Assigning inflectional paradigms to named entities by linear successive abstraction

This paper describes how a supervised learning method is used for assigning inflectional paradigms to organization entity names as the main prerequisite for generating a morphological lexicon of these named entities. An inflectional paradigm consists of a set of rules for generating all forms of a l...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:316366/Details
Matična publikacija: Proceedings Vol.III. MIPRO 2008. Computers in Technical Systems & Intelligent Systems (CTS & CIS)
Rijeka : Croatian Society for Information and Communication Technology, 2008
Glavni autori: Ljubešić, Nikola, informatičar (-), Bakarić, Nikola (Author), Lauc, Tomislava
Vrsta građe: Članak
Jezik: eng
LEADER 02783naa a2200253uu 4500
008 131111s2008 xx 1 eng|d
035 |a (CROSBI)418550 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |9 445  |a Ljubešić, Nikola,   |c informatičar 
245 1 0 |a Assigning inflectional paradigms to named entities by linear successive abstraction /  |c Ljubešić, Nikola ; Bakarić, Nikola ; Lauc, Tomislava. 
246 3 |i Naslov na engleskom:  |a Assigning Inflectional Paradigms to Named Entities by Linear Successive Abstraction 
300 |a 190-193  |f str. 
520 |a This paper describes how a supervised learning method is used for assigning inflectional paradigms to organization entity names as the main prerequisite for generating a morphological lexicon of these named entities. An inflectional paradigm consists of a set of rules for generating all forms of a lexicon entry. A morphological lexicon consists of lexicon entries and their corresponding forms. This type of language resource is crucial in tasks such as natural language generation (generating natural language business news from database data and news templates) and named entity identification (necessary step in data mining and business intelligence). The basic resource used in this research is a list of 106, 530 named entities of organizations given in basic form (nominative case) and ranked by relevance. On the first 5, 000 manually tagged named entities 59 inflectional paradigm classes are defined. Using linear successive abstraction, a suffix model is trained, validated and tested on this tagged dataset. Morphological lexica of general language, personal names and settlements are used as additional resources in the decision process. The achieved accuracy on the test set is 98.70%. 
536 |a Projekt MZOS  |f 130-1301679-1380 
536 |a Projekt MZOS  |f 130-1301799-1999 
546 |a ENG 
690 |a 5.04 
693 |a inflectional morphology, supervised learning, linear successive abstraction, morphological paradigm assignment, named entity  |l hrv  |2 crosbi 
693 |a inflectional morphology, supervised learning, linear successive abstraction, morphological paradigm assignment, named entity  |l eng  |2 crosbi 
700 1 |a Bakarić, Nikola  |4 aut 
700 1 |9 436  |a Lauc, Tomislava  |4 aut 
773 0 |a 31st International Convention on Information and Communication Technology, Electronics and Microelectronics (26-30.5.2008. ; Opatija, Hrvatska)  |t Proceedings Vol.III. MIPRO 2008. Computers in Technical Systems & Intelligent Systems (CTS & CIS)  |d Rijeka : Croatian Society for Information and Communication Technology, 2008  |n Bogunović, Nikola ; Ribarić, Slobodan  |z 978-953-233-038-0  |g str. 190-193 
942 |c RZB  |u 2  |v Recenzija  |z Znanstveni - Predavanje - CijeliRad  |t 1.08 
999 |c 316366  |d 316364