MARC: Generating a Morphological Lexicon of Organization Entity Names

Generating a Morphological Lexicon of Organization Entity Names

This paper describes methods used for generating a morphological lexicon of organization entity names in Croatian. This resource is intended for two primary tasks: template-based natural language generation and named entity identification. The main problems concerning the lexicon generation are high...

Full description

Permalink:	http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:316364/Details
Matična publikacija:	Proceedings of the Sixth International Language Resources and Evaluation (LREC'08) Marrakech, Morocco : European Language Resources Association (ELRA), 2008
Glavni autori:	Ljubešić, Nikola, informatičar (-), Boras, Damir (Author), Lauc, Tomislava
Vrsta građe:	Članak
Jezik:	eng
Online pristup:	http://www.lrec-conf.org/proceedings/lrec2008/


LEADER	02721naa a2200265uu 4500
008	131111s2008 xx 1 eng\|d
035			\|a (CROSBI)418517
040			\|a HR-ZaFF \|b hrv \|c HR-ZaFF \|e ppiak
100	1		\|9 445 \|a Ljubešić, Nikola, \|c informatičar
245	1	0	\|a Generating a Morphological Lexicon of Organization Entity Names / \|c Ljubešić, Nikola ; Lauc, Tomislava ; Boras, Damir.
246	3		\|i Naslov na engleskom: \|a Generating a Morphological Lexicon of Organization Entity Names
300			\|f str.
520			\|a This paper describes methods used for generating a morphological lexicon of organization entity names in Croatian. This resource is intended for two primary tasks: template-based natural language generation and named entity identification. The main problems concerning the lexicon generation are high level of inflection in Croatian and low linguistic quality of the primary resource containing named entities in normal form. The problem is divided into two subproblems concerning single- word and multi-word expressions. The single-word problem is solved by training a supervised learning algorithm called linear successive abstraction. With existing common language morphological resources and two simple hand-crafted rules backing up the algorithm, accuracy of 98.70% on the test set is achieved. The multi-word problem is solved through a semi- automated process for multi-word entities occurring in the first 10, 000 named entities. The generated multi-word lexicon will be used for natural language generation only while named entity identification will be solved algorithmically in forthcoming research. The single-word lexicon is capable of handling both tasks.
536			\|a Projekt MZOS \|f 130-1301679-1380
536			\|a Projekt MZOS \|f 130-1301799-1999
546			\|a ENG
690			\|a 5.04
693			\|a morphological lexicon, lexicon generation, organization entity names, linear successive abstraction \|l hrv \|2 crosbi
693			\|a morphological lexicon, lexicon generation, organization entity names, linear successive abstraction \|l eng \|2 crosbi
773	0		\|a Sixth International Language Resources and Evaluation Conference (28-30.5.2008. ; Marakeš, Maroko) \|t Proceedings of the Sixth International Language Resources and Evaluation (LREC'08) \|d Marrakech, Morocco : European Language Resources Association (ELRA), 2008 \|n Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Daniel Tapias \|z 2-9517408-4-0
700	1		\|9 418 \|a Boras, Damir \|4 aut
700	1		\|9 436 \|a Lauc, Tomislava \|4 aut
856			\|u http://www.lrec-conf.org/proceedings/lrec2008/
942			\|c RZB \|u 2 \|v Recenzija \|z Znanstveni - Poster - CijeliRad \|t 1.08
999			\|c 316364 \|d 316362

Generating a Morphological Lexicon of Organization Entity Names

Slični primjerci