Domain-aware Evaluation of Named Entity Recognition Systems for Croatian

We provide an evaluation of the currently available named entity recognition systems for Croatian. The evaluation puts special emphasis on domain dependence. To this goal, we manually annotated a dataset of approximately 1 million tokens of Croatian text from various domains within the newspaper tex...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:310233/Details
Matična publikacija: CIT. Journal of computing and information technology
21 (2013), 3 ; str. 1-15
Glavni autori: Agić, Željko (-), Bekavac, Božo (Author)
Vrsta građe: Članak
Jezik: eng
Online pristup: http://cit.srce.unizg.hr
LEADER 02191naa a2200265uu 4500
008 131105s2013 xx eng|d
022 |a 1330-1136 
035 |a (CROSBI)643539 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |9 495  |a Agić, Željko 
245 1 0 |a Domain-aware Evaluation of Named Entity Recognition Systems for Croatian /  |c Agić, Željko ; Bekavac, Božo. 
246 3 |i Naslov na engleskom:  |a Domain-aware Evaluation of Named Entity Recognition Systems for Croatian 
300 |a 1-15  |f str. 
363 |a 21  |b 3  |i 2013 
520 |a We provide an evaluation of the currently available named entity recognition systems for Croatian. The evaluation puts special emphasis on domain dependence. To this goal, we manually annotated a dataset of approximately 1 million tokens of Croatian text from various domains within the newspaper text genre. The dataset was annotated using a three-class named entity tagset -- denoting personal names, locations and organizations. We give insight to feature selection, domain sensitivity and effects of increase in training set size for statistical named entity recognition using the state-of-the- art Stanford NER system. We also sketch a comparison of publicly available named entity recognition systems for Croatian considering domain dependence, regardless of their underlying paradigms. Our top-performing system achieved an F1 -score of 0.884 in a mixed-domain testing scenario, scoring 0.925 and 0.843 in the two domains separated for the experiment. The system shows consistency in state-of-the-art scores for detecting names of persons, locations and organizations. 
536 |a Projekt MZOS  |f 130-1300646-1776 
546 |a ENG 
690 |a 5.04 
693 |a named entity recognition, Croatian language, text domain, domain dependence, evaluation  |l hrv  |2 crosbi 
693 |a named entity recognition, Croatian language, text domain, domain dependence, evaluation  |l eng  |2 crosbi 
773 0 |t CIT. Journal of computing and information technology  |x 1330-1136  |g 21 (2013), 3 ; str. 1-15 
700 1 |9 835  |a Bekavac, Božo  |4 aut 
856 |u http://cit.srce.unizg.hr 
942 |c CLA  |t 1.01  |u 2  |z Znanstveni - clanak 
999 |c 310233  |d 310231