Evaluating full lemmatization of Croatian texts

The paper presents the implementation and evaluation of a module for full lemmatization of Croatian texts. The module implements several lemmatization procedures, all of them based on merging outputs of the previously developed stochastic morphosyntactic tagger CroTag and the inflectional lexicon of...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:312004/Details
Matična publikacija: Recent Advances in Intelligent Information Systems
Challenging Problems of Science: Computer Science
Glavni autori: Agić, Željko (-), Tadić, Marko (Author), Dovedan Han, Zdravko
Vrsta građe: Članak
Jezik: eng
Online pristup: http://bib.irb.hr/datoteka/390143.zamtzd_bsnlp09.pdf
LEADER 02330naa a2200301uu 4500
008 131111s2009 xx eng|d
020 |a 97883-60434-59-8 
035 |a (CROSBI)390143 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |9 495  |a Agić, Željko 
245 1 0 |a Evaluating full lemmatization of Croatian texts /  |c Agić, Željko ; Tadić, Marko ; Dovedan, Zdravko. 
246 3 |i Naslov na engleskom:  |a Evaluating Full Lemmatization of Croatian Texts 
300 |a 175-184  |f str. 
520 |a The paper presents the implementation and evaluation of a module for full lemmatization of Croatian texts. The module implements several lemmatization procedures, all of them based on merging outputs of the previously developed stochastic morphosyntactic tagger CroTag and the inflectional lexicon of Croatian Evaluation of the lemmatization module on two test cases, simulating realistic and ideal operating conditions, provided full lemmatization accuracy scores of 96.96 and 98.15 percent, respectively. It is also shown that a majority of errors in this framework, 57.14 percent in the realistic testing scenario, occur on word forms with external homography. Moreover, approximately 80 percent of all lemmatization errors occur on nouns, adjectives and adverbs in that particular order. Language resources, testing environment and procedure descriptions are provided in the paper along with a discussion of obtained results and possible future research directions. 
536 |a Projekt MZOS  |f 130-1300646-0645 
536 |a Projekt MZOS  |f 130-1300646-1776 
546 |a ENG 
690 |a 2.09 
690 |a 5.04 
690 |a 6.03 
693 |a full lemmatization, morphosyntactic tagging, Croatian language  |l hrv  |2 crosbi 
693 |a full lemmatization, morphosyntactic tagging, Croatian language  |l eng  |2 crosbi 
773 0 |t Recent Advances in Intelligent Information Systems  |d Warsaw : Academic Publishing House EXIT, 2009  |k Challenging Problems of Science: Computer Science  |h 762  |n Klopotek, Mieczyslaw ; Przepiorkowski, Adam ; Wierzchon, Slawomir ; Trojanowski, Krzysztof  |z 978-83-60434-59-8  |g str. 175-184 
700 1 |9 888  |a Tadić, Marko  |4 aut 
700 1 |9 415  |a Dovedan Han, Zdravko  |4 aut 
856 |u http://bib.irb.hr/datoteka/390143.zamtzd_bsnlp09.pdf 
942 |c POG  |t 1.16.1  |u 2  |z Znanstveni 
999 |c 312004  |d 312002