Error analysis in Croatian morphosyntactic tagging

In this paper, we provide detailed insight on properties of errors generated by a stochastic morphosyntactic tagger assigning Multext-East morphosyntactic descriptions to Croatian texts. Tagging the Croatia Weekly newspaper corpus by the CroTag tagger in stochastic mode revealed that approximately 8...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:315892/Details
Matična publikacija: Proceedings of the 31st International Conference on Information Technology Interfaces
Zagreb : SRCE University Computer Centre, University of Zagreb, 2009
Glavni autori: Agić, Željko (-), Tadić, Marko (Author), Dovedan Han, Zdravko
Vrsta građe: Članak
Jezik: eng
Online pristup: http://bib.irb.hr/datoteka/389315.zamtzd_iti09.pdf
LEADER 02369naa a2200277uu 4500
008 131111s2009 xx 1 eng|d
035 |a (CROSBI)389315 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |9 495  |a Agić, Željko 
245 1 0 |a Error analysis in Croatian morphosyntactic tagging /  |c Agić, Željko ; Tadić, Marko ; Dovedan, Zdravko. 
246 3 |i Naslov na engleskom:  |a Error Analysis in Croatian Morphosyntactic Tagging 
300 |a 521-526  |f str. 
520 |a In this paper, we provide detailed insight on properties of errors generated by a stochastic morphosyntactic tagger assigning Multext-East morphosyntactic descriptions to Croatian texts. Tagging the Croatia Weekly newspaper corpus by the CroTag tagger in stochastic mode revealed that approximately 85 percent of all tagging errors occur on nouns, adjectives, pronouns and verbs. Moreover, approximately 50 percent of these are shown to be incorrect assignments of case values. We provide various other distributional properties of errors in assigning morphosyntactic descriptions for these and other parts of speech. On the basis of these properties, we propose rule- based and stochastic strategies which could be integrated in the tagging module, creating a hybrid procedure in order to raise overall tagging accuracy for Croatian. 
536 |a Projekt MZOS  |f 130-1300646-0645 
536 |a Projekt MZOS  |f 130-1300646-1776 
546 |a ENG 
690 |a 5.04 
690 |a 6.03 
693 |a morphosyntactic tagging, part-of-speech tagging, error analysis, error distribution, Croatian language, hybrid tagging  |l hrv  |2 crosbi 
693 |a morphosyntactic tagging, part-of-speech tagging, error analysis, error distribution, Croatian language, hybrid tagging  |l eng  |2 crosbi 
773 0 |a 31st International Conference on Information Technology Interfaces (ITI 2009) (22-25.06.2009. ; Cavtat, Hrvatska)  |t Proceedings of the 31st International Conference on Information Technology Interfaces  |d Zagreb : SRCE University Computer Centre, University of Zagreb, 2009  |n Lužar-Stiffler, Vesna ; Jarec, Iva ; Bekić, Zoran  |z 978-953-7138-16-5  |g str. 521-526 
700 1 |9 888  |a Tadić, Marko  |4 aut 
700 1 |9 415  |a Dovedan Han, Zdravko  |4 aut 
856 |u http://bib.irb.hr/datoteka/389315.zamtzd_iti09.pdf 
942 |c RZB  |u 2  |v Recenzija  |z Znanstveni - Predavanje - CijeliRad  |t 1.08 
999 |c 315892  |d 315890