Combining part-of-speech tagger and inflectional lexicon for Croatian

This paper investigates several methods of combining output of a third-order Hidden Markov Model PoS/MSD tagger and a highcoverage inflectional lexicon for Croatian. Our primary motivation was to improve overall tagging accuracy of Croatian texts by using our newly-developed PoS/MSD tagger. We also...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:315735/Details
Matična publikacija: Proceedings of the 6th Language Technologies Conference
Ljubljana, Slovenija : Institut Jožef Stefan, 2008
Glavni autori: Agić, Željko (-), Tadić, Marko (Author), Dovedan Han, Zdravko
Vrsta građe: Članak
Jezik: eng
Online pristup: http://bib.irb.hr/datoteka/363913.2008-ISLTC-ZAMTZD-final.pdf
LEADER 02841naa a2200301uu 4500
008 131111s2008 xx 1 eng|d
035 |a (CROSBI)363913 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |9 495  |a Agić, Željko 
245 1 0 |a Combining part-of-speech tagger and inflectional lexicon for Croatian /  |c Agić, Željko ; Tadić, Marko ; Dovedan, Zdravko. 
246 3 |i Naslov na engleskom:  |a Combining Part-of-Speech Tagger and Inflectional Lexicon for Croatian 
300 |a 116-121  |f str. 
520 |a This paper investigates several methods of combining output of a third-order Hidden Markov Model PoS/MSD tagger and a highcoverage inflectional lexicon for Croatian. Our primary motivation was to improve overall tagging accuracy of Croatian texts by using our newly-developed PoS/MSD tagger. We also wanted to compare its tagging results – both standalone and utilizing the morphological lexicon – to the ones previously described in (Agić, Tadić, 2006), provided by the TnT statistical tagger applied to Croatian which we used as a reference point having in mind that both implement the second-order HMM tagging procedure. At the beginning we explain the basic idea behind the experiment, its motivation and importance from the perspective of processing the Croatian language. We also describe all the tools and language resources used in the experiment, including their operating paradigms and input and output format details that were of importance. With the basics presented, we describe in theory all the possible methods of combining these resources and tools with respect to their paradigm, input and production capabilities and then put these ideas to test, using the de facto standard recall, precision and F-measure framework. Results are then discussed in detail and conclusions and future work plans are presented. 
536 |a Projekt MZOS  |f 036-1300646-1986 
536 |a Projekt MZOS  |f 130-1300646-0645 
536 |a Projekt MZOS  |f 130-1300646-1776 
546 |a ENG 
690 |a 2.09 
690 |a 5.04 
690 |a 6.03 
693 |a PoS/MSD tagging, HMM, inflectional lexicon, Croatian language  |l hrv  |2 crosbi 
693 |a PoS/MSD tagging, HMM, inflectional lexicon, Croatian language  |l eng  |2 crosbi 
773 0 |a 11th Information Society Multiconference (IS 2008) / 6th Language Technologies Conference (IS-LTC 2008) (16-17.10.2008. ; Ljubljana, Slovenija)  |t Proceedings of the 6th Language Technologies Conference  |d Ljubljana, Slovenija : Institut Jožef Stefan, 2008  |n Erjavec, Tomaž ; Žganec Gros, Jerneja  |z 978-961-264-006-4  |g str. 116-121 
700 1 |9 888  |a Tadić, Marko  |4 aut 
700 1 |9 415  |a Dovedan Han, Zdravko  |4 aut 
856 |u http://bib.irb.hr/datoteka/363913.2008-ISLTC-ZAMTZD-final.pdf 
942 |c RZB  |u 2  |v Recenzija  |z Znanstveni - Predavanje - CijeliRad  |t 1.08 
999 |c 315735  |d 315733