MARC: Improving Part-of-Speech Tagging Accuracy for Croatian by Morphological Analysis

Improving Part-of-Speech Tagging Accuracy for Croatian by Morphological Analysis

This paper investigates several methods of combining a second order hidden Markov model part-of-speech (morphosyntactic) tagger and a high-coverage inflectional lexicon for Croatian. Our primary motivation was to improve tagging accuracy of Croatian texts by using our newly-developed tagger CroTag,...

Full description

Permalink:	http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:307602/Details
Matična publikacija:	Informatica 32 (2008), 4 ; str. 445-451
Glavni autori:	Agić, Željko (-), Tadić, Marko (Author), Dovedan Han, Zdravko
Vrsta građe:	Članak
Jezik:	eng
Online pristup:	http://bib.irb.hr/datoteka/375351.2008-Informatica-ZAMTZD-final.pdf http://www.informatica.si/


LEADER	02749naa a2200337uu 4500
008	131105s2008 xx eng\|d
022			\|a 0350-5596
035			\|a (CROSBI)375351
040			\|a HR-ZaFF \|b hrv \|c HR-ZaFF \|e ppiak
100	1		\|9 495 \|a Agić, Željko
245	1	0	\|a Improving Part-of-Speech Tagging Accuracy for Croatian by Morphological Analysis / \|c Agić, Željko ; Tadić, Marko ; Dovedan, Zdravko.
246	3		\|i Naslov na engleskom: \|a Improving Part-of-Speech Tagging Accuracy for Croatian by Morphological Analysis
300			\|a 445-451 \|f str.
363			\|a 32 \|b 4 \|i 2008
520			\|a This paper investigates several methods of combining a second order hidden Markov model part-of-speech (morphosyntactic) tagger and a high-coverage inflectional lexicon for Croatian. Our primary motivation was to improve tagging accuracy of Croatian texts by using our newly-developed tagger CroTag, currently in beta-version. We also wanted to compare its tagging results – both standalone and utilizing the morphological lexicon – to the ones previously described in (Agić and Tadić 2006), provided by the TnT statistical tagger which we used as a reference point having in mind that both implement the same tagging procedure. At the beginning we explain the basic idea behind the experiment, its motivation and importance from the perspective of processing the Croatian language. We also describe tools – namely tagger and lexicon – and language resources used in the experiment, including their implementation method and input/output format details that were of importance. With the basics presented, we describe in theory four possible methods of combining these resources and tools with respect to their operating paradigm, input and production capabilities and then put these ideas to test using the F-measure evaluation framework. Results are then discussed in detail and conclusions and future work plans are presented.
536			\|a Projekt MZOS \|f 036-1300646-1986
536			\|a Projekt MZOS \|f 130-1300646-0645
536			\|a Projekt MZOS \|f 130-1300646-1776
546			\|a ENG
690			\|a 2.09
690			\|a 5.04
690			\|a 6.03
693			\|a part-of-speech tagging, morphological analysis, inflectional lexicon, Croatian language Received: May \|l hrv \|2 crosbi
693			\|a part-of-speech tagging, morphological analysis, inflectional lexicon, Croatian language Received: May \|l eng \|2 crosbi
773	0		\|t Informatica \|x 0350-5596 \|g 32 (2008), 4 ; str. 445-451
700	1		\|9 888 \|a Tadić, Marko \|4 aut
700	1		\|9 415 \|a Dovedan Han, Zdravko \|4 aut
856			\|u http://bib.irb.hr/datoteka/375351.2008-Informatica-ZAMTZD-final.pdf
856			\|u http://www.informatica.si/
942			\|c CLA \|t 1.01 \|u 2 \|z Znanstveni - clanak
999			\|c 307602 \|d 307600

Improving Part-of-Speech Tagging Accuracy for Croatian by Morphological Analysis

Slični primjerci