MARC: Tagset reductions in morphosyntactic tagging of Croatian texts

Tagset reductions in morphosyntactic tagging of Croatian texts

Morphosyntactic tagging of Croatian texts is performed with stochastic taggers by using a language model built on a manually annotated corpus implementing the Multext East version 3 specifications for Croatian. Tagging accuracy in this framework is basically predefined, i.e. proportionally dependent...

Full description

Permalink:	http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:316466/Details
Matična publikacija:	The Future of Information Sciences: Digital Resources and Knowledge Sharing Zagreb : Department of Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb, 2009
Glavni autori:	Agić, Željko (-), Tadić, Marko (Author), Dovedan Han, Zdravko
Vrsta građe:	Članak
Jezik:	eng
Online pristup:	http://bib.irb.hr/datoteka/433148.zamtzd_infuture09_final.pdf


LEADER	02717naa a2200277uu 4500
008	131111s2009 xx 1 eng\|d
035			\|a (CROSBI)433148
040			\|a HR-ZaFF \|b hrv \|c HR-ZaFF \|e ppiak
100	1		\|9 495 \|a Agić, Željko
245	1	0	\|a Tagset reductions in morphosyntactic tagging of Croatian texts / \|c Agić, Željko ; Tadić, Marko ; Dovedan, Zdravko.
246	3		\|i Naslov na engleskom: \|a Tagset Reductions in Morphosyntactic Tagging of Croatian Texts
300			\|a 289-298 \|f str.
520			\|a Morphosyntactic tagging of Croatian texts is performed with stochastic taggers by using a language model built on a manually annotated corpus implementing the Multext East version 3 specifications for Croatian. Tagging accuracy in this framework is basically predefined, i.e. proportionally dependent of two things: the size of the training corpus and the number of different morphosyntactic tags encompassed by that corpus. Being that the 100 kw Croatia Weekly newspaper corpus by definition makes a rather small language model in terms of stochastic tagging of free domain texts, the paper presents an approach dealing with tagset reductions. Several meaningful subsets of the Croatian Multext-East version 3 morphosyntactic tagset specifications are created and applied on Croatian texts with the CroTag stochastic tagger, measuring overall tagging accuracy and F1-measures. Obtained results are discussed in terms of applying different reductions in different natural language processing systems and specific tasks defined by specific user requirements.
536			\|a Projekt MZOS \|f 130-1300646-0645
536			\|a Projekt MZOS \|f 130-1300646-1776
546			\|a ENG
690			\|a 5.04
690			\|a 6.03
693			\|a morphosyntactic tagging, part-of-speech tagging, stochastic tagger, Multext East tagset, tagset reductions, Croatian language \|l hrv \|2 crosbi
693			\|a morphosyntactic tagging, part-of-speech tagging, stochastic tagger, Multext East tagset, tagset reductions, Croatian language \|l eng \|2 crosbi
773	0		\|a 2nd International Conference The Future of Information Sciences (INFuture 2009) (3-6.11.2009. ; Zagreb, Hrvatska) \|t The Future of Information Sciences: Digital Resources and Knowledge Sharing \|d Zagreb : Department of Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb, 2009 \|n Stančić, Hrvoje ; Seljan, Sanja ; Bawden, David ; Lasić-Lazić, Jadranka ; Slavić, Aida \|z 978-953-175-355-5 \|g str. 289-298
700	1		\|9 888 \|a Tadić, Marko \|4 aut
700	1		\|9 415 \|a Dovedan Han, Zdravko \|4 aut
856			\|u http://bib.irb.hr/datoteka/433148.zamtzd_infuture09_final.pdf
942			\|c RZB \|u 2 \|v Recenzija \|z Znanstveni - Predavanje - CijeliRad \|t 1.08
999			\|c 316466 \|d 316464

Tagset reductions in morphosyntactic tagging of Croatian texts

Slični primjerci