Tagset reductions in morphosyntactic tagging of Croatian texts

Morphosyntactic tagging of Croatian texts is performed with stochastic taggers by using a language model built on a manually annotated corpus implementing the Multext East version 3 specifications for Croatian. Tagging accuracy in this framework is basically predefined, i.e. proportionally dependent...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:316466/Details
Matična publikacija: The Future of Information Sciences: Digital Resources and Knowledge Sharing
Zagreb : Department of Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb, 2009
Glavni autori: Agić, Željko (-), Tadić, Marko (Author), Dovedan Han, Zdravko
Vrsta građe: Članak
Jezik: eng
Online pristup: http://bib.irb.hr/datoteka/433148.zamtzd_infuture09_final.pdf
LEADER 02717naa a2200277uu 4500
008 131111s2009 xx 1 eng|d
035 |a (CROSBI)433148 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |9 495  |a Agić, Željko 
245 1 0 |a Tagset reductions in morphosyntactic tagging of Croatian texts /  |c Agić, Željko ; Tadić, Marko ; Dovedan, Zdravko. 
246 3 |i Naslov na engleskom:  |a Tagset Reductions in Morphosyntactic Tagging of Croatian Texts 
300 |a 289-298  |f str. 
520 |a Morphosyntactic tagging of Croatian texts is performed with stochastic taggers by using a language model built on a manually annotated corpus implementing the Multext East version 3 specifications for Croatian. Tagging accuracy in this framework is basically predefined, i.e. proportionally dependent of two things: the size of the training corpus and the number of different morphosyntactic tags encompassed by that corpus. Being that the 100 kw Croatia Weekly newspaper corpus by definition makes a rather small language model in terms of stochastic tagging of free domain texts, the paper presents an approach dealing with tagset reductions. Several meaningful subsets of the Croatian Multext-East version 3 morphosyntactic tagset specifications are created and applied on Croatian texts with the CroTag stochastic tagger, measuring overall tagging accuracy and F1-measures. Obtained results are discussed in terms of applying different reductions in different natural language processing systems and specific tasks defined by specific user requirements. 
536 |a Projekt MZOS  |f 130-1300646-0645 
536 |a Projekt MZOS  |f 130-1300646-1776 
546 |a ENG 
690 |a 5.04 
690 |a 6.03 
693 |a morphosyntactic tagging, part-of-speech tagging, stochastic tagger, Multext East tagset, tagset reductions, Croatian language  |l hrv  |2 crosbi 
693 |a morphosyntactic tagging, part-of-speech tagging, stochastic tagger, Multext East tagset, tagset reductions, Croatian language  |l eng  |2 crosbi 
773 0 |a 2nd International Conference The Future of Information Sciences (INFuture 2009) (3-6.11.2009. ; Zagreb, Hrvatska)  |t The Future of Information Sciences: Digital Resources and Knowledge Sharing  |d Zagreb : Department of Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb, 2009  |n Stančić, Hrvoje ; Seljan, Sanja ; Bawden, David ; Lasić-Lazić, Jadranka ; Slavić, Aida  |z 978-953-175-355-5  |g str. 289-298 
700 1 |9 888  |a Tadić, Marko  |4 aut 
700 1 |9 415  |a Dovedan Han, Zdravko  |4 aut 
856 |u http://bib.irb.hr/datoteka/433148.zamtzd_infuture09_final.pdf 
942 |c RZB  |u 2  |v Recenzija  |z Znanstveni - Predavanje - CijeliRad  |t 1.08 
999 |c 316466  |d 316464