Tagset reductions in morphosyntactic tagging of Croatian texts
Morphosyntactic tagging of Croatian texts is performed with stochastic taggers by using a language model built on a manually annotated corpus implementing the Multext East version 3 specifications for Croatian. Tagging accuracy in this framework is basically predefined, i.e. proportionally dependent...
Permalink: | http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:316466/Details |
---|---|
Matična publikacija: |
The Future of Information Sciences: Digital Resources and Knowledge Sharing Zagreb : Department of Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb, 2009 |
Glavni autori: | Agić, Željko (-), Tadić, Marko (Author), Dovedan Han, Zdravko |
Vrsta građe: | Članak |
Jezik: | eng |
Online pristup: |
http://bib.irb.hr/datoteka/433148.zamtzd_infuture09_final.pdf |
LEADER | 02717naa a2200277uu 4500 | ||
---|---|---|---|
008 | 131111s2009 xx 1 eng|d | ||
035 | |a (CROSBI)433148 | ||
040 | |a HR-ZaFF |b hrv |c HR-ZaFF |e ppiak | ||
100 | 1 | |9 495 |a Agić, Željko | |
245 | 1 | 0 | |a Tagset reductions in morphosyntactic tagging of Croatian texts / |c Agić, Željko ; Tadić, Marko ; Dovedan, Zdravko. |
246 | 3 | |i Naslov na engleskom: |a Tagset Reductions in Morphosyntactic Tagging of Croatian Texts | |
300 | |a 289-298 |f str. | ||
520 | |a Morphosyntactic tagging of Croatian texts is performed with stochastic taggers by using a language model built on a manually annotated corpus implementing the Multext East version 3 specifications for Croatian. Tagging accuracy in this framework is basically predefined, i.e. proportionally dependent of two things: the size of the training corpus and the number of different morphosyntactic tags encompassed by that corpus. Being that the 100 kw Croatia Weekly newspaper corpus by definition makes a rather small language model in terms of stochastic tagging of free domain texts, the paper presents an approach dealing with tagset reductions. Several meaningful subsets of the Croatian Multext-East version 3 morphosyntactic tagset specifications are created and applied on Croatian texts with the CroTag stochastic tagger, measuring overall tagging accuracy and F1-measures. Obtained results are discussed in terms of applying different reductions in different natural language processing systems and specific tasks defined by specific user requirements. | ||
536 | |a Projekt MZOS |f 130-1300646-0645 | ||
536 | |a Projekt MZOS |f 130-1300646-1776 | ||
546 | |a ENG | ||
690 | |a 5.04 | ||
690 | |a 6.03 | ||
693 | |a morphosyntactic tagging, part-of-speech tagging, stochastic tagger, Multext East tagset, tagset reductions, Croatian language |l hrv |2 crosbi | ||
693 | |a morphosyntactic tagging, part-of-speech tagging, stochastic tagger, Multext East tagset, tagset reductions, Croatian language |l eng |2 crosbi | ||
773 | 0 | |a 2nd International Conference The Future of Information Sciences (INFuture 2009) (3-6.11.2009. ; Zagreb, Hrvatska) |t The Future of Information Sciences: Digital Resources and Knowledge Sharing |d Zagreb : Department of Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb, 2009 |n Stančić, Hrvoje ; Seljan, Sanja ; Bawden, David ; Lasić-Lazić, Jadranka ; Slavić, Aida |z 978-953-175-355-5 |g str. 289-298 | |
700 | 1 | |9 888 |a Tadić, Marko |4 aut | |
700 | 1 | |9 415 |a Dovedan Han, Zdravko |4 aut | |
856 | |u http://bib.irb.hr/datoteka/433148.zamtzd_infuture09_final.pdf | ||
942 | |c RZB |u 2 |v Recenzija |z Znanstveni - Predavanje - CijeliRad |t 1.08 | ||
999 | |c 316466 |d 316464 |