Preparation of POS tagging of Croatian using CLaRK System

This paper presents the first results of POS tagging of Croatian texts using generated word-form list from Croatian Morphological Lexicon. The corpus of 500.000 tokens was processed using CLaRK System developed in SFS Tübingen and LML Sofia. The first part of the paper describes the process of mappi...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:314159/Details
Matična publikacija: Proceeding of RANLP2003 Conference
Sofia : BAS, 2003
Glavni autori: Tadić, Marko (-), Bekavac, Božo (Author)
Vrsta građe: Članak
Jezik: eng
LEADER 02026naa a2200229uu 4500
008 131111s2003 xx 1 eng|d
035 |a (CROSBI)126493 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |a Tadić, Marko 
245 1 0 |a Preparation of POS tagging of Croatian using CLaRK System /  |c Tadić, Marko ; Bekavac, Božo. 
246 3 |i Naslov na engleskom:  |a Preparation of POS tagging of Croatian using CLaRK System 
300 |a 455-459  |f str. 
520 |a This paper presents the first results of POS tagging of Croatian texts using generated word-form list from Croatian Morphological Lexicon. The corpus of 500.000 tokens was processed using CLaRK System developed in SFS Tübingen and LML Sofia. The first part of the paper describes the process of mapping word-forms with accompanied MSDs to tokens in the corpus. The phenomena of &#8220 ; internal&#8221 ; homography (several word-forms of the same lemma sharing the same form) and &#8220 ; external&#8221 ; homography (word-forms potentially belonging to different lemmas sharing the same form) are discussed. Also the statistics that represent measures of MSD and lemma ambiguity of Croatian nouns, verbs and adjectives is presented. The final part of the paper describes extraction and quantification of several POS patterns from the same corpus which are expected to represent characteristic patterns of multiword terminological units in Croatian. 
536 |a Projekt MZOS  |f 0130418 
546 |a ENG 
690 |a 6.03 
693 |a Croatian Language, Croatian Morphological Lexicon, POS tagging, homography, CLaRK system  |l hrv  |2 crosbi 
693 |a Croatian Language, Croatian Morphological Lexicon, POS tagging, homography, CLaRK system  |l eng  |2 crosbi 
700 1 |a Bekavac, Božo  |4 aut 
773 0 |a Recent Advances in Natural Language Processing 2003 (10-12.09.2003 ; Borovets, Bugarska)  |t Proceeding of RANLP2003 Conference  |d Sofia : BAS, 2003  |g str. 455-459 
942 |c RZB  |u 1  |v Recenzija  |z Znanstveni - Poster - CijeliRad 
999 |c 314159  |d 314157