Preparation of POS tagging of Croatian using CLaRK System
This paper presents the first results of POS tagging of Croatian texts using generated word-form list from Croatian Morphological Lexicon. The corpus of 500.000 tokens was processed using CLaRK System developed in SFS Tübingen and LML Sofia. The first part of the paper describes the process of mappi...
Permalink: | http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:314159/Details |
---|---|
Matična publikacija: |
Proceeding of RANLP2003 Conference Sofia : BAS, 2003 |
Glavni autori: | Tadić, Marko (-), Bekavac, Božo (Author) |
Vrsta građe: | Članak |
Jezik: | eng |
LEADER | 02026naa a2200229uu 4500 | ||
---|---|---|---|
008 | 131111s2003 xx 1 eng|d | ||
035 | |a (CROSBI)126493 | ||
040 | |a HR-ZaFF |b hrv |c HR-ZaFF |e ppiak | ||
100 | 1 | |a Tadić, Marko | |
245 | 1 | 0 | |a Preparation of POS tagging of Croatian using CLaRK System / |c Tadić, Marko ; Bekavac, Božo. |
246 | 3 | |i Naslov na engleskom: |a Preparation of POS tagging of Croatian using CLaRK System | |
300 | |a 455-459 |f str. | ||
520 | |a This paper presents the first results of POS tagging of Croatian texts using generated word-form list from Croatian Morphological Lexicon. The corpus of 500.000 tokens was processed using CLaRK System developed in SFS Tübingen and LML Sofia. The first part of the paper describes the process of mapping word-forms with accompanied MSDs to tokens in the corpus. The phenomena of &#8220 ; internal&#8221 ; homography (several word-forms of the same lemma sharing the same form) and &#8220 ; external&#8221 ; homography (word-forms potentially belonging to different lemmas sharing the same form) are discussed. Also the statistics that represent measures of MSD and lemma ambiguity of Croatian nouns, verbs and adjectives is presented. The final part of the paper describes extraction and quantification of several POS patterns from the same corpus which are expected to represent characteristic patterns of multiword terminological units in Croatian. | ||
536 | |a Projekt MZOS |f 0130418 | ||
546 | |a ENG | ||
690 | |a 6.03 | ||
693 | |a Croatian Language, Croatian Morphological Lexicon, POS tagging, homography, CLaRK system |l hrv |2 crosbi | ||
693 | |a Croatian Language, Croatian Morphological Lexicon, POS tagging, homography, CLaRK system |l eng |2 crosbi | ||
700 | 1 | |a Bekavac, Božo |4 aut | |
773 | 0 | |a Recent Advances in Natural Language Processing 2003 (10-12.09.2003 ; Borovets, Bugarska) |t Proceeding of RANLP2003 Conference |d Sofia : BAS, 2003 |g str. 455-459 | |
942 | |c RZB |u 1 |v Recenzija |z Znanstveni - Poster - CijeliRad | ||
999 | |c 314159 |d 314157 |