Croatian Lemmatization Server
The need for lemmatization in inflectionally rich languages is indisputable: it is applicable for the whole range of procedures — from textsearch, up to parsing. From two predominant approaches to lemmatization: 1) algorithmic (generally rule-based and realized with FSA) and 2) relational...
Permalink: | http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:315117/Details |
---|---|
Matična publikacija: |
Formal Approaches to south Slavic and Balkan Languages Sofia : Bulgarian Academy of Sciences, 2006 |
Glavni autor: | Tadić, Marko (-) |
Vrsta građe: | Članak |
Jezik: | eng |
Online pristup: |
http://bib.irb.hr/datoteka/280673.MT4FASSBL2006.pdf http://hnk.ffzg.hr/txts/mt4FASSBL2006.pdf |
LEADER | 02530naa a2200241uu 4500 | ||
---|---|---|---|
008 | 131111s2006 xx 1 eng|d | ||
035 | |a (CROSBI)280673 | ||
040 | |a HR-ZaFF |b hrv |c HR-ZaFF |e ppiak | ||
100 | 1 | |a Tadić, Marko | |
245 | 1 | 0 | |a Croatian Lemmatization Server / |c Tadić, Marko. |
246 | 3 | |i Naslov na engleskom: |a Croatian Lemmatization Server | |
300 | |a 140-146 |f str. | ||
520 | |a The need for lemmatization in inflectionally rich languages is indisputable: it is applicable for the whole range of procedures — from textsearch, up to parsing. From two predominant approaches to lemmatization: 1) algorithmic (generally rule-based and realized with FSA) and 2) relational (generally data-driven and realized with databases), this paper opted for the latter. The reason is that formal-grammar approaches to Croatian morphology are rare and limited just to a part of morphological system. The other reason is that the generator for Croatian has already been developed (Tadić 1994) as well as Croatian Morphological Lexicon (CML) (Tadić & Fulgosi 2003). The idea was to offer an on-line lemmatization, POS/MSD service using CML v 4.5 as the back-end. The Croatian Lemmatization Server (CLS) is available at http://hml.hnk.ffzg.hr and it offers lemmatization and POS/MSD tagging at unigram level for now. For each token in submitted text, the server delivers all possible lemmas of which this token may be a word-form. For homographic tokens, each lemma is accompanied with all possible POS/MSD tags which are compliant to MulTextEast v3 specifications for Croatian . The CLS can also be used for generation: when lemma is inputted and marked, all its possible word-forms are being retrieved and delivered. | ||
536 | |a Projekt MZOS |f 0130418 | ||
546 | |a ENG | ||
690 | |a 6.03 | ||
693 | |a lemmatization, POS tagging, MSD tagging, Croatian, web-service |l hrv |2 crosbi | ||
693 | |a lemmatization, POS tagging, MSD tagging, Croatian, web-service |l eng |2 crosbi | ||
773 | 0 | |a Fifth International Conference Formal Approaches to South Slavic and Balkan languages (FASSBL) (18-20.10.2006. ; Sofija, Bugarska) |t Formal Approaches to south Slavic and Balkan Languages |d Sofia : Bulgarian Academy of Sciences, 2006 |n Vulchanova, Mila Dimitrova ; Koeva, Svetla ; Krapova, Iliyana ; Vulchanov, Valentin |g str. 140-146 | |
856 | |u http://bib.irb.hr/datoteka/280673.MT4FASSBL2006.pdf | ||
856 | |u http://hnk.ffzg.hr/txts/mt4FASSBL2006.pdf | ||
942 | |c RZB |u 1 |v Recenzija |z Znanstveni - Predavanje - CijeliRad | ||
999 | |c 315117 |d 315115 |