The Croatian Lemmatization Server

The need for lemmatization in inflectionally rich languages is indisputable: it is applicable for the whole range of procedures, from text-search up to parsing. From two predominant approaches to lemmatization (algorithmic— generally rule-based and realized with FSA— and relation...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:307125/Details
Matična publikacija: Southern Journal of Linguistics
29 (2005), 1/2 ; str. 206-217
Glavni autor: Tadić, Marko (-)
Vrsta građe: Članak
Jezik: eng
LEADER 02271naa a2200253uu 4500
008 131105s2005 xx eng|d
022 |a 0730-6245 
035 |a (CROSBI)332419 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |a Tadić, Marko 
245 1 4 |a The Croatian Lemmatization Server /  |c Tadić, Marko. 
246 3 |i Naslov na engleskom:  |a The Croatian Lemmatization Server 
300 |a 206-217  |f str. 
363 |a 29  |b 1/2  |i 2005 
520 |a The need for lemmatization in inflectionally rich languages is indisputable: it is applicable for the whole range of procedures, from text-search up to parsing. From two predominant approaches to lemmatization (algorithmic— generally rule-based and realized with FSA— and relational— generally data-driven and realized with databases), this paper opted for the latter. The reason is that formal-grammar approaches to Croatian morphology are rare and limited just to a part of morphological system. The other reason is that the generator for Croatian has already been developed (Tadić 1994) as well as Croatian Morphological Lexicon (CML) (Tadić and Fulgosi 2003). The idea was to offer an on-line lemmatization, POS/MSD service using CML v4.5 as the back-end. The Croatian Lemmatization Server (CLS) is available at http://hml.hnk.ffzg.hr, and it offers lemmatization and POS/MSD tagging at unigram level for now. For each token in submitted text, the server delivers all possible lemmas of which this token may be a word-form. For homographic tokens, each lemma is accompanied with all possible POS/MSD tags which are compliant to MulTextEast v3 specifications for Croatian. The CLS can also be used for generation: when lemma is inputted and marked, all its possible word-forms are being retrieved and delivered. 
536 |a Projekt MZOS  |f 0130418 
546 |a ENG 
690 |a 5.04 
690 |a 6.03 
693 |a lemmatization, morphological processing, computational linguistics, Croatian, web service  |l hrv  |2 crosbi 
693 |a lemmatization, morphological processing, computational linguistics, Croatian, web service  |l eng  |2 crosbi 
773 0 |t Southern Journal of Linguistics  |x 0730-6245  |g 29 (2005), 1/2 ; str. 206-217 
942 |c CLA  |t 1.01  |u 1  |z Znanstveni - clanak 
999 |c 307125  |d 307123