Croatian Lemmatization Server

The need for lemmatization in inflectionally rich languages is indisputable: it is applicable for the whole range of procedures — from textsearch, up to parsing. From two predominant approaches to lemmatization: 1) algorithmic (generally rule-based and realized with FSA) and 2) relational...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:315117/Details
Matična publikacija: Formal Approaches to south Slavic and Balkan Languages
Sofia : Bulgarian Academy of Sciences, 2006
Glavni autor: Tadić, Marko (-)
Vrsta građe: Članak
Jezik: eng
Online pristup: http://bib.irb.hr/datoteka/280673.MT4FASSBL2006.pdf
http://hnk.ffzg.hr/txts/mt4FASSBL2006.pdf
LEADER 02530naa a2200241uu 4500
008 131111s2006 xx 1 eng|d
035 |a (CROSBI)280673 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |a Tadić, Marko 
245 1 0 |a Croatian Lemmatization Server /  |c Tadić, Marko. 
246 3 |i Naslov na engleskom:  |a Croatian Lemmatization Server 
300 |a 140-146  |f str. 
520 |a The need for lemmatization in inflectionally rich languages is indisputable: it is applicable for the whole range of procedures — from textsearch, up to parsing. From two predominant approaches to lemmatization: 1) algorithmic (generally rule-based and realized with FSA) and 2) relational (generally data-driven and realized with databases), this paper opted for the latter. The reason is that formal-grammar approaches to Croatian morphology are rare and limited just to a part of morphological system. The other reason is that the generator for Croatian has already been developed (Tadić 1994) as well as Croatian Morphological Lexicon (CML) (Tadić & Fulgosi 2003). The idea was to offer an on-line lemmatization, POS/MSD service using CML v 4.5 as the back-end. The Croatian Lemmatization Server (CLS) is available at http://hml.hnk.ffzg.hr and it offers lemmatization and POS/MSD tagging at unigram level for now. For each token in submitted text, the server delivers all possible lemmas of which this token may be a word-form. For homographic tokens, each lemma is accompanied with all possible POS/MSD tags which are compliant to MulTextEast v3 specifications for Croatian . The CLS can also be used for generation: when lemma is inputted and marked, all its possible word-forms are being retrieved and delivered. 
536 |a Projekt MZOS  |f 0130418 
546 |a ENG 
690 |a 6.03 
693 |a lemmatization, POS tagging, MSD tagging, Croatian, web-service  |l hrv  |2 crosbi 
693 |a lemmatization, POS tagging, MSD tagging, Croatian, web-service  |l eng  |2 crosbi 
773 0 |a Fifth International Conference Formal Approaches to South Slavic and Balkan languages (FASSBL) (18-20.10.2006. ; Sofija, Bugarska)  |t Formal Approaches to south Slavic and Balkan Languages  |d Sofia : Bulgarian Academy of Sciences, 2006  |n Vulchanova, Mila Dimitrova ; Koeva, Svetla ; Krapova, Iliyana ; Vulchanov, Valentin  |g str. 140-146 
856 |u http://bib.irb.hr/datoteka/280673.MT4FASSBL2006.pdf 
856 |u http://hnk.ffzg.hr/txts/mt4FASSBL2006.pdf 
942 |c RZB  |u 1  |v Recenzija  |z Znanstveni - Predavanje - CijeliRad 
999 |c 315117  |d 315115