Automatic Enrichment of Croatian Morphological Lexicon Using Large Corpora and Web Search

The first version of the Croatian Morphological Lexicon (HML) was developed as early as 1994 and was utilized in the implementation of various experiments and systems dealing with Croatian. Since the HML is frequently used both as a stand-alone application and as a module in many other systems for p...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:335525/Details
Matična publikacija: Proceedings of the 6th International Conference on Corpus Linguistics
Las Palmas : AELINCO, 2014
Glavni autori: Merkler, Danijela (-), Tadić, Marko (Author), Agić, Željko
Vrsta građe: Članak
Jezik: eng
Online pristup: http://www.congresos.ulpgc.es/cilc6/resources/Programa_FINAL.pdf
LEADER 02434naa a22002657i 4500
005 20170115202523.0
008 150109s2014 sp 1 eng|d
035 |a (CROSBI)727739 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |a Merkler, Danijela  |9 868 
245 1 0 |a Automatic Enrichment of Croatian Morphological Lexicon Using Large Corpora and Web Search /  |c Merkler, Danijela ; Agić, Željko ; Tadić, Marko. 
246 3 |i Naslov na engleskom:  |a Automatic Enrichment of Croatian Morphological Lexicon Using Large Corpora and Web Search 
300 |a 42-42  |f str. 
520 |a The first version of the Croatian Morphological Lexicon (HML) was developed as early as 1994 and was utilized in the implementation of various experiments and systems dealing with Croatian. Since the HML is frequently used both as a stand-alone application and as a module in many other systems for processing Croatian, the lexicon is constantly being updated to newer versions by manual inserting unknown wordforms (i.e. the corresponding 3- tuples of lemmas, wordforms and morphosyntactic tags) in batches. Current version of HML cosists of 110.000 lemmas and more than 4.000.000 lexicon entries. Due to limitations in availability of expert human annotators and various other constraints, the process of manual inspection, lemma assingment and inflectional pattern selection for unknown wordforms is a rather slow procedure. Accordingly, in this paper, we propose a generic approach to (semi-)automatic generation of new candidate lemmas for HML, their verification, assingment of inflectional patterns and finally creation and insertion of new lexicon entries to HML in a single processing pipeline. 
536 |a Projekt MZOS  |f 130-1300646-1776 
546 |a ENG 
690 |a 5.04 
693 |a morphological lexicon, automatic enlargement, Croatian language  |l hrv  |2 crosbi 
693 |a morphological lexicon, automatic enlargement, Croatian language  |l eng  |2 crosbi 
700 1 |a Tadić, Marko  |4 aut  |9 888 
700 1 |9 495  |a Agić, Željko  |4 aut 
773 0 |a 6th International Conference on Corpus Linguistics (CILC 2014) (22-24.05.2014 ; Las Palmas, Španjolska)  |t Proceedings of the 6th International Conference on Corpus Linguistics  |d Las Palmas : AELINCO, 2014  |g str. 42-42 
856 |u http://www.congresos.ulpgc.es/cilc6/resources/Programa_FINAL.pdf 
942 |c RZB  |u 2  |v Recenzija  |z Znanstveni - Predavanje - Sazetak  |t 1.12 
999 |c 335525  |d 335522