Automatic Enrichment of Croatian Morphological Lexicon Using Large Corpora and Web Search

Inflectional (or morphological) lexica are considered to be language resources of high importance and frequent usage in many language processing tasks -- from basic problems such as lemmatization and morphosyntactic tagging of written text to applications in machine learning, information extraction,...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:318107/Details
Matična publikacija: Proceedings of FASSBL 2012
Glavni autori: Merkler, Danijela (-), Tadić, Marko (Author), Agić, Željko
Vrsta građe: Članak
Jezik: eng
Online pristup: http://bib.irb.hr/datoteka/603927.dmzamt_fassbl_2012.pdf
LEADER 02374naa a2200277uu 4500
008 131111s2012 xx 1 eng|d
035 |a (CROSBI)603927 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |9 868  |a Merkler, Danijela 
245 1 0 |a Automatic Enrichment of Croatian Morphological Lexicon Using Large Corpora and Web Search /  |c Merkler, Danijela ; Agić, Željko ; Tadić, Marko. 
246 3 |i Naslov na engleskom:  |a Automatic Enrichment of Croatian Morphological Lexicon Using Large Corpora and Web Search 
300 |f str. 
520 |a Inflectional (or morphological) lexica are considered to be language resources of high importance and frequent usage in many language processing tasks -- from basic problems such as lemmatization and morphosyntactic tagging of written text to applications in machine learning, information extraction, information retrieval and machine translation -- for highly inflectional languages such as Croatian. Being that Croatian Morphological Lexicon (HML) is frequently used both as a stand-alone application and as a module in many other systems for processing Croatian, unknown wordforms -- those undetected when matching unseen text with the current version of the HML database -- are constantly being logged and the lexicon is being updated to newer versions by inserting these new wordforms in batches. Accordingly, in this paper, we propose a generic approach to (semi-)automatic generation of new candidate lemmas for HML, their verification, assignment of inflectional patterns and finally creation and insertion of new lexicon entries to HML in a single processing pipeline. 
536 |a Projekt MZOS  |f 130-1300646-0645 
536 |a Projekt MZOS  |f 130-1300646-1776 
546 |a ENG 
690 |a 5.04 
690 |a 6.03 
693 |a automatic enrichment, morphological lexicon, large corpora  |l hrv  |2 crosbi 
693 |a automatic enrichment, morphological lexicon, large corpora  |l eng  |2 crosbi 
773 0 |a The 8th International Conference on Formal Approaches to South Slavic and Balkan Languages (FASSBL 2012) (19-21.09.2012. ; Dubrovnik, Hrvatska)  |t Proceedings of FASSBL 2012 
700 1 |9 888  |a Tadić, Marko  |4 aut 
700 1 |9 495  |a Agić, Željko  |4 aut 
856 |u http://bib.irb.hr/datoteka/603927.dmzamt_fassbl_2012.pdf 
942 |c RZB  |u 2  |v Recenzija  |z Znanstveni - Predavanje - ppt  |t 3.15 
999 |c 318107  |d 318105