Lexicon-Based Morphological Normalisation and its Aplication to Croatian Language

Due to language morphology, words appear in text in various inflectional and derivational forms. This morphological variation has been shown to negatively affect the performance of most information retrieval and text mining systems. Morphological variation may be reduced by performing morphological...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:312304/Details
Matična publikacija: Technologies for the Processing and Retrieval of Semi-Structured Documents: Experience from the CADIAL Project
Language and Technology
Glavni autori: Šnajder, Jan (-), Dalbelo Bašić, Bojana (Author), Tadić, Marko
Vrsta građe: Članak
Jezik: eng
LEADER 02740naa a2200253uu 4500
008 131111s2009 xx eng|d
020 |a 978953-55375-1-9 
035 |a (CROSBI)427400 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |a Šnajder, Jan 
245 1 0 |a Lexicon-Based Morphological Normalisation and its Aplication to Croatian Language /  |c Šnajder, Jan ; Dalbelo Bašić, Bojana ; Tadić, Marko. 
246 3 |i Naslov na engleskom:  |a Lexicon-Based Morphological Normalisation and its Aplication to Croatian Language 
300 |a 23-80  |f str. 
520 |a Due to language morphology, words appear in text in various inflectional and derivational forms. This morphological variation has been shown to negatively affect the performance of most information retrieval and text mining systems. Morphological variation may be reduced by performing morphological normalisation, i.e., the conflation of morphological variants of a word into a single representative form. A lexicon-based approach to normalisation allows for high normalisation precision, which for morphologically complex languages may otherwise be difficult to achieve. In this paper we describe a two-stage lexicon-based approach to morphological normalisation that addresses both inflectional and derivational variation. To eliminate the immense effort required to compile a lexicon by hand, we devise a procedure for acquiring automatically an inflectional morphological lexicon from raw corpora. We also propose a convenient and highly expressive morphology representation formalism on which the acquisition procedure is based. We apply our approach to the morphologically complex Croatian language, but our approach should be equally applicable to other languages of similar morphological complexity. A detailed task-independent evaluation reveals that our approach yields good normalisation performance at both inflectional and derivational level. 
536 |a Projekt MZOS  |f 036-1300646-1986 
546 |a ENG 
690 |a 2.09 
693 |a Morphological normalisation, morphological lexicon, inflection, derivation, lexicon acquisition, Croatian language  |l hrv  |2 crosbi 
693 |a Morphological normalisation, morphological lexicon, inflection, derivation, lexicon acquisition, Croatian language  |l eng  |2 crosbi 
700 1 |a Dalbelo Bašić, Bojana  |4 aut 
700 1 |9 888  |a Tadić, Marko  |4 aut 
773 0 |t Technologies for the Processing and Retrieval of Semi-Structured Documents: Experience from the CADIAL Project  |d Zagreb : Croatian Language Technologies Society, 2009  |k Language and Technology  |h 238  |n Tadić, Marko ; Dalbelo Bašić, Bojana ; Moens, Marie-Francine  |z 978-953-55375-1-9  |g str. 23-80 
942 |c POG  |t 1.16.1  |u 2  |z Znanstveni 
999 |c 312304  |d 312302