Automatic Acquisition of Inflectional Lexica for Morphological Normalisation

Due to natural language morphology, words can take on various morphological forms. Morphological normalisation – often used in information retrieval and text mining systems – conflates morphological variants of a word to a single representative form. In this paper, we describe an approach to lexicon...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:307089/Details
Matična publikacija: Information Processing & Management
44 (2008), 5 ; str. 1720-1731
Glavni autori: Šnajder, Jan (-), Dalbelo Bašić, Bojana (Author), Tadić, Marko
Vrsta građe: Članak
Jezik: eng
Online pristup: http://dx.doi.org/10.1016/j.ipm.2008.03.006
LEADER 02539naa a2200325uu 4500
008 131105s2008 xx eng|d
022 |a 0306-4573 
024 |2 doi  |a doi:10.1016/j.ipm.2008.03.006 
035 |a (CROSBI)326450 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |a Šnajder, Jan 
245 1 0 |a Automatic Acquisition of Inflectional Lexica for Morphological Normalisation /  |c Šnajder, Jan ; Dalbelo Bašić, Bojana ; Tadić, Marko. 
246 3 |i Naslov na engleskom:  |a Automatic Acquisition of Inflectional Lexica for Morphological Normalisation 
300 |a 1720-1731  |f str. 
363 |a 44  |b 5  |i 2008 
520 |a Due to natural language morphology, words can take on various morphological forms. Morphological normalisation – often used in information retrieval and text mining systems – conflates morphological variants of a word to a single representative form. In this paper, we describe an approach to lexicon-based inflectional normalisation. This approach is in between stemming and lemmatisation, and is suitable for morphological normalisation of inflectionally complex languages. To eliminate the immense effort required to compile the lexicon by hand, we focus on the problem of acquiring automatically an inflectional morphological lexicon from raw corpora. We propose a convenient and highly expressive morphology representation formalism on which the acquisition procedure is based. Our approach is applied to the morphologically complex Croatian language, but it should be equally applicable to other languages of similar morphological complexity. Experimental results show that our approach can be used to acquire a lexicon whose linguistic quality allows for rather good normalisation performance. 
536 |a Projekt MZOS  |f 036-1300646-1986 
536 |a Projekt MZOS  |f 130-1300646-0645 
546 |a ENG 
690 |a 2.09 
690 |a 5.04 
690 |a 6.03 
693 |a Morphological normalisation, morphological lexicon, lexicon acquisition, inflection, Croatian language, text mining, information retrieval  |l hrv  |2 crosbi 
693 |a Morphological normalisation, morphological lexicon, lexicon acquisition, inflection, Croatian language, text mining, information retrieval  |l eng  |2 crosbi 
700 1 |a Dalbelo Bašić, Bojana  |4 aut 
700 1 |a Tadić, Marko  |4 aut 
773 0 |t Information Processing & Management  |x 0306-4573  |g 44 (2008), 5 ; str. 1720-1731 
856 |u http://dx.doi.org/10.1016/j.ipm.2008.03.006 
942 |c CLA  |t 1.01  |u 2  |z Znanstveni - clanak 
999 |c 307089  |d 307087