Bilingual lexicon extraction from comparable corpora: a comparative study

This paper presents a comparative study of the impact of the key parameters for bilingual lexicon extraction for nouns from comparable corpora. The parameters we analyzed are: corpus size and comparability, dictionary size and type, feature selection for context vectors and window size, and associat...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:317596/Details
Matična publikacija: First International Workshop on Lexical Resources, An ESSLLI 2011 Workshop, Ljubljana, Slovenia - August 1-5, 2011
2011
Glavni autori: Ljubešić, Nikola, informatičar (-), Fišer, Darja (Author), Vintar, Špela, Pollak, Senja
Vrsta građe: Članak
Jezik: eng
Online pristup: http://alpage.inria.fr/~sagot/woler2011/WoLeR2011/Home.html
LEADER 02076naa a2200265uu 4500
008 131111s2011 xx 1 eng|d
035 |a (CROSBI)552786 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |9 445  |a Ljubešić, Nikola,   |c informatičar 
245 1 0 |a Bilingual lexicon extraction from comparable corpora: a comparative study /  |c Ljubešić, Nikola ; Fišer, Darja ; Vintar, Špela ; Pollak, Senja. 
246 3 |i Naslov na engleskom:  |a Bilingual lexicon extraction from comparable corpora: A comparative study 
300 |f str. 
520 |a This paper presents a comparative study of the impact of the key parameters for bilingual lexicon extraction for nouns from comparable corpora. The parameters we analyzed are: corpus size and comparability, dictionary size and type, feature selection for context vectors and window size, and association and similarity measures. Evaluation against the gold standard shows that window size of 7 with encoded position yields best results. The consistently best-performing association and similarity measures are Jensen-Shannon divergence with log-likelihood. We have shown that very good results can be achieved with small-sized but purpose-built seed lexicons and that problems arising from dissimilarities between the source and the target corpus can be compensated with their sufficient size. 
536 |a Projekt MZOS  |f 130-1301679-1380 
546 |a ENG 
690 |a 5.04 
693 |a comparable corpora, bilingual lexicon extraction  |l hrv  |2 crosbi 
693 |a comparable corpora, bilingual lexicon extraction  |l eng  |2 crosbi 
700 1 |a Fišer, Darja  |4 aut 
700 1 |a Vintar, Špela  |4 aut 
700 1 |a Pollak, Senja  |4 aut 
773 0 |a First International Workshop on Lexical Resources (1-5.8.2011. ; Ljubljana, Slovenija)  |t First International Workshop on Lexical Resources, An ESSLLI 2011 Workshop, Ljubljana, Slovenia - August 1-5, 2011  |d 2011 
856 |u http://alpage.inria.fr/~sagot/woler2011/WoLeR2011/Home.html 
942 |c RZB  |u 2  |v Recenzija  |z Znanstveni - Predavanje - CijeliRad  |t 1.08 
999 |c 317596  |d 317594