Bootstrapping bilingual lexicons from comparable corpora for closely related languages

In this paper we present an approach to bootstrap a Croatian- Slovene bilingual lexicon from comparable news corpora from scratch, without relying on any external bilingual knowledge resource. Instead of using a dictionary to translate context vectors, we build a seed lexicon from identical words in...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:312925/Details
Matična publikacija: Text, Speech and Dialogue : 14th International Conference, TSD 2011, Pilsen, Czech Republic, September 1-5, 2011. : Proceedings
Lecture Notes in Computer Science
Glavni autori: Ljubešić, Nikola, informatičar (-), Fišer, Darja (Author)
Vrsta građe: Članak
Jezik: eng
Online pristup: http://www.springerlink.com/content/n5m86t5h212h2753/
LEADER 02036naa a2200253uu 4500
008 131111s2011 xx eng|d
020 |a 9783-642-23537-5 
035 |a (CROSBI)552910 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |9 445  |a Ljubešić, Nikola,   |c informatičar 
245 1 0 |a Bootstrapping bilingual lexicons from comparable corpora for closely related languages /  |c Ljubešić, Nikola ; Fišer, Darja. 
246 3 |i Naslov na engleskom:  |a Bootstrapping Bilingual Lexicons from Comparable Corpora for Closely Related Languages 
300 |a 91-98  |f str. 
520 |a In this paper we present an approach to bootstrap a Croatian- Slovene bilingual lexicon from comparable news corpora from scratch, without relying on any external bilingual knowledge resource. Instead of using a dictionary to translate context vectors, we build a seed lexicon from identical words in both languages and extend it with context-based cognates and translation candidates of the most frequent words. By enlarging the seed dictionary for only 7% we were able to improve the baseline precision from 0.597 to 0.731 on the mean reciprocal rank for the ten top-ranking translation candidates with a 50.4% recall on the gold standard of 500 entries. 
536 |a Projekt MZOS  |f 130-1301679-1380 
546 |a ENG 
690 |a 5.04 
693 |a comparable corpora, bilingual lexicon extraction, bootstrapping  |l hrv  |2 crosbi 
693 |a comparable corpora, bilingual lexicon extraction, bootstrapping  |l eng  |2 crosbi 
700 1 |a Fišer, Darja  |4 aut 
773 0 |t Text, Speech and Dialogue : 14th International Conference, TSD 2011, Pilsen, Czech Republic, September 1-5, 2011. : Proceedings  |d Berlin / Heidelberg : Springer, 2011  |k Lecture Notes in Computer Science  |n Habernal, Ivan ; Matoušek, Václav  |z 978-3-642-23537-5  |g str. 91-98  |a International Conference, TSD 2011(14 ; 2011 ; Pilsen, Czech Republic) 
856 |u http://www.springerlink.com/content/n5m86t5h212h2753/ 
942 |c RZB  |t 1.08  |u 2  |z Znanstveni  |v MeđRecenzija 
999 |c 312925  |d 312923