Bootstrapping bilingual lexicons from comparable corpora for closely related languages
In this paper we present an approach to bootstrap a Croatian- Slovene bilingual lexicon from comparable news corpora from scratch, without relying on any external bilingual knowledge resource. Instead of using a dictionary to translate context vectors, we build a seed lexicon from identical words in...
Permalink: | http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:312925/Details |
---|---|
Matična publikacija: |
Text, Speech and Dialogue : 14th International Conference, TSD 2011, Pilsen, Czech Republic, September 1-5, 2011. : Proceedings Lecture Notes in Computer Science |
Glavni autori: | Ljubešić, Nikola, informatičar (-), Fišer, Darja (Author) |
Vrsta građe: | Članak |
Jezik: | eng |
Online pristup: |
http://www.springerlink.com/content/n5m86t5h212h2753/ |
LEADER | 02036naa a2200253uu 4500 | ||
---|---|---|---|
008 | 131111s2011 xx eng|d | ||
020 | |a 9783-642-23537-5 | ||
035 | |a (CROSBI)552910 | ||
040 | |a HR-ZaFF |b hrv |c HR-ZaFF |e ppiak | ||
100 | 1 | |9 445 |a Ljubešić, Nikola, |c informatičar | |
245 | 1 | 0 | |a Bootstrapping bilingual lexicons from comparable corpora for closely related languages / |c Ljubešić, Nikola ; Fišer, Darja. |
246 | 3 | |i Naslov na engleskom: |a Bootstrapping Bilingual Lexicons from Comparable Corpora for Closely Related Languages | |
300 | |a 91-98 |f str. | ||
520 | |a In this paper we present an approach to bootstrap a Croatian- Slovene bilingual lexicon from comparable news corpora from scratch, without relying on any external bilingual knowledge resource. Instead of using a dictionary to translate context vectors, we build a seed lexicon from identical words in both languages and extend it with context-based cognates and translation candidates of the most frequent words. By enlarging the seed dictionary for only 7% we were able to improve the baseline precision from 0.597 to 0.731 on the mean reciprocal rank for the ten top-ranking translation candidates with a 50.4% recall on the gold standard of 500 entries. | ||
536 | |a Projekt MZOS |f 130-1301679-1380 | ||
546 | |a ENG | ||
690 | |a 5.04 | ||
693 | |a comparable corpora, bilingual lexicon extraction, bootstrapping |l hrv |2 crosbi | ||
693 | |a comparable corpora, bilingual lexicon extraction, bootstrapping |l eng |2 crosbi | ||
700 | 1 | |a Fišer, Darja |4 aut | |
773 | 0 | |t Text, Speech and Dialogue : 14th International Conference, TSD 2011, Pilsen, Czech Republic, September 1-5, 2011. : Proceedings |d Berlin / Heidelberg : Springer, 2011 |k Lecture Notes in Computer Science |n Habernal, Ivan ; Matoušek, Václav |z 978-3-642-23537-5 |g str. 91-98 |a International Conference, TSD 2011(14 ; 2011 ; Pilsen, Czech Republic) | |
856 | |u http://www.springerlink.com/content/n5m86t5h212h2753/ | ||
942 | |c RZB |t 1.08 |u 2 |z Znanstveni |v MeđRecenzija | ||
999 | |c 312925 |d 312923 |