MARC: Bootstrapping bilingual lexicons from comparable corpora for closely related languages

Bootstrapping bilingual lexicons from comparable corpora for closely related languages

In this paper we present an approach to bootstrap a Croatian- Slovene bilingual lexicon from comparable news corpora from scratch, without relying on any external bilingual knowledge resource. Instead of using a dictionary to translate context vectors, we build a seed lexicon from identical words in...

Full description

Permalink:	http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:312925/Details
Matična publikacija:	Text, Speech and Dialogue : 14th International Conference, TSD 2011, Pilsen, Czech Republic, September 1-5, 2011. : Proceedings Lecture Notes in Computer Science
Glavni autori:	Ljubešić, Nikola, informatičar (-), Fišer, Darja (Author)
Vrsta građe:	Članak
Jezik:	eng
Online pristup:	http://www.springerlink.com/content/n5m86t5h212h2753/


LEADER	02036naa a2200253uu 4500
008	131111s2011 xx eng\|d
020			\|a 9783-642-23537-5
035			\|a (CROSBI)552910
040			\|a HR-ZaFF \|b hrv \|c HR-ZaFF \|e ppiak
100	1		\|9 445 \|a Ljubešić, Nikola, \|c informatičar
245	1	0	\|a Bootstrapping bilingual lexicons from comparable corpora for closely related languages / \|c Ljubešić, Nikola ; Fišer, Darja.
246	3		\|i Naslov na engleskom: \|a Bootstrapping Bilingual Lexicons from Comparable Corpora for Closely Related Languages
300			\|a 91-98 \|f str.
520			\|a In this paper we present an approach to bootstrap a Croatian- Slovene bilingual lexicon from comparable news corpora from scratch, without relying on any external bilingual knowledge resource. Instead of using a dictionary to translate context vectors, we build a seed lexicon from identical words in both languages and extend it with context-based cognates and translation candidates of the most frequent words. By enlarging the seed dictionary for only 7% we were able to improve the baseline precision from 0.597 to 0.731 on the mean reciprocal rank for the ten top-ranking translation candidates with a 50.4% recall on the gold standard of 500 entries.
536			\|a Projekt MZOS \|f 130-1301679-1380
546			\|a ENG
690			\|a 5.04
693			\|a comparable corpora, bilingual lexicon extraction, bootstrapping \|l hrv \|2 crosbi
693			\|a comparable corpora, bilingual lexicon extraction, bootstrapping \|l eng \|2 crosbi
700	1		\|a Fišer, Darja \|4 aut
773	0		\|t Text, Speech and Dialogue : 14th International Conference, TSD 2011, Pilsen, Czech Republic, September 1-5, 2011. : Proceedings \|d Berlin / Heidelberg : Springer, 2011 \|k Lecture Notes in Computer Science \|n Habernal, Ivan ; Matoušek, Václav \|z 978-3-642-23537-5 \|g str. 91-98 \|a International Conference, TSD 2011(14 ; 2011 ; Pilsen, Czech Republic)
856			\|u http://www.springerlink.com/content/n5m86t5h212h2753/
942			\|c RZB \|t 1.08 \|u 2 \|z Znanstveni \|v MeđRecenzija
999			\|c 312925 \|d 312923

Bootstrapping bilingual lexicons from comparable corpora for closely related languages

Slični primjerci