MARC: Bilingual lexicon extraction from comparable corpora for closely related languages

Bilingual lexicon extraction from comparable corpora for closely related languages

In this paper we present a knowledge-light approach to extract a bilingual lexicon for closely related languages from comparable corpora. While in most related work an existing dictionary is used to translate context vectors, we take advantage of the similarities between languages instead and build...

Full description

Permalink:	http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:317598/Details
Matična publikacija:	Proceedings of the International Conference Recent Advances in Natural Language Processing 2011 Hissar, Bulgaria : RANLP 2011 Organising Committee
Glavni autori:	Fišer, Darja (-), Ljubešić, Nikola, informatičar (Author)
Vrsta građe:	Članak
Jezik:	eng
Online pristup:	http://www.aclweb.org/anthology-new/R/R11/


LEADER	02451naa a2200241uu 4500
008	131111s2011 xx 1 eng\|d
035			\|a (CROSBI)552852
040			\|a HR-ZaFF \|b hrv \|c HR-ZaFF \|e ppiak
100	1		\|a Fišer, Darja
245	1	0	\|a Bilingual lexicon extraction from comparable corpora for closely related languages / \|c Fišer, Darja ; Ljubešić, Nikola.
246	3		\|i Naslov na engleskom: \|a Bilingual Lexicon Extraction from Comparable Corpora for Closely Related Languages
300			\|a 125-131 \|f str.
520			\|a In this paper we present a knowledge-light approach to extract a bilingual lexicon for closely related languages from comparable corpora. While in most related work an existing dictionary is used to translate context vectors, we take advantage of the similarities between languages instead and build a seed lexicon from words that are identical in both languages and then further extend it with context-based cognates and translations of the most frequent words. We also use cognates for reranking translation candidates obtained via context similarity and extract translation equivalents for all content words, not just nouns as in most related work. The results are very encouraging, suggesting that other similar languages could bene- fit from the same approach. By enlarging the seed lexicon with cognates and translations of the most frequent words and by cognate-based reranking of translation candidates we were able to improve the average baseline precision from 0.592 to 0.797 on the mean reciprocal rank for the ten top- ranking translation candidates for nouns, verbs and adjectives with a 46% recall on the gold standard of 1000 random entries from a traditional dictionary.
536			\|a Projekt MZOS \|f 130-1301679-1380
546			\|a ENG
690			\|a 5.04
693			\|a comparable corpora, lexicon extraction, closely related languages \|l hrv \|2 crosbi
693			\|a comparable corpora, lexicon extraction, closely related languages \|l eng \|2 crosbi
773	0		\|a Recent Advances in Natural Language Processing 2011 (12-14.09.2011. ; Hissar, Bugarska) \|t Proceedings of the International Conference Recent Advances in Natural Language Processing 2011 \|d Hissar, Bulgaria : RANLP 2011 Organising Committee \|g str. 125-131
700	1		\|9 445 \|a Ljubešić, Nikola, \|c informatičar \|4 aut
856			\|u http://www.aclweb.org/anthology-new/R/R11/
942			\|c RZB \|u 2 \|v Recenzija \|z Znanstveni - Predavanje - CijeliRad \|t 1.08
999			\|c 317598 \|d 317596

Bilingual lexicon extraction from comparable corpora for closely related languages

Slični primjerci