Disambiguating vectors for bilingual lexicon extraction from comparable corpora

This paper presents an approach to enhance the extraction of translation equivalents from comparable corpora by plugging in bilingual lexico-semantic knowledge harvested from a parallel corpus. First, the bilingual lexicon obtained from word-aligning the parallel corpus replaces an external seed dic...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:318227/Details
Matična publikacija: Proceedings of the Eighth LANGUAGE TECHNOLOGIES Conference
Ljubljana : 2012
Glavni autori: Apidianaki, Marianna (-), Fišer, Darja (Author), Ljubešić, Nikola, informatičar
Vrsta građe: Članak
Jezik: eng
LEADER 02175naa a2200241uu 4500
008 131111s2012 xx 1 eng|d
035 |a (CROSBI)616795 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |a Apidianaki, Marianna 
245 1 0 |a Disambiguating vectors for bilingual lexicon extraction from comparable corpora /  |c Apidianaki, Marianna ; Ljubešić, Nikola ; Fišer, Darja. 
246 3 |i Naslov na engleskom:  |a Disambiguating vectors for bilingual lexicon extraction from comparable corpora 
300 |a 10-15  |f str. 
520 |a This paper presents an approach to enhance the extraction of translation equivalents from comparable corpora by plugging in bilingual lexico-semantic knowledge harvested from a parallel corpus. First, the bilingual lexicon obtained from word-aligning the parallel corpus replaces an external seed dictionary, making the approach knowledge-light and portable. Next, instead of using simple 1:1 mappings between the source and the target language, translation equivalents are clustered into sets of synonyms based on contextual similarities, enabling us to expand the translation of vector features with several translation variants. And last but not least, the vector features are disambiguated and translated only with the translation variants from the most appropriate cluster, thus producing less noisy vectors that allow for a more successful cross- lingual comparison of the vectors compared to simpler methods. 
536 |a Projekt MZOS  |f 130-1301679-1380 
546 |a ENG 
690 |a 5.04 
693 |a bilingual lexicon extraction, cross-lingual sense clustering, feature disambiguation  |l hrv  |2 crosbi 
693 |a bilingual lexicon extraction, cross-lingual sense clustering, feature disambiguation  |l eng  |2 crosbi 
700 1 |a Fišer, Darja  |4 aut 
700 1 |9 445  |a Ljubešić, Nikola,   |c informatičar  |4 aut 
773 0 |a Eighth LANGUAGE TECHNOLOGIES Conference (8.-9.10.2012. ; Ljubljana, Slovenija)  |t Proceedings of the Eighth LANGUAGE TECHNOLOGIES Conference  |d Ljubljana : 2012  |n Erjavec, Tomaž ; Žganec Gros, Jerneja  |g str. 10-15 
942 |c RZB  |u 2  |v Recenzija  |z Znanstveni - Predavanje - CijeliRad  |t 1.08 
999 |c 318227  |d 318225