Possibilities of Identification of Translation Equivalents in Croatian-English Parallel Corpus
The paper discuses associations between translation equivalents in a parallel aligned corpus. The focus is on identification of multi-word units in a parallel corpus and verification of translation equivalents. The data have been extracted from the Croatian-English parallel corpus aligned to the lev...
Permalink: | http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:313871/Details |
---|---|
Matična publikacija: |
Proceedings of the 5th TELRI seminar Mannheim : TELRI Association, 2001 |
Glavni autori: | Šojat, Krešimir (-), Tadić, Marko (Author) |
Vrsta građe: | Članak |
Jezik: | eng |
LEADER | 02560naa a2200229uu 4500 | ||
---|---|---|---|
008 | 131111s2001 xx 1 eng|d | ||
035 | |a (CROSBI)69119 | ||
040 | |a HR-ZaFF |b hrv |c HR-ZaFF |e ppiak | ||
100 | 1 | |a Šojat, Krešimir | |
245 | 1 | 0 | |a Possibilities of Identification of Translation Equivalents in Croatian-English Parallel Corpus / |c Šojat, Krešimir ; Tadić, Marko. |
246 | 3 | |i Naslov na engleskom: |a Possibilities of Identification of Translation Equivalents in Croatian-English Parallel Corpus | |
300 | |a (u tisku) |f str. | ||
520 | |a The paper discuses associations between translation equivalents in a parallel aligned corpus. The focus is on identification of multi-word units in a parallel corpus and verification of translation equivalents. The data have been extracted from the Croatian-English parallel corpus aligned to the level of sentence that has been compiled at the Institute of Linguistics, Faculty of Philosophy, University of Zagreb. Using statistical measures, primarily mutual information value, significant co-occurrences of words were identified first in a source language. In order to define significant multi-word units both in the source language (Croatian) as well as in the target language (English), the same procedure has been carried through in the target language. The analysis and classification of the results takes place afterwards. In order to establish which translations are significant, the next step consist of applying the statistical procedures between translation equivalents in a test sample extracted from the parallel corpus. Applying such statistical measures between translation equivalents enables a systematization of terminology from the areas where a constant lack of new Croatian terms exists (e.g. market-economy, computer science). On the other hand, examination of translation equivalents in the target language can be used as an instrument for information extraction in the source language. | ||
536 | |a Projekt MZOS |f 130718 | ||
546 | |a ENG | ||
690 | |a 6.03 | ||
693 | |a corpus, parallel corpora, Croatian, English, translation equivalents |l hrv |2 crosbi | ||
693 | |a corpus, parallel corpora, Croatian, English, translation equivalents |l eng |2 crosbi | ||
700 | 1 | |a Tadić, Marko |4 aut | |
773 | 0 | |a 5th TELRI seminar "Extracting Meaning from Corpora" (20-23.09.2000. ; Ljubljana, Slovenija) |t Proceedings of the 5th TELRI seminar |d Mannheim : TELRI Association, 2001 |n Teubert, Wolfgang et al. |g str. (u tisku) | |
942 | |c RZB |u 1 |v Recenzija |z Znanstveni - Predavanje - Nista | ||
999 | |c 313871 |d 313869 |