MARC: The applicability of lemmatisation in translation equivalents detection

The applicability of lemmatisation in translation equivalents detection

The aim of the research is to help in identification of TEs in 1:1 aligned sentences at the level of single-word units. The research is based on the Croatian-English parallel corpus compiled at the University of Zagreb. The method is based entirely on a statistical approach with no linguistic filter...

Full description

Permalink:	http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:310964/Details
Matična publikacija:	Meaningful Texts: The Extraction of Semantic Information from Monolingual and Multilingual Corpora Barnbrook, Geoff ; Danielsson, Pernilla ; Mahlberg, Michaela
Glavni autori:	Tadić, Marko (-), Fulgosi, Sanja (Author), Šojat, Krešimir
Vrsta građe:	Članak
Jezik:	eng
Online pristup:	http://www.is.bham.ac.uk/ubpress/corpus_meaningful.asp


LEADER	02432naa a2200265uu 4500
008	131111s2004 xx eng\|d
020			\|a 082647490X
035			\|a (CROSBI)125583
040			\|a HR-ZaFF \|b hrv \|c HR-ZaFF \|e ppiak
100	1		\|a Tadić, Marko
245	1	4	\|a The applicability of lemmatisation in translation equivalents detection / \|c Tadić, Marko ; Fulgosi, Sanja ; Šojat, Krešimir.
246	3		\|i Naslov na engleskom: \|a The applicability of lemmatisation in translation equivalents detection
300			\|a 195-206 \|f str.
520			\|a The aim of the research is to help in identification of TEs in 1:1 aligned sentences at the level of single-word units. The research is based on the Croatian-English parallel corpus compiled at the University of Zagreb. The method is based entirely on a statistical approach with no linguistic filter applied before or after the processing which has 3 steps: 1) generation of all possible pairs of tokens from 1:1 aligned sentences (Carthesius product) ; 2) application of mutual information to generated pairs in order to detect candidates for real TE ; 3) sorting the pairs according to calculated MI and choosing real TE for further use. The same method was applied to nonlemmatized and lemmatized material. The latter demonstrated 4.5 % higher precision and it has proven our hypothesis that for Croatian-English pair (and possibly other morphologically rich languages like Croatian) the lemmatized form of corpus data helps the statistical methods of TE detection.
536			\|a Projekt MZOS \|f 0130418
546			\|a ENG
690			\|a 6.03
693			\|a Croatian Language, English Language, Croatian-English Parallel Corpus, parallel corpus, lemmatization, translation equivalents, translation equivalents detection \|l hrv \|2 crosbi
693			\|a Croatian Language, English Language, Croatian-English Parallel Corpus, parallel corpus, lemmatization, translation equivalents, translation equivalents detection \|l eng \|2 crosbi
700	1		\|a Fulgosi, Sanja \|4 aut
700	1		\|a Šojat, Krešimir \|4 aut
773	0		\|t Meaningful Texts: The Extraction of Semantic Information from Monolingual and Multilingual Corpora \|d London, New York : Continuum international publishing group, 2004 \|n Barnbrook, Geoff ; Danielsson, Pernilla ; Mahlberg, Michaela \|z 082647490X \|g str. 195-206
856			\|u http://www.is.bham.ac.uk/ubpress/corpus_meaningful.asp
942			\|c POG \|t 1.16.1 \|u 1 \|z Znanstveni
999			\|c 310964 \|d 310962

The applicability of lemmatisation in translation equivalents detection

Slični primjerci