Evaluating Sentence Alignment on Croatian-English Parallel Corpora

This paper describes an experiment in applying sentence alignment methods to Croatian-English parallel corpora and systematically evaluate their performance within the recall, precision and F-measure framework. It is our primary goal to provide an insight and a reference point on sentence alignment...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:315741/Details
Matična publikacija: Proceedings of the 6th International Conference on Formal Approaches to South Slavic and Balkan Languages
Zagreb : Croatian Language Technologies Society, 2008
Glavni autori: Seljan, Sanja (-), Tadić, Marko (Author), Agić, Željko
Vrsta građe: Članak
Jezik: eng
Online pristup: http://bib.irb.hr/datoteka/364628.2008-FASSBL-SSZAMT-final.pdf
LEADER 02672naa a2200301uu 4500
008 131111s2008 xx 1 eng|d
035 |a (CROSBI)364628 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |9 430  |a Seljan, Sanja 
245 1 0 |a Evaluating Sentence Alignment on Croatian-English Parallel Corpora /  |c Seljan, Sanja ; Agić, Željko ; Tadić, Marko. 
246 3 |i Naslov na engleskom:  |a Evaluating sentence alignment on Croatian-English parallel corpora 
300 |a 101-108  |f str. 
520 |a This paper describes an experiment in applying sentence alignment methods to Croatian-English parallel corpora and systematically evaluate their performance within the recall, precision and F-measure framework. It is our primary goal to provide an insight and a reference point on sentence alignment accuracy for Croatian-English language pair and also to extend the scope of (Tadić, 2000) – to our knowledge, the first experiment dealing with sentence alignment of Croatian-English parallel corpora – by utilizing newly implemented tools, creating corpora subsets defined by genre and finally by expanding and formalizing its preliminary observations on alignment accuracy. Therefore, in this paper we start off by briefly describing and argumenting sentence alignment paradigms of choice and presenting available language resources, subset of Croatian-English parallel corpus described in (Tadić, 2000) being our primary asset. These descriptions are followed by a formal definition of our testing framework. Results are then discussed in detail and conclusions are stated along with a brief insight on possible future work. 
536 |a Projekt MZOS  |f 130-1300646-0645 
536 |a Projekt MZOS  |f 130-1300646-0909 
536 |a Projekt MZOS  |f 130-1300646-1776 
546 |a ENG 
690 |a 2.09 
690 |a 5.04 
690 |a 6.03 
693 |a sentence alignment, croatian-english parallel corpora  |l hrv  |2 crosbi 
693 |a sentence alignment, croatian-english parallel corpora  |l eng  |2 crosbi 
700 1 |a Tadić, Marko  |4 aut 
700 1 |9 495  |a Agić, Željko  |4 aut 
773 0 |a 6th International Conference on Formal Approaches to South Slavic and Balkan Languages (FASSBL 2008) (25-28.09.2008. ; Dubrovnik, Hrvatska)  |t Proceedings of the 6th International Conference on Formal Approaches to South Slavic and Balkan Languages  |d Zagreb : Croatian Language Technologies Society, 2008  |n Tadić, Marko ; Dimitrova-Vulchanova, Mila ; Koeva, Svetla  |z 978-953-55375-0-2  |g str. 101-108 
856 |u http://bib.irb.hr/datoteka/364628.2008-FASSBL-SSZAMT-final.pdf 
942 |c RZB  |u 2  |v Recenzija  |z Znanstveni - Predavanje - CijeliRad  |t 1.08 
999 |c 315741  |d 315739