Towards Obtaining High Quality Sentence-Aligned English-Croatian Parallel Corpus
This paper presents the acquisition of parallel bilingual corpus and all the steps involved in the process of unsupervised sentence alignment, such as tokenization, lowercasing, etc. The problem of sentence alignment is not trivial because translators do not necessarily translate one sentence in the...
Permalink: | http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:317111 |
---|---|
Matična publikacija: |
Proceedings of the 4th IEEE International Conference on Computer Science and Information Technology ICCSIT 2011 Chengdu, China : 2011 |
Glavni autori: | Brkić, Marija (-), Matetić, Maja (Author), Seljan, Sanja |
Vrsta građe: | Članak |
Jezik: | eng |