Making Monolingual Corpora Comparable: a Case Study of Bulgarian and Croatian

This paper describes the first steps towards the creation of a Bulgarian-Croatian comparable corpus. Its base are two newspaper subcorpora from larger reference corpora of Bulgarian and Croatian. In the beginning we rely on more extralinguistically-oriented, but methodologically cleaner parameters o...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:311205/Details
Matična publikacija: Fourth International Conference on Language Resources and Evaluation LREC2004
Lino, Maria Teresa ; Xavier, Maria Francesca ; Ferreira, Fátima ; Costa, Rute ; Silva, Raquel
Glavni autori: Bekavac, Božo (-), Osenova, Petya (Author), Simov, Kiril, Tadić, Marko
Vrsta građe: Članak
Jezik: eng
Online pristup: http://bib.irb.hr/datoteka/174994.Comparable-paper529.pdf
LEADER 02026naa a2200301uu 4500
008 131111s2004 xx eng|d
020 |a 29517408-1-6 
035 |a (CROSBI)174994 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |a Bekavac, Božo 
245 1 0 |a Making Monolingual Corpora Comparable: a Case Study of Bulgarian and Croatian /  |c Bekavac, Božo ; Osenova, Petya ; Simov, Kiril ; Tadić, Marko. 
246 3 |i Naslov na engleskom:  |a Making Monolingual Corpora Comparable: a Case Study of Bulgarian and Croatian 
300 |a 1187-1190  |f str. 
520 |a This paper describes the first steps towards the creation of a Bulgarian-Croatian comparable corpus. Its base are two newspaper subcorpora from larger reference corpora of Bulgarian and Croatian. In the beginning we rely on more extralinguistically-oriented, but methodologically cleaner parameters of similarity like: specific topics, pre-defined time span and data size. The idea of ‘ light’ and ‘ hard’ comparable corpora is introduced. At this stage we aim at producing a ‘ light’ bilingual comparable corpus. The algorithm for identifying lexical similarity and aligning linguistic units is presented, and the initial experiments are outlined. 
536 |a Projekt MZOS  |f 0130418 
546 |a ENG 
690 |a 5.04 
690 |a 6.03 
690 |a 6.06 
693 |a corpus linguistics, comparable corpora, Croatian, Bulgarian  |l hrv  |2 crosbi 
693 |a corpus linguistics, comparable corpora, Croatian, Bulgarian  |l eng  |2 crosbi 
700 1 |a Osenova, Petya  |4 aut 
700 1 |a Simov, Kiril  |4 aut 
700 1 |a Tadić, Marko  |4 aut 
773 0 |t Fourth International Conference on Language Resources and Evaluation LREC2004  |d Pariz-Lisabon : ELRA, 2004  |n Lino, Maria Teresa ; Xavier, Maria Francesca ; Ferreira, Fátima ; Costa, Rute ; Silva, Raquel  |z 2-9517408-1-6  |g str. 1187-1190 
856 |u http://bib.irb.hr/datoteka/174994.Comparable-paper529.pdf 
942 |c POG  |t 1.16.1  |u 1  |z Znanstveni 
999 |c 311205  |d 311203