Making Monolingual Corpora Comparable: a Case Study of Bulgarian and Croatian
This paper describes the first steps towards the creation of a Bulgarian-Croatian comparable corpus. Its base are two newspaper subcorpora from larger reference corpora of Bulgarian and Croatian. In the beginning we rely on more extralinguistically-oriented, but methodologically cleaner parameters o...
Permalink: | http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:311205/Details |
---|---|
Matična publikacija: |
Fourth International Conference on Language Resources and Evaluation LREC2004 Lino, Maria Teresa ; Xavier, Maria Francesca ; Ferreira, Fátima ; Costa, Rute ; Silva, Raquel |
Glavni autori: | Bekavac, Božo (-), Osenova, Petya (Author), Simov, Kiril, Tadić, Marko |
Vrsta građe: | Članak |
Jezik: | eng |
Online pristup: |
http://bib.irb.hr/datoteka/174994.Comparable-paper529.pdf |
LEADER | 02026naa a2200301uu 4500 | ||
---|---|---|---|
008 | 131111s2004 xx eng|d | ||
020 | |a 29517408-1-6 | ||
035 | |a (CROSBI)174994 | ||
040 | |a HR-ZaFF |b hrv |c HR-ZaFF |e ppiak | ||
100 | 1 | |a Bekavac, Božo | |
245 | 1 | 0 | |a Making Monolingual Corpora Comparable: a Case Study of Bulgarian and Croatian / |c Bekavac, Božo ; Osenova, Petya ; Simov, Kiril ; Tadić, Marko. |
246 | 3 | |i Naslov na engleskom: |a Making Monolingual Corpora Comparable: a Case Study of Bulgarian and Croatian | |
300 | |a 1187-1190 |f str. | ||
520 | |a This paper describes the first steps towards the creation of a Bulgarian-Croatian comparable corpus. Its base are two newspaper subcorpora from larger reference corpora of Bulgarian and Croatian. In the beginning we rely on more extralinguistically-oriented, but methodologically cleaner parameters of similarity like: specific topics, pre-defined time span and data size. The idea of ‘ light’ and ‘ hard’ comparable corpora is introduced. At this stage we aim at producing a ‘ light’ bilingual comparable corpus. The algorithm for identifying lexical similarity and aligning linguistic units is presented, and the initial experiments are outlined. | ||
536 | |a Projekt MZOS |f 0130418 | ||
546 | |a ENG | ||
690 | |a 5.04 | ||
690 | |a 6.03 | ||
690 | |a 6.06 | ||
693 | |a corpus linguistics, comparable corpora, Croatian, Bulgarian |l hrv |2 crosbi | ||
693 | |a corpus linguistics, comparable corpora, Croatian, Bulgarian |l eng |2 crosbi | ||
700 | 1 | |a Osenova, Petya |4 aut | |
700 | 1 | |a Simov, Kiril |4 aut | |
700 | 1 | |a Tadić, Marko |4 aut | |
773 | 0 | |t Fourth International Conference on Language Resources and Evaluation LREC2004 |d Pariz-Lisabon : ELRA, 2004 |n Lino, Maria Teresa ; Xavier, Maria Francesca ; Ferreira, Fátima ; Costa, Rute ; Silva, Raquel |z 2-9517408-1-6 |g str. 1187-1190 | |
856 | |u http://bib.irb.hr/datoteka/174994.Comparable-paper529.pdf | ||
942 | |c POG |t 1.16.1 |u 1 |z Znanstveni | ||
999 | |c 311205 |d 311203 |