Corpus-Based Comparison of Contemporary Croatian, Serbian and Bosnian

This paper explores the differences between three Slavic languages: Bosnian, Croatian and Serbian, drawing on the Southeast European Times newspaper corpus, translated to each language from the source English text and consisting of approximately 330, 000 tokens for each language. The paper is an eff...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:315860/Details
Matična publikacija: Proceedings of the 6th International Conference on Formal Approaches to South Slavic and Balkan Languages
Zagreb : Croatian Language Technologies Society, 2008
Glavni autori: Bekavac, Božo (-), Seljan, Sanja (Author), Simeon, Ivana
Vrsta građe: Članak
Jezik: eng
Online pristup: http://bib.irb.hr/datoteka/382519.FASSBL2008_paper_BB_SS_IS_v3.pdf
LEADER 02581naa a2200289uu 4500
008 131111s2008 xx 1 eng|d
035 |a (CROSBI)382519 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |9 835  |a Bekavac, Božo 
245 1 0 |a Corpus-Based Comparison of Contemporary Croatian, Serbian and Bosnian /  |c Bekavac, Božo ; Seljan, Sanja ; Simeon, Ivana. 
246 3 |i Naslov na engleskom:  |a Corpus-Based Comparison of Contemporary Croatian, Serbian and Bosnian 
300 |a 33-39  |f str. 
520 |a This paper explores the differences between three Slavic languages: Bosnian, Croatian and Serbian, drawing on the Southeast European Times newspaper corpus, translated to each language from the source English text and consisting of approximately 330, 000 tokens for each language. The paper is an effort intended to contribute to the establishment of the criteria and methodology for measuring similarities between these languages. The differences were explored at five levels: at the level of phonology, morphology, lexis, syntax and semantics. Empirical analysis has shown that a huge portion of differences across the three languages are systematic and regular, and as such, could be formalized for automatic translation/generation. The results of this study and of similar future corpus-based studies can be used in developing NLP tools such as annotating tools, e-dictionaries, text summarizers, machine translation systems, computerassisted language learning etc. for the three languages, as well as further linguistic investigation of their mutual relationship. 
536 |a Projekt MZOS  |f 130-1300646-0645 
536 |a Projekt MZOS  |f 130-1300646-0909 
536 |a Projekt MZOS  |f 130-1300646-1002 
546 |a ENG 
690 |a 5.04 
690 |a 6.03 
693 |a slavenski jezici, hrvatski, srpski, bosanski, jezične razlike  |l hrv  |2 crosbi 
693 |a Slavic languages, Croatian, Serbian, Bosnian, language differences  |l eng  |2 crosbi 
773 0 |a Formal Approaches to South Slavic and Balkan Languages FASSBL (25-28.09.2008. ; Dubrovnik, Hrvatska)  |t Proceedings of the 6th International Conference on Formal Approaches to South Slavic and Balkan Languages  |d Zagreb : Croatian Language Technologies Society, 2008  |n Tadić, Marko ; Dimitrova-Vulchanova, Mila ; Koeva, Svetla  |z 978-953-55375-0-2  |g str. 33-39 
700 1 |9 430  |a Seljan, Sanja  |4 aut 
700 1 |9 878  |a Simeon, Ivana  |4 aut 
856 |u http://bib.irb.hr/datoteka/382519.FASSBL2008_paper_BB_SS_IS_v3.pdf 
942 |c RZB  |u 2  |v Recenzija  |z Znanstveni - Predavanje - CijeliRad  |t 1.08 
999 |c 315860  |d 315858