New version of the Croatian National Corpus

This contribution presents the new version (v 2.5) of the Croatian National Corpus (HNK). In the beginning it briefly describes the history of collecting HNK and its first two versions. It continues with describing the differences and novelties introduced in this new version: 1) new text samples tha...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:312373/Details
Matična publikacija: After Half a Century of Slavonic Natural Language Processing
261
Glavni autor: Tadić, Marko (-)
Vrsta građe: Članak
Jezik: eng
LEADER 02001naa a2200229uu 4500
008 131111s2009 xx eng|d
020 |a 97880-7399-815-8 
035 |a (CROSBI)449387 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |9 888  |a Tadić, Marko 
245 1 0 |a New version of the Croatian National Corpus /  |c Tadić, Marko. 
246 3 |i Naslov na engleskom:  |a New version of the Croatian National Corpus 
300 |a 199-205  |f str. 
520 |a This contribution presents the new version (v 2.5) of the Croatian National Corpus (HNK). In the beginning it briefly describes the history of collecting HNK and its first two versions. It continues with describing the differences and novelties introduced in this new version: 1) new text samples that bring the existing corpus structure more to the desired ideal ensemble of text types, genres and topics ; 2) lemmatization and full MSD-tagging of the whole corpus. This second update is realized using lemmatizer and MSD-tagger for Croatian described in (Agi`c et al. 2008, Agić et al. 2009a). It achieves results at the level of state-of-art of taggers for other Slavic languages while in lemmatization it offers some novel solutions in its hybrid approach to disambiguation of lemmatization. Lemmatized, MSD-tagged and disambiguated HNK is available for querying through standard client-server architecture Manatee/Bonito. The contribution concludes with future directions for HNK. 
536 |a Projekt MZOS  |f 130-1300646-0645 
546 |a ENG 
690 |a 6.03 
693 |a corpus, corpus linguistics, Croatian National Corpus, Croatian language  |l hrv  |2 crosbi 
693 |a corpus, corpus linguistics, Croatian National Corpus, Croatian language  |l eng  |2 crosbi 
773 0 |t After Half a Century of Slavonic Natural Language Processing  |d Brno : Masaryk University, 2009  |h 261  |n Hlaváčková, Dana ; Horák, Aleš ; Osolsobě, Klara ; Rychlý, Pavel  |z 978-80-7399-815-8  |g str. 199-205 
942 |c POG  |t 1.16.1  |u 2  |z Znanstveni 
999 |c 312373  |d 312371