New version of the Croatian National Corpus
This contribution presents the new version (v 2.5) of the Croatian National Corpus (HNK). In the beginning it briefly describes the history of collecting HNK and its first two versions. It continues with describing the differences and novelties introduced in this new version: 1) new text samples tha...
Permalink: | http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:312373/Details |
---|---|
Matična publikacija: |
After Half a Century of Slavonic Natural Language Processing 261 |
Glavni autor: | Tadić, Marko (-) |
Vrsta građe: | Članak |
Jezik: | eng |
LEADER | 02001naa a2200229uu 4500 | ||
---|---|---|---|
008 | 131111s2009 xx eng|d | ||
020 | |a 97880-7399-815-8 | ||
035 | |a (CROSBI)449387 | ||
040 | |a HR-ZaFF |b hrv |c HR-ZaFF |e ppiak | ||
100 | 1 | |9 888 |a Tadić, Marko | |
245 | 1 | 0 | |a New version of the Croatian National Corpus / |c Tadić, Marko. |
246 | 3 | |i Naslov na engleskom: |a New version of the Croatian National Corpus | |
300 | |a 199-205 |f str. | ||
520 | |a This contribution presents the new version (v 2.5) of the Croatian National Corpus (HNK). In the beginning it briefly describes the history of collecting HNK and its first two versions. It continues with describing the differences and novelties introduced in this new version: 1) new text samples that bring the existing corpus structure more to the desired ideal ensemble of text types, genres and topics ; 2) lemmatization and full MSD-tagging of the whole corpus. This second update is realized using lemmatizer and MSD-tagger for Croatian described in (Agi`c et al. 2008, Agić et al. 2009a). It achieves results at the level of state-of-art of taggers for other Slavic languages while in lemmatization it offers some novel solutions in its hybrid approach to disambiguation of lemmatization. Lemmatized, MSD-tagged and disambiguated HNK is available for querying through standard client-server architecture Manatee/Bonito. The contribution concludes with future directions for HNK. | ||
536 | |a Projekt MZOS |f 130-1300646-0645 | ||
546 | |a ENG | ||
690 | |a 6.03 | ||
693 | |a corpus, corpus linguistics, Croatian National Corpus, Croatian language |l hrv |2 crosbi | ||
693 | |a corpus, corpus linguistics, Croatian National Corpus, Croatian language |l eng |2 crosbi | ||
773 | 0 | |t After Half a Century of Slavonic Natural Language Processing |d Brno : Masaryk University, 2009 |h 261 |n Hlaváčková, Dana ; Horák, Aleš ; Osolsobě, Klara ; Rychlý, Pavel |z 978-80-7399-815-8 |g str. 199-205 | |
942 | |c POG |t 1.16.1 |u 2 |z Znanstveni | ||
999 | |c 312373 |d 312371 |