Building the Croatian National Corpus

The paper presents the work being done so far on the building of the Croatian National Corpus (HNK). It's being collected since 1998 at the Institute of Linguistics, Faculty of Philosophy, University of Zagreb. The size, time-span, its composition and criteria for text selection are being pr...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:310962/Details
Matična publikacija: Third International Conference on Language Resources and Evaluation LREC2002
González Rodriguez, M. ; Suarez Araujo, C. P.
Glavni autor: Tadić, Marko (-)
Vrsta građe: Članak
Jezik: eng
Online pristup: http://bib.irb.hr/datoteka/125424.MT4LREC2002.pdf
http://www.hnk.ffzg.hr/txts/mt4LREC2002.pdf
http://www.hnk.ffzg.hr/txts/mt4LREC2002.zip
LEADER 01970naa a2200265uu 4500
008 131111s2002 xx eng|d
020 |a 29517408-0-8 
035 |a (CROSBI)125424 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |a Tadić, Marko 
245 1 0 |a Building the Croatian National Corpus /  |c Tadić, Marko. 
246 3 |i Naslov na engleskom:  |a Building the Croatian National Corpus 
300 |a 441-446  |f str. 
520 |a The paper presents the work being done so far on the building of the Croatian National Corpus (HNK). It's being collected since 1998 at the Institute of Linguistics, Faculty of Philosophy, University of Zagreb. The size, time-span, its composition and criteria for text selection are being presented. The HNK consists of two parts: 1) 30-million corpus of contemporary Croatian language, 2) Croatian Electronic Textual Archive. The procedures of the corpus mark-up and processing are being discussed. One of the most interesting features of this corpus since its launch in 1998 is its availability for querying through the WWW. The future directions of 30m corpus enlargement to 100m in next few years, enhanced corpus management and querying as well as annotation and processing are being discussed at the end. 
536 |a Projekt MZOS  |f 0130418 
546 |a ENG 
690 |a 6.03 
693 |a Croatian language, Corpus building, Croatian national corpus, Pos tagging  |l hrv  |2 crosbi 
693 |a Croatian language, Corpus building, Croatian national corpus, Pos tagging  |l eng  |2 crosbi 
773 0 |t Third International Conference on Language Resources and Evaluation LREC2002  |d Pariz-Las Palmas : ELRA, 2002  |n González Rodriguez, M. ; Suarez Araujo, C. P.  |z 2-9517408-0-8  |g str. 441-446 
856 |u http://bib.irb.hr/datoteka/125424.MT4LREC2002.pdf 
856 |u http://www.hnk.ffzg.hr/txts/mt4LREC2002.pdf 
856 |u http://www.hnk.ffzg.hr/txts/mt4LREC2002.zip 
942 |c POG  |t 1.16.1  |u 1  |z Znanstveni 
999 |c 310962  |d 310960