Building the Croatian National Corpus
The paper presents the work being done so far on the building of the Croatian National Corpus (HNK). It's being collected since 1998 at the Institute of Linguistics, Faculty of Philosophy, University of Zagreb. The size, time-span, its composition and criteria for text selection are being pr...
Permalink: | http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:310962/Details |
---|---|
Matična publikacija: |
Third International Conference on Language Resources and Evaluation LREC2002 González Rodriguez, M. ; Suarez Araujo, C. P. |
Glavni autor: | Tadić, Marko (-) |
Vrsta građe: | Članak |
Jezik: | eng |
Online pristup: |
http://bib.irb.hr/datoteka/125424.MT4LREC2002.pdf http://www.hnk.ffzg.hr/txts/mt4LREC2002.pdf http://www.hnk.ffzg.hr/txts/mt4LREC2002.zip |
LEADER | 01970naa a2200265uu 4500 | ||
---|---|---|---|
008 | 131111s2002 xx eng|d | ||
020 | |a 29517408-0-8 | ||
035 | |a (CROSBI)125424 | ||
040 | |a HR-ZaFF |b hrv |c HR-ZaFF |e ppiak | ||
100 | 1 | |a Tadić, Marko | |
245 | 1 | 0 | |a Building the Croatian National Corpus / |c Tadić, Marko. |
246 | 3 | |i Naslov na engleskom: |a Building the Croatian National Corpus | |
300 | |a 441-446 |f str. | ||
520 | |a The paper presents the work being done so far on the building of the Croatian National Corpus (HNK). It's being collected since 1998 at the Institute of Linguistics, Faculty of Philosophy, University of Zagreb. The size, time-span, its composition and criteria for text selection are being presented. The HNK consists of two parts: 1) 30-million corpus of contemporary Croatian language, 2) Croatian Electronic Textual Archive. The procedures of the corpus mark-up and processing are being discussed. One of the most interesting features of this corpus since its launch in 1998 is its availability for querying through the WWW. The future directions of 30m corpus enlargement to 100m in next few years, enhanced corpus management and querying as well as annotation and processing are being discussed at the end. | ||
536 | |a Projekt MZOS |f 0130418 | ||
546 | |a ENG | ||
690 | |a 6.03 | ||
693 | |a Croatian language, Corpus building, Croatian national corpus, Pos tagging |l hrv |2 crosbi | ||
693 | |a Croatian language, Corpus building, Croatian national corpus, Pos tagging |l eng |2 crosbi | ||
773 | 0 | |t Third International Conference on Language Resources and Evaluation LREC2002 |d Pariz-Las Palmas : ELRA, 2002 |n González Rodriguez, M. ; Suarez Araujo, C. P. |z 2-9517408-0-8 |g str. 441-446 | |
856 | |u http://bib.irb.hr/datoteka/125424.MT4LREC2002.pdf | ||
856 | |u http://www.hnk.ffzg.hr/txts/mt4LREC2002.pdf | ||
856 | |u http://www.hnk.ffzg.hr/txts/mt4LREC2002.zip | ||
942 | |c POG |t 1.16.1 |u 1 |z Znanstveni | ||
999 | |c 310962 |d 310960 |