The SETimes.HR Linguistically Annotated Corpus of Croatian

We present SETIMES.HR— the first linguistically annotated corpus of Croatian that is freely available for all purposes. The corpus is built on top of the SETIMES parallel corpus of nine Southeast European languages and English. It is manually annotated for lemmas, morphosyntactic tags, named entities...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:335696/Details
Matična publikacija: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)
Reykjavik, Iceland : European Language Resources Association (ELRA), 2014
Glavni autori: Agić, Željko (-), Ljubešić, Nikola, informatičar (Author)
Vrsta građe: Članak
Jezik: eng
Online pristup: http://bib.irb.hr/datoteka/698032.690_Paper.pdf
http://www.lrec-conf.org/proceedings/lrec2014/pdf/690_Paper.pdf
LEADER 02418naa a22002897i 4500
003 HR-ZaFF
005 20180114222553.0
008 150109s2014 ic 1 eng|d
999 |c 335696  |d 335693 
035 |a (CROSBI)698032 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |9 495  |a Agić, Željko 
245 1 4 |a The SETimes.HR Linguistically Annotated Corpus of Croatian /  |c Agić, Željko ; Ljubešić, Nikola. 
246 3 |i Naslov na engleskom:  |a The SETimes.HR Linguistically Annotated Corpus of Croatian 
300 |a 1724-1727  |f str. 
520 |a We present SETIMES.HR— the first linguistically annotated corpus of Croatian that is freely available for all purposes. The corpus is built on top of the SETIMES parallel corpus of nine Southeast European languages and English. It is manually annotated for lemmas, morphosyntactic tags, named entities and dependency syntax. We couple the corpus with domain-sensitive test sets for Croatian and Serbian to support direct model transfer evaluation between these closely related languages. We build and evaluate statistical models for lemmatization, morphosyntactic tagging, named entity recognition and dependency parsing on top of SETIMES.HR and the test sets, providing the state of the art in all the tasks. We make all resources presented in the paper freely available under a very permissive licensing scheme. 
536 |a Projekt MZOS  |f 130-1300646-1776 
546 |a ENG 
690 |a 5.04 
693 |a dependency treebank, Croatian language, free availability  |l hrv  |2 crosbi 
693 |a dependency treebank, Croatian language, free availability  |l eng  |2 crosbi 
700 1 |9 445  |a Ljubešić, Nikola,   |c informatičar  |4 aut 
773 0 |a International Conference on Language Resources and Evaluation, LREC ( 9 ; 2014 ; Reykjavik, Island)  |t Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)  |d Reykjavik, Iceland : European Language Resources Association (ELRA), 2014  |n Calzolari, Nicoletta ; Choukri, Khalid ; Declerck, Thierry ; Loftsson, Hrafn ; Maegaard, Bente ; Mariani, Joseph ; Moreno, Asuncion ; Odijk, Jan ; Piperidis, Stelios  |z 978-2-9517408-8-4  |g str. 1724-1727 
856 |u http://bib.irb.hr/datoteka/698032.690_Paper.pdf 
856 |u http://www.lrec-conf.org/proceedings/lrec2014/pdf/690_Paper.pdf 
942 |c RZB  |u 2  |v Recenzija  |z Znanstveni - Poster - CijeliRad  |t 1.08 
962 |w WOS:000355611003054