Rule based chunker for croatian

In this paper we discuss a rule-based approach to chunking sentences in Croatian, implemented using local regular grammars within the NooJ development environment. We describe the rules and their implementation by regular grammars and at the same time show that in NooJ environment it is extremely ea...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:315515/Details
Matična publikacija: Proceedings of the Sixth International Conference on Language Resources and Evaluation LREC2008
Marakeš-Pariz : European Language Resources Association (ELRA), 2008
Glavni autori: Kocijan, Kristina (-), Tadić, Marko (Author), Dovedan Han, Zdravko
Vrsta građe: Članak
Jezik: eng
Online pristup: http://bib.irb.hr/datoteka/342299.KVMTZD4LREC2008.pdf
http://www.lrec-conf.org/proceedings/lrec2008/pdf/631_paper.pdf
LEADER 02495naa a2200301uu 4500
008 131111s2008 xx 1 eng|d
035 |a (CROSBI)342299 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |9 446  |a Kocijan, Kristina 
245 1 0 |a Rule based chunker for croatian /  |c Vučković, Kristina ; Tadić, Marko ; Dovedan, Zdravko. 
246 3 |i Naslov na engleskom:  |a Rule Based Chunker for Croatian 
300 |a 2544-2549  |f str. 
520 |a In this paper we discuss a rule-based approach to chunking sentences in Croatian, implemented using local regular grammars within the NooJ development environment. We describe the rules and their implementation by regular grammars and at the same time show that in NooJ environment it is extremely easy to fine tune their different sub-rules. Since Croatian has strong morphosyntactic features that are shared between most or all elements of a chunk, the rules are built by taking these features into account and strongly relying on them. For the evaluation of our chunker we used a extracted set of manually annotated sentences from 100 kw MSD/tagged and disambiguated Croatian corpus. Our chunker performed the best on VP- chunks (F: 97.01), while NP-chunks (F: 92.31) and PP-chunks (F: 83.08) were of lower quality. The results are comparable to chunker performance of CoNLL-2000 shared task of chunking. 
536 |a Projekt MZOS  |f 036-1300646-1986 
536 |a Projekt MZOS  |f 130-1300646-0645 
536 |a Projekt MZOS  |f 130-1300646-1776 
546 |a ENG 
690 |a 5.04 
690 |a 6.03 
693 |a chunker, rule based, local regular grammar, Croatian  |l hrv  |2 crosbi 
693 |a chunker, rule based, local regular grammar, Croatian  |l eng  |2 crosbi 
773 0 |a The Sixth International Conference on Language Resources and Evaluation LREC2008 (28-30.05.2008. ; Marakeš, Maroko)  |t Proceedings of the Sixth International Conference on Language Resources and Evaluation LREC2008  |d Marakeš-Pariz : European Language Resources Association (ELRA), 2008  |n Calzolari, Nicoletta ; Choukri, Khalid ; Maegaard, Bente ; Mariani, Joseph ; Odjik, Jan ; Piperidis, Stelios ; Tapias, Daniel  |z 2-9517408-4-0  |g str. 2544-2549 
700 1 |9 888  |a Tadić, Marko  |4 aut 
700 1 |9 415  |a Dovedan Han, Zdravko  |4 aut 
856 |u http://bib.irb.hr/datoteka/342299.KVMTZD4LREC2008.pdf 
856 |u http://www.lrec-conf.org/proceedings/lrec2008/pdf/631_paper.pdf 
942 |c RZB  |u 2  |v Recenzija  |z Znanstveni - Poster - CijeliRad  |t 1.08 
999 |c 315515  |d 315513