MARC: Improving chunking accuracy on Croatian texts by morphosyntactic tagging

Improving chunking accuracy on Croatian texts by morphosyntactic tagging

In this paper, we present the results of an experiment with utilizing a stochastic morphosyntactic tagger as a pre-processing module of a rule-based chunker and partial parser for Croatian in order to raise its overall chunking and partial parsing accuracy on Croatian texts. In order to conduct the...

Full description

Permalink:	http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:316610/Details
Matična publikacija:	Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC2010) Valletta : European Language Resources Association, 2010
Glavni autori:	Kocijan, Kristina (-), Tadić, Marko (Author), Agić, Željko
Vrsta građe:	Članak
Jezik:	eng
Online pristup:	http://bib.irb.hr/datoteka/455001.834_Paper.pdf http://www.lrec-conf.org/proceedings/lrec2010/pdf/834_Paper.pdf


LEADER	02876naa a2200289uu 4500
008	131111s2010 xx 1 eng\|d
035			\|a (CROSBI)455001
040			\|a HR-ZaFF \|b hrv \|c HR-ZaFF \|e ppiak
100	1		\|9 446 \|a Kocijan, Kristina
245	1	0	\|a Improving chunking accuracy on Croatian texts by morphosyntactic tagging / \|c Vučković, Kristina ; Agić, Željko ; Tadić, Marko.
246	3		\|i Naslov na engleskom: \|a Improving Chunking Accuracy on Croatian Texts by Morphosyntactic Tagging
300			\|a 1944-1949 \|f str.
520			\|a In this paper, we present the results of an experiment with utilizing a stochastic morphosyntactic tagger as a pre-processing module of a rule-based chunker and partial parser for Croatian in order to raise its overall chunking and partial parsing accuracy on Croatian texts. In order to conduct the experiment, we have manually chunked and partially parsed 459 sentences from the Croatia Weekly 100 kw newspaper sub-corpus taken from the Croatian National Corpus, that were previously also morphosyntactically disambiguated and lemmatized. Due to the lack of resources of this type, these sentences were designated as a temporary chunking and partial parsing gold standard for Croatian. We have then evaluated the chunker and partial parser in three different scenarios: (1) chunking previously morphosyntactically untagged text, (2) chunking text that was tagged using the stochastic morphosyntactic tagger for Croatian and (3) chunking manually tagged text. The obtained F1- scores for the three scenarios were, respectively, 0.875 (P: 0.826, R: 0.930), 0.900 (P: 0.866, R: 0.937) and 0.930 (P: 0.912, R: 0.949). The paper provides the description of language resources and tools used in the experiment, its setup and discussion of results and perspectives for future work.
536			\|a Projekt MZOS \|f 130-1300646-0645
536			\|a Projekt MZOS \|f 130-1300646-1776
546			\|a ENG
690			\|a 5.04
690			\|a 6.03
693			\|a chunking, partial parsing, morphosyntactic tagging \|l hrv \|2 crosbi
693			\|a chunking, partial parsing, morphosyntactic tagging \|l eng \|2 crosbi
773	0		\|a Seventh International Conference on Language Resources and Evaluation (19.-21.05.2010. ; Valletta, Malta) \|t Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC2010) \|d Valletta : European Language Resources Association, 2010 \|n Calzolari, Nicoletta ; Choukri, Khalid ; Maegaard, Bente ; Mariani, Joseph ; Odjik, Jan ; Piperidis, Stelios ; Rosner, Mike ; Tapias, Daniel \|z 2-9517408-6-7 \|g str. 1944-1949
700	1		\|9 888 \|a Tadić, Marko \|4 aut
700	1		\|9 495 \|a Agić, Željko \|4 aut
856			\|u http://bib.irb.hr/datoteka/455001.834_Paper.pdf
856			\|u http://www.lrec-conf.org/proceedings/lrec2010/pdf/834_Paper.pdf
942			\|c RZB \|u 2 \|v Recenzija \|z Znanstveni - Poster - CijeliRad \|t 1.08
999			\|c 316610 \|d 316608

Improving chunking accuracy on Croatian texts by morphosyntactic tagging

Slični primjerci