Sentence classification and clause detection for Croatian

We present a method for classifying Croatian sentences by structure and detecting independent and dependent clauses within these sentences and provide its evaluation. A prototype system applying the method was implemented by using the NooJ linguistic development environment, both for purposes of thi...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:316872/Details
Matična publikacija: Proceedings of the 7th International Conference on Formal Approaches to South Slavic and Balkan Languages
Zagreb : Croatian Language Technologies Society -- Faculty of Humanities and Social Sciences, 2010
Glavni autori: Kocijan, Kristina (-), Tadić, Marko (Author), Agić, Željko
Vrsta građe: Članak
Jezik: eng
Online pristup: http://hnk.ffzg.hr/fassbl2010/
LEADER 02689naa a2200277uu 4500
008 131111s2010 xx 1 eng|d
035 |a (CROSBI)484912 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |9 446  |a Kocijan, Kristina 
245 1 0 |a Sentence classification and clause detection for Croatian /  |c Vučković, Kristina ; Agić, Željko ; Tadić, Marko. 
246 3 |i Naslov na engleskom:  |a Sentence Classification and Clause Detection for Croatian 
300 |a 131-138  |f str. 
520 |a We present a method for classifying Croatian sentences by structure and detecting independent and dependent clauses within these sentences and provide its evaluation. A prototype system applying the method was implemented by using the NooJ linguistic development environment, both for purposes of this experiment and for further utilization in a prototype rule-based chunking and shallow parsing system for Croatian. With regards to pre-processing, we implemented and evaluated three different approaches to designing the system: (1) no pre-processing of input sentences, (2) automatic morphosyntactic tagging of sentences by using the CroTag stochastic tagger and (3) manual morphosyntactic annotation of input sentences. All three approaches were evaluated for sentence classification and clause detection accuracy in terms of precision and recall. The highest scoring system was the one using sentences with manually assigned morphosyntactic tags as input and it scored an overall F1-measure of 0.861 (P: 0.928, R: 0.813). In the paper, a more detailed discussion of system design and experiment setup is provided, followed by a discussion of the obtained results and future research directions. 
536 |a Projekt MZOS  |f 130-1300646-0645 
536 |a Projekt MZOS  |f 130-1300646-1776 
546 |a ENG 
690 |a 5.04 
690 |a 6.03 
693 |a sentence detection, sentence classification, clause detection, Croatian language  |l hrv  |2 crosbi 
693 |a sentence detection, sentence classification, clause detection, Croatian language  |l eng  |2 crosbi 
773 0 |a Formal Approaches to South Slavic and Balkan Languages (04-06.10.2010. ; Dubrovnik, Hrvatska)  |t Proceedings of the 7th International Conference on Formal Approaches to South Slavic and Balkan Languages  |d Zagreb : Croatian Language Technologies Society -- Faculty of Humanities and Social Sciences, 2010  |n Tadić, Marko ; Dimitrova-Vulchanova, Mila ; Koeva, Svetla  |z 978-953-55375-2-6  |g str. 131-138 
700 1 |9 888  |a Tadić, Marko  |4 aut 
700 1 |9 495  |a Agić, Željko  |4 aut 
856 |u http://hnk.ffzg.hr/fassbl2010/ 
942 |c RZB  |u 2  |v Recenzija  |z Znanstveni - Predavanje - CijeliRad  |t 1.08 
999 |c 316872  |d 316870