Inductive Morphosyntactic Tagsets

There is a number of morphological generators for Croatian language such as Kržak (1988), Silić (1996), Tadić (1994, 2003), and others that are parts of Korektor© , Hrvatska Riječ© , Lapis© and other applications, all of which, except Tadić’ s, have application as spelling checkers. Tadić's Gen...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:314795/Details
Glavni autori: Stojanov, Tomislav (-), Dovedan Han, Zdravko (Author), Kocijan, Kristina
Vrsta građe: Članak
Jezik: eng
LEADER 03571naa a2200289uu 4500
008 131111s2005 xx 1 eng|d
035 |a (CROSBI)217175 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |a Stojanov, Tomislav 
245 1 0 |a Inductive Morphosyntactic Tagsets /  |c Stojanov, Tomislav ; Vučković, Kristina ; Dovedan, Zdravko. 
246 3 |i Naslov na engleskom:  |a Inductive Morphosyntactic Tagsets 
300 |f str. 
500 |a Druga recenzija za objavu u tiskanu izdanju još u postupku. 
520 |a There is a number of morphological generators for Croatian language such as Kržak (1988), Silić (1996), Tadić (1994, 2003), and others that are parts of Korektor© , Hrvatska Riječ© , Lapis© and other applications, all of which, except Tadić’ s, have application as spelling checkers. Tadić's GenOblik is developed for the need of the corpus linguistics project and annotated according to the Multext-East specification (Erjavec 2001) that Przepiórkowski & Woliński (2003a, b) have critically evaluated having adopted their own tagset closer to grammatical system of Polish language. This paper also approaches from the criticism of the stated specification, but based on a different ground. The following is emphasized: (i) insufficient differentiation of inherent and relational motivated morphosyntactic features – verb relational categories such as modality, conditionality and compound tense cannot be annotated by tag that is added to an individual lexical unit – the stated features (in Croatian as well as in other languages) do not derive from form as such but are relationally conditioned. (ii) lack of adherence from morphosyntactic criteria in establishing formal criteria – semantic features, like the category of common and proper noun, are introduced, whereas other semantic categories, like countability, collectiveness, transitivity, and optativity are not included. Most of the critique towards the Multext-East specification reflects the so-called deductive approach to the tagset design. The tagging system that relies on the more emphasized qualitative approach is discussed in the second part of this paper. Explained is the so called inductive approach to creating a system of tags where the tags are derived from the morphological generator itself which avoids the disadvantages of the deductive system of tags and gains greater grammatical reliability. This could contribute to the greater accuracy in solving homographic forms in the parser’ s algorithm. Six arguments are made in favor of the inductive approach. 
536 |a Projekt MZOS  |f 0130440 
536 |a Projekt MZOS  |f 0212010 
546 |a ENG 
690 |a 2.09 
690 |a 5.04 
690 |a 6.03 
693 |a tagging, tagset, Croatian language, morphological generator, MULTEXT-East, morphosyntax, morphosyntactic category, morphosyntactic feature, adjective aspect, adjective indefiniteness, deductive method, inductive method, corpus linguistics, machine translat  |l hrv  |2 crosbi 
693 |a tagging, tagset, Croatian language, morphological generator, MULTEXT-East, morphosyntax, morphosyntactic category, morphosyntactic feature, adjective aspect, adjective indefiniteness, deductive method, inductive method, corpus linguistics, machine translat  |l eng  |2 crosbi 
773 0 |a Computational Modeling of Lexical Acquisition (25.-28.08.2005. ; Split, Hrvatska) 
700 1 |9 415  |a Dovedan Han, Zdravko  |4 aut 
700 1 |9 446  |a Kocijan, Kristina  |4 aut 
942 |c RZB  |u 2  |v Nista  |z Znanstveni - Predavanje - Nista  |t 3.15 
999 |c 314795  |d 314793