Retrieving Information in Croatian: Building a Simple and Efficient Rule-based Stemmer

Since Croatian is a highly flective language there is a need for morphological normalization of natural language information so that information could become retrievable in a more efficient way. // Although this topic has been researched for more than two decades in Croatia, the vast majority of inf...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:315626/Details
Matična publikacija: 1. međunarodna znanstvena konferencija "The Future of Information Sciences" (INFuture 2007) : Digital information and heritage : zbornik radova
Zagreb : Odsjek za informacijske znanosti Filozofskog fakulteta, 2007
Glavni autori: Ljubešić, Nikola (-), Boras, Damir (Author), Kubelka, Ozren
Vrsta građe: Članak
Jezik: eng
LEADER 02692naa a2200241uu 4500
008 131111s2007 xx 1 eng|d
035 |a (CROSBI)348540 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |a Ljubešić, Nikola 
245 1 0 |a Retrieving Information in Croatian: Building a Simple and Efficient Rule-based Stemmer /  |c Ljubešić, Nikola ; Boras, Damir ; Kubelka, Ozren. 
246 3 |i Naslov na engleskom:  |a Retrieving Information in Croatian: Building a Simple and Efficient Rule-based Stemmer 
300 |a 313-320  |f str. 
520 |a Since Croatian is a highly flective language there is a need for morphological normalization of natural language information so that information could become retrievable in a more efficient way. // Although this topic has been researched for more than two decades in Croatia, the vast majority of information systems that store information written in Croatian still do not have this problem solved. The primary cause for this situation is the high price of existing systems. // The aim of this paper is to analyze the current situation in the industry regarding this problem and to build a rule-based stemmer which would consist of a minimal set of rules for expanding queries to the whole possible paradigm. Such a system could make expensive morphological databases in information retrieval obsolete. // We used a corpus sample, a morphological lexicon and a query sample of 1.000 most frequent nouns in base form to build a rule-based stemmer optimized through the steepest ascent hill climbing algorithm. Using this method we built a stemmer which performs almost equally good as the noun lexicon with // F1 measures of 97.82% without the rules for adjectives and 97.64% with them. 
536 |a Projekt MZOS  |f 130-1301679-1380 
546 |a ENG 
690 |a 5.04 
693 |a Information retrieval, Croatian language, rule-based stemming, hill climbing optimization, industry awareness  |l hrv  |2 crosbi 
693 |a Information retrieval, Croatian language, rule-based stemming, hill climbing optimization, industry awareness  |l eng  |2 crosbi 
700 1 |a Boras, Damir  |4 aut 
700 1 |a Kubelka, Ozren  |4 aut 
773 0 |a Međunarodna znanstvena konferencija "The Future of Information Sciences" : Digital information and heritage (1 ; 2007) (07.-09.11.2007 ; Zagreb, Hrvatska)  |t 1. međunarodna znanstvena konferencija "The Future of Information Sciences" (INFuture 2007) : Digital information and heritage : zbornik radova  |d Zagreb : Odsjek za informacijske znanosti Filozofskog fakulteta, 2007  |n Seljan, Sanja ; Stančić, Hrvoje  |z 978-953-175-305-0  |g str. 313-320 
942 |c RZB  |u 1  |v Nista  |z Znanstveni - Poster - CijeliRad 
999 |c 315626  |d 315624