Document Representation Methods for News Event Detection in Croatian

Constant increase in the amount of available data in the world in general demands new organizational and representational ideas and approaches. Document clustering as a method for event detection uses, supplements and upgrades existing information retrieval methods in order to improve knowledge mana...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:315743/Details
Matična publikacija: Proceedings of the 6th International Conference on Formal Approaches to South Slavic and Balkan Languages
Zagreb : Croatian Language Technologies Society, 2008
Glavni autori: Ljubešić, Nikola, informatičar (-), Bakarić, Nikola (Author), Agić, Željko
Vrsta građe: Članak
Jezik: eng
Online pristup: http://bib.irb.hr/datoteka/364770.2008-FASSBL-NLZANB-final.pdf
LEADER 02656naa a2200289uu 4500
008 131111s2008 xx 1 eng|d
035 |a (CROSBI)364770 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |9 445  |a Ljubešić, Nikola,   |c informatičar 
245 1 0 |a Document Representation Methods for News Event Detection in Croatian /  |c Ljubešić, Nikola ; Agić, Željko ; Bakarić, Nikola. 
246 3 |i Naslov na engleskom:  |a Document Representation Methods for News Event Detection in Croatian 
300 |a 79-84  |f str. 
520 |a Constant increase in the amount of available data in the world in general demands new organizational and representational ideas and approaches. Document clustering as a method for event detection uses, supplements and upgrades existing information retrieval methods in order to improve knowledge management and representation. This article describes the research done in order to determine the impact of various methods of document representation on cluster analysis. Several statistical and linguistic NLP morphological normalization methods of document representation are tested in an event detection scenario. Event detection was conducted using online newspaper articles issued on a single day. A cluster analysis was done using the various document representation methods and a clustering algorithm. The results were then compared against a human evaluated golden standard. The results show that both statistical and linguistic methods simplify the representational complexity and minimally improve the results which lead to the conclusion that for this task statistical methods should be preferred. 
536 |a Projekt MZOS  |f 130-1300646-1776 
536 |a Projekt MZOS  |f 130-1301679-1380 
546 |a ENG 
690 |a 2.09 
690 |a 5.04 
690 |a 6.03 
693 |a document representation, document clustering, news event detection  |l hrv  |2 crosbi 
693 |a document representation, document clustering, news event detection  |l eng  |2 crosbi 
700 1 |a Bakarić, Nikola  |4 aut 
700 1 |9 495  |a Agić, Željko  |4 aut 
773 0 |a 6th International Conference on Formal Approaches to South Slavic and Balkan Languages (FASSBL 2008) (25-28.09.2008. ; Dubrovnik, Hrvatska)  |t Proceedings of the 6th International Conference on Formal Approaches to South Slavic and Balkan Languages  |d Zagreb : Croatian Language Technologies Society, 2008  |n Tadić, Marko ; Dimitrova-Vulchanova, Mila ; Koeva, Svetla  |z 978-953-55375-0-2  |g str. 79-84 
856 |u http://bib.irb.hr/datoteka/364770.2008-FASSBL-NLZANB-final.pdf 
942 |c RZB  |u 2  |v Recenzija  |z Znanstveni - Predavanje - CijeliRad  |t 1.08 
999 |c 315743  |d 315741