Document Representation Methods for News Event Detection in Croatian
Constant increase in the amount of available data in the world in general demands new organizational and representational ideas and approaches. Document clustering as a method for event detection uses, supplements and upgrades existing information retrieval methods in order to improve knowledge mana...
Permalink: | http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:315743/Details |
---|---|
Matična publikacija: |
Proceedings of the 6th International Conference on Formal Approaches to South Slavic and Balkan Languages Zagreb : Croatian Language Technologies Society, 2008 |
Glavni autori: | Ljubešić, Nikola, informatičar (-), Bakarić, Nikola (Author), Agić, Željko |
Vrsta građe: | Članak |
Jezik: | eng |
Online pristup: |
http://bib.irb.hr/datoteka/364770.2008-FASSBL-NLZANB-final.pdf |
LEADER | 02656naa a2200289uu 4500 | ||
---|---|---|---|
008 | 131111s2008 xx 1 eng|d | ||
035 | |a (CROSBI)364770 | ||
040 | |a HR-ZaFF |b hrv |c HR-ZaFF |e ppiak | ||
100 | 1 | |9 445 |a Ljubešić, Nikola, |c informatičar | |
245 | 1 | 0 | |a Document Representation Methods for News Event Detection in Croatian / |c Ljubešić, Nikola ; Agić, Željko ; Bakarić, Nikola. |
246 | 3 | |i Naslov na engleskom: |a Document Representation Methods for News Event Detection in Croatian | |
300 | |a 79-84 |f str. | ||
520 | |a Constant increase in the amount of available data in the world in general demands new organizational and representational ideas and approaches. Document clustering as a method for event detection uses, supplements and upgrades existing information retrieval methods in order to improve knowledge management and representation. This article describes the research done in order to determine the impact of various methods of document representation on cluster analysis. Several statistical and linguistic NLP morphological normalization methods of document representation are tested in an event detection scenario. Event detection was conducted using online newspaper articles issued on a single day. A cluster analysis was done using the various document representation methods and a clustering algorithm. The results were then compared against a human evaluated golden standard. The results show that both statistical and linguistic methods simplify the representational complexity and minimally improve the results which lead to the conclusion that for this task statistical methods should be preferred. | ||
536 | |a Projekt MZOS |f 130-1300646-1776 | ||
536 | |a Projekt MZOS |f 130-1301679-1380 | ||
546 | |a ENG | ||
690 | |a 2.09 | ||
690 | |a 5.04 | ||
690 | |a 6.03 | ||
693 | |a document representation, document clustering, news event detection |l hrv |2 crosbi | ||
693 | |a document representation, document clustering, news event detection |l eng |2 crosbi | ||
700 | 1 | |a Bakarić, Nikola |4 aut | |
700 | 1 | |9 495 |a Agić, Željko |4 aut | |
773 | 0 | |a 6th International Conference on Formal Approaches to South Slavic and Balkan Languages (FASSBL 2008) (25-28.09.2008. ; Dubrovnik, Hrvatska) |t Proceedings of the 6th International Conference on Formal Approaches to South Slavic and Balkan Languages |d Zagreb : Croatian Language Technologies Society, 2008 |n Tadić, Marko ; Dimitrova-Vulchanova, Mila ; Koeva, Svetla |z 978-953-55375-0-2 |g str. 79-84 | |
856 | |u http://bib.irb.hr/datoteka/364770.2008-FASSBL-NLZANB-final.pdf | ||
942 | |c RZB |u 2 |v Recenzija |z Znanstveni - Predavanje - CijeliRad |t 1.08 | ||
999 | |c 315743 |d 315741 |