Analysis of text collections for the purposes of keyword extraction task

The article discusses the evaluation of automatic keyword extraction algorithms (AKEA) and points out AKEA’s dependence on the properties of the test collection for effectiveness. As a result, it is difficult to compare different algorithms who’s tests were based on various test datasets. It is also...

Full description

Permalink: http://skupni.nsk.hr/Record/nsk.NSK01001088898/Details
Matična publikacija: Journal of information and organizational sciences (Online)
44 (2020), 1 ; str. 171-184
Glavni autori: Vanyushkin, Alexander (Author), Graschenko, Leonid
Vrsta građe: e-članak
Jezik: eng
Predmet:
Online pristup: https://doi.org/10.31341/jios.44.1.8
Journal of information and organizational sciences (Online)
Hrčak
LEADER 02366naa a22003734i 4500
001 NSK01001088898
003 HR-ZaNSK
005 20210218153504.0
006 m d
007 cr||||||||||||
008 210201s2020 ci a |o |0|| ||eng
024 7 |2 doi  |a 10.31341/jios.44.1.8 
035 |a (HR-ZaNSK)001088898 
040 |a HR-ZaNSK  |b hrv  |c HR-ZaNSK  |e ppiak 
041 0 |a eng  |b eng 
042 |a croatica 
044 |a ci  |c hr 
080 1 |a 004  |2 2011 
080 1 |a 81  |2 2011 
100 1 |a Vanyushkin, Alexander  |4 aut  |9 HR-ZaNSK 
245 1 0 |a Analysis of text collections for the purposes of keyword extraction task  |h [Elektronička građa] /  |c Alexander Vanyushkin, Leonid Graschenko. 
300 |b Ilustr. 
504 |a Bibliografske bilješke uz tekst ; bibliografija: 33 jed. 
504 |a Abstract. 
520 |a The article discusses the evaluation of automatic keyword extraction algorithms (AKEA) and points out AKEA’s dependence on the properties of the test collection for effectiveness. As a result, it is difficult to compare different algorithms who’s tests were based on various test datasets. It is also difficult to predict the effectiveness of different systems for solving real-world problems of natural language processing (NLP). We take in to consideration a number of characteristics, such as the text length distribution in words and the method of keyword assignment. Our analysis of publicly available analytical exposition text which is typical for the keywords extraction domain revealed that their length distributions are very regular and described by the lognormal form. Moreover, most of the article lengths range between 400 and 2500 words. Additionally, the paper presents a brief review of eleven corpora that have been used to evaluate AKEA’s. 
653 0 |a Korpusna lingvistika  |a Ključne riječi  |a Obrada prirodnog jezika 
700 1 |a Graschenko, Leonid  |4 aut  |9 HR-ZaNSK 
773 0 |t Journal of information and organizational sciences (Online)  |x 1846-9418  |g 44 (2020), 1 ; str. 171-184  |w nsk.(HR-ZaNSK)000672813 
981 |b Be2020  |b B02/20 
998 |b tino2102 
856 4 0 |u https://doi.org/10.31341/jios.44.1.8 
856 4 0 |u https://jios.foi.hr/index.php/jios/article/view/1185  |y Journal of information and organizational sciences (Online) 
856 4 1 |y Digitalna.nsk.hr 
856 4 0 |u https://hrcak.srce.hr/239794  |y Hrčak