Finding Multiword Term Candidates in Croatian

The paper presents the research in the field of statistical processing of a corpus of texts in Croatian with the primary aim of finding statistically significant co-occurrences of n-grams of tokens (digrams , trigrams and tetragrams). The collocations found with this method present the list of candi...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:314161/Details
Matična publikacija: Proceedings of Information Extraction for Slavic Languages 2003 Workshop (IESL2003)
Sofija : BAS, 2003
Glavni autori: Tadić, Marko (-), Šojat, Krešimir (Author)
Vrsta građe: Članak
Jezik: eng
LEADER 02030naa a2200229uu 4500
008 131111s2003 xx 1 eng|d
035 |a (CROSBI)126566 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |a Tadić, Marko 
245 1 0 |a Finding Multiword Term Candidates in Croatian /  |c Tadić, Marko ; Šojat, Krešimir. 
246 3 |i Naslov na engleskom:  |a Finding Multiword Term Candidates in Croatian 
300 |a 102-107  |f str. 
520 |a The paper presents the research in the field of statistical processing of a corpus of texts in Croatian with the primary aim of finding statistically significant co-occurrences of n-grams of tokens (digrams , trigrams and tetragrams). The collocations found with this method present the list of candidates for multiword terminological units submitted to terminologists for further processing i.e. manual selecting of the &#8220 ; real terms&#8221 ; . The statistical measure of co-occurrence used is mutual information (MI3) accompanied with linguistic filters: stop-words and POS. The results on non-lemmatized material of a highly inflected lan-guage such as Croatian show that MI measure alone is not sufficient to find satisfactory number of multi-word term candidates. In this case the usage of absolute frequency combined with linguistic filtering techniques gives broader list of candidates for real terms. 
536 |a Projekt MZOS  |f 0130418 
546 |a ENG 
690 |a 6.03 
693 |a Croatian Language, multiword terms, term candidates, statistical processing, mutual information  |l hrv  |2 crosbi 
693 |a Croatian Language, multiword terms, term candidates, statistical processing, mutual information  |l eng  |2 crosbi 
700 1 |a Šojat, Krešimir  |4 aut 
773 0 |a Information Extraction for Slavic Languages 2003 Workshop (08.-09.09.2003 ; Borovets, Bugarska)  |t Proceedings of Information Extraction for Slavic Languages 2003 Workshop (IESL2003)  |d Sofija : BAS, 2003  |g str. 102-107 
942 |c RZB  |u 1  |v Recenzija  |z Znanstveni - Predavanje - CijeliRad 
999 |c 314161  |d 314159