Finding Multiword Term Candidates in Croatian
The paper presents the research in the field of statistical processing of a corpus of texts in Croatian with the primary aim of finding statistically significant co-occurrences of n-grams of tokens (digrams , trigrams and tetragrams). The collocations found with this method present the list of candi...
Permalink: | http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:314161/Details |
---|---|
Matična publikacija: |
Proceedings of Information Extraction for Slavic Languages 2003 Workshop (IESL2003) Sofija : BAS, 2003 |
Glavni autori: | Tadić, Marko (-), Šojat, Krešimir (Author) |
Vrsta građe: | Članak |
Jezik: | eng |
LEADER | 02030naa a2200229uu 4500 | ||
---|---|---|---|
008 | 131111s2003 xx 1 eng|d | ||
035 | |a (CROSBI)126566 | ||
040 | |a HR-ZaFF |b hrv |c HR-ZaFF |e ppiak | ||
100 | 1 | |a Tadić, Marko | |
245 | 1 | 0 | |a Finding Multiword Term Candidates in Croatian / |c Tadić, Marko ; Šojat, Krešimir. |
246 | 3 | |i Naslov na engleskom: |a Finding Multiword Term Candidates in Croatian | |
300 | |a 102-107 |f str. | ||
520 | |a The paper presents the research in the field of statistical processing of a corpus of texts in Croatian with the primary aim of finding statistically significant co-occurrences of n-grams of tokens (digrams , trigrams and tetragrams). The collocations found with this method present the list of candidates for multiword terminological units submitted to terminologists for further processing i.e. manual selecting of the “ ; real terms” ; . The statistical measure of co-occurrence used is mutual information (MI3) accompanied with linguistic filters: stop-words and POS. The results on non-lemmatized material of a highly inflected lan-guage such as Croatian show that MI measure alone is not sufficient to find satisfactory number of multi-word term candidates. In this case the usage of absolute frequency combined with linguistic filtering techniques gives broader list of candidates for real terms. | ||
536 | |a Projekt MZOS |f 0130418 | ||
546 | |a ENG | ||
690 | |a 6.03 | ||
693 | |a Croatian Language, multiword terms, term candidates, statistical processing, mutual information |l hrv |2 crosbi | ||
693 | |a Croatian Language, multiword terms, term candidates, statistical processing, mutual information |l eng |2 crosbi | ||
700 | 1 | |a Šojat, Krešimir |4 aut | |
773 | 0 | |a Information Extraction for Slavic Languages 2003 Workshop (08.-09.09.2003 ; Borovets, Bugarska) |t Proceedings of Information Extraction for Slavic Languages 2003 Workshop (IESL2003) |d Sofija : BAS, 2003 |g str. 102-107 | |
942 | |c RZB |u 1 |v Recenzija |z Znanstveni - Predavanje - CijeliRad | ||
999 | |c 314161 |d 314159 |