First Steps Toward Developing a System for Terminology Extraction

The aim of this paper is to describe first steps in developing a system for terminology extraction. First a data sample is built from synopses of doctoral theses at the Faculty of Humanities and Social Sciences, University of Zagreb, accepted in the period from 2004 to 2009 written mostly in Croatia...

Full description

Permalink: http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:316499/Details
Matična publikacija: INFuture2009: Digital Resources and Knowledge Sharing
Zagreb : Department of Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb, 2009
Glavni autori: Bago, Petra (-), Boras, Damir (Author), Ljubešić, Nikola, informatičar
Vrsta građe: Članak
Jezik: eng
LEADER 02231naa a2200241uu 4500
008 131111s2009 xx 1 eng|d
035 |a (CROSBI)439599 
040 |a HR-ZaFF  |b hrv  |c HR-ZaFF  |e ppiak 
100 1 |9 474  |a Bago, Petra 
245 1 0 |a First Steps Toward Developing a System for Terminology Extraction /  |c Bago, Petra ; Boras, Damir ; Ljubešić, Nikola. 
246 3 |i Naslov na engleskom:  |a First Steps Toward Developing a System for Terminology Extraction 
300 |a 197-206  |f str. 
520 |a The aim of this paper is to describe first steps in developing a system for terminology extraction. First a data sample is built from synopses of doctoral theses at the Faculty of Humanities and Social Sciences, University of Zagreb, accepted in the period from 2004 to 2009 written mostly in Croatian language. Data sample consists of 420 documents and 338, 706 tokens. A small sample was manually tagged for terminology to be used in an initial experiment. The approach for terminology extraction is knowledge-driven and consists of differential analysis of reference and domain-specific corpora. Specific method used is log-likelihood ratio test. Experiment deals with different reference corpora and linguistic pre-processing. First results are promising. Further research guidelines are discussed. 
536 |a Projekt MZOS  |f 130-1301679-1380 
546 |a ENG 
690 |a 5.04 
693 |a terminology extraction, data sample, log-likelihood ratio test  |l hrv  |2 crosbi 
693 |a terminology extraction, data sample, log-likelihood ratio test  |l eng  |2 crosbi 
773 0 |a 2nd International Conference “The Future of Information Sciences: INFuture2009 – Digital Resources and Knowledge Sharing” (4-6.11.2009. ; Zagreb, Hrvatska)  |t INFuture2009: Digital Resources and Knowledge Sharing  |d Zagreb : Department of Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb, 2009  |n Stančić, Hrvoje ; Seljan, Sanja ; Bawden, David ; Lasić-Lazić, Jadranka ; Slavić, Aida  |z 978-953-175-355-5  |g str. 197-206 
700 1 |9 418  |a Boras, Damir  |4 aut 
700 1 |9 445  |a Ljubešić, Nikola,   |c informatičar  |4 aut 
942 |c RZB  |u 2  |v Recenzija  |z Znanstveni - Predavanje - CijeliRad  |t 1.08 
999 |c 316499  |d 316497