First Steps Toward Developing a System for Terminology Extraction
The aim of this paper is to describe first steps in developing a system for terminology extraction. First a data sample is built from synopses of doctoral theses at the Faculty of Humanities and Social Sciences, University of Zagreb, accepted in the period from 2004 to 2009 written mostly in Croatia...
Permalink: | http://skupni.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:316499/Details |
---|---|
Matična publikacija: |
INFuture2009: Digital Resources and Knowledge Sharing Zagreb : Department of Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb, 2009 |
Glavni autori: | Bago, Petra (-), Boras, Damir (Author), Ljubešić, Nikola, informatičar |
Vrsta građe: | Članak |
Jezik: | eng |
LEADER | 02231naa a2200241uu 4500 | ||
---|---|---|---|
008 | 131111s2009 xx 1 eng|d | ||
035 | |a (CROSBI)439599 | ||
040 | |a HR-ZaFF |b hrv |c HR-ZaFF |e ppiak | ||
100 | 1 | |9 474 |a Bago, Petra | |
245 | 1 | 0 | |a First Steps Toward Developing a System for Terminology Extraction / |c Bago, Petra ; Boras, Damir ; Ljubešić, Nikola. |
246 | 3 | |i Naslov na engleskom: |a First Steps Toward Developing a System for Terminology Extraction | |
300 | |a 197-206 |f str. | ||
520 | |a The aim of this paper is to describe first steps in developing a system for terminology extraction. First a data sample is built from synopses of doctoral theses at the Faculty of Humanities and Social Sciences, University of Zagreb, accepted in the period from 2004 to 2009 written mostly in Croatian language. Data sample consists of 420 documents and 338, 706 tokens. A small sample was manually tagged for terminology to be used in an initial experiment. The approach for terminology extraction is knowledge-driven and consists of differential analysis of reference and domain-specific corpora. Specific method used is log-likelihood ratio test. Experiment deals with different reference corpora and linguistic pre-processing. First results are promising. Further research guidelines are discussed. | ||
536 | |a Projekt MZOS |f 130-1301679-1380 | ||
546 | |a ENG | ||
690 | |a 5.04 | ||
693 | |a terminology extraction, data sample, log-likelihood ratio test |l hrv |2 crosbi | ||
693 | |a terminology extraction, data sample, log-likelihood ratio test |l eng |2 crosbi | ||
773 | 0 | |a 2nd International Conference “The Future of Information Sciences: INFuture2009 – Digital Resources and Knowledge Sharing” (4-6.11.2009. ; Zagreb, Hrvatska) |t INFuture2009: Digital Resources and Knowledge Sharing |d Zagreb : Department of Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb, 2009 |n Stančić, Hrvoje ; Seljan, Sanja ; Bawden, David ; Lasić-Lazić, Jadranka ; Slavić, Aida |z 978-953-175-355-5 |g str. 197-206 | |
700 | 1 | |9 418 |a Boras, Damir |4 aut | |
700 | 1 | |9 445 |a Ljubešić, Nikola, |c informatičar |4 aut | |
942 | |c RZB |u 2 |v Recenzija |z Znanstveni - Predavanje - CijeliRad |t 1.08 | ||
999 | |c 316499 |d 316497 |