Article | Proceedings of CHAT 2012: The 2nd Workshop on the Creation; Harmonization and Application of Terminology Resources; Co-located with TKE 2012; June 22; 2012; Madrid; Spain | Towards the Automated Enrichment of Multilingual Terminology Databases with Knowledge-Rich Contexts - Experiments with Russian EuroTermBank Data

Title:
Towards the Automated Enrichment of Multilingual Terminology Databases with Knowledge-Rich Contexts - Experiments with Russian EuroTermBank Data
Author:
Anne-Kathrin Schumann: University of Vienna, Austria / Tilde, Latvia
Download:
Full text (pdf)
Year:
2012
Conference:
Proceedings of CHAT 2012: The 2nd Workshop on the Creation; Harmonization and Application of Terminology Resources; Co-located with TKE 2012; June 22; 2012; Madrid; Spain
Issue:
072
Article no.:
004
Pages:
27-34
No. of pages:
8
Publication type:
Abstract and Fulltext
Published:
2012-06-11
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Publisher:
Linköping University Electronic Press; Linköpings universitet


Export in BibTex, RIS or text

Although knowledge-rich context (KRC) extraction has received a lot of attention; to our knowledge few attempts at directly feeding KRCs into a terminological resource have been undertaken. The aim of this study; therefore; is to investigate to which extent pattern-based KRC extraction can be useful for the enrichment of terminological resources. The paper describes experiments aiming at the enrichment of a multilingual term bank; namely EuroTermBank; with KRCs extracted from Russian language web corpora. The contexts are extracted using a simple pattern-based method and then ranked by means of a supervised machine learning algorithm. The internet is used as a source of information since it is a primary means for finding information about terms and concepts for many language professionals; and a KRC extraction approach must therefore be able to deal with the quality of data found online in order to be applicable to real tasks.

Keywords: computer-aided terminography; knowledge-rich contexts; web as corpus; Russian language; multilingual terminology databases

Proceedings of CHAT 2012: The 2nd Workshop on the Creation; Harmonization and Application of Terminology Resources; Co-located with TKE 2012; June 22; 2012; Madrid; Spain

Author:
Anne-Kathrin Schumann
Title:
Towards the Automated Enrichment of Multilingual Terminology Databases with Knowledge-Rich Contexts - Experiments with Russian EuroTermBank Data
References:

[1] Auger; A.; Barrière; C.: Pattern-based approaches to semantic relation extraction. Terminology. 14 (1); 1-19 (2008)


[2] Condamines; A.; Rebeyrolle; J.: Searching for and identifying conceptual relationships via a corpus-based approach to a Terminological Knowledge Base (CTKB). In: Bourigault; D.; Jacquemin; C.; L’Homme; M.-C. (eds.) Recent Advances in Computational Terminology; pp. 127-148. John Benjamins; Amsterdam/Philadelphia (2001)


[3] De Groc; C.: Babouk: Focused web crawling for corpus compilation and automatic terminology extraction. In: IEEE/WIC/ACM International Conference on Web Intelligence (2011)


[4] Feliu; J.; Cabré; M.: Conceptual relations in specialized texts: new typology and an extraction system proposal. In: Proceedings of TKE 2002; pp. 45-49. INRIA; Nancy (2002)


[5] Halskov; J.; Barrière; C.: Web-based extraction of semantic relation instances for terminology work. Terminology. 14 (1); 20-44 (2008)


[6] International Organization for Standardization. International Standard ISO 12620: 2009 – Terminology and Other Language and Content Resources – Specification of Data Categories and Management of a Data Category Registry for Language Resources. ISO; Geneva (2009)


[7] Malaisé; V.; Zweigenbaum; P.; Bachimont; B.: Mining defining contexts to help structuring differential ontologies. Terminology. 11 (1); 21-53 (2005)


[8] Marshman; E.: Towards strategies for processing relationsips between multiple relation participants in knowledge patterns. An analysis in English and French. Terminology. 13 (1); 1-34 (2007)


[9] Marshman; E.: Expressions of uncertainty in candidate knowledge-rich contexts. A comparison in English and French specialized texts. Terminology. 14 (1); 124-151 (2008)


[10] Meyer; I.: Extracting Knowledge-Rich Contexts for Terminography: A conceptual and methodological framework. In: Bourigault; Jacquemin; L’Homme (eds.); pp. 279-302 (2001)


[11] Pearson; J.: Terms in Context. (Studies in Corpus Linguistics 1). John Benjamins; Amsterdam/Philadelphia (1998)


[12] Rirdance; S.; Vasiljevs; A. (eds.): Towards Consolidation of European Terminology Resources. Experience and Recommendations from EuroTermBank Project. Tilde; Riga (2006)


[13] Schumann; A.-K.: A Bilingual Study of Knowledge-Rich Context Extraction in Russian and German. In: Proceedings of the Fifth Language & Technology Conference; pp. 516-520. Fundacja Uniwersytetu im. A. Mickiewicza; Poznan (2011)


[14] Sharoff; S.: Creating general-purpose corpora using automated search engine queries. In: Baroni; M.; Bernardini; S. (eds.); WaCky! Working papers on the Web as Corpus. Gedit; Bologna (2006)


[15] Sharoff; S.; Kopotev; M.; Erjavec; T.; Feldmann; A.; Divjak; S.: Designing and evaluating Russian tagsets. In: Proceedings of LREC (2008)


[16] Sierra; G.; Alarcón; R.; Aguilar; C.; Bach; C.: Definitional verbal patterns for semantic relation extraction. Terminology. 14 (1); 74-98 (2008)


[17] Walter; S.: Definitionsextraktion aus Urteilstexten. PhD thesis in Computational Linguistics. Saarland University Saarbrücken (2010)


[18] Xu; F.-Y.: Bootstrapping Relation Extraction from Semantic Seeds. PhD thesis in Computational Linguistics. Saarland University Saarbrücken (2007)

Proceedings of CHAT 2012: The 2nd Workshop on the Creation; Harmonization and Application of Terminology Resources; Co-located with TKE 2012; June 22; 2012; Madrid; Spain

Author:
Anne-Kathrin Schumann
Title:
Towards the Automated Enrichment of Multilingual Terminology Databases with Knowledge-Rich Contexts - Experiments with Russian EuroTermBank Data
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment