Article | Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden | Tilde MODEL - Multilingual Open Data for EU Languages
Göm menyn

Title:
Tilde MODEL - Multilingual Open Data for EU Languages
Author:
Roberts Rozis: Tilde Raivis SkadinŇ°: Tilde
Download:
Full text (pdf)
Year:
2017
Conference:
Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden
Issue:
131
Article no.:
035
Pages:
263-265
No. of pages:
3
Publication type:
Abstract and Fulltext
Published:
2017-05-08
ISBN:
978-91-7685-601-7
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press, Linköpings universitet


Export in BibTex, RIS or text

This paper describes a Multilingual Open Data corpus for European languages that was built in scope of the MODEL project. We describe the approach chosen to select data sources, which data sources were used, how the source data was handled, what tools were used and what data was obtained in the result of the project. Obtained data quality is presented, and a summary of challenges and chosen solutions are described, too. This paper may serve as a guide and reference in case someone might try to do something similar, as well as a guide to the new open data obtained.

Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Author:
Roberts Rozis, Raivis SkadinŇ°
Title:
Tilde MODEL - Multilingual Open Data for EU Languages
References:

Koehn, P. 2005. Europarl: A Parallel Corpus for Statistical Machine Translation. In Conference Proceedings: the tenth Machine Translation Summit. Phuket, Thailand: AAMT, pp. 79-86.


Moore, R.C. 2002. Fast and Accurate Sentence Alignment of Bilingual Corpora. In Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users. London, UK: Springer-Verlag, pp. 135-144.


SkadinŇ° R., Tiedemann J., Rozis R., Deksne D. 2014. Billions of Parallel Words for Free: Building and Using the EU Bookshop Corpus. Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14), pp. 1850‚Äď1855.


Steinberger, R., Eisele, A., Klocek, S., Pilos, S., & Schl√ľter, P. 2012. DGT-TM: A freely Available Translation Memory in 22 Languages. Proceedings of the 8th international conference on Language Resources and Evaluation (LREC’2012). Istanbul, Turkey, pp. 454-459.


Tiedemann, J. 2009. News from OPUS - A Collection of Multilingual Parallel Corpora with Tools and Interfaces. In N. Nicolov and K. Bontcheva and G. Angelova and R. Mitkov (eds.) Recent Advances in Natural Language Processing (vol V), pp. 237-248, John Benjamins, Amsterdam/Philadelphia

Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Author:
Roberts Rozis, Raivis SkadinŇ°
Title:
Tilde MODEL - Multilingual Open Data for EU Languages
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2017-02-21