Article | Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16 | Combining Statistical Machine Translation and Translation Memories with Domain Adaptation
Göm menyn

Title:
Combining Statistical Machine Translation and Translation Memories with Domain Adaptation
Author:
Samuel Läubli: Institute of Computational Linguistics, University of Zurich, Zürich, Switzerland Mark Fishel: Institute of Computational Linguistics, University of Zurich, Zürich, Switzerland Martin Volk: Institute of Computational Linguistics, University of Zurich, Zürich, Switzerland Manuela Weibel: SemioticTransfer AG, Baden, Switzerland
Download:
Full text (pdf)
Year:
2013
Conference:
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16
Issue:
085
Article no.:
030
Pages:
331-341
No. of pages:
11
Publication type:
Abstract and Fulltext
Published:
2013-05-17
ISBN:
978-91-7519-589-6
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press; Linköpings universitet


Export in BibTex, RIS or text

Since the emergence of translation memory software; translation companies and freelance translators have been accumulating translated text for various languages and domains. This data has the potential of being used for training domain-specific machine translation systems for corporate or even personal use. But while the resulting systems usually perform well in translating domain-specific language; their out-of-domain vocabulary coverage is often insufficient due to the limited size of the translation memories. In this paper; we demonstrate that small in-domain translation memories can be successfully complemented with freely available general-domain parallel corpora such that (a) the number of out-of-vocabulary words (OOV) is reduced while (b) the in-domain terminology is preserved. In our experiments; a German–French and a German–Italian statistical machine translation system geared to marketing texts of the automobile industry has been significantly improved using Europarl and OpenSubtitles data; both in terms of automatic evaluation metrics and human judgement.

Keywords: Machine Translation; Translation Memory; Domain Adaptation; Perplexity Minimization

Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16

Author:
Samuel Läubli, Mark Fishel, Martin Volk, Manuela Weibel
Title:
Combining Statistical Machine Translation and Translation Memories with Domain Adaptation
References:

Bertoldi; N.; Haddow; B.; and Fouet; J.-B. (2009). Improved minimum error rate training in moses. The Prague Bulletin of Mathematical Linguistics; 91(1):7–16.

Callison-Burch; C.; Koehn; P.; Monz; C.; Post; M.; Soricut; R.; and Specia; L. (2012). Findings of the 2012 workshop on statistical machine translation. In Proceedings of the Seventh Workshop on Statistical Machine Translation; pages 10–51; Montréal; Canada. Association for Computational Linguistics.

Chen; S. F. and Goodman; J. (1998). An empirical study of smoothing techniques for language modeling. omputer Speech & Language; 13:359–393.

Clark; J. H.; Dyer; C.; Lavie; A.; and Smith; N. A. (2011). Better hypothesis testing for statistical machine translation: Controlling for optimizer instability. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2; HLT ’11; pages 176–181; Stroudsburg; PA; USA. Association for Computational Linguistics.

Denkowski; M. and Lavie; A. (2011). Meteor 1.3: automatic metric for reliable optimization and evaluation of machine translation systems. In Proceedings of the Sixth Workshop on Statistical Machine Translation; WMT ’11; pages 85–91; Stroudsburg; PA; USA. Association for Computational Linguistics.

Dyet; C (2009). Using a maximum entropy model to build segmentation lattices for MT. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics; NAACL ’09; pages 406–414; Stroudsburg; PA; USA. Association for Computational Linguistics.

Federico; M. and Cettolo; M. (2007). Efficient handling of n-gram language models for statistical machine translation. In Proceedings of the Second Workshop on Statistical Machine Translation; StatMT ’07; pages 88–95; Stroudsburg; PA; USA. Association for Computational Linguistics.

Foster; G. and Kuhn; R. (2007). Mixture-model adaptation for SMT. In Proceedings of the Second Workshop on Statistical Machine Translation; StatMT ’07; pages 128–135; Stroudsburg; PA; USA. Association for Computational Linguistics.

Hardmeier; C.; Bisazza; A.; and Federico; M. (2010). FBK at WMT 2010: word lattices for morphological reduction and chunk-based reordering. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR; WMT ’10; pages 88–92; Stroudsburg; PA; USA. Association for Computational Linguistics.

Kanavos; P. and Kartsaklis; D. (2010). Integrating machine translation with translation memory: A practical approach. In JEC 2010: Second joint EM+/CNGL Workshop “Bringing MT to the user: research on integrating MT in the translation industry”; pages 11–20.

Koehn; P. (2005). Europarl: A Parallel Corpus for Statistical Machine Translation. In Machine Translation Summit X; pages 79–86; Phuket; Thailand.

Koehn; P.; Hoang; H.; Birch; A.; Callison-Burch; C.; Federico; M.; Bertoldi; N.; Cowan; B.; Shen; W.; Moran; C.; Zens; R.; Dyer; C.; Bojar; O.; Constantin; A.; and Herbst; E. (2007). Moses: open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions; ACL ’07; pages 177–180; Stroudsburg; PA; USA. Association for Computational Linguistics.

Koehn; P. and Knight; K. (2003). Empirical methods for compound splitting. In Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1; EACL ’03; pages 187–193; Stroudsburg; PA; USA. Association for Computational Linguistics.

Koehn; P. and Senellart; J. (2010). Convergence of translation memory and statistical machine translation. In JEC 2010: Second joint EM+/CNGL Workshop “Bringing MT to the user: research on integrating MT in the translation industry”; pages 21–31.

Och; F. J. (2003). Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1; ACL ’03; pages 160–167; Stroudsburg; PA; USA. Association for Computational Linguistics.

Pym; A.; Grin; F.; Sfreddo; C.; and Chan; A. L. J. (2012). The Status of the Translation Profession in the European Union; volume 7/2012 of Studies on translation and multilingualism. Publications Office of the European Union.

Sennrich; R. (2012). Perplexity Minimization for Translation Model Domain Adaptation in Statistical Machine Translation. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics; EACL ’12; pages 539–549; Stroudsburg; PA; USA. Association for Computational Linguistics.

Stymne; S. (2009). Compound processing for phrase-based statistical machine translation. Master’s thesis; Linköping University; Sweden.

Tiedemann; J. (2009). News from OPUS - A Collection of Multilingual Parallel Corpora with Tools and Interfaces. In Nicolov; N.; Bontcheva; K.; Angelova; G.; and Mitkov; R.; editors; Recent Advances in Natural Language Processing; volume V; pages 237–248; Borovets; Bulgaria. John Benjamins; Amsterdam/Philadelphia.

Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16

Author:
Samuel Läubli, Mark Fishel, Martin Volk, Manuela Weibel
Title:
Combining Statistical Machine Translation and Translation Memories with Domain Adaptation
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2017-02-21