Article | Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania | Automatic conversion of colloquial Finnish to standard Finnish Link�ping University Electronic Press Conference Proceedings
Göm menyn

Title:
Automatic conversion of colloquial Finnish to standard Finnish
Author:
Inari Listenmaa: Chalmers Institute of Technology, Sweden Francis M. Tyers: HSL-fakultehta, UiT Norgga árktal šs universitehtaNorway
Download:
Full text (pdf)
Year:
2015
Conference:
Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania
Issue:
109
Article no.:
027
Pages:
219-223
No. of pages:
5
Publication type:
Abstract and Fulltext
Published:
2015-05-06
ISBN:
978-91-7519-098-3
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press, Linköpings universitet


Export in BibTex, RIS or text

This paper presents an unsupervised method for converting between colloquial Finnish and standard Finnish. The method relies upon a small number of orthographical rules combined with a large language model of standard Finnish for ranking the possible conversions. Aside from this contribution, the paper also presents an evaluation corpus consisting of aligned sentences in colloquial Finnish, orthographically-standardised colloquial Finnish and standard Finnish. The methods we present outperforms the baseline of simply treating colloquial Finnish as standard Finnish and offers promise for the adaptation of language-technology tools created for standard Finnish to colloquial Finnish. To this end the paper also presents preliminary results which show promise for using normalisation in the machine translation task.

Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania

Author:
Inari Listenmaa, Francis M. Tyers
Title:
Automatic conversion of colloquial Finnish to standard Finnish
References:

Jacob Eisenstein. 2013. What to do about bad language on the internet. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics.

Kenneth Heafield. 2011. KenLM: faster and smaller language model queries. In Proceedings of the EMNLP 2011 Sixth Workshop on Statistical Machine Translation, pages 187–197, Edinburgh, Scotland, United Kingdom, July.

5The word oo is the negative form of the verb olla ‘to be’ in Finnish.

Fred Karlsson. 2008. Finnish: An Essential Grammar. Routledge, Abingdon, Oxon.

Philipp Koehn, Hieu Hoang, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open Source Toolkit for Statistical Machine Translation. In Demonstration session at the Annual Meeting of the Association for Computational Linguistics (ACL2007).

Philipp Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In Proceedings of MT Summit.

Karen Kukich. 1992. Techniques for automatically correcting words in text. ACM Comput. Surv., 24(4):377–439, December.

Preslav Nakov and J¨org Tiedemann. 2012. Combining word-level and character-level models for machine translation between closely-related languages. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pages 301–305.

Kevin Scannell. 2011. Statistical unicodification of african languages. Language Resources and Evaluation, 45(3):375–386.

Kevin Scannell. 2014. Statistical models for text normalization and machine translation. In Proceedings of the Celtic Language Technology Workshop at COLING 2014.

Jörg Tiedemann. 2009. Character-based PSMT for Closely Related Languages. In Proceedings of 13th Annual Conference of the European Association for Machine Translation (EAMT09), pages 12–19.

Kevin Unhammer and Trond Trosterud. 2009. Reuse of free resources in machine translation between nynorsk and bokml. In Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation, pages 35–42.

Jenni Viinikka and Eero Voutilainen. 2013. Ääniä ilmassa, merkkejä paperilla – puhutun ja kirjoitetun kielen suhteesta. Kielikello.

Richard Zens, Franz Josef Och, and Hermann Ney. 2002. Phrase-based statistical machine translation. In KI - 2002: Advances in Artificial Intelligence. 25. Annual German Conference on AI, KI 2002, volume 2479, pages 18–32. Springer Verlag.

Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania

Author:
Inari Listenmaa, Francis M. Tyers
Title:
Automatic conversion of colloquial Finnish to standard Finnish
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2018-9-11