Article | Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden | Multilingwis2 – Explore Your Parallel Corpus
Göm menyn

Title:
Multilingwis2 – Explore Your Parallel Corpus
Author:
Johannes Graën: Institute of Computational Linguistics, University of Zurich, Switzerland Dominique Sandoz: Institute of Computational Linguistics, University of Zurich, Switzerland Martin Volk: Institute of Computational Linguistics, University of Zurich, Switzerland
Download:
Full text (pdf)
Year:
2017
Conference:
Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden
Issue:
131
Article no.:
031
Pages:
247-250
No. of pages:
4
Publication type:
Abstract and Fulltext
Published:
2017-05-08
ISBN:
978-91-7685-601-7
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press, Linköpings universitet


Export in BibTex, RIS or text

We present Multilingwis2, a web based search engine for exploration of wordaligned parallel and multiparallel corpora. Our application extends the search facilities by Clematide et al. (2016) and is designed to be easily employable on any parallel corpus comprising universal part-ofspeech tags, lemmas and word alignments. In addition to corpus exploration, it has proven useful for the assessment of word alignment quality. Loading the results of different alignment methods on the same corpus as different corpora into Multilingwis2 alleviates their comparison.

Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Author:
Johannes Graën, Dominique Sandoz, Martin Volk
Title:
Multilingwis2 – Explore Your Parallel Corpus
References:

Bartunov, Oleg and Teodor Sigaev (2016). “FTS is DEAD ? – Long live FTS !” https://www.slideshare.net/ArthurZakirov1/better-full-text-search-in-postgresql. Accessed March 12th, 2017.


Clematide, Simon, Johannes GraĂ«n, and Martin Volk (2016). “Multilingwis – A Multilingual
Search Tool for Multi-Word Units in Multiparallel Corpora”. In: Computerised and Corpusbased Approaches to Phraseology: Monolingual and Multilingual Perspectives – Fraseologia computacional y basada en corpus: perspectivas monolingĂĽes y multilingĂĽes. Ed. by Gloria Corpas Pastor. Geneva: Tradulex, pp. 447–455.


Dyer, Chris, Victor Chahuneau, and Noah A. Smith (2013). “A Simple, Fast, and Effective
Reparameterization of IBM Model 2”. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 644–649.


Göhring, Anne and Martin Volk (2011). “The Text+Berg Corpus An Alpine French-German Parallel Resource”. In: Traitement Automatique des Langues Naturelles, p. 63.


GraĂ«n, Johannes, Dolores Batinic, and Martin Volk (2014). “Cleaning the Europarl Corpus for Linguistic Applications”. In: Proceedings of the Conference on Natural Language Processing. (Hildesheim). Stiftung Universität Hildesheim, pp. 222–227.


GraĂ«n, Johannes, Simon Clematide, and Martin Volk (2016). “Efficient Exploration of Translation Variants in Large Multiparallel Corpora Using a Relational Database”. In: 4th Workshop on Challenges in the Management of Large Corpora Workshop Programme. Ed. by Piotr Banski, Marc Kupietz, Harald LĂĽngen, et al., pp. 20–23.


Koehn, Philipp (2005). “Europarl: A parallel corpus for statistical machine translation”. In: Machine Translation Summit. (Phuket). Vol. 5, pp. 79–86.


Liang, Percy, Ben Taskar, and Dan Klein (2006). “Alignment by Agreement”. In: Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pp. 104–111.


Och, Franz Josef and Hermann Ney (2003). “A Systematic Comparison of Various Statistical
Alignment Models”. In: Computational linguistics 29.1, pp. 19–51.


Petrov, Slav, Dipanjan Das, and Ryan McDonald (2012). “A Universal Part-of-Speech Tagset”. In: Proceedings of the 8th International Conference on Language Resources and Evaluation. Ed. by Nicoletta Calzolari et al. Istanbul: European Language Resources Association (ELRA).


PostgreSQL Global Development Group (2017). PostgreSQL 9.6 Documentation – Chapter 12. Full Text Search. https://www.postgresql.org/docs/9.6/static/textsearch.html. Accessed March 12th, 2017.


Schmid, Helmut (1994). “Probabilistic part-ofspeech tagging using decision trees”. In: Proceedings of International Conference on New Methods in Natural Language Processing.
(Manchester). Vol. 12, pp. 44–49.


Tiedemann, Jörg (2011). Bitext Alignment. Vol. 4. Synthesis Lectures on Human Language Technologies 2. Morgan & Claypool.


Varga, Dániel, LászlĂł NĂ©meth, PĂ©ter Halácsy, András Kornai, Viktor TrĂłn, and Viktor Nagy (2005). “Parallel corpora for medium density languages”. In: Proceedings of the Recent Advances in Natural Language Processing. (Borovets), pp. 590–596.


Volk, Martin, Chantal Amrhein, Noëmi Aepli, Mathias Müller, and Phillip Ströbel (2016).
“Building a Parallel Corpus on the World’s Oldest Banking Magazine”. In: Proceedings of the Conference on Natural Language Processing. (Bochum).

Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Author:
Johannes Graën, Dominique Sandoz, Martin Volk
Title:
Multilingwis2 – Explore Your Parallel Corpus
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2017-02-21