Article | Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden | Joint UD Parsing of Norwegian Bokmål and Nynorsk
Göm menyn

Title:
Joint UD Parsing of Norwegian Bokmål and Nynorsk
Author:
Erik Velldal: Department of Informatics, University of Oslo, Norway Lilja √ėvrelid: Department of Informatics, University of Oslo, Norway Petter Hohle: Department of Informatics, University of Oslo, Norway
Download:
Full text (pdf)
Year:
2017
Conference:
Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden
Issue:
131
Article no.:
001
Pages:
1-10
No. of pages:
10
Publication type:
Abstract and Fulltext
Published:
2017-05-08
ISBN:
978-91-7685-601-7
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press, Linköpings universitet


Export in BibTex, RIS or text

This paper investigates interactions in parser performance for the two official standards for written Norwegian: Bokmål and Nynorsk. We demonstrate that while applying models across standards yields poor performance, combining the training data for both standards yields better results than previously achieved for each of them in isolation. This has immediate practical value for processing Norwegian, as it means that a single parsing pipeline is sufficient to cover both varieties, with no loss in accuracy. Based on the Norwegian Universal Dependencies treebank we present results for multiple taggers and parsers, experimenting with different ways of varying the training data given to the learners, including the use of machine translation.

Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Author:
Erik Velldal, Lilja √ėvrelid, Petter Hohle
Title:
Joint UD Parsing of Norwegian Bokmål and Nynorsk
References:

ŇĹeljko Agic, Anders Johannsen, Barbara Plank, H√©ctor Alonso Mart√≠nez, Natalie Schluter, and Anders S√łgaard. 2016. Multilingual projection for parsing truly low-resource languages. Transactions of the Association for Computational Linguistics, 4:301-312.


Waleed Ammar, George Mulcaire, Miguel Ballesteros, Chris Dyer, and Noah A. Smith. 2016. One parser, many languages. arXiv preprint arXiv:1602.01595.


Bernd Bohnet. 2010. Very High Accuracy and Fast Dependency Parsing is not a Contradiction. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 89-97, Beijing, China.


Thorsten Brants. 2000. TnT - A Statistical Part-of-Speech Tagger. In Proceedings of the Sixth Applied Natural Language Processing Conference, Seattle, WA, USA.


Xavier Carreras. 2007. Experiments with a higherorder projective dependency parser. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Conference on Computational Natural Language Learning, pages 957-961, Prague, Czech Republic.


Danqi Chen and Christopher Manning. 2014. A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pages 740-750, Doha, Qatar.


Jinho D. Choi, Joel Tetreault, and Amanda Stent. 2015. It Depends: Dependency Parser Comparison Using A Web-Based Evaluation Tool. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, pages 387-396, Beijing, China.


Michael Collins. 2002. Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, pages 1-8, PA, USA.


Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singe. 2006. Online passive-aggressive algorithms. Journal of Machine Learning Research, 7:551-585.


Marie-Catherine de Marneffe, Timothy Dozat, Natalia Silveira, Katri Haverinen, Filip Ginter, Joakim Nivre, and Christopher D. Manning. 2014. Universal Stanford dependencies. A cross-linguistic typology. In Proceedings of the International Conference on Language Resources and Evaluation, pages 4585-4592, Reykjavik, Iceland.


Mikel L. Forcada, Mireia Ginest√≠-Rosell, Jacob Nordfalk, Jim O’Regan, Sergio Ortiz-Rojas, Juan Antonio P√©rez-Ortiz, Felipe S√°nchez-Mart√≠nez, Gema Ram√≠rez-S√°nchez, and Francis M. Tyers. 2011. ¬†Apertium: a free/open-source platform for rulebased machine translation. Machine Translation, 25(2):127-144.


Petter Hohle, Lilja √ėvrelid, and Erik Velldal. 2017. Optimizing a PoS tagset for Norwegian dependency parsing. In Proceedings of the 21st Nordic Conference of Computational Linguistics, Gothenburg, Sweden.


Rebecca Hwa, Philip Resnik, Amy Weinberg, Clara Cabezas, and Okan Kolak. 2005. Bootstrapping parsers via syntactic projection across parallel texts. Natural Language Engineering, 11(3).


Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word
representations in vector space. arXiv preprint arXiv:1301.3781.


Jens Nilsson and Joakim Nivre. 2008. MaltEval: An evaluation and visualization tool for dependency parsing. In Proceedings of the Sixth International Conference on Language Resources and Evaluation, pages 161-166, Marrakech, Morocco.


Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Haji?c, Christopher D. Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, and Daniel Zeman. 2016. Universal dependencies v1: A multilingual treebank collection. In Proceedings of the International Conference on Language Resources and Evaluation, PortoroŇĺ, Slovenia.


Joakim Nivre. 2015. Towards a Universal Grammar for Natural Language Processing. In Computational Linguistics and Intelligent Text Processing, volume 9041 of Lecture Notes in Computer Science, pages 3-16. Springer International Publishing.


Lilja √ėvrelid and Petter Hohle. 2016. Universal Dependencies for Norwegian. In Proceedings of the Tenth International Conference on Language Resources and Evaluation, PortoroŇĺ, Slovenia.


Slav Petrov, Dipanjan Das, and Ryan McDonald. 2012. A Universal Part-of-Speech Tagset. In Proceedings of the Eighth International Conference on Language Resources and Evaluation, pages 2089-2096, Istanbul, Turkey.


Arne Skj√¶rholt and Lilja √ėvrelid. 2012. Impact of treebank characteristics on cross-lingual parser adaptation. In Proceedings of the Eleventh International Workshop on Treebanks and Linguistic Theories, pages 187-198, Lisbon, Portugal.


Anders S√łgaard. 2011. Data point selection for crosslanguage adaptation of dependency parsers. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pages 682-686, Portland, Oregon.


Per Erik Solberg, Arne Skj√¶rholt, Lilja √ėvrelid, Kristin Hagen, and Janne Bondi Johannessen. 2014. The Norwegian Dependency Treebank. In Proceedings of the Ninth International Conference on Language Resources and Evaluation, pages 789-795, Reykjavik, Iceland.


Kathrin Spreyer, Lilja √ėvrelid, and Jonas Kuhn. 2010. Training parsers on partial trees: A cross-language comparison. In Proceedings of the International Conference on Language Resources and Evaluation (LREC).


Milan Straka, Jan Hajic, Jana Strakov√°, and Jan Hajic jr. 2015. Parsing universal dependency treebanks using neural networks and search-based oracle. In Proceedings of Fourteenth International Workshop on Treebanks and Linguistic Theories, Warsaw, Poland.


Milan Straka, Jan Hajic, and Jana Strakov√°. 2016. UDPipe: trainable pipeline for processing CoNLL-U files performing tokenization, morphological analysis, pos tagging and parsing. In Proceedings of the Tenth International Conference on Language Resources and Evaluation, PortoroŇĺ, Slovenia.


Jana Strakov√°, Milan Straka, and Jan Hajic. 2014. Open-Source Tools for Morphology, Lemmatization, POS Tagging and Named Entity Recognition. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 13-18, Baltimore, Maryland.


Oscar Täckström, Ryan McDonald, and Jakob Uszkoreit. 2012. Cross-lingual word clusters for direct transfer of linguistic structure. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics, Montreal, Canada.


J√∂rg Tiedemann, ŇĹeljko Agic Zeljko, and Joakim Nivre. 2014. Treebank translation for cross-lingual parser induction. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning, pages 130-140.


Kevin Brubeck Unhammer and Trond Trosterud. 2009. Reuse of Free Resources in Machine Translation between Nynorsk and Bokmål. In Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation, pages 35-42, Alicante.


Dan Zeman and Philip Resnik. 2008. Cross-language parser adaptation between related languages. In Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages, Hyderabad, India.

Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Author:
Erik Velldal, Lilja √ėvrelid, Petter Hohle
Title:
Joint UD Parsing of Norwegian Bokmål and Nynorsk
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2017-02-21