Article | Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden | Optimizing a PoS Tagset for Norwegian Dependency Parsing
Göm menyn

Title:
Optimizing a PoS Tagset for Norwegian Dependency Parsing
Author:
Petter Hohle: Department of Informatics, University of Oslo, Norway Lilja √ėvrelid: Department of Informatics, University of Oslo, Norway Erik Velldal: Department of Informatics, University of Oslo, Norway
Download:
Full text (pdf)
Year:
2017
Conference:
Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden
Issue:
131
Article no.:
017
Pages:
142-151
No. of pages:
10
Publication type:
Abstract and Fulltext
Published:
2017-05-08
ISBN:
978-91-7685-601-7
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press, Linköpings universitet


Export in BibTex, RIS or text

This paper reports on a suite of experiments that evaluates how the linguistic granularity of part-of-speech tagsets impacts the performance of tagging and syntactic dependency parsing. Our results show that parsing accuracy can be significantly improved by introducing more finegrained morphological information in the tagset, even if tagger accuracy is compromised. Our taggers and parsers are trained and tested using the annotations of the Norwegian Dependency Treebank.

Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Author:
Petter Hohle, Lilja √ėvrelid, Erik Velldal
Title:
Optimizing a PoS Tagset for Norwegian Dependency Parsing
References:

Bernd Bohnet. 2010. Very High Accuracy and Fast Dependency Parsing is not a Contradiction. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 89‚Äď97, Beijing, China.


Thorsten Brants. 2000. TnT - A Statistical Part-of-Speech Tagger. In Proceedings of the Sixth Applied Natural Language Processing Conference, Seattle, WA, USA.


Jinho D. Choi, Joel Tetreault, and Amanda Stent. 2015. It Depends: Dependency Parser Comparison Using A Web-Based Evaluation Tool. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, pages 387‚Äď396, Beijing, China.


Jan Terje Faarlund, Svein Lie, and Kjell Ivar Vannebo. 1997. Norsk referansegrammatikk. Universitetsforlaget, Oslo, Norway.


Sofia Gustafson-Capková and Britt Hartmann, 2006. Manual of the Stockholm Umeå Corpus version 2.0. Stockholm, Sweden.


Kristin Hagen, Janne Bondi Johannessen, and Anders N√łklestad. 2000. A Constraint-Based Tagger for Norwegian. In Proceedings of the 17th Scandinavian Conference of Linguistics, pages 31‚Äď48, Odense, Denmark.


Jan Hajic, Massimiliano Ciaramita, Richard Johansson, Daisuke Kawahara, Maria Ant√≤nia Mart√≠, Llu√≠s M√†rquez, Adam Meyers, Joakim Nivre, Sebastian Pad√≥, Jan ? St?epanek, Pavel Straa√†k, Mihai Surdeanu, Nianwen Xue, and Yi Zhang. 2009. The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pages 1‚Äď18, Boulder, CO, USA.


Andrew MacKinlay. 2005. The Effects of Part-of-Speech Tagsets on Tagger Performance. Bachelor’s thesis, University of Melbourne, Melbourne, Australia.


Wolfgang Maier, Sandra K√ľbler, Daniel Dakota, and Daniel Whyatt. 2014. Parsing German: How much morphology do we need? In Proeceedings of the First Joint Workshop on Statistical Parsing of Morphologically Rich Languages and Syntactic Analysis of Non-Canonical Languages, pages 1‚Äď14, Dublin, Ireland.


Christopher Manning. 2011. Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? In Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing, pages 171‚Äď189.


Mitchell Marcus, Beatrice Santorino, and Mary Ann Marcinkiewicz. 1993. Building A Large Annotated Corpus of English: The Penn Treebank. Technical report, University of Philadelphia, Philadelphia, PA, USA.


Be√°ta Megyesi. 2002. Data-Driven Syntactic Analysis:Methods and Applications for Swedish. Ph.D. thesis, Royal Institute of Technology, Stockholm, Sweden.


Thomas M√ľller, Richard Farkas, Alex Judea, Helmut Schmid, and Hinrich Sch√ľtze. 2014. Dependency parsing with latent refinements of part-of-speech tags. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pages 963‚Äď967, Doha, Qatar.


Joakim Nivre, Johan Hall, Sandra K√ľbler, Ryan Mc-Donald, Jens Nilsson, Sebastian Riedel, and Deniz Yuret. 2007. The CoNLL 2007 Shared Task on Dependency Parsing. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pages 915‚Äď932, Prague, the Czech Republic.


Lilja √ėvrelid. 2008. Finite Matters: Verbal Features in Data-Driven Parsing of Swedish. In Proceedings of the Sixth International Conference on Natural Language Processing, Gothenburg, Sweden.


Slav Petrov, Dipanjan Das, and Ryan McDonald. 2012. A Universal Part-of-Speech Tagset. In Proceedings of the Eighth International Conference on Language Resources and Evaluation, pages 2089‚Äď2096, Istanbul, Turkey.


Ines Rehbein and Hagen Hirschmann. 2013. POS tagset refinement for linguistic analysis and the impact on statistical parsing. In Proceedings of the 13th International Workshop on Treebanks and Linguistic Theories, pages 172‚Äď183, T√ľbingen, Germany.


Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, pages 142‚Äď147, Stroudsburg, PA, USA.


Djam√© Seddah, Marie Candito, and Beno√ģt Crabb√©. 2009. Cross parser evaluation and tagset variation: A french treebank study. In Proceedings of the 11th International Conference on Parsing Technologies, IWPT ’09, pages 150‚Äď161, Stroudsburg, PA, USA. Association for Computational Linguistics.


Djam√© Seddah, Reut Tsarfaty, Sandra K√ľbler, Marie Candito, Jinho D. Choi, Richard Farkas, Jennifer Foster, Iakes Goenaga, Koldo Gojenola Galletebeitia, Yoav Goldberg, Spence Green, Nizar Habash, Marco Kuhlmann, Wolfgang Maier, Yuval Marton, Joakim Nivre, Adam Przepiorkowski, Ryan Roth, Wolfgang Seeker, Yannick Versley, Veronika Vincze, Marcin Wolinski, and Alina Wroblewska. 2013. Overview of the spmrl 2013 shared task: A cross-framework evaluation of parsing morphologically rich languages. In Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically Rich Languages, pages 146‚Äď182, Seattle, USA.


Wolfgang Seeker and Jonas Kuhn. 2013. Morphological and Syntactic Case in Statistical Dependency Parsing. Computational Linguistics, 39(1):23‚Äď55.


Per Erik Solberg, Arne Skj√¶rholt, Lilja √ėvrelid, Kristin Hagen, and Janne Bondi Johannessen. 2014. The Norwegian Dependency Treebank. In Proceedings of the Ninth International Conference on Language Resources and Evaluation, pages 789‚Äď795, Reykjavik, Iceland.


Per Erik Solberg. 2013. Building Gold-Standard Treebanks for Norwegian. In Proceedings of the 19th Nordic Conference of Computational Linguistics, pages 459‚Äď464, Oslo, Norway.


Milan Straka, Jan Hajic, and Jana Strakov√°. 2016. UDPipe: trainable pipeline for processing CoNLL-U files performing tokenization, morphological analysis, pos tagging and parsing. In Proceedings of the Tenth International Conference on Language Resources and Evaluation, PortoroŇĺ, Slovenia.


Reut Tsarfaty, Djam√© Seddah, Yoav Goldberg, Sandra K√ľbler, Marie Candito, Jennifer Foster, Yannick Versley, Ines Rehbein, and Lamia Tounsi. 2010. Statistical parsing of morphologically rich languages (SPMRL): what, how and whither. In Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages.


Theresa Wilson, Janyce Wiebe, and Paul Hoffman. 2009. Recognizing Contextual Polarity: An Exploration of Features for Phrase-Level Sentiment Analysis. Computational Linguistics, 35(3):399‚Äď433.


Yue Zhang and Joakim Nivre. 2011. Transition-Based Dependency Parsing with Rich Non-Local Features. In Proceedings of the 49th Annual Meeting of the Association for Computational Lingustics: Human Language Technologies, pages 188‚Äď193, Portland, OR, USA.

Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Author:
Petter Hohle, Lilja √ėvrelid, Erik Velldal
Title:
Optimizing a PoS Tagset for Norwegian Dependency Parsing
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2017-02-21