Article | Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania | Analysing Inconsistencies and Errors in PoS Tagging in two Icelandic Gold Standards Link�ping University Electronic Press Conference Proceedings
Göm menyn

Title:
Analysing Inconsistencies and Errors in PoS Tagging in two Icelandic Gold Standards
Author:
Steinþór Steingrímsson: The Árni Magnússon, Institute for Icelandic Studies, Reykjavík, Iceland Sigrún Helgadóttir: The Árni Magnússon, Institute for Icelandic Studies, Reykjavík, Iceland Eirikur Rögnvaldsson: University of Iceland, Reykjavík, Iceland
Download:
Full text (pdf)
Year:
2015
Conference:
Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania
Issue:
109
Article no.:
038
Pages:
287-291
No. of pages:
5
Publication type:
Abstract and Fulltext
Published:
2015-05-06
ISBN:
978-91-7519-098-3
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press, Linköpings universitet


Export in BibTex, RIS or text

This paper describes work in progress. We experiment with training a state-of-the-art tagger, Stagger, on a new gold standard, MIM-GOLD, for the PoS tagging of Icelandic. We compare the results to results obtained using a previous gold standard, IFD. Using MIM-GOLD, tagging accuracy is considerably lower, 92.76% compared to 93.67% accuracy for IFD. We analyze and classify the errors made by the tagger in order to explain this difference. We find that inconsistencies and incorrect tags in MIM-GOLD may account for this difference.

Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania

Author:
Steinþór Steingrímsson, Sigrún Helgadóttir, Eirikur Rögnvaldsson
Title:
Analysing Inconsistencies and Errors in PoS Tagging in two Icelandic Gold Standards
References:

Kristín Bjarnadóttir. 2012. The Database of Modern Icelandic Inflection. In Proceedings of “Language Technology for Normalization of Less-Resourced Languages”, workshop at the 8th International Conference on Language Resources and Evaluation, LREC 2012, Istanbul, Turkey.

Sigrún Helgadóttir, Ásta Svavarsdóttir, Eiríkur Rögnvaldsson, Kristín Bjarnadóttir, and Hrafn Loftsson. 2012. The Tagged Icelandic Corpus (MIM). In Proceedings of the workshop Language Technology for Normalization of Less-Resourced Languages, SaLTMiL 8 – AfLaT, LREC 2012, pages 67–72, Istanbul, Turkey.

Sigrún Helgadóttir, Hrafn Loftsson, and Eiríkur Rögnvaldsson. 2014. Correcting errors in a new gold standard for tagging icelandic text. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014), Reykjavik, Iceland.

Verena Henrich, Timo Reuter, and Hrafn Loftsson. 2009. Combitagger: A system for developing combined taggers. In Proceedings of the 22nd International FLAIRS Conference, Special Track: "Applied Natural Language Processing", Florida, USA.

Hrafn Loftsson and Robert Östling. 2013. Tagging a morphologically complex language using an averaged perceptron tagger: The case of icelandic. In Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA-2013), NEALT Proceedings Series 16, Oslo, Norway.

Hrafn Loftsson, Jökull H. Yngvason, Sigrún Helgadóttir, and Eiríkur Rögnvaldsson. 2010. Developing a PoS-tagged corpus using existing tools. In Proceedings of “Creation and use of basic lexical resources for less-resourced languages”, workshop at the 7th International Conference on Language Resources and Evaluation, LREC 2010, Valetta, Malta.

Hrafn Loftsson. 2008. Tagging Icelandic text: A linguistic rule-based approach. Nordic Journal of Linguistics, 31(1):47–72.

Hrafn Loftsson. 2009. Correcting a PoS-tagged corpus using three complementary methods. In Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009), Athens, Greece.

Christopher D. Manning. 2011. Part-of-speech tagging from 97% to 100%: is it time for some linguistics? In Computational Linguistics and Intelligent Text Processing, pages 171–189. Springer.

Jörgen Pind, Friðrik Magnússon, and Stefán Briem. 1991. Íslensk orðtíðnibók [The Icelandic Frequency Dictionary]. The Institute of Lexicography, University of Iceland, Reykjavik, Iceland.

Kristina Toutanova, Dan Klein, Christopher D. Manning, and Yoram Singer. 2003. Feature-rich part-ofspeech tagging with a cyclic dependency network. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, NAACL, Edmonton, Canada.

Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania

Author:
Steinþór Steingrímsson, Sigrún Helgadóttir, Eirikur Rögnvaldsson
Title:
Analysing Inconsistencies and Errors in PoS Tagging in two Icelandic Gold Standards
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2018-9-11