Article | Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16 | Tagging a Morphologically Complex Language Using an Averaged Perceptron Tagger: The Case of Icelandic
Göm menyn

Title:
Tagging a Morphologically Complex Language Using an Averaged Perceptron Tagger: The Case of Icelandic
Author:
Hrafn Lofsson: School of Computer Science, Reykjavik University, Iceland Robert Östling: Department of Linguistics, Stockholm University, Sweden
Download:
Full text (pdf)
Year:
2013
Conference:
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16
Issue:
085
Article no.:
013
Pages:
105-119
No. of pages:
15
Publication type:
Abstract and Fulltext
Published:
2013-05-17
ISBN:
978-91-7519-589-6
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press; Linköpings universitet


Export in BibTex, RIS or text

In this paper; we experiment with using Stagger; an open-source implementation of an Averaged Perceptron tagger; to tag Icelandic; a morphologically complex language. By adding languagespecific linguistic features and using IceMorphy; an unknown word guesser; we obtain state-of- the-art tagging accuracy of 92.82%. Furthermore; by adding data from a morphological database; and word embeddings induced from an unannotated corpus; the accuracy increases to 93.84%. This is equivalent to an error reduction of 5.5%; compared to the previously best tagger for Icelandic; consisting of linguistic rules and a Hidden Markov Model.

Keywords: Averaged Perceptron; Part-of-Speech Tagging; Morphological Database; Linguistic Features; Word Embeddings

Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16

Author:
Hrafn Lofsson, Robert Östling
Title:
Tagging a Morphologically Complex Language Using an Averaged Perceptron Tagger: The Case of Icelandic
References:

Berger; A. L.; Pietra; V. J. D.; and Pietra; S. A. D. (1996). A Maximum Entropy Approach to Natural Language Processing. Computational Linguistics; 22:39‚Äď71.


Bjarnad√≥ttir; K. (2012). The Database of Modern Icelandic Inflection. In Proceedings of the workshop ‚ÄúLanguage Technology for Normalization of Less-Resourced Languages‚ÄĚ; SaLTMiL 8 ‚Äď AfLaT; LREC; Istanbul; Turkey.


Brants; T. (2000). TnT: A statistical part-of-speech tagger. In Proceedings of the 6th Conference on Applied Natural Language Processing; Seattle; WA; USA.


Collins; M. (2002). Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing; Philadelphia; PA; USA.


Collobert; R. and Weston; J. (2008). A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask learning. In Proceedings of the 25th International Conference on Machine learning; ICML; Helsinki; Finland.


Collobert; R.; Weston; J.; Bottou; L.; Karlen; M.; Kavukcuoglu; K.; and Kuksa; P. (2011). Natural Language Processing (Almost) from Scratch. Journal of Machine Learning Research; 12:2493‚Äď2537.


Dredze; M. andWallenberg; J. (2008a). Further Results and Analysis of Icelandic Part of Speech Tagging. Technical report; Department of Computer and Information Science; University of Pennsylvania.


Dredze; M. and Wallenberg; J. (2008b). Icelandic Data Driven Part of Speech Tagging. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies; ACL-HLT; Columbus; OH; USA.


Georgiev; G.; Zhikov; V.; Simov; K.; Osenova; P.; and Nakov; P. (2012). Feature-Rich Partof- speech Tagging for Morphologically Complex Languages: Application to Bulgarian. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics; EACL; Avignon; France


Giménez; J. and Màrquez; L. (2004). SVMTool: A general POS tagger generator based on Support Vector Machines. In Proceedings of the 4th International Conference on Language Resources and Evaluation; LREC; Lisbon; Portugal


Helgadóttir; S. (2005). Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic. In Holmboe; H.; editor; Nordisk Sprogteknologi 2004. Museum Tusculanums Forlag; Copenhagen.


Lafferty; J.; McCallum; A.; and Pereira; F. (2001). Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of the 18th International Conference on Machine Learning; ICML; Williamstown; MA; USA.


Loftsson; H. (2008). Tagging Icelandic text: A linguistic rule-based approach. Nordic Journal of Linguistics; 31(1):47‚Äď72.


Loftsson; H.; Helgadóttir; S.; and Rögnvaldsson; E. (2011). Using a morphological database to increase the accuracy in PoS tagging. In Proceedings of Recent Advances in Natural Language Processing; RANLP; Hissar; Bulgaria.


Loftsson; H.; Kramarczyk; I.; Helgadóttir; S.; and Rögnvaldsson; E. (2009). Improving the PoS tagging accuracy of Icelandic text. In Proceedings of the 17th Nordic Conference of Computational Linguistics; NoDaLiDa; Odense; Denmark.


Loftsson; H. and R√∂gnvaldsson; E. (2007). IceNLP: A Natural Language Processing Toolkit for Icelandic. In Proceedings of Interspeech 2007; Special Session: ‚ÄúSpeech and language technology for less-resourced languages‚ÄĚ; Interspeech; Antwerp; Belgium.


Marcus; M. P.; Santorini; B.; and Marcinkiewicz; M. A. (1993). Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics; 19(20):313‚Äď330. Mikheev; A. (1997). Automatic Rule Induction for Unknown Word Guessing. Computational Linguistics; 21(4):543‚Äď565.


Nakagawa; T. and Yuji; M. (2006). Guessing parts-of-speech of unknown words using global information. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual meeting of the Association for Computational Linguistics; Sydney; Australia.


Nakov; P.; Bonev; Y.; Angelova; G.; Cius; E.; and Hahn; W. v. (2003). Guessing Morphological Classes of Unknown German Nouns. In Proceedings of Recent Advances in Natural Language Processing; RANLP; Borovets; Bulgaria.


Pind; J.; Magn√ļsson; F.; and Briem; S. (1991). √ćslensk or√įt√≠√įnib√≥k [The Icelandic Frequency Dictionary]. The Institute of Lexicography; University of Iceland; Reykjavik; Iceland.


Radziszewski; A. (2013). A tiered CRF tagger for Polish. In Bembenik; R.; Skonieczny; L.;Rybi¬īnski; H.; Kryszkiewicz; M.; and Niezg√≥dka; M.; editors; Intelligent Tools for Building a Scientific Information Platform: Advanced Architectures and Solutions. Springer Verlag.


Ratnaparkhi; A. (1996). A Maximum Entropy Model for Part-Of-Speech Tagging. In Proceedings of the Empirical Methods in Natural Language Processing Conference; Philadelphia; PA; USA.


Rögnvaldsson; E. and Helgadóttir; S. (2011). Morphosyntactic Tagging of Old Icelandic Texts and Its Use in Studying Syntactic Variation and Change. In Sporleder; C.; van den Bosch; A.; and Zervanou; K.; editors; Language Technology for Cultural Heritage: Selected Papers from the LaTeCH Workshop Series. Springer; Berlin.


Shen; L.; Satta; G.; and Joshi; A. (2007). Guided Learning for Bidirectional Sequence Classification.In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics; ACL; Prague; Czech Republic.


S√łgaard; A. (2011). Semi-supervised condensed nearest neighbor for part-of-speech tagging. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies; ACL-HLT; Portland; Oregon.


Spoustov√°; D. j.; Haji?c; J.; Raab; J.; and Spousta; M. (2009). Semi-supervised Training for the Averaged Perceptron POS Tagger. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics; EACL; Athens; Greece.


Toutanova; K.; Klein; D.; Manning; C. D.; and Singer; Y. (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology; NAACL; Edmonton; Canada.


Tsuruoka; Y.; Miyao; Y.; and Kazama; J. (2011). Learning with Lookahead: Can History- Based Models Rival Globally Optimized Models? In Proceedings of the Fifteenth Conference on Computational Natural Language Learning; CoNLL; Portland; Oregon; USA.


Turian; J.; Ratinov; L.; and Bengio; Y. (2010). Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics; ACL; Uppsala; Sweden.


√Ėstling; R. (2012). Stagger: A modern POS tagger for Swedish. In Proceedings of the Swedish Language Technology Conference; SLTC; Lund; Sweden.

Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16

Author:
Hrafn Lofsson, Robert Östling
Title:
Tagging a Morphologically Complex Language Using an Averaged Perceptron Tagger: The Case of Icelandic
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2017-02-21