Conference article

Language Resources for Icelandic

Sigrún Helgadóttir
Stofnun Árna Magnússonar í íslenskum fræðum, Reykjavík, Iceland

Eiríkur Rögnvaldsson
University of Iceland, Reykjavík, Iceland

Download article

Published in: Proceedings of the workshop on Nordic language research infrastructure at NODALIDA 2013; May 22-24; 2013; Oslo; Norway. NEALT Proceedings Series 20

Linköping Electronic Conference Proceedings 89:6, p. 60-76

NEALT Proceedings Series 20:6, p. 60-76

Show more +

Published: 2013-05-17

ISBN: 978-91-7519-585-8

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

We describe the current status of Icelandic language technology with respect to available language resources and tools. The recent META-NET survey of the state of language technology support for 30 languages clearly demonstrated that Icelandic lags behind almost all European languages in this respect. However; it is encouraging that as a result of the META-NORD project; almost all basic language resources for Icelandic are now available through the META-SHARE repository and the local site http://www.málföng.is; many of them in standard formats and under standard CC or GNU licenses. This is a major achievement since many of these resources have either been unavailable up to now or only available through personal contacts. In this paper; we describe briefly most of the major resources that have been made accessible through META-SHARE; their type; content; size; format; and license scheme. It is emphasized that even though these resources are extremely valuable as a basis for further R&D work; Icelandic language technology is far from having become self-sustaining and the Icelandic language technology community will need support from partners in the Nordic countries and Europe if Icelandic is to survive in the Digital Age.

Keywords

Icelandic; Language Resources; Repositories; Licenses

References

Árnason; M.; editor (2000). Íslensk orðabók [Dictionary of Icelandic]. 3rd edition; electronic version. Edda hf.; Reykjavík.

Bjarnadóttir; K. (2012). The Database of Modern Icelandic Inflection. In Proceedings of the Workshop on Language Technology for Normalisation of Less-Resourced Languages – SaLTMiL 8 – AfLaT2012; pages 13–184; Istanbul.

Brandt; M. D.; Loftsson; H.; Sigurþórsson; H.; and Tyers; F. M. (2011). Apertium-IceNLP: A rule-based Icelandic to English machine translation system. In Proceedings of the 15th Annual Conference of the European Association for Machine Translation (EAMT-2011); pages 217–224. Leuven.

Burnard; L. and Bauman; S. (2008). Guidelines for Electronic Text Encoding and Interchange P5 edition. Text Encoding Initiative. ?tt????????t??????r???????????s?P??.

Guðnason; J.; Kjartansson; O.; Jóhannsson; J.; Carstensdóttir; E.; Vilhjálmsson; H. H.; Loftsson; H.; Helgadóttir; S.; Jóhannsdóttir; K. M.; and Rögnvaldsson; E. (2012). Almannarómur: An Open Icelandic Speech Corpus. In Proceedings of SLTU ’12; 3rd Workshop on Spoken Languages Technologies for Under-Resourced Languages; Cape Town; South Africa.

Hallsteinsdóttir; E.; Eckart; T.; Biemann; C.; Quasthoff; U.; and Richter; M. (2007). Íslenskur orðasjóður – Building a Large Icelandic Corpus. In Nivre; J.; Kaalep; H.-J.; Muischnek; K.; and Koit; M.; editors; NODALIDA 2007 Conference Proceedings; pages 288–291; Tartu. University of Tartu.

Helgadóttir; S. (2007). Mörkun íslensks texta [Tagging Icelandic Text]. Orð og tunga; 9:75–107.

Helgadóttir; S.; Svavarsdóttir; Á.; Rögnvaldsson; E.; Bjarnadóttir; K.; and Loftsson; H. (2012). The Tagged Icelandic Corpus (MIM). In Proceedings of the Workshop on Language Technology for Normalisation of Less-Resourced Languages – SaLTMiL 8 – AfLaT2012; pages 67–72; Istanbul.

Ingason; A. K. Loftsson; H.; Helgadóttir; S.; and Rögnvaldsson; E. (2008). A Mixed Method Lemmatization Algorithm Using Hierachy of Linguistic Identities (HOLI). In Raante; A. and Nordström; B.; editors; Advances in Natural Language Processing; Lecture Notes in Computer Science; volume 5221; pages 205–216. Springer; Berlin.

Jensson; A. T.; Iwano; K.; and Furui; S. (2008). Language model adaptation using machinetranslated text for resource-deficient languages. Eurasip Journal on Audio; Speech; and Music Processing; 2008. Article ID 573832.

Johannessen; J. B.; Nygaard; L.; Priestley; J.; and Nøklestad; A. (2008). Glossa: a Multilingual; Multimodal; Configurable User Interface. In Proceedings of LREC 2008; pages 617–621; Marrakesh; Morocco.

Loftsson; H. (2007). Tagging and Parsing Icelandic Text. PhD thesis; Department of Computer Science; University of Sheffield.

Loftsson; H. (2008). Tagging Icelandic text: A linguistic rule-based approach. Nordic Journal of Linguistics; 31(1):47–72.

Loftsson; H. and Rögnvaldsson; E. (2007). IceParser: An Incremental Finite-State Parser for Icelandic. In Nivre; J.; Kaalep; H.-J.; Muischnek; K.; and Koit; M.; editors; NODALIDA 2007 Conference Proceedings; pages 128–135; Tartu. University of Tartu.

Loftsson; H.; Yngvason; J. H.; Helgadóttir; S.; and Rögnvaldsson; E. (2010). Developing a PoS-tagged corpus using existing tools. In Sarasola; K.; Tyers; F. M.; and Forcada; M. L.; editors; 7th SaLTMiL Workshop on Creation and Use of Basic Lexical Resources for Less- Resourced Languages; LREC 2010; pages 53–60; Valetta.

Nikulásdóttir; A. B. and Whelpton; M. (2010). Extraction of Semantic Relations as a Basis for a Future Semantic Database for Icelandic. In Sarasola; K.; Tyers; F. M.; and Forcada; M. L.; editors; 7th SaLTMiL Workshop on Creation and Use of Basic Lexical Resources for Less-Resourced Languages; LREC 2010; pages 33–39; Valetta.

Pind; J.; Magnússon; F.; and Briem; S. (1991). Íslensk orðtíðnibók [The Icelandic Frequency Dictionary]. The Institute of Lexicography; University of Iceland; Reykjavik.

Quasthoff; U.; Fiedler; S.; and Hallsteinsdóttir; E.; editors (2012). Frequency Dictionary Icelandic / Íslensk tíðniorðabók. Leipziger Universitätsverlag; Leipzig.

Rehm; G. and Uzkoreit; H.; editors (2012). Strategic Research Agenda for Multilingual Europe 2020. Presented by the META Technology Council. Springer. Berlin.

Rögnvaldsson; E. (2004). The Icelandic Speech Recognition Project Hjal. In Holmboe; H.; editor; Nordisk Sprogteknologi. Nordic Language Technology. Årbog 2003; pages 239–242. Museum Tusculanums Forlag; Copenhagen.

Rögnvaldsson; E. and Helgadóttir; S. (2011). Morphosyntactic Tagging of Old Icelandic Texts and Its Use in Studying Syntactic Variation and Change. In Sporleder; C.; van den Bosch; A. P. J.; and Zervanou; K. A.; editors; Language Technology for Cultural Heritage: Selected Papers from the LaTeCH Workshop Series; pages 63–76. Springer; Berlin.

Rögnvaldsson; E.; Ingason; A. K.; Sigurðsson; E. F.; and Wallenberg; J. (2011). Creating a Dual-Purpose Treebank. In Proceedings of the ACRH Workshop; Heidelberg; 5 Jan. 2012. Journal for Language Technology and Computational Linguistics; 26(2):141–152.

Rögnvaldsson; E.; Jóhannsdóttir; K. M.; Helgadóttir; S.; and Steingrímsson; S. (2012). The Icelandic Language in the Digital Age. Series editors Uzkoreit; H. and Rehm; G. Springer. Berlin.

Rögnvaldsson; E.; Loftsson; H.; Bjarnadóttir; K.; Helgadóttir; S.; Nikulásdóttir; A. B.; Whelpton; M.; and Ingason; A. K. (2009). Icelandic Language Resources and Technology: Status and Prospects. In Domeij; R.; Koskenniemi; K.; Krauwer; S.; Maegaard; B.; Rögnvaldsson; E.; and de Smedt; K.; editors; Proceedings of the NODALIDA 2009 Workshop Nordic Perspectives on the CLARIN Infrastructure of Language Resources; pages 27–32. Northern European Association for Language Technology (NEALT); Tartu University Library; Tartu.

Sigurðardóttir; A.; Hannesdóttir; A. H.; Jansson; H.; Jónsdóttir; H.; Trap-Jensen; L.; and Úlfarsdóttir; Þ. (2008). ISLEX – an Icelandic-Scandinavian Multilingual Online Dictionary. In Bernal; E. and DeCesaris; J.; editors; Proceedings of the XIII Euralex International Congress (Barcelona; 15-19 July 2008); pages 779–790; Barcelona. Institut Universitari de Linguistica Aplicada; Universitat Pompeu Fabra.

Taylor; A.; Warner; A.; Pintzuk; S.; and Beths; F. (2003). The York- Toronto-Helsinki Parsed Corpus of Old English Prose. University of York. http://www.users.york.ac.uk/~lang22/YcoeHome1.htm.

Thorbergsdóttir; Á. (2003). Íslenskt íðorðastarf og orðabanki íslenskrar málstöðvar [Icelandic terminological work and the word bank of the Icelandic Language Institute]. Málfregnir; 13:3–12.

Thráinsson; H.; Angantýsson; Á.; Svavarsdóttir; Á.; Eythórsson; T.; and Jónsson; J. G. (2007). The Icelandic (Pilot) Project in ScanDiaSyn. Nordlyd; 34(1):87–124.

Vasiljevs; A.; Forsberg; M.; Gornostay; T.; Hansen; D. H.; Jóhannsdóttir; K. M.; Lindén; K.; Lyse; G. I.; Offersgaard; L.; Oksanen; V.; Olsen; S.; Pedersen; B. S.; Rögnvaldsson; E.; Rozis; R.; Skadina; I.; and de Smedt; K. (2012). Creation of an Open Shared Language Resource Repository in the Nordic and Baltic Countries. In Proceedings of LREC 2012; pages 1076—-1083; Istanbul.

Wallenberg; J.; Ingason; A. K.; Sigurðsson; E. F.; and Rögnvaldsson; E. (2011). Icelandic Parsed Historical Corpus (IcePaHC). http://www.linguist.is/icelandic_ treebank/.

Citations in Crossref