Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden
Jörg Tiederman: University of Helsinki, Finland
On behalf of the program committee, I am pleased to welcome you to the 21st Nordic Conference on Computational Linguistics (NoDaLiDa 2017), held at theWallenberg Conference Center in the beautiful city of Gothenburg in Sweden, on May 22–24, 2017. The proceedings are published as part of the NEALT Proceedings Series by Linköping University Electronic Press and they will also be available from the ACL Anthology together with the proceedings of the co-located workshops. The NoDaLiDa conference has been organized bi-annually since 1977 and returns to this anniversary event after 40 years back to Gothenburg, where it started as a friendly gathering to discuss on-going research in the field of computational linguistics in the Nordic countries. The Northern European Association for Language Technology (NEALT) was founded later in 2006, which is now responsible for organizing NoDaLiDa among other events in the Nordic countries, the Baltic states and Northwest Russia. Since the early days, NoDaLiDa has grown into a recognized international conference and the tradition continues with the program of this year’s conference. It is a great honor for me to serve as the general chair of NoDaLiDa 2017 and I am grateful for all the support during the progress.

Before diving deeper into the acknowledgements, please, let me first briefly introduce the setup of the conference. Similar to the last edition, we included different paper categories to be presented: Regular long papers, student papers, short papers and system demonstration papers. Regular papers are presented orally during the conference and short papers received a slot in one of the two poster sessions. System demonstrations are given at the same time as the posters. We selected two student papers for an oral presentation and three student papers for poster presentations. In total, we received 78 submissions and accepted 49. The submissions included 32 regular papers (21 accepted, 65.6% acceptance rate), 8 student papers (5 accepted, 62.5% acceptance rate), 27 short papers (12 accepted, 44.4% acceptance rate) and 11 system demonstration papers (which we accepted all).

In addition to the submitted papers, NoDaLiDa 2017 also features invited keynote speakers – three distinguished international researchers: Kyunghyun Cho from New York University, Sharon Goldwater from the University of Edinburgh and Rada Mihalcea from the University of Michigan. We are excited about their contributions and grateful for their participation in the conference.

Furthermore, four workshops are connected to NoDaLiDa 2017: The First Workshop on Universal Dependencies (UDW 2017), the Joint 6th Workshop on NLP for CALL and 2nd Workshop on NLP for Research on Language Acquisition (NLP4CALL & LA), the Workshop on Processing Historical Language and theWorkshop on Constraint Grammar - Methods, Tools, and Applications. We would like to thank the workshop organizers for their efforts in making these events happen enriching the whole conference and its scientific coverage.

Finally, I would also like to thank the entire team behind the conference. Organizing such an event is a complex process and would not be possible without the help of many people. I would like to thank all members of the program committee, especially Be´ata Megyesi for a smooth transition from the previous NoDaLiDa and all her valuable input coming from the organization of that event, Inguna Skadin¸a for taking care of the submission system EasyChair, Lilja Øvrelid for the publicity and calls that we need to send out. My greatest relieve from organisational pain came from the professional local committee in Gothenburg. It is a pleasure to work together with their team and without the hard work of the local organizers we could not run the event in any way. Thank you very much and especially thanks to Nina Tahmasebi for leading the local team. All names are properly listed below and I am grateful to all of you and your efforts. I would also like to acknowledge the large number of reviewers and sub-reviewers for their assessment of the submissions, NEALT for backing up the conference, Link¨oping University Press for publishing the proceedings as well as people behind the ACL Anthology for offering the space for storing our publications. And, last but not least, I would also thank our sponsors for all the financial support, which really helped us to organize a pleasant and affordable meeting.

With all of these acknowledgments, and with my apologies for forgetting to mention many names that should be listed here, I would like to wish you all, once again, a fruitful conference and a nice stay in Gothenburg. And I wish you a lot of pleasure with reading the contributions in this volume especially if you, for whatever reason, happen to read this welcome address after the conference has already ended.

Jörg Tiedemann (general chair of NoDaLiDa 2017)

Erik Velldal, Lilja Øvrelid, Petter Hohle
Joint UD Parsing of Norwegian Bokmål and Nynorsk
[Abstract and Fulltext]

Prasanth Kolachina, Martin Riedl, Chris Biemann
Replacing OOV Words For Dependency Parsing With Distributional Semantics
[Abstract and Fulltext]

Ali Basirat, Joakim Nivre
Real-valued Syntactic Word Vectors (RSV) for Greedy Neural Dependency Parsing
[Abstract and Fulltext]

Kimmo Kettunen, Laura Löfberg
Tagging Named Entities in 19th Century and Modern Finnish Newspaper Material with a Finnish Semantic Tagger
[Abstract and Fulltext]

Marie Dubremetz, Joakim Nivre
Machine Learning for Rhetorical Figure Detection: More Chiasmus with Less Annotation
[Abstract and Fulltext]

Alexander Wallin, Pierre Nugues
Coreference Resolution for Swedish and German using Distant Supervision
[Abstract and Fulltext]

Kimmo Koskenniemi
Aligning phonemes using finte-state methods
[Abstract and Fulltext]

Katri Leino, Mikko Kurimo
Acoustic Model Compression with MAP adaptation
[Abstract and Fulltext]

Senka Drobac, Pekka Kauppinen, Krister Lindén
OCR and post-correction of historical Finnish texts
[Abstract and Fulltext]

Asbjørn Ottesen Steinskog, Jonas Foyn Therkelsen, Björn Gambäck
Twitter Topic Modeling by Tweet Aggregation
[Abstract and Fulltext]

Anton Södergren, Pierre Nugues
A Multilingual Entity Linker Using PageRank and Semantic Graphs
[Abstract and Fulltext]

Avo Muromägi, Kairit Sirts, Sven Laur
Linear Ensembles of Word Embedding Models
[Abstract and Fulltext]

Flavio Massimiliano Cecchini, Martin Riedl, Chris Biemann
Using Pseudowords for Algorithm Comparison: An Evaluation Framework for Graph-based Word Sense Induction
[Abstract and Fulltext]

Ryan Johnson, Tommi A Pirinen, Tiina Puolakainen, Francis Tyers, Trond Trosterud, Kevin Unhammer
North-Sámi to Finnish rule-based machine translation system
[Abstract and Fulltext]

Lene Antonsen, Ciprian Gerstenberger, Maja Kappfjell, Sandra Nystø Rahka, Marja-Liisa Olthuis, Trond Trosterud, Francis M. Tyers
Machine translation with North Saami as a pivot language
[Abstract and Fulltext]

Jesper Näsman, Beáta Megyesi, Anne Palmér
SWEGRAM – A Web-Based Tool for Automatic Annotation and Analysis of Swedish Texts
[Abstract and Fulltext]

Petter Hohle, Lilja Øvrelid, Erik Velldal
Optimizing a PoS Tagset for Norwegian Dependency Parsing
[Abstract and Fulltext]

Veronika Laippala, Juhani Luotolahti, Aki-Juhan Kyröläinen, Tapio Salakoski, Filip Ginter
Creating register sub-corpora for the Finnish Internet Parsebank
[Abstract and Fulltext]

Simon Dobnik, Erik Wouter de Graaf
KILLE: a Framework for Situated Agents for Learning Language Through Interaction
[Abstract and Fulltext]

Dimitrios Kokkinakis, Kristina Lundholm Fors Lundholm Fors, Eva Björkner, Arto Nordlund
Data Collection from Persons with Mild Forms of Cognitive Impairment and Healthy Controls - Infrastructure for Classification and Prediction of Dementia
[Abstract and Fulltext]

Tommi Jauhiainen, Krister Lindén, Heidi Jauhiainen
Evaluation of language identification methods using 285 languages
[Abstract and Fulltext]

Heiki-Jaan Kaalep, Siim Orasmaa
Can We Create a Tool for General Domain Event Analysis?
[Abstract and Fulltext]

Eckhard Bick
From Treebank to Propbank: A Semantic-Role and VerbNet Corpus for Danish
[Abstract and Fulltext]

Johannes Bjerva, Robert Ö stling
Cross-lingual Learning of Semantic Textual Similarity with Multilingual Word Representations
[Abstract and Fulltext]

Johannes Bjerva
Will my auxiliary tagging task help? Estimating Auxiliary Tasks Effectivity in Multi-Task Learning
[Abstract and Fulltext]

Carl Börstell, Robert Östling
Iconic Locations in Swedish Sign Language: Mapping Form to Meaning with Lexical Databases
[Abstract and Fulltext]

Marcus Klang, Pierre Nugues
Docforia: A Multilayer Document Model
[Abstract and Fulltext]

Viljami Venekoski, Jouko Vankka
Finnish resources for evaluating language model semantics
[Abstract and Fulltext]

Steinþór Steingrímsson, Jón Guðnason, Sigrún Helgadóttir, Eiríkur Rögnvaldsson
Málrómur: A Manually Verified Corpus of Recorded Icelandic Speech
[Abstract and Fulltext]

Sara Stymne
The Effect of Translationese on Tuning for Statistical Machine Translation
[Abstract and Fulltext]

Johannes Graën, Dominique Sandoz, Martin Volk
Multilingwis2 – Explore Your Parallel Corpus
[Abstract and Fulltext]

Anders Nøklestad, Kristin Hagen, Janne Bondi Johannessen, Michal Kosek, Joel Priestley
A modernised version of the Glossa corpus search system
[Abstract and Fulltext]

Juhani Luotolahti, Jenna Kanerva, Filip Ginter
Dep_search: Efficient Search Tool for Large Dependency Parsebanks
[Abstract and Fulltext]

Jouna Pyysalo
Proto-Indo-European Lexicon: The Generative Etymological Dictionary of Indo-European Languages
[Abstract and Fulltext]

Roberts Rozis, Raivis Skadinš
Tilde MODEL - Multilingual Open Data for EU Languages
[Abstract and Fulltext]

Adam Ek, Sofia Knuutinen
Mainstreaming August Strindberg with Text Normalization
[Abstract and Fulltext]

Murhaf Fares, Andrey Kutuzov, Stephan Oepen, Erik Velldal
Word vectors, reuse, and replicability: Towards a community repository of large-text resources
[Abstract and Fulltext]

Mika Koistinen, Kimmo Kettunen, Tuula Pääkkönen
Improving Optical Character Recognition of Finnish Historical Newspapers with a Combination of Fraktur & Antiqua Models and Image Preprocessing
[Abstract and Fulltext]

Pierre Lison, Andrei Kutuzov
Redefining Context Windows for Word Embedding Models: An Experimental Study
[Abstract and Fulltext]

Adam Persson
The Effect of Excluding Out of Domain Training Data from Supervised Named-Entity Recognition
[Abstract and Fulltext]

Andrew Salway, Paul Meurer, Knut Hofland, Øystein Reigem
Quote Extraction and Attribution from Norwegian Newspapers
[Abstract and Fulltext]

Heidi Sand, Erik Velldal, Lilja Øvrelid
Wordnet extension via word embeddings: Experiments on the Norwegian Wordnet
[Abstract and Fulltext]

Robert Östling, Carl Börstell, Moa Gaärdenfors, Mats Wirén
Universal Dependencies for Swedish Sign Language
[Abstract and Fulltext]

Johan Falkenjack, Evelina Rennes, Daniel Fahlborg, Vida Johansson, Arne Jönsson
Services for text simplification and analysis
[Abstract and Fulltext]

Johannes Graën, Christof Bless
Exploring Properties of Intralingual and Interlingual Association Measures Visually
[Abstract and Fulltext]

Peter Juel Henrichsen
TALERUM - Learning Danish by Doing Danish
[Abstract and Fulltext]

Aarne Ranta, Prasanth Kolachina, Thomas Hallgren
Cross-Lingual Syntax: Relating Grammatical Framework with Universal Dependencies
[Abstract and Fulltext]

Victoria Rosén, Helge Dyvik, Paul Meurer, Koenraad De Smedt
Exploring Treebanks with INESS Search
[Abstract and Fulltext]

Aleksi Vesanto, Asko Nivala, Tapio Salakoski, Hannu Salmi, Filip Ginter
A System for Identifying and Exploring Text Repetition in Large Historical Document Corpora
[Abstract and Fulltext]

