Article | Proceedings of the 7th Workshop on NLP for Computer Assisted Language Learning (NLP4CALL 2018) at SLTC, Stockholm, 7th November 2018 | Learner Corpus Anonymization in the Age of GDPR: Insights from the Creation of a Learner Corpus of Swedish Link�ping University Electronic Press Conference Proceedings
Göm menyn

Title:
Learner Corpus Anonymization in the Age of GDPR: Insights from the Creation of a Learner Corpus of Swedish
Author:
Beéta Megyesi: Uppsala University, Sweden Lena Granstedt: Umeå University, Sweden Sofia Johansson: Stockholm University, Sweden Julia Prentice: University of Gothenburg, Sweden Dan Rosén: University of Gothenburg, Sweden Carl-Johan Schenström: University of Gothenburg, Sweden Gunlög Sundberg: Stockholm University, Sweden Mats Wirén: Stockholm University, Sweden Elena Volodina: University of Gothenburg, Sweden
Download:
Full text (pdf)
Year:
2018
Conference:
Proceedings of the 7th Workshop on NLP for Computer Assisted Language Learning (NLP4CALL 2018) at SLTC, Stockholm, 7th November 2018
Issue:
152
Article no.:
006
Pages:
47-56
No. of pages:
10
Publication type:
Abstract and Fulltext
Published:
2018-11-02
ISBN:
978-91-7685-173-9
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press, Linköpings universitet


Export in BibTex, RIS or text

This paper reports on the status of learner corpus anonymization for the ongoing research infrastructure project SweLL. The main project aim is to deliver and make available for research a well-annotated corpus of essays written by second language (L2) learners of Swedish. As the practice shows, annotation of learner texts is a sensitive process demanding a lot of compromises between ethical and legal demands on the one hand, and research and technical demands, on the other. Below, is a concise description of the current status of pseudonymization of language learner data to ensure anonymity of the learners, with numerous examples of the above-mentioned compromises.



Keywords: learner corpus, anonymization, pseudonymization, legal issues, GDPR

Proceedings of the 7th Workshop on NLP for Computer Assisted Language Learning (NLP4CALL 2018) at SLTC, Stockholm, 7th November 2018

Author:
Beéta Megyesi, Lena Granstedt, Sofia Johansson, Julia Prentice, Dan Rosén, Carl-Johan Schenström, Gunlög Sundberg, Mats Wirén, Elena Volodina
Title:
Learner Corpus Anonymization in the Age of GDPR: Insights from the Creation of a Learner Corpus of Swedish
References:
.

Accenture. 2016. Building digital trust: The role of data ethics in the digital age. https://www.ccenture.com/t20160613T024441__w__/us-en/_acnmedia/PDF-22/Accenture-Data-Ethics-POV-WEB.pdf.

Malin Ahlberg, Lars Borin, Markus Forsberg, Martin Hammarstedt, Leif-J¨oran Olsson, Olof Olsson, Johan Roxendal, and Jonatan Uppström. 2013. Korp and Karp - a bestiary of language resources: the research infrastructure of Språkbanken. In Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013), pages 429–433.

Council of Europe. 2001. Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Press Syndicate of the University of Cambridge.

Karën Fort. 2016. Collaborative Annotation for Reliable Natural Language Processing: Technical and Sociological Aspects. John Wiley & Sons.

Ben Medlock. 2016. An Introduction to NLP-based Textual Anonymisation. In Proceedings of Language Resources and Evaliation, pages 1051–1056.

Nives Mikelic Preradovic, Monika Berac, and Damir Boras. 2015. Learner Corpus of Croatian as a Second and Foreign Language. In Multidisciplinary Approaches to Multilingualism. Peter Lang. Riksdagen. 1949. Tryckfrihetsförordningen (1949:105). http://www.riksdagen.se/sv/dokument-lagar/dokument/svensk-forfattningssamling/tryckfrihetsforordning-1949105_sfs-1949-105.

Alexandr Rosen. 2017. Introducing a corpus of nonnative Czech with automatic annotation. Language, Corpora and Cognition, pages 163–180.

Dan Ros´en, Mats Wir´en, and Elena Volodina. 2018. Error Coding of Second-Language Learner Texts Based on Mostly Automatic Alignment of Parallel Corpora. In CLARIN Annual conference 2018.

Kari Tenfjord, Paul Meurer, and Knut Hofland. 2006. The ASK corpus: A language learner corpus of Norwegian as a second language. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC), pages 1821–1824.

Elena Volodina, Lena Granstedt, Sofia Johansson, Beáta Megyesi, Julia Prentice, Dan Rosén, Carl-Johan Schenström, Gunlög Sundberg, and Mats Wir´en. 2018. Annotation of learner corpora: first SweLL insights. In Proceedings of SLTC 2018, Stockholm, Sweden.

Elena Volodina, Beáta Megyesi, Mats Wirén, Lena Granstedt, Julia Prentice, Monica Reichenberg, and Gunl¨og Sundberg. 2016a. A Friend in Need? Research agenda for electronic Second Language infrastructure. In Proceedings of SLTC 2016, Umeå, Sweden.

Elena Volodina, Ildikó Pilán, Lars Borin, and Therese Lindström Tiedemann. 2014. A flexible
language learning platform based on language resources and web services. In LREC, pages 3973–3978.

Elena Volodina, Ildikó Pilán, Ingegerd Enström, Lorena Llozhi, Peter Lundkvist, Gunlög Sundberg, and Monica Sandell. 2016b. Swell on the rise: Swedish learner language corpus for European reference level studies. Proceedings of LREC 2016

Proceedings of the 7th Workshop on NLP for Computer Assisted Language Learning (NLP4CALL 2018) at SLTC, Stockholm, 7th November 2018

Author:
Beéta Megyesi, Lena Granstedt, Sofia Johansson, Julia Prentice, Dan Rosén, Carl-Johan Schenström, Gunlög Sundberg, Mats Wirén, Elena Volodina
Title:
Learner Corpus Anonymization in the Age of GDPR: Insights from the Creation of a Learner Corpus of Swedish
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2018-9-11