Göm menyn

Title:
Proceedings of the Workshop on NLP and Pseudonymisation, September 30, 2019, Turku, Finland
Download:
Full text (pdf)
Editor(s):
Lars Ahrenberg: Department of Computer and Information Science, Human-Centered systems, Linköping University, Linköping, Sweden Beáta Megyesi: Department of Linguistics and Philology, Uppsala University, Uppsala, Sweden
Year:
2019
No. of pages:
23
Language:
English
ISBN:
978-91-7929-996-5
Series:
Linköping Electronic Conference Proceedings
Issue:
166
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Issue:
41
Published:
2019-09-30
Publisher:
Linköping University Electronic Press, Linköpings universitet



The goal of making research data freely available often comes into conflict with the rights of individuals. These rights are mainly of two kinds: intellectual property rights and rights to personal data protection. In Europe, the rights to personal data protection have been codified in the recently adopted General Data Protection Regulation, GDPR. While research, as a public interest, can process personal data, the GDPR requires appropriate safeguards to be in place. Consent from authors or subjects cannot always be obtained, or be general enough, and in this case pseudonymisation may be applied, with the intended effect that real individuals no longer can be identified from the language data.

Long before the GDPR, personal data protection has been a concern for creators of language corpora, and there exists a body of literature discussing legal and ethical aspects of corpus publishing. When the data is to be changed or masked in some way, the terms used have been anonymisation or de-identification. With textual data, originals are usually kept, however, which means that anyone with access to the originals and their metadata can make the connection with the transformed text and thus with individuals as authors or participants. For this reason we have used the GDPR term and called this workshop ‘NLP for Pseudonymisation’.

NLP is affected in two ways by the conflict. First, it uses language data of all kinds to develop systems, and these data may contain sensitive personal data. Second, it may contribute to making the pseudonymisation process more efficient, or even, more safe. We invited submissions on both of these aspects to the workshop.

NLP has been applied to the problem of deidentification of medical texts for quite a long time. Two of the three papers included in these proceedings deal with medical data. Moreover, in medicine, taxonomies of sensitive data categories are well established and annotated data already in existence. Many other fields, however, not least in the Humanities and Social Sciences, are increasingly aiming to share human-generated data and will need to develop tools and processes for this purpose. We hope that future workshops on the theme of NLP and Pseudonymisation will have a wider spread of contributions.

We would like to express our gratitude to the members of the program committee for their valuable advise and review of papers: Hercules Dalianis, Koenraad de Smedt, Cyril Grouin, Dimitrios Kokkinakis, Krister Lindén, Aurélie Névéol, Sumithra Velupillai, Sussi Olsen, Elena Volodina, and Mats Wirén. We gratefully acknowledge financial support for the workshop from Swe-Clarin, the Swedish node of the European CLARIN infrastructure, with long-term support from the Swedish Research Council.

Linköping and Uppsala, August 26, 2019

Lars Ahrenberg and Beáta Megyesi
Program co-chairs



Proceedings of the Workshop on NLP and Pseudonymisation, September 30, 2019, Turku, Finland

166:001
Allison Adams, Eric Aili, Daniel Aioanei, Rebecca Jonsson, Lina Mickelsson, Dagmar Mikmekova, Fred Roberts, Javier Fernandez Valencia, Roger Wechsler
AnonyMate: A Toolkit for Anonymizing Unstructured Chat Data
[Abstract and Fulltext]

166:002
Hanna Berg, Hercules Dalianis
Augmenting a De-identification System for Swedish Clinical Text Using Open Resources and Deep Learning
[Abstract and Fulltext]

166:003
Hercules Dalianis
Pseudonymisation of Swedish Electronic Patient Records Using a Rule-Based Approach
[Abstract and Fulltext]

Proceedings of the Workshop on NLP and Pseudonymisation, September 30, 2019, Turku, Finland

Author:
Lars Ahrenberg, Beáta Megyesi
Title:
Proceedings of the Workshop on NLP and Pseudonymisation, September 30, 2019, Turku, Finland
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2019-11-06