Article | Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden | Mainstreaming August Strindberg with Text Normalization
Göm menyn

Title:
Mainstreaming August Strindberg with Text Normalization
Author:
Adam Ek: Department of Linguistics, Stockholm University, Sweden Sofia Knuutinen: Department of Linguistics, Stockholm University, Sweden
Download:
Full text (pdf)
Year:
2017
Conference:
Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden
Issue:
131
Article no.:
036
Pages:
266-270
No. of pages:
5
Publication type:
Abstract and Fulltext
Published:
2017-05-08
ISBN:
978-91-7685-601-7
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press, Linköpings universitet


Export in BibTex, RIS or text

This article explores the application of text normalization methods based on Levenshtein distance and Statistical Machine Translation to the literary genre, specifically on the collected works of August Strindberg. The goal is to normalize archaic spellings to modern day spelling. The study finds evidence of success in text normalization, and explores some problems and improvements to the process of analysing mid-19th to early 20th century Swedish texts. This article is part of an ongoing project at Stockholm University which aims to create a corpus and webfriendly texts from Strindsberg’s collected works.

Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Author:
Adam Ek, Sofia Knuutinen
Title:
Mainstreaming August Strindberg with Text Normalization
References:

A. Baron and P. Rayson. 2008. Vard2: A tool for dealing with spelling variation in historical corpora, In Postgraduate Conference in Corpus Linguistics, Aston University, Birmingham, UK.


E. Pettersson, B. Megyesi and J. Nivre. 2012 Rule-Based Normalisation of Historical Text a Diachronic Study, Proceedings of KONVENS 2012 (LThist 2012 workshop), Vienna, September 21, 2012


E. Pettersson, B. Megyesi and J. Tiedemann. 2013 An SMT Approach to Automatic Annotation of Historical Text, Proceedings of the workshop on computational historical linguistics at NODALIDA 2013. NEALT Proceedings Series 18 / Linkping Electronic Conference Proceedings 87: 5469.


E. Pettersson, B. Megyesi and J. Nivre. 2013 Normalisation of Historical Text Using Context-Sensitive Weighted Levenshtein Distance and Compound Splitting, Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); Linköping Electronic Conference Proceedings 85: 163-179.


E. Pettersson 2016 Spelling Normalisation and Linguistic Analysis of Historical Text for Information Extraction, Studia Linguistica Upsaliensia 17. 147 pp. Uppsala: Acta Universitatis Upsaliensis, Uppsala, Sweden.


E. Pettersson 2016 User Manual for Normalisation of Noisy Input Data using HistNorm, Department of Linguistics and Philology. Uppsala: Uppsala university, Uppsala, Sweden.


L. Borin, M. Forsberg and L. Lönngren. 2008 Saldo 1.0 (svenskt associationslexikon version 2), Sprkbanken, Gothenburg: University of Gothenburg. Gothenburg, Sweden.


P. Rayson, D. Archer, and N. Smith. 2005. VARD versus Word A comparison of the UCREL variant detector and modern spell checkers on English historical corpora, In Proceedings from the Corpus Linguistics Conference Series online e-journal, volume 1, Birmingham, UK.


P. Nakov, J. Tiedmann 2012. Combining word-level and character-level models for machine translation between closely-related languages., In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) pages 301-305. Jeju Island, Korea. Association for Computational Linguistics.

Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Author:
Adam Ek, Sofia Knuutinen
Title:
Mainstreaming August Strindberg with Text Normalization
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2017-02-21