Article | Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language | Applying BLAST to Text Reuse Detection in Finnish Newspapers and Journals, 1771-1910
Göm menyn

Title:
Applying BLAST to Text Reuse Detection in Finnish Newspapers and Journals, 1771-1910
Author:
Aleksi Vesanto: Turku NLP Group, Department of FT, University of Turku, Finland Asko Nivala: Cultural History and Turku Institute for Advanced Studies, University of Turku, Finland Heli Rantala: Cultural History, University of Turku, Finland Tapio Salakoski: Turku NLP Group, Department of FT, University of Turku, Finland Hannu Salmi: Cultural History, University of Turku, Finland Filip Ginter: Turku NLP Group, Department of FT, University of Turku, Finland
Download:
Full text (pdf)
Year:
2017
Conference:
Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language
Issue:
133
Article no.:
010
Pages:
54-58
No. of pages:
5
Publication type:
Abstract and Fulltext
Published:
2017-05-10
ISBN:
978-91-7685-503-4
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press, Linköpings universitet


Export in BibTex, RIS or text

We present the results of text reuse detection, based on the corpus of scanned and OCR-recognized Finnish newspapers and journals from 1771 to 1910. Our study draws on BLAST, a software created for comparing and aligning biological sequences. We show different types of text reuse in this corpus, and also present a comparison to the software Passim, developed at the Northeastern University in Boston, for text reuse detection.

Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language

Author:
Aleksi Vesanto, Asko Nivala, Heli Rantala, Tapio Salakoski, Hannu Salmi, Filip Ginter
Title:
Applying BLAST to Text Reuse Detection in Finnish Newspapers and Journals, 1771-1910
References:

Stephen F. Altschul, Warren Gish, Webb Miller, Eugene W. Myers, and David J. Lipman. 1990. Basic local alignment search tool. Journal of Molecular Biology, 215(3):403‚Äď410, Oct.


Ryan Cordell. 2015. Reprinting, Circulation, and the Network Author in Antebellum Newspapers. American Literary History, 27(3):417‚Äď445.


Kimmo Kettunen. 2016. Keep, change or delete? Setting up a low resource ocr post-correction framework for a digitized old finnish newspaper collection. In D. Calvanese, D. De Nart, and C. Tasso, editors, Digital Libraries on the Move. IRCDL 2015.
Communications in Computer and Information Science, volume 612. Springer, Cham.


Tuula Pääkkönen, Jukka Kervinen, Asko Nivala, Kimmo Kettunen, and Eetu Mäkelä. 2016. Exporting Finnish Digitized Historical Newspaper Contents for Offline Use. D-Lib Magazine, 22(7).


David A. Smith, Ryan Cordell, Elizabeth Maddock Dillon, Nick Stramp, and John Wilkerson. 2014. Detecting and modeling local text reuse. In Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL ’14, pages 183‚Äď192, Piscataway, NJ, USA. IEEE Press.


David A. Smith, Ryan Cordell, and Abby Mullen. 2015. Computational Methods for Uncovering Reprinted Texts in Antebellum Newspapers. American Literary History, 27(3):E1‚ÄďE15.


Aleksi Vesanto, Asko Nivala, Tapio Salakoski, Hannu Salmi, and Ginter Filip. 2017. A system for identifying and exploring text repetition in large historical document corpora. In Proceedings of NoDaLiDa 2017.

Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language

Author:
Aleksi Vesanto, Asko Nivala, Heli Rantala, Tapio Salakoski, Hannu Salmi, Filip Ginter
Title:
Applying BLAST to Text Reuse Detection in Finnish Newspapers and Journals, 1771-1910
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2017-02-21