Article | Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden | OCR and post-correction of historical Finnish texts
Göm menyn

Title:
OCR and post-correction of historical Finnish texts
Author:
Senka Drobac: Department of Modern Languages, University of Helsinki, Finland Pekka Kauppinen: Department of Modern Languages, University of Helsinki, Finland Krister Lindén: Department of Modern Languages, University of Helsinki, Finland
Download:
Full text (pdf)
Year:
2017
Conference:
Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden
Issue:
131
Article no.:
009
Pages:
70-76
No. of pages:
7
Publication type:
Abstract and Fulltext
Published:
2017-05-08
ISBN:
978-91-7685-601-7
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press, Linköpings universitet


Export in BibTex, RIS or text

This paper presents experiments on Optical character recognition (OCR) as a combination of Ocropy software and data-driven spelling correction that uses Weighted Finite-State Methods. Both model training and testing were done on Finnish corpora of historical newspaper text and the best combination of OCR and post-processing models give 95.21% character recognition accuracy.

Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Author:
Senka Drobac, Pekka Kauppinen, Krister Lindén
Title:
OCR and post-correction of historical Finnish texts
References:

Thomas M Breuel, Adnan Ul-Hasan, Mayce Ali Al-Azawi, and Faisal Shafait. 2013. High-performance OCR for printed English and Fraktur using LSTM networks. In 2013 12th International Conference on Document Analysis and Recognition, pages 683–687. IEEE.


Thomas M Breuel. 2008. The OCRopus open source OCR system. In Electronic Imaging 2008, pages 68150F–68150F. International Society for Optics and Photonics.


Thomas Breuel. 2009. Recent progress on the OCRopus OCR system. In Proceedings of the International Workshop on Multilingual OCR, page 2. ACM.


Steffen Eger, Tim vor der Brck, and Alexander Mehler. 2016. A comparison of four character-level stringto-string translation models for (OCR) spelling error correction. The Prague Bulletin of Mathematical Linguistics, 105:77–99.


Vladimir I. Levenshtein. 1966. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady, 10:707.


R. Llobet, J. R. Cerdan-Navarro, J. C. Perez-Cortes, and J. Arlandis. 2010. OCR post-processing using weighted finite-state transducers. In 2010 20th International Conference on Pattern Recognition, pages 2021–2024, Aug.


Faisal Shafait. 2009. Document image analysis with OCRopus. In Multitopic Conference, 2009. INMIC 2009. IEEE 13th International, pages 1–6. IEEE.


Miikka Silfverberg and Jack Rueter. 2015. Can morphological analyzers improve the quality of optical character recognition? In Septentrio Conference Series, number 2, pages 45–56.


Miikka Silfverberg, Pekka Kauppinen, and Krister Lind´en. 2016. Data-driven spelling correction using weighted finite-state methods. In Proceedings of the SIGFSM Workshop on Statistical NLP and Weighted Automata, pages 51–59, Berlin, Germany, August. Association for Computational Linguistics.


Uwe Springmann, Dietmar Najock, Hermann Morgenroth, Helmut Schmid, Annette Gotscharek, and Florian Fink. 2014. OCR of historical printings of latin texts: problems, prospects, progress. In Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage, pages 71–75. ACM.

Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Author:
Senka Drobac, Pekka Kauppinen, Krister Lindén
Title:
OCR and post-correction of historical Finnish texts
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2017-02-21