Article | Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language | The Making of the Royal Society Corpus
Göm menyn

Title:
The Making of the Royal Society Corpus
Author:
J√∂rg Knappen: Sprachwissenschaft und Sprachtechnologie, Universit√§t des Saarlandes, Germany Stefan Fischer: Sprachwissenschaft und Sprachtechnologie, Universit√§t des Saarlandes, Germany Hannah Kermes: Sprachwissenschaft und Sprachtechnologie, Universit√§t des Saarlandes, Germany Elke Teich: Sprachwissenschaft und Sprachtechnologie, Universit√§t des Saarlandes, Germany Peter Fankhauser: Institut f√ľr Deutsche Sprache (IDS), Germany
Download:
Full text (pdf)
Year:
2017
Conference:
Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language
Issue:
133
Article no.:
003
Pages:
7-11
No. of pages:
5
Publication type:
Abstract and Fulltext
Published:
2017-05-10
ISBN:
978-91-7685-503-4
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press, Linköpings universitet


Export in BibTex, RIS or text

The Royal Society Corpus is a corpus of Early and Late modern English built in an agile process covering publications of the Royal Society of London from 1665 to 1869 (Kermes et al., 2016) with a size of approximately 30 million words. In this paper we will provide details on two aspects of the building process namely the mining of patterns for OCR correction and the improvement and evaluation of part-of-speech tagging.

Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language

Author:
Jörg Knappen, Stefan Fischer, Hannah Kermes, Elke Teich, Peter Fankhauser
Title:
The Making of the Royal Society Corpus
References:

Bea Alex, Claire Grover, Ewan Klein, and Richard Tobin. 2012. Digitised historical text: Does it have to be mediOCRe? In Proceedings of KONVENS 2012 (LThist 2012 workshop), pages 401‚Äď409, Vienna, Austria.


Alistair Baron and Paul Rayson. 2008. VARD 2: A tool for dealing with spelling variation in historical corpora. In Proceedings of the Postgraduate Conference in Corpus Linguistics, Birmingham, UK.


Alistair Cockburn. 2001. Agile Software Development. Addison-Wesley Professional, Boston, USA.


Hannah Kermes, Stefania Degaetano-Ortlieb, Ashraf Khamis, J¬®org Knappen, and Elke Teich. 2016. The royal society corpus: From uncharted data to corpus. In Proceedings of the LREC 2016, PortoroŇĺ, Slovenia, May 23-28.


Wang Ling, Chris Dyer, Alan Black, and Isabel Trancoso. 2015. Two/too simple adaptations of word2vec for syntax problems. In Proceedings of NAACL.


Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages 3111‚Äď3119.


Helmut Schmid. 1994. Probabilistic part-of-speech tagging using decision trees. In Proceedings of International Conference on New Methods in Language Processing.


Helmut Schmid. 1995. Improvements in part-ofspeech tagging with an application to german. In Proceedings of the ACL SIGDAT-Workshop.


Ted Underwood and Loretta Auvil. 2012. Basic OCR correction. http://usesofscale.com/gritty-details/basic-ocr-correction/.


Holger Voormann and Ulrike Gut. 2008. Agile corpus building. Corpus Linguistics and Linguistic Theory, 4(2):235‚Äď251.

Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language

Author:
Jörg Knappen, Stefan Fischer, Hannah Kermes, Elke Teich, Peter Fankhauser
Title:
The Making of the Royal Society Corpus
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2017-02-21