Article | Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language | Improving POS Tagging in Old Spanish Using TEITOK
Göm menyn

Title:
Improving POS Tagging in Old Spanish Using TEITOK
Author:
Maarten Janssen: CELGA-ILTEC, Portugal Josep Ausensi: Universitat Pompeu Fabra, Department of Translation and Language Sciences, Spain Josep M. Fontana: Universitat Pompeu Fabra, Department of Translation and Language Sciences, Spain
Download:
Full text (pdf)
Year:
2017
Conference:
Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language
Issue:
133
Article no.:
002
Pages:
2-6
No. of pages:
5
Publication type:
Abstract and Fulltext
Published:
2017-05-10
ISBN:
978-91-7685-503-4
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press, Linköpings universitet


Export in BibTex, RIS or text

In this paper, we describe how the TEITOK corpus tools helped to create a diachronic corpus for Old Spanish that contains both paleographic and linguistic information, which is easy to use for non-specialists, and in which it is easy to perform manual improvements to automatically assigned POS tags and lemmas.

Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language

Author:
Maarten Janssen, Josep Ausensi, Josep M. Fontana
Title:
Improving POS Tagging in Old Spanish Using TEITOK
References:

Malin Ahlberg, Lars Borin, Markus Forsberg, Martin Hammarstedt, Leif-J¬®oran Olsson, Olof Olsson, Johan Roxendal, and Jonatan Uppstr¬®om. 2013. Korp and karp ‚Äď a bestiary of language resources: the research infrastructure of spr¬įakbanken.


BNC. 2007. British national corpus, version 3 BNC XML edition.


Ivy A. Corfis, John O’Neill, and Jr. Theodore S. Beardsley. 1997. Early Celestina Electronic Texts and Concordances. Madison. Stefan Evert and Andrew Hardy. 2015. Twenty-first century corpus workbench: Updating a query architecture for the new millennium. In 10th International Conference on Open Repositories (OR2015), June.


Pablo Picasso Feliciano de Faria, Fabio Natanael Kepler, and Maria Clara Paixňúao de Sousa. 2010. An integrated tool for annotating historical corpora. In Proceedings of the Fourth Linguistic Annotation Workshop, LAW IV ’10, pages 217‚Äď221, Stroudsburg, PA, USA. Association for Computational Linguistics.


Andrew Hardie. 2012. Cqpweb combining power, flexibility and usability in a corpus analysis tool. International Journal of Corpus Linguistics, 17(3).


Serge Heiden. 2010. The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In Ryo Otoguro, Kiyoshi Ishikawa, Hiroshi Umemoto, Kei Yoshimoto, and Yasunari Harada, editors, 24th Pacific Asia Conference on Language, Information and Computation, pages 389‚Äď398, Sendai, Japan. Institute for Digital Enhancement of Cognitive Development, Waseda University.


Mar√≠a Teresa Herrera and Mar√≠a Estela Gonz√°lez de Fauve. 1997. Textos y Concordancias Electr√≥nicos del Corpus M√©dico Espa√Īol. Madison.


Maarten Janssen. 2012. NeoTag: a POS tagger for grammatical neologism detection. In Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012, Istanbul, Turkey, May 23-25, 2012, pages 2118‚Äď2124.


Maarten Janssen. 2015. Multi-level manuscript transcription: TEITOK. In Congresso de Humanidades Digitais em Portugal, Lisboa, October 8-9, 2015.


Lloyd Kasten, John Nitti, and Wilhelmina Jonxis-Henkemans. 1997. The Electronic Texts and Concordances of the ProseWorks of Alfonso X, El Sabio. Madison.


Thomas Krause and Amir Zeldes. 2016. Annis3: A new architecture for generic corpus query and visualization. Digital Scholarship in the Humanities, 31(1):118.


John Nitti and Lloyd Kasten. 1997. The Electronic Texts and Concordances of Medieval Navarro- Aragonese Manuscripts. Madison.


John O’Neill. 1999. Electronic Texts and Concordances of the Madison Corpus of Early Spanish Manuscripts and Printings. Madison.


Llu√≠s Padr√≥, Miquel Collado, Samuel Reese, Marina Lloberes, and Irene Castell¬īon. 2010. Freeling 2.1: Five years of open-source language processing tools. In Proceedings of 7th Language Resources and Evaluation Conference (LREC’10), La Valletta, Malta, May.


Cristina Sánchez-Marco, Gemma Boleda, and Lluís Padró. 2010. Annotation and representation of a diachronic corpus of spanish. In Proceedings of the Language Resources and Evaluation Conference, Malta, May. Association for Computational Linguistics.


Cristina S√°nchez-Marco, Gemma Boleda, and Llu√≠s Padr√≥. 2011. Extending the tool, or how to annotate historical language varieties. In Proceedings of the 5th ACL-HLT workshop on language technology for cultural heritage, social sciences, and humanities, pages 1‚Äď9. Association for Computational Linguistics.


Cristina S√°nchez-Marco, J.M. Fontana, and J. Domingo. 2012. Anotaci√≥n autom√°tica de textos diacr√≥nicos del espa√Īol. In Actas del VIII Congreso Internacional de Historia de la Lengua Espaňúnola, Universidad de Santiago de Compostela.


Jorge Vivaldi. 2009. Corpus and exploitation tool: Iulact and bwananet. In I International Conference on Corpus Linguistics (CICL 2009), A survey on corpus-based research, Universidad de Murcia, pages 224‚Äď239.

Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language

Author:
Maarten Janssen, Josep Ausensi, Josep M. Fontana
Title:
Improving POS Tagging in Old Spanish Using TEITOK
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2017-02-21