Article | Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC, Umeå, 16th November 2016 | Towards error annotation in a learner corpus of Portuguese
Göm menyn

Title:
Towards error annotation in a learner corpus of Portuguese
Author:
Iria del R√≠o: University of Lisbon ‚Äď CLUL, Portugal Sandra Antunes: University of Lisbon ‚Äď CLUL, Portugal Am√°lia Mendes: University of Lisbon ‚Äď CLUL, Portugal Maarten Janssen: University of Coimbra ‚Äď CELGA-ILTEC, Portugal
Download:
Full text (pdf)
Year:
2016
Conference:
Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC, Umeå, 16th November 2016
Issue:
130
Article no.:
002
Pages:
8-17
No. of pages:
10
Publication type:
Abstract and Fulltext
Published:
2016-11-15
ISBN:
978-91-7685-633-8
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Publisher:
Linköping University Electronic Press, Linköpings universitet


Export in BibTex, RIS or text

In this article, we present COPLE2, a new corpus of Portuguese that encompasses written and spoken data produced by foreign learners of Portuguese as a foreign or second language (FL/L2). Following the trend towards learner corpus research applied to less commonly taught languages, it is our aim to enhance the learning data of Portuguese L2. These data may be useful not only for educational purposes (design of learning materials, curricula, etc.) but also for the development of NLP tools to support students in their learning process. The corpus is available online using TEITOK environment, a web-based framework for corpus treatment that provides several built-in NLP tools and a rich set of functionalities (multiple orthographic transcription layers, lemmatization and POS, normalization of the tokens, error annotation) to automatically process and annotate texts in XML format. A CQP-based search interface allows searching the corpus for different fields, such as words, lemmas, POS tags or error tags. We will briefly describe the work in progress regarding the constitution and linguistic annotation of this corpus, particularly focusing on error annotation.

Keywords: Learner corpus, Error annotation, Corpus processing tool, Pedagogical resource

Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC, Umeå, 16th November 2016

Author:
Iria del Río, Sandra Antunes, Amália Mendes, Maarten Janssen
Title:
Towards error annotation in a learner corpus of Portuguese
References:

Boyd, A., J. Hana, L. Nicolas, D. Meurers, K. Wisniewski, A. Abel, K. Sch√∂ne, B. ҆tindlov√° and C. Vettori. 2014. The MERLIN corpus: Learner Language and the CEFR. In Proceedings of LREC, Reykjavik, Iceland. pp.1281-1288.


Burnard, L. and S. Bauman. Eds. 2013. TEI P5: Guidelines for Electronic Text Encoding and Interchange. Text Encoding Initiative Consortium: Charlottesville, Virginia.


Council of Europe. 2001. Common European framework of reference for languages: Learning, teaching, assessment. Cambridge, U.K: Press Syndicate of the University of Cambridge.


Cresti, E. and M. Moneglia. Eds. 2005. C-ORAL-ROM. Integrated Reference Corpora for Spoken Romance Languages. Amsterdam/Philadelphia: John Benjamins Publishing Company.


Christ, O., B. Schulze, A. Hofmann and E. Koenig. 1999. The IMS Corpus Workbench: Corpus Query Processor (CQP): User’s Manual. Institute for Natural Language Processing. University of Stuttgart. (CQP V2.2).


Dagneaux, E., S. Denness, S. Granger, F. Meunier, J. Neff and J. Thewissen. Eds. 2005. Error Tagging Manual. Version 1.2. Centre for English Corpus Linguistics. Université Catholique de Louvain.


Delais-Roussarie E. and H. Yoo. 2010. The COREIL corpus: a learner corpus designed for studying phrasal phonology and intonation. In K. Dziubalska-Kolaczyk, M. Wrembel and M. Kul (Eds). Proceedings of New Sound 2010. Poznan, Pologne, pp. 100-105.


Díaz-Negrillo, A. & Fernández-Domíguez, J. 2006. Error Tagging Systems for Learner Corpora. RESLA, 19:83-102.


Granger, S. 1996. From CA to CIA and back: An integrated approach to computerized bilingual and learner corpora. In K. Aijmer, B. Altenberg and M. Johansson (Eds.). Languages in Contrast. Text-based cross-linguistic studies. Lund Studies in English 88. Lund: Lund University Press, pp. 37-51.


Granger, S. 2003. Error-tagged Learner Corpora and CALL: A Promising Synergy. CALICO Journal 20 (3). Special issue on error analysis and error correction in computer-assisted language learning, pp. 465-480.


Granger, S. 2004. Computer learner corpus research: current status and future prospects. In U. Connor & T. Upton (Eds.), Applied Corpus Linguistics: A Multidimensional Perspective (pp. 123-145). Amsterdam & Atlanta: Rodopi.


Granger, S. 2015. Contrastive Interlanguage Analysis: a reappraisal. International Journal of Learner Corpus Research. Vol. 1:1. John Benjamins Publishing Company, pp. 7-24.


Granger, S., E. Dagneaux, F. Meunier and M. Paquot. Eds. 2009. International Corpus of Learner English. Version 2. UCL: Presses Universitaires de Louvain.


Hinrichs, L. 2006. Codeswitching on theWeb. English and Jamaican Creole in e-mail communication. Amsterdam/Philadelphia: John Benjamins Publishing Company.


Janssen, M. 2012. NeoTag: a POS Tagger for Grammatical Neologism Detection. In Proceedings of LREC 2012, Istanbul, Turkey.


Janssen, M. 2016. TEITOK: Text-Faithful Annotated Corpora. In Proceedings of LREC 2016, PortoroŇĺ, Slovenia.


Leiria, I. 2001. L√©xico ‚Äď aquisi√ß√£o e ensino do Portugu√™s Europeu l√≠ngua n√£o materna. PhD Dissertation. Faculdade de Letras da Universidade de Lisboa.


Lozano, C. 2009. CEDEL2: Corpus Escrito del Espa√Īol L2. In C. M. Bretones Callejas et al. (Eds). Applied Linguistics Now: Understanding Language and Mind / La Ling√ľ√≠stica Aplicada Hoy: Comprendiendo el Lenguaje y la Mente. Almer√≠a: Universidad de Almer√≠a, pp. 197-212.


MacWhinney, B. 2000. The CHILDES Project: Tools for Analyzing Talk. 3rd Edition. Mahwah, NJ: Lawrence Erlbaum Associates.


Mendes, A., M. Généreux, I. Hendricks. 2014. Manual for the CRPC on the CQPweb interface. Manual 1.3. http://alfclul.clul.ul.pt/CQPweb/doc/CRPCmanual.v1_2_en.pdf.


Mendes, A., S. Antunes, M. Janssen and A. Gon√ßalves. 2016. The COPLE2 Corpus: a Learner Corpus for Portuguese. In Proceedings of LREC 2016, PortoroŇĺ, Slovenia.


Meurers, D. 2015. Learner Corpora and Natural Language Processing. In S. Granger, G. Gilquin and F. Meunier (Eds.). The Cambridge Handbook of Learner Corpus Research. Cambridge University Press, pp. 537-566.


Nicholls, D. 2003. The Cambridge Learner Corpus ‚Äď error coding and analysis for lexicography and ELT. In D. Archer, P. Rayson, A. Wilson and T. McEnery (Eds.). Proceedings of the Corpus Linguistics 2003 Conference. Lancaster University, pp. 572-581.


Rosen, A., J. Hana, B. ҆tindlov√° & A. Feldman 2013. Evaluating and automating the annotation of a learner corpus. Language Resources and Evaluation pp. 1-28.


Schmidt, T. 2012. EXMARaLDA and the FOLK tools ‚Äď two toolsets for transcribing and annotating spoken language. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC). Istanbul, Turkey, pp. 236-40.


Tono, Y. 2003. Learner corpora: Design, development and applications. In D. Archer, P. Rayson, A. Wilson and T. McEnery (Eds.), Proceedings of the Corpus Linguistics 2003 Conference. Lancaster University, pp. 800-809.

Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC, Umeå, 16th November 2016

Author:
Iria del Río, Sandra Antunes, Amália Mendes, Maarten Janssen
Title:
Towards error annotation in a learner corpus of Portuguese
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2017-02-21