Article | Proceedings of the Joint 6th Workshop on NLP for Computer Assisted Language Learning and 2nd Workshop on NLP for Research on Language Acquisition at NoDaLiDa, Gothenburg, 22nd May 2017 | Annotating Errors in Student Texts: First Experiences and Experiments
Göm menyn

Title:
Annotating Errors in Student Texts: First Experiences and Experiments
Author:
Sara Stymne: Linguistics and Philology, Uppsala University, Sweden Eva Pettersson: Linguistics and Philology, Uppsala University, Sweden Beáta Megyesi: Linguistics and Philology, Uppsala University, Sweden Anne Palmér: Scandinavian Languages, Uppsala University, Sweden
Download:
Full text (pdf)
Year:
2017
Conference:
Proceedings of the Joint 6th Workshop on NLP for Computer Assisted Language Learning and 2nd Workshop on NLP for Research on Language Acquisition at NoDaLiDa, Gothenburg, 22nd May 2017
Issue:
134
Article no.:
006
Pages:
47-60
No. of pages:
14
Publication type:
Abstract and Fulltext
Published:
2017-05-11
ISBN:
978-91-7685-502-7
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press, Linköpings universitet


Export in BibTex, RIS or text

We describe the creation of an annotation layer for word-based writing errors for a corpus of student writings. The texts are written in Swedish by students between 9 and 19 years old. Our main purpose is to identify errors regarding spelling, split compounds and merged words. In addition, we also identify simple word-based grammatical errors, including morphological errors and extra words. In this paper we describe the corpus and the annotation process, including detailed descriptions of the error types and guidelines. We find that we can perform this annotation with a substantial inter-annotator agreement, but that there are still some remaining issues with the annotation. We also report results on two pilot experiments regarding spelling correction and the consistency of downstream NLP tools, to exemplify the usefulness of the annotated corpus.

Proceedings of the Joint 6th Workshop on NLP for Computer Assisted Language Learning and 2nd Workshop on NLP for Research on Language Acquisition at NoDaLiDa, Gothenburg, 22nd May 2017

Author:
Sara Stymne, Eva Pettersson, Beáta Megyesi, Anne Palmér
Title:
Annotating Errors in Student Texts: First Experiences and Experiments
References:

Andrea Abel, KatrinWisniewski, Lionel Nicolas, Adriane Boyd, Jirka Hana, and Detmar Meurers. 2014. A trilingual learner corpus illustrating European reference levels. ¬†RiCOGNIZIONI. Rivista di lingue, letterature e cultura moderne, 2(1):111‚Äď126.


Tua Abrahamsson and Pirko Bergman. 2014. Tankarna springer före: att bedöma ett andraspråk i utveckling. Liber, Stockholm, Sweden. Monica Axelsson and Ulrika Magnusson. 2012.


Forskning om flerspråkighet och kunskapsutveckling under skolåren. In Flerspråkighet: en forsknings översikt. Vetenskapsrådet, Stockholm, Sweden.


Yevgeni Berzak, Jessica Kenney, Carolyn Spadine, Jing Xian Wang, Lucia Lam, Keiko Sophie Mori, Sebastian Garza, and Boris Katz. 2016. Universal dependencies for learner English. In Proceedings of the 54th Annual Meeting of the ACL, pages 737‚Äď746, Berlin, Germany.


Lars Borin, Markus Forsberg, and Lennart Lönngren. 2008. SALDO 1.0 (Svenskt associationslexikon version 2). Språkbanken, University of Gothenburg.


Eric Brill and Robert C. Moore. 2000. An improved error model for noisy channel spelling correction. In Proceedings of the 38th Annual Meeting of the ACL, pages 286‚Äď293, Hong Kong.


Johan Carlberger, Rickard Domeij, Viggo Kann, and Ola Knutsson. 2005. The development and performance of a grammar checker for Swedish: A language engineering perspective. In Ola Knutsson. 2005. Developing and Evaluating Language Tools for Writers and Learners of Swedish. Ph.D. thesis, Royal Institute of Technology (KTH), Stockholm, Sweden.


Jean Carletta. 1996. Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics, 22(2):249‚Äď254.


Eva Ejerhed and Gunnel K¬®allgren. 1997. Stockholm Ume√• Corpus. Version 1.0. Produced by Department of Linguistics, Ume¬įa University and Department of Linguistics, Stockholm University.


Bj√∂rn Hammarberg. 2005. Introduktion till ASU‚Äďkorpusen, en longitudinell muntlig och skriftlig textkorpus av vuxna inl√§rares svenska med en motsvarande del fr√•n inf√∂dda svenskar. Institutionen for lingvistik, Stockholms universitet, Sweden.


Jirka Hana, Alexandr Rosen, Svatava ҆kodov√°, and Barbora ҆tindlov√°. 2004. Error-tagged learner corpus of Czech. In Proceedings of the Fourth Linguistic Annotation Workshop, Uppsala, Sweden.


John A. Hawkins and Paula Buttery. 2010. Criterial features in learner corpora: Theory and illustrations. English Profile Journal, 1(01):1‚Äď23.


Karen Kukich. 1992. Techniques for automatically correcting words in text. ACM Computing Surveys (CSUR), 24(4):377‚Äď439.


J. Richard Landis and Gary G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics, 33(1):159‚Äď174.


Vladimir Iosifovich Levenshtein. 1966. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady, 10(8):707‚Äď710.


Janne Lindberg and Gunnar Eriksson. 2004. Crosscheck-korpusen ‚Äď en elektronisk svensk
inl¨ararkorpus. In Proceedings of the ASLA Conference 2004.


B√©ata Megyesi, Jesper N√§sman, and Anne Palm√©r. 2016. The Uppsala Corpus of Student Writings - corpus creation, annotation, and analysis. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’16), PortoroŇĺ, Slovenia.


Diane Nicholls. 2003. The Cambridge Learner Corpus: Error coding and analysis for lexicography and ELT. In Proceedings of the Corpus Linguistics 2003 conference, pages 572‚Äď581, Lancaster, UK.


Joakim Nivre, Johan Hall, and Jens Nilsson. 2006. MaltParser: A data-driven parser-generator for dependency parsing. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC’06), pages 2216‚Äď2219, Genoa, Italy.


Joakim Nivre, B√©ata Megyesi, Sofia Gustafson-Capkov√°, Filip Salomonsson, and Bengt Dahlqvist. 2008. Cultivating a Swedish treebank. In Resourceful Language Technology: Festschrift in Honor of Anna S√•gvall Hein, pages 111‚Äď120. Acta Universitatis Upsaliensis, Uppsala, Sweden.


Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajic, Christopher D. Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsafarty, and Daniel Zeman. 2016. Universal dependencies v1: A multilingual treebank collection. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’16), PortoroŇĺ, Slovenia.


Jesper N√§sman, B√©ata Megyesi, and Anne Palm√©r. 2017. Swegram ‚Äď a web-based tool for automatic annotation and analysis of Swedish texts. In Proceedings of the 21st Nordic Conference on Computational Linguistics (NODALIDA’17), Gothenburg, Sweden.


Lena √Ėhrman. 1998. Felaktigt sa√§rskrivna sammans√§ttningar. Bachelor thesis, Stockholm University, Stockholm, Sweden.


Robert √Ėstling. 2016. Shallow learning for sequence tagging. Presented at The 6th Swedish Language Technology Conference (SLTC16), Ume√•, Sweden.


Eva Pettersson, B√©ata Megyesi, and Joakim Nivre. 2013. Normalisation of historical text using contextsensitive weighted Levenshtein distance and compound splitting. In Proceedings of the 19th Nordic Conference on Computational Linguistics (NODALIDA’ 13), Oslo, Norway.


Anju Saxena and Lars Borin. 2002. Locating and reusing sundry NLP flotsam in an e-learning application. In Proceedings of the Workshop on Customizing knowledge in NLP applications: strategies, issues, and evaluation (LREC12), Las Palmas, Canary Islands, Spain.


Svenska Akademiens ordlista. 2006. 13th edition. Svenska Akademien, Stockholm, Sweden.


Svenska Akademiens ordlista. 2015. 14th edition.


Svenska Akademien, Stockholm, Sweden.


Kari Tenfjord, Paul Meurer, and Knut Hofland. 2004. The ask-corpus - a language learner corpus of Norwegian as a second language. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC’04), Lisbon, Portugal.


Kristina Toutanova and Robert Moore. 2002. Pronunciation modeling for improved spelling correction. In Proceedings of the 40th Annual Meeting of the ACL, pages 144‚Äď151, Philadelphia, Pennsylvania, USA.


Elena Volodina, Ildik√≥ Pil√°n, Ingegerd Enstr√∂m, Lorena Llozhi, Peter Lundkvist, Gunl√∂g Sundberg, and Monica Sandell. 2016. SweLL on the rise: Swedish Learner Language corpus for European Reference Level studies. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’16), PortoroŇĺ, Slovenia.

Proceedings of the Joint 6th Workshop on NLP for Computer Assisted Language Learning and 2nd Workshop on NLP for Research on Language Acquisition at NoDaLiDa, Gothenburg, 22nd May 2017

Author:
Sara Stymne, Eva Pettersson, Beáta Megyesi, Anne Palmér
Title:
Annotating Errors in Student Texts: First Experiences and Experiments
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2017-02-21