Article | Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC, Umeå, 16th November 2016 | From Distributions to Labels: A Lexical Proficiency Analysis using Learner Corpora
Göm menyn

Title:
From Distributions to Labels: A Lexical Proficiency Analysis using Learner Corpora
Author:
David Alfter: University of Gothenburg, Sweden Yuri Bizzoni: University of Gothenburg, Sweden Anders Agebjórn: University of Gothenburg, Sweden Elena Volodina: University of Gothenburg, Sweden Ildikó Pilán: University of Gothenburg, Sweden
Download:
Full text (pdf)
Year:
2016
Conference:
Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC, Umeå, 16th November 2016
Issue:
130
Article no.:
001
Pages:
1-7
No. of pages:
7
Publication type:
Abstract and Fulltext
Published:
2016-11-15
ISBN:
978-91-7685-633-8
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Publisher:
Linköping University Electronic Press, Linköpings universitet


Export in BibTex, RIS or text

In this work we look at how information from second language learner essay corpora can be used for the evaluation of unseen learner essays. Using a corpus of learner essays which have been graded by well-trained human assessors using the CEFR scale, we extract a list of word distributions over CEFR levels. For the analysis of unseen essays, we want to map each word to a so-called target CEFR level using this word list. However, the task of mapping from a distribution to a single label is not trivial. We are also investigating how we can evaluate the mapping from distribution to label. We show that the distributional profile of words from the essays, informed with the essays’ levels, consistently overlaps with our frequency-based method, in the sense that words holding the same level of proficiency as predicted by our mapping tend to cluster together in a semantic space. In the absence of a gold standard, this information can be useful to see how often a word is associated with the same level in two different models. Also, in this case we have a similarity measure that can show which words are more central to a given level and which words are more peripheral.

Keywords: Lexical complexity, Common European Framework of Reference, Mapping, Semantic space

Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC, Umeå, 16th November 2016

Author:
David Alfter, Yuri Bizzoni, Anders Agebjórn, Elena Volodina, Ildikó Pilán
Title:
From Distributions to Labels: A Lexical Proficiency Analysis using Learner Corpora
References:

R. Artstein and M. Poesio. 2008. Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4):555‚Äď596.


Lars Borin, Markus Forsberg, and Johan Roxendal. 2012. Korp - the corpus infrastructure of Spr¬įakbanken. In LREC, pages 474‚Äď478.


Council of Europe. 2001. Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Press Syndicate of the University of Cambridge.


Eva Forsbom. 2006. A swedish base vocabulary pool. In Swedish Language Technology conference, Gothenburg.


Thomas Francçois, Elena Volodina, Ildik√≥ Pil√°n, and Anaïs Tack. 2016. SVALex: a CEFR-graded Lexical Resource for Swedish Foreign and Second Language Learners. In LREC 2016.


H√•kan Jansson, Sofie Johansson Kokkinakis, Judy Ribeck, and Emma Sk√∂ldberg. 2012. A Swedish Academic Word List: Methods and Data. In Proceedings of the 15th EURALEX International Congress, pages 7‚Äď11.


Adam Kilgarriff, Frieda Charalabopoulou, Maria Gavrilidou, Janne Bondi Johannessen, Saussan Khalil, Sofie Johansson Kokkinakis, Robert Lew, Serge Sharoff, Ravikiran Vadlapudi, and Elena Volodina. 2014. Corpus-based vocabulary lists for language learners for nine languages. Language resources and evaluation, 48(1):121‚Äď163.


K. Krippendorff. 1980. Content Analysis: An Introduction to Its Methodology. Chapter 12. Sage, Beverly Hills, CA.


Lorena Llozhi. 2016. SweLL list. A list of productive vocabulary generated from second language learners’ essays. Master’s Thesis. University of Gothenburg.


Tomas Mikolov and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems.


Katarina Heimann M√ľhlenbock and Sofie Johansson Kokkinakis. 2012. SweVoc-a Swedish vocabulary resource for CALL. In Proceedings of the SLTC 2012 workshop on NLP for CALL; Lund; 25th October; 2012, number 080, pages 28‚Äď34. Link√∂ping University Electronic Press.


Ildikó Pilán, David Alfter, and Elena Volodina. 2016. Coursebook texts as a helping hand for classifying linguistic complexity in language learners’ writings. In Proceedings of the workshop on Computational Linguistics for Linguistic Complexity (CL4LC). COLING 2016. Osaka, Japan.


Elena Volodina, Ildik√≥ Pil√°n, Stian R√łdven Eide, and Hannes Heidarsson. 2014. You get what you annotate: a pedagogically annotated corpus of coursebooks for Swedish as a Second Language. In Proceedings of the third workshop on NLP for computer-assisted language learning at SLTC 2014, Uppsala University, number 107. Link√∂ping University Electronic Press.


Elena Volodina, Ildikó Pilán, and David Alfter. 2016a. Classification of Swedish learner essays by CEFR levels. In Proceedings of EuroCALL 2016.


Elena Volodina, Ildikó Pilán, Ingegerd Enstr¨om, Lorena Llozhi, Peter Lundkvist, Gunlög Sundberg, and Monica Sandell. 2016b. SweLL on the rise: Swedish Learner Language corpus for European Reference Level studies. In LREC 2016.

Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC, Umeå, 16th November 2016

Author:
David Alfter, Yuri Bizzoni, Anders Agebjórn, Elena Volodina, Ildikó Pilán
Title:
From Distributions to Labels: A Lexical Proficiency Analysis using Learner Corpora
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2017-02-21