Article | Proceedings of the third workshop on NLP for computer-assisted language learning at SLTC 2014, Uppsala University | Automatic CEFR Level Prediction for Estonian Learner Text
Göm menyn

Title:
Automatic CEFR Level Prediction for Estonian Learner Text
Author:
Sowmya Vajjala: LEAD Graduate School, University of Tübingen, Germany Kaidi Lėo: Department of Linguistics, University of Alberta, Canada
Download:
Full text (pdf)
Year:
2014
Conference:
Proceedings of the third workshop on NLP for computer-assisted language learning at SLTC 2014, Uppsala University
Issue:
107
Article no.:
009
Pages:
113–127
No. of pages:
15
Publication type:
Abstract and Fulltext
Published:
2014-11-11
ISBN:
978-91-7519-175-1
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press, Linköpings universitet


Export in BibTex, RIS or text

This paper reports on approaches for automatically predicting a learner’s language proficiency in Estonian according to the European CEFR scale. We used the morphological and POS tag information extracted from the texts written by learners. We compared classification and regression modeling for this task. Our models achieve a classification accuracy of 79% and a correlation of 0.85 when modeled as regression. After a comparison between them, we concluded that classification is more effective than regression in terms of exact error and the direction of error. Apart from this, we investigated the most predictive features for both multi- class and binary classification between groups and also explored the nature of the correlations between highly predictive features. Our results show considerable improvement in classification accuracy over previously reported results and take us a step closer towards the automated assessment of Estonian learner text.

Keywords: Estonian; Proficiency Classification; CEFR; Morphological Features; Machine Learning

Proceedings of the third workshop on NLP for computer-assisted language learning at SLTC 2014, Uppsala University

Author:
Sowmya Vajjala, Kaidi Lėo
Title:
Automatic CEFR Level Prediction for Estonian Learner Text
References:

Burstein, J. (2003). The e-rater Scoring Engine: Automated Essay Scoring with Natural Language Processing, chapter 7, pages 107–115. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.


Burstein, J. and Chodorow, M. (2010). Progress and New Directions in Technology for Automated Essay Evaluation, chapter 36, pages 487–497. Oxford University Press, 2nd edition.


Council of Europe (2001). Common European Framework of Reference for Languages: Learning, teaching, assessment. Cambridge University Press, Cambridge.


Crossley, S. A., Salsbury, T., McNamara, D. S., and Jarvis, S. (2011). Predicting lexical proficiency in language learners using computational indices. Language Testing, 28:561–580.


Eslon, P. (2014). Eesti vahekeele korpus (Estonian Interlanguage Corpus). Keel ja Kirjandus, 6:436–451.


Gyllstad, H., Grandfeldt, J., Bernardini, P., and Källkvist, M. (2014). Linguistic correlates to communicative proficiency levels of the CEFR: The case of syntactic complexity in written l2 english, l3 french and l4 italian. EUROSLA Yearbook, 14(1):1–30.


Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. (2009). The WEKA data mining software: An update. The SIGKDD Explorations, 11(1):10–18.


Hall, M. A. (1998). Correlation-based Feature Subset Selection for Machine Learning. Hamilton, Newzealand.


Hancke, J. (2013). Automatic prediction of CEFR proficiency levels based on linguistic features of learner language. Master’s thesis, International Studies in Computational Linguistics. Seminar für Sprachwissenschaft, Universität Tübingen.


Hancke, J. and Meurers, D. (2013). Exploring CEFR classification for german based on rich linguistic modeling. In Learner Corpus Research 2013, Book of Abstracts, Bergen, Norway.


Kira, K. and Rendell, L. A. (1992). A practical approach to feature selection. In Ninth International Workshop on Machine Learning, pages 249–256.


Kononenko, I. (1994). Estimating attributes: Analysis and extensions of RELIEF. In European Conference on Machine Learning, pages 171–182.


Kyle, K. and Crossley, S. A. (2014). Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Quarterly, –:–.


Lu, X. (2010). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4):474–496.


Lu, X. (2012). The relationship of lexical richness to the quality of ESL learners’ oral narratives. The Modern Languages Journal.


Östling, R., Smolentzov, A., Tyrefors Hinnerich, B., and Höglin, E. (2013). Automated essay scoring for swedish. In Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications, pages 42–47, Atlanta, Georgia. Association for Computational Linguistics.


Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In Proceedings of the International Conference on New Methods in Language Processing, pages 44–49, Manchester, UK.


Tono, Y. (2000). A corpus-based analysis of interlanguage development: analysing pos tag sequences of EFL learner corpora. In PALC’99: Practical Applications in Language Corpora, pages 323–340.


Vajjala, S. and Lõo, K. (2013). Role of morpho-syntactic features in Estonian proficiency classification. In Proceedings of the 8th Workshop on Innovative Use of NLP for Building Educational Applications (BEA8), Association for Computational Linguistics.


Vyatkina, N. (2012). The development of second language writing complexity in groups and individuals: A longitudinal learner corpus study. The Modern Language Journal.


Williamson, D. M. (2009). A framework for implementing automated scoring. In The annual meeting of the American Educational Research Association (AERA) and the National Council on Measurement in Education (NCME).


Yannakoudakis, H., Briscoe, T., and Medlock, B. (2011). A new dataset and method for automatically grading ESOL texts. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT ’11, pages 180–189, Stroudsburg, PA, USA.


ssociation for Computational Linguistics. Corpus available: http://ilexir.co.uk/applications/clc-fce-dataset.


Zhang, B. (2008). Investigating proficiency classification for the examination for the certificate of proficiency in english (ECPE). In Spaan Fellow Working Papers in Second or Foreign Language Assessment.

Proceedings of the third workshop on NLP for computer-assisted language learning at SLTC 2014, Uppsala University

Author:
Sowmya Vajjala, Kaidi Lėo
Title:
Automatic CEFR Level Prediction for Estonian Learner Text
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2017-02-21