Article | Proceedings of the SLTC 2012 workshop on NLP for CALL; Lund; 25th October; 2012 | Semi-automatic selection of best corpus examples for Swedish: Initial algorithm evaluation

Title:
Semi-automatic selection of best corpus examples for Swedish: Initial algorithm evaluation
Author:
Elena Volodina: Department of Swedish & Språkbanken, University of Gothenburg, Sweden Richard Johansson: Department of Swedish & Språkbanken, University of Gothenburg, Sweden Sofie Johansson Kokkinakis: Department of Swedish & Språkbanken, University of Gothenburg, Sweden
Download:
Full text (pdf)
Year:
2012
Conference:
Proceedings of the SLTC 2012 workshop on NLP for CALL; Lund; 25th October; 2012
Issue:
080
Article no.:
007
Pages:
59-70
No. of pages:
12
Publication type:
Abstract and Fulltext
Published:
2012-11-12
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Publisher:
Linköping University Electronic Press; Linköpings universitet


The study presented here describes the results of the initial evaluation of two sorting approaches to automatic ranking of corpus examples for Swedish. Representatives from two potential target user groups have been asked to rate top three hits per approach for sixty search items from the point of view of the needs of their professional target groups; namely second/foreign language (L2) teachers and lexicographers. This evaluation has shown; on the one hand; which of the two approaches to example rating (called in the text below algorithms #1 and #2) performs better in terms of finding better examples for each target user group; and on the other hand; which features evaluators associate with good examples. It has also facilitated statistic analysis of the “good” versus “bad” examples with reference to the measurable features; such as sentence length; word length; lexical frequency profiles; PoS constitution; dependency structure; etc. with a potential to find out new reliable classifiers.

Proceedings of the SLTC 2012 workshop on NLP for CALL; Lund; 25th October; 2012

Author:
Elena Volodina, Richard Johansson, Sofie Johansson Kokkinakis
Title:
Semi-automatic selection of best corpus examples for Swedish: Initial algorithm evaluation
References:

Carl Hugo Björnsson. 1968. Läsbarhet. Liber Stockholm.


Lars Borin; Markus Forsberg; & Johan Roxendal. 2012a. Korp – the corpus infrastructure of Språkbanken. Proceedings of LREC 2012. Istanbul: ELRA. 474–478.


Lars Borin; Markus Forsberg; Karin Friberg Heppin; Richard Johansson; Annika Kjellandsson. 2012b. Search Result Diversification Methods to Assist Lexicographers. Proceedings of the 6th Linguistic Annotation Workshop.


Magnus Cedergren. 1992. Kvantitativa läsbarhetsanalyser som metod för datorstödd granskning. <http://iplab.nada.kth.se/pub_all.jsp> (Retrieved 2007-02-08) Stockholm: Inst.för Numerisk analys och datalogi; Kungl. Tekniska högskolan; NADA.


Kevyn Collins-Thompson and James P. Callan. 2004. A Language Modelling Approach to Predicting Reading Difficulty. Proceedings of the HLT/NAACL Annual Conference. Boston; MA; USA.


Council of Europe 2001. The Common European Framework of Reference for Languages. Cambridge University Press.


Jörg Didakowski; Lothar Lemnitzer & Alexander Geyken. 2012. Automatic example sentence extraction for a contemporary German dictionary. Proceedings of EuraLex 2012.


Jan Einarsson. 1976. Talbanken: Talbankens skriftsprĂĄkskonkordans/Talbankens talsprĂĄkskonkordans. Lund University.


Rudolf Flesch. 1948 A new readability yardstick. Journal of Applied Psychology; Vol. 32; pp. 221– 233.


Karin Friberg Heppin; Maria Toporowska Gronostaj. 2012. The Rocky Road towards a Swedish FrameNet – Creating SweFN. Proceedings of the Eighth conference on International Language Resources and Evaluation (LREC 2012); Istanbul; Turkey. p. 256–261


Glenn Fulcher. 1997. Text Difficulty and Accessibility:Reading Formulae and Expert Judgement. System vol.25; 497–513.


Jerker Järborg. 1989. Betydelseanalys och betydelsebeskrivning i lexikalisk databas. Göteborg: Inst. f. sv. Spr.; Göteborgs universitet.


Katarina Heimann Mühlenbock. Forthcoming. I see what you mean – Assessing readability for specific target groups. PhD Thesis; Gothenburg University.


Philip Hubbard. 2012. Curation for systematization of authentic content for autonomous learning. EuroCALL 2012 Proceedings; Gothenburg.


Thomas N. Huckin. 1983. A Cognitive Approach to Readability. In: Paul V. Anderson; R. John Brockmann and Carolyn R. Miller; Editors; New Essays in Technical and Scientific Communication: Research; Theory; Practice; Baywood; Farmington; NY; pp. 71–90.


Milos Husák. 2008. Automatic Retrieval of Good Dictionary Examples. Bachelor Thesis; Brno. Retrieved on 2010-09-22 from http://is.muni.cz/th/172590/fi_b/bachelor_thesis.pdf


Adam Kilgarriff; Milos Husák; Katy McAdam; Michael Rundell; Pavel Rychlý. 2008. GDEX: Automatically finding good dictionary examples in a corpus. Proc EURALEX; Barcelona; Spain.


Sofie Johansson Kokkinakis and Elena Volodina. 2011. Corpus-based approaches for the creation of a frequency based vocabulary list in the EU project KELLY – issues on reliability; validity and coverage. Proceedings of eLex 2011; Slovenia.


Iztok Kosem; Milos Husák and McCarthy Diana. 2011. GDEX for Slovene. Proceedings of eLex 2011; Slovenia; pp.151–159.


Gunnel Källgren; Sofia Gustafson-Capková and Britt Hartmann. 2006. Manual of the Stockholm Umeå Corpus version 2.0. Department of Linguistics; Stockholm University.


Enrico Minack; Wolf Siberski; and Wolfgang Nejdl. 2011. Incremental diversification for very large sets: a streaming-based approach. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development of Information Retrieval; SIGIR’11; pp. 585–594. New York; United States.


Katarina MĂĽhlenbock and Sofie Johansson Kokkinakis. 2009. LIX 68 revisited - An extended readability measure. Proceedings of Corpus Linguistics 2009.


Joakim Nivre; Jens Nilsson & Johan Hall. 2006. Talbanken05: A Swedish Treebank with Phrase Structure and Dependency Annotation. InProceedings of the fifth international conference on Language Resources and Evaluation (LREC2006) Genoa: ELRA. 1392–1395.


Niels Ott and Detmar Meurers. 2010. Information Retrieval for Education: Making Search Engines Language Aware. Themes in Science and Technology Education. Vol 3; No 1-2. Special issue on “Computer-aided language analysis; teaching and learning: approaches; perspectives and applications” edited by George Weir and Shin’ichiro Ishikawa; 2010.


Amruta Purandare and Ted Pedersen. 2004. Word sense discrimination by clustering contexts in vector and similarity spaces. In Proceedings of the Conference on Computational Natural Language Learning (CoNLL); pp. 41–48. Boston; United States.


Thomas M. Segler. 2007. Investigating the Selection of Example Sentences for Unknown Target Words in ICALL Reading Texts for L2 German. Doctoral
Thesis. University of Edinburgh. Retrieved on 2010- 09-22 from http://www.era.lib.ed.ac.uk/bitstream/1842/1750/3/Se gler TM thesis 2007.pdf


Ulf Teleman. 1974. Manual för grammatisk beskrivning av talad och skriven svenska. Lund.


Elena Volodina. 2010. Corpora in Language Classroom: Reusing Stockholm UmeĂĄ Corpus in a Vocabulary Exercise Generator. LAP Lambert Academic Publishing; Colne; Germany.


Elena Volodina and Lars Borin. 2012. Developing an Open-Source Web-Based Exercise Generator for Swedish. EuroCALL 2012 Proceedings; Gothenburg.


Elena Volodina & Sofie Johansson Kokkinakis. 2012. Introducing Swedish Kelly-list; a new lexical eresource for Swedish. LREC 2012; Turkey.

Proceedings of the SLTC 2012 workshop on NLP for CALL; Lund; 25th October; 2012

Author:
Elena Volodina, Richard Johansson, Sofie Johansson Kokkinakis
Title:
Semi-automatic selection of best corpus examples for Swedish: Initial algorithm evaluation
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment