Article | Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden | SWEGRAM ‚Äď A Web-Based Tool for Automatic Annotation and Analysis of Swedish Texts
Göm menyn

Title:
SWEGRAM ‚Äď A Web-Based Tool for Automatic Annotation and Analysis of Swedish Texts
Author:
Jesper Näsman: Linguistics and Philology, Uppsala University, Sweden Beáta Megyesi: Linguistics and Philology, Uppsala University, Sweden Anne Palmér: Scandinavian Languages, Uppsala University, Sweden
Download:
Full text (pdf)
Year:
2017
Conference:
Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden
Issue:
131
Article no.:
016
Pages:
132-141
No. of pages:
10
Publication type:
Abstract and Fulltext
Published:
2017-05-08
ISBN:
978-91-7685-601-7
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press, Linköpings universitet


Export in BibTex, RIS or text

We present SWEGRAM, a web-based tool for the automatic linguistic annotation and quantitative analysis of Swedish text, enabling researchers in the humanities and social sciences to annotate their own text and produce statistics on linguistic and other text-related features on the basis of this annotation. The tool allows users to upload one or several documents, which are automatically fed into a pipeline of tools for tokenization and sentence segmentation, spell checking, part-of-speech tagging and morpho-syntactic analysis as well as dependency parsing for syntactic annotation of sentences. The analyzer provides statistics on the number of tokens, words and sentences, the number of parts of speech (PoS), readability measures, the average length of various units, and frequency lists of tokens, lemmas, PoS, and spelling errors. SWEGRAM allows users to create their own corpus or compare texts on various linguistic levels.

Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Author:
Jesper Näsman, Beáta Megyesi, Anne Palmér
Title:
SWEGRAM ‚Äď A Web-Based Tool for Automatic Annotation and Analysis of Swedish Texts
References:

Laurence Anthony and Paul Baker. 2015. ProtAnt: A tool for analysing the prototypicality of texts. International Journal of Corpus Linguistics, 20(3):273‚Äď292.


Lars Borin, Markus Forsberg, and Johan Roxendal. 2012. Korp ‚Äď the corpus infrastructure of Spr√•kbanken. In Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012, page 474-478.


Lars Borin, Markus Forsberg, Martin Hammarstedt, Dan Ros√©n, Anne Schumacher, and Roland Sch√§fer. 2016. Sparv: Spr√•kbanken’s corpus annotation pipeline infrastructure. In SLTC 2016.


CLARIN-D/SfS-Uni. T¨ubingen. 2012. WebLicht: Web-Based Linguistic Chaining Tool. Online. Date Accessed: 28 Mar 2017. URL https://weblicht.sfs.uni-tuebingen.de/.


Dominique Estival and Steve Cassidy. 2016. Alveo: Above and beyond speech, language and music, a virtual lab for human communication science. Online. Date Accessed: 28 Mar 2017. URL http://alveo.edu.au/about/.


Sofia Gustafson-Capková and Britt Hartmann, 2006. Documentation of the Stockholm - Umeå Corpus. Stockholm University: Department of Linguistics.


P√©ter Hal√°csy, Andr√°s Kornai, and Csaba Oravecz. 2007. Hunpos: An open source trigram tagger. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, ACL ’07, pages 209‚Äď212, Stroudsburg, PA, USA.


Association for Computational Linguistics. Erhard W. Hinrichs, Marie Hinrichs, and Thomas Zastrow. 2010. Weblicht: Web-based LRT services for German. In Proceedings of the ACL 2010 System Demonstrations, pages 25‚Äď29.


Sebastian Hoffmann, Stefan Evert, Nicholas Smith, David Lee, and Ylva Berglund Prytz. 2008. Corpus Linguistics with BNCweb ‚Äď A Practical Guide. Frankfurt am Main: Peter Lang.


Tor G. Hultman and Margareta Westman. 1977. Gymnasistsvenska. Liber Läromedel, Lund.


Milen Kouylekov, Emanuele Lapponi, Stephan Oepen, Erik Velldal, and Nikolay Aleksandrov Vazov. 2014. LAP: The language analysis portal. Online. Date Accessed: 28 Mar 2017. URL http://www.mn.uio.no/ifi/english/research/projects/-clarino/.


Emanuele Lapponi, Erik Velldal, Stephan Oepen, and Rune Lain Knudsen. 2014. Off-road laf: Encoding and processing annotations in nlp workflows. In 9th edition of the Language Resources and Evaluation Conference (LREC).


Ulrika Magnusson and Sofie Johansson Kokkinakis. 2011. Computer-Based Quantitative Methods Applied to First and Second Language Student Writing. In Inger K√§llstr√∂m and Inger Lindberg, editors, Young Urban Swedish. Variation and change in multilingual settings, pages 105‚Äď124. G√∂teborgsstudier i nordisk spr√•kvetenskap 14. University of Gothenburg.


Be√°ta Megyesi, Jesper N√§sman, and Anne Palm√©r. 2016. The Uppsala corpus of student writings: Corpus creation, annotation, and analysis. In Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pages 3192‚Äď3199, Paris, France. European Language Resources Association (ELRA).


Be√°ta Megyesi. 2008. The Open Source Tagger Hun-PoS for Swedish. Uppsala University: Department of Linguistics and Philology.


Joakim Nivre, Johan Hall, and Jens Nilsson. 2006. Maltparser. In Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC ’06, pages 2216‚Äď2219.


Joakim Nivre, Be√°ta Megyesi, Sofia Gustafson-Capkov√°, Filip Salomonsson, and Bengt Dahlqvist. 2008. Cultivating a Swedish treebank. In Joakim Nivre, Mats Dahll√∂f, and Be√°ta Megyesi, editors, Resourceful Language Technology: A Festschrift in Honor of Anna S√•gvall Hein, pages 111‚Äď120.


Joakim Nivre, ŇĹeljko Agic, Lars Ahrenberg, Maria Jesus Aranzabe, Masayuki Asahara, Aitziber Atutxa, Miguel Ballesteros, John Bauer, Kepa Bengoetxea, Yevgeni Berzak, Riyaz Ahmad Bhat, Eckhard Bick, Carl B√∂rstell, Cristina Bosco, Gosse Bouma, Sam Bowman, G√ľlsen Cebirolu Eryiit, Giuseppe G. A. Celano, Fabricio Chalub, √áar √á√∂ltekin, Miriam Connor, Elizabeth Davidson, Marie-Catherine de Marneffe, Arantza Diaz de Ilarraza, Kaja Dobrovoljc, Timothy Dozat, Kira Droganova, Puneet Dwivedi, Marhaba Eli, TomaŇĺ Erjavec, Rich√°rd Farkas, Jennifer Foster, Claudia Freitas, Katar√≠na GajdoŇ°ov√°, Daniel Galbraith, Marcos Garcia, Moa G√§rdenfors, Sebastian Garza, Filip Ginter, Iakes Goenaga, Koldo Gojenola, Memduh G√∂krmak, Yoav Goldberg, Xavier G√≥mez Guinovart, Berta Gonz√°les Saavedra, Matias Grioni, Normunds Gruzitis, Bruno Guillaume, Jan Hajic, Linh H√° M, Dag Haug, Barbora Hladk√°, Radu Ion, Elena Irimia, Anders Johannsen, Fredrik J√łrgensen, H√ľner Kaskara, Hiroshi Kanayama, Jenna Kanerva, Boris Katz, Jessica Kenney, Natalia Kotsyba, Simon Krek, Veronika Laippala, Lucia Lam, Phng L√™ Hng, Alessandro Lenci, Nikola LjubeŇ°ic, Olga Lyashevskaya, Teresa Lynn, Aibek Makazhanov, Christopher Manning, Catalina Maranduc, David Marecek, H√©ctor Mart√≠nez Alonso, Andr√© Martins, Jan MaŇ°ek, Yuji Matsumoto, Ryan McDonald, Anna Missil√§, Verginica Mititelu, Yusuke Miyao, Simonetta Montemagni, Keiko Sophie Mori, Shunsuke Mori, Bohdan Moskalevskyi, Kadri Muischnek, Nina Mustafina, Kaili M√ľ√ľrisep, Lng Nguyn Th, Huyn Nguyn Th Minh, Vitaly Nikolaev, Hanna Nurmi, Petya Osenova, Robert √Ėstling, Lilja √ėvrelid, Valeria Paiva, Elena Pascual, Marco Passarotti, Cenel-Augusto Perez, Slav Petrov, Jussi Piitulainen, Barbara Plank, Martin Popel, Lauma Pretkalnia, Prokopis Prokopidis, Tiina Puolakainen, Sampo Pyysalo, Alexandre Rademaker, Loganathan Ramasamy, Livy Real, Laura Rituma, Rudolf Rosa, Shadi Saleh, Baiba Saul¬Įite, Sebastian Schuster, Wolfgang Seeker, Mojgan Seraji, Lena Shakurova, Mo Shen, Natalia Silveira, Maria Simi, Radu Simionescu, Katalin Simk√≥, M√°ria ҆imkov√°, Kiril Simov, Aaron Smith, Carolyn Spadine, Alane Suhr, Umut Sulubacak, Zsolt Sz√°nt√≥, Takaaki Tanaka, Reut Tsarfaty, Francis Tyers, Sumire Uematsu, Larraitz Uria, Gertjan van Noord, Viktor Varga, Veronika Vincze, Lars Wallin, Jing Xian Wang, Jonathan North Washington, Mats Wir√©n, Zdenek ŇĹabokrtsk√Ĺ, Amir Zeldes, Daniel Zeman, and Hanzhi Zhu. 2016. Universal dependencies 1.4. LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics, Charles University in Prague.


Lena √Ėhrman, 1998. Felaktigt s√§rskrivna sammans√§ttningar. Stockholm University, Department of Linguistics.


Robert √Ėstling. 2013. Stagger: An open-source part of speech tagger for Swedish. Northern European Journal of Language Technology, 3:1‚Äď18.


Eva Pettersson, Be√°ta Megyesi, and Joakim Nivre. 2013. Normalisation of historical text using contextsensitive weighted Levenshtein distance and compound splitting. In Proceedings of the 19th Nordic Conference of Computational Linguistics, NODALIDA ’13.


Mike Scott, 2016. WordSmith Tools Version 7. Stroud: Lexical Analysis Software.


Wenche Vagle. 2005. Tekstlengde + ordlengdesnitt = kvalitet? Hva kvantitative kriterier forteller om avgangselevenas skriveprestasjoner. In Kjell Lars Berge, Siegfred Evensen, Frydis Hertzberg, and Wenche. Vagle, editors, Ungdommers skrivekompetanse, Bind 2. Norskexamen som tekst. Universitetsforlaget.

Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Author:
Jesper Näsman, Beáta Megyesi, Anne Palmér
Title:
SWEGRAM ‚Äď A Web-Based Tool for Automatic Annotation and Analysis of Swedish Texts
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2017-02-21