Article | Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania | Defining the Eukalyptus forest - the Koala treebank of Swedish
Göm menyn

Title:
Defining the Eukalyptus forest - the Koala treebank of Swedish
Author:
Yvonne Adesam: Språkbanken, Department of Swedish, University of Gothenburg, Sweden Gerlof Bouma: Språkbanken, Department of Swedish, University of Gothenburg, Sweden Richard Johansson: Språkbanken, Department of Swedish, University of Gothenburg, Sweden
Download:
Full text (pdf)
Year:
2015
Conference:
Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania
Issue:
109
Article no.:
004
Pages:
1-9
No. of pages:
9
Publication type:
Abstract and Fulltext
Published:
2015-05-06
ISBN:
978-91-7519-098-3
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press, Linköpings universitet


Export in BibTex, RIS or text

This paper describes the creation of the Koala corpus, a 100k token manually an- notated corpus of Swedish contemporary texts, and in particular the part-of-speech and syntactic annotation. The resource will be made freely available.

Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania

Author:
Yvonne Adesam, Gerlof Bouma, Richard Johansson
Title:
Defining the Eukalyptus forest - the Koala treebank of Swedish
References:

Lars Borin, Markus Forsberg, and Lennart Lönngren. 2013. SALDO: a touch of yin to WordNet’s yang. Language Resources and Evaluation, 47(4):1191–1211.

Thorsten Brants, Wojciech Skut, and Hans Uszkoreit. 1999. Syntactic annotation of a German newspaper corpus. In Proceedings of the ATALA Treebank Workshop, pages 69–76.

Sabine Brants, Stefanie Dipper, Peter Eisenberg, Silvia Hansen-Schirra, Esther König,Wolfgang Lezius, Christian Rohrer, George Smith, and Hans Uszkoreit. 2004. Tiger: Linguistic interpretation of a German corpus. Research on Language and Computation, 2(4):597–620.

Aoife Cahill, Michael Burke, Ruth O’Donovan Josef Van Genabith, and Andy Way. 2004. Longdistance dependency resolution in automatically acquired wide-coverage PCFG-based LFG approximations. In Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL’04), Main Volume, pages 319–326.

Marie-Catherine de Marneffe, Timothy Dozat, Natalia Silveira, Katri Haverinen, Filip Ginter, Joakim Nivre, and Christopher Manning. 2014. Universal stanford dependencies: A cross-linguistic typology. In Proceedings of LREC.

Eva Ejerhed, Gunnel Källgren, Ola Wennstedt, and Magnus Åström. 1992. The linguistic annotation system of the Stockholm-Umeå corpus project - description and guidelines. Technical report, Department of Linguistics, Umeå University.

Helen Hoekstra, Michael Moortgat, Ineke Schuurman, and Ton van der Wouden. 2001. Syntactic annotation for the spoken Dutch corpus project (CGN). InWalter Daelemans, Khalil Sima’an, Jorn Veenstra, and Jakub Zavrel, editors, Computational Linguistics in the Netherlands 2000. Selected Papers from the Eleventh CLIN Meeting, pages 73–87. Rodopi.

Philipp Koehn. 2002. Europarl: A multilingual corpus for evaluation of machine translation. Bengt Loman and Nils Jörgensen. 1971. Manual för analys och beskrivning av makrosyntagmer. Studentlitteratur, Lund.

Yusuke Miyao, Takashi Ninomiya, , and Jun’ichi Tsujii. 2004. Corpus-oriented grammar development for acquiring a head-driven phrase structure grammar from the Penn Treebank. In Proceedings of the 1st International Joint Conference on Natural Language Processing (IJCNLP 2004), pages 684–693.

Joakim Nivre, Jens Nilsson, and Johan Hall. 2006. Talbanken05: A Swedish treebank with phrase structure and dependency annotation. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC), pages 1392–1395.

Joakim Nivre, Beáta Megyesi, Sofia Gustafson-Capková, Filip Salomonsson, and Bengt Dahlqvist. 2008. Cultivating a Swedish treebank. In Resourceful Language Technology: Festschrift in Honor of Anna Sågvall Hein. Uppsala University, Department of Linguistics and Philology.

Joakim Nivre. 2002. What kinds of trees grow in Swedish soil? a comparison of four annotation schemes for Swedish. In Proceedings of the Workshop on Treebanks and Linguistic Theories, September 20-21 (TLT02).

Robert Östling. 2013. Stagger: an open-source part of speech tagger for swedish. Northern European Journal of Language Technology, 3:1–18.

Wojciech Skut, Brigitte Krenn, Thorsten Brants, and Hans Uszkoreit. 1997. An annotation scheme for free word order languages. In Proceedings of the 5th Conference on Applied Natural Language Processing, pages 88–95.

Ulf Teleman, Staffan Hellberg, and Erik Andersson. 1999. Svenska Akademiens Grammatik. Svenska Akademien, Stockholm.

Ulf Teleman. 1974. Manual för grammatisk beskrivning av talad och skriven svenska. Studentlitteratur, Lund.

Heike Telljohann, Erhard Hinrichs, Sandra KĂĽbler, Heike Zinsmeister, and Kathrin Beck. 2012. Stylebook for the TĂĽbingen treebank of written German (TĂĽBa-D/Z). Technical report, Seminar fĂĽr Sprachwissenschaft, TĂĽbingen.

Gertjan van Noord, Gosse Bouma, Frank Van Eynde, Daniël de Kok, Jelmer van der Linde, Ineke Schuurman, Erik Tjong Kim San Sang, and Vincent Vandeghinste. 2013. Large scale syntactic annotation of written dutch: Lassy. In Peter Spyns and Jan Odijk, editors, Essential Speech and Language Technology for Dutch, Theory and Applications of Natural Language Processing, pages 147–164. Springer Berlin Heidelberg.

Martin Volk, Anne Göhring, Torsten Marek, and Yvonne Samuelsson. 2010. SMULTRON (version 3.0) — the Stockholm MULtilingual parallel TReebank. http://www.cl.uzh.ch/research/parallelcorpora/paralleltreebanks_en.html.

Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania

Author:
Yvonne Adesam, Gerlof Bouma, Richard Johansson
Title:
Defining the Eukalyptus forest - the Koala treebank of Swedish
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2017-02-21