Article | Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies, 22 May, Gothenburg Sweden | Increasing Return on Annotation Investment: the Automatic Construction of a Universal Dependency Treebank for Dutch
Göm menyn

Title:
Increasing Return on Annotation Investment: the Automatic Construction of a Universal Dependency Treebank for Dutch
Author:
Gosse Bouma: Centre for Language and Cognition, University of Groningen, The Netherlands Gertjan van Noord: Centre for Language and Cognition, University of Groningen, The Netherlands
Download:
Full text (pdf)
Year:
2017
Conference:
Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies, 22 May, Gothenburg Sweden
Issue:
135
Article no.:
003
Pages:
19-26
No. of pages:
8
Publication type:
Abstract and Fulltext
Published:
2017-05-29
ISBN:
978-91-7685-501-0
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press, Linköpings universitet


Export in BibTex, RIS or text

We present a method for automatically converting the Dutch Lassy Small treebank, a phrasal dependency treebank, to UD. All of the information required to produce accurate UD annotation appears to be available in the underlying annotation. However, we also note that the close connection between POS-tags and dependency labels that is present in UD is missing in the Lassy treebanks. As a consequence, annotation decisions in the Dutch data for such phenomena as nominalization and clausal complements of prepositions seem to differ to some extent from comparable data in English and German. Because the conversion is automatic, we can now also compare three state-of-the-art dependency parsers trained on UD Lassy Small with Alpino, a hybrid Dutch parser which produces output that is compatible with the original Lassy annotations.

Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies, 22 May, Gothenburg Sweden

Author:
Gosse Bouma, Gertjan van Noord
Title:
Increasing Return on Annotation Investment: the Automatic Construction of a Universal Dependency Treebank for Dutch
References:

Lars Ahrenberg. 2015. Converting an English-Swedish parallel treebank to universal dependencies. In Third International Conference on Dependency Linguistics (DepLing 2015), Uppsala, August 24-26, pages 10‚Äď19. Association for Computational Linguistics.


Chris Alberti, Daniel Andor, Ivan Bogatyy, Michael Collins, Dan Gillick, Lingpeng Kong, Terry Koo, Ji Ma, Mark Omernick, Slav Petrov, Chayut Thanapirom, Zora Tung, and David Weiss. 2017. Syntaxnet models for the CoNLL 2017 shared task.


Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman Ganchev, Slav Petrov, and Michael Collins. 2016. Globally normalized transition-based neural networks. In Proceedings of the ACL.


Sabine Brants, Stefanie Dipper, Silvia Hansen, Wolfgang Lezius, and George Smith. 2002. The TIGER treebank. In Proceedings of the workshop on Treebanks and Linguistic Theories, volume 168.


Sabine Buchholz and Erwin Marsi. 2006. CoNLL-X shared task on multilingual dependency parsing. In Proceedings of the Tenth Conference on Computational Natural Language Learning, pages 149‚Äď164. Association for Computational Linguistics.


Noam Chomsky. 1968. Remarks on nominalization. Linguistics Club, Indiana University. Anders Johannsen, H¬īector Mart¬īinez Alonso, and Barbara Plank. 2015. Universal dependencies for Danish. In International Workshop on Treebanks and Linguistic Theories (TLT14), page 157.


Eliyahu Kiperwasser and Yoav Goldberg. 2016. Easy-first dependency parsing with hierarchical tree LSTMs. Transactions of the ACL, 4:445‚Äď461.


Teresa Lynn and Jennifer Foster. 2016. Universal dependencies for Irish. In Celtic Language Technology Workshop, pages 79‚Äď92.


Adam Meyers, Ruth Reeves, Catherine Macleod, Rachel Szekely, Veronika Zielinska, Brian Young, and Ralph Grishman. 2004. Annotating noun argument structure for NomBank. In LREC, volume 4, pages 803‚Äď806.


Lilja √ėvrelid and Petter Hohle. 2016. Universal dependencies for Norwegian. In Proceedings of the Tenth International Conference on Language Resources and Evaluation. PortoroŇĺ, Slovenia.


Wojciech Skut, Thorsten Brants, Brigitte Krenn, and Hans Uszkoreit. 1998. A linguistically interpreted corpus of German newspaper text. arXiv preprint cmp-lg/9807008.


Leonoor van der Beek, Gosse Bouma, Rob Malouf, and Gertjan van Noord. 2002. The Alpino dependency treebank. In Computational Linguistics in the Netherlands (CLIN) 2001, Twente University.


Gertjan van Noord, Gosse Bouma, Frank van Eynde, Daniel de Kok, Jelmer van der Linde, Ineke Schuurman, Erik Tjong Kim Sang, and Vincent Vandeghinste. 2013. Large scale syntactic annotation of written Dutch: Lassy. In Peter Spyns and Jan Odijk, editors, Essential Speech and Language Technology for Dutch: the STEVIN Programme, pages 147‚Äď164. Springer.


Gertjan van Noord. 2006. At last parsing is now operational. In Piet Mertens, Cedrick Fairon, Anne Dister, and Patrick Watrin, editors, TALN06. Verbum Ex Machina. Actes de la 13e conference sur le traitement automatique des langues naturelles, pages 20‚Äď42.


Daniel Zeman, Ond rej DuŇ°ek, David Marecek, Martin Popel, Loganathan Ramasamy, Jan ҆tep√°nek, Zdenek ŇĹabokrtsk? and Jan Hajic. 2014. HamleDT: Harmonized multi-language dependency treebank. Language Resources and Evaluation, 48(4):601‚Äď637.

Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies, 22 May, Gothenburg Sweden

Author:
Gosse Bouma, Gertjan van Noord
Title:
Increasing Return on Annotation Investment: the Automatic Construction of a Universal Dependency Treebank for Dutch
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2017-02-21