Article | Proceedings of the Workshop on Semantic resources and Semantic Annotation for Natural Language Processing and the Digital Humanities at NODALIDA 2015, Vilnius, 11th May, 2015 | Coarse-grained sense annotation of Danish across textual domains Link�ping University Electronic Press Conference Proceedings
Göm menyn

Title:
Coarse-grained sense annotation of Danish across textual domains
Author:
Sussi Olsen: University of Copenhagen, Copenhagen, Denmark Bolette S. Pedersen: University of Copenhagen, Copenhagen, Denmark Héctor Martínez Alonso: University of Copenhagen, Copenhagen, Denmark Anders Johannsen: University of Copenhagen, Copenhagen, Denmark
Download:
Full text (pdf)
Year:
2015
Conference:
Proceedings of the Workshop on Semantic resources and Semantic Annotation for Natural Language Processing and the Digital Humanities at NODALIDA 2015, Vilnius, 11th May, 2015
Issue:
112
Article no.:
006
Pages:
36–43
No. of pages:
8
Publication type:
Abstract and Fulltext
Published:
2015-05-06
ISBN:
978-91-7519-049-5
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press, Linköpings universitet


Export in BibTex, RIS or text

We present the results of a coarse-grained sense annotation task on verbs, nouns and adjectives across six textual domains in Danish. We present the domain-wise differences in intercoder agreement and discuss how the applicability and validity of the sense inventory vary depending on domain. We find that domain-wise agreement is not higher in very canonical or edited text. In fact, newswire text and parliament speeches have lower agreement than blogs and chats, probably because the language of these text types is more complex and uses more abstract concepts. We further observe that domains differ in their sense distribution. For instance, newswire and magazines stand out as having a high focus on persons, and discussion fora typically include a restricted number of senses dependent on specialized topics. We anticipate that these findings can be exploited in automatic sense tagging when dealing with domain shift.

Keywords: sense annotation; sense tagging; sense inventory; supersenses; Danish; textual domains

Proceedings of the Workshop on Semantic resources and Semantic Annotation for Natural Language Processing and the Digital Humanities at NODALIDA 2015, Vilnius, 11th May, 2015

Author:
Sussi Olsen, Bolette S. Pedersen, Héctor Martínez Alonso, Anders Johannsen
Title:
Coarse-grained sense annotation of Danish across textual domains
References:

Ron Artstein and Massimo Poesio. 2008. Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4):555–596.

Jørg Asmussen and Jakob Halskov. 2012. The CLARIN DK Reference Corpus. In Sprogteknologisk Workshop.

Susan Windisch Brown, Travis Rood, and Martha Palmer. 2010. Number or nuance: Which factors restrict reliable word sense annotation? In LREC.

Massimiliano Ciaramita and Mark Johnson. 2003. Supersense tagging of unknown nouns in wordnet. In Proceedings of the 2003 conference on Empirical methods in natural language processing, pages 168–175. Association for Computational Linguistics.

Gerard De Melo, Collin F Baker, Nancy Ide, Rebecca J Passonneau, and Christiane Fellbaum. 2012. Empirical comparisons of masc word sense annotations. In LREC, pages 3036–3043.

William A Gale, Kenneth W Church, and David Yarowsky. 1992. One sense per discourse. In Proceedings of the workshop on Speech and Natural Language, pages 233–237. Association for Computational Linguistics.

Nancy Ide and Yorick Wilks. 2006. Making sense about sense. In Word sense disambiguation, pages 47–73. Springer.

Adam Kilgarriff. 2006. Word senses. In Eneko Agirre and Philip Edmonds, editors, Word Sense Disambiguation, pages 29–46. Springer.

H´ector Mart´inez Alonso, Anders Johannsen, Anders Søgaard, Sussi Olsen, Anna Braasch, Sanni Nimb, Nicolai Hartvig Sørensen, and Bolette Sandford Pedersen. 2015a. Supersense tagging for danish. In Nodalida.

Héctor Martínez Alonso, Barbara Plank, Anders Johannsen, and Søgaard. 2015b. Active learning for sense annotation. In Nodalida.

Bolette Sandford Pedersen, Sanni Nimb, Jørg Asmussen, Nicolai Hartvig Sørensen, Lars Trap-Jensen, and Henrik Lorentzen. 2009. Dannet: the challenge of compiling a wordnet for danish by reusing a monolingual dictionary. Language resources and evaluation, 43(3):269–299.

Bolette Pedersen, Anna Braasch, Sanni Nimb, and Sussi Olsen. 2015. Betydningsinventar - i ordbøger og i løbende tekst, forthcoming. In Presentation at the 13th Conference on Lexicography in the Nordic Countries.

Piek Vossen. 1998. EuroWordNet: A multilingual database with lexical semantic networks. Springer.

Seid Muhie Yimam, Iryna Gurevych, Richard Eckart de Castilho, and Chris Biemann. 2013. Webanno: A flexible, web-based and visually supported system for distributed annotations. In ACL (Conference System Demonstrations), pages 1–6.

Proceedings of the Workshop on Semantic resources and Semantic Annotation for Natural Language Processing and the Digital Humanities at NODALIDA 2015, Vilnius, 11th May, 2015

Author:
Sussi Olsen, Bolette S. Pedersen, Héctor Martínez Alonso, Anders Johannsen
Title:
Coarse-grained sense annotation of Danish across textual domains
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2018-9-11