Article | Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16 | 6;909 Reasons to Mess Up Your Data Link�ping University Electronic Press Conference Proceedings
Göm menyn

Title:
6;909 Reasons to Mess Up Your Data
Author:
Anders Søgaard: Københavns Universitet, Denmark
Download:
Full text (pdf)
Year:
2013
Conference:
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16
Issue:
085
Article no.:
003
Pages:
5-5
No. of pages:
1
Publication type:
Abstract and Fulltext
Published:
2013-05-17
ISBN:
978-91-7519-589-6
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press; Linköpings universitet


Export in BibTex, RIS or text

In computational linguistics we develop tools and on-line services for everything from literature to social media data; but our tools are often optimized to minimize expected error on a single annotated dataset; typically newspaper articles - and evaluated on held-out data sampled from the same dataset. Significance testing across data points randomly sampled from a standard dataset only tells us how likely we are to see better performance on more data points sampled this way; but says nothing about performance on other datasets. This talk discusses how to modify learning algorithms to minimize expected error on future; unseen datasets; with applications to PoS tagging and dependency parsing; including cross-language learning problems. It also discusses the related issue of how to best evaluate NLP tools (intrinsically) taking their possible out-of-domain applications into account.

Keywords: Domain Variation; PoS Tagging; Dependency Parsing; Evaluation

Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16

Author:
Anders Søgaard
Title:
6;909 Reasons to Mess Up Your Data
References:
No references available

Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16

Author:
Anders Søgaard
Title:
6;909 Reasons to Mess Up Your Data
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2018-9-11