Article | Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden | The Effect of Excluding Out of Domain Training Data from Supervised Named-Entity Recognition
Göm menyn

Title:
The Effect of Excluding Out of Domain Training Data from Supervised Named-Entity Recognition
Author:
Adam Persson: Department of Linguistics, Stockholm University, Stockholm, Sweden
Download:
Full text (pdf)
Year:
2017
Conference:
Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden
Issue:
131
Article no.:
040
Pages:
289-292
No. of pages:
4
Publication type:
Abstract and Fulltext
Published:
2017-05-08
ISBN:
978-91-7685-601-7
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press, Linköpings universitet


Export in BibTex, RIS or text

Supervised named-entity recognition (NER) systems perform better on text that is similar to its training data. Despite this, systems are often trained with as much data as possible, ignoring its relevance. This study explores if NER can be improved by excluding out of domain training data. A maximum entropy model is developed and evaluated twice with each domain in Stockholm-Umea¬į Corpus (SUC), once with all data and once with only in-domain data. For some domains, excluding out of domain training data improves tagging, but over the entire corpus it has a negative effect of less than two percentage points (both for strict and fuzzy matching).

Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Author:
Adam Persson
Title:
The Effect of Excluding Out of Domain Training Data from Supervised Named-Entity Recognition
References:

Berger, A. L., Pietra, V. J. D. & Pietra, S. A. D. 1996. A maximum entropy approach to natural language processing. Computational linguistics, 22(1), 39-71.


Ciaramita, M. & Altun, Y. 2005. Named-entity recognition in novel domains with external lexical knowledge. In Proceedings of the NIPS Workshop on Advances in Structured Learning for Text and Speech Processing.


Francis,W. N. & H. Ku?cera. 1964. Manual of Information to accompany A Standard Corpus of Present-Day Edited American English, for use with Digital Computers. Providence, Rhode Island: Department of Linguistics, Brown University. Revised 1971. Revised and amplified 1979.


Källgren, G. 2006. Documentation of the Stockholm-Umeå Corpus. Manual of the Stockholm Umeå Corpus version 2.0. Sofia Gustafson-Capková and Britt Hartmann (red). Stockholm University: Department of Linguistics.


Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K. & Dyer, C. 2016. Neural Architectures for Named Entity Recognition. In Proceedings of NAACL-HLT, 260-270.


Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M. & Perrot, M. (2011). Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research, 12, 2825-2830.


Persson, A. 2016. √Ėvervakad namntaggning med dom√§nspecifik tr√§ningsdata. (Bachelor thesis, Stockholm University, Stockholm, Sweden) Retrieved from http://www.divaportal.org/smash/get/diva2:934145/FULLTEXT01.pdf


Ratinov, L., & Roth, D. 2009. Design challenges and misconceptions in named entity recognition. Proceedings of the Thirteenth Conference on Computational Natural Language Learning, 147-155. Association for Computational Linguistics.


Salomonsson, A., Marinov, S. & Nugues, P. 2012. Identification of entities in Swedish. SLTC 2012, 63.


Sjöbergh, J. 2003. Combining POS-taggers for improved accuracy on Swedish text. Proceedings of
NoDaLiDa, 2003.


√Ėstling, R. 2013. Stagger: An open-source part of speech tagger for Swedish. Northern European Journal of Language Technology (NEJLT), 3, 1-18.

Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Author:
Adam Persson
Title:
The Effect of Excluding Out of Domain Training Data from Supervised Named-Entity Recognition
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2017-02-21