Article | Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden | Using Pseudowords for Algorithm Comparison: An Evaluation Framework for Graph-based Word Sense Induction
Göm menyn

Title:
Using Pseudowords for Algorithm Comparison: An Evaluation Framework for Graph-based Word Sense Induction
Author:
Flavio Massimiliano Cecchini: DISCo, Universit`a degli Studi di Milano-Bicocca, Italy Martin Riedl: Language Technology Group, Universität Hamburg, Germany Chris Biemann: Language Technology Group, Universität Hamburg, Germany
Download:
Full text (pdf)
Year:
2017
Conference:
Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden
Issue:
131
Article no.:
013
Pages:
105-114
No. of pages:
10
Publication type:
Abstract and Fulltext
Published:
2017-05-08
ISBN:
978-91-7685-601-7
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press, Linköpings universitet


Export in BibTex, RIS or text

In this paper we define two parallel data sets based on pseudowords, extracted from the same corpus. They both consist of word-centered graphs for each of 1225 different pseudowords, and use respectively first-order co-occurrences and secondorder semantic similarities. We propose an evaluation framework on these data sets for graph-based Word Sense Induction (WSI) focused on the case of coarsegrained homonymy: We compare different WSI clustering algorithms by measuring how well their outputs agree with the a priori known ground-truth decomposition of a pseudoword. We perform this evaluation for four different clustering algorithms: the Markov cluster algorithm, Chinese Whispers, MaxMax and a gangplankbased clustering algorithm. To further improve the comparison between these algorithms and the analysis of their behaviours, we also define a new specific evaluation measure. As far as we know, this is the first large-scale systematic pseudoword evaluation dedicated to the induction of coarsegrained homonymous word senses.

Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Author:
Flavio Massimiliano Cecchini, Martin Riedl, Chris Biemann
Title:
Using Pseudowords for Algorithm Comparison: An Evaluation Framework for Graph-based Word Sense Induction
References:

Enrique Amigó, Julio Gonzalo, Javier Artiles, and Felisa Verdejo. 2009. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information retrieval, 12(4):461–486.


Amit Bagga and Breck Baldwin. 1998. Algorithms for scoring coreference chains. In Proceedings of the first international Conference on Language Resources and Evaluation (LREC’98), workshop on linguistic coreference, pages 563–566, Granada, Spain. European Language Resources Association.


Osman Bas¸kaya and David Jurgens. 2016. Semisupervised learning with induced word senses for state of the art word sense disambiguation. Journal of Artificial Intelligence Research, 55:1025–1058.


Chris Biemann and Uwe Quasthoff. 2009. Networks generated from natural language text. In Dynamics on and of complex networks, pages 167–185. Springer.


Chris Biemann and Martin Riedl. 2013. Text: Now in 2D! a framework for lexical expansion with contextual similarity. Journal of Language Modelling, 1(1):55–95.


Chris Biemann. 2006. Chinese whispers: an efficient graph clustering algorithm and its application to natural language processing problems. In Proceedings of the first workshop on graph based methods for natural language processing, pages 73–80, New York, New York, USA.


Stefan Bordag. 2006. Word sense induction: Tripletbased clustering and automatic evaluation. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, pages 137–144, Trento, Italy. EACL.


Flavio Massimiliano Cecchini and Elisabetta Fersini. 2015. Word sense discrimination: A gangplank algorithm. In Proceedings of the Second Italian Conference on Computational Linguistics CLiC-it 2015, pages 77–81, Trento, Italy.


Marie-Catherine De Marneffe, Bill MacCartney, and Christopher Manning. 2006. Generating typed dependency parses from phrase structure parses. In Proceedings of the fifth international Conference on Language Resources and Evaluation (LREC’06), pages 449–454, Genoa, Italy. European Language Resources Association.


Ferdinand De Saussure. 1995 [1916]. Cours de linguistique générale. Payot&Rivage, Paris, France. Critical edition of 1st 1916 edition.


Stefan Evert. 2004. The statistics of word cooccurrences: word pairs and collocations. Ph.D. thesis, Universit¨at Stuttgart, August.


William Gale, Kenneth Church, and David Yarowsky. 1992. Work on statistical methods for word sense disambiguation. In Technical Report of 1992 Fall Symposium - Probabilistic Approaches to Natural Language, pages 54–60, Cambridge, Massachusetts, USA. AAAI.


Reza Ghaemi, Md Nasir Sulaiman, Hamidah Ibrahim, Norwati Mustapha, et al. 2009. A survey: clustering ensembles techniques. World Academy of Science, Engineering and Technology, 50:636–645.


Zellig Harris. 1954. Distributional structure. Word, 10(2-3):146–162.


Taher Haveliwala. 2002. Topic-sensitive pagerank. In Proceedings of the 11th international conference on World Wide Web, pages 517–526, Honolulu, Hawaii, USA. ACM.


David Hope and Bill Keller. 2013. MaxMax: a graphbased soft clustering algorithm applied to word sense induction. In Proceedings of the 14th International Conference on Computational Linguistics and Intelligent Text Processing, pages 368–381, Samos, Greece.


David Jurgens and Ioannis Klapaftis. 2013. SemEval-2013 task 13: Word sense induction for graded and non-graded senses. In *SEM 2013: The Second Joint Conference on Lexical and Computational Semantics, volume 2, pages 290–299, Atlanta, Georgia, USA. ACL.


Adam Kilgarriff, Pavel Rychlý, Pavel Smrž, and David Tugwell. 2004. The sketch engine. In Proceedings of the Eleventh Euralex Conference, pages 105–116, Lorient, France.


Linlin Li, Ivan Titov, and Caroline Sporleder. 2014. Improved estimation of entropy for evaluation of word sense induction. Computational Linguistics, 40(3):671–685.


Suresh Manandhar, Ioannis Klapaftis, Dmitriy Dligach, and Sameer Pradhan. 2010. Semeval-2010 task 14: Word sense induction & disambiguation. In Proceedings of the 5th international workshop on semantic evaluation, pages 63–68, Los Angeles, California, USA. Association for Computational Linguistics.


James Martin and Daniel Jurafsky. 2000. Speech and language processing. Pearson, Upper Saddle River, New Jersey, USA.


George Miller. 1995. WordNet: a lexical database for English. Communications of the ACM, 38(11):39–41.


Preslav Nakov and Marti Hearst. 2003. Categorybased pseudowords. In Companion Volume of the Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HTLNAACL) 2003 - Short Papers, pages 70–72, Edmonton, Alberta, Canada. Association for Computational Linguistics.


Roberto Navigli, Kenneth Litkowski, and Orin Hargraves. 2007. SemEval-2007 task 07: Coarsegrained English all-words task. In Proceedings of the 4th International Workshop on Semantic Evaluations, pages 30–35, Prague, Czech Republic. Association for Computational Linguistics.


Roberto Navigli. 2009. Word sense disambiguation: A survey. ACM Computing Surveys (CSUR), 41(2):10.


LubomĂ­r Otrusina and Pavel SmrĹľ. 2010. A new approach to pseudoword generation. In Proceedings of the seventh international Conference on Language Resources and Evaluation (LREC’10), pages 1195–1199. European Language Resources Association.


Robert Parker, David Graff, Junbo Kong, Ke Chen, and Kazuaki Maeda. 2011. English Gigaword Fifth Edition. Linguistic Data Consortium, Philadelphia, Pennsylvania, USA.


Mohammad Taher Pilehvar and Roberto Navigli. 2013. Paving the way to a large-scale pseudosenseannotated dataset. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (HTL-NAACL), pages 1100–1109, Atlanta, Georgia, USA. Association for Computational Linguistics.


Steffen Remus and Chris Biemann. 2013. Three knowledge-free methods for automatic lexical chain extraction. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (HTL-NAACL), pages 989–999, Atlanta, Georgia, USA. Association for Computational Linguistics.


Matthias Richter, Uwe Quasthoff, Erla HallsteinsdĂłttir, and Chris Biemann. 2006. Exploiting the Leipzig Corpora Collection. In Proceedings of the Fifth Slovenian and First International Language Technologies Conference, IS-LTC ’06, pages 68–73, Ljubljana, Slovenia. Slovenian Language Technologies Society.


Keijo Ruohonen. 2013. Graph Theory. Tampereen teknillinen yliopisto. Originally titled Graafiteoria, lecture notes translated by Tamminen, J., Lee, K.-C. and Piché, R.


Hinrich SchĂĽtze. 1992. Dimensions of meaning. In Proceedings of Supercomputing’92, pages 787–796, Minneapolis, Minnesota, USA. ACM/IEEE.


Hinrich Schütze. 1998. Automatic word sense discrimination. Computational linguistics, 24(1):97–123.


Alexander Strehl. 2002. Relationship-based clustering and cluster ensembles for high-dimensional data mining. Ph.D. thesis, The University of Texas at Austin, May.


Peter Turney and Patrick Pantel. 2010. From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research, 37(1):141–188.


Stijn van Dongen. 2000. Graph clustering by flow simulation. Ph.D. thesis, Universiteit Utrecht, May.


Dominic Widdows and Beate Dorow. 2002. A graph model for unsupervised lexical acquisition. In Proceedings of the 19th international conference on Computational Linguistics, volume 1, pages 1–7, Taipei, Taiwan. Association for Computational Linguistics

Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Author:
Flavio Massimiliano Cecchini, Martin Riedl, Chris Biemann
Title:
Using Pseudowords for Algorithm Comparison: An Evaluation Framework for Graph-based Word Sense Induction
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2017-02-21