Conference article

UDLex: Towards Cross-language Subcategorization Lexicons

Giulia Rambelli
Computational Linguistics Laboratory, Department of Philology, Literature, and Linguistics, University of Pisa, Pisa, Italy

Alessandro Lenci
Computational Linguistics Laboratory, Department of Philology, Literature, and Linguistics, University of Pisa, Pisa, Italy

Thierry Poibeau
LATTICE, CNRS, École normale supérieure and Université Sorbonne nouvelle, PSL Research University and USPC, Paris, France

Download article

Published in: Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017), September 18-20, 2017, Università di Pisa, Italy

Linköping Electronic Conference Proceedings 139:24, p. 207-217

Show more +

Published: 2017-09-13

ISBN: 978-91-7685-467-9

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

This paper introduces UDLex, a computational framework for the automatic extraction of argument structures for several languages. By exploiting the versatility of the Universal Dependency annotation scheme, our system acquires subcategorization frames directly from a dependency parsed corpus, regardless of the input language. It thus uses a universal set of language-independent rules to detect verb dependencies in a sentence. In this paper we describe how the system has been developed by adapting the LexIt (Lenci et al., 2012) framework, originally designed to describe argument structures of Italian predicates. Practical issues that arose when building argument structure representations for typologically different languages will also be discussed.

Keywords

No keywords available

References

Marco Baroni, Silvia Bernardini, Federica Comastri, Lorenzo Piccioni, Alessandra Volpi, Guy Aston, and Marco Mazzoleni. 2004. Introducing the La Repubblica Corpus: A Large, Annotated, TEI(XML)-Compliant Corpus of Newspaper Italian. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC’04):1771–1774.

Marco Baroni, Silvia Bernardini, Adriano Ferraresi, and Eros Zanchetta. 2009. The WaCky Wide Web: A Collection of Very Large Linguistically Processed Web-Crawled Corpora. Language Resources and Evaluation, 43(3):209–226.

Joan Bresnan. 1996. Lexicality and Argument Structure. In Paris Syntax and Semantics Conference.

Michela Cennamo and Claudia Fabrizio. 2013. Italian Valency Patterns. In I. Hartmann, M. Haspelmath and B. Taylor (Eds.), Valency Patterns Leipzig. Max Planck Institute for Evolutionary Anthropology, Leipzig.

Guersande Chaminade and Thierry Poibeau. 2017. Preliminary Experiments in the Extraction of Predicative Structures from a Large Finnish Corpus. In Proceedings of the Workshop 3rd International Workshop for Computational Linguistics of Uralic Language:37–55.

Noam Chomsky. 1957. Syntactic Structures. Mouton, The Hague.

Noam Chomsky. 1965. Aspects of the Theory of Syntax. MIT Press, Cambridge, MA.

Montserrat Civit, Joan Castelvi, Roser Morante, Antoni Oliver, and Joan Aparicio. 2005. 4LEX: a Multilingual Lexical Resource. In Cross-Language Knowledge Induction Workshop:39–45.

Katrin Erk, Sebastian Padó, and Ulrike Padó. 2010. A flexible, corpus-driven model of regular and inverse selectional preferences. Computational Linguistics, 36(4):723–763.

Stefan Evert. 2009. Corpora and Collocations. In A. Lüdeling et M. Kytö (Eds.), Corpus Linguistics. An International Handbook, chapter 58. Mouton de Gruyter, Berlin.

Karel van den Eynde and Piet Mertens. 2003. La valence: l’approche pronominale et son application au lexique verbal. Journal of French Language Studies, 13:63–104.

Christiane Fellbaum. 1998. WordNet An Electronic Lexical Database. The MIT Press, Cambridge, MA.

Charles J. Fillmore and Beryl T. (Sue) Atkins. 1992. Towards a frame-based lexicon: The semantics of RISK and its neighbors. In A. Lehrer and E.F. Kittay (Eds.), Frames, fields and contrasts:75–102. Lawrence Erlbaum Associates, Hillsdale, NJ.

Charles J. Fillmore. 1982. Frame Semantics. In Linguistics in the Morning Calm: Selected Papers from SICOL 1981:111–137.

Charles J. Fillmore. 1985. Frames and the semantics of understanding. Quaderni di semantica, 6:222–254.

Cliff Goddard. 2013. English Valency Patterns. In I. Hartmann, M. Haspelmath and B. Taylor (Eds.), Valency Patterns Leipzig. Max Planck Institute for Evolutionary Anthropology, Leipzig.

Xiwu Han, Tiejun Zhao, Haoliang Qi, and Hao Yu. 2004. Subcategorization acquisition and evaluation for Chinese verbs. In Proceedings of the 20th International Conference on Computational Linguistics (COLING’04).

Lars Hellan, Dorothee Beermann, Tore Bruland, Mary Esther Kropp Dakubu, and Montserrat Marimon. 2014. MultiVal towards a multilingual valence lexicon. In Proceedings of the 9th Edition of the Language, Resources and Evaluation Conference (LREC’14):2478–2485.

Fred Karlsson. 2008. Finnish: An Essential Grammar. 2nd edition. Routledge Essential Grammars, London.

Karin KipperSchuler. 2005. VerbNet: A Broadcoverage, Comprehensive Verb Lexicon. PhD thesis, University of Pennsylvania, Philadelphia, PA.

Anna Korhonen, Yuval Krymolowski, and Ted Briscoe. 2006. A Large Subcategorization Lexicon for Natural Language Processing Applications. In Proceedings of the 5th Edition of the Language, Resources and Evaluation Conference (LREC’06):1015–1020.

Anna Korhonen. 2009. Automatic Lexical Classification - Balancing between Machine Learning and Linguistics. In Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation (PACLIC 23):19–28.

Anna Korhonen. 2002. Subcategorization acquisition. PhD thesis, University of Cambridge.

Alessandro Lenci, Gabriella Lapesa, and Giulia Bonansinga. 2012. LexIt : A Computational Resource on Italian Argument Structure. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC12):3712–3718.

Beth Levin and Malka Rappaport-Hovav. 2005. Argument Realization. Cambridge University Press, Cambridge, UK.

Beth Levin. 1993. English Verb Classes and Alternations. The University of Chicago Press, Chicago, IL.

Marc Light and Warren Greiff. 2002. Statistical models for the induction and use of selectional preferences. Cognitive Science, 26(3):269–281.

Christopher D. Manning. 2015. The case for universal dependencies. In Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015):1.

Pierre Marchal. 2015. Acquisition de schmas prdicatifs verbaux en japonais. PhD Thesis, INaLCO.

Marie-Catherine de Marneffe and Christopher D. Manning. 2008. The Stanford typed dependencies representation. In Proceedings of COLING 2008 Workshop on Cross-framework and Cross-domain Parser Evaluation.

Marie-Catherine de Marneffe, Timothy Dozat, Natalia Silveira, Katri Haverinen, Filip Ginter, Joakim Nivre, and Christopher D. Manning. 2014. Universal Stanford Dependencies: A cross-linguistic typology. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14):4585–4592.

Diana McCarthy. 2001. Lexical Acquisition at the Syntax-Semantics Interface: Diathesis Alternations, Subcategorization Frames and Selectional Preferences. Ph.D. thesis, University of Sussex.

Piet Mertens. 2010. Restrictions de sélection et réalisations syntagmatiques dans DICOVALENCE. Conversion vers un format utilisable en TAL. In Actes TALN 2010.

Cédric Messiant, Thierry Poibeau, and Anna Korhonen. 2008. Lexschem: a large sub-categorization lexicon for French verbs. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’08):142–147.

Cédric Messiant, Kata Gábor, and Thierry Poibeau. 2010. Lexical acquisition from corpora: the case of subcategorization frames in French. Traitement Automatique des Langues, 51(1):65–96.

Joakim Nivre. 2015. Towards a Universal Grammar for Natural Language Processing. In: Alexander Gelbukh (Ed.), Computational Linguistics and Intelligent Text Processing. CICLing 2015:3–16. Springer, Cham.

Slav Petrov, Dipanjan Das, and Ryan McDonald. 2012. A universal part-of-speech tagset. In Proceedings of the 8ht International Conference on Language Resources and Evaluation (LREC’12):2089–2096.

Thierry Poibeau and Cdric Messiant. 2008. Do we still need gold standard for evaluation ? In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC’08).

Thierry Poibeau. 2011. Traitement automatique du contenu textuel. Lavoisier. Paris.

Judita Preiss, Ted Briscoe, and Anna Korhonen. 2007. A system for large-scale acquisition of verbal, nominal and adjectival subcategorization frames from corpora. In Proceedings of the 45th Meeting of the Association for Computational Linguistics (ACL’07):912–918.

Giulia Rambelli, Gianluca E. Lebani, Alessandro Lenci and Laurent Prévot. 2016. LexFr: adapting the LexIt framework to build a corpus-based French subcategorization lexicon. In Proceedings of the 10th Edition of the Language, Resources and Evaluation Conference (LREC’16):930–937.

Philip Resnik. 1996. Selectional constraints: an information-theoretic model and its computational realization. Cognition, 61(1-2):127-159.

Douglas Roland and Daniel Jurafsky. 2002. Verb sense and verb subcategorization probabilities. In Paola Merlo and Suzanne Stevenson (Eds.), The Lexical Basis of Sentence Processing: Formal, Computational, and Experimental Issues:325–346. John Benjamins, Amsterdam.

Anna Rumshisky. 2008. Resolving polysemy in verbs: Contextualized distributional approach to argument semantics. In Distributional Models of the Lexicon in Linguistics and Cognitive Science, special issue of Italian Journal of Linguistics / Rivista di Linguistica.

Sabine Schulte im Walde. 2002. A subcategorization lexicon for German verbs induced from a lexicalized PCFG. In Proceedings of the 3rd Conference on Language Resources and Evaluation (LREC’02):1351–1357.

Sabine Schulte im Walde. 2009. The induction of verb frames and verb classes from corpora. In A. Lüdeling et M. Kytö (Eds.), Corpus Linguistics. An International Handbook, chapter 61. Mouton de Gruyter, Berlin.

Sandra A. Thompson. 1997. Discourse Motivations for the Core-Oblique Distinction as a Language Universal. In Akio Kamio (Ed.), Directions in Functional Linguistics:59–82. Benjamins, Amsterdam.

Daniel Zeman and Philip Resnik. 2008. Cross-Language Parser Adaptation between Related Languages. In Proceedings of IJCNLP 2008 Workshop on NLP for Less Privileged Languages.

Citations in Crossref