Article | Proceedings of the workshop on computational historical linguistics at NODALIDA 2013; May 22-24; 2013; Oslo; Norway. NEALT Proceedings Series 18 | Edit transducers for spelling variation in Old Spanish Link�ping University Electronic Press Conference Proceedings
Göm menyn

Title:
Edit transducers for spelling variation in Old Spanish
Author:
Jordi Porta: Departamento de Tecnología y Sistemas, Centro de Estudios de la Real Academia Española, Madrid, Spain José-Luis Sancho: Departamento de Tecnología y Sistemas, Centro de Estudios de la Real Academia Española, Madrid, Spain Javier Gómez: Departamento de Tecnología y Sistemas, Centro de Estudios de la Real Academia Española, Madrid, Spain
Download:
Full text (pdf)
Year:
2013
Conference:
Proceedings of the workshop on computational historical linguistics at NODALIDA 2013; May 22-24; 2013; Oslo; Norway. NEALT Proceedings Series 18
Issue:
087
Article no.:
006
Pages:
70-79
No. of pages:
10
Publication type:
Abstract and Fulltext
Published:
2013-05-17
ISBN:
978-91-7519-587-2
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press; Linköpings universitet


Export in BibTex, RIS or text

A system for the analysis of Old Spanish word forms using weighted finite-state transducers is presented. The system uses previously existing resources such as a modern lexicon; a phonological transcriber and a set of rules implementing the evolution of Spanish from the Middle Ages. The results obtained in all datasets show significant improvements; both in accuracy and in the trade-off between precision and recall; with respect to the baseline and the Levenshtein edit distance. A qualitative error analysis suggests several potential ways to improve the performance of the system.

Keywords: Old Spanish; Finite-State Transducers; Spelling Variation; Historical Linguistics

Proceedings of the workshop on computational historical linguistics at NODALIDA 2013; May 22-24; 2013; Oslo; Norway. NEALT Proceedings Series 18

Author:
Jordi Porta, José-Luis Sancho, Javier Gómez
Title:
Edit transducers for spelling variation in Old Spanish
References:

Allauzen; C. and Mohri; M. (2008). 3-way composition of weighted finite-state transducers. In Proceedings of the 13th International Conference on Implementation and Application of Automata (CIAA–2008); pages 262–273; San Francisco; California; USA.

Allauzen; C.; Riley; M.; Schalkwyk; J.; Skut; W.; and Mohri; M. (2007). OpenFst: A general and efficient weighted finite-state transducer library. In Proceedings of the Ninth International Conference on Implementation and Application of Automata; (CIAA–2007); pages 11–23; Praque; Czech Republic.

Bollmann; M.; Petran; F.; and Dipper; S. (2011). Applying rule-based normalization to different types of historical texts — An evaluation. In Proceedings of the 5th Languange and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics; pages 339–344; Poznan; Poland.

Borin; L. and Forsberg; M. (2008). Something old; something new: A computational morphological description of Old Swedish. In LREC 2008 Workshop on Language Technology for Cultural Heritage Data (LaTeCH–2008); pages 9–16; Marrakech; Morocco.

Chomsky; N. and Halle; M. (1968). The sound pattern of English. Harper & Row; New York.

Damerau; F. J. (1964). A technique for computer detection and correction of spelling errors. Communications of the ACM; 7(3):171–176.

Jurish; B. (2010a). Efficient online k-best lookup in weighted finite-state cascades. In Hanneforth; T. and Fanselow; G.; editors; Language and Logos: Studies in Theoretical and Computational Linguistics; volume 72 of Studia grammatica; pages 313–327. Akademie Verlag; Berlin.

Jurish; B. (2010b). More than words: Using token context to improve canonicalization of historical German. Journal for Language Technology and Computational Linguistics; 25(1):23– 39.

Kaplan; R. M. and Kay; M. (1994). Regular models of phonological rule systems. Computational Linguistics; 20(3):331–378.

Karttunen; L. (1995). The replace operator. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL–95); pages 16–23; Cambridge; Massachusetts; USA.

Karttunen; L. (1996). Directed replacement. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (ACL–96); pages 108–115; Santa Cruz; California; USA.

Levenshtein; V. I. (1966). Binary codes capable of correcting deletions; insertions; and reversals. Soviet Physics Doklady; 10(8):707–710.

Lloyd; P. M. (1987). From Latin to Spanish. American Philosophical Society; Philadelphia.

Mohri; M. (2009). Weighted automata algorithms. In Droste; M.; Kuich; W.; and Vogler; H.; editors; Handbook of Weighted Automata; pages 213–254. Springer; Berlin.

Mohri; M. and Riley; M. (2002). An efficient algorithm for the n-best-strings problem. In Proceedings of the International Conference on Spoken Language Processing 2002 (ICSLP–2002); Denver; Colorado; USA.

Morreale; M. (1978). Trascendencia de la variatio para el estudio de la grafía; fonética; morfología y sintaxis de un texto medieval; ejemplificada en el MS Esc. I.I.6. In Annali della Facoltà di Lettere e Filosofia dell’Università di Padova; volume II; pages 249–261; Florence; Italy.

Penny; R. J. (2002). A history of the Spanish Language. Cambridge University Press; Cambridge; second edition.

Piotrowski; M. (2012). Natural language processing for historical texts. Synthesis Lectures on Human Language Technologies; 5(2):1–157.

Pombo; E. L. (2012). Variation and standardization in the history of Spanish spelling. In Baddeley; S. and Voeste; A.; editors; Orthographies in Early Modern Europe; pages 15–62. De Gruyter Mouton; Berlin; Boston.

RAE (2001). Diccionario de la lengua española. Espasa; Madrid; 22th edition.

Roark; B.; Sproat; R.; Allauzen; C.; Riley; M.; Sorensen; J.; and Tai; T. (2012). The OpenGrm open-source finite-state grammar software libraries. In Proceedings of the ACL 2012 System Demonstrations; pages 61–66; Jeju Island; Korea.

Sánchez; F.; Porta; J.; Sancho; J. L.; Nieto; A.; Ballester; A.; Fernández; A.; Gómez; J.; Gómez; L.; Raigal; E.; and Ruiz; R. (1999). La anotación de los corpus CREA y CORDE. In Proceedings of SEPLN 1999; volume 25; pages 175–182; Lleida; Spain.

Sánchez-Marco; C.; Boleda; G.; and Padró; L. (2011). Extending the tool; or how to annotate historical language varieties. In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage; Social Sciences; and Humanities; pages 1–9; Portland; OR; USA.

Wells; J. C. (1997). Sampa computer readable phonetic alphabet. In Gibbon; D.; Moore; R.; and Winski; R.; editors; Handbook of Standards and Resources for Spoken Language Systems; pages 684–732. Mouton de Gruyter; Berlin and New York.

Proceedings of the workshop on computational historical linguistics at NODALIDA 2013; May 22-24; 2013; Oslo; Norway. NEALT Proceedings Series 18

Author:
Jordi Porta, José-Luis Sancho, Javier Gómez
Title:
Edit transducers for spelling variation in Old Spanish
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2018-9-11