Article | Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16 | Tone restoration in transcribed Kammu: Decision-list word sense disambiguation for an unwritten language
Göm menyn

Title:
Tone restoration in transcribed Kammu: Decision-list word sense disambiguation for an unwritten language
Author:
Marus Uneson: Centre for Languages and Literature, Lund University, Sweden
Download:
Full text (pdf)
Year:
2013
Conference:
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16
Issue:
085
Article no.:
036
Pages:
399-409
No. of pages:
11
Publication type:
Abstract and Fulltext
Published:
2013-05-17
ISBN:
978-91-7519-589-6
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press; Linköpings universitet


Export in BibTex, RIS or text

The RWAAI (Repository and Workspace for Austroasiatic Intangible heritage) project aims at building a digital archive out of existing legacy data from the Austroasiatic language family. One aspect of the project is the preservation of analogue legacy data. In this context; we have at our hands a large number of mostly-phonemic transcriptions of narrative monologues; often with accompanying sound recordings; in the unwritten Kammu language of northern Laos. Some of the transcriptions; however; lack tone marks; which for a tonal language such as Kammu makes them substantially less useful. The problem of restoring tones can be recast as one of word sense disambiguation; or; more generally; lexical ambiguity resolution. We attack it by decision lists; along the lines of Yarowsky (1994); using the tone-marked part of the corpus (120kW) as training data. The performance ceiling of this corpus is uncertain: the stories were all annotated; primarily for human rather than machine consumption; by a single person during almost 40 years; with slowly emerging idiosyncratic conventions. Thus; both inter-annotator and intra-annotator agreement figures are unknown. Nevertheless; with the data from this one annotator as a gold standard; we improve from an already-high baseline accuracy of 95.7% to 97.2% (by 10-fold cross-validation).

Keywords: Word sense disambiguation; Kammu; decision lists; lexical ambiguity resolution; tone restoration; legacy data

Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16

Author:
Marus Uneson
Title:
Tone restoration in transcribed Kammu: Decision-list word sense disambiguation for an unwritten language
References:

Agirre; E. and Edmonds; P. (2006). Word sense disambiguation: Algorithms and applications; volume 33. Springer Science+ Business Media.


Jurafsky; D. and Martin; J. H. (2008). An Introduction to Natural Language Processing; Computational Linguistics; and Speech Recognition. Prentice-Hall; 2 edition.


Lindell; K.; Öjvind Swahn; J.; and Tayanin; D. (1977). A Kammu story-listener’s tales. Number 33 in Scandinavian Institute of Asian Studies Monograph Series. Curzon Press; London.


Lindell; K.; Öjvind Swahn; J.; and Tayanin; D. (1980). Folk Tales from Kammu II: A Story-teller’s Tales; volume 40 of Scandinavian Institute of Asian Studies Monograph Series. Curzon Press; London.


Lindell; K.; Ă–jvind Swahn; J.; and Tayanin; D. (1984). Folk Tales from Kammu III: Pearls of Kammu Literature. Number 51 in Scandinavian Institute of Asian Studies Monograph Series. Curzon Press; London.


Lindell; K.; Öjvind Swahn; J.; and Tayanin; D. (1989). Folk Tales from Kammu IV: A Master-Teller’s Tales. Number 56 in Scandinavian Institute of Asian Studies Monograph Series. Curzon Press; London.


Lindell; K.; Öjvind Swahn; J.; and Tayanin; D. (1995). Folk Tales from Kammu V: A Young Story-Teller’s Tales. Number 66 in Nordic Institute of Asian Studies Monograph series. Curzon Press; London.


Lindell; K.; Öjvind Swahn; J.; and Tayanin; D. (1998). Folk Tales from Kammu VI: A Teller’s Last Tales; volume 77 of Nordic Institute of Asian Studies Monograph series. Curzon Press; London.


Miller; G. A.; Beckwith; R.; Fellbaum; C.; Gross; D.; and Miller; K. (1990). Five papers on WordNet. International Journal of Lexicography; 3(4):235–244.


Navigli; R. (2009). Word sense disambiguation: a survey. ACM Computing Surveys; 41(2):1–69.


Rivest; R. (1987). Learning decision lists. Machine learning; 2(3):229–246.


Settles; B. (2009). Active learning literature survey. Technical Report Computer Sciences Technical Report 1648; University of Wisconsin–Madison.


Svantesson; J.-O. (1983). Kammu Phonology and Morphology. PhD thesis; Lund University. Travaux de l’Institut de linguistique de Lund; 18.


Svantesson; J.-O. (1989). Tonogenetic mechanisms in northern mon-khmer. Phonetica; 46(1-3):60–79.


Svantesson; J.-O.; Tayanin; D.; Lindell; K.; and Lundström; H. (in press). Kammu yùanenglish dictionary.


Yarowsky; D. (1994). Decision lists for lexical ambiguity resolution: Application to accent restoration in spanish and french. In Proceedings of the 32nd annual meeting on Association for Computational Linguistics; pages 88–95. Association for Computational Linguistics.


Yarowsky; D. (1996). Homograph disambiguation in text-to-speech synthesis. pages 157–172.

Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16

Author:
Marus Uneson
Title:
Tone restoration in transcribed Kammu: Decision-list word sense disambiguation for an unwritten language
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2017-02-21