Article | Proceedings of the workshop on computational historical linguistics at NODALIDA 2013; May 22-24; 2013; Oslo; Norway. NEALT Proceedings Series 18 | Experiments on sentence segmentation in Old Swedish editions Link�ping University Electronic Press Conference Proceedings
Göm menyn

Title:
Experiments on sentence segmentation in Old Swedish editions
Author:
Gerlof Bouma: Språkbanken, Department of Swedish University of Gothenburg, Sweden Yvonne Adesam: Språkbanken, Department of Swedish University of Gothenburg, Sweden
Download:
Full text (pdf)
Year:
2013
Conference:
Proceedings of the workshop on computational historical linguistics at NODALIDA 2013; May 22-24; 2013; Oslo; Norway. NEALT Proceedings Series 18
Issue:
087
Article no.:
002
Pages:
11-26
No. of pages:
16
Publication type:
Abstract and Fulltext
Published:
2013-05-17
ISBN:
978-91-7519-587-2
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press; Linköpings universitet


Export in BibTex, RIS or text

We Present experiments on automatic segmentation of electronic Old Swedish editions into sentence-like units. Our target material is haracterized by a great variation in the type of boundaries that are marked orthographically; the extent of boundary marking; and the means of boundary marking. We begin with an exploration of boundary marking in a large; unannotated corpus of Old Swedish texts. Then we show that we are able to improve upon a simple but effective segmenting baseline; using a conditional random field model trained on a manually annotated corpus. A more valuable lesson the modelling teaches us; however; is that we need to address the boundary marking variation explicitly.

Keywords: Sentence-like units; boundary detection; Old Swedish

Proceedings of the workshop on computational historical linguistics at NODALIDA 2013; May 22-24; 2013; Oslo; Norway. NEALT Proceedings Series 18

Author:
Gerlof Bouma, Yvonne Adesam
Title:
Experiments on sentence segmentation in Old Swedish editions
References:

Adesam; Y.; Ahlberg; M.; and Bouma; G. (2012). bokstaffua; bokstaffwa; bokstafwa; bokstaua; bokstawa. . . Towards lexical link-up for a corpus of Old Swedish. In Jancsary; editor; Empirical Methods in Natural Language Processing: Proceedings of KONVENS 2012 (LThist 2012 workshop); page 365–369; Vienna.

Evert; S. (2005). The Statistics of Word Cooccurrences: Word Pairs and Collocations. PhD thesis; IMS Stuttgart.

Gillick; D. (2009). Sentence boundary detection and the problem with the U.S. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume: Short Papers; pages 241–244; Boulder; Colorado. Association for Computational Linguistics.

Gotoh; Y. and Renals; S. (2000). Sentence boundary detection in broadcast speech transcripts. In ASR2000 - Automatic Speech Recognition: Challenges for the new Millenium; pages 228–235; Paris; France.

Haug; D. T. T.; Jøhndal; M.; Eckhoff; H. M.; Welo; E.; Hertzenberg; M. J. B.; and Müth; A. (2009). Computational and linguistic issues in designing a syntactically annotated parallel corpus of indo-european languages. Traitement Automatique des Langues; 50.

Höder; S. (2011). Phrases and Clauses Tagging Manual for syntactic analyses of Old Nordic texts encoded as Menotic XML documents (PaCMan). University of Hamburg; Hamburg. Version 2.0.

Huang; H.-H.; Sun; C.-T.; and Chen; H.-H. (2010). Classical Chinese sentence segmentation. In CIPS-SIGHAN Joint Conference on Chinese Language Processing; pages 15–23.

Kiss; T. and Strunk; J. (2006). Unsupervised multilingual sentence boundary detection. Computational Linguistics; 32(4):485–525.

Liu; Y. and Shriberg; E. (2007). Comparing evaluation metrics for sentence boundary detection. In ICASSP.

Liu; Y.; Stolcke; A.; Shriberg; E.; and Harper; M. (2005). Using Conditional Random Fields for sentence boundary detection in speech. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05); pages 451–458; Ann Arbor; Michigan. Association for Computational Linguistics.

Loman; B. and Jørgensen; N. (1971). Manual for analys och beskrivning av makrosyntagmer. Studentlitteratur; Lund.

Mikheev; A. (2002). Periods; capitalized words; etc. Computational Linguistics; 28(3):289–318.

Petran; F. (2012). Studies for segmentation of historical texts: Sentences or chunks? In Mambrini; F.; Passarotti; M.; and Sporleder; C.; editors; Proceedings of the Second Workshop on Annotation of Corpora for Research in the Humanities; pages 75–86; Lisbon.

Read; J.; Dridan; R.; Oepen; S.; and Solberg; L. J. (2012). Sentence boundary detection: A long solved problem? In Proceedings of COLING 2012: Posters; pages 985–994; Mumbai; India. The COLING 2012 Organizing Committee.

Stevenson; M. and Gaizauskas; R. (2000). Experiments on sentence boundary detection. In Proceedings of the Sixth Conference on Applied Natural Language Processing; pages 84–89; Seattle; Washington; USA. Association for Computational Linguistics.

Svensson; L. (1974). Nordisk Paleografi. Number 28 in Lunda studier i nordisk språkvetenskap; serie A. Studentlitteratur; Lund.

Proceedings of the workshop on computational historical linguistics at NODALIDA 2013; May 22-24; 2013; Oslo; Norway. NEALT Proceedings Series 18

Author:
Gerlof Bouma, Yvonne Adesam
Title:
Experiments on sentence segmentation in Old Swedish editions
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2018-9-11