Article | NEAL Proceedings of the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa), September 30 - October 2, Turku, Finland | Toward Multilingual Identification of Online Registers Linköping University Electronic Press Conference Proceedings
Göm menyn

Title:
Toward Multilingual Identification of Online Registers
Author:
Veronika Laippala: School of Languages and Translation Studies, University of Turku, Finland Roosa Kyllönen: School of Languages and Translation Studies, University of Turku, Finland Jesse Egbert: Applied Linguistics, Northern Arizona University, USA Douglas Biber: Applied Linguistics, Northern Arizona University, USA Sampo Pyysalo: Department of Future Technologies, University of Turku, Finland
Download:
Full text (pdf)
Year:
2019
Conference:
NEAL Proceedings of the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa), September 30 - October 2, Turku, Finland
Issue:
167
Article no.:
030
Pages:
292--297
No. of pages:
5
Publication type:
Abstract and Fulltext
Published:
2019-10-02
ISBN:
978-91-7929-995-8
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press, Linköpings universitet


Export in BibTex, RIS or text

We consider cross- and multilingual text classification approaches to the identification of online registers (genres), i.e. text varieties with specific situational characteristics. Register is the most important predictor of linguistic variation, and register information could improve the potential of online data for many applications. We introduce the first manually annotated non-English corpus of online registers featuring the full range of linguistic variation found online. The data set consists of 2,237 Finnish documents and follows the register taxonomy developed for the Corpus of Online Registers of English (CORE). Using CORE and the newly introduced corpus, we demonstrate the feasibility of cross-lingual register identification using a simple approach based on convolutional neural networks and multilingual word embeddings. We further find that register identification results can be improved through multilingual training even when a substantial number of annotations is available in the target language.

Keywords: Multilingual text classification Online Registers Convolutional neural networks Multilingual word vectors

NEAL Proceedings of the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa), September 30 - October 2, Turku, Finland

Author:
Veronika Laippala, Roosa Kyllönen, Jesse Egbert, Douglas Biber, Sampo Pyysalo
Title:
Toward Multilingual Identification of Online Registers
References:
No references available

NEAL Proceedings of the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa), September 30 - October 2, Turku, Finland

Author:
Veronika Laippala, Roosa Kyllönen, Jesse Egbert, Douglas Biber, Sampo Pyysalo
Title:
Toward Multilingual Identification of Online Registers
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2019-11-06