Article | Proceedings of the workshop on lexical semantic resources for NLP at NODALIDA 2013; May 22-24; 2013; Oslo; Norway. NEALT Proceedings Series 19 | LBK2013: A balanced; annotated national corpus for Norwegian Bokmål Link�ping University Electronic Press Conference Proceedings
Göm menyn

Title:
LBK2013: A balanced; annotated national corpus for Norwegian Bokmål
Author:
Rune Lain Knudsen: Institute of Linguistic and Nordic Studies, University of Oslo Ruth Vatvedt Fjeld: Institute of Linguistic and Nordic Studies, University of Oslo
Download:
Full text (pdf)
Year:
2013
Conference:
Proceedings of the workshop on lexical semantic resources for NLP at NODALIDA 2013; May 22-24; 2013; Oslo; Norway. NEALT Proceedings Series 19
Issue:
088
Article no.:
003
Pages:
12-20
No. of pages:
9
Publication type:
Abstract and Fulltext
Published:
2013-05-17
ISBN:
978-91-7519-586-5
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press; Linköpings universitet


Export in BibTex, RIS or text

At the Department of Linguistics and Scandinavian Studies (ILN) and the University of Oslo; the task of assembling a balanced corpus representing modern Norwegian Bokmål has reached a significant milestone. The Corpus for Bokmål Lexicography (LBK) now consists of more than 100;000;000 words. These documents have been selected based on a statistical analysis of reading habits in the general population of Norway. The documents have been subject to both manual bibliographic annotation; as well as automatic morphological annotation for each document. LBK will play a central part of a set of interconnected lexical resources; the aim of which is to provide an extensive documentation of Norwegian Bokmål that covers lexical and other linguistic/lexico-syntactic aspects. This paper presents LBK2013; a subset of LBK that we consider to be an accurate and comprehensive representation of modern written Norwegian Bokmål. A description of the corpus; as well as a number of related projects are described.

Keywords: NoDaLiDa 2013; Speech and Language Technologies; Northern Europe; Corpora; Lexicography; Lexical Semantics

Proceedings of the workshop on lexical semantic resources for NLP at NODALIDA 2013; May 22-24; 2013; Oslo; Norway. NEALT Proceedings Series 19

Author:
Rune Lain Knudsen, Ruth Vatvedt Fjeld
Title:
LBK2013: A balanced; annotated national corpus for Norwegian Bokmål
References:

Agirre; E. and Edmonds; P.; editors (2007). Word Sense Disambiguation - Algorithms and Applications; chapter 5; pages 107–131. Springer.

Evert; S. and Hardie; A. (2011). Twenty-first century corpus workbench: Updating a query architecture for the new millenium. In Proceedings of the Corpus Linguistics 2011 Conference. University of Birmingham.

Fellbaum; C.; editor (1998). WordNet - An Electronic Lexical Database. MIT Press.

Fjeld; R. V. and Nygaard; L. (2009). NorNet - a monolingual wordnet of modern norwegian. In NODALIDA 2009 workshop: WordNets and other Lexical Semantic Resources - between Lexical Semantics; Lexicography; Terminology and Formal Ontologies; volume 7 of NEALT Proceedings Series; pages 13–16.

Fjeld; R. V.; Nygaard; L.; and Bick; E. (2010). Semi-automatic retrieval of phraseological units in a corpus of modern norwegian. In Korpora; Web und Datenbanken. Computergestützte Methoden in der modernen Phraseologie und Lexicographie; volume 25.

Johannessen; J. B.; Hagen; K.; Lynum; A.; and Nøklestad; A. (2012). OBT+Stat: A combined rule-based and statistical tagger. In Exploring Newpaper Language; volume 49 of Studies in Corpus Linguistics; pages 51–65. John Benjamins.

Kilarriff; A. and Rosenzweig; J. (2000). English SENSEVAL: Report and results. In Proceedings of the 2nd International Conference on Language Resources and Evaluation.

Kilgarriff; A. and Rosenzweig; J. (2000). Framework and results for english SENSEVAL. In Computers and the Humanities; volume 34; pages 15–48. fd.

Kilgarriff; A. and Tugwell; D. (2002). Sketching words. In Lexicography and Natural Language Processing. Euralex.

Nygaard; L.; Priestley; J.; Nøklestad; A.; and Johannessen; J. B. (2008). Glossa: a multilingual; multimodal; configurable user interface. In Chair); N. C. C.; Choukri; K.; Maegaard; B.; Mariani; J.; Odijk; J.; Piperidis; S.; and Tapias; D.; editors; Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08); Marrakech; Morocco. European Language Resources Association (ELRA). http://www.lrec conf.org/proceedings/lrec2008.

Palmer; M.; Fellbaum; C.; and Dang; H. T. (2006). Making fine-grained and coarse-grained sense distinctions; both manually and automatically. In Natural Language Engineering; volume 12.

Pedersen; B.; Nimb; S.; Asmussen; J.; Sørensen; N.; Trap-Jensen; L.; and Lorentzen; H. (2009). Dannet: the challenge of compiling a wordnet for danish by reusing a monolingual dictionary. Language Resources and Evaluation; 43(3):269–299.

Proceedings of the workshop on lexical semantic resources for NLP at NODALIDA 2013; May 22-24; 2013; Oslo; Norway. NEALT Proceedings Series 19

Author:
Rune Lain Knudsen, Ruth Vatvedt Fjeld
Title:
LBK2013: A balanced; annotated national corpus for Norwegian Bokmål
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2018-9-11