Article | Proceedings of the workshop on Nordic language research infrastructure at NODALIDA 2013; May 22-24; 2013; Oslo; Norway. NEALT Proceedings Series 20 | Towards Large-Scale Language Analysis in the Cloud Link�ping University Electronic Press Conference Proceedings
Göm menyn

Title:
Towards Large-Scale Language Analysis in the Cloud
Author:
Emanuele Lapponi: Language Technology Group, Department of Informatics, University of Oslo, Norway Erik Velldal: Language Technology Group, Department of Informatics, University of Oslo, Norway Nikolay A. Vazov: Research Support Services Group, University Center for Information Technology, University of Oslo, Norway Stephan Oepen: Language Technology Group, Department of Informatics, University of Oslo, Norway
Download:
Full text (pdf)
Year:
2013
Conference:
Proceedings of the workshop on Nordic language research infrastructure at NODALIDA 2013; May 22-24; 2013; Oslo; Norway. NEALT Proceedings Series 20
Issue:
089
Article no.:
001
Pages:
1-10
No. of pages:
10
Publication type:
Abstract and Fulltext
Published:
2013-05-17
ISBN:
978-91-7519-585-8
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press; Linköpings universitet


Export in BibTex, RIS or text

This paper documents ongoing work within the Norwegian CLARINO project on building a Language Analysis Portal (LAP). The portal will provide an intuitive and easily accessible web interface to a centralized repository of a wide range of language technology tools; all installed on a high-performance computing cluster. Users will be able to compose and run workflows using an easy-to-use graphical interface; with multiple tools and resources chained together in potentially complex pipelines. Although the project aims to reach out to a diverse set of user groups; it particularly will facilitate use of language analysis in the social sciences; humanities; and other fields without strong computational traditions. While the development of the portal is still in its early stages; this paper documents ongoing work towards an already operable pilot in addition to providing an overview of long-term goals and visions. At the core of the current pilot implementation we find Galaxy; a web-based workflow management system initially developed for data-intensive research in genomics and bioinformatics; therefore; an important part of the work on the pilot is to adapt and evaluate Galaxy for the context of a language analysis portal.

Keywords: Research infrastructure; High-Performance Computing; web portal; CLARINO

Proceedings of the workshop on Nordic language research infrastructure at NODALIDA 2013; May 22-24; 2013; Oslo; Norway. NEALT Proceedings Series 20

Author:
Emanuele Lapponi, Erik Velldal, Nikolay A. Vazov, Stephan Oepen
Title:
Towards Large-Scale Language Analysis in the Cloud
References:

Bird; S.; Klein; E.; and Loper; E. (2009). Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly.

Blankenberg; D.; Kuster; G. V.; Coraor; N.; Ananda; G.; Lazarus; R.; Mangan; M.; Nekrutenko; A.; and Taylor; J. (2010). Galaxy: a web-based genome analysis tool for experimentalists. Current Protocols in Molecular Biology; pages 19.10.1–19.10.21.

Cunningham; H.; Maynard; D.; Bontcheva; K.; Tablan; V.; Aswani; N.; Roberts; I.; Gorrell; G.; Funk; A.; Roberts; A.; Damljanovic; D.; Heitz; T.; Greenwood; M. A.; Saggion; H.; Petrak; J.; Li; Y.; and Peters; W. (2011). Text Processing with GATE (Version 6).

Giardine; B.; Riemer; C.; Hardison; R. C.; Burhans; R.; Elnitski; L.; Shah; P.; Zhang; Y.; Blankenberg; D.; Albert; I.; Taylor; J.; Miller; W.; Kent; W. J.; and Nekrutenko; A. (2005). Galaxy: a platform for interactive large-scale genome analysis. Genome Research; 15(10):1451– 5.

Goecks; J.; Nekrutenko; A.; Taylor; J.; and Team; T. G. (2010). Galaxy: a comprehensive approach for supporting accessible; reproducible; and transparent computational research in the life sciences. Genome Biology; 11(8):R86.

Götz; T. and Suhre; O. (2004). Design and implementation of the UIMA common analysis system. IBM Syst. J.; 43(3):476–489.

Heid; U.; Schmid; H.; Eckart; K.; and Hinrichs; E. (2010). A corpus representation format for linguistic web services: The D-SPIN Text Corpus Format and its relationship with ISO standards. In Proceedings of the 7th International Conference on Language Resources and Evaluation; pages 494–499.

Missier; P.; Soiland-Reyes; S.; Owen; S.; Tan; W.; Nenadic; A.; Dunlop; I.; Williams; A.; Oinn; T.; and Goble; C. (2010). Taverna; reloaded. In Proceedings of the 22nd International Conference on Scientific and Statistical Database Management; pages 471–481.

Proceedings of the workshop on Nordic language research infrastructure at NODALIDA 2013; May 22-24; 2013; Oslo; Norway. NEALT Proceedings Series 20

Author:
Emanuele Lapponi, Erik Velldal, Nikolay A. Vazov, Stephan Oepen
Title:
Towards Large-Scale Language Analysis in the Cloud
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2018-9-11