Article | Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16 | Tidying up the Basement: A Tale of Large-Scale Parsing on National eInfrastructure Link�ping University Electronic Press Conference Proceedings
Göm menyn

Title:
Tidying up the Basement: A Tale of Large-Scale Parsing on National eInfrastructure
Author:
Stephan Oepen: Universitetet i Oslo, Norway
Download:
Full text (pdf)
Year:
2013
Conference:
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16
Issue:
085
Article no.:
005
Pages:
9-9
No. of pages:
1
Publication type:
Abstract and Fulltext
Published:
2013-05-17
ISBN:
978-91-7519-589-6
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press; Linköpings universitet


Export in BibTex, RIS or text

Until about six years ago; our research group used non-trivial amounts of project funds and researcher time on maintaining a dedicated server farm in the basement of our department. Rack space and cooling (just as much as funds and time) were in short supply; and we never quite got around to implementing automated load balancing across compute nodes; tuning the Linux kernel and filesystem for optimum performance; or connecting to the uninterruptible power supply. When pointed to the Norwegian National High-Performance Computing Initiative; we were intially doubtful that Natural Language Processing should be among their target user groups. Also; we were a tad hesitant to give up control of our own equipment and of course worried we would miss what we thought were our fancy toys. Today; any member of the group can access thousands of cpus simultaneously; we have about five terabytes of project data on-line; and our research has scaled to dataset sizes and turn-around times that would be just inconceivable on group-local hardware—at no charge to our project funds and no administrator responsibilities. For example; ‘deep’ semantic parsing of the about 900 million words of the English Wikipedia we can typically complete in less than one day (while expending what would be about eight sequential years of computation). Or; when searching for the best-performing features and hyper-parameters in a machine learning problem; we can explore a large ‘grid’ of possible configurations in parallel; without much need for a staged; partly manual; ‘coarseto- fine’ search strategy. Access to the very large-scale Norwegian National eInfrastructure and its high-quality technical support have enabled a comparatively computation-heavy research profile of our group and has thus contributed to its international competitiveness. In this presentation; I will review some of our experiences in establishing a dialogue with the HPC crowd and propose HPC for the Masses as a candidate vision in the on-going development trend towards more and more large-scale computational sciences.

Keywords: Parsing Wikipedia; HPC for the Masses; National eInfrastructure

Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16

Author:
Stephan Oepen
Title:
Tidying up the Basement: A Tale of Large-Scale Parsing on National eInfrastructure
References:
No references available

Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16

Author:
Stephan Oepen
Title:
Tidying up the Basement: A Tale of Large-Scale Parsing on National eInfrastructure
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2018-9-11