Article | Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16 | Building an open-source development infrastructure for language technology projects
Göm menyn

Title:
Building an open-source development infrastructure for language technology projects
Author:
Sjur N. Moshagen: University of Tromsø, Norway Tommi A. Pirinen: Helsinki university, Finland Trond Trosterud: University of Tromsø, Norway
Download:
Full text (pdf)
Year:
2013
Conference:
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16
Issue:
085
Article no.:
031
Pages:
343-352
No. of pages:
10
Publication type:
Abstract and Fulltext
Published:
2013-05-17
ISBN:
978-91-7519-589-6
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press; Linköpings universitet


Export in BibTex, RIS or text

The article presents the Giellatekno & Divvun language technology resources; more specifically the effort to utilise open-source tools to improve the build infrastructure; and the solutions to help adapt to best practices for software development. The article especially discusses how the infrastructure has been remade to cope with an increasing number of languages without incurring extra overhead for the maintainers; and at the same time let the linguists concentrate on the linguistic work. Finally; the article discusses how a uniform infrastructure like the one presented can be used to easily compare languages in terms of morphological or computational complexity; coverage or for cross-lingual applications.

Keywords: NoDaLiDa 2013; Infrastructure; Computational linguistics; Finite-state transducers; Language resources; Multilinguality

Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16

Author:
Sjur N. Moshagen, Tommi A. Pirinen, Trond Trosterud
Title:
Building an open-source development infrastructure for language technology projects
References:

Antonsen; L.; Trosterud; T.; and Wiechetek; L. (2010). Reusing Grammatical Resources for New Languages. In Calzolari; N.; Choukri; K.; Maegaard; B.; Mariani; J.; Odijk; J.; Piperidis; S.; Rosner; M.; and Tapias; D.; editors; Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10); Valletta; Malta. European Language Resources Association (ELRA).


Broda; B.; Marci´nczuk; M.; and Piasecki; M. (2010). Building a Node of the Accessible Language Technology Infrastructure. In Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10).


Cunningham; H.; Humphreys; K.; Gaizauskas; R.; and Wilks; Y. (1997). Software infrastructure for natural language processing. In Proceedings of the fifth conference on Applied natural language processing; ANLC ’97; pages 237–244; Stroudsburg; PA; USA. Association for Computational Linguistics.


Cunningham; H.; Maynard; D.; Bontcheva; K.; Tablan; V.; Aswani; N.; Roberts; I.; Gorrell; G.; Funk; A.; Roberts; A.; Damljanovic; D.; Heitz; T.; Greenwood; M. A.; Saggion; H.; Petrak; J.; Li; Y.; and Peters; W. (2011). Text Processing with GATE (Version 6). Gate.


Federmann; C.; Giannopoulou; I.; Girardi; C.; Hamon; O.; Mavroeidis; D.; Minutoli; S.; and Schröder; M. (2012). META-SHARE v2: An Open Network of Repositories for Language Resources including Data and Tools. In Calzolari; N.; Choukri; K.; Declerck; T.; Do?gan; M. U.; Maegaard; B.; Mariani; J.; Odijk; J.; and Piperidis; S.; editors; Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12); Istanbul; Turkey. European Language Resources Association (ELRA).


Forcada; M. L.; Ginestí-Rosell; M.; Nordfalk; J.; O’Regan; J.; Ortiz-Rojas; S.; Pérez-Ortiz; J. A.; Sánchez-Martínez; F.; Ramírez-Sánchez; G.; and Tyers; F. M. (2011). Apertium: a free/open-source platform for rule-based machine translation. Machine Translation.


Huizinga; D. and Kolawa; A. (2007). Automated Defect Prevention: Best Practices in Software Management. Wiley.


Karlsson; F. (1990). Constraint Grammar As A Framework For Parsing Running Text. Proceedings of the 13th International Conference on Computational Linguistics; pages 168–173.


Knuth; D. E. (1984). Literate Programming. The Computer Journal; 27(2):97–111.


Lindén; K.; Axelson; E.; Hardwick; S.; Pirinen; T.; and Silfverberg; M. (2011). Hfst—framework for compiling and applying morphologies. Systems and Frameworks for Computational Morphology; pages 67–85.


Loper; E. and Bird; S. (2002). NLTK: the Natural Language Toolkit. In Proceedings of the ACL-02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics - Volume 1; ETMTNLP ’02; pages 63–70; Stroudsburg; PA; USA. Association for Computational Linguistics.


Oflazer; K. (1996). Error-tolerant finite state recognition with applications to morphological analysis and spelling correction. COMPUTATIONAL LINGUISTICS; 22:73–89.


Trosterud; T. (2012). A restricted freedom of choice: Linguistic diversity in the digital landscape. Nordlyd; 39(2):89–104.


Váradi; T.; Krauwer; S.; Wittenburg; P.; Wynne; M.; and Koskenniemi; K. (2008). CLARIN: Common Language Resources and Technology Infrastructure. In Calzolari; N.; Choukri; K.; Maegaard; B.; Mariani; J.; Odijk; J.; Piperidis; S.; and Tapias; D.; editors; Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08); Marrakech; Morocco. European Language Resources Association (ELRA). http://www.lrecconf. org/proceedings/lrec2008/.


Wettig; H.; Hiltunen; S.; and Yangarber; R. (2011). MDL-based Models for Alignment of Etymological Data. In Proceedings of RANLP: the 8th Conference on Recent Advances in Natural Language Processing; Hissar; Bulgaria.

Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16

Author:
Sjur N. Moshagen, Tommi A. Pirinen, Trond Trosterud
Title:
Building an open-source development infrastructure for language technology projects
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2017-02-21