Article | Selected papers from the CLARIN Annual Conference 2016, Aix-en-Provence, 26‚Äď28 October 2016, CLARIN Common Language Resources and Technology Infrastructure | MTAS: A Solr/Lucene based Multi Tier Annotation Search solution
Göm menyn

Title:
MTAS: A Solr/Lucene based Multi Tier Annotation Search solution
Author:
Matthijs Brouwer: Meertens Institute, The Netherlands Hennie Brugman: Meertens Institute, The Netherlands Marc Kemps-Snijders: Meertens Institute, The Netherlands
Download:
Full text (pdf)
Year:
2017
Conference:
Selected papers from the CLARIN Annual Conference 2016, Aix-en-Provence, 26‚Äď28 October 2016, CLARIN Common Language Resources and Technology Infrastructure
Issue:
136
Article no.:
002
Pages:
19-37
No. of pages:
19
Publication type:
Abstract and Fulltext
Published:
2017-05-23
ISBN:
978-91-7685-499-0
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Publisher:
Linköping University Electronic Press, Linköpings universitet


Export in BibTex, RIS or text

In recent years, multiple solutions have become available providing search on huge amounts of plain text and metadata. Scalable searchability on annotated text however still appears to be problematic. With Mtas, an acronym for Multi-Tier Annotation Search, we add annotation layers and structure to the existing Lucene approach of creating and searching indexes, and furthermore present an implementation as Solr plugin providing both searchability and scalability. We present a configurable indexation process, supporting multiple document formats, and providing extended search options on both metadata and annotated text, such as advanced statistics, faceting, grouping and keyword-in-context. Mtas is currently used in production environments, with up to 15 million documents and 9.5 billion words. Mtas is available from GitHub.

Keywords: Multi tier annotation search, Lucene, SOLR, kwic, statistics

Selected papers from the CLARIN Annual Conference 2016, Aix-en-Provence, 26‚Äď28 October 2016, CLARIN Common Language Resources and Technology Infrastructure

Author:
Matthijs Brouwer, Hennie Brugman, Marc Kemps-Snijders
Title:
MTAS: A Solr/Lucene based Multi Tier Annotation Search solution
References:

Banski, P. et al., 2013. KorAP: the new corpus analysis platform at IDS Mannheim.. s.l., s.n.


Brouwer, M. et al., 2014. Nederlab, towards a Virtual Research Environment for textual data.. s.l., s.n.


Brugman, H. et al., 2016. Nederlab: Towards a Single Portal and Research Environment for Diachronic Dutch
Text Corpora.. s.l., ELRA, pp. 1277-1281.


Evert, S. & Hardie, A., 2011. Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium. Birmingham, s.n.


Kilgarriff, A., Rychly, P., Smrz, P. & Tugwell, D., 2004. Itri-04-08 the sketch engine. Lorient, s.n.


Meurer, P., 2012. Corpuscle ‚Äď a new corpus management platform for annotated corpora. In: G. Andersen, ed. Exploring Newspaper Language: Using the web to create and investigate a large corpus of modern Norwegian. s.l.:John Benjamins.


Odijk, J., 2015. Linguistic research with PaQu.. Computational Linguistics in The Netherlands, Volume 5, pp. 3-14.


Reynaert, M., Camp, M. v. d. & Zaanen, M. v., 2014. OpenSoNaR: user-driven development of the SoNaR corpus interfaces.. s.l., s.n., pp. 124-128.


Vandeghinste, Vincent & Augustinus, L., 2014. Making a large treebank searchable online. The SoNaR case.. Reykjavik, s.n., pp. 15-20.

Selected papers from the CLARIN Annual Conference 2016, Aix-en-Provence, 26‚Äď28 October 2016, CLARIN Common Language Resources and Technology Infrastructure

Author:
Matthijs Brouwer, Hennie Brugman, Marc Kemps-Snijders
Title:
MTAS: A Solr/Lucene based Multi Tier Annotation Search solution
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2017-02-21