Reflections and a Proposal for a Query and Reporting Language for Richly Annotated Multiparallel Corpora
Simon Clematide: Institute of Computational Linguistics, University of Zurich, Switzerland
Full text (pdf)
Large and open multiparallel corpora are a valuable resource for contrastive corpus linguists if the data is annotated and stored in a way that allows precise and flexible ad hoc searches. A linguistic query language should also support computational linguists in automated multilingual data mining. We review a broad range of approaches for linguistic query and reporting languages according to usability criteria such as expressibility, expressiveness, and efficiency. We propose an architecture that tries to strike the right balance to suit practical purposes.

Keywords: multiparallel corpora; treebank query languages; text corpus query languages

