The fundamental problem of computational linguistics is the modelling of the basic linguistic processes ‚Äď comprehension; production and learning of language. It includes central AI problems such as perception; communication; knowledge; planning; reasoning and learning. On the application side; information retrieval including text mining and machine translation; currently; seem to dominate.
By tradition; the term Computational Linguistics refers to written language; whereas Speech Technology is used for the analysis and synthesis of spoken language. In later years; Language Technology has emerged as a common denominator of both modes of language. The division into Speech Technology and Computational Linguistics was mainly based on different scientific traditions and methods. Whereas statistical methods dominated in speech technology; so did symbolic processing in computational linguistics. However; the statistical methods originally developed for the analysis of speech were found to be more or less directly applicable to the fairly new paradigm of data-driven machine translation. Below; machine translation will be focused.
Intuitively; translation may be understood as comprehending a source language text and producing an equivalent text in the target language. In terms of AI; or CL; it would imply modelling the comprehension process and the production process; respectively.
Assuming that the comprehension process would result in a complete; language independent meaning representation; an interlingua; this representation might be the starting point of the production process; and there would be no need for a specific translation step. In such an approach; knowledge becomes a central issue. How to identify it and how to represent it? This ideal approach to machine translation; the interlingua approach; has been explored by many researchers with interesting findings; but no viable translation system; as a result. The Google translation service is as far from the interlingua model that one may get; since it does not use any refined knowledge of language at all.
AI researchers in the 70-ies; such as Roger Schank; Terry Winograd and Yorick Wilks; made substantial and innovative contributions to the exploration of language comprehension and production; illustrating the relevance of knowledge; in particular world knowledge; planning; and reasoning in the use of language. Their approaches were exclusively symbolic; excluding; basically; machine learning of language. Currently; the data-driven approach dominates the field of machine translation; and attempts are made to combine or complement it with rule-based methods in order to overcome inherent limitations with the approach.