|Title:||Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes|
|Authors:||C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B. Kell, P. Reiser, and R.D. King|
|Series:||Linköping Electronic Articles
in Computer and Information Science
|Issue:||Vol. 6 (2001), No. 012|
|Abstract:|| We aim to partially automate some aspects of scientific work,
namely the processes of forming hypotheses, devising trials to discriminate
between these competing hypotheses, physically performing these trials and
then using the results of these trials to converge upon an accurate hypothesis.
We have developed ASE-Progol, an Active Learning system which uses Inductive
Logic Programming to construct hypothesised first-order theories and uses
a CART-like algorithm to select trials for eliminating ILP derived hypotheses.
We have developed a novel form of learning curve, which in contrast to the
form of learning curve normally used in Active Learning, allows one to compare
the costs incurred by different leaning strategies.
We plan to combine ASE-Progol with a standard laboratory robot to create a general automated approach to Functional Genomics. As a first step towards this goal, we are using ASE-Progol to rediscover how genes participate in the aromatic amino acid pathway of Saccharomyces cerevisiae. Our approach involves auxotrophic mutant trials. To date, ASE-Progol has conducted such trials in silico. However we describe how they will be performed automatically in vitro by a standard laboratory robot designed for these sorts of liquid handling tasks, namely the Beckman/Coulter Biomek 2000.
Although our work to date has been limited to trials conducted in silico, the results have been encouraging. Parts of the model were removed and the ability of ASE-Progol to efficiently recover the performance of the model was measured. The cost of the chemicals consumed in converging upon a hypothesis with an accuracy in the range 46-88% was reduced if trials were selected by ASE-Progol rather than if they were sampled at random (without replacement). To reach an accuracy in the range 46-80%, ASE-Progol incurs five orders of magnitude less experimental costs than random sampling. ASE-Progol requires less time to converge upon a hypothesis with an accuracy in the range 74-87% than if trials are sampled at random (without replacement) or selected using the naive strategy of always choosing the cheapest trial from the set of candidate trials. For example to reach an accuracy of 80%, ASE-Progol requires 4 days while random sampling requires 6 days and the naive strategy requires 10 days.
| Original publication