Title: Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes
Authors: C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B. Kell, P. Reiser, and R.D. King
Series: Linköping Electronic Articles in Computer and Information Science
ISSN 1401-9841
Issue: Vol. 6 (2001), No. 012
URL: http://www.ep.liu.se/ea/cis/2001/012/

Abstract: We aim to partially automate some aspects of scientific work, namely the processes of forming hypotheses, devising trials to discriminate between these competing hypotheses, physically performing these trials and then using the results of these trials to converge upon an accurate hypothesis. We have developed ASE-Progol, an Active Learning system which uses Inductive Logic Programming to construct hypothesised first-order theories and uses a CART-like algorithm to select trials for eliminating ILP derived hypotheses. We have developed a novel form of learning curve, which in contrast to the form of learning curve normally used in Active Learning, allows one to compare the costs incurred by different leaning strategies.

We plan to combine ASE-Progol with a standard laboratory robot to create a general automated approach to Functional Genomics. As a first step towards this goal, we are using ASE-Progol to rediscover how genes participate in the aromatic amino acid pathway of Saccharomyces cerevisiae. Our approach involves auxotrophic mutant trials. To date, ASE-Progol has conducted such trials in silico. However we describe how they will be performed automatically in vitro by a standard laboratory robot designed for these sorts of liquid handling tasks, namely the Beckman/Coulter Biomek 2000.

Although our work to date has been limited to trials conducted in silico, the results have been encouraging. Parts of the model were removed and the ability of ASE-Progol to efficiently recover the performance of the model was measured. The cost of the chemicals consumed in converging upon a hypothesis with an accuracy in the range 46-88% was reduced if trials were selected by ASE-Progol rather than if they were sampled at random (without replacement). To reach an accuracy in the range 46-80%, ASE-Progol incurs five orders of magnitude less experimental costs than random sampling. ASE-Progol requires less time to converge upon a hypothesis with an accuracy in the range 74-87% than if trials are sampled at random (without replacement) or selected using the naive strategy of always choosing the cheapest trial from the set of candidate trials. For example to reach an accuracy of 80%, ASE-Progol requires 4 days while random sampling requires 6 days and the naive strategy requires 10 days.


Original publication
2001-08-30
Postscript Checksum