Title: Automatic determination of protein fold signatures from structured superposition
Authors: A.P. Cootes, S.H. Muggleton, and M.J. Sternberg
Series: Linköping Electronic Articles in Computer and Information Science
ISSN 1401-9841
Issue: Vol. 6 (2001), No. 026
URL: http://www.ep.liu.se/ea/cis/2001/026/

Abstract: It remains unclear what principles underlie a protein sequence/structure adopting a given fold. Local properties such as the arrangement of secondary structure elements adjacent in sequence or global properties such as the total number of secondary structure elements may act as a constraint on the type of fold that a protein can adopt. Such constraints might be considered "signatures" of a given fold and their identification would be useful for the classification of protein structure. Inductive Logic Programming (ILP) has been applied to the problem of automatic identification of structural signatures. The signatures generated by ILP can then be both readily interpreted by a protein structure expert and tested for their accuracy. A previous application of ILP to this problem indicated that large insertions/deletions in proteins are an obstacle to learning rules that effectively discriminate between positive and negative examples of a given fold. Here, we apply an ILP learning scheme that reduces this problem by employing the structural superposition of protein domains with similar folds. This was done in three basic steps. Firstly, a multiple alignment of domains was generated for each type of fold studied. Secondly, the alignment was used to determine the secondary structure elements in each of those domains that can be considered equivalent to one another (the "core" elements of that fold). Thirdly, an ILP learning experiment was conducted to learn rules defining a fold in terms of those core elements.

Original publication
2001-08-30
HTML