| Abstract: |
It remains unclear what principles underlie a protein sequence/structure
adopting a given fold. Local properties such as the arrangement of secondary
structure elements adjacent in sequence or global properties such as the
total number of secondary structure elements may act as a constraint on
the type of fold that a protein can adopt. Such constraints might be considered
"signatures" of a given fold and their identification would be useful for
the classification of protein structure. Inductive Logic Programming (ILP)
has been applied to the problem of automatic identification of structural
signatures. The signatures generated by ILP can then be both readily interpreted
by a protein structure expert and tested for their accuracy. A previous
application of ILP to this problem indicated that large insertions/deletions
in proteins are an obstacle to learning rules that effectively discriminate
between positive and negative examples of a given fold. Here, we apply an
ILP learning scheme that reduces this problem by employing the structural
superposition of protein domains with similar folds. This was done in three
basic steps. Firstly, a multiple alignment of domains was generated for
each type of fold studied. Secondly, the alignment was used to determine
the secondary structure elements in each of those domains that can be considered
equivalent to one another (the "core" elements of that fold). Thirdly, an
ILP learning experiment was conducted to learn rules defining a fold in
terms of those core elements. |