Title: Finding Genes from DNA Sequences by Using Multi-Stream Hidden Markov Models
Authors: Kiyoshi Asai
Series: Linköping Electronic Articles in Computer and Information Science
ISSN 1401-9841
Issue: Vol. 6 (2001), No. 017
URL: http://www.ep.liu.se/ea/cis/2001/017/

Abstract: It is necessary to integrate various types of information in order to recognize the genes in DNA sequences. The information might be the four letters of DNA sequences, stochastic frequency of the corresponding codons, homology search scores, splice cite signal strengths. We have developed a software system of multi-stream Hidden Markov Models with which those information can be easily integrated with a consistent measure of probabilities. The output symbols of HMMs are the signals that we can observe. In the field of bioinformatics, the output symbols of the HMMs are mostly the four letters of nucleic acids or the twenty letters of amino acids. However, the output symbols can be anything, as far as we can attach their probability distributions. They can be discrete symbols, real values, real valued vectors, and multiple streams of those values. We propose gene annotation using HMMs with multiple streams, which combine the sequence and other pre-processed information. The important feature of multi-stream HMMs is that the weights of the streams can be optimized for each model. The multi-stream HMM with adjustable weight permits very flexible design of the gene finding systems.

Original publication
Postscript Checksum