Author_Institution :
Dept. of Clinical Sci., Univ. of Bergen, Bergen, Norway
Abstract :
A deoxyribonucleic acid (DNA) sequence can be represented as a sequence with 4 characters. If a particular property of the DNA is studied, for example, GC content, then it is possible to consider a binary sequence. In many cases, if the probabilistic properties of a segment differ from the neighbouring ones, this means that the segment can play a structural role. Therefore, DNA segmentation is given a special attention, and it is one of the most significant applications of change-point detection. Problems of this type also arise in a wide variety of areas, for example, seismology, industry (e.g., fault detection), biomedical signal processing, financial mathematics, speech and image processing. In this study, we have developed a Cross-Entropy algorithm for identifying change-points in binary sequences with first-order Markov dependence. We propose a statistical model for this problem and show effectiveness of our algorithm for synthetic and real datasets.
Keywords :
DNA; Markov processes; biology computing; entropy; molecular biophysics; molecular configurations; statistical analysis; DNA segmentation; GC content; binary Markov DNA sequences; biomedical signal processing; change-point detection; cross-entropy method; deoxyribonucleic acid sequence; fault detection; financial mathematics; first-order Markov dependence; image processing; probabilistic properties; real datasets; seismology; speech processing; statistical model; synthetic datasets; DNA; Educational institutions; Estimation; Genomics; Markov processes; Optimization; Vectors;