Periodic pattern detection in sparse boolean sequences

I Junier, J Hérisson, F Képès - Algorithms for Molecular Biology, 2010 - Springer
Algorithms for Molecular Biology, 2010Springer
Background The specific position of functionally related genes along the DNA has been
shown to reflect the interplay between chromosome structure and genetic regulation. By
investigating the statistical properties of the distances separating such genes, several
studies have highlighted various periodic trends. In many cases, however, groups built up
from co-functional or co-regulated genes are small and contain wrong information (data
contamination) so that the statistics is poorly exploitable. In addition, gene positions are not …
Background
The specific position of functionally related genes along the DNA has been shown to reflect the interplay between chromosome structure and genetic regulation. By investigating the statistical properties of the distances separating such genes, several studies have highlighted various periodic trends. In many cases, however, groups built up from co-functional or co-regulated genes are small and contain wrong information (data contamination) so that the statistics is poorly exploitable. In addition, gene positions are not expected to satisfy a perfectly ordered pattern along the DNA. Within this scope, we present an algorithm that aims to highlight periodic patterns in sparse boolean sequences, i.e. sequences of the type 010011011010... where the ratio of the number of 1's (denoting here the transcription start of a gene) to 0's is small.
Results
The algorithm is particularly robust with respect to strong signal distortions such as the addition of 1's at arbitrary positions (contaminated data), the deletion of existing 1's in the sequence (missing data) and the presence of disorder in the position of the 1's (noise). This robustness property stems from an appropriate exploitation of the remarkable alignment properties of periodic points in solenoidal coordinates.
Conclusions
The efficiency of the algorithm is demonstrated in situations where standard Fourier-based spectral methods are poorly adapted. We also show how the proposed framework allows to identify the 1's that participate in the periodic trends, i.e. how the framework allows to allocate a positional score to genes, in the same spirit of the sequence score. The software is available for public use at http://www.issb.genopole.fr/MEGA/Softwares/iSSB_SolenoidalApplication.zip .
Springer