DNA pattern recognition using canonical correlation algorithm

BK Sarkar, C Chakraborty - Journal of biosciences, 2015 - Springer
Journal of biosciences, 2015Springer
We performed canonical correlation analysis as an unsupervised statistical tool to describe
related views of the same semantic object for identifying patterns. A pattern recognition
technique based on canonical correlation analysis (CCA) was proposed for finding required
genetic code in the DNA sequence. Two related but different objects were considered: one
was a particular pattern, and other was test DNA sequence. CCA found correlations
between two observations of the same semantic pattern and test sequence. It is concluded …
Abstract
We performed canonical correlation analysis as an unsupervised statistical tool to describe related views of the same semantic object for identifying patterns. A pattern recognition technique based on canonical correlation analysis (CCA) was proposed for finding required genetic code in the DNA sequence. Two related but different objects were considered: one was a particular pattern, and other was test DNA sequence. CCA found correlations between two observations of the same semantic pattern and test sequence. It is concluded that the relationship possesses maximum value in the position where the pattern exists. As a case study, the potential of CCA was demonstrated on the sequence found from HIV-1 preferred integration sites. The subsequences on the left and right flanking from the integration site were considered as the two views, and statistically significant relationships were established between these two views to elucidate the viral preference as an important factor for the correlation.
Springer