Abstract
Hidden Markov Models (HMMs) are a commonly used tool for inference of transcription factor (TF) binding sites from DNA sequence data. We exploit the mathematical equivalence between HMMs for TF binding and the “inverse” statistical mechanics of hard rods in a one-dimensional disordered potential to investigate learning in HMMs. We derive analytic expressions for the Fisher information, a commonly employed measure of confidence in learned parameters, in the biologically relevant limit where the density of binding sites is low. We then use techniques from statistical mechanics to derive a scaling principle relating the specificity (binding energy) of a TF to the minimum amount of training data necessary to learn it.
Similar content being viewed by others
References
Berg, O.G., von Hippel, P.: Trends Biochem. Sci. 13, 207 (1988)
Stormo, G., Fields, D.: Trends Biochem. Sci. 23, 109 (1998)
Djordjevic, M., Sengupta, A.M., Shraiman, B.I.: Genome Res. 13, 2381 (2003)
Rajewsky, N., Vergassola, M., Gaul, U., Siggia, E.: BMC Bioinform. 3 (2002)
Sinha, S., van Nimwegen, E., Siggia, E.D.: Bioinformatics 19, 292 (2003)
Drawid, A., Gupta, N., Nagaraj, V., Gelinas, C., Sengupta, A.: BMC Bioinform. 10, 208 (2009)
Kinney, J.B., Tkaik, G., Callan, C.G.: Proc. Natl. Acad. Sci. USA 104, 501 (2007)
Percus, J.: J. Stat. Phys. 15 (1976)
Bishop, C.: In: Pattern Recognition and Machine Learning (2006)
Rabiner, L.: Proc. IEEE 257 (1989)
Schwab, D.J., Bruinsma, R., Rudnick, J., Widom, J.: Phys. Rev. Lett. 100, 228105 (2008)
Morozov, A., Fortney, K., Gaykalova, D.A., Studitsky, V., Widom, J., Siggia, E.: arXiv:0805.4017 (2008)
Baum, L.E., Petrie, T., Soules, G., Weiss, N.: Ann. Math. Stat. 41, 164 (1970)
Olsen, R., Bundschuh, R., Hwa, T.: In: Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, p. 211 (1999)
Tanay, A., Siggia, E.: Genome Biol. 9, 37 (2008)
Jeffreys, H.: Proc. R. Soc. Lond. Ser. A, Math. Phys. Sci. 186, 453 (1946)
Mahalanobis, P.: Proc. Natl. Inst. Sci. India 2, 49–55 (1936)
Mora, T., Walczak, A., Bialek, W., Callan, C.G.: Proc. Natl. Acad. Sci. USA 107, 5405 (2010)
Schneidman, E., Berry, M., Segev, R., Bialek, W.: Nature 440, 1007 (2006)
Halabi, N., Rivoire, O., Leibler, S., Ranganathan, R.: Cell 138, 774 (2009)
Weigt, M., White, R., Szurmant, H., Hoch, J., Hwa, T.: Proc. Natl. Acad. Sci. USA 106, 67 (2009)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mehta, P., Schwab, D.J. & Sengupta, A.M. Statistical Mechanics of Transcription-Factor Binding Site Discovery Using Hidden Markov Models. J Stat Phys 142, 1187–1205 (2011). https://doi.org/10.1007/s10955-010-0102-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10955-010-0102-x