Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Integrating genomic binding site predictions using real-valued meta classifiers

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Currently the best algorithms for predicting transcription factor binding sites in DNA sequences are severely limited in accuracy. There is good reason to believe that predictions from different classes of algorithms could be used in conjunction to improve the quality of predictions. In this paper, we apply single layer networks, rules sets, support vector machines and the Adaboost algorithm to predictions from 12 key real valued algorithms. Furthermore, we use a ‘window’ of consecutive results as the input vector in order to contextualise the neighbouring results. We improve the classification result with the aid of under- and over-sampling techniques. We find that support vector machines and the Adaboost algorithm outperform the original individual algorithms and the other classifiers employed in this work. In particular they give a better tradeoff between recall and precision.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Abnizova I, Rust A, Robinson M, Te Boekhorst R, Gilks WR (2006) Transcription binding site prediction using markov models. J Bioinform Comput Biol 4(2):425–441, 16819793 (P,S,G,E,B)

    Google Scholar 

  2. Abnizova I, te Boekhorst R, Walter C, Gilks WR (2005) Some statistical properties of regulatory DNA sequences and their use in predicting regulatory regions in Drosophila genome: the fluffy tail test. BMC Bioinformatics 6:109

    Google Scholar 

  3. Apostolico A, Bock ME, Lonardi S, Xu X (2000) Efficient detection of unusual words. J Comput Biol 7(1–2):71–94

    Google Scholar 

  4. Bailey TL, Elkan C (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proceedings of the second international conference on intelligent systems for molecular biology. AAAI Press, pp 28–36

  5. Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, New York

    Google Scholar 

  6. Blanchette M, Tompa M (2003) FootPrinter: a program designed for phylogenetic footprinting. Nucleic Acids Res 31(13):3840–3842

    Article  Google Scholar 

  7. Brown CT (2002) New computational approaches for analysis of cis-regulatory networks. Dev Biol 246(1):86–102

    Article  Google Scholar 

  8. Buckland M, Gey F (1994) The relationship between recall and precision. J Am Soc Inform Sci 45(1):12–19

    Article  Google Scholar 

  9. Bucher P (1990) Weight matrix descriptions of four eukaryotic RNA polymerase II promotor elements derived from 502 unrelated promotor sequences. J Mol Biol 212:563–578

    Google Scholar 

  10. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    MATH  Google Scholar 

  11. Fawcett R (2004) ROC graphs: notes and practical considerations for researchers. Kluwer, Dordrecht

  12. Fawcett T (2001) Using rule sets to maximize ROC performance. In: Proceedings of the IEEE international conference on data mining (ICDM-2001), IEEE Computer Society, Los Alamitos, pp 131–138

  13. Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36

    Google Scholar 

  14. Hughes JD, Estep PW, Tavazoie S, Church GM (2000) Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol 296(5):1205–1214

    Article  Google Scholar 

  15. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139

    Article  MATH  MathSciNet  Google Scholar 

  16. Wu G, Chang EY (2003) Class-boundary alignment for imbalanced dataset learning. Workshop on learning from imbalanced datasets, II, ICML, Washington

  17. Japkowicz N (2003) Class imbalances: are we focusing on the right issure? Workshop on learning from imbalanced datasets, II, ICML, Washington

  18. Joshi M, Kumar V, Agarwal R (2001) Evaluating Boosting algorithms to classify rare classes: comparison and improvements. In: First IEEE international conference on data mining, San Jose

  19. Markstein M, Stathopoulos A, Markstein V, Markstein P, Harafuji N, Keys D, Lee B, Richardson P, Rokshar D, Levine M (2002) Decoding noncoding regulatory DNAs in metazoan genomes. In: Proceeding of 1st IEEE computer society bioinformatics conference (CSB 2002), Stanford, August 2002, pp 14–16

  20. Quinlan JR (1993) C4.5: programs for machine learning, Morgan Kauffman, Los Altos

  21. Rajewsky N, Vergassola M, Gaul U, Siggia ED (2002) Computational detection of genomic cis regulatory modules, applied to body patterning in the early Drosophila embryo. BMC Bioinformatics 3:30

    Article  Google Scholar 

  22. Schapire RE, Freund Y, Bartlett PL, Lee WS (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Stat 26(5):1651–1686

    Article  MATH  MathSciNet  Google Scholar 

  23. Scholköpf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. The MIT Press, Cambridge

  24. Sun Y, Robinson M, Adams R, Kayes P, Rust AG, Davey N (2005) Integrating binding site predictions using meta classification methods. In: Proceedings ICANNGA05

  25. Sun Y, Robinson M, te Boekhorst R, Adams R, Rust AG, Davey N (2006) Using feature selection filtering metohds for binding site predictions. In: The 5th IEEE international conference on cognitive informatics, ICCI05, Beijing

  26. Sun Y, Robinson M, te Boekhorst R, Adams R, Rust AG, Davey N (2006) Using sampling methods to improve binding site predictions. In: 14th European symposium on artificial neural networks, ESANN, Bruges

  27. Sun Y, Robinson M, Adams R, Rust A, Davey N (2008) Prediction of binding sites in the mouse genome using support vector machine. In: Kurkova V, Neruda R, Koutnik J (eds) Proceedings of 18th international conference on artificial neural networks (ICANN2008). Springer Part 2 (LNCS 5164), Prague, September 2008, pp 91–100

  28. Te Boekhorst R, Abnizova I, Nehaniv C (2008) Discriminating coding, non-coding and regulatory regions using rescaled range and detrended fluctuation analysis. Biosystems 91(1):183–194

    Article  Google Scholar 

  29. Thijs G, Marchal K, Lescot M, Rombauts S, De Moor B, Rouz P Moreau Y (2001) A Gibbs sampling method to detect over-represented motifs in upstream regions of coexpressed genes. In: Proceedings Recomb’2001, pp 305–312

  30. Tompa M et al (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol 23(1):137–144

    Google Scholar 

  31. White RJ (2001) Gene transcription: mechanisms and control. Blackwell, Oxford

  32. Wolfsberg TG, Gabrieliam AE, Campbell AE, Cho MJ, Spouge RJ, Landsman D (1999) Candidatge regulatory sequence elements for cell cycle-dependent transcription in Saccharomyces cerevisiae. Genome Res 9:775–792

    Google Scholar 

  33. Wu TF, Lin CJ, Weng RC (2004) Probability estimates for multi-class classification by pairwise coupling. J Mach Learn Res 5:975–1005

    MathSciNet  Google Scholar 

  34. http://emboss.sourceforge.net/

  35. http://sourceforge.net/projects/netmotsa

  36. http://sourceforge.net/projects/pars

  37. http://www.fruitfly.org/annot/apollo

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Sun.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, Y., Robinson, M., Adams, R. et al. Integrating genomic binding site predictions using real-valued meta classifiers. Neural Comput & Applic 18, 577–590 (2009). https://doi.org/10.1007/s00521-008-0204-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-008-0204-4

Keywords