Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1854776.1854786acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

Relational operators for prioritizing candidate biomarkers in high-throughput differential expression data

Published: 02 August 2010 Publication History

Abstract

Recent developments in high-throughput proteomics technologies have made it possible to detect and identify low abundance proteins. These technologies provide a new window through which proteomes can be analyzed. Despite holding great promise, the contribution of mass spectrometry based proteomics in identifying novel diagnostic biomarkers has been disappointing. This failure has, in part, been attributed to the lack of effective strategies for determining candidate biomarkers that justify more expensive and time-consuming validation studies. An approach that bridges the gap between unbiased experimental paradigm emphasizing comprehensive characterizations of proteins and a candidate-driven paradigm would overcome this limitation [38]. To this end, we have developed database operators that extend the database management systems to analyze high-throughput proteomics and genomics data. By analyzing differentially expressed genes and proteins using pathway databases, these operators take advantage of established expert domain knowledge in pathway annotation to prioritize candidate biomarkers. They provide a systematic way of bridging the gap between unbiased experimental paradigm and candidate-driven paradigm. To test the operators, we analyzed a dataset of salivary proteins differentially expressed between pre-malignant and malignant oral lesions. Six proteins are identified as candidate biomarkers worth of validation studies. A literature search reveals these high priorit candidate biomarkers interact with proteins implicated in cancer development highlighting their potential utility as biomarkers demonstrating the effectiveness of our operators. The developed operators will help overcome one of the main challenges of high-throughput computational techniques; provide a systematic way of bridging the gap between unbiased data driven approach and hypothesis driven approach to prioritize candidate biomarkers worth of more expensive and time consuming validation studies.

References

[1]
T. Andreasen, H. Bulskov, and R. Knappe. On ontology-based querying. In Flexible query answering systems: recent advances: proceedings of the Fourth International Conference on Flexible Query Answering Systems, FQAS'2000, October 25--28, 2000, Warsaw, Poland, page 15. Physica Verlag, 2000.
[2]
G. Bebek and J. Yang. PathFinder: mining signal transduction pathway segments from protein-protein interaction networks. BMC bioinformatics, 8(1):335, 2007.
[3]
A. H. Bild, G. Yao, J. T. Chang, Q. Wang, A. Potti, D. Chasse, M.-B. Joshi, D. Harpole, J. M. Lancaster, A. Berchuck, J. A. J. Olson, J. R. Marks, H. K. Dressman, M. West, and J. R. Nevins. Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature, 439(7074):353--7, 2006.
[4]
P. Bouros, S. Skiadopoulos, T. Dalamagas, D. Sacharidis, and T. Sellis. Evaluating reachability queries over path collections. In Proceedings of the 21st International Conference on Scientific and Statistical Database Management, page 416. Springer, 2009.
[5]
K. Bruso. The development of bucketing operators and a supporting operator framework for relational database management systems. 2009.
[6]
K. Clement, N. Gustafson, A. Berbert, H. Carroll, C. Merris, A. Olsen, M. Clement, Q. Snell, J. Allen, and R. Roper. PathGen: A Transitive Gene Pathway Generator. Bioinformatics, 2009.
[7]
E. Cohen, K. Ihida-Stansbury, M. Lu, R. Panettieri, P. Jones, and E. Morrisey. Wnt signaling regulates smooth muscle precursor development in the mouse lung via a tenascin C/PDGFR pathway. J Clin Invest, 119(9):2538--49, 2009.
[8]
J. Ding, D. Song, X. Ye, and S. F. Liu. A pivotal role of endothelial-specific NF-kappaB signaling in the pathogenesis of septic shock and septic vascular dysfunction. J Immunol, 183(6):4031--8, 2009.
[9]
B. Eckman and P. Brown. Graph data management for molecular and cell biology. IBM Journal of Research and Development, 50(6):560, 2006.
[10]
M. Eltabakh, M. Ouzzani, W. Aref, A. Elmagarmid, Y. Laura-Silva, M. Arshad, D. Salt, and I. Baxter. Managing biological data using bdbms. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, pages 1600--1603. Citeseer, 2008.
[11]
J. A. Engelman. Targeting PI3K signalling in cancer: opportunities, challenges and limitations. Nat Rev Cancer, 9(8):550--62, 2009.
[12]
R. A. George, J. Y. Liu, L. L. Feng, R. J. Bryson-Richardson, D. Fatkin, and M. A. Wouters. Analysis of protein sequence and interaction data for candidate disease gene prediction. Nucleic Acids Res, 34(19):e130, 2006.
[13]
G. Hart, A. Ramani, and E. Marcotte. How complete are current yeast and human protein-interaction networks. Genome Biol, 7(11):120, 2006.
[14]
Q. He and J. Chiu. Proteomics in biomarker discovery and drug development. Journal of cellular biochemistry, 89(5):868--886, 2003.
[15]
M. Horner, L. Ries, M. Krapcho, N. Neyman, R. Aminou, N. Howlader, S. Altekruse, E. Feuer, L. Huang, A. Mariotto, et al. SEER cancer statistics review, 1975--2006. Bethesda, MD: National Cancer Institute. (Accessed June 22, 2009, at http://seer.cancer.gov/csr/1975_2006.), 2009.
[16]
T. Ideker and R. Sharan. Protein networks in disease. Genome Res, 18(4):644--52, 2008.
[17]
M. G. Kann. Protein interactions and disease: computational approaches to uncover the etiology of diseases. Brief Bioinform, 8(5):333--46, 2007.
[18]
Keydata. Sql rank. http://www.1keydata.com/sql/sql-rank.html.
[19]
S. Kohler, S. Bauer, D. Horn, and P. Robinson. Walking the interactome for prioritization of candidate disease genes. The American Journal of Human Genetics, 82(4):949--958, 2008.
[20]
L. Krishnamurthy, J. Nadeau, G. Ozsoyoglu, M. Ozsoyoglu, G. Schaeffer, M. Tasan, and W. Xu. Pathways database system: an integrated system for biological pathways. Bioinformatics, 19(8):930, 2003.
[21]
S. Kruck, J. Bedke, J. Hennenlotter, P. A. Ohneseit, U. Kuehs, E. Senger, K.-D. Sievert, and A. Stenzl. Activation of mTOR in renal cell carcinoma is due to increased phosphorylation rather than protein overexpression. Oncol Rep, 23(1):159--63, 2010.
[22]
S. Lapenna and A. Giordano. Cell cycle kinases as therapeutic targets for cancer. Nat Rev Drug Discov, 8(7):547--66, 2009.
[23]
U. Leser. A query language for biological networks. BIOINFORMATICS-OXFORD-, 21(2), 2005.
[24]
H. H. Luu, R. Zhang, R. C. Haydon, E. Rayburn, Q. Kang, W. Si, J. K. Park, H. Wang, Y. Peng, W. Jiang, and T.-C. He. Wnt/beta-catenin signaling pathway as a novel cancer drug target. Curr Cancer Drug Targets, 4(8):653--71, 2004.
[25]
M. Mannino and L. Shapiro. Extensions to query languages for graph traversal problems. IEEE Transactions on Knowledge and Data Engineering, 2(3):353--363, 1990.
[26]
R. Mehrian-Shai, C. D. Chen, T. Shi, S. Horvath, S. F. Nelson, J. K. V. Reichardt, and C. L. Sawyers. Insulin growth factor-binding protein 2 is a candidate biomarker for PTEN status and PI3K/Akt pathway activation in glioblastoma and prostate cancer. Proc Natl Acad Sci U S A, 104(13):5563--8, 2007.
[27]
R. Nibbe, M. Koyuturk, and M. Chance. An Integrative-omics Approach to Identify Functional Sub-Networks in Human Colorectal Cancer. 2010.
[28]
R. Nibbe, S. Markowitz, L. Myeroff, R. Ewing, and M. Chance. Discovery and scoring of protein interaction subnetworks discriminative of late stage human colon cancer. Molecular & Cellular Proteomics, 8(4):827, 2009.
[29]
F. Olken. Graph data management for molecular biology. OMICS A Journal of Integrative Biology, 7(1):75--78, 2003.
[30]
G. Onsongo, H. Xie, T. Griffin, and J. Carlis. Generating GO Slim Using Relational Database Management Systems to Support Proteomics Analysis. In 21st IEEE International Symposium on Computer-Based Medical Systems, 2008. CBMS'08, pages 215--217, 2008.
[31]
M. Oti and H. G. Brunner. The modular nature of genetic diseases. Clin Genet, 71(1):1--11, 2007.
[32]
M. Pepe, R. Etzioni, Z. Feng, J. Potter, M. Thompson, M. Thornquist, M. Winget, and Y. Yasui. Phases of biomarker development for early detection of cancer. JNCI Journal of the National Cancer Institute, 93(14):1054--61, 2001.
[33]
C. Perez-Iratxeta, M. Wjst, P. Bork, and M. A. Andrade. G2D: a tool for mining genes associated with disease. BMC Genet, 6(NIL):45, 2005.
[34]
B. Perez-Ordonez, M. Beauchemin, and R. C. K. Jordan. Molecular biology of squamous cell carcinoma of the head and neck. J Clin Pathol, 59(5):445--53, 2006.
[35]
T. Prasad, R. Goel, K. Kandasamy, S. Keerthikumar, S. Kumar, S. Mathivanan, D. Telikicherla, R. Raju, B. Shafreen, A. Venugopal, et al. Human Protein Reference Database-2009 update. Nucleic acids research, 2008.
[36]
A. A. Ptitsyn, M. M. Weil, and D. H. Thamm. Systems biology approach to identification of biomarkers for metastatic progression in cancer. BMC Bioinformatics, 9 Suppl 9(NIL):S8, 2008.
[37]
P. Qu, J. Roberts, Y. Li, M. Albrecht, O. W. Cummings, J. N. Eble, H. Du, and C. Yan. Stat3 downstream genes serve as biomarkers in human lung carcinomas and chronic obstructive pulmonary disease. Lung Cancer, 63(3):341--7, 2009.
[38]
N. Rifai, M. Gillette, and S. Carr. Protein biomarker discovery and validation: the long and uncertain path to clinical utility. Nature biotechnology, 24(8):971--984, 2006.
[39]
L. H. Saal, P. Johansson, K. Holm, S. K. Gruvberger-Saal, Q.-B. She, M. Maurer, S. Koujak, A. A. Ferrando, P. Malmstrom, L. Memeo, J. Isola, P.-O. Bendahl, N. Rosen, H. Hibshoosh, M. Ringner, A. Borg, and R. Parsons. Poor prognosis in carcinoma is associated with a gene expression signature of aberrant PTEN tumor suppressor pathway activity. Proc Natl Acad Sci U S A, 104(18):7564--9, 2007.
[40]
C. L. Sawyers. The cancer biomarker problem. Nature, 452(7187):548--52, 2008.
[41]
J. S. Sebolt-Leopold and R. Herrera. Targeting the mitogen-activated protein kinase cascade to treat cancer. Nat Rev Cancer, 4(12):937--47, 2004.
[42]
S. Shekhar, A. Fetterer, and B. Goyal. Materialization trade-offs in hierarchical shortest path algorithms. Lecture notes in computer science, pages 94--114, 1997.
[43]
D. Soh, D. Dong, Y. Guo, and L. Wong. Enabling more sophisticated gene expression analysis for understanding diseases and optimizing treatments. 2007.
[44]
S. Tata, W. Lang, and J. Patel. Periscope/SQ: interactive exploration of biological sequence databases. In Proceedings of the 33rd international conference on Very large data bases, pages 1406--1409. VLDB Endowment, 2007.
[45]
S. Tata and J. Patel. PiQA: An algebra for querying protein data sets. In Proc. of 15th SSDBM Conf. Citeseer, 2003.
[46]
N. Tiffin, M. A. Andrade-Navarro, and C. Perez-Iratxeta. Linking genes to diseases: it's all in the data. Genome Med, 1(8):77, 2009.
[47]
S. Tilghman. Lessons learned, promises kept: a biologist's eye view of the Genome Project. Genome Research, 6(9):773, 1996.
[48]
L.-C. Tranchevent, R. Barriot, S. Yu, S. V. Vooren, P. V. Loo, B. Coessens, B. D. Moor, S. Aerts, and Y. Moreau. ENDEAVOUR update: a web resource for gene prioritization in multiple species. Nucleic Acids Res, 36(Web Server issue):W377--84, 2008.
[49]
Z. Tu, L. Wang, M. Xu, X. Zhou, T. Chen, and F. Sun. Further understanding human disease genes by comparing with housekeeping genes and other genes. BMC genomics, 7(1):31, 2006.
[50]
M. A. van Driel, K. Cuelenaere, P. P. C. W. Kemmeren, J. A. M. Leunissen, H. G. Brunner, and G. Vriend. GeneSeeker: extraction and integration of human disease-related information from web-based genetic databases. Nucleic Acids Res, 33(Web Server issue):W758--61, 2005.
[51]
O. Vanunu, O. Magger, E. Ruppin, T. Shlomi, and R. Sharan. Associating Genes and Protein Complexes with Disease via Network Propagation. 2010.
[52]
I. Vastrik, P. D'Eustachio, E. Schmidt, G. Gopinath, D. Croft, B. de Bono, M. Gillespie, B. Jassal, S. Lewis, L. Matthews, G. Wu, E. Birney, and L. Stein. Reactome: a knowledge base of biologic pathways and processes. Genome Biol, 8(3):R39, 2007.
[53]
X. Wu, R. Jiang, M. Q. Zhang, and S. Li. Network-based global inference of human disease genes. Mol Syst Biol, 4(NIL):189, 2008.
[54]
I. Xenarios, D. W. Rice, L. Salwinski, M. K. Baron, E. M. Marcotte, and D. Eisenberg. DIP: the database of interacting proteins. Nucleic Acids Res, 28(1):289--91, 2000.
[55]
H. Xie, G. Onsongo, J. Popko, E. P. de Jong, J. Cao, J. V. Carlis, R. J. Griffin, N. L. Rhodus, and T. J. Griffin. Proteomics analysis of cells in whole saliva from oral cancer patients via value-added three-dimensional peptide fractionation and tandem mass spectrometry. Mol Cell Proteomics, 7(3):486--98, 2008.
[56]
J. Xu and Y. Li. Discovering disease-genes by topological features in human protein-protein interaction network. Bioinformatics, 22(22):2800--5, 2006.
[57]
G. Zhang, K. A. Kernan, A. Thomas, S. Collins, Y. Song, L. Li, W. Zhu, R. C. Leboeuf, and A. A. Eddy. A novel signaling pathway: fibroblast nicotinic receptor alpha1 binds urokinase and promotes renal fibrosis. J Biol Chem, 284(42):29050--64, 2009.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
BCB '10: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
August 2010
705 pages
ISBN:9781450304382
DOI:10.1145/1854776
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 August 2010

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

BCB'10
Sponsor:

Acceptance Rates

Overall Acceptance Rate 254 of 885 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 85
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media