Abstract
Knowledge discovery in clinico-genomic data is a task that requires to integrate not only highly heterogeneous kinds of data, but also the requirements and interests of very different user groups. Technologies of grid computing promise to be an effective tool to combine all these requirements into a single architecture. In this paper, we describe scenarios and future research directions related to grid-based knowledge discovery in clinico-genomic data, and introduce the approach taken by the recently launched ACGT project. The whole endeavor is considered in the context of biomedical informatics research and aims towards the realization of an integrated and grid-enabled biomedical infrastructure. The presented integrated clinico-genomics knowledge discovery (ICGKD) scenario and its process realization is based on a multi-strategy data-mining approach that seamlessly integrates three distinct data-mining components: clustering, association rules mining, and feature-selection. Preliminary experimental results are indicative of the rational and reliability of the approach.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Sander, C.: Genomic Medicine and the Future of Health Care. Science 287(5460), 1977–1978 (2000)
Martin-Sanchez, F., et al.: Synergy between medical informatics and bioinformatics: facilitating genomic medicine for future health care. Journal of Biomedical Informatics 37(1), 30–42 (2004)
Foster, I., Kesselman, C.(eds.).: The Grid: Blueprint for a New Computing Infrastructure, 2nd edn. Morgan Kaufmann, San Francisco (2004)
Stankovski, V., May, M., Franke, J., Schuster, A., McCourt, D., Dubitzky, W.: A service-centric perspective for data mining in complex problem solving environments. In: Proc. Int. Conf. on Parallel and Distributed Processing Techniques and Applications (PDPTA 2004), Las Vegas, USA, vol. II, pp. 780–787 (2004)
Parks, M.R., Disis, M.L.: Conflicts of interest in translational research. Journal of Translational Medicine 2(28), 1–4 (2004)
Witten, I., Frank, E.: Data Mining – Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (2000)
R Development Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2005) ISBN 3-900051-07-0
Tsiknakis, M., Kafetzopoulos, D., Potamias, G., Analyti, A., Marias, K., Manganas, A.: Building a European Biomedical Grid on Cancer: The ACGT Integrated Project. Stud Health Technol Inform. 120, 247–258 (2006)
Potamias, G., Tsiknakis, M., Papoutsidis, V., Kanterakis, A., Marias, K., Kafetzopoulos, D.: Advancing Clinico-Genomic Research Trials via Integrated Knowledge Discovery Operations. In: MIE 2006 (poster presentation) (2006)
Potamias, G., Koumakis, L., Moustakis, V.: Mining XML Clinical Data: The HealthObs System. Ingenierie des systems d’information, special session: Recherche, extraction et exploration d’information 10(1), 59–79 (2004)
Potamias, G., Koumakis, L., Moustakis, V.: Gene Selection via Discretized Gene-Expression Profiles and Greedy Feature-Elimination. In: Vouros, G.A., Panayiotopoulos, T. (eds.) SETN 2004. LNCS (LNAI), vol. 3025, pp. 256–266. Springer, Heidelberg (2004)
Eisen, M., Spellman, P., Botstein, D., Brown, P.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. 96, 14863–14867 (1999)
Alizadeh, A., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)
Golub, T., et al.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)
Alon, U., et al.: Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays. Proc. Natl. Acad. Sci. 96, 6745–6750 (1999)
Gupta, S.K., Rao, S., Bhatnagar, V.: K-means Clustering Algorithm for Categorical Attributes. In: Mohania, M., Tjoa, A.M. (eds.) DaWaK 1999. LNCS, vol. 1676, pp. 203–208. Springer, Heidelberg (1999)
San, O.M., Huynh, V., Nakamori, Y.: An alternative extension of the k-means algorithm for clustering categorical data. Int. J. Appl. Math. Comput. Sci. 14(2), 241–247 (2004)
Kanterakis, A., Potamias, G.: Supporting Clinico-Genomic Knowledge Discovery: A Multi-Strategy Data Mining Process. In: Antoniou, G., Potamias, G., Spyropoulos, C., Plexousakis, D. (eds.) SETN 2006. LNCS (LNAI), vol. 3955, pp. 520–524. Springer, Heidelberg (2006)
Katehakis, D., Sfakianaki, S., Tsiknakis, M., Orphanoudakis, S.: An Infrastructure for Integrated Electronic Health Record Services: The Role of XML. Journal of Medical Internet Research 3(1), E7 (2001)
van’t Veer, L., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
May, M., Potamias, G., Rüping, S. (2006). Grid-Based Knowledge Discovery in Clinico-Genomic Data. In: Maglaveras, N., Chouvarda, I., Koutkias, V., Brause, R. (eds) Biological and Medical Data Analysis. ISBMDA 2006. Lecture Notes in Computer Science(), vol 4345. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11946465_20
Download citation
DOI: https://doi.org/10.1007/11946465_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68063-5
Online ISBN: 978-3-540-68065-9
eBook Packages: Computer ScienceComputer Science (R0)