Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Novel approaches for small biomolecule classification and structural similarity search

Published: 01 June 2007 Publication History

Abstract

Structural similarity search among small molecules is a standard tool used in molecular classification and in-silico drug discovery. The effectiveness of this general approach depends on how well the following problems are addressed. The notion of similarity should be chosen for providing the highest level of discrimination of compounds with respect to the bioactivity of interest. The data structure for performing search should be very efficient as the molecular databases of interest include several millions of compounds.
In this paper we summarize the recent applications of k-nearest-neighbor search method for small molecule classification. The k-nn classification of small molecules is based on selecting the most relevant set of chemical descriptors which are then compared under standard Minkowski distance Lp. Here we describe how to computationally design the optimal weighted Minkowski distance wLp for maximizing the discrimination between active and inactive compounds wrt bioactivities of interest. k-nn classification requires fast similarity search for predicting bioactivity of a new molecule. We then focus on construction of pruning based k-nn search data structures for any wLp distance that minimizes similarity search time.
The accuracy achieved by k-nn classifier is better than the alternative LDA and MLR approaches and is comparable to the ANN methods. In terms of running time, k-nn classifier is considerably faster than the ANN approach especially when large data sets are used. Furthermore, k-nn classifier is capable of quantification of the level of bioactivity rather than returning a binary decision and can bring more insight to the nature of the activity via eliminating unrelated descriptors of the compounds with respect to the activity in question.

References

[1]
Adamson, G. W., Cowell, J., Lynch, M. F., McLure, A. H. W., Town, W. G., Yapp, A. M. (1973) Strategic Considerations in the Design of a Screening System for Substructure Searches of Chemical Structure Files, J. Chem. Doc, 13, 153--157.
[2]
Brown, R. D. (1997) Descriptors for Diversity Analysis, Persp. Drug Discovery Des., 7/8, 31--49.
[3]
Chen, X., Reynolds, C. H. (2002) Performance of Similarity Measures in 2D Fragment-Based Similarity Searching: Comparison of Structural Descriptors and Similarity Coefficients, J. Chem. Inf. & Comp. Sci., 42, 1407--1414.
[4]
Cherkasov, A. (2005) Inductive Descriptors. 10 Successful Years in QSAR, Curr. Computer-Aided Drug Des., 1, 21--42.
[5]
Chvatal, V. (1979) A Greedy Heuristic for the Set Covering Problem, Math. of Operations Research, 4, 233--235.
[6]
Cramer, R. D., Bunce, J. D., and Patterson, D. E. (1988) Crossvalidation, Bootstrapping, and Partial Least Squares Compared with Multiple Regression in Conventional QSAR Studies, Quant. Struct.-Act. Relat. 7, 18--25.
[7]
Geladi P., and Kowalski B. R. (1986) Partial Least-Squares Regression: A Tutorial, Analytica Chimica Acta, 185, 1--17.
[8]
Good, A. C., So, S. S., Richards W. G. (1993) Structure - Activity relationships from Molecular similarity Matrices, J. Medicinal Chemistry, 36, 433--438.
[9]
Itskowitz, P., and Tropsha, A. (2005) Kappa Nearest neighbors QSAR modeling as a variational problem: theory and applications, J. Chem. Inf. Model., 45(3), 777--85.
[10]
Karakoc, E., Cherkasov A., and Sahinalp S. C. (2006) Distance based ALgorithms for Small Biomolecule Classification and Structural Similarity Search, ISMB'06 Intelligent Sytems for Molecular Biology, May 2006.
[11]
Karakoc, E., Sahinalp S. C., and Cherkasov A. (2006) Comparative QSAR- and Fragments Distribution Analysis of Drugs, Druglikes, Metabolic Substances and Antimicrobial Compounds, Journal of Chemical Information and Modelling, 46(5), 2167--2182.
[12]
Livingstone, D. J. (1995) Data analysis for chemists. Applications to QSAR and chemical product design, Oxford Univ. Press, 239.
[13]
MACCS II Manual, MDL Information Systems, Inc 14600 Catalina Street, San Leandro, CA 94577 USA.
[14]
Maggiora, G. M., Johnson, M. A. (1990) Concepts and Applications of Molecular Similarity, Wiley, New York.
[15]
Sahinalp, S. C., Tasan, M., Macker, J., Ozsoyoglu Z. M.(2003) Distance-Based Indexing for String Proximity Search, Proc. IEEE Int. Conf. on Data Eng., 19, 135--138.
[16]
Uhlmann, J. K. (1991) Satisfying general proximity/similarity queries with metric trees, Inf. Proc. Lett., 4, 175--179.
[17]
Willett, P., Banard, J. M., and Downs, G. M. (1998) Chemical Similarity Searching, J. Chem. Inf. & Comp. Sci., 38 (6), 983--996.
[18]
Yianilos, P. N. (1993) Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces, Proc. ACM-SIAM Symp. on Discr. Alg., 1, 311--321.
[19]
Zernov, V. V., Balakin, K. V., Ivaschenko, A. A., Savchuk, N. P., Pletnev, I. V. (2003) Drug Discovery Using Support Vector Machines. The Case Studies of Drug-likeness, Agrochemical-likeness, and Enzyme Inhibition Predictions, J. Chem. Inf. & Comp. Sci., 43(6), 2048--2056.
[20]
Zheng, W. and Tropsha, A. (2000) Novel Variable selection quantitative structure-property relationship approach based on the k-nearest neighbor principle, J. Chem. Inf. & Comp. Sci., 40, 185.
[21]
Zupan, J., Gasteiger, J. (1999) Neural Networks in Chemistry and Drug Design, 2nd ed., Wiley, New York.

Cited By

View all
  • (2022)Machine Learning in Antibacterial Drug DesignFrontiers in Pharmacology10.3389/fphar.2022.86441213Online publication date: 3-May-2022
  • (2018)FASTBEE: A Fast and Self-Adaptive Clustering Algorithm Towards to Edge Computing2018 5th IEEE International Conference on Cyber Security and Cloud Computing (CSCloud)/2018 4th IEEE International Conference on Edge Computing and Scalable Cloud (EdgeCom)10.1109/CSCloud/EdgeCom.2018.00031(128-133)Online publication date: Jun-2018
  • (2013)Semantic Computing and Drug Discovery - A Preliminary ReportProceedings of the 2013 IEEE Seventh International Conference on Semantic Computing10.1109/ICSC.2013.86(453-458)Online publication date: 16-Sep-2013
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGKDD Explorations Newsletter
ACM SIGKDD Explorations Newsletter  Volume 9, Issue 1
Special issue on data mining for health informatics
June 2007
58 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/1294301
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2007
Published in SIGKDD Volume 9, Issue 1

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Machine Learning in Antibacterial Drug DesignFrontiers in Pharmacology10.3389/fphar.2022.86441213Online publication date: 3-May-2022
  • (2018)FASTBEE: A Fast and Self-Adaptive Clustering Algorithm Towards to Edge Computing2018 5th IEEE International Conference on Cyber Security and Cloud Computing (CSCloud)/2018 4th IEEE International Conference on Edge Computing and Scalable Cloud (EdgeCom)10.1109/CSCloud/EdgeCom.2018.00031(128-133)Online publication date: Jun-2018
  • (2013)Semantic Computing and Drug Discovery - A Preliminary ReportProceedings of the 2013 IEEE Seventh International Conference on Semantic Computing10.1109/ICSC.2013.86(453-458)Online publication date: 16-Sep-2013
  • (2008)Substructure similarity measurement in chinese recipesProceedings of the 17th international conference on World Wide Web10.1145/1367497.1367629(979-988)Online publication date: 21-Apr-2008
  • (2008)Accommodating substructure similarity-based search in a recipe database system2008 2nd IEEE International Conference on Digital Ecosystems and Technologies10.1109/DEST.2008.4635199(91-96)Online publication date: Feb-2008

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media