George Papadatos - Knime Tutorial
George Papadatos - Knime Tutorial
George Papadatos - Knime Tutorial
Outline
Introduction to KNIME
Basic components
Desktop, nodes, dialogs, workflows
Demo
Compound selection for focused screening
12/12/2013
12/12/2013
What is KNIME?
12/12/2013
KNIME resources
Web pages (documentation)
www.knime.org | tech.knime.org | tech.knime.org/installation-0
Downloads
knime.org/download-desktop
Community forum
tech.knime.org/forum
Myself
georgep@ebi.ac.uk
12/12/2013
Chemoinformatics
Conversions, similarity, clustering, (Q)SAR analysis, MMPs, reaction
enumeration
Scripting integration
R, Perl, Python, Matlab, Octave, Groovy
Reporting
So much more
Bioinformatics, HTS & image analysis, network & text mining
Marketing, bid data and business analytics
6
12/12/2013
Bioinformatics
HCS (MPI), NGS (Konstanz), Image analysis
Text mining
Palladian
Integration
Python, Perl, R, Groovy, Matlab (MPI), PDB web services client (Vernalis)
12/12/2013
12/12/2013
KNIME Workbench
Node description
tabs
workflow projects
favorite nodes
public server
workflow editor
node repository
12/12/2013
outline
console
Icon
Status display (traffic lights)
Sequence number
Red (not ready)
Amber (ready)
Green (executed)
10
12/12/2013
Right-click menu
To configure and
execute the node,
display the output
views, edit the
node, and display
data for the ports
11
12/12/2013
12
12/12/2013
12/12/2013
14
12/12/2013
The objective
15
12/12/2013
First steps - I
Locate the directory with todays
material
1
2
16
12/12/2013
First steps - II
Open a new workflow
Right click on the workflow projects area
1
2
17
12/12/2013
18
12/12/2013
SDF Reader
.\data\SMDC_cleaned_nodups.sdf
1
19
12/12/2013
20
12/12/2013
Molecule to RDKit
21
12/12/2013
12/12/2013
23
12/12/2013
Descriptor Calculation
1
2
24
12/12/2013
Java Snippet
.\code\Lipinski.txt
25
12/12/2013
26
12/12/2013
27
12/12/2013
Java Snippet
.\code\Oprea.txt
28
12/12/2013
29
12/12/2013
30
12/12/2013
31
12/12/2013
32
12/12/2013
12/12/2013
34
12/12/2013
Molecule to Indigo
35
12/12/2013
File reader
36
12/12/2013
.\data\PAINS_clean_half.sdf
37
12/12/2013
38
12/12/2013
39
12/12/2013
Substructure Matcher
40
12/12/2013
Loop End
41
12/12/2013
42
12/12/2013
43
12/12/2013
12/12/2013
45
12/12/2013
RDKit Fingerprint
46
12/12/2013
47
12/12/2013
48
12/12/2013
2D/3D Scatterplot
49
12/12/2013
50
12/12/2013
12/12/2013
Exercise 1
52
Exercise 2
Tips
Use the Molecule to RDKit node
Use the RDKit Descriptor Calculator node
Include the SlogP and ExactMW descriptors
53
12/12/2013
12/12/2013
Conclusions
Compound selection for focused screening
Typical scenario
KNIME
Open and free
Data analysis
Chemoinformatics toolkits
Erl Wood, RDKit, Indigo, CDK, etc.
55
12/12/2013
Further reading
Open data and tools
1. Irwin, J. J.; Sterling, T.; Mysinger, M. M.; Bolstad, E. S.; Coleman, R. G., ZINC:
A free tool to discover chemistry for biology. Journal of Chemical Information
and Modeling 2012 ASAP.
2. Saubern, S.; Guha, R.; Baell, J. B., KNIME workflow to assess PAINS filters in
SMARTS format. Comparison of RDKit and Indigo cheminformatics libraries.
Molecular Informatics 2011, 30, (10), 847-850.
3. Barnes, M. R.; Harland, L.; Foord, S. M.; Hall, M. D.; Dix, I.; Thomas, S.;
Williams-Jones, B. I.; Brouwer, C. R., Lowering industry firewalls: precompetitive informatics initiatives in drug discovery. Nature Reviews Drug
Discovery 2009, 8, (9), 701-708.
4. Berthold, M. R.; Cebron, N.; Dill, F.; Gabriel, T. R.; Ktter, T.; Meinl, T.; Ohl, P.;
Sieb, C.; Thiel, K.; Wiswedel, B., KNIME: The Konstanz Information Miner. In
Data Analysis, Machine Learning and Applications, Preisach, C.; Burkhardt, H.;
Schmidt-Thieme, L.; Decker, R., Eds. Springer: Berlin, 2008; pp 319-326.
5. Tiwari, A.; Sekhar, A. K. T., Workflow based framework for life science
informatics. Computational Biology and Chemistry 2007, 31, (5-6), 305-319.
56
12/12/2013
Further reading
High throughput screening
1. Bajorath, J., Integration of virtual and high-throughput screening. Nature
Reviews Drug Discovery 2002, 1, (11), 882-894.
2. Harper, G.; Pickett, S. D.; Green, D. V. S., Design of a compound
screening collection for use in High Throughput Screening. Combinatorial
Chemistry & High Throughput Screening 2004, 7, (1), 63-70.
57
12/12/2013
Further reading
Physicochemical properties and drug discovery
1. Brstle, M.; Beck, B.; Schindler, T.; King, W.; Mitchell, T.; Clark, T., Descriptors,
physical properties, and drug-likeness. Journal of Medicinal Chemistry 2002, 45,
(16), 3345-3355.
2. Hill, A. P.; Young, R. J., Getting physical in drug discovery: A contemporary
perspective on solubility and hydrophobicity. Drug Discovery Today 2010, 15,
(15/16), 648-655.
3. Leeson, P. D.; Springthorpe, B., The influence of drug-like concepts on decisionmaking in medicinal chemistry. Nature Reviews Drug Discovery 2007, 6, (11), 881890.
58
12/12/2013
Further reading
Similarity and diversity
1. Ashton, M.; Barnard, J.; Casset, F.; Charlton, M.; Downs, G.; Gorse, D.; Holliday,
J.; Lahana, R.; Willett, P., Identification of diverse database subsets using
property-based and fragment-based molecular descriptions. Quantitative
Structure-Activity Relationships 2002, 21, (6), 598-604.
2. Bender, A.; Glen, R. C., Molecular similarity: a key technique in molecular
informatics. Organic and Biomolecular Chemistry 2004, 2, 3204-3218.
3. Gorse, A.-D., Diversity in medicinal chemistry space. Current Topics in Medicinal
Chemistry 2006, 6, (1), 3-18.
4. Maldonado, A.; Doucet, J.; Petitjean, M.; Fan, B.-T., Molecular similarity and
diversity in chemoinformatics: From theory to applications. Molecular Diversity
2006, 10, (1), 39-79.
5. Rogers, D.; Hahn, M., Extended-connectivity fingerprints. Journal of Chemical
Information and Modeling 2010, 50, (5), 742-754.
6. Schuffenhauer, A.; Brown, N., Chemical diversity and biological activity. Drug
Discovery Today: Technologies 2006, 3, (4), 387-395.
7. Willett, P.; Barnard, J. M.; Downs, G. M., Chemical similarity searching. Journal
of Chemical Information and Computer Sciences 1998, 38, (6), 983-996.
59
12/12/2013