Predicting the functional sites of a protein from its structure, such as the binding sites of sma... more Predicting the functional sites of a protein from its structure, such as the binding sites of small molecules, other proteins or antibodies sheds light on its function in vivo. Currently, two classes of methods prevail: Machine Learning (ML) models built on top of handcrafted features and comparative modeling. They are respectively limited by the expressivity of the handcrafted features and the availability of similar proteins. Here, we introduce ScanNet, an end-to-end, interpretable geometric deep learning model that learns features directly from 3D structures. ScanNet builds representations of atoms and amino acids based on the spatio-chemical arrangement of their neighbors. We train ScanNet for detecting protein-protein and protein-antibody binding sites, demonstrate its accuracy - including for unseen protein folds - and interpret the filters learned. Finally, we predict epitopes of the SARS-CoV-2 spike protein, validating known antigenic regions and predicting previously unchar...
Non-coding RNAs are transcripts that do not encode proteins play key roles in many biological pro... more Non-coding RNAs are transcripts that do not encode proteins play key roles in many biological processes. The alignment of their secondary structures has become a major tool in RNA functional annotation. Many of the non-coding RNAs contain pseudoknots as a structural motif, which proved to be functionally important. We present HARP, a heuristic algorithm for the pairwise alignment of non-restricted (arbitrary) classes of pseudoknotted RNA secondary structures. HARP applies "geodesic hashing" to perform inexact matching of specially defined reduced RNA secondary structure graphs. The method proved to be efficient both in time and memory and was successfully tested on a benchmark of available experimental structures with known function. A web server is available at http://bioinfo3d.cs.tau.ac.il/HARP/.
Non-coding RNAs are transcripts that do not encode proteins play key roles in many biological pro... more Non-coding RNAs are transcripts that do not encode proteins play key roles in many biological processes. The alignment of their secondary structures has become a major tool in RNA functional annotation. Many of the non-coding RNAs contain pseudoknots as a structural motif, which proved to be functionally important. We present HARP, a heuristic algorithm for the pairwise alignment of non-restricted (arbitrary) classes of pseudoknotted RNA secondary structures. HARP applies “geodesic hashing” to perform inexact matching of specially defined reduced RNA secondary structure graphs. The method proved to be efficient both in time and memory and was successfully tested on a benchmark of available experimental structures with known function. A web server is available at http://bioinfo3d.cs.tau.ac.il/HARP/.
Proceedings. International Conference on Intelligent Systems for Molecular Biology, 2000
We present two algorithms which align flexible protein structures. Both apply efficient structura... more We present two algorithms which align flexible protein structures. Both apply efficient structural pattern detection and graph theoretic techniques. The FlexProt algorithm simultaneously detects the hinge regions and aligns the rigid subparts of the molecules. It does it by efficiently detecting maximal congruent rigid fragments in both molecules and calculating their optimal arrangement which does not violate the protein sequence order. The FlexMol algorithm is sequence order independent, yet requires as input the hypothesized hinge positions. Due its sequence order independence it can also be applied to protein-protein interface matching and drug molecule alignment. It aligns the rigid parts of the molecule using the Geometric Hashing method and calculates optimal connectivity among these parts by graph-theoretic techniques. Both algorithms are highly efficient even compared with rigid structure alignment algorithms. Typical running times on a standard desktop PC (400 MHz) are abo...
We present a novel highly efficient method for the detection of a pharmacophore from a set of dru... more We present a novel highly efficient method for the detection of a pharmacophore from a set of drug-like ligands that interact with a target receptor. A pharmacophore is a spatial arrangement of physico-chemical features in a ligand that is essential for the interaction with a specific receptor. In the absence of a known three-dimensional (3D) receptor structure, a pharmacophore can be identified from a multiple structural alignment of ligand molecules. The key advantages of the presented algorithm are: (a) its ability to multiply align flexible ligands in a deterministic manner, (b) its ability to focus on subsets of the input ligands, which may share a large common substructure, resulting in the detection of both outlier molecules and alternative binding modes, and (c) its computational efficiency, which allows to detect pharmacophores shared by a large number of molecules on a standard PC. The algorithm was extensively tested on a dataset of almost 80 ligands acting on 12 different receptors. The results, which were achieved using a set of standard default parameters, were consistent with reference pharmacophores that were derived from the bound ligand-receptor complexes. The pharmacophores detected by the algorithm are expected to be a key component in the discovery of new leads by screening large databases of drug-like molecules. A user-friendly web interface is available at http://bioinfo3d.cs.tau.ac.il/pharma. Supplementary material can be found at http://bioinfo3d.cs.tau.ac.il/pharma/reduction/.
The molecular chaperone Hsp90 is a ubiquitous ATPase-directed protein responsible for the activat... more The molecular chaperone Hsp90 is a ubiquitous ATPase-directed protein responsible for the activation and structural stabilization of a large clientele of proteins. As such, Hsp90 has emerged as a suitable candidate for the treatment of a diverse set of diseases, such as cancer and neurodegeneration. The inhibition of the chaperone through ATP-competitive inhibitors, however, was shown to lead to undesirable side effects. One strategy to alleviate this problem is the development of molecules that are able to disrupt specific protein–protein interactions, thus modulating the activity of Hsp90 only in the particular cellular pathway that needs to be targeted. Here, we exploit novel computational and theoretical approaches to design a set of peptides that are able to bind Hsp90 and compete for its interaction with the co-chaperone Cdc37, which is found to be responsible for the promotion of cancer cell proliferation. In spite of their capability to disrupt the Hsp90–Cdc37 interaction, n...
Methods in molecular biology (Clifton, N.J.), 2017
In this chapter we present two methods related to rational design of inhibitory peptides: PepCraw... more In this chapter we present two methods related to rational design of inhibitory peptides: PepCrawler: A tool to derive binding peptides from protein-protein complexes and the prediction of protein-peptide complexes. Given an initial protein-peptide complex, the method detects improved predicted peptide binding conformations which bind the protein with higher affinity. This program is a robotics motivated algorithm, representing the peptide as a robotic arm moving among obstacles and exploring its conformational space in an efficient way. PinaColada: A peptide design program for the discovery of novel peptide candidates that inhibit protein-protein interactions. PinaColada uses PepCrawler while introducing sequence mutations, in order to find novel inhibitory peptides for PPIs. It uses the ant colony optimization approach to explore the peptide's sequence space, while using PepCrawler in the refinement stage.
Motivation: Design of protein-protein interaction (PPI) inhibitors is a major challenge in Struct... more Motivation: Design of protein-protein interaction (PPI) inhibitors is a major challenge in Structural Bioinformatics. Peptides, especially short ones (5-15 amino acid long), are natural candidates for inhibition of protein-protein complexes due to several attractive features such as high structural compatibility with the protein binding site (mimicking the surface of one of the proteins), small size and the ability to form strong hotspot binding connections with the protein surface. Efficient rational peptide design is still a major challenge in computer aided drug design, due to the huge space of possible sequences, which is exponential in the length of the peptide, and the high flexibility of peptide conformations. Results: In this article we present PinaColada, a novel computational method for the design of peptide inhibitors for protein-protein interactions. We employ a version of the ant colony optimization heuristic, which is used to explore the exponential space (20 n) of length n peptide sequences, in combination with our fast robotics motivated PepCrawler algorithm, which explores the conformational space for each candidate sequence. PinaColada is being run in parallel, on a DELL PowerEdge 2.8 GHZ computer with 20 cores and 256 GB memory, and takes up to 24 h to design a peptide of 5-15 amino acids length.
Motivation: A highly efficient template-based protein-protein docking algorithm, nicknamed SnapDo... more Motivation: A highly efficient template-based protein-protein docking algorithm, nicknamed SnapDock, is presented. It employs a Geometric Hashing-based structural alignment scheme to align the target proteins to the interfaces of non-redundant protein-protein interface libraries. Docking of a pair of proteins utilizing the 22 600 interface PIFACE library is performed in < 2 min on the average. A flexible version of the algorithm allowing hinge motion in one of the proteins is presented as well. Results: To evaluate the performance of the algorithm a blind remodelling of 3547 PDB complexes, which have been uploaded after the PIFACE publication has been performed with success ratio of about 35%. Interestingly, a similar experiment with the template free PatchDock docking algorithm yielded a success rate of about 23% with roughly 1/3 of the solutions different from those of SnapDock. Consequently, the combination of the two methods gave a 42% success ratio.
[1988 Proceedings] 9th International Conference on Pattern Recognition, 2000
A method for recognition of partially occluded and overlapping objects in composite scenes is pre... more A method for recognition of partially occluded and overlapping objects in composite scenes is presented. The objects to be recognized belong to a large data base of model objects which are known in advance. A geometric hashing technique which is an improvement of a previous technique due to Schwartz and Sharir is proposed. The algorithm's complexity is linear in the number of sample points on the boundary of the composite scene. Experimental results from large databases are presented, including results that could not be successfully solved by the previous technique. 1. Introduction. A major task in robotic vision is the identification and localization of objects in the robots workspace. The objects to be recognized may overlap and be partially occluded. Since in an industrial environment one is usually faced with a limited number of possible objects which are known in advance, it is practical to build a model-based system (see [l]). In such a system the known objects are precompiled creating a model data base, and this database is used to recognize and localize objects in an image scene. In this paper we suggest a new 2-D modelbased recognition scheme. It allows recognition of flat objects, or 3-D objects having a small
The building block protein folding model states that the native protein structure is the product ... more The building block protein folding model states that the native protein structure is the product of a combinatorial assembly of relatively structurally independent contiguous parts of the protein that possess a hydrophobic core, i.e., building blocks (BBs). According to this model, our group proposed a three-stage scheme for a feasible time-wise semi ab-intio protein structure prediction. Given a protein sequence, at the first stage of the prediction scheme, we propose cutting the sequence into structurally assigned BBs. Next, we perform a combinatorial assembly and attempt to predict the relative three-dimensional arrangement of the BBs. In the third stage, we refine and rank the assemblies. The scheme has proven to be very promising in reducing the complexity of the protein folding problem and gaining insight into the protein folding process. In this chapter, we describe the different stages of the scheme and discuss a possible application of the model to protein design.
Combinatorial Chemistry High Throughput Screening, Sep 1, 1999
Here we examine the recognition of small molecules by their protein and DNA receptors. We focus o... more Here we examine the recognition of small molecules by their protein and DNA receptors. We focus on two questions: First, how well does the solid angle molecular surface representation perform in fitting together the surfaces of small ligands, such as drugs and cofactors to their corresponding receptors; And second, in particular, to what extent does the shape complementarity play a role in the matching (recognition) process of such small molecules. Both questions have been investigated in protein-protein binding: &quot;Critical Points&quot; based on solid angle calculations have been shown to perform well in the matching of large protein molecules. They are robust, may be few in numbers, and capture satisfactorily the molecular shape. Shape complementarity has been shown to be a critical factor in protein-protein recognition, but has not been examined in drug-receptor recognition. To probe these questions, here we dock 185 receptor-small ligand molecule pairs. We find that such a representation performs adequately for the smaller ligands too, and that shape complementarity is also observed. These issues are important, given the large databases of drugs that routinely have to be scanned to find candidate, lead compounds. We have been able to carry out such large scale docking experiments owing to our efficient, computer-vision based docking algorithms. Its fast CPU matching times, on the order of minutes on a PC, allows such large scale docking experiments.
Proteins Structure Function and Bioinformatics, Jun 1, 2006
Correlated mutations have been repeatedly exploited for intramolecular contact map prediction. Ov... more Correlated mutations have been repeatedly exploited for intramolecular contact map prediction. Over the last decade these efforts yielded several methods for measuring correlated mutations. Nevertheless, the application of correlated mutations for the prediction of intermolecular interactions has not yet been explored. This gap is due to several obstacles, such as 3D complexes availability, paralog discrimination, and the availability of sequence pairs that are required for inter-but not intramolecular analyses. Here we selected for analysis fusion protein families that bypass some of these obstacles. We find that several correlated mutation measurements yield reasonable accuracy for intramolecular contact map prediction on the fusion dataset. However, the accuracy level drops sharply in intermolecular contacts prediction. This drop in accuracy does not occur always. In the Cohesin-Dockerin family, reasonable accuracy is achieved in the prediction of both intra-and intermolecular contacts. The Cohesin-Dockerin family is well suited for correlated mutation analysis. Because, however, this family constitutes a special case (it has radical mutations, has domain repeats, within each species each Dockerin domain interacts with each Cohesin domain, see below), the successful prediction in this family does not point to a general potential in using correlated mutations for predicting intermolecular contacts. Overall, the results of our study indicate that current methodologies of correlated mutations analysis are not suitable for large-scale intermolecular contact prediction, and thus cannot assist in docking. With current measurements, sequence availability, sequence annotations, and underdeveloped sequence pairing methods, correlated mutations can yield reasonable accuracy only for a handful of families. Proteins 2006;63:832-845.
Cryo-EM has become an increasingly powerful technique for elucidating the structure, dynamics and... more Cryo-EM has become an increasingly powerful technique for elucidating the structure, dynamics and function of large flexible macromolecule assemblies that cannot be determined at atomic-resolution. A major challenge in analyzing EM maps of complexes is the identification of their subunits. We propose a fully automated highly efficient method for discovering high-resolution subunits of a complex, given as an intermediate resolution
Geometrie Hashing : A General and Efficient Model-Based Recognition Scheme Yehezkel Lamdan and Ha... more Geometrie Hashing : A General and Efficient Model-Based Recognition Scheme Yehezkel Lamdan and Haim J.Wolfson Robotics Research Laboratory Courant Inst, of Math., NYU 715 Broadway, 12&amp;#x27;th floor, New York, NY 10003. Abstract: A general method for model-based ...
Here we address the following questions. How many structurally different entries are there in the... more Here we address the following questions. How many structurally different entries are there in the Protein Data Bank (PDB)? How do the proteins populate the structural universe? To investigate these questions a structurally non-redundant set of representative entries was selected from the PDB. Construction of such a dataset is not trivial: (i) the considerable size of the PDB requires a large number of comparisons (there were more than 3250 structures of protein chains available in May 1994); (ii) the PDB is highly redundant, containing many structurally similar entries, not necessarily with significant sequence homology, and (iii) there is no clear-cut definition of structural similarity. The latter depend on the criteria and methods used. Here, we analyze structural similarity ignoring protein topology. To date, representative sets have been selected either by hand, by sequence comparison techniques which ignore the three-dimensional (3D) structures of the proteins or by using sequence comparisons followed by linear structural comparison (i.e. the topology, or the sequential order of the chains, is enforced in the structural comparison). Here we describe a 3D sequence-independent automated and efficient method to obtain a representative set of protein molecules from the PDB which contains all unique structures and which is structurally non-redundant. The method has two novel features. The first is the use of strictly structural criteria in the selection process without taking into account the sequence information. To this end we employ a fast structural comparison algorithm which requires on average approximately 2 s per pairwise comparison on a workstation. The second novel feature is the iterative application of a heuristic clustering algorithm that greatly reduces the number of comparisons required. We obtain a representative set of 220 chains with resolution better than 3.0 A, or 268 chains including lower resolution entries, NMR entries and models. The resulting set can serve as a basis for extensive structural classification and studies of 3D recurring motifs and of sequence-structure relationships. The clustering algorithm succeeds in classifying into the same structural family chains with no significant sequence homology, e.g. all the globins in one single group, all the trypsin-like serine proteases in another or all the immunoglobulin-like folds into a third. In addition, unexpected structural similarities of interest have been automatically detected between pairs of chains. A cluster analysis of the representative structures demonstrates the way the &quot;structural universe&#39; is populated.
Predicting the functional sites of a protein from its structure, such as the binding sites of sma... more Predicting the functional sites of a protein from its structure, such as the binding sites of small molecules, other proteins or antibodies sheds light on its function in vivo. Currently, two classes of methods prevail: Machine Learning (ML) models built on top of handcrafted features and comparative modeling. They are respectively limited by the expressivity of the handcrafted features and the availability of similar proteins. Here, we introduce ScanNet, an end-to-end, interpretable geometric deep learning model that learns features directly from 3D structures. ScanNet builds representations of atoms and amino acids based on the spatio-chemical arrangement of their neighbors. We train ScanNet for detecting protein-protein and protein-antibody binding sites, demonstrate its accuracy - including for unseen protein folds - and interpret the filters learned. Finally, we predict epitopes of the SARS-CoV-2 spike protein, validating known antigenic regions and predicting previously unchar...
Non-coding RNAs are transcripts that do not encode proteins play key roles in many biological pro... more Non-coding RNAs are transcripts that do not encode proteins play key roles in many biological processes. The alignment of their secondary structures has become a major tool in RNA functional annotation. Many of the non-coding RNAs contain pseudoknots as a structural motif, which proved to be functionally important. We present HARP, a heuristic algorithm for the pairwise alignment of non-restricted (arbitrary) classes of pseudoknotted RNA secondary structures. HARP applies "geodesic hashing" to perform inexact matching of specially defined reduced RNA secondary structure graphs. The method proved to be efficient both in time and memory and was successfully tested on a benchmark of available experimental structures with known function. A web server is available at http://bioinfo3d.cs.tau.ac.il/HARP/.
Non-coding RNAs are transcripts that do not encode proteins play key roles in many biological pro... more Non-coding RNAs are transcripts that do not encode proteins play key roles in many biological processes. The alignment of their secondary structures has become a major tool in RNA functional annotation. Many of the non-coding RNAs contain pseudoknots as a structural motif, which proved to be functionally important. We present HARP, a heuristic algorithm for the pairwise alignment of non-restricted (arbitrary) classes of pseudoknotted RNA secondary structures. HARP applies “geodesic hashing” to perform inexact matching of specially defined reduced RNA secondary structure graphs. The method proved to be efficient both in time and memory and was successfully tested on a benchmark of available experimental structures with known function. A web server is available at http://bioinfo3d.cs.tau.ac.il/HARP/.
Proceedings. International Conference on Intelligent Systems for Molecular Biology, 2000
We present two algorithms which align flexible protein structures. Both apply efficient structura... more We present two algorithms which align flexible protein structures. Both apply efficient structural pattern detection and graph theoretic techniques. The FlexProt algorithm simultaneously detects the hinge regions and aligns the rigid subparts of the molecules. It does it by efficiently detecting maximal congruent rigid fragments in both molecules and calculating their optimal arrangement which does not violate the protein sequence order. The FlexMol algorithm is sequence order independent, yet requires as input the hypothesized hinge positions. Due its sequence order independence it can also be applied to protein-protein interface matching and drug molecule alignment. It aligns the rigid parts of the molecule using the Geometric Hashing method and calculates optimal connectivity among these parts by graph-theoretic techniques. Both algorithms are highly efficient even compared with rigid structure alignment algorithms. Typical running times on a standard desktop PC (400 MHz) are abo...
We present a novel highly efficient method for the detection of a pharmacophore from a set of dru... more We present a novel highly efficient method for the detection of a pharmacophore from a set of drug-like ligands that interact with a target receptor. A pharmacophore is a spatial arrangement of physico-chemical features in a ligand that is essential for the interaction with a specific receptor. In the absence of a known three-dimensional (3D) receptor structure, a pharmacophore can be identified from a multiple structural alignment of ligand molecules. The key advantages of the presented algorithm are: (a) its ability to multiply align flexible ligands in a deterministic manner, (b) its ability to focus on subsets of the input ligands, which may share a large common substructure, resulting in the detection of both outlier molecules and alternative binding modes, and (c) its computational efficiency, which allows to detect pharmacophores shared by a large number of molecules on a standard PC. The algorithm was extensively tested on a dataset of almost 80 ligands acting on 12 different receptors. The results, which were achieved using a set of standard default parameters, were consistent with reference pharmacophores that were derived from the bound ligand-receptor complexes. The pharmacophores detected by the algorithm are expected to be a key component in the discovery of new leads by screening large databases of drug-like molecules. A user-friendly web interface is available at http://bioinfo3d.cs.tau.ac.il/pharma. Supplementary material can be found at http://bioinfo3d.cs.tau.ac.il/pharma/reduction/.
The molecular chaperone Hsp90 is a ubiquitous ATPase-directed protein responsible for the activat... more The molecular chaperone Hsp90 is a ubiquitous ATPase-directed protein responsible for the activation and structural stabilization of a large clientele of proteins. As such, Hsp90 has emerged as a suitable candidate for the treatment of a diverse set of diseases, such as cancer and neurodegeneration. The inhibition of the chaperone through ATP-competitive inhibitors, however, was shown to lead to undesirable side effects. One strategy to alleviate this problem is the development of molecules that are able to disrupt specific protein–protein interactions, thus modulating the activity of Hsp90 only in the particular cellular pathway that needs to be targeted. Here, we exploit novel computational and theoretical approaches to design a set of peptides that are able to bind Hsp90 and compete for its interaction with the co-chaperone Cdc37, which is found to be responsible for the promotion of cancer cell proliferation. In spite of their capability to disrupt the Hsp90–Cdc37 interaction, n...
Methods in molecular biology (Clifton, N.J.), 2017
In this chapter we present two methods related to rational design of inhibitory peptides: PepCraw... more In this chapter we present two methods related to rational design of inhibitory peptides: PepCrawler: A tool to derive binding peptides from protein-protein complexes and the prediction of protein-peptide complexes. Given an initial protein-peptide complex, the method detects improved predicted peptide binding conformations which bind the protein with higher affinity. This program is a robotics motivated algorithm, representing the peptide as a robotic arm moving among obstacles and exploring its conformational space in an efficient way. PinaColada: A peptide design program for the discovery of novel peptide candidates that inhibit protein-protein interactions. PinaColada uses PepCrawler while introducing sequence mutations, in order to find novel inhibitory peptides for PPIs. It uses the ant colony optimization approach to explore the peptide's sequence space, while using PepCrawler in the refinement stage.
Motivation: Design of protein-protein interaction (PPI) inhibitors is a major challenge in Struct... more Motivation: Design of protein-protein interaction (PPI) inhibitors is a major challenge in Structural Bioinformatics. Peptides, especially short ones (5-15 amino acid long), are natural candidates for inhibition of protein-protein complexes due to several attractive features such as high structural compatibility with the protein binding site (mimicking the surface of one of the proteins), small size and the ability to form strong hotspot binding connections with the protein surface. Efficient rational peptide design is still a major challenge in computer aided drug design, due to the huge space of possible sequences, which is exponential in the length of the peptide, and the high flexibility of peptide conformations. Results: In this article we present PinaColada, a novel computational method for the design of peptide inhibitors for protein-protein interactions. We employ a version of the ant colony optimization heuristic, which is used to explore the exponential space (20 n) of length n peptide sequences, in combination with our fast robotics motivated PepCrawler algorithm, which explores the conformational space for each candidate sequence. PinaColada is being run in parallel, on a DELL PowerEdge 2.8 GHZ computer with 20 cores and 256 GB memory, and takes up to 24 h to design a peptide of 5-15 amino acids length.
Motivation: A highly efficient template-based protein-protein docking algorithm, nicknamed SnapDo... more Motivation: A highly efficient template-based protein-protein docking algorithm, nicknamed SnapDock, is presented. It employs a Geometric Hashing-based structural alignment scheme to align the target proteins to the interfaces of non-redundant protein-protein interface libraries. Docking of a pair of proteins utilizing the 22 600 interface PIFACE library is performed in < 2 min on the average. A flexible version of the algorithm allowing hinge motion in one of the proteins is presented as well. Results: To evaluate the performance of the algorithm a blind remodelling of 3547 PDB complexes, which have been uploaded after the PIFACE publication has been performed with success ratio of about 35%. Interestingly, a similar experiment with the template free PatchDock docking algorithm yielded a success rate of about 23% with roughly 1/3 of the solutions different from those of SnapDock. Consequently, the combination of the two methods gave a 42% success ratio.
[1988 Proceedings] 9th International Conference on Pattern Recognition, 2000
A method for recognition of partially occluded and overlapping objects in composite scenes is pre... more A method for recognition of partially occluded and overlapping objects in composite scenes is presented. The objects to be recognized belong to a large data base of model objects which are known in advance. A geometric hashing technique which is an improvement of a previous technique due to Schwartz and Sharir is proposed. The algorithm's complexity is linear in the number of sample points on the boundary of the composite scene. Experimental results from large databases are presented, including results that could not be successfully solved by the previous technique. 1. Introduction. A major task in robotic vision is the identification and localization of objects in the robots workspace. The objects to be recognized may overlap and be partially occluded. Since in an industrial environment one is usually faced with a limited number of possible objects which are known in advance, it is practical to build a model-based system (see [l]). In such a system the known objects are precompiled creating a model data base, and this database is used to recognize and localize objects in an image scene. In this paper we suggest a new 2-D modelbased recognition scheme. It allows recognition of flat objects, or 3-D objects having a small
The building block protein folding model states that the native protein structure is the product ... more The building block protein folding model states that the native protein structure is the product of a combinatorial assembly of relatively structurally independent contiguous parts of the protein that possess a hydrophobic core, i.e., building blocks (BBs). According to this model, our group proposed a three-stage scheme for a feasible time-wise semi ab-intio protein structure prediction. Given a protein sequence, at the first stage of the prediction scheme, we propose cutting the sequence into structurally assigned BBs. Next, we perform a combinatorial assembly and attempt to predict the relative three-dimensional arrangement of the BBs. In the third stage, we refine and rank the assemblies. The scheme has proven to be very promising in reducing the complexity of the protein folding problem and gaining insight into the protein folding process. In this chapter, we describe the different stages of the scheme and discuss a possible application of the model to protein design.
Combinatorial Chemistry High Throughput Screening, Sep 1, 1999
Here we examine the recognition of small molecules by their protein and DNA receptors. We focus o... more Here we examine the recognition of small molecules by their protein and DNA receptors. We focus on two questions: First, how well does the solid angle molecular surface representation perform in fitting together the surfaces of small ligands, such as drugs and cofactors to their corresponding receptors; And second, in particular, to what extent does the shape complementarity play a role in the matching (recognition) process of such small molecules. Both questions have been investigated in protein-protein binding: &quot;Critical Points&quot; based on solid angle calculations have been shown to perform well in the matching of large protein molecules. They are robust, may be few in numbers, and capture satisfactorily the molecular shape. Shape complementarity has been shown to be a critical factor in protein-protein recognition, but has not been examined in drug-receptor recognition. To probe these questions, here we dock 185 receptor-small ligand molecule pairs. We find that such a representation performs adequately for the smaller ligands too, and that shape complementarity is also observed. These issues are important, given the large databases of drugs that routinely have to be scanned to find candidate, lead compounds. We have been able to carry out such large scale docking experiments owing to our efficient, computer-vision based docking algorithms. Its fast CPU matching times, on the order of minutes on a PC, allows such large scale docking experiments.
Proteins Structure Function and Bioinformatics, Jun 1, 2006
Correlated mutations have been repeatedly exploited for intramolecular contact map prediction. Ov... more Correlated mutations have been repeatedly exploited for intramolecular contact map prediction. Over the last decade these efforts yielded several methods for measuring correlated mutations. Nevertheless, the application of correlated mutations for the prediction of intermolecular interactions has not yet been explored. This gap is due to several obstacles, such as 3D complexes availability, paralog discrimination, and the availability of sequence pairs that are required for inter-but not intramolecular analyses. Here we selected for analysis fusion protein families that bypass some of these obstacles. We find that several correlated mutation measurements yield reasonable accuracy for intramolecular contact map prediction on the fusion dataset. However, the accuracy level drops sharply in intermolecular contacts prediction. This drop in accuracy does not occur always. In the Cohesin-Dockerin family, reasonable accuracy is achieved in the prediction of both intra-and intermolecular contacts. The Cohesin-Dockerin family is well suited for correlated mutation analysis. Because, however, this family constitutes a special case (it has radical mutations, has domain repeats, within each species each Dockerin domain interacts with each Cohesin domain, see below), the successful prediction in this family does not point to a general potential in using correlated mutations for predicting intermolecular contacts. Overall, the results of our study indicate that current methodologies of correlated mutations analysis are not suitable for large-scale intermolecular contact prediction, and thus cannot assist in docking. With current measurements, sequence availability, sequence annotations, and underdeveloped sequence pairing methods, correlated mutations can yield reasonable accuracy only for a handful of families. Proteins 2006;63:832-845.
Cryo-EM has become an increasingly powerful technique for elucidating the structure, dynamics and... more Cryo-EM has become an increasingly powerful technique for elucidating the structure, dynamics and function of large flexible macromolecule assemblies that cannot be determined at atomic-resolution. A major challenge in analyzing EM maps of complexes is the identification of their subunits. We propose a fully automated highly efficient method for discovering high-resolution subunits of a complex, given as an intermediate resolution
Geometrie Hashing : A General and Efficient Model-Based Recognition Scheme Yehezkel Lamdan and Ha... more Geometrie Hashing : A General and Efficient Model-Based Recognition Scheme Yehezkel Lamdan and Haim J.Wolfson Robotics Research Laboratory Courant Inst, of Math., NYU 715 Broadway, 12&amp;#x27;th floor, New York, NY 10003. Abstract: A general method for model-based ...
Here we address the following questions. How many structurally different entries are there in the... more Here we address the following questions. How many structurally different entries are there in the Protein Data Bank (PDB)? How do the proteins populate the structural universe? To investigate these questions a structurally non-redundant set of representative entries was selected from the PDB. Construction of such a dataset is not trivial: (i) the considerable size of the PDB requires a large number of comparisons (there were more than 3250 structures of protein chains available in May 1994); (ii) the PDB is highly redundant, containing many structurally similar entries, not necessarily with significant sequence homology, and (iii) there is no clear-cut definition of structural similarity. The latter depend on the criteria and methods used. Here, we analyze structural similarity ignoring protein topology. To date, representative sets have been selected either by hand, by sequence comparison techniques which ignore the three-dimensional (3D) structures of the proteins or by using sequence comparisons followed by linear structural comparison (i.e. the topology, or the sequential order of the chains, is enforced in the structural comparison). Here we describe a 3D sequence-independent automated and efficient method to obtain a representative set of protein molecules from the PDB which contains all unique structures and which is structurally non-redundant. The method has two novel features. The first is the use of strictly structural criteria in the selection process without taking into account the sequence information. To this end we employ a fast structural comparison algorithm which requires on average approximately 2 s per pairwise comparison on a workstation. The second novel feature is the iterative application of a heuristic clustering algorithm that greatly reduces the number of comparisons required. We obtain a representative set of 220 chains with resolution better than 3.0 A, or 268 chains including lower resolution entries, NMR entries and models. The resulting set can serve as a basis for extensive structural classification and studies of 3D recurring motifs and of sequence-structure relationships. The clustering algorithm succeeds in classifying into the same structural family chains with no significant sequence homology, e.g. all the globins in one single group, all the trypsin-like serine proteases in another or all the immunoglobulin-like folds into a third. In addition, unexpected structural similarities of interest have been automatically detected between pairs of chains. A cluster analysis of the representative structures demonstrates the way the &quot;structural universe&#39; is populated.
Uploads
Papers by Haim Wolfson