Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
David Baker

    David Baker

    • I am very lucky to be working with wonderful colleagues on a fascinating problem: how to model biomolecular systems w... moreedit
    In 2008, a successful computational design procedure was reported that yielded active enzyme catalysts for the Kemp elimination. Here, we studied these proteins together with a set of previously unpublished inactive designs to determine... more
    In 2008, a successful computational design procedure was reported that yielded active enzyme catalysts for the Kemp elimination. Here, we studied these proteins together with a set of previously unpublished inactive designs to determine the sources of activity or lack thereof, and to predict which of the designed structures are most likely to be catalytic. Methods that range from quantum mechanics (QM) on truncated model systems to the treatment of the full protein with ONIOM QM/MM and AMBER molecular dynamics (MD) were explored. The most effective procedure involved molecular dynamics, and a general MD protocol was established. Substantial deviations from the ideal catalytic geometries were observed for a number of designs. Penetration of water into the catalytic site and insufficient residue-packing around the active site are the main factors that can cause enzyme designs to be inactive. Where in the past, computational evaluations of designed enzymes were too time-extensive for p...
    The creation of enzymes capable of catalyzing any desired chemical reaction is a grand challenge for computational protein design. Using new algorithms that rely on hashing techniques to construct active sites for multistep reactions, we... more
    The creation of enzymes capable of catalyzing any desired chemical reaction is a grand challenge for computational protein design. Using new algorithms that rely on hashing techniques to construct active sites for multistep reactions, we designed retro-aldolases that use four different catalytic motifs to catalyze the breaking of a carbon-carbon bond in a nonnatural substrate. Of the 72 designs that were experimentally characterized, 32, spanning a range of protein folds, had detectable retro-aldolase activity. Designs that used an explicit water molecule to mediate proton shuffling were significantly more successful, with rate accelerations of up to four orders of magnitude and multiple turnovers, than those involving charged side-chain networks. The atomic accuracy of the design process was confirmed by the x-ray crystal structure of active designs embedded in two protein scaffolds, both of which were nearly superimposable on the design model.
    Water‐mediated hydrogen bonds play critical roles at protein–protein and protein–nucleic acid interfaces, and the interactions formed by discrete water molecules cannot be captured using continuum solvent models. We describe a simple... more
    Water‐mediated hydrogen bonds play critical roles at protein–protein and protein–nucleic acid interfaces, and the interactions formed by discrete water molecules cannot be captured using continuum solvent models. We describe a simple model for the energetics of water‐mediated hydrogen bonds, and show that, together with knowledge of the positions of buried water molecules observed in X‐ray crystal structures, the model improves the prediction of free‐energy changes upon mutation at protein–protein interfaces, and the recovery of native amino acid sequences in protein interface design calculations. We then describe a “solvated rotamer” approach to efficiently predict the positions of water molecules, at protein–protein interfaces and in monomeric proteins, that is compatible with widely used rotamer‐based side‐chain packing and protein design algorithms. Finally, we examine the extent to which the predicted water molecules can be used to improve prediction of amino acid identities an...
    Protein–small molecule docking algorithms provide a means to model the structure of protein–small molecule complexes in structural detail and play an important role in drug development. In recent years the necessity of simulating protein... more
    Protein–small molecule docking algorithms provide a means to model the structure of protein–small molecule complexes in structural detail and play an important role in drug development. In recent years the necessity of simulating protein side‐chain flexibility for an accurate prediction of the protein–small molecule interfaces has become apparent, and an increasing number of docking algorithms probe different approaches to include protein flexibility. Here we describe a new method for docking small molecules into protein binding sites employing a Monte Carlo minimization procedure in which the rigid body position and orientation of the small molecule and the protein side‐chain conformations are optimized simultaneously. The energy function comprises van der Waals (VDW) interactions, an implicit solvation model, an explicit orientation hydrogen bonding potential, and an electrostatics model. In an evaluation of the scoring function the computed energy correlated with experimental sma...
    Guided by the recent success of empirical model predicting the folding rates of small two‐state folding proteins from the relative contact order (CO) of their native structures, by a theoretical model of protein folding that predicts that... more
    Guided by the recent success of empirical model predicting the folding rates of small two‐state folding proteins from the relative contact order (CO) of their native structures, by a theoretical model of protein folding that predicts that logarithm of the folding rate decreases with the protein chain length L as L2/3, and by the finding that the folding rates of multistate folding proteins strongly correlate with their sizes and have very bad correlation with CO, we reexamined the dependence of folding rate on CO and L in attempt to find a structural parameter that determines folding rates for the totality of proteins. We show that the Abs_CO = CO × L, is able to predict rather accurately folding rates for both two‐state and multistate folding proteins, as well as short peptides, and that this Abs_CO scales with the protein chain length as L0.70 ± 0.07 for the totality of studied single‐domain proteins and peptides.
    Recently we proposed a novel method of alignment–alignment comparison, COMPASS (the tool for COmparison of Multiple Protein Alignments with Assessment of Statistical Significance). Here we present several examples of the relations between... more
    Recently we proposed a novel method of alignment–alignment comparison, COMPASS (the tool for COmparison of Multiple Protein Alignments with Assessment of Statistical Significance). Here we present several examples of the relations between PFAM protein families that were detected by COMPASS and that lead to the predictions of presently unresolved protein structures. We discuss relatively straightforward COMPASS predictions that are new and interesting to us, and that would require a substantial time and effort to justify even for a skilled PSI‐BLAST user. All of the presented COMPASS hits are independently confirmed by other methods, including the ab initio structure‐prediction method ROSETTA. The tertiary structure predictions made by ROSETTA proved to be useful for improving sequence‐derived alignments, because they are based on a reasonable folding of the polypeptide chain rather than on the information from sequence databases. The ability of COMPASS to predict new relations withi...
    We have developed a phage display system that provides a means to select variants of the IgG binding domain of peptostreptococcal protein L that fold from large combinatorial libraries. The premise underlying the selection scheme is that... more
    We have developed a phage display system that provides a means to select variants of the IgG binding domain of peptostreptococcal protein L that fold from large combinatorial libraries. The premise underlying the selection scheme is that binding of protein L to IgG requires that the protein be properly folded. Using a combination of molecular biological and biophysical methods, we show that this assumption is valid. First, the phage selection procedure strongly selects against a point mutation in protein L that disrupts folding but is not in the IgG binding interface. Second, variants recovered from a library in which the first third of protein L was randomized are properly folded. The degree of sequence variation in the selected population is striking: the variants have as many as nine substitutions in the 14 residues that were mutagenized. The approach provides a selection for “foldedness” that is potentially applicable to any small binding protein.
    We have used cluster analysis to identify recurring sequence patterns that transcend protein family boundaries. A subset of these patterns occur predominantly in a single type of local structure in proteins. Here we characterize the... more
    We have used cluster analysis to identify recurring sequence patterns that transcend protein family boundaries. A subset of these patterns occur predominantly in a single type of local structure in proteins. Here we characterize the three‐dimensional structures and contexts in which these sequence patterns occur, with particular attention to the interactions responsible for their structural selectivity.
    Formation of many dsDNA viruses begins with the assembly of a procapsid, containing scaffolding proteins and a multisubunit portal but lacking DNA, which matures into an infectious virion. This process, conserved among dsDNA viruses such... more
    Formation of many dsDNA viruses begins with the assembly of a procapsid, containing scaffolding proteins and a multisubunit portal but lacking DNA, which matures into an infectious virion. This process, conserved among dsDNA viruses such as herpes viruses and bacteriophages, is key to forming infectious virions. Bacteriophage P22 has served as a model system for this study in the past several decades. However, how capsid assembly is initiated, where and how scaffolding proteins bind to coat proteins in the procapsid, and the conformational changes upon capsid maturation still remain elusive. Here, we report Cα backbone models for the P22 procapsid and infectious virion derived from electron cryomicroscopy density maps determined at 3.8- and 4.0-Å resolution, respectively, and the first procapsid structure at subnanometer resolution without imposing symmetry. The procapsid structures show the scaffolding protein interacting electrostatically with the N terminus (N arm) of the coat pr...
    Foldit is a multiplayer online game in which players collaborate and compete to create accurate protein structure models. For specific hard problems, Foldit player solutions can in some cases outperform state-of-the-art computational... more
    Foldit is a multiplayer online game in which players collaborate and compete to create accurate protein structure models. For specific hard problems, Foldit player solutions can in some cases outperform state-of-the-art computational methods. However, very little is known about how collaborative gameplay produces these results and whether Foldit player strategies can be formalized and structured so that they can be used by computers. To determine whether high performing player strategies could be collectively codified, we augmented the Foldit gameplay mechanics with tools for players to encode their folding strategies as “recipes” and to share their recipes with other players, who are able to further modify and redistribute them. Here we describe the rapid social evolution of player-developed folding algorithms that took place in the year following the introduction of these tools. Players developed over 5,400 different recipes, both by creating new algorithms and by modifying and re...
    The type III secretion system (T3SS) is an interspecies protein transport machine that plays a major role in interactions of Gram-negative bacteria with animals and plants by delivering bacterial effector proteins into host cells. T3SSs... more
    The type III secretion system (T3SS) is an interspecies protein transport machine that plays a major role in interactions of Gram-negative bacteria with animals and plants by delivering bacterial effector proteins into host cells. T3SSs span both membranes of Gram-negative bacteria by forming a structure of connected oligomeric rings termed the needle complex (NC). Here, the localization of subunits in the Salmonella enterica serovar Typhimurium T3SS NC were probed via mass spectrometry-assisted identification of chemical cross-links in intact NC preparations. Cross-links between amino acids near the amino terminus of the outer membrane ring component InvG and the carboxyl terminus of the inner membrane ring component PrgH and between the two inner membrane components PrgH and PrgK allowed for spatial localization of the three ring components within the electron density map structures of NCs. Mutational and biochemical analysis demonstrated that the amino terminus of InvG and the ca...
    Protein L consists of a single α-helix packed on a four-stranded β-sheet formed by two symmetrically opposed β-hairpins. We use a computer-based protein design procedure to stabilize a domain-swapped dimer of protein L in which the second... more
    Protein L consists of a single α-helix packed on a four-stranded β-sheet formed by two symmetrically opposed β-hairpins. We use a computer-based protein design procedure to stabilize a domain-swapped dimer of protein L in which the second β-turn straightens and the C-terminal strand inserts into the β-sheet of the partner. The designed obligate dimer contains three mutations (A52V, N53P, and G55A) and has a dissociation constant of ≈700 pM, which is comparable to the dissociation constant of many naturally occurring protein dimers. The structure of the dimer has been determined by x-ray crystallography and is close to the in silico model.
    Genome sequencing projects are producing linear amino acid sequences, but full understanding of the biological role of these proteins will require knowledge of their structure and function. Although experimental structure determination... more
    Genome sequencing projects are producing linear amino acid sequences, but full understanding of the biological role of these proteins will require knowledge of their structure and function. Although experimental structure determination methods are providing high-resolution structure information about a subset of the proteins, computational structure prediction methods will provide valuable information for the large fraction of sequences whose structures will not be determined experimentally. The first class of protein structure prediction methods, including threading and comparative modeling, rely on detectable similarity spanning most of the modeled sequence and at least one known structure. The second class of methods, de novo or ab initio methods, predict the structure from sequence alone, without relying on similarity at the fold level between the modeled sequence and any of the known structures. In this Viewpoint, we begin by describing the essential features of the methods, th...
    Accurate high-resolution refinement of protein structure models is a formidable challenge because of the delicate balance of forces in the native state, the difficulty in sampling the very large number of alternative tightly packed... more
    Accurate high-resolution refinement of protein structure models is a formidable challenge because of the delicate balance of forces in the native state, the difficulty in sampling the very large number of alternative tightly packed conformations, and the inaccuracies in current force fields. Indeed, energy-based refinement of comparative models generally leads to degradation rather than improvement in model quality, and, hence, most current comparative modeling procedures omit physically based refinement. However, despite their inaccuracies, current force fields do contain information that is orthogonal to the evolutionary information on which comparative models are based, and, hence, refinement might be able to improve comparative models if the space that is sampled is restricted sufficiently so that false attractors are avoided. Here, we use the principal components of the variation of backbone structures within a homologous family to define a small number of evolutionarily favore...
    Chaperonins are large ATP-driven molecular machines that mediate cellular protein folding. Group II chaperonins use their "built-in lid" to close their central folding chamber. Here we report the structure of an archaeal group... more
    Chaperonins are large ATP-driven molecular machines that mediate cellular protein folding. Group II chaperonins use their "built-in lid" to close their central folding chamber. Here we report the structure of an archaeal group II chaperonin in its prehydrolysis ATP-bound state at subnanometer resolution using single particle cryo-electron microscopy (cryo-EM). Structural comparison of Mm-cpn in ATP-free, ATP-bound, and ATP-hydrolysis states reveals that ATP binding alone causes the chaperonin to close slightly with a ∼45° counterclockwise rotation of the apical domain. The subsequent ATP hydrolysis drives each subunit to rock toward the folding chamber and to close the lid completely. These motions are attributable to the local interactions of specific active site residues with the nucleotide, the tight couplings between the apical and intermediate domains within the subunit, and the aligned interactions between two subunits across the rings. This mechanism of structural c...

    And 284 more