Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

    Morgan Giddings

    The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium... more
    The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome
    Data from the Encyclopedia of DNA Elements (ENCODE) project show over 9640 human genome loci classified as long noncoding RNAs (lncRNAs), yet only ∼100 have been deeply characterized to determine their role in the cell. To measure the... more
    Data from the Encyclopedia of DNA Elements (ENCODE) project show over 9640 human genome loci classified as long noncoding RNAs (lncRNAs), yet only ∼100 have been deeply characterized to determine their role in the cell. To measure the protein-coding output from these RNAs, we jointly analyzed two recent data sets produced in the ENCODE project: tandem mass spectrometry (MS/MS) data mapping expressed peptides to their encoding genomic loci, and RNA-seq data generated by ENCODE in long polyA+ and polyA− fractions in the cell lines K562 and GM12878. We used the machine-learning algorithm RuleFit3 to regress the peptide data against RNA expression data. The most important covariate for predicting translation was, surprisingly, the Cytosol polyA− fraction in both cell lines. LncRNAs are ∼13-fold less likely to produce detectable peptides than similar mRNAs, indicating that ∼92% of GENCODE v7 lncRNAs are not translated in these two ENCODE cell lines. Intersecting 9640 lncRNA loci with 79,...
    Microbes have developed resistance to nearly every antibiotic, yet the steps leading to drug resistance remain unclear. Here we report a multistage process by which Pseudomonas aeruginosa acquires drug resistance following exposure to... more
    Microbes have developed resistance to nearly every antibiotic, yet the steps leading to drug resistance remain unclear. Here we report a multistage process by which Pseudomonas aeruginosa acquires drug resistance following exposure to ciprofloxacin at levels ranging from 0.5× to 8× the initial MIC. In stage I, susceptible cells are killed en masse by the exposure. In stage II, a small, slow to nongrowing population survives antibiotic exposure that does not exhibit significantly increased resistance according to the MIC measure. In stage III, exhibited at 0.5× to 4× the MIC, a growing population emerges to reconstitute the population, and these cells display heritable increases in drug resistance of up to 50 times the original level. We studied the stage III cells by proteomic methods to uncover differences in the regulatory pathways that are involved in this phenotype, revealing upregulation of phosphorylation on two proteins, succinate-semialdehyde dehydrogenase (SSADH) and methyl...
    Genome-based peptide fingerprint scanning (GFS) directly maps several types of protein mass spectral (MS) data to the loci in the genome that may have encoded for the protein. This process can be used either for protein identification or... more
    Genome-based peptide fingerprint scanning (GFS) directly maps several types of protein mass spectral (MS) data to the loci in the genome that may have encoded for the protein. This process can be used either for protein identification or for proteogenomic mapping, which is gene-finding and annotation based on proteomic data. Inputs to the program are one or more mass spectrometry files from peptide mass fingerprinting and/or tandem MS (MS/MS) along with one or more sequences to search them against, and the output is the coordinates of any matches found. This unit describes the use of GFS and subsequent results analysis.
    Membrane proteomics, the large-scale analysis of membrane proteins, is often constrained by the difficulties of achieving fully resolvable separation and resistance to proteolysis, both of which could lead to low recovery and low... more
    Membrane proteomics, the large-scale analysis of membrane proteins, is often constrained by the difficulties of achieving fully resolvable separation and resistance to proteolysis, both of which could lead to low recovery and low identification rates of membrane proteins. Here, we introduce a novel integrated approach, GELFrEE Optimized FASP Technology (GOFAST) for large-scale and comprehensive membrane proteins analysis. Using an array of sample preparation techniques including gel-eluted liquid fraction entrapment electrophoresis (GELFrEE), filter-aided sample preparation (FASP), and microwave-assisted on-filter enzymatic digestion, we identified 2 090 proteins from the membrane fraction of a leukemia cell line (K562). Of these, 37% are annotated as membrane proteins according to gene ontology analysis, resulting in the largest membrane proteome of leukemia cells reported to date. Our approach combines the advantages of GELFrEE high-loading capacity, gel-free separation, efficient depletion of detergents, and microwave-assisted on-filter digestion, minimizing sample losses and maximizing MS-detectable sequence coverage of individual proteins. In addition, this approach also shows great potential for the identification of alternative splicing products.
    ABSTRACT
    This chapter describes using the Protein Inference Engine (PIE) to integrate various types of data--especially top down and bottom up mass spectrometer (MS) data--to describe a protein's posttranslational modifications (PTMs). PTMs... more
    This chapter describes using the Protein Inference Engine (PIE) to integrate various types of data--especially top down and bottom up mass spectrometer (MS) data--to describe a protein's posttranslational modifications (PTMs). PTMs include cleavage events such as the n-terminal loss of methionine and residue modifications like phosphorylation. Modifications are key elements in many biological processes, but are difficult to study as no single, general method adequately characterizes a protein's PTMs; manually integrating data from several MS experiments is usually required. The PIE is designed to automate this process using a guess and refine process similar to how an expert manually integrates data. The PIE repeatedly "imagines" a possible modification set, evaluates it using available data, and then tries to improve on it. After many rounds of refinement, the resulting modification set is proposed as a candidate answer. Multiple candidate answers are generated to...
    Software for gel image analysis and base-calling in fluorescence-based sequencing consisting of two primary programs, BaseFinder and GelImager, is described. BaseFinder is a framework for trace processing, analysis, and base-calling.... more
    Software for gel image analysis and base-calling in fluorescence-based sequencing consisting of two primary programs, BaseFinder and GelImager, is described. BaseFinder is a framework for trace processing, analysis, and base-calling. BaseFinder is highly extensible, allowing the addition of trace analysis and processing modules without recompilation. Powerful scripting capabilities combined with modularity and multilane handling allow the user to customize BaseFinder to virtually any type of trace processing. We have developed an extensive set of data processing and analysis modules for use with the program in fluorescence-based sequencing. GelImager is a framework for gel image manipulation. It can be used for gel visualization, lane retracking, and as a front end to the Washington University Getlanes program. The programs were designed using a cross-platform development environment, currently allowing them to run in Windows NT, Windows 95, Openstep/Mach, and Rhapsody. Work is ongo...
    We have developed a high speed instrument for automated DNA sequence analysis. The apparatus employs laser excitation and a cooled CCD detector for the parallel detection of up to 18 sets of four fluorescently labeled DNA sequencing... more
    We have developed a high speed instrument for automated DNA sequence analysis. The apparatus employs laser excitation and a cooled CCD detector for the parallel detection of up to 18 sets of four fluorescently labeled DNA sequencing reactions during their electrophoretic separation in ultrathin (50-100 microns) denaturing polyacrylamide gels. Four hundred and fifty bases of sequence information is obtained from 100 ng of M13 template DNA in less than one hour, corresponding to an overall instrument throughput of over 8000 bases/hr.
    With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was... more
    With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the o...
    Proteogenomic searching is a useful method for identifying novel proteins, annotating genes and detecting peptides unique to an individual genome. The approach, however, can be laborious, as it often requires search segmentation and the... more
    Proteogenomic searching is a useful method for identifying novel proteins, annotating genes and detecting peptides unique to an individual genome. The approach, however, can be laborious, as it often requires search segmentation and the use of several unintegrated tools. Furthermore, many proteogenomic efforts have been limited to small genomes, as large genomes can prove impractical due to the required amount of computer memory and computation time. We present Peppy, a software tool designed to perform every necessary task of proteogenomic searches quickly, accurately and automatically. The software generates a peptide database from a genome, tracks peptide loci, matches peptides to MS/MS spectra and assigns confidence values to those matches. Peppy automatically performs a decoy database generation, search and analysis to return identifications at the desired false discovery rate threshold. Written in Java for cross-platform execution, the software is fully multithreaded for enhanced speed. The program can run on regular desktop computers, opening the doors of proteogenomic searching to a wider audience of proteomics and genomics researchers. Peppy is available at http://geneffects.com/peppy .

    And 13 more