Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Mudita Singhal

... Proteomics is the _____ 1 susan.havre@pnl.gov 2 mudita.singhal@pnl.gov 3 debbie.payne@pnl.gov 4 bobbie-jo.webb-robertson@pnl.gov ... The proteome is an essential key to understanding the complex processes of cells. ...
The use of computer tools and technologies is unavoidable when it comes to conducting mass spectrometry (MS) research at any significant level. This is mainly due to the large volume of MS data and the processing rates required. Most of... more
The use of computer tools and technologies is unavoidable when it comes to conducting mass spectrometry (MS) research at any significant level. This is mainly due to the large volume of MS data and the processing rates required. Most of the existing tools focus on one particular task: be it storing and maintaining the data or visualizing the dataset to
Scientists face an ever-increasing challenge in investigating biological systems with high throughput experimental methods such as mass spectrometry and gene arrays because of the scale and complexity of the data and the need to integrate... more
Scientists face an ever-increasing challenge in investigating biological systems with high throughput experimental methods such as mass spectrometry and gene arrays because of the scale and complexity of the data and the need to integrate results broadly with heterogeneous other types of information. Many analyses require merging the experimental results with datasets returned from public databases, such as those hosted
Human experts can annotate peaks in MALDI-TOF profiles of detached N-glycans with some degree of accuracy. Even though MALDI-TOF profiles give only intact masses without any fragmentation information, expert knowledge of the most common... more
Human experts can annotate peaks in MALDI-TOF profiles of detached N-glycans with some degree of accuracy. Even though MALDI-TOF profiles give only intact masses without any fragmentation information, expert knowledge of the most common glycans and biosynthetic pathways in the biological system can point to a small set of most likely glycan structures at the "cartoon" level of detail. Cartoonist is a recently developed, fully automatic annotation tool for MALDI-TOF glycan profiles. Here we benchmark Cartoonist's automatic annotations against human expert annotations on human and mouse N-glycan data from the Consortium for Functional Glycomics. We find that Cartoonist and expert annotations largely agree, but the expert tends to annotate more specifically, meaning fewer suggested structures per peak, and Cartoonist more comprehensively, meaning more annotated peaks. On peaks for which both Cartoonist and the expert give unique cartoons, the two cartoons agree in over 90% of all cases. This article is part of a Special Issue entitled: Computational Proteomics.
Processing rules in a distributed active database involves evaluating distributed rules correctly. For databases whose rules are not changed very often, the per- formance of the distributed rule evaluation algorithm plays the key role in... more
Processing rules in a distributed active database involves evaluating distributed rules correctly. For databases whose rules are not changed very often, the per- formance of the distributed rule evaluation algorithm plays the key role in the overall performance of distributed active databases. In this paper, we study the performance of a distributed rule evaluation algorithm. An analytical model is developed
The Software Environment for BIological Network Inference (SEBINI) has been created to provide an interactive environment for the deployment and testing of network inference algorithms that use high-throughput expression data. Networks... more
The Software Environment for BIological Network Inference (SEBINI) has been created to provide an interactive environment for the deployment and testing of network inference algorithms that use high-throughput expression data. Networks inferred from the SEBINI software ...
Searching a large document collection to learn about a broad subject involves the iterative process of figuring out what to ask, filtering the results, identifying useful documents, and deciding when one has covered enough material to... more
Searching a large document collection to learn about a broad subject involves the iterative process of figuring out what to ask, filtering the results, identifying useful documents, and deciding when one has covered enough material to stop searching. We are calling this activity "discoverage," discovery of relevant material and tracking coverage of that material. We built a visual analytic tool called Footprints that uses multiple coordinated visualizations to help users navigate through the discoverage process. To support discovery, Footprints displays topics extracted from documents that provide an overview of the search space and are used to construct searches visuospatially. Footprints allows users to triage their search results by assigning a status to each document (To Read, Read, Useful), and those status markings are shown on interactive histograms depicting the user's coverage through the documents across dates, sources, and topics. Coverage histograms help users notice biases in their search and fill any gaps in their analytic process. To create Footprints, we used a highly iterative, user-centered approach in which we conducted many evaluations during both the design and implementation stages and continually modified the design in response to feedback.
... Joshua N. Adkins Pacific Northwest National Lab Richland, WA, USA +1-509-371-6583 joshua.adkins@pnl.gov Roslyn Brown Pacific Northwest National Lab Richland, WA, USA +1-509-371-7629 roslyn.brown@pnl.gov ABSTRACT ...
... Anuj R. Shah, Mudita Singhal, Tara D. Gibson, Chandrika Sivaramakrishnan, Katrina M. Waters, Ian Gorton* Pacific Northwest National Laboratory ... provides a consistent framework that combines a database management system, a... more
... Anuj R. Shah, Mudita Singhal, Tara D. Gibson, Chandrika Sivaramakrishnan, Katrina M. Waters, Ian Gorton* Pacific Northwest National Laboratory ... provides a consistent framework that combines a database management system, a collection of tools to store, manage, and query ...
Attaining a detailed understanding of the various biological networks in an organism lies at the core of the emerging discipline of systems biology. A precise description of the relationships formed between genes, mRNA molecules, and... more
Attaining a detailed understanding of the various biological networks in an organism lies at the core of the emerging discipline of systems biology. A precise description of the relationships formed between genes, mRNA molecules, and proteins is a necessary step toward a complete description of the dynamic behavior of an organism at the cellular level, and toward intelligent, efficient, and directed modification of an organism. The importance of understanding such regulatory, signaling, and interaction networks has fueled the development of numerous in silico inference algorithms, as well as new experimental techniques and a growing collection of public databases. The Software Environment for BIological Network Inference (SEBINI) has been created to provide an interactive environment for the deployment, evaluation, and improvement of algorithms used to reconstruct the structure of biological regulatory and interaction networks. SEBINI can be used to analyze high-throughput gene expression, protein abundance, or protein activation data via a suite of state-of-the-art network inference algorithms. It also allows algorithm developers to compare and train network inference methods on artificial networks and simulated gene expression perturbation data. SEBINI can therefore be used by software developers wishing to evaluate, refine, or combine inference techniques, as well as by bioinformaticians analyzing experimental data. Networks inferred from the SEBINI software platform can be further analyzed using the Collective Analysis of Biological Interaction Networks (CABIN) tool, which is an exploratory data analysis software that enables integration and analysis of protein-protein interaction and gene-to-gene regulatory evidence obtained from multiple sources. The collection of edges in a public database, along with the confidence held in each edge (if available), can be fed into CABIN as one "evidence network," using the Cytoscape SIF file format. Using CABIN, one may increase the confidence in individual edges in a network inferred by an algorithm in SEBINI, as well as extend such a network by combining it with species-specific or generic information, e.g., known protein-protein interactions or target genes identified for known transcription factors. Thus, the combined SEBINI-CABIN toolkit aids in the more accurate reconstruction of biological networks, with less effort, in less time.A demonstration web site for SEBINI can be accessed from https://www.emsl.pnl.gov/SEBINI/RootServlet . Source code and PostgreSQL database schema are available under open source license. ronald.taylor@pnl.gov. For commercial use, some algorithms included in SEBINI require licensing from the original developers. CABIN can be downloaded from http://www.sysbio.org/dataresources/cabin.stm . mudita.singhal@pnl.gov.
We present a platform for the reconstruction of protein-protein interaction networks inferred from Mass Spectrometry (MS) bait-prey data. The Software Environment for Biological Network Inference (SEBINI), an environment for the... more
We present a platform for the reconstruction of protein-protein interaction networks inferred from Mass Spectrometry (MS) bait-prey data. The Software Environment for Biological Network Inference (SEBINI), an environment for the deployment of network inference algorithms that use high-throughput data, forms the platform core. Among the many algorithms available in SEBINI is the Bayesian Estimator of Probabilities of Protein-Protein Associations (BEPro3) algorithm, which is used to infer interaction networks from such MS affinity isolation data. Also, the pipeline incorporates the Collective Analysis of Biological Interaction Networks (CABIN) software. We have thus created a structured workflow for protein-protein network inference and supplemental analysis.
For the SC|06 analytics challenge, we demonstrate an end-to-end solution for processing data produced by high-throughput mass spectrometry (MS)-based proteomics so biological hypotheses can be explored. This approach is based on a tool... more
For the SC|06 analytics challenge, we demonstrate an end-to-end solution for processing data produced by high-throughput mass spectrometry (MS)-based proteomics so biological hypotheses can be explored. This approach is based on a tool called the Bioinformatics Resource Manager (BRM) which will interact with high-performance architecture and experimental data sources to provide high-throughput analytics to a specific experimental dataset. Peptide identification
For scientific data visualizations, real-time data streams present many interesting challenges when compared to static data. Real-time data are dynamic, transient, high-volume and temporal. Effective visualizations need to be able to... more
For scientific data visualizations, real-time data streams present many interesting challenges when compared to static data. Real-time data are dynamic, transient, high-volume and temporal. Effective visualizations need to be able to accommodate dynamic data behavior as well as and present the data in ways that make sense to and are usable by humans. The Visual Content Analysis of Real-Time Data Streams project at the Pacific Northwest National Laboratory is researching and prototyping dynamic visualization techniques and tools to help facilitate human understanding and comprehension of high-volume, real-time data. The general strategy of the project is to develop and evolve visual contexts that will organize and orient high-volume dynamic data in conceptual and perceptive views. The goal is to allow users to quickly grasp dynamic data in forms that are intuitive and natural without requiring intensive training in the use of specific visualization or analysis tools and methods. Thus...
Proteins play a key role in cellular processes, making proteomics central to understanding systems biology. MS techniques provide a means to observe entire proteomes at a global level. Yet, high-throughput MS proteomics techniques... more
Proteins play a key role in cellular processes, making proteomics central to understanding systems biology. MS techniques provide a means to observe entire proteomes at a global level. Yet, high-throughput MS proteomics techniques generate data faster than it can currently be analyzed. The success of proteomics depends on high-throughput experimental techniques coupled with sophisticated visual analysis and data-mining methods. Visual analysis has been applied successfully in a number of fields plagued with huge, complex data sets and will likely be an important tool in proteomics discovery. PQuad, a novel visualization of MS proteomics data, provides powerful analysis capabilities that support a number of proteomic data applications. In particular, PQuad supports differential proteomics by simplifying the comparison of peptide sets from different experimental conditions as well as different protein identification or confidence scoring techniques. Finally, PQuad supports data validation and quality control by providing a variety of resolutions for huge amounts of data to reveal errors undetected by other methods.
The recent advances in high-throughput data acquisition have driven a revolution in the study of human disease and determination of molecular biomarkers of disease states. It has become increasingly clear that many of the most important... more
The recent advances in high-throughput data acquisition have driven a revolution in the study of human disease and determination of molecular biomarkers of disease states. It has become increasingly clear that many of the most important human diseases arise as the result of a complex interplay between several factors including environmental factors, such as exposure to toxins or pathogens, diet, lifestyle, and the genetics of the individual patient. Recent research has begun to describe these factors in the context of networks which describe relationships between biological components, such as genes, proteins and metabolites, and have made progress towards the understanding of disease as a dysfunction of the entire system, rather than, for example, mutations in single genes. We provide a summary of some of the recent work in this area, focusing on how the integration of different kinds of complementary data, and analysis of biological networks and pathways can lead to discovery of r...
The importance of understanding biological interaction networks has fueled the development of numerous interaction data generation techniques, databases and prediction tools. However, not all prediction tools and databases predict... more
The importance of understanding biological interaction networks has fueled the development of numerous interaction data generation techniques, databases and prediction tools. However, not all prediction tools and databases predict interactions with one hundred percent accuracy. Generation of high-confidence interaction networks formulates the first step towards deciphering unknown protein functions, determining protein complexes and inventing drugs. The CABIN: Collective Analysis of Biological Interaction Networks software is an exploratory data analysis tool that enables analysis and integration of interactions evidence obtained from multiple sources, thereby increasing the confidence of computational predictions as well as validating experimental observations. CABIN has been written in Java and is available as a plugin for Cytoscape--an open source network visualization tool.
Biologists and bioinformaticists face the ever-increasing challenge of managing large datasets queried from diverse data sources. Genomics and proteomics databases such as the National Center for Biotechnology (NCBI), Kyoto Encyclopedia... more
Biologists and bioinformaticists face the ever-increasing challenge of managing large datasets queried from diverse data sources. Genomics and proteomics databases such as the National Center for Biotechnology (NCBI), Kyoto Encyclopedia of Genes and Genomes (KEGG), and the European Molecular Biology Laboratory (EMBL) are becoming the standard biological data department stores that biologists visit on a regular basis to obtain the supplies necessary for conducting their research. However, much of the data that biologists retrieve from these databases needs to be further managed and organized in a meaningful way so that the researcher can focus on the problem that they are trying to investigate and share their data and findings with other researchers. We are working towards developing a problem-solving environment called the Computational Cell Environment (CCE) that provides connectivity to these diverse data stores and provides data retrieval, management, and analysis through all aspects of biological study. In this paper we discuss the system and database design of CCE. We also outline a few problems encountered at various stages of its development and the design decisions taken to resolve them.
The recent development of high-throughput proteomics techniques has resulted in the exponential growth of experimental proteomics data. At the same time, the amount of published biological information--which includes not only journal... more
The recent development of high-throughput proteomics techniques has resulted in the exponential growth of experimental proteomics data. At the same time, the amount of published biological information--which includes not only journal articles but also gene sequences, ...
ABSTRACT
The Energy Citations Database (ECD) provides access to historical and current research (1948 to the present) from the Department of Energy (DOE) and predecessor agencies.
Systems biology research demands the availability of tools and technologies that span a comprehensive range of computational capabilities, including data management, transfer, processing, integration, and interpretation. To address these... more
Systems biology research demands the availability of tools and technologies that span a comprehensive range of computational capabilities, including data management, transfer, processing, integration, and interpretation. To address these needs, we have created the bioinformatics resource manager (BRM), a scalable, flexible, and easy to use tool for biologists to undertake complex analyses. This paper describes the underlying software architecture of the BRM that integrates multiple commodity platforms to provide a highly extensible and scalable software infrastructure for bioinformatics. The architecture integrates a J2EE 3-tier application with an archival experimental data management system, the GAGGLE framework for desktop tool integration, and the MeDICi integration framework for high-throughput data analysis workflows. This architecture facilitates a systems biology software solution that enables the entire spectrum of scientific activities, from experimental data access to hig...
Simulation and modeling is becoming one of the standard approaches to understand complex biochemical processes. Therefore, there is a big need for software tools that allow access to diverse simulation and modeling methods as well as... more
Simulation and modeling is becoming one of the standard approaches to understand complex biochemical processes. Therefore, there is a big need for software tools that allow access to diverse simulation and modeling methods as well as support for the use of these methods. Here, we present a new software tool that is platform independent, user friendly and offers several unique features. In addition, we discuss numerical considerations and support for the switching between simulation methods.
For scientific data visualizations, real-time data streams present many interesting challenges when compared to static data. Real-time data are dynamic, transient, high-volume and temporal. Effective visualizations need to be able to... more
For scientific data visualizations, real-time data streams present many interesting challenges when compared to static data. Real-time data are dynamic, transient, high-volume and temporal. Effective visualizations need to be able to accommodate dynamic data behavior as well as Abstract and present the data in ways that make sense to and are usable by humans. The Visual Content Analysis of Real-Time
Motivation: Simulation and modeling is becoming a standard approach to understand complex biochemical processes. Therefore, there is a big need for software tools that allow access to diverse simulation and modeling methods as well as... more
Motivation: Simulation and modeling is becoming a standard approach to understand complex biochemical processes. Therefore, there is a big need for software tools that allow access to diverse simulation and modeling methods as well as support for the usage of these methods. Results: Here, we present COPASI, a platform-independent and user-friendly biochemical simulator that offers several unique features. We discuss numerical issues with these features; in particular, the criteria to switch between stochastic ...