Massively-parallel sequencing (MPS) technologies and their diverse applications in genomics and epigenomics research have yielded enormous new insights into the physiology and pathophysiology of the human genome. The biggest hurdle... more
Massively-parallel sequencing (MPS) technologies and their diverse applications in genomics and epigenomics research have yielded enormous new insights into the physiology and pathophysiology of the human genome. The biggest hurdle remains the magnitude and diversity of the datasets generated, compromising our ability to manage, organize, process and ultimately analyse data. The Wiki-based Automated Sequence Processor (WASP), developed at the Albert Einstein College of Medicine (hereafter Einstein), uniquely manages to tightly couple the sequencing platform, the sequencing assay, sample metadata and the automated workflows deployed on a heterogeneous high performance computing cluster infrastructure that yield sequenced, quality-controlled and 'mapped' sequence data, all within the one operating environment accessible by a web-based GUI interface. WASP at Einstein processes 4-6 TB of data per week and since its production cycle commenced it has processed ~ 1 PB of data overa...
Research Interests:
A novel DNA methylation assay, HELP-tagging, has been recently described to use massively parallel sequencing technology for genome-wide methylation profiling. Massively parallel sequencing-based assays such as this produce substantial... more
A novel DNA methylation assay, HELP-tagging, has been recently described to use massively parallel sequencing technology for genome-wide methylation profiling. Massively parallel sequencing-based assays such as this produce substantial amounts of data, which complicate analysis and necessitate the use of significant computational resources. To simplify the processing and analysis of HELP-tagging data, a bioinformatic analytical pipeline was developed. Quality checks are performed on the data at various stages, as they are processed by the pipeline to ensure the accuracy of the results. A quantitative methylation score is provided for each locus, along with a confidence score based on the amount of information available for determining the quantification. HELP-tagging analysis results are supplied in standard file formats (BED and WIG) that can be readily examined on the UCSC genome browser.
Research Interests:
The challenges associated with the management, analysis and interpretation of assays based on massively-parallel sequencing (MPS) are both individually complex and numerous. We describe what we believe to be the appropriate solution, one... more
The challenges associated with the management, analysis and interpretation of assays based on massively-parallel sequencing (MPS) are both individually complex and numerous. We describe what we believe to be the appropriate solution, one that represents a departure from traditional computational biology approaches. The Wasp System is an open source, distributed package written in Spring/J2EE that creates a foundation for development of an end-to-end solution for MPS-based experiments or clinical tests. Recognizing that one group will be unable to solve these challenges in isolation, we describe a nurtured open source development model that will allow the software to be collectively used, shared and developed. The ultimate goal is to emulate resources such as the Virtual Observatory of the astrophysics community, enabling computationally-inexpert scientists and clinicians to explore and interpret their MPS data. Here we describe the current implementation and development of the Wasp System and the roadmap for its community development.