Methods in molecular biology (Clifton, N.J.), 2018
In the "big data" era, research biologists are faced with analyzing new types that usua... more In the "big data" era, research biologists are faced with analyzing new types that usually require some level of computational expertise. A number of programs and pipelines exist, but acquiring the expertise to run them, and then understanding the output can be a challenge.The Pathosystems Resource Integration Center (PATRIC, www.patricbrc.org ) has created an end-to-end analysis platform that allows researchers to take their raw reads, assemble a genome, annotate it, and then use a suite of user-friendly tools to compare it to any public data that is available in the repository. With close to 113,000 bacterial and more than 1000 archaeal genomes, PATRIC creates a unique research experience with "virtual integration" of private and public data. PATRIC contains many diverse tools and functionalities to explore both genome-scale and gene expression data, but the main focus of this chapter is on assembly, annotation, and the downstream comparative analysis functiona...
This Argonne report serves as a companion to our CADE-10 paper. To fulfill promises made in that ... more This Argonne report serves as a companion to our CADE-10 paper. To fulfill promises made in that paper, we include here detailed proofs in clause notation, input files compatible with OTTER, and explanations for our choice of approach. We also include certain of the original and unpublished proofs (of Winker) that answered four open questions, two in equivalent calculus and two in the R-calculus. The organization parallels that of our CADE-10 paper. 34 refs.
The race to meet the challenges of the global pandemic has served as a reminder that the existing... more The race to meet the challenges of the global pandemic has served as a reminder that the existing drug discovery process is expensive, inefficient and slow. There is a major bottleneck screening the vast number of potential small molecules to shortlist lead compounds for antiviral drug development. New opportunities to accelerate drug discovery lie at the interface between machine learning methods, in this case developed for linear accelerators, and physics-based methods. The two in silico methods, each have their own advantages and limitations which, interestingly, complement each other. Here, we present an innovative infrastructural development that combines both approaches to accelerate drug discovery. The scale of the potential resulting workflow is such that it is dependent on supercomputing to achieve extremely high throughput. We have demonstrated the viability of this workflow for the study of inhibitors for four COVID-19 target proteins and our ability to perform the requir...
With the growing presence of multimedia-enabled systems, we will see an integration of collaborat... more With the growing presence of multimedia-enabled systems, we will see an integration of collaborative computing concepts into future scientific and technical workplaces. Desktop teleconferencing is common today,while more complex teleconferencing technology that relies on the availability of multipoint-enabled tools is starting to become available on PCs. A critical problem when using these collaborative tools is archiving multistream, multipoint meetings and making the content available to others. Ideally, one would like the ability to capture, record, play back, index, annotate, and distribute multimedia stream data as easily as we currently handle text or still-image data. The Argonne Voyager project is exploring and developing media server technology needed to provide such a flexible, virtual multipoint recording/playback capability. In this article we describe the motivating requirements, architecture, implementation, operation, performance, and related work. 1 Introduction As m...
2018 IEEE/ACM 4th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2), 2018
Advanced programming models, domain specific languages, and scripting toolkits have the potential... more Advanced programming models, domain specific languages, and scripting toolkits have the potential to greatly accelerate the adoption of high performance computing. These complex software systems, however, are often difficult to install and maintain, especially on exotic high-end systems. We consider deep learning workflows used on petascale systems and redeployment on research clusters using containers. Containers are used to deploy the MPI-based infrastructure, but challenges in efficiency, usability, and complexity must be overcome. In this work, we address these challenges through enhancements to a unified workflow system that manages interaction with the container abstraction, the cluster scheduler, and the programming tools. We also report results from running the application on our system, harnessing 298 TFLOPS (single precision).
This white paper describes a collaborative project that brings together systems software develope... more This white paper describes a collaborative project that brings together systems software developers, computer vendors, and applications teams to develop hardware and software systems to support scalable I/O for high performance computer systems. The project is organized around the provision of a full-scale testbed for the development and evaluation of new systems software for scalable I/O. In addition, research projects will be formed to address the scalable I/O problem from a number of perspectives, such as languages, compilers, le systems, networking software, persistent object stores, and low level system services. Unlike the current state of parallel operating systems where a commonly available software platform exists (i.e., Mach), vendors that wish today to provide parallel I/O capabilities for their MPP systems must largely work from scratch when developing the le systems and user software needed to support a scalable I/O system. This problem forces many vendors to duplicate ...
The race to meet the challenges of the global pandemic has served as a reminder that the existing... more The race to meet the challenges of the global pandemic has served as a reminder that the existing drug discovery process is expensive, inefficient and slow. There is a major bottleneck screening the vast number of potential small molecules to shortlist lead compounds for antiviral drug development. New opportunities to accelerate drug discovery lie at the interface between machine learning methods, in this case, developed for linear accelerators, and physics-based methods. The two in silico methods, each have their own advantages and limitations which, interestingly, complement each other. Here, we present an innovative infrastructural development that combines both approaches to accelerate drug discovery. The scale of the potential resulting workflow is such that it is dependent on supercomputing to achieve extremely high throughput. We have demonstrated the viability of this workflow for the study of inhibitors for four COVID-19 target proteins and our ability to perform the requi...
Abstract: The University of Chicago and Argonne National Laboratory are building a highly distrib... more Abstract: The University of Chicago and Argonne National Laboratory are building a highly distributed storage system to meet the needs of a crossdisciplinary group of scientists and to investigate the issues involved in implementing such a system. The storage system is based on IBM 3590/3494 tape technology managed by ADSM software. High-speed wide-area networking is required to support the distributed user community. NFS, DFS, FTP, and other user access methods are supported.
Methods in molecular biology (Clifton, N.J.), 2018
In the "big data" era, research biologists are faced with analyzing new types that usua... more In the "big data" era, research biologists are faced with analyzing new types that usually require some level of computational expertise. A number of programs and pipelines exist, but acquiring the expertise to run them, and then understanding the output can be a challenge.The Pathosystems Resource Integration Center (PATRIC, www.patricbrc.org ) has created an end-to-end analysis platform that allows researchers to take their raw reads, assemble a genome, annotate it, and then use a suite of user-friendly tools to compare it to any public data that is available in the repository. With close to 113,000 bacterial and more than 1000 archaeal genomes, PATRIC creates a unique research experience with "virtual integration" of private and public data. PATRIC contains many diverse tools and functionalities to explore both genome-scale and gene expression data, but the main focus of this chapter is on assembly, annotation, and the downstream comparative analysis functiona...
This Argonne report serves as a companion to our CADE-10 paper. To fulfill promises made in that ... more This Argonne report serves as a companion to our CADE-10 paper. To fulfill promises made in that paper, we include here detailed proofs in clause notation, input files compatible with OTTER, and explanations for our choice of approach. We also include certain of the original and unpublished proofs (of Winker) that answered four open questions, two in equivalent calculus and two in the R-calculus. The organization parallels that of our CADE-10 paper. 34 refs.
The race to meet the challenges of the global pandemic has served as a reminder that the existing... more The race to meet the challenges of the global pandemic has served as a reminder that the existing drug discovery process is expensive, inefficient and slow. There is a major bottleneck screening the vast number of potential small molecules to shortlist lead compounds for antiviral drug development. New opportunities to accelerate drug discovery lie at the interface between machine learning methods, in this case developed for linear accelerators, and physics-based methods. The two in silico methods, each have their own advantages and limitations which, interestingly, complement each other. Here, we present an innovative infrastructural development that combines both approaches to accelerate drug discovery. The scale of the potential resulting workflow is such that it is dependent on supercomputing to achieve extremely high throughput. We have demonstrated the viability of this workflow for the study of inhibitors for four COVID-19 target proteins and our ability to perform the requir...
With the growing presence of multimedia-enabled systems, we will see an integration of collaborat... more With the growing presence of multimedia-enabled systems, we will see an integration of collaborative computing concepts into future scientific and technical workplaces. Desktop teleconferencing is common today,while more complex teleconferencing technology that relies on the availability of multipoint-enabled tools is starting to become available on PCs. A critical problem when using these collaborative tools is archiving multistream, multipoint meetings and making the content available to others. Ideally, one would like the ability to capture, record, play back, index, annotate, and distribute multimedia stream data as easily as we currently handle text or still-image data. The Argonne Voyager project is exploring and developing media server technology needed to provide such a flexible, virtual multipoint recording/playback capability. In this article we describe the motivating requirements, architecture, implementation, operation, performance, and related work. 1 Introduction As m...
2018 IEEE/ACM 4th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2), 2018
Advanced programming models, domain specific languages, and scripting toolkits have the potential... more Advanced programming models, domain specific languages, and scripting toolkits have the potential to greatly accelerate the adoption of high performance computing. These complex software systems, however, are often difficult to install and maintain, especially on exotic high-end systems. We consider deep learning workflows used on petascale systems and redeployment on research clusters using containers. Containers are used to deploy the MPI-based infrastructure, but challenges in efficiency, usability, and complexity must be overcome. In this work, we address these challenges through enhancements to a unified workflow system that manages interaction with the container abstraction, the cluster scheduler, and the programming tools. We also report results from running the application on our system, harnessing 298 TFLOPS (single precision).
This white paper describes a collaborative project that brings together systems software develope... more This white paper describes a collaborative project that brings together systems software developers, computer vendors, and applications teams to develop hardware and software systems to support scalable I/O for high performance computer systems. The project is organized around the provision of a full-scale testbed for the development and evaluation of new systems software for scalable I/O. In addition, research projects will be formed to address the scalable I/O problem from a number of perspectives, such as languages, compilers, le systems, networking software, persistent object stores, and low level system services. Unlike the current state of parallel operating systems where a commonly available software platform exists (i.e., Mach), vendors that wish today to provide parallel I/O capabilities for their MPP systems must largely work from scratch when developing the le systems and user software needed to support a scalable I/O system. This problem forces many vendors to duplicate ...
The race to meet the challenges of the global pandemic has served as a reminder that the existing... more The race to meet the challenges of the global pandemic has served as a reminder that the existing drug discovery process is expensive, inefficient and slow. There is a major bottleneck screening the vast number of potential small molecules to shortlist lead compounds for antiviral drug development. New opportunities to accelerate drug discovery lie at the interface between machine learning methods, in this case, developed for linear accelerators, and physics-based methods. The two in silico methods, each have their own advantages and limitations which, interestingly, complement each other. Here, we present an innovative infrastructural development that combines both approaches to accelerate drug discovery. The scale of the potential resulting workflow is such that it is dependent on supercomputing to achieve extremely high throughput. We have demonstrated the viability of this workflow for the study of inhibitors for four COVID-19 target proteins and our ability to perform the requi...
Abstract: The University of Chicago and Argonne National Laboratory are building a highly distrib... more Abstract: The University of Chicago and Argonne National Laboratory are building a highly distributed storage system to meet the needs of a crossdisciplinary group of scientists and to investigate the issues involved in implementing such a system. The storage system is based on IBM 3590/3494 tape technology managed by ADSM software. High-speed wide-area networking is required to support the distributed user community. NFS, DFS, FTP, and other user access methods are supported.
Uploads
Papers by Rick Stevens