Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Sasha Ames

    Sasha Ames

    As the number and variety of files stored and accessed by a typical user has dramatically increased, existing file system structures have begun to fail as a mechanism for managing all of the information contained in those files. Many... more
    As the number and variety of files stored and accessed by a typical user has dramatically increased, existing file system structures have begun to fail as a mechanism for managing all of the information contained in those files. Many applications—email clients, multimedia management applications, and desktop search engines are examples— have been forced to develop their own richer metadata infrastructures. While effective, these solutions are generally non-standard, non-portable, non-sharable across applications, users or platforms, proprietary, and potentially inefficient. In the interest of providing a rich, efficient, shared file system metadata infrastructure, we have developed the Linking File System (LiFS). Taking advantage of non-volatile storage class memories, LiFS supports a wide variety of user and application metadata needs while efficiently supporting traditional file system operations. 1
    Abstract—Metagenomic analysis, the study of microbial communities found in environmental samples, presents considerable challenges in quantity of data and computational cost. We present a novel metagenomic analysis pipeline that leverages... more
    Abstract—Metagenomic analysis, the study of microbial communities found in environmental samples, presents considerable challenges in quantity of data and computational cost. We present a novel metagenomic analysis pipeline that leverages emerging large address space compute nodes with NVRAM to hold a searchable, memory-mapped “k-mer ” database of all known genomes and their taxonomic lineage. We describe challenges to creating the many hundred gigabytes-sized databases and describe database organization optimizations that enable our Livermore Metagenomic Analysis Toolkit (LMAT) software to effectively query the k-mer key-value store, which resides in high performance flash storage, as if fully in memory. To make database creation tractable, we have designed, implemented, and evaluated an optimized ingest pipeline. To optimize query performance for the database, we present a twolevel index scheme that yields speedups of 8.4 ⇥ 74 ⇥ over a conventional hash table index. LMAT, includin...
    — File system metadata management has become a bottleneck for many data-intensive applications that rely on high-performance file systems. Part of the bottleneck is due to the limitations of an almost 50 year old interface standard with... more
    — File system metadata management has become a bottleneck for many data-intensive applications that rely on high-performance file systems. Part of the bottleneck is due to the limitations of an almost 50 year old interface standard with metadata abstractions that were designed at a time when high-end file systems managed less than 100MB. Today’s high-performance file systems store 7 to 9 orders of magnitude more data, resulting in numbers of data items for which these metadata abstractions are inadequate, such as directory hierarchies unable to handle complex relationships among data. Users of file systems have attempted to work around these inadequacies by moving application-specific metadata management to relational databases to make metadata searchable. Splitting file system metadata management into two separate systems introduces inefficiencies and systems management problems. To address this problem, we propose QMDS: a file system metadata management service that integrates all...
    Deep learning techniques have been successfully applied to solve many problems in climate and geoscience using massive-scaled observed and modeled data. For extreme climate event detections, several models based on deep neural networks... more
    Deep learning techniques have been successfully applied to solve many problems in climate and geoscience using massive-scaled observed and modeled data. For extreme climate event detections, several models based on deep neural networks have been recently proposed and attend superior performance that overshadows all previous handcrafted expert based method. The issue arising, though, is that accurate localization of events requires high quality of climate data. In this work, we propose framework capable of detecting and localizing extreme climate events in very coarse climate data. Our framework is based on two models using deep neural networks, (1) Convolutional Neural Networks (CNNs) to detect and localize extreme climate events, and (2) Pixel recursive recursive super resolution model to reconstruct high resolution climate data from low resolution climate data. Based on our preliminary work, we have presented two CNNs in our framework for different purposes, detection and localiza...
    The DREAM project was funded more than 3 years ago to design and implement a next-generation ESGF (Earth System Grid Federation [1]) architecture which would be suitable for managing and accessing data and services resources on a... more
    The DREAM project was funded more than 3 years ago to design and implement a next-generation ESGF (Earth System Grid Federation [1]) architecture which would be suitable for managing and accessing data and services resources on a distributed and scalable environment. In particular, the project intended to focus on the computing and visualization capabilities of the stack, which at the time were rather primitive. At the beginning, the team had the general notion that a better ESGF architecture could be built by modularizing each component, and redefining its interaction with other components by defining and exposing a well defined API. Although this was still the high level principle that guided the work, the DREAM project was able to accomplish its goals by leveraging new practices in IT that started just about 3 or 4 years ago: the advent of containerization technologies (specifically, Docker), the development of frameworks to manage containers at scale (Docker Swarm and Kubernetes...
    As the number and variety of files stored and accessed by a typical user has dramatically increased, existing file system structures have begun to fail as a mechanism for managing all of the information contained in those files. Many... more
    As the number and variety of files stored and accessed by a typical user has dramatically increased, existing file system structures have begun to fail as a mechanism for managing all of the information contained in those files. Many applications—email clients, multimedia management applications, and desktop search engines are examples— have been forced to develop their own richer metadata infrastructures. While effective, these solutions are generally non-standard, non-portable, non-sharable across applications, users or platforms, proprietary, and potentially inefficient. In the interest of providing a rich, efficient, shared file system metadata infrastructure, we have developed the Linking File System (LiFS). Taking advantage of non-volatile storage class memories, LiFS supports a wide variety of user and application metadata needs while efficiently supporting traditional file system operations.
    The Linking File System introduces a new storage paradigm for enhanced user productivity through relationships between files, yet it opens up new challenges in usability. LiFSBrowse is a GUI for LiFS that attempts to meet those challenges... more
    The Linking File System introduces a new storage paradigm for enhanced user productivity through relationships between files, yet it opens up new challenges in usability. LiFSBrowse is a GUI for LiFS that attempts to meet those challenges through giving customizable graphical views of the file system. LiFSBrowse supports interaction through link manipulation and file system querying. We describe the layout of LiFSBrowse in detail and give some examples of its usability through a sample file system view.
    ABSTRACT Data-intensive applications are best suited to high-performance computing architectures that contain large quantities of main memory. Creating these systems with DRAM-based main memory remains costly and power-intensive. Due to... more
    ABSTRACT Data-intensive applications are best suited to high-performance computing architectures that contain large quantities of main memory. Creating these systems with DRAM-based main memory remains costly and power-intensive. Due to improvements in density and cost, non-volatile random access memories (NVRAM) have emerged as compelling storage technologies to augment traditional DRAM. This work explores the potential of future NVRAM technologies to store program state at performance comparable to DRAM. We have developed the PerMA NVRAM simulator that allows us to explore applications with working sets ranging up to hundreds of gigabytes per node. The simulator is implemented as a Linux device driver that allows application execution at native speeds. Using the simulator we show the impact of future technology generations of I/O-bus-attached NVRAM on an unstructured-access, level-asynchronous, Breadth-First Search (BFS) graph traversal algorithm. Our simulations show that within a couple of technology generations, a system architecture with local high performance NVRAM will be able to effectively augment DRAM to support highly concurrent data-intensive applications with large memory footprints. However, improvements will be needed in the I/O stack to deliver this performance to applications. The simulator shows that future technology generations of NVRAM in conjunction with an improved I/O runtime will enable parallel data-intensive applications to offload in-memory data structures to NVRAM with minimal performance loss.
    ABSTRACT Despite continual improvements in the performance and reliability of large scale file systems, the management of file system metadata has changed little in the past decade. The mismatch between the size and complexity of large... more
    ABSTRACT Despite continual improvements in the performance and reliability of large scale file systems, the management of file system metadata has changed little in the past decade. The mismatch between the size and complexity of large scale data stores and their ability to organize and query their metadata has led to a de facto standard in which raw data is stored in traditional file systems, while related, application-specific metadata is stored in relational databases. This separation of data and metadata requires considerable effort to maintain consistency and can result in complex, slow, and inflexible system operation. To address these problems, we have developed the Quasar File System (QFS), a metadata-rich file system in which files, metadata, and file relationships are all first class objects. In contrast to hierarchical file systems and relational databases, QFS defines a graph data model composed of files and their relationships. QFS includes Quasar, an XPATH-extended query language for searching the file system. Results from our QFS prototype show the effectiveness of this approach. Compared to the defacto standard, the QFS prototype shows superior ingest performance and comparable query performance on user metadata-intensive operations and superior performance on normal file metadata operations.
    ABSTRACT We present DI-MMAP, a high-performance runtime that memory-maps large external data sets into an application's address space and shows significantly better performance than the Linux mmap system call. Our implementation... more
    ABSTRACT We present DI-MMAP, a high-performance runtime that memory-maps large external data sets into an application's address space and shows significantly better performance than the Linux mmap system call. Our implementation is particularly effective when used with high performance locally attached Flash arrays on highly concurrent, latency-tolerant data-intensive HPC applications. We describe the kernel module and show performance results on a benchmark test suite and on a new bioinformatics metagenomic classification application. For the complex metagenomics classification application, DI-MMAP performs up to 4.88× better than standard Linux mmap.