Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Error-controlled Progressive Retrieval of Scientific Data under Derivable Quantities of Interest
SC '24: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and AnalysisArticle No.: 86, Pages 1–16https://doi.org/10.1109/SC41406.2024.00092The unprecedented amount of scientific data has introduced heavy pressure on the current data storage and transmission systems. Progressive compression has been proposed to mitigate this problem, which offers data access with on-demand precision. However,...
- research-articleOctober 2024
Navigating the Landscape of Reproducible Research: A Predictive Modeling Approach
CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge ManagementPages 24–33https://doi.org/10.1145/3627673.3679831The reproducibility of scientific articles is central to the advancement of science. Despite this importance, evaluating reproducibility remains challenging due to the scarcity of ground truth data. Predictive models can address this limitation by ...
- short-paperJuly 2024
Open Science Data Federation - operation and monitoring
PEARC '24: Practice and Experience in Advanced Research Computing 2024: Human Powered ComputingArticle No.: 63, Pages 1–5https://doi.org/10.1145/3626203.3670557Extensive data processing is becoming commonplace in many fields of science. Distributing data to processing sites and providing methods to share the data with collaborators efficiently has become essential. The Open Science Data Federation (OSDF) ...
- research-articleSeptember 2024
GWLZ: A Group-wise Learning-based Lossy Compression Framework for Scientific Data
FlexScience'24: Proceedings of the 14th Workshop on AI and Scientific Computing at Scale using Flexible Computing InfrastructuresPages 34–41https://doi.org/10.1145/3659995.3660041The rapid expansion of computational capabilities and the evergrowing scale of modern HPC systems present formidable challenges in managing exascale scientific data. Faced with such vast datasets, traditional lossless compression techniques prove ...
- extended-abstractFebruary 2024
Abstractive Summarization of Scientific Documents: Models and Evaluation Techniques
FIRE '23: Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval EvaluationPages 121–124https://doi.org/10.1145/3632754.3632771Text summarization refers to the procedure of condensing the key ideas present in a single document or a group of related documents. The fundamental advantage of text summarization is that it saves the reader’s time by extracting the most crucial ...
-
- research-articleAugust 2023
FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Computing Applications on GPUs
HPDC '23: Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed ComputingPages 129–142https://doi.org/10.1145/3588195.3592994Today's large-scale scientific applications running on high-performance computing (HPC) systems generate vast data volumes. Thus, data compression is becoming a critical technique to mitigate the storage burden and data-movement cost. However, existing ...
- research-articleAugust 2023
SciDG: Benchmarking Scientific Dynamic Graph Queries
SSDBM '23: Proceedings of the 35th International Conference on Scientific and Statistical Database ManagementArticle No.: 4, Pages 1–12https://doi.org/10.1145/3603719.3603724Dynamic graphs are increasingly being utilized in domain knowledge modeling and large-scale scientific data management. Managing dynamic graph data requires a graph database system that can handle constantly changing volumes and data versions, while ...
- research-articleSeptember 2022
The Information Ecosystem of Open Science: Key Aspects of Development
Scientific and Technical Information Processing (SPSTIP), Volume 49, Issue 3Pages 151–158https://doi.org/10.3103/S0147688222030042AbstractThis paper presents the results of an analysis of trends in the development of the information ecosystem of open science based on the study of the practices of the main actors (research and educational organizations, publishers, sponsors of ...
- research-articleJune 2022
CEAZ: accelerating parallel I/O via hardware-algorithm co-designed adaptive lossy compression
ICS '22: Proceedings of the 36th ACM International Conference on SupercomputingArticle No.: 12, Pages 1–13https://doi.org/10.1145/3524059.3532362As HPC systems continue to grow to exascale, the amount of data that needs to be saved or transmitted is exploding. To this end, many previous works have studied using error-bounded lossy compressors to reduce the data size and improve the I/O ...
- research-articleJune 2022
PROV-IO: An I/O-Centric Provenance Framework for Scientific Data on HPC Systems
HPDC '22: Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed ComputingPages 213–226https://doi.org/10.1145/3502181.3531477cData provenance, or data lineage, describes the life cycle of data. In scientific workflows on HPC systems, scientists often seek diverse provenance (e.g., origins of data products, usage patterns of datasets). Unfortunately, existing provenance ...
- research-articleJune 2022
Ultrafast Error-bounded Lossy Compression for Scientific Datasets
HPDC '22: Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed ComputingPages 159–171https://doi.org/10.1145/3502181.3531473Today's scientific high-performance computing applications and advanced instruments are producing vast volumes of data across a wide range of domains, which impose a serious burden on data transfer and storage. Error-bounded lossy compression has been ...
- research-articleJune 2022
TAC: Optimizing Error-Bounded Lossy Compression for Three-Dimensional Adaptive Mesh Refinement Simulations
HPDC '22: Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed ComputingPages 135–147https://doi.org/10.1145/3502181.3531458Today's scientific simulations require a significant reduction of data volume because of extremely large amounts of data they produce and the limited I/O bandwidth and storage space. Error-bounded lossy compression has been considered one of the most ...
- research-articleOctober 2021
The Library in the Information Ecosystem of Open Science
Scientific and Technical Information Processing (SPSTIP), Volume 48, Issue 4Pages 239–247https://doi.org/10.3103/S0147688221040043AbstractThe results of an analysis of trends in the development of the information ecosystem of open science based on the study of the global document flow, open access resources, and scientific data repositories, as well as initiatives in the field of ...
- research-articleSeptember 2020
cuSZ: An Efficient GPU-Based Error-Bounded Lossy Compression Framework for Scientific Data
- Jiannan Tian,
- Sheng Di,
- Kai Zhao,
- Cody Rivera,
- Megan Hickman Fulp,
- Robert Underwood,
- Sian Jin,
- Xin Liang,
- Jon Calhoun,
- Dingwen Tao,
- Franck Cappello
PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation TechniquesPages 3–15https://doi.org/10.1145/3410463.3414624Error-bounded lossy compression is a state-of-the-art data reduction technique for HPC applications because it not only significantly reduces storage overhead but also can retain high fidelity for postanalysis. Because supercomputers and HPC ...
- short-paperAugust 2020
Keyword Recommendation Methods for Earth Science Data Considering Hierarchical Structure of Vocabularies
JCDL '20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020Pages 353–356https://doi.org/10.1145/3383583.3398622To understand and properly use scientific data, it is important that the metadata, which describes information related to data, contains sufficient information. Keywords are one of the metadata items, and for assigning keywords to the earth science ...
waveSZ: a hardware-algorithm co-design of efficient lossy compression for scientific data
PPoPP '20: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingPages 74–88https://doi.org/10.1145/3332466.3374525Error-bounded lossy compression is critical to the success of extreme-scale scientific research because of ever-increasing volumes of data produced by today's high-performance computing (HPC) applications. Not only can error-controlled lossy compressors ...
- research-articleMarch 2019
Z-checker: A framework for assessing lossy compression of scientific data
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 33, Issue 2Pages 285–303https://doi.org/10.1177/1094342017737147Because of the vast volume of data being produced by today’s scientific simulations and experiments, lossy data compressor allowing user-controlled loss of accuracy during the compression is a relevant solution for significantly reducing the data size. ...
- research-articleJanuary 2019
Recovery of scientific data using Intelligent Distributed Data Warehouse
Procedia Computer Science (PROCS), Volume 151, Issue CPages 1249–1254https://doi.org/10.1016/j.procs.2019.04.180AbstractA Retrieval System requires several components that define its functionality and behavior. In the case of a meta-search engine for the retrieval of scientific data, a schema that defines the way to store such data is considered a necessary element ...
- research-articleOctober 2018
Scientific Data Relevance Criteria Classification and Usage
CSAE '18: Proceedings of the 2nd International Conference on Computer Science and Application EngineeringArticle No.: 30, Pages 1–7https://doi.org/10.1145/3207677.3278010In1 the big data era, scientific data plays a crucial role in scientific research. Data sharing, retrieval and usage has become an inevitable trend. We study how the users of scientific data select relevant data from the data sharing platform. The study ...
- posterSeptember 2018
NDN-SCI for managing large scale genomics data
ICN '18: Proceedings of the 5th ACM Conference on Information-Centric NetworkingPages 204–205https://doi.org/10.1145/3267955.3269022Genomics datasets are currently managed by iRODS, the Integrated Rule-Oriented Data System, which is an open source data management software. iRODS provides several services, including indexing, publishing, integrity, storage, and provenance. In this ...