Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3311790.3401776acmconferencesArticle/Chapter ViewAbstractPublication PagespearcConference Proceedingsconference-collections
research-article

Deploying large fixed file datasets with SquashFS and Singularity

Published: 26 July 2020 Publication History

Abstract

Shared high-performance computing (HPC) platforms, such as those provided by XSEDE and Compute Canada, enable researchers to carry out large-scale computational experiments at a fraction of the cost of the cloud. Most systems require the use of distributed filesystems (e.g. Lustre) for providing a highly multi-user, large capacity storage environment. These suffer performance penalties as the number of files increases due to network contention and metadata performance. We demonstrate how a combination of two technologies, Singularity and SquashFS, can help developers, integrators, architects, and scientists deploy large datasets (O(10M) files) on these shared systems with minimal performance limitations. The proposed integration enables more efficient access and indexing than normal file-based dataset installations, while providing transparent file access to users and processes. Furthermore, the approach does not require administrative privileges on the target system. While the examples studied here have been taken from the field of neuroimaging, the technologies adopted are not specific to that field. Currently, this solution is limited to read-only datasets. We propose the adoption of this technology for the consumption and dissemination of community datasets across shared computing resources.

Supplemental Material

MP4 File
Presentation video

References

[1]
Boyer, E.B. 2012. Glusterfs one storage server to rule them all. Los Alamos National Lab.(LANL), Los Alamos, NM (United States).
[2]
Braam, P. 2019. The Lustre Storage Architecture. arXiv [cs.OS].
[3]
Burns, R. 2013. The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience. International Conference on Scientific and Statistical Database Management: [proceedings] / SSDBM. International Conference on Scientific and Statistical Database Management. (2013).
[4]
Collins, F.S. 2003. The Human Genome Project: lessons from large-scale biology. Science. 300, 5617 (Apr. 2003), 286–290.
[5]
Dong, H.W. 2008. The Allen reference atlas: A digital color brain atlas of the C57Bl/6J male mouse. John Wiley & Sons Inc.
[6]
Fischl, B. 2012. FreeSurfer. NeuroImage. 62, 2 (Aug. 2012), 774–781.
[7]
Glasser, M.F. 2013. The minimal preprocessing pipelines for the Human Connectome Project. NeuroImage. 80, (Oct. 2013), 105–124.
[8]
Kurtzer, G.M. 2017. Singularity: Scientific containers for mobility of compute. PloS one. 12, 5 (May 2017), e0177459.
[9]
Lougher, P. and Lougher, R. 2008. SquashFS.
[10]
Nowicki, B. RFC 1094,“NFS: Network File System Protocol Specification,” Mar. 1989, 27 pages, Sun Microsystems. Inc., Santa Clara, CA.
[11]
Shen, E.H. 2012. The Allen Human Brain Atlas: comprehensive gene expression mapping of the human brain. Trends in neurosciences. 35, 12 (Dec. 2012), 711–714.
[12]
Shipman, G. 2010. Lessons learned in deploying the world's largest scale lustre file system. The 52nd Cray user group conference (2010).
[13]
Sudlow, C. 2015. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS medicine. 12, 3 (Mar. 2015), e1001779.
[14]
York, D.G. 2000. The sloan digital sky survey: Technical summary. The Astronomical journal. 120, 3 (2000), 1579.
[15]
Krotkiewski, M, "Dealing with small Files in HPC Environments: automatic Loop-Back Mounting of Disk Images" (2017), https://prace-ri.eu/wp-content/uploads/WP245.pdf
[16]
"libfuse reference implementation", https://github.com/libfuse/libfuse

Cited By

View all
  • (2023)Lazy Python Dependency Management in Large-Scale Systems2023 IEEE 19th International Conference on e-Science (e-Science)10.1109/e-Science58273.2023.10254910(1-10)Online publication date: 9-Oct-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PEARC '20: Practice and Experience in Advanced Research Computing 2020: Catch the Wave
July 2020
556 pages
ISBN:9781450366892
DOI:10.1145/3311790
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 July 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Lustre
  2. Singularity
  3. SquashFS
  4. Virtualization

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Canada First Research Excellence Fund/Healthy Brains for Healthy Lives

Conference

PEARC '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 133 of 202 submissions, 66%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Lazy Python Dependency Management in Large-Scale Systems2023 IEEE 19th International Conference on e-Science (e-Science)10.1109/e-Science58273.2023.10254910(1-10)Online publication date: 9-Oct-2023

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media