Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
SINGULARITY
CONTAINERS FOR SCIENCE
Vanessa Sochat, PhD
Research Software Engineer
Research Computing Stanford University
THE PERFECT SANDWICH
The Perfect Sandwich
1. Peanut Butter
2. Jelly
3. Bread
4. Spread on Bread
5. Eat
The Perfect Sandwich
1. Peanut Butter
2. Jelly
3. Bread
4. Spread on Bread
5. Eat
The Perfect Sandwich
1. Peanut Butter
2. Jelly
3. Bread
4. Spread on Bread
5. Eat
The Perfect Sandwich
1. Peanut Butter
2. Jelly
3. Bread
4. Spread on Bread
5. Eat
IS IT THE SAME SANDWICH?
SAME SAME, BUT DIFFERENT
SAME SAME, BUT DIFFERENT
WE COULD HAVE DONE WORSE...
Why does it taste different?
1. Our recipe was not reproducible
1. Our recipe was not reproducible
2. We had missing dependencies
1. Our recipe was not reproducible
2. We had missing dependencies
3. The perfect sandwich might never be made again
1. Our recipe was not reproducible
2. We had missing dependencies
3. The perfect sandwich might never be made again
no ability to easily distribute or validate work
Introducing Singularity
Introducing Singularity
Give them the sandwich.
Container:
encapsulation of system
environment
LIFE’S
WORK
Container:
encapsulation of system
environment
Why not Docker?
DOCKER IS (STILL) GREAT!
DOCKER IS (STILL) GREAT!
Docker
Well-known container platform
DOCKER IS (STILL) GREAT!
Docker
Well-known container platform
Micro-service virtualization
DOCKER IS (STILL) GREAT!
Docker
Well-known container platform
Micro-service virtualization
Create + distribute containers
DOCKER IS (STILL) GREAT!
Docker
Well-known container platform
Micro-service virtualization
Create + distribute containers
Reproducible
DOCKER IS (STILL) GREAT!
Docker
Well-known container platform
Micro-service virtualization
Create + distribute containers
Reproducible
Easy to use, well documented
WHY NOT DOCKER?
Docker is not designed for,
WHY NOT DOCKER?
Docker is not designed for,
efficient for,
WHY NOT DOCKER?
Docker is not designed for,
efficient for,
or even compatible with
WHY NOT DOCKER?
Docker is not designed for,
efficient for,
or even compatible with
traditional HPC architectures
WHY NOT DOCKER?
Docker is not designed for,
efficient for,
or even compatible with
traditional HPC architectures
No centers run Docker on their traditional HPC
+
HPC
HPC ADMIN
HPC USER
scientists
need
containers
too
1. Singularity, three ways
2. Singularity Hub
3. Reproducible Science
Singularity
...three ways
How do I use it
THE SINGULARITY FLOW
Image Creation
$ singularity create ubuntu.img
$ singularity import ubuntu.img docker://ubuntu:14.04
Image Creation
$ singularity create ubuntu.img
$ singularity import ubuntu.img docker://ubuntu:14.04
Image Creation
$ singularity create ubuntu.img
$ singularity import ubuntu.img docker://ubuntu:14.04
$ singularity pull docker://ubuntu:14.04
ubuntu-14.04.img
Image Bootstrap
$ singularity create ubuntu.img
$ sudo singularity bootstrap ubuntu.img Singularity
Image Bootstrap
$ singularity create ubuntu.img
$ sudo singularity bootstrap ubuntu.img Singularity
Bootstrap: docker
From: python:latest
Singularity
Bootstrap: docker
From: python:latest
%post
apt-get update
apt-get install -y vim wget
mkdir /cave
Singularity
Bootstrap: docker
From: python:latest
%post
apt-get update
apt-get install -y vim wget
mkdir /cave
%labels
MAINTAINER vanessasaurus
Singularity
Bootstrap: docker
From: python:latest
%post
apt-get update
apt-get install -y vim wget
mkdir /cave
%labels
MAINTAINER vanessasaurus
%files
/home/vanessa/Desktop/rawr.sh /cave/rawr.sh
Singularity
Bootstrap: docker
From: python:latest
%post
apt-get update
apt-get install -y vim wget
mkdir /cave
%labels
MAINTAINER vanessasaurus
%files
/home/vanessa/Desktop/rawr.sh /cave/rawr.sh
%environment
DINOSAUR_HOME=/cave
export DINOSAUR_HOME
Singularity
Bootstrap: docker
From: python:latest
%post
apt-get update
apt-get install -y vim wget
mkdir /cave
%labels
MAINTAINER vanessasaurus
%files
/home/vanessa/Desktop/rawr.sh /cave/rawr.sh
%environment
DINOSAUR_HOME=/cave
export DINOSAUR_HOME
%runscript
exec /bin/bash /cave/rawr.sh “$@”
Singularity
PEARC17: Reproducibility and Containers: The Perfect Sandwich
Where does it live?
open
source
https://octodex.github.com/
client
(bash)
client
(bash)
src
(C)
client
(bash)
src
(C)
helper
(python)
/home/vanessa/.singularity
├── docker
├── metadata
└── shub
/usr/local/var/singularity/
└── mnt
├── container
├── overlay
└── session
/usr/local/
├── bin
├── etc
├── include
├── lib
├── libexec
└── var
./configure --prefix=/usr/local
client
/usr/local/
├── bin
├── etc
├── include
├── lib
├── libexec
└── var
./configure --prefix=/usr/local
/usr/local/
├── bin
├── etc
├── include
├── lib
├── libexec
└── var
src
/usr/local/
├── bin
├── etc
├── include
├── lib
├── libexec
└── var
python
/usr/local/
├── bin
├── etc
├── include
├── lib
├── libexec
└── var
mount
/usr/local/
├── bin
├── etc
├── include
├── lib
├── libexec
└── var
config
How does it work?
Installation
git clone https://www.github.com/singularityware/singularity.git
cd singularity
./autogen.sh
./configure --prefix=/usr/local
make
sudo make install
Customizable by the HPC Admin
SINGULARITY.CONF
- bind/mount points
- permissions
- overlayfs
Customizable by the HPC Admin
SINGULARITY.CONF
- bind/mount points
- permissions
- overlayfs
- config file must be root owned
Customizable by the HPC Admin
SINGULARITY.CONF
- bind/mount points
- permissions
- overlayfs
- config file must be root owned
- controls what user can/not do
Customizable by the HPC Admin
SINGULARITY.CONF
- bind/mount points
- permissions
- overlayfs
- config file must be root owned
- controls what user can/not do
- dis/allow different devices
Customizable by the HPC Admin
SINGULARITY.CONF
- bind/mount points
- permissions
- overlayfs
- config file must be root owned
- controls what user can/not do
- dis/allow different devices
- paths, session dirs all controlled
If you want to be root inside the container, you
must be root outside the container.
contained processes exit
all namespaces collapse
...leaving a cleaned system
The Singularity Command
singularity --debug run --contain sandwich.img
The Singularity Command
singularity --debug run --contain sandwich.img
<action>
The Singularity Command
singularity --debug run --contain sandwich.img
[global options]
The Singularity Command
singularity --debug run --contain sandwich.img
[command options]
The Singularity Command
singularity --debug run --contain sandwich.img
<image>
PEARC17: Reproducibility and Containers: The Perfect Sandwich
share
sandwich?
Singularity
Hub
1. Singularity, three ways
2. Singularity Hub
3. Reproducible Science
WHERE IS THE BOTTLENECK?
SINGULARITY HUB: CONTAINER REGISTRY
COLLECTIONS
COLLECTION
COLLECTION
commit
CONTAINER BUILD
CONTAINER BUILD LOG
ESTIMATED OPERATING SYSTEMS
How does it work? :>
1. Add bootstrap specification file to Github repo base
1. Add bootstrap specification file to Github repo base
2. “Turn build on” in Singularity Hub
1. Add bootstrap specification file to Github repo base
2. “Turn build on” in Singularity Hub
3. Commits are built automatically on Google Cloud
1. Add bootstrap specification file to Github repo base
2. “Turn build on” in Singularity Hub
3. Commits are built automatically on Google Cloud
4. Accessible via command line
PEARC17: Reproducibility and Containers: The Perfect Sandwich
PEARC17: Reproducibility and Containers: The Perfect Sandwich
PEARC17: Reproducibility and Containers: The Perfect Sandwich
1. Singularity, three ways
2. Singularity Hub
3. Reproducible Science
1. Singularity, three ways
2. Singularity Hub
3. Reproducible Science
What happens next?
container...
predictions!
Change in the movement of information
Change in the movement of information
bits
Change in the movement of information
bits
file
Change in the movement of information
bits
file
folder
Change in the movement of information
bits
file
folder
Change in the movement of information
bits
file
folder
software
Change in the movement of information
bits
file
folder
software
Change in the movement of information
bits
file
folder
software
apt-get install -y party-animal
pip install party-animal
I’m missing dependencies.
I didn’t get the same result
What version of Python did you use?
It doesn’t compile on my system!
The unit of information isn’t good enough.
Change in the movement of information
bits
file
folder
software
os
Change in the movement of information
bits
file
folder
software
osos
containers
container...
predictions!
Change in the movement of information: put stuff in containers
Too many containers!
PEARC17: Reproducibility and Containers: The Perfect Sandwich
Which containers do genomic analysis?
Which containers do genomic analysis?
Which containers do it best? How do we define best?
Which containers do genomic analysis?
Which containers do it best? How do we define best?
Which ones have the most varying result? Why?
PEARC17: Reproducibility and Containers: The Perfect Sandwich
I don’t know how to measure that.
Our representation of containers isn’t good enough
PEARC17: Reproducibility and Containers: The Perfect Sandwich
Expectation:
This container makes the
perfect sandwich!
Reality:
container...
predictions!
Change in the movement of information: put stuff in containers
Change in the representation of containers: reproducibility metrics
1. Singularity, three ways
2. Singularity Hub
3. Reproducibility Metrics
How is container C1
similar to container C2
?
C1
C2
PEARC17: Reproducibility and Containers: The Perfect Sandwich
PEARC17: Reproducibility and Containers: The Perfect Sandwich
PEARC17: Reproducibility and Containers: The Perfect Sandwich
C1
C2
Intersection of sets C1
and C2
C1
C2
Total sum of files in C1
and C2
C1
C2
Is container C1
similar to container C2
?
C1
C2
Is container C1
similar to container C2
?
It depends who is asking
SONIC MADE IT THROUGH
REPRODUCIBILITY LEVEL REPLICATE!
C1
Levels of Reproducibility
Identical: the exact same image file
Levels of Reproducibility
Identical: the exact same image file
Replicate: the same image built at different times
Levels of Reproducibility
Identical: the exact same image file
Replicate: the same image built at different times
Base: the core os is estimated to be the same
Levels of Reproducibility
Identical: the exact same image file
Replicate: the same image built at different times
Base: the core os is estimated to be the same
Runscript: the content of the runscript is the same
Environment: the environments are the same
Labels: the container labels are the same
What is a level of reproducibility?
A set of files between containers that are compared via content hash
PEARC17: Reproducibility and Containers: The Perfect Sandwich
Intersection of sets C1
and C2
C1
C2
Total sum of files in C1
and C2
C1
C2
Reproducibility
Assessment Algorithm
“Hash Content Comparison”
Intersection of sets C1
and C2
Total sum of files in C1
and C2
Do the levels behave as I would expect?
Compare an image to itself
Do the levels behave as I would expect?
Compare an image to itself
- At step 1, start with the image compared to its full self
Do the levels behave as I would expect?
Compare an image to itself
- At step 1, start with the image compared to its full self
- Subtract one file from the second image, recalculate, until empty
Do the levels behave as I would expect?
Compare an image to itself
- At step 1, start with the image compared to its full self
- Subtract one file from the second image, recalculate, until empty
a. Remove more recent files first
Do the levels behave as I would expect?
Compare an image to itself
- At step 1, start with the image compared to its full self
- Subtract one file from the second image, recalculate, until empty
a. Remove more recent files first
Do this across all levels
PEARC17: Reproducibility and Containers: The Perfect Sandwich
PEARC17: Reproducibility and Containers: The Perfect Sandwich
Reproducibility Metrics: Takeaways
1. “Operating system science” needs to be a thing
Reproducibility Metrics: Takeaways
1. “Operating system science” needs to be a thing
2. Definitions of levels important
Reproducibility Metrics: Takeaways
1. “Operating system science” needs to be a thing
2. Definitions of levels important
3. I learned things about the OS just looking at the graphs
Reproducibility Metrics: Takeaways
1. “Operating system science” needs to be a thing
2. Definitions of levels important
3. I learned things about the OS just looking at the graphs
4. A way to derive features for an operating system?
thinking
about
the
future
How can containers support reproducible science?
How can the HPC community support containers?
container sharing
integration
incentives
container sharing
integration
incentives
PEARC17: Reproducibility and Containers: The Perfect Sandwich
This is not optimized for scaled building!
Singularity Registry
Singularity Registry
a local registry for a cluster resource
PEARC17: Reproducibility and Containers: The Perfect Sandwich
PEARC17: Reproducibility and Containers: The Perfect Sandwich
Challenges
- Most resources can’t support web download links
- How to share images? manifests?
- Storage (for most) is a file system
- No Docker for orchestration
- Permissions?
- Integration with Singularity Hub?
- Management?
storage
- file system
storage
- file system
builders
- job queue
- build node
- virtual machines
storage
- file system
builders
- job queue
- build node
- virtual machines
manager
- singularity image
- command line
- web interface
/usr/local/libexec/sregistry
├── cli
│ ├── help.database
│ ├── help.init
│ ├── sregistry.build
│ ├── sregistry.database
│ ├── sregistry.help
│ └── sregistry.init
│
├── helpers
│ ├── args
│ ├── update
│ └── utils
└── singularity.registry
sudo ./install.sh --prefix=/usr/local
/opt/shub/
builder/
templates/
recipes/
.git/
.travis.yml
storage/
containers/
sudo sregistry init --base /opt/shub
/opt/shub/builder
recipes/
.git/
tensorflow/
tensorflow/
Singularity
Singularity.gpu
/opt/shub/builder
recipes/
.git/
tensorflow/
tensorflow/ ← collection tensorflow/tensorflow
Singularity
Singularity.gpu
/opt/shub/builder
recipes/
.git/
tensorflow/
tensorflow/ ← collection tensorflow/tensorflow
Singularity shub://tacc/tensorflow/tensorflow
Singularity.gpu
/opt/shub/builder
recipes/
.git/
tensorflow/
tensorflow/ ← collection tensorflow/tensorflow
Singularity shub://tacc/tensorflow/tensorflow
Singularity.gpu
registry
/opt/shub/builder
recipes/
.git/
tensorflow/
tensorflow/ ← collection tensorflow/tensorflow
Singularity shub://tacc/tensorflow/tensorflow
Singularity.gpu
container name
/opt/shub/builder
recipes/
.git/
tensorflow/
tensorflow/ ← collection tensorflow/tensorflow
Singularity shub://tacc/tensorflow/tensorflow:tag
Singularity.gpu
tag
registry:
container collection corresponds to a folder in repository
registry:
container collection corresponds to a folder in repository
Individual user:
container collection corresponds to an entire Github repo
registry:
container collection corresponds to a folder in repository
Individual user:
container collection corresponds to an entire GIthub repo
both
build multiple tags for one collection from within same repository
Connect to Singularity Hub
Connect to Singularity Hub
...permission to build, granted!
...build away, Merrill.
1. If setup to build locally
Launches local build job
2. If setup to only build on Singularity Hub
Pings Singularity Hub
3. Both
Launches local build job
Successful builds ping Singularity Hub
1. run a build command
sudo sregistry build tensorflow
sudo sregistry build tensorflow/tensorflow
sudo sregistry build tensorflow/tensorflow:gpu
/opt/shub/builder
templates/
recipes/
.git/
tensorflow/
tensorflow/
Singularity
Singularity.gpu
container sharing
integration
incentives
container sharing
integration
incentives
THE CLOUD HPC RESOURCE
How can we work together?
Local
Development
Environment
Local
Development
Environment
Local
Development
Environment
Testing
Local
Development
Environment
Testing
Deploy and
Share
Run!Local
Development
Environment
Testing
Deploy and
Share
Run!
Run!
Run!Local
Development
Environment
Testing
Deploy and
Share
Run!
Run!
Result
Run!
Creation
Testing Publication
Run!
Run!
Reproduce
Run!
Creation
Testing Publication
Run!
Run!
Reproduce
How can we work together?
How can we work together?
Just try.
PEARC17: Reproducibility and Containers: The Perfect Sandwich
Thank you Google!
container sharing
integration
incentives
Academic Layer Cake
scientists
Academic Layer Cake
scientists
staff
scientist
“I need a custom tool”
scientist
“I need a custom tool”
staff
“I offer resources”
scientist
“I need a custom tool”
staff
“I offer resources”
scientist
“I’ll do it myself”
scientist
“I need a custom tool”
staff
“I offer resources”
scientist
“I’ll do it myself”
scientist
“I need a custom tool”
staff
“I offer resources”
scientist
“I’ll do it myself”
Why can’t we do better?
Academic Layer Cake
scientists
staff
software engineers
PEARC17: Reproducibility and Containers: The Perfect Sandwich
PEARC17: Reproducibility and Containers: The Perfect Sandwich
Lessons from Software Engineering
1. Continuous Integration (testing)
2. Version Control
3. Documentation
4. Logging, Handling Errors
5. Databases, Organization, and Storage
We need incentives and support for
Research Software Engineers
scientist
“I need a custom tool”
scientist
“I need a custom tool”
software engineer
“I can help with that”
containers-ftw
PEARC17: Reproducibility and Containers: The Perfect Sandwich
containers-ftw
crowdsourcing science with
competitive containers
containers-ftw
crowdsourcing science with
competitive containers
- package your challenge
containers-ftw
crowdsourcing science with
competitive containers
- package your challenge
- define metric of success
containers-ftw
crowdsourcing science with
competitive containers
- package your challenge
- define metric of success
- share it
containers-ftw
crowdsourcing science with
competitive containers
- package your challenge
- define metric of success
- share it
- ...may the best container win!
SHOW ME WHAT YOU GOT
SHOW ME WHAT YOU GOT
- Dave Godlove, NIH
- Stefan Kombrink
https://singularity-hub.org/demos/6/
PEARC17: Reproducibility and Containers: The Perfect Sandwich
PEARC17: Reproducibility and Containers: The Perfect Sandwich
1. Singularity, three ways
2. Singularity Hub
3. Reproducible Science
[ Singularity ]
[ Singularity ]
reproducible tools
[ Singularity ]
reproducible tools
are a group effort
[ Singularity ]
reproducible tools
are a group effort
grow out of need
[ Singularity Hub ]
[ Singularity Hub ]
reproducible practices
[ Singularity Hub ]
reproducible practices
sharing containers, data, software
[ Singularity Hub ]
reproducible practices
sharing containers, data, software
working across lines
[ Reproducibility Metrics ]
[ Reproducibility Metrics ]
representation for understanding
[ Reproducibility Metrics ]
representation for understanding
container transparency
[ Incentives ]
[ Incentives ]
Research software engineering
Build for how you want the world to be
Party on, party dinosaur
[ the perfect sandwich ]
HPC Admin, Developers, and Scientists
http://singularityware.github.io
https://www.singularity-hub.org
https://www.github.com/singularityhub
Got messy code?
Need to use a node?
...jokes, help, for free!
#SRCC
Only the best
for your analysis mess
SRCC
vsochat@stanford.edu

More Related Content

PEARC17: Reproducibility and Containers: The Perfect Sandwich