The document is a presentation on Transcendent Memory on Xen given at the Xen Summit in 2009. It discusses the challenges of optimizing physical memory distribution across virtual machines on a hypervisor. It introduces Transcendent Memory as a new approach that collects unused and wasted guest memory into a shared pool to better optimize memory allocation over time without performance penalties.
Report
Share
Report
Share
1 of 53
Download to read offline
More Related Content
XS Oracle 2009 Transcendent Memory
1. <Insert Picture Here>
Transcendent
Memory on Xen
2009 Speaker: Dan Magenheimer
Oracle Corporation
2. Agenda
• Motivation and Challenge
• Overview of Physical Memory Management
• Transcendent Memory (“tmem”) Overview
• Transcendent Memory in Action
• Status, Futures, etc.
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
3. Motivation
• Memory is increasingly becoming a
bottleneck in virtualized system
• Existing mechanisms have major holes
ballooning
Four underutilized 2-cpu virtual servers
each with 1GB RAM
One 4-CPU physical
server w/4GB RAM
X
X
memory
overcommitment page
sharing
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
4. The Virtualized Physical Memory
Resource Optimization Challenge
Optimize, across time, the distribution of machine
memory among a maximal set of virtual machines by:
• measuring the current and future memory need of
each running VM and
• reclaiming memory from those VMs that have an
excess of memory and either:
• providing it to VMs that need more memory or
• using it to provision additional new VMs.
• without suffering a significant performance penalty
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
5. The Virtualized Physical Memory
Resource Optimization Challenge
Optimize, across time, the distribution of machine
memory among a maximal set of virtual machines by:
• measuring the current and future memory need of
each running VM and
• reclaiming memory from those VMs that have an
excess of memory and either:
• providing it to VMs that need more memory or
• using it to provision additional new VMs.
• without suffering a significant performance penalty
…..Why is this a hard problem?
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
6. Agenda
• Motivation and Challenge
• Overview of Physical Memory Management
• in an operating system
• in a virtual machine monitor (Xen)
• Transcendent Memory Overview
• Transcendent Memory In Action
• Status, Futures, etc.
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
7. OS Physical Memory Management
• Operating systems
are memory hogs!
OS
Memory constraint
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
8. OS Physical Memory Management
• Operating systems are
memory hogs!
OS
If you give an
operating system
more memory…..
New larger memory
constraint
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
9. OS Physical Memory Management
• Operating systems are
memory hogs!
My name is
Linux and I
am a
memory
…it uses up any
hog
memory you give it!
Memory constraint
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
10. OS Physical Memory Management
• What does an OS do
with all that memory?
Kernel code
Pa
ge
User data
ta
Page
bl
es
cache
Kernel
User code
data
Everything
else
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
11. OS Physical Memory Management
• What does an OS do
with all that memory?
page
cache
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
12. OS Physical Memory Management
• What does an OS do
with all that memory?
page
cache
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
13. OS Physical Memory Management
• What does an OS do
with all that memory?
page cache
…much of the time
mostly page cache
… some of which will
be useful in the future
… and some of which
is wasted
Everything else
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
14. Agenda
• Motivation and Challenge
• Overview of Physical Memory Management
• in an operating system
• in a virtual machine monitor (Xen)
• Transcendent Memory Overview
• Transcendent Memory In Action
• Status, Futures, etc.
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
15. VMM Physical Memory Management
• Xen partitions memory
• hypervisor memory
guest
• dom0 memory
• guest memory
Dom0 is special ☺
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
16. VMM Physical Memory Management
• Xen partitions
memory
guest
• Xen memory
• dom0 memory
• guest 1 memory
• guest 2 memory
guest
• whatever’s left over:
“fallow” memory
fallow, adj., land left without a
crop for one or more years
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
17. VMM Physical Memory Management
• Xen partitions
fallow
memory
guest
• Xen memory
fallow
• dom0 memory
fallow
• guest 1 memory
• guest 2 memory
guest
• whatever’s left over:
“fallow” memory
fallow, adj., land left without a
crop for one or more years
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
18. VMM Physical Memory Management
• Xen partitions memory
fallow
among more guests
fallow
gues
• Xen memory
guest t
• dom0 memory
guest
• guest 1 memory
• guest 2 memory
• guest 3…
fallow
guest • BUT still fallow memory
leftover
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
19. VMM Physical Memory Management
in the presence of migration
fallow
fallow
gue
guest s
t
Physical gues
machine “B”
• migration t
fallow
• requires fallow memory
in the target machine
• leaves behind fallow fallow
memory in the fallow
gue
s
t
originating machine gues
t
fallow
guest
Physical
machine “A”
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
20. VMM Physical Memory Management
in the presence of ballooning
• Use ballooning to
fallow
allow guest memory
fallow
gues
guest t
size to grow?
guest
• Goal: fill fallow memory
fallow
guest
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
21. VMM Physical Memory Management
in the presence of ballooning
• Look! No more
fallow
guest
fallow memory!
guest
But….
guest
gue
st gue
st
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
22. VMM Physical Memory Management
in the presence of ballooning
• Look! No more fallow
fallow
guest
memory!
guest
But….
guest
gue
st gue
st
And but…
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
23. VMM Physical Memory Management
in the presence of ballooning
Using ballooning to take memory away:
• not instantaneous (memory inertia)
• guest can’t predict future needs
• good pages are evicted along with the bad
• don’t know how much/fast to balloon
• Too much or too fast
thrashing or the dreaded OOM killer
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
24. The Virtualized Physical Memory
Resource Optimization Challenge
Optimize, across time, the distribution of machine
memory among a maximal set of virtual machines by:
• measuring the current and future memory need of
each running VM and
• reclaiming memory from those VMs that have an
excess of memory and either:
• providing it to VMs that need more memory or
• using it to provision additional new VMs.
• without suffering a significant performance penalty
…..This IS a hard problem!!!
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
25. Why this IS a hard problem!
Summary
• OS’s use as much memory as they are given
• but cannot predict the future so often guess wrong
• and often much memory owned by an OS is wasted
• Xen leaves large amounts of memory fallow
• fixed partitioning results in fragmentation
• migration requires fallow memory to succeed
• Ballooning helps but:
• can’t predict future memory needs of guests
• memory has inertia
• the price of incorrect guesses can be dire
NEED A NEW APPROACH TO VIRTUALIZED
PHYSICAL MEMORY MANAGEMENT!!
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
26. Agenda
• Motivation and Challenge
• Overview of Physical Memory Management
• Transcendent Memory Overview
• Transcendent Memory In Action
• Status, Futures, etc.
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
27. Transcendent memory
creating the transcendent memory pool
• Step 1a: reclaim all fallow memory
fallow
• Step 1b: reclaim wasted guest
fallow
memory (e.g. via ballooning)
guest guest
• Step 1c: collect it all into a pool
guest
fallow
guest
Transcendent
memory
pool
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
28. Transcendent memory
creating the transcendent memory pool
• Step 2: provide indirect
access, strictly controlled by
guest
the hypervisor and dom0
guest
data
data
control
Transcendent guest
memory data
pool
control
data
guest
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
29. Transcendent memory
API characteristics
Transcendent memory API
guest
guest • paravirtualized (lightly)
• narrow
• well-specified
• operations are:
• synchronous
• page-oriented (one page per op)
• copy-based
• multi-faceted
• extensible
Transcendent
memory
pool
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
30. Transcendent memory
four different subpool types four different uses
ephemeral persistent
private “second-chance” Fast swap
clean-page cache!! “device”!! Implemented and working
“hcache” “hswap” today (Linux + Xen)
In development
shared server-side cluster inter-domain
filesystem cache? shared Under investigation
“shared hcache” memory?
eph-em-er-al, adj., … transitory, existing only briefly, short-lived (i.e. NOT persistent)
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
31. Transcendent memory
caveats
• Requirements
• guest OS must be paravirtualized
• 64-bit hypervisor and CPU
• Workload:
• should exert memory pressure in at least one guest
• memory pressure in multiple guests should vary across time
• For best results:
• dom0 should be configured with a fixed memory size
• guest should have a (virtual) swap disk configured
• Complementary to:
• feedback-directed ballooning
• transparent content-based page sharing
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
32. Agenda
• Motivation and Challenge
• Overview of Physical Memory Management
• Transcendent Memory Overview
• Transcendent Memory In Action
“hcache”*
• private-ephemeral pool
“shared hcache”
• shared-ephemeral pool
“hswap”*
• private-persistent pool
• Status, Future, etc.
* called “precache” and “preswap” for Linux
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
33. hcache
• a second-chance clean
page cache for a guest
• “put” clean pages only
• “get” only valuable pages
• pages eventually are evicted
Transcendent
memory pool • coherency managed by guest
(private+ephemeral)
• exclusive cache semantics
“put”
Transcendent Memory Pool types
“get” persistent
ephemeral
guest “second-chance” Fast swap
private
clean-page cache!! “device”!!
“hcache” “hswap”
shared server-side cluster inter-domain
filesystem cache? shared memory?
“shared hcache”
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
34. hcache (with compression)
• Compression
• Option (per-domain)
guest
• nominally doubles available memory
• performance-space tradeoff
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
35. hcache (multiple guests)
• second-chance page cache
for multiple guests
guest • Need “memory scheduler”:
• global admission/eviction policy:
private ephemeral
• LRU queue, or
tmem pool #1
• weight balanced (future)
private ephemeral
tmem pool #2
guest
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
36. shared hcache (for clustering)
• guests sharing a
clustered filesystem
guest
• non-exclusive
• LFU instead of LRU
• compression optional
SHARED
ephemeral a server-side disk cache!
tmem pool
Clustered
filesystem Transcendent Memory Pool types
persistent
ephemeral
guest private “second-chance” Fast swap
clean-page cache!! “device”!!
“hcache” “hswap”
server-side cluster inter-domain
shared
filesystem cache? shared memory?
“shared hcache”
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
37. hswap
• over-ballooned guests
experiencing unexpected
memory pressure have an
emergency swap disk
• much faster than swapping
• persistent (“dirty”) pages OK
• prioritized higher than hcache
• limited by domain’s maxmem
Transcendent Memory Pool types
ephemeral persistent
“second-chance” Fast swap
private
clean-page cache!! “device”!!
“hcache” “hswap”
shared server-side cluster inter-domain
filesystem cache? shared memory?
“shared hcache”
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
38. Agenda
• Motivation and Challenge
• Overview of Physical Memory Management
• Transcendent Memory Overview
• Transcendent Memory In Action
• Status, Future, etc.
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
39. Current Status
• hcache and hswap fully working
• shared hcache soon
• xen-side patch ready for inclusion in xen-unstable
• ~3K line patch, but low impact on existing code
• enabled with xen boot option (off by default)
• “technology preview”
• goal: broader community usage (3.4?)
• linux-side patch ready
• low impact on existing code
• 2.6.18-xen version ready for inclusion in Xen-linux tree
• 2.6.28 version working
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
40. Future Work
• finish “shared hcache” work (ocfs2)
• shared-persistent pool investigation
• inter-domain communication?
• real world performance measurement/analysis
• identify tuning opportunities (e.g. scaleability) and repeat
• finish “memory scheduler”
• tmem for:
• native Linux?
• Linux containers?
• KVM?
• Hvm domains?
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
41. Acknowledgements
• Chris Mason (Oracle)
• Linux vfs changes for hcache
• Zhigang Wang (Oracle)
• Xen tools (xm + libxc) code
• Kurt Hackel (Oracle), various HP friends, Ian, Keir, Jeremy
• design feedback along the way
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
42. For more information
http://oss.oracle.com/projects/tmem
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
43. <Insert Picture Here>
Transcendent
Memory on Xen
2009 Speaker: Dan Magenheimer
Oracle Corporation
45. Transcendent Memory API
overview (API v0.0.1)
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
46. Transcendent memory API
op overview (API v0.0.1)
Two classes of operations:
• Create a pool
Syntax: pool_id = tmem_new_pool(uuid, flags)
• Operate on a created pool
Generic syntax:
retval = tmem_op(handle,pfn[,ofs1,ofs2,len])
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
47. Transcendent memory API
pool creation (API v0.0.1)
Syntax: pool_id = tmem_new_pool(uuid, flags)
ephemeral persistent
private “second-chance” Fast swap
clean- page cache!! “device”!! Implemented and working
“hcache” “hswap” today (Linux + Xen)
shared server-side cluster inter-domain Under investigation
filesystem cache? shared
memory?
flags: private vs. shared, ephemeral vs. persistent, page size, API version, … ???
uuid: 128-bit “share name” (for shared pools, ignored for private pools)
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
48. Transcendent Memory API
what is a “handle”?? (API v0.0.1)
retval = tmem_op(handle,pfn) (is actually)
retval = tmem_op(pool_id,object_id,page_id,pfn)
• The “handle” used in previous slides is actually a
three-element “handle-tuple” consisting of:
• a 32-bit pool-id (obtained from tmem_new_pool())
• a 64-bit object-id
• a 32-bit page-id
• In filesystem-like usage:
• pool-id one per filesystem
• object-id inode
• page-id page index into a file
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
50. Transcendent Memory API
important semantic details (v0.0.1)
• get_page on a private+ephemeral pool is destructive (auto-flush)
• implements exclusive cache semantics
• no serialization guarantees are provided for SMP VMs
• clients must ensure coherency with their own caches/data stores but
implementation provides following guarantees:
• put/put/get (aka “dup put”) coherency
tmem_put_page(ABC,D1);
tmem_put_page(ABC,D2);
tmem_get_page(ABC,E);
E may never contain the data from D1.
(implies that on persistent pools, dup put must never fail)
• get/get coherency
tmem_get_page(ABC,E);
tmem_get_page(ABC,E);
If the first get fails, the second must also fail
• all flush operations must always succeed
• return values: >=0 means success, < 0 failure (errno)
• see spec for more information
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
51. Transcendent memory
hcache performance
(smaller is better)
100
70
60 80
disk reads (K)
50
60
seconds
40
30 40
20
20
10
0
0
pcpu=2 pcpu=4 pcpu=4
pcpu=2 pcpu=4 pcpu=4
vcpu=2 vcpu=2 vcpu=4
vcpu=2 vcpu=2 vcpu=4
256MB w/hcache 256MB no hcache
256MB w/hcache 256MB no hcache
1024MB no hcache 2048MB no hcache
1024MB no hcache 2048MB no hcache
Benchmark: Linux compile, cold page cache, pre-caching enabled (ccache)
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
52. Transcendent memory
hcache compensates for
underprovisioned memory
120 600
100 500
disk reads (K)
80 400
seconds
60 300
40 200
20 100
0 0
pcpu=4 vcpu=4 pcpu=4 vcpu=4
128MB w/hcache 128MB no hcache 128MB w/hcache 128MB no hcache
256MB no hcache 1024MB no hcache 256MB no hcache 1024MB no hcache
Benchmark: Linux compile, warm page cache, pre-caching disabled
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
53. hcache (multiple domains + compressed)
• shared compressed
extended page cache for
guest
more than one guest
guest
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer