Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
<Insert Picture Here>

                        Memory on Xen
    2009                   Speaker: Dan Magenheimer
                                  Oracle Corporation

         •   Motivation and Challenge
         •   Overview of Physical Memory Management
         •   Transcendent Memory (“tmem”) Overview
         •   Transcendent Memory in Action
         •   Status, Futures, etc.

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
        • Memory is increasingly becoming a
          bottleneck in virtualized system
        • Existing mechanisms have major holes
      Four underutilized 2-cpu virtual servers
                           each with 1GB RAM
                                                 One 4-CPU physical
                                                 server w/4GB RAM

                            overcommitment                              page

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
The Virtualized Physical Memory
                      Resource Optimization Challenge
         Optimize, across time, the distribution of machine
           memory among a maximal set of virtual machines by:
         • measuring the current and future memory need of
           each running VM and
         • reclaiming memory from those VMs that have an
           excess of memory and either:
               • providing it to VMs that need more memory or
               • using it to provision additional new VMs.
         • without suffering a significant performance penalty

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
The Virtualized Physical Memory
                      Resource Optimization Challenge
         Optimize, across time, the distribution of machine
           memory among a maximal set of virtual machines by:
         • measuring the current and future memory need of
           each running VM and
         • reclaiming memory from those VMs that have an
           excess of memory and either:
               • providing it to VMs that need more memory or
               • using it to provision additional new VMs.
         • without suffering a significant performance penalty

         …..Why is this a hard problem?
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

         • Motivation and Challenge
         • Overview of Physical Memory Management
               • in an operating system
               • in a virtual machine monitor (Xen)
         • Transcendent Memory Overview
         • Transcendent Memory In Action
         • Status, Futures, etc.

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
OS Physical Memory Management

                                                                 • Operating systems
                                                                   are memory hogs!

        Memory constraint

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
OS Physical Memory Management

                                                                 • Operating systems are
                                                                   memory hogs!

                                                                 If you give an
                                                                   operating system
                                                                   more memory…..

            New larger memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
OS Physical Memory Management

                                                                 • Operating systems are
                                                                   memory hogs!

                                         My name is
                                         Linux and I
                                            am a
                                                                 …it uses up any
                                                                  memory you give it!

             Memory constraint

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
OS Physical Memory Management

                                                                    • What does an OS do
                                                                      with all that memory?

                                       Kernel code
                                           User data
                                            User code


Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
OS Physical Memory Management

                                                                 • What does an OS do
                                                                   with all that memory?

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
OS Physical Memory Management

                                                                 • What does an OS do
                                                                   with all that memory?

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
OS Physical Memory Management

                                                                 • What does an OS do
                                                                   with all that memory?
                           page cache
                                                                 …much of the time
                                                                   mostly page cache
                                                                 … some of which will
                                                                   be useful in the future
                                                                 … and some of which
                                                                   is wasted
               Everything else

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

         • Motivation and Challenge
         • Overview of Physical Memory Management
               • in an operating system
               • in a virtual machine monitor (Xen)
         • Transcendent Memory Overview
         • Transcendent Memory In Action
         • Status, Futures, etc.

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
VMM Physical Memory Management

                                                                 • Xen partitions memory
                                                                   • hypervisor memory
                                                                   • dom0 memory
                                                                   • guest memory

                            Dom0 is special ☺

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
VMM Physical Memory Management

                                                                 • Xen partitions
                                                                    •   Xen memory
                                                                    •   dom0 memory
                                                                    •   guest 1 memory
                                                                    •   guest 2 memory
                                                                    •   whatever’s left over:
                                                                        “fallow” memory

                                                                 fallow, adj., land left without a
                                                                 crop for one or more years

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
VMM Physical Memory Management

                                                                 • Xen partitions

                                                                    •   Xen memory
                                                                    •   dom0 memory

                                                                    •   guest 1 memory
                                                                    •   guest 2 memory
                                                                    •   whatever’s left over:
                                                                        “fallow” memory

                                                                 fallow, adj., land left without a
                                                                 crop for one or more years

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
VMM Physical Memory Management

                                                                         • Xen partitions memory

                                                                           among more guests

                                                                           •   Xen memory
                              guest                         t

                                                                           •   dom0 memory
                                                                           •   guest 1 memory
                                                                           •   guest 2 memory
                                                                           •   guest 3…

                         guest                                           • BUT still fallow memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
VMM Physical Memory Management
                            in the presence of migration


                                                                                   guest                    s

                                                                   Physical                                           gues
                                                                  machine “B”
  • migration                                                                                                           t

        • requires fallow memory
          in the target machine
        • leaves behind fallow                                                                       fallow

          memory in the                                                                                                fallow

          originating machine                                                                                         gues

                                                                 machine “A”

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
VMM Physical Memory Management
                            in the presence of ballooning

                                                                         • Use ballooning to

                                                                           allow guest memory

                              guest                         t
                                                                           size to grow?
                                                                           • Goal: fill fallow memory


Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
VMM Physical Memory Management
                            in the presence of ballooning

                                                                 • Look! No more

                                                                   fallow memory!

                   st             gue

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
VMM Physical Memory Management
                            in the presence of ballooning

                                                                 • Look! No more fallow

                   st             gue

                                                                 And but…

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
VMM Physical Memory Management
                            in the presence of ballooning
                                                           Using ballooning to take memory away:
                                                           • not instantaneous (memory inertia)
                                                           • guest can’t predict future needs
                                                                 • good pages are evicted along with the bad
                                                           • don’t know how much/fast to balloon
                                                                 • Too much or too fast
                                                                     thrashing or the dreaded OOM killer

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
The Virtualized Physical Memory
                      Resource Optimization Challenge
         Optimize, across time, the distribution of machine
           memory among a maximal set of virtual machines by:
         • measuring the current and future memory need of
           each running VM and
         • reclaiming memory from those VMs that have an
           excess of memory and either:
               • providing it to VMs that need more memory or
               • using it to provision additional new VMs.
         • without suffering a significant performance penalty

         …..This IS a hard problem!!!
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
Why this IS a hard problem!
         • OS’s use as much memory as they are given
               • but cannot predict the future so often guess wrong
               • and often much memory owned by an OS is wasted
         • Xen leaves large amounts of memory fallow
               • fixed partitioning results in fragmentation
               • migration requires fallow memory to succeed
         • Ballooning helps but:
               • can’t predict future memory needs of guests
               • memory has inertia
               • the price of incorrect guesses can be dire
Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

         •   Motivation and Challenge
         •   Overview of Physical Memory Management
         •   Transcendent Memory Overview
         •   Transcendent Memory In Action
         •   Status, Futures, etc.

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
Transcendent memory
                    creating the transcendent memory pool

                                                                            • Step 1a: reclaim all fallow memory

                                                                            • Step 1b: reclaim wasted guest
                                                                                   memory (e.g. via ballooning)
                             guest                        guest

                                                                            • Step 1c: collect it all into a pool


Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
Transcendent memory
                    creating the transcendent memory pool

                                                                         • Step 2: provide indirect
                                                                           access, strictly controlled by
                                                                           the hypervisor and dom0

             Transcendent                                        guest
                memory data


Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
Transcendent memory
                                              API characteristics

                                                                 Transcendent memory API
              guest                                              • paravirtualized (lightly)
                                                                 • narrow
                                                                 • well-specified
                                                                 • operations are:
                                                                    • synchronous
                                                                    • page-oriented (one page per op)
                                                                    • copy-based
                                                                 • multi-faceted
                                                                 • extensible

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
Transcendent memory
     four different subpool types                                                four different uses

                                     ephemeral                      persistent

               private “second-chance”                           Fast swap
                       clean-page cache!!                        “device”!!        Implemented and working
                                   “hcache”                         “hswap”           today (Linux + Xen)
                                                                                       In development
              shared server-side cluster inter-domain
                     filesystem cache?   shared                                      Under investigation
                         “shared hcache” memory?

        eph-em-er-al, adj., … transitory, existing only briefly, short-lived (i.e. NOT persistent)

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
Transcendent memory
        • Requirements
              • guest OS must be paravirtualized
              • 64-bit hypervisor and CPU
        • Workload:
              • should exert memory pressure in at least one guest
              • memory pressure in multiple guests should vary across time
        • For best results:
              • dom0 should be configured with a fixed memory size
              • guest should have a (virtual) swap disk configured
        • Complementary to:
              • feedback-directed ballooning
              • transparent content-based page sharing

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

         • Motivation and Challenge
         • Overview of Physical Memory Management
         • Transcendent Memory Overview
         • Transcendent Memory In Action
            • private-ephemeral pool
                                       “shared hcache”
            • shared-ephemeral pool
            • private-persistent pool
         • Status, Future, etc.

         * called “precache” and “preswap” for Linux

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
                                                                 • a second-chance clean
                                                                   page cache for a guest
                                                                   •   “put” clean pages only
                                                                   •   “get” only valuable pages
                                                                   •   pages eventually are evicted
                                    memory pool                    •   coherency managed by guest
                                                                   •   exclusive cache semantics

                                                                                   Transcendent Memory Pool types
                                      “get”                                                                   persistent
                     guest                                                          “second-chance”       Fast swap
                                                                                    clean-page cache!!    “device”!!
                                                                                       “hcache”              “hswap”
                                                                          shared    server-side cluster   inter-domain
                                                                                    filesystem cache?     shared memory?
                                                                                        “shared hcache”

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
hcache (with compression)

                                                                 • Compression
                                                                  • Option (per-domain)
                                                                  • nominally doubles available memory
                                                                  • performance-space tradeoff

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
hcache (multiple guests)

                                                                 • second-chance page cache
                                                                   for multiple guests

                 guest                                           • Need “memory scheduler”:
                                                                   • global admission/eviction policy:
                                 private ephemeral
                                                                      • LRU queue, or
                                   tmem pool #1
                                                                      • weight balanced (future)

                                 private ephemeral
                                   tmem pool #2


Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
shared hcache (for clustering)

                                                                 • guests sharing a
                                                                   clustered filesystem
                                                                   • non-exclusive
                                                                   • LFU instead of LRU
                                                                   • compression optional
                                                  ephemeral           a server-side disk cache!
                                                  tmem pool
filesystem                                                              Transcendent Memory Pool types
                                guest                               private   “second-chance”       Fast swap
                                                                              clean-page cache!!    “device”!!
                                                                                 “hcache”              “hswap”
                                                                              server-side cluster   inter-domain
                                                                              filesystem cache?     shared memory?
                                                                                  “shared hcache”

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

                                                                  • over-ballooned guests
                                                                    experiencing unexpected
                                                                    memory pressure have an
                                                                    emergency swap disk
                                                                    •   much faster than swapping
                                                                    •   persistent (“dirty”) pages OK
                                                                    •   prioritized higher than hcache
                                                                    •   limited by domain’s maxmem
                                                                                   Transcendent Memory Pool types

                                                                                              ephemeral         persistent
                                                                                         “second-chance”       Fast swap
                                                                                         clean-page cache!!    “device”!!
                                                                                            “hcache”              “hswap”
                                                                                shared   server-side cluster   inter-domain
                                                                                         filesystem cache?     shared memory?
                                                                                             “shared hcache”

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

         •   Motivation and Challenge
         •   Overview of Physical Memory Management
         •   Transcendent Memory Overview
         •   Transcendent Memory In Action
         •   Status, Future, etc.

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
Current Status

         • hcache and hswap fully working
               • shared hcache soon
         • xen-side patch ready for inclusion in xen-unstable
               • ~3K line patch, but low impact on existing code
               • enabled with xen boot option (off by default)
                  • “technology preview”
               • goal: broader community usage (3.4?)
         • linux-side patch ready
               • low impact on existing code
               • 2.6.18-xen version ready for inclusion in Xen-linux tree
               • 2.6.28 version working

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
Future Work

         • finish “shared hcache” work (ocfs2)
         • shared-persistent pool investigation
               • inter-domain communication?
         • real world performance measurement/analysis
               • identify tuning opportunities (e.g. scaleability) and repeat
         • finish “memory scheduler”
         • tmem for:
               •   native Linux?
               •   Linux containers?
               •   KVM?
               •   Hvm domains?

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

     • Chris Mason (Oracle)
           • Linux vfs changes for hcache
     • Zhigang Wang (Oracle)
           • Xen tools (xm + libxc) code
     • Kurt Hackel (Oracle), various HP friends, Ian, Keir, Jeremy
           • design feedback along the way

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
For more information


Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
<Insert Picture Here>

                        Memory on Xen
    2009                   Speaker: Dan Magenheimer
                                  Oracle Corporation
Backup Slides

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
Transcendent Memory API
                                                        overview (API v0.0.1)

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
Transcendent memory API
                                          op overview (API v0.0.1)

         Two classes of operations:
         • Create a pool
                     Syntax: pool_id = tmem_new_pool(uuid, flags)
         • Operate on a created pool
                   Generic syntax:
                    retval = tmem_op(handle,pfn[,ofs1,ofs2,len])

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
Transcendent memory API
                                         pool creation (API v0.0.1)
           Syntax: pool_id = tmem_new_pool(uuid, flags)

                                      ephemeral                     persistent

               private “second-chance”                           Fast swap
                       clean- page cache!!                       “device”!!      Implemented and working
                          “hcache”                                 “hswap”          today (Linux + Xen)
              shared server-side cluster                         inter-domain      Under investigation
                     filesystem cache?                           shared

              flags: private vs. shared, ephemeral vs. persistent, page size, API version, … ???
              uuid: 128-bit “share name” (for shared pools, ignored for private pools)

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
Transcendent Memory API
                              what is a “handle”?? (API v0.0.1)
                  retval = tmem_op(handle,pfn)                    (is actually)

      retval = tmem_op(pool_id,object_id,page_id,pfn)

             • The “handle” used in previous slides is actually a
               three-element “handle-tuple” consisting of:
                   • a 32-bit pool-id (obtained from tmem_new_pool())
                   • a 64-bit object-id
                   • a 32-bit page-id
             • In filesystem-like usage:
                   • pool-id one per filesystem
                   • object-id inode
                   • page-id page index into a file

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
Transcendent Memory API
                                      API operations (API v0.0.1)
        • tmem_new_pool(uuid,flags)
        • tmem_destroy_pool(pool_id)
        • tmem_put_page(pool_id,object_id,page_id,pfn)
        • tmem_get_page(pool_id,object_id,page_id,empty_pfn)
        • tmem_flush_page(pool_id,object_id,page_id)
        • tmem_flush_object(pool_id,object_id)
        • tmem_read(pool_id,object_id,page_id,pfn,
        • tmem_write(pool_id,object_id,page_id,pfn,
        • tmem_xchg(pool_id,object_id,page_id,pfn,
        • tmem_control(TBD…)

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
Transcendent Memory API
                         important semantic details (v0.0.1)
           • get_page on a private+ephemeral pool is destructive (auto-flush)
                 • implements exclusive cache semantics
           • no serialization guarantees are provided for SMP VMs
           • clients must ensure coherency with their own caches/data stores but
             implementation provides following guarantees:
                 • put/put/get (aka “dup put”) coherency
                     E may never contain the data from D1.
                   (implies that on persistent pools, dup put must never fail)
                 • get/get coherency
                      If the first get fails, the second must also fail
           • all flush operations must always succeed
           • return values: >=0 means success, < 0 failure (errno)
           • see spec for more information

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
Transcendent memory
                                           hcache performance
                                                             (smaller is better)
                    60                                                                   80

                                                                        disk reads (K)

                    30                                                                   40
                                                                                              pcpu=2 pcpu=4 pcpu=4
                         pcpu=2 pcpu=4 pcpu=4
                                                                                              vcpu=2 vcpu=2 vcpu=4
                         vcpu=2 vcpu=2 vcpu=4
                                                                                         256MB w/hcache     256MB no hcache
                    256MB w/hcache         256MB no hcache
                                                                                         1024MB no hcache   2048MB no hcache
                    1024MB no hcache       2048MB no hcache

                     Benchmark: Linux compile, cold page cache, pre-caching enabled (ccache)

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
Transcendent memory
                                  hcache compensates for
                                 underprovisioned memory
                  120                                                             600
                  100                                                             500

                                                                 disk reads (K)
                  80                                                              400

                  60                                                              300
                  40                                                              200
                  20                                                              100
                   0                                                               0
                               pcpu=4 vcpu=4                                                 pcpu=4 vcpu=4

                   128MB w/hcache          128MB no hcache                         128MB w/hcache    128MB no hcache
                   256MB no hcache         1024MB no hcache                        256MB no hcache   1024MB no hcache

                        Benchmark: Linux compile, warm page cache, pre-caching disabled

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
hcache (multiple domains + compressed)

                                                                 • shared compressed
                                                                   extended page cache for
                                                                   more than one guest


Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

More Related Content

XS Oracle 2009 Transcendent Memory

  • 1. <Insert Picture Here> Transcendent Memory on Xen 2009 Speaker: Dan Magenheimer Oracle Corporation
  • 2. Agenda • Motivation and Challenge • Overview of Physical Memory Management • Transcendent Memory (“tmem”) Overview • Transcendent Memory in Action • Status, Futures, etc. Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 3. Motivation • Memory is increasingly becoming a bottleneck in virtualized system • Existing mechanisms have major holes ballooning Four underutilized 2-cpu virtual servers each with 1GB RAM One 4-CPU physical server w/4GB RAM X X memory overcommitment page sharing Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 4. The Virtualized Physical Memory Resource Optimization Challenge Optimize, across time, the distribution of machine memory among a maximal set of virtual machines by: • measuring the current and future memory need of each running VM and • reclaiming memory from those VMs that have an excess of memory and either: • providing it to VMs that need more memory or • using it to provision additional new VMs. • without suffering a significant performance penalty Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 5. The Virtualized Physical Memory Resource Optimization Challenge Optimize, across time, the distribution of machine memory among a maximal set of virtual machines by: • measuring the current and future memory need of each running VM and • reclaiming memory from those VMs that have an excess of memory and either: • providing it to VMs that need more memory or • using it to provision additional new VMs. • without suffering a significant performance penalty …..Why is this a hard problem? Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 6. Agenda • Motivation and Challenge • Overview of Physical Memory Management • in an operating system • in a virtual machine monitor (Xen) • Transcendent Memory Overview • Transcendent Memory In Action • Status, Futures, etc. Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 7. OS Physical Memory Management • Operating systems are memory hogs! OS Memory constraint Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 8. OS Physical Memory Management • Operating systems are memory hogs! OS If you give an operating system more memory….. New larger memory constraint Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 9. OS Physical Memory Management • Operating systems are memory hogs! My name is Linux and I am a memory …it uses up any hog memory you give it! Memory constraint Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 10. OS Physical Memory Management • What does an OS do with all that memory? Kernel code Pa ge User data ta Page bl es cache Kernel User code data Everything else Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 11. OS Physical Memory Management • What does an OS do with all that memory? page cache Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 12. OS Physical Memory Management • What does an OS do with all that memory? page cache Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 13. OS Physical Memory Management • What does an OS do with all that memory? page cache …much of the time mostly page cache … some of which will be useful in the future … and some of which is wasted Everything else Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 14. Agenda • Motivation and Challenge • Overview of Physical Memory Management • in an operating system • in a virtual machine monitor (Xen) • Transcendent Memory Overview • Transcendent Memory In Action • Status, Futures, etc. Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 15. VMM Physical Memory Management • Xen partitions memory • hypervisor memory guest • dom0 memory • guest memory Dom0 is special ☺ Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 16. VMM Physical Memory Management • Xen partitions memory guest • Xen memory • dom0 memory • guest 1 memory • guest 2 memory guest • whatever’s left over: “fallow” memory fallow, adj., land left without a crop for one or more years Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 17. VMM Physical Memory Management • Xen partitions fallow memory guest • Xen memory fallow • dom0 memory fallow • guest 1 memory • guest 2 memory guest • whatever’s left over: “fallow” memory fallow, adj., land left without a crop for one or more years Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 18. VMM Physical Memory Management • Xen partitions memory fallow among more guests fallow gues • Xen memory guest t • dom0 memory guest • guest 1 memory • guest 2 memory • guest 3… fallow guest • BUT still fallow memory leftover Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 19. VMM Physical Memory Management in the presence of migration fallow fallow gue guest s t Physical gues machine “B” • migration t fallow • requires fallow memory in the target machine • leaves behind fallow fallow memory in the fallow gue s t originating machine gues t fallow guest Physical machine “A” Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 20. VMM Physical Memory Management in the presence of ballooning • Use ballooning to fallow allow guest memory fallow gues guest t size to grow? guest • Goal: fill fallow memory fallow guest Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 21. VMM Physical Memory Management in the presence of ballooning • Look! No more fallow guest fallow memory! guest But…. guest gue st gue st Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 22. VMM Physical Memory Management in the presence of ballooning • Look! No more fallow fallow guest memory! guest But…. guest gue st gue st And but… Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 23. VMM Physical Memory Management in the presence of ballooning Using ballooning to take memory away: • not instantaneous (memory inertia) • guest can’t predict future needs • good pages are evicted along with the bad • don’t know how much/fast to balloon • Too much or too fast thrashing or the dreaded OOM killer Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 24. The Virtualized Physical Memory Resource Optimization Challenge Optimize, across time, the distribution of machine memory among a maximal set of virtual machines by: • measuring the current and future memory need of each running VM and • reclaiming memory from those VMs that have an excess of memory and either: • providing it to VMs that need more memory or • using it to provision additional new VMs. • without suffering a significant performance penalty …..This IS a hard problem!!! Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 25. Why this IS a hard problem! Summary • OS’s use as much memory as they are given • but cannot predict the future so often guess wrong • and often much memory owned by an OS is wasted • Xen leaves large amounts of memory fallow • fixed partitioning results in fragmentation • migration requires fallow memory to succeed • Ballooning helps but: • can’t predict future memory needs of guests • memory has inertia • the price of incorrect guesses can be dire NEED A NEW APPROACH TO VIRTUALIZED PHYSICAL MEMORY MANAGEMENT!! Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 26. Agenda • Motivation and Challenge • Overview of Physical Memory Management • Transcendent Memory Overview • Transcendent Memory In Action • Status, Futures, etc. Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 27. Transcendent memory creating the transcendent memory pool • Step 1a: reclaim all fallow memory fallow • Step 1b: reclaim wasted guest fallow memory (e.g. via ballooning) guest guest • Step 1c: collect it all into a pool guest fallow guest Transcendent memory pool Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 28. Transcendent memory creating the transcendent memory pool • Step 2: provide indirect access, strictly controlled by guest the hypervisor and dom0 guest data data control Transcendent guest memory data pool control data guest Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 29. Transcendent memory API characteristics Transcendent memory API guest guest • paravirtualized (lightly) • narrow • well-specified • operations are: • synchronous • page-oriented (one page per op) • copy-based • multi-faceted • extensible Transcendent memory pool Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 30. Transcendent memory four different subpool types four different uses ephemeral persistent private “second-chance” Fast swap clean-page cache!! “device”!! Implemented and working “hcache” “hswap” today (Linux + Xen) In development shared server-side cluster inter-domain filesystem cache? shared Under investigation “shared hcache” memory? eph-em-er-al, adj., … transitory, existing only briefly, short-lived (i.e. NOT persistent) Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 31. Transcendent memory caveats • Requirements • guest OS must be paravirtualized • 64-bit hypervisor and CPU • Workload: • should exert memory pressure in at least one guest • memory pressure in multiple guests should vary across time • For best results: • dom0 should be configured with a fixed memory size • guest should have a (virtual) swap disk configured • Complementary to: • feedback-directed ballooning • transparent content-based page sharing Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 32. Agenda • Motivation and Challenge • Overview of Physical Memory Management • Transcendent Memory Overview • Transcendent Memory In Action “hcache”* • private-ephemeral pool “shared hcache” • shared-ephemeral pool “hswap”* • private-persistent pool • Status, Future, etc. * called “precache” and “preswap” for Linux Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 33. hcache • a second-chance clean page cache for a guest • “put” clean pages only • “get” only valuable pages • pages eventually are evicted Transcendent memory pool • coherency managed by guest (private+ephemeral) • exclusive cache semantics “put” Transcendent Memory Pool types “get” persistent ephemeral guest “second-chance” Fast swap private clean-page cache!! “device”!! “hcache” “hswap” shared server-side cluster inter-domain filesystem cache? shared memory? “shared hcache” Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 34. hcache (with compression) • Compression • Option (per-domain) guest • nominally doubles available memory • performance-space tradeoff Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 35. hcache (multiple guests) • second-chance page cache for multiple guests guest • Need “memory scheduler”: • global admission/eviction policy: private ephemeral • LRU queue, or tmem pool #1 • weight balanced (future) private ephemeral tmem pool #2 guest Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 36. shared hcache (for clustering) • guests sharing a clustered filesystem guest • non-exclusive • LFU instead of LRU • compression optional SHARED ephemeral a server-side disk cache! tmem pool Clustered filesystem Transcendent Memory Pool types persistent ephemeral guest private “second-chance” Fast swap clean-page cache!! “device”!! “hcache” “hswap” server-side cluster inter-domain shared filesystem cache? shared memory? “shared hcache” Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 37. hswap • over-ballooned guests experiencing unexpected memory pressure have an emergency swap disk • much faster than swapping • persistent (“dirty”) pages OK • prioritized higher than hcache • limited by domain’s maxmem Transcendent Memory Pool types ephemeral persistent “second-chance” Fast swap private clean-page cache!! “device”!! “hcache” “hswap” shared server-side cluster inter-domain filesystem cache? shared memory? “shared hcache” Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 38. Agenda • Motivation and Challenge • Overview of Physical Memory Management • Transcendent Memory Overview • Transcendent Memory In Action • Status, Future, etc. Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 39. Current Status • hcache and hswap fully working • shared hcache soon • xen-side patch ready for inclusion in xen-unstable • ~3K line patch, but low impact on existing code • enabled with xen boot option (off by default) • “technology preview” • goal: broader community usage (3.4?) • linux-side patch ready • low impact on existing code • 2.6.18-xen version ready for inclusion in Xen-linux tree • 2.6.28 version working Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 40. Future Work • finish “shared hcache” work (ocfs2) • shared-persistent pool investigation • inter-domain communication? • real world performance measurement/analysis • identify tuning opportunities (e.g. scaleability) and repeat • finish “memory scheduler” • tmem for: • native Linux? • Linux containers? • KVM? • Hvm domains? Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 41. Acknowledgements • Chris Mason (Oracle) • Linux vfs changes for hcache • Zhigang Wang (Oracle) • Xen tools (xm + libxc) code • Kurt Hackel (Oracle), various HP friends, Ian, Keir, Jeremy • design feedback along the way Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 42. For more information http://oss.oracle.com/projects/tmem Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 43. <Insert Picture Here> Transcendent Memory on Xen 2009 Speaker: Dan Magenheimer Oracle Corporation
  • 44. Backup Slides Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 45. Transcendent Memory API overview (API v0.0.1) Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 46. Transcendent memory API op overview (API v0.0.1) Two classes of operations: • Create a pool Syntax: pool_id = tmem_new_pool(uuid, flags) • Operate on a created pool Generic syntax: retval = tmem_op(handle,pfn[,ofs1,ofs2,len]) Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 47. Transcendent memory API pool creation (API v0.0.1) Syntax: pool_id = tmem_new_pool(uuid, flags) ephemeral persistent private “second-chance” Fast swap clean- page cache!! “device”!! Implemented and working “hcache” “hswap” today (Linux + Xen) shared server-side cluster inter-domain Under investigation filesystem cache? shared memory? flags: private vs. shared, ephemeral vs. persistent, page size, API version, … ??? uuid: 128-bit “share name” (for shared pools, ignored for private pools) Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 48. Transcendent Memory API what is a “handle”?? (API v0.0.1) retval = tmem_op(handle,pfn) (is actually) retval = tmem_op(pool_id,object_id,page_id,pfn) • The “handle” used in previous slides is actually a three-element “handle-tuple” consisting of: • a 32-bit pool-id (obtained from tmem_new_pool()) • a 64-bit object-id • a 32-bit page-id • In filesystem-like usage: • pool-id one per filesystem • object-id inode • page-id page index into a file Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 49. Transcendent Memory API API operations (API v0.0.1) • tmem_new_pool(uuid,flags) • tmem_destroy_pool(pool_id) • tmem_put_page(pool_id,object_id,page_id,pfn) • tmem_get_page(pool_id,object_id,page_id,empty_pfn) • tmem_flush_page(pool_id,object_id,page_id) • tmem_flush_object(pool_id,object_id) • tmem_read(pool_id,object_id,page_id,pfn, offset1,offset2,len) • tmem_write(pool_id,object_id,page_id,pfn, offset1,offset2,len • tmem_xchg(pool_id,object_id,page_id,pfn, offset1,offset2,len) • tmem_control(TBD…) Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 50. Transcendent Memory API important semantic details (v0.0.1) • get_page on a private+ephemeral pool is destructive (auto-flush) • implements exclusive cache semantics • no serialization guarantees are provided for SMP VMs • clients must ensure coherency with their own caches/data stores but implementation provides following guarantees: • put/put/get (aka “dup put”) coherency tmem_put_page(ABC,D1); tmem_put_page(ABC,D2); tmem_get_page(ABC,E); E may never contain the data from D1. (implies that on persistent pools, dup put must never fail) • get/get coherency tmem_get_page(ABC,E); tmem_get_page(ABC,E); If the first get fails, the second must also fail • all flush operations must always succeed • return values: >=0 means success, < 0 failure (errno) • see spec for more information Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 51. Transcendent memory hcache performance (smaller is better) 100 70 60 80 disk reads (K) 50 60 seconds 40 30 40 20 20 10 0 0 pcpu=2 pcpu=4 pcpu=4 pcpu=2 pcpu=4 pcpu=4 vcpu=2 vcpu=2 vcpu=4 vcpu=2 vcpu=2 vcpu=4 256MB w/hcache 256MB no hcache 256MB w/hcache 256MB no hcache 1024MB no hcache 2048MB no hcache 1024MB no hcache 2048MB no hcache Benchmark: Linux compile, cold page cache, pre-caching enabled (ccache) Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 52. Transcendent memory hcache compensates for underprovisioned memory 120 600 100 500 disk reads (K) 80 400 seconds 60 300 40 200 20 100 0 0 pcpu=4 vcpu=4 pcpu=4 vcpu=4 128MB w/hcache 128MB no hcache 128MB w/hcache 128MB no hcache 256MB no hcache 1024MB no hcache 256MB no hcache 1024MB no hcache Benchmark: Linux compile, warm page cache, pre-caching disabled Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer
  • 53. hcache (multiple domains + compressed) • shared compressed extended page cache for guest more than one guest guest Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer