Memory Management of Linux PDF
Memory Management of Linux PDF
Memory Management of Linux PDF
Dr. Ulrich Weigand Linux on zSeries Development, IBM Lab Bblingen Ulrich.Weigand@de.ibm.com
Or this:
$ cat /proc/meminfo MemTotal: 512832 MemFree: 39512 Buffers: 52308 Cached: 236768 SwapCached: 532 Active: 246328 Inactive: 61920 HighTotal: 0 HighFree: 0 LowTotal: 512832 LowFree: 39512 SwapTotal: 524280 SwapFree: 523368 kB kB kB kB kB kB kB kB kB kB kB kB kB Dirty: Writeback: Mapped: Slab: Committed_AS: PageTables: VmallocTotal: VmallocUsed: VmallocChunk: 28 0 5492 158608 7656 208 1564671 724 1563947 kB kB kB kB kB kB kB kB kB
Agenda
Physical Memory Dynamic Address Translation Process Address Spaces and Page Cache Kernel Memory Allocators Page Replacement and Swapping Virtualization Considerations
Physical Memory
Pages aggregated into memory zones Zones may be aggregated into nodes (NUMA systems)
Boot process
Kernel loaded at bottom of memory Determine size of memory, create 'struct page' array Kernel pages and boot memory marked as 'reserved' All other pages form the master pool for page allocation
Memory Zones
Memory not directly addressable from kernel space
On Intel machines all > 4/2/1 GB On zSeries empty
On 31-bit zSeries all memory On 64-bit zSeries all < 2 GB (On Intel all < 16 MB)
Slab cache
Manages allocations of objects of the same type Large-scale users: inodes, dentries, block I/O, network ... kmalloc (generic allocator) implemented on top
Buddy Allocator
Order-n free lists (per zone) 0 1 2 3 4 5 6 7 8 ... MAX_ORDER 2^0 pages
2^3 pages
2^7 pages Allocate order-n block: If none is free, a order-(n+1) block is split Free order-n block: If 'buddy' is also free, merge them to order-(n+1) block
STO
SX
PX
BX
Segment Table
Page Table
PTO
PFRA
Real Address
PFRA
Page Frame Real Address
BX
RFTO
RFX
RSX
RTX
SX
PX
BX
Region-1st Table
Region-2nd Table
Region-3rd Table
Segment Table
Page Table
RSTO
RTTO
STO
PTO
PFRA
Real Address
PFRA
BX
RTTO
RTX
SX
PX
BX
Region-1st Table
Region-2nd Table
Region-3rd Table
Segment Table
Page Table
STO
PTO
PFRA
Real Address
PFRA
BX
Access registers
Base register used in memory access identifies AR AR specifies STO/RTO via Access List Entry Token Operating System manages ALETs and grants privilege ALET 0 is primary space, ALET 1 is secondary space
_ _ _ _ _ _ _ _ _______ _ _ _ _ ___ ___ _______ _____________ _ | | | | | | |I|E| | | | | | | | Prog | |E| |0|R|0|0|0|T|O|X| Key |0|M|W|P|A S|C C| Mask |0 0 0 0 0 0 0|A| |_|_|_|_|_|_|_|_|_______|_|_|_|_|___|___|_______|_____________|_| 0 5 8 12 16 18 20 24 31 _ _____________________________________________________________ |B| | |A|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0| |_|_____________________________________________________________| 32 63 _______________________________________________________________ | | | Instruction Address | |_______________________________________________________________| 64 95 _______________________________________________________________ | | | Instruction Address (Continued) | |_______________________________________________________________| 96 127
Program Event Recording Mask Dynamic Address Translation Mode I/O Interruption Mask External Interruption Mask PSW Key (storage proctection) Machine Check Mask Wait State Problem State Address Space Control Condition Code Program Mask Extended Addressing Mode Basic Addressing Mode
Stack
vmalloc area
Shared Libs
0x40000000
mvc 0(8,%r2),0(%r4)
Kernel code
0x00000000
%a r4 =
Home Space
Primary Space
mvc 0(8,%r2),0(%r4)
'Address spaces'
Represent some page-addressed data Examples: inodes (files), shared memory segments, swap Contents cached in 'page cache'
'Memory map'
Describes a process' user address space List of 'virtual memory arenas' VMA maps part of an address space into a process
Page tables
Hardware-defined structure: region, segment, page tables Linux uses platform-independent abstraction Contents filled on-demand as defined by MM
Page tables
Memory map
Address spaces
page cache
Shared VMA
page cache
Private VMA
page cache
_ ___________________ _ _ _ _ ________ |0| PFRA |0|I|P|0| - - - | |_|___________________|_|_|_|_|________| 0 1 20 24 31 _ ___________________ _ _ _ _ ________ |0| 0 |0|1|0|0| 0 |0| |_|___________________|_|_|_|_|______|_| 0 1 20 24 31 _ ___________________ _ _ _ _ ________ |0| PFRA |0|0|0|0| 0 |0| |_|___________________|_|_|_|_|______|_| 0 1 20 24 31 _ ___________________ _ _ _ _ ________ |0| PFRA |0|0|1|0| 0 |0| |_|___________________|_|_|_|_|______|_| 0 1 20 24 31 _ ___________________ _ _ _ _ ________ |0| PFRA |0|1|0|0| 0 |1| |_|___________________|_|_|_|_|______|_| 0 1 20 24 31 _ ___________________ _ _ _ _ ________ |0| SO |0|1|1|0| SA |0| |_|___________________|_|_|_|_|______|_| 0 1 20 24 31 _ ___________________ _ _ _ _ ________ |0| FOH |0|1|1|0| FOL |1| |_|___________________|_|_|_|_|______|_| 0 1 20 24 31
PFRA: Page Frame Real Address I: Invalid P: Page Protection Empty PTE slot
Read-write page
Read-only page
No-access page
Swapped-out page SA: Swap area (file/device) SO: Offset within area Paged-out remapped page FOL: Offset (low bits) FOH: Offset (high bits)
Reverse Mappings
VM A
VM B
VM C
Physical Page
Reverse Mapping
Hardware support
Accessing invalid pages causes 'page translation' check Writing to protected pages causes 'protection exception' Translation-exception identification provides address 'Suppression on protection' facility essential!
Page Replacement
Anonymous Page
Swap File
Backing Store
referenced
Tail
unreferenced
Active List
Head
referenced
clean
Head
Inactive List
Tail
dirty
Start Writeback
Or this:
$ cat /proc/meminfo MemTotal: 512832 MemFree: 39512 Buffers: 52308 Cached: 236768 SwapCached: 532 Active: 246328 Inactive: 61920 HighTotal: 0 HighFree: 0 LowTotal: 512832 LowFree: 39512 SwapTotal: 524280 SwapFree: 523368 kB kB kB kB kB kB kB kB kB kB kB kB kB Page Cache Total LRU (PC + Anon) Dirty: Writeback: Mapped: Slab: Committed_AS: PageTables: VmallocTotal: VmallocUsed: VmallocChunk: 28 0 5492 158608 7656 208 1564671 724 1563947 kB kB kB kB kB kB kB kB kB Fixed User
/proc/sys/vm/swappiness
Influences page-out decision of mapped vs. unmapped pages
/proc/sys/vm/page-cluster
Controls swap-in read-ahead
/proc/sys/vm/dirty_background_ratio / dirty_ratio
Percentage of memory allowed to fill with dirty pages
/proc/sys/vm/dirty_writeback_centisecs / expire_centisecs
Average/maximum time a page is allowed to remain dirty
Virtualization Considerations
Caveats
Access to kernel pages Access to user page from kernel code
Changes effective available memory size without reboot! IUCV special message interface allows central instance to manage server farm total memory consumption
/proc/sys/vm/cmm_timed_pages
Read to query number of pages temporarily reserved Write increment to add to target
/proc/sys/vm/cmm_timeout
Holds pair of N pages / X seconds (read/write) Every time X seconds have passed, release N temporary pages
Resources