Dave Probert is a kernel architect at Microsoft who has worked on Windows kernels for over 13 years. He manages platform-independent kernel development and works on support for multi-core and heterogeneous parallel computing. Probert also co-instigated the Windows Academic Program which provides kernel source code and curriculum materials to universities to aid in operating systems education. The document discusses differences between the UNIX and Windows NT design environments and how those influenced OS design choices. It provides an overview of the Windows kernel architecture and changes made in newer versions like Windows 7 to improve scalability and support for multi-core systems.
1 of 30
More Related Content
2337610
1. Dave Probert, Ph.D. - Windows Kernel Architect
MicrosoftWindows Division
Copyright Microsoft Corporation
2. About Me
Ph.D. in Computer Engineering (Operating Systems w/o
Kernels)
Kernel Architect at Microsoft for over 13 years
Managed platform-independent kernel development in Win2K/XP
Working on multi-core & heterogeneous parallel computing support
Architect for UMS in Windows 7 / Windows Server 2008 R2
Co-instigator of the Windows Academic Program
Providing kernel source and curriculum materials to universities
http://microsoft.com/WindowsAcademic or compsci@microsoft.com
Wrote the Windows material for leading OS textbooks
Tanenbaum, Silberschatz, Stallings
Consulted on others, including a successful OS textbook in China
3. UNIX vs NT Design Environments
Environment which influenced
fundamental design decisions
UNIX [1969] Windows (NT) [1989]
16-bit program address space
Kbytes of physical memory
Swapping system with memory mapping
Kbytes of disk, fixed disks
Uniprocessor
State-machine based I/O devices
Standalone interactive systems
Small number of friendly users
32-bit program address space
Mbytes of physical memory
Virtual memory
Mbytes of disk, removable disks
Multiprocessor (4-way)
Micro-controller based I/O devices
Client/Server distributed computing
Large, diverse user populations
Copyright Microsoft Corporation
4. Effect on OS Design
NT vs UNIX
Although both Windows and Linux have adapted to changes in the
environment, the original design environments (i.e. in 1989 and 1969) heavily
influenced the design choices:
Unit of concurrency:
Process creation:
I/O:
Namespace root:
Security:
Threads vs processes
CreateProcess() vs fork()
Async vs sync
Virtual vs Filesystem
ACLs vs uid/gid
Addr space, uniproc
Addr space, swapping
Swapping, I/O devices
Removable storage
User populations
Copyright Microsoft Corporation
5. Today’s Environment [2009]
64-bit addresses
GBytes of physical memory
TBytes of rotational disk
New Storage hierarchies (SSDs)
Hypervisors, virtual processors
Multi-core/Many-core
Heterogeneous CPU architectures, Fixed function hardware
High-speed internet/intranet, Web Services
Media-rich applications
Single user, but vulnerable to hackers worldwide
Convergence: Smartphone / Netbook / Laptop / Desktop / TV / Web / Cloud
Copyright Microsoft Corporation
6. Windows Architecture
hardware interfaces (buses, I/O devices, interrupts,
interval timers, DMA, memory cache control, etc., etc.)
System Service Dispatcher
Task Manager
Explorer
SvcHost.Exe
WinMgt.Exe
SpoolSv.Exe
Service
Control Mgr.
LSASS
Object
Mgr.
Windows
USER,
GDI
File
System
Cache
I/O Mgr
Environment
Subsystems
User
Application
Subsystem DLLs
System Processes Services Applications
System
Threads
User
Mode
Kernel
Mode
NTDLL.DLL
Device &
File Sys.
Drivers
WinLogon
Session Manager
Services.Exe POSIX
Windows DLLs
Plugand
PlayMgr.
Power
Mgr.
Security
Reference
Monitor
Virtual
Memory
Processes
&
Threads
Local
Procedure
Call
Graphics
Drivers
Kernel
Hardware Abstraction Layer (HAL)
(kernel mode callable interfaces)
Configura-
tionMgr
(registry)
OS/2
Windows
Copyright Microsoft Corporation
7. Kernel-mode Architecture of
Windows
Copyright Microsoft Corporation
NT API stubs (wrap sysenter) -- system library (ntdll.dll)
user
mode
kernel
mode
NTOS executive layer
Trap/Exception/Interrupt Dispatch
CPU mgmt: scheduling, synchr, ISRs/DPCs/APCs
Drivers
Devices, Filters,
Volumes,
Networking,
Graphics
Hardware Abstraction Layer (HAL): BIOS/chipset details
firmware/
hardware CPU, MMU, APIC, BIOS/ACPI, memory, devices
NTOS
kernel
layer
Caching Mgr
Security
Procs/Threads
Virtual Memory
IPC
glue
I/O
Object Mgr
Registry
Copyright Microsoft Corporation
8. Kernel/Executive layers
Kernel layer – ntos/ke – ~ 5% of NTOS source)
Abstracts the CPU
Threads, Asynchronous Procedure Calls (APCs)
Interrupt Service Routines (ISRs)
Deferred Procedure Calls (DPCs – aka Software Interrupts)
Providers low-level synchronization
Executive layer
OS Services running in a multithreaded environment
Full virtual memory, heap, handles
Extensions to NTOS: drivers, file systems, network, …
Copyright Microsoft Corporation
10. Windows Vista Kernel
Changes Kernel changes mostly minor improvements
Algorithms, scalability, code maintainability
CPU timing: Uses Time Stamp Counter (TSC)
Interrupts not charged to threads
Timing and quanta are more accurate
Communication
ALPC: Advanced Lightweight Procedure Calls
Kernel-mode RPC
New TCP/IP stack (integrated IPv4 and IPv6)
I/O
Remove a context switch from I/O Completion Ports
I/O cancellation improvements
Memory management
Address space randomization (DLLs, stacks)
Kernel address space dynamically configured
Security: BitLocker, DRM, UAC, Integrity Levels
Copyright Microsoft Corporation
11. Windows 7 Kernel Changes
Miscellaneous kernel changes
MinWin
Change how Windows is built
Lots of DLL refactoring
API Sets (virtual DLLs)
Working-set management
Runaway processes quickly start reusing own pages
Break up kernel working-set into multiple working-sets
System cache, paged pool, pageable system code
Security
Better UAC, new account types, less BitLocker blockers
Energy efficiency
Trigger-started background services
Core Parking
Timer-coalescing, tick skipping
Major scalability improvements for large server apps
Broke apart last two major kernel locks, >64p
Kernel support for ConcRT
User-Mode Scheduling (UMS)
Copyright Microsoft Corporation
12. MinWin
MinWin is first step at creating architectural
partitions
Can be built, booted and tested separately from the rest of the
system
Higher layers can evolve independently
An engineering process improvement, not a microkernel NT!
MinWin was defined as set of components required to
boot and access network
Kernel, file system driver, TCP/IP stack, device drivers, services
No servicing, WMI, graphics, audio or shell, etc, etc, etc
MinWin footprint:
150 binaries, 25MB on disk, 40MB in-memory
14. Timer Coalescing
Secret of energy efficiency: Go idle and Stay idle
Staying idle requires minimizing timer interrupts
Before, periodic timers had independent cycles even when period
was the same
New timer APIs permit timer coalescing
Application or driver specifies tolerable delay
Timer system shifts timer firing
MarkRuss
15. Broke apart the Dispatcher
Lock Scheduler Dispatcher lock hottest on server workloads
Lock protects all thread state changes (wait, unwait)
Very lock at >64x
Dispatcher lock broken up inWindows 7 / Server 2008 R2
Each object protected by its own lock
Many operations are lock-free
Copyright Microsoft Corporation
16. Removed PFN Lock
Windows tracks the state of pages in physical memory
In use: in working sets:
Not assigned: on paging lists: freemodified, standby, …
Before, all page state changes protected by global PFN
(Physical Frame Number) lock
As of Windows 7 the PFN lock is gone
Pages are now locked individually
Improves scalability for large memory applications
Copyright Microsoft Corporation
17. The Silicon Power Wall
The situation:
Power2
∝ Clock frequency
Voltage ∝ Power2
⇨Clock frequency and Voltage offset each other
Clock frequency inversely proportional to logic path length
Bad News:
Power is about as low as it can go
Logic paths between clocked elements are pretty short
Good News:
Moore’s Law continues (# transistors doubles ~22 months)
All that parallel computational theory is going into practice
Transistors going into more cores, not faster cores!
Software subject to Amdahl’s Law, not Moore’s Law
(or Gustafson’s Law
– if my wife can find large enough datasets she cares about) 17
18. Approaches to HW
parallelismHomogeneous
More big superscalar cores
Extend with private (or shared) SIMD engines (SSE on steroids)
(Maybe) not very energy efficient
A few more big, cores and lots of smaller, slower, cooler cores
Use SIMD for performance
Shutoff idle small cores for energy efficiency (but leakage?)
Lots of little fully programmable cores, all the same
Nobody has ever gotten this to work – more on this later
Heterogeneous
Programmable Accelerators (e.g. GPUs)
Attach loosely-coupled, specialized (non-x86), energy-efficient cores
Fixed-function Accelerators
Very energy-efficient, device-like computational units for very-specific tasks
18
19. User Mode Scheduling (UMS)
Improve support for efficient cooperative multithreaded
scheduling of small tasks (over-decomposition)
⇒ Want to schedule tasks in user-mode
⇒ Use NT threads to simulate CPUs, multiplex tasks onto these
threads
When a task calls into the kernel and blocks, the CPU may get
scheduled to a different app
⇒ If a single NT thread per CPU, when it blocks it blocks.
⇒ Could have extra threads, but then kernel and user-mode are
competing to schedule the CPU
Tasks run arbitraryWin32 code (but only x64/IA64)
⇒ Assumes running on an NT thread (TEB, kernel thread)
Used by ConcRT (Visual Studio 2010’s Concurrency Run-Time)
Copyright Microsoft Corporation
20. Windows 7 User-Mode Scheduling
UMS breaks NT thread into two parts:
UT: user-mode portion (TEB, ustack, registers)
KT: kernel-mode portion (ETHREAD, kstack, registers)
Three key properties:
User-mode scheduler switches UTs w/o ring crossing
KT switch is lazy: at kernel entry (e.g. syscall, pagefault)
CPU returned to user-mode scheduler when KT blocks
KT “returns” to user-mode by queuing completion
User-mode scheduler schedules corresponding UT
(similar to scheduler activations, etc)
Copyright Microsoft Corporation
21. Normal NT Threading
kernel
user
KT0 KT1 KT2
UT2UT1
UT0
Kernel-mode
Scheduler
NTOS executive
trap code
NT Thread is Kernel Thread (KT) and User Thread (UT)
UT/KT form a single logical thread representing NT thread in user or
kernel
KT: ETHREAD, KSTACK, link to EPROCESS
UT: TEB, USTACK
x86 core
Copyright Microsoft Corporation
22. User-Mode Scheduling (UMS)
kernel
user
Thread Parking
KT0 KT1 KT2
UT Completion list
Primary
Thread
UT0
UT1
UT0
User-mode
Scheduler
trap code
NTOS executive
KT0 blocks
Only primary thread runs in user-mode
Trap code switches to parked KT
KT blocks ⇒ primary returns to user-mode
KT unblocks & parks ⇒ queue UT completion
Copyright Microsoft Corporation
23. UMS
Based on NT threads
⇒ Each NT thread has user & kernel parts (UT & KT)
⇒ When a thread becomes UMS, KT never returns to UT
⇒ (Well, sort of)
⇒ Instead, the primary thread calls the USched
USched
⇒ Switches between UTs, all in user-mode
⇒ When a UT enters kernel and blocks, the primary thread will hand
CPU back to the USched declaring UT blocked
⇒ When UT unblocks, kernel queues notification
⇒ USched consumes notifications, marks UT runnable
PrimaryThread
⇒ Self-identified by entering kernel with wrongTEB
⇒ So UTs can migrate between threads
⇒ Affinities of primaries and KTs are orthogonal issues
Copyright Microsoft Corporation
24. UMS Thread Roles
Primary threads: represent CPUs, normal app threads enter the
USched world and become primaries, primaries also can be created
by UScheds to allow parallel execution
Primaries represent concurrent execution
UMS threads (UT/KTs): allow blocking in the kernel without losing
the CPU
UMS thread represent concurrent blocking in kernel
Copyright Microsoft Corporation
25. Thread Scheduling vs UMS
Core 2
Thread
3
Non-running threads
Core 1
Thread
4
Thread
5
Thread
1
Thread
2
Thread
6
Core 2Core 1
User
Thread
2
Kernel
Thread
2
User
Thread
1
Kernel
Thread
1
User
Thread
3
Kernel
Thread
3
User
Thread
4
Kernel
Thread
4
User
Thread
5
Kernel
Thread
5
User
Thread
6
Kernel
Thread
6
MarkRuss
26. Win32 compat considerations
Why not Win32 fibers?
TEB issues
⇒ ContainsTLS andWin32-specific fields (incl LastError)
⇒ Fibers run on multiple threads, soTEB state doesn’t track
Kernel thread issues
⇒ Visibility toTEB
⇒ I/O is queued to thread
⇒ Mutexes record thread owner
⇒ Impersonation
⇒ Cross-thread operations expect to find threads and IDs
⇒ Win32 code has thread and affinity awareness
Copyright Microsoft Corporation
27. Futures: Master/Slave UMS?
remote kernel
Remote x86
Thread Parking
KT0 KT1 KT2
UT2
UT1
Remote
Scheduler
trap code
NTOS executiveKernel-mode
Scheduler
Syscall Completion QueueSyscall Request Queue
UT0
x86 core
UTs (can) run on accelerators or x86s
KTs run on x86s, syscalls remoted/batched
Pagefaults are just like syscalls
Accelerator never “loses the CPU” (implicit primary)
Copyright Microsoft Corporation
28. Operating Systems Futures
Many-core challenge
New driving force in software innovation:
Amdahl’s Law overtakes Moore’s Law as high-order bit
Heterogeneous cores?
OS Scalability
Loosely –coupled OS: mem + cpu + services?
Energy efficiency
Shrink-wrap and Freeze-dry applications?
Hypervisor/Kernel/Runtime relationships
Move kernel scheduling (cpu/memory) into run-times?
Move kernel resource management into Hypervisor?
Copyright Microsoft Corporation
29. Windows Academic Program
Windows Kernel Internals
Windows kernel in source (Windows Research Kernel –WRK)
Windows kernel in PowerPoint (Curriculum Resource Kit – CRK)
Based onWindows Server 2008 Service Pack 1
Latest kernel at time of release
First kernel release with AMD64 support
Joint program betweenWindows Product Group and MS
Academic Groups
Program directed by Arkady Retik (Need a DVD? Have questions?)
Information available at
http://microsoft.com/WindowsAcademic OR
compsci@microsoft.com
Microsoft Academic Contacts in Buenos Aires
Miguel Saez (masaez@microsoft.com) or
Ezequiel Glinsky (eglinsky@microsoft.com)
Copyright Microsoft Corporation