The document discusses the goals and architecture of the Windows operating system. It aims to be hardware-portable, software-portable, and high performance. It uses an object-oriented model with handles, security descriptors, and access rights. Key components include processes, threads, memory management, I/O, and structured exception handling.
2. Goals
• Hardware-portable
– Used to support MIPS, PowerPC and Alpha
– Currently supports x86, ia64, and amd64
– Multiple vendors build hardware
• Software-portable
– POSIX, OS2, and Win32 subsystems
• OS2 is dead
• POSIX is still supported—separate product
• Lots of Win32 software out there in the world
3. Goals
• High performance
– Anticipated PC speeds approaching
minicomputers and mainframes
– Async IO model is standard
– Support for large physical memories
– SMP was an early design goal
– Designed to support multi-threaded processes
– Kernel has to be reentrant
4. Process Model
• Threads and processes are distinct
• Process:
– Address space
– Handle table (Handles => file descriptors)
– Process default security token
• Thread:
– Execution Context
– Optional thread-specific security token
5. Tokens
• “Who you are”—list of identities
– Each identity is a SID
• Also contains Privileges
– Shutdown, Load drivers, Backup, Debug…
• Can be passed through LPC ports and
named pipe requests
– Server side can use this to selectively
impersonate the client.
6. Object Manager
• Uniform interface to kernel mode objects.
• Handles are 32bit opaque integers
• Per-process handle table maps handles to
objects and permissions on the objects
• Implements refcount GC
– Pointer count—total number of references
– Handle count—number of open handles
7. Object Manager
• Implements an object namespace
– Win32 objects are under BaseNamedObjects
– Devices under Device
• This includes filesystems
– Drive letters are symbolic links
• ??C: => the appropriate filesystem device
• Some things have other names
– Processes and threads are opened by
specifying a CID: (Process.Thread)
8. Standard operations on handles
• CloseHandle()
• DuplicateHandle()
– Takes source and destination process
– Very useful for servers
• WaitForSingleObject(),
WaitForMultipleObjects()
– Wait for something to happen
– Can wait on up to 64 handles at once
9. Security Descriptors
• Each object has a Security Descriptor
– Owner—special SID, CREATOR_OWNER
– Group—special SID, CREATOR_GROUP
– DACL
• Discretionary Access Control List
• List of SIDs and granted or denied access rights
– SACL
• System Access Control List
• List of SIDs and access rights to be audited
11. Security Use
• Objects are referred to via handles
• Security checks occur when an object is
opened
– Open requests contain a mask of requested
access rights
– If granted to the token by the DACL, the
handle contains those access rights
• Access rights are checked on use
– Just a bit test—very fast
12. Object Open
evt = OpenEvent(EVENT_MODIFY_STATE,
FALSE, "SomeName");
– Finds the event object by name
– Walks the DACL, looking for token SIDs
– Keeps looking until all permissions are
granted
– If access is granted, inserts a handle to the
object into the process’s handle table, with
EVENT_MODIFY_STATE access
13. Object Use
SetEvent(evt);
– SetEvent() requires EVENT_MODIFY_STATE
access, and an event object.
– The kernel looks up the handle in the
process’s handle table.
– Checks to make sure that it maps to an event
object, and that the granted access bits
contain the EVENT_MODIFY_STATE bit.
– If all is good, the event is set.
14. Object Use
WaitForSingleObject(evt)
– WaitForSingleObject() requires a
synchronization object (like an event) and
SYNCHRONIZE access.
– evt maps to an event object
– SYNCHRONIZE access was not requested
when the handle was inserted.
– Even if the DACL permits it, the wait fails.
15. Types of Objects
• Events
– State is set or clear.
– Can clear when a wait completes (auto-reset)
• Mutexes
– Can be acquired by a single thread at a time.
– Automatically release when owner exits.
• Semaphores
– Maintain a count
– Waits decrement the count
16. More objects
• Threads, Processes, Timers—like events
• Registry Keys
– Manipulate data in the registry—centralized
store of system configuration info.
• LPC Ports
– Fast local RPC
– Security tokens can transfer over LPC calls
• Files
17. Files & IO
• File objects maintain a current offset, and
a pointer to the underlying stream.
• Default internal model is asynchronous
– Synchronous IO just waits for the IO to
complete
– Async IO can set an event, or run a callback
in the thread which queued the IO, or post a
message to an IO completion port.
• Each request is an IRP
18. IRPs
• Maintain state of IO requests, independent
of the thread working on the IO
• IRPs are handed off through the device
stack to their destinations
– Threads process IRPs
– Initiating thread processes the IRP until a
device returns STATUS_PENDING
– Subsequent processing can be done in kernel
worker threads
19. Interrupts
IRQL—Interrupt Request Level:
0 => PASSIVE_LEVEL
Processor is running threads
All usermode code is at IRQL 0
1 => APC_LEVEL; threads, APCs disabled
2 => DISPATCH_LEVEL
• Running as the processor: can’t stop!
• Can’t take a page fault
• Only locks available are KSPIN_LOCKs
20. Interupts
3-26 => Device Interrupt Service Routines
• Device interrupts are mapped to an IRQL and an
interrupt service routine; ISR is called at that IRQL
27 => PROFILE_LEVEL—profiling
28 => CLOCK2_LEVEL—clock interrupt
29 => IPI_LEVEL—interprocessor interrupt
• Requests another processor to do something
30 => POWER_LEVEL—power failure
31 => HIGH_LEVEL—interrupts disabled
21. Interrupts
• Hardware signals an interrupt
• Interrupt’s ISR runs at device IRQL
– Has to be fast; get off the processor and allow
other ISRs to run
– Typically queues a DPC, acknowledges the
interrupt, and returns
• DPC—Delayed Procedure Call
– Further processing at DISPATCH_LEVEL
– Queues work to kernel worker threads
22. IO Completion
• Driver calls IO Manager to complete the
IRP
• IO Manager queues a kernel mode APC to
the initiating thread
• APC: Asynchronous Procedure Call
– Kernel mode APC preempts thread execution
– Writes data back to user mode in the context
of the thread which initiated the IO
– Signals completion of the IO
23. IO Cache
• Classic: block cache
– Page mappings translate directly to blocks on
the underlying partition.
• Windows: stream cache
– Page mappings are offsets within a stream.
– IO Cache Manager uses the same mappings.
– All cache management (trimming) is
centralized in the memory manager
– All modifications show up in mapped views.
24. Virtual Memory
• Sections—another object type
– Can be created to map a file
– Can also be created off the pagefile
– Optionally named, for shared memory
• Reservation
– Range of VA which will not be handed out for
some other purpose
• Committed
– VA which actually maps to something
25. Aside: CreateProcess
• Just a user mode Win32 API
{
NtCreateFile(&file, szImage);
NtCreateSection(&sec, file);
NtCreateProcess(&proc, sec);
NtCreateThread(&thrd, proc);
}
WaitForSingleObject(proc);
26. Virtual Memory
• Memory Manager maintains processor-
specific page table entry mappings.
– Some parts of the address space are shared
between processes—for instance, the kernel’s
address space and the per-session space.
• On a pagefault, mm reads in the data
• Pages can be mapped without the
appropriate access… what to do?
27. Signals
• With threads, signals don’t work very well.
• Some software designs expect to touch
inaccessible memory.
– Large structured files
– Concurrent garbage collection
– SLists
• Single global handler has to somehow
know about all possible situations.
28. Structured Exception Handling
• Exceptions unwind the stack
– Almost like C++!
– C++ matches against a type hierarchy
– SEH calls exception filter code—filters are
Turing-complete.
• Two ways to deal with exceptions:
– try/finally
– try/except
31. try/except
• GetExceptionCode()
– A code indicating the cause of the exception
• GetExceptionInformation()
– Additional code-specific info
– The full processor context
• Filter decides what to do
– EXCEPTION_EXECUTE_HANDLER
– EXCEPTION_CONTINUE_SEARCH
– EXCEPTION_CONTINUE_EXECUTION
32. Structured Exception Handling
• On x86, TEB points to stack of
EXCEPTION_REGISTRATION_RECORD
– auto structs, pointing to handler code
– pushed by function prolog
– popped by function epilog
• On exception, RtlDispatchException()
walks the list.
– Runs the filters to figure out what to do
– Calls handler functions
33. Structured Exception Handling
• On x86, there’s some overhead with
pushing and popping the registration
record
• On ia64, there is no overhead
– Stack traces are reliable
– It’s always possible to look up the handler
• Exception handling is very slow
– Especially on ia64
• Used only for truly exceptional conditions
34. Structured Exception Handling
• Used in kernel mode too!
– Most user mode access will just work
– Still need to validate address ranges & data
– Works great for SMP when another thread
might be in the middle of modifying the
address space
– Expected read exceptions are returned as
status codes from system calls
– Expected writes are returned as SUCCESS
– Unexpected => buggy kernel => blue screen
35. Top-level Exception Filter
• Top frame on each thread defines a
catchall exception filter
• Top-level exception filter:
– Notifies the debugger (if being debugged)
– Launches a just-in-time debugger (if set up)
– Loads faultrep.dll to report the failure
36. Faultrep.dll
• faultrep.dll offers to report the failure back
to Microsoft
• We analyze the failures
– A significant number are recognized instantly;
we can tell the user what happened and how
to fix it.
– The others go through the standard triage
process; developers analyze the dumps and
figure out what happened.
37. OCA
• 67 million machines running XP
• Tens of thousands of drivers
• Over 100 drivers on any given machine
• One bug in one driver => Crash
• A significant number of crashes come
from third-party drivers (some of which
ship on the CD)
• Lots of different problems, though
38. Driver Verifier
• Controlled by verifier.exe
• Special-pool’s allocations
– Detects allocation overruns & use after free
• Validates some behaviors
– IRQL—touching paged memory?
– DMA buffers
• Can inject failures—useful for testing
behavior under sub-optimal conditions
39. Stress
• Every night, a couple hundred machines
run stress on the latest build
• Stress exercises filesystems, memory,
GUI, scheduler, &c, trying to uncover low-
memory handling problems and race
conditions
• Every morning, the stress test team
triages failed machines
• Developers debug the failures