Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
The Windows Operating
System
Goals
• Hardware-portable
– Used to support MIPS, PowerPC and Alpha
– Currently supports x86, ia64, and amd64
– Multiple vendors build hardware
• Software-portable
– POSIX, OS2, and Win32 subsystems
• OS2 is dead
• POSIX is still supported—separate product
• Lots of Win32 software out there in the world
Goals
• High performance
– Anticipated PC speeds approaching
minicomputers and mainframes
– Async IO model is standard
– Support for large physical memories
– SMP was an early design goal
– Designed to support multi-threaded processes
– Kernel has to be reentrant
Process Model
• Threads and processes are distinct
• Process:
– Address space
– Handle table (Handles => file descriptors)
– Process default security token
• Thread:
– Execution Context
– Optional thread-specific security token
Tokens
• “Who you are”—list of identities
– Each identity is a SID
• Also contains Privileges
– Shutdown, Load drivers, Backup, Debug…
• Can be passed through LPC ports and
named pipe requests
– Server side can use this to selectively
impersonate the client.
Object Manager
• Uniform interface to kernel mode objects.
• Handles are 32bit opaque integers
• Per-process handle table maps handles to
objects and permissions on the objects
• Implements refcount GC
– Pointer count—total number of references
– Handle count—number of open handles
Object Manager
• Implements an object namespace
– Win32 objects are under BaseNamedObjects
– Devices under Device
• This includes filesystems
– Drive letters are symbolic links
• ??C: => the appropriate filesystem device
• Some things have other names
– Processes and threads are opened by
specifying a CID: (Process.Thread)
Standard operations on handles
• CloseHandle()
• DuplicateHandle()
– Takes source and destination process
– Very useful for servers
• WaitForSingleObject(),
WaitForMultipleObjects()
– Wait for something to happen
– Can wait on up to 64 handles at once
Security Descriptors
• Each object has a Security Descriptor
– Owner—special SID, CREATOR_OWNER
– Group—special SID, CREATOR_GROUP
– DACL
• Discretionary Access Control List
• List of SIDs and granted or denied access rights
– SACL
• System Access Control List
• List of SIDs and access rights to be audited
Access Rights
typedef struct _ACCESS_MASK {
USHORT SpecificRights;
UCHAR StandardRights;
UCHAR AccessSystemAcl : 1;
UCHAR Reserved : 3;
UCHAR GenericAll : 1;
UCHAR GenericExecute : 1;
UCHAR GenericWrite : 1;
UCHAR GenericRead : 1;
} ACCESS_MASK;
Security Use
• Objects are referred to via handles
• Security checks occur when an object is
opened
– Open requests contain a mask of requested
access rights
– If granted to the token by the DACL, the
handle contains those access rights
• Access rights are checked on use
– Just a bit test—very fast
Object Open
evt = OpenEvent(EVENT_MODIFY_STATE,
FALSE, "SomeName");
– Finds the event object by name
– Walks the DACL, looking for token SIDs
– Keeps looking until all permissions are
granted
– If access is granted, inserts a handle to the
object into the process’s handle table, with
EVENT_MODIFY_STATE access
Object Use
SetEvent(evt);
– SetEvent() requires EVENT_MODIFY_STATE
access, and an event object.
– The kernel looks up the handle in the
process’s handle table.
– Checks to make sure that it maps to an event
object, and that the granted access bits
contain the EVENT_MODIFY_STATE bit.
– If all is good, the event is set.
Object Use
WaitForSingleObject(evt)
– WaitForSingleObject() requires a
synchronization object (like an event) and
SYNCHRONIZE access.
– evt maps to an event object
– SYNCHRONIZE access was not requested
when the handle was inserted.
– Even if the DACL permits it, the wait fails.
Types of Objects
• Events
– State is set or clear.
– Can clear when a wait completes (auto-reset)
• Mutexes
– Can be acquired by a single thread at a time.
– Automatically release when owner exits.
• Semaphores
– Maintain a count
– Waits decrement the count
More objects
• Threads, Processes, Timers—like events
• Registry Keys
– Manipulate data in the registry—centralized
store of system configuration info.
• LPC Ports
– Fast local RPC
– Security tokens can transfer over LPC calls
• Files
Files & IO
• File objects maintain a current offset, and
a pointer to the underlying stream.
• Default internal model is asynchronous
– Synchronous IO just waits for the IO to
complete
– Async IO can set an event, or run a callback
in the thread which queued the IO, or post a
message to an IO completion port.
• Each request is an IRP
IRPs
• Maintain state of IO requests, independent
of the thread working on the IO
• IRPs are handed off through the device
stack to their destinations
– Threads process IRPs
– Initiating thread processes the IRP until a
device returns STATUS_PENDING
– Subsequent processing can be done in kernel
worker threads
Interrupts
IRQL—Interrupt Request Level:
0 => PASSIVE_LEVEL
Processor is running threads
All usermode code is at IRQL 0
1 => APC_LEVEL; threads, APCs disabled
2 => DISPATCH_LEVEL
• Running as the processor: can’t stop!
• Can’t take a page fault
• Only locks available are KSPIN_LOCKs
Interupts
3-26 => Device Interrupt Service Routines
• Device interrupts are mapped to an IRQL and an
interrupt service routine; ISR is called at that IRQL
27 => PROFILE_LEVEL—profiling
28 => CLOCK2_LEVEL—clock interrupt
29 => IPI_LEVEL—interprocessor interrupt
• Requests another processor to do something
30 => POWER_LEVEL—power failure
31 => HIGH_LEVEL—interrupts disabled
Interrupts
• Hardware signals an interrupt
• Interrupt’s ISR runs at device IRQL
– Has to be fast; get off the processor and allow
other ISRs to run
– Typically queues a DPC, acknowledges the
interrupt, and returns
• DPC—Delayed Procedure Call
– Further processing at DISPATCH_LEVEL
– Queues work to kernel worker threads
IO Completion
• Driver calls IO Manager to complete the
IRP
• IO Manager queues a kernel mode APC to
the initiating thread
• APC: Asynchronous Procedure Call
– Kernel mode APC preempts thread execution
– Writes data back to user mode in the context
of the thread which initiated the IO
– Signals completion of the IO
IO Cache
• Classic: block cache
– Page mappings translate directly to blocks on
the underlying partition.
• Windows: stream cache
– Page mappings are offsets within a stream.
– IO Cache Manager uses the same mappings.
– All cache management (trimming) is
centralized in the memory manager
– All modifications show up in mapped views.
Virtual Memory
• Sections—another object type
– Can be created to map a file
– Can also be created off the pagefile
– Optionally named, for shared memory
• Reservation
– Range of VA which will not be handed out for
some other purpose
• Committed
– VA which actually maps to something
Aside: CreateProcess
• Just a user mode Win32 API
{
NtCreateFile(&file, szImage);
NtCreateSection(&sec, file);
NtCreateProcess(&proc, sec);
NtCreateThread(&thrd, proc);
}
WaitForSingleObject(proc);
Virtual Memory
• Memory Manager maintains processor-
specific page table entry mappings.
– Some parts of the address space are shared
between processes—for instance, the kernel’s
address space and the per-session space.
• On a pagefault, mm reads in the data
• Pages can be mapped without the
appropriate access… what to do?
Signals
• With threads, signals don’t work very well.
• Some software designs expect to touch
inaccessible memory.
– Large structured files
– Concurrent garbage collection
– SLists
• Single global handler has to somehow
know about all possible situations.
Structured Exception Handling
• Exceptions unwind the stack
– Almost like C++!
– C++ matches against a type hierarchy
– SEH calls exception filter code—filters are
Turing-complete.
• Two ways to deal with exceptions:
– try/finally
– try/except
try/finally
res = AllocateSomeResource();
try {
SomeOperation(res);
} finally {
if (AbnormalTermination()) {
FreeSomeResource(res);
}
}
return res;
try/except
try {
SomeOperationWhichMayAV();
} except (Filter(
GetExceptionCode(),
GetExceptionInformation())) {
DoSomethingElse();
}
try/except
• GetExceptionCode()
– A code indicating the cause of the exception
• GetExceptionInformation()
– Additional code-specific info
– The full processor context
• Filter decides what to do
– EXCEPTION_EXECUTE_HANDLER
– EXCEPTION_CONTINUE_SEARCH
– EXCEPTION_CONTINUE_EXECUTION
Structured Exception Handling
• On x86, TEB points to stack of
EXCEPTION_REGISTRATION_RECORD
– auto structs, pointing to handler code
– pushed by function prolog
– popped by function epilog
• On exception, RtlDispatchException()
walks the list.
– Runs the filters to figure out what to do
– Calls handler functions
Structured Exception Handling
• On x86, there’s some overhead with
pushing and popping the registration
record
• On ia64, there is no overhead
– Stack traces are reliable
– It’s always possible to look up the handler
• Exception handling is very slow
– Especially on ia64
• Used only for truly exceptional conditions
Structured Exception Handling
• Used in kernel mode too!
– Most user mode access will just work
– Still need to validate address ranges & data
– Works great for SMP when another thread
might be in the middle of modifying the
address space
– Expected read exceptions are returned as
status codes from system calls
– Expected writes are returned as SUCCESS
– Unexpected => buggy kernel => blue screen
Top-level Exception Filter
• Top frame on each thread defines a
catchall exception filter
• Top-level exception filter:
– Notifies the debugger (if being debugged)
– Launches a just-in-time debugger (if set up)
– Loads faultrep.dll to report the failure
Faultrep.dll
• faultrep.dll offers to report the failure back
to Microsoft
• We analyze the failures
– A significant number are recognized instantly;
we can tell the user what happened and how
to fix it.
– The others go through the standard triage
process; developers analyze the dumps and
figure out what happened.
OCA
• 67 million machines running XP
• Tens of thousands of drivers
• Over 100 drivers on any given machine
• One bug in one driver => Crash
• A significant number of crashes come
from third-party drivers (some of which
ship on the CD)
• Lots of different problems, though
Driver Verifier
• Controlled by verifier.exe
• Special-pool’s allocations
– Detects allocation overruns & use after free
• Validates some behaviors
– IRQL—touching paged memory?
– DMA buffers
• Can inject failures—useful for testing
behavior under sub-optimal conditions
Stress
• Every night, a couple hundred machines
run stress on the latest build
• Stress exercises filesystems, memory,
GUI, scheduler, &c, trying to uncover low-
memory handling problems and race
conditions
• Every morning, the stress test team
triages failed machines
• Developers debug the failures
Questions?

More Related Content

the windows opereting system

  • 2. Goals • Hardware-portable – Used to support MIPS, PowerPC and Alpha – Currently supports x86, ia64, and amd64 – Multiple vendors build hardware • Software-portable – POSIX, OS2, and Win32 subsystems • OS2 is dead • POSIX is still supported—separate product • Lots of Win32 software out there in the world
  • 3. Goals • High performance – Anticipated PC speeds approaching minicomputers and mainframes – Async IO model is standard – Support for large physical memories – SMP was an early design goal – Designed to support multi-threaded processes – Kernel has to be reentrant
  • 4. Process Model • Threads and processes are distinct • Process: – Address space – Handle table (Handles => file descriptors) – Process default security token • Thread: – Execution Context – Optional thread-specific security token
  • 5. Tokens • “Who you are”—list of identities – Each identity is a SID • Also contains Privileges – Shutdown, Load drivers, Backup, Debug… • Can be passed through LPC ports and named pipe requests – Server side can use this to selectively impersonate the client.
  • 6. Object Manager • Uniform interface to kernel mode objects. • Handles are 32bit opaque integers • Per-process handle table maps handles to objects and permissions on the objects • Implements refcount GC – Pointer count—total number of references – Handle count—number of open handles
  • 7. Object Manager • Implements an object namespace – Win32 objects are under BaseNamedObjects – Devices under Device • This includes filesystems – Drive letters are symbolic links • ??C: => the appropriate filesystem device • Some things have other names – Processes and threads are opened by specifying a CID: (Process.Thread)
  • 8. Standard operations on handles • CloseHandle() • DuplicateHandle() – Takes source and destination process – Very useful for servers • WaitForSingleObject(), WaitForMultipleObjects() – Wait for something to happen – Can wait on up to 64 handles at once
  • 9. Security Descriptors • Each object has a Security Descriptor – Owner—special SID, CREATOR_OWNER – Group—special SID, CREATOR_GROUP – DACL • Discretionary Access Control List • List of SIDs and granted or denied access rights – SACL • System Access Control List • List of SIDs and access rights to be audited
  • 10. Access Rights typedef struct _ACCESS_MASK { USHORT SpecificRights; UCHAR StandardRights; UCHAR AccessSystemAcl : 1; UCHAR Reserved : 3; UCHAR GenericAll : 1; UCHAR GenericExecute : 1; UCHAR GenericWrite : 1; UCHAR GenericRead : 1; } ACCESS_MASK;
  • 11. Security Use • Objects are referred to via handles • Security checks occur when an object is opened – Open requests contain a mask of requested access rights – If granted to the token by the DACL, the handle contains those access rights • Access rights are checked on use – Just a bit test—very fast
  • 12. Object Open evt = OpenEvent(EVENT_MODIFY_STATE, FALSE, "SomeName"); – Finds the event object by name – Walks the DACL, looking for token SIDs – Keeps looking until all permissions are granted – If access is granted, inserts a handle to the object into the process’s handle table, with EVENT_MODIFY_STATE access
  • 13. Object Use SetEvent(evt); – SetEvent() requires EVENT_MODIFY_STATE access, and an event object. – The kernel looks up the handle in the process’s handle table. – Checks to make sure that it maps to an event object, and that the granted access bits contain the EVENT_MODIFY_STATE bit. – If all is good, the event is set.
  • 14. Object Use WaitForSingleObject(evt) – WaitForSingleObject() requires a synchronization object (like an event) and SYNCHRONIZE access. – evt maps to an event object – SYNCHRONIZE access was not requested when the handle was inserted. – Even if the DACL permits it, the wait fails.
  • 15. Types of Objects • Events – State is set or clear. – Can clear when a wait completes (auto-reset) • Mutexes – Can be acquired by a single thread at a time. – Automatically release when owner exits. • Semaphores – Maintain a count – Waits decrement the count
  • 16. More objects • Threads, Processes, Timers—like events • Registry Keys – Manipulate data in the registry—centralized store of system configuration info. • LPC Ports – Fast local RPC – Security tokens can transfer over LPC calls • Files
  • 17. Files & IO • File objects maintain a current offset, and a pointer to the underlying stream. • Default internal model is asynchronous – Synchronous IO just waits for the IO to complete – Async IO can set an event, or run a callback in the thread which queued the IO, or post a message to an IO completion port. • Each request is an IRP
  • 18. IRPs • Maintain state of IO requests, independent of the thread working on the IO • IRPs are handed off through the device stack to their destinations – Threads process IRPs – Initiating thread processes the IRP until a device returns STATUS_PENDING – Subsequent processing can be done in kernel worker threads
  • 19. Interrupts IRQL—Interrupt Request Level: 0 => PASSIVE_LEVEL Processor is running threads All usermode code is at IRQL 0 1 => APC_LEVEL; threads, APCs disabled 2 => DISPATCH_LEVEL • Running as the processor: can’t stop! • Can’t take a page fault • Only locks available are KSPIN_LOCKs
  • 20. Interupts 3-26 => Device Interrupt Service Routines • Device interrupts are mapped to an IRQL and an interrupt service routine; ISR is called at that IRQL 27 => PROFILE_LEVEL—profiling 28 => CLOCK2_LEVEL—clock interrupt 29 => IPI_LEVEL—interprocessor interrupt • Requests another processor to do something 30 => POWER_LEVEL—power failure 31 => HIGH_LEVEL—interrupts disabled
  • 21. Interrupts • Hardware signals an interrupt • Interrupt’s ISR runs at device IRQL – Has to be fast; get off the processor and allow other ISRs to run – Typically queues a DPC, acknowledges the interrupt, and returns • DPC—Delayed Procedure Call – Further processing at DISPATCH_LEVEL – Queues work to kernel worker threads
  • 22. IO Completion • Driver calls IO Manager to complete the IRP • IO Manager queues a kernel mode APC to the initiating thread • APC: Asynchronous Procedure Call – Kernel mode APC preempts thread execution – Writes data back to user mode in the context of the thread which initiated the IO – Signals completion of the IO
  • 23. IO Cache • Classic: block cache – Page mappings translate directly to blocks on the underlying partition. • Windows: stream cache – Page mappings are offsets within a stream. – IO Cache Manager uses the same mappings. – All cache management (trimming) is centralized in the memory manager – All modifications show up in mapped views.
  • 24. Virtual Memory • Sections—another object type – Can be created to map a file – Can also be created off the pagefile – Optionally named, for shared memory • Reservation – Range of VA which will not be handed out for some other purpose • Committed – VA which actually maps to something
  • 25. Aside: CreateProcess • Just a user mode Win32 API { NtCreateFile(&file, szImage); NtCreateSection(&sec, file); NtCreateProcess(&proc, sec); NtCreateThread(&thrd, proc); } WaitForSingleObject(proc);
  • 26. Virtual Memory • Memory Manager maintains processor- specific page table entry mappings. – Some parts of the address space are shared between processes—for instance, the kernel’s address space and the per-session space. • On a pagefault, mm reads in the data • Pages can be mapped without the appropriate access… what to do?
  • 27. Signals • With threads, signals don’t work very well. • Some software designs expect to touch inaccessible memory. – Large structured files – Concurrent garbage collection – SLists • Single global handler has to somehow know about all possible situations.
  • 28. Structured Exception Handling • Exceptions unwind the stack – Almost like C++! – C++ matches against a type hierarchy – SEH calls exception filter code—filters are Turing-complete. • Two ways to deal with exceptions: – try/finally – try/except
  • 29. try/finally res = AllocateSomeResource(); try { SomeOperation(res); } finally { if (AbnormalTermination()) { FreeSomeResource(res); } } return res;
  • 30. try/except try { SomeOperationWhichMayAV(); } except (Filter( GetExceptionCode(), GetExceptionInformation())) { DoSomethingElse(); }
  • 31. try/except • GetExceptionCode() – A code indicating the cause of the exception • GetExceptionInformation() – Additional code-specific info – The full processor context • Filter decides what to do – EXCEPTION_EXECUTE_HANDLER – EXCEPTION_CONTINUE_SEARCH – EXCEPTION_CONTINUE_EXECUTION
  • 32. Structured Exception Handling • On x86, TEB points to stack of EXCEPTION_REGISTRATION_RECORD – auto structs, pointing to handler code – pushed by function prolog – popped by function epilog • On exception, RtlDispatchException() walks the list. – Runs the filters to figure out what to do – Calls handler functions
  • 33. Structured Exception Handling • On x86, there’s some overhead with pushing and popping the registration record • On ia64, there is no overhead – Stack traces are reliable – It’s always possible to look up the handler • Exception handling is very slow – Especially on ia64 • Used only for truly exceptional conditions
  • 34. Structured Exception Handling • Used in kernel mode too! – Most user mode access will just work – Still need to validate address ranges & data – Works great for SMP when another thread might be in the middle of modifying the address space – Expected read exceptions are returned as status codes from system calls – Expected writes are returned as SUCCESS – Unexpected => buggy kernel => blue screen
  • 35. Top-level Exception Filter • Top frame on each thread defines a catchall exception filter • Top-level exception filter: – Notifies the debugger (if being debugged) – Launches a just-in-time debugger (if set up) – Loads faultrep.dll to report the failure
  • 36. Faultrep.dll • faultrep.dll offers to report the failure back to Microsoft • We analyze the failures – A significant number are recognized instantly; we can tell the user what happened and how to fix it. – The others go through the standard triage process; developers analyze the dumps and figure out what happened.
  • 37. OCA • 67 million machines running XP • Tens of thousands of drivers • Over 100 drivers on any given machine • One bug in one driver => Crash • A significant number of crashes come from third-party drivers (some of which ship on the CD) • Lots of different problems, though
  • 38. Driver Verifier • Controlled by verifier.exe • Special-pool’s allocations – Detects allocation overruns & use after free • Validates some behaviors – IRQL—touching paged memory? – DMA buffers • Can inject failures—useful for testing behavior under sub-optimal conditions
  • 39. Stress • Every night, a couple hundred machines run stress on the latest build • Stress exercises filesystems, memory, GUI, scheduler, &c, trying to uncover low- memory handling problems and race conditions • Every morning, the stress test team triages failed machines • Developers debug the failures