Win32 API Interceptor
Win32 API Interceptor
Win32API Interceptor
A Microsoft Windows API function calls
Interception application
Final Report
1
Win32API Interceptor Final Report
Table of contents
Table of contents ...................................................................................2
Abstract ..................................................................................................4
Introduction ............................................................................................4
COM .................................................................................................................. 12
DllInjectionAppLoader...................................................................................... 12
InterceptLogger ............................................................................................... 13
2
Win32API Interceptor Final Report
Known Issues................................................................................................... 31
Appendices...........................................................................................32
Microsoft Research “Detours” ....................................................................... 32
3
Win32API Interceptor Final Report
Abstract
This project introduces a novel approach to intercept Win32API function calls.
It is based on a Microsoft-research technology called Detours.
The final product of this project is a MS Access based application that logs all the
Win32API function calls that are issued by an application of the user's choice.
Introduction
Innovative systems research hinges on the ability to easily instrument and extend
existing operating system and application functionality. With access to appropriate
source code, it is often trivial to insert new instrumentation or extensions by
rebuilding the OS or application. However, in today’s world of commercial software,
researchers seldom have access to all relevant source code.
In this project we use "Detours", which is a library for instrumenting arbitrary Win32
functions on x86 machines. Detours intercepts Win32 functions by re-writing target
function images.
While prior researchers have used binary rewriting to insert debugging and profiling
instrumentation, to our knowledge, Detours is the first package on any platform to
logically preserve the un-instrumented target function (callable through a trampoline)
as a subroutine for use by the instrumentation. Using the unique trampoline design
is crucial for extending existing binary software.
Since the project’s scope is bounded by an academic course we did not implement a
whole solution. We mainly concentrated in understanding the technologies that were
involved and to produce a working prototype of an infrastructure that intercepts
Windows API functions (for NT-family OSs).
4
Win32API Interceptor Final Report
5
Win32API Interceptor Final Report
6
Win32API Interceptor Final Report
7
Win32API Interceptor Final Report
This illustration displays how the memory looks right before we resume the thread.
(step 13):
Our Process New Spawned (Soon to be
intercepted) Process
0xFF….F 0xFF….F
Old ESP
DLL-name (copy)
Copy of
struct Code{ JMP Old(EIP) the Code
…. structure
…. CALL LoadLibrary
} code;
PUSH DLL-name
New EIP =
UINT32 nCodeBase Old(ESP) –
sizeof(Code)
CreateProcess()
(Suspended)
0x00….0 0x00….0
New ESP
Detouring a function
Now, for the real thing. In this section we'll describe how the Detours mechanism
works and how it was incorporated into our project.
In general, the way Detour accomplishes this task is by changing the application's
assembly code that was loaded into the memory so that instead of going to the real
functions' code it jumps to the detouring code.
we stated "jumps" above since this is precisely the way detours does the trick – it
takes the 5 first bytes of the function you want to detour (assuming it has at least 5
bytes, this is the biggest restriction of using this method) and it writes it down in a
"Trampoline function". Instead of those 5 bytes an unconditional jump is written
destined to jump to the "Detour function", see bellow.
Then it creates a new code block, called the "Detour function", this function includes
the user's interception code (any thing he wants to do before the real function
operates). Appending this code is an assembly call function to the "Trampoline
function".
As you might recall, the "Trampoline function" includes the 5 bytes that were taken
from the original function. At the end of the "Trampoline function", Detours appends
an unconditional jump to the rest of the original function's code.
8
Win32API Interceptor Final Report
And now for the unwinding: when the original function hits the end it "Returns" to the
calling function and that would be… the "Detour function", since the "Trampoline
function" used an unconditional jump the return address on the stack is of the
"Detour Function".
When the "Detour Function" completes then it "Returns" to … the function that called
the original function to begin with (and not to the original function itself since, as you
recall, we added an unconditional jump to the "Detour function").
That’s it!
9
Win32API Interceptor Final Report
Easy, ha? Well we'm well aware that the explanation above is a "bit" obscure. To
combat this we will now add a diagram that will express this notion.
The diagram bellow is based on a diagram that was introduced in a PowerPoint
presentation that is included in the detours archive file that can be downloaded from
the web.
Before Detours:
1. Call
2. Return
After Detours:
6. Return 5. Return
10
Win32API Interceptor Final Report
Target: Target:
push ebp [1 byte]
jmp Detour [5 bytes]
mov ebp,esp [2 bytes]
push edi
push ebx [1 bytes]
....
push esi [1 byte]
push edi
Detour:
....
...Your code...
Call Trampoline
...More of your code...
Trampoline:
push ebp
mov ebp,esp
push ebx
push esi
jmp Target+5
Now that we clearly understand how the mechanism works we need to understand
how to create the "Detours functions" and how to connect them to the "Original
functions" we want to detour.
The Detours library comes with some code that creates this connection, meaning,
given a function you want to detour and a "Detour Function" that contains the code
you want to inject it will instrument the "original function" like we described above.
The process of a function-instrumentation, done by creating the DLL that we
described in the section above, is divided into two:
1. Create a "Trampoline Function" and store in it the address of the original
function. (Done in compile time)
For this task Detours present a c-macro called:
"DETOURS_TRAMPOLINE(<Trampoline function signature>, <Original
function's name>)", this macro generates code of a Trampoline Function and
stores the address of the Original function in it.
This macro can be found in the "Detours.h" file.
2. Connect the Trampoline with the Detour function and the Original Function,
using the stored address of the Original Function in the Trampoline. (Done in
runtime). This is done by using the function:
"DetourFunctionWithTrampoline(<Trampoline function name>, <Detour
function name>)", the Trampoline function name is the same function name
as declared in the macro in the first bullet (above), and the Detour function
name is the function's name that you want to be called instead of the original
function (see the bullet above for the "original function name")
11
Win32API Interceptor Final Report
To sum it up, for every "Original function" that you want to detour, you need to call
the "DETOURS_TRAMPOLINE" with the original function and the signature of the
trampoline function (which should be the same as the signature of the Original
Function) then you should add a call to the "DetourFunctionWithTrampoline" function
that will bind the Trampoline and your function, the Detour function, in which you can
add the code that you want to run before the call to the original function.
You should not forget one important thing, in the Detour function you write, you
should add a function call to the Trampoline function (This function, as you can
recall, holds the first few instructions of the original function and a jump call to the
rest of the function)
You don't have to call the Trampoline function. If you don't call it, there will be no run
time error, what will happen is that when the detour function will terminate it'll return
to the calling function without running any of the original function's code. This, in fact,
is a way to replace the original function with your implementation.
Further more, you could add code after the call to the trampoline function (in the
Detour function), and that code will actually run after the original implementation.
Since we wrap all this code in a DLL binary we want the instrumentation to happen in
the DllMain() function, when it is called with the "reason" parameter set to:
"DLL_PROCESS_ATTACH". (To be specific, we want the calls to
"DetourFunctionWithTrampoline" to exist in DllMain())
This will ensure that when the LoadLibray() function will be injected into the
instrumented process (as we saw in the first section), the calls to the "
DetourFunctionWithTrampoline" functions will run as soon as the process will
resume execution.
COM
Describing the COM technology is way beyond the scope of this document. Further
more COM is only used as a by-product, it is not the main technology used in the
project.
Never the less we will generally describe how this technology helped us in the
project.
COM is a way to share objects, created in one language, in another language.
It wraps the object in a binary capsule that can be interpreted in several languages.
It is a bit more complicated that what is described above, and COM has more into it
that only what we stated.
The main reason we used COM objects is because we had C/C++ code that
implemented the detours functionality and the application we wrote was based on
Visual basic for Application.
Both C++ and VB handle COM objects and it was a good way to run the needed
functionality from within the VB application's memory space and not as separate
processes.
In the following sections we'll describe the two COM objects we created
DllInjectionAppLoader
This COM object will spawn a new process with the requested application and will
inject a DLL that includes function-instrumentation code (as was described in the
Microsoft research "Detours" technology section)
12
Win32API Interceptor Final Report
InterceptLogger
This COM object will log the Win32API functions that are called during the
application that was spawned with the injected DLL.
This object implements the IInterceptLogger interface.
The IInterceptLogger declares the following methods and properties:
HRESULT StartLogging(
[in] VARIANT_BOOL bBlocking,
[in] BSTR ODBC_DSN);
This method opens the OS-pipe and logs every message from it to the DB.
bBlocking should be true if and only if you wish the function to block. This is
useful if you write a VB-script that plays the role of the listener.
ODBC_DSN is the name of the ODBC DSN that the logger should write to.
HRESULT Shutdown();
This method should be called if you want to close the connection to the data
base.
After calling this method no logging messages will be inserted into the
database until StartLogging() will be call again.
HRESULT IsConnected([out, retval] VARIANT_BOOL* pVal);
this property returns true if and only if the following terms co-exist:
o StartLogging() function was called previously and the connection was
made successfully
o Shutdown() was not called after StartLogging() was successful
HRESULT AddFunctionToFilter([in] BSTR FunctionName);
This method adds the function name that is passed in the parameter
FunctionName, to a list of filtered functions
HRESULT SetFilterType([in] long FilterType);
This method sets the way the InterceptLogger object will filter the messages
according to the function names that were added by the AddFunctionToFilter()
method.
FilterType can be any one of the following values:
o 0 – No filtering will be done, the functions in the filter list will be ignored
o 1 – Log if and only if the log message is of a function that exists in the
filter list
o 2 – Log if and only if the log message is of a function that doesn't exists
in the filter list
13
Win32API Interceptor Final Report
14
Win32API Interceptor Final Report
The Solution
Architecture
In this section we'll describe the pieces that build our solution, what technology we
used for each of them and how we put all the pieces together.
The following diagram displays the grand picture.
InterceptLogger
(COM Object)
DllInjectionAppLoader
(OS Pipe)
(COM Object)
3a. Spawn
TraceAPI.DLL
3b. Inject Spawned process
(Binary DLL with
(with the TraceAPI.DLL
the Detours
injected)
functions)
Actually all the elements that are displayed in the diagram above were described
throughout the document.
15
Win32API Interceptor Final Report
Win32API Interceptor
This is an MS Access based application that runs the show.
As soon as the database is started a form opens, using this form the user can
choose which application he wants to intercept, he can change the filter that controls
which functions should be logged or not.
All the logged data is displayed on the screen and there is a graph that displays the
topmost called functions. (See the MS Access sub-section in the Technologies
section)
InterceptLogger
This is a COM object that incorporates the code that extracts the logging data that is
stored in the \\.\pipe\Win32APIInterceptor OS Pipe and inserts it to the data base.
(See the COM sub-section in the Technologies section)
DllInjectionAppLoader
This, too, is a COM object that spawns the user selected application that will be
instrumented with the TraceAPI.DLL detours functions. (See the COM sub-section in
the Technologies section)
\\.\pipe\Win32APIInterceptor
This is the windows-pipe's name that the TraceAPI.DLL's functions send logging
data to and the InterceptLogger extracts the logging data from.
This is the way the spawned process communicates with the Win32API-Intercetor
application.
TraceAPI.DLL
This DLL file includes the detours functions and the code that changes the Win32
API functions so they'd detour first through our code, that will send logging data via
the pipe. (See Microsoft research "Detours" technology section)
Spawned Process
This is the process the user wants to log its Win32API function calls.
It will be spawned by the DllInjectionAppLoader COM object, that will inject the
TraceAPI.DLL into its memory space. Every call to a Win32API function that will be
issued by this application/process will be posted in the \\.\pipe\Win32APIInterceptor.
Thereafter the message will be extracted from the pipe by the InterceptLogger and
stored by it in the Win32API Interceptor database.
16
Win32API Interceptor Final Report
Code snippets
#define DETOUR_TRAMPOLINE(trampoline,target) \
static PVOID __fastcall _Detours_GetVA_##target(VOID) \
{ \
return ⌖ \
} \
\
__declspec(naked) trampoline \
{ \
__asm { nop };\
__asm { nop };\
__asm { call _Detours_GetVA_##target };\
__asm { jmp eax };\
__asm { ret };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
__asm { nop };\
}
DetourGenMoveEax function
This function generates assembly code on the given pointer that will be interpreted
as a "MOV nValue eax" assembly command.
There are many other functions of this sort, so we'll bring only this one as a teaser.
The following code was excerpted from the Detours.h file.
17
Win32API Interceptor Final Report
DetourFunctionWithTrampolineEx function
This function was used to change the original function so it'll jump to the detour
function, and to fill the trampoline (See the Detouring a function section)
The following code was excerpted from the Detours.cpp file.
BOOL WINAPI DetourFunctionWithTrampolineEx(PBYTE pbTrampoline,
PBYTE pbDetour,
PBYTE *ppbRealTrampoline,
PBYTE *ppbRealTarget)
{
PBYTE pbTarget = NULL;
return FALSE;
}
18
Win32API Interceptor Final Report
*ppbRealTarget = pbTarget;
// Kfir: this function will copy the code from the original
// code to the trampoline and add
// the needed jmp opcodes in the trampoline and in the
// original function
// pbTarget is the pointer to the ORIGINAL FUNCTION
// pbDetour is the pointer to the DETOUR FUNCTION (the
// code you want to run before the original one)
// pbTrampoline is the pointer to where we will dump the
// first 5 bytes of the original code.
return detour_insert_detour(pbTarget, pbTrampoline, pbDetour);
}
detour_insert_detour function
This is an inner function that is used by the DetourFunctionWithTrampolineEx
function. (See where this function was called, at the end of the function)
The following code was excerpted from the Detours.cpp file.
// Kfir: this function will copy the code from the original-code to
// the trampoline and add
// the needed jmp opcodes in the trampoline and in the original //
function
// pbTarget is the pointer to the ORIGINAL FUNCTION
// pbDetour is the pointer to the DETOUR FUNCTION (the code you
// want to run before the original one)
// pbTrampoline is the pointer to where we will dump the first
// 5 bytes o
static BOOL detour_insert_detour(PBYTE pbTarget,
PBYTE pbTrampoline,
PBYTE pbDetour)
{
PBYTE pbCont = pbTarget;
//Kfir: First we want to *check* what kind of commands exist in
// the begining of a function
// generally we want to remove at least 5 bytes
// (SIZE_OF_TRP_OPS) but we
// dont want to break a command in the middle (i don't
// know how they drew the line exactly
// but some opcodes need to be glued and some don't need to be
// glued together,
// so if the first one moves so will the rest)
for (LONG cbTarget = 0; cbTarget < SIZE_OF_TRP_OPS;) {
PBYTE pbOp = pbCont;
BYTE bOp = *pbOp;
pbCont = DetourCopyInstruction(NULL, pbCont, NULL);
cbTarget = pbCont - pbTarget;
if (bOp == OP_JMP ||
bOp == OP_JMP_EAX ||
bOp == OP_RET_POP ||
bOp == OP_RET) {
break;
}
if (bOp == OP_PREFIX && pbOp[1] == OP_JMP_SEG) {
break;
}
if ((bOp == OP_PRE_ES ||
bOp == OP_PRE_CS ||
19
Win32API Interceptor Final Report
bOp == OP_PRE_SS ||
bOp == OP_PRE_DS ||
bOp == OP_PRE_FS ||
bOp == OP_PRE_GS) &&
pbOp[1] == OP_PREFIX &&
pbOp[2] == OP_JMP_SEG) {
break;
}
} // Kfir: End of FOR!!!
CDetourEnableWriteOnCodePage ewTrampoline(pbTrampoline,
DETOUR_TRAMPOLINE_SIZE);
CDetourEnableWriteOnCodePage ewTarget(pbTarget, cbTarget);
if (!ewTrampoline.SetPermission(PAGE_EXECUTE_READWRITE))
return FALSE;
if (!ewTarget.IsValid())
return FALSE;
pbTrampoline[DETOUR_TRAMPOLINE_SIZE-1] = (BYTE)cbTarget;
return TRUE;
}
20
Win32API Interceptor Final Report
BOOL rv = 0;
__try {
// Kfir: Here we call the trampoline function that will
21
Win32API Interceptor Final Report
InjectLibrary function
This function injects a LoadLibrary() function call to a process.
It is used to inject the TraceAPI.dll file to the process that the user asked to intercept
its Win32API calls. (See Loading a DLL into a process’s context section for more
details)
The following code was excerpted from the creatwth.cpp file.
static BOOL InjectLibrary(HANDLE hProcess,
HANDLE hThread,
PBYTE pfLoadLibrary,
PBYTE pbData,
DWORD cbData)
{
BOOL fSucceeded = FALSE;
DWORD nProtect = 0;
DWORD nWritten = 0;
CONTEXT cxt;
UINT32 nCodeBase;
PBYTE pbCode;
struct Code
{
BYTE rbCode[128];
union
{
WCHAR wzLibFile[512];
CHAR szLibFile[512];
};
} code;
//Kfir: suspend the tread so we can change its context & stack
//Kfir: hThread is the main running thread
SuspendThread(hThread);
ZeroMemory(&cxt, sizeof(cxt));
cxt.ContextFlags = CONTEXT_FULL;
pbCode = code.rbCode;
22
Win32API Interceptor Final Report
if (pbData) {
CopyMemory(code.szLibFile, pbData, cbData);
//Kfir: probably "DetourGenPush" adds a "push" opcode on
// the stack and the
//Kfir: address of the Dll Name afterwards
pbCode = DetourGenPush(pbCode, nCodeBase +
offsetof(Code, szLibFile));
//Kfir: probably adds a "call" opcode to the
// 'LoadLibrary' function located at
//Kfir: the kernel dll (uses internal GetLoadLibraryA())
pbCode = DetourGenCall(pbCode, pfLoadLibrary,
(PBYTE)nCodeBase + (pbCode –
code.rbCode));
}
fSucceeded = TRUE;
finish:
23
Win32API Interceptor Final Report
24
Win32API Interceptor Final Report
System Requirements
1. x86 version of Windows NT/Windows 2000/ Windows XP
2. Microsoft Visual C++ .Net 2003
3. Microsoft Office Access 2003
4. Administrator permissions
Now, open ODBC Data Source Administrator and select User DSN section (illustration
4.2). Then, press the Add button, select Driver do Microsoft Access (*.mdb) and press the
Finish button. Now, you see the ODBC Microsoft Access Setup screen. Type
Win32APIInterceptor in the Data Source Name field, press the Select button and select the
Win32APIInterceprot.mdb file from the Win32APIInterceptor directory. Press OK.
Once ODBC definition is complete, you must compile the source files. These files can be
found on the project’s website.
Compilation
In order to install the application, open the Win32APIInterceptor -> Win32APIIntercept
directory. Open Win32APIIntercept.sln file with Microsoft Visual C++ .NET
25
Win32API Interceptor Final Report
illustration 4.3
The next step is to build the solution. Select the Rebuild Solution option from the
Build menu as shown on the previous illustration and wait until the build process
completes.
26
Win32API Interceptor Final Report
GUI guide
Main Screen
To start a Win32APIInterceptor, open the Win32APIInterceptor.mdb file. The main
window appears, as follows:
illustration 4.4
Illustration 4.5
27
Win32API Interceptor Final Report
Log section
All intercepted API calls for a certain application are listed in this section. There is a
separate record for each API function, which consists of a function name, call time,
return value and the first five arguments (at most).
While intercepting, all these details are saved in a database. In order to display them
on the screen, one must press the Refresh button.
There is a record counter at the bottom of the section.
illustration 4.6
Tools section
The can change some interception properties:
To clear the database, use the Clear Database button. After the database is
cleared, the Log list is refreshed – all the data is removed from it.
In the Filters section the user can manage the Log list. One can select the
functions he wishes to display (or those he wishes to screen out). The default
condition is Unfiltered, which means, that all API functions will be displayed.
In order to display only certain functions, one should select the Wanted
option and add all of the desired functions to the list below. To block some
function from being displayed, select the Unwanted option (analogous to
Wanted in the previous example) and proceed as described earlier.
Functions list can be managed according to the user’s needs (add/remove
functions, clear the list). All changes in the tools section can be performed
dynamically, without stopping the interception process.
illustration 4.7
28
Win32API Interceptor Final Report
illustration 4.8
29
Win32API Interceptor Final Report
Hands-on example
Let’s examine the API calls that occur while running the command line interpreter
(cmd.exe, in our case).
1. Open Win32APIInterceptor and press the Clear Database button for you own
convenience. (This will delete the saved information from the database and
remove all the records from the Log section.)
2. Type cmd in the Executable location field, and then press the Start
Interception button.
3. As you can see, nothing happens and the Log section remains clear. The
reason is that we didn’t press the Refresh button, i.e. the information about
API calls is stored in the database, but is not shown on the screen. Press the
Refresh button and function’s details will appear. Wait a few second until all
API calls occur. (You can press the Refresh button at the end of the
interception, to ensure, that all functions are displayed.)
4. Press the Stop Interception button.
5. The counter on the bottom of the Log section shows, that 282 API calls has
occurred. We can see that most frequently called function was
GetLocaleInfoW (138 times - according to the chart).
6. Clear the function list, by pressing the Clear button in the Filters section. Add
the GetLocaleInfoW function to the list and highlight the Unwanted option.
7. This will cause the interceptor not to display the GetLocaleInfoW function in
the Log section.
8. Repeat the steps 1-3 again.
9. The GetLocaleInfoW function doesn’t appear in the Log section, and function
counter equals to 144. As you can see, 144 + 138 = 282.
10. Note, that instead of using the Refresh button, you can use the AutoRefresh
option.
30
Win32API Interceptor Final Report
Known Issues
• The compilation of the project is dependent of the MS Visual Studio 2003
(.Net) IDE, there isn't a full makefile that build all of the project.
Aside of the COM objects, there is a makefile that'll build the rest of the
project.
31
Win32API Interceptor Final Report
Appendices
Microsoft Research “Detours”
32
Win32API Interceptor Final Report
33
Win32API Interceptor Final Report
;; Target Function ;; Target Function payloads, to Win32 binary files and for editing
… … DLL import tables.
TargetFunction: TargetFunction:
push ebp jmp DetourFunction Figure 3 shows the basic structure of a
mov ebp,esp Win32 Portable Executable (PE) binary file.
push ebx
push esi TargetFunction+5:
The PE format for Win32 binaries is an
push edi push edi extension of COFF (the Common Object File
… … Format). A Win32 binary consists of a DOS
;; Trampoline ;; Trampoline compatible header, a PE header, a text section
… … containing program code, a data section
TrampolineFunction: TrampolineFunction:
jmp TargetFunction push ebp containing initialized data, an import table
… mov ebp,esp listing any imported DLLS and functions, an
push ebx
push esi
export table listing functions exported by the
jmp TargetFunction+5 code, and debug symbols. With the exception
… of the two headers, each of the other sections
of the file is optional and may not exist in a
given binary.
Figure 2. Trampoline and target functions,
before and after insertion of the detour (left Start of File
DOS Header
and right). PE (w/COFF) Header
.text Section
Figure 2 shows the insertion of a detour. To Program Code
detour a target function, Detours first allocates
.data Section
memory for the dynamic trampoline function Initialized Data
(if no static trampoline is provided) and then
enables write access to both the target and the .idata Section
Import Table
trampoline. Starting with the first instruction,
Detours copies instructions from the target to .edata Section
Export Table
the trampoline until at least 5 bytes have been
Debug Symbols
copied (enough for an unconditional jump
instruction). If the target function is fewer
than 5 bytes, Detours aborts and returns an End of File
error code. To copy instructions, Detours uses Figure 3. Format of a Win32 PE binary file.
a simple table-driven disassembler. Detours
adds a jump instruction from the end of the To modify a Win32 binary, Detours creates
trampoline to the first non-copied instruction a new .detours section between the export
of the target function. Detours writes an table and the debug symbols. Note that debug
unconditional jump instruction to the detour symbols must always reside last in a Win32
function as the first instruction of the target binary. The new section contains a detours
function. To finish, Detours restores the header record and a copy of the original PE
original page permissions on both the target header. If modifying the import table, Detours
and trampoline functions and flushes the CPU creates the new import table, appends it to the
instruction cache with a call to Flush- copied PE header, then modifies the original
InstructionCache. PE header to point to the new import table.
Finally, Detours writes any user payloads at
Payloads and DLL Import Editing the end of the .detours section and appends
While a number of tools exist for editing the debug symbols to finish the file. Detours
binary files [10, 12, 13, 17], most systems can reverse modifications to the Win32 binary
research doesn’t require such heavy-handed by restoring the original PE header from the
access to binary files. Instead, it is often .detours section and removing the
sufficient to add an extra DLL or data segment .detours section. Figure 4 shows the
to an application or system binary file. In format of a Detours-modified Win32 binary.
addition to detour functions, the Detours Creating a new import table serves two
library also contains fully reversible support purposes. First, it preserves the original
for attaching arbitrary data segments, called import table in case the programmer needs to
reverse all modifications to the Win32 file.
34
Win32API Interceptor Final Report
Second, the new import table can contain must include the detours.h header file and
renamed import DLLs and functions or link with the detours.lib library.
entirely new DLLs and functions. For
#include <windows.h>
example, Coign [7] uses Detours to insert an #include <detours.h>
initial entry for coignrte.dll into each VOID (*DynamicTrampoline)(VOID) = NULL;
instrumented application. As the first entry in DETOUR_TRAMPOLINE(
the applications import table, VOID WINAPI SleepTrampoline(DWORD),
Sleep
coignrte.dll always is the first DLL to );
run in the application’s address space. VOID WINAPI SleepDetour(DWORD dw)
{
Start of File return SleepTrampoline(dw);
DOS Header
}
PE (w/COFF) Header
VOID DynamicDetour(VOID)
.text Section
{
Program Code
return DynamicTrampoline();
.data Section }
Initialized Data void main(void)
{
.idata Section VOID (*DynamicTarget)(VOID) = SomeFunction;
unused Import Table
DynamicTrampoline
.edata Section =(FUNCPTR)DetourFunction(
Export Table (PBYTE)DynamicTarget,
(PBYTE)DynamicDetour);
.detours Section
detour header DetourFunctionWithTrampoline(
original PE header (PBYTE)SleepTrampoline,
new import table (PBYTE)SleepDetour);
user payloads
// Execute the remainder of program.
Debug Symbols
DetourRemoveTrampoline(SleepTrampoline);
End of File DetourRemoveTrampoline(DynamicTrampoline);
}
Figure 4. Format of a Detours-modified
binary file.
Figure 5. Sample Instrumentation Program.
Detours provides functions for editing
import tables, adding payloads, enumerating Trampolines may be created either statically
payloads, removing payloads, and rebinding or dynamically. To intercept a target function
binary files. Detours also provides routines for with a static trampoline, the application must
enumerating the binary files mapped into an create the trampoline with the DETOUR-
address space and locating payloads within _TRAMPOLINE macro. DETOUR_-
those mapped binaries. Each payload is TRAMPOLINE takes two arguments: the
identified by a 128-bit globally unique prototype for the static trampoline and the
identifier (GUID). Coign uses Detours to name of the target function.
attach per-application configuration data to Note that for proper interception the
application binaries. prototype, target, trampoline, and detour
In cases where instrumentation need be functions must all have exactly the same call
inserted into an application without modifying signature including number of arguments and
binary files, Detours provides functions to calling convention. It is the responsibility of
inject a DLL into either a new or an existing the detour function to copy arguments when
process. To inject a DLL, Detours writes a invoking the target function through the
LoadLibrary call into the target process trampoline. This is intuitive as the target
with the VirtualAllocEx and Write- function is just a subroutine callable by the
ProcessMemory APIs then invokes the call detour function.
with the CreateRemoteThread API. Using the same calling convention insures
that registers will be properly preserved and
Using Detours that the stack will be properly aligned between
The code fragment in Figure 5 illustrates detour and target functions.
the usage of the Detours library. User code
35
Win32API Interceptor Final Report
Interception of the target function is enabled debugging symbols. The function pointer
by invoking the DetourFunctionWith- returned by DetourFindFunction can be
Trampoline function with two arguments: given to DetourFunction to create a
the trampoline and the pointer to the detour dynamic trampoline.
function. The target function is not given as Interception of a target function can be
an argument because it is already encoded in removed by invoking the DetourRemove-
the trampoline. Trampoline function.
A dynamic trampoline is created by calling Note that because the functions in the
DetourFunction with two arguments: a Detours library modify code in the application
pointer to the target function and a pointer to address space, it is the programmer’s
the detour function. DetourFunction responsibility to ensure that no other threads
allocates a new trampoline and inserts the are executing in the address space while a
appropriate interception code in the target detour is inserted or removed. An easy way to
function. insure single-threaded execution is to call
Static trampolines are extremely easy to use functions in the Detours library from a
when the target function is available as a link DllMain routine.
symbol. When the target function is not
available for linking, a dynamic trampoline Evaluation
can be used. Often a function pointer to the Several alternative techniques exist for
target function can be acquired from a second intercepting function calls. Alternative
function. For those times, when a pointer to interception techniques include:
the target function is not readily available, Call replacement in application source
DetourFindFunction can find the pointer code. Calls to the target function are replaced
to a function when it is either exported from a with calls to the detour function by modifying
known DLL, or if debugging symbols are application source code. The major drawback
available for the target function’s binary1. of this technique is that it requires access to
DetourFindFunction accepts two source code.
arguments, the name of the binary and the Call replacement in application binary
name of the function. DetourFind- code. Calls to the target function are replaced
Function returns either a valid pointer to the with calls to the detour function by modifying
function or NULL if the symbol for the application binaries. While this technique
function could not be found. DetourFind- does not require source code, replacement in
Function first attempts to locate the the application binary does require the ability
function using the Win32 LoadLibrary and to identify all applicable call sites. This
GetProcAddress APIs. If the function is requires substantial symbolic information that
not found in the export table of the DLL, is not generally available for binary software.
DetourFindFunction uses the DLL redirection. If the target function
ImageHlp library to search available resides in a DLL, the DLL import entries in
the binary can be modified to point to a detour
DLL. Redirection to the detour DLL can be
achieved by either replacing the name of the
original DLL in the import table before load
time or replacing the function addresses in the
indirect import jump table after load [2].
Unfortunately, redirecting to the detour DLL
through the import table fails to intercept DLL
internal calls and calls on pointers obtained
from the LoadLibrary and
GetProcAddress APIs early in an
applications execution.
Breakpoint trapping. Rather than replace
1
Microsoft ships debugging symbols for the entire the DLL, the target function can be intercepted
Windows NT operation system as part of the retail
by inserting a debugging breakpoint into the
release. These symbols can be found in the \support-
\symbols directory on the OS distribution media. target function. The debugging exception
36
Win32API Interceptor Final Report
handler can then invoke the detour function. be partitioned across a network. During
The major drawback to breakpoint trapping is distributed executions, new Coign detour
that debugging exceptions suspend all functions intercept calls to COM instantiation
application threads. In addition, the debug functions and re-route those calls to distributed
exception must be caught in a second machines. In essence, Coign extends the COM
operating-system process. Interception via library to support intelligent remote
break-point trapping has a high performance invocation. Whereas DCOM supports remote
penalty. invocation of a few COM instantiation
Table 1 lists times for intercepting either an functions, Coign supports remote invocation
empty function or the CoCreateInstance for approximately 50 COM functions through
API. Times are on a 200 MHz Pentium Pro. detour extensions. Coign uses Detours’ DLL
Rows list the time to invoke the functions redirection functions to attach a runtime loader
without interception, with interception through and the payload functions to attach profiling
call replacement, with interception through data to application binaries.
DLL redirection, with interception using the Our colleagues have used Detours to
Detours library, or with interception through instrument the user-mode portion of the
breakpoint trapping. As can be seen, function DCOM protocol stack including marshaling
interception with Detours library has only proxies, DCOM runtime, RPC runtime,
minimal overhead (less than 400 ns in either WinSock runtime, and marshaling stubs [11].
case). The resultant detailed analysis was then used
to drive a re-architecture of DCOM for fast
Intercepted Function user-mode networks. While they could have
Interception
Empty CoCreate- used source code modifications to produce a
Technique
Function Instance special profiling version of DCOM, the
Direct 0.113µs 14.836µs source-based instrumentation would have been
Call Replacement 0.143µs 15.193µs version dependent and shared by all DCOM
DLL Redirection 0.143µs 15.193µs applications on the profiling machine. With
Detours Library 0.145µs 15.194µs binary instrumentation based on Detours, the
Breakpoint Trap 229.564µs 265.851µs profiling tool can be attached to any Windows
NT 4 build of DCOM and only effects the
process being profiled.
Table 1. Comparison of Interception
In another extension exercise, Detours was
Techniques.
used to create a thunking layer for COP (the
Component-based Operating System Proxy)
Experience [14]. COP is a COM-based version of the
The Detours package has been used Win32 API. COP aware applications access
extensively in Microsoft Research over the last operating system functionality through COM
two years to instrument and extend Win32 interfaces, such as IWin32FileHandle.
applications and the Windows NT operating Because the COP interfaces are distributable
system. with DCOM, a COP application can use OS
Detours was originally developed for the resources, including file systems, keyboards,
Coign Automatic Distributed Partition System mice, displays, registries, etc., from any
[7]. Coign converts local desktop applications machine in a network. To provide support for
built from COM components into distributed legacy applications, COP uses detour functions
client-server applications. During profiling, to intercept all application calls to the Win32
Coign uses Detours to intercept calls to COM APIs. Native application API calls are
instantiation functions such as CoCreate- converted to calls on COP interfaces. At the
Instance. The detour functions invoke the bottom, the COP implementation
original library functions through trampolines, communicates with the underlying operating
then wrap output interface pointers in an system through trampoline functions. COP
additional instrumentation layer (for more requires no modifications to application
details see [8]). The instrumentation layer binaries. At load time, the COP DLL is
measures inter-component communication to injected into the application’s address space
determine how application components should with Detours’ injection functions. Through its
37
Win32API Interceptor Final Report
simple interception, Detours has facilitated this 15]. Code patching has been applied to insert
massive extension of the Win32 API. debugging or profiling code. In the distant
Finally, to support Software Distributed past, code patching was generally considered
Shared Memory (SDSM) systems, we have to be a much more practical update method
implemented a first chance exception filter for than re-compiling the entire application. In
Win32 structured exception handling. The addition to debugging and profiling, Detours
Win32 API contains an API, Set- has also been used to resourcefully extend the
UnhandledExceptionFilter, through functionality of existing systems [7, 14].
which an application can specify an exception While recent systems have extended code
filter to execute should no other filter handle patching to parallel applications [1] and
an application exception. For applications system kernels [16], Detours is to our
such as SDSM systems, the programmer knowledge the only code patching system that
would like to insert a first-chance exception preserves the semantics of the target function
filter to remove page faults caused by the as a callable subroutine. The detour function
SDSM’s manipulation of VM page replaces the target function, but can invoke its
permissions. Windows NT does not provide functionality at any point through the
such a first-chance exception filter mechanism. trampoline. Our unique trampoline design
A simple detour intercepts the exception entry makes it trivial to extend the functionality of
point from kernel mode to user mode (Ki- existing binary functions.
UserExceptionDispatcher). With only Recent research has produced a class of
a few lines of code, the detour function calls a detailed binary rewriting tools including Atom
user-provided first-chance exception filter and [13], Etch [12], EEL [10], and Morph [17]. In
then forwards the exception, if unhandled, to general, these tools take as input an application
the default exception mechanism through a binary and an instrumentation script. The
trampoline. instrumentation script passes over the binary
inserting code between instructions, basic
Related Work blocks, or functions. The output of the script
is a new, instrumented binary. In a departure
Detours are an extension of the general for earlier systems, DyninstAPI [6] can modify
technique of code patching. To intercept applications dynamically.
execution, an unconditional branch or jump is Detours’ primary advantage over detailed
inserted into the desired point of interception binary rewriters is its size. Detours adds less
in the target function. Code overwritten by the than 18KB to an instrumentation package
unconditional branch is moved to a code patch. whereas detailed binary rewriters add at least a
The code patch consists of either the few hundred KB. The cost of Detours small
instrumentation code or a call to the size is an inability to insert code between
instrumentation code followed by the instructions or basic blocks. Detailed binary
instructions moved to insert the unconditional rewriters can insert instrumentation around any
branch and a jump to the first instruction in the instruction through sophisticated features such
target function after the unconditional branch. as free register discovery. Detours relies on
Logically, a code patch can be prepended to adherence to calling conventions in order to
the beginning of a function, inserted at some preserve register values. While detailed binary
arbitrary point in a function, or appended to rewriters support insertion of code before or
the end of a function. after any basic instruction unit, they do not
Whereas a code patch invokes preserve the semantics of the uninstrumented
instrumentation then continues the target target function as a callable subroutine.
function, our technique transfers control
completely to the detour function which can
invoke the original target function through the
Conclusions
trampoline at its leisure. The trampoline gives The Detours library provides an import set
instrumentation complete freedom to invoke of tools to the arsenal of the systems
the semantics of the original function as a researcher. Detour functions are fast, flexible,
callable subroutine at any time. and friendly. A detour of
Techniques for code patching have existed CoCreateInstance function has less than
since the dawn of digital computing [3-5, 9, a 3% overhead, which is an order of magnitude
38
Win32API Interceptor Final Report
smaller than the penalty for breakpoint [9] Kessler, Peter. Fast Breakpoints: Design and
Implementation. Proceedings of the ACM SIGPLAN '90
trapping. The Detours library is very small. Conference on Programming Language Design and
The runtime consists of less than 40KB of Implementation, pp. 78-84. White Plains, NY, June 1990.
compiled code although typically less than [10] Larus, James R. and Eric Schnarr. EEL: Machine-
18KB of code is added to the users Independent Executable Editing. Proceedings of the
instrumentation. ACM SIGPLAN Conference on Programming Language
Design and Implementation, pp. 291-300. La Jolla, CA,
Unlike DLL redirection, the Detours library June 1995.
intercepts both statically and dynamically
[11] Li, Li, Alessandro Forin, Galen Hunt, and Yi-Min Wang.
bound invocations. Finally, the Detours High-Performance Distributed Objects over a System
library is much more flexible than DLL Area Network. Proceedings of the Third USENIX NT
redirection or application code modification. Symposium. Seattle, WA, July 1999.
Interception of any function can be selectively [12] Romer, Ted, Geoff Voelker, Dennis Lee, Alec Wolman,
enabled or disabled for each process Wayne Wong, Hank Levy, Brian Bershad, and J. Bradley
Chen. Instrumentation and Optimization of Win32/Intel
individually at execution time. Executables Using Etch. Proceedings of the USENIX
Our unique trampoline preserves the Windows NT Workshop 1997, pp. 1-7. Seattle, WA,
semantics of the original, uninstrumented August 1997. USENIX.
target function for use as a subroutine of the [13] Srivastava, Amitabh and Alan Eustace. ATOM: A
detour function. Using detour functions and System for Building Customized Program Analysis
Tools. Proceedings of the SIGPLAN '94 Conference on
trampolines, it is trivial to produce compelling Programming Language Design and Implementation, pp.
system extensions without access to system 196-205. Orlando, FL, June 1994.
source code and without recompiling the [14] Stets, Robert J., Galen C. Hunt, and Michael L. Scott.
underlying binary files. Detours makes Component-based Operating System APIs: A Versioning
possible a whole new generation of innovative and Distributed Resource Solution. IEEE Computer,
32(7), July 1999.
systems research on the Windows NT
platform. [15] Stockham, T.G. and J.B. Dennis. FLIT- Flexowriter
Interrogation Tape: A Symbolic Utility Program for the
TX-0. Department of Electical Engineering, MIT,
Bibliography Cambridge, MA, Memo 5001-23, July 1960.
[1] Aral, Ziya, Illya Gertner, and Greg Schaffer. Efficient
[16] Tamches, Ariel and Barton P. Miller. Fine-Grained
Debugging Primitives for Multiprocessors. Proceedings
Dynamic Instrumentation of Commodity Operating
of the Third International Conference on Architectural
System Kernels. Proceedings of the Third Symposium on
Support for Programming Languages and Operating
Operating Systems Design and Implementation (OSDI
Systems, pp. 87-95. Boston, MA, April 1989.
'99), pp. 117-130. New Orleans, LA, February 1999.
[2] Balzer, Robert and Neil Goldman. Mediating USENIX.
Connectors. Proceedings of the 19th IEEE International
[17] Zhang, Xiaolan, Zheng Wang, Nicholas Gloy, J. Bradley
Conference on Distributed Computing Systems
Chen, and Michael D. Smith. System Support for
Workshop, pp. 73-77. Austin, TX, June 1999.
Automated Profiling and Optimization. Proceedings of
[3] Digital Equipment Corporation. DDT Reference Manual, the Sixteenth ACM Symposium on Operating System
1972. Principles. Saint-Malo, France, October 1997.
39
Win32API Interceptor Final Report
40