Dtrace Internals x86
Dtrace Internals x86
Dtrace Internals x86
Invalid operation
causes #UD trap
How SDT works
• Sourcecode: Name of statically-defined probe
DTRACE_PROBE4(squeue__enqueuechain, squeue_t *, sqp, \
mblk_t *, mp, mblk_t *, tail, int, cnt); \
• ELF object:
Symbol Table Section: .symtab
index value size type bind oth ver shndx name
[2561] 0x000be0d1 0x000000000965 FUNC GLOB D 0 .text squeue_enter_chain
Relocation Section:
.rela.eh_frame type offset addend section with respect to
R_AMD64_PC32 0xbe283 0xfffffffffffffffc .rela.text __dtrace_probe_squeue__enqueuechain
squeue_enter_chain+0x1b2
Relocation hook
• code in object file:
squeue_enter_chain+0x1b1: e8 00 00 00 00 call <...>
How SDT works (continued)
• Executable file doesn't match running binary:
Code offset executable file contents running
code
[ ... ] [ ... ] [ ... ]
squeue_enter_chain+0x1b1: call <_dtrace_probe_...> nop
squeue_enter_chain+0x1b2: ... nop
squeue_enter_chain+0x1b3: ... nop
squeue_enter_chain+0x1b4: ... nop
squeue_enter_chain+0x1b5: ... nop
Zero overhead !
How SDT works (continued)
• Kernel runtime linker, krtld:
usr/src/uts/intel/amd64/krtld/kobj_reloc.c
#define SDT_NOP 0x90
#define SDT_NOPS 5
static int
sdt_reloc_resolve(struct module *mp, char *symname, uint8_t *instr)
{
[ ... ]
/*
* The "statically defined tracing" (SDT) provider for DTrace uses
* a mechanism similar to TNF, but somewhat simpler. (Surprise,
* surprise.) The SDT mechanism works by replacing calls to the
* undefined routine __dtrace_probe_[name] with nop instructions.
* The relocations are logged, and SDT itself will later patch the
* running binary appropriately.
*/
[ ... ]
for (i = 0; i < SDT_NOPS; i++)
instr[i - 1] = SDT_NOP;
[ ... ]
Tracepoint insertion – DTracing DTrace
• Quick idea:
# dtrace -n "fbt::fasttrap_tracepoint_install:entry { stack();ustack();exit(0) }"
libc.so.1`ioctl+0xa
libdtrace.so.1`dtrace_program_exec+0x51
dtrace`exec_prog+0x37
dtrace`main+0xc02
dtrace`0x4026cc
Tracepoint insertion, FBT provider
• Tracepoint enabling/disabling: simple memory write
static void
fbt_enable(void *arg, dtrace_id_t id, void *parg)
{
fbt_probe_t *fbt = parg;
struct modctl *ctl = fbt->fbtp_ctl;
[ ... ]
for (; fbt != NULL; fbt = fbt->fbtp_next)
*fbt->fbtp_patchpoint = fbt->fbtp_patchval;
}
static void
fbt_disable(void *arg, dtrace_id_t id, void *parg)
{
fbt_probe_t *fbt = parg;
struct modctl *ctl = fbt->fbtp_ctl;
[ ... ]
for (; fbt != NULL; fbt = fbt->fbtp_next)
*fbt->fbtp_patchpoint = fbt->fbtp_savedval;
}
uts/intel/dtrace/fbt.c
The core of DTrace – trap interposition
/* uts/intel/ia32/ml/exception.s
* #BP
*/
ENTRY_NP(brktrap)
#if defined(__amd64)
cmpw $KCS_SEL, 8(%rsp)
Usermode tracepoint hook
je bp_jmpud
#endif
TRAP_NOERR(T_BPTFLT) /* $3 */
jmp dtrace_trap
#if defined(__amd64)
bp_jmpud:
/*
* This is a breakpoint in the kernel -- it is very likely that this
* is DTrace-induced. To unify DTrace handling, we spoof this as an
* invalid opcode (#UD) fault. Note that #BP is a trap, not a fault --
* we must decrement the trapping %rip to make it appear as a fault.
* We then push a non-zero error code to indicate that this is coming
* from #BP.
*/
decq (%rsp)
push $1 /* error code -- non-zero for #BP */
jmp ud_kernel
#endif
SET_SIZE(brktrap)
The core of DTrace – trap interposition
ENTRY_NP(invoptrap) uts/intel/ia32/ml/exception.s
cmpw $KCS_SEL, 8(%rsp)
jne ud_user
/*
* We must first check if DTrace has set its NOFAULT bit. This
* regrettably must happen before the TRAPTRACE data is recorded,
* because recording the TRAPTRACE data includes obtaining a stack
* trace -- which requires a call to getpcstack() and may induce
* recursion if an fbt::getpcstack: enabling is inducing the bad load.
*/
movl %gs:CPU_ID, %eax
shlq $CPU_CORE_SHIFT, %rax
leaq cpu_core(%rip), %r8
addq %r8, %rax
movw CPUC_DTRACE_FLAGS(%rax), %cx
testw $CPU_DTRACE_NOFAULT, %cx
jnz .dtrace_induced
[ ... ]
Jesus
John 19:28-29
The DTrace backend on
Solaris for x86/x64
Frank Hofmann
Frank.Hofmann@sun.com