Understanding Kernel Oops
Understanding Kernel Oops
Understanding Kernel Oops
1 of 7
Write For Us
HOME
Submit Tips
REVIEWS
http://www.linuxforu.com/2011/01/understanding-a-kerne...
HOW-TOS
CODING
Search
INTERVIEWS
FEATURES
OVERVIEW
BLOGS
SERIES
IT ADMIN
Understanding a kernel panic and doing the forensics to trace the bug
is considered a hackers job. This is a complex task that requires sound
knowledge of both the architecture you are working on, and the
internals of the Linux kernel. Depending on type of error detected by
the kernel, panics in the Linux kernel are classied as hard panics
(Aiee!) and soft panics (Oops!). This article explains the workings of a
Linux kernel Oops, helps to create a simple version, and then debug it.
It is mainly intended for beginners getting into Linux kernel
development, who need to debug the kernel. Knowledge of the Linux
kernel, and C programming, is assumed.
An Oops is what the kernel throws at us when it finds something faulty, or an exception, in the
kernel code. Its somewhat like the segfaults of user-space. An Oops dumps its message on the
console; it contains the processor status and the CPU registers of when the fault occurred. The
offending process that triggered this Oops gets killed without releasing locks or cleaning up
structures. The system may not even resume its normal operations sometimes; this is called an
unstable state. Once an Oops has occurred, the system cannot be trusted any further.
Lets try to generate an Oops message with sample code, and try to understand the dump.
2 of 7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
http://www.linuxforu.com/2011/01/understanding-a-kerne...
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
static void create_oops() {
*(int *)0 = 0;
}
static int __init my_oops_init(void) {
printk("oops from the module\n");
create_oops();
return (0);
}
static void __exit my_oops_exit(void) {
printk("Goodbye world\n");
}
module_init(my_oops_init);
module_exit(my_oops_exit);
This is the error code value in hex. Each bit has a significance of its own:
bit 0 == 0 means no page found, 1 means a protection fault
bit 1 == 0 means read, 1 means write
bit 2 == 0 means kernel, 1 means user-mode
3 of 7
http://www.linuxforu.com/2011/01/understanding-a-kerne...
[#1] this value is the number of times the Oops occurred. Multiple Oops can be triggered as
a cascading effect of the first one.
CPU 1
2.6.33.3-85.fc13.x86_64
The Tainted flag points to P here. Each flag has its own meaning. A few other flags, and their
meanings, picked up from kernel/panic.c:
P Proprietary module has been loaded.
F Module has been forcibly loaded.
S SMP with a CPU not designed for SMP.
R User forced a module unload.
M System experienced a machine check exception.
B System has hit bad_page.
U Userspace-defined naughtiness.
A ACPI table overridden.
W Taint on warning.
RIP: 0010:[<ffffffffa03e1012>]
RIP is the CPU register containing the address of the instruction that is getting executed. 0010
0018:ffff88007ad4bf08
0000000000000018 RBX:
0000000000000000 RSI:
ffff88007ad4bf08 R08:
0000000000000000 R11:
00000000016b0030 R14:
EFLAGS: 00010292
ffffffffa03e1000 RCX:
0000000000000046 RDI:
ffff88007af1cba0 R09:
ffff88007ad4bd68 R12:
0000000000019db9 R15:
00000000000013b7
0000000000000246
0000000000000004
0000000000000000
00000000016b0010
The above is the call trace the list of functions being called just before the Oops occurred.
Code: <c7> 04 25 00 00 00 00 00 00 00 00 31 c0 c9 c3 00 00 00 00 00 00 00
The Code is a hex-dump of the section of machine code that was being run at the time the Oops
occurred.
Next, add the symbol file to the debugger. The add-symbol-file commands first argument is
oops.o and the second argument is the address of the text section of the module. You can obtain
this address from /sys/module/oops/sections/.init.text (where oops is the module name):
(gdb) add-symbol-file oops.o 0xffffffffa03e1000
add symbol table from file "oops.o" at
.text_addr = 0xffffffffa03e1000
4 of 7
http://www.linuxforu.com/2011/01/understanding-a-kerne...
(y or n) y
Reading symbols from /code/oops/oops.o...done.
From the RIP instruction line, we can get the name of the offending function, and disassemble it.
(gdb) disassemble my_oops_init
Dump of assembler code for function my_oops_init:
0x0000000000000038 <+0>:
push
%rbp
0x0000000000000039 <+1>:
mov
$0x0,%rdi
0x0000000000000040 <+8>:
xor
%eax,%eax
0x0000000000000042 <+10>:
mov
%rsp,%rbp
0x0000000000000045 <+13>:
callq 0x4a <my_oops_init+18>
0x000000000000004a <+18>:
movl
$0x0,0x0
0x0000000000000055 <+29>:
xor
%eax,%eax
0x0000000000000057 <+31>:
leaveq
0x0000000000000058 <+32>:
retq
End of assembler dump.
Now, to pin point the actual line of offending code, we add the starting address and the offset. The
offset is available in the same RIP instruction line. In our case, we are adding
0x0000000000000038 + 0x012 =
References
The kerneloops.org website can be used to pick up a lot of Oops messages to debug. The Linux
kernel documentation directory has information about Oops kernel/Documentation/oopstracing.txt. This, and numerous other online resources, were used while creating this article.
Related Posts:
CRASH Your System (and Debug Kernel Panic)
Kernel Debugging Using Kprobe and Jprobe
Debugging the Linux Kernel with debugfs
Kernel Tracing with ftrace, Part 1
Loading Library Files in C++
Tags: C, Debugging, Fedora, GDB, kernel aiee, kernel code, kernel development, kernel oops, kernel panic, kerneloops.org, LFY January
2011, Linux kernel, Loadable kernel modules, makefile, modprobe, processor status, segfaults, unstable state
Previous Post
Next Post
5 of 7
http://www.linuxforu.com/2011/01/understanding-a-kerne...
What's this?
3 comments
1 comment
My Life Scoop
3 comments
Leave a message...
Discussion
Community
Vaske Cyberpop
Share
a year ago
Reply
levitra PGD
Share
11 months ago
I was very over the moon to find this site.I wanted to offer
you on this great presume from!! I obviously enjoying every bantam speck of it
and I suffer with you bookmarked to monitor elsewhere novel pieces you post.
Reply
James brunt
Share
11 months ago
Great
post my friend, very nice. congrats! if you have some time, take a look on my
page, is linked to my name.
0
Comment feed
Reply
Share
Search for:
Search
Get Connected
RSS Feed
6 of 7
http://www.linuxforu.com/2011/01/understanding-a-kerne...
Follow
+1,888
Find us on Facebook
Open Source For You
Like
240,840 people like Open Source For You.
@LinuxForYou
7 of 7
http://www.linuxforu.com/2011/01/understanding-a-kerne...
Galaxy Pocket Neo Android Phone
bit.ly/13XpmlS
yesterday reply retweet favorite
Popular
Comments
Tag cloud
Code Sport
March 1, 2013 1 Comments Prashant Phatak
Reviews
How-Tos
Coding
Interviews
Features
Overview
Blogs
Search
Popular tags
Linux, ubuntu, Java, MySQL, Google, python, Fedora, Android, PHP, C, html,
web applications, India, Microsoft, unix, Windows, Red Hat, Oracle, Security,
Apache, xml, LFY April 2012, GNOME, http, JavaScript, LFY June 2011,
FOSS, open source, RAM, operating systems
All published articles are released under Creative Commons Attribution-ShareAlike 3.0 Unported License, unless otherwise noted.
LINUX For You is powered by WordPress, which gladly sits on top of a CentOS-based LEMP stack.