Linux Troubleshooting Cheatsheet: Strace
Linux Troubleshooting Cheatsheet: Strace
Linux Troubleshooting Cheatsheet: Strace
This cheatsheet is a great guide of command-lines linux admins can use to get
insights into their servers. Whether you’ve been an admin for one month or 20
years you’ve definitely used one if not all of these tools to troubleshoot an issue.
Because we love sysdig (naturally!) we also included a translation for each of these
common operations into the sysdig command line or csysdig.
Rather than attempt covering all options from manpages (which would have made
for boring coverage of many esoteric, rarely-used switches), we’ve started from
examples referenced at the most popular web pages you’d find when you search
for terms like “strace examples”, “htop examples”, and so forth.
Do you have favorites that aren’t listed here? Let us know and we’ll include
them in future articles.
strace
There’s one subtle difference between strace and sysdig that will be apparent in many of these side-by-side
comparisons: Many of the simplest strace examples include command-lines that are executed and traced as a “one-
shot” operation. On the other hand, sysdig has a somewhat different philosophy, in that it either watches live events
from afar as they happen, or analyzes capture data previously saved to a file. Thankfully, sysdig’s rich filtering
options provide the knobs to watch for specific one-shot executions, as you’ll soon see.
1
strace (cont’d)
Print a timestamp strace -t who sysdig proc.name=who sysdig prints timestamps by default.
for each output line
of the trace
Print relative time strace -r who sysdig -tD proc.name=who sysdig offers several more ways to represent
for system calls timestamps via the -t option.
Generate batch strace -c who sysdig -w output.scap proc. Sysdig’s default behavior is more optimized for
statistics reports of name=who # Now run the the case of presenting event data as it happens
system calls “who” separately rather than “batch” reporting. This is why the
sysdig equivalent is done in two steps here.
For one-shot batch text
reports:
sysdig -r output.scap -c
topscalls -c topscalls_time
Generate live, per- N/A csysdig -v syscalls proc. While strace can show individual events as they
second statistics pid=1363 happen live, or provide a single batch report for
reports of system the execution of a command, csysdig’s views
calls for running provide a unique ability to show live, periodic
process with reports.
PID=1363
2
htop
Since htop is a live, interactive, curses-style tool, we’ll compare it to the live, interactive, curses-style csysdig.
For starters, both tools use the same approach of navigating the live table via Up/Down/Left/Right arrows and
also PgUp/PgDn. For operations that affect a single process (killing, renicing, etc.) it is assumed you’ve used these
controls to first highlight a particular process.
Renice a process Press F7 or ] to reduce the Press ] to reduce the nice value by 1
nice value by 1
Press [ to increase the nice value by 1
Press F8 or [ to increase the
nice value by 1
Change the output Launch as: Launch as: As you can see, htop
refresh interval htop -d 50 csysdig -d 5000 works in units of tenths-
to once every 5 of-a-second, while csysdig
seconds works in milliseconds.
List open files for a Press l to run a one-time Press f to run a one-time lsof or to see real- See the Note above for
process lsof time, updating reports of files/directories “Renice a process” about
used by a process, drill down to a specific how the one-time lsof
process by pressing Enter, then press F2 and was recently added as an
select a View such as Files, File Opens List, or enhancement.
Directories
List processes that have opened the lsof /var/log/syslog sysdig -c lsof "fd.name=/var/log/syslog"
specific file /var/log/syslog
List processes that have opened files lsof +d /var/log sysdig -c lsof "fd.directory=/var/log"
under the directory /var/log
List files opened by processes named lsof -c sshd sysdig -c lsof “proc.name=sshd”
“sshd”
List files opened by a specific user lsof -u phil sysdig -c lsof “user.name=phil”
named “phil”
List files opened by everyone except lsof -u ^phil sysdig -c lsof “user.name!=phil”
for the user named “phil”
List all open files for a specific lsof -p 1081 sysdig -c lsof “proc.pid=1081”
process with PID=1081
List all files opened by user “phil” or a lsof -u phil -c sysdig -c lsof “’user.name=phil or proc. Note the use of
process named “sshd” (OR logic) sshd name=sshd’” two layers of
quotes with the
sysdig filter.
List all files opened by an “sshd” lsof -u phil -c sshd sysdig -c lsof “’user.name=phil and proc. Note the use of
process for user “phil” (AND logic) -a name=sshd’” two layers of
quotes with the
Sysdig filter.
Observe repeating reports of open Enable repeat mode Similar live data can be obtained with a
files based on live activity with one of: live/interactive csysdig view, launched
like so:
lsof -r csysdig -v files
lsof +r csysdig -v file_opens
List network connections in use by a lsof -i -a -p 1014 sysdig -c lsof “’fd.type=ipv4 and proc. Note the use of
specific process with PID=1014 pid=1014’” two layers of
quotes with the
sysdig filter.
List processes that are listening on lsof -i :22 sysdig -c lsof “’fd.port=22 and fd.is_ Note the use of
port 22 server=true’” two layers of
quotes with the
Sysdig filter.
List all TCP or UDP connections lsof -i tcp sysdig -c lsof “fd.l4proto=tcp”
4
tcpdump
tcpdump is focused entirely on network traffic, while network traffic is only a subset of what sysdig covers. Many
tcpdump use cases involve filtering, and tcpdump uses network-specific BPF filters, whereas sysdig uses its own
broader sysdig filtering. The two approaches look similar in many ways, but you’ll want to look at the docs for each
side-by-side as you progress to more advanced filtering needs. Also, since in Linux everything is a file, you’ll notice
the sysdig filtering examples below all leverage a “network-connections-via-file-descriptors” approach.
Capture packet data, tcpdump sysdig -w saved.scap The sysdig file format is capable of holding
writing it into into a file -w saved. fd.type=ipv4 event data for much more than just
pcap network packets (e.g. system calls).
Capture only packets tcpdump sysdig “fd.type=ipv4 and evt. The greater/less options in tcpdump
longer/smaller than 1024 greater 1024 buflen > 1024” reference overall packet length whereas
bytes evt.buflen in sysdig is relative to payload
tcpdump less sysdig “fd.type=ipv4 and evt. size.
1024 buflen < 1024”
Capture only UDP or TCP tcpdump udp sysdig fd.l4proto=udp Note that we don’t need to explicitly
packets include fd.type=ipv4 since we’re using
tcpdump tcp sysdig fd.l4proto=tcp other network-only filters here.
Capture only packets tcpdump port sysdig fd.port=22 Note that we don’t need to explicitly
going to/from a 22 include fd.type=ipv4 since we’re using
particular port other network-only filters here.
Capture packets for a tcpdump dst sysdig fd.rip=54.165.81.189 and Note that we don’t need to explicitly
particular destination IP 54.165.81.189 fd.port=6666 include fd.type=ipv4 since we’re using
and port and port 6666 other network-only filters here.
Observe traffic for Launch as: Launch as: sysdig/csysdig do not currently
just the eth0 interface iftop -i eth0 csysdig -v connections have filtering based on named
(192.168.10.119) fd.ip=192.168.10.119 interfaces, but the equivalent via IP
address is shown here.
Or mouse-click on Filter: from
within csysdig, then append and
fd.ip=192.168.10.119 to the existing
filter text
Resolve DNS names Press n from within Press n from within csysdig to run
iftop to toggle nslookup on the currently-highlighted
resolution for all remote host
hosts shown
Change sort order based Press < to sort by Press F9 or > and then select a column
on a column of the table source Press > to by name, or
sort by destination
Press shift <1-9> to sort by any
column n, and press repeatedly to
invert sort order, or
Scroll the display Press j to scroll up Press Up/Down/Left/Right arrows or sysdig/csysdig go well beyond
PgUp/PgDn to scroll through the table scrolling through a single-table,
Press k to scroll since you can drill down into the
down Connections View to see data
in other groupings such as per-
container or per-thread.