Tips and Tricks For Diagnosing Lustre Problems On Cray Systems
Tips and Tricks For Diagnosing Lustre Problems On Cray Systems
Tips and Tricks For Diagnosing Lustre Problems On Cray Systems
7. Acknowledgments
Many thanks to the Cray benchmarking and SPS staff
including the field support for always providing the
needed data, insights, and operational support in whose
experience the authors based this paper on.
Also, thank you to the CUG 2010 attendees for
requesting this type of contribution from the Cray Lustre
team.
Finally, thank you to Nic Henke for providing insight
into gnilnd internals. Questions not covered in this
paper pertaining to the gnilnd internals can be directed
to nic@cray.com.
This material is based upon work supported by the
Defense Advanced Research Projects Agency under its
Agreement No. HR0011-07-9-0001. Any opinions,
findings and conclusions or recommendations expressed
in this material are those of the author(s) and do not
necessarily reflect the views of the Defense Advanced
Research Projects Agency.
#
# Extract Lustre server messages from console log into separate files
# Copyright 2011 Cray Inc. All Rights Reserved.
#
#!/bin/bash
# Usage: <script> <console log> [<hosts>]
usage () {
echo ""
echo "*** Usage: $(basename $0) [-h] <console_log>"
echo ""
echo "*** Extracts MDS and OSS messages from the specified console"
echo "*** log and places them in separate files based on node id."
echo ""
echo "*** File names identify server type, cname, nid, and objects"
echo "*** on the server. The OST list in file name is not guaranteed"
echo "*** to be complete but the gaps in the numbers usually makes "
echo "*** this obvious."
echo ""
echo "*** Options:"
echo "*** -h Prints this message."
echo ""
}
# Parses cname and Lustre object from console log messages of the form:
#
# [2010-10-12 04:16:36][c0-0c0s4n3]Lustre: Server garnid15-OST0005 on device /dev/sdb has started
#
# Finds cname/nid pairings for server nodes. Record format is:
# 2010-10-12 04:13:22][c0-0c0s0n3] HOSTNAME: nid00003
#
# Builds filenames: <oss | mds>.<cname>.<nid>.<target list>
# Extracts records for cname from console file and writes to <filename>
# Produces: mds:c#-#c#s#n:.MDT0000.MGS or
# oss#:c#-#c#s#n:.OST####.OST####...
SERVERS=$( \
grep "${srch_string}" $CONSOLE_LOG | sort -k ${obj_field} -u | \
awk -v fld=$obj_field \
'{match($2, /c[0-9]+-[0-9]+c[0-9]+s[0-9]+n[0-9]+/, cn);
obj=$(fld)
sub(/^.*-/, "", obj);
nodes[cn[0]] = sprintf("%s.%s", nodes[cn[0]], obj);
}
END {
ndx=0
for (cname in nodes) {
if (match(nodes[cname], /OST/)) {
printf "oss%d:%s:%s ", ndx, cname, nodes[cname];
ndx++;
}
else
printf "mds:%s:%s ", cname, nodes[cname];
}
}'
)
}
# Main
SERVERS=""
for idx in $(seq 1 ${#srch[@]}); do
find_servernodes ${objfld[$idx]} "${srch[$idx]}"
if [ "${SERVERS}" != "" ]; then
break
fi
done
nid_file="/tmp/"$(mktemp .nidsXXXXX)
grep "HOSTNAME" ${CONSOLE_LOG} > ${nid_file}
fname=${prefix}.${cname}${nid}${objs}
echo " "$fname
grep "${cname}" ${CONSOLE_LOG} > ${fname}
done
rm ${nid_file}
#!/bin/bash
#
# Sort Lustre dk log into chronological order
# Copyright 2011 Cray Inc. All Rights Reserved.
#
INF=$*
lctl_daytime.sh
#!/bin/bash
#
# Convert dk log into time of day format
# Copyright 2011 Cray Inc. All Rights Reserved.
#
if [ $# -lt 2 ]; then
echo "usage: $(basename $0) <input_file> <output_file>"
exit 1
fi
NOTE: The text description from errno.h is provided to reference the string printed from things like strerror and doesn't reflect
the exact use in the gnilnd. Some errors are used in a bit of a crafty manner.
Error code (name)
text description from errno.h - description of error(s) in the gnilnd
-2 (-ENOENT)
No such file or directory - could not find peer, often for lctl --net peer_list, del_peer, disconnect, etc.
-3 (-ESRCH)
No such process - RCA could not resolve NID to to NIC address.
-5 (-EIO)
I/O error - generic error returned to LNET for failed transactions, used in gnilnd for failed IP sockets reads, etc
-7 (-E2BIG)
Argument list too long - too many peers/conns/endpoints
-9 (-EBADF)
Bad file number - could not validate connection request (datagram) header - like -EPROTO, but for different fields
that should be more static. Most likely a corrupt packet - it will be dropped instead of the NAK for -EPROTO.
-12 (-ENOMEM)
Out of memory - memory couldn't be allocated for some function; also indicates a GART registration failure (for now)
-14 (-EFAULT)
Bad address - failed RDMA send due to fatal network error
-19 (-ENODEV)
No such device - connection request to invalid device
-53 (-EBADR)
Invalid request descriptor - couldn't post datagram for outgoing connection request
-54 (-EXFULL)
Exchange Full - too many SMSG retransmits
-57 (-EBADSLT)
Invalid slot - datagram match for wrong NID.
-70 (-ECOMM)
Communication error on send - we couldn't send an SMSG (FMA) due to a GNI_RC_TRANSACTION_ERROR to
peer. This means that there was some HW issue in trying the send. Check for errors like SMSG send error to
29@gni: rc 11 (SOURCE_SSID_SRSP:REQUEST_TIMEOUT) to find the type and cause of the error.
-71 (-EPROTO)
Protocol error - invalid bits in messages, bad magic, wire version, NID wrong for mailbox, bad timeout. Remote peer
will receive NAK.
-100 (-ENETDOWN)
Network is down - could not create EP or post datagram for new connection setup
-102 (-ENETRESET)
Network dropped connection because of reset - admin ran lctl --net gni disconnect
-103 (-ECONNABORTED)
Software caused connection abort - could not configure EP for new connection with the parameters provided from
remote peer
-104 (-ECONNRESET)
Connection reset by peer - remote peer sent CLOSE to us
-108 (-ESHUTDOWN)
Cannot send after transport endpoint shutdown - we are tearing down the LND.
-110 (-ETIMEDOUT)
Connection timed out - connection did not receive SMSG from peer within timeout
-111 (-ECONNREFUSED)
Connection refused - hardware datagram timeout trying to connect to peer.