A Unix/Linux Find Command Tutorial: Locating Files
A Unix/Linux Find Command Tutorial: Locating Files
Locating Files:
The find command is used to locate files on a Unix or Linux system. find will search
any set of directories you specify for files that match the supplied search criteria. You
can search for files by name, owner, group, type, permissions, date, and other criteria.
The search is recursive in that it will search all subdirectories too. The syntax looks like
this:
All arguments to find are optional, and there are defaults for all parts. (This may depend
on which version of find is used. Here we discuss the freely available GNU version of
find, which is the version available on YborStudent.) For example where-to-look
defaults to . (that is, the current working directory), criteria defaults to none (that is,
show all files), and what-to-do (known as the find action) defaults to -print (that is,
display the names of found files to standard output). Technically the criteria and actions
are all known as find primaries.
For example:
find
will display the pathnames of all files in the current directory and all subdirectories. The
commands
find . -print
find -print
find .
do the exact same thing. Here's an example find command using a search criterion and
the default action:
This will search the whole system for any files named foo and display their pathnames.
Here we are using the criterion -name with the argument foo to tell find to perform a
name search for the filename foo. The output might look like this:
/home/wpollock/foo
/home/ua02/foo
/tmp/foo
If find doesn't locate any matching files, it produces no output.
The above example said to search the whole system, by specifying the root directory (/)
to search. If you don't run this command as root, find will display a error message for
each directory on which you don't have read permission. This can be a lot of messages,
and the matching files that are found may scroll right off your screen. A good way to
deal with this problem is to redirect the error messages so you don't have to see them at
all:
In such cases you can specify the action -print0 instead. This lists the found files
separated not with a newline but with a null (or NUL) character, which is not a legal
character in Unix or Linux file names. Of course the command that reads the output of
find must be able to handle such a list of file names. Many commands commonly used
with find (such as tar or cpio) have special options to read in file names separated with
NULs instead of spaces.
This will search from the current directory down for foo*bar (that is, any filename that
begins with foo and ends with bar). Note that wildcards in the name argument must be
quoted so the shell doesn't expand them before passing them to find. Also, unlike
regular shell wildcards, these will match leading periods in filenames. (For example
find -name \*.txt.)
You can search for other criteria beside the name. Also you can list multiple search
criteria. When you have multiple criteria any found files must match all listed criteria.
That is, there is an implied Boolean AND operator between the listed search criteria.
find also allows OR and NOT Boolean operators, as well as grouping, to combine search
criteria in powerful ways (not shown here.)
will find any regular files (i.e., not directories or other special files) with the criteria
-type f, and only those modified seven or fewer days ago (-mtime -7). Note the use of
xargs, a handy utility that coverts a stream of input (in this case the output of find) into
command line arguments for the supplied command (in this case tar, used to create a
backup archive).
Using the tar option -c is dangerous here; xargs may invoke tar
several times if there are many files found and each -c will cause tar
to over-write the previous invocation. The -r option appends files
to an archive. Other options such as those that would permit
filenames containing spaces would be useful in a production quality
backup script.
Another use of xargs is illustrated below. This command will efficiently remove all files
named core from your system (provided you run the command as root of course):
(The last two forms run the rm command once per file, and are not as efficient as the first
form.)
One of my favorite find criteria is to locate files modified less than 10 minutes ago. I
use this right after using some system administration tool, to learn which files got
changed by that tool:
(This search is also useful when I've downloaded some file but can't locate it.)
Another common use is to locate all files owned by a given user (-user username).
This is useful when deleting user accounts.
You can also find files with various permissions set. -perm /permissions means to
find files with any of the specified permissions on, -perm -permissions means to find
files with all of the specified permissions on, and -perm permissions means to find
files with exactly permissions. Permissions can be specified either symbolically
(preferred) or with an octal number. The following will locate files that are writeable by
others (including symlinks, which should be writeable by all):
(Using -perm is more complex than this example shows. You should check both the
POSIX documentation for find (which explains how the symbolic modes work) and the
Gnu find man page (which describes the Gnu extensions).
When using find to locate files for backups, it often pays to use the -depth option
(really a criterion that is always true), which forces the output to be depth-first—that is,
files first and then the directories containing them. This helps when the directories have
restrictive permissions, and restoring the directory first could prevent the files from
restoring at all (and would change the time stamp on the directory in any case).
Normally, find returns the directory first, before any of the files in that directory. This is
useful when using the -prune action to prevent find from examining any files you want
to ignore:
When specifying time with find options such as -mmin (minutes) or -mtime (24 hour
periods, starting from now), you can specify a number n to mean exactly n, -n to mean
less than n, and +n to mean more than n.
For example:
find . -mtime 0 # find files modified between now and 1 day ago
# (i.e., within the past 24 hours)
find . -mtime -1 # find files modified less than 1 day ago
# (i.e., within the past 24 hours, as before)
find . -mtime 1 # find files modified between 24 and 48 hours ago
find . -mtime +1 # find files modified more than 48 hours ago
Using the -printf action instead of the default -print is useful to control the output
format better than you can with ls or dir. You can use find with -printf to produce
output that can easily be parsed by other utilities or imported into spreadsheets or
databases. See the man page for the dozens of possibilities with the -printf action. (In
fact find with -printf is more versatile than ls and is the preferred tool for forensic
examiners even on Windows systems, to list file information.) For example the
following displays non-hidden (no leading dot) files in the current directory only (no
subdirectories), with an custom output format:
find . -maxdepth 1 -name '[!.]*' -printf 'Name: %16f Size: %6s\n'
On any version of find you can use this more complex (but portable) code:
Note that -maxdepth 1 will include . unless you also specify -mindepth 1. A
portable way to include . is:
As a system administrator you can use find to locate suspicious files (e.g., world
writable files, files with no valid owner and/or group, SetUID files, files with unusual
permissions, sizes, names, or dates). Here's a final more complex example (which I
saved as a shell script):
This says to seach the whole system, skipping the directories /proc, /sys, /dev, and
/windows-C-Drive (presumably a Windows partition on a dual-booted computer). The
Gnu -noleaf option tells find not to assume all remaining mounted filesystems are Unix
file systems (you might have a mounted CD for instance). The -o is the Boolean OR
operator, and ! is the Boolean NOT operator (applies to the following criteria).
So these criteria say to locate files that are world writable (-perm -2, same as -o=w) and
NOT symlinks (! -type l) and NOT sockets (! -type s) and NOT directories with the
sticky (or text) bit set (! \( -type d -perm -1000 \)). (Symlinks, sockets and
directories with the sticky bit set are often world-writable and generally not suspicious.)
A common request is a way to find all the hard links to some file. Using ls -li file
will tell you how many hard links the file has, and the inode number. You can locate all
pathnames to this file with:
find mount-point -xdev -inum inode-number
Since hard links are restricted to a single filesystem, you need to search that whole
filesystem so you start the search at the filesystem's mount point. (This is likely to be
either /home or / for files in your home directory.) The -xdev options tells find to not
search any other filesystems.
(While most Unix and all Linux systems have a find command that supports the -inum
criterion, this isn't POSIX standard. Older Unix systems provided the ncheck utility
instead that could be used for this.)
However this approach has two limitations. Firstly not all commands accept the list of
files at the end of the command. A good example is cp:
(Note the Gnu version of cp has a non-POSIX option -t for this, and xargs has options
to handle this too.)
Secondly filenames may contain spaces or newlines, which would confuse the command
used with xargs. (Again Gnu tools have options for that, find ... -print0 |xargs -
0 ....)
There are POSIX (but non-obvious) solutions to both problems. An alternate form of
-exec ends with a plus-sign, not a semi-colon. This form collects the filenames into
groups or sets, and runs the command once per set. (This is exactly what xargs does, to
prevent argument lists from becoming too long for the system to handle.) In this form the
{} argument expands to the set of filenames. For example:
This form of -exec can be combined with a shell feature to solve the other problem
(names with spaces). The POSIX shell allows us to use:
(We don't usually care about the command-name, so X, dummy, or inline cmd is often
used.) Here's an example of efficiently copying found files to /tmp, in a POSIX-
compliant way (Posted on comp.unix.shell netnews newsgroup on Oct. 28 2007 by
Stephane CHAZELAS):
Common Gotcha:
If the given expression to find does not contain any of the action primaries -exec, -ok,
or -print, the given expression is effectively replaced by:
The implied parenthesis can cause unexpected results. For example, consider these two
similar commands: