Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
57 views

Lesson 04 Text Files

This document provides an overview of common Linux text processing tools: - Section 4.1 describes text file viewing tools like head, tail, cat - Section 4.2 covers the grep tool for searching text files - Section 4.3 defines regular expressions used with grep and other tools - Section 4.4 presents awk for data extraction and reporting - Section 5.5 introduces sed for editing text files The document gives examples of using each tool to view, search, extract, and edit parts of text files.

Uploaded by

Taha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

Lesson 04 Text Files

This document provides an overview of common Linux text processing tools: - Section 4.1 describes text file viewing tools like head, tail, cat - Section 4.2 covers the grep tool for searching text files - Section 4.3 defines regular expressions used with grep and other tools - Section 4.4 presents awk for data extraction and reporting - Section 5.5 introduces sed for editing text files The document gives examples of using each tool to view, search, extract, and edit parts of text files.

Uploaded by

Taha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Lesson 04 Text Files

Created @December 8, 2021 6:38 AM

Class

Type

Materials

Reviewed

Last Update @December 8, 2021 10:23 PM


4.1 Text tools
4.2 grep (Generic Regular Expression Parser)
4.3 Regular Expressions
POSIX:
4.4 awk
4.5 sed (Stream Editor)

4.1 Text tools


more // read file contents
less // more advance features tham "more" // can browe forward (space bar) and back
ward (Page Up)
head // show the first 10 lines
tail // show the last 10 lines
-n nn // to specify exact number of lines
cat
-A : shows all non-printable characters (tab, end of line, ...)
-b : line numbers
-s : supress repeated embty lines
tac // same as cat, but in reverse order, funny command
cut // filter output
sort // sort output
tr // translate // works like find & replace

head -n 5 /etc/passwd
head -n 10 /etc/passwd | tail -n 1 // show line number 10
tail -n 3 /etc/passwd
tail -f /var/log/messages

$ head -n 5 /etc/passwd | tail -n 1 // show line number 5 from the file /e


tc/passwd

Lesson 04 Text Files 1


lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin

cut -f 3 -d : /etc/passwd | less // cut field number 3, where delimiter is ":"


cut -f 3 -d : /etc/passwd | sort | less
cut -f 3 -d : /etc/passwd | sort -n | less // sort as numbers
cut -f 1 -d : /etc/passwd | sort | tr [a-z] [A-Z] // all converted to UPPER CASE
cut -f 1 -d : /etc/passwd | sort | tr [:lower:] [:upper:] // all converted to UPPER
CASE // works with Special Characters // better multi langage support

4.2 grep (Generic Regular Expression


Parser)
find text in a file or in an output

ps -aux | grep ssh


grep linda * 2> /dev/null // search for linda, in all files, in the current dire
ctory
// it will show file names & the line containing "lind
a"
grep '\<root\>' * 2> /dev/null // search for "root", in all files, in the current d
irectory
grep -l linda * 2> /dev/null // l : less, show list of files only
grep -i linda * // -i : ignor case
grep -A5 linda /etc/passwd // print the following 5 lines after finding linda //
useful in logs
grep -B5 linda /etc/passwd // print the previous 5 lines before finding linda //
useful in logs
grep -R root /etc // Recursively find the word root
grep -Rl root /etc 2> /dev/null | less // l : less

egrep '^[[:alpha:]]{3}$' * 2> /dev/null // egrep all lines that are exactly 3
alphabets
grep '^...$' * 2> /dev/null // grep all lines that are exactly 3 c
haracters
$ grep '^endif$' * 2> /dev/null // find exactlty "endif"
grep '\<endif\>' * 2> /dev/null // find exactlty "endif"

4.3 Regular Expressions


globbing : applies to file name

Regular Expression : applies to search patterns for a text inside


a file

Lesson 04 Text Files 2


grep 'a*' a* // first 'a*' is Regular expression, to search for the pattern 'a*'
inside the file
// second a* is globbing, to search for files with a*

Regular expressions are used with:

grep

vim

awk

sed

POSIX:
The Portable Operating System Interface is a family of standards specified by the
IEEE Computer Society for maintaining compatibility between operating systems.

The goal of POSIX is to ease the task of cross-platform software development by
establishing a set of guidelines for operating system vendors to follow. Ideally, a
developer should have to write a program only once to run on all POSIX-compliant
systems.

man 7 regex // Regular Expression

$ cat regtext
b
bt
bit
bite
boot
bloat
boat

Lesson 04 Text Files 3


Regular expression must be
between single quotes ' ',
'b.*t'

The period . matches any single character.

Anchoring
The caret ^ and the dollar sign $ are meta-characters that respectively match the
empty string at the beginning and end of a
line.
The Backslash Character and Special Expressions
The symbols \< and \> respectively match the empty string at the beginning and end
of a word.
The symbol \b matches the empty string at the edge of a word, and \B
matches the empty string provided it's not at the edge of a word.

The symbol \w is a synonym for [[:alnum:]] and \W is a synonym for [^[:alnum:]].


Repetition
A regular expression may be followed by one of several repetition operators:
? The preceding item is optional and matched at most once.
* The preceding item will be matched zero or more times.
+ The preceding item will be matched one or more times.
{n} The preceding item is matched exactly n times.
{n,} The preceding item is matched n or more times.
{,m} The preceding item is matched at most m times. This is a GNU extension.
{n,m} The preceding item is matched at least n times, but not more than m times.

* is a repetition operator for


zero or more

Lesson 04 Text Files 4


? is an Extended Regular
Expression. ? did not work * is a repetition operator for
with grep, it works with egrep. zero or more

* is a repetition operator for


zero or more. boat does not
match, because * means that
"o" (the preceding character)
is repeated zero or more
times.

4.4 awk
awk is specialized in data extraction and reporting (could be sent to a printer).

$ awk -F : '/linda/ { print $4 }' /etc/passwd // -F : the delimiter, $4 is the


field number 4
1001

awk -F : '{ print $NF }' /etc/passwd // $NF number of fields, print the last fie
ld in the line.
// useful when number of fields are not the sam
e in all lines.
/bin/bash
/sbin/nologin
/sbin/nologin
/sbin/nologin
/sbin/nologin
/bin/sync
/sbin/shutdown
/sbin/halt
/sbin/nologin

// print the last column of ps -aux


$ ps -aux | awk '{ print $NF }'

$ ls -l /etc | awk '/pass/ { print }' | less


-rw-r--r--. 1 root root 2598 Dec 6 16:04 passwd
-rw-r--r--. 1 root root 2557 Dec 4 23:41 passwd-
(END)

Lesson 04 Text Files 5


$ ls -l /etc | grep pass
-rw-r--r--. 1 root root 2598 Dec 6 16:04 passwd
-rw-r--r--. 1 root root 2557 Dec 4 23:41 passwd-

4.5 sed (Stream Editor)


$ cat sedfile
one
two
three
four
five

$ sed -n 4p sedfile // -n 4p print line number 4


four

$ sed -i s/four/FOUR/g sedfile // -i write directly to the file, // s substi


tute and replace
// without -i it will write to the stdout
$ cat sedfile
one
two
three
FOUR
five

$
$ sed -n 4p sedfile
FOUR

$ sed -i -e '2d' sedfile // -i modify the file, 2d delete line number 2


$ cat sedfile
one
three
FOUR
five

Lesson 04 Text Files 6

You might also like