Fundamental UNIX
Lecture 6: Unix — text processing(1)
Dr Mahamed Lamine Guindo
Assistant Professor
• Simple String Search//Searches for "hello" in [Link].
grep "hello" [Link]
• Case-Insensitive Search//Searches for "hello" in [Link] without
considering case.
grep -i "hello" [Link]
• Search Recursively//Recursively searches for "hello" in all files within
/path/to/directory.
grep -r "hello" /path/to/directory
• Counting Matches//Counts the number of lines containing "hello" in
[Link].
grep -c "hello" [Link]
Show Lines Without the Match//Prints 2 lines before and after each match
of "hello"
grep -v "hello" [Link]
• Match Whole Words Only//Matches "hello" as a whole word, so
"hello" matches but "hello123" does not.
grep -w "hello" [Link]
bzcat and zcat are both command-line tools used to decompress and display the
contents of compressedfiles directly to the terminal, but they work with different types of
compressed files:
[Link]: Used for files compressed with bzip2 (.bz2).
[Link]: Used for files compressed with gzip (.gz)
bzcat
•Purpose: Decompress and display the contents of a bzip2-compressed file (.bz2 file) without
actually uncompressing it on disk.
bzcat [Link].bz2
zcat
•Purpose: Decompress and display the contents of a gzip-compressed file (.gz file) without
uncompressing it on disk.
zcat [Link]
-------------compress-------------------
echo "This is some text" > [Link]
gzip [Link]
• echo "This is some text" > [Link]
• bzip2 [Link]
-------------decompress-------------------
• zcat [Link]
• bzcat [Link].bz2 | grep “text"
• The cut command is used for extracting specific sections from each
line of files or input data. It’s particularly useful for handling CSV, TSV,
or other text-based data formats.
• cut OPTION [FILE...]
Common Options
•-f : Specifies the fields to cut by number.
•-d : Specifies the delimiter to use.
•-c : Extracts a specific character or range of characters from each line.
•-b : Extracts a specific byte or range of bytes from each line.
• Example 1: Cutting Columns from a CSV File
Assume you have a file [Link] with this content:
Name,Age,Grade,City
Alice,14,A,New York
Bob,15,B,Los Angeles
Charlie,13,C,Chicago
Extract only the Name and Grade columns:
cut -d',' -f1,3 [Link]
• If you want to extract the first 5 characters of each line in a file
[Link], you can do
•
cut -c1-5 [Link]
Use cut to quickly extract usernames from a list:
• cut -d':' -f1 /etc/passwd
• Changing the Output Delimiter with --output-delimiter
cut -d',' -f1,3 --output-delimiter=';’ [Link]
• To exclude a column:
cut -d',' -f3 --complement [Link]
Cut fruit field
• apple:2.50
• banana:1.20
• orange:1.75
• grape:2.00
• Assignment 2:cut Product and Stock
• Product,Category,Price,Stock
• Laptop,Electronics,1200,5
• Tablet,Electronics,600,10
• Chair,Furniture,150,20
• Desk,Furniture,300,15
CUT EXAMPLE
• echo "Name,Age,Location" > [Link]
• echo "John,30,New York" >> [Link]
• echo "Alice,25,Los Angeles" >> [Link]
• echo "Bob,35,Chicago" >> [Link]
• echo "Eve,28,Miami" >> [Link]
• cut -d ',' -f 1,3 [Link]
Example with Wc
echo "This is the first line." > [Link]
echo "This is the second line." >> [Link]
echo "And this is the third line." >> [Link]
• Display number lines
wc -l [Link]
• To count number of words
wc -w [Link]
• To count number of character
wc -c [Link]
• sort [OPTION]... [FILE]...
Key Options
•-n : Sorts numerically.
•-r : Reverses the sorting order.
•-k : Specifies a key (column) to sort by.
•-t : Sets the delimiter for field separation.
•-o : Writes the result to an output file.
•-u : Removes duplicate lines.
•-b : Ignores leading blanks.
•-f : Ignores case.
•-M : Sorts by month name.
•-c : Checks if a file is sorted, without sorting it.
Example 1: Basic Alphabetical Sort
Assume you have a file [Link]:
Charlie
Alice
Bob
Eve
David
To sort this file alphabetically:
sort [Link]
Using the same [Link] file, you can reverse the sort order:
sort -r [Link]
10
Assume you have a file [Link]: 2
25
1
sort -n [Link] 17
• Suppose you have a file [Link] with the following tab-
separated data:
101 John 55000
102 Jane 75000
103 Bob 62000
104 Emma 48000
To sort by the third column (Salary) in ascending order:
sort -k3 -n [Link] [Link]
apple
Removing Duplicates with -u banana
apple
orange
sort -u [Link]
banana
grape