Python Notes Mod4
Python Notes Mod4
Python Notes Mod4
You’ll first learn how to open and read information from files.
After that, you’ll learn about the different techniques for reading files, and then you’ll see several
case studies that use the various techniques.
Text files, on the other hand, don’t contain any style information. They contain only readable
characters. You can open a text file in any text editor and read it. You can’t include style
information in text files, but you gain a lot in portability. Plain-text files take up very little disk
space. Compare the size of an empty text file to “empty” OpenOffice, Apple Pages, and Microsoft
Word documents.
They take up little disk space and are easy to process. The power comes from applications that can
process text files that are written with a particular syntax. The Python programs we have been
writing are text files, and by themselves they are only letters in a file. But combined with a Python
interpreter, these Python text files are robust: you can express a powerful algorithm and the
interpreter will follow your instructions.
Similarly, web browsers read and process HTML files, spreadsheets read and process comma-
separated value files, calendar programs read and process calendar data files, and other
programming language applications read and process files written with a particular programming
language syntax.
Opening a File
• Built-in function open opens a file and returns an object that knows how to get
information from the file, how much you’ve read, and which part of the file you’re about
to read next.
• If you call open with only the name of the file (omitting the mode), then the default is 'r'.
file is a path-like object giving the pathname (absolute or relative to the current working directory)
of the file to be opened or an integer file descriptor of the file to be wrapped.
mode is an optional string that specifies the mode in which the file is opened. It defaults
to 'r' which means open for reading in text mode. Other common values are 'w' for writing
(truncating the file if it already exists), 'x' for exclusive creation and 'a' for appending (which
on some Unix systems, means that all writes append to the end of the file regardless of the current
seek position). In text mode, if encoding is not specified the encoding used is platform
dependent: locale.getpreferredencoding(False) is called to get the current locale
encoding.
Character Meaning
Example:
file = open('file_example.txt', 'r')
contents = file.read()
print(contents)
file.close()
The first argument in the example call on function open, 'file_example.txt', is the name of the file
to open, and the second argument, 'r', tells Python that you want to read the file; this is called the
file mode. Other options for the mode include 'w' for writing and 'a' for appending. If you call open
with only the name of the file (omitting the mode), then the default is 'r'.
The second statement, contents = file.read(), tells Python that you want to read the
contents of the entire file into a string, which we assign to a variable called contents.
The third statement prints that string. When you run the program, you’ll see that newline characters
are treated just like every other character; a newline character is just another character in the file.
The last statement, file.close(), releases all resources associated with the open file object.
This example reads the contents of a file into a list of strings and then prints that list:
print(lines)
Here is the output:
Output is :
Mars
Earth
Venus
Mercury
We can use the Readlines technique to read the file, sort the lines, and print the planets
alphabetically (here, we use built-in function sorted, which returns the items in the list in order
from smallest to largest):
Use this technique when you want to do the same thing to every line from the file cursor to the
end of a file. On each iteration, the file cursor is moved to the beginning of the next line.
Write a python program to read first 10 lines and last 5 lines of a text file
print("Top 5 lines")
for i in lines[:5]:
print(i.strip())
print("Bottom 5 lines")
for i in lines[-1:-6:-1]:
print(i.strip())
data = hopedale_file.readline().strip()
while data.startswith('#'):
data = hopedale_file.readline()
total_pelts = int(data)
Writing Files
This program opens a file called topics.txt, writes the words Computer Science to the file, and then
closes the file:
In addition to writing characters to a file, method write returns the number of characters written.
For example, output_file.write('Computer Science') returns 16. To create a new file or to replace
the contents of an existing file, we use write mode ('w'). If the filename doesn’t exist already, then
a new file is created; otherwise the file contents are erased and replaced. Once opened for writing,
you can use method write to write a string to the file.
Rather than replacing the file contents, we can also add to a file using the append mode ('a'). When
we write to a file that is opened in append mode, the data we write is added to the end of the file
and the current file contents are not overwritten. For example, to add to our previous file topics.txt,
we can append the words Software Engineering:
The next example, in a file called total.py, is more complex, and it involves both reading from and
writing to a file. Our input file contains two numbers per line separated by a space. The output file
will contain three numbers: the two from the input file and their sum (all separated by spaces):
sum_num_pairs(open('num_pairs.txt','r'),'out.txt')
1.3 3.4
2 4.2
-1 1
Then total.sum_number_pairs(open('number_pairs.txt', 'r'), 'out.txt') creates this file:
def skip_header(reader):
""" (file open for reading) -> str
Skip the header in reader and return the first real piece of data.
"""
# Read the description line
line = reader.readline()
# Find the first non-comment line
line = reader.readline()
while line.startswith('#'):
line = reader.readline()
# Now line contains the first real piece of data
return line
def process_file(reader):
""" (file open for reading) -> NoneType
Read and print the data from reader, which must start with a single
Description line, then a sequence of lines beginning with '#', then
a sequence of data.
"""
# Find and print the first piece of data
if __name__ == '__main__':
with open('hopedale.txt', 'r') as input_file:
process_file(input_file)
This program processes the Hopedale data set to find the smallest number of fox pelts produced in
any year. As we progress through the file, we keep the smallest value seen so far in a variable
called smallest. That variable is initially set to the value on the first line, since it’s the smallest (and
only) value seen so far:
def smallest_value(reader):
""" (file open for reading) -> NoneType
Read and process reader and return the smallest value after the
time_series header.
"""
line = skip_header(reader).strip()
# Now line contains the first data val; this is the smallest value
# found so far, because it is the only one we have seen.
smallest = int(line)
for line in reader:
value = int(line.strip())
# If we find a smaller value, remember it.
if value < smallest:
smallest = value
return smallest
The hyphen indicates that data for the year 1836 is missing. Unfortunately, calling read_smallest
on the Hebron data produces this error:
The problem is that '-' isn’t an integer, so calling int('-') fails. This isn’t an isolated problem. In
general, we will often need to skip blank lines, comments, or lines containing other “nonvalues”
in our data. Real data sets often contain omissions or contradictions; dealing with them is just a
fact of scientific life.
To fix our code, we must add a check inside the loop that processes a line only if it contains a real
value. In the TSDL data sets, missing entries are always marked with hyphens, so we just need to
check for that before trying to convert the string we have read to an integer:
def smallest_value_skip(reader):
line = skip_header(reader).strip()
# Now line contains the first data value; this is smallest value
# found so far, because it is the only one we have seen.
smallest = int(line)
for line in reader:
line = line.strip()
if line != '-':
value = int(line)
smallest = min(smallest, value)
return smallest
def find_largest(line):
# The largest value seen so far.
largest = -1
for value in line.split():
# Remove the trailing period.
v = int(value[:-1])
# If we find a larger value, remember it.
if v > largest:
largest = v
return largest
line = skip_header(reader).strip()
largest = find_largest(line)
# Check the rest of the lines for larger values.
for line in reader:
large = find_largest(line)
if large > largest:
largest = large
return largest
with open('lynx.txt', 'r') as input_file:
print(process_file(input_file))
Multiline Records
Not every data record will fit onto a single line. Here is a file in simplified Protein Data Bank
(PDB) format that describes the arrangements of atoms in ammonia:
COMPND AMMONIA
ATOM 1 N 0.257 -0.363 0.000
ATOM 2 H 0.257 0.727 0.000
ATOM 3 H 0.771 -0.727 0.890
ATOM 4 H 0.771 -0.727 -0.890
END
The first line is the name of the molecule. All subsequent lines down to the one containing END
specify the ID, type, and XYZ coordinates of one of the atoms in the molecule.
Reading this file is straightforward using the techniques we have built up in this chapter. But what
if the file contained two or more molecules, like this:
COMPND AMMONIA
ATOM 1 N 0.257 -0.363 0.000
ATOM 2 H 0.257 0.727 0.000
ATOM 3 H 0.771 -0.727 0.890
ATOM 4 H 0.771 -0.727 -0.890
END
COMPND METHANOL
ATOM 1 C -0.748 -0.015 0.024
ATOM 2 O 0.558 0.420 -0.278
ATOM 3 H -1.293 -0.202 -0.901
ATOM 4 H -1.263 0.754 0.600
ATOM 5 H -0.699 -0.934 0.609
ATOM 6 H 0.716 1.404 0.137
As always, we tackle this problem by dividing into smaller ones and solving each of those in turn.
Our first algorithm is as follows:
def read_molecule(reader):
""" (file open for reading) -> list or NoneType
Read a single molecule from reader and return it, or return None
to signal end of file. The first item in the result is the name
of the compound; each list contains an atom type and the X, Y,
and Z coordinates of that atom.
"""
# If there isn't another line, we're at the end of the file.
line = reader.readline()
if not line:
return None
# Name of the molecule: "COMPND name"
key, name = line.split()
# Other lines are either "END" or "ATOM num atom_type x y z"
molecule = [name]
line = reader.readline()
# Parse all the atoms in the molecule.
while not line.startswith('END'):
key, num, atom_type, x, y, z = line.split()
molecule.append([atom_type, x, y, z])
line = reader.readline()
return molecule
def read_all_molecules(reader):
""" (file open for reading) -> list
Read zero or more molecules from reader, returning a list of the
molecule information.
"""
# The list of molecule information.
result = []
reading = True
while reading:
molecule = read_molecule(reader)
if molecule: # None is treated as False in an if statement
result.append(molecule)
else:
reading = False
return result
if __name__ == '__main__':
molecule_file = open('multimol.pdb', 'r')
molecules = read_all_molecules(molecule_file)
for m in molecules:
print(m )
molecule_file.close()
1. What is the use of open() function in python? Explain with its syntax and example.
6. Write a python program to read first 10 lines and last 5 lines of a text file.
10. Write a python program to display the contents of a file in sorted order.
11. Write a python program to display the length of each line of a text file.
12. Consider the following hopdale.txt, Write a python program to calculate the total.
16. Design a python program to read the contents of a text file and write the same contents to another
file.
17. Consider num_pairs.txt, read data and find the line total and write back to a new file.
18. With an algorithm to skip head information of a text file and display file contents.
19. Design an algorithm to find smallest number among the contents of a file.
20. Design an algorithm to find largest number among the data set in the following file.
It looks much like a list, except that sets use braces (that is, { and }) instead of brackets (that is,
[ and ]). Notice that, when displayed in the shell, the set is unordered. Python does some
mathematical tricks behind the scenes to make accessing the items very fast, and one of the side
effects of this is that the items aren’t in any particular order. Here we show that each item is distinct;
duplicates are ignored:
>>> vowels = {'a', 'e', 'a', 'a', 'i', 'o', 'u', 'u'}
>>> vowels
{'u', 'o', 'i', 'e', 'a'}
Even though there were three 'a's and two 'u's when we created the set, only one of each was kept.
Python considers the two sets to be equal:
>>>{'a', 'e', 'i', 'o', 'u'} == {'a', 'e', 'a', 'i', 'o', 'u', 'u'}
True
The reason they are equal is that they contain the same items. Again, order doesn’t matter, and only
one of each element is kept. Variable vowels refers to an object of type set:
>>> type(vowels)
<class 'set'>
>>> type({1, 2, 3})
<class 'set'>
Function Set()
Function set expects either no arguments (to create an empty set) or a single argument that is a
collection of values. We can, for example, create a set from a list:
Note: Function set expects at most one argument. You can’t pass several values as separate
arguments:
In addition to lists, there are a couple of other types that can be used as arguments to function set.
One is a set:
Another type is range from Generating Ranges of Numbers. In the following code a set is created
with the values 0 to 4 inclusive:
>>> set(range(5))
{0, 1, 2, 3, 4}
Set Operations
In mathematics, set operations include union, intersection, add, and remove. In Python, these are
implemented as methods (for a complete list see the following table)
Other methods, such as intersection and union, return new sets based on their arguments. In the
following code, we show all of these methods in action:
Many of the tasks performed by methods can also be accomplished using operators. If acids and
bases are two sets, for example, then acids | bases creates a new set containing their union (that is,
all the elements from both acids and bases), while acids <= bases tests whether all the values in
acids are also in bases.
• {} • []
• Sets can't contain duplicates • Can contain duplicates elements
• Sets are unordered, cannot access elements • Index can be created and accessed
using index. • Double and triple index (sub list is
• Set Mathematical operations can be carried possible)
• No special
Tuples
• A tuple is a sequence of immutable Python objects.
• Tuples are sequences, just like lists. The differences between tuples and lists are, the tuples
cannot be changed unlike lists and tuples use parentheses, whereas lists use square brackets.
• Creating a tuple is as simple as putting different comma-separated values. Optionally you
can put these comma-separated values between parentheses also. For example −
• There’s one small catch: although () represents the empty tuple, a tuple with one element is not
written as (x) but as (x,) (with a trailing comma).
• If the trailing comma weren’t required, (5 + 3) could mean either 8 or the tuple containing only
the value 8:
>>> (8)
8
>>> type((8))
<class 'int'>
>>> (8,)
(8,)
>>> type((8,))
<class 'tuple'>
>>> (5 + 3)
8
>>> (5 + 3,)
(8,)
• To access values in tuple, use the square brackets for slicing along with the index or indices to
obtain value available at that index.
Updating Tuples
• Tuples are immutable which means you cannot update or change the values of tuple elements.
India 91
USA 1
UK 41
Japan 9
List:
lst1 = [['India', 91], ['USA', 1], ['UK', 41]]
Tuple:
tup3 = (('UK', 41), ('India', 91), ('USA', 1))
Python Dictionary
• Dictionary is also known as a map, a dictionary is an unordered mutable collection of
key/value pairs.
• Dictionaries are created by putting key/value pairs inside braces
• Each key is separated from its value by a colon (:), the items are separated by commas, and the
whole thing is enclosed in curly braces. An empty dictionary without any items is written with
just two curly braces, like this: {}.
>>> bird_to_observations = {}
>>> bird_to_observations['snow goose'] = 33
>>> bird_to_observations['eagle'] = 999
>>> bird_to_observations
{'eagle': 999, 'snow goose': 33}
>>> bird_to_observations['eagle'] = 9
>>> bird_to_observations
{'eagle': 9, 'snow goose': 33}
Notice that the values in the dictionary are ignored; the in operator only looks at the keys.
• For dictionaries, the loop variable is assigned each key from the dictionary in turn:
birds = {'canada goose': 183, 'long-tailed jaeger': 71,'snow goose':
63, 'northern fulmar': 1}
for key in birds:
... print(key, birds[key])
Methods of Dictionaries
Comparing Collections