Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Python Notes Mod4

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Module 4: Reading and writing text files

What do we learn in this chapter?


In this chapter, you’ll learn about different file formats, common ways to organize data, and how
to read and write that data using Python.

You’ll first learn how to open and read information from files.
After that, you’ll learn about the different techniques for reading files, and then you’ll see several
case studies that use the various techniques.

What Kinds of Files Are There?


There are many kinds of files. Text files, music files, videos, and various word processor and
presentation documents are common. Text files only contain characters; all the other file formats
include formatting information that is specific to that particular file format, and in order to use a
file in a particular format you need a special program that understands that format.

Text files, on the other hand, don’t contain any style information. They contain only readable
characters. You can open a text file in any text editor and read it. You can’t include style
information in text files, but you gain a lot in portability. Plain-text files take up very little disk
space. Compare the size of an empty text file to “empty” OpenOffice, Apple Pages, and Microsoft
Word documents.

They take up little disk space and are easy to process. The power comes from applications that can
process text files that are written with a particular syntax. The Python programs we have been
writing are text files, and by themselves they are only letters in a file. But combined with a Python
interpreter, these Python text files are robust: you can express a powerful algorithm and the
interpreter will follow your instructions.

Similarly, web browsers read and process HTML files, spreadsheets read and process comma-
separated value files, calendar programs read and process calendar data files, and other
programming language applications read and process files written with a particular programming
language syntax.

Opening a File
• Built-in function open opens a file and returns an object that knows how to get
information from the file, how much you’ve read, and which part of the file you’re about
to read next.

Prepared By: Dr. Rama Satish K V, Asst Professor, RNSIT, Bengaluru


• The marker that keeps track of the current location in the file is called a file cursor.
• The file cursor is initially at the beginning of the file, but as we read or write data it
moves.
• The first argument in the example call on function open, 'file_example.txt', is the name of
the file to open, and the second argument, 'r', tells Python that you want to read the file;
this is called the file mode. Other options for the mode include 'w' for writing and 'a' for
appending.

• If you call open with only the name of the file (omitting the mode), then the default is 'r'.

Syntax: open(file, mode='r')

file is a path-like object giving the pathname (absolute or relative to the current working directory)
of the file to be opened or an integer file descriptor of the file to be wrapped.

mode is an optional string that specifies the mode in which the file is opened. It defaults
to 'r' which means open for reading in text mode. Other common values are 'w' for writing
(truncating the file if it already exists), 'x' for exclusive creation and 'a' for appending (which
on some Unix systems, means that all writes append to the end of the file regardless of the current
seek position). In text mode, if encoding is not specified the encoding used is platform
dependent: locale.getpreferredencoding(False) is called to get the current locale
encoding.

Character Meaning

'r' open for reading (default)


'w' open for writing, truncating the file first
'x' open for exclusive creation, failing if the file already exists
'a' open for writing, appending to the end of the file if it exists
'b' binary mode
't' text mode (default)
'+' open a disk file for updating (reading and writing)

Example:
file = open('file_example.txt', 'r')
contents = file.read()
print(contents)
file.close()

Prepared By: Dr. Rama Satish K V, Asst Professor, RNSIT, Bengaluru


Built-in function open opens a file and returns an object that knows how to get information from
the file, how much you’ve read, and which part of the file you’re about to read next. The marker
that keeps track of the current location in the file is called a file cursor.

The first argument in the example call on function open, 'file_example.txt', is the name of the file
to open, and the second argument, 'r', tells Python that you want to read the file; this is called the
file mode. Other options for the mode include 'w' for writing and 'a' for appending. If you call open
with only the name of the file (omitting the mode), then the default is 'r'.

The second statement, contents = file.read(), tells Python that you want to read the
contents of the entire file into a string, which we assign to a variable called contents.

The third statement prints that string. When you run the program, you’ll see that newline characters
are treated just like every other character; a newline character is just another character in the file.
The last statement, file.close(), releases all resources associated with the open file object.

The with Statement


Because every call on function open should have a corresponding call on method close, Python
provides a with statement that automatically closes a file when the end of the block is reached.
Here is the same example using a with statement:

with open('file_example.txt', 'r') as file:


contents = file.read()
print(contents)
The general form of a with statement is as follows:
with open(«filename», «mode») as «variable»:
«block»

The Read Technique


Use this technique when you want to read the contents of a file into a single string, or when you
want to specify exactly how many characters to read.

with open('file_example.txt', 'r') as file:


contents = file.read()
print(contents)
When called with no arguments, it reads everything from the current file cursor all the way to the
end of the file and moves the file cursor to the end of the file. When called with one integer
argument, it reads that many characters and moves the file cursor after the characters that were just
read.

Prepared By: Dr. Rama Satish K V, Asst Professor, RNSIT, Bengaluru


with open('file_example.txt', 'r') as example_file:
first_ten_chars = example_file.read(10)
the_rest = example_file.read()

print("The first 10 characters:", first_ten_chars)


print("The rest of the file:", the_rest)

The Readlines Technique


Use this technique when you want to get a Python list of strings containing the individual lines
from a file. Function readlines works much like function read, except that it splits up the lines into
a list of strings. As with read, the file cursor is moved to the end of the file.

This example reads the contents of a file into a list of strings and then prints that list:

with open('file_example.txt', 'r') as example_file:


lines = example_file.readlines()

print(lines)
Here is the output:

['First line of text.\n', 'Second line of text.\n', 'Third line of text.\n']


Take a close look at that list; you’ll see that each line ends in \n characters. Python does not remove
any characters from what is read; it only splits them into separate strings.

with open('planets.txt', 'r') as planets_file:


planets = planets_file.readlines()

for planet in reversed(planets):


print(planet.strip())

Output is :

Mars
Earth
Venus
Mercury
We can use the Readlines technique to read the file, sort the lines, and print the planets
alphabetically (here, we use built-in function sorted, which returns the items in the list in order
from smallest to largest):

>>>with open('planets.txt', 'r') as planets_file:


... planets = planets_file.readlines()
...
>>> for planet in sorted(planets):
... print(planet.strip())
...
Earth
Mars
Mercury
Venus

Prepared By: Dr. Rama Satish K V, Asst Professor, RNSIT, Bengaluru


The “For Line in File” Technique

Use this technique when you want to do the same thing to every line from the file cursor to the
end of a file. On each iteration, the file cursor is moved to the beginning of the next line.

with open('planets.txt', 'r') as data_file:


... for line in data_file:
... print(len(line))
8
6
6
5
Take a close look at the last line of output. There are only four characters in the word Mars, but
our program is reporting that the line is five characters long. The reason for this is the same as
for function readlines: each of the lines we read from the file has a newline character at the end.
We can get rid of it using string method strip, which returns a copy of a string that has leading
and trailing whitespace characters (spaces, tabs, and newlines) stripped away:

with open('planets.txt', 'r') as data_file:


... for line in data_file:
... print(len(line.strip()))

Write a python program to read first 10 lines and last 5 lines of a text file

with open('lynx.txt', 'r') as planets_file:


lines = planets_file.readlines()

print("Top 5 lines")
for i in lines[:5]:
print(i.strip())

print("Bottom 5 lines")
for i in lines[-1:-6:-1]:
print(i.strip())

The Readline Technique


This technique reads one line at a time, unlike the Readlines technique. Use this technique when
you want to read only part of a file. For example, you might want to treat lines differently
depending on context; perhaps you want to process a file that has a header section followed by a
series of records, either one record per line or with multiline records.
The following data, taken from the Time Series Data Library i.e., hopedale.txt

Prepared By: Dr. Rama Satish K V, Asst Professor, RNSIT, Bengaluru


The first line contains a description of the data. The next two lines contain comments about the
data, each of which begins with a # character. Each piece of actual data appears on a single line.
We’ll use the Readline technique to skip the header, and then we’ll use the For Line in File
technique to process the data in the file, counting how many fox fur pelts were produced.

with open('hopedale.txt', 'r') as hopedale_file:


hopedale_file.readline()

data = hopedale_file.readline().strip()
while data.startswith('#'):
data = hopedale_file.readline()

total_pelts = int(data)

for data in hopedale_file:


total_pelts += int(data.strip())

print("Total number of pelts:", total_pelts)


And here is the output:

Total number of pelts: 373

Files over the Internet


These days, of course, the file containing the data we want could be on a machine half a world
away. Provided the file is accessible over the Internet, though, we can read it just as we do a local
file. For example, the Hopedale data not only exists on our computers, but it’s also on a web page.
At the time of writing, the URL for the file is

http://robjhyndman.com/tsdldata/ecology1/hopedale.dat (you can look at it online!).


(Note that the examples in this section will work only if your computer is actually connected to the
Internet.)
Module urllib.urlrequest contains a function called urlopen that opens a web page for reading.
urlopen returns a file-like object that you can use much as if you were reading a local file.

Prepared By: Dr. Rama Satish K V, Asst Professor, RNSIT, Bengaluru


import urllib.request
url = 'http://robjhyndman.com/tsdldata/ecology1/hopedale.dat'
with urllib.request.urlopen(url) as webpage:
for line in webpage:
line = line.strip()
line = line.decode('utf-8')
print(line)

Writing Files
This program opens a file called topics.txt, writes the words Computer Science to the file, and then
closes the file:

with open('topics.txt', 'w') as output_file:


output_file.write('Computer Science')

In addition to writing characters to a file, method write returns the number of characters written.
For example, output_file.write('Computer Science') returns 16. To create a new file or to replace
the contents of an existing file, we use write mode ('w'). If the filename doesn’t exist already, then
a new file is created; otherwise the file contents are erased and replaced. Once opened for writing,
you can use method write to write a string to the file.

Rather than replacing the file contents, we can also add to a file using the append mode ('a'). When
we write to a file that is opened in append mode, the data we write is added to the end of the file
and the current file contents are not overwritten. For example, to add to our previous file topics.txt,
we can append the words Software Engineering:

with open('topics.txt', 'a') as output_file:


output_file.write('Software Engineering')
At this point, if we print the contents of topics.txt, we’d see the following:
Computer ScienceSoftware Engineering
Unlike function print, method write doesn’t automatically start a new line; if you want a string to
end in a newline, you have to include it manually using '\n'. In each of the previous examples, we
called write only once, but you’ll typically call it multiple times.

The next example, in a file called total.py, is more complex, and it involves both reading from and
writing to a file. Our input file contains two numbers per line separated by a space. The output file
will contain three numbers: the two from the input file and their sum (all separated by spaces):

def sum_num_pairs(input_file, output_filename):


with open(output_filename, 'w') as output_file:
for number_pair in input_file:
number_pair = number_pair.strip()
operands = number_pair.split()
total = float(operands[0]) + float(operands[1])

Prepared By: Dr. Rama Satish K V, Asst Professor, RNSIT, Bengaluru


new_line = '{0} {1}\n'.format(number_pair, total)
output_file.write(new_line)

sum_num_pairs(open('num_pairs.txt','r'),'out.txt')

Assume that a file called number_pairs.txt exists with these contents:

1.3 3.4
2 4.2
-1 1
Then total.sum_number_pairs(open('number_pairs.txt', 'r'), 'out.txt') creates this file:

1.3 3.4 4.7


2 4.2 6.2
-1 1 0.0

Writing Algorithms That Use the File-Reading Techniques


Many data files begin with a header. TSDL files begin with a one-line description followed by
comments in lines beginning with a #, and the Readline technique can be used to skip that header.
The technique ends when we read the first real piece of data, which will be the first line after the
description that doesn’t start with a #.

In English, we might try this algorithm to process this kind of a file:

Skip the first line in the file


Skip over the comment lines in the file
For each of the remaining lines in the file:
Process the data on that line
The actual implemented python program is :

def skip_header(reader):
""" (file open for reading) -> str
Skip the header in reader and return the first real piece of data.
"""
# Read the description line
line = reader.readline()
# Find the first non-comment line
line = reader.readline()
while line.startswith('#'):
line = reader.readline()
# Now line contains the first real piece of data
return line

def process_file(reader):
""" (file open for reading) -> NoneType
Read and print the data from reader, which must start with a single
Description line, then a sequence of lines beginning with '#', then
a sequence of data.
"""
# Find and print the first piece of data

Prepared By: Dr. Rama Satish K V, Asst Professor, RNSIT, Bengaluru


line = skip_header(reader).strip()
print(line)
# Read the rest of the data
for line in reader:
line = line.strip()
print(line)

if __name__ == '__main__':
with open('hopedale.txt', 'r') as input_file:
process_file(input_file)
This program processes the Hopedale data set to find the smallest number of fox pelts produced in
any year. As we progress through the file, we keep the smallest value seen so far in a variable
called smallest. That variable is initially set to the value on the first line, since it’s the smallest (and
only) value seen so far:

from Module4.skipHeader import skip_header

def smallest_value(reader):
""" (file open for reading) -> NoneType
Read and process reader and return the smallest value after the
time_series header.
"""
line = skip_header(reader).strip()
# Now line contains the first data val; this is the smallest value
# found so far, because it is the only one we have seen.
smallest = int(line)
for line in reader:
value = int(line.strip())
# If we find a smaller value, remember it.
if value < smallest:
smallest = value
return smallest

with open('hopedale.txt', 'r') as input_file:


print(smallest_value(input_file))

Dealing with Missing Values in Data


Consider the following data file:

The hyphen indicates that data for the year 1836 is missing. Unfortunately, calling read_smallest
on the Hebron data produces this error:

Prepared By: Dr. Rama Satish K V, Asst Professor, RNSIT, Bengaluru


>>> import read_smallest
>>> read_smallest.smallest_value(open('hebron.txt', 'r'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "./read_smallest.py", line 19, in smallest_value
value = int(line.strip())
ValueError: invalid literal for int() with base 10: '-'

The problem is that '-' isn’t an integer, so calling int('-') fails. This isn’t an isolated problem. In
general, we will often need to skip blank lines, comments, or lines containing other “nonvalues”
in our data. Real data sets often contain omissions or contradictions; dealing with them is just a
fact of scientific life.

To fix our code, we must add a check inside the loop that processes a line only if it contains a real
value. In the TSDL data sets, missing entries are always marked with hyphens, so we just need to
check for that before trying to convert the string we have read to an integer:

from Module4.skipHeader import skip_header

def smallest_value_skip(reader):

line = skip_header(reader).strip()

# Now line contains the first data value; this is smallest value
# found so far, because it is the only one we have seen.
smallest = int(line)
for line in reader:
line = line.strip()
if line != '-':
value = int(line)
smallest = min(smallest, value)
return smallest

with open('hebron.txt', 'r') as input_file:


print(smallest_value_skip(input_file))

Processing Whitespace-Delimited Data


The file at http://robjhyndman.com/tsdldata/ecology1/lynx.dat contains information about lynx
pelts in the years 1821–1934.

Prepared By: Dr. Rama Satish K V, Asst Professor, RNSIT, Bengaluru


To process this, we will break each line into pieces and strip off the periods. Our algorithm is the
same as it was for the fox pelt data: find and process the first line of data in the file, and then
process each of the subsequent lines. However, the notion of “processing a line” needs to be
examined further because there are many values per line. Our refined algorithm, shown next, uses
nested loops to handle the notion of “for each line and for each value on that line”:

Find the first line of real data after the header


Find the largest value in that line
For each of the remaining lines of data:
Find the largest value in that line
If that value is larger than the previous largest, remember it
String method split will split around the whitespace, but we still have to remove the periods at the
ends of the values. We can also simplify our code by initializing largest to -1, because that value
is guaranteed to be smaller than any of the (positive) values in the file. That way, no matter what
the first real value is, it will be larger than the “previous” value (our -1) and replace it.

from Module4.skipHeader import skip_header

def find_largest(line):
# The largest value seen so far.
largest = -1
for value in line.split():
# Remove the trailing period.
v = int(value[:-1])
# If we find a larger value, remember it.
if v > largest:
largest = v
return largest

Prepared By: Dr. Rama Satish K V, Asst Professor, RNSIT, Bengaluru


def process_file(reader):

line = skip_header(reader).strip()

largest = find_largest(line)
# Check the rest of the lines for larger values.
for line in reader:
large = find_largest(line)
if large > largest:
largest = large
return largest
with open('lynx.txt', 'r') as input_file:
print(process_file(input_file))

Multiline Records
Not every data record will fit onto a single line. Here is a file in simplified Protein Data Bank
(PDB) format that describes the arrangements of atoms in ammonia:

COMPND AMMONIA
ATOM 1 N 0.257 -0.363 0.000
ATOM 2 H 0.257 0.727 0.000
ATOM 3 H 0.771 -0.727 0.890
ATOM 4 H 0.771 -0.727 -0.890
END
The first line is the name of the molecule. All subsequent lines down to the one containing END
specify the ID, type, and XYZ coordinates of one of the atoms in the molecule.

Reading this file is straightforward using the techniques we have built up in this chapter. But what
if the file contained two or more molecules, like this:

COMPND AMMONIA
ATOM 1 N 0.257 -0.363 0.000
ATOM 2 H 0.257 0.727 0.000
ATOM 3 H 0.771 -0.727 0.890
ATOM 4 H 0.771 -0.727 -0.890
END
COMPND METHANOL
ATOM 1 C -0.748 -0.015 0.024
ATOM 2 O 0.558 0.420 -0.278
ATOM 3 H -1.293 -0.202 -0.901
ATOM 4 H -1.263 0.754 0.600
ATOM 5 H -0.699 -0.934 0.609
ATOM 6 H 0.716 1.404 0.137
As always, we tackle this problem by dividing into smaller ones and solving each of those in turn.
Our first algorithm is as follows:

While there are more molecules in the file:


Read a molecule from the file

Prepared By: Dr. Rama Satish K V, Asst Professor, RNSIT, Bengaluru


Append it to the list of molecules read so far

def read_molecule(reader):
""" (file open for reading) -> list or NoneType
Read a single molecule from reader and return it, or return None
to signal end of file. The first item in the result is the name
of the compound; each list contains an atom type and the X, Y,
and Z coordinates of that atom.
"""
# If there isn't another line, we're at the end of the file.
line = reader.readline()
if not line:
return None
# Name of the molecule: "COMPND name"
key, name = line.split()
# Other lines are either "END" or "ATOM num atom_type x y z"
molecule = [name]
line = reader.readline()
# Parse all the atoms in the molecule.
while not line.startswith('END'):
key, num, atom_type, x, y, z = line.split()
molecule.append([atom_type, x, y, z])
line = reader.readline()
return molecule

def read_all_molecules(reader):
""" (file open for reading) -> list
Read zero or more molecules from reader, returning a list of the
molecule information.
"""
# The list of molecule information.
result = []
reading = True

while reading:
molecule = read_molecule(reader)
if molecule: # None is treated as False in an if statement
result.append(molecule)
else:
reading = False
return result

if __name__ == '__main__':
molecule_file = open('multimol.pdb', 'r')
molecules = read_all_molecules(molecule_file)
for m in molecules:
print(m )
molecule_file.close()

Prepared By: Dr. Rama Satish K V, Asst Professor, RNSIT, Bengaluru


Module - 4 Questions

1. What is the use of open() function in python? Explain with its syntax and example.

2. What is the purpose of with statment.

3. Write a python program to read a file and display its contents.

4. Explain read() function with arguments and without arguments.

5. How readlines() function work? Explain with example.

6. Write a python program to read first 10 lines and last 5 lines of a text file.

7. Write a python program to find number of lines in a file.

8. Differenciate readlines()function and readline() function.

9.What is the use of strip() function?Demonstrate an example.

10. Write a python program to display the contents of a file in sorted order.

11. Write a python program to display the length of each line of a text file.

12. Consider the following hopdale.txt, Write a python program to calculate the total.

13. What is a ragged list? explain.

14. How do you access a file over internet? Explain.

15. Demonstrate writing of a text file.

16. Design a python program to read the contents of a text file and write the same contents to another
file.

17. Consider num_pairs.txt, read data and find the line total and write back to a new file.

18. With an algorithm to skip head information of a text file and display file contents.

19. Design an algorithm to find smallest number among the contents of a file.

20. Design an algorithm to find largest number among the data set in the following file.

Prepared By: Dr. Rama Satish K V, Asst Professor, RNSIT, Bengaluru


21. How do you achieve reading a multiline record file? Give example.

Prepared By: Dr. Rama Satish K V, Asst Professor, RNSIT, Bengaluru


Module 4: Storing Data Using Other Collection Types
In Chapter 8, Storing Collections of Data Using Lists, you learned how to store collections of data
using lists. In this chapter, you will learn about three other kinds of collections: sets, tuples, and
dictionaries. With four different options for storing your collections of data, you will be able to
pick the one that best matches your problem in order to keep your code as simple and efficient as
possible.

Storing Data Using Sets


A set is an unordered collection of distinct items. Unordered means that items aren’t stored in any
particular order. Something is either in the set or it’s not, but there’s no notion of it being the first,
second, or last item. Distinct means that any item appears in a set at most once; in other words,
there are no duplicate.
Python has a type called set that allows us to store mutable collections of unordered, distinct items.
(Remember that a mutable object means one that you can modify.) Here we create a set containing
these vowels:

>>> vowels = {'a', 'e', 'i', 'o', 'u'}


>>> vowels
{'a', 'u', 'o', 'i', 'e'}

It looks much like a list, except that sets use braces (that is, { and }) instead of brackets (that is,
[ and ]). Notice that, when displayed in the shell, the set is unordered. Python does some
mathematical tricks behind the scenes to make accessing the items very fast, and one of the side
effects of this is that the items aren’t in any particular order. Here we show that each item is distinct;
duplicates are ignored:

>>> vowels = {'a', 'e', 'a', 'a', 'i', 'o', 'u', 'u'}
>>> vowels
{'u', 'o', 'i', 'e', 'a'}

Even though there were three 'a's and two 'u's when we created the set, only one of each was kept.
Python considers the two sets to be equal:

>>>{'a', 'e', 'i', 'o', 'u'} == {'a', 'e', 'a', 'i', 'o', 'u', 'u'}
True

The reason they are equal is that they contain the same items. Again, order doesn’t matter, and only
one of each element is kept. Variable vowels refers to an object of type set:

>>> type(vowels)
<class 'set'>
>>> type({1, 2, 3})
<class 'set'>

Function Set()
Function set expects either no arguments (to create an empty set) or a single argument that is a
collection of values. We can, for example, create a set from a list:

Prepared By: Dr. Rama Satish K V, Asst Professor, RNSIT, Bengaluru


>>> set()
set()
>>> type(set())
<class 'set'>
>>> set([2, 3, 2, 5])
{2, 3, 5}

Note: Function set expects at most one argument. You can’t pass several values as separate
arguments:

In addition to lists, there are a couple of other types that can be used as arguments to function set.
One is a set:

Another type is range from Generating Ranges of Numbers. In the following code a set is created
with the values 0 to 4 inclusive:

>>> set(range(5))
{0, 1, 2, 3, 4}

Set Operations
In mathematics, set operations include union, intersection, add, and remove. In Python, these are
implemented as methods (for a complete list see the following table)

Prepared By: Dr. Rama Satish K V, Asst Professor, RNSIT, Bengaluru


Sets are mutable, which means you can change what is in a set object. The methods add, remove,
and clear all modify what is in a set. The letter y is sometimes considered to be a vowel; here we
add it to our set of vowels:

>>> vowels = {'a', 'e', 'i', 'o', 'u'}


>>> vowels
{'o', 'u', 'a', 'e', 'i'}
>>> vowels.add('y')
>>> vowels
{'u', 'y', 'e', 'a', 'o', 'i'}

Other methods, such as intersection and union, return new sets based on their arguments. In the
following code, we show all of these methods in action:

>>> ten = set(range(10))


>>> lows = {0, 1, 2, 3, 4}
>>> odds = {1, 3, 5, 7, 9}
>>> lows.add(9)
>>> lows
{0, 1, 2, 3, 4, 9}
>>> lows.difference(odds)
{0, 2, 4}
>>> lows.intersection(odds)
{1, 3, 9}
>>> lows.issubset(ten)
True
>>> lows.issuperset(odds)
False

Prepared By: Dr. Rama Satish K V, Asst Professor, RNSIT, Bengaluru


>>> lows.remove(0)
>>> lows
{1, 2, 3, 4, 9}
>>> lows.symmetric_difference(odds)
{2, 4, 5, 7}
>>> lows.union(odds)
{1, 2, 3, 4, 5, 7, 9}
>>> lows.clear()
>>> lows
set()

Many of the tasks performed by methods can also be accomplished using operators. If acids and
bases are two sets, for example, then acids | bases creates a new set containing their union (that is,
all the elements from both acids and bases), while acids <= bases tests whether all the values in
acids are also in bases.

Difference between Set and List

• {} • []
• Sets can't contain duplicates • Can contain duplicates elements
• Sets are unordered, cannot access elements • Index can be created and accessed
using index. • Double and triple index (sub list is
• Set Mathematical operations can be carried possible)
• No special

Tuples
• A tuple is a sequence of immutable Python objects.
• Tuples are sequences, just like lists. The differences between tuples and lists are, the tuples
cannot be changed unlike lists and tuples use parentheses, whereas lists use square brackets.
• Creating a tuple is as simple as putting different comma-separated values. Optionally you
can put these comma-separated values between parentheses also. For example −

tup1 = ('physics', 'chemistry', 1997, 2000);


tup2 = (1, 2, 3, 4, 5 );
tup3 = "a", "b", "c", "d";
The empty tuple is written as two parentheses containing nothing − tup1 = ();

Prepared By: Dr. Rama Satish K V, Asst Professor, RNSIT, Bengaluru


Storing Data Using Tuples
• Python also has an immutable sequence type called a tuple. Tuples are written using
parentheses instead of brackets; like strings and lists, they can be subscripted, sliced, and looped
over:

>>> bases = ('A', 'C', 'G', 'T')


>>> for base in bases:
... print(base)
...
A
C
G
T
• To write a tuple containing a single value you have to include a comma, even though there is
only one value − tup1 = (50,);

• There’s one small catch: although () represents the empty tuple, a tuple with one element is not
written as (x) but as (x,) (with a trailing comma).

• This is done to avoid ambiguity.

• If the trailing comma weren’t required, (5 + 3) could mean either 8 or the tuple containing only
the value 8:

>>> (8)
8
>>> type((8))
<class 'int'>
>>> (8,)
(8,)
>>> type((8,))
<class 'tuple'>
>>> (5 + 3)
8
>>> (5 + 3,)
(8,)

Accessing Values in Tuples:


• Like string indices, tuple indices start at 0, and they can be sliced, concatenated, and so on.

• To access values in tuple, use the square brackets for slicing along with the index or indices to
obtain value available at that index.

tup1 = ('physics', 'chemistry', 1997, 2000)


tup2 = (1, 2, 3, 4, 5, 6, 7 )
print "tup1[0]: ", tup1[0]
print "tup2[1:5]: ", tup2[1:5]

Updating Tuples
• Tuples are immutable which means you cannot update or change the values of tuple elements.

Prepared By: Dr. Rama Satish K V, Asst Professor, RNSIT, Bengaluru


Example:
tup1 = (12, 34.56);
tup2 = ('abc', 'xyz');
# tup1[0] = 100; # This action is not valid for tuples
# So let's create a new tuple as follows
tup3 = tup1 + tup2;
print (tup3)

Storing lists in a tuple


>>> canada = ['Canada', 76.5]
>>> usa = ['United States', 75.5]
>>> mexico = ['Mexico', 72.0]

>>> life = (canada, usa, mexico)

Store the following data in a list, a tuple, a set and a dictionary:

India 91

USA 1

UK 41

Japan 9

List:
lst1 = [['India', 91], ['USA', 1], ['UK', 41]]
Tuple:
tup3 = (('UK', 41), ('India', 91), ('USA', 1))

Prepared By: Dr. Rama Satish K V, Asst Professor, RNSIT, Bengaluru


Or
tup4 = (['India', 91], ['USA', 1], ['UK', 41])
Set
set1 = {('UK', 41), ('India', 91), ('USA', 1)}
Note: set of lists are not possible

>>> set1 = {["India", 91], ["USA", 1]}


Traceback (most recent call last):
set1 = {["India", 91], ["USA", 1]}
TypeError: unhashable type: 'list'
Dictionary
{'India': 91, 'USA': 1, 'UK': 41}

Python Dictionary
• Dictionary is also known as a map, a dictionary is an unordered mutable collection of
key/value pairs.
• Dictionaries are created by putting key/value pairs inside braces
• Each key is separated from its value by a colon (:), the items are separated by commas, and the
whole thing is enclosed in curly braces. An empty dictionary without any items is written with
just two curly braces, like this: {}.

>>> bird_to_observations = {'canada goose': 3, 'northern fulmar': 1}


>>> bird_to_observations
{'northern fulmar': 1, 'canada goose': 3}
• Keys are unique within a dictionary while values may not be. The values of a dictionary can
be of any type, but the keys must be of an immutable data type such as strings, numbers, or
tuples.

Accessing the elements


• To get the value associated with a key, we put the key in square brackets, much like indexing
into a list:

>>> bird_to_observations['northern fulmar']


1
• Indexing a dictionary with a key it doesn’t contain produces an error, just like an out-of-range
index for a list does:

>>> bird_to_observations['canada goose']


3

Prepared By: Dr. Rama Satish K V, Asst Professor, RNSIT, Bengaluru


>>> bird_to_observations['long-tailed jaeger']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'long-tailed jaeger'
• The empty dictionary is written {} (this is why we can’t use this notation for the empty set).

Adding key and value


• To update the value associated with a key, you use the same notation as for lists, except you
use a key instead of an index.
• If the key is already in the dictionary, this assignment statement changes the value associated
with it.
• If the key isn’t present, the key/value pair is added to the dictionary

>>> bird_to_observations = {}
>>> bird_to_observations['snow goose'] = 33
>>> bird_to_observations['eagle'] = 999
>>> bird_to_observations
{'eagle': 999, 'snow goose': 33}

Updating the values


• Change the value associated with key 'eagle' to 9.

>>> bird_to_observations['eagle'] = 9
>>> bird_to_observations
{'eagle': 9, 'snow goose': 33}

Removing the value


• To remove an entry from a dictionary, use del d[k], where d is the dictionary and k is the key
being removed. Only entries that are present can be removed;
• Trying to remove one that isn’t there results in an error:

>>> bird_to_observations = {'snow goose': 33, 'eagle': 9}


>>> del bird_to_observations['snow goose']
>>> bird_to_observations
{'eagle': 9}
>>> del bird_to_observations['gannet']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'gannet'

Checking the membership


• To test whether a key is in a dictionary, we can use the in operator:

birds = {'eagle': 999, 'snow goose': 33}


if 'eagle' in birds:
print('eagles have been seen')

Prepared By: Dr. Rama Satish K V, Asst Professor, RNSIT, Bengaluru


Output:

eagles have been seen

Using the In Operator on Tuples, Sets, and Dictionaries


As with lists, the in operator can be applied to tuples and sets to check whether an item is a member
of the collection:
>>> odds = set([1, 3, 5, 7, 9])
>>> 9 in odds
True
>>> 8 in odds
False
>>> '9' in odds
False
>>> evens = (0, 2, 4, 6, 8)
>>> 4 in evens
True
>>> 11 in evens
False
When used on a dictionary, in checks whether a value is a key in the dictionary:

>>> bird2observations ={'canada goose': 183,'long-tailed jaeger': 71,


... 'snow goose': 63, 'northern fulmar': 1}
>>> 'snow goose' in bird2observations
True
>>> 183 in bird2observations
False

Notice that the values in the dictionary are ignored; the in operator only looks at the keys.

Looping Over Dictionaries


• Like the other collections you’ve seen, you can loop over dictionaries. The general form of
a for loop over a dictionary is as follows:

for «variable» in «dictionary»:


«block»

• For dictionaries, the loop variable is assigned each key from the dictionary in turn:
birds = {'canada goose': 183, 'long-tailed jaeger': 71,'snow goose':
63, 'northern fulmar': 1}
for key in birds:
... print(key, birds[key])

>>>scientist_to_bdate = {'Newton' : 1642, 'Darwin' : 1809,


... 'Turing' : 1912}
>>>for scientist, birthdate in scientist_to_birthdate.items():
... print(scientist, 'was born in', birthdate)
...
Turing was born in 1912
Darwin was born in 1809
Newton was born in 1642

Prepared By: Dr. Rama Satish K V, Asst Professor, RNSIT, Bengaluru


Instead of a single loop variable, there are two. The two parts of

Methods of Dictionaries

Reading data from text file and storing in Dictionary


observations_file = open('observation.txt')
bird_to_observations = {}
for line in observations_file:
bird = line.strip()
if bird in bird_to_observations:
bird_to_observations[bird] = bird_to_observations[bird] + 1
else:
bird_to_observations[bird] = 1
observations_file.close()
for bird, observations in bird_to_observations.items():
print(bird, observations)

Prepared By: Dr. Rama Satish K V, Asst Professor, RNSIT, Bengaluru


Inverting a Dictionary
You might want to print the birds in another order—in order of the number of observations, for
example. To do this, you need to invert the dictionary; that is, create a new dictionary in which you
use the values as keys and the keys as values.. There’s no guarantee that the values are unique, so
you have to handle what are called collisions. For example, if you invert the dictionary {'a': 1, 'b':
1, 'c': 1}, a key would be 1, but it’s not clear what the value associated with it would be. Since
you’d like to keep all of the data from the original dictionary, you may need to use a collection,
such as a list, to keep track of the values associated with a key. If we go this route, the inverse of
the dictionary shown earlier would be {1: ['a', 'b', 'c']}. Here’s a program to invert the dictionary of
birds to observations:

bird_to_observations ={'canada goose': 5,


'northern fulmar': 1,
'long-tailed jaeger': 2,
'snow goose': 1}
observations_to_birds_list = {}
for bird, observations in bird_to_observations.items():
if observations in observations_to_birds_list:
observations_to_birds_list[observations].append(bird)
else:
observations_to_birds_list[observations] = [bird]
print(observations_to_birds_list)

Comparing Collections

Prepared By: Dr. Rama Satish K V, Asst Professor, RNSIT, Bengaluru

You might also like