Python Tutorial_ File Management
Python Tutorial_ File Management
Home Python 2 Tutorial Python 3 Tutorial Advanced Topics Numerical Programming Machine Learning Tkinter Tutorial Contact
File Management
"The illiterate of the This string contains the complete content of the file, which includes the carriage returns and line feeds.
21st century will not
be those who cannot
read and write, but Resetting the Files Current Position
those who cannot
learn, unlearn, and
It's possible to set - or reset - a file's position to a certain position, also called the offset. To do this, we use the method seek. It has only one parameter in Python3 (no "whence" is available as in Python2). The
relearn."
parameter of seek determines the offset which we want to set the current position to. To work with seek, we will often need the method tell, which "tells" us the current position. When we have just opened a file, it
(Alvin Toffler,
will be zero. We will demonstrate the way of working with both seek and tell in the following example. You have to create a file called "buck_mulligan.txt" with the content "Stately, plump Buck Mulligan came from
American Science
the stairhead, bearing a bowl of lather on which a mirror and a razor lay crossed.":
Fiction author)
>>> fh = open("buck_mulligan.txt")
>>> fh.tell()
Read and Write 0
>>> fh.read(7)
'Stately'
"Just because some >>> fh.tell()
of us can read and 7
write and do a little >>> fh.read()
math that doesn't ', plump Buck Mulligan came from the stairhead, bearing a bowl of\nlather on which a mirror and a razor lay crossed.\n'
mean we deserve to >>> fh.tell()
122
conquer the
>>> fh.seek(9)
Universe." 9
(Kurt Vonnegut, from >>> fh.read(5)
his novel "Hocus 'plump'
Pocus")
It's also possible to set the file position relative to the current position by using tell correspondingly:
>>> fh = open("buck_mulligan.txt")
This website is >>> fh.read(15)
supported by: 'Stately, plump '
>>> # set the current position 6 characters to the left:
...
Linux and Python >>> fh.seek(fh.tell() -6)
Training Courses 9
>>> fh.read(5)
'plump'
>>> # now, we will advance 29 characters to the
>>> # 'right' relative to the current position:
...
>>> fh.seek(fh.tell() + 29)
43
>>> fh.read(10)
'stairhead,'
>>>
In the following example we will open a file for reading and writing at the same time. If the file doesn't exist, it will be created. If you want to open an existing file for read and write, you should better use "r+",
because this will not delete the content of the file.
fh = open('colours.txt', 'w+')
fh.write('The colour brown')
brown
16
The colour green
We don't mean what the heading says. On the contrary, we want to prevent any nasty situation, like loosing the data, which your Python program has calculated. So, we
will show you, how you can save your data in an easy way that you or better your program can reread them at a later date again. We are "pickling" the data, so that
nothing gets lost.
Python offers for this purpose a module, which is called "pickle". With the algorithms of the pickle module we can serialize and de-serialize Python object structures.
"Pickling" denotes the process which converts a Python object hierarchy into a byte stream, and "unpickling" on the other hand is the inverse operation, i.e. the byte
stream is converted back into an object hierarchy. What we call pickling (and unpickling) is also known as "serialization" or "flattening" a data structure.
An object can be dumped with the dump method of the pickle module:
dump() writes a pickled representation of obj to the open file object file. The optional protocol argument tells the pickler to use the given protocol:
Protocol version 0 is the original (before Python3) human-readable (ascii) protocol and is backwards compatible with previous versions of Python
Protocol version 1 is the old binary format which is also compatible with previous versions of Python.
Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes.
Protocol version 3 was introduced with Python 3.0. It has explicit support for bytes and cannot be unpickled by Python 2.x pickle modules. It's the recommended protocol of Python 3.x.
If fix_imports is True and protocol is less than 3, pickle will try to map the new Python3 names to the old module names used in Python2, so that the pickle data stream is readable with Python 2.
Objects which have been dumped to a file with pickle.dump can be reread into a program by using the method pickle.load(file). pickle.load recognizes automatically, which format had been used for writing the
data.
A simple example:
The file data.pkl can be read in again by Python in the same or another session or by a different program:
Only the objects and not their names are saved. That's why we use the assignment to villes in the previous example, i.e. data = pickle.load(f).
In our previous example, we had pickled only one object, i.e. a list of French cities. But what about pickling multiple objects? The solution is easy: We pack the objects into another object, so we will only have to
pickle one object again. We will pack two lists "programming_languages" and "python_dialects" into a list pickle_objects in the following example:
The pickled data from the previous example, - i.e. the data which we have written to the file data.pkl, - can be separated into two lists again, when we read back in again the data:
shelve Module
One drawback of the pickle module is that it is only capable of pickling one object at the time, which has to be unpickled in one go. Let's imagine this data object is a dictionary. It may be desirable that we don't
have to save and load every time the whole dictionary, but save and load just a single value corresponding to just one key. The shelve module is the solution to this request. A "shelf" - as used in the shelve module
- is a persistent, dictionary-like object. The difference with dbm databases is that the values (not the keys!) in a shelf can be essentially arbitrary Python objects -- anything that the "pickle" module can handle. This
includes most class instances, recursive data types, and objects containing lots of shared sub-objects. The keys have to be strings.
The shelve module can be easily used. Actually, it is as easy as using a dictionary in Python. Before we can use a shelf object, we have to import the module. After this, we have to open a shelve object with the
shelve method open. The open method opens a special shelf file for reading and writing:
If the file "MyShelve" already exists, the open method will try to open it. If it isn't a shelf file, - i.e. a file which has been created with the shelve module, - we will get an error message. If the file doesn't exist, it will
be created.
>>> s.close()
We can use the previously created shelf file in another program or in an interactive Python session:
$ python3
Python 3.2.3 (default, Feb 28 2014, 00:22:33)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import shelve
>>> s = shelve.open("MyShelve")
>>> s["street"]
'Fleet Str'
>>> s["city"]
'London'
>>>
It is also possible to cast a shelf object into an "ordinary" dictionary with the dict function:
>>> s
≤shelve.DbfilenameShelf object at 0xb7133dcc>
>>> dict(s)
{'city': 'London', 'street': 'Fleet Str'}
>>>
The following example uses more complex values for our shelf object:
$ python3
Python 3.2.3 (default, Feb 28 2014, 00:22:33)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import shelve
>>> tele = shelve.open("MyPhoneBook")
>>> tele["Steve"]["phone"]
'8745'
>>>
Exercises
1. The file cities_and_times.txt contains city names and times. Each line contains the name of the city, followed by the name of the day ("Sun") and the time in the form hh:mm. Read in the file and create an
alphabetically ordered list of the form
[('Amsterdam', 'Sun', (8, 52)), ('Anchorage', 'Sat', (23, 52)), ('Ankara', 'Sun', (10, 52)), ('Athens', 'Sun', (9, 52)), ('Atlanta', 'Sun', (2, 52)), ('Auckland', 'Sun',
(20, 52)), ('Barcelona', 'Sun', (8, 52)), ('Beirut', 'Sun', (9, 52)),
...
('Toronto', 'Sun', (2, 52)), ('Vancouver', 'Sun', (0, 52)), ('Vienna', 'Sun', (8, 52)), ('Warsaw', 'Sun', (8, 52)), ('Washington DC', 'Sun', (2, 52)), ('Winnipeg', 'Sun',
(1, 52)), ('Zurich', 'Sun', (8, 52))]
Finally, the list should be dumped for later usage with the pickle module. We will use this list in our chapter on Numpy dtype.
1. import pickle
lines = open("cities_and_times.txt").readlines()
lines.sort()
cities = []
for line in lines:
*city, day, time = line.split()
hours, minutes = time.split(":")
cities.append((" ".join(city), day, (int(hours), int(minutes)) ))
fh = open("cities_and_times.pkl", "bw")
pickle.dump(cities, fh)
City names can consist of multiple words like "Salt Lake City". That is why we have to use the asterisk in the line, in which we split a line. So city will be a list with the words of the city, e.g. ["Salt", "Lake",
"City"]. " ".join(city) turns such a list into a "proper" string with the city name, i.e. in our example "Salt Lake City".
© 2011 - 2018, Bernd Klein, Bodenseo; Design by Denise Mitchinson adapted for python-course.eu by Bernd Klein