Porting Python 2 Code To Python 3: Guido Van Rossum Fred L. Drake, JR., Editor
Porting Python 2 Code To Python 3: Guido Van Rossum Fred L. Drake, JR., Editor
Release 3.3.2
Contents
1 Choosing a Strategy 1.1 Universal Bits of Advice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Python 3 and 3to2 Python 2 and 2to3 3.1 Support Python 2.7 . . . . . . . . . . . . . . . . . . . . . 3.2 Try to Support Python 2.6 and Newer Only . . . . . . . . from __future__ import print_function . . from __future__ import unicode_literals Bytes literals . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Supporting Python 2.5 and Newer Only . . . . . . . . . . from __future__ import absolute_import . Mark all Unicode strings with a u prex . . . . . . . . . . 3.4 Handle Common Gotchas . . . . . . . . . . . . . . . . from __future__ import division . . . . . . . Specify when opening a le as binary . . . . . . . . . . . Text les . . . . . . . . . . . . . . . . . . . . . . . . . . . Subclass object . . . . . . . . . . . . . . . . . . . . . . Deal With the Bytes/String Dichotomy . . . . . . . . . . . Indexing bytes objects . . . . . . . . . . . . . . . . . . . __str__()/__unicode__() . . . . . . . . . . . . . Dont Index on Exceptions . . . . . . . . . . . . . . . . . Dont use __getslice__ & Friends . . . . . . . . . . . Updating doctests . . . . . . . . . . . . . . . . . . . . . . Update map for imbalanced input sequences . . . . . . . . 3.5 Eliminate -3 Warnings . . . . . . . . . . . . . . . . . . . 3.6 Run 2to3 . . . . . . . . . . . . . . . . . . . . . . . . . . . Manually . . . . . . . . . . . . . . . . . . . . . . . . . . During Installation . . . . . . . . . . . . . . . . . . . . . 3.7 Verify & Test . . . . . . . . . . . . . . . . . . . . . . . . ii ii iii iii . iv . iv . iv . iv . iv . v . v . v . v . v . vi . vi . vi . vi . vii . viii . viii . viii . ix . ix . ix . ix . ix . ix . x
2 3
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
Python 2/3 Compatible Source 4.1 Follow The Steps for Using 2to3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Use six . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Capturing the Currently Raised Exception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other Resources
x x x x xi
author Brett Cannon Abstract With Python 3 being the future of Python while Python 2 is still in active use, it is good to have your project available for both major releases of Python. This guide is meant to help you choose which strategy works best for your project to support both Python 2 & 3 along with how to execute that strategy. If you are looking to port an extension module instead of pure Python code, please see cporting-howto.
1 Choosing a Strategy
When a project chooses to support both Python 2 & 3, a decision needs to be made as to how to go about accomplishing that goal. The chosen strategy will depend on how large the projects existing codebase is and how much divergence you want from your current Python 2 codebase (e.g., changing your code to work simultaneously with Python 2 and 3). If you would prefer to maintain a codebase which is semantically and syntactically compatible with Python 2 & 3 simultaneously, you can write Python 2/3 Compatible Source. While this tends to lead to somewhat non-idiomatic code, it does mean you keep a rapid development process for you, the developer. If your project is brand-new or does not have a large codebase, then you may want to consider writing/porting all of your code for Python 3 and use 3to2 to port your code for Python 2. Finally, you do have the option of using 2to3 to translate Python 2 code into Python 3 code (with some manual help). This can take the form of branching your code and using 2to3 to start a Python 3 branch. You can also have users perform the translation at installation time automatically so that you only have to maintain a Python 2 codebase. Regardless of which approach you choose, porting is not as hard or time-consuming as you might initially think. You can also tackle the problem piece-meal as a good portion of porting is simply updating your code to follow current best practices in a Python 2/3 compatible way.
setup( name=Your Library, version=1.0, classifiers=[ # make sure to use :: Python *and* :: Python :: 3 so # that pypi can list the package on the python 3 page Programming Language :: Python, Programming Language :: Python :: 3 ], packages=[yourlibrary], # make sure to add custom_fixers to the MANIFEST.in include_package_data=True, # ... ) Doing so will cause your project to show up in the Python 3 packages list. You will know you set the classier properly as visiting your project page on the Cheeseshop will show a Python 3 logo in the upper-left corner of the page. Three, the six project provides a library which helps iron out differences between Python 2 & 3. If you nd there is a sticky point that is a continual point of contention in your translation or maintenance of code, consider using a source-compatible solution relying on six. If you have to create your own Python 2/3 compatible solution, you can use sys.version_info[0] >= 3 as a guard. Four, read all the approaches. Just because some bit of advice applies to one approach more than another doesnt mean that some advice doesnt apply to other strategies. This is especially true of whether you decide to use 2to3 or be source-compatible; tips for one approach almost always apply to the other. Five, drop support for older Python versions if possible. Python 2.5 introduced a lot of useful syntax and libraries which have become idiomatic in Python 3. Python 2.6 introduced future statements which makes compatibility much easier if you are going from Python 2 to 3. Python 2.7 continues the trend in the stdlib. So choose the newest version of Python which you believe can be your minimum support version and work from there. Six, target the newest version of Python 3 that you can. Beyond just the usual bugxes, compatibility has continued to improve between Python 2 and 3 as time has passed. This is especially true for Python 3.3 where the u prex for strings is allowed, making source-compatible Python code easier. Seven, make sure to look at the Other Resources for tips from other people which may help you out.
code from their Python 2 codebase and maintain them as independent codebases. You can even begin preparing to use this approach today by writing future-compatible Python code which works cleanly in Python 2 in conjunction with 2to3; all steps outlined below will work with Python 2 code up to the point when the actual use of 2to3 occurs. Use of 2to3 as an on-demand translation step at install time is also possible, preventing the need to maintain a separate Python 3 codebase, but this approach does come with some drawbacks. While users will only have to pay the translation cost once at installation, you as a developer will need to pay the cost regularly during development. If your codebase is sufciently large enough then the translation step ends up acting like a compilation step, robbing you of the rapid development process you are used to with Python. Obviously the time required to translate a project will vary, so do an experimental translation just to see how long it takes to evaluate whether you prefer this approach compared to using Python 2/3 Compatible Source or simply keeping a separate Python 3 codebase. Below are the typical steps taken by a project which tries to support Python 2 & 3 while keeping the code directly executable by Python 2.
This point cannot be stressed enough: make sure you know what all of your string literals in Python 2 are meant to become in Python 3. Any string literal that should be treated as bytes should have the b prex. Any string literal that should be Unicode/text in Python 2 should either have the u literal (supported, but ignored, in Python 3.3 and later) or you should have from __future__ import unicode_literals at the top of the le. But the key point is you should know how Python 3 will treat everyone one of your string literals and you should mark them as appropriate. There are some differences between byte literals in Python 2 and those in Python 3 thanks to the bytes type just being an alias to str in Python 2. Probably the biggest gotcha is that indexing results in different values. In Python 2, the value of bpy[1] is y, while in Python 3 its 121. You can avoid this disparity by always slicing at the size of a single element: bpy[1:2] is y in Python 2 and by in Python 3 (i.e., close enough). You cannot concatenate bytes and strings in Python 3. But since Python 2 has bytes aliased to str, it will succeed: ba + ub works in Python 2, but ba + b in Python 3 is a TypeError. A similar issue also comes about when doing comparisons between bytes and strings.
Specify when opening a le as binary Unless you have been working on Windows, there is a chance you have not always bothered to add the b mode when opening a binary le (e.g., rb for binary reading). Under Python 3, binary les and text les are clearly distinct and mutually incompatible; see the io module for details. Therefore, you must make a decision of whether a le will be used for binary access (allowing to read and/or write bytes data) or text access (allowing to read and/or write unicode data). Text les Text les created using open() under Python 2 return byte strings, while under Python 3 they return unicode strings. Depending on your porting strategy, this can be an issue. If you want text les to return unicode strings in Python 2, you have two possibilities: Under Python 2.6 and higher, use io.open(). Since io.open() is essentially the same function in both Python 2 and Python 3, it will help iron out any issues that might arise. If pre-2.6 compatibility is needed, then you should use codecs.open() instead. This will make sure that you get back unicode strings in Python 2. Subclass object New-style classes have been around since Python 2.2. You need to make sure you are subclassing from object to avoid odd edge cases involving method resolution order, etc. This continues to be totally valid in Python 3 (although unneeded as all classes implicitly inherit from object). Deal With the Bytes/String Dichotomy One of the biggest issues people have when porting code to Python 3 is handling the bytes/string dichotomy. Because Python 2 allowed the str type to hold textual data, people have over the years been rather loose in their delineation of what str instances held text compared to bytes. In Python 3 you cannot be so care-free anymore and need to properly handle the difference. The key handling this issue is to make sure that every string literal in your Python 2 code is either syntactically of functionally marked as either bytes or text data. After this is done you then need to make sure your APIs are designed to either handle a specic type or made to be properly polymorphic.
Mark Up Python 2 String Literals
First thing you must do is designate every single string literal in Python 2 as either textual or bytes data. If you are only supporting Python 2.6 or newer, this can be accomplished by marking bytes literals with a b prex and then designating textual data with a u prex or using the unicode_literals future statement. If your project supports versions of Python predating 2.6, then you should use the six project and its b() function to denote bytes literals. For text literals you can either use sixs u() function or use a u prex.
Decide what APIs Will Accept
In Python 2 it was very easy to accidentally create an API that accepted both bytes and textual data. But in Python 3, thanks to the more strict handling of disparate types, this loose usage of bytes and text together tends to fail. Take the dict {ba: bytes, ua: text} in Python 2.6. It creates the dict {ua: text} since ba == ua. But in Python 3 the equivalent dict creates {ba: bytes, a: text}, i.e., no lost data. Similar issues can crop up when transitioning Python 2 code to Python 3.
This means you need to choose what an API is going to accept and create and consistently stick to that API in both Python 2 and 3.
Bytes / Unicode Comparison
In Python 3, mixing bytes and unicode is forbidden in most situations; it will raise a TypeError where Python 2 would have attempted an implicit coercion between types. However, there is one case where it doesnt and it can be very misleading: >>> b"" == "" False This is because an equality comparison is required by the language to always succeed (and return False for incompatible types). However, this also means that code incorrectly ported to Python 3 can display buggy behaviour if such comparisons are silently executed. To detect such situations, Python 3 has a -b ag that will display a warning: $ python3 -b >>> b"" == "" __main__:1: BytesWarning: Comparison between bytes and string False To turn the warning into an exception, use the -bb ag instead: $ python3 -bb >>> b"" == "" Traceback (most recent call last): File "<stdin>", line 1, in <module> BytesWarning: Comparison between bytes and string Indexing bytes objects Another potentially surprising change is the indexing behaviour of bytes objects in Python 3: >>> b"xyz"[0] 120 Indeed, Python 3 bytes objects (as well as bytearray objects) are sequences of integers. But code converted from Python 2 will often assume that indexing a bytestring produces another bytestring, not an integer. To reconcile both behaviours, use slicing: >>> b"xyz"[0:1] bx >>> n = 1 >>> b"xyz"[n:n+1] by The only remaining gotcha is that an out-of-bounds slice returns an empty bytes object instead of raising IndexError: >>> b"xyz"[3] Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: index out of range >>> b"xyz"[3:4] b
__str__()/__unicode__() In Python 2, objects can specify both a string and unicode representation of themselves. In Python 3, though, there is only a string representation. This becomes an issue as people can inadvertently do things in their __str__() methods which have unpredictable results (e.g., innite recursion if you happen to use the unicode(self).encode(utf8) idiom as the body of your __str__() method). There are two ways to solve this issue. One is to use a custom 2to3 xer. The blog post at http://lucumr.pocoo.org/2011/1/22/forwards-compatible-python/ species how to do this. That will allow 2to3 to change all instances of def __unicode(self): ... to def __str__(self): .... This does require that you dene your __str__() method in Python 2 before your __unicode__() method. The other option is to use a mixin class. This allows you to only dene a __unicode__() method for your class and let the mixin derive __str__() for you (code from http://lucumr.pocoo.org/2011/1/22/forwards-compatiblepython/): import sys class UnicodeMixin(object): """Mixin class to handle defining the proper __str__/__unicode__ methods in Python 2 or 3.""" if sys.version_info[0] >= 3: # Python 3 def __str__(self): return self.__unicode__() else: # Python 2 def __str__(self): return self.__unicode__().encode(utf8)
class Spam(UnicodeMixin): def __unicode__(self): return uspam-spam-bacon-spam Dont Index on Exceptions In Python 2, the following worked: >>> exc = Exception(1, 2, 3) >>> exc.args[1] 2 >>> exc[1] # Python 2 only! 2 But in Python 3, indexing directly on an exception is an error. You need to make sure to only index on the BaseException.args attribute which is a sequence containing all arguments passed to the __init__() method. Even better is to use the documented attributes the exception provides. Dont use __getslice__ & Friends Been deprecated for a while, but Python 3 nally drops support for __getslice__(), etc. Move completely over to __getitem__() and friends.
Updating doctests 2to3 will attempt to generate xes for doctests that it comes across. Its not perfect, though. If you wrote a monolithic set of doctests (e.g., a single docstring containing all of your doctests), you should at least consider breaking the doctests up into smaller pieces to make it more manageable to x. Otherwise it might very well be worth your time and effort to port your tests to unittest. Update map for imbalanced input sequences With Python 2, map would pad input sequences of unequal length with None values, returning a sequence as long as the longest input sequence. With Python 3, if the input sequences to map are of unequal length, map will stop at the termination of the shortest of the sequences. For full compatibility with map from Python 2.x, also wrap the sequences in itertools.zip_longest(), e.g. map(func, *sequences) becomes list(map(func, itertools.zip_longest(*sequences))).
# ... ) For Distribute: setup(use_2to3=True, # ... ) This will allow you to not have to distribute a separate Python 3 version of your project. It does require, though, that when you perform development that you at least build your project and use the built Python 3 source for testing.
try: raise Exception() except Exception, exc: # Current exception is exc pass This syntax changed in Python 3 (and backported to Python 2.6 and later) to: try: raise Exception() except Exception as exc: # Current exception is exc # In Python 3, exc is restricted to the block; Python 2.6 will "leak" pass Because of this syntax change you must change to capturing the current exception to: try: raise Exception() except Exception: import sys exc = sys.exc_info()[1] # Current exception is exc pass You can get more information about the raised exception from sys.exc_info() than simply the current exception instance, but you most likely dont need it. Note: In Python 3, the traceback is attached to the exception instance through the __traceback__ attribute. If the instance is saved in a local variable that persists outside of the except block, the traceback will create a reference cycle with the current frame and its dictionary of local variables. This will delay reclaiming dead resources until the next cyclic garbage collection pass. In Python 2, this problem only occurs if you save the traceback itself (e.g. the third element of the tuple returned by sys.exc_info()) in a variable.
5 Other Resources
The authors of the following blog posts, wiki pages, and books deserve special thanks for making public their tips for porting Python 2 code to Python 3 (and thus helping provide information for this document): http://python3porting.com/ http://docs.pythonsprints.com/python3_porting/py-porting.html http://techspot.zzzeek.org/2011/01/24/zzzeek-s-guide-to-python-3-porting/ http://dabeaz.blogspot.com/2011/01/porting-py65-and-my-superboard-to.html http://lucumr.pocoo.org/2011/1/22/forwards-compatible-python/ http://lucumr.pocoo.org/2010/2/11/porting-to-python-3-a-guide/ http://wiki.python.org/moin/PortingPythonToPy3k https://wiki.ubuntu.com/Python/3
If you feel there is something missing from this document that should be added, please email the python-porting mailing list.