This blog post iterates on some interesting (and some common) modernization issues I saw that came up during the Python 2 to 3 migration process at work. The migration had occurred due to the EOL date for Python 2.7 set to January 1, 2020.
MRO Algorithm Changed From DLR to C3 Linearization
Method Resolution Order is the logical path in which a child (base) class follows the parent (super) class(es) to resolve an invoked method or an attribute. This is essential to produce predictable and reproducible class inheritance behaviors.
In Python 2, “Depth-first and Left-to-Right” (DLR) algorithm is used to evaluate multi-level inheritance patterns. In DLR, a base node traverses to the top-most (root) super node first before iterating horizontally left-to-right at each descent.
On the other hand, in Python 3, C3 algorithm is used which prioritizes children’s importance over their parents (BFS-like). Instead of striving to resolve the top-most (root) super node first, it prioritizes resolving per escalations.
For example, here’s a commonly-used linear inheritance pattern.
|
|
In this example, both Python 2 and 3 produce identical MROs.
Invoking B.method()
executes its super A.method()
implementation since method()
is not defined in B
. But if it tries to invoke, for instance, B.no_method()
static method, it will raise AttributeError
as it’s neither defined in B
, A
, and object
respectively.
Here’s another (diamond) pattern that’s used somewhat commonly as well.
The MRO from class B
and C
remains nearly identical - with the exception that C
tries resolution attempts starting from C
instead of B
.
But from class D
, the MRO bifurcation occurs between Python 2 and 3 implementation. A MRO in Python would look like (for explanation, please refer to description above):
And in Python 3 (for explanation, please refer to description above):
As we can see, class D
traverses to depth-first (top-most) and then - at each level of descent - traverses in left-to-right direction. Incidentally, if we change the order of D
’s inheritance from C
to B
, the order becomes counter-clockwise but still preserves the same concept.
Python 2:
Python 3:
This is what happens if we resolve an “eight”-shaped inheritance pattern from F
.
Python 2:
Python 3:
This is what happens if we resolve an “mixin”-style inheritance pattern from G
.
Python 2 and Python 3:
This is what happens if we resolve a multi-inheritance pattern from K
.
Python 2:
Python 3:
In summary, with Python 2 to Python 3 interpreter changes, MRO changes will also affect how class inheritances are resolved. This means that if your OOP is structured around multiple inheritances and hierarchy, it might be a good time to double check if the new resolution does not break any existing expectations.
Bytes vs. Strings
One of the biggest pain points was having to separately deal with strings and bytes - especially around network packets and files. Previously in Python 2, operators were able to be applied on both str
and bytes
interchangeably as such:
|
|
This equality operator would yield True
in Python 2 and False
in Python 3. The simplicity of dealing with binary data equally as strings was replaced by requiring to use .encode()
and .decode()
to convert from one type to another.
And couple more relationships for convenience.
Memory Allocation
I thought this was interesting. I wrote a small routine that calculates memory offsets and direction. In Python 2 and Python 3, there seems to be a difference.
Python 2:
a = -2 (0x55eaaa39bd20)
b = -1 (0x55eaaa39bd08)
a,b offset = 0x18
+---------------------------------------+-----+------------+-------------+
| mem_addr | int | offset | direction |
+---------------------------------------+-----+------------+-------------+
| 0x55eaaa39bd08 (-0.0000000000000568%) | -1 | 24 B | high -> low |
| 0x55eaaa39bcf0 (0.0000000000000000%) | 0 | 24 B | high -> low |
| 0x55eaaa39bcd8 (0.0000000000000568%) | 1 | 24 B | high -> low |
| 0x55eaaa39bcc0 (0.0000000000001137%) | 2 | 24 B | high -> low |
| 0x55eaaa39bca8 (0.0000000000001705%) | 3 | 24 B | high -> low |
| 0x55eaaa39bc90 (0.0000000000002274%) | 4 | 24 B | high -> low |
| 0x55eaaa39bc78 (0.0000000000002842%) | 5 | 24 B | high -> low |
| 0x55eaaa39bc60 (0.0000000000003411%) | 6 | 24 B | high -> low |
| 0x55eaaa39bc48 (0.0000000000003979%) | 7 | 24 B | high -> low |
| 0x55eaaa39bc30 (0.0000000000004547%) | 8 | 24 B | high -> low |
| 0x55eaaa39bc18 (0.0000000000005116%) | 9 | 24 B | high -> low |
...
| 0x55eaaa39b9c0 (0.0000000000019327%) | 34 | 24 B | high -> low |
| 0x55eaaa39b9a8 (0.0000000000019895%) | 35 | -1968 B | low -> high |
...
| 0x55eaaa39cd58 (0.0000000000136424%) | 240 | -1968 B | low -> high |
| 0x55eaaa39d508 (0.0000000000136993%) | 241 | 24 B | high -> low |
...
| 0x55eaaa39d3b8 (0.0000000000144951%) | 255 | 24 B | high -> low |
| 0x55eaaa39d3a0 (0.0000000000145519%) | 256 | -3585288 B | low -> high |
| 0x55eaaa7086e0 (0.0000000000146088%) | 257 | -240 B | low -> high |
+---------------------------------------+-----+------------+-------------+
Python 3:
a = -2 (0x953de0)
b = -1 (0x953e00)
a,b offset = -0x20
+--------------------------------------+-----+--------------------+-------------+
| mem_addr | int | offset | direction |
+--------------------------------------+-----+--------------------+-------------+
| 0x953e00 (-0.0000000000000568%) | -1 | -32 B | low -> high |
| 0x953e20 (0.0000000000000000%) | 0 | -32 B | low -> high |
| 0x953e40 (0.0000000000000568%) | 1 | -32 B | low -> high |
| 0x953e60 (0.0000000000001137%) | 2 | -32 B | low -> high |
| 0x953e80 (0.0000000000001705%) | 3 | -32 B | low -> high |
| 0x953ea0 (0.0000000000002274%) | 4 | -32 B | low -> high |
| 0x953ec0 (0.0000000000002842%) | 5 | -32 B | low -> high |
| 0x953ee0 (0.0000000000003411%) | 6 | -32 B | low -> high |
| 0x953f00 (0.0000000000003979%) | 7 | -32 B | low -> high |
| 0x953f20 (0.0000000000004547%) | 8 | -32 B | low -> high |
| 0x953f40 (0.0000000000005116%) | 9 | -32 B | low -> high |
...
| 0x955e00 (0.0000000000144951%) | 255 | -32 B | low -> high |
| 0x955e20 (0.0000000000145519%) | 256 | -139864651158032 B | low -> high |
| 0x7f34c76eb0d0 (0.0000000000146088%) | 257 | -64 B | low -> high |
+--------------------------------------+-----+--------------------+-------------+
Not something that will affect the runtime (maybe tiny performance-related?), but just an interesting observation.
Easier Type Hints
In Python 2, I personally disliked how it didn’t have type notations like Java or C. This led me to identifying simple mistakes of wrong type operations too late until the buggy line was executed (Python doesn’t detect type errors early since it interprets instead of compiling the code).
Although you would be able to utilize tools like mypy and generate *.pyi
files to separately define types, Python 3 allows for more convenient syntax as such:
|
|
Print()
In Python 2, print
can be written as a statement but only as a function in Python 3.
|
|
Finding this code pattern is easy with a simple grep
.
|
|
Different Division Behaviors
In Python 2, the division operator “/
” returns a floored value.
|
|
To explicitly get a floating point, either numerator or denominator must be a float.
|
|
In Python 3, the default behavior of the division operator is to return a float.
|
|
But you can floor the value by using double “//
”.
|
|
Pickle Serializations
In some places of our codebase, there were a few static references to pickle
protocol version explicitly being set to 2. The reasoning was probably to maintain explicit consistency in other codebases, but it also meant with Python 3,
Long and int
In Python 2, there were two types of integers: long
and int
. The long
s can be extended as much as the system memory allows it to. The int
s were contained by the size of C-integers (32 or 64 bits). But there are other differences as well.
In Python 3, these two were merged into a single int
type.
map
, reduce
, filter
, and range
The three functions, map()
, reduce()
, and filter()
were my go-tos in Python 2 as they abstracted away for
loops into a simple func(func, iterable)
. In Python 2, these three evaluated immediately and returned the resultant product.
|
|
In Python 3, however, they simply return a generator object to later consume.
|
|
And reduce
was removed.