This post goes over some interesting modernization issues that came up during the Python 2.* to 3 migration.

MRO Algorithm Changed From DLR to C3 Linearization

Method Resolution Order (MRO) is the logical path for a child class to follow to resolve an invoked method or an attribute. Having a deterministic order is essential to produce predictable and reproducible class inheritance behaviors.

In Python 2, “Depth-first and Left-to-Right” (DLR) algorithm is used to evaluate multi-level inheritance patterns. In DLR, a base node traverses to the top-most super node first before iterating horizontally left-to-right at each descent. In Python 3, C3 algorithm is used to prioritize children’s importance over their parents. Instead of striving to resolve the top-most (root) super node first, it prioritizes resolving per escalations.

Linear Inheritance

Here’s the commonly-used linear inheritance pattern which results in identical resolution order in both versions.

Invoking B.method() executes A.method() since method() is not defined in the class B. Invoking B.no_method() method will expectedly raise AttributeError as .no_method() is neither defined in B and A.

Diamond Inheritance

MRO from B(A) or C(A)

Python 2:
  B -> A

Python 3:
  C -> A

MRO from D(B, C)

Python 2:
  D -> B -> A -> C

Python 3:
  D -> B -> C -> A

MRO from D(C, B)

Python 2:
  D -> C -> A -> B

Python 3:
  D -> C -> B -> A

“Eight” Inheritance

MRO from F(D)

Python 2:
  F -> D -> B -> A -> C

Python 3:
  F -> D -> B -> C -> A

“Mix-in” Inheritance

MRO from G(D, E)

Python 2:
  G -> D -> A -> E -> B

Python 3:
  G -> D -> A -> E -> B

With Python 2 to Python 3 interpreter changes, MRO changes also affect how inheritances are resolved. This means that if your OOP is structured around multiple inheritances and hierarchy, it might be a good time to double check if the new resolution does not break any existing expectations.

Bytes vs. Strings

One of the biggest pain points is having to deal with strings and bytes - especially when parsing network packets and reading files. Previously in Python 2, operators can be applied to both str and bytes interchangeably.

1
b'a' * 100 == 'a' * 100  # True

This equality operator now yields False in Python 3. Dealing with binary data now requires .encode() and .decode() to convert from one type to another.

And couple more relationships.

Memory Allocation

I wrote a small routine below that calculates memory offsets and direction.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
from __future__ import print_function
from prettytable import PrettyTable

MEMORY_UPPER_LIMIT = float(16 * 2 << 39)

a = -2
b = a + 1
print('a = %s (%s)' % (a, hex(id(a))))
print('b = %s (%s)' % (b, hex(id(b))))
print('a,b offset = %s' % hex(id(a) - id(b))) # 24 bytes offset???

x = a
y = b
t = PrettyTable(['mem_addr', 'int', 'offset', 'direction'])

for i in range(258):
    x += 1
    y += 1
    offset = id(x) - id(y)
    direction = (
        'high -> low'
        if offset > 0
        else 'low -> high')
    t.add_row([
        '%s (%0.16f%%)' % (hex(id(x)), x / MEMORY_UPPER_LIMIT),
        x,
        '%s B' % (offset),
        direction])

x += 1
y += 1
offset = id(x) - id(y)
direction = (
    'high -> low'
    if offset > 0
    else 'low -> high'
)

try:
    t.add_row([
        '%s (%0.16f%%)' % (hex(id(x)), x / MEMORY_UPPER_LIMIT),
        x,
        '%s B' % (offset),
        direction])
finally:
    print(t)

Python 2

a = -2 (0x55eaaa39bd20)
b = -1 (0x55eaaa39bd08)
a,b offset = 0x18
mem_addrintoffsetdirection
0x55eaaa39bd08-124 Bhigh -> low
0x55eaaa39bcf0024 Bhigh -> low
0x55eaaa39bcd8124 Bhigh -> low
0x55eaaa39bcc0224 Bhigh -> low
0x55eaaa39bca8324 Bhigh -> low
0x55eaaa39bc90424 Bhigh -> low
0x55eaaa39bc78524 Bhigh -> low
0x55eaaa39bc60624 Bhigh -> low
0x55eaaa39bc48724 Bhigh -> low
0x55eaaa39bc30824 Bhigh -> low
0x55eaaa39bc18924 Bhigh -> low
0x55eaaa39b9c03424 Bhigh -> low
0x55eaaa39b9a835-1968 Blow -> high
0x55eaaa39cd58240-1968 Blow -> high
0x55eaaa39d50824124 Bhigh -> low
0x55eaaa39d3b825524 Bhigh -> low
0x55eaaa39d3a0256-3585288 Blow -> high
0x55eaaa7086e0257-240 Blow -> high

Python 3

a = -2 (0x953de0)
b = -1 (0x953e00)
a,b offset = -0x20
mem_addrintoffsetdirection
0x953e00-1-32 Blow -> high
0x953e200-32 Blow -> high
0x953e401-32 Blow -> high
0x953e602-32 Blow -> high
0x953e803-32 Blow -> high
0x953ea04-32 Blow -> high
0x953ec05-32 Blow -> high
0x953ee06-32 Blow -> high
0x953f007-32 Blow -> high
0x953f208-32 Blow -> high
0x953f409-32 Blow -> high
0x955e00255-32 Blow -> high
0x955e20256-139864651158032 Blow -> high
0x7f34c76eb0d0257-64 Blow -> high

Print

In Python 2, print can be written as a statement but only as a function in Python 3.

1
2
print "Hello"   # Python 2
print("Hello")  # Python 2 and Python 3

Finding this code pattern is easy with a simple grep.

1
fgrep -Hr 'print ' .

Different Division Behaviors

In Python 2, the division operator “/” returns a floored value.

1
>>> 1/2  # 0

To explicitly get a floating point, either numerator or denominator must be a float.

1
>>> 1/2.  # 0.5

In Python 3, the default behavior of the division operator is to return a float.

1
>>> 1/2  # 0.5

But you can floor the value by using double “//”.

1
>>> 1//2  # 0

map, reduce, filter, range

map(), reduce(), and filter() were my go-tos in Python 2 as they abstracted away for loops into a simple func(func, iterable). In Python 2, these three evaluated immediately and returned the resultant.

>>> map(lambda e: e+1, range(10))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In Python 3, however, they simply return a generator object.

>>> map(lambda x: x+1, range(10))
<map object at 0x7fa1fc2032b0>

And reduce was removed.

Long and int

In Python 2, there were two types of integers: long and int. The longs can be extended as much as the system memory allows it to. The ints were contained by the size of C-integers (32 or 64 bits), plus other differences.

In Python 3, these two were merged into a single int type.