@initbar | Python 2/3 Modernization

This post goes over some interesting modernization issues that came up during the Python 2.* to 3 migration.

MRO Algorithm Changed From DLR to C3 Linearization

Method Resolution Order (MRO) is the logical path for a child class to follow to resolve an invoked method or an attribute. Having a deterministic order is essential to produce predictable and reproducible class inheritance behaviors.

In Python 2, “Depth-first and Left-to-Right” (DLR) algorithm is used to evaluate multi-level inheritance patterns. In DLR, a base node traverses to the top-most super node first before iterating horizontally left-to-right at each descent. In Python 3, C3 algorithm is used to prioritize children’s importance over their parents. Instead of striving to resolve the top-most (root) super node first, it prioritizes resolving per escalations.

Linear Inheritance

Here’s the commonly-used linear inheritance pattern which results in identical resolution order in both versions.

Invoking B.method() executes A.method() since method() is not defined in the class B. Invoking B.no_method() method will expectedly raise AttributeError as .no_method() is neither defined in B and A.

Diamond Inheritance

MRO from B(A) or C(A)

Python 2:
  B -> A

Python 3:
  C -> A

MRO from D(B, C)

Python 2:
  D -> B -> A -> C

Python 3:
  D -> B -> C -> A

MRO from D(C, B)

Python 2:
  D -> C -> A -> B

Python 3:
  D -> C -> B -> A

“Eight” Inheritance

MRO from F(D)

Python 2:
  F -> D -> B -> A -> C

Python 3:
  F -> D -> B -> C -> A

“Mix-in” Inheritance

MRO from G(D, E)

Python 2:
  G -> D -> A -> E -> B

Python 3:
  G -> D -> A -> E -> B

With Python 2 to Python 3 interpreter changes, MRO changes also affect how inheritances are resolved. This means that if your OOP is structured around multiple inheritances and hierarchy, it might be a good time to double check if the new resolution does not break any existing expectations.

Bytes vs. Strings

One of the biggest pain points is having to deal with strings and bytes - especially when parsing network packets and reading files. Previously in Python 2, operators can be applied to both str and bytes interchangeably.

1
b'a' * 100 == 'a' * 100  # True

This equality operator now yields False in Python 3. Dealing with binary data now requires .encode() and .decode() to convert from one type to another.

And couple more relationships.

Memory Allocation

I wrote a small routine below that calculates memory offsets and direction.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
from __future__ import print_function
from prettytable import PrettyTable

MEMORY_UPPER_LIMIT = float(16 * 2 << 39)

a = -2
b = a + 1
print('a = %s (%s)' % (a, hex(id(a))))
print('b = %s (%s)' % (b, hex(id(b))))
print('a,b offset = %s' % hex(id(a) - id(b))) # 24 bytes offset???

x = a
y = b
t = PrettyTable(['mem_addr', 'int', 'offset', 'direction'])

for i in range(258):
    x += 1
    y += 1
    offset = id(x) - id(y)
    direction = (
        'high -> low'
        if offset > 0
        else 'low -> high')
    t.add_row([
        '%s (%0.16f%%)' % (hex(id(x)), x / MEMORY_UPPER_LIMIT),
        x,
        '%s B' % (offset),
        direction])

x += 1
y += 1
offset = id(x) - id(y)
direction = (
    'high -> low'
    if offset > 0
    else 'low -> high'
)

try:
    t.add_row([
        '%s (%0.16f%%)' % (hex(id(x)), x / MEMORY_UPPER_LIMIT),
        x,
        '%s B' % (offset),
        direction])
finally:
    print(t)

Python 2

a = -2 (0x55eaaa39bd20)
b = -1 (0x55eaaa39bd08)
a,b offset = 0x18

mem_addr	int	offset	direction
0x55eaaa39bd08	-1	24 B	high -> low
0x55eaaa39bcf0	0	24 B	high -> low
0x55eaaa39bcd8	1	24 B	high -> low
0x55eaaa39bcc0	2	24 B	high -> low
0x55eaaa39bca8	3	24 B	high -> low
0x55eaaa39bc90	4	24 B	high -> low
0x55eaaa39bc78	5	24 B	high -> low
0x55eaaa39bc60	6	24 B	high -> low
0x55eaaa39bc48	7	24 B	high -> low
0x55eaaa39bc30	8	24 B	high -> low
0x55eaaa39bc18	9	24 B	high -> low
0x55eaaa39b9c0	34	24 B	high -> low
0x55eaaa39b9a8	35	-1968 B	low -> high
0x55eaaa39cd58	240	-1968 B	low -> high
0x55eaaa39d508	241	24 B	high -> low
0x55eaaa39d3b8	255	24 B	high -> low
0x55eaaa39d3a0	256	-3585288 B	low -> high
0x55eaaa7086e0	257	-240 B	low -> high

Python 3

a = -2 (0x953de0)
b = -1 (0x953e00)
a,b offset = -0x20

mem_addr	int	offset	direction
0x953e00	-1	-32 B	low -> high
0x953e20	0	-32 B	low -> high
0x953e40	1	-32 B	low -> high
0x953e60	2	-32 B	low -> high
0x953e80	3	-32 B	low -> high
0x953ea0	4	-32 B	low -> high
0x953ec0	5	-32 B	low -> high
0x953ee0	6	-32 B	low -> high
0x953f00	7	-32 B	low -> high
0x953f20	8	-32 B	low -> high
0x953f40	9	-32 B	low -> high
0x955e00	255	-32 B	low -> high
0x955e20	256	-139864651158032 B	low -> high
0x7f34c76eb0d0	257	-64 B	low -> high

Print

In Python 2, print can be written as a statement but only as a function in Python 3.

1
2
print "Hello"   # Python 2
print("Hello")  # Python 2 and Python 3

Finding this code pattern is easy with a simple grep.

1
fgrep -Hr 'print ' .

Different Division Behaviors

In Python 2, the division operator “/” returns a floored value.

1
>>> 1/2  # 0

To explicitly get a floating point, either numerator or denominator must be a float.

1
>>> 1/2.  # 0.5

In Python 3, the default behavior of the division operator is to return a float.

1
>>> 1/2  # 0.5

But you can floor the value by using double “//”.

1
>>> 1//2  # 0

`map`, `reduce`, `filter`, `range`

map(), reduce(), and filter() were my go-tos in Python 2 as they abstracted away for loops into a simple func(func, iterable). In Python 2, these three evaluated immediately and returned the resultant.

>>> map(lambda e: e+1, range(10))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In Python 3, however, they simply return a generator object.

>>> map(lambda x: x+1, range(10))
<map object at 0x7fa1fc2032b0>

And reduce was removed.

Long and int

In Python 2, there were two types of integers: long and int. The longs can be extended as much as the system memory allows it to. The ints were contained by the size of C-integers (32 or 64 bits), plus other differences.

In Python 3, these two were merged into a single int type.

Python 2/3 Modernization