Back in 2022, I participated in the Google’s foo.bar challenge and wrote a short writeup. While I was looking through my solutions, I noticed my solution for Bunny Worker Locations could benefit from a small improvement so that it can produce same solutions a lot faster at O(1) time complexity.

Analysis of the Previous Solution

1
2
3
4
5
6
7
8
9
def solution(x, y):
    # i-th x on x-coordinate with y=0 will always be the sum.
    xn = sum(range(1, x+1))

    # Deltas from the y-coordinate growth series.
    yn = sum(range(1, y-1))

    # Organic growth from the y-coordinate.
    return xn + yn + (x * (y-1))
  • Line 3: Time complexity of O(x) as range(), albeit from Python 3 it’s lazy sequence, when evaluated would still iterate over [1, x+1) (or [0, x]) at sum().

  • Line 6: Time complexity of O(y) for the similar reason as above.

  • Line 9: Time complexity of O(1) as it’s a simple arithmetic.

So this function runs at O(x + y) time complexity.

New Solution using Triangular Number Formula

Triangular number, as Wikipedia defines it, is “the nth … is the number of dots in the triangular arrangement with n dots on each side, and is equal to the sum of the n natural numbers from 1 to n.”. So what I’m doing in line 3 and 6 can be expressed using the shortcut:

Or as code:

1
xn = (x * (x + 1)) // 2  # xn = sum(range(1, x+1))

But you might be wondering why not just take the distributive property and write the solution as xn = (x**2 + x) // 2? If we disassemble these two implementation in Python, we can see the following:

Both implementations have roughly similar opcodes - but the difference is:

  • xn = (x * (x + 1)) // 2

    • BINARY_ADD -> BINARY_MULTIPLY -> BINARY_FLOOR_DIVIDE
  • xn = (x**2 + x) // 2

    • BINARY_POWER -> BINARY_ADD -> BINARY_FLOOR_DIVIDE

Where BINARY_POWER (pow()) is more expensive compared to BINARY_MULTIPLY so the first is preferred if we’re squeezing out every performance and simplicity from the code.

Now filling the remaining section for y range, the new solution becomes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
def solution(x, y):
    # i-th x on x-coordinate with y=0 will always be the sum.
    xn = (x * (x + 1)) // 2

    # Deltas from the y-coordinate growth series. The `if` guards for when  `y < 2`
    # as that will multiply two negatives to output a positive number (which leads
    # to an off-by-one error).
    yn = ((y - 2) * (y - 1)) // 2 if y >= 2 else 0

    # Organic growth from the y-coordinate.
    return xn + yn + (x * (y - 1))

Where the entire solution is a long chain of arithmetic with no function calls.

Benchmark

As (x, y) grows larger, we can see the previous solution (a()) slows down considerably whereas the new one (b()) is kept at a constant O(1)!