Tuesday, January 26, 2016

heapq and Missing Features

You sometimes need a bit more convenience than a barrel to store your water.

Recently, I wrote about heaps in Python and promised a sequel. Here, you are. This time, we ponder over the shortcomings of Python's heap module, heapq.

So, what's wrong with it? Nothing at all if you have the basic needs: fast push and pop onto a heap. However, as usual, at a certain point, you want more features in your program which in turn set the requirements higher for the heap implementation you use. If that happens, you usually
put a question online on StackOverflow or some mailing lists; or even more likely, because you are almost certainly not the first one having the issue at hand, you find several of them already (answered) there:
  1. removal of an item [1] [2] [3]
  2. different order of items (e.g. max-heaps) [4] [5] [6]
  3. object-oriented heaps [7]
Why would you need these features in the first place?

Consider the following situation: I have a set of items (let's call them tasks). I need to sort them by two independent criteria (priority and deadline). Naturally, I used two heaps to achieve that. The priority heap for working on the next most important tasks and the deadline heap for removing items which are too old. Furthermore, users can cancel arbitrary tasks at their discretion at any time.

So, we need to remove items from anywhere within in the heaps and we need two heaps of the same items but with a different sorting criteria. By sifting through the proposals, I realized I don't want to clutter the rest of my code to implement the missing functionality. I want a object-oriented heap implementation which encapsulates all of the magic involved to perform fast removal and ordering.

That's something heapq cannot deliver out of the box.

What can we do?

Realizing that heapq is a blazingly fast implementation, an object-oriented version should not slow you down. On the other hand considering the cleanliness of your code, I find heapq equally unacceptable to be used in production for problems described here and on StackOverflow. Considering the sheer amount of re-asking similar questions on platforms like StackOverflow is an indicator that the stdlib could do better here.

So, I went out to solve the issue once and for all; but that's another post. ;)

Best,
Sven

No comments:

Post a Comment