--- layout: post title: Replacing Objects in Python tags: Python description: More reflection than you cared to ask for draft: true --- Today, we're going to demonstrate a fairly evil thing in Python, which I call _object replacement_. Say you have some program that's been running for a while, and a particular object has made its way throughout your code. It lives inside lists, class attributes, maybe even inside some closures. You want to completely replace this object with another one; that is to say, you want to find all references to object `A` and replace them with object `B`, enabling `A` to be garbage collected. This has some interesting implications for special object types. If you have methods that are bound to `A`, you want to rebind them to `B`. If `A` is a class, you want all instances of `A` to become instances of `B`. And so on. _But why on Earth would you want to do that?_ you ask. I'll focus on a concrete use case in a future post, but for now, I imagine this could be useful in some kind of advanted unit testing situation with mock objects. Still, it's fairly insane, so let's leave it as primarily an intellectual exercise. ## Review First, a recap on terminology here. You can skip this section if you know Python well. In Python, _names_ are what most languages call "variables". They reference _objects_. So when we do: {% highlight python %} a = [1, 2, 3, 4] {% endhighlight %} We are creating a list object with four integers, and binding it to the name `a`: %3L[1, 2, 3, 4]aaa->L In each of the following examples, we are creating new _references_ to the list object, but we are never duplicating it. Each reference points to the same memory address (which you can get using `id(a)`, but that's a CPython implementation detail). {% highlight python %} b = a {% endhighlight %} {% highlight python %} c = SomeContainerClass() c.data = a {% endhighlight %} {% highlight python %} def wrapper(L): def inner(): return L.pop() return inner d = wrapper(a) {% endhighlight %} %3cluster0dobj[1, 2, 3, 4]aaa->objbbb->objcc.datac->objLLL->obj Note that these references are all equal. `a` is no more valid a name for the list than `b`, `c.data`, or `L` (from the perspective of `d`, which is exposed to everyone else as `d.func_closure[0].cell_contents`, but that's cumbersome and you would never do that in practice). As a result, if you delete one of these references—explicitly with `del a`, or implicitly if a name goes out of scope—then the other references are still around, and object continues to exist. If all of an object's references disappear, then Python's garbage collector should eliminate it. ## Fishing for references with Guppy So, this boils down to finding all of the references to a particular object, and then updating them to point to a different object. But how do we track references? Fortunately for us, there is a library called [Guppy](http://guppy-pe.sourceforge.net/) that allows us to do this. ## Handling different reference types ### Dictionaries dicts, class attributes via `__dict__`, locals() ### Lists simple replacement ### Tuples recursively replace parent since immutable ### Bound methods note that built-in methods and regular methods have different underlying C structs, but have the same offsets for their self field ### Closure cells function closures ### Frames ... ### Slots ... ### Classes ... ### Other cases Certainly, not every case is handled above, but it seems to cover the vast majority of instances that I've found through testing. There are a number of reference relations in Guppy that I couldn't figure out how to replicate without doing something insane (`R_HASATTR`, `R_CELL`, and `R_STACK`), so some obscure replacements are likely unimplemented. Some other kinds of replacements are known, but impossible. For example, replacing a class object that uses `__slots__` with another class will not work if the replacement class has a different slot layout and instances of the old class exist. More generally, replacing a class with a non-class object won't work if instances of the class exist. Furthermore, references stored in data structures managed by C extensions cannot be changed, since there's no good way for us to track these. Remaining areas to explore include behavior when metaclasses and more complex descriptors are involved. Implementing a more complete version of `replace()` is left as an exercise for the reader. ## Notes The [DOT files](https://en.wikipedia.org/wiki/DOT_(graph_description_language)) used to generate graphs in this post are [available on Gist](https://gist.github.com/earwig/edc13f04f871c110eea6).