|
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276 |
- ---
- layout: post
- title: Replacing Objects in Python
- tags: Python
- description: More reflection than you cared to ask for
- draft: true
- ---
-
- Today, we're going to demonstrate a fairly evil thing in Python, which I call
- _object replacement_.
-
- Say you have some program that's been running for a while, and a particular
- object has made its way throughout your code. It lives inside lists, class
- attributes, maybe even inside some closures. You want to completely replace
- this object with another one; that is to say, you want to find all references
- to object `A` and replace them with object `B`, enabling `A` to be garbage
- collected. This has some interesting implications for special object types. If
- you have methods that are bound to `A`, you want to rebind them to `B`. If `A`
- is a class, you want all instances of `A` to become instances of `B`. And so
- on.
-
- _But why on Earth would you want to do that?_ you ask. I'll focus on a concrete
- use case in a future post, but for now, I imagine this could be useful in some
- kind of advanted unit testing situation with mock objects. Still, it's fairly
- insane, so let's leave it as primarily an intellectual exercise.
-
- This article is written for [CPython](https://en.wikipedia.org/wiki/CPython)
- 2.7.<sup><a id="ref1" href="#fn1">[1]</a></sup>
-
- ## Review
-
- First, a recap on terminology here. You can skip this section if you know
- Python well.
-
- In Python, _names_ are what most languages call "variables". They reference
- _objects_. So when we do:
-
- {% highlight python %}
-
- a = [1, 2, 3, 4]
-
- {% endhighlight %}
-
- ...we are creating a list object with four integers, and binding it to the name
- `a`. In graph form:<sup><a id="ref2" href="#fn2">[2]</a></sup>
-
- <svg width="223pt" height="44pt" viewBox="0.00 0.00 223.01 44.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 40)"><title>%3</title><polygon fill="white" stroke="none" points="-4,4 -4,-40 219.012,-40 219.012,4 -4,4"/><g id="node1" class="node"><title>L</title><polygon fill="none" stroke="black" stroke-width="0.5" points="215.018,-36 126.994,-36 126.994,-0 215.018,-0 215.018,-36"/><text text-anchor="middle" x="171.006" y="-15" font-family="Courier,monospace" font-size="10.00">[1, 2, 3, 4]</text></g><g id="node2" class="node"><title>a</title><ellipse fill="none" stroke="black" stroke-width="0.5" cx="27" cy="-18" rx="27" ry="18"/><text text-anchor="middle" x="27" y="-13.8" font-family="Courier,monospace" font-size="14.00">a</text></g><g id="edge1" class="edge"><title>a->L</title><path fill="none" stroke="black" stroke-width="0.5" d="M54.0461,-18C72.2389,-18 97.1211,-18 119.173,-18"/><polygon fill="black" stroke="black" stroke-width="0.5" points="119.339,-20.6251 126.839,-18 119.339,-15.3751 119.339,-20.6251"/></g></g></svg>
-
- In each of the following examples, we are creating new _references_ to the
- list object, but we are never duplicating it. Each reference points to the same
- memory address (which you can get using `id(a)`).
-
- {% highlight python %}
-
- b = a
-
- {% endhighlight %}
-
- {% highlight python %}
-
- c = SomeContainerClass()
- c.data = a
-
- {% endhighlight %}
-
- {% highlight python %}
-
- def wrapper(L):
- def inner():
- return L.pop()
- return inner
-
- d = wrapper(a)
-
- {% endhighlight %}
-
- <svg width="254pt" height="234pt" viewBox="0.00 0.00 253.96 234.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 238)"><title>%3</title><polygon fill="white" stroke="none" points="-4,4 -4,-238 249.96,-238 249.96,4 -4,4"/><g id="clust3" class="cluster"><title>cluster0</title><polygon fill="none" stroke="black" stroke-width="0.5" points="8,-8 8,-82 78,-82 78,-8 8,-8"/><text text-anchor="middle" x="43" y="-66.8" font-family="Courier,monospace" font-size="14.00">d</text></g><g id="node1" class="node"><title>obj</title><polygon fill="none" stroke="black" stroke-width="0.5" points="245.966,-153 157.943,-153 157.943,-117 245.966,-117 245.966,-153"/><text text-anchor="middle" x="201.954" y="-132" font-family="Courier,monospace" font-size="10.00">[1, 2, 3, 4]</text></g><g id="node2" class="node"><title>a</title><ellipse fill="none" stroke="black" stroke-width="0.5" cx="43" cy="-216" rx="27" ry="18"/><text text-anchor="middle" x="43" y="-211.8" font-family="Courier,monospace" font-size="14.00">a</text></g><g id="edge1" class="edge"><title>a->obj</title><path fill="none" stroke="black" stroke-width="0.5" d="M64.8423,-205.244C88.7975,-192.881 128.721,-172.278 159.152,-156.573"/><polygon fill="black" stroke="black" stroke-width="0.5" points="160.422,-158.872 165.883,-153.1 158.014,-154.206 160.422,-158.872"/></g><g id="node3" class="node"><title>b</title><ellipse fill="none" stroke="black" stroke-width="0.5" cx="43" cy="-162" rx="27" ry="18"/><text text-anchor="middle" x="43" y="-157.8" font-family="Courier,monospace" font-size="14.00">b</text></g><g id="edge2" class="edge"><title>b->obj</title><path fill="none" stroke="black" stroke-width="0.5" d="M69.2174,-157.662C90.9996,-153.915 123.147,-148.385 150.231,-143.726"/><polygon fill="black" stroke="black" stroke-width="0.5" points="150.777,-146.295 157.724,-142.437 149.887,-141.121 150.777,-146.295"/></g><g id="node4" class="node"><title>c</title><ellipse fill="none" stroke="black" stroke-width="0.5" cx="43" cy="-108" rx="41.897" ry="18"/><text text-anchor="middle" x="43" y="-103.8" font-family="Courier,monospace" font-size="14.00">c.data</text></g><g id="edge3" class="edge"><title>c->obj</title><path fill="none" stroke="black" stroke-width="0.5" d="M82.3954,-114.605C102.772,-118.11 128.077,-122.463 150.069,-126.247"/><polygon fill="black" stroke="black" stroke-width="0.5" points="149.86,-128.874 157.697,-127.559 150.75,-123.7 149.86,-128.874"/></g><g id="node5" class="node"><title>L</title><ellipse fill="none" stroke="black" stroke-width="0.5" cx="43" cy="-34" rx="27" ry="18"/><text text-anchor="middle" x="43" y="-29.8" font-family="Courier,monospace" font-size="14.00">L</text></g><g id="edge4" class="edge"><title>L->obj</title><path fill="none" stroke="black" stroke-width="0.5" d="M62.9324,-46.183C88.5083,-62.6411 134.554,-92.2712 166.386,-112.755"/><polygon fill="black" stroke="black" stroke-width="0.5" points="165.223,-115.128 172.951,-116.98 168.064,-110.714 165.223,-115.128"/></g></g></svg>
-
- Note that these references are all equal. `a` is no more valid a name for the
- list than `b`, `c.data`, or `L` (from the perspective of `d`, which is exposed
- to everyone else as `d.func_closure[0].cell_contents`, but that's cumbersome
- and you would never do that in practice). As a result, if you delete one of
- these references—explicitly with `del a`, or implicitly if a name goes out of
- scope—then the other references are still around, and object continues to
- exist. If all of an object's references disappear, then Python's garbage
- collector should eliminate it.
-
- ## Dead ends
-
- My first thought when approaching this problem was to physically write over the
- memory where our target object is stored. This can be done using
- [`ctypes.memmove()`](https://docs.python.org/2/library/ctypes.html#ctypes.memmove)
- from the Python standard library:
-
- {% highlight pycon %}
-
- >>> class A(object): pass
- ...
- >>> class B(object): pass
- ...
- >>> obj = A()
- >>> print obj
- <__main__.A object at 0x10e3e1190>
- >>> import ctypes
- >>> ctypes.memmove(id(A), id(B), object.__sizeof__(A))
- 140576340136752
- >>> print obj
- <__main__.B object at 0x10e3e1190>
-
- {% endhighlight %}
-
- What we are doing here is overwriting the fields of the `A` instance of the
- [`PyClassObject` C struct](https://github.com/python/cpython/blob/2.7/Include/classobject.h#L12)
- with fields from the `B` struct instance. As a result, they now share various
- properties, such as their attribute dictionaries
- ([`__dict__`](https://docs.python.org/2/reference/datamodel.html#the-standard-type-hierarchy)).
- So, we can do things like this:
-
- {% highlight pycon %}
-
- >>> B.foo = 123
- >>> obj.foo
- 123
-
- {% endhighlight %}
-
- However, there are clear issues. What we've done is create a
- [_shallow copy_](https://en.wikipedia.org/wiki/Object_copy#Shallow_copy).
- Therefore, `A` and `B` are still distinct objects, so certain changes made to
- one will not be replicated to the other:
-
- {% highlight pycon %}
-
- >>> A is B
- False
- >>> B.__name__ = "C"
- >>> A.__name__
- 'B'
-
- {% endhighlight %}
-
- Also, this won't work if `A` and `B` are different sizes, since we will be
- either reading from or writing to memory we don't necessarily own:
-
- {% highlight pycon %}
-
- >>> A = ()
- >>> B = []
- >>> print A.__sizeof__(), B.__sizeof__()
- 24 40
- >>> import ctypes
- >>> ctypes.memmove(id(A), id(B), A.__sizeof__())
- 4321271888
- Python(33575,0x7fff76925300) malloc: *** error for object 0x6f: pointer being freed was not allocated
- *** set a breakpoint in malloc_error_break to debug
- Abort trap: 6
-
- {% endhighlight %}
-
- Oh, and there's a bit of a problem when we deallocate these objects, too...
-
- {% highlight pycon %}
-
- >>> A = []
- >>> B = range(8)
- >>> import ctypes
- >>> ctypes.memmove(id(A), id(B), A.__sizeof__())
- 4514685728
- >>> print A
- [0, 1, 2, 3, 4, 5, 6, 7]
- >>> del A
- >>> del B
- Segmentation fault: 11
-
- {% endhighlight %}
-
- ## Fishing for references with Guppy
-
- A more correct solution is finding all of the _references_ to the old object,
- and then updating them to point to the new object, rather than replacing the
- old object directly.
-
- But how do we track references? Fortunately, there is a library called
- [Guppy](http://guppy-pe.sourceforge.net/) that allows us to do this. Often used
- for diagnosing memory leaks, we can take advantage of its robust object
- tracking features here. Install it with [pip](https://pypi.python.org/pypi/pip)
- (`pip install guppy`).
-
- I've always found Guppy hard to use (as many debuggers are, though justified by
- the complexity of the task involved), so we'll begin with a feature demo before
- delving into the actual problem.
-
- ### Feature demonstration
-
- Guppy's interface is deceptively simple. We begin by creating an instance of
- the Heapy interface, which is the component of Guppy that has the features we
- want:
-
- {% highlight pycon %}
-
- >>> import guppy
- >>> hp = guppy.hpy()
-
- {% endhighlight %}
-
- [...]
-
- ## Handling different reference types
-
- ### Dictionaries
-
- dicts, class attributes via `__dict__`, locals()
-
- ### Lists
-
- simple replacement
-
- ### Tuples
-
- recursively replace parent since immutable
-
- ### Bound methods
-
- note that built-in methods and regular methods have different underlying C
- structs, but have the same offsets for their self field
-
- ### Closure cells
-
- function closures
-
- ### Frames
-
- ...
-
- ### Slots
-
- ...
-
- ### Classes
-
- ...
-
- ### Other cases
-
- Certainly, not every case is handled above, but it seems to cover the vast
- majority of instances that I've found through testing. There are a number of
- reference relations in Guppy that I couldn't figure out how to replicate
- without doing something insane (`R_HASATTR`, `R_CELL`, and `R_STACK`), so some
- obscure replacements are likely unimplemented.
-
- Some other kinds of replacements are known, but impossible. For example,
- replacing a class object that uses `__slots__` with another class will not work
- if the replacement class has a different slot layout and instances of the old
- class exist. More generally, replacing a class with a non-class object won't
- work if instances of the class exist. Furthermore, references stored in data
- structures managed by C extensions cannot be changed, since there's no good way
- for us to track these.
-
- Remaining areas to explore include behavior when metaclasses and more complex
- descriptors are involved. Implementing a more complete version of `replace()`
- is left as an exercise for the reader.
-
- ## Notes
-
- 1. <a id="fn1" href="#ref1">^</a> This post relies _heavily_ on implementation
- details of CPython 2.7. While it could be adapted for Python 3 by examining
- changes to the internal structures of objects that we used above, that would
- be a lost cause if you wanted to replicate this on
- [Jython](http://www.jython.org/) or some other implementation. We are so
- dependent on concepts specific to CPython that you would need to start from
- scratch, beginning with a language-specific replacement for Guppy.
-
- 2. <a id="fn2" href="#ref2">^</a> The
- [DOT files](https://en.wikipedia.org/wiki/DOT_(graph_description_language))
- used to generate graphs in this post are
- [available on Gist](https://gist.github.com/earwig/edc13f04f871c110eea6).
|