Explorar el Código

Finish most of the first major chunk of the post.

master
Ben Kurtovic hace 9 años
padre
commit
eee67a7a02
Se han modificado 1 ficheros con 134 adiciones y 11 borrados
  1. +134
    -11
      _drafts/python-object-replacement.md

+ 134
- 11
_drafts/python-object-replacement.md Ver fichero

@@ -24,6 +24,9 @@ use case in a future post, but for now, I imagine this could be useful in some
kind of advanted unit testing situation with mock objects. Still, it's fairly
insane, so let's leave it as primarily an intellectual exercise.

This article is written for [CPython](https://en.wikipedia.org/wiki/CPython)
2.7.<sup><a id="ref1" href="#fn1">[1]</a></sup>

## Review

First, a recap on terminology here. You can skip this section if you know
@@ -38,15 +41,14 @@ a = [1, 2, 3, 4]

{% endhighlight %}

We are creating a list object with four integers, and binding it to the name
`a`:
...we are creating a list object with four integers, and binding it to the name
`a`. In graph form:<sup><a id="ref2" href="#fn2">[2]</a></sup>

<svg width="223pt" height="44pt" viewBox="0.00 0.00 223.01 44.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 40)"><title>%3</title><polygon fill="white" stroke="none" points="-4,4 -4,-40 219.012,-40 219.012,4 -4,4"/><g id="node1" class="node"><title>L</title><polygon fill="none" stroke="black" stroke-width="0.5" points="215.018,-36 126.994,-36 126.994,-0 215.018,-0 215.018,-36"/><text text-anchor="middle" x="171.006" y="-15" font-family="Courier,monospace" font-size="10.00">[1, 2, 3, 4]</text></g><g id="node2" class="node"><title>a</title><ellipse fill="none" stroke="black" stroke-width="0.5" cx="27" cy="-18" rx="27" ry="18"/><text text-anchor="middle" x="27" y="-13.8" font-family="Courier,monospace" font-size="14.00">a</text></g><g id="edge1" class="edge"><title>a&#45;&gt;L</title><path fill="none" stroke="black" stroke-width="0.5" d="M54.0461,-18C72.2389,-18 97.1211,-18 119.173,-18"/><polygon fill="black" stroke="black" stroke-width="0.5" points="119.339,-20.6251 126.839,-18 119.339,-15.3751 119.339,-20.6251"/></g></g></svg>

In each of the following examples, we are creating new _references_ to the
list object, but we are never duplicating it. Each reference points to the same
memory address (which you can get using `id(a)`, but that's a CPython
implementation detail).
memory address (which you can get using `id(a)`).

{% highlight python %}

@@ -83,13 +85,125 @@ scope—then the other references are still around, and object continues to
exist. If all of an object's references disappear, then Python's garbage
collector should eliminate it.

## Dead ends

My first thought when approaching this problem was to physically write over the
memory where our target object is stored. This can be done using
[`ctypes.memmove()`](https://docs.python.org/2/library/ctypes.html#ctypes.memmove)
from the Python standard library:

{% highlight pycon %}

>>> class A(object): pass
...
>>> class B(object): pass
...
>>> obj = A()
>>> print obj
<__main__.A object at 0x10e3e1190>
>>> import ctypes
>>> ctypes.memmove(id(A), id(B), object.__sizeof__(A))
140576340136752
>>> print obj
<__main__.B object at 0x10e3e1190>

{% endhighlight %}

What we are doing here is overwriting the fields of the `A` instance of the
[`PyClassObject` C struct](https://github.com/python/cpython/blob/2.7/Include/classobject.h#L12)
with fields from the `B` struct instance. As a result, they now share various
properties, such as their attribute dictionaries
([`__dict__`](https://docs.python.org/2/reference/datamodel.html#the-standard-type-hierarchy)).
So, we can do things like this:

{% highlight pycon %}

>>> B.foo = 123
>>> obj.foo
123

{% endhighlight %}

However, there are clear issues. What we've done is create a
[_shallow copy_](https://en.wikipedia.org/wiki/Object_copy#Shallow_copy).
Therefore, `A` and `B` are still distinct objects, so certain changes made to
one will not be replicated to the other:

{% highlight pycon %}

>>> A is B
False
>>> B.__name__ = "C"
>>> A.__name__
'B'

{% endhighlight %}

Also, this won't work if `A` and `B` are different sizes, since we will be
either reading from or writing to memory we don't necessarily own:

{% highlight pycon %}

>>> A = ()
>>> B = []
>>> print A.__sizeof__(), B.__sizeof__()
24 40
>>> import ctypes
>>> ctypes.memmove(id(A), id(B), A.__sizeof__())
4321271888
Python(33575,0x7fff76925300) malloc: *** error for object 0x6f: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Abort trap: 6

{% endhighlight %}

Oh, and there's a bit of a problem when we deallocate these objects, too...

{% highlight pycon %}

>>> A = []
>>> B = range(8)
>>> import ctypes
>>> ctypes.memmove(id(A), id(B), A.__sizeof__())
4514685728
>>> print A
[0, 1, 2, 3, 4, 5, 6, 7]
>>> del A
>>> del B
Segmentation fault: 11

{% endhighlight %}

## Fishing for references with Guppy

So, this boils down to finding all of the references to a particular object,
and then updating them to point to a different object.
A more correct solution is finding all of the _references_ to the old object,
and then updating them to point to the new object, rather than replacing the
old object directly.

But how do we track references? Fortunately, there is a library called
[Guppy](http://guppy-pe.sourceforge.net/) that allows us to do this. Often used
for diagnosing memory leaks, we can take advantage of its robust object
tracking features here. Install it with [pip](https://pypi.python.org/pypi/pip)
(`pip install guppy`).

I've always found Guppy hard to use (as many debuggers are, though justified by
the complexity of the task involved), so we'll begin with a feature demo before
delving into the actual problem.

### Feature demonstration

Guppy's interface is deceptively simple. We begin by creating an instance of
the Heapy interface, which is the component of Guppy that has the features we
want:

{% highlight pycon %}

>>> import guppy
>>> hp = guppy.hpy()

{% endhighlight %}

But how do we track references? Fortunately for us, there is a library called
[Guppy](http://guppy-pe.sourceforge.net/) that allows us to do this.
[...]

## Handling different reference types

@@ -148,6 +262,15 @@ is left as an exercise for the reader.

## Notes

The [DOT files](https://en.wikipedia.org/wiki/DOT_(graph_description_language))
used to generate graphs in this post are
[available on Gist](https://gist.github.com/earwig/edc13f04f871c110eea6).
1. <a id="fn1" href="#ref1">^</a> This post relies _heavily_ on implementation
details of CPython 2.7. While it could be adapted for Python 3 by examining
changes to the internal structures of objects that we used above, that would
be a lost cause if you wanted to replicate this on
[Jython](http://www.jython.org/) or some other implementation. We are so
dependent on concepts specific to CPython that you would need to start from
scratch, beginning with a language-specific replacement for Guppy.

2. <a id="fn2" href="#ref2">^</a> The
[DOT files](https://en.wikipedia.org/wiki/DOT_(graph_description_language))
used to generate graphs in this post are
[available on Gist](https://gist.github.com/earwig/edc13f04f871c110eea6).

Cargando…
Cancelar
Guardar