diff --git a/_drafts/python-object-replacement.md b/_drafts/python-object-replacement.md index c033425..352fe93 100644 --- a/_drafts/python-object-replacement.md +++ b/_drafts/python-object-replacement.md @@ -22,7 +22,7 @@ on. _But why on Earth would you want to do that?_ you ask. I'll focus on a concrete use case in a future post, but for now, I imagine this could be useful in some kind of advanted unit testing situation with mock objects. Still, it's fairly -insane, so let's leave it as primarily an intellectual exercise. +insane, so let's leave it primarily as an intellectual exercise. This article is written for [CPython](https://en.wikipedia.org/wiki/CPython) 2.7.[1] @@ -77,13 +77,12 @@ d = wrapper(a) cluster0dobj[1, 2, 3, 4]aaa->objbbb->objcc.datac->objLLL->obj Note that these references are all equal. `a` is no more valid a name for the -list than `b`, `c.data`, or `L` (from the perspective of `d`, which is exposed -to everyone else as `d.func_closure[0].cell_contents`, but that's cumbersome -and you would never do that in practice). As a result, if you delete one of -these references—explicitly with `del a`, or implicitly if a name goes out of -scope—then the other references are still around, and object continues to -exist. If all of an object's references disappear, then Python's garbage -collector should eliminate it. +list than `b`, `c.data`, or `L` (or `d.func_closure[0].cell_contents` to the +outside world). As a result, if you delete one of these references—explicitly +with `del a`, or implicitly if a name goes out of scope—then the other +references are still around, and object continues to exist. If all of an +object's references disappear, then Python's garbage collector should eliminate +it. ## Dead ends @@ -176,9 +175,9 @@ Segmentation fault: 11 ## Fishing for references with Guppy -A more correct solution is finding all of the _references_ to the old object, -and then updating them to point to the new object, rather than replacing the -old object directly. +A more appropriate solution is finding all of the _references_ to the old +object, and then updating them to point to the new object, rather than +replacing the old object directly. But how do we track references? Fortunately, there's a library called [Guppy](http://guppy-pe.sourceforge.net/) that allows us to do this. Often used @@ -194,7 +193,7 @@ delving into the actual problem. Guppy's interface is deceptively simple. We begin by calling [`guppy.hpy()`](http://guppy-pe.sourceforge.net/guppy.html#kindnames.guppy.hpy), -to expose the Heapy interface, which is the component of Guppy that has the +to expose the Heapy interface, which is the component of Guppy with the features we want: {% highlight pycon %} @@ -836,31 +835,131 @@ True ### Dictionary keys -... +Dictionary keys have a different reference relation type than values, but the +replacement works mostly the same way. We pop the value of the old key from the +dictionary, and then insert it in again under the new key. Here's the code, +which we'll stick into the main block in `replace()`: + +{% highlight python %} + +elif isinstance(relation, Path.R_INDEXKEY): + source = path.src.theone + source[new] = source.pop(source.keys()[relation.r]) + +{% endhighlight %} + +And, a demonstration: + +{% highlight pycon %} + +>>> a, b = A(), B() +>>> d = {a: 1} +>>> replace(a, b) +>>> d +{<__main__.B object at 0x10fb47950>: 1} + +{% endhighlight %} ### Closure cells -... +We'll cover just one more case, this time involving a +[closure](https://en.wikipedia.org/wiki/Closure_(computer_programming)). Here's +our test function: -### Frames +{% highlight python %} -... +def wrapper(obj): + def inner(): + return obj + return inner -### Slots +{% endhighlight %} -... +As we can see, an instance of the inner function keeps references to the locals +of the wrapper function, even after using our current +version of `replace()`: + +{% highlight pycon %} -### Classes +>>> a, b = A(), B() +>>> f = wrapper(a) +>>> f() +<__main__.A object at 0x109446090> +>>> replace(a, b) +>>> f() +<__main__.A object at 0x109446090> -... +{% endhighlight %} + +Internally, CPython implements this using things called +[_cells_](https://docs.python.org/2/c-api/cell.html). We notice that +`f.func_closure` gives us a tuple of `cell` objects, and we can examine an +individual cell's contents with `.cell_contents`: + +{% highlight pycon %} + +>>> f.func_closure +(,) +>>> f.func_closure[0].cell_contents +<__main__.A object at 0x109446090> + +{% endhighlight %} + +As expected, we can't just modify it... + +{% highlight pycon %} + +>>> f.func_closure[0].cell_contents = b +Traceback (most recent call last): + File "", line 1, in +AttributeError: attribute 'cell_contents' of 'cell' objects is not writable + +{% endhighlight %} + +...because that would be too easy. So, how can we replace it? Well, we could +go back to `memmove`, but there's an easier way thanks to the `ctypes` module +also exposing Python's C API. Specifically, the +[`PyCell_Set`](https://docs.python.org/2/c-api/cell.html#c.PyCell_Set) function +(which seems to lack a pure Python equivalent) does exactly what we want. Since +the function expects `PyObject*`s as arguments, we'll need to use +`ctypes.py_object` as a wrapper. Here it is: + +{% highlight pycon %} + +>>> from ctypes import py_object, pythonapi +>>> pythonapi.PyCell_Set(py_object(f.func_closure[0]), py_object(b)) +0 +>>> f() +<__main__.B object at 0x10ad94dd0> + +{% endhighlight %} + +Perfect – the replacement worked. To tie it together with `replace()`, we'll +note that Guppy represents the cell contents relationship with +`Based_R_INTERATTR`, for what I assume to be "internal attribute". We can use +this to find the cell object within the inner function that references our +target object, and then use the method above to make the change: + +{% highlight python %} + +elif isinstance(relation, Path.R_INTERATTR): + if isinstance(source, CellType): + pythonapi.PyCell_Set(py_object(source), py_object(new)) + return + +{% endhighlight %} ### Other cases -Certainly, not every case is handled above, but it seems to cover the vast -majority of instances that I've found through testing. There are a number of -reference relations in Guppy that I couldn't figure out how to replicate -without doing something insane (`R_HASATTR`, `R_CELL`, and `R_STACK`), so some -obscure replacements are likely unimplemented. +There are many, many more types of possible replacements. I've written a more +extensible version of `replace()` with some test cases, which can be viewed +on Gist [here](https://gist.github.com/earwig/28a64ffb94d51a608e3d). + +Certainly, not every case is handled by it, but it seems to cover the majority +that I've found through testing. There are a number of reference relations in +Guppy that I couldn't figure out how to replicate without doing something +insane (`R_HASATTR`, `R_CELL`, and `R_STACK`), so some obscure replacements are +likely unimplemented. Some other kinds of replacements are known, but impossible. For example, replacing a class object that uses `__slots__` with another class will not work @@ -870,10 +969,6 @@ work if instances of the class exist. Furthermore, references stored in data structures managed by C extensions cannot be changed, since there's no good way for us to track these. -Remaining areas to explore include behavior when metaclasses and more complex -descriptors are involved. Implementing a more complete version of `replace()` -is left as an exercise for the reader. - ## Footnotes 1. ^ This post relies _heavily_ on implementation