From 33a17d46fcbdc1712381dd4705e1d15a2361f9a5 Mon Sep 17 00:00:00 2001 From: Ben Kurtovic Date: Mon, 10 Nov 2014 02:24:24 -0600 Subject: [PATCH 01/22] Start work on Python object replacement post. --- _drafts/python-object-replacement.md | 75 ++++++++++++++++++++++++++++++++++++ 1 file changed, 75 insertions(+) create mode 100644 _drafts/python-object-replacement.md diff --git a/_drafts/python-object-replacement.md b/_drafts/python-object-replacement.md new file mode 100644 index 0000000..502a071 --- /dev/null +++ b/_drafts/python-object-replacement.md @@ -0,0 +1,75 @@ +--- +layout: post +title: Replacing Objects in Python +tags: Python +description: More reflection than you cared to ask for +draft: true +--- + +Today, we're going to demonstrate a fairly evil thing in Python, which I call +_object replacement_. + +## Review + +First, a recap on terminology here. You can skip this if you know Python well. + +In Python, _names_ are what most languages call "variables". They reference +_objects_. So when we do: + +{% highlight python %} + +a = [1, 2, 3, 4] + +{% endhighlight %} + +We are creating a list object with four integers, and binding it to the name +`a`: + +
[1, 2, 3, 4]
[Not supported by viewer]
a
[Not supported by viewer]
+ +In each of the following examples, we are creating new _references_ to the +list object, but we are never duplicating it. Each reference points to the same +memory address (which you can get using `id(a)`, but that's a CPython +implementation detail). + +{% highlight python %} + +b = a + +{% endhighlight %} + +{% highlight python %} + +c = SomeContainerClass() +c.data = a + +{% endhighlight %} + +{% highlight python %} + +def wrapper(L): + def inner(): + return L.pop() + return inner + +d = wrapper(a) + +{% endhighlight %} + +[insert charts here] + +Note that these references are all equal. `a` is no more valid a name for the +list than `b`, `c.data`, or `L` from the perspective of `d` (and +`d.func_closure[0].cell_contents` from the perspective of the outside world, +even though that's cumbersome and you would never do that in practice). As a +result, if you delete one of these references—explicitly with `del a`, or +implicitly if a name goes out of scope—then the other references are still +around, and object continues to exist. If all of an object's references +disappear, then Python's garbage collector will eliminate it. + +## Fishing for references with Guppy + +[...] + +So, what this boils down to is finding all references to a given object and +replacing them with another object. From 01d821002508134efd5ef86d2a2a0eec4748195f Mon Sep 17 00:00:00 2001 From: Ben Kurtovic Date: Mon, 10 Nov 2014 04:26:11 -0600 Subject: [PATCH 02/22] Add more. --- _drafts/python-object-replacement.md | 56 +++++++++++++++++++++++++++++------- 1 file changed, 45 insertions(+), 11 deletions(-) diff --git a/_drafts/python-object-replacement.md b/_drafts/python-object-replacement.md index 502a071..4b1f3c1 100644 --- a/_drafts/python-object-replacement.md +++ b/_drafts/python-object-replacement.md @@ -11,7 +11,8 @@ _object replacement_. ## Review -First, a recap on terminology here. You can skip this if you know Python well. +First, a recap on terminology here. You can skip this section if you know +Python well. In Python, _names_ are what most languages call "variables". They reference _objects_. So when we do: @@ -59,17 +60,50 @@ d = wrapper(a) [insert charts here] Note that these references are all equal. `a` is no more valid a name for the -list than `b`, `c.data`, or `L` from the perspective of `d` (and -`d.func_closure[0].cell_contents` from the perspective of the outside world, -even though that's cumbersome and you would never do that in practice). As a -result, if you delete one of these references—explicitly with `del a`, or -implicitly if a name goes out of scope—then the other references are still -around, and object continues to exist. If all of an object's references -disappear, then Python's garbage collector will eliminate it. +list than `b`, `c.data`, or `L` (from the perspective of `d`, which is exposed +to everyone else as `d.func_closure[0].cell_contents`, but that's cumbersome +and you would never do that in practice). As a result, if you delete one of +these references—explicitly with `del a`, or implicitly if a name goes out of +scope—then the other references are still around, and object continues to +exist. If all of an object's references disappear, then Python's garbage +collector will eliminate it. + +## The task + +Say you have some program that's been running for a while, and a particular +object has made its way throughout your code. It lives inside lists, class +attributes, maybe even inside some closures. You want to completely replace +this object with another one; that is to say, you want to find all references +to object `A` and replace them with object `B`, enabling `A` to be garbage +collected. + +_But why on Earth would you want to do that?_ you ask. I'll focus on a concrete +use case in a future post, but for now, I imagine this could be useful in some +kind of advanted unit testing situation with mock objects. Still, it's fairly +insane, so let's leave it as primarily an intellectual exercise. ## Fishing for references with Guppy -[...] +Guppy! -So, what this boils down to is finding all references to a given object and -replacing them with another object. +## Handling different references + +### Dictionaries + +dicts, class attributes via `__dict__`, locals() + +### Lists + +.... + +### Tuples + +recursively replace parent since immutable + +### Closure cells + +function closures + +### Bound methods + +bound built-in methods separately? From 022b2ed53c2418a3aed3c404ee28eca740e6a036 Mon Sep 17 00:00:00 2001 From: Ben Kurtovic Date: Mon, 10 Nov 2014 18:05:38 -0600 Subject: [PATCH 03/22] Add more skeleton to post. --- _drafts/python-object-replacement.md | 39 +++++++++++++++++++----------------- 1 file changed, 21 insertions(+), 18 deletions(-) diff --git a/_drafts/python-object-replacement.md b/_drafts/python-object-replacement.md index 4b1f3c1..64f3854 100644 --- a/_drafts/python-object-replacement.md +++ b/_drafts/python-object-replacement.md @@ -9,6 +9,18 @@ draft: true Today, we're going to demonstrate a fairly evil thing in Python, which I call _object replacement_. +Say you have some program that's been running for a while, and a particular +object has made its way throughout your code. It lives inside lists, class +attributes, maybe even inside some closures. You want to completely replace +this object with another one; that is to say, you want to find all references +to object `A` and replace them with object `B`, enabling `A` to be garbage +collected. + +_But why on Earth would you want to do that?_ you ask. I'll focus on a concrete +use case in a future post, but for now, I imagine this could be useful in some +kind of advanted unit testing situation with mock objects. Still, it's fairly +insane, so let's leave it as primarily an intellectual exercise. + ## Review First, a recap on terminology here. You can skip this section if you know @@ -66,21 +78,7 @@ and you would never do that in practice). As a result, if you delete one of these references—explicitly with `del a`, or implicitly if a name goes out of scope—then the other references are still around, and object continues to exist. If all of an object's references disappear, then Python's garbage -collector will eliminate it. - -## The task - -Say you have some program that's been running for a while, and a particular -object has made its way throughout your code. It lives inside lists, class -attributes, maybe even inside some closures. You want to completely replace -this object with another one; that is to say, you want to find all references -to object `A` and replace them with object `B`, enabling `A` to be garbage -collected. - -_But why on Earth would you want to do that?_ you ask. I'll focus on a concrete -use case in a future post, but for now, I imagine this could be useful in some -kind of advanted unit testing situation with mock objects. Still, it's fairly -insane, so let's leave it as primarily an intellectual exercise. +collector should eliminate it. ## Fishing for references with Guppy @@ -94,16 +92,21 @@ dicts, class attributes via `__dict__`, locals() ### Lists -.... +simple replacement ### Tuples recursively replace parent since immutable +### Bound methods + +note that built-in methods and regular methods have different underlying C +structs, but have the same offsets for their self field + ### Closure cells function closures -### Bound methods +### Frames -bound built-in methods separately? +... From 3773cba45cf4a7f90e9b58ad871949e974e51ec9 Mon Sep 17 00:00:00 2001 From: Ben Kurtovic Date: Wed, 12 Nov 2014 03:22:29 -0600 Subject: [PATCH 04/22] Develop post more. --- _drafts/python-object-replacement.md | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/_drafts/python-object-replacement.md b/_drafts/python-object-replacement.md index 64f3854..fe03e17 100644 --- a/_drafts/python-object-replacement.md +++ b/_drafts/python-object-replacement.md @@ -82,9 +82,13 @@ collector should eliminate it. ## Fishing for references with Guppy -Guppy! +So, this boils down to finding all of the references to a particular object, +and then updating them to point to a different object. -## Handling different references +But how do we track references? Fortunately for us, there is a library called +[Guppy](http://guppy-pe.sourceforge.net/) that allows us to do this. + +## Handling different reference types ### Dictionaries @@ -110,3 +114,15 @@ function closures ### Frames ... + +### Classes + +... + +### Other cases + +Certainly, not every case is handled above, but it seems to cover the vast +majority of instances that I've found through testing. Remaining areas to +explore include behavior when metaclasses and more complex descriptors are +involved. Implementing a more complete version of `replace()` is left as an +exercise for the reader. From fbd4012c0fbc0ac3895a4f540b6f3d4f3481e064 Mon Sep 17 00:00:00 2001 From: Ben Kurtovic Date: Wed, 12 Nov 2014 18:33:31 -0600 Subject: [PATCH 05/22] More work on post. --- _drafts/python-object-replacement.md | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/_drafts/python-object-replacement.md b/_drafts/python-object-replacement.md index fe03e17..4f91858 100644 --- a/_drafts/python-object-replacement.md +++ b/_drafts/python-object-replacement.md @@ -115,6 +115,10 @@ function closures ... +### Slots + +... + ### Classes ... @@ -122,7 +126,18 @@ function closures ### Other cases Certainly, not every case is handled above, but it seems to cover the vast -majority of instances that I've found through testing. Remaining areas to -explore include behavior when metaclasses and more complex descriptors are -involved. Implementing a more complete version of `replace()` is left as an -exercise for the reader. +majority of instances that I've found through testing. There are a number of +reference relations in Guppy that I couldn't figure out how to replicate +without doing something insane (`R_CELL` and `R_STACK`), so some obscure +replacements are likely unimplemented. + +Some other kinds of replacements are known, but impossible. For example, +replacing a class object that uses `__slots__` with another class will not work +if the replacement class has a different slot layout and instances of the old +class exist. Furthermore, it doesn't work for references stored in the code of +C extensions, since there's effectively no way for us to track these, but this +is an exceptional circumstance. + +Remaining areas to explore include behavior when metaclasses and more complex +descriptors are involved. Implementing a more complete version of `replace()` +is left as an exercise for the reader. From 25c8ffe01376b8c13d46dd4b2742c50eae9e16d8 Mon Sep 17 00:00:00 2001 From: Ben Kurtovic Date: Wed, 12 Nov 2014 23:24:49 -0600 Subject: [PATCH 06/22] More progress; new graph via graphviz. --- _drafts/python-object-replacement.md | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/_drafts/python-object-replacement.md b/_drafts/python-object-replacement.md index 4f91858..835fe2c 100644 --- a/_drafts/python-object-replacement.md +++ b/_drafts/python-object-replacement.md @@ -14,7 +14,10 @@ object has made its way throughout your code. It lives inside lists, class attributes, maybe even inside some closures. You want to completely replace this object with another one; that is to say, you want to find all references to object `A` and replace them with object `B`, enabling `A` to be garbage -collected. +collected. This has some interesting implications for special object types. If +you have methods that are bound to `A`, you want to rebind them to `B`. If `A` +is a class, you want all instances of `A` to become instances of `B`. And so +on. _But why on Earth would you want to do that?_ you ask. I'll focus on a concrete use case in a future post, but for now, I imagine this could be useful in some @@ -38,7 +41,7 @@ a = [1, 2, 3, 4] We are creating a list object with four integers, and binding it to the name `a`: -
[1, 2, 3, 4]
[Not supported by viewer]
a
[Not supported by viewer]
+%3L[1, 2, 3, 4]aaa->L In each of the following examples, we are creating new _references_ to the list object, but we are never duplicating it. Each reference points to the same @@ -128,15 +131,16 @@ function closures Certainly, not every case is handled above, but it seems to cover the vast majority of instances that I've found through testing. There are a number of reference relations in Guppy that I couldn't figure out how to replicate -without doing something insane (`R_CELL` and `R_STACK`), so some obscure -replacements are likely unimplemented. +without doing something insane (`R_HASATTR`, `R_CELL`, and `R_STACK`), so some +obscure replacements are likely unimplemented. Some other kinds of replacements are known, but impossible. For example, replacing a class object that uses `__slots__` with another class will not work if the replacement class has a different slot layout and instances of the old -class exist. Furthermore, it doesn't work for references stored in the code of -C extensions, since there's effectively no way for us to track these, but this -is an exceptional circumstance. +class exist. More generally, replacing a class with a non-class object won't +work if instances of the class exist. Furthermore, references stored in data +structures managed by C extensions cannot be changed, since there's no good way +for us to track these. Remaining areas to explore include behavior when metaclasses and more complex descriptors are involved. Implementing a more complete version of `replace()` From 1cc40087769cde8e7909efb59508bbfda9c837a6 Mon Sep 17 00:00:00 2001 From: Ben Kurtovic Date: Wed, 12 Nov 2014 23:55:44 -0600 Subject: [PATCH 07/22] Add other reference chart; fix code box margins. --- _drafts/python-object-replacement.md | 2 +- static/main.css | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/_drafts/python-object-replacement.md b/_drafts/python-object-replacement.md index 835fe2c..d59cd05 100644 --- a/_drafts/python-object-replacement.md +++ b/_drafts/python-object-replacement.md @@ -72,7 +72,7 @@ d = wrapper(a) {% endhighlight %} -[insert charts here] +%3cluster0dobj[1, 2, 3, 4]aaa->objbbb->objcc.datac->objLLL->obj Note that these references are all equal. `a` is no more valid a name for the list than `b`, `c.data`, or `L` (from the perspective of `d`, which is exposed diff --git a/static/main.css b/static/main.css index c0fdff5..03fd79c 100644 --- a/static/main.css +++ b/static/main.css @@ -205,6 +205,7 @@ pre code { border: 1px solid #e8e8e8; font-size: 14px; line-height: 1.35em; + margin: 1em 0; padding-left: 16px; } From 3daf0a1350d89e0a3179ae275f29cc75640d30bf Mon Sep 17 00:00:00 2001 From: Ben Kurtovic Date: Thu, 13 Nov 2014 22:13:53 -0600 Subject: [PATCH 08/22] Link to DOT files on Gist. --- _drafts/python-object-replacement.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/_drafts/python-object-replacement.md b/_drafts/python-object-replacement.md index d59cd05..d4dc41a 100644 --- a/_drafts/python-object-replacement.md +++ b/_drafts/python-object-replacement.md @@ -145,3 +145,9 @@ for us to track these. Remaining areas to explore include behavior when metaclasses and more complex descriptors are involved. Implementing a more complete version of `replace()` is left as an exercise for the reader. + +## Notes + +The [DOT files](https://en.wikipedia.org/wiki/DOT_(graph_description_language)) +used to generate graphs in this post are +[available on Gist](https://gist.github.com/earwig/edc13f04f871c110eea6). From eee67a7a02c5ab69d711eab094d123078a847ba1 Mon Sep 17 00:00:00 2001 From: Ben Kurtovic Date: Fri, 14 Nov 2014 01:01:05 -0600 Subject: [PATCH 09/22] Finish most of the first major chunk of the post. --- _drafts/python-object-replacement.md | 145 ++++++++++++++++++++++++++++++++--- 1 file changed, 134 insertions(+), 11 deletions(-) diff --git a/_drafts/python-object-replacement.md b/_drafts/python-object-replacement.md index d4dc41a..2a3bfb6 100644 --- a/_drafts/python-object-replacement.md +++ b/_drafts/python-object-replacement.md @@ -24,6 +24,9 @@ use case in a future post, but for now, I imagine this could be useful in some kind of advanted unit testing situation with mock objects. Still, it's fairly insane, so let's leave it as primarily an intellectual exercise. +This article is written for [CPython](https://en.wikipedia.org/wiki/CPython) +2.7.[1] + ## Review First, a recap on terminology here. You can skip this section if you know @@ -38,15 +41,14 @@ a = [1, 2, 3, 4] {% endhighlight %} -We are creating a list object with four integers, and binding it to the name -`a`: +...we are creating a list object with four integers, and binding it to the name +`a`. In graph form:[2] %3L[1, 2, 3, 4]aaa->L In each of the following examples, we are creating new _references_ to the list object, but we are never duplicating it. Each reference points to the same -memory address (which you can get using `id(a)`, but that's a CPython -implementation detail). +memory address (which you can get using `id(a)`). {% highlight python %} @@ -83,13 +85,125 @@ scope—then the other references are still around, and object continues to exist. If all of an object's references disappear, then Python's garbage collector should eliminate it. +## Dead ends + +My first thought when approaching this problem was to physically write over the +memory where our target object is stored. This can be done using +[`ctypes.memmove()`](https://docs.python.org/2/library/ctypes.html#ctypes.memmove) +from the Python standard library: + +{% highlight pycon %} + +>>> class A(object): pass +... +>>> class B(object): pass +... +>>> obj = A() +>>> print obj +<__main__.A object at 0x10e3e1190> +>>> import ctypes +>>> ctypes.memmove(id(A), id(B), object.__sizeof__(A)) +140576340136752 +>>> print obj +<__main__.B object at 0x10e3e1190> + +{% endhighlight %} + +What we are doing here is overwriting the fields of the `A` instance of the +[`PyClassObject` C struct](https://github.com/python/cpython/blob/2.7/Include/classobject.h#L12) +with fields from the `B` struct instance. As a result, they now share various +properties, such as their attribute dictionaries +([`__dict__`](https://docs.python.org/2/reference/datamodel.html#the-standard-type-hierarchy)). +So, we can do things like this: + +{% highlight pycon %} + +>>> B.foo = 123 +>>> obj.foo +123 + +{% endhighlight %} + +However, there are clear issues. What we've done is create a +[_shallow copy_](https://en.wikipedia.org/wiki/Object_copy#Shallow_copy). +Therefore, `A` and `B` are still distinct objects, so certain changes made to +one will not be replicated to the other: + +{% highlight pycon %} + +>>> A is B +False +>>> B.__name__ = "C" +>>> A.__name__ +'B' + +{% endhighlight %} + +Also, this won't work if `A` and `B` are different sizes, since we will be +either reading from or writing to memory we don't necessarily own: + +{% highlight pycon %} + +>>> A = () +>>> B = [] +>>> print A.__sizeof__(), B.__sizeof__() +24 40 +>>> import ctypes +>>> ctypes.memmove(id(A), id(B), A.__sizeof__()) +4321271888 +Python(33575,0x7fff76925300) malloc: *** error for object 0x6f: pointer being freed was not allocated +*** set a breakpoint in malloc_error_break to debug +Abort trap: 6 + +{% endhighlight %} + +Oh, and there's a bit of a problem when we deallocate these objects, too... + +{% highlight pycon %} + +>>> A = [] +>>> B = range(8) +>>> import ctypes +>>> ctypes.memmove(id(A), id(B), A.__sizeof__()) +4514685728 +>>> print A +[0, 1, 2, 3, 4, 5, 6, 7] +>>> del A +>>> del B +Segmentation fault: 11 + +{% endhighlight %} + ## Fishing for references with Guppy -So, this boils down to finding all of the references to a particular object, -and then updating them to point to a different object. +A more correct solution is finding all of the _references_ to the old object, +and then updating them to point to the new object, rather than replacing the +old object directly. + +But how do we track references? Fortunately, there is a library called +[Guppy](http://guppy-pe.sourceforge.net/) that allows us to do this. Often used +for diagnosing memory leaks, we can take advantage of its robust object +tracking features here. Install it with [pip](https://pypi.python.org/pypi/pip) +(`pip install guppy`). + +I've always found Guppy hard to use (as many debuggers are, though justified by +the complexity of the task involved), so we'll begin with a feature demo before +delving into the actual problem. + +### Feature demonstration + +Guppy's interface is deceptively simple. We begin by creating an instance of +the Heapy interface, which is the component of Guppy that has the features we +want: + +{% highlight pycon %} + +>>> import guppy +>>> hp = guppy.hpy() + +{% endhighlight %} -But how do we track references? Fortunately for us, there is a library called -[Guppy](http://guppy-pe.sourceforge.net/) that allows us to do this. +[...] ## Handling different reference types @@ -148,6 +262,15 @@ is left as an exercise for the reader. ## Notes -The [DOT files](https://en.wikipedia.org/wiki/DOT_(graph_description_language)) -used to generate graphs in this post are -[available on Gist](https://gist.github.com/earwig/edc13f04f871c110eea6). +1. ^ This post relies _heavily_ on implementation + details of CPython 2.7. While it could be adapted for Python 3 by examining + changes to the internal structures of objects that we used above, that would + be a lost cause if you wanted to replicate this on + [Jython](http://www.jython.org/) or some other implementation. We are so + dependent on concepts specific to CPython that you would need to start from + scratch, beginning with a language-specific replacement for Guppy. + +2. ^ The + [DOT files](https://en.wikipedia.org/wiki/DOT_(graph_description_language)) + used to generate graphs in this post are + [available on Gist](https://gist.github.com/earwig/edc13f04f871c110eea6). From 63ef371c55efbe26897bb6191d28129c3a5651f2 Mon Sep 17 00:00:00 2001 From: Ben Kurtovic Date: Fri, 14 Nov 2014 21:49:17 -0600 Subject: [PATCH 10/22] Finish section on using Guppy. --- _drafts/python-object-replacement.md | 254 ++++++++++++++++++++++++++++++++++- 1 file changed, 248 insertions(+), 6 deletions(-) diff --git a/_drafts/python-object-replacement.md b/_drafts/python-object-replacement.md index 2a3bfb6..0db4f0b 100644 --- a/_drafts/python-object-replacement.md +++ b/_drafts/python-object-replacement.md @@ -140,7 +140,7 @@ False {% endhighlight %} Also, this won't work if `A` and `B` are different sizes, since we will be -either reading from or writing to memory we don't necessarily own: +either reading from or writing to memory that we don't necessarily own: {% highlight pycon %} @@ -192,21 +192,259 @@ delving into the actual problem. ### Feature demonstration -Guppy's interface is deceptively simple. We begin by creating an instance of -the Heapy interface, which is the component of Guppy that has the features we -want: +Guppy's interface is deceptively simple. We begin by calling +[`guppy.hpy()`](http://guppy-pe.sourceforge.net/guppy.html#kindnames.guppy.hpy), +to expose the Heapy interface, which is the component of Guppy that has the +features we want: {% highlight pycon %} >>> import guppy >>> hp = guppy.hpy() +>>> hp +Top level interface to Heapy. +Use eg: hp.doc for more info on hp. {% endhighlight %} -[...] +Calling +[`hp.heap()`](http://guppy-pe.sourceforge.net/heapy_Use.html#heapykinds.Use.heap) +shows us a table of the objects known to Guppy, grouped together +(mathematically speaking, +[_partitioned_](https://en.wikipedia.org/wiki/Partition_of_a_set)) by +type[3] and sorted by how much space +they take up in memory: + +{% highlight pycon %} + +>>> heap = hp.heap() +>>> heap +Partition of a set of 45761 objects. Total size = 4699200 bytes. + Index Count % Size % Cumulative % Kind (class / dict of class) + 0 15547 34 1494736 32 1494736 32 str + 1 8356 18 770272 16 2265008 48 tuple + 2 346 1 452080 10 2717088 58 dict (no owner) + 3 13685 30 328440 7 3045528 65 int + 4 71 0 221096 5 3266624 70 dict of module + 5 1652 4 211456 4 3478080 74 types.CodeType + 6 199 0 210856 4 3688936 79 dict of type + 7 1614 4 193680 4 3882616 83 function + 8 199 0 177008 4 4059624 86 type + 9 124 0 135328 3 4194952 89 dict of class +<91 more rows. Type e.g. '_.more' to view.> + +{% endhighlight %} + +This object (called an +[`IdentitySet`](http://guppy-pe.sourceforge.net/heapy_UniSet.html#heapykinds.IdentitySet)) +looks bizarre, but it can be treated roughly like a list. If we want to take a +look at strings, we can do `heap[0]`: + +{% highlight pycon %} + +>>> heap[0] +Partition of a set of 22606 objects. Total size = 2049896 bytes. + Index Count % Size % Cumulative % Kind (class / dict of class) + 0 22606 100 2049896 100 2049896 100 str + +{% endhighlight %} + +This isn't very useful, though. What we really want to do is re-partition this +subset by another relationship. There are a number of options: + +{% highlight pycon %} + +>>> heap[0].byid # Group by object ID; each subset therefore has one element +Set of 22606 objects. Total size = 2049896 bytes. + Index Size % Cumulative % Representation (limited) + 0 7480 0.4 7480 0.4 'The class Bi... copy of S.\n' + 1 4872 0.2 12352 0.6 "Support for ... 'error'.\n\n" + 2 4760 0.2 17112 0.8 'Heap queues\...at Art! :-)\n' + 3 4760 0.2 21872 1.1 'Heap queues\...at Art! :-)\n' + 4 3896 0.2 25768 1.3 'This module ...ng function\n' + 5 3824 0.2 29592 1.4 'The type of ...call order.\n' + 6 3088 0.2 32680 1.6 't\x00\x00|\x...x00|\x02\x00S' + 7 2992 0.1 35672 1.7 'HeapView(roo... size, etc.\n' + 8 2808 0.1 38480 1.9 'Directory tr...ories\n\n ' + 9 2640 0.1 41120 2.0 'The class No... otherwise.\n' +<22596 more rows. Type e.g. '_.more' to view.> + +{% endhighlight %} + +{% highlight pycon %} + +>>> heap[0].byrcs # Group by what types of objects reference the strings +Partition of a set of 22606 objects. Total size = 2049896 bytes. + Index Count % Size % Cumulative % Referrers by Kind (class / dict of class) + 0 6146 27 610752 30 610752 30 types.CodeType + 1 5304 23 563984 28 1174736 57 tuple + 2 4104 18 237536 12 1412272 69 dict (no owner) + 3 1959 9 139880 7 1552152 76 list + 4 564 2 136080 7 1688232 82 function, tuple + 5 809 4 97896 5 1786128 87 dict of module + 6 346 2 71760 4 1857888 91 dict of type + 7 365 2 19408 1 1877296 92 dict of module, tuple + 8 192 1 16176 1 1893472 92 dict (no owner), list + 9 232 1 11784 1 1905256 93 dict of class, function, tuple, types.CodeType +<229 more rows. Type e.g. '_.more' to view.> + +{% endhighlight %} + +{% highlight pycon %} + +>>> heap[0].byvia # Group by how the strings are related to their referrers +Partition of a set of 22606 objects. Total size = 2049896 bytes. + Index Count % Size % Cumulative % Referred Via: + 0 2656 12 420456 21 420456 21 '[0]' + 1 2095 9 259008 13 679464 33 '.co_code' + 2 2095 9 249912 12 929376 45 '.co_filename' + 3 564 2 136080 7 1065456 52 '.func_doc', '[0]' + 4 243 1 103528 5 1168984 57 "['__doc__']" + 5 1930 9 100584 5 1269568 62 '.co_lnotab' + 6 502 2 31128 2 1300696 63 '[1]' + 7 306 1 16272 1 1316968 64 '[2]' + 8 242 1 12960 1 1329928 65 '[3]' + 9 184 1 9872 0 1339800 65 '[4]' +<7323 more rows. Type e.g. '_.more' to view.> + +{% endhighlight %} + +From this, we can see that the plurality of memory devoted to strings is taken +up by those referenced by code objects (`types.CodeType` represents +Python code—accessible from a non-C-defined function through +`func.func_code`—and contains things like the names of its local variables and +the actual sequence of opcodes that make it up). + +For fun, let's pick a random string. + +{% highlight pycon %} + +>>> import random +>>> obj = heap[0].byid[random.randrange(0, heap[0].count)] +>>> obj +Set of 1 object. Total size = 176 bytes. + Index Size % Cumulative % Representation (limited) + 0 176 100.0 176 100.0 'Define names...not listed.\n' + +{% endhighlight %} + +Interesting. Since this heap subset contains only one element, we can use +[`.theone`](http://guppy-pe.sourceforge.net/heapy_UniSet.html#heapykinds.IdentitySetSingleton.theone) +to get the actual object represented here: + +{% highlight pycon %} + +>>> obj.theone +'Define names for all type symbols known in the standard interpreter.\n\nTypes that are part of optional modules (e.g. array) are not listed.\n' + +{% endhighlight %} + +Looks like the docstring for the +[`types`](https://docs.python.org/2/library/types.html) module. We can confirm +by using +[`.referrers`](http://guppy-pe.sourceforge.net/heapy_UniSet.html#heapykinds.IdentitySet.referrers) +to get the set of objects that refer to objects in the given set: + +{% highlight pycon %} + +>>> obj.referrers +Partition of a set of 1 object. Total size = 3352 bytes. + Index Count % Size % Cumulative % Kind (class / dict of class) + 0 1 100 3352 100 3352 100 dict of module + +{% endhighlight %} + +This is `types.__dict__` (since the docstring we got is actually stored as +`types.__dict__["__doc__"]`), so if we use `.referrers` again: + +{% highlight pycon %} + +>>> obj.referrers.referrers +Partition of a set of 1 object. Total size = 56 bytes. + Index Count % Size % Cumulative % Kind (class / dict of class) + 0 1 100 56 100 56 100 module +>>> obj.referrers.referrers.theone + +>>> import types +>>> types.__doc__ is obj.theone +True + +{% endhighlight %} + +_But why did we find an object in the `types` module if we never imported it?_ +Well, let's see. We can use +[`hp.iso()`](http://guppy-pe.sourceforge.net/heapy_Use.html#heapykinds.Use.iso) +to get the Heapy set consisting of a single given object: + +{% highlight pycon %} + +>>> hp.iso(types) +Partition of a set of 1 object. Total size = 56 bytes. + Index Count % Size % Cumulative % Kind (class / dict of class) + 0 1 100 56 100 56 100 module + +{% endhighlight %} + +Using a similar procedure as before, we see that `types` is imported by the +[`traceback`](https://docs.python.org/2/library/traceback.html) module: + +{% highlight pycon %} + +>>> hp.iso(types).referrers +Partition of a set of 10 objects. Total size = 25632 bytes. + Index Count % Size % Cumulative % Kind (class / dict of class) + 0 2 20 13616 53 13616 53 dict (no owner) + 1 5 50 9848 38 23464 92 dict of module + 2 1 10 1048 4 24512 96 dict of guppy.etc.Glue.Interface + 3 1 10 1048 4 25560 100 dict of guppy.etc.Glue.Share + 4 1 10 72 0 25632 100 tuple +>>> hp.iso(types).referrers[1].byid +Set of 5 objects. Total size = 9848 bytes. + Index Size % Cumulative % Owner Name + 0 3352 34.0 3352 34.0 traceback + 1 3352 34.0 6704 68.1 warnings + 2 1048 10.6 7752 78.7 __main__ + 3 1048 10.6 8800 89.4 abc + 4 1048 10.6 9848 100.0 guppy.etc.Glue + +{% endhighlight %} + +...and that is imported by +[`site`](https://docs.python.org/2/library/site.html): + +{% highlight pycon %} + +>>> import traceback +>>> hp.iso(traceback).referrers +Partition of a set of 3 objects. Total size = 15992 bytes. + Index Count % Size % Cumulative % Kind (class / dict of class) + 0 1 33 12568 79 12568 79 dict (no owner) + 1 1 33 3352 21 15920 100 dict of module + 2 1 33 72 0 15992 100 tuple +>>> hp.iso(traceback).referrers[1].byid +Set of 1 object. Total size = 3352 bytes. + Index Size % Cumulative % Owner Name + 0 3352 100.0 3352 100.0 site + +{% endhighlight %} + +Since `site` is imported by Python on startup, we've figured out why objects +from `types` exist, even though we've never used them. + +We've learned something important, too. When objects are stored as ordinary +attributes of a parent object (like `types.__doc__`, `traceback.types`, and +`site.traceback` from above), they are not referenced directly by the parent +object, but by that object's `__dict__` attribute. Therefore, if we want to +replace `A` with `B` and `A` is an attribute of `C`, we (probably) don't need +to know anything special about `C`—just how to modify dictionaries. + +A good Guppy/Heapy tutorial, while a bit old and incomplete, can be found on +[Andrey Smirnov's website](http://smira.ru/wp-content/uploads/2011/08/heapy.html). ## Handling different reference types +[...] + ### Dictionaries dicts, class attributes via `__dict__`, locals() @@ -260,7 +498,7 @@ Remaining areas to explore include behavior when metaclasses and more complex descriptors are involved. Implementing a more complete version of `replace()` is left as an exercise for the reader. -## Notes +## Footnotes 1. ^ This post relies _heavily_ on implementation details of CPython 2.7. While it could be adapted for Python 3 by examining @@ -274,3 +512,7 @@ is left as an exercise for the reader. [DOT files](https://en.wikipedia.org/wiki/DOT_(graph_description_language)) used to generate graphs in this post are [available on Gist](https://gist.github.com/earwig/edc13f04f871c110eea6). + +3. ^ They're actually grouped together by _clodo_ + ("class or dict object"), which is similar to type, but groups `__dict__`s + separately by their owner's type. From 381b5e6d2da7dd1df0916210b6dfedb7eeed5370 Mon Sep 17 00:00:00 2001 From: Ben Kurtovic Date: Mon, 17 Nov 2014 14:49:41 -0600 Subject: [PATCH 11/22] More work on replacement. --- _drafts/python-object-replacement.md | 126 ++++++++++++++++++++++++++++++++++- 1 file changed, 124 insertions(+), 2 deletions(-) diff --git a/_drafts/python-object-replacement.md b/_drafts/python-object-replacement.md index 0db4f0b..0441928 100644 --- a/_drafts/python-object-replacement.md +++ b/_drafts/python-object-replacement.md @@ -180,7 +180,7 @@ A more correct solution is finding all of the _references_ to the old object, and then updating them to point to the new object, rather than replacing the old object directly. -But how do we track references? Fortunately, there is a library called +But how do we track references? Fortunately, there's a library called [Guppy](http://guppy-pe.sourceforge.net/) that allows us to do this. Often used for diagnosing memory leaks, we can take advantage of its robust object tracking features here. Install it with [pip](https://pypi.python.org/pypi/pip) @@ -250,7 +250,7 @@ Partition of a set of 22606 objects. Total size = 2049896 bytes. {% endhighlight %} This isn't very useful, though. What we really want to do is re-partition this -subset by another relationship. There are a number of options: +subset using another relationship. There are a number of options, such as: {% highlight pycon %} @@ -441,6 +441,128 @@ to know anything special about `C`—just how to modify dictionaries. A good Guppy/Heapy tutorial, while a bit old and incomplete, can be found on [Andrey Smirnov's website](http://smira.ru/wp-content/uploads/2011/08/heapy.html). +## Examining paths + +Let's set up an example replacement using class instances: + +{% highlight python %} + +class A(object): + pass + +class B(object): + pass + +a = A() +b = B() + +{% endhighlight %} + +Suppose we want to replace `a` with `b`. From the demo above, we know that we +can get the Heapy set of a single object using `hp.iso()`. We also know we can +use `.referrers` to get a set of objects that reference the given object: + +{% highlight pycon %} + +>>> import guppy +>>> hp = guppy.hpy() +>>> print hp.iso(a).referrers +Partition of a set of 1 object. Total size = 1048 bytes. + Index Count % Size % Cumulative % Kind (class / dict of class) + 0 1 100 1048 100 1048 100 dict of module + +{% endhighlight %} + +`a` is only referenced by one object, which makes sense, since we've only used +it in one place—as a local variable—meaning `hp.iso(a).referrers.theone` must +be [`locals()`](https://docs.python.org/2/library/functions.html#locals): + +{% highlight pycon %} + +>>> hp.iso(a).referrers.theone is locals() +True + +{% highlight pycon %} + +However, there is a more useful feature available to us: +[`.pathsin`](http://guppy-pe.sourceforge.net/heapy_UniSet.html#heapykinds.IdentitySet.pathsin). +This also returns references to the given object, but instead of a Heapy set, +it is a list of `Path` objects. These are more useful since they tell us not +only _what_ objects are related to the given object, but _how_ they are +related. + +{% highlight pycon %} + +>>> print hp.iso(a).pathsin + 0: Src['a'] + +{% endhighlight %} + +This looks very ambiguous. However, we find that we can extract the source of +the reference using `.src`: + +{% highlight pycon %} + +>>> path = hp.iso(a).pathsin[0] +>>> print path.src +Partition of a set of 1 object. Total size = 1048 bytes. + Index Count % Size % Cumulative % Kind (class / dict of class) + 0 1 100 1048 100 1048 100 dict of module +>>> path.src.theone is locals() +True + +{% endhighlight %} + +...and, we can examine the type of relation by looking at `.path[1]` (the +actual reason for this isn't worth getting into, due to Guppy's lack of +documentation on the subject): + +{% highlight pycon %} + +>>> relation = path.path[1] +>>> relation + + +{% endhighlight %} + +We notice that `relation` is a `Based_R_INDEXVAL` object. Sounds bizarre, but +this tells us that `path.src` is related to `a` by being a particular index +value of it. What index? We can get this using `relation.r`: + +{% highlight pycon %} + +>>> rel = relation.r +>>> print rel +a + +{% endhighlight %} + +Ah ha! So now we know that `a` is equal to the reference source indexed by +`rel`. But what is the reference source? It's just `path.src.theone`: + +{% highlight pycon %} + +>>> path.src.theone[rel] is a +True + +{% endhighlight %} + +But `path.src.theone` is just a dictionary, meaning we know how to modify it +very easily: + +{% highlight pycon %} + +>>> path.src.theone[rel] = b +>>> a +<__main__.B object at 0x100dae090> +>>> a is b +True + +{% endhighlight %} + +Python's documentation tells us not to modify the locals dictionary, but screw +it, we're gonna do it anyway. + ## Handling different reference types [...] From 42b06f9d37523419e2d77fe57a3314044953a728 Mon Sep 17 00:00:00 2001 From: Ben Kurtovic Date: Mon, 17 Nov 2014 14:50:07 -0600 Subject: [PATCH 12/22] Fix typo. --- _drafts/python-object-replacement.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_drafts/python-object-replacement.md b/_drafts/python-object-replacement.md index 0441928..be7432d 100644 --- a/_drafts/python-object-replacement.md +++ b/_drafts/python-object-replacement.md @@ -482,7 +482,7 @@ be [`locals()`](https://docs.python.org/2/library/functions.html#locals): >>> hp.iso(a).referrers.theone is locals() True -{% highlight pycon %} +{% endhighlight %} However, there is a more useful feature available to us: [`.pathsin`](http://guppy-pe.sourceforge.net/heapy_UniSet.html#heapykinds.IdentitySet.pathsin). From 974c75a21f0a61af4211892bde51ea271eb0cff0 Mon Sep 17 00:00:00 2001 From: Ben Kurtovic Date: Mon, 17 Nov 2014 15:26:08 -0600 Subject: [PATCH 13/22] Clean up recent additions; stub for next section. --- _drafts/python-object-replacement.md | 28 +++++++++++++++++++--------- 1 file changed, 19 insertions(+), 9 deletions(-) diff --git a/_drafts/python-object-replacement.md b/_drafts/python-object-replacement.md index be7432d..27584bf 100644 --- a/_drafts/python-object-replacement.md +++ b/_drafts/python-object-replacement.md @@ -460,7 +460,7 @@ b = B() Suppose we want to replace `a` with `b`. From the demo above, we know that we can get the Heapy set of a single object using `hp.iso()`. We also know we can -use `.referrers` to get a set of objects that reference the given object: +use `.referrers` to get the set of objects that reference the given object: {% highlight pycon %} @@ -526,8 +526,8 @@ documentation on the subject): {% endhighlight %} We notice that `relation` is a `Based_R_INDEXVAL` object. Sounds bizarre, but -this tells us that `path.src` is related to `a` by being a particular index -value of it. What index? We can get this using `relation.r`: +this tells us that `a` is a particular indexed value of `path.src`. What index? +We can get this using `relation.r`: {% highlight pycon %} @@ -537,8 +537,8 @@ a {% endhighlight %} -Ah ha! So now we know that `a` is equal to the reference source indexed by -`rel`. But what is the reference source? It's just `path.src.theone`: +Ah ha! So now we know that `a` is equal to the reference source (i.e., +`path.src.theone`) indexed by `rel`: {% highlight pycon %} @@ -548,7 +548,7 @@ True {% endhighlight %} But `path.src.theone` is just a dictionary, meaning we know how to modify it -very easily: +very easily:[4] {% highlight pycon %} @@ -560,12 +560,19 @@ True {% endhighlight %} -Python's documentation tells us not to modify the locals dictionary, but screw -it, we're gonna do it anyway. +Bingo. We've successfully replaced `a` with `b`, using a general method that +should work for any case where `a` is in a dictionary-like object. ## Handling different reference types -[...] +We'll continue by wrapping this code up in a nice function: + +{% highlight python %} + +def replace(old, new): + pass + +{% endhighlight %} ### Dictionaries @@ -638,3 +645,6 @@ is left as an exercise for the reader. 3. ^ They're actually grouped together by _clodo_ ("class or dict object"), which is similar to type, but groups `__dict__`s separately by their owner's type. + +4. ^ Python's documentation tells us not to modify + the locals dictionary, but screw that; we're gonna do it anyway. From 56e3a4a913e3e761928210d8dbdb7068a0f68ce5 Mon Sep 17 00:00:00 2001 From: Ben Kurtovic Date: Wed, 19 Nov 2014 02:45:56 -0600 Subject: [PATCH 14/22] Finish dictionaries, lists, and tuples. --- _drafts/python-object-replacement.md | 95 +++++++++++++++++++++++++++++++++--- 1 file changed, 87 insertions(+), 8 deletions(-) diff --git a/_drafts/python-object-replacement.md b/_drafts/python-object-replacement.md index 27584bf..d2fd9ad 100644 --- a/_drafts/python-object-replacement.md +++ b/_drafts/python-object-replacement.md @@ -565,26 +565,105 @@ should work for any case where `a` is in a dictionary-like object. ## Handling different reference types -We'll continue by wrapping this code up in a nice function: +We'll continue by wrapping this code up in a nice function, which we will +expand as we go: {% highlight python %} +import guppy +from guppy.heapy import Path + +hp = guppy.hpy() + def replace(old, new): - pass + for path in hp.iso(old).pathsin: + relation = path.path[1] + if isinstance(relation, Path.R_INDEXVAL): + path.src.theone[relation.r] = new {% endhighlight %} -### Dictionaries +### Dictionaries, lists, and tuples + +As noted above, this is versatile to handle many dictionary-like situations, +including `__dict__`, which means we already know how to replace object +attributes: + +{% highlight pycon %} + +>>> a, b = A(), B() +>>> +>>> class X(object): +... pass +... +>>> X.cattr = a +>>> x = X() +>>> x.iattr = a +>>> d1 = {1: a} +>>> d2 = [{1: {0: ("foo", "bar", {"a": a, "b": b})}}] +>>> +>>> replace(a, b) +>>> +>>> print a +<__main__.B object at 0x1042b9910> +>>> print X.cattr +<__main__.B object at 0x1042b9910> +>>> print x.iattr +<__main__.B object at 0x1042b9910> +>>> print d1[1] +<__main__.B object at 0x1042b9910> +>>> print d2[0][1][0][2]["a"] +<__main__.B object at 0x1042b9910> + +{% endhighlight %} + +Lists can be handled exactly the same as dictionaries, although the keys in +this case (i.e., `relation.r`) will always be integers. -dicts, class attributes via `__dict__`, locals() +{% highlight pycon %} -### Lists +>>> a, b = A(), B() +>>> L = [0, 1, 2, a, b] +>>> print L +[0, 1, 2, <__main__.A object at 0x104598950>, <__main__.B object at 0x104598910>] +>>> replace(a, b) +>>> print L +[0, 1, 2, <__main__.B object at 0x104598910>, <__main__.B object at 0x104598910>] -simple replacement +{% endhighlight %} -### Tuples +Tuples are interesting. We can't modify them directly because they're +immutable, but we _can_ create a new tuple with the new value, and then replace +that tuple just like we replaced our original object: -recursively replace parent since immutable +{% highlight python %} + + # Meanwhile, in replace()... + if isinstance(relation, Path.R_INDEXVAL): + source = path.src.theone + if isinstance(source, tuple): + temp = list(source) + temp[relation.r] = new + replace(source, tuple(temp)) + else: + source[relation.r] = new + +{% endhighlight %} + +As a result: + +{% highlight pycon %} + +>>> a, b = A(), B() +>>> t1 = (0, 1, 2, a) +>>> t2 = (0, (1, (2, (3, (4, (5, (a,))))))) +>>> replace(a, b) +>>> print t1 +(0, 1, 2, <__main__.B object at 0x104598e50>) +>>> print t2 +(0, (1, (2, (3, (4, (5, (<__main__.B object at 0x104598e50>,))))))) + +{% endhighlight %} ### Bound methods From f25dd72d0c39f90a83c63985133b197a438ef429 Mon Sep 17 00:00:00 2001 From: Ben Kurtovic Date: Wed, 19 Nov 2014 14:47:32 -0600 Subject: [PATCH 15/22] Work on the bound method section. --- _drafts/python-object-replacement.md | 73 +++++++++++++++++++++++++++++++++++- 1 file changed, 71 insertions(+), 2 deletions(-) diff --git a/_drafts/python-object-replacement.md b/_drafts/python-object-replacement.md index d2fd9ad..27817fc 100644 --- a/_drafts/python-object-replacement.md +++ b/_drafts/python-object-replacement.md @@ -667,8 +667,77 @@ As a result: ### Bound methods -note that built-in methods and regular methods have different underlying C -structs, but have the same offsets for their self field +Here's a fun one. Let's upgrade our definitions of `A` and `B`: + +{% highlight python %} + +class A(object): + def func(self): + return self + +class B(object): + pass + +{% endhighlight %} + +After replacing `a` with `b`, `a.func` no longer exists, as we'd expect: + +{% highlight pycon %} + +>>> a, b = A(), B() +>>> a.func() +<__main__.A object at 0x10c4a5b10> +>>> replace(a, b) +>>> a.func() +Traceback (most recent call last): + File "", line 1, in +AttributeError: 'B' object has no attribute 'func' + +{% endhighlight %} + +But what if we save a reference to `a.func` before the replacement? + +{% highlight pycon %} + +>>> a, b = A(), B() +>>> f = a.func +>>> replace(a, b) +>>> f() +<__main__.A object at 0x10c4b6090> + +{% endhighlight %} + +Hmm. So `f` has kept a reference to `a` somehow, but not in a dictionary-like +object. So where is it? + +Well, we can reveal it with the attribute `f.__self__`: + +{% highlight pycon %} + +>>> f.__self__ +<__main__.A object at 0x10c4b6090> + +{% endhighlight %} + +Unfortunately, this attribute is magical and we can't write to it directly: + +{% highlight pycon %} + +>>> f.__self__ = b +Traceback (most recent call last): + File "", line 1, in +TypeError: readonly attribute + +{% endhighlight %} + +Python clearly doesn't want us to re-bind bound methods, and a reasonable +person would give up here, but we still have a few tricks up our sleeve. Let's +examine the internal C structure of bound methods, +[`PyMethodObject`](https://github.com/python/cpython/blob/2.7/Include/classobject.h#L31). + +[...] + +[`PyCFunctionObject`](https://github.com/python/cpython/blob/2.7/Include/methodobject.h#L81) ### Closure cells From 207654c73f18d5ec809d7ba9a8b9f2b2add97e26 Mon Sep 17 00:00:00 2001 From: Ben Kurtovic Date: Sat, 22 Nov 2014 12:32:20 -0600 Subject: [PATCH 16/22] Add graph for PyMethodObject. --- _drafts/python-object-replacement.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/_drafts/python-object-replacement.md b/_drafts/python-object-replacement.md index 27817fc..338ca75 100644 --- a/_drafts/python-object-replacement.md +++ b/_drafts/python-object-replacement.md @@ -1,6 +1,6 @@ --- layout: post -title: Replacing Objects in Python +title: Finding and Replacing Objects in Python tags: Python description: More reflection than you cared to ask for draft: true @@ -735,7 +735,9 @@ person would give up here, but we still have a few tricks up our sleeve. Let's examine the internal C structure of bound methods, [`PyMethodObject`](https://github.com/python/cpython/blob/2.7/Include/classobject.h#L31). -[...] +%3clusterPyMethodObjectobj<__main__.A object at 0xdeadbeef>structstruct _object* _ob_nextstruct _object* _ob_prevPy_ssize_t ob_refcntstruct _typeobject* ob_typePyObject* im_funcPyObject* im_selfPyObject* im_classPyObject* im_weakrefliststruct:f->obj + +[`_PyObject_HEAD_EXTRA`](https://github.com/python/cpython/blob/2.7/Include/object.h#L64) [`PyCFunctionObject`](https://github.com/python/cpython/blob/2.7/Include/methodobject.h#L81) From e2babb35f932b0ee817c61cee7616425a9483926 Mon Sep 17 00:00:00 2001 From: Ben Kurtovic Date: Sat, 22 Nov 2014 13:02:01 -0600 Subject: [PATCH 17/22] More work on bound methods. --- _drafts/python-object-replacement.md | 39 ++++++++++++++++++++++++++++++++++-- 1 file changed, 37 insertions(+), 2 deletions(-) diff --git a/_drafts/python-object-replacement.md b/_drafts/python-object-replacement.md index 338ca75..81688c7 100644 --- a/_drafts/python-object-replacement.md +++ b/_drafts/python-object-replacement.md @@ -737,13 +737,48 @@ examine the internal C structure of bound methods, %3clusterPyMethodObjectobj<__main__.A object at 0xdeadbeef>structstruct _object* _ob_nextstruct _object* _ob_prevPy_ssize_t ob_refcntstruct _typeobject* ob_typePyObject* im_funcPyObject* im_selfPyObject* im_classPyObject* im_weakrefliststruct:f->obj -[`_PyObject_HEAD_EXTRA`](https://github.com/python/cpython/blob/2.7/Include/object.h#L64) +The four gray fields of the struct come from +[`PyObject_HEAD`](https://github.com/python/cpython/blob/2.7/Include/object.h#L78), +which exist in all Python objects. The first two fields are from +[`_PyObject_HEAD_EXTRA`](https://github.com/python/cpython/blob/2.7/Include/object.h#L66), +and only exist when the debugging macro `Py_TRACE_REFS` is defined, in order to +support more advanced reference counting. We can see that the `im_self` field, +which mantains the reference to our target object, is either forth or sixth in +the struct depending on `Py_TRACE_REFS`. If we can figure out the size of the +field and its offset from the start of the struct, then we can set its value +directly using `ctypes.memmove()`: + +{% highlight python %} + +ctypes.memmove(id(f) + offset, ctypes.byref(ctypes.py_object(b)), field_size) + +{% endhighlight %} + +Here, `id(f)` is the memory location of our method, which refers to the start +of the C struct from above. `offset` is the number of bytes between this memory +location and the start of the `im_self` field. We use +[`ctypes.byref()`](https://docs.python.org/2/library/ctypes.html#ctypes.byref) +to create a reference to the replacement object, `b`, which will be copied over +the existing reference to `a`. Finally, `field_size` is the number of bytes +we're copying, equal to the size of the `im_self` field. + +Well, all but one of these fields are pointers, meaning they have the same +size, equal to +[`ctypes.sizeof(ctypes.c_void_p)`](https://docs.python.org/2/library/ctypes.html#ctypes.sizeof). +This is (probably) 4 or 8 bytes, depending on whether you're on a 32-bit or a +64-bit system. The other field is a `Py_ssize_t` object—_very_ likely to be +the same size as a pointer, but we can't be sure—which is equal to +`ctypes.sizeof(ctypes.c_ssize_t)`. [`PyCFunctionObject`](https://github.com/python/cpython/blob/2.7/Include/methodobject.h#L81) +### Dictionary keys + +... + ### Closure cells -function closures +... ### Frames From 1567a8fe5ad39036eb9343ce1a12bd875eb64920 Mon Sep 17 00:00:00 2001 From: Ben Kurtovic Date: Mon, 24 Nov 2014 04:17:41 -0500 Subject: [PATCH 18/22] Clarify nuances of C standards. --- _drafts/python-object-replacement.md | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/_drafts/python-object-replacement.md b/_drafts/python-object-replacement.md index 81688c7..aad63da 100644 --- a/_drafts/python-object-replacement.md +++ b/_drafts/python-object-replacement.md @@ -762,12 +762,12 @@ to create a reference to the replacement object, `b`, which will be copied over the existing reference to `a`. Finally, `field_size` is the number of bytes we're copying, equal to the size of the `im_self` field. -Well, all but one of these fields are pointers, meaning they have the same -size, equal to -[`ctypes.sizeof(ctypes.c_void_p)`](https://docs.python.org/2/library/ctypes.html#ctypes.sizeof). +Well, all but one of these fields are pointers to structure types, meaning they +have the same size,[5] equal to +[`ctypes.sizeof(ctypes.py_object)`](https://docs.python.org/2/library/ctypes.html#ctypes.sizeof). This is (probably) 4 or 8 bytes, depending on whether you're on a 32-bit or a -64-bit system. The other field is a `Py_ssize_t` object—_very_ likely to be -the same size as a pointer, but we can't be sure—which is equal to +64-bit system. The other field is a `Py_ssize_t` object—possibly the same size +as the pointers, but we can't be sure—which is equal to `ctypes.sizeof(ctypes.c_ssize_t)`. [`PyCFunctionObject`](https://github.com/python/cpython/blob/2.7/Include/methodobject.h#L81) @@ -833,3 +833,10 @@ is left as an exercise for the reader. 4. ^ Python's documentation tells us not to modify the locals dictionary, but screw that; we're gonna do it anyway. + +5. ^ According to the + [C99](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf) and + [C11 standards](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf); + section 6.2.5.27 in the former and 6.2.5.28 in the latter: "All pointers to + structure types shall have the same representation and alignment + requirements as each other." From b6f107856a64389d0fae19cdb08f0e19ffb8155d Mon Sep 17 00:00:00 2001 From: Ben Kurtovic Date: Wed, 3 Dec 2014 01:36:24 -0600 Subject: [PATCH 19/22] More on bound methods. --- _drafts/python-object-replacement.md | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/_drafts/python-object-replacement.md b/_drafts/python-object-replacement.md index aad63da..191e5d9 100644 --- a/_drafts/python-object-replacement.md +++ b/_drafts/python-object-replacement.md @@ -770,6 +770,33 @@ This is (probably) 4 or 8 bytes, depending on whether you're on a 32-bit or a as the pointers, but we can't be sure—which is equal to `ctypes.sizeof(ctypes.c_ssize_t)`. +We know that `field_size` must be `ctypes.sizeof(ctypes.py_object)`, since we +are copying a structure pointer. `offset` is this value multiplied by the +number of structure pointers before `im_self` (4 if `Py_TRACE_REFS` is defined +and 2 otherwise), plus `ctypes.sizeof(ctypes.c_ssize_t)` for `ob_type`. But how +do we determine if `Py_TRACE_REFS` is defined? We can't check the value of a +macro at runtime, but we can check for the existence of +[`sys.getobjects()`](https://github.com/python/cpython/blob/2.7/Misc/SpecialBuilds.txt#L54), +which is +[only defined when that macro is](https://github.com/python/cpython/blob/2.7/Python/sysmodule.c#L951). +Therefore, we can make our replacement like so: + +{% highlight pycon %} + +>>> import ctypes +>>> import sys +>>> field_size = ctypes.sizeof(ctypes.py_object) +>>> ptrs_in_struct = 4 if hasattr(sys, "getobjects") else 2 +>>> offset = ptrs_in_struct * field_size + ctypes.sizeof(ctypes.c_ssize_t) +>>> ctypes.memmove(id(f) + offset, ctypes.byref(ctypes.py_object(b)), field_size) +4470258440 +>>> f() +<__main__.B object at 0x10a8af290> + +{% endhighlight %} + +Excellent—it worked! + [`PyCFunctionObject`](https://github.com/python/cpython/blob/2.7/Include/methodobject.h#L81) ### Dictionary keys From 92e437d07b8d8a17d1d7e903c3eb1f44721580dd Mon Sep 17 00:00:00 2001 From: Ben Kurtovic Date: Wed, 3 Dec 2014 12:43:16 -0600 Subject: [PATCH 20/22] Add PyCFunctionObjects to bound methods section. --- _drafts/python-object-replacement.md | 45 ++++++++++++++++++++++++++++++++---- 1 file changed, 40 insertions(+), 5 deletions(-) diff --git a/_drafts/python-object-replacement.md b/_drafts/python-object-replacement.md index 191e5d9..c033425 100644 --- a/_drafts/python-object-replacement.md +++ b/_drafts/python-object-replacement.md @@ -44,7 +44,7 @@ a = [1, 2, 3, 4] ...we are creating a list object with four integers, and binding it to the name `a`. In graph form:[2] -%3L[1, 2, 3, 4]aaa->L +L[1, 2, 3, 4]aaa->L In each of the following examples, we are creating new _references_ to the list object, but we are never duplicating it. Each reference points to the same @@ -74,7 +74,7 @@ d = wrapper(a) {% endhighlight %} -%3cluster0dobj[1, 2, 3, 4]aaa->objbbb->objcc.datac->objLLL->obj +cluster0dobj[1, 2, 3, 4]aaa->objbbb->objcc.datac->objLLL->obj Note that these references are all equal. `a` is no more valid a name for the list than `b`, `c.data`, or `L` (from the perspective of `d`, which is exposed @@ -733,9 +733,9 @@ TypeError: readonly attribute Python clearly doesn't want us to re-bind bound methods, and a reasonable person would give up here, but we still have a few tricks up our sleeve. Let's examine the internal C structure of bound methods, -[`PyMethodObject`](https://github.com/python/cpython/blob/2.7/Include/classobject.h#L31). +[`PyMethodObject`](https://github.com/python/cpython/blob/2.7/Include/classobject.h#L31): -%3clusterPyMethodObjectobj<__main__.A object at 0xdeadbeef>structstruct _object* _ob_nextstruct _object* _ob_prevPy_ssize_t ob_refcntstruct _typeobject* ob_typePyObject* im_funcPyObject* im_selfPyObject* im_classPyObject* im_weakrefliststruct:f->obj +clusterPyMethodObjectobj<__main__.A object at 0xdeadbeef>structstruct _object* _ob_nextstruct _object* _ob_prevPy_ssize_t ob_refcntstruct _typeobject* ob_typePyObject* im_funcPyObject* im_selfPyObject* im_classPyObject* im_weakrefliststruct:f->obj The four gray fields of the struct come from [`PyObject_HEAD`](https://github.com/python/cpython/blob/2.7/Include/object.h#L78), @@ -790,6 +790,8 @@ Therefore, we can make our replacement like so: >>> offset = ptrs_in_struct * field_size + ctypes.sizeof(ctypes.c_ssize_t) >>> ctypes.memmove(id(f) + offset, ctypes.byref(ctypes.py_object(b)), field_size) 4470258440 +>>> f.__self__ is b +True >>> f() <__main__.B object at 0x10a8af290> @@ -797,7 +799,40 @@ Therefore, we can make our replacement like so: Excellent—it worked! -[`PyCFunctionObject`](https://github.com/python/cpython/blob/2.7/Include/methodobject.h#L81) +There's another kind of bound method, which is the built-in variety as opposed +to the user-defined variety we saw above. An example is `a.__sizeof__()`: + +{% highlight pycon %} + +>>> a, b = A(), B() +>>> f = a.__sizeof__ +>>> f + +>>> replace(a, b) +>>> f.__self__ +<__main__.A object at 0x10ab44b50> + +{% endhighlight %} + +This is stored internally as a +[`PyCFunctionObject`](https://github.com/python/cpython/blob/2.7/Include/methodobject.h#L81). +Let's take a look at its layout: + +clusterPyCFunctionObjectobj<__main__.A object at 0xdeadbeef>structstruct _object* _ob_nextstruct _object* _ob_prevPy_ssize_t ob_refcntstruct _typeobject* ob_typePyMethodDef* m_mlPyObject* m_selfPyObject* m_modulestruct:f->obj + +Fortunately, `m_self` here has the same offset as `im_self` from before, so we +can just use the same code: + +{% highlight pycon %} + +>>> ctypes.memmove(id(f) + offset, ctypes.byref(ctypes.py_object(b)), field_size) +4474703768 +>>> f.__self__ is b +True +>>> f + + +{% endhighlight %} ### Dictionary keys From 3500c3cae2fe1f71f28ad14aca1f4da913685bba Mon Sep 17 00:00:00 2001 From: Ben Kurtovic Date: Wed, 28 Jan 2015 14:19:41 -0600 Subject: [PATCH 21/22] Finish post. --- _drafts/python-object-replacement.md | 153 ++++++++++++++++++++++++++++------- 1 file changed, 124 insertions(+), 29 deletions(-) diff --git a/_drafts/python-object-replacement.md b/_drafts/python-object-replacement.md index c033425..352fe93 100644 --- a/_drafts/python-object-replacement.md +++ b/_drafts/python-object-replacement.md @@ -22,7 +22,7 @@ on. _But why on Earth would you want to do that?_ you ask. I'll focus on a concrete use case in a future post, but for now, I imagine this could be useful in some kind of advanted unit testing situation with mock objects. Still, it's fairly -insane, so let's leave it as primarily an intellectual exercise. +insane, so let's leave it primarily as an intellectual exercise. This article is written for [CPython](https://en.wikipedia.org/wiki/CPython) 2.7.[1] @@ -77,13 +77,12 @@ d = wrapper(a) cluster0dobj[1, 2, 3, 4]aaa->objbbb->objcc.datac->objLLL->obj Note that these references are all equal. `a` is no more valid a name for the -list than `b`, `c.data`, or `L` (from the perspective of `d`, which is exposed -to everyone else as `d.func_closure[0].cell_contents`, but that's cumbersome -and you would never do that in practice). As a result, if you delete one of -these references—explicitly with `del a`, or implicitly if a name goes out of -scope—then the other references are still around, and object continues to -exist. If all of an object's references disappear, then Python's garbage -collector should eliminate it. +list than `b`, `c.data`, or `L` (or `d.func_closure[0].cell_contents` to the +outside world). As a result, if you delete one of these references—explicitly +with `del a`, or implicitly if a name goes out of scope—then the other +references are still around, and object continues to exist. If all of an +object's references disappear, then Python's garbage collector should eliminate +it. ## Dead ends @@ -176,9 +175,9 @@ Segmentation fault: 11 ## Fishing for references with Guppy -A more correct solution is finding all of the _references_ to the old object, -and then updating them to point to the new object, rather than replacing the -old object directly. +A more appropriate solution is finding all of the _references_ to the old +object, and then updating them to point to the new object, rather than +replacing the old object directly. But how do we track references? Fortunately, there's a library called [Guppy](http://guppy-pe.sourceforge.net/) that allows us to do this. Often used @@ -194,7 +193,7 @@ delving into the actual problem. Guppy's interface is deceptively simple. We begin by calling [`guppy.hpy()`](http://guppy-pe.sourceforge.net/guppy.html#kindnames.guppy.hpy), -to expose the Heapy interface, which is the component of Guppy that has the +to expose the Heapy interface, which is the component of Guppy with the features we want: {% highlight pycon %} @@ -836,31 +835,131 @@ True ### Dictionary keys -... +Dictionary keys have a different reference relation type than values, but the +replacement works mostly the same way. We pop the value of the old key from the +dictionary, and then insert it in again under the new key. Here's the code, +which we'll stick into the main block in `replace()`: + +{% highlight python %} + +elif isinstance(relation, Path.R_INDEXKEY): + source = path.src.theone + source[new] = source.pop(source.keys()[relation.r]) + +{% endhighlight %} + +And, a demonstration: + +{% highlight pycon %} + +>>> a, b = A(), B() +>>> d = {a: 1} +>>> replace(a, b) +>>> d +{<__main__.B object at 0x10fb47950>: 1} + +{% endhighlight %} ### Closure cells -... +We'll cover just one more case, this time involving a +[closure](https://en.wikipedia.org/wiki/Closure_(computer_programming)). Here's +our test function: -### Frames +{% highlight python %} -... +def wrapper(obj): + def inner(): + return obj + return inner -### Slots +{% endhighlight %} -... +As we can see, an instance of the inner function keeps references to the locals +of the wrapper function, even after using our current +version of `replace()`: + +{% highlight pycon %} -### Classes +>>> a, b = A(), B() +>>> f = wrapper(a) +>>> f() +<__main__.A object at 0x109446090> +>>> replace(a, b) +>>> f() +<__main__.A object at 0x109446090> -... +{% endhighlight %} + +Internally, CPython implements this using things called +[_cells_](https://docs.python.org/2/c-api/cell.html). We notice that +`f.func_closure` gives us a tuple of `cell` objects, and we can examine an +individual cell's contents with `.cell_contents`: + +{% highlight pycon %} + +>>> f.func_closure +(,) +>>> f.func_closure[0].cell_contents +<__main__.A object at 0x109446090> + +{% endhighlight %} + +As expected, we can't just modify it... + +{% highlight pycon %} + +>>> f.func_closure[0].cell_contents = b +Traceback (most recent call last): + File "", line 1, in +AttributeError: attribute 'cell_contents' of 'cell' objects is not writable + +{% endhighlight %} + +...because that would be too easy. So, how can we replace it? Well, we could +go back to `memmove`, but there's an easier way thanks to the `ctypes` module +also exposing Python's C API. Specifically, the +[`PyCell_Set`](https://docs.python.org/2/c-api/cell.html#c.PyCell_Set) function +(which seems to lack a pure Python equivalent) does exactly what we want. Since +the function expects `PyObject*`s as arguments, we'll need to use +`ctypes.py_object` as a wrapper. Here it is: + +{% highlight pycon %} + +>>> from ctypes import py_object, pythonapi +>>> pythonapi.PyCell_Set(py_object(f.func_closure[0]), py_object(b)) +0 +>>> f() +<__main__.B object at 0x10ad94dd0> + +{% endhighlight %} + +Perfect – the replacement worked. To tie it together with `replace()`, we'll +note that Guppy represents the cell contents relationship with +`Based_R_INTERATTR`, for what I assume to be "internal attribute". We can use +this to find the cell object within the inner function that references our +target object, and then use the method above to make the change: + +{% highlight python %} + +elif isinstance(relation, Path.R_INTERATTR): + if isinstance(source, CellType): + pythonapi.PyCell_Set(py_object(source), py_object(new)) + return + +{% endhighlight %} ### Other cases -Certainly, not every case is handled above, but it seems to cover the vast -majority of instances that I've found through testing. There are a number of -reference relations in Guppy that I couldn't figure out how to replicate -without doing something insane (`R_HASATTR`, `R_CELL`, and `R_STACK`), so some -obscure replacements are likely unimplemented. +There are many, many more types of possible replacements. I've written a more +extensible version of `replace()` with some test cases, which can be viewed +on Gist [here](https://gist.github.com/earwig/28a64ffb94d51a608e3d). + +Certainly, not every case is handled by it, but it seems to cover the majority +that I've found through testing. There are a number of reference relations in +Guppy that I couldn't figure out how to replicate without doing something +insane (`R_HASATTR`, `R_CELL`, and `R_STACK`), so some obscure replacements are +likely unimplemented. Some other kinds of replacements are known, but impossible. For example, replacing a class object that uses `__slots__` with another class will not work @@ -870,10 +969,6 @@ work if instances of the class exist. Furthermore, references stored in data structures managed by C extensions cannot be changed, since there's no good way for us to track these. -Remaining areas to explore include behavior when metaclasses and more complex -descriptors are involved. Implementing a more complete version of `replace()` -is left as an exercise for the reader. - ## Footnotes 1. ^ This post relies _heavily_ on implementation From 95569acd58b1eeca52d0abaf9cbe2b171cb26b94 Mon Sep 17 00:00:00 2001 From: Ben Kurtovic Date: Wed, 28 Jan 2015 14:21:46 -0600 Subject: [PATCH 22/22] Publish post on Python object replacement. --- .../2015-01-28-python-object-replacement.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename _drafts/python-object-replacement.md => _posts/2015-01-28-python-object-replacement.md (100%) diff --git a/_drafts/python-object-replacement.md b/_posts/2015-01-28-python-object-replacement.md similarity index 100% rename from _drafts/python-object-replacement.md rename to _posts/2015-01-28-python-object-replacement.md