Personal website https://benkurtovic.com/
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

2015-01-28-python-object-replacement.md 44 KiB

9 years ago
10 years ago
9 years ago
9 years ago
9 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
9 years ago
10 years ago
9 years ago
10 years ago
9 years ago
10 years ago
9 years ago
9 years ago
9 years ago
9 years ago
9 years ago
9 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971972973974975976977978979980981982983984985986987988989990991992993994995996997998999
  1. ---
  2. layout: post
  3. title: Finding and Replacing Objects in Python
  4. tags: Python
  5. description: More reflection than you cared to ask for
  6. draft: true
  7. ---
  8. Today, we're going to demonstrate a fairly evil thing in Python, which I call
  9. _object replacement_.
  10. Say you have some program that's been running for a while, and a particular
  11. object has made its way throughout your code. It lives inside lists, class
  12. attributes, maybe even inside some closures. You want to completely replace
  13. this object with another one; that is to say, you want to find all references
  14. to object `A` and replace them with object `B`, enabling `A` to be garbage
  15. collected. This has some interesting implications for special object types. If
  16. you have methods that are bound to `A`, you want to rebind them to `B`. If `A`
  17. is a class, you want all instances of `A` to become instances of `B`. And so
  18. on.
  19. _But why on Earth would you want to do that?_ you ask. I'll focus on a concrete
  20. use case in a future post, but for now, I imagine this could be useful in some
  21. kind of advanted unit testing situation with mock objects. Still, it's fairly
  22. insane, so let's leave it primarily as an intellectual exercise.
  23. This article is written for [CPython](https://en.wikipedia.org/wiki/CPython)
  24. 2.7.<sup><a id="ref1" href="#fn1">[1]</a></sup>
  25. ## Review
  26. First, a recap on terminology here. You can skip this section if you know
  27. Python well.
  28. In Python, _names_ are what most languages call "variables". They reference
  29. _objects_. So when we do:
  30. {% highlight python %}
  31. a = [1, 2, 3, 4]
  32. {% endhighlight %}
  33. ...we are creating a list object with four integers, and binding it to the name
  34. `a`. In graph form:<sup><a id="ref2" href="#fn2">[2]</a></sup>
  35. <svg width="223pt" height="44pt" viewBox="0.00 0.00 223.01 44.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 40)"><polygon fill="white" stroke="none" points="-4,4 -4,-40 219.012,-40 219.012,4 -4,4"/><g id="node1" class="node"><title>L</title><polygon fill="none" stroke="black" stroke-width="0.5" points="215.018,-36 126.994,-36 126.994,-0 215.018,-0 215.018,-36"/><text text-anchor="middle" x="171.006" y="-15" font-family="Courier,monospace" font-size="10.00">[1, 2, 3, 4]</text></g><g id="node2" class="node"><title>a</title><ellipse fill="none" stroke="black" stroke-width="0.5" cx="27" cy="-18" rx="27" ry="18"/><text text-anchor="middle" x="27" y="-13.8" font-family="Courier,monospace" font-size="14.00">a</text></g><g id="edge1" class="edge"><title>a&#45;&gt;L</title><path fill="none" stroke="black" stroke-width="0.5" d="M54.0461,-18C72.2389,-18 97.1211,-18 119.173,-18"/><polygon fill="black" stroke="black" stroke-width="0.5" points="119.339,-20.6251 126.839,-18 119.339,-15.3751 119.339,-20.6251"/></g></g></svg>
  36. In each of the following examples, we are creating new _references_ to the
  37. list object, but we are never duplicating it. Each reference points to the same
  38. memory address (which you can get using `id(a)`).
  39. {% highlight python %}
  40. b = a
  41. {% endhighlight %}
  42. {% highlight python %}
  43. c = SomeContainerClass()
  44. c.data = a
  45. {% endhighlight %}
  46. {% highlight python %}
  47. def wrapper(L):
  48. def inner():
  49. return L.pop()
  50. return inner
  51. d = wrapper(a)
  52. {% endhighlight %}
  53. <svg width="254pt" height="234pt" viewBox="0.00 0.00 253.96 234.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 238)"><polygon fill="white" stroke="none" points="-4,4 -4,-238 249.96,-238 249.96,4 -4,4"/><g id="clust3" class="cluster"><title>cluster0</title><polygon fill="none" stroke="black" stroke-width="0.5" points="8,-8 8,-82 78,-82 78,-8 8,-8"/><text text-anchor="middle" x="43" y="-66.8" font-family="Courier,monospace" font-size="14.00">d</text></g><g id="node1" class="node"><title>obj</title><polygon fill="none" stroke="black" stroke-width="0.5" points="245.966,-153 157.943,-153 157.943,-117 245.966,-117 245.966,-153"/><text text-anchor="middle" x="201.954" y="-132" font-family="Courier,monospace" font-size="10.00">[1, 2, 3, 4]</text></g><g id="node2" class="node"><title>a</title><ellipse fill="none" stroke="black" stroke-width="0.5" cx="43" cy="-216" rx="27" ry="18"/><text text-anchor="middle" x="43" y="-211.8" font-family="Courier,monospace" font-size="14.00">a</text></g><g id="edge1" class="edge"><title>a&#45;&gt;obj</title><path fill="none" stroke="black" stroke-width="0.5" d="M64.8423,-205.244C88.7975,-192.881 128.721,-172.278 159.152,-156.573"/><polygon fill="black" stroke="black" stroke-width="0.5" points="160.422,-158.872 165.883,-153.1 158.014,-154.206 160.422,-158.872"/></g><g id="node3" class="node"><title>b</title><ellipse fill="none" stroke="black" stroke-width="0.5" cx="43" cy="-162" rx="27" ry="18"/><text text-anchor="middle" x="43" y="-157.8" font-family="Courier,monospace" font-size="14.00">b</text></g><g id="edge2" class="edge"><title>b&#45;&gt;obj</title><path fill="none" stroke="black" stroke-width="0.5" d="M69.2174,-157.662C90.9996,-153.915 123.147,-148.385 150.231,-143.726"/><polygon fill="black" stroke="black" stroke-width="0.5" points="150.777,-146.295 157.724,-142.437 149.887,-141.121 150.777,-146.295"/></g><g id="node4" class="node"><title>c</title><ellipse fill="none" stroke="black" stroke-width="0.5" cx="43" cy="-108" rx="41.897" ry="18"/><text text-anchor="middle" x="43" y="-103.8" font-family="Courier,monospace" font-size="14.00">c.data</text></g><g id="edge3" class="edge"><title>c&#45;&gt;obj</title><path fill="none" stroke="black" stroke-width="0.5" d="M82.3954,-114.605C102.772,-118.11 128.077,-122.463 150.069,-126.247"/><polygon fill="black" stroke="black" stroke-width="0.5" points="149.86,-128.874 157.697,-127.559 150.75,-123.7 149.86,-128.874"/></g><g id="node5" class="node"><title>L</title><ellipse fill="none" stroke="black" stroke-width="0.5" cx="43" cy="-34" rx="27" ry="18"/><text text-anchor="middle" x="43" y="-29.8" font-family="Courier,monospace" font-size="14.00">L</text></g><g id="edge4" class="edge"><title>L&#45;&gt;obj</title><path fill="none" stroke="black" stroke-width="0.5" d="M62.9324,-46.183C88.5083,-62.6411 134.554,-92.2712 166.386,-112.755"/><polygon fill="black" stroke="black" stroke-width="0.5" points="165.223,-115.128 172.951,-116.98 168.064,-110.714 165.223,-115.128"/></g></g></svg>
  54. Note that these references are all equal. `a` is no more valid a name for the
  55. list than `b`, `c.data`, or `L` (or `d.func_closure[0].cell_contents` to the
  56. outside world). As a result, if you delete one of these references—explicitly
  57. with `del a`, or implicitly if a name goes out of scope—then the other
  58. references are still around, and object continues to exist. If all of an
  59. object's references disappear, then Python's garbage collector should eliminate
  60. it.
  61. ## Dead ends
  62. My first thought when approaching this problem was to physically write over the
  63. memory where our target object is stored. This can be done using
  64. [`ctypes.memmove()`](https://docs.python.org/2/library/ctypes.html#ctypes.memmove)
  65. from the Python standard library:
  66. {% highlight pycon %}
  67. >>> class A(object): pass
  68. ...
  69. >>> class B(object): pass
  70. ...
  71. >>> obj = A()
  72. >>> print obj
  73. <__main__.A object at 0x10e3e1190>
  74. >>> import ctypes
  75. >>> ctypes.memmove(id(A), id(B), object.__sizeof__(A))
  76. 140576340136752
  77. >>> print obj
  78. <__main__.B object at 0x10e3e1190>
  79. {% endhighlight %}
  80. What we are doing here is overwriting the fields of the `A` instance of the
  81. [`PyClassObject` C struct](https://github.com/python/cpython/blob/2.7/Include/classobject.h#L12)
  82. with fields from the `B` struct instance. As a result, they now share various
  83. properties, such as their attribute dictionaries
  84. ([`__dict__`](https://docs.python.org/2/reference/datamodel.html#the-standard-type-hierarchy)).
  85. So, we can do things like this:
  86. {% highlight pycon %}
  87. >>> B.foo = 123
  88. >>> obj.foo
  89. 123
  90. {% endhighlight %}
  91. However, there are clear issues. What we've done is create a
  92. [_shallow copy_](https://en.wikipedia.org/wiki/Object_copy#Shallow_copy).
  93. Therefore, `A` and `B` are still distinct objects, so certain changes made to
  94. one will not be replicated to the other:
  95. {% highlight pycon %}
  96. >>> A is B
  97. False
  98. >>> B.__name__ = "C"
  99. >>> A.__name__
  100. 'B'
  101. {% endhighlight %}
  102. Also, this won't work if `A` and `B` are different sizes, since we will be
  103. either reading from or writing to memory that we don't necessarily own:
  104. {% highlight pycon %}
  105. >>> A = ()
  106. >>> B = []
  107. >>> print A.__sizeof__(), B.__sizeof__()
  108. 24 40
  109. >>> import ctypes
  110. >>> ctypes.memmove(id(A), id(B), A.__sizeof__())
  111. 4321271888
  112. Python(33575,0x7fff76925300) malloc: *** error for object 0x6f: pointer being freed was not allocated
  113. *** set a breakpoint in malloc_error_break to debug
  114. Abort trap: 6
  115. {% endhighlight %}
  116. Oh, and there's a bit of a problem when we deallocate these objects, too...
  117. {% highlight pycon %}
  118. >>> A = []
  119. >>> B = range(8)
  120. >>> import ctypes
  121. >>> ctypes.memmove(id(A), id(B), A.__sizeof__())
  122. 4514685728
  123. >>> print A
  124. [0, 1, 2, 3, 4, 5, 6, 7]
  125. >>> del A
  126. >>> del B
  127. Segmentation fault: 11
  128. {% endhighlight %}
  129. ## Fishing for references with Guppy
  130. A more appropriate solution is finding all of the _references_ to the old
  131. object, and then updating them to point to the new object, rather than
  132. replacing the old object directly.
  133. But how do we track references? Fortunately, there's a library called
  134. [Guppy](http://guppy-pe.sourceforge.net/) that allows us to do this. Often used
  135. for diagnosing memory leaks, we can take advantage of its robust object
  136. tracking features here. Install it with [pip](https://pypi.python.org/pypi/pip)
  137. (`pip install guppy`).
  138. I've always found Guppy hard to use (as many debuggers are, though justified by
  139. the complexity of the task involved), so we'll begin with a feature demo before
  140. delving into the actual problem.
  141. ### Feature demonstration
  142. Guppy's interface is deceptively simple. We begin by calling
  143. [`guppy.hpy()`](http://guppy-pe.sourceforge.net/guppy.html#kindnames.guppy.hpy),
  144. to expose the Heapy interface, which is the component of Guppy with the
  145. features we want:
  146. {% highlight pycon %}
  147. >>> import guppy
  148. >>> hp = guppy.hpy()
  149. >>> hp
  150. Top level interface to Heapy.
  151. Use eg: hp.doc for more info on hp.
  152. {% endhighlight %}
  153. Calling
  154. [`hp.heap()`](http://guppy-pe.sourceforge.net/heapy_Use.html#heapykinds.Use.heap)
  155. shows us a table of the objects known to Guppy, grouped together
  156. (mathematically speaking,
  157. [_partitioned_](https://en.wikipedia.org/wiki/Partition_of_a_set)) by
  158. type<sup><a id="ref3" href="#fn3">[3]</a></sup> and sorted by how much space
  159. they take up in memory:
  160. {% highlight pycon %}
  161. >>> heap = hp.heap()
  162. >>> heap
  163. Partition of a set of 45761 objects. Total size = 4699200 bytes.
  164. Index Count % Size % Cumulative % Kind (class / dict of class)
  165. 0 15547 34 1494736 32 1494736 32 str
  166. 1 8356 18 770272 16 2265008 48 tuple
  167. 2 346 1 452080 10 2717088 58 dict (no owner)
  168. 3 13685 30 328440 7 3045528 65 int
  169. 4 71 0 221096 5 3266624 70 dict of module
  170. 5 1652 4 211456 4 3478080 74 types.CodeType
  171. 6 199 0 210856 4 3688936 79 dict of type
  172. 7 1614 4 193680 4 3882616 83 function
  173. 8 199 0 177008 4 4059624 86 type
  174. 9 124 0 135328 3 4194952 89 dict of class
  175. <91 more rows. Type e.g. '_.more' to view.>
  176. {% endhighlight %}
  177. This object (called an
  178. [`IdentitySet`](http://guppy-pe.sourceforge.net/heapy_UniSet.html#heapykinds.IdentitySet))
  179. looks bizarre, but it can be treated roughly like a list. If we want to take a
  180. look at strings, we can do `heap[0]`:
  181. {% highlight pycon %}
  182. >>> heap[0]
  183. Partition of a set of 22606 objects. Total size = 2049896 bytes.
  184. Index Count % Size % Cumulative % Kind (class / dict of class)
  185. 0 22606 100 2049896 100 2049896 100 str
  186. {% endhighlight %}
  187. This isn't very useful, though. What we really want to do is re-partition this
  188. subset using another relationship. There are a number of options, such as:
  189. {% highlight pycon %}
  190. >>> heap[0].byid # Group by object ID; each subset therefore has one element
  191. Set of 22606 <str> objects. Total size = 2049896 bytes.
  192. Index Size % Cumulative % Representation (limited)
  193. 0 7480 0.4 7480 0.4 'The class Bi... copy of S.\n'
  194. 1 4872 0.2 12352 0.6 "Support for ... 'error'.\n\n"
  195. 2 4760 0.2 17112 0.8 'Heap queues\...at Art! :-)\n'
  196. 3 4760 0.2 21872 1.1 'Heap queues\...at Art! :-)\n'
  197. 4 3896 0.2 25768 1.3 'This module ...ng function\n'
  198. 5 3824 0.2 29592 1.4 'The type of ...call order.\n'
  199. 6 3088 0.2 32680 1.6 't\x00\x00|\x...x00|\x02\x00S'
  200. 7 2992 0.1 35672 1.7 'HeapView(roo... size, etc.\n'
  201. 8 2808 0.1 38480 1.9 'Directory tr...ories\n\n '
  202. 9 2640 0.1 41120 2.0 'The class No... otherwise.\n'
  203. <22596 more rows. Type e.g. '_.more' to view.>
  204. {% endhighlight %}
  205. {% highlight pycon %}
  206. >>> heap[0].byrcs # Group by what types of objects reference the strings
  207. Partition of a set of 22606 objects. Total size = 2049896 bytes.
  208. Index Count % Size % Cumulative % Referrers by Kind (class / dict of class)
  209. 0 6146 27 610752 30 610752 30 types.CodeType
  210. 1 5304 23 563984 28 1174736 57 tuple
  211. 2 4104 18 237536 12 1412272 69 dict (no owner)
  212. 3 1959 9 139880 7 1552152 76 list
  213. 4 564 2 136080 7 1688232 82 function, tuple
  214. 5 809 4 97896 5 1786128 87 dict of module
  215. 6 346 2 71760 4 1857888 91 dict of type
  216. 7 365 2 19408 1 1877296 92 dict of module, tuple
  217. 8 192 1 16176 1 1893472 92 dict (no owner), list
  218. 9 232 1 11784 1 1905256 93 dict of class, function, tuple, types.CodeType
  219. <229 more rows. Type e.g. '_.more' to view.>
  220. {% endhighlight %}
  221. {% highlight pycon %}
  222. >>> heap[0].byvia # Group by how the strings are related to their referrers
  223. Partition of a set of 22606 objects. Total size = 2049896 bytes.
  224. Index Count % Size % Cumulative % Referred Via:
  225. 0 2656 12 420456 21 420456 21 '[0]'
  226. 1 2095 9 259008 13 679464 33 '.co_code'
  227. 2 2095 9 249912 12 929376 45 '.co_filename'
  228. 3 564 2 136080 7 1065456 52 '.func_doc', '[0]'
  229. 4 243 1 103528 5 1168984 57 "['__doc__']"
  230. 5 1930 9 100584 5 1269568 62 '.co_lnotab'
  231. 6 502 2 31128 2 1300696 63 '[1]'
  232. 7 306 1 16272 1 1316968 64 '[2]'
  233. 8 242 1 12960 1 1329928 65 '[3]'
  234. 9 184 1 9872 0 1339800 65 '[4]'
  235. <7323 more rows. Type e.g. '_.more' to view.>
  236. {% endhighlight %}
  237. From this, we can see that the plurality of memory devoted to strings is taken
  238. up by those referenced by code objects (`types.CodeType` represents
  239. Python code—accessible from a non-C-defined function through
  240. `func.func_code`—and contains things like the names of its local variables and
  241. the actual sequence of opcodes that make it up).
  242. For fun, let's pick a random string.
  243. {% highlight pycon %}
  244. >>> import random
  245. >>> obj = heap[0].byid[random.randrange(0, heap[0].count)]
  246. >>> obj
  247. Set of 1 <str> object. Total size = 176 bytes.
  248. Index Size % Cumulative % Representation (limited)
  249. 0 176 100.0 176 100.0 'Define names...not listed.\n'
  250. {% endhighlight %}
  251. Interesting. Since this heap subset contains only one element, we can use
  252. [`.theone`](http://guppy-pe.sourceforge.net/heapy_UniSet.html#heapykinds.IdentitySetSingleton.theone)
  253. to get the actual object represented here:
  254. {% highlight pycon %}
  255. >>> obj.theone
  256. 'Define names for all type symbols known in the standard interpreter.\n\nTypes that are part of optional modules (e.g. array) are not listed.\n'
  257. {% endhighlight %}
  258. Looks like the docstring for the
  259. [`types`](https://docs.python.org/2/library/types.html) module. We can confirm
  260. by using
  261. [`.referrers`](http://guppy-pe.sourceforge.net/heapy_UniSet.html#heapykinds.IdentitySet.referrers)
  262. to get the set of objects that refer to objects in the given set:
  263. {% highlight pycon %}
  264. >>> obj.referrers
  265. Partition of a set of 1 object. Total size = 3352 bytes.
  266. Index Count % Size % Cumulative % Kind (class / dict of class)
  267. 0 1 100 3352 100 3352 100 dict of module
  268. {% endhighlight %}
  269. This is `types.__dict__` (since the docstring we got is actually stored as
  270. `types.__dict__["__doc__"]`), so if we use `.referrers` again:
  271. {% highlight pycon %}
  272. >>> obj.referrers.referrers
  273. Partition of a set of 1 object. Total size = 56 bytes.
  274. Index Count % Size % Cumulative % Kind (class / dict of class)
  275. 0 1 100 56 100 56 100 module
  276. >>> obj.referrers.referrers.theone
  277. <module 'types' from '/usr/local/Cellar/python/2.7.8_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/types.pyc'>
  278. >>> import types
  279. >>> types.__doc__ is obj.theone
  280. True
  281. {% endhighlight %}
  282. _But why did we find an object in the `types` module if we never imported it?_
  283. Well, let's see. We can use
  284. [`hp.iso()`](http://guppy-pe.sourceforge.net/heapy_Use.html#heapykinds.Use.iso)
  285. to get the Heapy set consisting of a single given object:
  286. {% highlight pycon %}
  287. >>> hp.iso(types)
  288. Partition of a set of 1 object. Total size = 56 bytes.
  289. Index Count % Size % Cumulative % Kind (class / dict of class)
  290. 0 1 100 56 100 56 100 module
  291. {% endhighlight %}
  292. Using a similar procedure as before, we see that `types` is imported by the
  293. [`traceback`](https://docs.python.org/2/library/traceback.html) module:
  294. {% highlight pycon %}
  295. >>> hp.iso(types).referrers
  296. Partition of a set of 10 objects. Total size = 25632 bytes.
  297. Index Count % Size % Cumulative % Kind (class / dict of class)
  298. 0 2 20 13616 53 13616 53 dict (no owner)
  299. 1 5 50 9848 38 23464 92 dict of module
  300. 2 1 10 1048 4 24512 96 dict of guppy.etc.Glue.Interface
  301. 3 1 10 1048 4 25560 100 dict of guppy.etc.Glue.Share
  302. 4 1 10 72 0 25632 100 tuple
  303. >>> hp.iso(types).referrers[1].byid
  304. Set of 5 <dict of module> objects. Total size = 9848 bytes.
  305. Index Size % Cumulative % Owner Name
  306. 0 3352 34.0 3352 34.0 traceback
  307. 1 3352 34.0 6704 68.1 warnings
  308. 2 1048 10.6 7752 78.7 __main__
  309. 3 1048 10.6 8800 89.4 abc
  310. 4 1048 10.6 9848 100.0 guppy.etc.Glue
  311. {% endhighlight %}
  312. ...and that is imported by
  313. [`site`](https://docs.python.org/2/library/site.html):
  314. {% highlight pycon %}
  315. >>> import traceback
  316. >>> hp.iso(traceback).referrers
  317. Partition of a set of 3 objects. Total size = 15992 bytes.
  318. Index Count % Size % Cumulative % Kind (class / dict of class)
  319. 0 1 33 12568 79 12568 79 dict (no owner)
  320. 1 1 33 3352 21 15920 100 dict of module
  321. 2 1 33 72 0 15992 100 tuple
  322. >>> hp.iso(traceback).referrers[1].byid
  323. Set of 1 <dict of module> object. Total size = 3352 bytes.
  324. Index Size % Cumulative % Owner Name
  325. 0 3352 100.0 3352 100.0 site
  326. {% endhighlight %}
  327. Since `site` is imported by Python on startup, we've figured out why objects
  328. from `types` exist, even though we've never used them.
  329. We've learned something important, too. When objects are stored as ordinary
  330. attributes of a parent object (like `types.__doc__`, `traceback.types`, and
  331. `site.traceback` from above), they are not referenced directly by the parent
  332. object, but by that object's `__dict__` attribute. Therefore, if we want to
  333. replace `A` with `B` and `A` is an attribute of `C`, we (probably) don't need
  334. to know anything special about `C`—just how to modify dictionaries.
  335. A good Guppy/Heapy tutorial, while a bit old and incomplete, can be found on
  336. [Andrey Smirnov's website](http://smira.ru/wp-content/uploads/2011/08/heapy.html).
  337. ## Examining paths
  338. Let's set up an example replacement using class instances:
  339. {% highlight python %}
  340. class A(object):
  341. pass
  342. class B(object):
  343. pass
  344. a = A()
  345. b = B()
  346. {% endhighlight %}
  347. Suppose we want to replace `a` with `b`. From the demo above, we know that we
  348. can get the Heapy set of a single object using `hp.iso()`. We also know we can
  349. use `.referrers` to get the set of objects that reference the given object:
  350. {% highlight pycon %}
  351. >>> import guppy
  352. >>> hp = guppy.hpy()
  353. >>> print hp.iso(a).referrers
  354. Partition of a set of 1 object. Total size = 1048 bytes.
  355. Index Count % Size % Cumulative % Kind (class / dict of class)
  356. 0 1 100 1048 100 1048 100 dict of module
  357. {% endhighlight %}
  358. `a` is only referenced by one object, which makes sense, since we've only used
  359. it in one place—as a local variable—meaning `hp.iso(a).referrers.theone` must
  360. be [`locals()`](https://docs.python.org/2/library/functions.html#locals):
  361. {% highlight pycon %}
  362. >>> hp.iso(a).referrers.theone is locals()
  363. True
  364. {% endhighlight %}
  365. However, there is a more useful feature available to us:
  366. [`.pathsin`](http://guppy-pe.sourceforge.net/heapy_UniSet.html#heapykinds.IdentitySet.pathsin).
  367. This also returns references to the given object, but instead of a Heapy set,
  368. it is a list of `Path` objects. These are more useful since they tell us not
  369. only _what_ objects are related to the given object, but _how_ they are
  370. related.
  371. {% highlight pycon %}
  372. >>> print hp.iso(a).pathsin
  373. 0: Src['a']
  374. {% endhighlight %}
  375. This looks very ambiguous. However, we find that we can extract the source of
  376. the reference using `.src`:
  377. {% highlight pycon %}
  378. >>> path = hp.iso(a).pathsin[0]
  379. >>> print path.src
  380. Partition of a set of 1 object. Total size = 1048 bytes.
  381. Index Count % Size % Cumulative % Kind (class / dict of class)
  382. 0 1 100 1048 100 1048 100 dict of module
  383. >>> path.src.theone is locals()
  384. True
  385. {% endhighlight %}
  386. ...and, we can examine the type of relation by looking at `.path[1]` (the
  387. actual reason for this isn't worth getting into, due to Guppy's lack of
  388. documentation on the subject):
  389. {% highlight pycon %}
  390. >>> relation = path.path[1]
  391. >>> relation
  392. <guppy.heapy.Path.Based_R_INDEXVAL object at 0x100f38230>
  393. {% endhighlight %}
  394. We notice that `relation` is a `Based_R_INDEXVAL` object. Sounds bizarre, but
  395. this tells us that `a` is a particular indexed value of `path.src`. What index?
  396. We can get this using `relation.r`:
  397. {% highlight pycon %}
  398. >>> rel = relation.r
  399. >>> print rel
  400. a
  401. {% endhighlight %}
  402. Ah ha! So now we know that `a` is equal to the reference source (i.e.,
  403. `path.src.theone`) indexed by `rel`:
  404. {% highlight pycon %}
  405. >>> path.src.theone[rel] is a
  406. True
  407. {% endhighlight %}
  408. But `path.src.theone` is just a dictionary, meaning we know how to modify it
  409. very easily:<sup><a id="ref4" href="#fn4">[4]</a></sup>
  410. {% highlight pycon %}
  411. >>> path.src.theone[rel] = b
  412. >>> a
  413. <__main__.B object at 0x100dae090>
  414. >>> a is b
  415. True
  416. {% endhighlight %}
  417. Bingo. We've successfully replaced `a` with `b`, using a general method that
  418. should work for any case where `a` is in a dictionary-like object.
  419. ## Handling different reference types
  420. We'll continue by wrapping this code up in a nice function, which we will
  421. expand as we go:
  422. {% highlight python %}
  423. import guppy
  424. from guppy.heapy import Path
  425. hp = guppy.hpy()
  426. def replace(old, new):
  427. for path in hp.iso(old).pathsin:
  428. relation = path.path[1]
  429. if isinstance(relation, Path.R_INDEXVAL):
  430. path.src.theone[relation.r] = new
  431. {% endhighlight %}
  432. ### Dictionaries, lists, and tuples
  433. As noted above, this is versatile to handle many dictionary-like situations,
  434. including `__dict__`, which means we already know how to replace object
  435. attributes:
  436. {% highlight pycon %}
  437. >>> a, b = A(), B()
  438. >>>
  439. >>> class X(object):
  440. ... pass
  441. ...
  442. >>> X.cattr = a
  443. >>> x = X()
  444. >>> x.iattr = a
  445. >>> d1 = {1: a}
  446. >>> d2 = [{1: {0: ("foo", "bar", {"a": a, "b": b})}}]
  447. >>>
  448. >>> replace(a, b)
  449. >>>
  450. >>> print a
  451. <__main__.B object at 0x1042b9910>
  452. >>> print X.cattr
  453. <__main__.B object at 0x1042b9910>
  454. >>> print x.iattr
  455. <__main__.B object at 0x1042b9910>
  456. >>> print d1[1]
  457. <__main__.B object at 0x1042b9910>
  458. >>> print d2[0][1][0][2]["a"]
  459. <__main__.B object at 0x1042b9910>
  460. {% endhighlight %}
  461. Lists can be handled exactly the same as dictionaries, although the keys in
  462. this case (i.e., `relation.r`) will always be integers.
  463. {% highlight pycon %}
  464. >>> a, b = A(), B()
  465. >>> L = [0, 1, 2, a, b]
  466. >>> print L
  467. [0, 1, 2, <__main__.A object at 0x104598950>, <__main__.B object at 0x104598910>]
  468. >>> replace(a, b)
  469. >>> print L
  470. [0, 1, 2, <__main__.B object at 0x104598910>, <__main__.B object at 0x104598910>]
  471. {% endhighlight %}
  472. Tuples are interesting. We can't modify them directly because they're
  473. immutable, but we _can_ create a new tuple with the new value, and then replace
  474. that tuple just like we replaced our original object:
  475. {% highlight python %}
  476. # Meanwhile, in replace()...
  477. if isinstance(relation, Path.R_INDEXVAL):
  478. source = path.src.theone
  479. if isinstance(source, tuple):
  480. temp = list(source)
  481. temp[relation.r] = new
  482. replace(source, tuple(temp))
  483. else:
  484. source[relation.r] = new
  485. {% endhighlight %}
  486. As a result:
  487. {% highlight pycon %}
  488. >>> a, b = A(), B()
  489. >>> t1 = (0, 1, 2, a)
  490. >>> t2 = (0, (1, (2, (3, (4, (5, (a,)))))))
  491. >>> replace(a, b)
  492. >>> print t1
  493. (0, 1, 2, <__main__.B object at 0x104598e50>)
  494. >>> print t2
  495. (0, (1, (2, (3, (4, (5, (<__main__.B object at 0x104598e50>,)))))))
  496. {% endhighlight %}
  497. ### Bound methods
  498. Here's a fun one. Let's upgrade our definitions of `A` and `B`:
  499. {% highlight python %}
  500. class A(object):
  501. def func(self):
  502. return self
  503. class B(object):
  504. pass
  505. {% endhighlight %}
  506. After replacing `a` with `b`, `a.func` no longer exists, as we'd expect:
  507. {% highlight pycon %}
  508. >>> a, b = A(), B()
  509. >>> a.func()
  510. <__main__.A object at 0x10c4a5b10>
  511. >>> replace(a, b)
  512. >>> a.func()
  513. Traceback (most recent call last):
  514. File "<stdin>", line 1, in <module>
  515. AttributeError: 'B' object has no attribute 'func'
  516. {% endhighlight %}
  517. But what if we save a reference to `a.func` before the replacement?
  518. {% highlight pycon %}
  519. >>> a, b = A(), B()
  520. >>> f = a.func
  521. >>> replace(a, b)
  522. >>> f()
  523. <__main__.A object at 0x10c4b6090>
  524. {% endhighlight %}
  525. Hmm. So `f` has kept a reference to `a` somehow, but not in a dictionary-like
  526. object. So where is it?
  527. Well, we can reveal it with the attribute `f.__self__`:
  528. {% highlight pycon %}
  529. >>> f.__self__
  530. <__main__.A object at 0x10c4b6090>
  531. {% endhighlight %}
  532. Unfortunately, this attribute is magical and we can't write to it directly:
  533. {% highlight pycon %}
  534. >>> f.__self__ = b
  535. Traceback (most recent call last):
  536. File "<stdin>", line 1, in <module>
  537. TypeError: readonly attribute
  538. {% endhighlight %}
  539. Python clearly doesn't want us to re-bind bound methods, and a reasonable
  540. person would give up here, but we still have a few tricks up our sleeve. Let's
  541. examine the internal C structure of bound methods,
  542. [`PyMethodObject`](https://github.com/python/cpython/blob/2.7/Include/classobject.h#L31):
  543. <svg width="559pt" height="200pt" viewBox="0.00 18.00 559.03 200.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 226)"><polygon fill="white" stroke="none" points="-4,4 -4,-226 555.032,-226 555.032,4 -4,4"/><g id="clust2" class="cluster"><title>cluster</title><polygon fill="none" stroke="black" stroke-width="0" points="8,-8 8,-214 272,-214 272,-8 8,-8"/><text text-anchor="middle" x="140" y="-14.8" font-family="Courier,monospace" font-size="14.00">PyMethodObject</text></g><g id="node1" class="node"><title>obj</title><polygon fill="none" stroke="black" stroke-width="0.5" points="551.048,-110 336.984,-110 336.984,-74 551.048,-74 551.048,-110"/><text text-anchor="middle" x="444.016" y="-89" font-family="Courier,monospace" font-size="10.00">&lt;__main__.A object at 0xdeadbeef&gt;</text></g><g id="node2" class="node"><title>struct</title><polygon fill="#eeeeee" stroke="none" points="24,-182 24,-202 256,-202 256,-182 24,-182"/><polygon fill="none" stroke="black" stroke-width="0.5" points="24,-182 24,-202 256,-202 256,-182 24,-182"/><text text-anchor="start" x="27" y="-188.8" font-family="Courier,monospace" font-size="14.00" fill="#888888">struct _object* </text><text text-anchor="start" x="161.422" y="-188.8" font-family="Courier,monospace" font-style="oblique" font-size="14.00" fill="#666666">_ob_next</text><polygon fill="#eeeeee" stroke="none" points="24,-162 24,-182 256,-182 256,-162 24,-162"/><polygon fill="none" stroke="black" stroke-width="0.5" points="24,-162 24,-182 256,-182 256,-162 24,-162"/><text text-anchor="start" x="27" y="-168.8" font-family="Courier,monospace" font-size="14.00" fill="#888888">struct _object* </text><text text-anchor="start" x="161.422" y="-168.8" font-family="Courier,monospace" font-style="oblique" font-size="14.00" fill="#666666">_ob_prev</text><polygon fill="#eeeeee" stroke="none" points="24,-142 24,-162 256,-162 256,-142 24,-142"/><polygon fill="none" stroke="black" stroke-width="0.5" points="24,-142 24,-162 256,-162 256,-142 24,-142"/><text text-anchor="start" x="27" y="-148.8" font-family="Courier,monospace" font-size="14.00" fill="#888888">Py_ssize_t </text><text text-anchor="start" x="119.415" y="-148.8" font-family="Courier,monospace" font-size="14.00">ob_refcnt</text><polygon fill="#eeeeee" stroke="none" points="24,-122 24,-142 256,-142 256,-122 24,-122"/><polygon fill="none" stroke="black" stroke-width="0.5" points="24,-122 24,-142 256,-142 256,-122 24,-122"/><text text-anchor="start" x="26.5815" y="-128.8" font-family="Courier,monospace" font-size="14.00" fill="#888888">struct _typeobject* </text><text text-anchor="start" x="194.609" y="-128.8" font-family="Courier,monospace" font-size="14.00">ob_type</text><polygon fill="none" stroke="black" stroke-width="0.5" points="24,-102 24,-122 256,-122 256,-102 24,-102"/><text text-anchor="start" x="27" y="-108.8" font-family="Courier,monospace" font-size="14.00" fill="#888888">PyObject* </text><text text-anchor="start" x="111.014" y="-108.8" font-family="Courier,monospace" font-size="14.00">im_func</text><polygon fill="none" stroke="black" stroke-width="0.5" points="24,-82 24,-102 256,-102 256,-82 24,-82"/><text text-anchor="start" x="27" y="-88.8" font-family="Courier,monospace" font-size="14.00" fill="#888888">PyObject* </text><text text-anchor="start" x="111.014" y="-88.8" font-family="Courier,monospace" font-size="14.00">im_self</text><polygon fill="none" stroke="black" stroke-width="0.5" points="24,-62 24,-82 256,-82 256,-62 24,-62"/><text text-anchor="start" x="27" y="-68.8" font-family="Courier,monospace" font-size="14.00" fill="#888888">PyObject* </text><text text-anchor="start" x="111.014" y="-68.8" font-family="Courier,monospace" font-size="14.00">im_class</text><polygon fill="none" stroke="black" stroke-width="0.5" points="24,-42 24,-62 256,-62 256,-42 24,-42"/><text text-anchor="start" x="27" y="-48.8" font-family="Courier,monospace" font-size="14.00" fill="#888888">PyObject* </text><text text-anchor="start" x="111.014" y="-48.8" font-family="Courier,monospace" font-size="14.00">im_weakreflist</text></g><g id="edge1" class="edge"><title>struct:f&#45;&gt;obj</title><path fill="none" stroke="black" stroke-width="0.5" d="M257,-92C280.313,-92 305.269,-92 329.087,-92"/><polygon fill="black" stroke="black" stroke-width="0.5" points="329.277,-94.6251 336.777,-92 329.277,-89.3751 329.277,-94.6251"/></g></g></svg>
  544. The four gray fields of the struct come from
  545. [`PyObject_HEAD`](https://github.com/python/cpython/blob/2.7/Include/object.h#L78),
  546. which exist in all Python objects. The first two fields are from
  547. [`_PyObject_HEAD_EXTRA`](https://github.com/python/cpython/blob/2.7/Include/object.h#L66),
  548. and only exist when the debugging macro `Py_TRACE_REFS` is defined, in order to
  549. support more advanced reference counting. We can see that the `im_self` field,
  550. which mantains the reference to our target object, is either forth or sixth in
  551. the struct depending on `Py_TRACE_REFS`. If we can figure out the size of the
  552. field and its offset from the start of the struct, then we can set its value
  553. directly using `ctypes.memmove()`:
  554. {% highlight python %}
  555. ctypes.memmove(id(f) + offset, ctypes.byref(ctypes.py_object(b)), field_size)
  556. {% endhighlight %}
  557. Here, `id(f)` is the memory location of our method, which refers to the start
  558. of the C struct from above. `offset` is the number of bytes between this memory
  559. location and the start of the `im_self` field. We use
  560. [`ctypes.byref()`](https://docs.python.org/2/library/ctypes.html#ctypes.byref)
  561. to create a reference to the replacement object, `b`, which will be copied over
  562. the existing reference to `a`. Finally, `field_size` is the number of bytes
  563. we're copying, equal to the size of the `im_self` field.
  564. Well, all but one of these fields are pointers to structure types, meaning they
  565. have the same size,<sup><a id="ref5" href="#fn5">[5]</a></sup> equal to
  566. [`ctypes.sizeof(ctypes.py_object)`](https://docs.python.org/2/library/ctypes.html#ctypes.sizeof).
  567. This is (probably) 4 or 8 bytes, depending on whether you're on a 32-bit or a
  568. 64-bit system. The other field is a `Py_ssize_t` object—possibly the same size
  569. as the pointers, but we can't be sure—which is equal to
  570. `ctypes.sizeof(ctypes.c_ssize_t)`.
  571. We know that `field_size` must be `ctypes.sizeof(ctypes.py_object)`, since we
  572. are copying a structure pointer. `offset` is this value multiplied by the
  573. number of structure pointers before `im_self` (4 if `Py_TRACE_REFS` is defined
  574. and 2 otherwise), plus `ctypes.sizeof(ctypes.c_ssize_t)` for `ob_type`. But how
  575. do we determine if `Py_TRACE_REFS` is defined? We can't check the value of a
  576. macro at runtime, but we can check for the existence of
  577. [`sys.getobjects()`](https://github.com/python/cpython/blob/2.7/Misc/SpecialBuilds.txt#L54),
  578. which is
  579. [only defined when that macro is](https://github.com/python/cpython/blob/2.7/Python/sysmodule.c#L951).
  580. Therefore, we can make our replacement like so:
  581. {% highlight pycon %}
  582. >>> import ctypes
  583. >>> import sys
  584. >>> field_size = ctypes.sizeof(ctypes.py_object)
  585. >>> ptrs_in_struct = 4 if hasattr(sys, "getobjects") else 2
  586. >>> offset = ptrs_in_struct * field_size + ctypes.sizeof(ctypes.c_ssize_t)
  587. >>> ctypes.memmove(id(f) + offset, ctypes.byref(ctypes.py_object(b)), field_size)
  588. 4470258440
  589. >>> f.__self__ is b
  590. True
  591. >>> f()
  592. <__main__.B object at 0x10a8af290>
  593. {% endhighlight %}
  594. Excellent—it worked!
  595. There's another kind of bound method, which is the built-in variety as opposed
  596. to the user-defined variety we saw above. An example is `a.__sizeof__()`:
  597. {% highlight pycon %}
  598. >>> a, b = A(), B()
  599. >>> f = a.__sizeof__
  600. >>> f
  601. <built-in method __sizeof__ of A object at 0x10ab44b50>
  602. >>> replace(a, b)
  603. >>> f.__self__
  604. <__main__.A object at 0x10ab44b50>
  605. {% endhighlight %}
  606. This is stored internally as a
  607. [`PyCFunctionObject`](https://github.com/python/cpython/blob/2.7/Include/methodobject.h#L81).
  608. Let's take a look at its layout:
  609. <svg width="559pt" height="180pt" viewBox="0.00 18.00 559.03 180.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 206)"><polygon fill="white" stroke="none" points="-4,4 -4,-206 555.032,-206 555.032,4 -4,4"/><g id="clust2" class="cluster"><title>cluster</title><polygon fill="none" stroke="black" stroke-width="0" points="8,-8 8,-194 272,-194 272,-8 8,-8"/><text text-anchor="middle" x="140" y="-14.8" font-family="Courier,monospace" font-size="14.00">PyCFunctionObject</text></g><g id="node1" class="node"><title>obj</title><polygon fill="none" stroke="black" stroke-width="0.5" points="551.048,-90 336.984,-90 336.984,-54 551.048,-54 551.048,-90"/><text text-anchor="middle" x="444.016" y="-69" font-family="Courier,monospace" font-size="10.00">&lt;__main__.A object at 0xdeadbeef&gt;</text></g><g id="node2" class="node"><title>struct</title><polygon fill="#eeeeee" stroke="none" points="24,-162 24,-182 256,-182 256,-162 24,-162"/><polygon fill="none" stroke="black" stroke-width="0.5" points="24,-162 24,-182 256,-182 256,-162 24,-162"/><text text-anchor="start" x="27" y="-168.8" font-family="Courier,monospace" font-size="14.00" fill="#888888">struct _object* </text><text text-anchor="start" x="161.422" y="-168.8" font-family="Courier,monospace" font-style="oblique" font-size="14.00" fill="#666666">_ob_next</text><polygon fill="#eeeeee" stroke="none" points="24,-142 24,-162 256,-162 256,-142 24,-142"/><polygon fill="none" stroke="black" stroke-width="0.5" points="24,-142 24,-162 256,-162 256,-142 24,-142"/><text text-anchor="start" x="27" y="-148.8" font-family="Courier,monospace" font-size="14.00" fill="#888888">struct _object* </text><text text-anchor="start" x="161.422" y="-148.8" font-family="Courier,monospace" font-style="oblique" font-size="14.00" fill="#666666">_ob_prev</text><polygon fill="#eeeeee" stroke="none" points="24,-122 24,-142 256,-142 256,-122 24,-122"/><polygon fill="none" stroke="black" stroke-width="0.5" points="24,-122 24,-142 256,-142 256,-122 24,-122"/><text text-anchor="start" x="27" y="-128.8" font-family="Courier,monospace" font-size="14.00" fill="#888888">Py_ssize_t </text><text text-anchor="start" x="119.415" y="-128.8" font-family="Courier,monospace" font-size="14.00">ob_refcnt</text><polygon fill="#eeeeee" stroke="none" points="24,-102 24,-122 256,-122 256,-102 24,-102"/><polygon fill="none" stroke="black" stroke-width="0.5" points="24,-102 24,-122 256,-122 256,-102 24,-102"/><text text-anchor="start" x="26.5815" y="-108.8" font-family="Courier,monospace" font-size="14.00" fill="#888888">struct _typeobject* </text><text text-anchor="start" x="194.609" y="-108.8" font-family="Courier,monospace" font-size="14.00">ob_type</text><polygon fill="none" stroke="black" stroke-width="0.5" points="24,-82 24,-102 256,-102 256,-82 24,-82"/><text text-anchor="start" x="27" y="-88.8" font-family="Courier,monospace" font-size="14.00" fill="#888888">PyMethodDef* </text><text text-anchor="start" x="136.218" y="-88.8" font-family="Courier,monospace" font-size="14.00">m_ml</text><polygon fill="none" stroke="black" stroke-width="0.5" points="24,-62 24,-82 256,-82 256,-62 24,-62"/><text text-anchor="start" x="27" y="-68.8" font-family="Courier,monospace" font-size="14.00" fill="#888888">PyObject* </text><text text-anchor="start" x="111.014" y="-68.8" font-family="Courier,monospace" font-size="14.00">m_self</text><polygon fill="none" stroke="black" stroke-width="0.5" points="24,-42 24,-62 256,-62 256,-42 24,-42"/><text text-anchor="start" x="27" y="-48.8" font-family="Courier,monospace" font-size="14.00" fill="#888888">PyObject* </text><text text-anchor="start" x="111.014" y="-48.8" font-family="Courier,monospace" font-size="14.00">m_module</text></g><g id="edge1" class="edge"><title>struct:f&#45;&gt;obj</title><path fill="none" stroke="black" stroke-width="0.5" d="M257,-72C280.313,-72 305.269,-72 329.087,-72"/><polygon fill="black" stroke="black" stroke-width="0.5" points="329.277,-74.6251 336.777,-72 329.277,-69.3751 329.277,-74.6251"/></g></g></svg>
  610. Fortunately, `m_self` here has the same offset as `im_self` from before, so we
  611. can just use the same code:
  612. {% highlight pycon %}
  613. >>> ctypes.memmove(id(f) + offset, ctypes.byref(ctypes.py_object(b)), field_size)
  614. 4474703768
  615. >>> f.__self__ is b
  616. True
  617. >>> f
  618. <built-in method __sizeof__ of B object at 0x10ab4f150>
  619. {% endhighlight %}
  620. ### Dictionary keys
  621. Dictionary keys have a different reference relation type than values, but the
  622. replacement works mostly the same way. We pop the value of the old key from the
  623. dictionary, and then insert it in again under the new key. Here's the code,
  624. which we'll stick into the main block in `replace()`:
  625. {% highlight python %}
  626. elif isinstance(relation, Path.R_INDEXKEY):
  627. source = path.src.theone
  628. source[new] = source.pop(source.keys()[relation.r])
  629. {% endhighlight %}
  630. And, a demonstration:
  631. {% highlight pycon %}
  632. >>> a, b = A(), B()
  633. >>> d = {a: 1}
  634. >>> replace(a, b)
  635. >>> d
  636. {<__main__.B object at 0x10fb47950>: 1}
  637. {% endhighlight %}
  638. ### Closure cells
  639. We'll cover just one more case, this time involving a
  640. [closure](https://en.wikipedia.org/wiki/Closure_(computer_programming)). Here's
  641. our test function:
  642. {% highlight python %}
  643. def wrapper(obj):
  644. def inner():
  645. return obj
  646. return inner
  647. {% endhighlight %}
  648. As we can see, an instance of the inner function keeps references to the locals
  649. of the wrapper function, even after using our current
  650. version of `replace()`:
  651. {% highlight pycon %}
  652. >>> a, b = A(), B()
  653. >>> f = wrapper(a)
  654. >>> f()
  655. <__main__.A object at 0x109446090>
  656. >>> replace(a, b)
  657. >>> f()
  658. <__main__.A object at 0x109446090>
  659. {% endhighlight %}
  660. Internally, CPython implements this using things called
  661. [_cells_](https://docs.python.org/2/c-api/cell.html). We notice that
  662. `f.func_closure` gives us a tuple of `cell` objects, and we can examine an
  663. individual cell's contents with `.cell_contents`:
  664. {% highlight pycon %}
  665. >>> f.func_closure
  666. (<cell at 0x10ad9f478: instance object at 0x109446090>,)
  667. >>> f.func_closure[0].cell_contents
  668. <__main__.A object at 0x109446090>
  669. {% endhighlight %}
  670. As expected, we can't just modify it...
  671. {% highlight pycon %}
  672. >>> f.func_closure[0].cell_contents = b
  673. Traceback (most recent call last):
  674. File "<stdin>", line 1, in <module>
  675. AttributeError: attribute 'cell_contents' of 'cell' objects is not writable
  676. {% endhighlight %}
  677. ...because that would be too easy. So, how can we replace it? Well, we could
  678. go back to `memmove`, but there's an easier way thanks to the `ctypes` module
  679. also exposing Python's C API. Specifically, the
  680. [`PyCell_Set`](https://docs.python.org/2/c-api/cell.html#c.PyCell_Set) function
  681. (which seems to lack a pure Python equivalent) does exactly what we want. Since
  682. the function expects `PyObject*`s as arguments, we'll need to use
  683. `ctypes.py_object` as a wrapper. Here it is:
  684. {% highlight pycon %}
  685. >>> from ctypes import py_object, pythonapi
  686. >>> pythonapi.PyCell_Set(py_object(f.func_closure[0]), py_object(b))
  687. 0
  688. >>> f()
  689. <__main__.B object at 0x10ad94dd0>
  690. {% endhighlight %}
  691. Perfect – the replacement worked. To tie it together with `replace()`, we'll
  692. note that Guppy represents the cell contents relationship with
  693. `Based_R_INTERATTR`, for what I assume to be "internal attribute". We can use
  694. this to find the cell object within the inner function that references our
  695. target object, and then use the method above to make the change:
  696. {% highlight python %}
  697. elif isinstance(relation, Path.R_INTERATTR):
  698. if isinstance(source, CellType):
  699. pythonapi.PyCell_Set(py_object(source), py_object(new))
  700. return
  701. {% endhighlight %}
  702. ### Other cases
  703. There are many, many more types of possible replacements. I've written a more
  704. extensible version of `replace()` with some test cases, which can be viewed
  705. on Gist [here](https://gist.github.com/earwig/28a64ffb94d51a608e3d).
  706. Certainly, not every case is handled by it, but it seems to cover the majority
  707. that I've found through testing. There are a number of reference relations in
  708. Guppy that I couldn't figure out how to replicate without doing something
  709. insane (`R_HASATTR`, `R_CELL`, and `R_STACK`), so some obscure replacements are
  710. likely unimplemented.
  711. Some other kinds of replacements are known, but impossible. For example,
  712. replacing a class object that uses `__slots__` with another class will not work
  713. if the replacement class has a different slot layout and instances of the old
  714. class exist. More generally, replacing a class with a non-class object won't
  715. work if instances of the class exist. Furthermore, references stored in data
  716. structures managed by C extensions cannot be changed, since there's no good way
  717. for us to track these.
  718. ## Footnotes
  719. 1. <a id="fn1" href="#ref1">^</a> This post relies _heavily_ on implementation
  720. details of CPython 2.7. While it could be adapted for Python 3 by examining
  721. changes to the internal structures of objects that we used above, that would
  722. be a lost cause if you wanted to replicate this on
  723. [Jython](http://www.jython.org/) or some other implementation. We are so
  724. dependent on concepts specific to CPython that you would need to start from
  725. scratch, beginning with a language-specific replacement for Guppy.
  726. 2. <a id="fn2" href="#ref2">^</a> The
  727. [DOT files](https://en.wikipedia.org/wiki/DOT_(graph_description_language))
  728. used to generate graphs in this post are
  729. [available on Gist](https://gist.github.com/earwig/edc13f04f871c110eea6).
  730. 3. <a id="fn3" href="#ref3">^</a> They're actually grouped together by _clodo_
  731. ("class or dict object"), which is similar to type, but groups `__dict__`s
  732. separately by their owner's type.
  733. 4. <a id="fn4" href="#ref4">^</a> Python's documentation tells us not to modify
  734. the locals dictionary, but screw that; we're gonna do it anyway.
  735. 5. <a id="fn5" href="#ref5">^</a> According to the
  736. [C99](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf) and
  737. [C11 standards](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf);
  738. section 6.2.5.27 in the former and 6.2.5.28 in the latter: "All pointers to
  739. structure types shall have the same representation and alignment
  740. requirements as each other."