Personal website https://benkurtovic.com/
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

python-object-replacement.md 25 KiB

10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640
  1. ---
  2. layout: post
  3. title: Replacing Objects in Python
  4. tags: Python
  5. description: More reflection than you cared to ask for
  6. draft: true
  7. ---
  8. Today, we're going to demonstrate a fairly evil thing in Python, which I call
  9. _object replacement_.
  10. Say you have some program that's been running for a while, and a particular
  11. object has made its way throughout your code. It lives inside lists, class
  12. attributes, maybe even inside some closures. You want to completely replace
  13. this object with another one; that is to say, you want to find all references
  14. to object `A` and replace them with object `B`, enabling `A` to be garbage
  15. collected. This has some interesting implications for special object types. If
  16. you have methods that are bound to `A`, you want to rebind them to `B`. If `A`
  17. is a class, you want all instances of `A` to become instances of `B`. And so
  18. on.
  19. _But why on Earth would you want to do that?_ you ask. I'll focus on a concrete
  20. use case in a future post, but for now, I imagine this could be useful in some
  21. kind of advanted unit testing situation with mock objects. Still, it's fairly
  22. insane, so let's leave it as primarily an intellectual exercise.
  23. This article is written for [CPython](https://en.wikipedia.org/wiki/CPython)
  24. 2.7.<sup><a id="ref1" href="#fn1">[1]</a></sup>
  25. ## Review
  26. First, a recap on terminology here. You can skip this section if you know
  27. Python well.
  28. In Python, _names_ are what most languages call "variables". They reference
  29. _objects_. So when we do:
  30. {% highlight python %}
  31. a = [1, 2, 3, 4]
  32. {% endhighlight %}
  33. ...we are creating a list object with four integers, and binding it to the name
  34. `a`. In graph form:<sup><a id="ref2" href="#fn2">[2]</a></sup>
  35. <svg width="223pt" height="44pt" viewBox="0.00 0.00 223.01 44.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 40)"><title>%3</title><polygon fill="white" stroke="none" points="-4,4 -4,-40 219.012,-40 219.012,4 -4,4"/><g id="node1" class="node"><title>L</title><polygon fill="none" stroke="black" stroke-width="0.5" points="215.018,-36 126.994,-36 126.994,-0 215.018,-0 215.018,-36"/><text text-anchor="middle" x="171.006" y="-15" font-family="Courier,monospace" font-size="10.00">[1, 2, 3, 4]</text></g><g id="node2" class="node"><title>a</title><ellipse fill="none" stroke="black" stroke-width="0.5" cx="27" cy="-18" rx="27" ry="18"/><text text-anchor="middle" x="27" y="-13.8" font-family="Courier,monospace" font-size="14.00">a</text></g><g id="edge1" class="edge"><title>a&#45;&gt;L</title><path fill="none" stroke="black" stroke-width="0.5" d="M54.0461,-18C72.2389,-18 97.1211,-18 119.173,-18"/><polygon fill="black" stroke="black" stroke-width="0.5" points="119.339,-20.6251 126.839,-18 119.339,-15.3751 119.339,-20.6251"/></g></g></svg>
  36. In each of the following examples, we are creating new _references_ to the
  37. list object, but we are never duplicating it. Each reference points to the same
  38. memory address (which you can get using `id(a)`).
  39. {% highlight python %}
  40. b = a
  41. {% endhighlight %}
  42. {% highlight python %}
  43. c = SomeContainerClass()
  44. c.data = a
  45. {% endhighlight %}
  46. {% highlight python %}
  47. def wrapper(L):
  48. def inner():
  49. return L.pop()
  50. return inner
  51. d = wrapper(a)
  52. {% endhighlight %}
  53. <svg width="254pt" height="234pt" viewBox="0.00 0.00 253.96 234.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 238)"><title>%3</title><polygon fill="white" stroke="none" points="-4,4 -4,-238 249.96,-238 249.96,4 -4,4"/><g id="clust3" class="cluster"><title>cluster0</title><polygon fill="none" stroke="black" stroke-width="0.5" points="8,-8 8,-82 78,-82 78,-8 8,-8"/><text text-anchor="middle" x="43" y="-66.8" font-family="Courier,monospace" font-size="14.00">d</text></g><g id="node1" class="node"><title>obj</title><polygon fill="none" stroke="black" stroke-width="0.5" points="245.966,-153 157.943,-153 157.943,-117 245.966,-117 245.966,-153"/><text text-anchor="middle" x="201.954" y="-132" font-family="Courier,monospace" font-size="10.00">[1, 2, 3, 4]</text></g><g id="node2" class="node"><title>a</title><ellipse fill="none" stroke="black" stroke-width="0.5" cx="43" cy="-216" rx="27" ry="18"/><text text-anchor="middle" x="43" y="-211.8" font-family="Courier,monospace" font-size="14.00">a</text></g><g id="edge1" class="edge"><title>a&#45;&gt;obj</title><path fill="none" stroke="black" stroke-width="0.5" d="M64.8423,-205.244C88.7975,-192.881 128.721,-172.278 159.152,-156.573"/><polygon fill="black" stroke="black" stroke-width="0.5" points="160.422,-158.872 165.883,-153.1 158.014,-154.206 160.422,-158.872"/></g><g id="node3" class="node"><title>b</title><ellipse fill="none" stroke="black" stroke-width="0.5" cx="43" cy="-162" rx="27" ry="18"/><text text-anchor="middle" x="43" y="-157.8" font-family="Courier,monospace" font-size="14.00">b</text></g><g id="edge2" class="edge"><title>b&#45;&gt;obj</title><path fill="none" stroke="black" stroke-width="0.5" d="M69.2174,-157.662C90.9996,-153.915 123.147,-148.385 150.231,-143.726"/><polygon fill="black" stroke="black" stroke-width="0.5" points="150.777,-146.295 157.724,-142.437 149.887,-141.121 150.777,-146.295"/></g><g id="node4" class="node"><title>c</title><ellipse fill="none" stroke="black" stroke-width="0.5" cx="43" cy="-108" rx="41.897" ry="18"/><text text-anchor="middle" x="43" y="-103.8" font-family="Courier,monospace" font-size="14.00">c.data</text></g><g id="edge3" class="edge"><title>c&#45;&gt;obj</title><path fill="none" stroke="black" stroke-width="0.5" d="M82.3954,-114.605C102.772,-118.11 128.077,-122.463 150.069,-126.247"/><polygon fill="black" stroke="black" stroke-width="0.5" points="149.86,-128.874 157.697,-127.559 150.75,-123.7 149.86,-128.874"/></g><g id="node5" class="node"><title>L</title><ellipse fill="none" stroke="black" stroke-width="0.5" cx="43" cy="-34" rx="27" ry="18"/><text text-anchor="middle" x="43" y="-29.8" font-family="Courier,monospace" font-size="14.00">L</text></g><g id="edge4" class="edge"><title>L&#45;&gt;obj</title><path fill="none" stroke="black" stroke-width="0.5" d="M62.9324,-46.183C88.5083,-62.6411 134.554,-92.2712 166.386,-112.755"/><polygon fill="black" stroke="black" stroke-width="0.5" points="165.223,-115.128 172.951,-116.98 168.064,-110.714 165.223,-115.128"/></g></g></svg>
  54. Note that these references are all equal. `a` is no more valid a name for the
  55. list than `b`, `c.data`, or `L` (from the perspective of `d`, which is exposed
  56. to everyone else as `d.func_closure[0].cell_contents`, but that's cumbersome
  57. and you would never do that in practice). As a result, if you delete one of
  58. these references—explicitly with `del a`, or implicitly if a name goes out of
  59. scope—then the other references are still around, and object continues to
  60. exist. If all of an object's references disappear, then Python's garbage
  61. collector should eliminate it.
  62. ## Dead ends
  63. My first thought when approaching this problem was to physically write over the
  64. memory where our target object is stored. This can be done using
  65. [`ctypes.memmove()`](https://docs.python.org/2/library/ctypes.html#ctypes.memmove)
  66. from the Python standard library:
  67. {% highlight pycon %}
  68. >>> class A(object): pass
  69. ...
  70. >>> class B(object): pass
  71. ...
  72. >>> obj = A()
  73. >>> print obj
  74. <__main__.A object at 0x10e3e1190>
  75. >>> import ctypes
  76. >>> ctypes.memmove(id(A), id(B), object.__sizeof__(A))
  77. 140576340136752
  78. >>> print obj
  79. <__main__.B object at 0x10e3e1190>
  80. {% endhighlight %}
  81. What we are doing here is overwriting the fields of the `A` instance of the
  82. [`PyClassObject` C struct](https://github.com/python/cpython/blob/2.7/Include/classobject.h#L12)
  83. with fields from the `B` struct instance. As a result, they now share various
  84. properties, such as their attribute dictionaries
  85. ([`__dict__`](https://docs.python.org/2/reference/datamodel.html#the-standard-type-hierarchy)).
  86. So, we can do things like this:
  87. {% highlight pycon %}
  88. >>> B.foo = 123
  89. >>> obj.foo
  90. 123
  91. {% endhighlight %}
  92. However, there are clear issues. What we've done is create a
  93. [_shallow copy_](https://en.wikipedia.org/wiki/Object_copy#Shallow_copy).
  94. Therefore, `A` and `B` are still distinct objects, so certain changes made to
  95. one will not be replicated to the other:
  96. {% highlight pycon %}
  97. >>> A is B
  98. False
  99. >>> B.__name__ = "C"
  100. >>> A.__name__
  101. 'B'
  102. {% endhighlight %}
  103. Also, this won't work if `A` and `B` are different sizes, since we will be
  104. either reading from or writing to memory that we don't necessarily own:
  105. {% highlight pycon %}
  106. >>> A = ()
  107. >>> B = []
  108. >>> print A.__sizeof__(), B.__sizeof__()
  109. 24 40
  110. >>> import ctypes
  111. >>> ctypes.memmove(id(A), id(B), A.__sizeof__())
  112. 4321271888
  113. Python(33575,0x7fff76925300) malloc: *** error for object 0x6f: pointer being freed was not allocated
  114. *** set a breakpoint in malloc_error_break to debug
  115. Abort trap: 6
  116. {% endhighlight %}
  117. Oh, and there's a bit of a problem when we deallocate these objects, too...
  118. {% highlight pycon %}
  119. >>> A = []
  120. >>> B = range(8)
  121. >>> import ctypes
  122. >>> ctypes.memmove(id(A), id(B), A.__sizeof__())
  123. 4514685728
  124. >>> print A
  125. [0, 1, 2, 3, 4, 5, 6, 7]
  126. >>> del A
  127. >>> del B
  128. Segmentation fault: 11
  129. {% endhighlight %}
  130. ## Fishing for references with Guppy
  131. A more correct solution is finding all of the _references_ to the old object,
  132. and then updating them to point to the new object, rather than replacing the
  133. old object directly.
  134. But how do we track references? Fortunately, there's a library called
  135. [Guppy](http://guppy-pe.sourceforge.net/) that allows us to do this. Often used
  136. for diagnosing memory leaks, we can take advantage of its robust object
  137. tracking features here. Install it with [pip](https://pypi.python.org/pypi/pip)
  138. (`pip install guppy`).
  139. I've always found Guppy hard to use (as many debuggers are, though justified by
  140. the complexity of the task involved), so we'll begin with a feature demo before
  141. delving into the actual problem.
  142. ### Feature demonstration
  143. Guppy's interface is deceptively simple. We begin by calling
  144. [`guppy.hpy()`](http://guppy-pe.sourceforge.net/guppy.html#kindnames.guppy.hpy),
  145. to expose the Heapy interface, which is the component of Guppy that has the
  146. features we want:
  147. {% highlight pycon %}
  148. >>> import guppy
  149. >>> hp = guppy.hpy()
  150. >>> hp
  151. Top level interface to Heapy.
  152. Use eg: hp.doc for more info on hp.
  153. {% endhighlight %}
  154. Calling
  155. [`hp.heap()`](http://guppy-pe.sourceforge.net/heapy_Use.html#heapykinds.Use.heap)
  156. shows us a table of the objects known to Guppy, grouped together
  157. (mathematically speaking,
  158. [_partitioned_](https://en.wikipedia.org/wiki/Partition_of_a_set)) by
  159. type<sup><a id="ref3" href="#fn3">[3]</a></sup> and sorted by how much space
  160. they take up in memory:
  161. {% highlight pycon %}
  162. >>> heap = hp.heap()
  163. >>> heap
  164. Partition of a set of 45761 objects. Total size = 4699200 bytes.
  165. Index Count % Size % Cumulative % Kind (class / dict of class)
  166. 0 15547 34 1494736 32 1494736 32 str
  167. 1 8356 18 770272 16 2265008 48 tuple
  168. 2 346 1 452080 10 2717088 58 dict (no owner)
  169. 3 13685 30 328440 7 3045528 65 int
  170. 4 71 0 221096 5 3266624 70 dict of module
  171. 5 1652 4 211456 4 3478080 74 types.CodeType
  172. 6 199 0 210856 4 3688936 79 dict of type
  173. 7 1614 4 193680 4 3882616 83 function
  174. 8 199 0 177008 4 4059624 86 type
  175. 9 124 0 135328 3 4194952 89 dict of class
  176. <91 more rows. Type e.g. '_.more' to view.>
  177. {% endhighlight %}
  178. This object (called an
  179. [`IdentitySet`](http://guppy-pe.sourceforge.net/heapy_UniSet.html#heapykinds.IdentitySet))
  180. looks bizarre, but it can be treated roughly like a list. If we want to take a
  181. look at strings, we can do `heap[0]`:
  182. {% highlight pycon %}
  183. >>> heap[0]
  184. Partition of a set of 22606 objects. Total size = 2049896 bytes.
  185. Index Count % Size % Cumulative % Kind (class / dict of class)
  186. 0 22606 100 2049896 100 2049896 100 str
  187. {% endhighlight %}
  188. This isn't very useful, though. What we really want to do is re-partition this
  189. subset using another relationship. There are a number of options, such as:
  190. {% highlight pycon %}
  191. >>> heap[0].byid # Group by object ID; each subset therefore has one element
  192. Set of 22606 <str> objects. Total size = 2049896 bytes.
  193. Index Size % Cumulative % Representation (limited)
  194. 0 7480 0.4 7480 0.4 'The class Bi... copy of S.\n'
  195. 1 4872 0.2 12352 0.6 "Support for ... 'error'.\n\n"
  196. 2 4760 0.2 17112 0.8 'Heap queues\...at Art! :-)\n'
  197. 3 4760 0.2 21872 1.1 'Heap queues\...at Art! :-)\n'
  198. 4 3896 0.2 25768 1.3 'This module ...ng function\n'
  199. 5 3824 0.2 29592 1.4 'The type of ...call order.\n'
  200. 6 3088 0.2 32680 1.6 't\x00\x00|\x...x00|\x02\x00S'
  201. 7 2992 0.1 35672 1.7 'HeapView(roo... size, etc.\n'
  202. 8 2808 0.1 38480 1.9 'Directory tr...ories\n\n '
  203. 9 2640 0.1 41120 2.0 'The class No... otherwise.\n'
  204. <22596 more rows. Type e.g. '_.more' to view.>
  205. {% endhighlight %}
  206. {% highlight pycon %}
  207. >>> heap[0].byrcs # Group by what types of objects reference the strings
  208. Partition of a set of 22606 objects. Total size = 2049896 bytes.
  209. Index Count % Size % Cumulative % Referrers by Kind (class / dict of class)
  210. 0 6146 27 610752 30 610752 30 types.CodeType
  211. 1 5304 23 563984 28 1174736 57 tuple
  212. 2 4104 18 237536 12 1412272 69 dict (no owner)
  213. 3 1959 9 139880 7 1552152 76 list
  214. 4 564 2 136080 7 1688232 82 function, tuple
  215. 5 809 4 97896 5 1786128 87 dict of module
  216. 6 346 2 71760 4 1857888 91 dict of type
  217. 7 365 2 19408 1 1877296 92 dict of module, tuple
  218. 8 192 1 16176 1 1893472 92 dict (no owner), list
  219. 9 232 1 11784 1 1905256 93 dict of class, function, tuple, types.CodeType
  220. <229 more rows. Type e.g. '_.more' to view.>
  221. {% endhighlight %}
  222. {% highlight pycon %}
  223. >>> heap[0].byvia # Group by how the strings are related to their referrers
  224. Partition of a set of 22606 objects. Total size = 2049896 bytes.
  225. Index Count % Size % Cumulative % Referred Via:
  226. 0 2656 12 420456 21 420456 21 '[0]'
  227. 1 2095 9 259008 13 679464 33 '.co_code'
  228. 2 2095 9 249912 12 929376 45 '.co_filename'
  229. 3 564 2 136080 7 1065456 52 '.func_doc', '[0]'
  230. 4 243 1 103528 5 1168984 57 "['__doc__']"
  231. 5 1930 9 100584 5 1269568 62 '.co_lnotab'
  232. 6 502 2 31128 2 1300696 63 '[1]'
  233. 7 306 1 16272 1 1316968 64 '[2]'
  234. 8 242 1 12960 1 1329928 65 '[3]'
  235. 9 184 1 9872 0 1339800 65 '[4]'
  236. <7323 more rows. Type e.g. '_.more' to view.>
  237. {% endhighlight %}
  238. From this, we can see that the plurality of memory devoted to strings is taken
  239. up by those referenced by code objects (`types.CodeType` represents
  240. Python code—accessible from a non-C-defined function through
  241. `func.func_code`—and contains things like the names of its local variables and
  242. the actual sequence of opcodes that make it up).
  243. For fun, let's pick a random string.
  244. {% highlight pycon %}
  245. >>> import random
  246. >>> obj = heap[0].byid[random.randrange(0, heap[0].count)]
  247. >>> obj
  248. Set of 1 <str> object. Total size = 176 bytes.
  249. Index Size % Cumulative % Representation (limited)
  250. 0 176 100.0 176 100.0 'Define names...not listed.\n'
  251. {% endhighlight %}
  252. Interesting. Since this heap subset contains only one element, we can use
  253. [`.theone`](http://guppy-pe.sourceforge.net/heapy_UniSet.html#heapykinds.IdentitySetSingleton.theone)
  254. to get the actual object represented here:
  255. {% highlight pycon %}
  256. >>> obj.theone
  257. 'Define names for all type symbols known in the standard interpreter.\n\nTypes that are part of optional modules (e.g. array) are not listed.\n'
  258. {% endhighlight %}
  259. Looks like the docstring for the
  260. [`types`](https://docs.python.org/2/library/types.html) module. We can confirm
  261. by using
  262. [`.referrers`](http://guppy-pe.sourceforge.net/heapy_UniSet.html#heapykinds.IdentitySet.referrers)
  263. to get the set of objects that refer to objects in the given set:
  264. {% highlight pycon %}
  265. >>> obj.referrers
  266. Partition of a set of 1 object. Total size = 3352 bytes.
  267. Index Count % Size % Cumulative % Kind (class / dict of class)
  268. 0 1 100 3352 100 3352 100 dict of module
  269. {% endhighlight %}
  270. This is `types.__dict__` (since the docstring we got is actually stored as
  271. `types.__dict__["__doc__"]`), so if we use `.referrers` again:
  272. {% highlight pycon %}
  273. >>> obj.referrers.referrers
  274. Partition of a set of 1 object. Total size = 56 bytes.
  275. Index Count % Size % Cumulative % Kind (class / dict of class)
  276. 0 1 100 56 100 56 100 module
  277. >>> obj.referrers.referrers.theone
  278. <module 'types' from '/usr/local/Cellar/python/2.7.8_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/types.pyc'>
  279. >>> import types
  280. >>> types.__doc__ is obj.theone
  281. True
  282. {% endhighlight %}
  283. _But why did we find an object in the `types` module if we never imported it?_
  284. Well, let's see. We can use
  285. [`hp.iso()`](http://guppy-pe.sourceforge.net/heapy_Use.html#heapykinds.Use.iso)
  286. to get the Heapy set consisting of a single given object:
  287. {% highlight pycon %}
  288. >>> hp.iso(types)
  289. Partition of a set of 1 object. Total size = 56 bytes.
  290. Index Count % Size % Cumulative % Kind (class / dict of class)
  291. 0 1 100 56 100 56 100 module
  292. {% endhighlight %}
  293. Using a similar procedure as before, we see that `types` is imported by the
  294. [`traceback`](https://docs.python.org/2/library/traceback.html) module:
  295. {% highlight pycon %}
  296. >>> hp.iso(types).referrers
  297. Partition of a set of 10 objects. Total size = 25632 bytes.
  298. Index Count % Size % Cumulative % Kind (class / dict of class)
  299. 0 2 20 13616 53 13616 53 dict (no owner)
  300. 1 5 50 9848 38 23464 92 dict of module
  301. 2 1 10 1048 4 24512 96 dict of guppy.etc.Glue.Interface
  302. 3 1 10 1048 4 25560 100 dict of guppy.etc.Glue.Share
  303. 4 1 10 72 0 25632 100 tuple
  304. >>> hp.iso(types).referrers[1].byid
  305. Set of 5 <dict of module> objects. Total size = 9848 bytes.
  306. Index Size % Cumulative % Owner Name
  307. 0 3352 34.0 3352 34.0 traceback
  308. 1 3352 34.0 6704 68.1 warnings
  309. 2 1048 10.6 7752 78.7 __main__
  310. 3 1048 10.6 8800 89.4 abc
  311. 4 1048 10.6 9848 100.0 guppy.etc.Glue
  312. {% endhighlight %}
  313. ...and that is imported by
  314. [`site`](https://docs.python.org/2/library/site.html):
  315. {% highlight pycon %}
  316. >>> import traceback
  317. >>> hp.iso(traceback).referrers
  318. Partition of a set of 3 objects. Total size = 15992 bytes.
  319. Index Count % Size % Cumulative % Kind (class / dict of class)
  320. 0 1 33 12568 79 12568 79 dict (no owner)
  321. 1 1 33 3352 21 15920 100 dict of module
  322. 2 1 33 72 0 15992 100 tuple
  323. >>> hp.iso(traceback).referrers[1].byid
  324. Set of 1 <dict of module> object. Total size = 3352 bytes.
  325. Index Size % Cumulative % Owner Name
  326. 0 3352 100.0 3352 100.0 site
  327. {% endhighlight %}
  328. Since `site` is imported by Python on startup, we've figured out why objects
  329. from `types` exist, even though we've never used them.
  330. We've learned something important, too. When objects are stored as ordinary
  331. attributes of a parent object (like `types.__doc__`, `traceback.types`, and
  332. `site.traceback` from above), they are not referenced directly by the parent
  333. object, but by that object's `__dict__` attribute. Therefore, if we want to
  334. replace `A` with `B` and `A` is an attribute of `C`, we (probably) don't need
  335. to know anything special about `C`—just how to modify dictionaries.
  336. A good Guppy/Heapy tutorial, while a bit old and incomplete, can be found on
  337. [Andrey Smirnov's website](http://smira.ru/wp-content/uploads/2011/08/heapy.html).
  338. ## Examining paths
  339. Let's set up an example replacement using class instances:
  340. {% highlight python %}
  341. class A(object):
  342. pass
  343. class B(object):
  344. pass
  345. a = A()
  346. b = B()
  347. {% endhighlight %}
  348. Suppose we want to replace `a` with `b`. From the demo above, we know that we
  349. can get the Heapy set of a single object using `hp.iso()`. We also know we can
  350. use `.referrers` to get a set of objects that reference the given object:
  351. {% highlight pycon %}
  352. >>> import guppy
  353. >>> hp = guppy.hpy()
  354. >>> print hp.iso(a).referrers
  355. Partition of a set of 1 object. Total size = 1048 bytes.
  356. Index Count % Size % Cumulative % Kind (class / dict of class)
  357. 0 1 100 1048 100 1048 100 dict of module
  358. {% endhighlight %}
  359. `a` is only referenced by one object, which makes sense, since we've only used
  360. it in one place—as a local variable—meaning `hp.iso(a).referrers.theone` must
  361. be [`locals()`](https://docs.python.org/2/library/functions.html#locals):
  362. {% highlight pycon %}
  363. >>> hp.iso(a).referrers.theone is locals()
  364. True
  365. {% highlight pycon %}
  366. However, there is a more useful feature available to us:
  367. [`.pathsin`](http://guppy-pe.sourceforge.net/heapy_UniSet.html#heapykinds.IdentitySet.pathsin).
  368. This also returns references to the given object, but instead of a Heapy set,
  369. it is a list of `Path` objects. These are more useful since they tell us not
  370. only _what_ objects are related to the given object, but _how_ they are
  371. related.
  372. {% highlight pycon %}
  373. >>> print hp.iso(a).pathsin
  374. 0: Src['a']
  375. {% endhighlight %}
  376. This looks very ambiguous. However, we find that we can extract the source of
  377. the reference using `.src`:
  378. {% highlight pycon %}
  379. >>> path = hp.iso(a).pathsin[0]
  380. >>> print path.src
  381. Partition of a set of 1 object. Total size = 1048 bytes.
  382. Index Count % Size % Cumulative % Kind (class / dict of class)
  383. 0 1 100 1048 100 1048 100 dict of module
  384. >>> path.src.theone is locals()
  385. True
  386. {% endhighlight %}
  387. ...and, we can examine the type of relation by looking at `.path[1]` (the
  388. actual reason for this isn't worth getting into, due to Guppy's lack of
  389. documentation on the subject):
  390. {% highlight pycon %}
  391. >>> relation = path.path[1]
  392. >>> relation
  393. <guppy.heapy.Path.Based_R_INDEXVAL object at 0x100f38230>
  394. {% endhighlight %}
  395. We notice that `relation` is a `Based_R_INDEXVAL` object. Sounds bizarre, but
  396. this tells us that `path.src` is related to `a` by being a particular index
  397. value of it. What index? We can get this using `relation.r`:
  398. {% highlight pycon %}
  399. >>> rel = relation.r
  400. >>> print rel
  401. a
  402. {% endhighlight %}
  403. Ah ha! So now we know that `a` is equal to the reference source indexed by
  404. `rel`. But what is the reference source? It's just `path.src.theone`:
  405. {% highlight pycon %}
  406. >>> path.src.theone[rel] is a
  407. True
  408. {% endhighlight %}
  409. But `path.src.theone` is just a dictionary, meaning we know how to modify it
  410. very easily:
  411. {% highlight pycon %}
  412. >>> path.src.theone[rel] = b
  413. >>> a
  414. <__main__.B object at 0x100dae090>
  415. >>> a is b
  416. True
  417. {% endhighlight %}
  418. Python's documentation tells us not to modify the locals dictionary, but screw
  419. it, we're gonna do it anyway.
  420. ## Handling different reference types
  421. [...]
  422. ### Dictionaries
  423. dicts, class attributes via `__dict__`, locals()
  424. ### Lists
  425. simple replacement
  426. ### Tuples
  427. recursively replace parent since immutable
  428. ### Bound methods
  429. note that built-in methods and regular methods have different underlying C
  430. structs, but have the same offsets for their self field
  431. ### Closure cells
  432. function closures
  433. ### Frames
  434. ...
  435. ### Slots
  436. ...
  437. ### Classes
  438. ...
  439. ### Other cases
  440. Certainly, not every case is handled above, but it seems to cover the vast
  441. majority of instances that I've found through testing. There are a number of
  442. reference relations in Guppy that I couldn't figure out how to replicate
  443. without doing something insane (`R_HASATTR`, `R_CELL`, and `R_STACK`), so some
  444. obscure replacements are likely unimplemented.
  445. Some other kinds of replacements are known, but impossible. For example,
  446. replacing a class object that uses `__slots__` with another class will not work
  447. if the replacement class has a different slot layout and instances of the old
  448. class exist. More generally, replacing a class with a non-class object won't
  449. work if instances of the class exist. Furthermore, references stored in data
  450. structures managed by C extensions cannot be changed, since there's no good way
  451. for us to track these.
  452. Remaining areas to explore include behavior when metaclasses and more complex
  453. descriptors are involved. Implementing a more complete version of `replace()`
  454. is left as an exercise for the reader.
  455. ## Footnotes
  456. 1. <a id="fn1" href="#ref1">^</a> This post relies _heavily_ on implementation
  457. details of CPython 2.7. While it could be adapted for Python 3 by examining
  458. changes to the internal structures of objects that we used above, that would
  459. be a lost cause if you wanted to replicate this on
  460. [Jython](http://www.jython.org/) or some other implementation. We are so
  461. dependent on concepts specific to CPython that you would need to start from
  462. scratch, beginning with a language-specific replacement for Guppy.
  463. 2. <a id="fn2" href="#ref2">^</a> The
  464. [DOT files](https://en.wikipedia.org/wiki/DOT_(graph_description_language))
  465. used to generate graphs in this post are
  466. [available on Gist](https://gist.github.com/earwig/edc13f04f871c110eea6).
  467. 3. <a id="fn3" href="#ref3">^</a> They're actually grouped together by _clodo_
  468. ("class or dict object"), which is similar to type, but groups `__dict__`s
  469. separately by their owner's type.