Personal website https://benkurtovic.com/
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

python-object-replacement.md 12 KiB

10 年之前
10 年之前
10 年之前
10 年之前
10 年之前
10 年之前
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276
  1. ---
  2. layout: post
  3. title: Replacing Objects in Python
  4. tags: Python
  5. description: More reflection than you cared to ask for
  6. draft: true
  7. ---
  8. Today, we're going to demonstrate a fairly evil thing in Python, which I call
  9. _object replacement_.
  10. Say you have some program that's been running for a while, and a particular
  11. object has made its way throughout your code. It lives inside lists, class
  12. attributes, maybe even inside some closures. You want to completely replace
  13. this object with another one; that is to say, you want to find all references
  14. to object `A` and replace them with object `B`, enabling `A` to be garbage
  15. collected. This has some interesting implications for special object types. If
  16. you have methods that are bound to `A`, you want to rebind them to `B`. If `A`
  17. is a class, you want all instances of `A` to become instances of `B`. And so
  18. on.
  19. _But why on Earth would you want to do that?_ you ask. I'll focus on a concrete
  20. use case in a future post, but for now, I imagine this could be useful in some
  21. kind of advanted unit testing situation with mock objects. Still, it's fairly
  22. insane, so let's leave it as primarily an intellectual exercise.
  23. This article is written for [CPython](https://en.wikipedia.org/wiki/CPython)
  24. 2.7.<sup><a id="ref1" href="#fn1">[1]</a></sup>
  25. ## Review
  26. First, a recap on terminology here. You can skip this section if you know
  27. Python well.
  28. In Python, _names_ are what most languages call "variables". They reference
  29. _objects_. So when we do:
  30. {% highlight python %}
  31. a = [1, 2, 3, 4]
  32. {% endhighlight %}
  33. ...we are creating a list object with four integers, and binding it to the name
  34. `a`. In graph form:<sup><a id="ref2" href="#fn2">[2]</a></sup>
  35. <svg width="223pt" height="44pt" viewBox="0.00 0.00 223.01 44.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 40)"><title>%3</title><polygon fill="white" stroke="none" points="-4,4 -4,-40 219.012,-40 219.012,4 -4,4"/><g id="node1" class="node"><title>L</title><polygon fill="none" stroke="black" stroke-width="0.5" points="215.018,-36 126.994,-36 126.994,-0 215.018,-0 215.018,-36"/><text text-anchor="middle" x="171.006" y="-15" font-family="Courier,monospace" font-size="10.00">[1, 2, 3, 4]</text></g><g id="node2" class="node"><title>a</title><ellipse fill="none" stroke="black" stroke-width="0.5" cx="27" cy="-18" rx="27" ry="18"/><text text-anchor="middle" x="27" y="-13.8" font-family="Courier,monospace" font-size="14.00">a</text></g><g id="edge1" class="edge"><title>a&#45;&gt;L</title><path fill="none" stroke="black" stroke-width="0.5" d="M54.0461,-18C72.2389,-18 97.1211,-18 119.173,-18"/><polygon fill="black" stroke="black" stroke-width="0.5" points="119.339,-20.6251 126.839,-18 119.339,-15.3751 119.339,-20.6251"/></g></g></svg>
  36. In each of the following examples, we are creating new _references_ to the
  37. list object, but we are never duplicating it. Each reference points to the same
  38. memory address (which you can get using `id(a)`).
  39. {% highlight python %}
  40. b = a
  41. {% endhighlight %}
  42. {% highlight python %}
  43. c = SomeContainerClass()
  44. c.data = a
  45. {% endhighlight %}
  46. {% highlight python %}
  47. def wrapper(L):
  48. def inner():
  49. return L.pop()
  50. return inner
  51. d = wrapper(a)
  52. {% endhighlight %}
  53. <svg width="254pt" height="234pt" viewBox="0.00 0.00 253.96 234.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 238)"><title>%3</title><polygon fill="white" stroke="none" points="-4,4 -4,-238 249.96,-238 249.96,4 -4,4"/><g id="clust3" class="cluster"><title>cluster0</title><polygon fill="none" stroke="black" stroke-width="0.5" points="8,-8 8,-82 78,-82 78,-8 8,-8"/><text text-anchor="middle" x="43" y="-66.8" font-family="Courier,monospace" font-size="14.00">d</text></g><g id="node1" class="node"><title>obj</title><polygon fill="none" stroke="black" stroke-width="0.5" points="245.966,-153 157.943,-153 157.943,-117 245.966,-117 245.966,-153"/><text text-anchor="middle" x="201.954" y="-132" font-family="Courier,monospace" font-size="10.00">[1, 2, 3, 4]</text></g><g id="node2" class="node"><title>a</title><ellipse fill="none" stroke="black" stroke-width="0.5" cx="43" cy="-216" rx="27" ry="18"/><text text-anchor="middle" x="43" y="-211.8" font-family="Courier,monospace" font-size="14.00">a</text></g><g id="edge1" class="edge"><title>a&#45;&gt;obj</title><path fill="none" stroke="black" stroke-width="0.5" d="M64.8423,-205.244C88.7975,-192.881 128.721,-172.278 159.152,-156.573"/><polygon fill="black" stroke="black" stroke-width="0.5" points="160.422,-158.872 165.883,-153.1 158.014,-154.206 160.422,-158.872"/></g><g id="node3" class="node"><title>b</title><ellipse fill="none" stroke="black" stroke-width="0.5" cx="43" cy="-162" rx="27" ry="18"/><text text-anchor="middle" x="43" y="-157.8" font-family="Courier,monospace" font-size="14.00">b</text></g><g id="edge2" class="edge"><title>b&#45;&gt;obj</title><path fill="none" stroke="black" stroke-width="0.5" d="M69.2174,-157.662C90.9996,-153.915 123.147,-148.385 150.231,-143.726"/><polygon fill="black" stroke="black" stroke-width="0.5" points="150.777,-146.295 157.724,-142.437 149.887,-141.121 150.777,-146.295"/></g><g id="node4" class="node"><title>c</title><ellipse fill="none" stroke="black" stroke-width="0.5" cx="43" cy="-108" rx="41.897" ry="18"/><text text-anchor="middle" x="43" y="-103.8" font-family="Courier,monospace" font-size="14.00">c.data</text></g><g id="edge3" class="edge"><title>c&#45;&gt;obj</title><path fill="none" stroke="black" stroke-width="0.5" d="M82.3954,-114.605C102.772,-118.11 128.077,-122.463 150.069,-126.247"/><polygon fill="black" stroke="black" stroke-width="0.5" points="149.86,-128.874 157.697,-127.559 150.75,-123.7 149.86,-128.874"/></g><g id="node5" class="node"><title>L</title><ellipse fill="none" stroke="black" stroke-width="0.5" cx="43" cy="-34" rx="27" ry="18"/><text text-anchor="middle" x="43" y="-29.8" font-family="Courier,monospace" font-size="14.00">L</text></g><g id="edge4" class="edge"><title>L&#45;&gt;obj</title><path fill="none" stroke="black" stroke-width="0.5" d="M62.9324,-46.183C88.5083,-62.6411 134.554,-92.2712 166.386,-112.755"/><polygon fill="black" stroke="black" stroke-width="0.5" points="165.223,-115.128 172.951,-116.98 168.064,-110.714 165.223,-115.128"/></g></g></svg>
  54. Note that these references are all equal. `a` is no more valid a name for the
  55. list than `b`, `c.data`, or `L` (from the perspective of `d`, which is exposed
  56. to everyone else as `d.func_closure[0].cell_contents`, but that's cumbersome
  57. and you would never do that in practice). As a result, if you delete one of
  58. these references—explicitly with `del a`, or implicitly if a name goes out of
  59. scope—then the other references are still around, and object continues to
  60. exist. If all of an object's references disappear, then Python's garbage
  61. collector should eliminate it.
  62. ## Dead ends
  63. My first thought when approaching this problem was to physically write over the
  64. memory where our target object is stored. This can be done using
  65. [`ctypes.memmove()`](https://docs.python.org/2/library/ctypes.html#ctypes.memmove)
  66. from the Python standard library:
  67. {% highlight pycon %}
  68. >>> class A(object): pass
  69. ...
  70. >>> class B(object): pass
  71. ...
  72. >>> obj = A()
  73. >>> print obj
  74. <__main__.A object at 0x10e3e1190>
  75. >>> import ctypes
  76. >>> ctypes.memmove(id(A), id(B), object.__sizeof__(A))
  77. 140576340136752
  78. >>> print obj
  79. <__main__.B object at 0x10e3e1190>
  80. {% endhighlight %}
  81. What we are doing here is overwriting the fields of the `A` instance of the
  82. [`PyClassObject` C struct](https://github.com/python/cpython/blob/2.7/Include/classobject.h#L12)
  83. with fields from the `B` struct instance. As a result, they now share various
  84. properties, such as their attribute dictionaries
  85. ([`__dict__`](https://docs.python.org/2/reference/datamodel.html#the-standard-type-hierarchy)).
  86. So, we can do things like this:
  87. {% highlight pycon %}
  88. >>> B.foo = 123
  89. >>> obj.foo
  90. 123
  91. {% endhighlight %}
  92. However, there are clear issues. What we've done is create a
  93. [_shallow copy_](https://en.wikipedia.org/wiki/Object_copy#Shallow_copy).
  94. Therefore, `A` and `B` are still distinct objects, so certain changes made to
  95. one will not be replicated to the other:
  96. {% highlight pycon %}
  97. >>> A is B
  98. False
  99. >>> B.__name__ = "C"
  100. >>> A.__name__
  101. 'B'
  102. {% endhighlight %}
  103. Also, this won't work if `A` and `B` are different sizes, since we will be
  104. either reading from or writing to memory we don't necessarily own:
  105. {% highlight pycon %}
  106. >>> A = ()
  107. >>> B = []
  108. >>> print A.__sizeof__(), B.__sizeof__()
  109. 24 40
  110. >>> import ctypes
  111. >>> ctypes.memmove(id(A), id(B), A.__sizeof__())
  112. 4321271888
  113. Python(33575,0x7fff76925300) malloc: *** error for object 0x6f: pointer being freed was not allocated
  114. *** set a breakpoint in malloc_error_break to debug
  115. Abort trap: 6
  116. {% endhighlight %}
  117. Oh, and there's a bit of a problem when we deallocate these objects, too...
  118. {% highlight pycon %}
  119. >>> A = []
  120. >>> B = range(8)
  121. >>> import ctypes
  122. >>> ctypes.memmove(id(A), id(B), A.__sizeof__())
  123. 4514685728
  124. >>> print A
  125. [0, 1, 2, 3, 4, 5, 6, 7]
  126. >>> del A
  127. >>> del B
  128. Segmentation fault: 11
  129. {% endhighlight %}
  130. ## Fishing for references with Guppy
  131. A more correct solution is finding all of the _references_ to the old object,
  132. and then updating them to point to the new object, rather than replacing the
  133. old object directly.
  134. But how do we track references? Fortunately, there is a library called
  135. [Guppy](http://guppy-pe.sourceforge.net/) that allows us to do this. Often used
  136. for diagnosing memory leaks, we can take advantage of its robust object
  137. tracking features here. Install it with [pip](https://pypi.python.org/pypi/pip)
  138. (`pip install guppy`).
  139. I've always found Guppy hard to use (as many debuggers are, though justified by
  140. the complexity of the task involved), so we'll begin with a feature demo before
  141. delving into the actual problem.
  142. ### Feature demonstration
  143. Guppy's interface is deceptively simple. We begin by creating an instance of
  144. the Heapy interface, which is the component of Guppy that has the features we
  145. want:
  146. {% highlight pycon %}
  147. >>> import guppy
  148. >>> hp = guppy.hpy()
  149. {% endhighlight %}
  150. [...]
  151. ## Handling different reference types
  152. ### Dictionaries
  153. dicts, class attributes via `__dict__`, locals()
  154. ### Lists
  155. simple replacement
  156. ### Tuples
  157. recursively replace parent since immutable
  158. ### Bound methods
  159. note that built-in methods and regular methods have different underlying C
  160. structs, but have the same offsets for their self field
  161. ### Closure cells
  162. function closures
  163. ### Frames
  164. ...
  165. ### Slots
  166. ...
  167. ### Classes
  168. ...
  169. ### Other cases
  170. Certainly, not every case is handled above, but it seems to cover the vast
  171. majority of instances that I've found through testing. There are a number of
  172. reference relations in Guppy that I couldn't figure out how to replicate
  173. without doing something insane (`R_HASATTR`, `R_CELL`, and `R_STACK`), so some
  174. obscure replacements are likely unimplemented.
  175. Some other kinds of replacements are known, but impossible. For example,
  176. replacing a class object that uses `__slots__` with another class will not work
  177. if the replacement class has a different slot layout and instances of the old
  178. class exist. More generally, replacing a class with a non-class object won't
  179. work if instances of the class exist. Furthermore, references stored in data
  180. structures managed by C extensions cannot be changed, since there's no good way
  181. for us to track these.
  182. Remaining areas to explore include behavior when metaclasses and more complex
  183. descriptors are involved. Implementing a more complete version of `replace()`
  184. is left as an exercise for the reader.
  185. ## Notes
  186. 1. <a id="fn1" href="#ref1">^</a> This post relies _heavily_ on implementation
  187. details of CPython 2.7. While it could be adapted for Python 3 by examining
  188. changes to the internal structures of objects that we used above, that would
  189. be a lost cause if you wanted to replicate this on
  190. [Jython](http://www.jython.org/) or some other implementation. We are so
  191. dependent on concepts specific to CPython that you would need to start from
  192. scratch, beginning with a language-specific replacement for Guppy.
  193. 2. <a id="fn2" href="#ref2">^</a> The
  194. [DOT files](https://en.wikipedia.org/wiki/DOT_(graph_description_language))
  195. used to generate graphs in this post are
  196. [available on Gist](https://gist.github.com/earwig/edc13f04f871c110eea6).