From 63ef371c55efbe26897bb6191d28129c3a5651f2 Mon Sep 17 00:00:00 2001 From: Ben Kurtovic Date: Fri, 14 Nov 2014 21:49:17 -0600 Subject: [PATCH] Finish section on using Guppy. --- _drafts/python-object-replacement.md | 254 ++++++++++++++++++++++++++++++++++- 1 file changed, 248 insertions(+), 6 deletions(-) diff --git a/_drafts/python-object-replacement.md b/_drafts/python-object-replacement.md index 2a3bfb6..0db4f0b 100644 --- a/_drafts/python-object-replacement.md +++ b/_drafts/python-object-replacement.md @@ -140,7 +140,7 @@ False {% endhighlight %} Also, this won't work if `A` and `B` are different sizes, since we will be -either reading from or writing to memory we don't necessarily own: +either reading from or writing to memory that we don't necessarily own: {% highlight pycon %} @@ -192,21 +192,259 @@ delving into the actual problem. ### Feature demonstration -Guppy's interface is deceptively simple. We begin by creating an instance of -the Heapy interface, which is the component of Guppy that has the features we -want: +Guppy's interface is deceptively simple. We begin by calling +[`guppy.hpy()`](http://guppy-pe.sourceforge.net/guppy.html#kindnames.guppy.hpy), +to expose the Heapy interface, which is the component of Guppy that has the +features we want: {% highlight pycon %} >>> import guppy >>> hp = guppy.hpy() +>>> hp +Top level interface to Heapy. +Use eg: hp.doc for more info on hp. {% endhighlight %} -[...] +Calling +[`hp.heap()`](http://guppy-pe.sourceforge.net/heapy_Use.html#heapykinds.Use.heap) +shows us a table of the objects known to Guppy, grouped together +(mathematically speaking, +[_partitioned_](https://en.wikipedia.org/wiki/Partition_of_a_set)) by +type[3] and sorted by how much space +they take up in memory: + +{% highlight pycon %} + +>>> heap = hp.heap() +>>> heap +Partition of a set of 45761 objects. Total size = 4699200 bytes. + Index Count % Size % Cumulative % Kind (class / dict of class) + 0 15547 34 1494736 32 1494736 32 str + 1 8356 18 770272 16 2265008 48 tuple + 2 346 1 452080 10 2717088 58 dict (no owner) + 3 13685 30 328440 7 3045528 65 int + 4 71 0 221096 5 3266624 70 dict of module + 5 1652 4 211456 4 3478080 74 types.CodeType + 6 199 0 210856 4 3688936 79 dict of type + 7 1614 4 193680 4 3882616 83 function + 8 199 0 177008 4 4059624 86 type + 9 124 0 135328 3 4194952 89 dict of class +<91 more rows. Type e.g. '_.more' to view.> + +{% endhighlight %} + +This object (called an +[`IdentitySet`](http://guppy-pe.sourceforge.net/heapy_UniSet.html#heapykinds.IdentitySet)) +looks bizarre, but it can be treated roughly like a list. If we want to take a +look at strings, we can do `heap[0]`: + +{% highlight pycon %} + +>>> heap[0] +Partition of a set of 22606 objects. Total size = 2049896 bytes. + Index Count % Size % Cumulative % Kind (class / dict of class) + 0 22606 100 2049896 100 2049896 100 str + +{% endhighlight %} + +This isn't very useful, though. What we really want to do is re-partition this +subset by another relationship. There are a number of options: + +{% highlight pycon %} + +>>> heap[0].byid # Group by object ID; each subset therefore has one element +Set of 22606 objects. Total size = 2049896 bytes. + Index Size % Cumulative % Representation (limited) + 0 7480 0.4 7480 0.4 'The class Bi... copy of S.\n' + 1 4872 0.2 12352 0.6 "Support for ... 'error'.\n\n" + 2 4760 0.2 17112 0.8 'Heap queues\...at Art! :-)\n' + 3 4760 0.2 21872 1.1 'Heap queues\...at Art! :-)\n' + 4 3896 0.2 25768 1.3 'This module ...ng function\n' + 5 3824 0.2 29592 1.4 'The type of ...call order.\n' + 6 3088 0.2 32680 1.6 't\x00\x00|\x...x00|\x02\x00S' + 7 2992 0.1 35672 1.7 'HeapView(roo... size, etc.\n' + 8 2808 0.1 38480 1.9 'Directory tr...ories\n\n ' + 9 2640 0.1 41120 2.0 'The class No... otherwise.\n' +<22596 more rows. Type e.g. '_.more' to view.> + +{% endhighlight %} + +{% highlight pycon %} + +>>> heap[0].byrcs # Group by what types of objects reference the strings +Partition of a set of 22606 objects. Total size = 2049896 bytes. + Index Count % Size % Cumulative % Referrers by Kind (class / dict of class) + 0 6146 27 610752 30 610752 30 types.CodeType + 1 5304 23 563984 28 1174736 57 tuple + 2 4104 18 237536 12 1412272 69 dict (no owner) + 3 1959 9 139880 7 1552152 76 list + 4 564 2 136080 7 1688232 82 function, tuple + 5 809 4 97896 5 1786128 87 dict of module + 6 346 2 71760 4 1857888 91 dict of type + 7 365 2 19408 1 1877296 92 dict of module, tuple + 8 192 1 16176 1 1893472 92 dict (no owner), list + 9 232 1 11784 1 1905256 93 dict of class, function, tuple, types.CodeType +<229 more rows. Type e.g. '_.more' to view.> + +{% endhighlight %} + +{% highlight pycon %} + +>>> heap[0].byvia # Group by how the strings are related to their referrers +Partition of a set of 22606 objects. Total size = 2049896 bytes. + Index Count % Size % Cumulative % Referred Via: + 0 2656 12 420456 21 420456 21 '[0]' + 1 2095 9 259008 13 679464 33 '.co_code' + 2 2095 9 249912 12 929376 45 '.co_filename' + 3 564 2 136080 7 1065456 52 '.func_doc', '[0]' + 4 243 1 103528 5 1168984 57 "['__doc__']" + 5 1930 9 100584 5 1269568 62 '.co_lnotab' + 6 502 2 31128 2 1300696 63 '[1]' + 7 306 1 16272 1 1316968 64 '[2]' + 8 242 1 12960 1 1329928 65 '[3]' + 9 184 1 9872 0 1339800 65 '[4]' +<7323 more rows. Type e.g. '_.more' to view.> + +{% endhighlight %} + +From this, we can see that the plurality of memory devoted to strings is taken +up by those referenced by code objects (`types.CodeType` represents +Python code—accessible from a non-C-defined function through +`func.func_code`—and contains things like the names of its local variables and +the actual sequence of opcodes that make it up). + +For fun, let's pick a random string. + +{% highlight pycon %} + +>>> import random +>>> obj = heap[0].byid[random.randrange(0, heap[0].count)] +>>> obj +Set of 1 object. Total size = 176 bytes. + Index Size % Cumulative % Representation (limited) + 0 176 100.0 176 100.0 'Define names...not listed.\n' + +{% endhighlight %} + +Interesting. Since this heap subset contains only one element, we can use +[`.theone`](http://guppy-pe.sourceforge.net/heapy_UniSet.html#heapykinds.IdentitySetSingleton.theone) +to get the actual object represented here: + +{% highlight pycon %} + +>>> obj.theone +'Define names for all type symbols known in the standard interpreter.\n\nTypes that are part of optional modules (e.g. array) are not listed.\n' + +{% endhighlight %} + +Looks like the docstring for the +[`types`](https://docs.python.org/2/library/types.html) module. We can confirm +by using +[`.referrers`](http://guppy-pe.sourceforge.net/heapy_UniSet.html#heapykinds.IdentitySet.referrers) +to get the set of objects that refer to objects in the given set: + +{% highlight pycon %} + +>>> obj.referrers +Partition of a set of 1 object. Total size = 3352 bytes. + Index Count % Size % Cumulative % Kind (class / dict of class) + 0 1 100 3352 100 3352 100 dict of module + +{% endhighlight %} + +This is `types.__dict__` (since the docstring we got is actually stored as +`types.__dict__["__doc__"]`), so if we use `.referrers` again: + +{% highlight pycon %} + +>>> obj.referrers.referrers +Partition of a set of 1 object. Total size = 56 bytes. + Index Count % Size % Cumulative % Kind (class / dict of class) + 0 1 100 56 100 56 100 module +>>> obj.referrers.referrers.theone + +>>> import types +>>> types.__doc__ is obj.theone +True + +{% endhighlight %} + +_But why did we find an object in the `types` module if we never imported it?_ +Well, let's see. We can use +[`hp.iso()`](http://guppy-pe.sourceforge.net/heapy_Use.html#heapykinds.Use.iso) +to get the Heapy set consisting of a single given object: + +{% highlight pycon %} + +>>> hp.iso(types) +Partition of a set of 1 object. Total size = 56 bytes. + Index Count % Size % Cumulative % Kind (class / dict of class) + 0 1 100 56 100 56 100 module + +{% endhighlight %} + +Using a similar procedure as before, we see that `types` is imported by the +[`traceback`](https://docs.python.org/2/library/traceback.html) module: + +{% highlight pycon %} + +>>> hp.iso(types).referrers +Partition of a set of 10 objects. Total size = 25632 bytes. + Index Count % Size % Cumulative % Kind (class / dict of class) + 0 2 20 13616 53 13616 53 dict (no owner) + 1 5 50 9848 38 23464 92 dict of module + 2 1 10 1048 4 24512 96 dict of guppy.etc.Glue.Interface + 3 1 10 1048 4 25560 100 dict of guppy.etc.Glue.Share + 4 1 10 72 0 25632 100 tuple +>>> hp.iso(types).referrers[1].byid +Set of 5 objects. Total size = 9848 bytes. + Index Size % Cumulative % Owner Name + 0 3352 34.0 3352 34.0 traceback + 1 3352 34.0 6704 68.1 warnings + 2 1048 10.6 7752 78.7 __main__ + 3 1048 10.6 8800 89.4 abc + 4 1048 10.6 9848 100.0 guppy.etc.Glue + +{% endhighlight %} + +...and that is imported by +[`site`](https://docs.python.org/2/library/site.html): + +{% highlight pycon %} + +>>> import traceback +>>> hp.iso(traceback).referrers +Partition of a set of 3 objects. Total size = 15992 bytes. + Index Count % Size % Cumulative % Kind (class / dict of class) + 0 1 33 12568 79 12568 79 dict (no owner) + 1 1 33 3352 21 15920 100 dict of module + 2 1 33 72 0 15992 100 tuple +>>> hp.iso(traceback).referrers[1].byid +Set of 1 object. Total size = 3352 bytes. + Index Size % Cumulative % Owner Name + 0 3352 100.0 3352 100.0 site + +{% endhighlight %} + +Since `site` is imported by Python on startup, we've figured out why objects +from `types` exist, even though we've never used them. + +We've learned something important, too. When objects are stored as ordinary +attributes of a parent object (like `types.__doc__`, `traceback.types`, and +`site.traceback` from above), they are not referenced directly by the parent +object, but by that object's `__dict__` attribute. Therefore, if we want to +replace `A` with `B` and `A` is an attribute of `C`, we (probably) don't need +to know anything special about `C`—just how to modify dictionaries. + +A good Guppy/Heapy tutorial, while a bit old and incomplete, can be found on +[Andrey Smirnov's website](http://smira.ru/wp-content/uploads/2011/08/heapy.html). ## Handling different reference types +[...] + ### Dictionaries dicts, class attributes via `__dict__`, locals() @@ -260,7 +498,7 @@ Remaining areas to explore include behavior when metaclasses and more complex descriptors are involved. Implementing a more complete version of `replace()` is left as an exercise for the reader. -## Notes +## Footnotes 1. ^ This post relies _heavily_ on implementation details of CPython 2.7. While it could be adapted for Python 3 by examining @@ -274,3 +512,7 @@ is left as an exercise for the reader. [DOT files](https://en.wikipedia.org/wiki/DOT_(graph_description_language)) used to generate graphs in this post are [available on Gist](https://gist.github.com/earwig/edc13f04f871c110eea6). + +3. ^ They're actually grouped together by _clodo_ + ("class or dict object"), which is similar to type, but groups `__dict__`s + separately by their owner's type.