Przeglądaj źródła

Finish section on using Guppy.

master
Ben Kurtovic 9 lat temu
rodzic
commit
63ef371c55
1 zmienionych plików z 248 dodań i 6 usunięć
  1. +248
    -6
      _drafts/python-object-replacement.md

+ 248
- 6
_drafts/python-object-replacement.md Wyświetl plik

@@ -140,7 +140,7 @@ False
{% endhighlight %}

Also, this won't work if `A` and `B` are different sizes, since we will be
either reading from or writing to memory we don't necessarily own:
either reading from or writing to memory that we don't necessarily own:

{% highlight pycon %}

@@ -192,21 +192,259 @@ delving into the actual problem.

### Feature demonstration

Guppy's interface is deceptively simple. We begin by creating an instance of
the Heapy interface, which is the component of Guppy that has the features we
want:
Guppy's interface is deceptively simple. We begin by calling
[`guppy.hpy()`](http://guppy-pe.sourceforge.net/guppy.html#kindnames.guppy.hpy),
to expose the Heapy interface, which is the component of Guppy that has the
features we want:

{% highlight pycon %}

>>> import guppy
>>> hp = guppy.hpy()
>>> hp
Top level interface to Heapy.
Use eg: hp.doc for more info on hp.

{% endhighlight %}

[...]
Calling
[`hp.heap()`](http://guppy-pe.sourceforge.net/heapy_Use.html#heapykinds.Use.heap)
shows us a table of the objects known to Guppy, grouped together
(mathematically speaking,
[_partitioned_](https://en.wikipedia.org/wiki/Partition_of_a_set)) by
type<sup><a id="ref3" href="#fn3">[3]</a></sup> and sorted by how much space
they take up in memory:

{% highlight pycon %}

>>> heap = hp.heap()
>>> heap
Partition of a set of 45761 objects. Total size = 4699200 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 15547 34 1494736 32 1494736 32 str
1 8356 18 770272 16 2265008 48 tuple
2 346 1 452080 10 2717088 58 dict (no owner)
3 13685 30 328440 7 3045528 65 int
4 71 0 221096 5 3266624 70 dict of module
5 1652 4 211456 4 3478080 74 types.CodeType
6 199 0 210856 4 3688936 79 dict of type
7 1614 4 193680 4 3882616 83 function
8 199 0 177008 4 4059624 86 type
9 124 0 135328 3 4194952 89 dict of class
<91 more rows. Type e.g. '_.more' to view.>

{% endhighlight %}

This object (called an
[`IdentitySet`](http://guppy-pe.sourceforge.net/heapy_UniSet.html#heapykinds.IdentitySet))
looks bizarre, but it can be treated roughly like a list. If we want to take a
look at strings, we can do `heap[0]`:

{% highlight pycon %}

>>> heap[0]
Partition of a set of 22606 objects. Total size = 2049896 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 22606 100 2049896 100 2049896 100 str

{% endhighlight %}

This isn't very useful, though. What we really want to do is re-partition this
subset by another relationship. There are a number of options:

{% highlight pycon %}

>>> heap[0].byid # Group by object ID; each subset therefore has one element
Set of 22606 <str> objects. Total size = 2049896 bytes.
Index Size % Cumulative % Representation (limited)
0 7480 0.4 7480 0.4 'The class Bi... copy of S.\n'
1 4872 0.2 12352 0.6 "Support for ... 'error'.\n\n"
2 4760 0.2 17112 0.8 'Heap queues\...at Art! :-)\n'
3 4760 0.2 21872 1.1 'Heap queues\...at Art! :-)\n'
4 3896 0.2 25768 1.3 'This module ...ng function\n'
5 3824 0.2 29592 1.4 'The type of ...call order.\n'
6 3088 0.2 32680 1.6 't\x00\x00|\x...x00|\x02\x00S'
7 2992 0.1 35672 1.7 'HeapView(roo... size, etc.\n'
8 2808 0.1 38480 1.9 'Directory tr...ories\n\n '
9 2640 0.1 41120 2.0 'The class No... otherwise.\n'
<22596 more rows. Type e.g. '_.more' to view.>

{% endhighlight %}

{% highlight pycon %}

>>> heap[0].byrcs # Group by what types of objects reference the strings
Partition of a set of 22606 objects. Total size = 2049896 bytes.
Index Count % Size % Cumulative % Referrers by Kind (class / dict of class)
0 6146 27 610752 30 610752 30 types.CodeType
1 5304 23 563984 28 1174736 57 tuple
2 4104 18 237536 12 1412272 69 dict (no owner)
3 1959 9 139880 7 1552152 76 list
4 564 2 136080 7 1688232 82 function, tuple
5 809 4 97896 5 1786128 87 dict of module
6 346 2 71760 4 1857888 91 dict of type
7 365 2 19408 1 1877296 92 dict of module, tuple
8 192 1 16176 1 1893472 92 dict (no owner), list
9 232 1 11784 1 1905256 93 dict of class, function, tuple, types.CodeType
<229 more rows. Type e.g. '_.more' to view.>

{% endhighlight %}

{% highlight pycon %}

>>> heap[0].byvia # Group by how the strings are related to their referrers
Partition of a set of 22606 objects. Total size = 2049896 bytes.
Index Count % Size % Cumulative % Referred Via:
0 2656 12 420456 21 420456 21 '[0]'
1 2095 9 259008 13 679464 33 '.co_code'
2 2095 9 249912 12 929376 45 '.co_filename'
3 564 2 136080 7 1065456 52 '.func_doc', '[0]'
4 243 1 103528 5 1168984 57 "['__doc__']"
5 1930 9 100584 5 1269568 62 '.co_lnotab'
6 502 2 31128 2 1300696 63 '[1]'
7 306 1 16272 1 1316968 64 '[2]'
8 242 1 12960 1 1329928 65 '[3]'
9 184 1 9872 0 1339800 65 '[4]'
<7323 more rows. Type e.g. '_.more' to view.>

{% endhighlight %}

From this, we can see that the plurality of memory devoted to strings is taken
up by those referenced by code objects (`types.CodeType` represents
Python code—accessible from a non-C-defined function through
`func.func_code`—and contains things like the names of its local variables and
the actual sequence of opcodes that make it up).

For fun, let's pick a random string.

{% highlight pycon %}

>>> import random
>>> obj = heap[0].byid[random.randrange(0, heap[0].count)]
>>> obj
Set of 1 <str> object. Total size = 176 bytes.
Index Size % Cumulative % Representation (limited)
0 176 100.0 176 100.0 'Define names...not listed.\n'

{% endhighlight %}

Interesting. Since this heap subset contains only one element, we can use
[`.theone`](http://guppy-pe.sourceforge.net/heapy_UniSet.html#heapykinds.IdentitySetSingleton.theone)
to get the actual object represented here:

{% highlight pycon %}

>>> obj.theone
'Define names for all type symbols known in the standard interpreter.\n\nTypes that are part of optional modules (e.g. array) are not listed.\n'

{% endhighlight %}

Looks like the docstring for the
[`types`](https://docs.python.org/2/library/types.html) module. We can confirm
by using
[`.referrers`](http://guppy-pe.sourceforge.net/heapy_UniSet.html#heapykinds.IdentitySet.referrers)
to get the set of objects that refer to objects in the given set:

{% highlight pycon %}

>>> obj.referrers
Partition of a set of 1 object. Total size = 3352 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 1 100 3352 100 3352 100 dict of module

{% endhighlight %}

This is `types.__dict__` (since the docstring we got is actually stored as
`types.__dict__["__doc__"]`), so if we use `.referrers` again:

{% highlight pycon %}

>>> obj.referrers.referrers
Partition of a set of 1 object. Total size = 56 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 1 100 56 100 56 100 module
>>> obj.referrers.referrers.theone
<module 'types' from '/usr/local/Cellar/python/2.7.8_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/types.pyc'>
>>> import types
>>> types.__doc__ is obj.theone
True

{% endhighlight %}

_But why did we find an object in the `types` module if we never imported it?_
Well, let's see. We can use
[`hp.iso()`](http://guppy-pe.sourceforge.net/heapy_Use.html#heapykinds.Use.iso)
to get the Heapy set consisting of a single given object:

{% highlight pycon %}

>>> hp.iso(types)
Partition of a set of 1 object. Total size = 56 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 1 100 56 100 56 100 module

{% endhighlight %}

Using a similar procedure as before, we see that `types` is imported by the
[`traceback`](https://docs.python.org/2/library/traceback.html) module:

{% highlight pycon %}

>>> hp.iso(types).referrers
Partition of a set of 10 objects. Total size = 25632 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 2 20 13616 53 13616 53 dict (no owner)
1 5 50 9848 38 23464 92 dict of module
2 1 10 1048 4 24512 96 dict of guppy.etc.Glue.Interface
3 1 10 1048 4 25560 100 dict of guppy.etc.Glue.Share
4 1 10 72 0 25632 100 tuple
>>> hp.iso(types).referrers[1].byid
Set of 5 <dict of module> objects. Total size = 9848 bytes.
Index Size % Cumulative % Owner Name
0 3352 34.0 3352 34.0 traceback
1 3352 34.0 6704 68.1 warnings
2 1048 10.6 7752 78.7 __main__
3 1048 10.6 8800 89.4 abc
4 1048 10.6 9848 100.0 guppy.etc.Glue

{% endhighlight %}

...and that is imported by
[`site`](https://docs.python.org/2/library/site.html):

{% highlight pycon %}

>>> import traceback
>>> hp.iso(traceback).referrers
Partition of a set of 3 objects. Total size = 15992 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 1 33 12568 79 12568 79 dict (no owner)
1 1 33 3352 21 15920 100 dict of module
2 1 33 72 0 15992 100 tuple
>>> hp.iso(traceback).referrers[1].byid
Set of 1 <dict of module> object. Total size = 3352 bytes.
Index Size % Cumulative % Owner Name
0 3352 100.0 3352 100.0 site

{% endhighlight %}

Since `site` is imported by Python on startup, we've figured out why objects
from `types` exist, even though we've never used them.

We've learned something important, too. When objects are stored as ordinary
attributes of a parent object (like `types.__doc__`, `traceback.types`, and
`site.traceback` from above), they are not referenced directly by the parent
object, but by that object's `__dict__` attribute. Therefore, if we want to
replace `A` with `B` and `A` is an attribute of `C`, we (probably) don't need
to know anything special about `C`—just how to modify dictionaries.

A good Guppy/Heapy tutorial, while a bit old and incomplete, can be found on
[Andrey Smirnov's website](http://smira.ru/wp-content/uploads/2011/08/heapy.html).

## Handling different reference types

[...]

### Dictionaries

dicts, class attributes via `__dict__`, locals()
@@ -260,7 +498,7 @@ Remaining areas to explore include behavior when metaclasses and more complex
descriptors are involved. Implementing a more complete version of `replace()`
is left as an exercise for the reader.

## Notes
## Footnotes

1. <a id="fn1" href="#ref1">^</a> This post relies _heavily_ on implementation
details of CPython 2.7. While it could be adapted for Python 3 by examining
@@ -274,3 +512,7 @@ is left as an exercise for the reader.
[DOT files](https://en.wikipedia.org/wiki/DOT_(graph_description_language))
used to generate graphs in this post are
[available on Gist](https://gist.github.com/earwig/edc13f04f871c110eea6).

3. <a id="fn3" href="#ref3">^</a> They're actually grouped together by _clodo_
("class or dict object"), which is similar to type, but groups `__dict__`s
separately by their owner's type.

Ładowanie…
Anuluj
Zapisz