mwparserfromhell

A Python parser for MediaWiki wikicode https://mwparserfromhell.readthedocs.io/
Ben Kurtovic ce8adf4b2e Initial commit		12 年前
mwtemplateparserfromhell	Initial commit	12 年前
tests	Initial commit	12 年前
.gitignore	Initial commit	12 年前
LICENSE	Initial commit	12 年前
README.rst	Initial commit	12 年前
setup.py	Initial commit	12 年前
README.rst

mwtemplateparserfromhell
========================

**mwtemplateparserfromhell** (the *MediaWiki Template Parser from Hell*) is a
Python package that provides an easy-to-use and outrageously powerful template
parser for MediaWiki_ wikicode.

Coded by Earwig_ and named by `Σ`_.

Installation
------------

The easiest way to install the parser is through the `Python Package Index`_,
so you can install the latest release with ``pip install
mwtemplateparserfromhell`` (`get pip`_). Alternatively, get the latest
development version::

    git clone git://github.com/earwig/mwtemplateparserfromhell.git mwtemplateparserfromhell
    cd mwtemplateparserfromhell
    python setup.py install

You can run the comprehensive unit testing suite with ``python setup.py test``.

Usage
-----

Normal usage is rather straightforward (where ``text`` is page text)::

    >>> import mwtemplateparserfromhell
    >>> parser = mwtemplateparserfromhell.Parser()
    >>> templates = parser.parse(text)

``templates`` is a list of ``mwtemplateparserfromhell.Template`` objects, which
contain a ``name`` attribute, a ``params`` attribute, and a ``get()`` method.
For example::

    >>> templates = parser.parse("{{foo|bar|baz|eggs=spam}}")
    >>> print templates
    [Template(name="foo", params={"1": "bar", "2": "baz", "eggs": "spam"})]
    >>> print templates[0].name
    foo
    >>> print templates[0].params
    ['bar', 'baz']
    >>> print templates[0].get(0)
    bar
    >>> print templates[0].get("eggs")
    spam

If ``get``\ 's argument is a number *n*, it'll return the *n*\ th parameter,
otherwise it will return the parameter with the given name. Unnamed parameters
are given numerical names starting with 1, so ``{{foo|bar}}`` is the same as
``{{foo|1=bar}}``, and ``templates[0].get(0) is templates[0].get("1")``.

By default, nested templates are supported like so::

    >>> templates = parser.parse("{{foo|this {{includes a|template}}}}")
    >>> print templates
    [Template(name="foo", params={"1": "this {{includes a|template}}"})]
    >>> print templates[0].get(0)
    this {{includes a|template}}
    >>> print templates[0].get(0).templates
    [Template(name="includes a", params={"1": "template"})]
    >>> print templates[0].get(0).templates[0].params[0]
    template

Integration
-----------

``mwtemplateparserfromhell`` is used by and originally developed for
EarwigBot_; ``Page`` objects have a ``parse_templates`` method that essentially
calls ``Parser().parse()`` on ``page.get()``.

If you're using PyWikipedia_, your code might look like this::

    import mwtemplateparserfromhell
    import wikipedia as pywikibot
    def parse_templates(title):
        site = pywikibot.get_site()
        page = pywikibot.Page(site, title)
        text = page.get()
        parser = mwtemplateparserfromhell.Parser()
        return parser.parse(text)

If you're not using a library, you can parse templates in any page using the
following code (via the API_)::

    import json
    import urllib
    import mwtemplateparserfromhell
    API_URL = "http://en.wikipedia.org/w/api.php"
    def parse_templates(title):
        raw = urllib.urlopen(API_URL, data).read()
        res = json.loads(raw)
        text = res["query"]["pages"].values()[0]["revisions"][0]["*"]
        parser = mwtemplateparserfromhell.Parser()
        return parser.parse(text)

.. _MediaWiki:            http://mediawiki.org
.. _Earwig:               http://en.wikipedia.org/wiki/User:The_Earwig
.. _Σ:                    http://en.wikipedia.org/wiki/User:Σ
.. _Python Package Index: http://pypi.python.org
.. _get pip:              http://pypi.python.org/pypi/pip
.. _EarwigBot:            https://github.com/earwig/earwigbot
.. _PyWikipedia:          http://pywikipediabot.sourceforge.net/
.. _API:                  http://mediawiki.org/wiki/API