mwparserfromhell

A Python parser for MediaWiki wikicode https://mwparserfromhell.readthedocs.io/

Ben Kurtovic 81e5ce30af Working on the framework for the tokenizer, plus some cleanup, fixes.		12 years ago
mwparserfromhell	Working on the framework for the tokenizer, plus some cleanup, fixes.	12 years ago
tests	Update copyright notice; some additions.	12 years ago
.gitignore	Initial commit	12 years ago
LICENSE	Update copyright notice; some additions.	12 years ago
README.rst	Updating with a more logical project structure.	12 years ago
setup.py	Update copyright notice; some additions.	12 years ago

README.rst

mwparserfromhell
========================

**mwparserfromhell** (the *MediaWiki Parser from Hell*) is a Python package
that provides an easy-to-use and outrageously powerful parser for MediaWiki_
wikicode.

Developed by Earwig_ and named by `Σ`_.

Installation
------------

The easiest way to install the parser is through the `Python Package Index`_,
so you can install the latest release with ``pip install mwparserfromhell``
(`get pip`_). Alternatively, get the latest development version::

    git clone git://github.com/earwig/mwparserfromhell.git mwparserfromhell
    cd mwparserfromhell
    python setup.py install

You can run the comprehensive unit testing suite with ``python setup.py test``.

Usage
-----

Normal usage is rather straightforward (where ``text`` is page text)::

    >>> import mwparserfromhell
    >>> wikicode = mwparserfromhell.parse(text)

``wikicode`` is a ``mwparserfromhell.Wikicode`` object, which acts like an
ordinary unicode object. It also contains a list of nodes representing the
components of the wikicode, including ordinary text nodes, templates, and
links. For example::

    >>> wikicode = mwparserfromhell.parse(u"{{foo|bar|baz|eggs=spam}}")
    >>> print wikicode
    u"{{foo|bar|baz|eggs=spam}}"
    >>>


    [Template(name="foo", params={"1": "bar", "2": "baz", "eggs": "spam"})]
    >>> template = templates[0]
    >>> print template.name
    foo
    >>> print template.params
    ['bar', 'baz']
    >>> print template[0]
    bar
    >>> print template["eggs"]
    spam
    >>> print template.render()
    {{foo|bar|baz|eggs=spam}}

If ``get``\ 's argument is a number *n*, it'll return the *n*\ th parameter,
otherwise it will return the parameter with the given name. Unnamed parameters
are given numerical names starting with 1, so ``{{foo|bar}}`` is the same as
``{{foo|1=bar}}``, and ``templates[0].get(0) is templates[0].get("1")``.

By default, nested templates are supported like so::

    >>> templates = parser.parse("{{foo|this {{includes a|template}}}}")
    >>> print templates
    [Template(name="foo", params={"1": "this {{includes a|template}}"})]
    >>> print templates[0].get(0)
    this {{includes a|template}}
    >>> print templates[0].get(0).templates
    [Template(name="includes a", params={"1": "template"})]
    >>> print templates[0].get(0).templates[0].params[0]
    template

Integration
-----------

``mwparserfromhell`` is used by and originally developed for EarwigBot_;
``Page`` objects have a ``parse_templates`` method that essentially calls
``Parser().parse()`` on ``page.get()``.

If you're using PyWikipedia_, your code might look like this::

    import mwparserfromhell
    import wikipedia as pywikibot
    def parse_templates(title):
        site = pywikibot.get_site()
        page = pywikibot.Page(site, title)
        text = page.get()
        parser = mwparserfromhell.Parser()
        return parser.parse(text)

If you're not using a library, you can parse templates in any page using the
following code (via the API_)::

    import json
    import urllib
    import mwparserfromhell
    API_URL = "http://en.wikipedia.org/w/api.php"
    def parse_templates(title):
        raw = urllib.urlopen(API_URL, data).read()
        res = json.loads(raw)
        text = res["query"]["pages"].values()[0]["revisions"][0]["*"]
        parser = mwparserfromhell.Parser()
        return parser.parse(text)

.. _MediaWiki:            http://mediawiki.org
.. _Earwig:               http://en.wikipedia.org/wiki/User:The_Earwig
.. _Σ:                    http://en.wikipedia.org/wiki/User:Σ
.. _Python Package Index: http://pypi.python.org
.. _get pip:              http://pypi.python.org/pypi/pip
.. _EarwigBot:            https://github.com/earwig/earwigbot
.. _PyWikipedia:          http://pywikipediabot.sourceforge.net/
.. _API:                  http://mediawiki.org/wiki/API