A Python parser for MediaWiki wikicode https://mwparserfromhell.readthedocs.io/
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
Ben Kurtovic 33acb3eea3 Starting main parser: Parser, Tokens, Tokenizer, Builder, BuildStack. 12 years ago
mwparserfromhell Starting main parser: Parser, Tokens, Tokenizer, Builder, BuildStack. 12 years ago
tests Update copyright notice; some additions. 12 years ago
.gitignore Initial commit 12 years ago
LICENSE Update copyright notice; some additions. 12 years ago
README.rst Updating with a more logical project structure. 12 years ago
setup.py Update copyright notice; some additions. 12 years ago

README.rst

mwparserfromhell
========================

**mwparserfromhell** (the *MediaWiki Parser from Hell*) is a Python package
that provides an easy-to-use and outrageously powerful parser for MediaWiki_
wikicode.

Developed by Earwig_ and named by `Σ`_.

Installation
------------

The easiest way to install the parser is through the `Python Package Index`_,
so you can install the latest release with ``pip install mwparserfromhell``
(`get pip`_). Alternatively, get the latest development version::

git clone git://github.com/earwig/mwparserfromhell.git mwparserfromhell
cd mwparserfromhell
python setup.py install

You can run the comprehensive unit testing suite with ``python setup.py test``.

Usage
-----

Normal usage is rather straightforward (where ``text`` is page text)::

>>> import mwparserfromhell
>>> wikicode = mwparserfromhell.parse(text)

``wikicode`` is a ``mwparserfromhell.Wikicode`` object, which acts like an
ordinary unicode object. It also contains a list of nodes representing the
components of the wikicode, including ordinary text nodes, templates, and
links. For example::

>>> wikicode = mwparserfromhell.parse(u"{{foo|bar|baz|eggs=spam}}")
>>> print wikicode
u"{{foo|bar|baz|eggs=spam}}"
>>>


[Template(name="foo", params={"1": "bar", "2": "baz", "eggs": "spam"})]
>>> template = templates[0]
>>> print template.name
foo
>>> print template.params
['bar', 'baz']
>>> print template[0]
bar
>>> print template["eggs"]
spam
>>> print template.render()
{{foo|bar|baz|eggs=spam}}

If ``get``\ 's argument is a number *n*, it'll return the *n*\ th parameter,
otherwise it will return the parameter with the given name. Unnamed parameters
are given numerical names starting with 1, so ``{{foo|bar}}`` is the same as
``{{foo|1=bar}}``, and ``templates[0].get(0) is templates[0].get("1")``.

By default, nested templates are supported like so::

>>> templates = parser.parse("{{foo|this {{includes a|template}}}}")
>>> print templates
[Template(name="foo", params={"1": "this {{includes a|template}}"})]
>>> print templates[0].get(0)
this {{includes a|template}}
>>> print templates[0].get(0).templates
[Template(name="includes a", params={"1": "template"})]
>>> print templates[0].get(0).templates[0].params[0]
template

Integration
-----------

``mwparserfromhell`` is used by and originally developed for EarwigBot_;
``Page`` objects have a ``parse_templates`` method that essentially calls
``Parser().parse()`` on ``page.get()``.

If you're using PyWikipedia_, your code might look like this::

import mwparserfromhell
import wikipedia as pywikibot
def parse_templates(title):
site = pywikibot.get_site()
page = pywikibot.Page(site, title)
text = page.get()
parser = mwparserfromhell.Parser()
return parser.parse(text)

If you're not using a library, you can parse templates in any page using the
following code (via the API_)::

import json
import urllib
import mwparserfromhell
API_URL = "http://en.wikipedia.org/w/api.php"
def parse_templates(title):
raw = urllib.urlopen(API_URL, data).read()
res = json.loads(raw)
text = res["query"]["pages"].values()[0]["revisions"][0]["*"]
parser = mwparserfromhell.Parser()
return parser.parse(text)

.. _MediaWiki: http://mediawiki.org
.. _Earwig: http://en.wikipedia.org/wiki/User:The_Earwig
.. _Σ: http://en.wikipedia.org/wiki/User:Σ
.. _Python Package Index: http://pypi.python.org
.. _get pip: http://pypi.python.org/pypi/pip
.. _EarwigBot: https://github.com/earwig/earwigbot
.. _PyWikipedia: http://pywikipediabot.sourceforge.net/
.. _API: http://mediawiki.org/wiki/API