|
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111 |
- mwparserfromhell
- ========================
-
- **mwparserfromhell** (the *MediaWiki Parser from Hell*) is a Python package
- that provides an easy-to-use and outrageously powerful parser for MediaWiki_
- wikicode.
-
- Developed by Earwig_ and named by `Σ`_.
-
- Installation
- ------------
-
- The easiest way to install the parser is through the `Python Package Index`_,
- so you can install the latest release with ``pip install mwparserfromhell``
- (`get pip`_). Alternatively, get the latest development version::
-
- git clone git://github.com/earwig/mwparserfromhell.git mwparserfromhell
- cd mwparserfromhell
- python setup.py install
-
- You can run the comprehensive unit testing suite with ``python setup.py test``.
-
- Usage
- -----
-
- Normal usage is rather straightforward (where ``text`` is page text)::
-
- >>> import mwparserfromhell
- >>> wikicode = mwparserfromhell.parse(text)
-
- ``wikicode`` is a ``mwparserfromhell.Wikicode`` object, which acts like an
- ordinary unicode object. It also contains a list of nodes representing the
- components of the wikicode, including ordinary text nodes, templates, and
- links. For example::
-
- >>> wikicode = mwparserfromhell.parse(u"{{foo|bar|baz|eggs=spam}}")
- >>> print wikicode
- u"{{foo|bar|baz|eggs=spam}}"
- >>>
-
-
- [Template(name="foo", params={"1": "bar", "2": "baz", "eggs": "spam"})]
- >>> template = templates[0]
- >>> print template.name
- foo
- >>> print template.params
- ['bar', 'baz']
- >>> print template[0]
- bar
- >>> print template["eggs"]
- spam
- >>> print template.render()
- {{foo|bar|baz|eggs=spam}}
-
- If ``get``\ 's argument is a number *n*, it'll return the *n*\ th parameter,
- otherwise it will return the parameter with the given name. Unnamed parameters
- are given numerical names starting with 1, so ``{{foo|bar}}`` is the same as
- ``{{foo|1=bar}}``, and ``templates[0].get(0) is templates[0].get("1")``.
-
- By default, nested templates are supported like so::
-
- >>> templates = parser.parse("{{foo|this {{includes a|template}}}}")
- >>> print templates
- [Template(name="foo", params={"1": "this {{includes a|template}}"})]
- >>> print templates[0].get(0)
- this {{includes a|template}}
- >>> print templates[0].get(0).templates
- [Template(name="includes a", params={"1": "template"})]
- >>> print templates[0].get(0).templates[0].params[0]
- template
-
- Integration
- -----------
-
- ``mwparserfromhell`` is used by and originally developed for EarwigBot_;
- ``Page`` objects have a ``parse_templates`` method that essentially calls
- ``Parser().parse()`` on ``page.get()``.
-
- If you're using PyWikipedia_, your code might look like this::
-
- import mwparserfromhell
- import wikipedia as pywikibot
- def parse_templates(title):
- site = pywikibot.get_site()
- page = pywikibot.Page(site, title)
- text = page.get()
- parser = mwparserfromhell.Parser()
- return parser.parse(text)
-
- If you're not using a library, you can parse templates in any page using the
- following code (via the API_)::
-
- import json
- import urllib
- import mwparserfromhell
- API_URL = "http://en.wikipedia.org/w/api.php"
- def parse_templates(title):
- raw = urllib.urlopen(API_URL, data).read()
- res = json.loads(raw)
- text = res["query"]["pages"].values()[0]["revisions"][0]["*"]
- parser = mwparserfromhell.Parser()
- return parser.parse(text)
-
- .. _MediaWiki: http://mediawiki.org
- .. _Earwig: http://en.wikipedia.org/wiki/User:The_Earwig
- .. _Σ: http://en.wikipedia.org/wiki/User:Σ
- .. _Python Package Index: http://pypi.python.org
- .. _get pip: http://pypi.python.org/pypi/pip
- .. _EarwigBot: https://github.com/earwig/earwigbot
- .. _PyWikipedia: http://pywikipediabot.sourceforge.net/
- .. _API: http://mediawiki.org/wiki/API
|