A Python parser for MediaWiki wikicode https://mwparserfromhell.readthedocs.io/
Você não pode selecionar mais de 25 tópicos Os tópicos devem começar com uma letra ou um número, podem incluir traços ('-') e podem ter até 35 caracteres.
 
 
 
 
Ben Kurtovic 81e5ce30af Working on the framework for the tokenizer, plus some cleanup, fixes. 12 anos atrás
mwparserfromhell Working on the framework for the tokenizer, plus some cleanup, fixes. 12 anos atrás
tests Update copyright notice; some additions. 12 anos atrás
.gitignore Initial commit 12 anos atrás
LICENSE Update copyright notice; some additions. 12 anos atrás
README.rst Updating with a more logical project structure. 12 anos atrás
setup.py Update copyright notice; some additions. 12 anos atrás

README.rst

mwparserfromhell
========================

**mwparserfromhell** (the *MediaWiki Parser from Hell*) is a Python package
that provides an easy-to-use and outrageously powerful parser for MediaWiki_
wikicode.

Developed by Earwig_ and named by `Σ`_.

Installation
------------

The easiest way to install the parser is through the `Python Package Index`_,
so you can install the latest release with ``pip install mwparserfromhell``
(`get pip`_). Alternatively, get the latest development version::

git clone git://github.com/earwig/mwparserfromhell.git mwparserfromhell
cd mwparserfromhell
python setup.py install

You can run the comprehensive unit testing suite with ``python setup.py test``.

Usage
-----

Normal usage is rather straightforward (where ``text`` is page text)::

>>> import mwparserfromhell
>>> wikicode = mwparserfromhell.parse(text)

``wikicode`` is a ``mwparserfromhell.Wikicode`` object, which acts like an
ordinary unicode object. It also contains a list of nodes representing the
components of the wikicode, including ordinary text nodes, templates, and
links. For example::

>>> wikicode = mwparserfromhell.parse(u"{{foo|bar|baz|eggs=spam}}")
>>> print wikicode
u"{{foo|bar|baz|eggs=spam}}"
>>>


[Template(name="foo", params={"1": "bar", "2": "baz", "eggs": "spam"})]
>>> template = templates[0]
>>> print template.name
foo
>>> print template.params
['bar', 'baz']
>>> print template[0]
bar
>>> print template["eggs"]
spam
>>> print template.render()
{{foo|bar|baz|eggs=spam}}

If ``get``\ 's argument is a number *n*, it'll return the *n*\ th parameter,
otherwise it will return the parameter with the given name. Unnamed parameters
are given numerical names starting with 1, so ``{{foo|bar}}`` is the same as
``{{foo|1=bar}}``, and ``templates[0].get(0) is templates[0].get("1")``.

By default, nested templates are supported like so::

>>> templates = parser.parse("{{foo|this {{includes a|template}}}}")
>>> print templates
[Template(name="foo", params={"1": "this {{includes a|template}}"})]
>>> print templates[0].get(0)
this {{includes a|template}}
>>> print templates[0].get(0).templates
[Template(name="includes a", params={"1": "template"})]
>>> print templates[0].get(0).templates[0].params[0]
template

Integration
-----------

``mwparserfromhell`` is used by and originally developed for EarwigBot_;
``Page`` objects have a ``parse_templates`` method that essentially calls
``Parser().parse()`` on ``page.get()``.

If you're using PyWikipedia_, your code might look like this::

import mwparserfromhell
import wikipedia as pywikibot
def parse_templates(title):
site = pywikibot.get_site()
page = pywikibot.Page(site, title)
text = page.get()
parser = mwparserfromhell.Parser()
return parser.parse(text)

If you're not using a library, you can parse templates in any page using the
following code (via the API_)::

import json
import urllib
import mwparserfromhell
API_URL = "http://en.wikipedia.org/w/api.php"
def parse_templates(title):
raw = urllib.urlopen(API_URL, data).read()
res = json.loads(raw)
text = res["query"]["pages"].values()[0]["revisions"][0]["*"]
parser = mwparserfromhell.Parser()
return parser.parse(text)

.. _MediaWiki: http://mediawiki.org
.. _Earwig: http://en.wikipedia.org/wiki/User:The_Earwig
.. _Σ: http://en.wikipedia.org/wiki/User:Σ
.. _Python Package Index: http://pypi.python.org
.. _get pip: http://pypi.python.org/pypi/pip
.. _EarwigBot: https://github.com/earwig/earwigbot
.. _PyWikipedia: http://pywikipediabot.sourceforge.net/
.. _API: http://mediawiki.org/wiki/API