A Python parser for MediaWiki wikicode https://mwparserfromhell.readthedocs.io/
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.rst 3.8 KiB

12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111
  1. mwparserfromhell
  2. ========================
  3. **mwparserfromhell** (the *MediaWiki Parser from Hell*) is a Python package
  4. that provides an easy-to-use and outrageously powerful parser for MediaWiki_
  5. wikicode.
  6. Developed by Earwig_ and named by `Σ`_.
  7. Installation
  8. ------------
  9. The easiest way to install the parser is through the `Python Package Index`_,
  10. so you can install the latest release with ``pip install mwparserfromhell``
  11. (`get pip`_). Alternatively, get the latest development version::
  12. git clone git://github.com/earwig/mwparserfromhell.git mwparserfromhell
  13. cd mwparserfromhell
  14. python setup.py install
  15. You can run the comprehensive unit testing suite with ``python setup.py test``.
  16. Usage
  17. -----
  18. Normal usage is rather straightforward (where ``text`` is page text)::
  19. >>> import mwparserfromhell
  20. >>> wikicode = mwparserfromhell.parse(text)
  21. ``wikicode`` is a ``mwparserfromhell.Wikicode`` object, which acts like an
  22. ordinary unicode object. It also contains a list of nodes representing the
  23. components of the wikicode, including ordinary text nodes, templates, and
  24. links. For example::
  25. >>> wikicode = mwparserfromhell.parse(u"{{foo|bar|baz|eggs=spam}}")
  26. >>> print wikicode
  27. u"{{foo|bar|baz|eggs=spam}}"
  28. >>>
  29. [Template(name="foo", params={"1": "bar", "2": "baz", "eggs": "spam"})]
  30. >>> template = templates[0]
  31. >>> print template.name
  32. foo
  33. >>> print template.params
  34. ['bar', 'baz']
  35. >>> print template[0]
  36. bar
  37. >>> print template["eggs"]
  38. spam
  39. >>> print template.render()
  40. {{foo|bar|baz|eggs=spam}}
  41. If ``get``\ 's argument is a number *n*, it'll return the *n*\ th parameter,
  42. otherwise it will return the parameter with the given name. Unnamed parameters
  43. are given numerical names starting with 1, so ``{{foo|bar}}`` is the same as
  44. ``{{foo|1=bar}}``, and ``templates[0].get(0) is templates[0].get("1")``.
  45. By default, nested templates are supported like so::
  46. >>> templates = parser.parse("{{foo|this {{includes a|template}}}}")
  47. >>> print templates
  48. [Template(name="foo", params={"1": "this {{includes a|template}}"})]
  49. >>> print templates[0].get(0)
  50. this {{includes a|template}}
  51. >>> print templates[0].get(0).templates
  52. [Template(name="includes a", params={"1": "template"})]
  53. >>> print templates[0].get(0).templates[0].params[0]
  54. template
  55. Integration
  56. -----------
  57. ``mwparserfromhell`` is used by and originally developed for EarwigBot_;
  58. ``Page`` objects have a ``parse_templates`` method that essentially calls
  59. ``Parser().parse()`` on ``page.get()``.
  60. If you're using PyWikipedia_, your code might look like this::
  61. import mwparserfromhell
  62. import wikipedia as pywikibot
  63. def parse_templates(title):
  64. site = pywikibot.get_site()
  65. page = pywikibot.Page(site, title)
  66. text = page.get()
  67. parser = mwparserfromhell.Parser()
  68. return parser.parse(text)
  69. If you're not using a library, you can parse templates in any page using the
  70. following code (via the API_)::
  71. import json
  72. import urllib
  73. import mwparserfromhell
  74. API_URL = "http://en.wikipedia.org/w/api.php"
  75. def parse_templates(title):
  76. raw = urllib.urlopen(API_URL, data).read()
  77. res = json.loads(raw)
  78. text = res["query"]["pages"].values()[0]["revisions"][0]["*"]
  79. parser = mwparserfromhell.Parser()
  80. return parser.parse(text)
  81. .. _MediaWiki: http://mediawiki.org
  82. .. _Earwig: http://en.wikipedia.org/wiki/User:The_Earwig
  83. .. _Σ: http://en.wikipedia.org/wiki/User:Σ
  84. .. _Python Package Index: http://pypi.python.org
  85. .. _get pip: http://pypi.python.org/pypi/pip
  86. .. _EarwigBot: https://github.com/earwig/earwigbot
  87. .. _PyWikipedia: http://pywikipediabot.sourceforge.net/
  88. .. _API: http://mediawiki.org/wiki/API