A Python parser for MediaWiki wikicode https://mwparserfromhell.readthedocs.io/
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.rst 3.7 KiB

12 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105
  1. mwtemplateparserfromhell
  2. ========================
  3. **mwtemplateparserfromhell** (the *MediaWiki Template Parser from Hell*) is a
  4. Python package that provides an easy-to-use and outrageously powerful template
  5. parser for MediaWiki_ wikicode.
  6. Coded by Earwig_ and named by `Σ`_.
  7. Installation
  8. ------------
  9. The easiest way to install the parser is through the `Python Package Index`_,
  10. so you can install the latest release with ``pip install
  11. mwtemplateparserfromhell`` (`get pip`_). Alternatively, get the latest
  12. development version::
  13. git clone git://github.com/earwig/mwtemplateparserfromhell.git mwtemplateparserfromhell
  14. cd mwtemplateparserfromhell
  15. python setup.py install
  16. You can run the comprehensive unit testing suite with ``python setup.py test``.
  17. Usage
  18. -----
  19. Normal usage is rather straightforward (where ``text`` is page text)::
  20. >>> import mwtemplateparserfromhell
  21. >>> parser = mwtemplateparserfromhell.Parser()
  22. >>> templates = parser.parse(text)
  23. ``templates`` is a list of ``mwtemplateparserfromhell.Template`` objects, which
  24. contain a ``name`` attribute, a ``params`` attribute, and a ``get()`` method.
  25. For example::
  26. >>> templates = parser.parse("{{foo|bar|baz|eggs=spam}}")
  27. >>> print templates
  28. [Template(name="foo", params={"1": "bar", "2": "baz", "eggs": "spam"})]
  29. >>> print templates[0].name
  30. foo
  31. >>> print templates[0].params
  32. ['bar', 'baz']
  33. >>> print templates[0].get(0)
  34. bar
  35. >>> print templates[0].get("eggs")
  36. spam
  37. If ``get``\ 's argument is a number *n*, it'll return the *n*\ th parameter,
  38. otherwise it will return the parameter with the given name. Unnamed parameters
  39. are given numerical names starting with 1, so ``{{foo|bar}}`` is the same as
  40. ``{{foo|1=bar}}``, and ``templates[0].get(0) is templates[0].get("1")``.
  41. By default, nested templates are supported like so::
  42. >>> templates = parser.parse("{{foo|this {{includes a|template}}}}")
  43. >>> print templates
  44. [Template(name="foo", params={"1": "this {{includes a|template}}"})]
  45. >>> print templates[0].get(0)
  46. this {{includes a|template}}
  47. >>> print templates[0].get(0).templates
  48. [Template(name="includes a", params={"1": "template"})]
  49. >>> print templates[0].get(0).templates[0].params[0]
  50. template
  51. Integration
  52. -----------
  53. ``mwtemplateparserfromhell`` is used by and originally developed for
  54. EarwigBot_; ``Page`` objects have a ``parse_templates`` method that essentially
  55. calls ``Parser().parse()`` on ``page.get()``.
  56. If you're using PyWikipedia_, your code might look like this::
  57. import mwtemplateparserfromhell
  58. import wikipedia as pywikibot
  59. def parse_templates(title):
  60. site = pywikibot.get_site()
  61. page = pywikibot.Page(site, title)
  62. text = page.get()
  63. parser = mwtemplateparserfromhell.Parser()
  64. return parser.parse(text)
  65. If you're not using a library, you can parse templates in any page using the
  66. following code (via the API_)::
  67. import json
  68. import urllib
  69. import mwtemplateparserfromhell
  70. API_URL = "http://en.wikipedia.org/w/api.php"
  71. def parse_templates(title):
  72. raw = urllib.urlopen(API_URL, data).read()
  73. res = json.loads(raw)
  74. text = res["query"]["pages"].values()[0]["revisions"][0]["*"]
  75. parser = mwtemplateparserfromhell.Parser()
  76. return parser.parse(text)
  77. .. _MediaWiki: http://mediawiki.org
  78. .. _Earwig: http://en.wikipedia.org/wiki/User:The_Earwig
  79. .. _Σ: http://en.wikipedia.org/wiki/User:Σ
  80. .. _Python Package Index: http://pypi.python.org
  81. .. _get pip: http://pypi.python.org/pypi/pip
  82. .. _EarwigBot: https://github.com/earwig/earwigbot
  83. .. _PyWikipedia: http://pywikipediabot.sourceforge.net/
  84. .. _API: http://mediawiki.org/wiki/API