A Python parser for MediaWiki wikicode https://mwparserfromhell.readthedocs.io/
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.rst 5.0 KiB

12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142
  1. mwparserfromhell
  2. ================
  3. **mwparserfromhell** (the *MediaWiki Parser from Hell*) is a Python package
  4. that provides an easy-to-use and outrageously powerful parser for MediaWiki_
  5. wikicode. It supports Python 2 and Python 3.
  6. Developed by Earwig_ with help from `Σ`_.
  7. Installation
  8. ------------
  9. The easiest way to install the parser is through the `Python Package Index`_,
  10. so you can install the latest release with ``pip install mwparserfromhell``
  11. (`get pip`_). Alternatively, get the latest development version::
  12. git clone git://github.com/earwig/mwparserfromhell.git
  13. cd mwparserfromhell
  14. python setup.py install
  15. You can run the comprehensive unit testing suite with
  16. ``python setup.py test -q``.
  17. Usage
  18. -----
  19. Normal usage is rather straightforward (where ``text`` is page text)::
  20. >>> import mwparserfromhell
  21. >>> wikicode = mwparserfromhell.parse(text)
  22. ``wikicode`` is a ``mwparserfromhell.Wikicode`` object, which acts like an
  23. ordinary ``unicode`` object (or ``str`` in Python 3) with some extra methods.
  24. For example::
  25. >>> text = "I has a template! {{foo|bar|baz|eggs=spam}} See it?"
  26. >>> wikicode = mwparserfromhell.parse(text)
  27. >>> print wikicode
  28. I has a template! {{foo|bar|baz|eggs=spam}} See it?
  29. >>> templates = wikicode.filter_templates()
  30. >>> print templates
  31. ['{{foo|bar|baz|eggs=spam}}']
  32. >>> template = templates[0]
  33. >>> print template.name
  34. foo
  35. >>> print template.params
  36. ['bar', 'baz', 'eggs=spam']
  37. >>> print template.get(1).value
  38. bar
  39. >>> print template.get("eggs").value
  40. spam
  41. Since every node you reach is also a ``Wikicode`` object, it's trivial to get
  42. nested templates::
  43. >>> code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}")
  44. >>> print code.filter_templates()
  45. ['{{foo|this {{includes a|template}}}}']
  46. >>> foo = code.filter_templates()[0]
  47. >>> print foo.get(1).value
  48. this {{includes a|template}}
  49. >>> print foo.get(1).value.filter_templates()[0]
  50. {{includes a|template}}
  51. >>> print foo.get(1).value.filter_templates()[0].get(1).value
  52. template
  53. Additionally, you can include nested templates in ``filter_templates()`` by
  54. passing ``recursive=True``::
  55. >>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}"
  56. >>> mwparserfromhell.parse(text).filter_templates(recursive=True)
  57. ['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}']
  58. Templates can be easily modified to add, remove, or alter params. ``Wikicode``
  59. can also be treated like a list with ``append()``, ``insert()``, ``remove()``,
  60. ``replace()``, and more::
  61. >>> text = "{{cleanup}} '''Foo''' is a [[bar]]. {{uncategorized}}"
  62. >>> code = mwparserfromhell.parse(text)
  63. >>> for template in code.filter_templates():
  64. ... if template.name == "cleanup" and not template.has_param("date"):
  65. ... template.add("date", "July 2012")
  66. ...
  67. >>> print code
  68. {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{uncategorized}}
  69. >>> code.replace("{{uncategorized}}", "{{bar-stub}}")
  70. >>> print code
  71. {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
  72. >>> print code.filter_templates()
  73. ['{{cleanup|date=July 2012}}', '{{bar-stub}}']
  74. You can then convert ``code`` back into a regular ``unicode`` object (for
  75. saving the page!) by calling ``unicode()`` on it::
  76. >>> text = unicode(code)
  77. >>> print text
  78. {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
  79. >>> text == code
  80. True
  81. Likewise, use ``str(code)`` in Python 3.
  82. Integration
  83. -----------
  84. ``mwparserfromhell`` is used by and originally developed for EarwigBot_;
  85. ``Page`` objects have a ``parse`` method that essentially calls
  86. ``mwparserfromhell.parse()`` on ``page.get()``.
  87. If you're using PyWikipedia_, your code might look like this::
  88. import mwparserfromhell
  89. import wikipedia as pywikibot
  90. def parse(title):
  91. site = pywikibot.get_site()
  92. page = pywikibot.Page(site, title)
  93. text = page.get()
  94. return mwparserfromhell.parse(text)
  95. If you're not using a library, you can parse templates in any page using the
  96. following code (via the API_)::
  97. import json
  98. import urllib
  99. import mwparserfromhell
  100. API_URL = "http://en.wikipedia.org/w/api.php"
  101. def parse(title):
  102. data = {"action": "query", "prop": "revisions", "rvlimit": 1,
  103. "rvprop": "content", "format": "json", "titles": title}
  104. raw = urllib.urlopen(API_URL, urllib.urlencode(data)).read()
  105. res = json.loads(raw)
  106. text = res["query"]["pages"].values()[0]["revisions"][0]["*"]
  107. return mwparserfromhell.parse(text)
  108. .. _MediaWiki: http://mediawiki.org
  109. .. _Earwig: http://en.wikipedia.org/wiki/User:The_Earwig
  110. .. _Σ: http://en.wikipedia.org/wiki/User:%CE%A3
  111. .. _Python Package Index: http://pypi.python.org
  112. .. _get pip: http://pypi.python.org/pypi/pip
  113. .. _EarwigBot: https://github.com/earwig/earwigbot
  114. .. _PyWikipedia: http://pywikipediabot.sourceforge.net/
  115. .. _API: http://mediawiki.org/wiki/API