A Python parser for MediaWiki wikicode https://mwparserfromhell.readthedocs.io/
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

12 年之前
12 年之前
12 年之前
12 年之前
12 年之前
12 年之前
12 年之前
12 年之前
12 年之前
12 年之前
12 年之前
12 年之前
12 年之前
12 年之前
12 年之前
12 年之前
12 年之前
12 年之前
12 年之前
12 年之前
12 年之前
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148
  1. mwparserfromhell
  2. ================
  3. **mwparserfromhell** (the *MediaWiki Parser from Hell*) is a Python package
  4. that provides an easy-to-use and outrageously powerful parser for MediaWiki_
  5. wikicode. It supports Python 2 and Python 3.
  6. Developed by Earwig_ with help from `Σ`_.
  7. Installation
  8. ------------
  9. The easiest way to install the parser is through the `Python Package Index`_,
  10. so you can install the latest release with ``pip install mwparserfromhell``
  11. (`get pip`_). Alternatively, get the latest development version::
  12. git clone git://github.com/earwig/mwparserfromhell.git
  13. cd mwparserfromhell
  14. python setup.py install
  15. If you get ``error: Unable to find vcvarsall.bat`` while installing, this is
  16. because Windows can't find the compiler for C extensions. Consult this
  17. `StackOverflow question`_ for help. You can also set ``ext_modules`` in
  18. ``setup.py`` to an empty list to prevent the extension from building.
  19. You can run the comprehensive unit testing suite with
  20. ``python setup.py test -q``.
  21. Usage
  22. -----
  23. Normal usage is rather straightforward (where ``text`` is page text)::
  24. >>> import mwparserfromhell
  25. >>> wikicode = mwparserfromhell.parse(text)
  26. ``wikicode`` is a ``mwparserfromhell.Wikicode`` object, which acts like an
  27. ordinary ``unicode`` object (or ``str`` in Python 3) with some extra methods.
  28. For example::
  29. >>> text = "I has a template! {{foo|bar|baz|eggs=spam}} See it?"
  30. >>> wikicode = mwparserfromhell.parse(text)
  31. >>> print wikicode
  32. I has a template! {{foo|bar|baz|eggs=spam}} See it?
  33. >>> templates = wikicode.filter_templates()
  34. >>> print templates
  35. ['{{foo|bar|baz|eggs=spam}}']
  36. >>> template = templates[0]
  37. >>> print template.name
  38. foo
  39. >>> print template.params
  40. ['bar', 'baz', 'eggs=spam']
  41. >>> print template.get(1).value
  42. bar
  43. >>> print template.get("eggs").value
  44. spam
  45. Since every node you reach is also a ``Wikicode`` object, it's trivial to get
  46. nested templates::
  47. >>> code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}")
  48. >>> print code.filter_templates()
  49. ['{{foo|this {{includes a|template}}}}']
  50. >>> foo = code.filter_templates()[0]
  51. >>> print foo.get(1).value
  52. this {{includes a|template}}
  53. >>> print foo.get(1).value.filter_templates()[0]
  54. {{includes a|template}}
  55. >>> print foo.get(1).value.filter_templates()[0].get(1).value
  56. template
  57. Additionally, you can include nested templates in ``filter_templates()`` by
  58. passing ``recursive=True``::
  59. >>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}"
  60. >>> mwparserfromhell.parse(text).filter_templates(recursive=True)
  61. ['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}']
  62. Templates can be easily modified to add, remove, or alter params. ``Wikicode``
  63. can also be treated like a list with ``append()``, ``insert()``, ``remove()``,
  64. ``replace()``, and more::
  65. >>> text = "{{cleanup}} '''Foo''' is a [[bar]]. {{uncategorized}}"
  66. >>> code = mwparserfromhell.parse(text)
  67. >>> for template in code.filter_templates():
  68. ... if template.name == "cleanup" and not template.has_param("date"):
  69. ... template.add("date", "July 2012")
  70. ...
  71. >>> print code
  72. {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{uncategorized}}
  73. >>> code.replace("{{uncategorized}}", "{{bar-stub}}")
  74. >>> print code
  75. {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
  76. >>> print code.filter_templates()
  77. ['{{cleanup|date=July 2012}}', '{{bar-stub}}']
  78. You can then convert ``code`` back into a regular ``unicode`` object (for
  79. saving the page!) by calling ``unicode()`` on it::
  80. >>> text = unicode(code)
  81. >>> print text
  82. {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
  83. >>> text == code
  84. True
  85. Likewise, use ``str(code)`` in Python 3.
  86. Integration
  87. -----------
  88. ``mwparserfromhell`` is used by and originally developed for EarwigBot_;
  89. ``Page`` objects have a ``parse`` method that essentially calls
  90. ``mwparserfromhell.parse()`` on ``page.get()``.
  91. If you're using Pywikipedia_, your code might look like this::
  92. import mwparserfromhell
  93. import wikipedia as pywikibot
  94. def parse(title):
  95. site = pywikibot.getSite()
  96. page = pywikibot.Page(site, title)
  97. text = page.get()
  98. return mwparserfromhell.parse(text)
  99. If you're not using a library, you can parse templates in any page using the
  100. following code (via the API_)::
  101. import json
  102. import urllib
  103. import mwparserfromhell
  104. API_URL = "http://en.wikipedia.org/w/api.php"
  105. def parse(title):
  106. data = {"action": "query", "prop": "revisions", "rvlimit": 1,
  107. "rvprop": "content", "format": "json", "titles": title}
  108. raw = urllib.urlopen(API_URL, urllib.urlencode(data)).read()
  109. res = json.loads(raw)
  110. text = res["query"]["pages"].values()[0]["revisions"][0]["*"]
  111. return mwparserfromhell.parse(text)
  112. .. _MediaWiki: http://mediawiki.org
  113. .. _Earwig: http://en.wikipedia.org/wiki/User:The_Earwig
  114. .. _Σ: http://en.wikipedia.org/wiki/User:%CE%A3
  115. .. _Python Package Index: http://pypi.python.org
  116. .. _StackOverflow question: http://stackoverflow.com/questions/2817869/error-unable-to-find-vcvarsall-bat
  117. .. _get pip: http://pypi.python.org/pypi/pip
  118. .. _EarwigBot: https://github.com/earwig/earwigbot
  119. .. _Pywikipedia: https://www.mediawiki.org/wiki/Manual:Pywikipediabot
  120. .. _API: http://mediawiki.org/wiki/API