A Python parser for MediaWiki wikicode https://mwparserfromhell.readthedocs.io/
您最多选择25个主题 主题必须以字母或数字开头,可以包含连字符 (-),并且长度不得超过35个字符

11 年前
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162
  1. mwparserfromhell
  2. ================
  3. .. image:: https://img.shields.io/travis/earwig/mwparserfromhell/develop.svg
  4. :alt: Build Status
  5. :target: http://travis-ci.org/earwig/mwparserfromhell
  6. .. image:: https://img.shields.io/coveralls/earwig/mwparserfromhell/develop.svg
  7. :alt: Coverage Status
  8. :target: https://coveralls.io/r/earwig/mwparserfromhell
  9. **mwparserfromhell** (the *MediaWiki Parser from Hell*) is a Python package
  10. that provides an easy-to-use and outrageously powerful parser for MediaWiki_
  11. wikicode. It supports Python 2 and Python 3.
  12. Developed by Earwig_ with contributions from `Σ`_, Legoktm_, and others.
  13. Full documentation is available on ReadTheDocs_. Development occurs on GitHub_.
  14. Installation
  15. ------------
  16. The easiest way to install the parser is through the `Python Package Index`_;
  17. you can install the latest release with ``pip install mwparserfromhell``
  18. (`get pip`_). On Windows, make sure you have the latest version of pip
  19. installed by running ``pip install --upgrade pip``.
  20. Alternatively, get the latest development version::
  21. git clone https://github.com/earwig/mwparserfromhell.git
  22. cd mwparserfromhell
  23. python setup.py install
  24. You can run the comprehensive unit testing suite with
  25. ``python setup.py test -q``.
  26. Usage
  27. -----
  28. Normal usage is rather straightforward (where ``text`` is page text)::
  29. >>> import mwparserfromhell
  30. >>> wikicode = mwparserfromhell.parse(text)
  31. ``wikicode`` is a ``mwparserfromhell.Wikicode`` object, which acts like an
  32. ordinary ``str`` object (or ``unicode`` in Python 2) with some extra methods.
  33. For example::
  34. >>> text = "I has a template! {{foo|bar|baz|eggs=spam}} See it?"
  35. >>> wikicode = mwparserfromhell.parse(text)
  36. >>> print(wikicode)
  37. I has a template! {{foo|bar|baz|eggs=spam}} See it?
  38. >>> templates = wikicode.filter_templates()
  39. >>> print(templates)
  40. ['{{foo|bar|baz|eggs=spam}}']
  41. >>> template = templates[0]
  42. >>> print(template.name)
  43. foo
  44. >>> print(template.params)
  45. ['bar', 'baz', 'eggs=spam']
  46. >>> print(template.get(1).value)
  47. bar
  48. >>> print(template.get("eggs").value)
  49. spam
  50. Since nodes can contain other nodes, getting nested templates is trivial::
  51. >>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}"
  52. >>> mwparserfromhell.parse(text).filter_templates()
  53. ['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}']
  54. You can also pass ``recursive=False`` to ``filter_templates()`` and explore
  55. templates manually. This is possible because nodes can contain additional
  56. ``Wikicode`` objects::
  57. >>> code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}")
  58. >>> print(code.filter_templates(recursive=False))
  59. ['{{foo|this {{includes a|template}}}}']
  60. >>> foo = code.filter_templates(recursive=False)[0]
  61. >>> print(foo.get(1).value)
  62. this {{includes a|template}}
  63. >>> print(foo.get(1).value.filter_templates()[0])
  64. {{includes a|template}}
  65. >>> print(foo.get(1).value.filter_templates()[0].get(1).value)
  66. template
  67. Templates can be easily modified to add, remove, or alter params. ``Wikicode``
  68. objects can be treated like lists, with ``append()``, ``insert()``,
  69. ``remove()``, ``replace()``, and more. They also have a ``matches()`` method
  70. for comparing page or template names, which takes care of capitalization and
  71. whitespace::
  72. >>> text = "{{cleanup}} '''Foo''' is a [[bar]]. {{uncategorized}}"
  73. >>> code = mwparserfromhell.parse(text)
  74. >>> for template in code.filter_templates():
  75. ... if template.name.matches("Cleanup") and not template.has("date"):
  76. ... template.add("date", "July 2012")
  77. ...
  78. >>> print(code)
  79. {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{uncategorized}}
  80. >>> code.replace("{{uncategorized}}", "{{bar-stub}}")
  81. >>> print(code)
  82. {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
  83. >>> print(code.filter_templates())
  84. ['{{cleanup|date=July 2012}}', '{{bar-stub}}']
  85. You can then convert ``code`` back into a regular ``str`` object (for
  86. saving the page!) by calling ``str()`` on it::
  87. >>> text = str(code)
  88. >>> print(text)
  89. {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
  90. >>> text == code
  91. True
  92. Likewise, use ``unicode(code)`` in Python 2.
  93. Integration
  94. -----------
  95. ``mwparserfromhell`` is used by and originally developed for EarwigBot_;
  96. ``Page`` objects have a ``parse`` method that essentially calls
  97. ``mwparserfromhell.parse()`` on ``page.get()``.
  98. If you're using Pywikibot_, your code might look like this::
  99. import mwparserfromhell
  100. import pywikibot
  101. def parse(title):
  102. site = pywikibot.Site()
  103. page = pywikibot.Page(site, title)
  104. text = page.get()
  105. return mwparserfromhell.parse(text)
  106. If you're not using a library, you can parse any page using the following code
  107. (via the API_)::
  108. import json
  109. from urllib.parse import urlencode
  110. from urllib.request import urlopen
  111. import mwparserfromhell
  112. API_URL = "https://en.wikipedia.org/w/api.php"
  113. def parse(title):
  114. data = {"action": "query", "prop": "revisions", "rvlimit": 1,
  115. "rvprop": "content", "format": "json", "titles": title}
  116. raw = urlopen(API_URL, urlencode(data).encode()).read()
  117. res = json.loads(raw)
  118. text = res["query"]["pages"].values()[0]["revisions"][0]["*"]
  119. return mwparserfromhell.parse(text)
  120. .. _MediaWiki: http://mediawiki.org
  121. .. _ReadTheDocs: http://mwparserfromhell.readthedocs.org
  122. .. _Earwig: http://en.wikipedia.org/wiki/User:The_Earwig
  123. .. _Σ: http://en.wikipedia.org/wiki/User:%CE%A3
  124. .. _Legoktm: http://en.wikipedia.org/wiki/User:Legoktm
  125. .. _GitHub: https://github.com/earwig/mwparserfromhell
  126. .. _Python Package Index: http://pypi.python.org
  127. .. _get pip: http://pypi.python.org/pypi/pip
  128. .. _EarwigBot: https://github.com/earwig/earwigbot
  129. .. _Pywikibot: https://www.mediawiki.org/wiki/Manual:Pywikibot
  130. .. _API: http://mediawiki.org/wiki/API