A Python parser for MediaWiki wikicode https://mwparserfromhell.readthedocs.io/
選択できるのは25トピックまでです。 トピックは、先頭が英数字で、英数字とダッシュ('-')を使用した35文字以内のものにしてください。

README.rst 5.9 KiB

11年前
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158
  1. mwparserfromhell
  2. ================
  3. .. image:: https://api.travis-ci.org/earwig/mwparserfromhell.svg?branch=develop
  4. :alt: Build Status
  5. :target: http://travis-ci.org/earwig/mwparserfromhell
  6. **mwparserfromhell** (the *MediaWiki Parser from Hell*) is a Python package
  7. that provides an easy-to-use and outrageously powerful parser for MediaWiki_
  8. wikicode. It supports Python 2 and Python 3.
  9. Developed by Earwig_ with contributions from `Σ`_, Legoktm_, and others.
  10. Full documentation is available on ReadTheDocs_. Development occurs on GitHub_.
  11. Installation
  12. ------------
  13. The easiest way to install the parser is through the `Python Package Index`_;
  14. you can install the latest release with ``pip install mwparserfromhell``
  15. (`get pip`_). On Windows, make sure you have the latest version of pip
  16. installed by running ``pip install --upgrade pip``.
  17. Alternatively, get the latest development version::
  18. git clone https://github.com/earwig/mwparserfromhell.git
  19. cd mwparserfromhell
  20. python setup.py install
  21. You can run the comprehensive unit testing suite with
  22. ``python setup.py test -q``.
  23. Usage
  24. -----
  25. Normal usage is rather straightforward (where ``text`` is page text)::
  26. >>> import mwparserfromhell
  27. >>> wikicode = mwparserfromhell.parse(text)
  28. ``wikicode`` is a ``mwparserfromhell.Wikicode`` object, which acts like an
  29. ordinary ``unicode`` object (or ``str`` in Python 3) with some extra methods.
  30. For example::
  31. >>> text = "I has a template! {{foo|bar|baz|eggs=spam}} See it?"
  32. >>> wikicode = mwparserfromhell.parse(text)
  33. >>> print wikicode
  34. I has a template! {{foo|bar|baz|eggs=spam}} See it?
  35. >>> templates = wikicode.filter_templates()
  36. >>> print templates
  37. ['{{foo|bar|baz|eggs=spam}}']
  38. >>> template = templates[0]
  39. >>> print template.name
  40. foo
  41. >>> print template.params
  42. ['bar', 'baz', 'eggs=spam']
  43. >>> print template.get(1).value
  44. bar
  45. >>> print template.get("eggs").value
  46. spam
  47. Since nodes can contain other nodes, getting nested templates is trivial::
  48. >>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}"
  49. >>> mwparserfromhell.parse(text).filter_templates()
  50. ['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}']
  51. You can also pass ``recursive=False`` to ``filter_templates()`` and explore
  52. templates manually. This is possible because nodes can contain additional
  53. ``Wikicode`` objects::
  54. >>> code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}")
  55. >>> print code.filter_templates(recursive=False)
  56. ['{{foo|this {{includes a|template}}}}']
  57. >>> foo = code.filter_templates(recursive=False)[0]
  58. >>> print foo.get(1).value
  59. this {{includes a|template}}
  60. >>> print foo.get(1).value.filter_templates()[0]
  61. {{includes a|template}}
  62. >>> print foo.get(1).value.filter_templates()[0].get(1).value
  63. template
  64. Templates can be easily modified to add, remove, or alter params. ``Wikicode``
  65. objects can be treated like lists, with ``append()``, ``insert()``,
  66. ``remove()``, ``replace()``, and more. They also have a ``matches()`` method
  67. for comparing page or template names, which takes care of capitalization and
  68. whitespace::
  69. >>> text = "{{cleanup}} '''Foo''' is a [[bar]]. {{uncategorized}}"
  70. >>> code = mwparserfromhell.parse(text)
  71. >>> for template in code.filter_templates():
  72. ... if template.name.matches("Cleanup") and not template.has("date"):
  73. ... template.add("date", "July 2012")
  74. ...
  75. >>> print code
  76. {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{uncategorized}}
  77. >>> code.replace("{{uncategorized}}", "{{bar-stub}}")
  78. >>> print code
  79. {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
  80. >>> print code.filter_templates()
  81. ['{{cleanup|date=July 2012}}', '{{bar-stub}}']
  82. You can then convert ``code`` back into a regular ``unicode`` object (for
  83. saving the page!) by calling ``unicode()`` on it::
  84. >>> text = unicode(code)
  85. >>> print text
  86. {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
  87. >>> text == code
  88. True
  89. Likewise, use ``str(code)`` in Python 3.
  90. Integration
  91. -----------
  92. ``mwparserfromhell`` is used by and originally developed for EarwigBot_;
  93. ``Page`` objects have a ``parse`` method that essentially calls
  94. ``mwparserfromhell.parse()`` on ``page.get()``.
  95. If you're using Pywikipedia_, your code might look like this::
  96. import mwparserfromhell
  97. import wikipedia as pywikibot
  98. def parse(title):
  99. site = pywikibot.getSite()
  100. page = pywikibot.Page(site, title)
  101. text = page.get()
  102. return mwparserfromhell.parse(text)
  103. If you're not using a library, you can parse any page using the following code
  104. (via the API_)::
  105. import json
  106. import urllib
  107. import mwparserfromhell
  108. API_URL = "http://en.wikipedia.org/w/api.php"
  109. def parse(title):
  110. data = {"action": "query", "prop": "revisions", "rvlimit": 1,
  111. "rvprop": "content", "format": "json", "titles": title}
  112. raw = urllib.urlopen(API_URL, urllib.urlencode(data)).read()
  113. res = json.loads(raw)
  114. text = res["query"]["pages"].values()[0]["revisions"][0]["*"]
  115. return mwparserfromhell.parse(text)
  116. .. _MediaWiki: http://mediawiki.org
  117. .. _ReadTheDocs: http://mwparserfromhell.readthedocs.org
  118. .. _Earwig: http://en.wikipedia.org/wiki/User:The_Earwig
  119. .. _Σ: http://en.wikipedia.org/wiki/User:%CE%A3
  120. .. _Legoktm: http://en.wikipedia.org/wiki/User:Legoktm
  121. .. _GitHub: https://github.com/earwig/mwparserfromhell
  122. .. _Python Package Index: http://pypi.python.org
  123. .. _StackOverflow question: http://stackoverflow.com/questions/2817869/error-unable-to-find-vcvarsall-bat
  124. .. _get pip: http://pypi.python.org/pypi/pip
  125. .. _EarwigBot: https://github.com/earwig/earwigbot
  126. .. _Pywikipedia: https://www.mediawiki.org/wiki/Manual:Pywikipediabot
  127. .. _API: http://mediawiki.org/wiki/API