A Python parser for MediaWiki wikicode https://mwparserfromhell.readthedocs.io/
選択できるのは25トピックまでです。 トピックは、先頭が英数字で、英数字とダッシュ('-')を使用した35文字以内のものにしてください。

README.rst 4.9 KiB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139
  1. mwparserfromhell
  2. ========================
  3. **mwparserfromhell** (the *MediaWiki Parser from Hell*) is a Python package
  4. that provides an easy-to-use and outrageously powerful parser for MediaWiki_
  5. wikicode. It supports Python 2 and Python 3.
  6. Developed by Earwig_ with help from `Σ`_.
  7. Installation
  8. ------------
  9. The easiest way to install the parser is through the `Python Package Index`_,
  10. so you can install the latest release with ``pip install mwparserfromhell``
  11. (`get pip`_). Alternatively, get the latest development version::
  12. git clone git://github.com/earwig/mwparserfromhell.git
  13. cd mwparserfromhell
  14. python setup.py install
  15. You can run the comprehensive unit testing suite with ``python setup.py test``.
  16. Usage
  17. -----
  18. Normal usage is rather straightforward (where ``text`` is page text)::
  19. >>> import mwparserfromhell
  20. >>> wikicode = mwparserfromhell.parse(text)
  21. ``wikicode`` is a ``mwparserfromhell.wikicode.Wikicode`` object, which acts
  22. like an ordinary ``unicode`` object (or ``str`` in Python 3) with some extra
  23. methods. For example::
  24. >>> text = "I has a template! {{foo|bar|baz|eggs=spam}} See it?"
  25. >>> wikicode = mwparserfromhell.parse(text)
  26. >>> print wikicode
  27. I has a template! {{foo|bar|baz|eggs=spam}} See it?
  28. >>> templates = wikicode.filter_templates()
  29. >>> print templates
  30. ['{{foo|bar|baz|eggs=spam}}']
  31. >>> template = templates[0]
  32. >>> print template.name
  33. foo
  34. >>> print template.params
  35. ['bar', 'baz', 'eggs=spam']
  36. >>> print template.get(1).value
  37. bar
  38. >>> print template.get("eggs").value
  39. spam
  40. Since every node you reach is also a ``Wikicode`` object, it's trivial to get
  41. nested templates::
  42. >>> code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}")
  43. >>> print code.filter_templates()
  44. ['{{foo|this {{includes a|template}}}}']
  45. >>> foo = code.filter_templates()[0]
  46. >>> print foo.get(1).value
  47. this {{includes a|template}}
  48. >>> print foo.get(1).value.filter_templates()[0]
  49. {{includes a|template}}
  50. >>> print foo.get(1).value.filter_templates()[0].get(1).value
  51. template
  52. Additionally, you can include nested templates in ``filter_templates()`` by
  53. passing ``recursive=True``::
  54. >>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}"
  55. >>> mwparserfromhell.parse(text).filter_templates(recursive=True)
  56. ['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}']
  57. Templates can be easily modified to add, remove, alter or params. ``Wikicode``
  58. can also be treated like a list with ``append()``, ``insert()``, ``remove()``,
  59. ``replace()``, and more::
  60. >>> text = "{{cleanup}} '''Foo''' is a [[bar]]. {{uncategorized}}"
  61. >>> code = mwparserfromhell.parse(text)
  62. >>> for template in code.filter_templates():
  63. ... if template.name == "cleanup" and not template.has_param("date"):
  64. ... template.add("date", "July 2012")
  65. ...
  66. >>> print code
  67. {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{uncategorized}}
  68. >>> code.replace("{{uncategorized}}", "{{bar-stub}}")
  69. >>> print code
  70. {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
  71. >>> print code.filter_templates()
  72. ['{{cleanup|date=July 2012}}', '{{bar-stub}}']
  73. You can then convert ``code`` back into a regular ``unicode`` object (for
  74. saving the page!) by calling ``unicode()`` on it::
  75. >>> text = unicode(code)
  76. >>> print text
  77. {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
  78. >>> text == code
  79. True
  80. Likewise, use ``str(code)`` in Python 3.
  81. Integration
  82. -----------
  83. ``mwparserfromhell`` is used by and originally developed for EarwigBot_;
  84. ``Page`` objects have a ``parse`` method that essentially calls
  85. ``mwparserfromhell.parse()`` on ``page.get()``.
  86. If you're using PyWikipedia_, your code might look like this::
  87. import mwparserfromhell
  88. import wikipedia as pywikibot
  89. def parse(title):
  90. site = pywikibot.get_site()
  91. page = pywikibot.Page(site, title)
  92. text = page.get()
  93. return mwparserfromhell.parse(text)
  94. If you're not using a library, you can parse templates in any page using the
  95. following code (via the API_)::
  96. import json
  97. import urllib
  98. import mwparserfromhell
  99. API_URL = "http://en.wikipedia.org/w/api.php"
  100. def parse(title):
  101. raw = urllib.urlopen(API_URL, data).read()
  102. res = json.loads(raw)
  103. text = res["query"]["pages"].values()[0]["revisions"][0]["*"]
  104. return mwparserfromhell.parse(text)
  105. .. _MediaWiki: http://mediawiki.org
  106. .. _Earwig: http://en.wikipedia.org/wiki/User:The_Earwig
  107. .. _Σ: http://en.wikipedia.org/wiki/User:Σ
  108. .. _Python Package Index: http://pypi.python.org
  109. .. _get pip: http://pypi.python.org/pypi/pip
  110. .. _EarwigBot: https://github.com/earwig/earwigbot
  111. .. _PyWikipedia: http://pywikipediabot.sourceforge.net/
  112. .. _API: http://mediawiki.org/wiki/API