A Python parser for MediaWiki wikicode https://mwparserfromhell.readthedocs.io/
25개 이상의 토픽을 선택하실 수 없습니다. Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.rst 5.6 KiB

12 년 전
11 년 전
12 년 전
12 년 전
12 년 전
12 년 전
12 년 전
12 년 전
12 년 전
12 년 전
12 년 전
12 년 전
12 년 전
12 년 전
12 년 전
12 년 전
12 년 전
12 년 전
12 년 전
12 년 전
12 년 전
12 년 전
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152
  1. mwparserfromhell
  2. ================
  3. .. image:: https://travis-ci.org/earwig/mwparserfromhell.png?branch=develop
  4. :alt: Build Status
  5. :target: http://travis-ci.org/earwig/mwparserfromhell
  6. **mwparserfromhell** (the *MediaWiki Parser from Hell*) is a Python package
  7. that provides an easy-to-use and outrageously powerful parser for MediaWiki_
  8. wikicode. It supports Python 2 and Python 3.
  9. Developed by Earwig_ with help from `Σ`_.
  10. Installation
  11. ------------
  12. The easiest way to install the parser is through the `Python Package Index`_,
  13. so you can install the latest release with ``pip install mwparserfromhell``
  14. (`get pip`_). Alternatively, get the latest development version::
  15. git clone git://github.com/earwig/mwparserfromhell.git
  16. cd mwparserfromhell
  17. python setup.py install
  18. If you get ``error: Unable to find vcvarsall.bat`` while installing, this is
  19. because Windows can't find the compiler for C extensions. Consult this
  20. `StackOverflow question`_ for help. You can also set ``ext_modules`` in
  21. ``setup.py`` to an empty list to prevent the extension from building.
  22. You can run the comprehensive unit testing suite with
  23. ``python setup.py test -q``.
  24. Usage
  25. -----
  26. Normal usage is rather straightforward (where ``text`` is page text)::
  27. >>> import mwparserfromhell
  28. >>> wikicode = mwparserfromhell.parse(text)
  29. ``wikicode`` is a ``mwparserfromhell.Wikicode`` object, which acts like an
  30. ordinary ``unicode`` object (or ``str`` in Python 3) with some extra methods.
  31. For example::
  32. >>> text = "I has a template! {{foo|bar|baz|eggs=spam}} See it?"
  33. >>> wikicode = mwparserfromhell.parse(text)
  34. >>> print wikicode
  35. I has a template! {{foo|bar|baz|eggs=spam}} See it?
  36. >>> templates = wikicode.filter_templates()
  37. >>> print templates
  38. ['{{foo|bar|baz|eggs=spam}}']
  39. >>> template = templates[0]
  40. >>> print template.name
  41. foo
  42. >>> print template.params
  43. ['bar', 'baz', 'eggs=spam']
  44. >>> print template.get(1).value
  45. bar
  46. >>> print template.get("eggs").value
  47. spam
  48. Since every node you reach is also a ``Wikicode`` object, it's trivial to get
  49. nested templates::
  50. >>> code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}")
  51. >>> print code.filter_templates()
  52. ['{{foo|this {{includes a|template}}}}']
  53. >>> foo = code.filter_templates()[0]
  54. >>> print foo.get(1).value
  55. this {{includes a|template}}
  56. >>> print foo.get(1).value.filter_templates()[0]
  57. {{includes a|template}}
  58. >>> print foo.get(1).value.filter_templates()[0].get(1).value
  59. template
  60. Additionally, you can include nested templates in ``filter_templates()`` by
  61. passing ``recursive=True``::
  62. >>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}"
  63. >>> mwparserfromhell.parse(text).filter_templates(recursive=True)
  64. ['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}']
  65. Templates can be easily modified to add, remove, or alter params. ``Wikicode``
  66. can also be treated like a list with ``append()``, ``insert()``, ``remove()``,
  67. ``replace()``, and more::
  68. >>> text = "{{cleanup}} '''Foo''' is a [[bar]]. {{uncategorized}}"
  69. >>> code = mwparserfromhell.parse(text)
  70. >>> for template in code.filter_templates():
  71. ... if template.name == "cleanup" and not template.has_param("date"):
  72. ... template.add("date", "July 2012")
  73. ...
  74. >>> print code
  75. {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{uncategorized}}
  76. >>> code.replace("{{uncategorized}}", "{{bar-stub}}")
  77. >>> print code
  78. {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
  79. >>> print code.filter_templates()
  80. ['{{cleanup|date=July 2012}}', '{{bar-stub}}']
  81. You can then convert ``code`` back into a regular ``unicode`` object (for
  82. saving the page!) by calling ``unicode()`` on it::
  83. >>> text = unicode(code)
  84. >>> print text
  85. {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
  86. >>> text == code
  87. True
  88. Likewise, use ``str(code)`` in Python 3.
  89. Integration
  90. -----------
  91. ``mwparserfromhell`` is used by and originally developed for EarwigBot_;
  92. ``Page`` objects have a ``parse`` method that essentially calls
  93. ``mwparserfromhell.parse()`` on ``page.get()``.
  94. If you're using Pywikipedia_, your code might look like this::
  95. import mwparserfromhell
  96. import wikipedia as pywikibot
  97. def parse(title):
  98. site = pywikibot.getSite()
  99. page = pywikibot.Page(site, title)
  100. text = page.get()
  101. return mwparserfromhell.parse(text)
  102. If you're not using a library, you can parse templates in any page using the
  103. following code (via the API_)::
  104. import json
  105. import urllib
  106. import mwparserfromhell
  107. API_URL = "http://en.wikipedia.org/w/api.php"
  108. def parse(title):
  109. data = {"action": "query", "prop": "revisions", "rvlimit": 1,
  110. "rvprop": "content", "format": "json", "titles": title}
  111. raw = urllib.urlopen(API_URL, urllib.urlencode(data)).read()
  112. res = json.loads(raw)
  113. text = res["query"]["pages"].values()[0]["revisions"][0]["*"]
  114. return mwparserfromhell.parse(text)
  115. .. _MediaWiki: http://mediawiki.org
  116. .. _Earwig: http://en.wikipedia.org/wiki/User:The_Earwig
  117. .. _Σ: http://en.wikipedia.org/wiki/User:%CE%A3
  118. .. _Python Package Index: http://pypi.python.org
  119. .. _StackOverflow question: http://stackoverflow.com/questions/2817869/error-unable-to-find-vcvarsall-bat
  120. .. _get pip: http://pypi.python.org/pypi/pip
  121. .. _EarwigBot: https://github.com/earwig/earwigbot
  122. .. _Pywikipedia: https://www.mediawiki.org/wiki/Manual:Pywikipediabot
  123. .. _API: http://mediawiki.org/wiki/API