A Python parser for MediaWiki wikicode https://mwparserfromhell.readthedocs.io/
Vous ne pouvez pas sélectionner plus de 25 sujets Les noms de sujets doivent commencer par une lettre ou un nombre, peuvent contenir des tirets ('-') et peuvent comporter jusqu'à 35 caractères.

README.rst 4.8 KiB

il y a 12 ans
il y a 12 ans
il y a 12 ans
il y a 12 ans
il y a 12 ans
il y a 12 ans
il y a 12 ans
il y a 12 ans
il y a 12 ans
il y a 12 ans
il y a 12 ans
il y a 12 ans
il y a 12 ans
il y a 12 ans
il y a 12 ans
il y a 12 ans
il y a 12 ans
il y a 12 ans
il y a 12 ans
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136
  1. mwparserfromhell
  2. ========================
  3. **mwparserfromhell** (the *MediaWiki Parser from Hell*) is a Python package
  4. that provides an easy-to-use and outrageously powerful parser for MediaWiki_
  5. wikicode. It supports Python 2 and Python 3.
  6. Developed by Earwig_ and named by `Σ`_.
  7. Installation
  8. ------------
  9. The easiest way to install the parser is through the `Python Package Index`_,
  10. so you can install the latest release with ``pip install mwparserfromhell``
  11. (`get pip`_). Alternatively, get the latest development version::
  12. git clone git://github.com/earwig/mwparserfromhell.git mwparserfromhell
  13. cd mwparserfromhell
  14. python setup.py install
  15. You can run the comprehensive unit testing suite with ``python setup.py test``.
  16. Usage
  17. -----
  18. Normal usage is rather straightforward (where ``text`` is page text)::
  19. >>> import mwparserfromhell
  20. >>> wikicode = mwparserfromhell.parse(text)
  21. ``wikicode`` is a ``mwparserfromhell.wikicode.Wikicode`` object, which acts
  22. like an ordinary unicode object with some extra methods. For example::
  23. >>> text = "I has a template! {{foo|bar|baz|eggs=spam}} See it?"
  24. >>> wikicode = mwparserfromhell.parse(text)
  25. >>> print wikicode
  26. I has a template! {{foo|bar|baz|eggs=spam}} See it?
  27. >>> templates = wikicode.filter_templates()
  28. >>> print templates
  29. ['{{foo|bar|baz|eggs=spam}}']
  30. >>> template = templates[0]
  31. >>> print template.name
  32. foo
  33. >>> print template.params
  34. ['bar', 'baz', 'eggs=spam']
  35. >>> print template.get(1).value
  36. bar
  37. >>> print template.get("eggs").value
  38. spam
  39. Since every node you reach is also a ``Wikicode`` object, it's trivial to get
  40. nested templates::
  41. >>> code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}")
  42. >>> print code.filter_templates()
  43. ['{{foo|this {{includes a|template}}}}']
  44. >>> foo = code.filter_templates()[0]
  45. >>> print foo.get(1).value
  46. this {{includes a|template}}
  47. >>> print foo.get(1).value.filter_templates()[0]
  48. {{includes a|template}}
  49. >>> print foo.get(1).value.filter_templates()[0].get(1).value
  50. template
  51. Additionally, you can get include nested templates in ``filter_templates()`` by
  52. passing ``recursive=True``::
  53. >>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}"
  54. >>> mwparserfromhell.parse(text).filter_templates(recursive=True)
  55. ['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}']
  56. Templates can be easily modified to add, remove, alter or params. ``Wikicode``
  57. can also be treated like a list with ``append()``, ``insert()``, ``remove()``,
  58. ``replace()``, and more::
  59. >>> text = "{{cleanup}} '''Foo''' is a [[bar]]. {{uncategorized}}"
  60. >>> code = mwparserfromhell.parse(text)
  61. >>> for template in code.filter_templates():
  62. ... if template.name == "cleanup" and not template.has_param("date"):
  63. ... template.add("date", "July 2012")
  64. ...
  65. >>> print code
  66. {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{uncategorized}}
  67. >>> code.replace("{{uncategorized}}", "{{bar-stub}}")
  68. >>> print code
  69. {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
  70. >>> print code.filter_templates()
  71. ['{{cleanup|date=July 2012}}', '{{bar-stub}}']
  72. You can then convert ``code`` back into a regular ``unicode`` object (for
  73. saving the page!) by calling ``unicode()`` on it::
  74. >>> text = unicode(code)
  75. >>> print text
  76. {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
  77. >>> text == code
  78. True
  79. Integration
  80. -----------
  81. ``mwparserfromhell`` is used by and originally developed for EarwigBot_;
  82. ``Page`` objects have a ``parse`` method that essentially calls
  83. ``mwparserfromhell.parse()`` on ``page.get()``.
  84. If you're using PyWikipedia_, your code might look like this::
  85. import mwparserfromhell
  86. import wikipedia as pywikibot
  87. def parse(title):
  88. site = pywikibot.get_site()
  89. page = pywikibot.Page(site, title)
  90. text = page.get()
  91. return mwparserfromhell.parse(text)
  92. If you're not using a library, you can parse templates in any page using the
  93. following code (via the API_)::
  94. import json
  95. import urllib
  96. import mwparserfromhell
  97. API_URL = "http://en.wikipedia.org/w/api.php"
  98. def parse(title):
  99. raw = urllib.urlopen(API_URL, data).read()
  100. res = json.loads(raw)
  101. text = res["query"]["pages"].values()[0]["revisions"][0]["*"]
  102. return mwparserfromhell.parse(text)
  103. .. _MediaWiki: http://mediawiki.org
  104. .. _Earwig: http://en.wikipedia.org/wiki/User:The_Earwig
  105. .. _Σ: http://en.wikipedia.org/wiki/User:Σ
  106. .. _Python Package Index: http://pypi.python.org
  107. .. _get pip: http://pypi.python.org/pypi/pip
  108. .. _EarwigBot: https://github.com/earwig/earwigbot
  109. .. _PyWikipedia: http://pywikipediabot.sourceforge.net/
  110. .. _API: http://mediawiki.org/wiki/API