A Python parser for MediaWiki wikicode https://mwparserfromhell.readthedocs.io/
No puede seleccionar más de 25 temas Los temas deben comenzar con una letra o número, pueden incluir guiones ('-') y pueden tener hasta 35 caracteres de largo.

README.rst 5.9 KiB

hace 12 años
hace 10 años
hace 11 años
hace 12 años
hace 12 años
hace 12 años
hace 11 años
hace 12 años
hace 12 años
hace 12 años
hace 12 años
hace 12 años
hace 12 años
hace 12 años
hace 12 años
hace 12 años
hace 12 años
hace 12 años
hace 12 años
hace 12 años
hace 12 años
hace 12 años
hace 12 años
hace 12 años
hace 12 años
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158
  1. mwparserfromhell
  2. ================
  3. .. image:: https://api.travis-ci.org/earwig/mwparserfromhell.svg?branch=develop
  4. :alt: Build Status
  5. :target: http://travis-ci.org/earwig/mwparserfromhell
  6. **mwparserfromhell** (the *MediaWiki Parser from Hell*) is a Python package
  7. that provides an easy-to-use and outrageously powerful parser for MediaWiki_
  8. wikicode. It supports Python 2 and Python 3.
  9. Developed by Earwig_ with contributions from `Σ`_, Legoktm_, and others.
  10. Full documentation is available on ReadTheDocs_. Development occurs on GitHub_.
  11. Installation
  12. ------------
  13. The easiest way to install the parser is through the `Python Package Index`_;
  14. you can install the latest release with ``pip install mwparserfromhell``
  15. (`get pip`_). On Windows, make sure you have the latest version of pip
  16. installed by running ``pip install --upgrade pip``.
  17. Alternatively, get the latest development version::
  18. git clone https://github.com/earwig/mwparserfromhell.git
  19. cd mwparserfromhell
  20. python setup.py install
  21. You can run the comprehensive unit testing suite with
  22. ``python setup.py test -q``.
  23. Usage
  24. -----
  25. Normal usage is rather straightforward (where ``text`` is page text)::
  26. >>> import mwparserfromhell
  27. >>> wikicode = mwparserfromhell.parse(text)
  28. ``wikicode`` is a ``mwparserfromhell.Wikicode`` object, which acts like an
  29. ordinary ``unicode`` object (or ``str`` in Python 3) with some extra methods.
  30. For example::
  31. >>> text = "I has a template! {{foo|bar|baz|eggs=spam}} See it?"
  32. >>> wikicode = mwparserfromhell.parse(text)
  33. >>> print wikicode
  34. I has a template! {{foo|bar|baz|eggs=spam}} See it?
  35. >>> templates = wikicode.filter_templates()
  36. >>> print templates
  37. ['{{foo|bar|baz|eggs=spam}}']
  38. >>> template = templates[0]
  39. >>> print template.name
  40. foo
  41. >>> print template.params
  42. ['bar', 'baz', 'eggs=spam']
  43. >>> print template.get(1).value
  44. bar
  45. >>> print template.get("eggs").value
  46. spam
  47. Since nodes can contain other nodes, getting nested templates is trivial::
  48. >>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}"
  49. >>> mwparserfromhell.parse(text).filter_templates()
  50. ['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}']
  51. You can also pass ``recursive=False`` to ``filter_templates()`` and explore
  52. templates manually. This is possible because nodes can contain additional
  53. ``Wikicode`` objects::
  54. >>> code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}")
  55. >>> print code.filter_templates(recursive=False)
  56. ['{{foo|this {{includes a|template}}}}']
  57. >>> foo = code.filter_templates(recursive=False)[0]
  58. >>> print foo.get(1).value
  59. this {{includes a|template}}
  60. >>> print foo.get(1).value.filter_templates()[0]
  61. {{includes a|template}}
  62. >>> print foo.get(1).value.filter_templates()[0].get(1).value
  63. template
  64. Templates can be easily modified to add, remove, or alter params. ``Wikicode``
  65. objects can be treated like lists, with ``append()``, ``insert()``,
  66. ``remove()``, ``replace()``, and more. They also have a ``matches()`` method
  67. for comparing page or template names, which takes care of capitalization and
  68. whitespace::
  69. >>> text = "{{cleanup}} '''Foo''' is a [[bar]]. {{uncategorized}}"
  70. >>> code = mwparserfromhell.parse(text)
  71. >>> for template in code.filter_templates():
  72. ... if template.name.matches("Cleanup") and not template.has("date"):
  73. ... template.add("date", "July 2012")
  74. ...
  75. >>> print code
  76. {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{uncategorized}}
  77. >>> code.replace("{{uncategorized}}", "{{bar-stub}}")
  78. >>> print code
  79. {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
  80. >>> print code.filter_templates()
  81. ['{{cleanup|date=July 2012}}', '{{bar-stub}}']
  82. You can then convert ``code`` back into a regular ``unicode`` object (for
  83. saving the page!) by calling ``unicode()`` on it::
  84. >>> text = unicode(code)
  85. >>> print text
  86. {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
  87. >>> text == code
  88. True
  89. Likewise, use ``str(code)`` in Python 3.
  90. Integration
  91. -----------
  92. ``mwparserfromhell`` is used by and originally developed for EarwigBot_;
  93. ``Page`` objects have a ``parse`` method that essentially calls
  94. ``mwparserfromhell.parse()`` on ``page.get()``.
  95. If you're using Pywikipedia_, your code might look like this::
  96. import mwparserfromhell
  97. import wikipedia as pywikibot
  98. def parse(title):
  99. site = pywikibot.getSite()
  100. page = pywikibot.Page(site, title)
  101. text = page.get()
  102. return mwparserfromhell.parse(text)
  103. If you're not using a library, you can parse any page using the following code
  104. (via the API_)::
  105. import json
  106. import urllib
  107. import mwparserfromhell
  108. API_URL = "http://en.wikipedia.org/w/api.php"
  109. def parse(title):
  110. data = {"action": "query", "prop": "revisions", "rvlimit": 1,
  111. "rvprop": "content", "format": "json", "titles": title}
  112. raw = urllib.urlopen(API_URL, urllib.urlencode(data)).read()
  113. res = json.loads(raw)
  114. text = res["query"]["pages"].values()[0]["revisions"][0]["*"]
  115. return mwparserfromhell.parse(text)
  116. .. _MediaWiki: http://mediawiki.org
  117. .. _ReadTheDocs: http://mwparserfromhell.readthedocs.org
  118. .. _Earwig: http://en.wikipedia.org/wiki/User:The_Earwig
  119. .. _Σ: http://en.wikipedia.org/wiki/User:%CE%A3
  120. .. _Legoktm: http://en.wikipedia.org/wiki/User:Legoktm
  121. .. _GitHub: https://github.com/earwig/mwparserfromhell
  122. .. _Python Package Index: http://pypi.python.org
  123. .. _StackOverflow question: http://stackoverflow.com/questions/2817869/error-unable-to-find-vcvarsall-bat
  124. .. _get pip: http://pypi.python.org/pypi/pip
  125. .. _EarwigBot: https://github.com/earwig/earwigbot
  126. .. _Pywikipedia: https://www.mediawiki.org/wiki/Manual:Pywikipediabot
  127. .. _API: http://mediawiki.org/wiki/API