|
@@ -113,23 +113,38 @@ saving the page!) by calling ``str()`` on it:: |
|
|
|
|
|
|
|
|
Likewise, use ``unicode(code)`` in Python 2. |
|
|
Likewise, use ``unicode(code)`` in Python 2. |
|
|
|
|
|
|
|
|
Caveats |
|
|
|
|
|
|
|
|
Limitations |
|
|
|
|
|
----------- |
|
|
|
|
|
While the MediaWiki parser generates HTML, mwparserfromhell acts as an interface to |
|
|
|
|
|
the source code. mwparserfromhell therefore is unaware of template definitions since |
|
|
|
|
|
if it would substitute templates with their output you would no longer be working |
|
|
|
|
|
with the source code. This has several implications: |
|
|
|
|
|
|
|
|
|
|
|
* Start and end tags generated by templates aren't recognized e.g. ``<b>foobar{{bold-end}}``. |
|
|
|
|
|
|
|
|
|
|
|
* Templates adjacent to external links e.g. ``http://example.com{{foo}}`` are |
|
|
|
|
|
considered part of the link. |
|
|
|
|
|
|
|
|
|
|
|
* Crossed constructs like ``{{echo|''Hello}}, world!''`` are not supported, |
|
|
|
|
|
the first node is treated as plain text. |
|
|
|
|
|
|
|
|
|
|
|
The current workaround for cases where you are not interested in text |
|
|
|
|
|
formatting is to pass ``skip_style_tags=True`` to ``mwparserfromhell.parse()``. |
|
|
|
|
|
This treats ``''`` and ``'''`` like plain text. |
|
|
|
|
|
|
|
|
|
|
|
A future version of mwparserfromhell will include multiple parsing modes to get |
|
|
|
|
|
around this restriction. |
|
|
|
|
|
|
|
|
|
|
|
Configuration unawareness |
|
|
|
|
|
------------------------- |
|
|
|
|
|
|
|
|
An inherent limitation in wikicode prevents us from generating complete parse |
|
|
|
|
|
trees in certain cases. For example, the string ``{{echo|''Hello}}, world!''`` |
|
|
|
|
|
produces the valid output ``<i>Hello, world!</i>`` in MediaWiki, assuming |
|
|
|
|
|
``{{echo}}`` is a template that returns its first parameter. But since |
|
|
|
|
|
representing this in mwparserfromhell's node tree would be impossible, we |
|
|
|
|
|
compromise by treating the first node (i.e., the template) as plain text, |
|
|
|
|
|
parsing only the italics. |
|
|
|
|
|
|
|
|
* `word-ending links`_ are not supported since the linktrail rules are language-specific. |
|
|
|
|
|
|
|
|
The current workaround for cases where you are not interested in text |
|
|
|
|
|
formatting is to pass ``skip_style_tags=True`` to ``mwparserfromhell.parse()``. |
|
|
|
|
|
This treats ``''`` and ``'''`` like plain text. |
|
|
|
|
|
|
|
|
* Localized namespace names aren't recognized, e.g. ``[[File:...]]`` |
|
|
|
|
|
links are treated as regular wikilinks. |
|
|
|
|
|
|
|
|
A future version of mwparserfromhell will include multiple parsing modes to get |
|
|
|
|
|
around this restriction. |
|
|
|
|
|
|
|
|
* Anything that looks like an XML tag is parsed as a tag |
|
|
|
|
|
since the available tags are extension-dependent. |
|
|
|
|
|
|
|
|
Integration |
|
|
Integration |
|
|
----------- |
|
|
----------- |
|
@@ -174,6 +189,7 @@ Python 3 code (via the API_):: |
|
|
.. _GitHub: https://github.com/earwig/mwparserfromhell |
|
|
.. _GitHub: https://github.com/earwig/mwparserfromhell |
|
|
.. _Python Package Index: http://pypi.python.org |
|
|
.. _Python Package Index: http://pypi.python.org |
|
|
.. _get pip: http://pypi.python.org/pypi/pip |
|
|
.. _get pip: http://pypi.python.org/pypi/pip |
|
|
|
|
|
.. _word-ending links: https://www.mediawiki.org/wiki/Help:Links#linktrail |
|
|
.. _EarwigBot: https://github.com/earwig/earwigbot |
|
|
.. _EarwigBot: https://github.com/earwig/earwigbot |
|
|
.. _Pywikibot: https://www.mediawiki.org/wiki/Manual:Pywikibot |
|
|
.. _Pywikibot: https://www.mediawiki.org/wiki/Manual:Pywikibot |
|
|
.. _API: http://mediawiki.org/wiki/API |
|
|
.. _API: http://mediawiki.org/wiki/API |