|
@@ -115,36 +115,47 @@ Likewise, use ``unicode(code)`` in Python 2. |
|
|
|
|
|
|
|
|
Limitations |
|
|
Limitations |
|
|
----------- |
|
|
----------- |
|
|
While the MediaWiki parser generates HTML, mwparserfromhell acts as an interface to |
|
|
|
|
|
the source code. mwparserfromhell therefore is unaware of template definitions since |
|
|
|
|
|
if it would substitute templates with their output you would no longer be working |
|
|
|
|
|
with the source code. This has several implications: |
|
|
|
|
|
|
|
|
|
|
|
* Start and end tags generated by templates aren't recognized e.g. ``<b>foobar{{bold-end}}``. |
|
|
|
|
|
|
|
|
While the MediaWiki parser generates HTML and has access to the contents of |
|
|
|
|
|
templates, among other things, mwparserfromhell acts as a direct interface to |
|
|
|
|
|
the source code only. This has several implications: |
|
|
|
|
|
|
|
|
* Templates adjacent to external links e.g. ``http://example.com{{foo}}`` are |
|
|
|
|
|
considered part of the link. |
|
|
|
|
|
|
|
|
* Syntax elements produced by a template transclusion cannot be detected. For |
|
|
|
|
|
example, imagine a hypothetical page ``"Template:End-bold"`` that contained |
|
|
|
|
|
the text ``</b>``. While MediaWiki would correctly understand that |
|
|
|
|
|
``<b>foobar{{end-bold}}`` translates to ``<b>foobar</b>``, mwparserfromhell |
|
|
|
|
|
has no way of examining the contents of ``{{end-bold}}``. Instead, it would |
|
|
|
|
|
treat the bold tag as unfinished, possibly extending further down the page. |
|
|
|
|
|
|
|
|
* Crossed constructs like ``{{echo|''Hello}}, world!''`` are not supported, |
|
|
|
|
|
the first node is treated as plain text. |
|
|
|
|
|
|
|
|
* Templates adjacent to external links, as in ``http://example.com{{foo}}``, |
|
|
|
|
|
are considered part of the link. In reality, this would depend on the |
|
|
|
|
|
contents of the template. |
|
|
|
|
|
|
|
|
The current workaround for cases where you are not interested in text |
|
|
|
|
|
formatting is to pass ``skip_style_tags=True`` to ``mwparserfromhell.parse()``. |
|
|
|
|
|
This treats ``''`` and ``'''`` like plain text. |
|
|
|
|
|
|
|
|
* When different syntax elements cross over each other, as in |
|
|
|
|
|
``{{echo|''Hello}}, world!''``, the parser gets confused because this cannot |
|
|
|
|
|
be represented by an ordinary syntax tree. Instead, the parser will treat the |
|
|
|
|
|
first syntax construct as plain text. In this case, only the italic tag would |
|
|
|
|
|
be properly parsed. |
|
|
|
|
|
|
|
|
A future version of mwparserfromhell will include multiple parsing modes to get |
|
|
|
|
|
around this restriction. |
|
|
|
|
|
|
|
|
**Workaround:** Since this commonly occurs with text formatting and text |
|
|
|
|
|
formatting is often not of interest to users, you may pass |
|
|
|
|
|
*skip_style_tags=True* to ``mwparserfromhell.parse()``. This treats ``''`` |
|
|
|
|
|
and ``'''`` as plain text. |
|
|
|
|
|
|
|
|
Configuration unawareness |
|
|
|
|
|
|
|
|
A future version of mwparserfromhell may include multiple parsing modes to |
|
|
|
|
|
get around this restriction more sensibly. |
|
|
|
|
|
|
|
|
* `word-ending links`_ are not supported since the linktrail rules are language-specific. |
|
|
|
|
|
|
|
|
Additionally, the parser lacks awareness of certain wiki-specific settings: |
|
|
|
|
|
|
|
|
* Localized namespace names aren't recognized, e.g. ``[[File:...]]`` |
|
|
|
|
|
links are treated as regular wikilinks. |
|
|
|
|
|
|
|
|
* `word-ending links`_ are not supported, since the linktrail rules are |
|
|
|
|
|
language-specific. |
|
|
|
|
|
|
|
|
* Anything that looks like an XML tag is parsed as a tag |
|
|
|
|
|
since the available tags are extension-dependent. |
|
|
|
|
|
|
|
|
* Localized namespace names aren't recognized, so file links (such as |
|
|
|
|
|
``[[File:...]]``) are treated as regular wikilinks. |
|
|
|
|
|
|
|
|
|
|
|
* Anything that looks like an XML tag is treated as a tag, even if it is not a |
|
|
|
|
|
recognized tag name, since the list of valid tags depends on loaded MediaWiki |
|
|
|
|
|
extensions. |
|
|
|
|
|
|
|
|
Integration |
|
|
Integration |
|
|
----------- |
|
|
----------- |
|
|