From 2d89f611be365e181d2fa3df2bfbab6fde2ab07c Mon Sep 17 00:00:00 2001 From: Larivact Date: Sun, 4 Jun 2017 22:37:05 +0200 Subject: [PATCH] rewrite Caveats >not supported, since they cannot be represented in the node tree. It's not that they cannot be represented, it's that they would have to be evaluated. --- README.rst | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/README.rst b/README.rst index 86143c6..5ac605a 100644 --- a/README.rst +++ b/README.rst @@ -115,12 +115,18 @@ Likewise, use ``unicode(code)`` in Python 2. Caveats ------- -mwparserfromhell generates an abstract syntax tree instead of HTML. +While the MediaWiki parser generates HTML, mwparserfromhell acts as an interface to +the source code. mwparserfromhell therefore is unaware of template definitions since +if it would substitute templates with their output you could no longer change the templates. This has several implications: -* Crossed constructs like ``{{echo|''Hello}}, world!''`` are not supported, - since they cannot be represented in the node tree. We compromise by treating - the first node (i.e. the template) as plain text, parsing only the italics. +* Start and end tags generated by templates aren't recognized e.g. ``foobar{{bold-end}}``. + +* Templates adjacent to external links e.g. ``http://example.com{{foo}}`` are + considered part of the link. + +* Crossed constructs like ``{{echo|''Hello}}, world!''`` are not supported. + We compromise by treating the first node as plain text. The current workaround for cases where you are not interested in text formatting is to pass ``skip_style_tags=True`` to ``mwparserfromhell.parse()``. @@ -129,11 +135,6 @@ This has several implications: A future version of mwparserfromhell will include multiple parsing modes to get around this restriction. -* Templates adjacent to external links e.g. ``http://example.com{{foo}}`` are - considered part of the link, since mwparserfromhell does not know the - definition of templates and even if it did the template could only be - partially part of the link which also couldn't be represented in the AST. - Integration -----------