@@ -1,4 +1,24 @@ | |||
v0.1.1 (19da4d2144) to v0.2: | |||
v0.3 (released August 24, 2013): | |||
- Added complete support for HTML Tags, including forms like <ref>foo</ref>, | |||
<ref name="bar"/>, and wiki-markup tags like bold ('''), italics (''), and | |||
lists (*, #, ; and :). | |||
- Added support for ExternalLinks (http://example.com/ and | |||
[http://example.com/ Example]). | |||
- Wikicode's filter methods are now passed 'recursive=True' by default instead | |||
of False. This is a breaking change if you rely on any filter() methods being | |||
non-recursive by default. | |||
- Added a matches() method to Wikicode for page/template name comparisons. | |||
- The 'obj' param of Wikicode.insert_before(), insert_after(), replace(), and | |||
remove() now accepts other Wikicode objects and strings representing parts of | |||
wikitext, instead of just nodes. These methods also make all possible | |||
substitutions instead of just one. | |||
- Renamed Template.has_param() to has() for consistency with Template's other | |||
methods; has_param() is now an alias. | |||
- The C tokenizer extension now works on Python 3 in addition to Python 2.7. | |||
- Various bugfixes, internal changes, and cleanup. | |||
v0.2 (released June 20, 2013): | |||
- The parser now fully supports Python 3 in addition to Python 2.7. | |||
- Added a C tokenizer extension that is significantly faster than its Python | |||
@@ -24,10 +44,14 @@ v0.1.1 (19da4d2144) to v0.2: | |||
- Fixed some broken example code in the README; other copyedits. | |||
- Other bugfixes and code cleanup. | |||
v0.1 (ba94938fe8) to v0.1.1 (19da4d2144): | |||
v0.1.1 (released September 21, 2012): | |||
- Added support for Comments (<!-- foo -->) and Wikilinks ([[foo]]). | |||
- Added corresponding ifilter_links() and filter_links() methods to Wikicode. | |||
- Fixed a bug when parsing incomplete templates. | |||
- Fixed strip_code() to affect the contents of headings. | |||
- Various copyedits in documentation and comments. | |||
v0.1 (released August 23, 2012): | |||
- Initial release. |
@@ -9,7 +9,8 @@ mwparserfromhell | |||
that provides an easy-to-use and outrageously powerful parser for MediaWiki_ | |||
wikicode. It supports Python 2 and Python 3. | |||
Developed by Earwig_ with help from `Σ`_. | |||
Developed by Earwig_ with help from `Σ`_. Full documentation is available on | |||
ReadTheDocs_. | |||
Installation | |||
------------ | |||
@@ -18,7 +19,7 @@ The easiest way to install the parser is through the `Python Package Index`_, | |||
so you can install the latest release with ``pip install mwparserfromhell`` | |||
(`get pip`_). Alternatively, get the latest development version:: | |||
git clone git://github.com/earwig/mwparserfromhell.git | |||
git clone https://github.com/earwig/mwparserfromhell.git | |||
cd mwparserfromhell | |||
python setup.py install | |||
@@ -59,13 +60,20 @@ For example:: | |||
>>> print template.get("eggs").value | |||
spam | |||
Since every node you reach is also a ``Wikicode`` object, it's trivial to get | |||
nested templates:: | |||
Since nodes can contain other nodes, getting nested templates is trivial:: | |||
>>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}" | |||
>>> mwparserfromhell.parse(text).filter_templates() | |||
['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}'] | |||
You can also pass ``recursive=False`` to ``filter_templates()`` and explore | |||
templates manually. This is possible because nodes can contain additional | |||
``Wikicode`` objects:: | |||
>>> code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}") | |||
>>> print code.filter_templates() | |||
>>> print code.filter_templates(recursive=False) | |||
['{{foo|this {{includes a|template}}}}'] | |||
>>> foo = code.filter_templates()[0] | |||
>>> foo = code.filter_templates(recursive=False)[0] | |||
>>> print foo.get(1).value | |||
this {{includes a|template}} | |||
>>> print foo.get(1).value.filter_templates()[0] | |||
@@ -73,21 +81,16 @@ nested templates:: | |||
>>> print foo.get(1).value.filter_templates()[0].get(1).value | |||
template | |||
Additionally, you can include nested templates in ``filter_templates()`` by | |||
passing ``recursive=True``:: | |||
>>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}" | |||
>>> mwparserfromhell.parse(text).filter_templates(recursive=True) | |||
['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}'] | |||
Templates can be easily modified to add, remove, or alter params. ``Wikicode`` | |||
can also be treated like a list with ``append()``, ``insert()``, ``remove()``, | |||
``replace()``, and more:: | |||
objects can be treated like lists, with ``append()``, ``insert()``, | |||
``remove()``, ``replace()``, and more. They also have a ``matches()`` method | |||
for comparing page or template names, which takes care of capitalization and | |||
whitespace:: | |||
>>> text = "{{cleanup}} '''Foo''' is a [[bar]]. {{uncategorized}}" | |||
>>> code = mwparserfromhell.parse(text) | |||
>>> for template in code.filter_templates(): | |||
... if template.name == "cleanup" and not template.has_param("date"): | |||
... if template.name.matches("Cleanup") and not template.has("date"): | |||
... template.add("date", "July 2012") | |||
... | |||
>>> print code | |||
@@ -142,6 +145,7 @@ following code (via the API_):: | |||
return mwparserfromhell.parse(text) | |||
.. _MediaWiki: http://mediawiki.org | |||
.. _ReadTheDocs: http://mwparserfromhell.readthedocs.org | |||
.. _Earwig: http://en.wikipedia.org/wiki/User:The_Earwig | |||
.. _Σ: http://en.wikipedia.org/wiki/User:%CE%A3 | |||
.. _Python Package Index: http://pypi.python.org | |||
@@ -25,6 +25,14 @@ nodes Package | |||
:undoc-members: | |||
:show-inheritance: | |||
:mod:`external_link` Module | |||
--------------------------- | |||
.. automodule:: mwparserfromhell.nodes.external_link | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: | |||
:mod:`heading` Module | |||
--------------------- | |||
@@ -46,6 +54,7 @@ nodes Package | |||
.. automodule:: mwparserfromhell.nodes.tag | |||
:members: | |||
:undoc-members: | |||
:show-inheritance: | |||
:mod:`template` Module | |||
@@ -30,6 +30,12 @@ mwparserfromhell Package | |||
:members: | |||
:undoc-members: | |||
:mod:`definitions` Module | |||
------------------------- | |||
.. automodule:: mwparserfromhell.definitions | |||
:members: | |||
:mod:`utils` Module | |||
------------------- | |||
@@ -1,10 +1,38 @@ | |||
Changelog | |||
========= | |||
v0.3 | |||
---- | |||
`Released August 24, 2013 <https://github.com/earwig/mwparserfromhell/tree/v0.3>`_ | |||
(`changes <https://github.com/earwig/mwparserfromhell/compare/v0.2...v0.3>`__): | |||
- Added complete support for HTML :py:class:`Tags <.Tag>`, including forms like | |||
``<ref>foo</ref>``, ``<ref name="bar"/>``, and wiki-markup tags like bold | |||
(``'''``), italics (``''``), and lists (``*``, ``#``, ``;`` and ``:``). | |||
- Added support for :py:class:`.ExternalLink`\ s (``http://example.com/`` and | |||
``[http://example.com/ Example]``). | |||
- :py:class:`Wikicode's <.Wikicode>` :py:meth:`.filter` methods are now passed | |||
*recursive=True* by default instead of *False*. **This is a breaking change | |||
if you rely on any filter() methods being non-recursive by default.** | |||
- Added a :py:meth:`.matches` method to :py:class:`~.Wikicode` for | |||
page/template name comparisons. | |||
- The *obj* param of :py:meth:`Wikicode.insert_before() <.insert_before>`, | |||
:py:meth:`~.insert_after`, :py:meth:`~.Wikicode.replace`, and | |||
:py:meth:`~.Wikicode.remove` now accepts :py:class:`~.Wikicode` objects and | |||
strings representing parts of wikitext, instead of just nodes. These methods | |||
also make all possible substitutions instead of just one. | |||
- Renamed :py:meth:`Template.has_param() <.has_param>` to | |||
:py:meth:`~.Template.has` for consistency with :py:class:`~.Template`\ 's | |||
other methods; :py:meth:`~.has_param` is now an alias. | |||
- The C tokenizer extension now works on Python 3 in addition to Python 2.7. | |||
- Various bugfixes, internal changes, and cleanup. | |||
v0.2 | |||
---- | |||
19da4d2144_ to master_ (released June 20, 2013) | |||
`Released June 20, 2013 <https://github.com/earwig/mwparserfromhell/tree/v0.2>`_ | |||
(`changes <https://github.com/earwig/mwparserfromhell/compare/v0.1.1...v0.2>`__): | |||
- The parser now fully supports Python 3 in addition to Python 2.7. | |||
- Added a C tokenizer extension that is significantly faster than its Python | |||
@@ -38,7 +66,8 @@ v0.2 | |||
v0.1.1 | |||
------ | |||
ba94938fe8_ to 19da4d2144_ (released September 21, 2012) | |||
`Released September 21, 2012 <https://github.com/earwig/mwparserfromhell/tree/v0.1.1>`_ | |||
(`changes <https://github.com/earwig/mwparserfromhell/compare/v0.1...v0.1.1>`__): | |||
- Added support for :py:class:`Comments <.Comment>` (``<!-- foo -->``) and | |||
:py:class:`Wikilinks <.Wikilink>` (``[[foo]]``). | |||
@@ -51,8 +80,6 @@ ba94938fe8_ to 19da4d2144_ (released September 21, 2012) | |||
v0.1 | |||
---- | |||
ba94938fe8_ (released August 23, 2012) | |||
`Released August 23, 2012 <https://github.com/earwig/mwparserfromhell/tree/v0.1>`_: | |||
.. _master: https://github.com/earwig/mwparserfromhell/tree/v0.2 | |||
.. _19da4d2144: https://github.com/earwig/mwparserfromhell/tree/v0.1.1 | |||
.. _ba94938fe8: https://github.com/earwig/mwparserfromhell/tree/v0.1 | |||
- Initial release. |
@@ -1,15 +1,18 @@ | |||
MWParserFromHell v0.2 Documentation | |||
=================================== | |||
MWParserFromHell v\ |version| Documentation | |||
=========================================== | |||
:py:mod:`mwparserfromhell` (the *MediaWiki Parser from Hell*) is a Python | |||
package that provides an easy-to-use and outrageously powerful parser for | |||
MediaWiki_ wikicode. It supports Python 2 and Python 3. | |||
Developed by Earwig_ with help from `Σ`_. | |||
Developed by Earwig_ with contributions from `Σ`_, Legoktm_, and others. | |||
Development occurs on GitHub_. | |||
.. _MediaWiki: http://mediawiki.org | |||
.. _Earwig: http://en.wikipedia.org/wiki/User:The_Earwig | |||
.. _Σ: http://en.wikipedia.org/wiki/User:%CE%A3 | |||
.. _Legoktm: http://en.wikipedia.org/wiki/User:Legoktm | |||
.. _GitHub: https://github.com/earwig/mwparserfromhell | |||
Installation | |||
------------ | |||
@@ -18,7 +21,7 @@ The easiest way to install the parser is through the `Python Package Index`_, | |||
so you can install the latest release with ``pip install mwparserfromhell`` | |||
(`get pip`_). Alternatively, get the latest development version:: | |||
git clone git://github.com/earwig/mwparserfromhell.git | |||
git clone https://github.com/earwig/mwparserfromhell.git | |||
cd mwparserfromhell | |||
python setup.py install | |||
@@ -27,13 +27,20 @@ some extra methods. For example:: | |||
>>> print template.get("eggs").value | |||
spam | |||
Since every node you reach is also a :py:class:`~.Wikicode` object, it's | |||
trivial to get nested templates:: | |||
Since nodes can contain other nodes, getting nested templates is trivial:: | |||
>>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}" | |||
>>> mwparserfromhell.parse(text).filter_templates() | |||
['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}'] | |||
You can also pass *recursive=False* to :py:meth:`~.filter_templates` and | |||
explore templates manually. This is possible because nodes can contain | |||
additional :py:class:`~.Wikicode` objects:: | |||
>>> code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}") | |||
>>> print code.filter_templates() | |||
>>> print code.filter_templates(recursive=False) | |||
['{{foo|this {{includes a|template}}}}'] | |||
>>> foo = code.filter_templates()[0] | |||
>>> foo = code.filter_templates(recursive=False)[0] | |||
>>> print foo.get(1).value | |||
this {{includes a|template}} | |||
>>> print foo.get(1).value.filter_templates()[0] | |||
@@ -41,22 +48,17 @@ trivial to get nested templates:: | |||
>>> print foo.get(1).value.filter_templates()[0].get(1).value | |||
template | |||
Additionally, you can include nested templates in :py:meth:`~.filter_templates` | |||
by passing *recursive=True*:: | |||
>>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}" | |||
>>> mwparserfromhell.parse(text).filter_templates(recursive=True) | |||
['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}'] | |||
Templates can be easily modified to add, remove, or alter params. | |||
:py:class:`~.Wikicode` can also be treated like a list with | |||
:py:class:`~.Wikicode` objects can be treated like lists, with | |||
:py:meth:`~.Wikicode.append`, :py:meth:`~.Wikicode.insert`, | |||
:py:meth:`~.Wikicode.remove`, :py:meth:`~.Wikicode.replace`, and more:: | |||
:py:meth:`~.Wikicode.remove`, :py:meth:`~.Wikicode.replace`, and more. They | |||
also have a :py:meth:`~.Wikicode.matches` method for comparing page or template | |||
names, which takes care of capitalization and whitespace:: | |||
>>> text = "{{cleanup}} '''Foo''' is a [[bar]]. {{uncategorized}}" | |||
>>> code = mwparserfromhell.parse(text) | |||
>>> for template in code.filter_templates(): | |||
... if template.name == "cleanup" and not template.has_param("date"): | |||
... if template.name.matches("Cleanup") and not template.has("date"): | |||
... template.add("date", "July 2012") | |||
... | |||
>>> print code | |||
@@ -31,9 +31,10 @@ from __future__ import unicode_literals | |||
__author__ = "Ben Kurtovic" | |||
__copyright__ = "Copyright (C) 2012, 2013 Ben Kurtovic" | |||
__license__ = "MIT License" | |||
__version__ = "0.2" | |||
__version__ = "0.3" | |||
__email__ = "ben.kurtovic@verizon.net" | |||
from . import compat, nodes, parser, smart_list, string_mixin, utils, wikicode | |||
from . import (compat, definitions, nodes, parser, smart_list, string_mixin, | |||
utils, wikicode) | |||
parse = utils.parse_anything |
@@ -15,14 +15,12 @@ py3k = sys.version_info[0] == 3 | |||
if py3k: | |||
bytes = bytes | |||
str = str | |||
basestring = str | |||
maxsize = sys.maxsize | |||
import html.entities as htmlentities | |||
else: | |||
bytes = str | |||
str = unicode | |||
basestring = basestring | |||
maxsize = sys.maxint | |||
import htmlentitydefs as htmlentities | |||
@@ -0,0 +1,91 @@ | |||
# -*- coding: utf-8 -*- | |||
# | |||
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net> | |||
# | |||
# Permission is hereby granted, free of charge, to any person obtaining a copy | |||
# of this software and associated documentation files (the "Software"), to deal | |||
# in the Software without restriction, including without limitation the rights | |||
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | |||
# copies of the Software, and to permit persons to whom the Software is | |||
# furnished to do so, subject to the following conditions: | |||
# | |||
# The above copyright notice and this permission notice shall be included in | |||
# all copies or substantial portions of the Software. | |||
# | |||
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | |||
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | |||
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | |||
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | |||
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | |||
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | |||
# SOFTWARE. | |||
"""Contains data about certain markup, like HTML tags and external links.""" | |||
from __future__ import unicode_literals | |||
__all__ = ["get_html_tag", "is_parsable", "is_visible", "is_single", | |||
"is_single_only", "is_scheme"] | |||
URI_SCHEMES = { | |||
# [mediawiki/core.git]/includes/DefaultSettings.php @ 374a0ad943 | |||
"http": True, "https": True, "ftp": True, "ftps": True, "ssh": True, | |||
"sftp": True, "irc": True, "ircs": True, "xmpp": False, "sip": False, | |||
"sips": False, "gopher": True, "telnet": True, "nntp": True, | |||
"worldwind": True, "mailto": False, "tel": False, "sms": False, | |||
"news": False, "svn": True, "git": True, "mms": True, "bitcoin": False, | |||
"magnet": False, "urn": False, "geo": False | |||
} | |||
PARSER_BLACKLIST = [ | |||
# enwiki extensions @ 2013-06-28 | |||
"categorytree", "gallery", "hiero", "imagemap", "inputbox", "math", | |||
"nowiki", "pre", "score", "section", "source", "syntaxhighlight", | |||
"templatedata", "timeline" | |||
] | |||
INVISIBLE_TAGS = [ | |||
# enwiki extensions @ 2013-06-28 | |||
"categorytree", "gallery", "imagemap", "inputbox", "math", "score", | |||
"section", "templatedata", "timeline" | |||
] | |||
# [mediawiki/core.git]/includes/Sanitizer.php @ 87a0aef762 | |||
SINGLE_ONLY = ["br", "hr", "meta", "link", "img"] | |||
SINGLE = SINGLE_ONLY + ["li", "dt", "dd"] | |||
MARKUP_TO_HTML = { | |||
"#": "li", | |||
"*": "li", | |||
";": "dt", | |||
":": "dd" | |||
} | |||
def get_html_tag(markup): | |||
"""Return the HTML tag associated with the given wiki-markup.""" | |||
return MARKUP_TO_HTML[markup] | |||
def is_parsable(tag): | |||
"""Return if the given *tag*'s contents should be passed to the parser.""" | |||
return tag.lower() not in PARSER_BLACKLIST | |||
def is_visible(tag): | |||
"""Return whether or not the given *tag* contains visible text.""" | |||
return tag.lower() not in INVISIBLE_TAGS | |||
def is_single(tag): | |||
"""Return whether or not the given *tag* can exist without a close tag.""" | |||
return tag.lower() in SINGLE | |||
def is_single_only(tag): | |||
"""Return whether or not the given *tag* must exist without a close tag.""" | |||
return tag.lower() in SINGLE_ONLY | |||
def is_scheme(scheme, slashes=True, reverse=False): | |||
"""Return whether *scheme* is valid for external links.""" | |||
if reverse: # Convenience for C | |||
scheme = scheme[::-1] | |||
scheme = scheme.lower() | |||
if slashes: | |||
return scheme in URI_SCHEMES | |||
return scheme in URI_SCHEMES and not URI_SCHEMES[scheme] |
@@ -69,6 +69,7 @@ from . import extras | |||
from .text import Text | |||
from .argument import Argument | |||
from .comment import Comment | |||
from .external_link import ExternalLink | |||
from .heading import Heading | |||
from .html_entity import HTMLEntity | |||
from .tag import Tag | |||
@@ -0,0 +1,97 @@ | |||
# -*- coding: utf-8 -*- | |||
# | |||
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net> | |||
# | |||
# Permission is hereby granted, free of charge, to any person obtaining a copy | |||
# of this software and associated documentation files (the "Software"), to deal | |||
# in the Software without restriction, including without limitation the rights | |||
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | |||
# copies of the Software, and to permit persons to whom the Software is | |||
# furnished to do so, subject to the following conditions: | |||
# | |||
# The above copyright notice and this permission notice shall be included in | |||
# all copies or substantial portions of the Software. | |||
# | |||
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | |||
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | |||
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | |||
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | |||
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | |||
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | |||
# SOFTWARE. | |||
from __future__ import unicode_literals | |||
from . import Node | |||
from ..compat import str | |||
from ..utils import parse_anything | |||
__all__ = ["ExternalLink"] | |||
class ExternalLink(Node): | |||
"""Represents an external link, like ``[http://example.com/ Example]``.""" | |||
def __init__(self, url, title=None, brackets=True): | |||
super(ExternalLink, self).__init__() | |||
self._url = url | |||
self._title = title | |||
self._brackets = brackets | |||
def __unicode__(self): | |||
if self.brackets: | |||
if self.title is not None: | |||
return "[" + str(self.url) + " " + str(self.title) + "]" | |||
return "[" + str(self.url) + "]" | |||
return str(self.url) | |||
def __iternodes__(self, getter): | |||
yield None, self | |||
for child in getter(self.url): | |||
yield self.url, child | |||
if self.title is not None: | |||
for child in getter(self.title): | |||
yield self.title, child | |||
def __strip__(self, normalize, collapse): | |||
if self.brackets: | |||
if self.title: | |||
return self.title.strip_code(normalize, collapse) | |||
return None | |||
return self.url.strip_code(normalize, collapse) | |||
def __showtree__(self, write, get, mark): | |||
if self.brackets: | |||
write("[") | |||
get(self.url) | |||
if self.title is not None: | |||
get(self.title) | |||
if self.brackets: | |||
write("]") | |||
@property | |||
def url(self): | |||
"""The URL of the link target, as a :py:class:`~.Wikicode` object.""" | |||
return self._url | |||
@property | |||
def title(self): | |||
"""The link title (if given), as a :py:class:`~.Wikicode` object.""" | |||
return self._title | |||
@property | |||
def brackets(self): | |||
"""Whether to enclose the URL in brackets or display it straight.""" | |||
return self._brackets | |||
@url.setter | |||
def url(self, value): | |||
from ..parser import contexts | |||
self._url = parse_anything(value, contexts.EXT_LINK_URI) | |||
@title.setter | |||
def title(self, value): | |||
self._title = None if value is None else parse_anything(value) | |||
@brackets.setter | |||
def brackets(self, value): | |||
self._brackets = bool(value) |
@@ -36,18 +36,34 @@ class Attribute(StringMixIn): | |||
whose value is ``"foo"``. | |||
""" | |||
def __init__(self, name, value=None, quoted=True): | |||
def __init__(self, name, value=None, quoted=True, pad_first=" ", | |||
pad_before_eq="", pad_after_eq=""): | |||
super(Attribute, self).__init__() | |||
self._name = name | |||
self._value = value | |||
self._quoted = quoted | |||
self._pad_first = pad_first | |||
self._pad_before_eq = pad_before_eq | |||
self._pad_after_eq = pad_after_eq | |||
def __unicode__(self): | |||
if self.value: | |||
result = self.pad_first + str(self.name) + self.pad_before_eq | |||
if self.value is not None: | |||
result += "=" + self.pad_after_eq | |||
if self.quoted: | |||
return str(self.name) + '="' + str(self.value) + '"' | |||
return str(self.name) + "=" + str(self.value) | |||
return str(self.name) | |||
return result + '"' + str(self.value) + '"' | |||
return result + str(self.value) | |||
return result | |||
def _set_padding(self, attr, value): | |||
"""Setter for the value of a padding attribute.""" | |||
if not value: | |||
setattr(self, attr, "") | |||
else: | |||
value = str(value) | |||
if not value.isspace(): | |||
raise ValueError("padding must be entirely whitespace") | |||
setattr(self, attr, value) | |||
@property | |||
def name(self): | |||
@@ -64,14 +80,41 @@ class Attribute(StringMixIn): | |||
"""Whether the attribute's value is quoted with double quotes.""" | |||
return self._quoted | |||
@property | |||
def pad_first(self): | |||
"""Spacing to insert right before the attribute.""" | |||
return self._pad_first | |||
@property | |||
def pad_before_eq(self): | |||
"""Spacing to insert right before the equal sign.""" | |||
return self._pad_before_eq | |||
@property | |||
def pad_after_eq(self): | |||
"""Spacing to insert right after the equal sign.""" | |||
return self._pad_after_eq | |||
@name.setter | |||
def name(self, newval): | |||
self._name = parse_anything(newval) | |||
def name(self, value): | |||
self._name = parse_anything(value) | |||
@value.setter | |||
def value(self, newval): | |||
self._value = parse_anything(newval) | |||
self._value = None if newval is None else parse_anything(newval) | |||
@quoted.setter | |||
def quoted(self, newval): | |||
self._quoted = bool(newval) | |||
def quoted(self, value): | |||
self._quoted = bool(value) | |||
@pad_first.setter | |||
def pad_first(self, value): | |||
self._set_padding("_pad_first", value) | |||
@pad_before_eq.setter | |||
def pad_before_eq(self, value): | |||
self._set_padding("_pad_before_eq", value) | |||
@pad_after_eq.setter | |||
def pad_after_eq(self, value): | |||
self._set_padding("_pad_after_eq", value) |
@@ -22,8 +22,10 @@ | |||
from __future__ import unicode_literals | |||
from . import Node, Text | |||
from . import Node | |||
from .extras import Attribute | |||
from ..compat import str | |||
from ..definitions import is_visible | |||
from ..utils import parse_anything | |||
__all__ = ["Tag"] | |||
@@ -31,146 +33,85 @@ __all__ = ["Tag"] | |||
class Tag(Node): | |||
"""Represents an HTML-style tag in wikicode, like ``<ref>``.""" | |||
TAG_UNKNOWN = 0 | |||
# Basic HTML: | |||
TAG_ITALIC = 1 | |||
TAG_BOLD = 2 | |||
TAG_UNDERLINE = 3 | |||
TAG_STRIKETHROUGH = 4 | |||
TAG_UNORDERED_LIST = 5 | |||
TAG_ORDERED_LIST = 6 | |||
TAG_DEF_TERM = 7 | |||
TAG_DEF_ITEM = 8 | |||
TAG_BLOCKQUOTE = 9 | |||
TAG_RULE = 10 | |||
TAG_BREAK = 11 | |||
TAG_ABBR = 12 | |||
TAG_PRE = 13 | |||
TAG_MONOSPACE = 14 | |||
TAG_CODE = 15 | |||
TAG_SPAN = 16 | |||
TAG_DIV = 17 | |||
TAG_FONT = 18 | |||
TAG_SMALL = 19 | |||
TAG_BIG = 20 | |||
TAG_CENTER = 21 | |||
# MediaWiki parser hooks: | |||
TAG_REF = 101 | |||
TAG_GALLERY = 102 | |||
TAG_MATH = 103 | |||
TAG_NOWIKI = 104 | |||
TAG_NOINCLUDE = 105 | |||
TAG_INCLUDEONLY = 106 | |||
TAG_ONLYINCLUDE = 107 | |||
# Additional parser hooks: | |||
TAG_SYNTAXHIGHLIGHT = 201 | |||
TAG_POEM = 202 | |||
# Lists of tags: | |||
TAGS_INVISIBLE = set((TAG_REF, TAG_GALLERY, TAG_MATH, TAG_NOINCLUDE)) | |||
TAGS_VISIBLE = set(range(300)) - TAGS_INVISIBLE | |||
def __init__(self, type_, tag, contents=None, attrs=None, showtag=True, | |||
self_closing=False, open_padding=0, close_padding=0): | |||
def __init__(self, tag, contents=None, attrs=None, wiki_markup=None, | |||
self_closing=False, invalid=False, implicit=False, padding="", | |||
closing_tag=None): | |||
super(Tag, self).__init__() | |||
self._type = type_ | |||
self._tag = tag | |||
self._contents = contents | |||
if attrs: | |||
self._attrs = attrs | |||
if contents is None and not self_closing: | |||
self._contents = parse_anything("") | |||
else: | |||
self._attrs = [] | |||
self._showtag = showtag | |||
self._contents = contents | |||
self._attrs = attrs if attrs else [] | |||
self._wiki_markup = wiki_markup | |||
self._self_closing = self_closing | |||
self._open_padding = open_padding | |||
self._close_padding = close_padding | |||
self._invalid = invalid | |||
self._implicit = implicit | |||
self._padding = padding | |||
if closing_tag: | |||
self._closing_tag = closing_tag | |||
else: | |||
self._closing_tag = tag | |||
def __unicode__(self): | |||
if not self.showtag: | |||
open_, close = self._translate() | |||
if self.wiki_markup: | |||
if self.self_closing: | |||
return open_ | |||
return self.wiki_markup | |||
else: | |||
return open_ + str(self.contents) + close | |||
return self.wiki_markup + str(self.contents) + self.wiki_markup | |||
result = "<" + str(self.tag) | |||
if self.attrs: | |||
result += " " + " ".join([str(attr) for attr in self.attrs]) | |||
result = ("</" if self.invalid else "<") + str(self.tag) | |||
if self.attributes: | |||
result += "".join([str(attr) for attr in self.attributes]) | |||
if self.self_closing: | |||
result += " " * self.open_padding + "/>" | |||
result += self.padding + (">" if self.implicit else "/>") | |||
else: | |||
result += " " * self.open_padding + ">" + str(self.contents) | |||
result += "</" + str(self.tag) + " " * self.close_padding + ">" | |||
result += self.padding + ">" + str(self.contents) | |||
result += "</" + str(self.closing_tag) + ">" | |||
return result | |||
def __iternodes__(self, getter): | |||
yield None, self | |||
if self.showtag: | |||
if not self.wiki_markup: | |||
for child in getter(self.tag): | |||
yield self.tag, child | |||
for attr in self.attrs: | |||
for attr in self.attributes: | |||
for child in getter(attr.name): | |||
yield attr.name, child | |||
if attr.value: | |||
for child in getter(attr.value): | |||
yield attr.value, child | |||
for child in getter(self.contents): | |||
yield self.contents, child | |||
if self.contents: | |||
for child in getter(self.contents): | |||
yield self.contents, child | |||
if not self.self_closing and not self.wiki_markup and self.closing_tag: | |||
for child in getter(self.closing_tag): | |||
yield self.closing_tag, child | |||
def __strip__(self, normalize, collapse): | |||
if self.type in self.TAGS_VISIBLE: | |||
if self.contents and is_visible(self.tag): | |||
return self.contents.strip_code(normalize, collapse) | |||
return None | |||
def __showtree__(self, write, get, mark): | |||
tagnodes = self.tag.nodes | |||
if (not self.attrs and len(tagnodes) == 1 and isinstance(tagnodes[0], Text)): | |||
write("<" + str(tagnodes[0]) + ">") | |||
write("</" if self.invalid else "<") | |||
get(self.tag) | |||
for attr in self.attributes: | |||
get(attr.name) | |||
if not attr.value: | |||
continue | |||
write(" = ") | |||
mark() | |||
get(attr.value) | |||
if self.self_closing: | |||
write(">" if self.implicit else "/>") | |||
else: | |||
write("<") | |||
get(self.tag) | |||
for attr in self.attrs: | |||
get(attr.name) | |||
if not attr.value: | |||
continue | |||
write(" = ") | |||
mark() | |||
get(attr.value) | |||
write(">") | |||
get(self.contents) | |||
if len(tagnodes) == 1 and isinstance(tagnodes[0], Text): | |||
write("</" + str(tagnodes[0]) + ">") | |||
else: | |||
get(self.contents) | |||
write("</") | |||
get(self.tag) | |||
get(self.closing_tag) | |||
write(">") | |||
def _translate(self): | |||
"""If the HTML-style tag has a wikicode representation, return that. | |||
For example, ``<b>Foo</b>`` can be represented as ``'''Foo'''``. This | |||
returns a tuple of the character starting the sequence and the | |||
character ending it. | |||
""" | |||
translations = { | |||
self.TAG_ITALIC: ("''", "''"), | |||
self.TAG_BOLD: ("'''", "'''"), | |||
self.TAG_UNORDERED_LIST: ("*", ""), | |||
self.TAG_ORDERED_LIST: ("#", ""), | |||
self.TAG_DEF_TERM: (";", ""), | |||
self.TAG_DEF_ITEM: (":", ""), | |||
self.TAG_RULE: ("----", ""), | |||
} | |||
return translations[self.type] | |||
@property | |||
def type(self): | |||
"""The tag type.""" | |||
return self._type | |||
@property | |||
def tag(self): | |||
"""The tag itself, as a :py:class:`~.Wikicode` object.""" | |||
@@ -182,7 +123,7 @@ class Tag(Node): | |||
return self._contents | |||
@property | |||
def attrs(self): | |||
def attributes(self): | |||
"""The list of attributes affecting the tag. | |||
Each attribute is an instance of :py:class:`~.Attribute`. | |||
@@ -190,52 +131,142 @@ class Tag(Node): | |||
return self._attrs | |||
@property | |||
def showtag(self): | |||
"""Whether to show the tag itself instead of a wikicode version.""" | |||
return self._showtag | |||
def wiki_markup(self): | |||
"""The wikified version of a tag to show instead of HTML. | |||
If set to a value, this will be displayed instead of the brackets. | |||
For example, set to ``''`` to replace ``<i>`` or ``----`` to replace | |||
``<hr>``. | |||
""" | |||
return self._wiki_markup | |||
@property | |||
def self_closing(self): | |||
"""Whether the tag is self-closing with no content.""" | |||
"""Whether the tag is self-closing with no content (like ``<br/>``).""" | |||
return self._self_closing | |||
@property | |||
def open_padding(self): | |||
"""How much spacing to insert before the first closing >.""" | |||
return self._open_padding | |||
def invalid(self): | |||
"""Whether the tag starts with a backslash after the opening bracket. | |||
This makes the tag look like a lone close tag. It is technically | |||
invalid and is only parsable Wikicode when the tag itself is | |||
single-only, like ``<br>`` and ``<img>``. See | |||
:py:func:`.definitions.is_single_only`. | |||
""" | |||
return self._invalid | |||
@property | |||
def close_padding(self): | |||
"""How much spacing to insert before the last closing >.""" | |||
return self._close_padding | |||
def implicit(self): | |||
"""Whether the tag is implicitly self-closing, with no ending slash. | |||
@type.setter | |||
def type(self, value): | |||
value = int(value) | |||
if value not in self.TAGS_INVISIBLE | self.TAGS_VISIBLE: | |||
raise ValueError(value) | |||
self._type = value | |||
This is only possible for specific "single" tags like ``<br>`` and | |||
``<li>``. See :py:func:`.definitions.is_single`. This field only has an | |||
effect if :py:attr:`self_closing` is also ``True``. | |||
""" | |||
return self._implicit | |||
@property | |||
def padding(self): | |||
"""Spacing to insert before the first closing ``>``.""" | |||
return self._padding | |||
@property | |||
def closing_tag(self): | |||
"""The closing tag, as a :py:class:`~.Wikicode` object. | |||
This will usually equal :py:attr:`tag`, unless there is additional | |||
spacing, comments, or the like. | |||
""" | |||
return self._closing_tag | |||
@tag.setter | |||
def tag(self, value): | |||
self._tag = parse_anything(value) | |||
self._tag = self._closing_tag = parse_anything(value) | |||
@contents.setter | |||
def contents(self, value): | |||
self._contents = parse_anything(value) | |||
@showtag.setter | |||
def showtag(self, value): | |||
self._showtag = bool(value) | |||
@wiki_markup.setter | |||
def wiki_markup(self, value): | |||
self._wiki_markup = str(value) if value else None | |||
@self_closing.setter | |||
def self_closing(self, value): | |||
self._self_closing = bool(value) | |||
@open_padding.setter | |||
def open_padding(self, value): | |||
self._open_padding = int(value) | |||
@invalid.setter | |||
def invalid(self, value): | |||
self._invalid = bool(value) | |||
@implicit.setter | |||
def implicit(self, value): | |||
self._implicit = bool(value) | |||
@close_padding.setter | |||
def close_padding(self, value): | |||
self._close_padding = int(value) | |||
@padding.setter | |||
def padding(self, value): | |||
if not value: | |||
self._padding = "" | |||
else: | |||
value = str(value) | |||
if not value.isspace(): | |||
raise ValueError("padding must be entirely whitespace") | |||
self._padding = value | |||
@closing_tag.setter | |||
def closing_tag(self, value): | |||
self._closing_tag = parse_anything(value) | |||
def has(self, name): | |||
"""Return whether any attribute in the tag has the given *name*. | |||
Note that a tag may have multiple attributes with the same name, but | |||
only the last one is read by the MediaWiki parser. | |||
""" | |||
for attr in self.attributes: | |||
if attr.name == name.strip(): | |||
return True | |||
return False | |||
def get(self, name): | |||
"""Get the attribute with the given *name*. | |||
The returned object is a :py:class:`~.Attribute` instance. Raises | |||
:py:exc:`ValueError` if no attribute has this name. Since multiple | |||
attributes can have the same name, we'll return the last match, since | |||
all but the last are ignored by the MediaWiki parser. | |||
""" | |||
for attr in reversed(self.attributes): | |||
if attr.name == name.strip(): | |||
return attr | |||
raise ValueError(name) | |||
def add(self, name, value=None, quoted=True, pad_first=" ", | |||
pad_before_eq="", pad_after_eq=""): | |||
"""Add an attribute with the given *name* and *value*. | |||
*name* and *value* can be anything parasable by | |||
:py:func:`.utils.parse_anything`; *value* can be omitted if the | |||
attribute is valueless. *quoted* is a bool telling whether to wrap the | |||
*value* in double quotes (this is recommended). *pad_first*, | |||
*pad_before_eq*, and *pad_after_eq* are whitespace used as padding | |||
before the name, before the equal sign (or after the name if no value), | |||
and after the equal sign (ignored if no value), respectively. | |||
""" | |||
if value is not None: | |||
value = parse_anything(value) | |||
attr = Attribute(parse_anything(name), value, quoted) | |||
attr.pad_first = pad_first | |||
attr.pad_before_eq = pad_before_eq | |||
attr.pad_after_eq = pad_after_eq | |||
self.attributes.append(attr) | |||
return attr | |||
def remove(self, name): | |||
"""Remove all attributes with the given *name*.""" | |||
attrs = [attr for attr in self.attributes if attr.name == name.strip()] | |||
if not attrs: | |||
raise ValueError(name) | |||
for attr in attrs: | |||
self.attributes.remove(attr) |
@@ -26,7 +26,7 @@ import re | |||
from . import HTMLEntity, Node, Text | |||
from .extras import Parameter | |||
from ..compat import basestring, str | |||
from ..compat import str | |||
from ..utils import parse_anything | |||
__all__ = ["Template"] | |||
@@ -84,7 +84,7 @@ class Template(Node): | |||
replacement = str(HTMLEntity(value=ord(char))) | |||
for node in code.filter_text(recursive=False): | |||
if char in node: | |||
code.replace(node, node.replace(char, replacement)) | |||
code.replace(node, node.replace(char, replacement), False) | |||
def _blank_param_value(self, value): | |||
"""Remove the content from *value* while keeping its whitespace. | |||
@@ -164,15 +164,15 @@ class Template(Node): | |||
def name(self, value): | |||
self._name = parse_anything(value) | |||
def has_param(self, name, ignore_empty=True): | |||
def has(self, name, ignore_empty=True): | |||
"""Return ``True`` if any parameter in the template is named *name*. | |||
With *ignore_empty*, ``False`` will be returned even if the template | |||
contains a parameter with the name *name*, if the parameter's value | |||
is empty. Note that a template may have multiple parameters with the | |||
same name. | |||
same name, but only the last one is read by the MediaWiki parser. | |||
""" | |||
name = name.strip() if isinstance(name, basestring) else str(name) | |||
name = str(name).strip() | |||
for param in self.params: | |||
if param.name.strip() == name: | |||
if ignore_empty and not param.value.strip(): | |||
@@ -180,6 +180,9 @@ class Template(Node): | |||
return True | |||
return False | |||
has_param = lambda self, *args, **kwargs: self.has(*args, **kwargs) | |||
has_param.__doc__ = "Alias for :py:meth:`has`." | |||
def get(self, name): | |||
"""Get the parameter whose name is *name*. | |||
@@ -188,7 +191,7 @@ class Template(Node): | |||
parameters can have the same name, we'll return the last match, since | |||
the last parameter is the only one read by the MediaWiki parser. | |||
""" | |||
name = name.strip() if isinstance(name, basestring) else str(name) | |||
name = str(name).strip() | |||
for param in reversed(self.params): | |||
if param.name.strip() == name: | |||
return param | |||
@@ -226,7 +229,7 @@ class Template(Node): | |||
name, value = parse_anything(name), parse_anything(value) | |||
self._surface_escape(value, "|") | |||
if self.has_param(name): | |||
if self.has(name): | |||
self.remove(name, keep_field=True) | |||
existing = self.get(name) | |||
if showkey is not None: | |||
@@ -291,7 +294,7 @@ class Template(Node): | |||
the first instance if none have dependents, otherwise the one with | |||
dependents will be kept). | |||
""" | |||
name = name.strip() if isinstance(name, basestring) else str(name) | |||
name = str(name).strip() | |||
removed = False | |||
to_remove = [] | |||
for i, param in enumerate(self.params): | |||
@@ -46,16 +46,15 @@ class Parser(object): | |||
:py:class:`~.Node`\ s by the :py:class:`~.Builder`. | |||
""" | |||
def __init__(self, text): | |||
self.text = text | |||
def __init__(self): | |||
if use_c and CTokenizer: | |||
self._tokenizer = CTokenizer() | |||
else: | |||
self._tokenizer = Tokenizer() | |||
self._builder = Builder() | |||
def parse(self): | |||
"""Return a string as a parsed :py:class:`~.Wikicode` object tree.""" | |||
tokens = self._tokenizer.tokenize(self.text) | |||
def parse(self, text, context=0): | |||
"""Parse *text*, returning a :py:class:`~.Wikicode` object tree.""" | |||
tokens = self._tokenizer.tokenize(text, context) | |||
code = self._builder.build(tokens) | |||
return code |
@@ -24,8 +24,8 @@ from __future__ import unicode_literals | |||
from . import tokens | |||
from ..compat import str | |||
from ..nodes import (Argument, Comment, Heading, HTMLEntity, Tag, Template, | |||
Text, Wikilink) | |||
from ..nodes import (Argument, Comment, ExternalLink, Heading, HTMLEntity, Tag, | |||
Template, Text, Wikilink) | |||
from ..nodes.extras import Attribute, Parameter | |||
from ..smart_list import SmartList | |||
from ..wikicode import Wikicode | |||
@@ -83,7 +83,7 @@ class Builder(object): | |||
tokens.TemplateClose)): | |||
self._tokens.append(token) | |||
value = self._pop() | |||
if not key: | |||
if key is None: | |||
key = self._wrap([Text(str(default))]) | |||
return Parameter(key, value, showkey) | |||
else: | |||
@@ -142,6 +142,22 @@ class Builder(object): | |||
else: | |||
self._write(self._handle_token(token)) | |||
def _handle_external_link(self, token): | |||
"""Handle when an external link is at the head of the tokens.""" | |||
brackets, url = token.brackets, None | |||
self._push() | |||
while self._tokens: | |||
token = self._tokens.pop() | |||
if isinstance(token, tokens.ExternalLinkSeparator): | |||
url = self._pop() | |||
self._push() | |||
elif isinstance(token, tokens.ExternalLinkClose): | |||
if url is not None: | |||
return ExternalLink(url, self._pop(), brackets) | |||
return ExternalLink(self._pop(), brackets=brackets) | |||
else: | |||
self._write(self._handle_token(token)) | |||
def _handle_entity(self): | |||
"""Handle a case where an HTML entity is at the head of the tokens.""" | |||
token = self._tokens.pop() | |||
@@ -170,7 +186,7 @@ class Builder(object): | |||
self._write(self._handle_token(token)) | |||
def _handle_comment(self): | |||
"""Handle a case where a hidden comment is at the head of the tokens.""" | |||
"""Handle a case where an HTML comment is at the head of the tokens.""" | |||
self._push() | |||
while self._tokens: | |||
token = self._tokens.pop() | |||
@@ -180,7 +196,7 @@ class Builder(object): | |||
else: | |||
self._write(self._handle_token(token)) | |||
def _handle_attribute(self): | |||
def _handle_attribute(self, start): | |||
"""Handle a case where a tag attribute is at the head of the tokens.""" | |||
name, quoted = None, False | |||
self._push() | |||
@@ -191,37 +207,46 @@ class Builder(object): | |||
self._push() | |||
elif isinstance(token, tokens.TagAttrQuote): | |||
quoted = True | |||
elif isinstance(token, (tokens.TagAttrStart, | |||
tokens.TagCloseOpen)): | |||
elif isinstance(token, (tokens.TagAttrStart, tokens.TagCloseOpen, | |||
tokens.TagCloseSelfclose)): | |||
self._tokens.append(token) | |||
if name is not None: | |||
return Attribute(name, self._pop(), quoted) | |||
return Attribute(self._pop(), quoted=quoted) | |||
if name: | |||
value = self._pop() | |||
else: | |||
name, value = self._pop(), None | |||
return Attribute(name, value, quoted, start.pad_first, | |||
start.pad_before_eq, start.pad_after_eq) | |||
else: | |||
self._write(self._handle_token(token)) | |||
def _handle_tag(self, token): | |||
"""Handle a case where a tag is at the head of the tokens.""" | |||
type_, showtag = token.type, token.showtag | |||
attrs = [] | |||
close_tokens = (tokens.TagCloseSelfclose, tokens.TagCloseClose) | |||
implicit, attrs, contents, closing_tag = False, [], None, None | |||
wiki_markup, invalid = token.wiki_markup, token.invalid or False | |||
self._push() | |||
while self._tokens: | |||
token = self._tokens.pop() | |||
if isinstance(token, tokens.TagAttrStart): | |||
attrs.append(self._handle_attribute()) | |||
attrs.append(self._handle_attribute(token)) | |||
elif isinstance(token, tokens.TagCloseOpen): | |||
open_pad = token.padding | |||
padding = token.padding or "" | |||
tag = self._pop() | |||
self._push() | |||
elif isinstance(token, tokens.TagCloseSelfclose): | |||
tag = self._pop() | |||
return Tag(type_, tag, attrs=attrs, showtag=showtag, | |||
self_closing=True, open_padding=token.padding) | |||
elif isinstance(token, tokens.TagOpenClose): | |||
contents = self._pop() | |||
elif isinstance(token, tokens.TagCloseClose): | |||
return Tag(type_, tag, contents, attrs, showtag, False, | |||
open_pad, token.padding) | |||
self._push() | |||
elif isinstance(token, close_tokens): | |||
if isinstance(token, tokens.TagCloseSelfclose): | |||
tag = self._pop() | |||
self_closing = True | |||
padding = token.padding or "" | |||
implicit = token.implicit or False | |||
else: | |||
self_closing = False | |||
closing_tag = self._pop() | |||
return Tag(tag, contents, attrs, wiki_markup, self_closing, | |||
invalid, implicit, padding, closing_tag) | |||
else: | |||
self._write(self._handle_token(token)) | |||
@@ -235,6 +260,8 @@ class Builder(object): | |||
return self._handle_argument() | |||
elif isinstance(token, tokens.WikilinkOpen): | |||
return self._handle_wikilink() | |||
elif isinstance(token, tokens.ExternalLinkOpen): | |||
return self._handle_external_link(token) | |||
elif isinstance(token, tokens.HTMLEntityStart): | |||
return self._handle_entity() | |||
elif isinstance(token, tokens.HeadingStart): | |||
@@ -51,6 +51,12 @@ Local (stack-specific) contexts: | |||
* :py:const:`WIKILINK_TITLE` | |||
* :py:const:`WIKILINK_TEXT` | |||
* :py:const:`EXT_LINK` | |||
* :py:const:`EXT_LINK_URI` | |||
* :py:const:`EXT_LINK_TITLE` | |||
* :py:const:`EXT_LINK_BRACKETS` | |||
* :py:const:`HEADING` | |||
* :py:const:`HEADING_LEVEL_1` | |||
@@ -60,7 +66,21 @@ Local (stack-specific) contexts: | |||
* :py:const:`HEADING_LEVEL_5` | |||
* :py:const:`HEADING_LEVEL_6` | |||
* :py:const:`COMMENT` | |||
* :py:const:`TAG` | |||
* :py:const:`TAG_OPEN` | |||
* :py:const:`TAG_ATTR` | |||
* :py:const:`TAG_BODY` | |||
* :py:const:`TAG_CLOSE` | |||
* :py:const:`STYLE` | |||
* :py:const:`STYLE_ITALICS` | |||
* :py:const:`STYLE_BOLD` | |||
* :py:const:`STYLE_PASS_AGAIN` | |||
* :py:const:`STYLE_SECOND_PASS` | |||
* :py:const:`DL_TERM` | |||
* :py:const:`SAFETY_CHECK` | |||
@@ -74,41 +94,76 @@ Local (stack-specific) contexts: | |||
Global contexts: | |||
* :py:const:`GL_HEADING` | |||
Aggregate contexts: | |||
* :py:const:`FAIL` | |||
* :py:const:`UNSAFE` | |||
* :py:const:`DOUBLE` | |||
* :py:const:`INVALID_LINK` | |||
""" | |||
# Local contexts: | |||
TEMPLATE = 0b00000000000000000111 | |||
TEMPLATE_NAME = 0b00000000000000000001 | |||
TEMPLATE_PARAM_KEY = 0b00000000000000000010 | |||
TEMPLATE_PARAM_VALUE = 0b00000000000000000100 | |||
ARGUMENT = 0b00000000000000011000 | |||
ARGUMENT_NAME = 0b00000000000000001000 | |||
ARGUMENT_DEFAULT = 0b00000000000000010000 | |||
WIKILINK = 0b00000000000001100000 | |||
WIKILINK_TITLE = 0b00000000000000100000 | |||
WIKILINK_TEXT = 0b00000000000001000000 | |||
HEADING = 0b00000001111110000000 | |||
HEADING_LEVEL_1 = 0b00000000000010000000 | |||
HEADING_LEVEL_2 = 0b00000000000100000000 | |||
HEADING_LEVEL_3 = 0b00000000001000000000 | |||
HEADING_LEVEL_4 = 0b00000000010000000000 | |||
HEADING_LEVEL_5 = 0b00000000100000000000 | |||
HEADING_LEVEL_6 = 0b00000001000000000000 | |||
COMMENT = 0b00000010000000000000 | |||
SAFETY_CHECK = 0b11111100000000000000 | |||
HAS_TEXT = 0b00000100000000000000 | |||
FAIL_ON_TEXT = 0b00001000000000000000 | |||
FAIL_NEXT = 0b00010000000000000000 | |||
FAIL_ON_LBRACE = 0b00100000000000000000 | |||
FAIL_ON_RBRACE = 0b01000000000000000000 | |||
FAIL_ON_EQUALS = 0b10000000000000000000 | |||
TEMPLATE_NAME = 1 << 0 | |||
TEMPLATE_PARAM_KEY = 1 << 1 | |||
TEMPLATE_PARAM_VALUE = 1 << 2 | |||
TEMPLATE = TEMPLATE_NAME + TEMPLATE_PARAM_KEY + TEMPLATE_PARAM_VALUE | |||
ARGUMENT_NAME = 1 << 3 | |||
ARGUMENT_DEFAULT = 1 << 4 | |||
ARGUMENT = ARGUMENT_NAME + ARGUMENT_DEFAULT | |||
WIKILINK_TITLE = 1 << 5 | |||
WIKILINK_TEXT = 1 << 6 | |||
WIKILINK = WIKILINK_TITLE + WIKILINK_TEXT | |||
EXT_LINK_URI = 1 << 7 | |||
EXT_LINK_TITLE = 1 << 8 | |||
EXT_LINK_BRACKETS = 1 << 9 | |||
EXT_LINK = EXT_LINK_URI + EXT_LINK_TITLE + EXT_LINK_BRACKETS | |||
HEADING_LEVEL_1 = 1 << 10 | |||
HEADING_LEVEL_2 = 1 << 11 | |||
HEADING_LEVEL_3 = 1 << 12 | |||
HEADING_LEVEL_4 = 1 << 13 | |||
HEADING_LEVEL_5 = 1 << 14 | |||
HEADING_LEVEL_6 = 1 << 15 | |||
HEADING = (HEADING_LEVEL_1 + HEADING_LEVEL_2 + HEADING_LEVEL_3 + | |||
HEADING_LEVEL_4 + HEADING_LEVEL_5 + HEADING_LEVEL_6) | |||
TAG_OPEN = 1 << 16 | |||
TAG_ATTR = 1 << 17 | |||
TAG_BODY = 1 << 18 | |||
TAG_CLOSE = 1 << 19 | |||
TAG = TAG_OPEN + TAG_ATTR + TAG_BODY + TAG_CLOSE | |||
STYLE_ITALICS = 1 << 20 | |||
STYLE_BOLD = 1 << 21 | |||
STYLE_PASS_AGAIN = 1 << 22 | |||
STYLE_SECOND_PASS = 1 << 23 | |||
STYLE = STYLE_ITALICS + STYLE_BOLD + STYLE_PASS_AGAIN + STYLE_SECOND_PASS | |||
DL_TERM = 1 << 24 | |||
HAS_TEXT = 1 << 25 | |||
FAIL_ON_TEXT = 1 << 26 | |||
FAIL_NEXT = 1 << 27 | |||
FAIL_ON_LBRACE = 1 << 28 | |||
FAIL_ON_RBRACE = 1 << 29 | |||
FAIL_ON_EQUALS = 1 << 30 | |||
SAFETY_CHECK = (HAS_TEXT + FAIL_ON_TEXT + FAIL_NEXT + FAIL_ON_LBRACE + | |||
FAIL_ON_RBRACE + FAIL_ON_EQUALS) | |||
# Global contexts: | |||
GL_HEADING = 0b1 | |||
GL_HEADING = 1 << 0 | |||
# Aggregate contexts: | |||
FAIL = TEMPLATE + ARGUMENT + WIKILINK + EXT_LINK_TITLE + HEADING + TAG + STYLE | |||
UNSAFE = (TEMPLATE_NAME + WIKILINK + EXT_LINK_TITLE + TEMPLATE_PARAM_KEY + | |||
ARGUMENT_NAME + TAG_CLOSE) | |||
DOUBLE = TEMPLATE_PARAM_KEY + TAG_CLOSE | |||
INVALID_LINK = TEMPLATE_NAME + ARGUMENT_NAME + WIKILINK + EXT_LINK |
@@ -28,6 +28,7 @@ SOFTWARE. | |||
#include <Python.h> | |||
#include <math.h> | |||
#include <structmember.h> | |||
#include <bytesobject.h> | |||
#if PY_MAJOR_VERSION >= 3 | |||
#define IS_PY3K | |||
@@ -41,8 +42,8 @@ SOFTWARE. | |||
#define ALPHANUM "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" | |||
static const char* MARKERS[] = { | |||
"{", "}", "[", "]", "<", ">", "|", "=", "&", "#", "*", ";", ":", "/", "-", | |||
"!", "\n", ""}; | |||
"{", "}", "[", "]", "<", ">", "|", "=", "&", "'", "#", "*", ";", ":", "/", | |||
"-", "\n", ""}; | |||
#define NUM_MARKERS 18 | |||
#define TEXTBUFFER_BLOCKSIZE 1024 | |||
@@ -51,19 +52,20 @@ static const char* MARKERS[] = { | |||
#define MAX_BRACES 255 | |||
#define MAX_ENTITY_SIZE 8 | |||
static int route_state = 0; | |||
#define BAD_ROUTE (route_state) | |||
#define FAIL_ROUTE() (route_state = 1) | |||
#define RESET_ROUTE() (route_state = 0) | |||
static int route_state = 0, route_context = 0; | |||
#define BAD_ROUTE route_state | |||
#define BAD_ROUTE_CONTEXT route_context | |||
#define FAIL_ROUTE(context) route_state = 1; route_context = context | |||
#define RESET_ROUTE() route_state = 0 | |||
static char** entitydefs; | |||
static PyObject* EMPTY; | |||
static PyObject* NOARGS; | |||
static PyObject* tokens; | |||
static PyObject* definitions; | |||
/* Tokens */ | |||
/* Tokens: */ | |||
static PyObject* Text; | |||
@@ -80,6 +82,10 @@ static PyObject* WikilinkOpen; | |||
static PyObject* WikilinkSeparator; | |||
static PyObject* WikilinkClose; | |||
static PyObject* ExternalLinkOpen; | |||
static PyObject* ExternalLinkSeparator; | |||
static PyObject* ExternalLinkClose; | |||
static PyObject* HTMLEntityStart; | |||
static PyObject* HTMLEntityNumeric; | |||
static PyObject* HTMLEntityHex; | |||
@@ -102,47 +108,83 @@ static PyObject* TagCloseClose; | |||
/* Local contexts: */ | |||
#define LC_TEMPLATE 0x00007 | |||
#define LC_TEMPLATE_NAME 0x00001 | |||
#define LC_TEMPLATE_PARAM_KEY 0x00002 | |||
#define LC_TEMPLATE_PARAM_VALUE 0x00004 | |||
#define LC_ARGUMENT 0x00018 | |||
#define LC_ARGUMENT_NAME 0x00008 | |||
#define LC_ARGUMENT_DEFAULT 0x00010 | |||
#define LC_WIKILINK 0x00060 | |||
#define LC_WIKILINK_TITLE 0x00020 | |||
#define LC_WIKILINK_TEXT 0x00040 | |||
#define LC_HEADING 0x01F80 | |||
#define LC_HEADING_LEVEL_1 0x00080 | |||
#define LC_HEADING_LEVEL_2 0x00100 | |||
#define LC_HEADING_LEVEL_3 0x00200 | |||
#define LC_HEADING_LEVEL_4 0x00400 | |||
#define LC_HEADING_LEVEL_5 0x00800 | |||
#define LC_HEADING_LEVEL_6 0x01000 | |||
#define LC_COMMENT 0x02000 | |||
#define LC_SAFETY_CHECK 0xFC000 | |||
#define LC_HAS_TEXT 0x04000 | |||
#define LC_FAIL_ON_TEXT 0x08000 | |||
#define LC_FAIL_NEXT 0x10000 | |||
#define LC_FAIL_ON_LBRACE 0x20000 | |||
#define LC_FAIL_ON_RBRACE 0x40000 | |||
#define LC_FAIL_ON_EQUALS 0x80000 | |||
#define LC_TEMPLATE 0x00000007 | |||
#define LC_TEMPLATE_NAME 0x00000001 | |||
#define LC_TEMPLATE_PARAM_KEY 0x00000002 | |||
#define LC_TEMPLATE_PARAM_VALUE 0x00000004 | |||
#define LC_ARGUMENT 0x00000018 | |||
#define LC_ARGUMENT_NAME 0x00000008 | |||
#define LC_ARGUMENT_DEFAULT 0x00000010 | |||
#define LC_WIKILINK 0x00000060 | |||
#define LC_WIKILINK_TITLE 0x00000020 | |||
#define LC_WIKILINK_TEXT 0x00000040 | |||
#define LC_EXT_LINK 0x00000380 | |||
#define LC_EXT_LINK_URI 0x00000080 | |||
#define LC_EXT_LINK_TITLE 0x00000100 | |||
#define LC_EXT_LINK_BRACKETS 0x00000200 | |||
#define LC_HEADING 0x0000FC00 | |||
#define LC_HEADING_LEVEL_1 0x00000400 | |||
#define LC_HEADING_LEVEL_2 0x00000800 | |||
#define LC_HEADING_LEVEL_3 0x00001000 | |||
#define LC_HEADING_LEVEL_4 0x00002000 | |||
#define LC_HEADING_LEVEL_5 0x00004000 | |||
#define LC_HEADING_LEVEL_6 0x00008000 | |||
#define LC_TAG 0x000F0000 | |||
#define LC_TAG_OPEN 0x00010000 | |||
#define LC_TAG_ATTR 0x00020000 | |||
#define LC_TAG_BODY 0x00040000 | |||
#define LC_TAG_CLOSE 0x00080000 | |||
#define LC_STYLE 0x00F00000 | |||
#define LC_STYLE_ITALICS 0x00100000 | |||
#define LC_STYLE_BOLD 0x00200000 | |||
#define LC_STYLE_PASS_AGAIN 0x00400000 | |||
#define LC_STYLE_SECOND_PASS 0x00800000 | |||
#define LC_DLTERM 0x01000000 | |||
#define LC_SAFETY_CHECK 0x7E000000 | |||
#define LC_HAS_TEXT 0x02000000 | |||
#define LC_FAIL_ON_TEXT 0x04000000 | |||
#define LC_FAIL_NEXT 0x08000000 | |||
#define LC_FAIL_ON_LBRACE 0x10000000 | |||
#define LC_FAIL_ON_RBRACE 0x20000000 | |||
#define LC_FAIL_ON_EQUALS 0x40000000 | |||
/* Global contexts: */ | |||
#define GL_HEADING 0x1 | |||
/* Aggregate contexts: */ | |||
#define AGG_FAIL (LC_TEMPLATE | LC_ARGUMENT | LC_WIKILINK | LC_EXT_LINK_TITLE | LC_HEADING | LC_TAG | LC_STYLE) | |||
#define AGG_UNSAFE (LC_TEMPLATE_NAME | LC_WIKILINK | LC_EXT_LINK_TITLE | LC_TEMPLATE_PARAM_KEY | LC_ARGUMENT_NAME) | |||
#define AGG_DOUBLE (LC_TEMPLATE_PARAM_KEY | LC_TAG_CLOSE) | |||
#define AGG_INVALID_LINK (LC_TEMPLATE_NAME | LC_ARGUMENT_NAME | LC_WIKILINK | LC_EXT_LINK) | |||
/* Tag contexts: */ | |||
#define TAG_NAME 0x01 | |||
#define TAG_ATTR_READY 0x02 | |||
#define TAG_ATTR_NAME 0x04 | |||
#define TAG_ATTR_VALUE 0x08 | |||
#define TAG_QUOTED 0x10 | |||
#define TAG_NOTE_SPACE 0x20 | |||
#define TAG_NOTE_EQUALS 0x40 | |||
#define TAG_NOTE_QUOTE 0x80 | |||
/* Miscellaneous structs: */ | |||
struct Textbuffer { | |||
Py_ssize_t size; | |||
Py_UNICODE* data; | |||
struct Textbuffer* prev; | |||
struct Textbuffer* next; | |||
}; | |||
@@ -158,13 +200,24 @@ typedef struct { | |||
int level; | |||
} HeadingData; | |||
typedef struct { | |||
int context; | |||
struct Textbuffer* pad_first; | |||
struct Textbuffer* pad_before_eq; | |||
struct Textbuffer* pad_after_eq; | |||
Py_ssize_t reset; | |||
} TagData; | |||
typedef struct Textbuffer Textbuffer; | |||
typedef struct Stack Stack; | |||
/* Tokenizer object definition: */ | |||
typedef struct { | |||
PyObject_HEAD | |||
PyObject* text; /* text to tokenize */ | |||
struct Stack* topstack; /* topmost stack */ | |||
Stack* topstack; /* topmost stack */ | |||
Py_ssize_t head; /* current position in text */ | |||
Py_ssize_t length; /* length of text */ | |||
int global; /* global context */ | |||
@@ -173,78 +226,80 @@ typedef struct { | |||
} Tokenizer; | |||
/* Macros for accessing Tokenizer data: */ | |||
/* Macros related to Tokenizer functions: */ | |||
#define Tokenizer_READ(self, delta) (*PyUnicode_AS_UNICODE(Tokenizer_read(self, delta))) | |||
#define Tokenizer_READ_BACKWARDS(self, delta) \ | |||
(*PyUnicode_AS_UNICODE(Tokenizer_read_backwards(self, delta))) | |||
#define Tokenizer_CAN_RECURSE(self) (self->depth < MAX_DEPTH && self->cycles < MAX_CYCLES) | |||
#define Tokenizer_emit(self, token) Tokenizer_emit_token(self, token, 0) | |||
#define Tokenizer_emit_first(self, token) Tokenizer_emit_token(self, token, 1) | |||
#define Tokenizer_emit_kwargs(self, token, kwargs) Tokenizer_emit_token_kwargs(self, token, kwargs, 0) | |||
#define Tokenizer_emit_first_kwargs(self, token, kwargs) Tokenizer_emit_token_kwargs(self, token, kwargs, 1) | |||
/* Macros for accessing definitions: */ | |||
#define GET_HTML_TAG(markup) (markup == *":" ? "dd" : markup == *";" ? "dt" : "li") | |||
#define IS_PARSABLE(tag) (call_def_func("is_parsable", tag, NULL, NULL)) | |||
#define IS_SINGLE(tag) (call_def_func("is_single", tag, NULL, NULL)) | |||
#define IS_SINGLE_ONLY(tag) (call_def_func("is_single_only", tag, NULL, NULL)) | |||
#define IS_SCHEME(scheme, slashes, reverse) \ | |||
(call_def_func("is_scheme", scheme, slashes ? Py_True : Py_False, reverse ? Py_True : Py_False)) | |||
/* Function prototypes: */ | |||
static int heading_level_from_context(int); | |||
static Textbuffer* Textbuffer_new(void); | |||
static void Textbuffer_dealloc(Textbuffer*); | |||
static TagData* TagData_new(void); | |||
static void TagData_dealloc(TagData*); | |||
static PyObject* Tokenizer_new(PyTypeObject*, PyObject*, PyObject*); | |||
static struct Textbuffer* Textbuffer_new(void); | |||
static void Tokenizer_dealloc(Tokenizer*); | |||
static void Textbuffer_dealloc(struct Textbuffer*); | |||
static int Tokenizer_init(Tokenizer*, PyObject*, PyObject*); | |||
static int Tokenizer_push(Tokenizer*, int); | |||
static PyObject* Textbuffer_render(struct Textbuffer*); | |||
static int Tokenizer_push_textbuffer(Tokenizer*); | |||
static void Tokenizer_delete_top_of_stack(Tokenizer*); | |||
static PyObject* Tokenizer_pop(Tokenizer*); | |||
static PyObject* Tokenizer_pop_keeping_context(Tokenizer*); | |||
static void* Tokenizer_fail_route(Tokenizer*); | |||
static int Tokenizer_write(Tokenizer*, PyObject*); | |||
static int Tokenizer_write_first(Tokenizer*, PyObject*); | |||
static int Tokenizer_write_text(Tokenizer*, Py_UNICODE); | |||
static int Tokenizer_write_all(Tokenizer*, PyObject*); | |||
static int Tokenizer_write_text_then_stack(Tokenizer*, const char*); | |||
static PyObject* Tokenizer_read(Tokenizer*, Py_ssize_t); | |||
static PyObject* Tokenizer_read_backwards(Tokenizer*, Py_ssize_t); | |||
static int Tokenizer_parse_template_or_argument(Tokenizer*); | |||
static int Tokenizer_parse_template(Tokenizer*); | |||
static int Tokenizer_parse_argument(Tokenizer*); | |||
static int Tokenizer_handle_template_param(Tokenizer*); | |||
static int Tokenizer_handle_template_param_value(Tokenizer*); | |||
static PyObject* Tokenizer_handle_template_end(Tokenizer*); | |||
static int Tokenizer_handle_argument_separator(Tokenizer*); | |||
static PyObject* Tokenizer_handle_argument_end(Tokenizer*); | |||
static int Tokenizer_parse_wikilink(Tokenizer*); | |||
static int Tokenizer_handle_wikilink_separator(Tokenizer*); | |||
static PyObject* Tokenizer_handle_wikilink_end(Tokenizer*); | |||
static int Tokenizer_parse_heading(Tokenizer*); | |||
static HeadingData* Tokenizer_handle_heading_end(Tokenizer*); | |||
static int Tokenizer_really_parse_entity(Tokenizer*); | |||
static int Tokenizer_parse_entity(Tokenizer*); | |||
static int Tokenizer_parse_comment(Tokenizer*); | |||
static int Tokenizer_verify_safe(Tokenizer*, int, Py_UNICODE); | |||
static PyObject* Tokenizer_parse(Tokenizer*, int); | |||
static int Tokenizer_handle_dl_term(Tokenizer*); | |||
static int Tokenizer_parse_tag(Tokenizer*); | |||
static PyObject* Tokenizer_parse(Tokenizer*, int, int); | |||
static PyObject* Tokenizer_tokenize(Tokenizer*, PyObject*); | |||
/* Macros for Python 2/3 compatibility: */ | |||
#ifdef IS_PY3K | |||
#define NEW_INT_FUNC PyLong_FromSsize_t | |||
#define IMPORT_NAME_FUNC PyUnicode_FromString | |||
#define CREATE_MODULE PyModule_Create(&module_def); | |||
#define ENTITYDEFS_MODULE "html.entities" | |||
#define INIT_FUNC_NAME PyInit__tokenizer | |||
#define INIT_ERROR return NULL | |||
#else | |||
#define NEW_INT_FUNC PyInt_FromSsize_t | |||
#define IMPORT_NAME_FUNC PyBytes_FromString | |||
#define CREATE_MODULE Py_InitModule("_tokenizer", NULL); | |||
#define ENTITYDEFS_MODULE "htmlentitydefs" | |||
#define INIT_FUNC_NAME init_tokenizer | |||
#define INIT_ERROR return | |||
#endif | |||
/* More structs for creating the Tokenizer type: */ | |||
static PyMethodDef | |||
Tokenizer_methods[] = { | |||
static PyMethodDef Tokenizer_methods[] = { | |||
{"tokenize", (PyCFunction) Tokenizer_tokenize, METH_VARARGS, | |||
"Build a list of tokens from a string of wikicode and return it."}, | |||
{NULL} | |||
}; | |||
static PyMemberDef | |||
Tokenizer_members[] = { | |||
static PyMemberDef Tokenizer_members[] = { | |||
{NULL} | |||
}; | |||
static PyMethodDef | |||
module_methods[] = { | |||
{NULL} | |||
}; | |||
static PyTypeObject | |||
TokenizerType = { | |||
PyObject_HEAD_INIT(NULL) | |||
0, /* ob_size */ | |||
static PyTypeObject TokenizerType = { | |||
PyVarObject_HEAD_INIT(NULL, 0) | |||
"_tokenizer.CTokenizer", /* tp_name */ | |||
sizeof(Tokenizer), /* tp_basicsize */ | |||
0, /* tp_itemsize */ | |||
@@ -283,3 +338,12 @@ TokenizerType = { | |||
0, /* tp_alloc */ | |||
Tokenizer_new, /* tp_new */ | |||
}; | |||
#ifdef IS_PY3K | |||
static PyModuleDef module_def = { | |||
PyModuleDef_HEAD_INIT, | |||
"_tokenizer", | |||
"Creates a list of tokens from a string of wikicode.", | |||
-1, NULL, NULL, NULL, NULL, NULL | |||
}; | |||
#endif |
@@ -30,7 +30,7 @@ into the :py:class`~.Wikicode` tree by the :py:class:`~.Builder`. | |||
from __future__ import unicode_literals | |||
from ..compat import basestring, py3k | |||
from ..compat import py3k, str | |||
__all__ = ["Token"] | |||
@@ -43,7 +43,7 @@ class Token(object): | |||
def __repr__(self): | |||
args = [] | |||
for key, value in self._kwargs.items(): | |||
if isinstance(value, basestring) and len(value) > 100: | |||
if isinstance(value, str) and len(value) > 100: | |||
args.append(key + "=" + repr(value[:97] + "...")) | |||
else: | |||
args.append(key + "=" + repr(value)) | |||
@@ -55,7 +55,7 @@ class Token(object): | |||
return False | |||
def __getattr__(self, key): | |||
return self._kwargs[key] | |||
return self._kwargs.get(key) | |||
def __setattr__(self, key, value): | |||
self._kwargs[key] = value | |||
@@ -84,6 +84,10 @@ WikilinkOpen = make("WikilinkOpen") # [[ | |||
WikilinkSeparator = make("WikilinkSeparator") # | | |||
WikilinkClose = make("WikilinkClose") # ]] | |||
ExternalLinkOpen = make("ExternalLinkOpen") # [ | |||
ExternalLinkSeparator = make("ExternalLinkSeparator") # | |||
ExternalLinkClose = make("ExternalLinkClose") # ] | |||
HTMLEntityStart = make("HTMLEntityStart") # & | |||
HTMLEntityNumeric = make("HTMLEntityNumeric") # # | |||
HTMLEntityHex = make("HTMLEntityHex") # x | |||
@@ -31,7 +31,9 @@ from .compat import bytes, str | |||
from .nodes import Node | |||
from .smart_list import SmartList | |||
def parse_anything(value): | |||
__all__ = ["parse_anything"] | |||
def parse_anything(value, context=0): | |||
"""Return a :py:class:`~.Wikicode` for *value*, allowing multiple types. | |||
This differs from :py:meth:`.Parser.parse` in that we accept more than just | |||
@@ -42,6 +44,12 @@ def parse_anything(value): | |||
on-the-fly by various methods of :py:class:`~.Wikicode` and others like | |||
:py:class:`~.Template`, such as :py:meth:`wikicode.insert() | |||
<.Wikicode.insert>` or setting :py:meth:`template.name <.Template.name>`. | |||
If given, *context* will be passed as a starting context to the parser. | |||
This is helpful when this function is used inside node attribute setters. | |||
For example, :py:class:`~.ExternalLink`\ 's :py:attr:`~.ExternalLink.url` | |||
setter sets *context* to :py:mod:`contexts.EXT_LINK_URI <.contexts>` to | |||
prevent the URL itself from becoming an :py:class:`~.ExternalLink`. | |||
""" | |||
from .parser import Parser | |||
from .wikicode import Wikicode | |||
@@ -51,17 +59,17 @@ def parse_anything(value): | |||
elif isinstance(value, Node): | |||
return Wikicode(SmartList([value])) | |||
elif isinstance(value, str): | |||
return Parser(value).parse() | |||
return Parser().parse(value, context) | |||
elif isinstance(value, bytes): | |||
return Parser(value.decode("utf8")).parse() | |||
return Parser().parse(value.decode("utf8"), context) | |||
elif isinstance(value, int): | |||
return Parser(str(value)).parse() | |||
return Parser().parse(str(value), context) | |||
elif value is None: | |||
return Wikicode(SmartList()) | |||
try: | |||
nodelist = SmartList() | |||
for item in value: | |||
nodelist += parse_anything(item).nodes | |||
nodelist += parse_anything(item, context).nodes | |||
except TypeError: | |||
error = "Needs string, Node, Wikicode, int, None, or iterable of these, but got {0}: {1}" | |||
raise ValueError(error.format(type(value).__name__, value)) | |||
@@ -24,8 +24,8 @@ from __future__ import unicode_literals | |||
import re | |||
from .compat import maxsize, py3k, str | |||
from .nodes import (Argument, Comment, Heading, HTMLEntity, Node, Tag, | |||
Template, Text, Wikilink) | |||
from .nodes import (Argument, Comment, ExternalLink, Heading, HTMLEntity, | |||
Node, Tag, Template, Text, Wikilink) | |||
from .string_mixin import StringMixIn | |||
from .utils import parse_anything | |||
@@ -60,19 +60,6 @@ class Wikicode(StringMixIn): | |||
for context, child in node.__iternodes__(self._get_all_nodes): | |||
yield child | |||
def _get_context(self, node, obj): | |||
"""Return a ``Wikicode`` that contains *obj* in its descendants. | |||
The closest (shortest distance from *node*) suitable ``Wikicode`` will | |||
be returned, or ``None`` if the *obj* is the *node* itself. | |||
Raises ``ValueError`` if *obj* is not within *node*. | |||
""" | |||
for context, child in node.__iternodes__(self._get_all_nodes): | |||
if self._is_equivalent(obj, child): | |||
return context | |||
raise ValueError(obj) | |||
def _get_all_nodes(self, code): | |||
"""Iterate over all of our descendant nodes. | |||
@@ -105,26 +92,56 @@ class Wikicode(StringMixIn): | |||
return False | |||
return obj in nodes | |||
def _do_search(self, obj, recursive, callback, context, *args, **kwargs): | |||
"""Look within *context* for *obj*, executing *callback* if found. | |||
def _do_search(self, obj, recursive, context=None, literal=None): | |||
"""Return some info about the location of *obj* within *context*. | |||
If *recursive* is ``True``, we'll look within context and its | |||
descendants, otherwise we'll just execute callback. We raise | |||
:py:exc:`ValueError` if *obj* isn't in our node list or context. If | |||
found, *callback* is passed the context, the index of the node within | |||
the context, and whatever were passed as ``*args`` and ``**kwargs``. | |||
If *recursive* is ``True``, we'll look within *context* (``self`` by | |||
default) and its descendants, otherwise just *context*. We raise | |||
:py:exc:`ValueError` if *obj* isn't found. The return data is a list of | |||
3-tuples (*type*, *context*, *data*) where *type* is *obj*\ 's best | |||
type resolution (either ``Node``, ``Wikicode``, or ``str``), *context* | |||
is the closest ``Wikicode`` encompassing it, and *data* is either a | |||
``Node``, a list of ``Node``\ s, or ``None`` depending on *type*. | |||
""" | |||
if recursive: | |||
for i, node in enumerate(context.nodes): | |||
if self._is_equivalent(obj, node): | |||
return callback(context, i, *args, **kwargs) | |||
if self._contains(self._get_children(node), obj): | |||
context = self._get_context(node, obj) | |||
return self._do_search(obj, recursive, callback, context, | |||
*args, **kwargs) | |||
raise ValueError(obj) | |||
if not context: | |||
context = self | |||
literal = isinstance(obj, (Node, Wikicode)) | |||
obj = parse_anything(obj) | |||
if not obj or obj not in self: | |||
raise ValueError(obj) | |||
if len(obj.nodes) == 1: | |||
obj = obj.get(0) | |||
compare = lambda a, b: (a is b) if literal else (a == b) | |||
results = [] | |||
i = 0 | |||
while i < len(context.nodes): | |||
node = context.get(i) | |||
if isinstance(obj, Node) and compare(obj, node): | |||
results.append((Node, context, node)) | |||
elif isinstance(obj, Wikicode) and compare(obj.get(0), node): | |||
for j in range(1, len(obj.nodes)): | |||
if not compare(obj.get(j), context.get(i + j)): | |||
break | |||
else: | |||
nodes = list(context.nodes[i:i + len(obj.nodes)]) | |||
results.append((Wikicode, context, nodes)) | |||
i += len(obj.nodes) - 1 | |||
elif recursive: | |||
contexts = node.__iternodes__(self._get_all_nodes) | |||
processed = [] | |||
for code in (ctx for ctx, child in contexts): | |||
if code and code not in processed and obj in code: | |||
search = self._do_search(obj, recursive, code, literal) | |||
results.extend(search) | |||
processed.append(code) | |||
i += 1 | |||
callback(context, self.index(obj, recursive=False), *args, **kwargs) | |||
if not results and not literal and recursive: | |||
results.append((str, context, None)) | |||
if not results and context is self: | |||
raise ValueError(obj) | |||
return results | |||
def _get_tree(self, code, lines, marker, indent): | |||
"""Build a tree to illustrate the way the Wikicode object was parsed. | |||
@@ -253,41 +270,64 @@ class Wikicode(StringMixIn): | |||
def insert_before(self, obj, value, recursive=True): | |||
"""Insert *value* immediately before *obj* in the list of nodes. | |||
*obj* can be either a string or a :py:class:`~.Node`. *value* can be | |||
anything parasable by :py:func:`.parse_anything`. If *recursive* is | |||
``True``, we will try to find *obj* within our child nodes even if it | |||
is not a direct descendant of this :py:class:`~.Wikicode` object. If | |||
*obj* is not in the node list, :py:exc:`ValueError` is raised. | |||
*obj* can be either a string, a :py:class:`~.Node`, or other | |||
:py:class:`~.Wikicode` object (as created by :py:meth:`get_sections`, | |||
for example). *value* can be anything parasable by | |||
:py:func:`.parse_anything`. If *recursive* is ``True``, we will try to | |||
find *obj* within our child nodes even if it is not a direct descendant | |||
of this :py:class:`~.Wikicode` object. If *obj* is not found, | |||
:py:exc:`ValueError` is raised. | |||
""" | |||
callback = lambda self, i, value: self.insert(i, value) | |||
self._do_search(obj, recursive, callback, self, value) | |||
for restype, context, data in self._do_search(obj, recursive): | |||
if restype in (Node, Wikicode): | |||
i = context.index(data if restype is Node else data[0], False) | |||
context.insert(i, value) | |||
else: | |||
obj = str(obj) | |||
context.nodes = str(context).replace(obj, str(value) + obj) | |||
def insert_after(self, obj, value, recursive=True): | |||
"""Insert *value* immediately after *obj* in the list of nodes. | |||
*obj* can be either a string or a :py:class:`~.Node`. *value* can be | |||
anything parasable by :py:func:`.parse_anything`. If *recursive* is | |||
``True``, we will try to find *obj* within our child nodes even if it | |||
is not a direct descendant of this :py:class:`~.Wikicode` object. If | |||
*obj* is not in the node list, :py:exc:`ValueError` is raised. | |||
*obj* can be either a string, a :py:class:`~.Node`, or other | |||
:py:class:`~.Wikicode` object (as created by :py:meth:`get_sections`, | |||
for example). *value* can be anything parasable by | |||
:py:func:`.parse_anything`. If *recursive* is ``True``, we will try to | |||
find *obj* within our child nodes even if it is not a direct descendant | |||
of this :py:class:`~.Wikicode` object. If *obj* is not found, | |||
:py:exc:`ValueError` is raised. | |||
""" | |||
callback = lambda self, i, value: self.insert(i + 1, value) | |||
self._do_search(obj, recursive, callback, self, value) | |||
for restype, context, data in self._do_search(obj, recursive): | |||
if restype in (Node, Wikicode): | |||
i = context.index(data if restype is Node else data[-1], False) | |||
context.insert(i + 1, value) | |||
else: | |||
obj = str(obj) | |||
context.nodes = str(context).replace(obj, obj + str(value)) | |||
def replace(self, obj, value, recursive=True): | |||
"""Replace *obj* with *value* in the list of nodes. | |||
*obj* can be either a string or a :py:class:`~.Node`. *value* can be | |||
anything parasable by :py:func:`.parse_anything`. If *recursive* is | |||
``True``, we will try to find *obj* within our child nodes even if it | |||
is not a direct descendant of this :py:class:`~.Wikicode` object. If | |||
*obj* is not in the node list, :py:exc:`ValueError` is raised. | |||
*obj* can be either a string, a :py:class:`~.Node`, or other | |||
:py:class:`~.Wikicode` object (as created by :py:meth:`get_sections`, | |||
for example). *value* can be anything parasable by | |||
:py:func:`.parse_anything`. If *recursive* is ``True``, we will try to | |||
find *obj* within our child nodes even if it is not a direct descendant | |||
of this :py:class:`~.Wikicode` object. If *obj* is not found, | |||
:py:exc:`ValueError` is raised. | |||
""" | |||
def callback(self, i, value): | |||
self.nodes.pop(i) | |||
self.insert(i, value) | |||
self._do_search(obj, recursive, callback, self, value) | |||
for restype, context, data in self._do_search(obj, recursive): | |||
if restype is Node: | |||
i = context.index(data, False) | |||
context.nodes.pop(i) | |||
context.insert(i, value) | |||
elif restype is Wikicode: | |||
i = context.index(data[0], False) | |||
for _ in data: | |||
context.nodes.pop(i) | |||
context.insert(i, value) | |||
else: | |||
context.nodes = str(context).replace(str(obj), str(value)) | |||
def append(self, value): | |||
"""Insert *value* at the end of the list of nodes. | |||
@@ -301,15 +341,39 @@ class Wikicode(StringMixIn): | |||
def remove(self, obj, recursive=True): | |||
"""Remove *obj* from the list of nodes. | |||
*obj* can be either a string or a :py:class:`~.Node`. If *recursive* is | |||
``True``, we will try to find *obj* within our child nodes even if it | |||
is not a direct descendant of this :py:class:`~.Wikicode` object. If | |||
*obj* is not in the node list, :py:exc:`ValueError` is raised. | |||
*obj* can be either a string, a :py:class:`~.Node`, or other | |||
:py:class:`~.Wikicode` object (as created by :py:meth:`get_sections`, | |||
for example). If *recursive* is ``True``, we will try to find *obj* | |||
within our child nodes even if it is not a direct descendant of this | |||
:py:class:`~.Wikicode` object. If *obj* is not found, | |||
:py:exc:`ValueError` is raised. | |||
""" | |||
for restype, context, data in self._do_search(obj, recursive): | |||
if restype is Node: | |||
context.nodes.pop(context.index(data, False)) | |||
elif restype is Wikicode: | |||
i = context.index(data[0], False) | |||
for _ in data: | |||
context.nodes.pop(i) | |||
else: | |||
context.nodes = str(context).replace(str(obj), "") | |||
def matches(self, other): | |||
"""Do a loose equivalency test suitable for comparing page names. | |||
*other* can be any string-like object, including | |||
:py:class:`~.Wikicode`. This operation is symmetric; both sides are | |||
adjusted. Specifically, whitespace and markup is stripped and the first | |||
letter's case is normalized. Typical usage is | |||
``if template.name.matches("stub"): ...``. | |||
""" | |||
callback = lambda self, i: self.nodes.pop(i) | |||
self._do_search(obj, recursive, callback, self) | |||
this = self.strip_code().strip() | |||
that = parse_anything(other).strip_code().strip() | |||
if not this or not that: | |||
return this == that | |||
return this[0].upper() + this[1:] == that[0].upper() + that[1:] | |||
def ifilter(self, recursive=False, matches=None, flags=FLAGS, | |||
def ifilter(self, recursive=True, matches=None, flags=FLAGS, | |||
forcetype=None): | |||
"""Iterate over nodes in our list matching certain conditions. | |||
@@ -327,7 +391,7 @@ class Wikicode(StringMixIn): | |||
if not matches or re.search(matches, str(node), flags): | |||
yield node | |||
def filter(self, recursive=False, matches=None, flags=FLAGS, | |||
def filter(self, recursive=True, matches=None, flags=FLAGS, | |||
forcetype=None): | |||
"""Return a list of nodes within our list matching certain conditions. | |||
@@ -360,9 +424,8 @@ class Wikicode(StringMixIn): | |||
""" | |||
if matches: | |||
matches = r"^(=+?)\s*" + matches + r"\s*\1$" | |||
headings = self.filter_headings(recursive=True) | |||
filtered = self.filter_headings(recursive=True, matches=matches, | |||
flags=flags) | |||
headings = self.filter_headings() | |||
filtered = self.filter_headings(matches=matches, flags=flags) | |||
if levels: | |||
filtered = [head for head in filtered if head.level in levels] | |||
@@ -446,6 +509,6 @@ class Wikicode(StringMixIn): | |||
return "\n".join(self._get_tree(self, [], marker, 0)) | |||
Wikicode._build_filter_methods( | |||
arguments=Argument, comments=Comment, headings=Heading, | |||
html_entities=HTMLEntity, tags=Tag, templates=Template, text=Text, | |||
wikilinks=Wikilink) | |||
arguments=Argument, comments=Comment, external_links=ExternalLink, | |||
headings=Heading, html_entities=HTMLEntity, tags=Tag, templates=Template, | |||
text=Text, wikilinks=Wikilink) |
@@ -29,16 +29,13 @@ from mwparserfromhell.compat import py3k | |||
with open("README.rst") as fp: | |||
long_docs = fp.read() | |||
# builder = Extension("mwparserfromhell.parser._builder", | |||
# sources = ["mwparserfromhell/parser/builder.c"]) | |||
tokenizer = Extension("mwparserfromhell.parser._tokenizer", | |||
sources = ["mwparserfromhell/parser/tokenizer.c"]) | |||
setup( | |||
name = "mwparserfromhell", | |||
packages = find_packages(exclude=("tests",)), | |||
ext_modules = [] if py3k else [tokenizer], | |||
ext_modules = [tokenizer], | |||
test_suite = "tests", | |||
version = __version__, | |||
author = "Ben Kurtovic", | |||
@@ -50,13 +47,13 @@ setup( | |||
keywords = "earwig mwparserfromhell wikipedia wiki mediawiki wikicode template parsing", | |||
license = "MIT License", | |||
classifiers = [ | |||
"Development Status :: 3 - Alpha", | |||
"Development Status :: 4 - Beta", | |||
"Environment :: Console", | |||
"Intended Audience :: Developers", | |||
"License :: OSI Approved :: MIT License", | |||
"Operating System :: OS Independent", | |||
"Programming Language :: Python :: 2.7", | |||
"Programming Language :: Python :: 3", | |||
"Programming Language :: Python :: 3.3", | |||
"Topic :: Text Processing :: Markup" | |||
], | |||
) |
@@ -91,7 +91,27 @@ class TreeEqualityTestCase(TestCase): | |||
def assertTagNodeEqual(self, expected, actual): | |||
"""Assert that two Tag nodes have the same data.""" | |||
self.fail("Holding this until feature/html_tags is ready.") | |||
self.assertWikicodeEqual(expected.tag, actual.tag) | |||
if expected.contents is not None: | |||
self.assertWikicodeEqual(expected.contents, actual.contents) | |||
length = len(expected.attributes) | |||
self.assertEqual(length, len(actual.attributes)) | |||
for i in range(length): | |||
exp_attr = expected.attributes[i] | |||
act_attr = actual.attributes[i] | |||
self.assertWikicodeEqual(exp_attr.name, act_attr.name) | |||
if exp_attr.value is not None: | |||
self.assertWikicodeEqual(exp_attr.value, act_attr.value) | |||
self.assertIs(exp_attr.quoted, act_attr.quoted) | |||
self.assertEqual(exp_attr.pad_first, act_attr.pad_first) | |||
self.assertEqual(exp_attr.pad_before_eq, act_attr.pad_before_eq) | |||
self.assertEqual(exp_attr.pad_after_eq, act_attr.pad_after_eq) | |||
self.assertIs(expected.wiki_markup, actual.wiki_markup) | |||
self.assertIs(expected.self_closing, actual.self_closing) | |||
self.assertIs(expected.invalid, actual.invalid) | |||
self.assertIs(expected.implicit, actual.implicit) | |||
self.assertEqual(expected.padding, actual.padding) | |||
self.assertWikicodeEqual(expected.closing_tag, actual.closing_tag) | |||
def assertTemplateNodeEqual(self, expected, actual): | |||
"""Assert that two Template nodes have the same data.""" | |||
@@ -0,0 +1,89 @@ | |||
# -*- coding: utf-8 -*- | |||
# | |||
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net> | |||
# | |||
# Permission is hereby granted, free of charge, to any person obtaining a copy | |||
# of this software and associated documentation files (the "Software"), to deal | |||
# in the Software without restriction, including without limitation the rights | |||
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | |||
# copies of the Software, and to permit persons to whom the Software is | |||
# furnished to do so, subject to the following conditions: | |||
# | |||
# The above copyright notice and this permission notice shall be included in | |||
# all copies or substantial portions of the Software. | |||
# | |||
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | |||
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | |||
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | |||
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | |||
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | |||
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | |||
# SOFTWARE. | |||
from __future__ import unicode_literals | |||
import unittest | |||
from mwparserfromhell.compat import str | |||
from mwparserfromhell.nodes import Template | |||
from mwparserfromhell.nodes.extras import Attribute | |||
from ._test_tree_equality import TreeEqualityTestCase, wrap, wraptext | |||
class TestAttribute(TreeEqualityTestCase): | |||
"""Test cases for the Attribute node extra.""" | |||
def test_unicode(self): | |||
"""test Attribute.__unicode__()""" | |||
node = Attribute(wraptext("foo")) | |||
self.assertEqual(" foo", str(node)) | |||
node2 = Attribute(wraptext("foo"), wraptext("bar")) | |||
self.assertEqual(' foo="bar"', str(node2)) | |||
node3 = Attribute(wraptext("a"), wraptext("b"), True, "", " ", " ") | |||
self.assertEqual('a = "b"', str(node3)) | |||
node3 = Attribute(wraptext("a"), wraptext("b"), False, "", " ", " ") | |||
self.assertEqual("a = b", str(node3)) | |||
node4 = Attribute(wraptext("a"), wrap([]), False, " ", "", " ") | |||
self.assertEqual(" a= ", str(node4)) | |||
def test_name(self): | |||
"""test getter/setter for the name attribute""" | |||
name = wraptext("id") | |||
node = Attribute(name, wraptext("bar")) | |||
self.assertIs(name, node.name) | |||
node.name = "{{id}}" | |||
self.assertWikicodeEqual(wrap([Template(wraptext("id"))]), node.name) | |||
def test_value(self): | |||
"""test getter/setter for the value attribute""" | |||
value = wraptext("foo") | |||
node = Attribute(wraptext("id"), value) | |||
self.assertIs(value, node.value) | |||
node.value = "{{bar}}" | |||
self.assertWikicodeEqual(wrap([Template(wraptext("bar"))]), node.value) | |||
node.value = None | |||
self.assertIs(None, node.value) | |||
def test_quoted(self): | |||
"""test getter/setter for the quoted attribute""" | |||
node1 = Attribute(wraptext("id"), wraptext("foo"), False) | |||
node2 = Attribute(wraptext("id"), wraptext("bar")) | |||
self.assertFalse(node1.quoted) | |||
self.assertTrue(node2.quoted) | |||
node1.quoted = True | |||
node2.quoted = "" | |||
self.assertTrue(node1.quoted) | |||
self.assertFalse(node2.quoted) | |||
def test_padding(self): | |||
"""test getter/setter for the padding attributes""" | |||
for pad in ["pad_first", "pad_before_eq", "pad_after_eq"]: | |||
node = Attribute(wraptext("id"), wraptext("foo"), **{pad: "\n"}) | |||
self.assertEqual("\n", getattr(node, pad)) | |||
setattr(node, pad, " ") | |||
self.assertEqual(" ", getattr(node, pad)) | |||
setattr(node, pad, None) | |||
self.assertEqual("", getattr(node, pad)) | |||
self.assertRaises(ValueError, setattr, node, pad, True) | |||
if __name__ == "__main__": | |||
unittest.main(verbosity=2) |
@@ -23,8 +23,8 @@ | |||
from __future__ import unicode_literals | |||
import unittest | |||
from mwparserfromhell.nodes import (Argument, Comment, Heading, HTMLEntity, | |||
Tag, Template, Text, Wikilink) | |||
from mwparserfromhell.nodes import (Argument, Comment, ExternalLink, Heading, | |||
HTMLEntity, Tag, Template, Text, Wikilink) | |||
from mwparserfromhell.nodes.extras import Attribute, Parameter | |||
from mwparserfromhell.parser import tokens | |||
from mwparserfromhell.parser.builder import Builder | |||
@@ -72,6 +72,14 @@ class TestBuilder(TreeEqualityTestCase): | |||
wrap([Template(wraptext("foo"), params=[ | |||
Parameter(wraptext("bar"), wraptext("baz"))])])), | |||
([tokens.TemplateOpen(), tokens.TemplateParamSeparator(), | |||
tokens.TemplateParamSeparator(), tokens.TemplateParamEquals(), | |||
tokens.TemplateParamSeparator(), tokens.TemplateClose()], | |||
wrap([Template(wrap([]), params=[ | |||
Parameter(wraptext("1"), wrap([]), showkey=False), | |||
Parameter(wrap([]), wrap([]), showkey=True), | |||
Parameter(wraptext("2"), wrap([]), showkey=False)])])), | |||
([tokens.TemplateOpen(), tokens.Text(text="foo"), | |||
tokens.TemplateParamSeparator(), tokens.Text(text="bar"), | |||
tokens.TemplateParamEquals(), tokens.Text(text="baz"), | |||
@@ -142,6 +150,48 @@ class TestBuilder(TreeEqualityTestCase): | |||
for test, valid in tests: | |||
self.assertWikicodeEqual(valid, self.builder.build(test)) | |||
def test_external_link(self): | |||
"""tests for building ExternalLink nodes""" | |||
tests = [ | |||
([tokens.ExternalLinkOpen(brackets=False), | |||
tokens.Text(text="http://example.com/"), | |||
tokens.ExternalLinkClose()], | |||
wrap([ExternalLink(wraptext("http://example.com/"), | |||
brackets=False)])), | |||
([tokens.ExternalLinkOpen(brackets=True), | |||
tokens.Text(text="http://example.com/"), | |||
tokens.ExternalLinkClose()], | |||
wrap([ExternalLink(wraptext("http://example.com/"))])), | |||
([tokens.ExternalLinkOpen(brackets=True), | |||
tokens.Text(text="http://example.com/"), | |||
tokens.ExternalLinkSeparator(), tokens.ExternalLinkClose()], | |||
wrap([ExternalLink(wraptext("http://example.com/"), wrap([]))])), | |||
([tokens.ExternalLinkOpen(brackets=True), | |||
tokens.Text(text="http://example.com/"), | |||
tokens.ExternalLinkSeparator(), tokens.Text(text="Example"), | |||
tokens.ExternalLinkClose()], | |||
wrap([ExternalLink(wraptext("http://example.com/"), | |||
wraptext("Example"))])), | |||
([tokens.ExternalLinkOpen(brackets=False), | |||
tokens.Text(text="http://example"), tokens.Text(text=".com/foo"), | |||
tokens.ExternalLinkClose()], | |||
wrap([ExternalLink(wraptext("http://example", ".com/foo"), | |||
brackets=False)])), | |||
([tokens.ExternalLinkOpen(brackets=True), | |||
tokens.Text(text="http://example"), tokens.Text(text=".com/foo"), | |||
tokens.ExternalLinkSeparator(), tokens.Text(text="Example"), | |||
tokens.Text(text=" Web Page"), tokens.ExternalLinkClose()], | |||
wrap([ExternalLink(wraptext("http://example", ".com/foo"), | |||
wraptext("Example", " Web Page"))])), | |||
] | |||
for test, valid in tests: | |||
self.assertWikicodeEqual(valid, self.builder.build(test)) | |||
def test_html_entity(self): | |||
"""tests for building HTMLEntity nodes""" | |||
tests = [ | |||
@@ -190,6 +240,129 @@ class TestBuilder(TreeEqualityTestCase): | |||
for test, valid in tests: | |||
self.assertWikicodeEqual(valid, self.builder.build(test)) | |||
def test_tag(self): | |||
"""tests for building Tag nodes""" | |||
tests = [ | |||
# <ref></ref> | |||
([tokens.TagOpenOpen(), tokens.Text(text="ref"), | |||
tokens.TagCloseOpen(padding=""), tokens.TagOpenClose(), | |||
tokens.Text(text="ref"), tokens.TagCloseClose()], | |||
wrap([Tag(wraptext("ref"), wrap([]), | |||
closing_tag=wraptext("ref"))])), | |||
# <ref name></ref> | |||
([tokens.TagOpenOpen(), tokens.Text(text="ref"), | |||
tokens.TagAttrStart(pad_first=" ", pad_before_eq="", | |||
pad_after_eq=""), | |||
tokens.Text(text="name"), tokens.TagCloseOpen(padding=""), | |||
tokens.TagOpenClose(), tokens.Text(text="ref"), | |||
tokens.TagCloseClose()], | |||
wrap([Tag(wraptext("ref"), wrap([]), | |||
attrs=[Attribute(wraptext("name"))])])), | |||
# <ref name="abc" /> | |||
([tokens.TagOpenOpen(), tokens.Text(text="ref"), | |||
tokens.TagAttrStart(pad_first=" ", pad_before_eq="", | |||
pad_after_eq=""), | |||
tokens.Text(text="name"), tokens.TagAttrEquals(), | |||
tokens.TagAttrQuote(), tokens.Text(text="abc"), | |||
tokens.TagCloseSelfclose(padding=" ")], | |||
wrap([Tag(wraptext("ref"), | |||
attrs=[Attribute(wraptext("name"), wraptext("abc"))], | |||
self_closing=True, padding=" ")])), | |||
# <br/> | |||
([tokens.TagOpenOpen(), tokens.Text(text="br"), | |||
tokens.TagCloseSelfclose(padding="")], | |||
wrap([Tag(wraptext("br"), self_closing=True)])), | |||
# <li> | |||
([tokens.TagOpenOpen(), tokens.Text(text="li"), | |||
tokens.TagCloseSelfclose(padding="", implicit=True)], | |||
wrap([Tag(wraptext("li"), self_closing=True, implicit=True)])), | |||
# </br> | |||
([tokens.TagOpenOpen(invalid=True), tokens.Text(text="br"), | |||
tokens.TagCloseSelfclose(padding="", implicit=True)], | |||
wrap([Tag(wraptext("br"), self_closing=True, invalid=True, | |||
implicit=True)])), | |||
# </br/> | |||
([tokens.TagOpenOpen(invalid=True), tokens.Text(text="br"), | |||
tokens.TagCloseSelfclose(padding="")], | |||
wrap([Tag(wraptext("br"), self_closing=True, invalid=True)])), | |||
# <ref name={{abc}} foo="bar {{baz}}" abc={{de}}f ghi=j{{k}}{{l}} | |||
# mno = "{{p}} [[q]] {{r}}">[[Source]]</ref> | |||
([tokens.TagOpenOpen(), tokens.Text(text="ref"), | |||
tokens.TagAttrStart(pad_first=" ", pad_before_eq="", | |||
pad_after_eq=""), | |||
tokens.Text(text="name"), tokens.TagAttrEquals(), | |||
tokens.TemplateOpen(), tokens.Text(text="abc"), | |||
tokens.TemplateClose(), | |||
tokens.TagAttrStart(pad_first=" ", pad_before_eq="", | |||
pad_after_eq=""), | |||
tokens.Text(text="foo"), tokens.TagAttrEquals(), | |||
tokens.TagAttrQuote(), tokens.Text(text="bar "), | |||
tokens.TemplateOpen(), tokens.Text(text="baz"), | |||
tokens.TemplateClose(), | |||
tokens.TagAttrStart(pad_first=" ", pad_before_eq="", | |||
pad_after_eq=""), | |||
tokens.Text(text="abc"), tokens.TagAttrEquals(), | |||
tokens.TemplateOpen(), tokens.Text(text="de"), | |||
tokens.TemplateClose(), tokens.Text(text="f"), | |||
tokens.TagAttrStart(pad_first=" ", pad_before_eq="", | |||
pad_after_eq=""), | |||
tokens.Text(text="ghi"), tokens.TagAttrEquals(), | |||
tokens.Text(text="j"), tokens.TemplateOpen(), | |||
tokens.Text(text="k"), tokens.TemplateClose(), | |||
tokens.TemplateOpen(), tokens.Text(text="l"), | |||
tokens.TemplateClose(), | |||
tokens.TagAttrStart(pad_first=" \n ", pad_before_eq=" ", | |||
pad_after_eq=" "), | |||
tokens.Text(text="mno"), tokens.TagAttrEquals(), | |||
tokens.TagAttrQuote(), tokens.TemplateOpen(), | |||
tokens.Text(text="p"), tokens.TemplateClose(), | |||
tokens.Text(text=" "), tokens.WikilinkOpen(), | |||
tokens.Text(text="q"), tokens.WikilinkClose(), | |||
tokens.Text(text=" "), tokens.TemplateOpen(), | |||
tokens.Text(text="r"), tokens.TemplateClose(), | |||
tokens.TagCloseOpen(padding=""), tokens.WikilinkOpen(), | |||
tokens.Text(text="Source"), tokens.WikilinkClose(), | |||
tokens.TagOpenClose(), tokens.Text(text="ref"), | |||
tokens.TagCloseClose()], | |||
wrap([Tag(wraptext("ref"), wrap([Wikilink(wraptext("Source"))]), [ | |||
Attribute(wraptext("name"), | |||
wrap([Template(wraptext("abc"))]), False), | |||
Attribute(wraptext("foo"), wrap([Text("bar "), | |||
Template(wraptext("baz"))]), pad_first=" "), | |||
Attribute(wraptext("abc"), wrap([Template(wraptext("de")), | |||
Text("f")]), False), | |||
Attribute(wraptext("ghi"), wrap([Text("j"), | |||
Template(wraptext("k")), | |||
Template(wraptext("l"))]), False), | |||
Attribute(wraptext("mno"), wrap([Template(wraptext("p")), | |||
Text(" "), Wikilink(wraptext("q")), Text(" "), | |||
Template(wraptext("r"))]), True, " \n ", " ", | |||
" ")])])), | |||
# "''italic text''" | |||
([tokens.TagOpenOpen(wiki_markup="''"), tokens.Text(text="i"), | |||
tokens.TagCloseOpen(), tokens.Text(text="italic text"), | |||
tokens.TagOpenClose(), tokens.Text(text="i"), | |||
tokens.TagCloseClose()], | |||
wrap([Tag(wraptext("i"), wraptext("italic text"), | |||
wiki_markup="''")])), | |||
# * bullet | |||
([tokens.TagOpenOpen(wiki_markup="*"), tokens.Text(text="li"), | |||
tokens.TagCloseSelfclose(), tokens.Text(text=" bullet")], | |||
wrap([Tag(wraptext("li"), wiki_markup="*", self_closing=True), | |||
Text(" bullet")])), | |||
] | |||
for test, valid in tests: | |||
self.assertWikicodeEqual(valid, self.builder.build(test)) | |||
def test_integration(self): | |||
"""a test for building a combination of templates together""" | |||
# {{{{{{{{foo}}bar|baz=biz}}buzz}}usr|{{bin}}}} | |||
@@ -61,36 +61,36 @@ class TestDocs(unittest.TestCase): | |||
def test_readme_2(self): | |||
"""test a block of example code in the README""" | |||
text = "{{foo|{{bar}}={{baz|{{spam}}}}}}" | |||
temps = mwparserfromhell.parse(text).filter_templates() | |||
if py3k: | |||
res = "['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}']" | |||
else: | |||
res = "[u'{{foo|{{bar}}={{baz|{{spam}}}}}}', u'{{bar}}', u'{{baz|{{spam}}}}', u'{{spam}}']" | |||
self.assertPrint(temps, res) | |||
def test_readme_3(self): | |||
"""test a block of example code in the README""" | |||
code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}") | |||
if py3k: | |||
self.assertPrint(code.filter_templates(), | |||
self.assertPrint(code.filter_templates(recursive=False), | |||
"['{{foo|this {{includes a|template}}}}']") | |||
else: | |||
self.assertPrint(code.filter_templates(), | |||
self.assertPrint(code.filter_templates(recursive=False), | |||
"[u'{{foo|this {{includes a|template}}}}']") | |||
foo = code.filter_templates()[0] | |||
foo = code.filter_templates(recursive=False)[0] | |||
self.assertPrint(foo.get(1).value, "this {{includes a|template}}") | |||
self.assertPrint(foo.get(1).value.filter_templates()[0], | |||
"{{includes a|template}}") | |||
self.assertPrint(foo.get(1).value.filter_templates()[0].get(1).value, | |||
"template") | |||
def test_readme_3(self): | |||
"""test a block of example code in the README""" | |||
text = "{{foo|{{bar}}={{baz|{{spam}}}}}}" | |||
temps = mwparserfromhell.parse(text).filter_templates(recursive=True) | |||
if py3k: | |||
res = "['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}']" | |||
else: | |||
res = "[u'{{foo|{{bar}}={{baz|{{spam}}}}}}', u'{{bar}}', u'{{baz|{{spam}}}}', u'{{spam}}']" | |||
self.assertPrint(temps, res) | |||
def test_readme_4(self): | |||
"""test a block of example code in the README""" | |||
text = "{{cleanup}} '''Foo''' is a [[bar]]. {{uncategorized}}" | |||
code = mwparserfromhell.parse(text) | |||
for template in code.filter_templates(): | |||
if template.name == "cleanup" and not template.has_param("date"): | |||
if template.name.matches("Cleanup") and not template.has("date"): | |||
template.add("date", "July 2012") | |||
res = "{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{uncategorized}}" | |||
self.assertPrint(code, res) | |||
@@ -0,0 +1,130 @@ | |||
# -*- coding: utf-8 -*- | |||
# | |||
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net> | |||
# | |||
# Permission is hereby granted, free of charge, to any person obtaining a copy | |||
# of this software and associated documentation files (the "Software"), to deal | |||
# in the Software without restriction, including without limitation the rights | |||
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | |||
# copies of the Software, and to permit persons to whom the Software is | |||
# furnished to do so, subject to the following conditions: | |||
# | |||
# The above copyright notice and this permission notice shall be included in | |||
# all copies or substantial portions of the Software. | |||
# | |||
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | |||
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | |||
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | |||
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | |||
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | |||
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | |||
# SOFTWARE. | |||
from __future__ import unicode_literals | |||
import unittest | |||
from mwparserfromhell.compat import str | |||
from mwparserfromhell.nodes import ExternalLink, Text | |||
from ._test_tree_equality import TreeEqualityTestCase, getnodes, wrap, wraptext | |||
class TestExternalLink(TreeEqualityTestCase): | |||
"""Test cases for the ExternalLink node.""" | |||
def test_unicode(self): | |||
"""test ExternalLink.__unicode__()""" | |||
node = ExternalLink(wraptext("http://example.com/"), brackets=False) | |||
self.assertEqual("http://example.com/", str(node)) | |||
node2 = ExternalLink(wraptext("http://example.com/")) | |||
self.assertEqual("[http://example.com/]", str(node2)) | |||
node3 = ExternalLink(wraptext("http://example.com/"), wrap([])) | |||
self.assertEqual("[http://example.com/ ]", str(node3)) | |||
node4 = ExternalLink(wraptext("http://example.com/"), | |||
wraptext("Example Web Page")) | |||
self.assertEqual("[http://example.com/ Example Web Page]", str(node4)) | |||
def test_iternodes(self): | |||
"""test ExternalLink.__iternodes__()""" | |||
node1n1 = Text("http://example.com/") | |||
node2n1 = Text("http://example.com/") | |||
node2n2, node2n3 = Text("Example"), Text("Page") | |||
node1 = ExternalLink(wrap([node1n1]), brackets=False) | |||
node2 = ExternalLink(wrap([node2n1]), wrap([node2n2, node2n3])) | |||
gen1 = node1.__iternodes__(getnodes) | |||
gen2 = node2.__iternodes__(getnodes) | |||
self.assertEqual((None, node1), next(gen1)) | |||
self.assertEqual((None, node2), next(gen2)) | |||
self.assertEqual((node1.url, node1n1), next(gen1)) | |||
self.assertEqual((node2.url, node2n1), next(gen2)) | |||
self.assertEqual((node2.title, node2n2), next(gen2)) | |||
self.assertEqual((node2.title, node2n3), next(gen2)) | |||
self.assertRaises(StopIteration, next, gen1) | |||
self.assertRaises(StopIteration, next, gen2) | |||
def test_strip(self): | |||
"""test ExternalLink.__strip__()""" | |||
node1 = ExternalLink(wraptext("http://example.com"), brackets=False) | |||
node2 = ExternalLink(wraptext("http://example.com")) | |||
node3 = ExternalLink(wraptext("http://example.com"), wrap([])) | |||
node4 = ExternalLink(wraptext("http://example.com"), wraptext("Link")) | |||
for a in (True, False): | |||
for b in (True, False): | |||
self.assertEqual("http://example.com", node1.__strip__(a, b)) | |||
self.assertEqual(None, node2.__strip__(a, b)) | |||
self.assertEqual(None, node3.__strip__(a, b)) | |||
self.assertEqual("Link", node4.__strip__(a, b)) | |||
def test_showtree(self): | |||
"""test ExternalLink.__showtree__()""" | |||
output = [] | |||
getter, marker = object(), object() | |||
get = lambda code: output.append((getter, code)) | |||
mark = lambda: output.append(marker) | |||
node1 = ExternalLink(wraptext("http://example.com"), brackets=False) | |||
node2 = ExternalLink(wraptext("http://example.com"), wraptext("Link")) | |||
node1.__showtree__(output.append, get, mark) | |||
node2.__showtree__(output.append, get, mark) | |||
valid = [ | |||
(getter, node1.url), "[", (getter, node2.url), | |||
(getter, node2.title), "]"] | |||
self.assertEqual(valid, output) | |||
def test_url(self): | |||
"""test getter/setter for the url attribute""" | |||
url = wraptext("http://example.com/") | |||
node1 = ExternalLink(url, brackets=False) | |||
node2 = ExternalLink(url, wraptext("Example")) | |||
self.assertIs(url, node1.url) | |||
self.assertIs(url, node2.url) | |||
node1.url = "mailto:héhehé@spam.com" | |||
node2.url = "mailto:héhehé@spam.com" | |||
self.assertWikicodeEqual(wraptext("mailto:héhehé@spam.com"), node1.url) | |||
self.assertWikicodeEqual(wraptext("mailto:héhehé@spam.com"), node2.url) | |||
def test_title(self): | |||
"""test getter/setter for the title attribute""" | |||
title = wraptext("Example!") | |||
node1 = ExternalLink(wraptext("http://example.com/"), brackets=False) | |||
node2 = ExternalLink(wraptext("http://example.com/"), title) | |||
self.assertIs(None, node1.title) | |||
self.assertIs(title, node2.title) | |||
node2.title = None | |||
self.assertIs(None, node2.title) | |||
node2.title = "My Website" | |||
self.assertWikicodeEqual(wraptext("My Website"), node2.title) | |||
def test_brackets(self): | |||
"""test getter/setter for the brackets attribute""" | |||
node1 = ExternalLink(wraptext("http://example.com/"), brackets=False) | |||
node2 = ExternalLink(wraptext("http://example.com/"), wraptext("Link")) | |||
self.assertFalse(node1.brackets) | |||
self.assertTrue(node2.brackets) | |||
node1.brackets = True | |||
node2.brackets = False | |||
self.assertTrue(node1.brackets) | |||
self.assertFalse(node2.brackets) | |||
self.assertEqual("[http://example.com/]", str(node1)) | |||
self.assertEqual("http://example.com/", str(node2)) | |||
if __name__ == "__main__": | |||
unittest.main(verbosity=2) |
@@ -36,9 +36,9 @@ class TestParser(TreeEqualityTestCase): | |||
def test_use_c(self): | |||
"""make sure the correct tokenizer is used""" | |||
if parser.use_c: | |||
self.assertTrue(parser.Parser(None)._tokenizer.USES_C) | |||
self.assertTrue(parser.Parser()._tokenizer.USES_C) | |||
parser.use_c = False | |||
self.assertFalse(parser.Parser(None)._tokenizer.USES_C) | |||
self.assertFalse(parser.Parser()._tokenizer.USES_C) | |||
def test_parsing(self): | |||
"""integration test for parsing overall""" | |||
@@ -59,7 +59,7 @@ class TestParser(TreeEqualityTestCase): | |||
])) | |||
]) | |||
]) | |||
actual = parser.Parser(text).parse() | |||
actual = parser.Parser().parse(text) | |||
self.assertWikicodeEqual(expected, actual) | |||
if __name__ == "__main__": | |||
@@ -0,0 +1,315 @@ | |||
# -*- coding: utf-8 -*- | |||
# | |||
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net> | |||
# | |||
# Permission is hereby granted, free of charge, to any person obtaining a copy | |||
# of this software and associated documentation files (the "Software"), to deal | |||
# in the Software without restriction, including without limitation the rights | |||
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | |||
# copies of the Software, and to permit persons to whom the Software is | |||
# furnished to do so, subject to the following conditions: | |||
# | |||
# The above copyright notice and this permission notice shall be included in | |||
# all copies or substantial portions of the Software. | |||
# | |||
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | |||
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | |||
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | |||
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | |||
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | |||
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | |||
# SOFTWARE. | |||
from __future__ import unicode_literals | |||
import unittest | |||
from mwparserfromhell.compat import str | |||
from mwparserfromhell.nodes import Tag, Template, Text | |||
from mwparserfromhell.nodes.extras import Attribute | |||
from ._test_tree_equality import TreeEqualityTestCase, getnodes, wrap, wraptext | |||
agen = lambda name, value: Attribute(wraptext(name), wraptext(value)) | |||
agennq = lambda name, value: Attribute(wraptext(name), wraptext(value), False) | |||
agenp = lambda name, v, a, b, c: Attribute(wraptext(name), v, True, a, b, c) | |||
agenpnv = lambda name, a, b, c: Attribute(wraptext(name), None, True, a, b, c) | |||
class TestTag(TreeEqualityTestCase): | |||
"""Test cases for the Tag node.""" | |||
def test_unicode(self): | |||
"""test Tag.__unicode__()""" | |||
node1 = Tag(wraptext("ref")) | |||
node2 = Tag(wraptext("span"), wraptext("foo"), | |||
[agen("style", "color: red;")]) | |||
node3 = Tag(wraptext("ref"), | |||
attrs=[agennq("name", "foo"), | |||
agenpnv("some_attr", " ", "", "")], | |||
self_closing=True) | |||
node4 = Tag(wraptext("br"), self_closing=True, padding=" ") | |||
node5 = Tag(wraptext("br"), self_closing=True, implicit=True) | |||
node6 = Tag(wraptext("br"), self_closing=True, invalid=True, | |||
implicit=True) | |||
node7 = Tag(wraptext("br"), self_closing=True, invalid=True, | |||
padding=" ") | |||
node8 = Tag(wraptext("hr"), wiki_markup="----", self_closing=True) | |||
node9 = Tag(wraptext("i"), wraptext("italics!"), wiki_markup="''") | |||
self.assertEqual("<ref></ref>", str(node1)) | |||
self.assertEqual('<span style="color: red;">foo</span>', str(node2)) | |||
self.assertEqual("<ref name=foo some_attr/>", str(node3)) | |||
self.assertEqual("<br />", str(node4)) | |||
self.assertEqual("<br>", str(node5)) | |||
self.assertEqual("</br>", str(node6)) | |||
self.assertEqual("</br />", str(node7)) | |||
self.assertEqual("----", str(node8)) | |||
self.assertEqual("''italics!''", str(node9)) | |||
def test_iternodes(self): | |||
"""test Tag.__iternodes__()""" | |||
node1n1, node1n2 = Text("ref"), Text("foobar") | |||
node2n1, node3n1, node3n2 = Text("bold text"), Text("img"), Text("id") | |||
node3n3, node3n4, node3n5 = Text("foo"), Text("class"), Text("bar") | |||
# <ref>foobar</ref> | |||
node1 = Tag(wrap([node1n1]), wrap([node1n2])) | |||
# '''bold text''' | |||
node2 = Tag(wraptext("b"), wrap([node2n1]), wiki_markup="'''") | |||
# <img id="foo" class="bar" /> | |||
node3 = Tag(wrap([node3n1]), | |||
attrs=[Attribute(wrap([node3n2]), wrap([node3n3])), | |||
Attribute(wrap([node3n4]), wrap([node3n5]))], | |||
self_closing=True, padding=" ") | |||
gen1 = node1.__iternodes__(getnodes) | |||
gen2 = node2.__iternodes__(getnodes) | |||
gen3 = node3.__iternodes__(getnodes) | |||
self.assertEqual((None, node1), next(gen1)) | |||
self.assertEqual((None, node2), next(gen2)) | |||
self.assertEqual((None, node3), next(gen3)) | |||
self.assertEqual((node1.tag, node1n1), next(gen1)) | |||
self.assertEqual((node3.tag, node3n1), next(gen3)) | |||
self.assertEqual((node3.attributes[0].name, node3n2), next(gen3)) | |||
self.assertEqual((node3.attributes[0].value, node3n3), next(gen3)) | |||
self.assertEqual((node3.attributes[1].name, node3n4), next(gen3)) | |||
self.assertEqual((node3.attributes[1].value, node3n5), next(gen3)) | |||
self.assertEqual((node1.contents, node1n2), next(gen1)) | |||
self.assertEqual((node2.contents, node2n1), next(gen2)) | |||
self.assertEqual((node1.closing_tag, node1n1), next(gen1)) | |||
self.assertRaises(StopIteration, next, gen1) | |||
self.assertRaises(StopIteration, next, gen2) | |||
self.assertRaises(StopIteration, next, gen3) | |||
def test_strip(self): | |||
"""test Tag.__strip__()""" | |||
node1 = Tag(wraptext("i"), wraptext("foobar")) | |||
node2 = Tag(wraptext("math"), wraptext("foobar")) | |||
node3 = Tag(wraptext("br"), self_closing=True) | |||
for a in (True, False): | |||
for b in (True, False): | |||
self.assertEqual("foobar", node1.__strip__(a, b)) | |||
self.assertEqual(None, node2.__strip__(a, b)) | |||
self.assertEqual(None, node3.__strip__(a, b)) | |||
def test_showtree(self): | |||
"""test Tag.__showtree__()""" | |||
output = [] | |||
getter, marker = object(), object() | |||
get = lambda code: output.append((getter, code)) | |||
mark = lambda: output.append(marker) | |||
node1 = Tag(wraptext("ref"), wraptext("text"), [agen("name", "foo")]) | |||
node2 = Tag(wraptext("br"), self_closing=True, padding=" ") | |||
node3 = Tag(wraptext("br"), self_closing=True, invalid=True, | |||
implicit=True, padding=" ") | |||
node1.__showtree__(output.append, get, mark) | |||
node2.__showtree__(output.append, get, mark) | |||
node3.__showtree__(output.append, get, mark) | |||
valid = [ | |||
"<", (getter, node1.tag), (getter, node1.attributes[0].name), | |||
" = ", marker, (getter, node1.attributes[0].value), ">", | |||
(getter, node1.contents), "</", (getter, node1.closing_tag), ">", | |||
"<", (getter, node2.tag), "/>", "</", (getter, node3.tag), ">"] | |||
self.assertEqual(valid, output) | |||
def test_tag(self): | |||
"""test getter/setter for the tag attribute""" | |||
tag = wraptext("ref") | |||
node = Tag(tag, wraptext("text")) | |||
self.assertIs(tag, node.tag) | |||
self.assertIs(tag, node.closing_tag) | |||
node.tag = "span" | |||
self.assertWikicodeEqual(wraptext("span"), node.tag) | |||
self.assertWikicodeEqual(wraptext("span"), node.closing_tag) | |||
self.assertEqual("<span>text</span>", node) | |||
def test_contents(self): | |||
"""test getter/setter for the contents attribute""" | |||
contents = wraptext("text") | |||
node = Tag(wraptext("ref"), contents) | |||
self.assertIs(contents, node.contents) | |||
node.contents = "text and a {{template}}" | |||
parsed = wrap([Text("text and a "), Template(wraptext("template"))]) | |||
self.assertWikicodeEqual(parsed, node.contents) | |||
self.assertEqual("<ref>text and a {{template}}</ref>", node) | |||
def test_attributes(self): | |||
"""test getter for the attributes attribute""" | |||
attrs = [agen("name", "bar")] | |||
node1 = Tag(wraptext("ref"), wraptext("foo")) | |||
node2 = Tag(wraptext("ref"), wraptext("foo"), attrs) | |||
self.assertEqual([], node1.attributes) | |||
self.assertIs(attrs, node2.attributes) | |||
def test_wiki_markup(self): | |||
"""test getter/setter for the wiki_markup attribute""" | |||
node = Tag(wraptext("i"), wraptext("italic text")) | |||
self.assertIs(None, node.wiki_markup) | |||
node.wiki_markup = "''" | |||
self.assertEqual("''", node.wiki_markup) | |||
self.assertEqual("''italic text''", node) | |||
node.wiki_markup = False | |||
self.assertFalse(node.wiki_markup) | |||
self.assertEqual("<i>italic text</i>", node) | |||
def test_self_closing(self): | |||
"""test getter/setter for the self_closing attribute""" | |||
node = Tag(wraptext("ref"), wraptext("foobar")) | |||
self.assertFalse(node.self_closing) | |||
node.self_closing = True | |||
self.assertTrue(node.self_closing) | |||
self.assertEqual("<ref/>", node) | |||
node.self_closing = 0 | |||
self.assertFalse(node.self_closing) | |||
self.assertEqual("<ref>foobar</ref>", node) | |||
def test_invalid(self): | |||
"""test getter/setter for the invalid attribute""" | |||
node = Tag(wraptext("br"), self_closing=True, implicit=True) | |||
self.assertFalse(node.invalid) | |||
node.invalid = True | |||
self.assertTrue(node.invalid) | |||
self.assertEqual("</br>", node) | |||
node.invalid = 0 | |||
self.assertFalse(node.invalid) | |||
self.assertEqual("<br>", node) | |||
def test_implicit(self): | |||
"""test getter/setter for the implicit attribute""" | |||
node = Tag(wraptext("br"), self_closing=True) | |||
self.assertFalse(node.implicit) | |||
node.implicit = True | |||
self.assertTrue(node.implicit) | |||
self.assertEqual("<br>", node) | |||
node.implicit = 0 | |||
self.assertFalse(node.implicit) | |||
self.assertEqual("<br/>", node) | |||
def test_padding(self): | |||
"""test getter/setter for the padding attribute""" | |||
node = Tag(wraptext("ref"), wraptext("foobar")) | |||
self.assertEqual("", node.padding) | |||
node.padding = " " | |||
self.assertEqual(" ", node.padding) | |||
self.assertEqual("<ref >foobar</ref>", node) | |||
node.padding = None | |||
self.assertEqual("", node.padding) | |||
self.assertEqual("<ref>foobar</ref>", node) | |||
self.assertRaises(ValueError, setattr, node, "padding", True) | |||
def test_closing_tag(self): | |||
"""test getter/setter for the closing_tag attribute""" | |||
tag = wraptext("ref") | |||
node = Tag(tag, wraptext("foobar")) | |||
self.assertIs(tag, node.closing_tag) | |||
node.closing_tag = "ref {{ignore me}}" | |||
parsed = wrap([Text("ref "), Template(wraptext("ignore me"))]) | |||
self.assertWikicodeEqual(parsed, node.closing_tag) | |||
self.assertEqual("<ref>foobar</ref {{ignore me}}>", node) | |||
def test_has(self): | |||
"""test Tag.has()""" | |||
node = Tag(wraptext("ref"), wraptext("cite"), [agen("name", "foo")]) | |||
self.assertTrue(node.has("name")) | |||
self.assertTrue(node.has(" name ")) | |||
self.assertTrue(node.has(wraptext("name"))) | |||
self.assertFalse(node.has("Name")) | |||
self.assertFalse(node.has("foo")) | |||
attrs = [agen("id", "foo"), agenp("class", "bar", " ", "\n", "\n"), | |||
agen("foo", "bar"), agenpnv("foo", " ", " \n ", " \t")] | |||
node2 = Tag(wraptext("div"), attrs=attrs, self_closing=True) | |||
self.assertTrue(node2.has("id")) | |||
self.assertTrue(node2.has("class")) | |||
self.assertTrue(node2.has(attrs[1].pad_first + str(attrs[1].name) + | |||
attrs[1].pad_before_eq)) | |||
self.assertTrue(node2.has(attrs[3])) | |||
self.assertTrue(node2.has(str(attrs[3]))) | |||
self.assertFalse(node2.has("idclass")) | |||
self.assertFalse(node2.has("id class")) | |||
self.assertFalse(node2.has("id=foo")) | |||
def test_get(self): | |||
"""test Tag.get()""" | |||
attrs = [agen("name", "foo")] | |||
node = Tag(wraptext("ref"), wraptext("cite"), attrs) | |||
self.assertIs(attrs[0], node.get("name")) | |||
self.assertIs(attrs[0], node.get(" name ")) | |||
self.assertIs(attrs[0], node.get(wraptext("name"))) | |||
self.assertRaises(ValueError, node.get, "Name") | |||
self.assertRaises(ValueError, node.get, "foo") | |||
attrs = [agen("id", "foo"), agenp("class", "bar", " ", "\n", "\n"), | |||
agen("foo", "bar"), agenpnv("foo", " ", " \n ", " \t")] | |||
node2 = Tag(wraptext("div"), attrs=attrs, self_closing=True) | |||
self.assertIs(attrs[0], node2.get("id")) | |||
self.assertIs(attrs[1], node2.get("class")) | |||
self.assertIs(attrs[1], node2.get( | |||
attrs[1].pad_first + str(attrs[1].name) + attrs[1].pad_before_eq)) | |||
self.assertIs(attrs[3], node2.get(attrs[3])) | |||
self.assertIs(attrs[3], node2.get(str(attrs[3]))) | |||
self.assertIs(attrs[3], node2.get(" foo")) | |||
self.assertRaises(ValueError, node2.get, "idclass") | |||
self.assertRaises(ValueError, node2.get, "id class") | |||
self.assertRaises(ValueError, node2.get, "id=foo") | |||
def test_add(self): | |||
"""test Tag.add()""" | |||
node = Tag(wraptext("ref"), wraptext("cite")) | |||
node.add("name", "value") | |||
node.add("name", "value", quoted=False) | |||
node.add("name") | |||
node.add(1, False) | |||
node.add("style", "{{foobar}}") | |||
node.add("name", "value", True, "\n", " ", " ") | |||
attr1 = ' name="value"' | |||
attr2 = " name=value" | |||
attr3 = " name" | |||
attr4 = ' 1="False"' | |||
attr5 = ' style="{{foobar}}"' | |||
attr6 = '\nname = "value"' | |||
self.assertEqual(attr1, node.attributes[0]) | |||
self.assertEqual(attr2, node.attributes[1]) | |||
self.assertEqual(attr3, node.attributes[2]) | |||
self.assertEqual(attr4, node.attributes[3]) | |||
self.assertEqual(attr5, node.attributes[4]) | |||
self.assertEqual(attr6, node.attributes[5]) | |||
self.assertEqual(attr6, node.get("name")) | |||
self.assertWikicodeEqual(wrap([Template(wraptext("foobar"))]), | |||
node.attributes[4].value) | |||
self.assertEqual("".join(("<ref", attr1, attr2, attr3, attr4, attr5, | |||
attr6, ">cite</ref>")), node) | |||
def test_remove(self): | |||
"""test Tag.remove()""" | |||
attrs = [agen("id", "foo"), agenp("class", "bar", " ", "\n", "\n"), | |||
agen("foo", "bar"), agenpnv("foo", " ", " \n ", " \t")] | |||
node = Tag(wraptext("div"), attrs=attrs, self_closing=True) | |||
node.remove("class") | |||
self.assertEqual('<div id="foo" foo="bar" foo \n />', node) | |||
node.remove("foo") | |||
self.assertEqual('<div id="foo"/>', node) | |||
self.assertRaises(ValueError, node.remove, "foo") | |||
node.remove("id") | |||
self.assertEqual('<div/>', node) | |||
if __name__ == "__main__": | |||
unittest.main(verbosity=2) |
@@ -115,23 +115,23 @@ class TestTemplate(TreeEqualityTestCase): | |||
self.assertEqual([], node1.params) | |||
self.assertIs(plist, node2.params) | |||
def test_has_param(self): | |||
"""test Template.has_param()""" | |||
def test_has(self): | |||
"""test Template.has()""" | |||
node1 = Template(wraptext("foobar")) | |||
node2 = Template(wraptext("foo"), | |||
[pgenh("1", "bar"), pgens("\nabc ", "def")]) | |||
node3 = Template(wraptext("foo"), | |||
[pgenh("1", "a"), pgens("b", "c"), pgens("1", "d")]) | |||
node4 = Template(wraptext("foo"), [pgenh("1", "a"), pgens("b", " ")]) | |||
self.assertFalse(node1.has_param("foobar")) | |||
self.assertTrue(node2.has_param(1)) | |||
self.assertTrue(node2.has_param("abc")) | |||
self.assertFalse(node2.has_param("def")) | |||
self.assertTrue(node3.has_param("1")) | |||
self.assertTrue(node3.has_param(" b ")) | |||
self.assertFalse(node4.has_param("b")) | |||
self.assertTrue(node3.has_param("b", False)) | |||
self.assertTrue(node4.has_param("b", False)) | |||
self.assertFalse(node1.has("foobar")) | |||
self.assertTrue(node2.has(1)) | |||
self.assertTrue(node2.has("abc")) | |||
self.assertFalse(node2.has("def")) | |||
self.assertTrue(node3.has("1")) | |||
self.assertTrue(node3.has(" b ")) | |||
self.assertFalse(node4.has("b")) | |||
self.assertTrue(node3.has("b", False)) | |||
self.assertTrue(node4.has("b", False)) | |||
def test_get(self): | |||
"""test Template.get()""" | |||
@@ -44,8 +44,8 @@ class TestTokens(unittest.TestCase): | |||
self.assertEqual("bar", token2.foo) | |||
self.assertEqual(123, token2.baz) | |||
self.assertRaises(KeyError, lambda: token1.foo) | |||
self.assertRaises(KeyError, lambda: token2.bar) | |||
self.assertFalse(token1.foo) | |||
self.assertFalse(token2.bar) | |||
token1.spam = "eggs" | |||
token2.foo = "ham" | |||
@@ -53,7 +53,7 @@ class TestTokens(unittest.TestCase): | |||
self.assertEqual("eggs", token1.spam) | |||
self.assertEqual("ham", token2.foo) | |||
self.assertRaises(KeyError, lambda: token2.baz) | |||
self.assertFalse(token2.baz) | |||
self.assertRaises(KeyError, delattr, token2, "baz") | |||
def test_repr(self): | |||
@@ -21,6 +21,7 @@ | |||
# SOFTWARE. | |||
from __future__ import unicode_literals | |||
from functools import partial | |||
import re | |||
from types import GeneratorType | |||
import unittest | |||
@@ -122,66 +123,99 @@ class TestWikicode(TreeEqualityTestCase): | |||
code3.insert(-1000, "derp") | |||
self.assertEqual("derp{{foo}}bar[[baz]]", code3) | |||
def _test_search(self, meth, expected): | |||
"""Base test for insert_before(), insert_after(), and replace().""" | |||
code = parse("{{a}}{{b}}{{c}}{{d}}{{e}}") | |||
func = partial(meth, code) | |||
func("{{b}}", "x", recursive=True) | |||
func("{{d}}", "[[y]]", recursive=False) | |||
func(code.get(2), "z") | |||
self.assertEqual(expected[0], code) | |||
self.assertRaises(ValueError, func, "{{r}}", "n", recursive=True) | |||
self.assertRaises(ValueError, func, "{{r}}", "n", recursive=False) | |||
fake = parse("{{a}}").get(0) | |||
self.assertRaises(ValueError, func, fake, "n", recursive=True) | |||
self.assertRaises(ValueError, func, fake, "n", recursive=False) | |||
code2 = parse("{{a}}{{a}}{{a}}{{b}}{{b}}{{b}}") | |||
func = partial(meth, code2) | |||
func(code2.get(1), "c", recursive=False) | |||
func("{{a}}", "d", recursive=False) | |||
func(code2.get(-1), "e", recursive=True) | |||
func("{{b}}", "f", recursive=True) | |||
self.assertEqual(expected[1], code2) | |||
code3 = parse("{{a|{{b}}|{{c|d={{f}}}}}}") | |||
func = partial(meth, code3) | |||
obj = code3.get(0).params[0].value.get(0) | |||
self.assertRaises(ValueError, func, obj, "x", recursive=False) | |||
func(obj, "x", recursive=True) | |||
self.assertRaises(ValueError, func, "{{f}}", "y", recursive=False) | |||
func("{{f}}", "y", recursive=True) | |||
self.assertEqual(expected[2], code3) | |||
code4 = parse("{{a}}{{b}}{{c}}{{d}}{{e}}{{f}}{{g}}{{h}}{{i}}{{j}}") | |||
func = partial(meth, code4) | |||
fake = parse("{{b}}{{c}}") | |||
self.assertRaises(ValueError, func, fake, "q", recursive=False) | |||
self.assertRaises(ValueError, func, fake, "q", recursive=True) | |||
func("{{b}}{{c}}", "w", recursive=False) | |||
func("{{d}}{{e}}", "x", recursive=True) | |||
func(wrap(code4.nodes[-2:]), "y", recursive=False) | |||
func(wrap(code4.nodes[-2:]), "z", recursive=True) | |||
self.assertEqual(expected[3], code4) | |||
self.assertRaises(ValueError, func, "{{c}}{{d}}", "q", recursive=False) | |||
self.assertRaises(ValueError, func, "{{c}}{{d}}", "q", recursive=True) | |||
code5 = parse("{{a|{{b}}{{c}}|{{f|{{g}}={{h}}{{i}}}}}}") | |||
func = partial(meth, code5) | |||
self.assertRaises(ValueError, func, "{{b}}{{c}}", "x", recursive=False) | |||
func("{{b}}{{c}}", "x", recursive=True) | |||
obj = code5.get(0).params[1].value.get(0).params[0].value | |||
self.assertRaises(ValueError, func, obj, "y", recursive=False) | |||
func(obj, "y", recursive=True) | |||
self.assertEqual(expected[4], code5) | |||
code6 = parse("here is {{some text and a {{template}}}}") | |||
func = partial(meth, code6) | |||
self.assertRaises(ValueError, func, "text and", "ab", recursive=False) | |||
func("text and", "ab", recursive=True) | |||
self.assertRaises(ValueError, func, "is {{some", "cd", recursive=False) | |||
func("is {{some", "cd", recursive=True) | |||
self.assertEqual(expected[5], code6) | |||
def test_insert_before(self): | |||
"""test Wikicode.insert_before()""" | |||
code = parse("{{a}}{{b}}{{c}}{{d}}") | |||
code.insert_before("{{b}}", "x", recursive=True) | |||
code.insert_before("{{d}}", "[[y]]", recursive=False) | |||
self.assertEqual("{{a}}x{{b}}{{c}}[[y]]{{d}}", code) | |||
code.insert_before(code.get(2), "z") | |||
self.assertEqual("{{a}}xz{{b}}{{c}}[[y]]{{d}}", code) | |||
self.assertRaises(ValueError, code.insert_before, "{{r}}", "n", | |||
recursive=True) | |||
self.assertRaises(ValueError, code.insert_before, "{{r}}", "n", | |||
recursive=False) | |||
code2 = parse("{{a|{{b}}|{{c|d={{f}}}}}}") | |||
code2.insert_before(code2.get(0).params[0].value.get(0), "x", | |||
recursive=True) | |||
code2.insert_before("{{f}}", "y", recursive=True) | |||
self.assertEqual("{{a|x{{b}}|{{c|d=y{{f}}}}}}", code2) | |||
self.assertRaises(ValueError, code2.insert_before, "{{f}}", "y", | |||
recursive=False) | |||
meth = lambda code, *args, **kw: code.insert_before(*args, **kw) | |||
expected = [ | |||
"{{a}}xz{{b}}{{c}}[[y]]{{d}}{{e}}", | |||
"d{{a}}cd{{a}}d{{a}}f{{b}}f{{b}}ef{{b}}", | |||
"{{a|x{{b}}|{{c|d=y{{f}}}}}}", | |||
"{{a}}w{{b}}{{c}}x{{d}}{{e}}{{f}}{{g}}{{h}}yz{{i}}{{j}}", | |||
"{{a|x{{b}}{{c}}|{{f|{{g}}=y{{h}}{{i}}}}}}", | |||
"here cdis {{some abtext and a {{template}}}}"] | |||
self._test_search(meth, expected) | |||
def test_insert_after(self): | |||
"""test Wikicode.insert_after()""" | |||
code = parse("{{a}}{{b}}{{c}}{{d}}") | |||
code.insert_after("{{b}}", "x", recursive=True) | |||
code.insert_after("{{d}}", "[[y]]", recursive=False) | |||
self.assertEqual("{{a}}{{b}}x{{c}}{{d}}[[y]]", code) | |||
code.insert_after(code.get(2), "z") | |||
self.assertEqual("{{a}}{{b}}xz{{c}}{{d}}[[y]]", code) | |||
self.assertRaises(ValueError, code.insert_after, "{{r}}", "n", | |||
recursive=True) | |||
self.assertRaises(ValueError, code.insert_after, "{{r}}", "n", | |||
recursive=False) | |||
code2 = parse("{{a|{{b}}|{{c|d={{f}}}}}}") | |||
code2.insert_after(code2.get(0).params[0].value.get(0), "x", | |||
recursive=True) | |||
code2.insert_after("{{f}}", "y", recursive=True) | |||
self.assertEqual("{{a|{{b}}x|{{c|d={{f}}y}}}}", code2) | |||
self.assertRaises(ValueError, code2.insert_after, "{{f}}", "y", | |||
recursive=False) | |||
meth = lambda code, *args, **kw: code.insert_after(*args, **kw) | |||
expected = [ | |||
"{{a}}{{b}}xz{{c}}{{d}}[[y]]{{e}}", | |||
"{{a}}d{{a}}dc{{a}}d{{b}}f{{b}}f{{b}}fe", | |||
"{{a|{{b}}x|{{c|d={{f}}y}}}}", | |||
"{{a}}{{b}}{{c}}w{{d}}{{e}}x{{f}}{{g}}{{h}}{{i}}{{j}}yz", | |||
"{{a|{{b}}{{c}}x|{{f|{{g}}={{h}}{{i}}y}}}}", | |||
"here is {{somecd text andab a {{template}}}}"] | |||
self._test_search(meth, expected) | |||
def test_replace(self): | |||
"""test Wikicode.replace()""" | |||
code = parse("{{a}}{{b}}{{c}}{{d}}") | |||
code.replace("{{b}}", "x", recursive=True) | |||
code.replace("{{d}}", "[[y]]", recursive=False) | |||
self.assertEqual("{{a}}x{{c}}[[y]]", code) | |||
code.replace(code.get(1), "z") | |||
self.assertEqual("{{a}}z{{c}}[[y]]", code) | |||
self.assertRaises(ValueError, code.replace, "{{r}}", "n", | |||
recursive=True) | |||
self.assertRaises(ValueError, code.replace, "{{r}}", "n", | |||
recursive=False) | |||
code2 = parse("{{a|{{b}}|{{c|d={{f}}}}}}") | |||
code2.replace(code2.get(0).params[0].value.get(0), "x", recursive=True) | |||
code2.replace("{{f}}", "y", recursive=True) | |||
self.assertEqual("{{a|x|{{c|d=y}}}}", code2) | |||
self.assertRaises(ValueError, code2.replace, "y", "z", recursive=False) | |||
meth = lambda code, *args, **kw: code.replace(*args, **kw) | |||
expected = [ | |||
"{{a}}xz[[y]]{{e}}", "dcdffe", "{{a|x|{{c|d=y}}}}", | |||
"{{a}}wx{{f}}{{g}}z", "{{a|x|{{f|{{g}}=y}}}}", | |||
"here cd ab a {{template}}}}"] | |||
self._test_search(meth, expected) | |||
def test_append(self): | |||
"""test Wikicode.append()""" | |||
@@ -197,18 +231,25 @@ class TestWikicode(TreeEqualityTestCase): | |||
def test_remove(self): | |||
"""test Wikicode.remove()""" | |||
code = parse("{{a}}{{b}}{{c}}{{d}}") | |||
code.remove("{{b}}", recursive=True) | |||
code.remove(code.get(1), recursive=True) | |||
self.assertEqual("{{a}}{{d}}", code) | |||
self.assertRaises(ValueError, code.remove, "{{r}}", recursive=True) | |||
self.assertRaises(ValueError, code.remove, "{{r}}", recursive=False) | |||
code2 = parse("{{a|{{b}}|{{c|d={{f}}{{h}}}}}}") | |||
code2.remove(code2.get(0).params[0].value.get(0), recursive=True) | |||
code2.remove("{{f}}", recursive=True) | |||
self.assertEqual("{{a||{{c|d={{h}}}}}}", code2) | |||
self.assertRaises(ValueError, code2.remove, "{{h}}", recursive=False) | |||
meth = lambda code, obj, value, **kw: code.remove(obj, **kw) | |||
expected = [ | |||
"{{a}}{{c}}", "", "{{a||{{c|d=}}}}", "{{a}}{{f}}", | |||
"{{a||{{f|{{g}}=}}}}", "here a {{template}}}}" | |||
] | |||
self._test_search(meth, expected) | |||
def test_matches(self): | |||
"""test Wikicode.matches()""" | |||
code1 = parse("Cleanup") | |||
code2 = parse("\nstub<!-- TODO: make more specific -->") | |||
self.assertTrue(code1.matches("Cleanup")) | |||
self.assertTrue(code1.matches("cleanup")) | |||
self.assertTrue(code1.matches(" cleanup\n")) | |||
self.assertFalse(code1.matches("CLEANup")) | |||
self.assertFalse(code1.matches("Blah")) | |||
self.assertTrue(code2.matches("stub")) | |||
self.assertTrue(code2.matches("Stub<!-- no, it's fine! -->")) | |||
self.assertFalse(code2.matches("StuB")) | |||
def test_filter_family(self): | |||
"""test the Wikicode.i?filter() family of functions""" | |||
@@ -219,11 +260,11 @@ class TestWikicode(TreeEqualityTestCase): | |||
code = parse("a{{b}}c[[d]]{{{e}}}{{f}}[[g]]") | |||
for func in (code.filter, ifilter(code)): | |||
self.assertEqual(["a", "{{b}}", "c", "[[d]]", "{{{e}}}", "{{f}}", | |||
"[[g]]"], func()) | |||
self.assertEqual(["a", "{{b}}", "b", "c", "[[d]]", "d", "{{{e}}}", | |||
"e", "{{f}}", "f", "[[g]]", "g"], func()) | |||
self.assertEqual(["{{{e}}}"], func(forcetype=Argument)) | |||
self.assertIs(code.get(4), func(forcetype=Argument)[0]) | |||
self.assertEqual(["a", "c"], func(forcetype=Text)) | |||
self.assertEqual(list("abcdefg"), func(forcetype=Text)) | |||
self.assertEqual([], func(forcetype=Heading)) | |||
self.assertRaises(TypeError, func, forcetype=True) | |||
@@ -235,11 +276,12 @@ class TestWikicode(TreeEqualityTestCase): | |||
self.assertEqual(["{{{e}}}"], get_filter("arguments")) | |||
self.assertIs(code.get(4), get_filter("arguments")[0]) | |||
self.assertEqual([], get_filter("comments")) | |||
self.assertEqual([], get_filter("external_links")) | |||
self.assertEqual([], get_filter("headings")) | |||
self.assertEqual([], get_filter("html_entities")) | |||
self.assertEqual([], get_filter("tags")) | |||
self.assertEqual(["{{b}}", "{{f}}"], get_filter("templates")) | |||
self.assertEqual(["a", "c"], get_filter("text")) | |||
self.assertEqual(list("abcdefg"), get_filter("text")) | |||
self.assertEqual(["[[d]]", "[[g]]"], get_filter("wikilinks")) | |||
code2 = parse("{{a|{{b}}|{{c|d={{f}}{{h}}}}}}") | |||
@@ -252,13 +294,13 @@ class TestWikicode(TreeEqualityTestCase): | |||
code3 = parse("{{foobar}}{{FOO}}{{baz}}{{bz}}") | |||
for func in (code3.filter, ifilter(code3)): | |||
self.assertEqual(["{{foobar}}", "{{FOO}}"], func(matches=r"foo")) | |||
self.assertEqual(["{{foobar}}", "{{FOO}}"], func(recursive=False, matches=r"foo")) | |||
self.assertEqual(["{{foobar}}", "{{FOO}}"], | |||
func(matches=r"^{{foo.*?}}")) | |||
func(recursive=False, matches=r"^{{foo.*?}}")) | |||
self.assertEqual(["{{foobar}}"], | |||
func(matches=r"^{{foo.*?}}", flags=re.UNICODE)) | |||
self.assertEqual(["{{baz}}", "{{bz}}"], func(matches=r"^{{b.*?z")) | |||
self.assertEqual(["{{baz}}"], func(matches=r"^{{b.+?z}}")) | |||
func(recursive=False, matches=r"^{{foo.*?}}", flags=re.UNICODE)) | |||
self.assertEqual(["{{baz}}", "{{bz}}"], func(recursive=False, matches=r"^{{b.*?z")) | |||
self.assertEqual(["{{baz}}"], func(recursive=False, matches=r"^{{b.+?z}}")) | |||
self.assertEqual(["{{a|{{b}}|{{c|d={{f}}{{h}}}}}}"], | |||
code2.filter_templates(recursive=False)) | |||
@@ -0,0 +1,473 @@ | |||
name: basic | |||
label: basic external link | |||
input: "http://example.com/" | |||
output: [ExternalLinkOpen(brackets=False), Text(text="http://example.com/"), ExternalLinkClose()] | |||
--- | |||
name: basic_brackets | |||
label: basic external link in brackets | |||
input: "[http://example.com/]" | |||
output: [ExternalLinkOpen(brackets=True), Text(text="http://example.com/"), ExternalLinkClose()] | |||
--- | |||
name: brackets_space | |||
label: basic external link in brackets, with a space after | |||
input: "[http://example.com/ ]" | |||
output: [ExternalLinkOpen(brackets=True), Text(text="http://example.com/"), ExternalLinkSeparator(), ExternalLinkClose()] | |||
--- | |||
name: brackets_title | |||
label: basic external link in brackets, with a title | |||
input: "[http://example.com/ Example]" | |||
output: [ExternalLinkOpen(brackets=True), Text(text="http://example.com/"), ExternalLinkSeparator(), Text(text="Example"), ExternalLinkClose()] | |||
--- | |||
name: brackets_multiword_title | |||
label: basic external link in brackets, with a multi-word title | |||
input: "[http://example.com/ Example Web Page]" | |||
output: [ExternalLinkOpen(brackets=True), Text(text="http://example.com/"), ExternalLinkSeparator(), Text(text="Example Web Page"), ExternalLinkClose()] | |||
--- | |||
name: brackets_adjacent | |||
label: three adjacent bracket-enclosed external links | |||
input: "[http://foo.com/ Foo][http://bar.com/ Bar]\n[http://baz.com/ Baz]" | |||
output: [ExternalLinkOpen(brackets=True), Text(text="http://foo.com/"), ExternalLinkSeparator(), Text(text="Foo"), ExternalLinkClose(), ExternalLinkOpen(brackets=True), Text(text="http://bar.com/"), ExternalLinkSeparator(), Text(text="Bar"), ExternalLinkClose(), Text(text="\n"), ExternalLinkOpen(brackets=True), Text(text="http://baz.com/"), ExternalLinkSeparator(), Text(text="Baz"), ExternalLinkClose()] | |||
--- | |||
name: brackets_newline_before | |||
label: bracket-enclosed link with a newline before the title | |||
input: "[http://example.com/ \nExample]" | |||
output: [Text(text="["), ExternalLinkOpen(brackets=False), Text(text="http://example.com/"), ExternalLinkClose(), Text(text=" \nExample]")] | |||
--- | |||
name: brackets_newline_inside | |||
label: bracket-enclosed link with a newline in the title | |||
input: "[http://example.com/ Example \nWeb Page]" | |||
output: [Text(text="["), ExternalLinkOpen(brackets=False), Text(text="http://example.com/"), ExternalLinkClose(), Text(text=" Example \nWeb Page]")] | |||
--- | |||
name: brackets_newline_after | |||
label: bracket-enclosed link with a newline after the title | |||
input: "[http://example.com/ Example\n]" | |||
output: [Text(text="["), ExternalLinkOpen(brackets=False), Text(text="http://example.com/"), ExternalLinkClose(), Text(text=" Example\n]")] | |||
--- | |||
name: brackets_space_before | |||
label: bracket-enclosed link with a space before the URL | |||
input: "[ http://example.com Example]" | |||
output: [Text(text="[ "), ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose(), Text(text=" Example]")] | |||
--- | |||
name: brackets_title_like_url | |||
label: bracket-enclosed link with a title that looks like a URL | |||
input: "[http://example.com http://example.com]" | |||
output: [ExternalLinkOpen(brackets=True), Text(text="http://example.com"), ExternalLinkSeparator(), Text(text="http://example.com"), ExternalLinkClose()] | |||
--- | |||
name: brackets_recursive | |||
label: bracket-enclosed link with a bracket-enclosed link as the title | |||
input: "[http://example.com [http://example.com]]" | |||
output: [ExternalLinkOpen(brackets=True), Text(text="http://example.com"), ExternalLinkSeparator(), Text(text="[http://example.com"), ExternalLinkClose(), Text(text="]")] | |||
--- | |||
name: period_after | |||
label: a period after a free link that is excluded | |||
input: "http://example.com." | |||
output: [ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose(), Text(text=".")] | |||
--- | |||
name: colons_after | |||
label: colons after a free link that are excluded | |||
input: "http://example.com/foo:bar.:;baz!?," | |||
output: [ExternalLinkOpen(brackets=False), Text(text="http://example.com/foo:bar.:;baz"), ExternalLinkClose(), Text(text="!?,")] | |||
--- | |||
name: close_paren_after_excluded | |||
label: a closing parenthesis after a free link that is excluded | |||
input: "http://example.)com)" | |||
output: [ExternalLinkOpen(brackets=False), Text(text="http://example.)com"), ExternalLinkClose(), Text(text=")")] | |||
--- | |||
name: close_paren_after_included | |||
label: a closing parenthesis after a free link that is included because of an opening parenthesis in the URL | |||
input: "http://example.(com)" | |||
output: [ExternalLinkOpen(brackets=False), Text(text="http://example.(com)"), ExternalLinkClose()] | |||
--- | |||
name: open_bracket_inside | |||
label: an open bracket inside a free link that causes it to be ended abruptly | |||
input: "http://foobar[baz.com" | |||
output: [ExternalLinkOpen(brackets=False), Text(text="http://foobar"), ExternalLinkClose(), Text(text="[baz.com")] | |||
--- | |||
name: brackets_period_after | |||
label: a period after a bracket-enclosed link that is included | |||
input: "[http://example.com. Example]" | |||
output: [ExternalLinkOpen(brackets=True), Text(text="http://example.com."), ExternalLinkSeparator(), Text(text="Example"), ExternalLinkClose()] | |||
--- | |||
name: brackets_colons_after | |||
label: colons after a bracket-enclosed link that are included | |||
input: "[http://example.com/foo:bar.:;baz!?, Example]" | |||
output: [ExternalLinkOpen(brackets=True), Text(text="http://example.com/foo:bar.:;baz!?,"), ExternalLinkSeparator(), Text(text="Example"), ExternalLinkClose()] | |||
--- | |||
name: brackets_close_paren_after_included | |||
label: a closing parenthesis after a bracket-enclosed link that is included | |||
input: "[http://example.)com) Example]" | |||
output: [ExternalLinkOpen(brackets=True), Text(text="http://example.)com)"), ExternalLinkSeparator(), Text(text="Example"), ExternalLinkClose()] | |||
--- | |||
name: brackets_close_paren_after_included_2 | |||
label: a closing parenthesis after a bracket-enclosed link that is also included | |||
input: "[http://example.(com) Example]" | |||
output: [ExternalLinkOpen(brackets=True), Text(text="http://example.(com)"), ExternalLinkSeparator(), Text(text="Example"), ExternalLinkClose()] | |||
--- | |||
name: brackets_open_bracket_inside | |||
label: an open bracket inside a bracket-enclosed link that is also included | |||
input: "[http://foobar[baz.com Example]" | |||
output: [ExternalLinkOpen(brackets=True), Text(text="http://foobar[baz.com"), ExternalLinkSeparator(), Text(text="Example"), ExternalLinkClose()] | |||
--- | |||
name: adjacent_space | |||
label: two free links separated by a space | |||
input: "http://example.com http://example.com" | |||
output: [ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose(), Text(text=" "), ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose()] | |||
--- | |||
name: adjacent_newline | |||
label: two free links separated by a newline | |||
input: "http://example.com\nhttp://example.com" | |||
output: [ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose(), Text(text="\n"), ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose()] | |||
--- | |||
name: adjacent_close_bracket | |||
label: two free links separated by a close bracket | |||
input: "http://example.com]http://example.com" | |||
output: [ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose(), Text(text="]"), ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose()] | |||
--- | |||
name: html_entity_in_url | |||
label: a HTML entity parsed correctly inside a free link | |||
input: "http://exa mple.com/" | |||
output: [ExternalLinkOpen(brackets=False), Text(text="http://exa"), HTMLEntityStart(), Text(text="nbsp"), HTMLEntityEnd(), Text(text="mple.com/"), ExternalLinkClose()] | |||
--- | |||
name: template_in_url | |||
label: a template parsed correctly inside a free link | |||
input: "http://exa{{template}}mple.com/" | |||
output: [ExternalLinkOpen(brackets=False), Text(text="http://exa"), TemplateOpen(), Text(text="template"), TemplateClose(), Text(text="mple.com/"), ExternalLinkClose()] | |||
--- | |||
name: argument_in_url | |||
label: an argument parsed correctly inside a free link | |||
input: "http://exa{{{argument}}}mple.com/" | |||
output: [ExternalLinkOpen(brackets=False), Text(text="http://exa"), ArgumentOpen(), Text(text="argument"), ArgumentClose(), Text(text="mple.com/"), ExternalLinkClose()] | |||
--- | |||
name: wikilink_in_url | |||
label: a wikilink that destroys a free link | |||
input: "http://exa[[wikilink]]mple.com/" | |||
output: [ExternalLinkOpen(brackets=False), Text(text="http://exa"), ExternalLinkClose(), WikilinkOpen(), Text(text="wikilink"), WikilinkClose(), Text(text="mple.com/")] | |||
--- | |||
name: external_link_in_url | |||
label: a bracketed link that destroys a free link | |||
input: "http://exa[http://example.com/]mple.com/" | |||
output: [ExternalLinkOpen(brackets=False), Text(text="http://exa"), ExternalLinkClose(), ExternalLinkOpen(brackets=True), Text(text="http://example.com/"), ExternalLinkClose(), Text(text="mple.com/")] | |||
--- | |||
name: spaces_padding | |||
label: spaces padding a free link | |||
input: " http://example.com " | |||
output: [Text(text=" "), ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose(), Text(text=" ")] | |||
--- | |||
name: text_and_spaces_padding | |||
label: text and spaces padding a free link | |||
input: "x http://example.com x" | |||
output: [Text(text="x "), ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose(), Text(text=" x")] | |||
--- | |||
name: template_before | |||
label: a template before a free link | |||
input: "{{foo}}http://example.com" | |||
output: [TemplateOpen(), Text(text="foo"), TemplateClose(), ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose()] | |||
--- | |||
name: spaces_padding_no_slashes | |||
label: spaces padding a free link with no slashes after the colon | |||
input: " mailto:example@example.com " | |||
output: [Text(text=" "), ExternalLinkOpen(brackets=False), Text(text="mailto:example@example.com"), ExternalLinkClose(), Text(text=" ")] | |||
--- | |||
name: text_and_spaces_padding_no_slashes | |||
label: text and spaces padding a free link with no slashes after the colon | |||
input: "x mailto:example@example.com x" | |||
output: [Text(text="x "), ExternalLinkOpen(brackets=False), Text(text="mailto:example@example.com"), ExternalLinkClose(), Text(text=" x")] | |||
--- | |||
name: template_before_no_slashes | |||
label: a template before a free link with no slashes after the colon | |||
input: "{{foo}}mailto:example@example.com" | |||
output: [TemplateOpen(), Text(text="foo"), TemplateClose(), ExternalLinkOpen(brackets=False), Text(text="mailto:example@example.com"), ExternalLinkClose()] | |||
--- | |||
name: no_slashes | |||
label: a free link with no slashes after the colon | |||
input: "mailto:example@example.com" | |||
output: [ExternalLinkOpen(brackets=False), Text(text="mailto:example@example.com"), ExternalLinkClose()] | |||
--- | |||
name: slashes_optional | |||
label: a free link using a scheme that doesn't need slashes, but has them anyway | |||
input: "mailto://example@example.com" | |||
output: [ExternalLinkOpen(brackets=False), Text(text="mailto://example@example.com"), ExternalLinkClose()] | |||
--- | |||
name: short | |||
label: a very short free link | |||
input: "mailto://abc" | |||
output: [ExternalLinkOpen(brackets=False), Text(text="mailto://abc"), ExternalLinkClose()] | |||
--- | |||
name: slashes_missing | |||
label: slashes missing from a free link with a scheme that requires them | |||
input: "http:example@example.com" | |||
output: [Text(text="http:example@example.com")] | |||
--- | |||
name: no_scheme_but_slashes | |||
label: no scheme in a free link, but slashes (protocol-relative free links are not supported) | |||
input: "//example.com" | |||
output: [Text(text="//example.com")] | |||
--- | |||
name: no_scheme_but_colon | |||
label: no scheme in a free link, but a colon | |||
input: " :example.com" | |||
output: [Text(text=" :example.com")] | |||
--- | |||
name: no_scheme_but_colon_and_slashes | |||
label: no scheme in a free link, but a colon and slashes | |||
input: " ://example.com" | |||
output: [Text(text=" ://example.com")] | |||
--- | |||
name: fake_scheme_no_slashes | |||
label: a nonexistent scheme in a free link, without slashes | |||
input: "fake:example.com" | |||
output: [Text(text="fake:example.com")] | |||
--- | |||
name: fake_scheme_slashes | |||
label: a nonexistent scheme in a free link, with slashes | |||
input: "fake://example.com" | |||
output: [Text(text="fake://example.com")] | |||
--- | |||
name: fake_scheme_brackets_no_slashes | |||
label: a nonexistent scheme in a bracketed link, without slashes | |||
input: "[fake:example.com]" | |||
output: [Text(text="[fake:example.com]")] | |||
--- | |||
name: fake_scheme_brackets_slashes | |||
label: #=a nonexistent scheme in a bracketed link, with slashes | |||
input: "[fake://example.com]" | |||
output: [Text(text="[fake://example.com]")] | |||
--- | |||
name: interrupted_scheme | |||
label: an otherwise valid scheme with something in the middle of it, in a free link | |||
input: "ht?tp://example.com" | |||
output: [Text(text="ht?tp://example.com")] | |||
--- | |||
name: interrupted_scheme_brackets | |||
label: an otherwise valid scheme with something in the middle of it, in a bracketed link | |||
input: "[ht?tp://example.com]" | |||
output: [Text(text="[ht?tp://example.com]")] | |||
--- | |||
name: no_slashes_brackets | |||
label: no slashes after the colon in a bracketed link | |||
input: "[mailto:example@example.com Example]" | |||
output: [ExternalLinkOpen(brackets=True), Text(text="mailto:example@example.com"), ExternalLinkSeparator(), Text(text="Example"), ExternalLinkClose()] | |||
--- | |||
name: space_before_no_slashes_brackets | |||
label: a space before a bracketed link with no slashes after the colon | |||
input: "[ mailto:example@example.com Example]" | |||
output: [Text(text="[ "), ExternalLinkOpen(brackets=False), Text(text="mailto:example@example.com"), ExternalLinkClose(), Text(text=" Example]")] | |||
--- | |||
name: slashes_optional_brackets | |||
label: a bracketed link using a scheme that doesn't need slashes, but has them anyway | |||
input: "[mailto://example@example.com Example]" | |||
output: [ExternalLinkOpen(brackets=True), Text(text="mailto://example@example.com"), ExternalLinkSeparator(), Text(text="Example"), ExternalLinkClose()] | |||
--- | |||
name: short_brackets | |||
label: a very short link in brackets | |||
input: "[mailto://abc Example]" | |||
output: [ExternalLinkOpen(brackets=True), Text(text="mailto://abc"), ExternalLinkSeparator(), Text(text="Example"), ExternalLinkClose()] | |||
--- | |||
name: slashes_missing_brackets | |||
label: slashes missing from a scheme that requires them in a bracketed link | |||
input: "[http:example@example.com Example]" | |||
output: [Text(text="[http:example@example.com Example]")] | |||
--- | |||
name: protcol_relative | |||
label: a protocol-relative link (in brackets) | |||
input: "[//example.com Example]" | |||
output: [ExternalLinkOpen(brackets=True), Text(text="//example.com"), ExternalLinkSeparator(), Text(text="Example"), ExternalLinkClose()] | |||
--- | |||
name: scheme_missing_but_colon_brackets | |||
label: scheme missing from a bracketed link, but with a colon | |||
input: "[:example.com Example]" | |||
output: [Text(text="[:example.com Example]")] | |||
--- | |||
name: scheme_missing_but_colon_slashes_brackets | |||
label: scheme missing from a bracketed link, but with a colon and slashes | |||
input: "[://example.com Example]" | |||
output: [Text(text="[://example.com Example]")] | |||
--- | |||
name: unclosed_protocol_relative | |||
label: an unclosed protocol-relative bracketed link | |||
input: "[//example.com" | |||
output: [Text(text="[//example.com")] | |||
--- | |||
name: space_before_protcol_relative | |||
label: a space before a protocol-relative bracketed link | |||
input: "[ //example.com]" | |||
output: [Text(text="[ //example.com]")] | |||
--- | |||
name: unclosed_just_scheme | |||
label: an unclosed bracketed link, ending after the scheme | |||
input: "[http" | |||
output: [Text(text="[http")] | |||
--- | |||
name: unclosed_scheme_colon | |||
label: an unclosed bracketed link, ending after the colon | |||
input: "[http:" | |||
output: [Text(text="[http:")] | |||
--- | |||
name: unclosed_scheme_colon_slashes | |||
label: an unclosed bracketed link, ending after the slashes | |||
input: "[http://" | |||
output: [Text(text="[http://")] | |||
--- | |||
name: incomplete_bracket | |||
label: just an open bracket | |||
input: "[" | |||
output: [Text(text="[")] | |||
--- | |||
name: incomplete_scheme_colon | |||
label: a free link with just a scheme and a colon | |||
input: "http:" | |||
output: [Text(text="http:")] | |||
--- | |||
name: incomplete_scheme_colon_slashes | |||
label: a free link with just a scheme, colon, and slashes | |||
input: "http://" | |||
output: [Text(text="http://")] | |||
--- | |||
name: brackets_scheme_but_no_url | |||
label: brackets around a scheme and a colon | |||
input: "[mailto:]" | |||
output: [Text(text="[mailto:]")] | |||
--- | |||
name: brackets_scheme_slashes_but_no_url | |||
label: brackets around a scheme, colon, and slashes | |||
input: "[http://]" | |||
output: [Text(text="[http://]")] | |||
--- | |||
name: brackets_scheme_title_but_no_url | |||
label: brackets around a scheme, colon, and slashes, with a title | |||
input: "[http:// Example]" | |||
output: [Text(text="[http:// Example]")] |
@@ -117,6 +117,20 @@ output: [Text(text="&;")] | |||
--- | |||
name: invalid_partial_amp_pound | |||
label: invalid entities: just an ampersand, pound sign | |||
input: "&#" | |||
output: [Text(text="&#")] | |||
--- | |||
name: invalid_partial_amp_pound_x | |||
label: invalid entities: just an ampersand, pound sign, x | |||
input: "&#x" | |||
output: [Text(text="&#x")] | |||
--- | |||
name: invalid_partial_amp_pound_semicolon | |||
label: invalid entities: an ampersand, pound sign, and semicolon | |||
input: "&#;" | |||
@@ -12,6 +12,13 @@ output: [TemplateOpen(), ArgumentOpen(), ArgumentOpen(), Text(text="foo"), Argum | |||
--- | |||
name: link_in_template_name | |||
label: a wikilink inside a template name, which breaks the template | |||
input: "{{foo[[bar]]}}" | |||
output: [Text(text="{{foo"), WikilinkOpen(), Text(text="bar"), WikilinkClose(), Text(text="}}")] | |||
--- | |||
name: rich_heading | |||
label: a heading with templates/wikilinks in it | |||
input: "== Head{{ing}} [[with]] {{{funky|{{stuf}}}}} ==" | |||
@@ -33,6 +40,13 @@ output: [Text(text="&n"), CommentStart(), Text(text="foo"), CommentEnd(), Text(t | |||
--- | |||
name: rich_tags | |||
label: a HTML tag with tons of other things in it | |||
input: "{{dubious claim}}<ref name={{abc}} foo="bar {{baz}}" abc={{de}}f ghi=j{{k}}{{l}} \n mno = "{{p}} [[q]] {{r}}">[[Source]]</ref>" | |||
output: [TemplateOpen(), Text(text="dubious claim"), TemplateClose(), TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="name"), TagAttrEquals(), TemplateOpen(), Text(text="abc"), TemplateClose(), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="foo"), TagAttrEquals(), TagAttrQuote(), Text(text="bar "), TemplateOpen(), Text(text="baz"), TemplateClose(), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="abc"), TagAttrEquals(), TemplateOpen(), Text(text="de"), TemplateClose(), Text(text="f"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="ghi"), TagAttrEquals(), Text(text="j"), TemplateOpen(), Text(text="k"), TemplateClose(), TemplateOpen(), Text(text="l"), TemplateClose(), TagAttrStart(pad_first=" \n ", pad_before_eq=" ", pad_after_eq=" "), Text(text="mno"), TagAttrEquals(), TagAttrQuote(), TemplateOpen(), Text(text="p"), TemplateClose(), Text(text=" "), WikilinkOpen(), Text(text="q"), WikilinkClose(), Text(text=" "), TemplateOpen(), Text(text="r"), TemplateClose(), TagCloseOpen(padding=""), WikilinkOpen(), Text(text="Source"), WikilinkClose(), TagOpenClose(), Text(text="ref"), TagCloseClose()] | |||
--- | |||
name: wildcard | |||
label: a wildcard assortment of various things | |||
input: "{{{{{{{{foo}}bar|baz=biz}}buzz}}usr|{{bin}}}}" | |||
@@ -44,3 +58,17 @@ name: wildcard_redux | |||
label: an even wilder assortment of various things | |||
input: "{{a|b|{{c|[[d]]{{{e}}}}}}}[[f|{{{g}}}<!--h-->]]{{i|j= }}" | |||
output: [TemplateOpen(), Text(text="a"), TemplateParamSeparator(), Text(text="b"), TemplateParamSeparator(), TemplateOpen(), Text(text="c"), TemplateParamSeparator(), WikilinkOpen(), Text(text="d"), WikilinkClose(), ArgumentOpen(), Text(text="e"), ArgumentClose(), TemplateClose(), TemplateClose(), WikilinkOpen(), Text(text="f"), WikilinkSeparator(), ArgumentOpen(), Text(text="g"), ArgumentClose(), CommentStart(), Text(text="h"), CommentEnd(), WikilinkClose(), TemplateOpen(), Text(text="i"), TemplateParamSeparator(), Text(text="j"), TemplateParamEquals(), HTMLEntityStart(), Text(text="nbsp"), HTMLEntityEnd(), TemplateClose()] | |||
--- | |||
name: link_inside_dl | |||
label: an external link inside a def list, such that the external link is parsed | |||
input: ";;;mailto:example" | |||
output: [TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), ExternalLinkOpen(brackets=False), Text(text="mailto:example"), ExternalLinkClose()] | |||
--- | |||
name: link_inside_dl_2 | |||
label: an external link inside a def list, such that the external link is not parsed | |||
input: ";;;malito:example" | |||
output: [TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="malito"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="example")] |
@@ -0,0 +1,578 @@ | |||
name: basic | |||
label: a basic tag with an open and close | |||
input: "<ref></ref>" | |||
output: [TagOpenOpen(), Text(text="ref"), TagCloseOpen(padding=""), TagOpenClose(), Text(text="ref"), TagCloseClose()] | |||
--- | |||
name: basic_selfclosing | |||
label: a basic self-closing tag | |||
input: "<ref/>" | |||
output: [TagOpenOpen(), Text(text="ref"), TagCloseSelfclose(padding="")] | |||
--- | |||
name: content | |||
label: a tag with some content in the middle | |||
input: "<ref>this is a reference</ref>" | |||
output: [TagOpenOpen(), Text(text="ref"), TagCloseOpen(padding=""), Text(text="this is a reference"), TagOpenClose(), Text(text="ref"), TagCloseClose()] | |||
--- | |||
name: padded_open | |||
label: a tag with some padding in the open tag | |||
input: "<ref ></ref>" | |||
output: [TagOpenOpen(), Text(text="ref"), TagCloseOpen(padding=" "), TagOpenClose(), Text(text="ref"), TagCloseClose()] | |||
--- | |||
name: padded_close | |||
label: a tag with some padding in the close tag | |||
input: "<ref></ref >" | |||
output: [TagOpenOpen(), Text(text="ref"), TagCloseOpen(padding=""), TagOpenClose(), Text(text="ref "), TagCloseClose()] | |||
--- | |||
name: padded_selfclosing | |||
label: a self-closing tag with padding | |||
input: "<ref />" | |||
output: [TagOpenOpen(), Text(text="ref"), TagCloseSelfclose(padding=" ")] | |||
--- | |||
name: attribute | |||
label: a tag with a single attribute | |||
input: "<ref name></ref>" | |||
output: [TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="name"), TagCloseOpen(padding=""), TagOpenClose(), Text(text="ref"), TagCloseClose()] | |||
--- | |||
name: attribute_value | |||
label: a tag with a single attribute with a value | |||
input: "<ref name=foo></ref>" | |||
output: [TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="name"), TagAttrEquals(), Text(text="foo"), TagCloseOpen(padding=""), TagOpenClose(), Text(text="ref"), TagCloseClose()] | |||
--- | |||
name: attribute_quoted | |||
label: a tag with a single quoted attribute | |||
input: "<ref name="foo bar"></ref>" | |||
output: [TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="name"), TagAttrEquals(), TagAttrQuote(), Text(text="foo bar"), TagCloseOpen(padding=""), TagOpenClose(), Text(text="ref"), TagCloseClose()] | |||
--- | |||
name: attribute_hyphen | |||
label: a tag with a single attribute, containing a hyphen | |||
input: "<ref name=foo-bar></ref>" | |||
output: [TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="name"), TagAttrEquals(), Text(text="foo-bar"), TagCloseOpen(padding=""), TagOpenClose(), Text(text="ref"), TagCloseClose()] | |||
--- | |||
name: attribute_quoted_hyphen | |||
label: a tag with a single quoted attribute, containing a hyphen | |||
input: "<ref name="foo-bar"></ref>" | |||
output: [TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="name"), TagAttrEquals(), TagAttrQuote(), Text(text="foo-bar"), TagCloseOpen(padding=""), TagOpenClose(), Text(text="ref"), TagCloseClose()] | |||
--- | |||
name: attribute_selfclosing | |||
label: a self-closing tag with a single attribute | |||
input: "<ref name/>" | |||
output: [TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="name"), TagCloseSelfclose(padding="")] | |||
--- | |||
name: attribute_selfclosing_value | |||
label: a self-closing tag with a single attribute with a value | |||
input: "<ref name=foo/>" | |||
output: [TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="name"), TagAttrEquals(), Text(text="foo"), TagCloseSelfclose(padding="")] | |||
--- | |||
name: attribute_selfclosing_value_quoted | |||
label: a self-closing tag with a single quoted attribute | |||
input: "<ref name="foo"/>" | |||
output: [TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="name"), TagAttrEquals(), TagAttrQuote(), Text(text="foo"), TagCloseSelfclose(padding="")] | |||
--- | |||
name: nested_tag | |||
label: a tag nested within the attributes of another | |||
input: "<ref name=<span style="color: red;">foo</span>>citation</ref>" | |||
output: [TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="name"), TagAttrEquals(), TagOpenOpen(), Text(text="span"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="style"), TagAttrEquals(), TagAttrQuote(), Text(text="color: red;"), TagCloseOpen(padding=""), Text(text="foo"), TagOpenClose(), Text(text="span"), TagCloseClose(), TagCloseOpen(padding=""), Text(text="citation"), TagOpenClose(), Text(text="ref"), TagCloseClose()] | |||
--- | |||
name: nested_tag_quoted | |||
label: a tag nested within the attributes of another, quoted | |||
input: "<ref name="<span style="color: red;">foo</span>">citation</ref>" | |||
output: [TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="name"), TagAttrEquals(), TagAttrQuote(), TagOpenOpen(), Text(text="span"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="style"), TagAttrEquals(), TagAttrQuote(), Text(text="color: red;"), TagCloseOpen(padding=""), Text(text="foo"), TagOpenClose(), Text(text="span"), TagCloseClose(), TagCloseOpen(padding=""), Text(text="citation"), TagOpenClose(), Text(text="ref"), TagCloseClose()] | |||
--- | |||
name: nested_troll_tag | |||
label: a bogus tag that appears to be nested within the attributes of another | |||
input: "<ref name=</ ><//>>citation</ref>" | |||
output: [Text(text="<ref name=</ ><//>>citation</ref>")] | |||
--- | |||
name: nested_troll_tag_quoted | |||
label: a bogus tag that appears to be nested within the attributes of another, quoted | |||
input: "<ref name="</ ><//>">citation</ref>" | |||
output: [TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="name"), TagAttrEquals(), TagAttrQuote(), Text(text="</ ><//>"), TagCloseOpen(padding=""), Text(text="citation"), TagOpenClose(), Text(text="ref"), TagCloseClose()] | |||
--- | |||
name: invalid_space_begin_open | |||
label: invalid tag: a space at the beginning of the open tag | |||
input: "< ref>test</ref>" | |||
output: [Text(text="< ref>test</ref>")] | |||
--- | |||
name: invalid_space_begin_close | |||
label: invalid tag: a space at the beginning of the close tag | |||
input: "<ref>test</ ref>" | |||
output: [Text(text="<ref>test</ ref>")] | |||
--- | |||
name: valid_space_end | |||
label: valid tag: spaces at the ends of both the open and close tags | |||
input: "<ref >test</ref >" | |||
output: [TagOpenOpen(), Text(text="ref"), TagCloseOpen(padding=" "), Text(text="test"), TagOpenClose(), Text(text="ref "), TagCloseClose()] | |||
--- | |||
name: invalid_template_ends | |||
label: invalid tag: a template at the ends of both the open and close tags | |||
input: "<ref {{foo}}>test</ref {{foo}}>" | |||
output: [Text(text="<ref "), TemplateOpen(), Text(text="foo"), TemplateClose(), Text(text=">test</ref "), TemplateOpen(), Text(text="foo"), TemplateClose(), Text(text=">")] | |||
--- | |||
name: invalid_template_ends_nospace | |||
label: invalid tag: a template at the ends of both the open and close tags, without spacing | |||
input: "<ref {{foo}}>test</ref{{foo}}>" | |||
output: [Text(text="<ref "), TemplateOpen(), Text(text="foo"), TemplateClose(), Text(text=">test</ref"), TemplateOpen(), Text(text="foo"), TemplateClose(), Text(text=">")] | |||
--- | |||
name: valid_template_end_open | |||
label: valid tag: a template at the end of the open tag | |||
input: "<ref {{foo}}>test</ref>" | |||
output: [TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), TemplateOpen(), Text(text="foo"), TemplateClose(), TagCloseOpen(padding=""), Text(text="test"), TagOpenClose(), Text(text="ref"), TagCloseClose()] | |||
--- | |||
name: valid_template_end_open_space_end_close | |||
label: valid tag: a template at the end of the open tag; whitespace at the end of the close tag | |||
input: "<ref {{foo}}>test</ref\n>" | |||
output: [TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), TemplateOpen(), Text(text="foo"), TemplateClose(), TagCloseOpen(padding=""), Text(text="test"), TagOpenClose(), Text(text="ref\n"), TagCloseClose()] | |||
--- | |||
name: invalid_template_end_open_nospace | |||
label: invalid tag: a template at the end of the open tag, without spacing | |||
input: "<ref{{foo}}>test</ref>" | |||
output: [Text(text="<ref"), TemplateOpen(), Text(text="foo"), TemplateClose(), Text(text=">test</ref>")] | |||
--- | |||
name: invalid_template_start_close | |||
label: invalid tag: a template at the beginning of the close tag | |||
input: "<ref>test</{{foo}}ref>" | |||
output: [Text(text="<ref>test</"), TemplateOpen(), Text(text="foo"), TemplateClose(), Text(text="ref>")] | |||
--- | |||
name: invalid_template_start_open | |||
label: invalid tag: a template at the beginning of the open tag | |||
input: "<{{foo}}ref>test</ref>" | |||
output: [Text(text="<"), TemplateOpen(), Text(text="foo"), TemplateClose(), Text(text="ref>test</ref>")] | |||
--- | |||
name: unclosed_quote | |||
label: a quoted attribute that is never closed | |||
input: "<span style="foobar>stuff</span>" | |||
output: [TagOpenOpen(), Text(text="span"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="style"), TagAttrEquals(), Text(text="\"foobar"), TagCloseOpen(padding=""), Text(text="stuff"), TagOpenClose(), Text(text="span"), TagCloseClose()] | |||
--- | |||
name: fake_quote | |||
label: a fake quoted attribute | |||
input: "<span style="foo"bar>stuff</span>" | |||
output: [TagOpenOpen(), Text(text="span"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="style"), TagAttrEquals(), Text(text="\"foo\"bar"), TagCloseOpen(padding=""), Text(text="stuff"), TagOpenClose(), Text(text="span"), TagCloseClose()] | |||
--- | |||
name: fake_quote_complex | |||
label: a fake quoted attribute, with spaces and templates and links | |||
input: "<span style="foo {{bar}}\n[[baz]]"buzz >stuff</span>" | |||
output: [TagOpenOpen(), Text(text="span"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="style"), TagAttrEquals(), Text(text="\"foo"), TagAttrStart(pad_first=" ", pad_before_eq="\n", pad_after_eq=""), TemplateOpen(), Text(text="bar"), TemplateClose(), TagAttrStart(pad_first="", pad_before_eq=" ", pad_after_eq=""), WikilinkOpen(), Text(text="baz"), WikilinkClose(), Text(text="\"buzz"), TagCloseOpen(padding=""), Text(text="stuff"), TagOpenClose(), Text(text="span"), TagCloseClose()] | |||
--- | |||
name: incomplete_lbracket | |||
label: incomplete tags: just a left bracket | |||
input: "<" | |||
output: [Text(text="<")] | |||
--- | |||
name: incomplete_lbracket_junk | |||
label: incomplete tags: just a left bracket, surrounded by stuff | |||
input: "foo<bar" | |||
output: [Text(text="foo<bar")] | |||
--- | |||
name: incomplete_unclosed_open | |||
label: incomplete tags: an unclosed open tag | |||
input: "junk <ref" | |||
output: [Text(text="junk <ref")] | |||
--- | |||
name: incomplete_unclosed_open_space | |||
label: incomplete tags: an unclosed open tag, space | |||
input: "junk <ref " | |||
output: [Text(text="junk <ref ")] | |||
--- | |||
name: incomplete_unclosed_open_unnamed_attr | |||
label: incomplete tags: an unclosed open tag, unnamed attribute | |||
input: "junk <ref name" | |||
output: [Text(text="junk <ref name")] | |||
--- | |||
name: incomplete_unclosed_open_attr_equals | |||
label: incomplete tags: an unclosed open tag, attribute, equal sign | |||
input: "junk <ref name=" | |||
output: [Text(text="junk <ref name=")] | |||
--- | |||
name: incomplete_unclosed_open_attr_equals_quoted | |||
label: incomplete tags: an unclosed open tag, attribute, equal sign, quote | |||
input: "junk <ref name="" | |||
output: [Text(text="junk <ref name=\"")] | |||
--- | |||
name: incomplete_unclosed_open_attr | |||
label: incomplete tags: an unclosed open tag, attribute with a key/value | |||
input: "junk <ref name=foo" | |||
output: [Text(text="junk <ref name=foo")] | |||
--- | |||
name: incomplete_unclosed_open_attr_quoted | |||
label: incomplete tags: an unclosed open tag, attribute with a key/value, quoted | |||
input: "junk <ref name="foo"" | |||
output: [Text(text="junk <ref name=\"foo\"")] | |||
--- | |||
name: incomplete_open | |||
label: incomplete tags: an open tag | |||
input: "junk <ref>" | |||
output: [Text(text="junk <ref>")] | |||
--- | |||
name: incomplete_open_unnamed_attr | |||
label: incomplete tags: an open tag, unnamed attribute | |||
input: "junk <ref name>" | |||
output: [Text(text="junk <ref name>")] | |||
--- | |||
name: incomplete_open_attr_equals | |||
label: incomplete tags: an open tag, attribute, equal sign | |||
input: "junk <ref name=>" | |||
output: [Text(text="junk <ref name=>")] | |||
--- | |||
name: incomplete_open_attr | |||
label: incomplete tags: an open tag, attribute with a key/value | |||
input: "junk <ref name=foo>" | |||
output: [Text(text="junk <ref name=foo>")] | |||
--- | |||
name: incomplete_open_attr_quoted | |||
label: incomplete tags: an open tag, attribute with a key/value, quoted | |||
input: "junk <ref name="foo">" | |||
output: [Text(text="junk <ref name=\"foo\">")] | |||
--- | |||
name: incomplete_open_text | |||
label: incomplete tags: an open tag, text | |||
input: "junk <ref>foo" | |||
output: [Text(text="junk <ref>foo")] | |||
--- | |||
name: incomplete_open_attr_text | |||
label: incomplete tags: an open tag, attribute with a key/value, text | |||
input: "junk <ref name=foo>bar" | |||
output: [Text(text="junk <ref name=foo>bar")] | |||
--- | |||
name: incomplete_open_text_lbracket | |||
label: incomplete tags: an open tag, text, left open bracket | |||
input: "junk <ref>bar<" | |||
output: [Text(text="junk <ref>bar<")] | |||
--- | |||
name: incomplete_open_text_lbracket_slash | |||
label: incomplete tags: an open tag, text, left bracket, slash | |||
input: "junk <ref>bar</" | |||
output: [Text(text="junk <ref>bar</")] | |||
--- | |||
name: incomplete_open_text_unclosed_close | |||
label: incomplete tags: an open tag, text, unclosed close | |||
input: "junk <ref>bar</ref" | |||
output: [Text(text="junk <ref>bar</ref")] | |||
--- | |||
name: incomplete_open_text_wrong_close | |||
label: incomplete tags: an open tag, text, wrong close | |||
input: "junk <ref>bar</span>" | |||
output: [Text(text="junk <ref>bar</span>")] | |||
--- | |||
name: incomplete_unclosed_close | |||
label: incomplete tags: an unclosed close tag | |||
input: "junk </" | |||
output: [Text(text="junk </")] | |||
--- | |||
name: incomplete_unclosed_close_text | |||
label: incomplete tags: an unclosed close tag, with text | |||
input: "junk </br" | |||
output: [Text(text="junk </br")] | |||
--- | |||
name: incomplete_close | |||
label: incomplete tags: a close tag | |||
input: "junk </ref>" | |||
output: [Text(text="junk </ref>")] | |||
--- | |||
name: incomplete_no_tag_name_open | |||
label: incomplete tags: no tag name within brackets; just an open | |||
input: "junk <>" | |||
output: [Text(text="junk <>")] | |||
--- | |||
name: incomplete_no_tag_name_selfclosing | |||
label: incomplete tags: no tag name within brackets; self-closing | |||
input: "junk < />" | |||
output: [Text(text="junk < />")] | |||
--- | |||
name: incomplete_no_tag_name_open_close | |||
label: incomplete tags: no tag name within brackets; open and close | |||
input: "junk <></>" | |||
output: [Text(text="junk <></>")] | |||
--- | |||
name: backslash_premature_before | |||
label: a backslash before a quote before a space | |||
input: "<foo attribute="this is\\" quoted">blah</foo>" | |||
output: [TagOpenOpen(), Text(text="foo"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="attribute"), TagAttrEquals(), TagAttrQuote(), Text(text="this is\\\" quoted"), TagCloseOpen(padding=""), Text(text="blah"), TagOpenClose(), Text(text="foo"), TagCloseClose()] | |||
--- | |||
name: backslash_premature_after | |||
label: a backslash before a quote after a space | |||
input: "<foo attribute="this is \\"quoted">blah</foo>" | |||
output: [TagOpenOpen(), Text(text="foo"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="attribute"), TagAttrEquals(), TagAttrQuote(), Text(text="this is \\\"quoted"), TagCloseOpen(padding=""), Text(text="blah"), TagOpenClose(), Text(text="foo"), TagCloseClose()] | |||
--- | |||
name: backslash_premature_middle | |||
label: a backslash before a quote in the middle of a word | |||
input: "<foo attribute="this i\\"s quoted">blah</foo>" | |||
output: [TagOpenOpen(), Text(text="foo"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="attribute"), TagAttrEquals(), TagAttrQuote(), Text(text="this i\\\"s quoted"), TagCloseOpen(padding=""), Text(text="blah"), TagOpenClose(), Text(text="foo"), TagCloseClose()] | |||
--- | |||
name: backslash_adjacent | |||
label: escaped quotes next to unescaped quotes | |||
input: "<foo attribute="\\"this is quoted\\"">blah</foo>" | |||
output: [TagOpenOpen(), Text(text="foo"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="attribute"), TagAttrEquals(), TagAttrQuote(), Text(text="\\\"this is quoted\\\""), TagCloseOpen(padding=""), Text(text="blah"), TagOpenClose(), Text(text="foo"), TagCloseClose()] | |||
--- | |||
name: backslash_endquote | |||
label: backslashes before the end quote, causing the attribute to become unquoted | |||
input: "<foo attribute="this_is quoted\\">blah</foo>" | |||
output: [TagOpenOpen(), Text(text="foo"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="attribute"), TagAttrEquals(), Text(text="\"this_is"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="quoted\\\""), TagCloseOpen(padding=""), Text(text="blah"), TagOpenClose(), Text(text="foo"), TagCloseClose()] | |||
--- | |||
name: backslash_double | |||
label: two adjacent backslashes, which do *not* affect the quote | |||
input: "<foo attribute="this is\\\\" quoted">blah</foo>" | |||
output: [TagOpenOpen(), Text(text="foo"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="attribute"), TagAttrEquals(), TagAttrQuote(), Text(text="this is\\\\"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="quoted\""), TagCloseOpen(padding=""), Text(text="blah"), TagOpenClose(), Text(text="foo"), TagCloseClose()] | |||
--- | |||
name: backslash_triple | |||
label: three adjacent backslashes, which do *not* affect the quote | |||
input: "<foo attribute="this is\\\\\\" quoted">blah</foo>" | |||
output: [TagOpenOpen(), Text(text="foo"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="attribute"), TagAttrEquals(), TagAttrQuote(), Text(text="this is\\\\\\"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="quoted\""), TagCloseOpen(padding=""), Text(text="blah"), TagOpenClose(), Text(text="foo"), TagCloseClose()] | |||
--- | |||
name: backslash_unaffecting | |||
label: backslashes near quotes, but not immediately adjacent, thus having no effect | |||
input: "<foo attribute="\\quote\\d" also="quote\\d\\">blah</foo>" | |||
output: [TagOpenOpen(), Text(text="foo"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="attribute"), TagAttrEquals(), TagAttrQuote(), Text(text="\\quote\\d"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="also"), TagAttrEquals(), Text(text="\"quote\\d\\\""), TagCloseOpen(padding=""), Text(text="blah"), TagOpenClose(), Text(text="foo"), TagCloseClose()] | |||
--- | |||
name: unparsable | |||
label: a tag that should not be put through the normal parser | |||
input: "{{t1}}<nowiki>{{t2}}</nowiki>{{t3}}" | |||
output: [TemplateOpen(), Text(text="t1"), TemplateClose(), TagOpenOpen(), Text(text="nowiki"), TagCloseOpen(padding=""), Text(text="{{t2}}"), TagOpenClose(), Text(text="nowiki"), TagCloseClose(), TemplateOpen(), Text(text="t3"), TemplateClose()] | |||
--- | |||
name: unparsable_complex | |||
label: a tag that should not be put through the normal parser; lots of stuff inside | |||
input: "{{t1}}<pre>{{t2}}\n==Heading==\nThis is some text with a [[page|link]].</pre>{{t3}}" | |||
output: [TemplateOpen(), Text(text="t1"), TemplateClose(), TagOpenOpen(), Text(text="pre"), TagCloseOpen(padding=""), Text(text="{{t2}}\n==Heading==\nThis is some text with a [[page|link]]."), TagOpenClose(), Text(text="pre"), TagCloseClose(), TemplateOpen(), Text(text="t3"), TemplateClose()] | |||
--- | |||
name: unparsable_attributed | |||
label: a tag that should not be put through the normal parser; parsed attributes | |||
input: "{{t1}}<nowiki attr=val attr2="{{val2}}">{{t2}}</nowiki>{{t3}}" | |||
output: [TemplateOpen(), Text(text=u't1'), TemplateClose(), TagOpenOpen(), Text(text="nowiki"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="attr"), TagAttrEquals(), Text(text="val"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="attr2"), TagAttrEquals(), TagAttrQuote(), TemplateOpen(), Text(text="val2"), TemplateClose(), TagCloseOpen(padding=""), Text(text="{{t2}}"), TagOpenClose(), Text(text="nowiki"), TagCloseClose(), TemplateOpen(), Text(text="t3"), TemplateClose()] | |||
--- | |||
name: unparsable_incomplete | |||
label: a tag that should not be put through the normal parser; incomplete | |||
input: "{{t1}}<nowiki>{{t2}}{{t3}}" | |||
output: [TemplateOpen(), Text(text="t1"), TemplateClose(), Text(text="<nowiki>"), TemplateOpen(), Text(text="t2"), TemplateClose(), TemplateOpen(), Text(text="t3"), TemplateClose()] | |||
--- | |||
name: unparsable_entity | |||
label: a HTML entity inside unparsable text is still parsed | |||
input: "{{t1}}<nowiki>{{t2}} {{t3}}</nowiki>{{t4}}" | |||
output: [TemplateOpen(), Text(text="t1"), TemplateClose(), TagOpenOpen(), Text(text="nowiki"), TagCloseOpen(padding=""), Text(text="{{t2}}"), HTMLEntityStart(), Text(text="nbsp"), HTMLEntityEnd(), Text(text="{{t3}}"), TagOpenClose(), Text(text="nowiki"), TagCloseClose(), TemplateOpen(), Text(text="t4"), TemplateClose()] | |||
--- | |||
name: unparsable_entity_incomplete | |||
label: an incomplete HTML entity inside unparsable text | |||
input: "<nowiki>&</nowiki>" | |||
output: [TagOpenOpen(), Text(text="nowiki"), TagCloseOpen(padding=""), Text(text="&"), TagOpenClose(), Text(text="nowiki"), TagCloseClose()] | |||
--- | |||
name: unparsable_entity_incomplete_2 | |||
label: an incomplete HTML entity inside unparsable text | |||
input: "<nowiki>&" | |||
output: [Text(text="<nowiki>&")] | |||
--- | |||
name: single_open_close | |||
label: a tag that supports being single; both an open and a close tag | |||
input: "foo<li>bar{{baz}}</li>" | |||
output: [Text(text="foo"), TagOpenOpen(), Text(text="li"), TagCloseOpen(padding=""), Text(text="bar"), TemplateOpen(), Text(text="baz"), TemplateClose(), TagOpenClose(), Text(text="li"), TagCloseClose()] | |||
--- | |||
name: single_open | |||
label: a tag that supports being single; just an open tag | |||
input: "foo<li>bar{{baz}}" | |||
output: [Text(text="foo"), TagOpenOpen(), Text(text="li"), TagCloseSelfclose(padding="", implicit=True), Text(text="bar"), TemplateOpen(), Text(text="baz"), TemplateClose()] | |||
--- | |||
name: single_selfclose | |||
label: a tag that supports being single; a self-closing tag | |||
input: "foo<li/>bar{{baz}}" | |||
output: [Text(text="foo"), TagOpenOpen(), Text(text="li"), TagCloseSelfclose(padding=""), Text(text="bar"), TemplateOpen(), Text(text="baz"), TemplateClose()] | |||
--- | |||
name: single_close | |||
label: a tag that supports being single; just a close tag | |||
input: "foo</li>bar{{baz}}" | |||
output: [Text(text="foo</li>bar"), TemplateOpen(), Text(text="baz"), TemplateClose()] | |||
--- | |||
name: single_only_open_close | |||
label: a tag that can only be single; both an open and a close tag | |||
input: "foo<br>bar{{baz}}</br>" | |||
output: [Text(text="foo"), TagOpenOpen(), Text(text="br"), TagCloseSelfclose(padding="", implicit=True), Text(text="bar"), TemplateOpen(), Text(text="baz"), TemplateClose(), TagOpenOpen(invalid=True), Text(text="br"), TagCloseSelfclose(padding="", implicit=True)] | |||
--- | |||
name: single_only_open | |||
label: a tag that can only be single; just an open tag | |||
input: "foo<br>bar{{baz}}" | |||
output: [Text(text="foo"), TagOpenOpen(), Text(text="br"), TagCloseSelfclose(padding="", implicit=True), Text(text="bar"), TemplateOpen(), Text(text="baz"), TemplateClose()] | |||
--- | |||
name: single_only_selfclose | |||
label: a tag that can only be single; a self-closing tag | |||
input: "foo<br/>bar{{baz}}" | |||
output: [Text(text="foo"), TagOpenOpen(), Text(text="br"), TagCloseSelfclose(padding=""), Text(text="bar"), TemplateOpen(), Text(text="baz"), TemplateClose()] | |||
--- | |||
name: single_only_close | |||
label: a tag that can only be single; just a close tag | |||
input: "foo</br>bar{{baz}}" | |||
output: [Text(text="foo"), TagOpenOpen(invalid=True), Text(text="br"), TagCloseSelfclose(padding="", implicit=True), Text(text="bar"), TemplateOpen(), Text(text="baz"), TemplateClose()] | |||
--- | |||
name: single_only_double | |||
label: a tag that can only be single; a tag with backslashes at the beginning and end | |||
input: "foo</br/>bar{{baz}}" | |||
output: [Text(text="foo"), TagOpenOpen(invalid=True), Text(text="br"), TagCloseSelfclose(padding=""), Text(text="bar"), TemplateOpen(), Text(text="baz"), TemplateClose()] | |||
--- | |||
name: single_only_close_attribute | |||
label: a tag that can only be single; presented as a close tag with an attribute | |||
input: "</br id="break">" | |||
output: [TagOpenOpen(invalid=True), Text(text="br"), TagAttrStart(pad_first=" ", pad_after_eq="", pad_before_eq=""), Text(text="id"), TagAttrEquals(), TagAttrQuote(), Text(text="break"), TagCloseSelfclose(padding="", implicit=True)] | |||
--- | |||
name: capitalization | |||
label: caps should be ignored within tag names | |||
input: "<NoWiKi>{{test}}</nOwIkI>" | |||
output: [TagOpenOpen(), Text(text="NoWiKi"), TagCloseOpen(padding=""), Text(text="{{test}}"), TagOpenClose(), Text(text="nOwIkI"), TagCloseClose()] |
@@ -0,0 +1,523 @@ | |||
name: basic_italics | |||
label: basic italic text | |||
input: "''text''" | |||
output: [TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="text"), TagOpenClose(), Text(text="i"), TagCloseClose()] | |||
--- | |||
name: basic_bold | |||
label: basic bold text | |||
input: "'''text'''" | |||
output: [TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), Text(text="text"), TagOpenClose(), Text(text="b"), TagCloseClose()] | |||
--- | |||
name: basic_ul | |||
label: basic unordered list | |||
input: "*text" | |||
output: [TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="text")] | |||
--- | |||
name: basic_ol | |||
label: basic ordered list | |||
input: "#text" | |||
output: [TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="text")] | |||
--- | |||
name: basic_dt | |||
label: basic description term | |||
input: ";text" | |||
output: [TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="text")] | |||
--- | |||
name: basic_dd | |||
label: basic description item | |||
input: ":text" | |||
output: [TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="text")] | |||
--- | |||
name: basic_hr | |||
label: basic horizontal rule | |||
input: "----" | |||
output: [TagOpenOpen(wiki_markup="----"), Text(text="hr"), TagCloseSelfclose()] | |||
--- | |||
name: complex_italics | |||
label: italics with a lot in them | |||
input: "''this is a test of [[Italic text|italics]] with {{plenty|of|stuff}}''" | |||
output: [TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="this is a"), HTMLEntityStart(), Text(text="nbsp"), HTMLEntityEnd(), Text(text="test of "), WikilinkOpen(), Text(text="Italic text"), WikilinkSeparator(), Text(text="italics"), WikilinkClose(), Text(text=" with "), TemplateOpen(), Text(text="plenty"), TemplateParamSeparator(), Text(text="of"), TemplateParamSeparator(), Text(text="stuff"), TemplateClose(), TagOpenClose(), Text(text="i"), TagCloseClose()] | |||
--- | |||
name: multiline_italics | |||
label: italics spanning mulitple lines | |||
input: "foo\nbar''testing\ntext\nspanning\n\n\n\n\nmultiple\nlines''foo\n\nbar" | |||
output: [Text(text="foo\nbar"), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="testing\ntext\nspanning\n\n\n\n\nmultiple\nlines"), TagOpenClose(), Text(text="i"), TagCloseClose(), Text(text="foo\n\nbar")] | |||
--- | |||
name: unending_italics | |||
label: italics without an ending tag | |||
input: "''unending formatting!" | |||
output: [Text(text="''unending formatting!")] | |||
--- | |||
name: misleading_italics_end | |||
label: italics with something that looks like an end but isn't | |||
input: "''this is 'not' the en'd'<nowiki>''</nowiki>" | |||
output: [Text(text="''this is 'not' the en'd'"), TagOpenOpen(), Text(text="nowiki"), TagCloseOpen(padding=""), Text(text="''"), TagOpenClose(), Text(text="nowiki"), TagCloseClose()] | |||
] | |||
--- | |||
name: italics_start_outside_end_inside | |||
label: italics that start outside a link and end inside it | |||
input: "''foo[[bar|baz'']]spam" | |||
output: [Text(text="''foo"), WikilinkOpen(), Text(text="bar"), WikilinkSeparator(), Text(text="baz''"), WikilinkClose(), Text(text="spam")] | |||
--- | |||
name: italics_start_inside_end_outside | |||
label: italics that start inside a link and end outside it | |||
input: "[[foo|''bar]]baz''spam" | |||
output: [Text(text="[[foo|"), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="bar]]baz"), TagOpenClose(), Text(text="i"), TagCloseClose(), Text(text="spam")] | |||
--- | |||
name: complex_bold | |||
label: bold with a lot in it | |||
input: "'''this is a test of [[Bold text|bold]] with {{plenty|of|stuff}}'''" | |||
output: [TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), Text(text="this is a"), HTMLEntityStart(), Text(text="nbsp"), HTMLEntityEnd(), Text(text="test of "), WikilinkOpen(), Text(text="Bold text"), WikilinkSeparator(), Text(text="bold"), WikilinkClose(), Text(text=" with "), TemplateOpen(), Text(text="plenty"), TemplateParamSeparator(), Text(text="of"), TemplateParamSeparator(), Text(text="stuff"), TemplateClose(), TagOpenClose(), Text(text="b"), TagCloseClose()] | |||
--- | |||
name: multiline_bold | |||
label: bold spanning mulitple lines | |||
input: "foo\nbar'''testing\ntext\nspanning\n\n\n\n\nmultiple\nlines'''foo\n\nbar" | |||
output: [Text(text="foo\nbar"), TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), Text(text="testing\ntext\nspanning\n\n\n\n\nmultiple\nlines"), TagOpenClose(), Text(text="b"), TagCloseClose(), Text(text="foo\n\nbar")] | |||
--- | |||
name: unending_bold | |||
label: bold without an ending tag | |||
input: "'''unending formatting!" | |||
output: [Text(text="'''unending formatting!")] | |||
--- | |||
name: misleading_bold_end | |||
label: bold with something that looks like an end but isn't | |||
input: "'''this is 'not' the en''d'<nowiki>'''</nowiki>" | |||
output: [Text(text="'"), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="this is 'not' the en"), TagOpenClose(), Text(text="i"), TagCloseClose(), Text(text="d'"), TagOpenOpen(), Text(text="nowiki"), TagCloseOpen(padding=""), Text(text="'''"), TagOpenClose(), Text(text="nowiki"), TagCloseClose()] | |||
--- | |||
name: bold_start_outside_end_inside | |||
label: bold that start outside a link and end inside it | |||
input: "'''foo[[bar|baz''']]spam" | |||
output: [Text(text="'''foo"), WikilinkOpen(), Text(text="bar"), WikilinkSeparator(), Text(text="baz'''"), WikilinkClose(), Text(text="spam")] | |||
--- | |||
name: bold_start_inside_end_outside | |||
label: bold that start inside a link and end outside it | |||
input: "[[foo|'''bar]]baz'''spam" | |||
output: [Text(text="[[foo|"), TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), Text(text="bar]]baz"), TagOpenClose(), Text(text="b"), TagCloseClose(), Text(text="spam")] | |||
--- | |||
name: bold_and_italics | |||
label: bold and italics together | |||
input: "this is '''''bold and italic text'''''!" | |||
output: [Text(text="this is "), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), Text(text="bold and italic text"), TagOpenClose(), Text(text="b"), TagCloseClose(), TagOpenClose(), Text(text="i"), TagCloseClose(), Text(text="!")] | |||
--- | |||
name: both_then_bold | |||
label: text that starts bold/italic, then is just bold | |||
input: "'''''both''bold'''" | |||
output: [TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="both"), TagOpenClose(), Text(text="i"), TagCloseClose(), Text(text="bold"), TagOpenClose(), Text(text="b"), TagCloseClose()] | |||
--- | |||
name: both_then_italics | |||
label: text that starts bold/italic, then is just italic | |||
input: "'''''both'''italics''" | |||
output: [TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), Text(text="both"), TagOpenClose(), Text(text="b"), TagCloseClose(), Text(text="italics"), TagOpenClose(), Text(text="i"), TagCloseClose()] | |||
--- | |||
name: bold_then_both | |||
label: text that starts just bold, then is bold/italic | |||
input: "'''bold''both'''''" | |||
output: [TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), Text(text="bold"), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="both"), TagOpenClose(), Text(text="i"), TagCloseClose(), TagOpenClose(), Text(text="b"), TagCloseClose()] | |||
--- | |||
name: italics_then_both | |||
label: text that starts just italic, then is bold/italic | |||
input: "''italics'''both'''''" | |||
output: [TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="italics"), TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), Text(text="both"), TagOpenClose(), Text(text="b"), TagCloseClose(), TagOpenClose(), Text(text="i"), TagCloseClose()] | |||
--- | |||
name: italics_then_bold | |||
label: text that starts italic, then is bold | |||
input: "none''italics'''''bold'''none" | |||
output: [Text(text="none"), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="italics"), TagOpenClose(), Text(text="i"), TagCloseClose(), TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), Text(text="bold"), TagOpenClose(), Text(text="b"), TagCloseClose(), Text(text="none")] | |||
--- | |||
name: bold_then_italics | |||
label: text that starts bold, then is italic | |||
input: "none'''bold'''''italics''none" | |||
output: [Text(text="none"), TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), Text(text="bold"), TagOpenClose(), Text(text="b"), TagCloseClose(), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="italics"), TagOpenClose(), Text(text="i"), TagCloseClose(), Text(text="none")] | |||
--- | |||
name: five_three | |||
label: five ticks to open, three to close (bold) | |||
input: "'''''foobar'''" | |||
output: [Text(text="''"), TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), Text(text="foobar"), TagOpenClose(), Text(text="b"), TagCloseClose()] | |||
--- | |||
name: five_two | |||
label: five ticks to open, two to close (bold) | |||
input: "'''''foobar''" | |||
output: [Text(text="'''"), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="foobar"), TagOpenClose(), Text(text="i"), TagCloseClose()] | |||
--- | |||
name: four | |||
label: four ticks | |||
input: "foo ''''bar'''' baz" | |||
output: [Text(text="foo '"), TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), Text(text="bar'"), TagOpenClose(), Text(text="b"), TagCloseClose(), Text(text=" baz")] | |||
--- | |||
name: four_two | |||
label: four ticks to open, two to close | |||
input: "foo ''''bar'' baz" | |||
output: [Text(text="foo ''"), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="bar"), TagOpenClose(), Text(text="i"), TagCloseClose(), Text(text=" baz")] | |||
--- | |||
name: two_three | |||
label: two ticks to open, three to close | |||
input: "foo ''bar''' baz" | |||
output: [Text(text="foo "), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="bar'"), TagOpenClose(), Text(text="i"), TagCloseClose(), Text(text=" baz")] | |||
--- | |||
name: two_four | |||
label: two ticks to open, four to close | |||
input: "foo ''bar'''' baz" | |||
output: [Text(text="foo "), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="bar''"), TagOpenClose(), Text(text="i"), TagCloseClose(), Text(text=" baz")] | |||
--- | |||
name: two_three_two | |||
label: two ticks to open, three to close, two afterwards | |||
input: "foo ''bar''' baz''" | |||
output: [Text(text="foo "), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="bar''' baz"), TagOpenClose(), Text(text="i"), TagCloseClose()] | |||
--- | |||
name: two_four_four | |||
label: two ticks to open, four to close, four afterwards | |||
input: "foo ''bar'''' baz''''" | |||
output: [Text(text="foo ''bar'"), TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), Text(text=" baz'"), TagOpenClose(), Text(text="b"), TagCloseClose()] | |||
--- | |||
name: seven | |||
label: seven ticks | |||
input: "'''''''seven'''''''" | |||
output: [Text(text="''"), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), Text(text="seven''"), TagOpenClose(), Text(text="b"), TagCloseClose(), TagOpenClose(), Text(text="i"), TagCloseClose()] | |||
--- | |||
name: complex_ul | |||
label: ul with a lot in it | |||
input: "* this is a test of an [[Unordered list|ul]] with {{plenty|of|stuff}}" | |||
output: [TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text=" this is a"), HTMLEntityStart(), Text(text="nbsp"), HTMLEntityEnd(), Text(text="test of an "), WikilinkOpen(), Text(text="Unordered list"), WikilinkSeparator(), Text(text="ul"), WikilinkClose(), Text(text=" with "), TemplateOpen(), Text(text="plenty"), TemplateParamSeparator(), Text(text="of"), TemplateParamSeparator(), Text(text="stuff"), TemplateClose()] | |||
--- | |||
name: ul_multiline_template | |||
label: ul with a template that spans multiple lines | |||
input: "* this has a template with a {{line|\nbreak}}\nthis is not part of the list" | |||
output: [TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text=" this has a template with a "), TemplateOpen(), Text(text="line"), TemplateParamSeparator(), Text(text="\nbreak"), TemplateClose(), Text(text="\nthis is not part of the list")] | |||
--- | |||
name: ul_adjacent | |||
label: multiple adjacent uls | |||
input: "a\n*b\n*c\nd\n*e\nf" | |||
output: [Text(text="a\n"), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="b\n"), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="c\nd\n"), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="e\nf")] | |||
--- | |||
name: ul_depths | |||
label: multiple adjacent uls, with differing depths | |||
input: "*a\n**b\n***c\n********d\n**e\nf\n***g" | |||
output: [TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="a\n"), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="b\n"), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="c\n"), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="d\n"), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="e\nf\n"), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="g")] | |||
--- | |||
name: ul_space_before | |||
label: uls with space before them | |||
input: "foo *bar\n *baz\n*buzz" | |||
output: [Text(text="foo *bar\n *baz\n"), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="buzz")] | |||
--- | |||
name: ul_interruption | |||
label: high-depth ul with something blocking it | |||
input: "**f*oobar" | |||
output: [TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="f*oobar")] | |||
--- | |||
name: complex_ol | |||
label: ol with a lot in it | |||
input: "# this is a test of an [[Ordered list|ol]] with {{plenty|of|stuff}}" | |||
output: [TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text=" this is a"), HTMLEntityStart(), Text(text="nbsp"), HTMLEntityEnd(), Text(text="test of an "), WikilinkOpen(), Text(text="Ordered list"), WikilinkSeparator(), Text(text="ol"), WikilinkClose(), Text(text=" with "), TemplateOpen(), Text(text="plenty"), TemplateParamSeparator(), Text(text="of"), TemplateParamSeparator(), Text(text="stuff"), TemplateClose()] | |||
--- | |||
name: ol_multiline_template | |||
label: ol with a template that spans moltiple lines | |||
input: "# this has a template with a {{line|\nbreak}}\nthis is not part of the list" | |||
output: [TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text=" this has a template with a "), TemplateOpen(), Text(text="line"), TemplateParamSeparator(), Text(text="\nbreak"), TemplateClose(), Text(text="\nthis is not part of the list")] | |||
--- | |||
name: ol_adjacent | |||
label: moltiple adjacent ols | |||
input: "a\n#b\n#c\nd\n#e\nf" | |||
output: [Text(text="a\n"), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="b\n"), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="c\nd\n"), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="e\nf")] | |||
--- | |||
name: ol_depths | |||
label: moltiple adjacent ols, with differing depths | |||
input: "#a\n##b\n###c\n########d\n##e\nf\n###g" | |||
output: [TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="a\n"), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="b\n"), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="c\n"), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="d\n"), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="e\nf\n"), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="g")] | |||
--- | |||
name: ol_space_before | |||
label: ols with space before them | |||
input: "foo #bar\n #baz\n#buzz" | |||
output: [Text(text="foo #bar\n #baz\n"), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="buzz")] | |||
--- | |||
name: ol_interruption | |||
label: high-depth ol with something blocking it | |||
input: "##f#oobar" | |||
output: [TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="f#oobar")] | |||
--- | |||
name: ul_ol_mix | |||
label: a mix of adjacent uls and ols | |||
input: "*a\n*#b\n*##c\n*##*#*#*d\n*#e\nf\n##*g" | |||
output: [TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="a\n"), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="b\n"), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="c\n"), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="d\n"), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="e\nf\n"), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="g")] | |||
--- | |||
name: complex_dt | |||
label: dt with a lot in it | |||
input: "; this is a test of an [[description term|dt]] with {{plenty|of|stuff}}" | |||
output: [TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text=" this is a"), HTMLEntityStart(), Text(text="nbsp"), HTMLEntityEnd(), Text(text="test of an "), WikilinkOpen(), Text(text="description term"), WikilinkSeparator(), Text(text="dt"), WikilinkClose(), Text(text=" with "), TemplateOpen(), Text(text="plenty"), TemplateParamSeparator(), Text(text="of"), TemplateParamSeparator(), Text(text="stuff"), TemplateClose()] | |||
--- | |||
name: dt_multiline_template | |||
label: dt with a template that spans mdttiple lines | |||
input: "; this has a template with a {{line|\nbreak}}\nthis is not part of the list" | |||
output: [TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text=" this has a template with a "), TemplateOpen(), Text(text="line"), TemplateParamSeparator(), Text(text="\nbreak"), TemplateClose(), Text(text="\nthis is not part of the list")] | |||
--- | |||
name: dt_adjacent | |||
label: mdttiple adjacent dts | |||
input: "a\n;b\n;c\nd\n;e\nf" | |||
output: [Text(text="a\n"), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="b\n"), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="c\nd\n"), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="e\nf")] | |||
--- | |||
name: dt_depths | |||
label: mdttiple adjacent dts, with differing depths | |||
input: ";a\n;;b\n;;;c\n;;;;;;;;d\n;;e\nf\n;;;g" | |||
output: [TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="a\n"), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="b\n"), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="c\n"), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="d\n"), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="e\nf\n"), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="g")] | |||
--- | |||
name: dt_space_before | |||
label: dts with space before them | |||
input: "foo ;bar\n ;baz\n;buzz" | |||
output: [Text(text="foo ;bar\n ;baz\n"), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="buzz")] | |||
--- | |||
name: dt_interruption | |||
label: high-depth dt with something blocking it | |||
input: ";;f;oobar" | |||
output: [TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="f;oobar")] | |||
--- | |||
name: complex_dd | |||
label: dd with a lot in it | |||
input: ": this is a test of an [[description item|dd]] with {{plenty|of|stuff}}" | |||
output: [TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text=" this is a"), HTMLEntityStart(), Text(text="nbsp"), HTMLEntityEnd(), Text(text="test of an "), WikilinkOpen(), Text(text="description item"), WikilinkSeparator(), Text(text="dd"), WikilinkClose(), Text(text=" with "), TemplateOpen(), Text(text="plenty"), TemplateParamSeparator(), Text(text="of"), TemplateParamSeparator(), Text(text="stuff"), TemplateClose()] | |||
--- | |||
name: dd_multiline_template | |||
label: dd with a template that spans mddtiple lines | |||
input: ": this has a template with a {{line|\nbreak}}\nthis is not part of the list" | |||
output: [TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text=" this has a template with a "), TemplateOpen(), Text(text="line"), TemplateParamSeparator(), Text(text="\nbreak"), TemplateClose(), Text(text="\nthis is not part of the list")] | |||
--- | |||
name: dd_adjacent | |||
label: mddtiple adjacent dds | |||
input: "a\n:b\n:c\nd\n:e\nf" | |||
output: [Text(text="a\n"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="b\n"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="c\nd\n"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="e\nf")] | |||
--- | |||
name: dd_depths | |||
label: mddtiple adjacent dds, with differing depths | |||
input: ":a\n::b\n:::c\n::::::::d\n::e\nf\n:::g" | |||
output: [TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="a\n"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="b\n"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="c\n"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="d\n"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="e\nf\n"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="g")] | |||
--- | |||
name: dd_space_before | |||
label: dds with space before them | |||
input: "foo :bar\n :baz\n:buzz" | |||
output: [Text(text="foo :bar\n :baz\n"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="buzz")] | |||
--- | |||
name: dd_interruption | |||
label: high-depth dd with something blocking it | |||
input: "::f:oobar" | |||
output: [TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="f:oobar")] | |||
--- | |||
name: dt_dd_mix | |||
label: a mix of adjacent dts and dds | |||
input: ";a\n;:b\n;::c\n;::;:;:;d\n;:e\nf\n::;g" | |||
output: [TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="a\n"), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="b\n"), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="c\n"), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="d\n"), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="e\nf\n"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="g")] | |||
--- | |||
name: dt_dd_mix2 | |||
label: the correct usage of a dt/dd unit, as in a dl | |||
input: ";foo:bar:baz" | |||
output: [TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="foo"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="bar:baz")] | |||
--- | |||
name: dt_dd_mix3 | |||
label: another example of correct (but strange) dt/dd usage | |||
input: ":;;::foo:bar:baz" | |||
output: [TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="foo"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="bar:baz")] | |||
--- | |||
name: ul_ol_dt_dd_mix | |||
label: an assortment of uls, ols, dds, and dts | |||
input: ";:#*foo\n:#*;foo\n#*;:foo\n*;:#foo" | |||
output: [TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="foo\n"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="foo\n"), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="foo\n"), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="foo")] | |||
--- | |||
name: hr_text_before | |||
label: text before an otherwise-valid hr | |||
input: "foo----" | |||
output: [Text(text="foo----")] | |||
--- | |||
name: hr_text_after | |||
label: text after a valid hr | |||
input: "----bar" | |||
output: [TagOpenOpen(wiki_markup="----"), Text(text="hr"), TagCloseSelfclose(), Text(text="bar")] | |||
--- | |||
name: hr_text_before_after | |||
label: text at both ends of an otherwise-valid hr | |||
input: "foo----bar" | |||
output: [Text(text="foo----bar")] | |||
--- | |||
name: hr_newlines | |||
label: newlines surrounding a valid hr | |||
input: "foo\n----\nbar" | |||
output: [Text(text="foo\n"), TagOpenOpen(wiki_markup="----"), Text(text="hr"), TagCloseSelfclose(), Text(text="\nbar")] | |||
--- | |||
name: hr_adjacent | |||
label: two adjacent hrs | |||
input: "----\n----" | |||
output: [TagOpenOpen(wiki_markup="----"), Text(text="hr"), TagCloseSelfclose(), Text(text="\n"), TagOpenOpen(wiki_markup="----"), Text(text="hr"), TagCloseSelfclose()] | |||
--- | |||
name: hr_adjacent_space | |||
label: two adjacent hrs, with a space before the second one, making it invalid | |||
input: "----\n ----" | |||
output: [TagOpenOpen(wiki_markup="----"), Text(text="hr"), TagCloseSelfclose(), Text(text="\n ----")] | |||
--- | |||
name: hr_short | |||
label: an invalid three-hyphen-long hr | |||
input: "---" | |||
output: [Text(text="---")] | |||
--- | |||
name: hr_long | |||
label: a very long, valid hr | |||
input: "------------------------------------------" | |||
output: [TagOpenOpen(wiki_markup="------------------------------------------"), Text(text="hr"), TagCloseSelfclose()] | |||
--- | |||
name: hr_interruption_short | |||
label: a hr that is interrupted, making it invalid | |||
input: "---x-" | |||
output: [Text(text="---x-")] | |||
--- | |||
name: hr_interruption_long | |||
label: a hr that is interrupted, but the first part remains valid because it is long enough | |||
input: "----x--" | |||
output: [TagOpenOpen(wiki_markup="----"), Text(text="hr"), TagCloseSelfclose(), Text(text="x--")] | |||
--- | |||
name: nowiki_cancel | |||
label: a nowiki tag before a list causes it to not be parsed | |||
input: "<nowiki />* Unordered list" | |||
output: [TagOpenOpen(), Text(text="nowiki"), TagCloseSelfclose(padding=" "), Text(text="* Unordered list")] |
@@ -23,3 +23,10 @@ name: unicode2 | |||
label: additional unicode check for non-BMP codepoints | |||
input: "𐌲𐌿𐍄𐌰𐍂𐌰𐌶𐌳𐌰" | |||
output: [Text(text="𐌲𐌿𐍄𐌰𐍂𐌰𐌶𐌳𐌰")] | |||
--- | |||
name: large | |||
label: a lot of text, requiring multiple textbuffer blocks in the C tokenizer | |||
input: "ZWfsZYcZyhGbkDYJiguJuuhsNyHGFkFhnjkbLJyXIygTHqcXdhsDkEOTSIKYlBiohLIkiXxvyebUyCGvvBcYqFdtcftGmaAanKXEIyYSEKlTfEEbdGhdePVwVImOyKiHSzAEuGyEVRIKPZaNjQsYqpqARIQfvAklFtQyTJVGlLwjJIxYkiqmHBmdOvTyNqJRbMvouoqXRyOhYDwowtkcZGSOcyzVxibQdnzhDYbrgbatUrlOMRvFSzmLWHRihtXnddwYadPgFWUOxAzAgddJVDXHerawdkrRuWaEXfuwQSkQUmLEJUmrgXDVlXCpciaisfuOUjBldElygamkkXbewzLucKRnAEBimIIotXeslRRhnqQjrypnLQvvdCsKFWPVTZaHvzJMFEahDHWcCbyXgxFvknWjhVfiLSDuFhGoFxqSvhjnnRZLmCMhmWeOgSoanDEInKTWHnbpKyUlabLppITDFFxyWKAnUYJQIcmYnrvMmzmtYvsbCYbebgAhMFVVFAKUSvlkLFYluDpbpBaNFWyfXTaOdSBrfiHDTWGBTUCXMqVvRCIMrEjWpQaGsABkioGnveQWqBTDdRQlxQiUipwfyqAocMddXqdvTHhEwjEzMkOSWVPjJvDtClhYwpvRztPmRKCSpGIpXQqrYtTLmShFdpKtOxGtGOZYIdyUGPjdmyvhJTQMtgYJWUUZnecRjBfQXsyWQWikyONySLzLEqRFqcJYdRNFcGwWZtfZasfFWcvdsHRXoqKlKYihRAOJdrPBDdxksXFwKceQVncmFXfUfBsNgjKzoObVExSnRnjegeEhqxXzPmFcuiasViAFeaXrAxXhSfSyCILkKYpjxNeKynUmdcGAbwRwRnlAFbOSCafmzXddiNpLCFTHBELvArdXFpKUGpSHRekhrMedMRNkQzmSyFKjVwiWwCvbNWjgxJRzYeRxHiCCRMXktmKBxbxGZvOpvZIJOwvGIxcBLzsMFlDqAMLtScdsJtrbIUAvKfcdChXGnBzIxGxXMgxJhayrziaCswdpjJJJhkaYnGhHXqZwOzHFdhhUIEtfjERdLaSPRTDDMHpQtonNaIgXUYhjdbnnKppfMBxgNSOOXJAPtFjfAKnrRDrumZBpNhxMstqjTGBViRkDqbTdXYUirsedifGYzZpQkvdNhtFTOPgsYXYCwZHLcSLSfwfpQKtWfZuRUUryHJsbVsAOQcIJdSKKlOvCeEjUQNRPHKXuBJUjPuaAJJxcDMqyaufqfVwUmHLdjeYZzSiiGLHOTCInpVAalbXXTMLugLiwFiyPSuSFiyJUKVrWjbZAHaJtZnQmnvorRrxdPKThqXzNgTjszQiCoMczRnwGYJMERUWGXFyrSbAqsHmLwLlnJOJoXNsjVehQjVOpQOQJAZWwFZBlgyVIplzLTlFwumPgBLYrUIAJAcmvHPGfHfWQguCjfTYzxYfbohaLFAPwxFRrNuCdCzLlEbuhyYjCmuDBTJDMCdLpNRVqEALjnPSaBPsKWRCKNGwEMFpiEWbYZRwaMopjoUuBUvMpvyLfsPKDrfQLiFOQIWPtLIMoijUEUYfhykHrSKbTtrvjwIzHdWZDVwLIpNkloCqpzIsErxxKAFuFEjikWNYChqYqVslXMtoSWzNhbMuxYbzLfJIcPGoUeGPkGyPQNhDyrjgdKekzftFrRPTuyLYqCArkDcWHTrjPQHfoThBNnTQyMwLEWxEnBXLtzJmFVLGEPrdbEwlXpgYfnVnWoNXgPQKKyiXifpvrmJATzQOzYwFhliiYxlbnsEPKbHYUfJLrwYPfSUwTIHiEvBFMrEtVmqJobfcwsiiEudTIiAnrtuywgKLOiMYbEIOAOJdOXqroPjWnQQcTNxFvkIEIsuHLyhSqSphuSmlvknzydQEnebOreeZwOouXYKlObAkaWHhOdTFLoMCHOWrVKeXjcniaxtgCziKEqWOZUWHJQpcDJzYnnduDZrmxgjZroBRwoPBUTJMYipsgJwbTSlvMyXXdAmiEWGMiQxhGvHGPLOKeTxNaLnFVbWpiYIVyqN" | |||
output: [Text(text="ZWfsZYcZyhGbkDYJiguJuuhsNyHGFkFhnjkbLJyXIygTHqcXdhsDkEOTSIKYlBiohLIkiXxvyebUyCGvvBcYqFdtcftGmaAanKXEIyYSEKlTfEEbdGhdePVwVImOyKiHSzAEuGyEVRIKPZaNjQsYqpqARIQfvAklFtQyTJVGlLwjJIxYkiqmHBmdOvTyNqJRbMvouoqXRyOhYDwowtkcZGSOcyzVxibQdnzhDYbrgbatUrlOMRvFSzmLWHRihtXnddwYadPgFWUOxAzAgddJVDXHerawdkrRuWaEXfuwQSkQUmLEJUmrgXDVlXCpciaisfuOUjBldElygamkkXbewzLucKRnAEBimIIotXeslRRhnqQjrypnLQvvdCsKFWPVTZaHvzJMFEahDHWcCbyXgxFvknWjhVfiLSDuFhGoFxqSvhjnnRZLmCMhmWeOgSoanDEInKTWHnbpKyUlabLppITDFFxyWKAnUYJQIcmYnrvMmzmtYvsbCYbebgAhMFVVFAKUSvlkLFYluDpbpBaNFWyfXTaOdSBrfiHDTWGBTUCXMqVvRCIMrEjWpQaGsABkioGnveQWqBTDdRQlxQiUipwfyqAocMddXqdvTHhEwjEzMkOSWVPjJvDtClhYwpvRztPmRKCSpGIpXQqrYtTLmShFdpKtOxGtGOZYIdyUGPjdmyvhJTQMtgYJWUUZnecRjBfQXsyWQWikyONySLzLEqRFqcJYdRNFcGwWZtfZasfFWcvdsHRXoqKlKYihRAOJdrPBDdxksXFwKceQVncmFXfUfBsNgjKzoObVExSnRnjegeEhqxXzPmFcuiasViAFeaXrAxXhSfSyCILkKYpjxNeKynUmdcGAbwRwRnlAFbOSCafmzXddiNpLCFTHBELvArdXFpKUGpSHRekhrMedMRNkQzmSyFKjVwiWwCvbNWjgxJRzYeRxHiCCRMXktmKBxbxGZvOpvZIJOwvGIxcBLzsMFlDqAMLtScdsJtrbIUAvKfcdChXGnBzIxGxXMgxJhayrziaCswdpjJJJhkaYnGhHXqZwOzHFdhhUIEtfjERdLaSPRTDDMHpQtonNaIgXUYhjdbnnKppfMBxgNSOOXJAPtFjfAKnrRDrumZBpNhxMstqjTGBViRkDqbTdXYUirsedifGYzZpQkvdNhtFTOPgsYXYCwZHLcSLSfwfpQKtWfZuRUUryHJsbVsAOQcIJdSKKlOvCeEjUQNRPHKXuBJUjPuaAJJxcDMqyaufqfVwUmHLdjeYZzSiiGLHOTCInpVAalbXXTMLugLiwFiyPSuSFiyJUKVrWjbZAHaJtZnQmnvorRrxdPKThqXzNgTjszQiCoMczRnwGYJMERUWGXFyrSbAqsHmLwLlnJOJoXNsjVehQjVOpQOQJAZWwFZBlgyVIplzLTlFwumPgBLYrUIAJAcmvHPGfHfWQguCjfTYzxYfbohaLFAPwxFRrNuCdCzLlEbuhyYjCmuDBTJDMCdLpNRVqEALjnPSaBPsKWRCKNGwEMFpiEWbYZRwaMopjoUuBUvMpvyLfsPKDrfQLiFOQIWPtLIMoijUEUYfhykHrSKbTtrvjwIzHdWZDVwLIpNkloCqpzIsErxxKAFuFEjikWNYChqYqVslXMtoSWzNhbMuxYbzLfJIcPGoUeGPkGyPQNhDyrjgdKekzftFrRPTuyLYqCArkDcWHTrjPQHfoThBNnTQyMwLEWxEnBXLtzJmFVLGEPrdbEwlXpgYfnVnWoNXgPQKKyiXifpvrmJATzQOzYwFhliiYxlbnsEPKbHYUfJLrwYPfSUwTIHiEvBFMrEtVmqJobfcwsiiEudTIiAnrtuywgKLOiMYbEIOAOJdOXqroPjWnQQcTNxFvkIEIsuHLyhSqSphuSmlvknzydQEnebOreeZwOouXYKlObAkaWHhOdTFLoMCHOWrVKeXjcniaxtgCziKEqWOZUWHJQpcDJzYnnduDZrmxgjZroBRwoPBUTJMYipsgJwbTSlvMyXXdAmiEWGMiQxhGvHGPLOKeTxNaLnFVbWpiYIVyqN")] |
@@ -40,17 +40,17 @@ output: [WikilinkOpen(), Text(text="foo"), WikilinkSeparator(), Text(text="bar|b | |||
--- | |||
name: nested | |||
label: a wikilink nested within the value of another | |||
input: "[[foo|[[bar]]]]" | |||
output: [WikilinkOpen(), Text(text="foo"), WikilinkSeparator(), WikilinkOpen(), Text(text="bar"), WikilinkClose(), WikilinkClose()] | |||
name: newline_text | |||
label: a newline in the middle of the text | |||
input: "[[foo|foo\nbar]]" | |||
output: [WikilinkOpen(), Text(text="foo"), WikilinkSeparator(), Text(text="foo\nbar"), WikilinkClose()] | |||
--- | |||
name: nested_with_text | |||
label: a wikilink nested within the value of another, separated by other data | |||
input: "[[foo|a[[b]]c]]" | |||
output: [WikilinkOpen(), Text(text="foo"), WikilinkSeparator(), Text(text="a"), WikilinkOpen(), Text(text="b"), WikilinkClose(), Text(text="c"), WikilinkClose()] | |||
name: bracket_text | |||
label: a left bracket in the middle of the text | |||
input: "[[foo|bar[baz]]" | |||
output: [WikilinkOpen(), Text(text="foo"), WikilinkSeparator(), Text(text="bar[baz"), WikilinkClose()] | |||
--- | |||
@@ -96,13 +96,34 @@ output: [Text(text="[[foo"), WikilinkOpen(), Text(text="bar"), WikilinkClose(), | |||
--- | |||
name: invalid_nested_text | |||
name: invalid_nested_padding | |||
label: invalid wikilink: trying to nest in the wrong context, with a text param | |||
input: "[[foo[[bar]]|baz]]" | |||
output: [Text(text="[[foo"), WikilinkOpen(), Text(text="bar"), WikilinkClose(), Text(text="|baz]]")] | |||
--- | |||
name: invalid_nested_text | |||
label: invalid wikilink: a wikilink nested within the value of another | |||
input: "[[foo|[[bar]]" | |||
output: [Text(text="[[foo|"), WikilinkOpen(), Text(text="bar"), WikilinkClose()] | |||
--- | |||
name: invalid_nested_text_2 | |||
label: invalid wikilink: a wikilink nested within the value of another, two pairs of closing brackets | |||
input: "[[foo|[[bar]]]]" | |||
output: [Text(text="[[foo|"), WikilinkOpen(), Text(text="bar"), WikilinkClose(), Text(text="]]")] | |||
--- | |||
name: invalid_nested_text_padding | |||
label: invalid wikilink: a wikilink nested within the value of another, separated by other data | |||
input: "[[foo|a[[b]]c]]" | |||
output: [Text(text="[[foo|a"), WikilinkOpen(), Text(text="b"), WikilinkClose(), Text(text="c]]")] | |||
--- | |||
name: incomplete_open_only | |||
label: incomplete wikilinks: just an open | |||
input: "[[" | |||