Browse Source

Merge branch 'develop'

tags/v0.3.1
Ben Kurtovic 11 years ago
parent
commit
d53ca7837a
42 changed files with 6184 additions and 1296 deletions
  1. +26
    -2
      CHANGELOG
  2. +20
    -16
      README.rst
  3. +9
    -0
      docs/api/mwparserfromhell.nodes.rst
  4. +6
    -0
      docs/api/mwparserfromhell.rst
  5. +33
    -6
      docs/changelog.rst
  6. +7
    -4
      docs/index.rst
  7. +16
    -14
      docs/usage.rst
  8. +3
    -2
      mwparserfromhell/__init__.py
  9. +0
    -2
      mwparserfromhell/compat.py
  10. +91
    -0
      mwparserfromhell/definitions.py
  11. +1
    -0
      mwparserfromhell/nodes/__init__.py
  12. +97
    -0
      mwparserfromhell/nodes/external_link.py
  13. +53
    -10
      mwparserfromhell/nodes/extras/attribute.py
  14. +166
    -135
      mwparserfromhell/nodes/tag.py
  15. +11
    -8
      mwparserfromhell/nodes/template.py
  16. +4
    -5
      mwparserfromhell/parser/__init__.py
  17. +48
    -21
      mwparserfromhell/parser/builder.py
  18. +87
    -32
      mwparserfromhell/parser/contexts.py
  19. +1968
    -639
      mwparserfromhell/parser/tokenizer.c
  20. +151
    -87
      mwparserfromhell/parser/tokenizer.h
  21. +702
    -113
      mwparserfromhell/parser/tokenizer.py
  22. +7
    -3
      mwparserfromhell/parser/tokens.py
  23. +13
    -5
      mwparserfromhell/utils.py
  24. +133
    -70
      mwparserfromhell/wikicode.py
  25. +3
    -6
      setup.py
  26. +21
    -1
      tests/_test_tree_equality.py
  27. +89
    -0
      tests/test_attribute.py
  28. +175
    -2
      tests/test_builder.py
  29. +14
    -14
      tests/test_docs.py
  30. +130
    -0
      tests/test_external_link.py
  31. +3
    -3
      tests/test_parser.py
  32. +315
    -0
      tests/test_tag.py
  33. +11
    -11
      tests/test_template.py
  34. +3
    -3
      tests/test_tokens.py
  35. +115
    -73
      tests/test_wikicode.py
  36. +473
    -0
      tests/tokenizer/external_links.mwtest
  37. +14
    -0
      tests/tokenizer/html_entities.mwtest
  38. +28
    -0
      tests/tokenizer/integration.mwtest
  39. +578
    -0
      tests/tokenizer/tags.mwtest
  40. +523
    -0
      tests/tokenizer/tags_wikimarkup.mwtest
  41. +7
    -0
      tests/tokenizer/text.mwtest
  42. +30
    -9
      tests/tokenizer/wikilinks.mwtest

+ 26
- 2
CHANGELOG View File

@@ -1,4 +1,24 @@
v0.1.1 (19da4d2144) to v0.2:
v0.3 (released August 24, 2013):

- Added complete support for HTML Tags, including forms like <ref>foo</ref>,
<ref name="bar"/>, and wiki-markup tags like bold ('''), italics (''), and
lists (*, #, ; and :).
- Added support for ExternalLinks (http://example.com/ and
[http://example.com/ Example]).
- Wikicode's filter methods are now passed 'recursive=True' by default instead
of False. This is a breaking change if you rely on any filter() methods being
non-recursive by default.
- Added a matches() method to Wikicode for page/template name comparisons.
- The 'obj' param of Wikicode.insert_before(), insert_after(), replace(), and
remove() now accepts other Wikicode objects and strings representing parts of
wikitext, instead of just nodes. These methods also make all possible
substitutions instead of just one.
- Renamed Template.has_param() to has() for consistency with Template's other
methods; has_param() is now an alias.
- The C tokenizer extension now works on Python 3 in addition to Python 2.7.
- Various bugfixes, internal changes, and cleanup.

v0.2 (released June 20, 2013):

- The parser now fully supports Python 3 in addition to Python 2.7.
- Added a C tokenizer extension that is significantly faster than its Python
@@ -24,10 +44,14 @@ v0.1.1 (19da4d2144) to v0.2:
- Fixed some broken example code in the README; other copyedits.
- Other bugfixes and code cleanup.

v0.1 (ba94938fe8) to v0.1.1 (19da4d2144):
v0.1.1 (released September 21, 2012):

- Added support for Comments (<!-- foo -->) and Wikilinks ([[foo]]).
- Added corresponding ifilter_links() and filter_links() methods to Wikicode.
- Fixed a bug when parsing incomplete templates.
- Fixed strip_code() to affect the contents of headings.
- Various copyedits in documentation and comments.

v0.1 (released August 23, 2012):

- Initial release.

+ 20
- 16
README.rst View File

@@ -9,7 +9,8 @@ mwparserfromhell
that provides an easy-to-use and outrageously powerful parser for MediaWiki_
wikicode. It supports Python 2 and Python 3.

Developed by Earwig_ with help from `Σ`_.
Developed by Earwig_ with help from `Σ`_. Full documentation is available on
ReadTheDocs_.

Installation
------------
@@ -18,7 +19,7 @@ The easiest way to install the parser is through the `Python Package Index`_,
so you can install the latest release with ``pip install mwparserfromhell``
(`get pip`_). Alternatively, get the latest development version::

git clone git://github.com/earwig/mwparserfromhell.git
git clone https://github.com/earwig/mwparserfromhell.git
cd mwparserfromhell
python setup.py install

@@ -59,13 +60,20 @@ For example::
>>> print template.get("eggs").value
spam

Since every node you reach is also a ``Wikicode`` object, it's trivial to get
nested templates::
Since nodes can contain other nodes, getting nested templates is trivial::

>>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}"
>>> mwparserfromhell.parse(text).filter_templates()
['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}']

You can also pass ``recursive=False`` to ``filter_templates()`` and explore
templates manually. This is possible because nodes can contain additional
``Wikicode`` objects::

>>> code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}")
>>> print code.filter_templates()
>>> print code.filter_templates(recursive=False)
['{{foo|this {{includes a|template}}}}']
>>> foo = code.filter_templates()[0]
>>> foo = code.filter_templates(recursive=False)[0]
>>> print foo.get(1).value
this {{includes a|template}}
>>> print foo.get(1).value.filter_templates()[0]
@@ -73,21 +81,16 @@ nested templates::
>>> print foo.get(1).value.filter_templates()[0].get(1).value
template

Additionally, you can include nested templates in ``filter_templates()`` by
passing ``recursive=True``::

>>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}"
>>> mwparserfromhell.parse(text).filter_templates(recursive=True)
['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}']

Templates can be easily modified to add, remove, or alter params. ``Wikicode``
can also be treated like a list with ``append()``, ``insert()``, ``remove()``,
``replace()``, and more::
objects can be treated like lists, with ``append()``, ``insert()``,
``remove()``, ``replace()``, and more. They also have a ``matches()`` method
for comparing page or template names, which takes care of capitalization and
whitespace::

>>> text = "{{cleanup}} '''Foo''' is a [[bar]]. {{uncategorized}}"
>>> code = mwparserfromhell.parse(text)
>>> for template in code.filter_templates():
... if template.name == "cleanup" and not template.has_param("date"):
... if template.name.matches("Cleanup") and not template.has("date"):
... template.add("date", "July 2012")
...
>>> print code
@@ -142,6 +145,7 @@ following code (via the API_)::
return mwparserfromhell.parse(text)

.. _MediaWiki: http://mediawiki.org
.. _ReadTheDocs: http://mwparserfromhell.readthedocs.org
.. _Earwig: http://en.wikipedia.org/wiki/User:The_Earwig
.. _Σ: http://en.wikipedia.org/wiki/User:%CE%A3
.. _Python Package Index: http://pypi.python.org


+ 9
- 0
docs/api/mwparserfromhell.nodes.rst View File

@@ -25,6 +25,14 @@ nodes Package
:undoc-members:
:show-inheritance:

:mod:`external_link` Module
---------------------------

.. automodule:: mwparserfromhell.nodes.external_link
:members:
:undoc-members:
:show-inheritance:

:mod:`heading` Module
---------------------

@@ -46,6 +54,7 @@ nodes Package

.. automodule:: mwparserfromhell.nodes.tag
:members:
:undoc-members:
:show-inheritance:

:mod:`template` Module


+ 6
- 0
docs/api/mwparserfromhell.rst View File

@@ -30,6 +30,12 @@ mwparserfromhell Package
:members:
:undoc-members:

:mod:`definitions` Module
-------------------------

.. automodule:: mwparserfromhell.definitions
:members:

:mod:`utils` Module
-------------------



+ 33
- 6
docs/changelog.rst View File

@@ -1,10 +1,38 @@
Changelog
=========

v0.3
----

`Released August 24, 2013 <https://github.com/earwig/mwparserfromhell/tree/v0.3>`_
(`changes <https://github.com/earwig/mwparserfromhell/compare/v0.2...v0.3>`__):

- Added complete support for HTML :py:class:`Tags <.Tag>`, including forms like
``<ref>foo</ref>``, ``<ref name="bar"/>``, and wiki-markup tags like bold
(``'''``), italics (``''``), and lists (``*``, ``#``, ``;`` and ``:``).
- Added support for :py:class:`.ExternalLink`\ s (``http://example.com/`` and
``[http://example.com/ Example]``).
- :py:class:`Wikicode's <.Wikicode>` :py:meth:`.filter` methods are now passed
*recursive=True* by default instead of *False*. **This is a breaking change
if you rely on any filter() methods being non-recursive by default.**
- Added a :py:meth:`.matches` method to :py:class:`~.Wikicode` for
page/template name comparisons.
- The *obj* param of :py:meth:`Wikicode.insert_before() <.insert_before>`,
:py:meth:`~.insert_after`, :py:meth:`~.Wikicode.replace`, and
:py:meth:`~.Wikicode.remove` now accepts :py:class:`~.Wikicode` objects and
strings representing parts of wikitext, instead of just nodes. These methods
also make all possible substitutions instead of just one.
- Renamed :py:meth:`Template.has_param() <.has_param>` to
:py:meth:`~.Template.has` for consistency with :py:class:`~.Template`\ 's
other methods; :py:meth:`~.has_param` is now an alias.
- The C tokenizer extension now works on Python 3 in addition to Python 2.7.
- Various bugfixes, internal changes, and cleanup.

v0.2
----

19da4d2144_ to master_ (released June 20, 2013)
`Released June 20, 2013 <https://github.com/earwig/mwparserfromhell/tree/v0.2>`_
(`changes <https://github.com/earwig/mwparserfromhell/compare/v0.1.1...v0.2>`__):

- The parser now fully supports Python 3 in addition to Python 2.7.
- Added a C tokenizer extension that is significantly faster than its Python
@@ -38,7 +66,8 @@ v0.2
v0.1.1
------

ba94938fe8_ to 19da4d2144_ (released September 21, 2012)
`Released September 21, 2012 <https://github.com/earwig/mwparserfromhell/tree/v0.1.1>`_
(`changes <https://github.com/earwig/mwparserfromhell/compare/v0.1...v0.1.1>`__):

- Added support for :py:class:`Comments <.Comment>` (``<!-- foo -->``) and
:py:class:`Wikilinks <.Wikilink>` (``[[foo]]``).
@@ -51,8 +80,6 @@ ba94938fe8_ to 19da4d2144_ (released September 21, 2012)
v0.1
----

ba94938fe8_ (released August 23, 2012)
`Released August 23, 2012 <https://github.com/earwig/mwparserfromhell/tree/v0.1>`_:

.. _master: https://github.com/earwig/mwparserfromhell/tree/v0.2
.. _19da4d2144: https://github.com/earwig/mwparserfromhell/tree/v0.1.1
.. _ba94938fe8: https://github.com/earwig/mwparserfromhell/tree/v0.1
- Initial release.

+ 7
- 4
docs/index.rst View File

@@ -1,15 +1,18 @@
MWParserFromHell v0.2 Documentation
===================================
MWParserFromHell v\ |version| Documentation
===========================================

:py:mod:`mwparserfromhell` (the *MediaWiki Parser from Hell*) is a Python
package that provides an easy-to-use and outrageously powerful parser for
MediaWiki_ wikicode. It supports Python 2 and Python 3.

Developed by Earwig_ with help from `Σ`_.
Developed by Earwig_ with contributions from `Σ`_, Legoktm_, and others.
Development occurs on GitHub_.

.. _MediaWiki: http://mediawiki.org
.. _Earwig: http://en.wikipedia.org/wiki/User:The_Earwig
.. _Σ: http://en.wikipedia.org/wiki/User:%CE%A3
.. _Legoktm: http://en.wikipedia.org/wiki/User:Legoktm
.. _GitHub: https://github.com/earwig/mwparserfromhell

Installation
------------
@@ -18,7 +21,7 @@ The easiest way to install the parser is through the `Python Package Index`_,
so you can install the latest release with ``pip install mwparserfromhell``
(`get pip`_). Alternatively, get the latest development version::

git clone git://github.com/earwig/mwparserfromhell.git
git clone https://github.com/earwig/mwparserfromhell.git
cd mwparserfromhell
python setup.py install



+ 16
- 14
docs/usage.rst View File

@@ -27,13 +27,20 @@ some extra methods. For example::
>>> print template.get("eggs").value
spam

Since every node you reach is also a :py:class:`~.Wikicode` object, it's
trivial to get nested templates::
Since nodes can contain other nodes, getting nested templates is trivial::

>>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}"
>>> mwparserfromhell.parse(text).filter_templates()
['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}']

You can also pass *recursive=False* to :py:meth:`~.filter_templates` and
explore templates manually. This is possible because nodes can contain
additional :py:class:`~.Wikicode` objects::

>>> code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}")
>>> print code.filter_templates()
>>> print code.filter_templates(recursive=False)
['{{foo|this {{includes a|template}}}}']
>>> foo = code.filter_templates()[0]
>>> foo = code.filter_templates(recursive=False)[0]
>>> print foo.get(1).value
this {{includes a|template}}
>>> print foo.get(1).value.filter_templates()[0]
@@ -41,22 +48,17 @@ trivial to get nested templates::
>>> print foo.get(1).value.filter_templates()[0].get(1).value
template

Additionally, you can include nested templates in :py:meth:`~.filter_templates`
by passing *recursive=True*::

>>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}"
>>> mwparserfromhell.parse(text).filter_templates(recursive=True)
['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}']

Templates can be easily modified to add, remove, or alter params.
:py:class:`~.Wikicode` can also be treated like a list with
:py:class:`~.Wikicode` objects can be treated like lists, with
:py:meth:`~.Wikicode.append`, :py:meth:`~.Wikicode.insert`,
:py:meth:`~.Wikicode.remove`, :py:meth:`~.Wikicode.replace`, and more::
:py:meth:`~.Wikicode.remove`, :py:meth:`~.Wikicode.replace`, and more. They
also have a :py:meth:`~.Wikicode.matches` method for comparing page or template
names, which takes care of capitalization and whitespace::

>>> text = "{{cleanup}} '''Foo''' is a [[bar]]. {{uncategorized}}"
>>> code = mwparserfromhell.parse(text)
>>> for template in code.filter_templates():
... if template.name == "cleanup" and not template.has_param("date"):
... if template.name.matches("Cleanup") and not template.has("date"):
... template.add("date", "July 2012")
...
>>> print code


+ 3
- 2
mwparserfromhell/__init__.py View File

@@ -31,9 +31,10 @@ from __future__ import unicode_literals
__author__ = "Ben Kurtovic"
__copyright__ = "Copyright (C) 2012, 2013 Ben Kurtovic"
__license__ = "MIT License"
__version__ = "0.2"
__version__ = "0.3"
__email__ = "ben.kurtovic@verizon.net"

from . import compat, nodes, parser, smart_list, string_mixin, utils, wikicode
from . import (compat, definitions, nodes, parser, smart_list, string_mixin,
utils, wikicode)

parse = utils.parse_anything

+ 0
- 2
mwparserfromhell/compat.py View File

@@ -15,14 +15,12 @@ py3k = sys.version_info[0] == 3
if py3k:
bytes = bytes
str = str
basestring = str
maxsize = sys.maxsize
import html.entities as htmlentities

else:
bytes = str
str = unicode
basestring = basestring
maxsize = sys.maxint
import htmlentitydefs as htmlentities



+ 91
- 0
mwparserfromhell/definitions.py View File

@@ -0,0 +1,91 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

"""Contains data about certain markup, like HTML tags and external links."""

from __future__ import unicode_literals

__all__ = ["get_html_tag", "is_parsable", "is_visible", "is_single",
"is_single_only", "is_scheme"]

URI_SCHEMES = {
# [mediawiki/core.git]/includes/DefaultSettings.php @ 374a0ad943
"http": True, "https": True, "ftp": True, "ftps": True, "ssh": True,
"sftp": True, "irc": True, "ircs": True, "xmpp": False, "sip": False,
"sips": False, "gopher": True, "telnet": True, "nntp": True,
"worldwind": True, "mailto": False, "tel": False, "sms": False,
"news": False, "svn": True, "git": True, "mms": True, "bitcoin": False,
"magnet": False, "urn": False, "geo": False
}

PARSER_BLACKLIST = [
# enwiki extensions @ 2013-06-28
"categorytree", "gallery", "hiero", "imagemap", "inputbox", "math",
"nowiki", "pre", "score", "section", "source", "syntaxhighlight",
"templatedata", "timeline"
]

INVISIBLE_TAGS = [
# enwiki extensions @ 2013-06-28
"categorytree", "gallery", "imagemap", "inputbox", "math", "score",
"section", "templatedata", "timeline"
]

# [mediawiki/core.git]/includes/Sanitizer.php @ 87a0aef762
SINGLE_ONLY = ["br", "hr", "meta", "link", "img"]
SINGLE = SINGLE_ONLY + ["li", "dt", "dd"]

MARKUP_TO_HTML = {
"#": "li",
"*": "li",
";": "dt",
":": "dd"
}

def get_html_tag(markup):
"""Return the HTML tag associated with the given wiki-markup."""
return MARKUP_TO_HTML[markup]

def is_parsable(tag):
"""Return if the given *tag*'s contents should be passed to the parser."""
return tag.lower() not in PARSER_BLACKLIST

def is_visible(tag):
"""Return whether or not the given *tag* contains visible text."""
return tag.lower() not in INVISIBLE_TAGS

def is_single(tag):
"""Return whether or not the given *tag* can exist without a close tag."""
return tag.lower() in SINGLE

def is_single_only(tag):
"""Return whether or not the given *tag* must exist without a close tag."""
return tag.lower() in SINGLE_ONLY

def is_scheme(scheme, slashes=True, reverse=False):
"""Return whether *scheme* is valid for external links."""
if reverse: # Convenience for C
scheme = scheme[::-1]
scheme = scheme.lower()
if slashes:
return scheme in URI_SCHEMES
return scheme in URI_SCHEMES and not URI_SCHEMES[scheme]

+ 1
- 0
mwparserfromhell/nodes/__init__.py View File

@@ -69,6 +69,7 @@ from . import extras
from .text import Text
from .argument import Argument
from .comment import Comment
from .external_link import ExternalLink
from .heading import Heading
from .html_entity import HTMLEntity
from .tag import Tag


+ 97
- 0
mwparserfromhell/nodes/external_link.py View File

@@ -0,0 +1,97 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import unicode_literals

from . import Node
from ..compat import str
from ..utils import parse_anything

__all__ = ["ExternalLink"]

class ExternalLink(Node):
"""Represents an external link, like ``[http://example.com/ Example]``."""

def __init__(self, url, title=None, brackets=True):
super(ExternalLink, self).__init__()
self._url = url
self._title = title
self._brackets = brackets

def __unicode__(self):
if self.brackets:
if self.title is not None:
return "[" + str(self.url) + " " + str(self.title) + "]"
return "[" + str(self.url) + "]"
return str(self.url)

def __iternodes__(self, getter):
yield None, self
for child in getter(self.url):
yield self.url, child
if self.title is not None:
for child in getter(self.title):
yield self.title, child

def __strip__(self, normalize, collapse):
if self.brackets:
if self.title:
return self.title.strip_code(normalize, collapse)
return None
return self.url.strip_code(normalize, collapse)

def __showtree__(self, write, get, mark):
if self.brackets:
write("[")
get(self.url)
if self.title is not None:
get(self.title)
if self.brackets:
write("]")

@property
def url(self):
"""The URL of the link target, as a :py:class:`~.Wikicode` object."""
return self._url

@property
def title(self):
"""The link title (if given), as a :py:class:`~.Wikicode` object."""
return self._title

@property
def brackets(self):
"""Whether to enclose the URL in brackets or display it straight."""
return self._brackets

@url.setter
def url(self, value):
from ..parser import contexts
self._url = parse_anything(value, contexts.EXT_LINK_URI)

@title.setter
def title(self, value):
self._title = None if value is None else parse_anything(value)

@brackets.setter
def brackets(self, value):
self._brackets = bool(value)

+ 53
- 10
mwparserfromhell/nodes/extras/attribute.py View File

@@ -36,18 +36,34 @@ class Attribute(StringMixIn):
whose value is ``"foo"``.
"""

def __init__(self, name, value=None, quoted=True):
def __init__(self, name, value=None, quoted=True, pad_first=" ",
pad_before_eq="", pad_after_eq=""):
super(Attribute, self).__init__()
self._name = name
self._value = value
self._quoted = quoted
self._pad_first = pad_first
self._pad_before_eq = pad_before_eq
self._pad_after_eq = pad_after_eq

def __unicode__(self):
if self.value:
result = self.pad_first + str(self.name) + self.pad_before_eq
if self.value is not None:
result += "=" + self.pad_after_eq
if self.quoted:
return str(self.name) + '="' + str(self.value) + '"'
return str(self.name) + "=" + str(self.value)
return str(self.name)
return result + '"' + str(self.value) + '"'
return result + str(self.value)
return result

def _set_padding(self, attr, value):
"""Setter for the value of a padding attribute."""
if not value:
setattr(self, attr, "")
else:
value = str(value)
if not value.isspace():
raise ValueError("padding must be entirely whitespace")
setattr(self, attr, value)

@property
def name(self):
@@ -64,14 +80,41 @@ class Attribute(StringMixIn):
"""Whether the attribute's value is quoted with double quotes."""
return self._quoted

@property
def pad_first(self):
"""Spacing to insert right before the attribute."""
return self._pad_first

@property
def pad_before_eq(self):
"""Spacing to insert right before the equal sign."""
return self._pad_before_eq

@property
def pad_after_eq(self):
"""Spacing to insert right after the equal sign."""
return self._pad_after_eq

@name.setter
def name(self, newval):
self._name = parse_anything(newval)
def name(self, value):
self._name = parse_anything(value)

@value.setter
def value(self, newval):
self._value = parse_anything(newval)
self._value = None if newval is None else parse_anything(newval)

@quoted.setter
def quoted(self, newval):
self._quoted = bool(newval)
def quoted(self, value):
self._quoted = bool(value)

@pad_first.setter
def pad_first(self, value):
self._set_padding("_pad_first", value)

@pad_before_eq.setter
def pad_before_eq(self, value):
self._set_padding("_pad_before_eq", value)

@pad_after_eq.setter
def pad_after_eq(self, value):
self._set_padding("_pad_after_eq", value)

+ 166
- 135
mwparserfromhell/nodes/tag.py View File

@@ -22,8 +22,10 @@

from __future__ import unicode_literals

from . import Node, Text
from . import Node
from .extras import Attribute
from ..compat import str
from ..definitions import is_visible
from ..utils import parse_anything

__all__ = ["Tag"]
@@ -31,146 +33,85 @@ __all__ = ["Tag"]
class Tag(Node):
"""Represents an HTML-style tag in wikicode, like ``<ref>``."""

TAG_UNKNOWN = 0

# Basic HTML:
TAG_ITALIC = 1
TAG_BOLD = 2
TAG_UNDERLINE = 3
TAG_STRIKETHROUGH = 4
TAG_UNORDERED_LIST = 5
TAG_ORDERED_LIST = 6
TAG_DEF_TERM = 7
TAG_DEF_ITEM = 8
TAG_BLOCKQUOTE = 9
TAG_RULE = 10
TAG_BREAK = 11
TAG_ABBR = 12
TAG_PRE = 13
TAG_MONOSPACE = 14
TAG_CODE = 15
TAG_SPAN = 16
TAG_DIV = 17
TAG_FONT = 18
TAG_SMALL = 19
TAG_BIG = 20
TAG_CENTER = 21

# MediaWiki parser hooks:
TAG_REF = 101
TAG_GALLERY = 102
TAG_MATH = 103
TAG_NOWIKI = 104
TAG_NOINCLUDE = 105
TAG_INCLUDEONLY = 106
TAG_ONLYINCLUDE = 107

# Additional parser hooks:
TAG_SYNTAXHIGHLIGHT = 201
TAG_POEM = 202

# Lists of tags:
TAGS_INVISIBLE = set((TAG_REF, TAG_GALLERY, TAG_MATH, TAG_NOINCLUDE))
TAGS_VISIBLE = set(range(300)) - TAGS_INVISIBLE

def __init__(self, type_, tag, contents=None, attrs=None, showtag=True,
self_closing=False, open_padding=0, close_padding=0):
def __init__(self, tag, contents=None, attrs=None, wiki_markup=None,
self_closing=False, invalid=False, implicit=False, padding="",
closing_tag=None):
super(Tag, self).__init__()
self._type = type_
self._tag = tag
self._contents = contents
if attrs:
self._attrs = attrs
if contents is None and not self_closing:
self._contents = parse_anything("")
else:
self._attrs = []
self._showtag = showtag
self._contents = contents
self._attrs = attrs if attrs else []
self._wiki_markup = wiki_markup
self._self_closing = self_closing
self._open_padding = open_padding
self._close_padding = close_padding
self._invalid = invalid
self._implicit = implicit
self._padding = padding
if closing_tag:
self._closing_tag = closing_tag
else:
self._closing_tag = tag

def __unicode__(self):
if not self.showtag:
open_, close = self._translate()
if self.wiki_markup:
if self.self_closing:
return open_
return self.wiki_markup
else:
return open_ + str(self.contents) + close
return self.wiki_markup + str(self.contents) + self.wiki_markup

result = "<" + str(self.tag)
if self.attrs:
result += " " + " ".join([str(attr) for attr in self.attrs])
result = ("</" if self.invalid else "<") + str(self.tag)
if self.attributes:
result += "".join([str(attr) for attr in self.attributes])
if self.self_closing:
result += " " * self.open_padding + "/>"
result += self.padding + (">" if self.implicit else "/>")
else:
result += " " * self.open_padding + ">" + str(self.contents)
result += "</" + str(self.tag) + " " * self.close_padding + ">"
result += self.padding + ">" + str(self.contents)
result += "</" + str(self.closing_tag) + ">"
return result

def __iternodes__(self, getter):
yield None, self
if self.showtag:
if not self.wiki_markup:
for child in getter(self.tag):
yield self.tag, child
for attr in self.attrs:
for attr in self.attributes:
for child in getter(attr.name):
yield attr.name, child
if attr.value:
for child in getter(attr.value):
yield attr.value, child
for child in getter(self.contents):
yield self.contents, child
if self.contents:
for child in getter(self.contents):
yield self.contents, child
if not self.self_closing and not self.wiki_markup and self.closing_tag:
for child in getter(self.closing_tag):
yield self.closing_tag, child

def __strip__(self, normalize, collapse):
if self.type in self.TAGS_VISIBLE:
if self.contents and is_visible(self.tag):
return self.contents.strip_code(normalize, collapse)
return None

def __showtree__(self, write, get, mark):
tagnodes = self.tag.nodes
if (not self.attrs and len(tagnodes) == 1 and isinstance(tagnodes[0], Text)):
write("<" + str(tagnodes[0]) + ">")
write("</" if self.invalid else "<")
get(self.tag)
for attr in self.attributes:
get(attr.name)
if not attr.value:
continue
write(" = ")
mark()
get(attr.value)
if self.self_closing:
write(">" if self.implicit else "/>")
else:
write("<")
get(self.tag)
for attr in self.attrs:
get(attr.name)
if not attr.value:
continue
write(" = ")
mark()
get(attr.value)
write(">")
get(self.contents)
if len(tagnodes) == 1 and isinstance(tagnodes[0], Text):
write("</" + str(tagnodes[0]) + ">")
else:
get(self.contents)
write("</")
get(self.tag)
get(self.closing_tag)
write(">")

def _translate(self):
"""If the HTML-style tag has a wikicode representation, return that.

For example, ``<b>Foo</b>`` can be represented as ``'''Foo'''``. This
returns a tuple of the character starting the sequence and the
character ending it.
"""
translations = {
self.TAG_ITALIC: ("''", "''"),
self.TAG_BOLD: ("'''", "'''"),
self.TAG_UNORDERED_LIST: ("*", ""),
self.TAG_ORDERED_LIST: ("#", ""),
self.TAG_DEF_TERM: (";", ""),
self.TAG_DEF_ITEM: (":", ""),
self.TAG_RULE: ("----", ""),
}
return translations[self.type]

@property
def type(self):
"""The tag type."""
return self._type

@property
def tag(self):
"""The tag itself, as a :py:class:`~.Wikicode` object."""
@@ -182,7 +123,7 @@ class Tag(Node):
return self._contents

@property
def attrs(self):
def attributes(self):
"""The list of attributes affecting the tag.

Each attribute is an instance of :py:class:`~.Attribute`.
@@ -190,52 +131,142 @@ class Tag(Node):
return self._attrs

@property
def showtag(self):
"""Whether to show the tag itself instead of a wikicode version."""
return self._showtag
def wiki_markup(self):
"""The wikified version of a tag to show instead of HTML.

If set to a value, this will be displayed instead of the brackets.
For example, set to ``''`` to replace ``<i>`` or ``----`` to replace
``<hr>``.
"""
return self._wiki_markup

@property
def self_closing(self):
"""Whether the tag is self-closing with no content."""
"""Whether the tag is self-closing with no content (like ``<br/>``)."""
return self._self_closing

@property
def open_padding(self):
"""How much spacing to insert before the first closing >."""
return self._open_padding
def invalid(self):
"""Whether the tag starts with a backslash after the opening bracket.

This makes the tag look like a lone close tag. It is technically
invalid and is only parsable Wikicode when the tag itself is
single-only, like ``<br>`` and ``<img>``. See
:py:func:`.definitions.is_single_only`.
"""
return self._invalid

@property
def close_padding(self):
"""How much spacing to insert before the last closing >."""
return self._close_padding
def implicit(self):
"""Whether the tag is implicitly self-closing, with no ending slash.

@type.setter
def type(self, value):
value = int(value)
if value not in self.TAGS_INVISIBLE | self.TAGS_VISIBLE:
raise ValueError(value)
self._type = value
This is only possible for specific "single" tags like ``<br>`` and
``<li>``. See :py:func:`.definitions.is_single`. This field only has an
effect if :py:attr:`self_closing` is also ``True``.
"""
return self._implicit

@property
def padding(self):
"""Spacing to insert before the first closing ``>``."""
return self._padding

@property
def closing_tag(self):
"""The closing tag, as a :py:class:`~.Wikicode` object.

This will usually equal :py:attr:`tag`, unless there is additional
spacing, comments, or the like.
"""
return self._closing_tag

@tag.setter
def tag(self, value):
self._tag = parse_anything(value)
self._tag = self._closing_tag = parse_anything(value)

@contents.setter
def contents(self, value):
self._contents = parse_anything(value)

@showtag.setter
def showtag(self, value):
self._showtag = bool(value)
@wiki_markup.setter
def wiki_markup(self, value):
self._wiki_markup = str(value) if value else None

@self_closing.setter
def self_closing(self, value):
self._self_closing = bool(value)

@open_padding.setter
def open_padding(self, value):
self._open_padding = int(value)
@invalid.setter
def invalid(self, value):
self._invalid = bool(value)

@implicit.setter
def implicit(self, value):
self._implicit = bool(value)

@close_padding.setter
def close_padding(self, value):
self._close_padding = int(value)
@padding.setter
def padding(self, value):
if not value:
self._padding = ""
else:
value = str(value)
if not value.isspace():
raise ValueError("padding must be entirely whitespace")
self._padding = value

@closing_tag.setter
def closing_tag(self, value):
self._closing_tag = parse_anything(value)

def has(self, name):
"""Return whether any attribute in the tag has the given *name*.

Note that a tag may have multiple attributes with the same name, but
only the last one is read by the MediaWiki parser.
"""
for attr in self.attributes:
if attr.name == name.strip():
return True
return False

def get(self, name):
"""Get the attribute with the given *name*.

The returned object is a :py:class:`~.Attribute` instance. Raises
:py:exc:`ValueError` if no attribute has this name. Since multiple
attributes can have the same name, we'll return the last match, since
all but the last are ignored by the MediaWiki parser.
"""
for attr in reversed(self.attributes):
if attr.name == name.strip():
return attr
raise ValueError(name)

def add(self, name, value=None, quoted=True, pad_first=" ",
pad_before_eq="", pad_after_eq=""):
"""Add an attribute with the given *name* and *value*.

*name* and *value* can be anything parasable by
:py:func:`.utils.parse_anything`; *value* can be omitted if the
attribute is valueless. *quoted* is a bool telling whether to wrap the
*value* in double quotes (this is recommended). *pad_first*,
*pad_before_eq*, and *pad_after_eq* are whitespace used as padding
before the name, before the equal sign (or after the name if no value),
and after the equal sign (ignored if no value), respectively.
"""
if value is not None:
value = parse_anything(value)
attr = Attribute(parse_anything(name), value, quoted)
attr.pad_first = pad_first
attr.pad_before_eq = pad_before_eq
attr.pad_after_eq = pad_after_eq
self.attributes.append(attr)
return attr

def remove(self, name):
"""Remove all attributes with the given *name*."""
attrs = [attr for attr in self.attributes if attr.name == name.strip()]
if not attrs:
raise ValueError(name)
for attr in attrs:
self.attributes.remove(attr)

+ 11
- 8
mwparserfromhell/nodes/template.py View File

@@ -26,7 +26,7 @@ import re

from . import HTMLEntity, Node, Text
from .extras import Parameter
from ..compat import basestring, str
from ..compat import str
from ..utils import parse_anything

__all__ = ["Template"]
@@ -84,7 +84,7 @@ class Template(Node):
replacement = str(HTMLEntity(value=ord(char)))
for node in code.filter_text(recursive=False):
if char in node:
code.replace(node, node.replace(char, replacement))
code.replace(node, node.replace(char, replacement), False)

def _blank_param_value(self, value):
"""Remove the content from *value* while keeping its whitespace.
@@ -164,15 +164,15 @@ class Template(Node):
def name(self, value):
self._name = parse_anything(value)

def has_param(self, name, ignore_empty=True):
def has(self, name, ignore_empty=True):
"""Return ``True`` if any parameter in the template is named *name*.

With *ignore_empty*, ``False`` will be returned even if the template
contains a parameter with the name *name*, if the parameter's value
is empty. Note that a template may have multiple parameters with the
same name.
same name, but only the last one is read by the MediaWiki parser.
"""
name = name.strip() if isinstance(name, basestring) else str(name)
name = str(name).strip()
for param in self.params:
if param.name.strip() == name:
if ignore_empty and not param.value.strip():
@@ -180,6 +180,9 @@ class Template(Node):
return True
return False

has_param = lambda self, *args, **kwargs: self.has(*args, **kwargs)
has_param.__doc__ = "Alias for :py:meth:`has`."

def get(self, name):
"""Get the parameter whose name is *name*.

@@ -188,7 +191,7 @@ class Template(Node):
parameters can have the same name, we'll return the last match, since
the last parameter is the only one read by the MediaWiki parser.
"""
name = name.strip() if isinstance(name, basestring) else str(name)
name = str(name).strip()
for param in reversed(self.params):
if param.name.strip() == name:
return param
@@ -226,7 +229,7 @@ class Template(Node):
name, value = parse_anything(name), parse_anything(value)
self._surface_escape(value, "|")

if self.has_param(name):
if self.has(name):
self.remove(name, keep_field=True)
existing = self.get(name)
if showkey is not None:
@@ -291,7 +294,7 @@ class Template(Node):
the first instance if none have dependents, otherwise the one with
dependents will be kept).
"""
name = name.strip() if isinstance(name, basestring) else str(name)
name = str(name).strip()
removed = False
to_remove = []
for i, param in enumerate(self.params):


+ 4
- 5
mwparserfromhell/parser/__init__.py View File

@@ -46,16 +46,15 @@ class Parser(object):
:py:class:`~.Node`\ s by the :py:class:`~.Builder`.
"""

def __init__(self, text):
self.text = text
def __init__(self):
if use_c and CTokenizer:
self._tokenizer = CTokenizer()
else:
self._tokenizer = Tokenizer()
self._builder = Builder()

def parse(self):
"""Return a string as a parsed :py:class:`~.Wikicode` object tree."""
tokens = self._tokenizer.tokenize(self.text)
def parse(self, text, context=0):
"""Parse *text*, returning a :py:class:`~.Wikicode` object tree."""
tokens = self._tokenizer.tokenize(text, context)
code = self._builder.build(tokens)
return code

+ 48
- 21
mwparserfromhell/parser/builder.py View File

@@ -24,8 +24,8 @@ from __future__ import unicode_literals

from . import tokens
from ..compat import str
from ..nodes import (Argument, Comment, Heading, HTMLEntity, Tag, Template,
Text, Wikilink)
from ..nodes import (Argument, Comment, ExternalLink, Heading, HTMLEntity, Tag,
Template, Text, Wikilink)
from ..nodes.extras import Attribute, Parameter
from ..smart_list import SmartList
from ..wikicode import Wikicode
@@ -83,7 +83,7 @@ class Builder(object):
tokens.TemplateClose)):
self._tokens.append(token)
value = self._pop()
if not key:
if key is None:
key = self._wrap([Text(str(default))])
return Parameter(key, value, showkey)
else:
@@ -142,6 +142,22 @@ class Builder(object):
else:
self._write(self._handle_token(token))

def _handle_external_link(self, token):
"""Handle when an external link is at the head of the tokens."""
brackets, url = token.brackets, None
self._push()
while self._tokens:
token = self._tokens.pop()
if isinstance(token, tokens.ExternalLinkSeparator):
url = self._pop()
self._push()
elif isinstance(token, tokens.ExternalLinkClose):
if url is not None:
return ExternalLink(url, self._pop(), brackets)
return ExternalLink(self._pop(), brackets=brackets)
else:
self._write(self._handle_token(token))

def _handle_entity(self):
"""Handle a case where an HTML entity is at the head of the tokens."""
token = self._tokens.pop()
@@ -170,7 +186,7 @@ class Builder(object):
self._write(self._handle_token(token))

def _handle_comment(self):
"""Handle a case where a hidden comment is at the head of the tokens."""
"""Handle a case where an HTML comment is at the head of the tokens."""
self._push()
while self._tokens:
token = self._tokens.pop()
@@ -180,7 +196,7 @@ class Builder(object):
else:
self._write(self._handle_token(token))

def _handle_attribute(self):
def _handle_attribute(self, start):
"""Handle a case where a tag attribute is at the head of the tokens."""
name, quoted = None, False
self._push()
@@ -191,37 +207,46 @@ class Builder(object):
self._push()
elif isinstance(token, tokens.TagAttrQuote):
quoted = True
elif isinstance(token, (tokens.TagAttrStart,
tokens.TagCloseOpen)):
elif isinstance(token, (tokens.TagAttrStart, tokens.TagCloseOpen,
tokens.TagCloseSelfclose)):
self._tokens.append(token)
if name is not None:
return Attribute(name, self._pop(), quoted)
return Attribute(self._pop(), quoted=quoted)
if name:
value = self._pop()
else:
name, value = self._pop(), None
return Attribute(name, value, quoted, start.pad_first,
start.pad_before_eq, start.pad_after_eq)
else:
self._write(self._handle_token(token))

def _handle_tag(self, token):
"""Handle a case where a tag is at the head of the tokens."""
type_, showtag = token.type, token.showtag
attrs = []
close_tokens = (tokens.TagCloseSelfclose, tokens.TagCloseClose)
implicit, attrs, contents, closing_tag = False, [], None, None
wiki_markup, invalid = token.wiki_markup, token.invalid or False
self._push()
while self._tokens:
token = self._tokens.pop()
if isinstance(token, tokens.TagAttrStart):
attrs.append(self._handle_attribute())
attrs.append(self._handle_attribute(token))
elif isinstance(token, tokens.TagCloseOpen):
open_pad = token.padding
padding = token.padding or ""
tag = self._pop()
self._push()
elif isinstance(token, tokens.TagCloseSelfclose):
tag = self._pop()
return Tag(type_, tag, attrs=attrs, showtag=showtag,
self_closing=True, open_padding=token.padding)
elif isinstance(token, tokens.TagOpenClose):
contents = self._pop()
elif isinstance(token, tokens.TagCloseClose):
return Tag(type_, tag, contents, attrs, showtag, False,
open_pad, token.padding)
self._push()
elif isinstance(token, close_tokens):
if isinstance(token, tokens.TagCloseSelfclose):
tag = self._pop()
self_closing = True
padding = token.padding or ""
implicit = token.implicit or False
else:
self_closing = False
closing_tag = self._pop()
return Tag(tag, contents, attrs, wiki_markup, self_closing,
invalid, implicit, padding, closing_tag)
else:
self._write(self._handle_token(token))

@@ -235,6 +260,8 @@ class Builder(object):
return self._handle_argument()
elif isinstance(token, tokens.WikilinkOpen):
return self._handle_wikilink()
elif isinstance(token, tokens.ExternalLinkOpen):
return self._handle_external_link(token)
elif isinstance(token, tokens.HTMLEntityStart):
return self._handle_entity()
elif isinstance(token, tokens.HeadingStart):


+ 87
- 32
mwparserfromhell/parser/contexts.py View File

@@ -51,6 +51,12 @@ Local (stack-specific) contexts:
* :py:const:`WIKILINK_TITLE`
* :py:const:`WIKILINK_TEXT`

* :py:const:`EXT_LINK`

* :py:const:`EXT_LINK_URI`
* :py:const:`EXT_LINK_TITLE`
* :py:const:`EXT_LINK_BRACKETS`

* :py:const:`HEADING`

* :py:const:`HEADING_LEVEL_1`
@@ -60,7 +66,21 @@ Local (stack-specific) contexts:
* :py:const:`HEADING_LEVEL_5`
* :py:const:`HEADING_LEVEL_6`

* :py:const:`COMMENT`
* :py:const:`TAG`

* :py:const:`TAG_OPEN`
* :py:const:`TAG_ATTR`
* :py:const:`TAG_BODY`
* :py:const:`TAG_CLOSE`

* :py:const:`STYLE`

* :py:const:`STYLE_ITALICS`
* :py:const:`STYLE_BOLD`
* :py:const:`STYLE_PASS_AGAIN`
* :py:const:`STYLE_SECOND_PASS`

* :py:const:`DL_TERM`

* :py:const:`SAFETY_CHECK`

@@ -74,41 +94,76 @@ Local (stack-specific) contexts:
Global contexts:

* :py:const:`GL_HEADING`

Aggregate contexts:

* :py:const:`FAIL`
* :py:const:`UNSAFE`
* :py:const:`DOUBLE`
* :py:const:`INVALID_LINK`

"""

# Local contexts:

TEMPLATE = 0b00000000000000000111
TEMPLATE_NAME = 0b00000000000000000001
TEMPLATE_PARAM_KEY = 0b00000000000000000010
TEMPLATE_PARAM_VALUE = 0b00000000000000000100

ARGUMENT = 0b00000000000000011000
ARGUMENT_NAME = 0b00000000000000001000
ARGUMENT_DEFAULT = 0b00000000000000010000

WIKILINK = 0b00000000000001100000
WIKILINK_TITLE = 0b00000000000000100000
WIKILINK_TEXT = 0b00000000000001000000

HEADING = 0b00000001111110000000
HEADING_LEVEL_1 = 0b00000000000010000000
HEADING_LEVEL_2 = 0b00000000000100000000
HEADING_LEVEL_3 = 0b00000000001000000000
HEADING_LEVEL_4 = 0b00000000010000000000
HEADING_LEVEL_5 = 0b00000000100000000000
HEADING_LEVEL_6 = 0b00000001000000000000

COMMENT = 0b00000010000000000000

SAFETY_CHECK = 0b11111100000000000000
HAS_TEXT = 0b00000100000000000000
FAIL_ON_TEXT = 0b00001000000000000000
FAIL_NEXT = 0b00010000000000000000
FAIL_ON_LBRACE = 0b00100000000000000000
FAIL_ON_RBRACE = 0b01000000000000000000
FAIL_ON_EQUALS = 0b10000000000000000000
TEMPLATE_NAME = 1 << 0
TEMPLATE_PARAM_KEY = 1 << 1
TEMPLATE_PARAM_VALUE = 1 << 2
TEMPLATE = TEMPLATE_NAME + TEMPLATE_PARAM_KEY + TEMPLATE_PARAM_VALUE

ARGUMENT_NAME = 1 << 3
ARGUMENT_DEFAULT = 1 << 4
ARGUMENT = ARGUMENT_NAME + ARGUMENT_DEFAULT

WIKILINK_TITLE = 1 << 5
WIKILINK_TEXT = 1 << 6
WIKILINK = WIKILINK_TITLE + WIKILINK_TEXT

EXT_LINK_URI = 1 << 7
EXT_LINK_TITLE = 1 << 8
EXT_LINK_BRACKETS = 1 << 9
EXT_LINK = EXT_LINK_URI + EXT_LINK_TITLE + EXT_LINK_BRACKETS

HEADING_LEVEL_1 = 1 << 10
HEADING_LEVEL_2 = 1 << 11
HEADING_LEVEL_3 = 1 << 12
HEADING_LEVEL_4 = 1 << 13
HEADING_LEVEL_5 = 1 << 14
HEADING_LEVEL_6 = 1 << 15
HEADING = (HEADING_LEVEL_1 + HEADING_LEVEL_2 + HEADING_LEVEL_3 +
HEADING_LEVEL_4 + HEADING_LEVEL_5 + HEADING_LEVEL_6)

TAG_OPEN = 1 << 16
TAG_ATTR = 1 << 17
TAG_BODY = 1 << 18
TAG_CLOSE = 1 << 19
TAG = TAG_OPEN + TAG_ATTR + TAG_BODY + TAG_CLOSE

STYLE_ITALICS = 1 << 20
STYLE_BOLD = 1 << 21
STYLE_PASS_AGAIN = 1 << 22
STYLE_SECOND_PASS = 1 << 23
STYLE = STYLE_ITALICS + STYLE_BOLD + STYLE_PASS_AGAIN + STYLE_SECOND_PASS

DL_TERM = 1 << 24

HAS_TEXT = 1 << 25
FAIL_ON_TEXT = 1 << 26
FAIL_NEXT = 1 << 27
FAIL_ON_LBRACE = 1 << 28
FAIL_ON_RBRACE = 1 << 29
FAIL_ON_EQUALS = 1 << 30
SAFETY_CHECK = (HAS_TEXT + FAIL_ON_TEXT + FAIL_NEXT + FAIL_ON_LBRACE +
FAIL_ON_RBRACE + FAIL_ON_EQUALS)

# Global contexts:

GL_HEADING = 0b1
GL_HEADING = 1 << 0

# Aggregate contexts:

FAIL = TEMPLATE + ARGUMENT + WIKILINK + EXT_LINK_TITLE + HEADING + TAG + STYLE
UNSAFE = (TEMPLATE_NAME + WIKILINK + EXT_LINK_TITLE + TEMPLATE_PARAM_KEY +
ARGUMENT_NAME + TAG_CLOSE)
DOUBLE = TEMPLATE_PARAM_KEY + TAG_CLOSE
INVALID_LINK = TEMPLATE_NAME + ARGUMENT_NAME + WIKILINK + EXT_LINK

+ 1968
- 639
mwparserfromhell/parser/tokenizer.c
File diff suppressed because it is too large
View File


+ 151
- 87
mwparserfromhell/parser/tokenizer.h View File

@@ -28,6 +28,7 @@ SOFTWARE.
#include <Python.h>
#include <math.h>
#include <structmember.h>
#include <bytesobject.h>

#if PY_MAJOR_VERSION >= 3
#define IS_PY3K
@@ -41,8 +42,8 @@ SOFTWARE.
#define ALPHANUM "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

static const char* MARKERS[] = {
"{", "}", "[", "]", "<", ">", "|", "=", "&", "#", "*", ";", ":", "/", "-",
"!", "\n", ""};
"{", "}", "[", "]", "<", ">", "|", "=", "&", "'", "#", "*", ";", ":", "/",
"-", "\n", ""};

#define NUM_MARKERS 18
#define TEXTBUFFER_BLOCKSIZE 1024
@@ -51,19 +52,20 @@ static const char* MARKERS[] = {
#define MAX_BRACES 255
#define MAX_ENTITY_SIZE 8

static int route_state = 0;
#define BAD_ROUTE (route_state)
#define FAIL_ROUTE() (route_state = 1)
#define RESET_ROUTE() (route_state = 0)
static int route_state = 0, route_context = 0;
#define BAD_ROUTE route_state
#define BAD_ROUTE_CONTEXT route_context
#define FAIL_ROUTE(context) route_state = 1; route_context = context
#define RESET_ROUTE() route_state = 0

static char** entitydefs;

static PyObject* EMPTY;
static PyObject* NOARGS;
static PyObject* tokens;
static PyObject* definitions;


/* Tokens */
/* Tokens: */

static PyObject* Text;

@@ -80,6 +82,10 @@ static PyObject* WikilinkOpen;
static PyObject* WikilinkSeparator;
static PyObject* WikilinkClose;

static PyObject* ExternalLinkOpen;
static PyObject* ExternalLinkSeparator;
static PyObject* ExternalLinkClose;

static PyObject* HTMLEntityStart;
static PyObject* HTMLEntityNumeric;
static PyObject* HTMLEntityHex;
@@ -102,47 +108,83 @@ static PyObject* TagCloseClose;

/* Local contexts: */

#define LC_TEMPLATE 0x00007
#define LC_TEMPLATE_NAME 0x00001
#define LC_TEMPLATE_PARAM_KEY 0x00002
#define LC_TEMPLATE_PARAM_VALUE 0x00004

#define LC_ARGUMENT 0x00018
#define LC_ARGUMENT_NAME 0x00008
#define LC_ARGUMENT_DEFAULT 0x00010

#define LC_WIKILINK 0x00060
#define LC_WIKILINK_TITLE 0x00020
#define LC_WIKILINK_TEXT 0x00040

#define LC_HEADING 0x01F80
#define LC_HEADING_LEVEL_1 0x00080
#define LC_HEADING_LEVEL_2 0x00100
#define LC_HEADING_LEVEL_3 0x00200
#define LC_HEADING_LEVEL_4 0x00400
#define LC_HEADING_LEVEL_5 0x00800
#define LC_HEADING_LEVEL_6 0x01000

#define LC_COMMENT 0x02000

#define LC_SAFETY_CHECK 0xFC000
#define LC_HAS_TEXT 0x04000
#define LC_FAIL_ON_TEXT 0x08000
#define LC_FAIL_NEXT 0x10000
#define LC_FAIL_ON_LBRACE 0x20000
#define LC_FAIL_ON_RBRACE 0x40000
#define LC_FAIL_ON_EQUALS 0x80000
#define LC_TEMPLATE 0x00000007
#define LC_TEMPLATE_NAME 0x00000001
#define LC_TEMPLATE_PARAM_KEY 0x00000002
#define LC_TEMPLATE_PARAM_VALUE 0x00000004

#define LC_ARGUMENT 0x00000018
#define LC_ARGUMENT_NAME 0x00000008
#define LC_ARGUMENT_DEFAULT 0x00000010

#define LC_WIKILINK 0x00000060
#define LC_WIKILINK_TITLE 0x00000020
#define LC_WIKILINK_TEXT 0x00000040

#define LC_EXT_LINK 0x00000380
#define LC_EXT_LINK_URI 0x00000080
#define LC_EXT_LINK_TITLE 0x00000100
#define LC_EXT_LINK_BRACKETS 0x00000200

#define LC_HEADING 0x0000FC00
#define LC_HEADING_LEVEL_1 0x00000400
#define LC_HEADING_LEVEL_2 0x00000800
#define LC_HEADING_LEVEL_3 0x00001000
#define LC_HEADING_LEVEL_4 0x00002000
#define LC_HEADING_LEVEL_5 0x00004000
#define LC_HEADING_LEVEL_6 0x00008000

#define LC_TAG 0x000F0000
#define LC_TAG_OPEN 0x00010000
#define LC_TAG_ATTR 0x00020000
#define LC_TAG_BODY 0x00040000
#define LC_TAG_CLOSE 0x00080000

#define LC_STYLE 0x00F00000
#define LC_STYLE_ITALICS 0x00100000
#define LC_STYLE_BOLD 0x00200000
#define LC_STYLE_PASS_AGAIN 0x00400000
#define LC_STYLE_SECOND_PASS 0x00800000

#define LC_DLTERM 0x01000000

#define LC_SAFETY_CHECK 0x7E000000
#define LC_HAS_TEXT 0x02000000
#define LC_FAIL_ON_TEXT 0x04000000
#define LC_FAIL_NEXT 0x08000000
#define LC_FAIL_ON_LBRACE 0x10000000
#define LC_FAIL_ON_RBRACE 0x20000000
#define LC_FAIL_ON_EQUALS 0x40000000

/* Global contexts: */

#define GL_HEADING 0x1

/* Aggregate contexts: */

#define AGG_FAIL (LC_TEMPLATE | LC_ARGUMENT | LC_WIKILINK | LC_EXT_LINK_TITLE | LC_HEADING | LC_TAG | LC_STYLE)
#define AGG_UNSAFE (LC_TEMPLATE_NAME | LC_WIKILINK | LC_EXT_LINK_TITLE | LC_TEMPLATE_PARAM_KEY | LC_ARGUMENT_NAME)
#define AGG_DOUBLE (LC_TEMPLATE_PARAM_KEY | LC_TAG_CLOSE)
#define AGG_INVALID_LINK (LC_TEMPLATE_NAME | LC_ARGUMENT_NAME | LC_WIKILINK | LC_EXT_LINK)

/* Tag contexts: */

#define TAG_NAME 0x01
#define TAG_ATTR_READY 0x02
#define TAG_ATTR_NAME 0x04
#define TAG_ATTR_VALUE 0x08
#define TAG_QUOTED 0x10
#define TAG_NOTE_SPACE 0x20
#define TAG_NOTE_EQUALS 0x40
#define TAG_NOTE_QUOTE 0x80


/* Miscellaneous structs: */

struct Textbuffer {
Py_ssize_t size;
Py_UNICODE* data;
struct Textbuffer* prev;
struct Textbuffer* next;
};

@@ -158,13 +200,24 @@ typedef struct {
int level;
} HeadingData;

typedef struct {
int context;
struct Textbuffer* pad_first;
struct Textbuffer* pad_before_eq;
struct Textbuffer* pad_after_eq;
Py_ssize_t reset;
} TagData;

typedef struct Textbuffer Textbuffer;
typedef struct Stack Stack;


/* Tokenizer object definition: */

typedef struct {
PyObject_HEAD
PyObject* text; /* text to tokenize */
struct Stack* topstack; /* topmost stack */
Stack* topstack; /* topmost stack */
Py_ssize_t head; /* current position in text */
Py_ssize_t length; /* length of text */
int global; /* global context */
@@ -173,78 +226,80 @@ typedef struct {
} Tokenizer;


/* Macros for accessing Tokenizer data: */
/* Macros related to Tokenizer functions: */

#define Tokenizer_READ(self, delta) (*PyUnicode_AS_UNICODE(Tokenizer_read(self, delta)))
#define Tokenizer_READ_BACKWARDS(self, delta) \
(*PyUnicode_AS_UNICODE(Tokenizer_read_backwards(self, delta)))
#define Tokenizer_CAN_RECURSE(self) (self->depth < MAX_DEPTH && self->cycles < MAX_CYCLES)

#define Tokenizer_emit(self, token) Tokenizer_emit_token(self, token, 0)
#define Tokenizer_emit_first(self, token) Tokenizer_emit_token(self, token, 1)
#define Tokenizer_emit_kwargs(self, token, kwargs) Tokenizer_emit_token_kwargs(self, token, kwargs, 0)
#define Tokenizer_emit_first_kwargs(self, token, kwargs) Tokenizer_emit_token_kwargs(self, token, kwargs, 1)


/* Macros for accessing definitions: */

#define GET_HTML_TAG(markup) (markup == *":" ? "dd" : markup == *";" ? "dt" : "li")
#define IS_PARSABLE(tag) (call_def_func("is_parsable", tag, NULL, NULL))
#define IS_SINGLE(tag) (call_def_func("is_single", tag, NULL, NULL))
#define IS_SINGLE_ONLY(tag) (call_def_func("is_single_only", tag, NULL, NULL))
#define IS_SCHEME(scheme, slashes, reverse) \
(call_def_func("is_scheme", scheme, slashes ? Py_True : Py_False, reverse ? Py_True : Py_False))


/* Function prototypes: */

static int heading_level_from_context(int);
static Textbuffer* Textbuffer_new(void);
static void Textbuffer_dealloc(Textbuffer*);

static TagData* TagData_new(void);
static void TagData_dealloc(TagData*);

static PyObject* Tokenizer_new(PyTypeObject*, PyObject*, PyObject*);
static struct Textbuffer* Textbuffer_new(void);
static void Tokenizer_dealloc(Tokenizer*);
static void Textbuffer_dealloc(struct Textbuffer*);
static int Tokenizer_init(Tokenizer*, PyObject*, PyObject*);
static int Tokenizer_push(Tokenizer*, int);
static PyObject* Textbuffer_render(struct Textbuffer*);
static int Tokenizer_push_textbuffer(Tokenizer*);
static void Tokenizer_delete_top_of_stack(Tokenizer*);
static PyObject* Tokenizer_pop(Tokenizer*);
static PyObject* Tokenizer_pop_keeping_context(Tokenizer*);
static void* Tokenizer_fail_route(Tokenizer*);
static int Tokenizer_write(Tokenizer*, PyObject*);
static int Tokenizer_write_first(Tokenizer*, PyObject*);
static int Tokenizer_write_text(Tokenizer*, Py_UNICODE);
static int Tokenizer_write_all(Tokenizer*, PyObject*);
static int Tokenizer_write_text_then_stack(Tokenizer*, const char*);
static PyObject* Tokenizer_read(Tokenizer*, Py_ssize_t);
static PyObject* Tokenizer_read_backwards(Tokenizer*, Py_ssize_t);
static int Tokenizer_parse_template_or_argument(Tokenizer*);
static int Tokenizer_parse_template(Tokenizer*);
static int Tokenizer_parse_argument(Tokenizer*);
static int Tokenizer_handle_template_param(Tokenizer*);
static int Tokenizer_handle_template_param_value(Tokenizer*);
static PyObject* Tokenizer_handle_template_end(Tokenizer*);
static int Tokenizer_handle_argument_separator(Tokenizer*);
static PyObject* Tokenizer_handle_argument_end(Tokenizer*);
static int Tokenizer_parse_wikilink(Tokenizer*);
static int Tokenizer_handle_wikilink_separator(Tokenizer*);
static PyObject* Tokenizer_handle_wikilink_end(Tokenizer*);
static int Tokenizer_parse_heading(Tokenizer*);
static HeadingData* Tokenizer_handle_heading_end(Tokenizer*);
static int Tokenizer_really_parse_entity(Tokenizer*);
static int Tokenizer_parse_entity(Tokenizer*);
static int Tokenizer_parse_comment(Tokenizer*);
static int Tokenizer_verify_safe(Tokenizer*, int, Py_UNICODE);
static PyObject* Tokenizer_parse(Tokenizer*, int);
static int Tokenizer_handle_dl_term(Tokenizer*);
static int Tokenizer_parse_tag(Tokenizer*);
static PyObject* Tokenizer_parse(Tokenizer*, int, int);
static PyObject* Tokenizer_tokenize(Tokenizer*, PyObject*);


/* Macros for Python 2/3 compatibility: */

#ifdef IS_PY3K
#define NEW_INT_FUNC PyLong_FromSsize_t
#define IMPORT_NAME_FUNC PyUnicode_FromString
#define CREATE_MODULE PyModule_Create(&module_def);
#define ENTITYDEFS_MODULE "html.entities"
#define INIT_FUNC_NAME PyInit__tokenizer
#define INIT_ERROR return NULL
#else
#define NEW_INT_FUNC PyInt_FromSsize_t
#define IMPORT_NAME_FUNC PyBytes_FromString
#define CREATE_MODULE Py_InitModule("_tokenizer", NULL);
#define ENTITYDEFS_MODULE "htmlentitydefs"
#define INIT_FUNC_NAME init_tokenizer
#define INIT_ERROR return
#endif


/* More structs for creating the Tokenizer type: */

static PyMethodDef
Tokenizer_methods[] = {
static PyMethodDef Tokenizer_methods[] = {
{"tokenize", (PyCFunction) Tokenizer_tokenize, METH_VARARGS,
"Build a list of tokens from a string of wikicode and return it."},
{NULL}
};

static PyMemberDef
Tokenizer_members[] = {
static PyMemberDef Tokenizer_members[] = {
{NULL}
};

static PyMethodDef
module_methods[] = {
{NULL}
};

static PyTypeObject
TokenizerType = {
PyObject_HEAD_INIT(NULL)
0, /* ob_size */
static PyTypeObject TokenizerType = {
PyVarObject_HEAD_INIT(NULL, 0)
"_tokenizer.CTokenizer", /* tp_name */
sizeof(Tokenizer), /* tp_basicsize */
0, /* tp_itemsize */
@@ -283,3 +338,12 @@ TokenizerType = {
0, /* tp_alloc */
Tokenizer_new, /* tp_new */
};

#ifdef IS_PY3K
static PyModuleDef module_def = {
PyModuleDef_HEAD_INIT,
"_tokenizer",
"Creates a list of tokens from a string of wikicode.",
-1, NULL, NULL, NULL, NULL, NULL
};
#endif

+ 702
- 113
mwparserfromhell/parser/tokenizer.py
File diff suppressed because it is too large
View File


+ 7
- 3
mwparserfromhell/parser/tokens.py View File

@@ -30,7 +30,7 @@ into the :py:class`~.Wikicode` tree by the :py:class:`~.Builder`.

from __future__ import unicode_literals

from ..compat import basestring, py3k
from ..compat import py3k, str

__all__ = ["Token"]

@@ -43,7 +43,7 @@ class Token(object):
def __repr__(self):
args = []
for key, value in self._kwargs.items():
if isinstance(value, basestring) and len(value) > 100:
if isinstance(value, str) and len(value) > 100:
args.append(key + "=" + repr(value[:97] + "..."))
else:
args.append(key + "=" + repr(value))
@@ -55,7 +55,7 @@ class Token(object):
return False

def __getattr__(self, key):
return self._kwargs[key]
return self._kwargs.get(key)

def __setattr__(self, key, value):
self._kwargs[key] = value
@@ -84,6 +84,10 @@ WikilinkOpen = make("WikilinkOpen") # [[
WikilinkSeparator = make("WikilinkSeparator") # |
WikilinkClose = make("WikilinkClose") # ]]

ExternalLinkOpen = make("ExternalLinkOpen") # [
ExternalLinkSeparator = make("ExternalLinkSeparator") #
ExternalLinkClose = make("ExternalLinkClose") # ]

HTMLEntityStart = make("HTMLEntityStart") # &
HTMLEntityNumeric = make("HTMLEntityNumeric") # #
HTMLEntityHex = make("HTMLEntityHex") # x


+ 13
- 5
mwparserfromhell/utils.py View File

@@ -31,7 +31,9 @@ from .compat import bytes, str
from .nodes import Node
from .smart_list import SmartList

def parse_anything(value):
__all__ = ["parse_anything"]

def parse_anything(value, context=0):
"""Return a :py:class:`~.Wikicode` for *value*, allowing multiple types.

This differs from :py:meth:`.Parser.parse` in that we accept more than just
@@ -42,6 +44,12 @@ def parse_anything(value):
on-the-fly by various methods of :py:class:`~.Wikicode` and others like
:py:class:`~.Template`, such as :py:meth:`wikicode.insert()
<.Wikicode.insert>` or setting :py:meth:`template.name <.Template.name>`.

If given, *context* will be passed as a starting context to the parser.
This is helpful when this function is used inside node attribute setters.
For example, :py:class:`~.ExternalLink`\ 's :py:attr:`~.ExternalLink.url`
setter sets *context* to :py:mod:`contexts.EXT_LINK_URI <.contexts>` to
prevent the URL itself from becoming an :py:class:`~.ExternalLink`.
"""
from .parser import Parser
from .wikicode import Wikicode
@@ -51,17 +59,17 @@ def parse_anything(value):
elif isinstance(value, Node):
return Wikicode(SmartList([value]))
elif isinstance(value, str):
return Parser(value).parse()
return Parser().parse(value, context)
elif isinstance(value, bytes):
return Parser(value.decode("utf8")).parse()
return Parser().parse(value.decode("utf8"), context)
elif isinstance(value, int):
return Parser(str(value)).parse()
return Parser().parse(str(value), context)
elif value is None:
return Wikicode(SmartList())
try:
nodelist = SmartList()
for item in value:
nodelist += parse_anything(item).nodes
nodelist += parse_anything(item, context).nodes
except TypeError:
error = "Needs string, Node, Wikicode, int, None, or iterable of these, but got {0}: {1}"
raise ValueError(error.format(type(value).__name__, value))


+ 133
- 70
mwparserfromhell/wikicode.py View File

@@ -24,8 +24,8 @@ from __future__ import unicode_literals
import re

from .compat import maxsize, py3k, str
from .nodes import (Argument, Comment, Heading, HTMLEntity, Node, Tag,
Template, Text, Wikilink)
from .nodes import (Argument, Comment, ExternalLink, Heading, HTMLEntity,
Node, Tag, Template, Text, Wikilink)
from .string_mixin import StringMixIn
from .utils import parse_anything

@@ -60,19 +60,6 @@ class Wikicode(StringMixIn):
for context, child in node.__iternodes__(self._get_all_nodes):
yield child

def _get_context(self, node, obj):
"""Return a ``Wikicode`` that contains *obj* in its descendants.

The closest (shortest distance from *node*) suitable ``Wikicode`` will
be returned, or ``None`` if the *obj* is the *node* itself.

Raises ``ValueError`` if *obj* is not within *node*.
"""
for context, child in node.__iternodes__(self._get_all_nodes):
if self._is_equivalent(obj, child):
return context
raise ValueError(obj)

def _get_all_nodes(self, code):
"""Iterate over all of our descendant nodes.

@@ -105,26 +92,56 @@ class Wikicode(StringMixIn):
return False
return obj in nodes

def _do_search(self, obj, recursive, callback, context, *args, **kwargs):
"""Look within *context* for *obj*, executing *callback* if found.
def _do_search(self, obj, recursive, context=None, literal=None):
"""Return some info about the location of *obj* within *context*.

If *recursive* is ``True``, we'll look within context and its
descendants, otherwise we'll just execute callback. We raise
:py:exc:`ValueError` if *obj* isn't in our node list or context. If
found, *callback* is passed the context, the index of the node within
the context, and whatever were passed as ``*args`` and ``**kwargs``.
If *recursive* is ``True``, we'll look within *context* (``self`` by
default) and its descendants, otherwise just *context*. We raise
:py:exc:`ValueError` if *obj* isn't found. The return data is a list of
3-tuples (*type*, *context*, *data*) where *type* is *obj*\ 's best
type resolution (either ``Node``, ``Wikicode``, or ``str``), *context*
is the closest ``Wikicode`` encompassing it, and *data* is either a
``Node``, a list of ``Node``\ s, or ``None`` depending on *type*.
"""
if recursive:
for i, node in enumerate(context.nodes):
if self._is_equivalent(obj, node):
return callback(context, i, *args, **kwargs)
if self._contains(self._get_children(node), obj):
context = self._get_context(node, obj)
return self._do_search(obj, recursive, callback, context,
*args, **kwargs)
raise ValueError(obj)
if not context:
context = self
literal = isinstance(obj, (Node, Wikicode))
obj = parse_anything(obj)
if not obj or obj not in self:
raise ValueError(obj)
if len(obj.nodes) == 1:
obj = obj.get(0)

compare = lambda a, b: (a is b) if literal else (a == b)
results = []
i = 0
while i < len(context.nodes):
node = context.get(i)
if isinstance(obj, Node) and compare(obj, node):
results.append((Node, context, node))
elif isinstance(obj, Wikicode) and compare(obj.get(0), node):
for j in range(1, len(obj.nodes)):
if not compare(obj.get(j), context.get(i + j)):
break
else:
nodes = list(context.nodes[i:i + len(obj.nodes)])
results.append((Wikicode, context, nodes))
i += len(obj.nodes) - 1
elif recursive:
contexts = node.__iternodes__(self._get_all_nodes)
processed = []
for code in (ctx for ctx, child in contexts):
if code and code not in processed and obj in code:
search = self._do_search(obj, recursive, code, literal)
results.extend(search)
processed.append(code)
i += 1

callback(context, self.index(obj, recursive=False), *args, **kwargs)
if not results and not literal and recursive:
results.append((str, context, None))
if not results and context is self:
raise ValueError(obj)
return results

def _get_tree(self, code, lines, marker, indent):
"""Build a tree to illustrate the way the Wikicode object was parsed.
@@ -253,41 +270,64 @@ class Wikicode(StringMixIn):
def insert_before(self, obj, value, recursive=True):
"""Insert *value* immediately before *obj* in the list of nodes.

*obj* can be either a string or a :py:class:`~.Node`. *value* can be
anything parasable by :py:func:`.parse_anything`. If *recursive* is
``True``, we will try to find *obj* within our child nodes even if it
is not a direct descendant of this :py:class:`~.Wikicode` object. If
*obj* is not in the node list, :py:exc:`ValueError` is raised.
*obj* can be either a string, a :py:class:`~.Node`, or other
:py:class:`~.Wikicode` object (as created by :py:meth:`get_sections`,
for example). *value* can be anything parasable by
:py:func:`.parse_anything`. If *recursive* is ``True``, we will try to
find *obj* within our child nodes even if it is not a direct descendant
of this :py:class:`~.Wikicode` object. If *obj* is not found,
:py:exc:`ValueError` is raised.
"""
callback = lambda self, i, value: self.insert(i, value)
self._do_search(obj, recursive, callback, self, value)
for restype, context, data in self._do_search(obj, recursive):
if restype in (Node, Wikicode):
i = context.index(data if restype is Node else data[0], False)
context.insert(i, value)
else:
obj = str(obj)
context.nodes = str(context).replace(obj, str(value) + obj)

def insert_after(self, obj, value, recursive=True):
"""Insert *value* immediately after *obj* in the list of nodes.

*obj* can be either a string or a :py:class:`~.Node`. *value* can be
anything parasable by :py:func:`.parse_anything`. If *recursive* is
``True``, we will try to find *obj* within our child nodes even if it
is not a direct descendant of this :py:class:`~.Wikicode` object. If
*obj* is not in the node list, :py:exc:`ValueError` is raised.
*obj* can be either a string, a :py:class:`~.Node`, or other
:py:class:`~.Wikicode` object (as created by :py:meth:`get_sections`,
for example). *value* can be anything parasable by
:py:func:`.parse_anything`. If *recursive* is ``True``, we will try to
find *obj* within our child nodes even if it is not a direct descendant
of this :py:class:`~.Wikicode` object. If *obj* is not found,
:py:exc:`ValueError` is raised.
"""
callback = lambda self, i, value: self.insert(i + 1, value)
self._do_search(obj, recursive, callback, self, value)
for restype, context, data in self._do_search(obj, recursive):
if restype in (Node, Wikicode):
i = context.index(data if restype is Node else data[-1], False)
context.insert(i + 1, value)
else:
obj = str(obj)
context.nodes = str(context).replace(obj, obj + str(value))

def replace(self, obj, value, recursive=True):
"""Replace *obj* with *value* in the list of nodes.

*obj* can be either a string or a :py:class:`~.Node`. *value* can be
anything parasable by :py:func:`.parse_anything`. If *recursive* is
``True``, we will try to find *obj* within our child nodes even if it
is not a direct descendant of this :py:class:`~.Wikicode` object. If
*obj* is not in the node list, :py:exc:`ValueError` is raised.
*obj* can be either a string, a :py:class:`~.Node`, or other
:py:class:`~.Wikicode` object (as created by :py:meth:`get_sections`,
for example). *value* can be anything parasable by
:py:func:`.parse_anything`. If *recursive* is ``True``, we will try to
find *obj* within our child nodes even if it is not a direct descendant
of this :py:class:`~.Wikicode` object. If *obj* is not found,
:py:exc:`ValueError` is raised.
"""
def callback(self, i, value):
self.nodes.pop(i)
self.insert(i, value)

self._do_search(obj, recursive, callback, self, value)
for restype, context, data in self._do_search(obj, recursive):
if restype is Node:
i = context.index(data, False)
context.nodes.pop(i)
context.insert(i, value)
elif restype is Wikicode:
i = context.index(data[0], False)
for _ in data:
context.nodes.pop(i)
context.insert(i, value)
else:
context.nodes = str(context).replace(str(obj), str(value))

def append(self, value):
"""Insert *value* at the end of the list of nodes.
@@ -301,15 +341,39 @@ class Wikicode(StringMixIn):
def remove(self, obj, recursive=True):
"""Remove *obj* from the list of nodes.

*obj* can be either a string or a :py:class:`~.Node`. If *recursive* is
``True``, we will try to find *obj* within our child nodes even if it
is not a direct descendant of this :py:class:`~.Wikicode` object. If
*obj* is not in the node list, :py:exc:`ValueError` is raised.
*obj* can be either a string, a :py:class:`~.Node`, or other
:py:class:`~.Wikicode` object (as created by :py:meth:`get_sections`,
for example). If *recursive* is ``True``, we will try to find *obj*
within our child nodes even if it is not a direct descendant of this
:py:class:`~.Wikicode` object. If *obj* is not found,
:py:exc:`ValueError` is raised.
"""
for restype, context, data in self._do_search(obj, recursive):
if restype is Node:
context.nodes.pop(context.index(data, False))
elif restype is Wikicode:
i = context.index(data[0], False)
for _ in data:
context.nodes.pop(i)
else:
context.nodes = str(context).replace(str(obj), "")

def matches(self, other):
"""Do a loose equivalency test suitable for comparing page names.

*other* can be any string-like object, including
:py:class:`~.Wikicode`. This operation is symmetric; both sides are
adjusted. Specifically, whitespace and markup is stripped and the first
letter's case is normalized. Typical usage is
``if template.name.matches("stub"): ...``.
"""
callback = lambda self, i: self.nodes.pop(i)
self._do_search(obj, recursive, callback, self)
this = self.strip_code().strip()
that = parse_anything(other).strip_code().strip()
if not this or not that:
return this == that
return this[0].upper() + this[1:] == that[0].upper() + that[1:]

def ifilter(self, recursive=False, matches=None, flags=FLAGS,
def ifilter(self, recursive=True, matches=None, flags=FLAGS,
forcetype=None):
"""Iterate over nodes in our list matching certain conditions.

@@ -327,7 +391,7 @@ class Wikicode(StringMixIn):
if not matches or re.search(matches, str(node), flags):
yield node

def filter(self, recursive=False, matches=None, flags=FLAGS,
def filter(self, recursive=True, matches=None, flags=FLAGS,
forcetype=None):
"""Return a list of nodes within our list matching certain conditions.

@@ -360,9 +424,8 @@ class Wikicode(StringMixIn):
"""
if matches:
matches = r"^(=+?)\s*" + matches + r"\s*\1$"
headings = self.filter_headings(recursive=True)
filtered = self.filter_headings(recursive=True, matches=matches,
flags=flags)
headings = self.filter_headings()
filtered = self.filter_headings(matches=matches, flags=flags)
if levels:
filtered = [head for head in filtered if head.level in levels]

@@ -446,6 +509,6 @@ class Wikicode(StringMixIn):
return "\n".join(self._get_tree(self, [], marker, 0))

Wikicode._build_filter_methods(
arguments=Argument, comments=Comment, headings=Heading,
html_entities=HTMLEntity, tags=Tag, templates=Template, text=Text,
wikilinks=Wikilink)
arguments=Argument, comments=Comment, external_links=ExternalLink,
headings=Heading, html_entities=HTMLEntity, tags=Tag, templates=Template,
text=Text, wikilinks=Wikilink)

+ 3
- 6
setup.py View File

@@ -29,16 +29,13 @@ from mwparserfromhell.compat import py3k
with open("README.rst") as fp:
long_docs = fp.read()

# builder = Extension("mwparserfromhell.parser._builder",
# sources = ["mwparserfromhell/parser/builder.c"])

tokenizer = Extension("mwparserfromhell.parser._tokenizer",
sources = ["mwparserfromhell/parser/tokenizer.c"])

setup(
name = "mwparserfromhell",
packages = find_packages(exclude=("tests",)),
ext_modules = [] if py3k else [tokenizer],
ext_modules = [tokenizer],
test_suite = "tests",
version = __version__,
author = "Ben Kurtovic",
@@ -50,13 +47,13 @@ setup(
keywords = "earwig mwparserfromhell wikipedia wiki mediawiki wikicode template parsing",
license = "MIT License",
classifiers = [
"Development Status :: 3 - Alpha",
"Development Status :: 4 - Beta",
"Environment :: Console",
"Intended Audience :: Developers",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
"Programming Language :: Python :: 2.7",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.3",
"Topic :: Text Processing :: Markup"
],
)

+ 21
- 1
tests/_test_tree_equality.py View File

@@ -91,7 +91,27 @@ class TreeEqualityTestCase(TestCase):

def assertTagNodeEqual(self, expected, actual):
"""Assert that two Tag nodes have the same data."""
self.fail("Holding this until feature/html_tags is ready.")
self.assertWikicodeEqual(expected.tag, actual.tag)
if expected.contents is not None:
self.assertWikicodeEqual(expected.contents, actual.contents)
length = len(expected.attributes)
self.assertEqual(length, len(actual.attributes))
for i in range(length):
exp_attr = expected.attributes[i]
act_attr = actual.attributes[i]
self.assertWikicodeEqual(exp_attr.name, act_attr.name)
if exp_attr.value is not None:
self.assertWikicodeEqual(exp_attr.value, act_attr.value)
self.assertIs(exp_attr.quoted, act_attr.quoted)
self.assertEqual(exp_attr.pad_first, act_attr.pad_first)
self.assertEqual(exp_attr.pad_before_eq, act_attr.pad_before_eq)
self.assertEqual(exp_attr.pad_after_eq, act_attr.pad_after_eq)
self.assertIs(expected.wiki_markup, actual.wiki_markup)
self.assertIs(expected.self_closing, actual.self_closing)
self.assertIs(expected.invalid, actual.invalid)
self.assertIs(expected.implicit, actual.implicit)
self.assertEqual(expected.padding, actual.padding)
self.assertWikicodeEqual(expected.closing_tag, actual.closing_tag)

def assertTemplateNodeEqual(self, expected, actual):
"""Assert that two Template nodes have the same data."""


+ 89
- 0
tests/test_attribute.py View File

@@ -0,0 +1,89 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import unicode_literals
import unittest

from mwparserfromhell.compat import str
from mwparserfromhell.nodes import Template
from mwparserfromhell.nodes.extras import Attribute

from ._test_tree_equality import TreeEqualityTestCase, wrap, wraptext

class TestAttribute(TreeEqualityTestCase):
"""Test cases for the Attribute node extra."""

def test_unicode(self):
"""test Attribute.__unicode__()"""
node = Attribute(wraptext("foo"))
self.assertEqual(" foo", str(node))
node2 = Attribute(wraptext("foo"), wraptext("bar"))
self.assertEqual(' foo="bar"', str(node2))
node3 = Attribute(wraptext("a"), wraptext("b"), True, "", " ", " ")
self.assertEqual('a = "b"', str(node3))
node3 = Attribute(wraptext("a"), wraptext("b"), False, "", " ", " ")
self.assertEqual("a = b", str(node3))
node4 = Attribute(wraptext("a"), wrap([]), False, " ", "", " ")
self.assertEqual(" a= ", str(node4))

def test_name(self):
"""test getter/setter for the name attribute"""
name = wraptext("id")
node = Attribute(name, wraptext("bar"))
self.assertIs(name, node.name)
node.name = "{{id}}"
self.assertWikicodeEqual(wrap([Template(wraptext("id"))]), node.name)

def test_value(self):
"""test getter/setter for the value attribute"""
value = wraptext("foo")
node = Attribute(wraptext("id"), value)
self.assertIs(value, node.value)
node.value = "{{bar}}"
self.assertWikicodeEqual(wrap([Template(wraptext("bar"))]), node.value)
node.value = None
self.assertIs(None, node.value)

def test_quoted(self):
"""test getter/setter for the quoted attribute"""
node1 = Attribute(wraptext("id"), wraptext("foo"), False)
node2 = Attribute(wraptext("id"), wraptext("bar"))
self.assertFalse(node1.quoted)
self.assertTrue(node2.quoted)
node1.quoted = True
node2.quoted = ""
self.assertTrue(node1.quoted)
self.assertFalse(node2.quoted)

def test_padding(self):
"""test getter/setter for the padding attributes"""
for pad in ["pad_first", "pad_before_eq", "pad_after_eq"]:
node = Attribute(wraptext("id"), wraptext("foo"), **{pad: "\n"})
self.assertEqual("\n", getattr(node, pad))
setattr(node, pad, " ")
self.assertEqual(" ", getattr(node, pad))
setattr(node, pad, None)
self.assertEqual("", getattr(node, pad))
self.assertRaises(ValueError, setattr, node, pad, True)

if __name__ == "__main__":
unittest.main(verbosity=2)

+ 175
- 2
tests/test_builder.py View File

@@ -23,8 +23,8 @@
from __future__ import unicode_literals
import unittest

from mwparserfromhell.nodes import (Argument, Comment, Heading, HTMLEntity,
Tag, Template, Text, Wikilink)
from mwparserfromhell.nodes import (Argument, Comment, ExternalLink, Heading,
HTMLEntity, Tag, Template, Text, Wikilink)
from mwparserfromhell.nodes.extras import Attribute, Parameter
from mwparserfromhell.parser import tokens
from mwparserfromhell.parser.builder import Builder
@@ -72,6 +72,14 @@ class TestBuilder(TreeEqualityTestCase):
wrap([Template(wraptext("foo"), params=[
Parameter(wraptext("bar"), wraptext("baz"))])])),

([tokens.TemplateOpen(), tokens.TemplateParamSeparator(),
tokens.TemplateParamSeparator(), tokens.TemplateParamEquals(),
tokens.TemplateParamSeparator(), tokens.TemplateClose()],
wrap([Template(wrap([]), params=[
Parameter(wraptext("1"), wrap([]), showkey=False),
Parameter(wrap([]), wrap([]), showkey=True),
Parameter(wraptext("2"), wrap([]), showkey=False)])])),

([tokens.TemplateOpen(), tokens.Text(text="foo"),
tokens.TemplateParamSeparator(), tokens.Text(text="bar"),
tokens.TemplateParamEquals(), tokens.Text(text="baz"),
@@ -142,6 +150,48 @@ class TestBuilder(TreeEqualityTestCase):
for test, valid in tests:
self.assertWikicodeEqual(valid, self.builder.build(test))

def test_external_link(self):
"""tests for building ExternalLink nodes"""
tests = [
([tokens.ExternalLinkOpen(brackets=False),
tokens.Text(text="http://example.com/"),
tokens.ExternalLinkClose()],
wrap([ExternalLink(wraptext("http://example.com/"),
brackets=False)])),

([tokens.ExternalLinkOpen(brackets=True),
tokens.Text(text="http://example.com/"),
tokens.ExternalLinkClose()],
wrap([ExternalLink(wraptext("http://example.com/"))])),

([tokens.ExternalLinkOpen(brackets=True),
tokens.Text(text="http://example.com/"),
tokens.ExternalLinkSeparator(), tokens.ExternalLinkClose()],
wrap([ExternalLink(wraptext("http://example.com/"), wrap([]))])),

([tokens.ExternalLinkOpen(brackets=True),
tokens.Text(text="http://example.com/"),
tokens.ExternalLinkSeparator(), tokens.Text(text="Example"),
tokens.ExternalLinkClose()],
wrap([ExternalLink(wraptext("http://example.com/"),
wraptext("Example"))])),

([tokens.ExternalLinkOpen(brackets=False),
tokens.Text(text="http://example"), tokens.Text(text=".com/foo"),
tokens.ExternalLinkClose()],
wrap([ExternalLink(wraptext("http://example", ".com/foo"),
brackets=False)])),

([tokens.ExternalLinkOpen(brackets=True),
tokens.Text(text="http://example"), tokens.Text(text=".com/foo"),
tokens.ExternalLinkSeparator(), tokens.Text(text="Example"),
tokens.Text(text=" Web Page"), tokens.ExternalLinkClose()],
wrap([ExternalLink(wraptext("http://example", ".com/foo"),
wraptext("Example", " Web Page"))])),
]
for test, valid in tests:
self.assertWikicodeEqual(valid, self.builder.build(test))

def test_html_entity(self):
"""tests for building HTMLEntity nodes"""
tests = [
@@ -190,6 +240,129 @@ class TestBuilder(TreeEqualityTestCase):
for test, valid in tests:
self.assertWikicodeEqual(valid, self.builder.build(test))

def test_tag(self):
"""tests for building Tag nodes"""
tests = [
# <ref></ref>
([tokens.TagOpenOpen(), tokens.Text(text="ref"),
tokens.TagCloseOpen(padding=""), tokens.TagOpenClose(),
tokens.Text(text="ref"), tokens.TagCloseClose()],
wrap([Tag(wraptext("ref"), wrap([]),
closing_tag=wraptext("ref"))])),

# <ref name></ref>
([tokens.TagOpenOpen(), tokens.Text(text="ref"),
tokens.TagAttrStart(pad_first=" ", pad_before_eq="",
pad_after_eq=""),
tokens.Text(text="name"), tokens.TagCloseOpen(padding=""),
tokens.TagOpenClose(), tokens.Text(text="ref"),
tokens.TagCloseClose()],
wrap([Tag(wraptext("ref"), wrap([]),
attrs=[Attribute(wraptext("name"))])])),

# <ref name="abc" />
([tokens.TagOpenOpen(), tokens.Text(text="ref"),
tokens.TagAttrStart(pad_first=" ", pad_before_eq="",
pad_after_eq=""),
tokens.Text(text="name"), tokens.TagAttrEquals(),
tokens.TagAttrQuote(), tokens.Text(text="abc"),
tokens.TagCloseSelfclose(padding=" ")],
wrap([Tag(wraptext("ref"),
attrs=[Attribute(wraptext("name"), wraptext("abc"))],
self_closing=True, padding=" ")])),

# <br/>
([tokens.TagOpenOpen(), tokens.Text(text="br"),
tokens.TagCloseSelfclose(padding="")],
wrap([Tag(wraptext("br"), self_closing=True)])),

# <li>
([tokens.TagOpenOpen(), tokens.Text(text="li"),
tokens.TagCloseSelfclose(padding="", implicit=True)],
wrap([Tag(wraptext("li"), self_closing=True, implicit=True)])),

# </br>
([tokens.TagOpenOpen(invalid=True), tokens.Text(text="br"),
tokens.TagCloseSelfclose(padding="", implicit=True)],
wrap([Tag(wraptext("br"), self_closing=True, invalid=True,
implicit=True)])),

# </br/>
([tokens.TagOpenOpen(invalid=True), tokens.Text(text="br"),
tokens.TagCloseSelfclose(padding="")],
wrap([Tag(wraptext("br"), self_closing=True, invalid=True)])),

# <ref name={{abc}} foo="bar {{baz}}" abc={{de}}f ghi=j{{k}}{{l}}
# mno = "{{p}} [[q]] {{r}}">[[Source]]</ref>
([tokens.TagOpenOpen(), tokens.Text(text="ref"),
tokens.TagAttrStart(pad_first=" ", pad_before_eq="",
pad_after_eq=""),
tokens.Text(text="name"), tokens.TagAttrEquals(),
tokens.TemplateOpen(), tokens.Text(text="abc"),
tokens.TemplateClose(),
tokens.TagAttrStart(pad_first=" ", pad_before_eq="",
pad_after_eq=""),
tokens.Text(text="foo"), tokens.TagAttrEquals(),
tokens.TagAttrQuote(), tokens.Text(text="bar "),
tokens.TemplateOpen(), tokens.Text(text="baz"),
tokens.TemplateClose(),
tokens.TagAttrStart(pad_first=" ", pad_before_eq="",
pad_after_eq=""),
tokens.Text(text="abc"), tokens.TagAttrEquals(),
tokens.TemplateOpen(), tokens.Text(text="de"),
tokens.TemplateClose(), tokens.Text(text="f"),
tokens.TagAttrStart(pad_first=" ", pad_before_eq="",
pad_after_eq=""),
tokens.Text(text="ghi"), tokens.TagAttrEquals(),
tokens.Text(text="j"), tokens.TemplateOpen(),
tokens.Text(text="k"), tokens.TemplateClose(),
tokens.TemplateOpen(), tokens.Text(text="l"),
tokens.TemplateClose(),
tokens.TagAttrStart(pad_first=" \n ", pad_before_eq=" ",
pad_after_eq=" "),
tokens.Text(text="mno"), tokens.TagAttrEquals(),
tokens.TagAttrQuote(), tokens.TemplateOpen(),
tokens.Text(text="p"), tokens.TemplateClose(),
tokens.Text(text=" "), tokens.WikilinkOpen(),
tokens.Text(text="q"), tokens.WikilinkClose(),
tokens.Text(text=" "), tokens.TemplateOpen(),
tokens.Text(text="r"), tokens.TemplateClose(),
tokens.TagCloseOpen(padding=""), tokens.WikilinkOpen(),
tokens.Text(text="Source"), tokens.WikilinkClose(),
tokens.TagOpenClose(), tokens.Text(text="ref"),
tokens.TagCloseClose()],
wrap([Tag(wraptext("ref"), wrap([Wikilink(wraptext("Source"))]), [
Attribute(wraptext("name"),
wrap([Template(wraptext("abc"))]), False),
Attribute(wraptext("foo"), wrap([Text("bar "),
Template(wraptext("baz"))]), pad_first=" "),
Attribute(wraptext("abc"), wrap([Template(wraptext("de")),
Text("f")]), False),
Attribute(wraptext("ghi"), wrap([Text("j"),
Template(wraptext("k")),
Template(wraptext("l"))]), False),
Attribute(wraptext("mno"), wrap([Template(wraptext("p")),
Text(" "), Wikilink(wraptext("q")), Text(" "),
Template(wraptext("r"))]), True, " \n ", " ",
" ")])])),

# "''italic text''"
([tokens.TagOpenOpen(wiki_markup="''"), tokens.Text(text="i"),
tokens.TagCloseOpen(), tokens.Text(text="italic text"),
tokens.TagOpenClose(), tokens.Text(text="i"),
tokens.TagCloseClose()],
wrap([Tag(wraptext("i"), wraptext("italic text"),
wiki_markup="''")])),

# * bullet
([tokens.TagOpenOpen(wiki_markup="*"), tokens.Text(text="li"),
tokens.TagCloseSelfclose(), tokens.Text(text=" bullet")],
wrap([Tag(wraptext("li"), wiki_markup="*", self_closing=True),
Text(" bullet")])),
]
for test, valid in tests:
self.assertWikicodeEqual(valid, self.builder.build(test))

def test_integration(self):
"""a test for building a combination of templates together"""
# {{{{{{{{foo}}bar|baz=biz}}buzz}}usr|{{bin}}}}


+ 14
- 14
tests/test_docs.py View File

@@ -61,36 +61,36 @@ class TestDocs(unittest.TestCase):

def test_readme_2(self):
"""test a block of example code in the README"""
text = "{{foo|{{bar}}={{baz|{{spam}}}}}}"
temps = mwparserfromhell.parse(text).filter_templates()
if py3k:
res = "['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}']"
else:
res = "[u'{{foo|{{bar}}={{baz|{{spam}}}}}}', u'{{bar}}', u'{{baz|{{spam}}}}', u'{{spam}}']"
self.assertPrint(temps, res)

def test_readme_3(self):
"""test a block of example code in the README"""
code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}")
if py3k:
self.assertPrint(code.filter_templates(),
self.assertPrint(code.filter_templates(recursive=False),
"['{{foo|this {{includes a|template}}}}']")
else:
self.assertPrint(code.filter_templates(),
self.assertPrint(code.filter_templates(recursive=False),
"[u'{{foo|this {{includes a|template}}}}']")
foo = code.filter_templates()[0]
foo = code.filter_templates(recursive=False)[0]
self.assertPrint(foo.get(1).value, "this {{includes a|template}}")
self.assertPrint(foo.get(1).value.filter_templates()[0],
"{{includes a|template}}")
self.assertPrint(foo.get(1).value.filter_templates()[0].get(1).value,
"template")

def test_readme_3(self):
"""test a block of example code in the README"""
text = "{{foo|{{bar}}={{baz|{{spam}}}}}}"
temps = mwparserfromhell.parse(text).filter_templates(recursive=True)
if py3k:
res = "['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}']"
else:
res = "[u'{{foo|{{bar}}={{baz|{{spam}}}}}}', u'{{bar}}', u'{{baz|{{spam}}}}', u'{{spam}}']"
self.assertPrint(temps, res)

def test_readme_4(self):
"""test a block of example code in the README"""
text = "{{cleanup}} '''Foo''' is a [[bar]]. {{uncategorized}}"
code = mwparserfromhell.parse(text)
for template in code.filter_templates():
if template.name == "cleanup" and not template.has_param("date"):
if template.name.matches("Cleanup") and not template.has("date"):
template.add("date", "July 2012")
res = "{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{uncategorized}}"
self.assertPrint(code, res)


+ 130
- 0
tests/test_external_link.py View File

@@ -0,0 +1,130 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import unicode_literals
import unittest

from mwparserfromhell.compat import str
from mwparserfromhell.nodes import ExternalLink, Text

from ._test_tree_equality import TreeEqualityTestCase, getnodes, wrap, wraptext

class TestExternalLink(TreeEqualityTestCase):
"""Test cases for the ExternalLink node."""

def test_unicode(self):
"""test ExternalLink.__unicode__()"""
node = ExternalLink(wraptext("http://example.com/"), brackets=False)
self.assertEqual("http://example.com/", str(node))
node2 = ExternalLink(wraptext("http://example.com/"))
self.assertEqual("[http://example.com/]", str(node2))
node3 = ExternalLink(wraptext("http://example.com/"), wrap([]))
self.assertEqual("[http://example.com/ ]", str(node3))
node4 = ExternalLink(wraptext("http://example.com/"),
wraptext("Example Web Page"))
self.assertEqual("[http://example.com/ Example Web Page]", str(node4))

def test_iternodes(self):
"""test ExternalLink.__iternodes__()"""
node1n1 = Text("http://example.com/")
node2n1 = Text("http://example.com/")
node2n2, node2n3 = Text("Example"), Text("Page")
node1 = ExternalLink(wrap([node1n1]), brackets=False)
node2 = ExternalLink(wrap([node2n1]), wrap([node2n2, node2n3]))
gen1 = node1.__iternodes__(getnodes)
gen2 = node2.__iternodes__(getnodes)
self.assertEqual((None, node1), next(gen1))
self.assertEqual((None, node2), next(gen2))
self.assertEqual((node1.url, node1n1), next(gen1))
self.assertEqual((node2.url, node2n1), next(gen2))
self.assertEqual((node2.title, node2n2), next(gen2))
self.assertEqual((node2.title, node2n3), next(gen2))
self.assertRaises(StopIteration, next, gen1)
self.assertRaises(StopIteration, next, gen2)

def test_strip(self):
"""test ExternalLink.__strip__()"""
node1 = ExternalLink(wraptext("http://example.com"), brackets=False)
node2 = ExternalLink(wraptext("http://example.com"))
node3 = ExternalLink(wraptext("http://example.com"), wrap([]))
node4 = ExternalLink(wraptext("http://example.com"), wraptext("Link"))
for a in (True, False):
for b in (True, False):
self.assertEqual("http://example.com", node1.__strip__(a, b))
self.assertEqual(None, node2.__strip__(a, b))
self.assertEqual(None, node3.__strip__(a, b))
self.assertEqual("Link", node4.__strip__(a, b))

def test_showtree(self):
"""test ExternalLink.__showtree__()"""
output = []
getter, marker = object(), object()
get = lambda code: output.append((getter, code))
mark = lambda: output.append(marker)
node1 = ExternalLink(wraptext("http://example.com"), brackets=False)
node2 = ExternalLink(wraptext("http://example.com"), wraptext("Link"))
node1.__showtree__(output.append, get, mark)
node2.__showtree__(output.append, get, mark)
valid = [
(getter, node1.url), "[", (getter, node2.url),
(getter, node2.title), "]"]
self.assertEqual(valid, output)

def test_url(self):
"""test getter/setter for the url attribute"""
url = wraptext("http://example.com/")
node1 = ExternalLink(url, brackets=False)
node2 = ExternalLink(url, wraptext("Example"))
self.assertIs(url, node1.url)
self.assertIs(url, node2.url)
node1.url = "mailto:héhehé@spam.com"
node2.url = "mailto:héhehé@spam.com"
self.assertWikicodeEqual(wraptext("mailto:héhehé@spam.com"), node1.url)
self.assertWikicodeEqual(wraptext("mailto:héhehé@spam.com"), node2.url)

def test_title(self):
"""test getter/setter for the title attribute"""
title = wraptext("Example!")
node1 = ExternalLink(wraptext("http://example.com/"), brackets=False)
node2 = ExternalLink(wraptext("http://example.com/"), title)
self.assertIs(None, node1.title)
self.assertIs(title, node2.title)
node2.title = None
self.assertIs(None, node2.title)
node2.title = "My Website"
self.assertWikicodeEqual(wraptext("My Website"), node2.title)

def test_brackets(self):
"""test getter/setter for the brackets attribute"""
node1 = ExternalLink(wraptext("http://example.com/"), brackets=False)
node2 = ExternalLink(wraptext("http://example.com/"), wraptext("Link"))
self.assertFalse(node1.brackets)
self.assertTrue(node2.brackets)
node1.brackets = True
node2.brackets = False
self.assertTrue(node1.brackets)
self.assertFalse(node2.brackets)
self.assertEqual("[http://example.com/]", str(node1))
self.assertEqual("http://example.com/", str(node2))

if __name__ == "__main__":
unittest.main(verbosity=2)

+ 3
- 3
tests/test_parser.py View File

@@ -36,9 +36,9 @@ class TestParser(TreeEqualityTestCase):
def test_use_c(self):
"""make sure the correct tokenizer is used"""
if parser.use_c:
self.assertTrue(parser.Parser(None)._tokenizer.USES_C)
self.assertTrue(parser.Parser()._tokenizer.USES_C)
parser.use_c = False
self.assertFalse(parser.Parser(None)._tokenizer.USES_C)
self.assertFalse(parser.Parser()._tokenizer.USES_C)

def test_parsing(self):
"""integration test for parsing overall"""
@@ -59,7 +59,7 @@ class TestParser(TreeEqualityTestCase):
]))
])
])
actual = parser.Parser(text).parse()
actual = parser.Parser().parse(text)
self.assertWikicodeEqual(expected, actual)

if __name__ == "__main__":


+ 315
- 0
tests/test_tag.py View File

@@ -0,0 +1,315 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import unicode_literals
import unittest

from mwparserfromhell.compat import str
from mwparserfromhell.nodes import Tag, Template, Text
from mwparserfromhell.nodes.extras import Attribute
from ._test_tree_equality import TreeEqualityTestCase, getnodes, wrap, wraptext

agen = lambda name, value: Attribute(wraptext(name), wraptext(value))
agennq = lambda name, value: Attribute(wraptext(name), wraptext(value), False)
agenp = lambda name, v, a, b, c: Attribute(wraptext(name), v, True, a, b, c)
agenpnv = lambda name, a, b, c: Attribute(wraptext(name), None, True, a, b, c)

class TestTag(TreeEqualityTestCase):
"""Test cases for the Tag node."""

def test_unicode(self):
"""test Tag.__unicode__()"""
node1 = Tag(wraptext("ref"))
node2 = Tag(wraptext("span"), wraptext("foo"),
[agen("style", "color: red;")])
node3 = Tag(wraptext("ref"),
attrs=[agennq("name", "foo"),
agenpnv("some_attr", " ", "", "")],
self_closing=True)
node4 = Tag(wraptext("br"), self_closing=True, padding=" ")
node5 = Tag(wraptext("br"), self_closing=True, implicit=True)
node6 = Tag(wraptext("br"), self_closing=True, invalid=True,
implicit=True)
node7 = Tag(wraptext("br"), self_closing=True, invalid=True,
padding=" ")
node8 = Tag(wraptext("hr"), wiki_markup="----", self_closing=True)
node9 = Tag(wraptext("i"), wraptext("italics!"), wiki_markup="''")

self.assertEqual("<ref></ref>", str(node1))
self.assertEqual('<span style="color: red;">foo</span>', str(node2))
self.assertEqual("<ref name=foo some_attr/>", str(node3))
self.assertEqual("<br />", str(node4))
self.assertEqual("<br>", str(node5))
self.assertEqual("</br>", str(node6))
self.assertEqual("</br />", str(node7))
self.assertEqual("----", str(node8))
self.assertEqual("''italics!''", str(node9))

def test_iternodes(self):
"""test Tag.__iternodes__()"""
node1n1, node1n2 = Text("ref"), Text("foobar")
node2n1, node3n1, node3n2 = Text("bold text"), Text("img"), Text("id")
node3n3, node3n4, node3n5 = Text("foo"), Text("class"), Text("bar")

# <ref>foobar</ref>
node1 = Tag(wrap([node1n1]), wrap([node1n2]))
# '''bold text'''
node2 = Tag(wraptext("b"), wrap([node2n1]), wiki_markup="'''")
# <img id="foo" class="bar" />
node3 = Tag(wrap([node3n1]),
attrs=[Attribute(wrap([node3n2]), wrap([node3n3])),
Attribute(wrap([node3n4]), wrap([node3n5]))],
self_closing=True, padding=" ")

gen1 = node1.__iternodes__(getnodes)
gen2 = node2.__iternodes__(getnodes)
gen3 = node3.__iternodes__(getnodes)
self.assertEqual((None, node1), next(gen1))
self.assertEqual((None, node2), next(gen2))
self.assertEqual((None, node3), next(gen3))
self.assertEqual((node1.tag, node1n1), next(gen1))
self.assertEqual((node3.tag, node3n1), next(gen3))
self.assertEqual((node3.attributes[0].name, node3n2), next(gen3))
self.assertEqual((node3.attributes[0].value, node3n3), next(gen3))
self.assertEqual((node3.attributes[1].name, node3n4), next(gen3))
self.assertEqual((node3.attributes[1].value, node3n5), next(gen3))
self.assertEqual((node1.contents, node1n2), next(gen1))
self.assertEqual((node2.contents, node2n1), next(gen2))
self.assertEqual((node1.closing_tag, node1n1), next(gen1))
self.assertRaises(StopIteration, next, gen1)
self.assertRaises(StopIteration, next, gen2)
self.assertRaises(StopIteration, next, gen3)

def test_strip(self):
"""test Tag.__strip__()"""
node1 = Tag(wraptext("i"), wraptext("foobar"))
node2 = Tag(wraptext("math"), wraptext("foobar"))
node3 = Tag(wraptext("br"), self_closing=True)
for a in (True, False):
for b in (True, False):
self.assertEqual("foobar", node1.__strip__(a, b))
self.assertEqual(None, node2.__strip__(a, b))
self.assertEqual(None, node3.__strip__(a, b))

def test_showtree(self):
"""test Tag.__showtree__()"""
output = []
getter, marker = object(), object()
get = lambda code: output.append((getter, code))
mark = lambda: output.append(marker)
node1 = Tag(wraptext("ref"), wraptext("text"), [agen("name", "foo")])
node2 = Tag(wraptext("br"), self_closing=True, padding=" ")
node3 = Tag(wraptext("br"), self_closing=True, invalid=True,
implicit=True, padding=" ")
node1.__showtree__(output.append, get, mark)
node2.__showtree__(output.append, get, mark)
node3.__showtree__(output.append, get, mark)
valid = [
"<", (getter, node1.tag), (getter, node1.attributes[0].name),
" = ", marker, (getter, node1.attributes[0].value), ">",
(getter, node1.contents), "</", (getter, node1.closing_tag), ">",
"<", (getter, node2.tag), "/>", "</", (getter, node3.tag), ">"]
self.assertEqual(valid, output)

def test_tag(self):
"""test getter/setter for the tag attribute"""
tag = wraptext("ref")
node = Tag(tag, wraptext("text"))
self.assertIs(tag, node.tag)
self.assertIs(tag, node.closing_tag)
node.tag = "span"
self.assertWikicodeEqual(wraptext("span"), node.tag)
self.assertWikicodeEqual(wraptext("span"), node.closing_tag)
self.assertEqual("<span>text</span>", node)

def test_contents(self):
"""test getter/setter for the contents attribute"""
contents = wraptext("text")
node = Tag(wraptext("ref"), contents)
self.assertIs(contents, node.contents)
node.contents = "text and a {{template}}"
parsed = wrap([Text("text and a "), Template(wraptext("template"))])
self.assertWikicodeEqual(parsed, node.contents)
self.assertEqual("<ref>text and a {{template}}</ref>", node)

def test_attributes(self):
"""test getter for the attributes attribute"""
attrs = [agen("name", "bar")]
node1 = Tag(wraptext("ref"), wraptext("foo"))
node2 = Tag(wraptext("ref"), wraptext("foo"), attrs)
self.assertEqual([], node1.attributes)
self.assertIs(attrs, node2.attributes)

def test_wiki_markup(self):
"""test getter/setter for the wiki_markup attribute"""
node = Tag(wraptext("i"), wraptext("italic text"))
self.assertIs(None, node.wiki_markup)
node.wiki_markup = "''"
self.assertEqual("''", node.wiki_markup)
self.assertEqual("''italic text''", node)
node.wiki_markup = False
self.assertFalse(node.wiki_markup)
self.assertEqual("<i>italic text</i>", node)

def test_self_closing(self):
"""test getter/setter for the self_closing attribute"""
node = Tag(wraptext("ref"), wraptext("foobar"))
self.assertFalse(node.self_closing)
node.self_closing = True
self.assertTrue(node.self_closing)
self.assertEqual("<ref/>", node)
node.self_closing = 0
self.assertFalse(node.self_closing)
self.assertEqual("<ref>foobar</ref>", node)

def test_invalid(self):
"""test getter/setter for the invalid attribute"""
node = Tag(wraptext("br"), self_closing=True, implicit=True)
self.assertFalse(node.invalid)
node.invalid = True
self.assertTrue(node.invalid)
self.assertEqual("</br>", node)
node.invalid = 0
self.assertFalse(node.invalid)
self.assertEqual("<br>", node)

def test_implicit(self):
"""test getter/setter for the implicit attribute"""
node = Tag(wraptext("br"), self_closing=True)
self.assertFalse(node.implicit)
node.implicit = True
self.assertTrue(node.implicit)
self.assertEqual("<br>", node)
node.implicit = 0
self.assertFalse(node.implicit)
self.assertEqual("<br/>", node)

def test_padding(self):
"""test getter/setter for the padding attribute"""
node = Tag(wraptext("ref"), wraptext("foobar"))
self.assertEqual("", node.padding)
node.padding = " "
self.assertEqual(" ", node.padding)
self.assertEqual("<ref >foobar</ref>", node)
node.padding = None
self.assertEqual("", node.padding)
self.assertEqual("<ref>foobar</ref>", node)
self.assertRaises(ValueError, setattr, node, "padding", True)

def test_closing_tag(self):
"""test getter/setter for the closing_tag attribute"""
tag = wraptext("ref")
node = Tag(tag, wraptext("foobar"))
self.assertIs(tag, node.closing_tag)
node.closing_tag = "ref {{ignore me}}"
parsed = wrap([Text("ref "), Template(wraptext("ignore me"))])
self.assertWikicodeEqual(parsed, node.closing_tag)
self.assertEqual("<ref>foobar</ref {{ignore me}}>", node)

def test_has(self):
"""test Tag.has()"""
node = Tag(wraptext("ref"), wraptext("cite"), [agen("name", "foo")])
self.assertTrue(node.has("name"))
self.assertTrue(node.has(" name "))
self.assertTrue(node.has(wraptext("name")))
self.assertFalse(node.has("Name"))
self.assertFalse(node.has("foo"))

attrs = [agen("id", "foo"), agenp("class", "bar", " ", "\n", "\n"),
agen("foo", "bar"), agenpnv("foo", " ", " \n ", " \t")]
node2 = Tag(wraptext("div"), attrs=attrs, self_closing=True)
self.assertTrue(node2.has("id"))
self.assertTrue(node2.has("class"))
self.assertTrue(node2.has(attrs[1].pad_first + str(attrs[1].name) +
attrs[1].pad_before_eq))
self.assertTrue(node2.has(attrs[3]))
self.assertTrue(node2.has(str(attrs[3])))
self.assertFalse(node2.has("idclass"))
self.assertFalse(node2.has("id class"))
self.assertFalse(node2.has("id=foo"))

def test_get(self):
"""test Tag.get()"""
attrs = [agen("name", "foo")]
node = Tag(wraptext("ref"), wraptext("cite"), attrs)
self.assertIs(attrs[0], node.get("name"))
self.assertIs(attrs[0], node.get(" name "))
self.assertIs(attrs[0], node.get(wraptext("name")))
self.assertRaises(ValueError, node.get, "Name")
self.assertRaises(ValueError, node.get, "foo")

attrs = [agen("id", "foo"), agenp("class", "bar", " ", "\n", "\n"),
agen("foo", "bar"), agenpnv("foo", " ", " \n ", " \t")]
node2 = Tag(wraptext("div"), attrs=attrs, self_closing=True)
self.assertIs(attrs[0], node2.get("id"))
self.assertIs(attrs[1], node2.get("class"))
self.assertIs(attrs[1], node2.get(
attrs[1].pad_first + str(attrs[1].name) + attrs[1].pad_before_eq))
self.assertIs(attrs[3], node2.get(attrs[3]))
self.assertIs(attrs[3], node2.get(str(attrs[3])))
self.assertIs(attrs[3], node2.get(" foo"))
self.assertRaises(ValueError, node2.get, "idclass")
self.assertRaises(ValueError, node2.get, "id class")
self.assertRaises(ValueError, node2.get, "id=foo")

def test_add(self):
"""test Tag.add()"""
node = Tag(wraptext("ref"), wraptext("cite"))
node.add("name", "value")
node.add("name", "value", quoted=False)
node.add("name")
node.add(1, False)
node.add("style", "{{foobar}}")
node.add("name", "value", True, "\n", " ", " ")
attr1 = ' name="value"'
attr2 = " name=value"
attr3 = " name"
attr4 = ' 1="False"'
attr5 = ' style="{{foobar}}"'
attr6 = '\nname = "value"'
self.assertEqual(attr1, node.attributes[0])
self.assertEqual(attr2, node.attributes[1])
self.assertEqual(attr3, node.attributes[2])
self.assertEqual(attr4, node.attributes[3])
self.assertEqual(attr5, node.attributes[4])
self.assertEqual(attr6, node.attributes[5])
self.assertEqual(attr6, node.get("name"))
self.assertWikicodeEqual(wrap([Template(wraptext("foobar"))]),
node.attributes[4].value)
self.assertEqual("".join(("<ref", attr1, attr2, attr3, attr4, attr5,
attr6, ">cite</ref>")), node)

def test_remove(self):
"""test Tag.remove()"""
attrs = [agen("id", "foo"), agenp("class", "bar", " ", "\n", "\n"),
agen("foo", "bar"), agenpnv("foo", " ", " \n ", " \t")]
node = Tag(wraptext("div"), attrs=attrs, self_closing=True)
node.remove("class")
self.assertEqual('<div id="foo" foo="bar" foo \n />', node)
node.remove("foo")
self.assertEqual('<div id="foo"/>', node)
self.assertRaises(ValueError, node.remove, "foo")
node.remove("id")
self.assertEqual('<div/>', node)

if __name__ == "__main__":
unittest.main(verbosity=2)

+ 11
- 11
tests/test_template.py View File

@@ -115,23 +115,23 @@ class TestTemplate(TreeEqualityTestCase):
self.assertEqual([], node1.params)
self.assertIs(plist, node2.params)

def test_has_param(self):
"""test Template.has_param()"""
def test_has(self):
"""test Template.has()"""
node1 = Template(wraptext("foobar"))
node2 = Template(wraptext("foo"),
[pgenh("1", "bar"), pgens("\nabc ", "def")])
node3 = Template(wraptext("foo"),
[pgenh("1", "a"), pgens("b", "c"), pgens("1", "d")])
node4 = Template(wraptext("foo"), [pgenh("1", "a"), pgens("b", " ")])
self.assertFalse(node1.has_param("foobar"))
self.assertTrue(node2.has_param(1))
self.assertTrue(node2.has_param("abc"))
self.assertFalse(node2.has_param("def"))
self.assertTrue(node3.has_param("1"))
self.assertTrue(node3.has_param(" b "))
self.assertFalse(node4.has_param("b"))
self.assertTrue(node3.has_param("b", False))
self.assertTrue(node4.has_param("b", False))
self.assertFalse(node1.has("foobar"))
self.assertTrue(node2.has(1))
self.assertTrue(node2.has("abc"))
self.assertFalse(node2.has("def"))
self.assertTrue(node3.has("1"))
self.assertTrue(node3.has(" b "))
self.assertFalse(node4.has("b"))
self.assertTrue(node3.has("b", False))
self.assertTrue(node4.has("b", False))

def test_get(self):
"""test Template.get()"""


+ 3
- 3
tests/test_tokens.py View File

@@ -44,8 +44,8 @@ class TestTokens(unittest.TestCase):

self.assertEqual("bar", token2.foo)
self.assertEqual(123, token2.baz)
self.assertRaises(KeyError, lambda: token1.foo)
self.assertRaises(KeyError, lambda: token2.bar)
self.assertFalse(token1.foo)
self.assertFalse(token2.bar)

token1.spam = "eggs"
token2.foo = "ham"
@@ -53,7 +53,7 @@ class TestTokens(unittest.TestCase):

self.assertEqual("eggs", token1.spam)
self.assertEqual("ham", token2.foo)
self.assertRaises(KeyError, lambda: token2.baz)
self.assertFalse(token2.baz)
self.assertRaises(KeyError, delattr, token2, "baz")

def test_repr(self):


+ 115
- 73
tests/test_wikicode.py View File

@@ -21,6 +21,7 @@
# SOFTWARE.

from __future__ import unicode_literals
from functools import partial
import re
from types import GeneratorType
import unittest
@@ -122,66 +123,99 @@ class TestWikicode(TreeEqualityTestCase):
code3.insert(-1000, "derp")
self.assertEqual("derp{{foo}}bar[[baz]]", code3)

def _test_search(self, meth, expected):
"""Base test for insert_before(), insert_after(), and replace()."""
code = parse("{{a}}{{b}}{{c}}{{d}}{{e}}")
func = partial(meth, code)
func("{{b}}", "x", recursive=True)
func("{{d}}", "[[y]]", recursive=False)
func(code.get(2), "z")
self.assertEqual(expected[0], code)
self.assertRaises(ValueError, func, "{{r}}", "n", recursive=True)
self.assertRaises(ValueError, func, "{{r}}", "n", recursive=False)
fake = parse("{{a}}").get(0)
self.assertRaises(ValueError, func, fake, "n", recursive=True)
self.assertRaises(ValueError, func, fake, "n", recursive=False)

code2 = parse("{{a}}{{a}}{{a}}{{b}}{{b}}{{b}}")
func = partial(meth, code2)
func(code2.get(1), "c", recursive=False)
func("{{a}}", "d", recursive=False)
func(code2.get(-1), "e", recursive=True)
func("{{b}}", "f", recursive=True)
self.assertEqual(expected[1], code2)

code3 = parse("{{a|{{b}}|{{c|d={{f}}}}}}")
func = partial(meth, code3)
obj = code3.get(0).params[0].value.get(0)
self.assertRaises(ValueError, func, obj, "x", recursive=False)
func(obj, "x", recursive=True)
self.assertRaises(ValueError, func, "{{f}}", "y", recursive=False)
func("{{f}}", "y", recursive=True)
self.assertEqual(expected[2], code3)

code4 = parse("{{a}}{{b}}{{c}}{{d}}{{e}}{{f}}{{g}}{{h}}{{i}}{{j}}")
func = partial(meth, code4)
fake = parse("{{b}}{{c}}")
self.assertRaises(ValueError, func, fake, "q", recursive=False)
self.assertRaises(ValueError, func, fake, "q", recursive=True)
func("{{b}}{{c}}", "w", recursive=False)
func("{{d}}{{e}}", "x", recursive=True)
func(wrap(code4.nodes[-2:]), "y", recursive=False)
func(wrap(code4.nodes[-2:]), "z", recursive=True)
self.assertEqual(expected[3], code4)
self.assertRaises(ValueError, func, "{{c}}{{d}}", "q", recursive=False)
self.assertRaises(ValueError, func, "{{c}}{{d}}", "q", recursive=True)

code5 = parse("{{a|{{b}}{{c}}|{{f|{{g}}={{h}}{{i}}}}}}")
func = partial(meth, code5)
self.assertRaises(ValueError, func, "{{b}}{{c}}", "x", recursive=False)
func("{{b}}{{c}}", "x", recursive=True)
obj = code5.get(0).params[1].value.get(0).params[0].value
self.assertRaises(ValueError, func, obj, "y", recursive=False)
func(obj, "y", recursive=True)
self.assertEqual(expected[4], code5)

code6 = parse("here is {{some text and a {{template}}}}")
func = partial(meth, code6)
self.assertRaises(ValueError, func, "text and", "ab", recursive=False)
func("text and", "ab", recursive=True)
self.assertRaises(ValueError, func, "is {{some", "cd", recursive=False)
func("is {{some", "cd", recursive=True)
self.assertEqual(expected[5], code6)

def test_insert_before(self):
"""test Wikicode.insert_before()"""
code = parse("{{a}}{{b}}{{c}}{{d}}")
code.insert_before("{{b}}", "x", recursive=True)
code.insert_before("{{d}}", "[[y]]", recursive=False)
self.assertEqual("{{a}}x{{b}}{{c}}[[y]]{{d}}", code)
code.insert_before(code.get(2), "z")
self.assertEqual("{{a}}xz{{b}}{{c}}[[y]]{{d}}", code)
self.assertRaises(ValueError, code.insert_before, "{{r}}", "n",
recursive=True)
self.assertRaises(ValueError, code.insert_before, "{{r}}", "n",
recursive=False)

code2 = parse("{{a|{{b}}|{{c|d={{f}}}}}}")
code2.insert_before(code2.get(0).params[0].value.get(0), "x",
recursive=True)
code2.insert_before("{{f}}", "y", recursive=True)
self.assertEqual("{{a|x{{b}}|{{c|d=y{{f}}}}}}", code2)
self.assertRaises(ValueError, code2.insert_before, "{{f}}", "y",
recursive=False)
meth = lambda code, *args, **kw: code.insert_before(*args, **kw)
expected = [
"{{a}}xz{{b}}{{c}}[[y]]{{d}}{{e}}",
"d{{a}}cd{{a}}d{{a}}f{{b}}f{{b}}ef{{b}}",
"{{a|x{{b}}|{{c|d=y{{f}}}}}}",
"{{a}}w{{b}}{{c}}x{{d}}{{e}}{{f}}{{g}}{{h}}yz{{i}}{{j}}",
"{{a|x{{b}}{{c}}|{{f|{{g}}=y{{h}}{{i}}}}}}",
"here cdis {{some abtext and a {{template}}}}"]
self._test_search(meth, expected)

def test_insert_after(self):
"""test Wikicode.insert_after()"""
code = parse("{{a}}{{b}}{{c}}{{d}}")
code.insert_after("{{b}}", "x", recursive=True)
code.insert_after("{{d}}", "[[y]]", recursive=False)
self.assertEqual("{{a}}{{b}}x{{c}}{{d}}[[y]]", code)
code.insert_after(code.get(2), "z")
self.assertEqual("{{a}}{{b}}xz{{c}}{{d}}[[y]]", code)
self.assertRaises(ValueError, code.insert_after, "{{r}}", "n",
recursive=True)
self.assertRaises(ValueError, code.insert_after, "{{r}}", "n",
recursive=False)

code2 = parse("{{a|{{b}}|{{c|d={{f}}}}}}")
code2.insert_after(code2.get(0).params[0].value.get(0), "x",
recursive=True)
code2.insert_after("{{f}}", "y", recursive=True)
self.assertEqual("{{a|{{b}}x|{{c|d={{f}}y}}}}", code2)
self.assertRaises(ValueError, code2.insert_after, "{{f}}", "y",
recursive=False)
meth = lambda code, *args, **kw: code.insert_after(*args, **kw)
expected = [
"{{a}}{{b}}xz{{c}}{{d}}[[y]]{{e}}",
"{{a}}d{{a}}dc{{a}}d{{b}}f{{b}}f{{b}}fe",
"{{a|{{b}}x|{{c|d={{f}}y}}}}",
"{{a}}{{b}}{{c}}w{{d}}{{e}}x{{f}}{{g}}{{h}}{{i}}{{j}}yz",
"{{a|{{b}}{{c}}x|{{f|{{g}}={{h}}{{i}}y}}}}",
"here is {{somecd text andab a {{template}}}}"]
self._test_search(meth, expected)

def test_replace(self):
"""test Wikicode.replace()"""
code = parse("{{a}}{{b}}{{c}}{{d}}")
code.replace("{{b}}", "x", recursive=True)
code.replace("{{d}}", "[[y]]", recursive=False)
self.assertEqual("{{a}}x{{c}}[[y]]", code)
code.replace(code.get(1), "z")
self.assertEqual("{{a}}z{{c}}[[y]]", code)
self.assertRaises(ValueError, code.replace, "{{r}}", "n",
recursive=True)
self.assertRaises(ValueError, code.replace, "{{r}}", "n",
recursive=False)

code2 = parse("{{a|{{b}}|{{c|d={{f}}}}}}")
code2.replace(code2.get(0).params[0].value.get(0), "x", recursive=True)
code2.replace("{{f}}", "y", recursive=True)
self.assertEqual("{{a|x|{{c|d=y}}}}", code2)
self.assertRaises(ValueError, code2.replace, "y", "z", recursive=False)
meth = lambda code, *args, **kw: code.replace(*args, **kw)
expected = [
"{{a}}xz[[y]]{{e}}", "dcdffe", "{{a|x|{{c|d=y}}}}",
"{{a}}wx{{f}}{{g}}z", "{{a|x|{{f|{{g}}=y}}}}",
"here cd ab a {{template}}}}"]
self._test_search(meth, expected)

def test_append(self):
"""test Wikicode.append()"""
@@ -197,18 +231,25 @@ class TestWikicode(TreeEqualityTestCase):

def test_remove(self):
"""test Wikicode.remove()"""
code = parse("{{a}}{{b}}{{c}}{{d}}")
code.remove("{{b}}", recursive=True)
code.remove(code.get(1), recursive=True)
self.assertEqual("{{a}}{{d}}", code)
self.assertRaises(ValueError, code.remove, "{{r}}", recursive=True)
self.assertRaises(ValueError, code.remove, "{{r}}", recursive=False)

code2 = parse("{{a|{{b}}|{{c|d={{f}}{{h}}}}}}")
code2.remove(code2.get(0).params[0].value.get(0), recursive=True)
code2.remove("{{f}}", recursive=True)
self.assertEqual("{{a||{{c|d={{h}}}}}}", code2)
self.assertRaises(ValueError, code2.remove, "{{h}}", recursive=False)
meth = lambda code, obj, value, **kw: code.remove(obj, **kw)
expected = [
"{{a}}{{c}}", "", "{{a||{{c|d=}}}}", "{{a}}{{f}}",
"{{a||{{f|{{g}}=}}}}", "here a {{template}}}}"
]
self._test_search(meth, expected)

def test_matches(self):
"""test Wikicode.matches()"""
code1 = parse("Cleanup")
code2 = parse("\nstub<!-- TODO: make more specific -->")
self.assertTrue(code1.matches("Cleanup"))
self.assertTrue(code1.matches("cleanup"))
self.assertTrue(code1.matches(" cleanup\n"))
self.assertFalse(code1.matches("CLEANup"))
self.assertFalse(code1.matches("Blah"))
self.assertTrue(code2.matches("stub"))
self.assertTrue(code2.matches("Stub<!-- no, it's fine! -->"))
self.assertFalse(code2.matches("StuB"))

def test_filter_family(self):
"""test the Wikicode.i?filter() family of functions"""
@@ -219,11 +260,11 @@ class TestWikicode(TreeEqualityTestCase):

code = parse("a{{b}}c[[d]]{{{e}}}{{f}}[[g]]")
for func in (code.filter, ifilter(code)):
self.assertEqual(["a", "{{b}}", "c", "[[d]]", "{{{e}}}", "{{f}}",
"[[g]]"], func())
self.assertEqual(["a", "{{b}}", "b", "c", "[[d]]", "d", "{{{e}}}",
"e", "{{f}}", "f", "[[g]]", "g"], func())
self.assertEqual(["{{{e}}}"], func(forcetype=Argument))
self.assertIs(code.get(4), func(forcetype=Argument)[0])
self.assertEqual(["a", "c"], func(forcetype=Text))
self.assertEqual(list("abcdefg"), func(forcetype=Text))
self.assertEqual([], func(forcetype=Heading))
self.assertRaises(TypeError, func, forcetype=True)

@@ -235,11 +276,12 @@ class TestWikicode(TreeEqualityTestCase):
self.assertEqual(["{{{e}}}"], get_filter("arguments"))
self.assertIs(code.get(4), get_filter("arguments")[0])
self.assertEqual([], get_filter("comments"))
self.assertEqual([], get_filter("external_links"))
self.assertEqual([], get_filter("headings"))
self.assertEqual([], get_filter("html_entities"))
self.assertEqual([], get_filter("tags"))
self.assertEqual(["{{b}}", "{{f}}"], get_filter("templates"))
self.assertEqual(["a", "c"], get_filter("text"))
self.assertEqual(list("abcdefg"), get_filter("text"))
self.assertEqual(["[[d]]", "[[g]]"], get_filter("wikilinks"))

code2 = parse("{{a|{{b}}|{{c|d={{f}}{{h}}}}}}")
@@ -252,13 +294,13 @@ class TestWikicode(TreeEqualityTestCase):

code3 = parse("{{foobar}}{{FOO}}{{baz}}{{bz}}")
for func in (code3.filter, ifilter(code3)):
self.assertEqual(["{{foobar}}", "{{FOO}}"], func(matches=r"foo"))
self.assertEqual(["{{foobar}}", "{{FOO}}"], func(recursive=False, matches=r"foo"))
self.assertEqual(["{{foobar}}", "{{FOO}}"],
func(matches=r"^{{foo.*?}}"))
func(recursive=False, matches=r"^{{foo.*?}}"))
self.assertEqual(["{{foobar}}"],
func(matches=r"^{{foo.*?}}", flags=re.UNICODE))
self.assertEqual(["{{baz}}", "{{bz}}"], func(matches=r"^{{b.*?z"))
self.assertEqual(["{{baz}}"], func(matches=r"^{{b.+?z}}"))
func(recursive=False, matches=r"^{{foo.*?}}", flags=re.UNICODE))
self.assertEqual(["{{baz}}", "{{bz}}"], func(recursive=False, matches=r"^{{b.*?z"))
self.assertEqual(["{{baz}}"], func(recursive=False, matches=r"^{{b.+?z}}"))

self.assertEqual(["{{a|{{b}}|{{c|d={{f}}{{h}}}}}}"],
code2.filter_templates(recursive=False))


+ 473
- 0
tests/tokenizer/external_links.mwtest View File

@@ -0,0 +1,473 @@
name: basic
label: basic external link
input: "http://example.com/"
output: [ExternalLinkOpen(brackets=False), Text(text="http://example.com/"), ExternalLinkClose()]

---

name: basic_brackets
label: basic external link in brackets
input: "[http://example.com/]"
output: [ExternalLinkOpen(brackets=True), Text(text="http://example.com/"), ExternalLinkClose()]

---

name: brackets_space
label: basic external link in brackets, with a space after
input: "[http://example.com/ ]"
output: [ExternalLinkOpen(brackets=True), Text(text="http://example.com/"), ExternalLinkSeparator(), ExternalLinkClose()]

---

name: brackets_title
label: basic external link in brackets, with a title
input: "[http://example.com/ Example]"
output: [ExternalLinkOpen(brackets=True), Text(text="http://example.com/"), ExternalLinkSeparator(), Text(text="Example"), ExternalLinkClose()]

---

name: brackets_multiword_title
label: basic external link in brackets, with a multi-word title
input: "[http://example.com/ Example Web Page]"
output: [ExternalLinkOpen(brackets=True), Text(text="http://example.com/"), ExternalLinkSeparator(), Text(text="Example Web Page"), ExternalLinkClose()]

---

name: brackets_adjacent
label: three adjacent bracket-enclosed external links
input: "[http://foo.com/ Foo][http://bar.com/ Bar]\n[http://baz.com/ Baz]"
output: [ExternalLinkOpen(brackets=True), Text(text="http://foo.com/"), ExternalLinkSeparator(), Text(text="Foo"), ExternalLinkClose(), ExternalLinkOpen(brackets=True), Text(text="http://bar.com/"), ExternalLinkSeparator(), Text(text="Bar"), ExternalLinkClose(), Text(text="\n"), ExternalLinkOpen(brackets=True), Text(text="http://baz.com/"), ExternalLinkSeparator(), Text(text="Baz"), ExternalLinkClose()]

---

name: brackets_newline_before
label: bracket-enclosed link with a newline before the title
input: "[http://example.com/ \nExample]"
output: [Text(text="["), ExternalLinkOpen(brackets=False), Text(text="http://example.com/"), ExternalLinkClose(), Text(text=" \nExample]")]

---

name: brackets_newline_inside
label: bracket-enclosed link with a newline in the title
input: "[http://example.com/ Example \nWeb Page]"
output: [Text(text="["), ExternalLinkOpen(brackets=False), Text(text="http://example.com/"), ExternalLinkClose(), Text(text=" Example \nWeb Page]")]

---

name: brackets_newline_after
label: bracket-enclosed link with a newline after the title
input: "[http://example.com/ Example\n]"
output: [Text(text="["), ExternalLinkOpen(brackets=False), Text(text="http://example.com/"), ExternalLinkClose(), Text(text=" Example\n]")]

---

name: brackets_space_before
label: bracket-enclosed link with a space before the URL
input: "[ http://example.com Example]"
output: [Text(text="[ "), ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose(), Text(text=" Example]")]

---

name: brackets_title_like_url
label: bracket-enclosed link with a title that looks like a URL
input: "[http://example.com http://example.com]"
output: [ExternalLinkOpen(brackets=True), Text(text="http://example.com"), ExternalLinkSeparator(), Text(text="http://example.com"), ExternalLinkClose()]

---

name: brackets_recursive
label: bracket-enclosed link with a bracket-enclosed link as the title
input: "[http://example.com [http://example.com]]"
output: [ExternalLinkOpen(brackets=True), Text(text="http://example.com"), ExternalLinkSeparator(), Text(text="[http://example.com"), ExternalLinkClose(), Text(text="]")]

---

name: period_after
label: a period after a free link that is excluded
input: "http://example.com."
output: [ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose(), Text(text=".")]

---

name: colons_after
label: colons after a free link that are excluded
input: "http://example.com/foo:bar.:;baz!?,"
output: [ExternalLinkOpen(brackets=False), Text(text="http://example.com/foo:bar.:;baz"), ExternalLinkClose(), Text(text="!?,")]

---

name: close_paren_after_excluded
label: a closing parenthesis after a free link that is excluded
input: "http://example.)com)"
output: [ExternalLinkOpen(brackets=False), Text(text="http://example.)com"), ExternalLinkClose(), Text(text=")")]

---

name: close_paren_after_included
label: a closing parenthesis after a free link that is included because of an opening parenthesis in the URL
input: "http://example.(com)"
output: [ExternalLinkOpen(brackets=False), Text(text="http://example.(com)"), ExternalLinkClose()]

---

name: open_bracket_inside
label: an open bracket inside a free link that causes it to be ended abruptly
input: "http://foobar[baz.com"
output: [ExternalLinkOpen(brackets=False), Text(text="http://foobar"), ExternalLinkClose(), Text(text="[baz.com")]

---

name: brackets_period_after
label: a period after a bracket-enclosed link that is included
input: "[http://example.com. Example]"
output: [ExternalLinkOpen(brackets=True), Text(text="http://example.com."), ExternalLinkSeparator(), Text(text="Example"), ExternalLinkClose()]

---

name: brackets_colons_after
label: colons after a bracket-enclosed link that are included
input: "[http://example.com/foo:bar.:;baz!?, Example]"
output: [ExternalLinkOpen(brackets=True), Text(text="http://example.com/foo:bar.:;baz!?,"), ExternalLinkSeparator(), Text(text="Example"), ExternalLinkClose()]

---

name: brackets_close_paren_after_included
label: a closing parenthesis after a bracket-enclosed link that is included
input: "[http://example.)com) Example]"
output: [ExternalLinkOpen(brackets=True), Text(text="http://example.)com)"), ExternalLinkSeparator(), Text(text="Example"), ExternalLinkClose()]

---

name: brackets_close_paren_after_included_2
label: a closing parenthesis after a bracket-enclosed link that is also included
input: "[http://example.(com) Example]"
output: [ExternalLinkOpen(brackets=True), Text(text="http://example.(com)"), ExternalLinkSeparator(), Text(text="Example"), ExternalLinkClose()]

---

name: brackets_open_bracket_inside
label: an open bracket inside a bracket-enclosed link that is also included
input: "[http://foobar[baz.com Example]"
output: [ExternalLinkOpen(brackets=True), Text(text="http://foobar[baz.com"), ExternalLinkSeparator(), Text(text="Example"), ExternalLinkClose()]

---

name: adjacent_space
label: two free links separated by a space
input: "http://example.com http://example.com"
output: [ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose(), Text(text=" "), ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose()]

---

name: adjacent_newline
label: two free links separated by a newline
input: "http://example.com\nhttp://example.com"
output: [ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose(), Text(text="\n"), ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose()]

---

name: adjacent_close_bracket
label: two free links separated by a close bracket
input: "http://example.com]http://example.com"
output: [ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose(), Text(text="]"), ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose()]

---

name: html_entity_in_url
label: a HTML entity parsed correctly inside a free link
input: "http://exa&nbsp;mple.com/"
output: [ExternalLinkOpen(brackets=False), Text(text="http://exa"), HTMLEntityStart(), Text(text="nbsp"), HTMLEntityEnd(), Text(text="mple.com/"), ExternalLinkClose()]

---

name: template_in_url
label: a template parsed correctly inside a free link
input: "http://exa{{template}}mple.com/"
output: [ExternalLinkOpen(brackets=False), Text(text="http://exa"), TemplateOpen(), Text(text="template"), TemplateClose(), Text(text="mple.com/"), ExternalLinkClose()]

---

name: argument_in_url
label: an argument parsed correctly inside a free link
input: "http://exa{{{argument}}}mple.com/"
output: [ExternalLinkOpen(brackets=False), Text(text="http://exa"), ArgumentOpen(), Text(text="argument"), ArgumentClose(), Text(text="mple.com/"), ExternalLinkClose()]

---

name: wikilink_in_url
label: a wikilink that destroys a free link
input: "http://exa[[wikilink]]mple.com/"
output: [ExternalLinkOpen(brackets=False), Text(text="http://exa"), ExternalLinkClose(), WikilinkOpen(), Text(text="wikilink"), WikilinkClose(), Text(text="mple.com/")]

---

name: external_link_in_url
label: a bracketed link that destroys a free link
input: "http://exa[http://example.com/]mple.com/"
output: [ExternalLinkOpen(brackets=False), Text(text="http://exa"), ExternalLinkClose(), ExternalLinkOpen(brackets=True), Text(text="http://example.com/"), ExternalLinkClose(), Text(text="mple.com/")]

---

name: spaces_padding
label: spaces padding a free link
input: " http://example.com "
output: [Text(text=" "), ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose(), Text(text=" ")]

---

name: text_and_spaces_padding
label: text and spaces padding a free link
input: "x http://example.com x"
output: [Text(text="x "), ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose(), Text(text=" x")]

---

name: template_before
label: a template before a free link
input: "{{foo}}http://example.com"
output: [TemplateOpen(), Text(text="foo"), TemplateClose(), ExternalLinkOpen(brackets=False), Text(text="http://example.com"), ExternalLinkClose()]

---

name: spaces_padding_no_slashes
label: spaces padding a free link with no slashes after the colon
input: " mailto:example@example.com "
output: [Text(text=" "), ExternalLinkOpen(brackets=False), Text(text="mailto:example@example.com"), ExternalLinkClose(), Text(text=" ")]

---

name: text_and_spaces_padding_no_slashes
label: text and spaces padding a free link with no slashes after the colon
input: "x mailto:example@example.com x"
output: [Text(text="x "), ExternalLinkOpen(brackets=False), Text(text="mailto:example@example.com"), ExternalLinkClose(), Text(text=" x")]

---

name: template_before_no_slashes
label: a template before a free link with no slashes after the colon
input: "{{foo}}mailto:example@example.com"
output: [TemplateOpen(), Text(text="foo"), TemplateClose(), ExternalLinkOpen(brackets=False), Text(text="mailto:example@example.com"), ExternalLinkClose()]

---

name: no_slashes
label: a free link with no slashes after the colon
input: "mailto:example@example.com"
output: [ExternalLinkOpen(brackets=False), Text(text="mailto:example@example.com"), ExternalLinkClose()]

---

name: slashes_optional
label: a free link using a scheme that doesn't need slashes, but has them anyway
input: "mailto://example@example.com"
output: [ExternalLinkOpen(brackets=False), Text(text="mailto://example@example.com"), ExternalLinkClose()]

---

name: short
label: a very short free link
input: "mailto://abc"
output: [ExternalLinkOpen(brackets=False), Text(text="mailto://abc"), ExternalLinkClose()]

---

name: slashes_missing
label: slashes missing from a free link with a scheme that requires them
input: "http:example@example.com"
output: [Text(text="http:example@example.com")]

---

name: no_scheme_but_slashes
label: no scheme in a free link, but slashes (protocol-relative free links are not supported)
input: "//example.com"
output: [Text(text="//example.com")]

---

name: no_scheme_but_colon
label: no scheme in a free link, but a colon
input: " :example.com"
output: [Text(text=" :example.com")]

---

name: no_scheme_but_colon_and_slashes
label: no scheme in a free link, but a colon and slashes
input: " ://example.com"
output: [Text(text=" ://example.com")]

---

name: fake_scheme_no_slashes
label: a nonexistent scheme in a free link, without slashes
input: "fake:example.com"
output: [Text(text="fake:example.com")]

---

name: fake_scheme_slashes
label: a nonexistent scheme in a free link, with slashes
input: "fake://example.com"
output: [Text(text="fake://example.com")]

---

name: fake_scheme_brackets_no_slashes
label: a nonexistent scheme in a bracketed link, without slashes
input: "[fake:example.com]"
output: [Text(text="[fake:example.com]")]

---

name: fake_scheme_brackets_slashes
label: #=a nonexistent scheme in a bracketed link, with slashes
input: "[fake://example.com]"
output: [Text(text="[fake://example.com]")]

---

name: interrupted_scheme
label: an otherwise valid scheme with something in the middle of it, in a free link
input: "ht?tp://example.com"
output: [Text(text="ht?tp://example.com")]

---

name: interrupted_scheme_brackets
label: an otherwise valid scheme with something in the middle of it, in a bracketed link
input: "[ht?tp://example.com]"
output: [Text(text="[ht?tp://example.com]")]

---

name: no_slashes_brackets
label: no slashes after the colon in a bracketed link
input: "[mailto:example@example.com Example]"
output: [ExternalLinkOpen(brackets=True), Text(text="mailto:example@example.com"), ExternalLinkSeparator(), Text(text="Example"), ExternalLinkClose()]

---

name: space_before_no_slashes_brackets
label: a space before a bracketed link with no slashes after the colon
input: "[ mailto:example@example.com Example]"
output: [Text(text="[ "), ExternalLinkOpen(brackets=False), Text(text="mailto:example@example.com"), ExternalLinkClose(), Text(text=" Example]")]

---

name: slashes_optional_brackets
label: a bracketed link using a scheme that doesn't need slashes, but has them anyway
input: "[mailto://example@example.com Example]"
output: [ExternalLinkOpen(brackets=True), Text(text="mailto://example@example.com"), ExternalLinkSeparator(), Text(text="Example"), ExternalLinkClose()]

---

name: short_brackets
label: a very short link in brackets
input: "[mailto://abc Example]"
output: [ExternalLinkOpen(brackets=True), Text(text="mailto://abc"), ExternalLinkSeparator(), Text(text="Example"), ExternalLinkClose()]

---

name: slashes_missing_brackets
label: slashes missing from a scheme that requires them in a bracketed link
input: "[http:example@example.com Example]"
output: [Text(text="[http:example@example.com Example]")]

---

name: protcol_relative
label: a protocol-relative link (in brackets)
input: "[//example.com Example]"
output: [ExternalLinkOpen(brackets=True), Text(text="//example.com"), ExternalLinkSeparator(), Text(text="Example"), ExternalLinkClose()]

---

name: scheme_missing_but_colon_brackets
label: scheme missing from a bracketed link, but with a colon
input: "[:example.com Example]"
output: [Text(text="[:example.com Example]")]

---

name: scheme_missing_but_colon_slashes_brackets
label: scheme missing from a bracketed link, but with a colon and slashes
input: "[://example.com Example]"
output: [Text(text="[://example.com Example]")]

---

name: unclosed_protocol_relative
label: an unclosed protocol-relative bracketed link
input: "[//example.com"
output: [Text(text="[//example.com")]

---

name: space_before_protcol_relative
label: a space before a protocol-relative bracketed link
input: "[ //example.com]"
output: [Text(text="[ //example.com]")]

---

name: unclosed_just_scheme
label: an unclosed bracketed link, ending after the scheme
input: "[http"
output: [Text(text="[http")]

---

name: unclosed_scheme_colon
label: an unclosed bracketed link, ending after the colon
input: "[http:"
output: [Text(text="[http:")]

---

name: unclosed_scheme_colon_slashes
label: an unclosed bracketed link, ending after the slashes
input: "[http://"
output: [Text(text="[http://")]

---

name: incomplete_bracket
label: just an open bracket
input: "["
output: [Text(text="[")]

---

name: incomplete_scheme_colon
label: a free link with just a scheme and a colon
input: "http:"
output: [Text(text="http:")]

---

name: incomplete_scheme_colon_slashes
label: a free link with just a scheme, colon, and slashes
input: "http://"
output: [Text(text="http://")]

---

name: brackets_scheme_but_no_url
label: brackets around a scheme and a colon
input: "[mailto:]"
output: [Text(text="[mailto:]")]

---

name: brackets_scheme_slashes_but_no_url
label: brackets around a scheme, colon, and slashes
input: "[http://]"
output: [Text(text="[http://]")]

---

name: brackets_scheme_title_but_no_url
label: brackets around a scheme, colon, and slashes, with a title
input: "[http:// Example]"
output: [Text(text="[http:// Example]")]

+ 14
- 0
tests/tokenizer/html_entities.mwtest View File

@@ -117,6 +117,20 @@ output: [Text(text="&;")]

---

name: invalid_partial_amp_pound
label: invalid entities: just an ampersand, pound sign
input: "&#"
output: [Text(text="&#")]

---

name: invalid_partial_amp_pound_x
label: invalid entities: just an ampersand, pound sign, x
input: "&#x"
output: [Text(text="&#x")]

---

name: invalid_partial_amp_pound_semicolon
label: invalid entities: an ampersand, pound sign, and semicolon
input: "&#;"


+ 28
- 0
tests/tokenizer/integration.mwtest View File

@@ -12,6 +12,13 @@ output: [TemplateOpen(), ArgumentOpen(), ArgumentOpen(), Text(text="foo"), Argum

---

name: link_in_template_name
label: a wikilink inside a template name, which breaks the template
input: "{{foo[[bar]]}}"
output: [Text(text="{{foo"), WikilinkOpen(), Text(text="bar"), WikilinkClose(), Text(text="}}")]

---

name: rich_heading
label: a heading with templates/wikilinks in it
input: "== Head{{ing}} [[with]] {{{funky|{{stuf}}}}} =="
@@ -33,6 +40,13 @@ output: [Text(text="&n"), CommentStart(), Text(text="foo"), CommentEnd(), Text(t

---

name: rich_tags
label: a HTML tag with tons of other things in it
input: "{{dubious claim}}<ref name={{abc}} foo="bar {{baz}}" abc={{de}}f ghi=j{{k}}{{l}} \n mno = "{{p}} [[q]] {{r}}">[[Source]]</ref>"
output: [TemplateOpen(), Text(text="dubious claim"), TemplateClose(), TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="name"), TagAttrEquals(), TemplateOpen(), Text(text="abc"), TemplateClose(), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="foo"), TagAttrEquals(), TagAttrQuote(), Text(text="bar "), TemplateOpen(), Text(text="baz"), TemplateClose(), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="abc"), TagAttrEquals(), TemplateOpen(), Text(text="de"), TemplateClose(), Text(text="f"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="ghi"), TagAttrEquals(), Text(text="j"), TemplateOpen(), Text(text="k"), TemplateClose(), TemplateOpen(), Text(text="l"), TemplateClose(), TagAttrStart(pad_first=" \n ", pad_before_eq=" ", pad_after_eq=" "), Text(text="mno"), TagAttrEquals(), TagAttrQuote(), TemplateOpen(), Text(text="p"), TemplateClose(), Text(text=" "), WikilinkOpen(), Text(text="q"), WikilinkClose(), Text(text=" "), TemplateOpen(), Text(text="r"), TemplateClose(), TagCloseOpen(padding=""), WikilinkOpen(), Text(text="Source"), WikilinkClose(), TagOpenClose(), Text(text="ref"), TagCloseClose()]

---

name: wildcard
label: a wildcard assortment of various things
input: "{{{{{{{{foo}}bar|baz=biz}}buzz}}usr|{{bin}}}}"
@@ -44,3 +58,17 @@ name: wildcard_redux
label: an even wilder assortment of various things
input: "{{a|b|{{c|[[d]]{{{e}}}}}}}[[f|{{{g}}}<!--h-->]]{{i|j=&nbsp;}}"
output: [TemplateOpen(), Text(text="a"), TemplateParamSeparator(), Text(text="b"), TemplateParamSeparator(), TemplateOpen(), Text(text="c"), TemplateParamSeparator(), WikilinkOpen(), Text(text="d"), WikilinkClose(), ArgumentOpen(), Text(text="e"), ArgumentClose(), TemplateClose(), TemplateClose(), WikilinkOpen(), Text(text="f"), WikilinkSeparator(), ArgumentOpen(), Text(text="g"), ArgumentClose(), CommentStart(), Text(text="h"), CommentEnd(), WikilinkClose(), TemplateOpen(), Text(text="i"), TemplateParamSeparator(), Text(text="j"), TemplateParamEquals(), HTMLEntityStart(), Text(text="nbsp"), HTMLEntityEnd(), TemplateClose()]

---

name: link_inside_dl
label: an external link inside a def list, such that the external link is parsed
input: ";;;mailto:example"
output: [TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), ExternalLinkOpen(brackets=False), Text(text="mailto:example"), ExternalLinkClose()]

---

name: link_inside_dl_2
label: an external link inside a def list, such that the external link is not parsed
input: ";;;malito:example"
output: [TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="malito"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="example")]

+ 578
- 0
tests/tokenizer/tags.mwtest View File

@@ -0,0 +1,578 @@
name: basic
label: a basic tag with an open and close
input: "<ref></ref>"
output: [TagOpenOpen(), Text(text="ref"), TagCloseOpen(padding=""), TagOpenClose(), Text(text="ref"), TagCloseClose()]

---

name: basic_selfclosing
label: a basic self-closing tag
input: "<ref/>"
output: [TagOpenOpen(), Text(text="ref"), TagCloseSelfclose(padding="")]

---

name: content
label: a tag with some content in the middle
input: "<ref>this is a reference</ref>"
output: [TagOpenOpen(), Text(text="ref"), TagCloseOpen(padding=""), Text(text="this is a reference"), TagOpenClose(), Text(text="ref"), TagCloseClose()]

---

name: padded_open
label: a tag with some padding in the open tag
input: "<ref ></ref>"
output: [TagOpenOpen(), Text(text="ref"), TagCloseOpen(padding=" "), TagOpenClose(), Text(text="ref"), TagCloseClose()]

---

name: padded_close
label: a tag with some padding in the close tag
input: "<ref></ref >"
output: [TagOpenOpen(), Text(text="ref"), TagCloseOpen(padding=""), TagOpenClose(), Text(text="ref "), TagCloseClose()]

---

name: padded_selfclosing
label: a self-closing tag with padding
input: "<ref />"
output: [TagOpenOpen(), Text(text="ref"), TagCloseSelfclose(padding=" ")]

---

name: attribute
label: a tag with a single attribute
input: "<ref name></ref>"
output: [TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="name"), TagCloseOpen(padding=""), TagOpenClose(), Text(text="ref"), TagCloseClose()]

---

name: attribute_value
label: a tag with a single attribute with a value
input: "<ref name=foo></ref>"
output: [TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="name"), TagAttrEquals(), Text(text="foo"), TagCloseOpen(padding=""), TagOpenClose(), Text(text="ref"), TagCloseClose()]

---

name: attribute_quoted
label: a tag with a single quoted attribute
input: "<ref name="foo bar"></ref>"
output: [TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="name"), TagAttrEquals(), TagAttrQuote(), Text(text="foo bar"), TagCloseOpen(padding=""), TagOpenClose(), Text(text="ref"), TagCloseClose()]

---

name: attribute_hyphen
label: a tag with a single attribute, containing a hyphen
input: "<ref name=foo-bar></ref>"
output: [TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="name"), TagAttrEquals(), Text(text="foo-bar"), TagCloseOpen(padding=""), TagOpenClose(), Text(text="ref"), TagCloseClose()]

---

name: attribute_quoted_hyphen
label: a tag with a single quoted attribute, containing a hyphen
input: "<ref name="foo-bar"></ref>"
output: [TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="name"), TagAttrEquals(), TagAttrQuote(), Text(text="foo-bar"), TagCloseOpen(padding=""), TagOpenClose(), Text(text="ref"), TagCloseClose()]

---

name: attribute_selfclosing
label: a self-closing tag with a single attribute
input: "<ref name/>"
output: [TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="name"), TagCloseSelfclose(padding="")]

---

name: attribute_selfclosing_value
label: a self-closing tag with a single attribute with a value
input: "<ref name=foo/>"
output: [TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="name"), TagAttrEquals(), Text(text="foo"), TagCloseSelfclose(padding="")]

---

name: attribute_selfclosing_value_quoted
label: a self-closing tag with a single quoted attribute
input: "<ref name="foo"/>"
output: [TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="name"), TagAttrEquals(), TagAttrQuote(), Text(text="foo"), TagCloseSelfclose(padding="")]

---

name: nested_tag
label: a tag nested within the attributes of another
input: "<ref name=<span style="color: red;">foo</span>>citation</ref>"
output: [TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="name"), TagAttrEquals(), TagOpenOpen(), Text(text="span"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="style"), TagAttrEquals(), TagAttrQuote(), Text(text="color: red;"), TagCloseOpen(padding=""), Text(text="foo"), TagOpenClose(), Text(text="span"), TagCloseClose(), TagCloseOpen(padding=""), Text(text="citation"), TagOpenClose(), Text(text="ref"), TagCloseClose()]

---

name: nested_tag_quoted
label: a tag nested within the attributes of another, quoted
input: "<ref name="<span style="color: red;">foo</span>">citation</ref>"
output: [TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="name"), TagAttrEquals(), TagAttrQuote(), TagOpenOpen(), Text(text="span"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="style"), TagAttrEquals(), TagAttrQuote(), Text(text="color: red;"), TagCloseOpen(padding=""), Text(text="foo"), TagOpenClose(), Text(text="span"), TagCloseClose(), TagCloseOpen(padding=""), Text(text="citation"), TagOpenClose(), Text(text="ref"), TagCloseClose()]

---

name: nested_troll_tag
label: a bogus tag that appears to be nested within the attributes of another
input: "<ref name=</ ><//>>citation</ref>"
output: [Text(text="<ref name=</ ><//>>citation</ref>")]

---

name: nested_troll_tag_quoted
label: a bogus tag that appears to be nested within the attributes of another, quoted
input: "<ref name="</ ><//>">citation</ref>"
output: [TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="name"), TagAttrEquals(), TagAttrQuote(), Text(text="</ ><//>"), TagCloseOpen(padding=""), Text(text="citation"), TagOpenClose(), Text(text="ref"), TagCloseClose()]

---

name: invalid_space_begin_open
label: invalid tag: a space at the beginning of the open tag
input: "< ref>test</ref>"
output: [Text(text="< ref>test</ref>")]

---

name: invalid_space_begin_close
label: invalid tag: a space at the beginning of the close tag
input: "<ref>test</ ref>"
output: [Text(text="<ref>test</ ref>")]

---

name: valid_space_end
label: valid tag: spaces at the ends of both the open and close tags
input: "<ref >test</ref >"
output: [TagOpenOpen(), Text(text="ref"), TagCloseOpen(padding=" "), Text(text="test"), TagOpenClose(), Text(text="ref "), TagCloseClose()]

---

name: invalid_template_ends
label: invalid tag: a template at the ends of both the open and close tags
input: "<ref {{foo}}>test</ref {{foo}}>"
output: [Text(text="<ref "), TemplateOpen(), Text(text="foo"), TemplateClose(), Text(text=">test</ref "), TemplateOpen(), Text(text="foo"), TemplateClose(), Text(text=">")]

---

name: invalid_template_ends_nospace
label: invalid tag: a template at the ends of both the open and close tags, without spacing
input: "<ref {{foo}}>test</ref{{foo}}>"
output: [Text(text="<ref "), TemplateOpen(), Text(text="foo"), TemplateClose(), Text(text=">test</ref"), TemplateOpen(), Text(text="foo"), TemplateClose(), Text(text=">")]

---

name: valid_template_end_open
label: valid tag: a template at the end of the open tag
input: "<ref {{foo}}>test</ref>"
output: [TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), TemplateOpen(), Text(text="foo"), TemplateClose(), TagCloseOpen(padding=""), Text(text="test"), TagOpenClose(), Text(text="ref"), TagCloseClose()]

---

name: valid_template_end_open_space_end_close
label: valid tag: a template at the end of the open tag; whitespace at the end of the close tag
input: "<ref {{foo}}>test</ref\n>"
output: [TagOpenOpen(), Text(text="ref"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), TemplateOpen(), Text(text="foo"), TemplateClose(), TagCloseOpen(padding=""), Text(text="test"), TagOpenClose(), Text(text="ref\n"), TagCloseClose()]

---

name: invalid_template_end_open_nospace
label: invalid tag: a template at the end of the open tag, without spacing
input: "<ref{{foo}}>test</ref>"
output: [Text(text="<ref"), TemplateOpen(), Text(text="foo"), TemplateClose(), Text(text=">test</ref>")]

---

name: invalid_template_start_close
label: invalid tag: a template at the beginning of the close tag
input: "<ref>test</{{foo}}ref>"
output: [Text(text="<ref>test</"), TemplateOpen(), Text(text="foo"), TemplateClose(), Text(text="ref>")]

---

name: invalid_template_start_open
label: invalid tag: a template at the beginning of the open tag
input: "<{{foo}}ref>test</ref>"
output: [Text(text="<"), TemplateOpen(), Text(text="foo"), TemplateClose(), Text(text="ref>test</ref>")]

---

name: unclosed_quote
label: a quoted attribute that is never closed
input: "<span style="foobar>stuff</span>"
output: [TagOpenOpen(), Text(text="span"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="style"), TagAttrEquals(), Text(text="\"foobar"), TagCloseOpen(padding=""), Text(text="stuff"), TagOpenClose(), Text(text="span"), TagCloseClose()]

---

name: fake_quote
label: a fake quoted attribute
input: "<span style="foo"bar>stuff</span>"
output: [TagOpenOpen(), Text(text="span"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="style"), TagAttrEquals(), Text(text="\"foo\"bar"), TagCloseOpen(padding=""), Text(text="stuff"), TagOpenClose(), Text(text="span"), TagCloseClose()]

---

name: fake_quote_complex
label: a fake quoted attribute, with spaces and templates and links
input: "<span style="foo {{bar}}\n[[baz]]"buzz >stuff</span>"
output: [TagOpenOpen(), Text(text="span"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="style"), TagAttrEquals(), Text(text="\"foo"), TagAttrStart(pad_first=" ", pad_before_eq="\n", pad_after_eq=""), TemplateOpen(), Text(text="bar"), TemplateClose(), TagAttrStart(pad_first="", pad_before_eq=" ", pad_after_eq=""), WikilinkOpen(), Text(text="baz"), WikilinkClose(), Text(text="\"buzz"), TagCloseOpen(padding=""), Text(text="stuff"), TagOpenClose(), Text(text="span"), TagCloseClose()]

---

name: incomplete_lbracket
label: incomplete tags: just a left bracket
input: "<"
output: [Text(text="<")]

---

name: incomplete_lbracket_junk
label: incomplete tags: just a left bracket, surrounded by stuff
input: "foo<bar"
output: [Text(text="foo<bar")]

---

name: incomplete_unclosed_open
label: incomplete tags: an unclosed open tag
input: "junk <ref"
output: [Text(text="junk <ref")]

---

name: incomplete_unclosed_open_space
label: incomplete tags: an unclosed open tag, space
input: "junk <ref "
output: [Text(text="junk <ref ")]

---

name: incomplete_unclosed_open_unnamed_attr
label: incomplete tags: an unclosed open tag, unnamed attribute
input: "junk <ref name"
output: [Text(text="junk <ref name")]

---

name: incomplete_unclosed_open_attr_equals
label: incomplete tags: an unclosed open tag, attribute, equal sign
input: "junk <ref name="
output: [Text(text="junk <ref name=")]

---

name: incomplete_unclosed_open_attr_equals_quoted
label: incomplete tags: an unclosed open tag, attribute, equal sign, quote
input: "junk <ref name=""
output: [Text(text="junk <ref name=\"")]

---

name: incomplete_unclosed_open_attr
label: incomplete tags: an unclosed open tag, attribute with a key/value
input: "junk <ref name=foo"
output: [Text(text="junk <ref name=foo")]

---

name: incomplete_unclosed_open_attr_quoted
label: incomplete tags: an unclosed open tag, attribute with a key/value, quoted
input: "junk <ref name="foo""
output: [Text(text="junk <ref name=\"foo\"")]

---

name: incomplete_open
label: incomplete tags: an open tag
input: "junk <ref>"
output: [Text(text="junk <ref>")]

---

name: incomplete_open_unnamed_attr
label: incomplete tags: an open tag, unnamed attribute
input: "junk <ref name>"
output: [Text(text="junk <ref name>")]

---

name: incomplete_open_attr_equals
label: incomplete tags: an open tag, attribute, equal sign
input: "junk <ref name=>"
output: [Text(text="junk <ref name=>")]

---

name: incomplete_open_attr
label: incomplete tags: an open tag, attribute with a key/value
input: "junk <ref name=foo>"
output: [Text(text="junk <ref name=foo>")]

---

name: incomplete_open_attr_quoted
label: incomplete tags: an open tag, attribute with a key/value, quoted
input: "junk <ref name="foo">"
output: [Text(text="junk <ref name=\"foo\">")]

---

name: incomplete_open_text
label: incomplete tags: an open tag, text
input: "junk <ref>foo"
output: [Text(text="junk <ref>foo")]

---

name: incomplete_open_attr_text
label: incomplete tags: an open tag, attribute with a key/value, text
input: "junk <ref name=foo>bar"
output: [Text(text="junk <ref name=foo>bar")]

---

name: incomplete_open_text_lbracket
label: incomplete tags: an open tag, text, left open bracket
input: "junk <ref>bar<"
output: [Text(text="junk <ref>bar<")]

---

name: incomplete_open_text_lbracket_slash
label: incomplete tags: an open tag, text, left bracket, slash
input: "junk <ref>bar</"
output: [Text(text="junk <ref>bar</")]

---

name: incomplete_open_text_unclosed_close
label: incomplete tags: an open tag, text, unclosed close
input: "junk <ref>bar</ref"
output: [Text(text="junk <ref>bar</ref")]

---

name: incomplete_open_text_wrong_close
label: incomplete tags: an open tag, text, wrong close
input: "junk <ref>bar</span>"
output: [Text(text="junk <ref>bar</span>")]

---

name: incomplete_unclosed_close
label: incomplete tags: an unclosed close tag
input: "junk </"
output: [Text(text="junk </")]

---

name: incomplete_unclosed_close_text
label: incomplete tags: an unclosed close tag, with text
input: "junk </br"
output: [Text(text="junk </br")]

---

name: incomplete_close
label: incomplete tags: a close tag
input: "junk </ref>"
output: [Text(text="junk </ref>")]

---

name: incomplete_no_tag_name_open
label: incomplete tags: no tag name within brackets; just an open
input: "junk <>"
output: [Text(text="junk <>")]

---

name: incomplete_no_tag_name_selfclosing
label: incomplete tags: no tag name within brackets; self-closing
input: "junk < />"
output: [Text(text="junk < />")]

---

name: incomplete_no_tag_name_open_close
label: incomplete tags: no tag name within brackets; open and close
input: "junk <></>"
output: [Text(text="junk <></>")]

---

name: backslash_premature_before
label: a backslash before a quote before a space
input: "<foo attribute="this is\\" quoted">blah</foo>"
output: [TagOpenOpen(), Text(text="foo"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="attribute"), TagAttrEquals(), TagAttrQuote(), Text(text="this is\\\" quoted"), TagCloseOpen(padding=""), Text(text="blah"), TagOpenClose(), Text(text="foo"), TagCloseClose()]

---

name: backslash_premature_after
label: a backslash before a quote after a space
input: "<foo attribute="this is \\"quoted">blah</foo>"
output: [TagOpenOpen(), Text(text="foo"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="attribute"), TagAttrEquals(), TagAttrQuote(), Text(text="this is \\\"quoted"), TagCloseOpen(padding=""), Text(text="blah"), TagOpenClose(), Text(text="foo"), TagCloseClose()]

---

name: backslash_premature_middle
label: a backslash before a quote in the middle of a word
input: "<foo attribute="this i\\"s quoted">blah</foo>"
output: [TagOpenOpen(), Text(text="foo"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="attribute"), TagAttrEquals(), TagAttrQuote(), Text(text="this i\\\"s quoted"), TagCloseOpen(padding=""), Text(text="blah"), TagOpenClose(), Text(text="foo"), TagCloseClose()]

---

name: backslash_adjacent
label: escaped quotes next to unescaped quotes
input: "<foo attribute="\\"this is quoted\\"">blah</foo>"
output: [TagOpenOpen(), Text(text="foo"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="attribute"), TagAttrEquals(), TagAttrQuote(), Text(text="\\\"this is quoted\\\""), TagCloseOpen(padding=""), Text(text="blah"), TagOpenClose(), Text(text="foo"), TagCloseClose()]

---

name: backslash_endquote
label: backslashes before the end quote, causing the attribute to become unquoted
input: "<foo attribute="this_is quoted\\">blah</foo>"
output: [TagOpenOpen(), Text(text="foo"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="attribute"), TagAttrEquals(), Text(text="\"this_is"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="quoted\\\""), TagCloseOpen(padding=""), Text(text="blah"), TagOpenClose(), Text(text="foo"), TagCloseClose()]

---

name: backslash_double
label: two adjacent backslashes, which do *not* affect the quote
input: "<foo attribute="this is\\\\" quoted">blah</foo>"
output: [TagOpenOpen(), Text(text="foo"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="attribute"), TagAttrEquals(), TagAttrQuote(), Text(text="this is\\\\"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="quoted\""), TagCloseOpen(padding=""), Text(text="blah"), TagOpenClose(), Text(text="foo"), TagCloseClose()]

---

name: backslash_triple
label: three adjacent backslashes, which do *not* affect the quote
input: "<foo attribute="this is\\\\\\" quoted">blah</foo>"
output: [TagOpenOpen(), Text(text="foo"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="attribute"), TagAttrEquals(), TagAttrQuote(), Text(text="this is\\\\\\"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="quoted\""), TagCloseOpen(padding=""), Text(text="blah"), TagOpenClose(), Text(text="foo"), TagCloseClose()]

---

name: backslash_unaffecting
label: backslashes near quotes, but not immediately adjacent, thus having no effect
input: "<foo attribute="\\quote\\d" also="quote\\d\\">blah</foo>"
output: [TagOpenOpen(), Text(text="foo"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="attribute"), TagAttrEquals(), TagAttrQuote(), Text(text="\\quote\\d"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="also"), TagAttrEquals(), Text(text="\"quote\\d\\\""), TagCloseOpen(padding=""), Text(text="blah"), TagOpenClose(), Text(text="foo"), TagCloseClose()]

---

name: unparsable
label: a tag that should not be put through the normal parser
input: "{{t1}}<nowiki>{{t2}}</nowiki>{{t3}}"
output: [TemplateOpen(), Text(text="t1"), TemplateClose(), TagOpenOpen(), Text(text="nowiki"), TagCloseOpen(padding=""), Text(text="{{t2}}"), TagOpenClose(), Text(text="nowiki"), TagCloseClose(), TemplateOpen(), Text(text="t3"), TemplateClose()]

---

name: unparsable_complex
label: a tag that should not be put through the normal parser; lots of stuff inside
input: "{{t1}}<pre>{{t2}}\n==Heading==\nThis is some text with a [[page|link]].</pre>{{t3}}"
output: [TemplateOpen(), Text(text="t1"), TemplateClose(), TagOpenOpen(), Text(text="pre"), TagCloseOpen(padding=""), Text(text="{{t2}}\n==Heading==\nThis is some text with a [[page|link]]."), TagOpenClose(), Text(text="pre"), TagCloseClose(), TemplateOpen(), Text(text="t3"), TemplateClose()]

---

name: unparsable_attributed
label: a tag that should not be put through the normal parser; parsed attributes
input: "{{t1}}<nowiki attr=val attr2="{{val2}}">{{t2}}</nowiki>{{t3}}"
output: [TemplateOpen(), Text(text=u't1'), TemplateClose(), TagOpenOpen(), Text(text="nowiki"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="attr"), TagAttrEquals(), Text(text="val"), TagAttrStart(pad_first=" ", pad_before_eq="", pad_after_eq=""), Text(text="attr2"), TagAttrEquals(), TagAttrQuote(), TemplateOpen(), Text(text="val2"), TemplateClose(), TagCloseOpen(padding=""), Text(text="{{t2}}"), TagOpenClose(), Text(text="nowiki"), TagCloseClose(), TemplateOpen(), Text(text="t3"), TemplateClose()]

---

name: unparsable_incomplete
label: a tag that should not be put through the normal parser; incomplete
input: "{{t1}}<nowiki>{{t2}}{{t3}}"
output: [TemplateOpen(), Text(text="t1"), TemplateClose(), Text(text="<nowiki>"), TemplateOpen(), Text(text="t2"), TemplateClose(), TemplateOpen(), Text(text="t3"), TemplateClose()]

---

name: unparsable_entity
label: a HTML entity inside unparsable text is still parsed
input: "{{t1}}<nowiki>{{t2}}&nbsp;{{t3}}</nowiki>{{t4}}"
output: [TemplateOpen(), Text(text="t1"), TemplateClose(), TagOpenOpen(), Text(text="nowiki"), TagCloseOpen(padding=""), Text(text="{{t2}}"), HTMLEntityStart(), Text(text="nbsp"), HTMLEntityEnd(), Text(text="{{t3}}"), TagOpenClose(), Text(text="nowiki"), TagCloseClose(), TemplateOpen(), Text(text="t4"), TemplateClose()]

---

name: unparsable_entity_incomplete
label: an incomplete HTML entity inside unparsable text
input: "<nowiki>&</nowiki>"
output: [TagOpenOpen(), Text(text="nowiki"), TagCloseOpen(padding=""), Text(text="&"), TagOpenClose(), Text(text="nowiki"), TagCloseClose()]

---

name: unparsable_entity_incomplete_2
label: an incomplete HTML entity inside unparsable text
input: "<nowiki>&"
output: [Text(text="<nowiki>&")]

---

name: single_open_close
label: a tag that supports being single; both an open and a close tag
input: "foo<li>bar{{baz}}</li>"
output: [Text(text="foo"), TagOpenOpen(), Text(text="li"), TagCloseOpen(padding=""), Text(text="bar"), TemplateOpen(), Text(text="baz"), TemplateClose(), TagOpenClose(), Text(text="li"), TagCloseClose()]

---

name: single_open
label: a tag that supports being single; just an open tag
input: "foo<li>bar{{baz}}"
output: [Text(text="foo"), TagOpenOpen(), Text(text="li"), TagCloseSelfclose(padding="", implicit=True), Text(text="bar"), TemplateOpen(), Text(text="baz"), TemplateClose()]

---

name: single_selfclose
label: a tag that supports being single; a self-closing tag
input: "foo<li/>bar{{baz}}"
output: [Text(text="foo"), TagOpenOpen(), Text(text="li"), TagCloseSelfclose(padding=""), Text(text="bar"), TemplateOpen(), Text(text="baz"), TemplateClose()]

---

name: single_close
label: a tag that supports being single; just a close tag
input: "foo</li>bar{{baz}}"
output: [Text(text="foo</li>bar"), TemplateOpen(), Text(text="baz"), TemplateClose()]

---

name: single_only_open_close
label: a tag that can only be single; both an open and a close tag
input: "foo<br>bar{{baz}}</br>"
output: [Text(text="foo"), TagOpenOpen(), Text(text="br"), TagCloseSelfclose(padding="", implicit=True), Text(text="bar"), TemplateOpen(), Text(text="baz"), TemplateClose(), TagOpenOpen(invalid=True), Text(text="br"), TagCloseSelfclose(padding="", implicit=True)]

---

name: single_only_open
label: a tag that can only be single; just an open tag
input: "foo<br>bar{{baz}}"
output: [Text(text="foo"), TagOpenOpen(), Text(text="br"), TagCloseSelfclose(padding="", implicit=True), Text(text="bar"), TemplateOpen(), Text(text="baz"), TemplateClose()]

---

name: single_only_selfclose
label: a tag that can only be single; a self-closing tag
input: "foo<br/>bar{{baz}}"
output: [Text(text="foo"), TagOpenOpen(), Text(text="br"), TagCloseSelfclose(padding=""), Text(text="bar"), TemplateOpen(), Text(text="baz"), TemplateClose()]

---

name: single_only_close
label: a tag that can only be single; just a close tag
input: "foo</br>bar{{baz}}"
output: [Text(text="foo"), TagOpenOpen(invalid=True), Text(text="br"), TagCloseSelfclose(padding="", implicit=True), Text(text="bar"), TemplateOpen(), Text(text="baz"), TemplateClose()]

---

name: single_only_double
label: a tag that can only be single; a tag with backslashes at the beginning and end
input: "foo</br/>bar{{baz}}"
output: [Text(text="foo"), TagOpenOpen(invalid=True), Text(text="br"), TagCloseSelfclose(padding=""), Text(text="bar"), TemplateOpen(), Text(text="baz"), TemplateClose()]

---

name: single_only_close_attribute
label: a tag that can only be single; presented as a close tag with an attribute
input: "</br id="break">"
output: [TagOpenOpen(invalid=True), Text(text="br"), TagAttrStart(pad_first=" ", pad_after_eq="", pad_before_eq=""), Text(text="id"), TagAttrEquals(), TagAttrQuote(), Text(text="break"), TagCloseSelfclose(padding="", implicit=True)]

---

name: capitalization
label: caps should be ignored within tag names
input: "<NoWiKi>{{test}}</nOwIkI>"
output: [TagOpenOpen(), Text(text="NoWiKi"), TagCloseOpen(padding=""), Text(text="{{test}}"), TagOpenClose(), Text(text="nOwIkI"), TagCloseClose()]

+ 523
- 0
tests/tokenizer/tags_wikimarkup.mwtest View File

@@ -0,0 +1,523 @@
name: basic_italics
label: basic italic text
input: "''text''"
output: [TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="text"), TagOpenClose(), Text(text="i"), TagCloseClose()]

---

name: basic_bold
label: basic bold text
input: "'''text'''"
output: [TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), Text(text="text"), TagOpenClose(), Text(text="b"), TagCloseClose()]

---

name: basic_ul
label: basic unordered list
input: "*text"
output: [TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="text")]

---

name: basic_ol
label: basic ordered list
input: "#text"
output: [TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="text")]

---

name: basic_dt
label: basic description term
input: ";text"
output: [TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="text")]

---

name: basic_dd
label: basic description item
input: ":text"
output: [TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="text")]

---

name: basic_hr
label: basic horizontal rule
input: "----"
output: [TagOpenOpen(wiki_markup="----"), Text(text="hr"), TagCloseSelfclose()]

---

name: complex_italics
label: italics with a lot in them
input: "''this is a&nbsp;test of [[Italic text|italics]] with {{plenty|of|stuff}}''"
output: [TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="this is a"), HTMLEntityStart(), Text(text="nbsp"), HTMLEntityEnd(), Text(text="test of "), WikilinkOpen(), Text(text="Italic text"), WikilinkSeparator(), Text(text="italics"), WikilinkClose(), Text(text=" with "), TemplateOpen(), Text(text="plenty"), TemplateParamSeparator(), Text(text="of"), TemplateParamSeparator(), Text(text="stuff"), TemplateClose(), TagOpenClose(), Text(text="i"), TagCloseClose()]

---

name: multiline_italics
label: italics spanning mulitple lines
input: "foo\nbar''testing\ntext\nspanning\n\n\n\n\nmultiple\nlines''foo\n\nbar"
output: [Text(text="foo\nbar"), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="testing\ntext\nspanning\n\n\n\n\nmultiple\nlines"), TagOpenClose(), Text(text="i"), TagCloseClose(), Text(text="foo\n\nbar")]

---

name: unending_italics
label: italics without an ending tag
input: "''unending formatting!"
output: [Text(text="''unending formatting!")]

---

name: misleading_italics_end
label: italics with something that looks like an end but isn't
input: "''this is 'not' the en'd'<nowiki>''</nowiki>"
output: [Text(text="''this is 'not' the en'd'"), TagOpenOpen(), Text(text="nowiki"), TagCloseOpen(padding=""), Text(text="''"), TagOpenClose(), Text(text="nowiki"), TagCloseClose()]
]

---

name: italics_start_outside_end_inside
label: italics that start outside a link and end inside it
input: "''foo[[bar|baz'']]spam"
output: [Text(text="''foo"), WikilinkOpen(), Text(text="bar"), WikilinkSeparator(), Text(text="baz''"), WikilinkClose(), Text(text="spam")]

---

name: italics_start_inside_end_outside
label: italics that start inside a link and end outside it
input: "[[foo|''bar]]baz''spam"
output: [Text(text="[[foo|"), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="bar]]baz"), TagOpenClose(), Text(text="i"), TagCloseClose(), Text(text="spam")]

---

name: complex_bold
label: bold with a lot in it
input: "'''this is a&nbsp;test of [[Bold text|bold]] with {{plenty|of|stuff}}'''"
output: [TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), Text(text="this is a"), HTMLEntityStart(), Text(text="nbsp"), HTMLEntityEnd(), Text(text="test of "), WikilinkOpen(), Text(text="Bold text"), WikilinkSeparator(), Text(text="bold"), WikilinkClose(), Text(text=" with "), TemplateOpen(), Text(text="plenty"), TemplateParamSeparator(), Text(text="of"), TemplateParamSeparator(), Text(text="stuff"), TemplateClose(), TagOpenClose(), Text(text="b"), TagCloseClose()]

---

name: multiline_bold
label: bold spanning mulitple lines
input: "foo\nbar'''testing\ntext\nspanning\n\n\n\n\nmultiple\nlines'''foo\n\nbar"
output: [Text(text="foo\nbar"), TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), Text(text="testing\ntext\nspanning\n\n\n\n\nmultiple\nlines"), TagOpenClose(), Text(text="b"), TagCloseClose(), Text(text="foo\n\nbar")]

---

name: unending_bold
label: bold without an ending tag
input: "'''unending formatting!"
output: [Text(text="'''unending formatting!")]

---

name: misleading_bold_end
label: bold with something that looks like an end but isn't
input: "'''this is 'not' the en''d'<nowiki>'''</nowiki>"
output: [Text(text="'"), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="this is 'not' the en"), TagOpenClose(), Text(text="i"), TagCloseClose(), Text(text="d'"), TagOpenOpen(), Text(text="nowiki"), TagCloseOpen(padding=""), Text(text="'''"), TagOpenClose(), Text(text="nowiki"), TagCloseClose()]

---

name: bold_start_outside_end_inside
label: bold that start outside a link and end inside it
input: "'''foo[[bar|baz''']]spam"
output: [Text(text="'''foo"), WikilinkOpen(), Text(text="bar"), WikilinkSeparator(), Text(text="baz'''"), WikilinkClose(), Text(text="spam")]

---

name: bold_start_inside_end_outside
label: bold that start inside a link and end outside it
input: "[[foo|'''bar]]baz'''spam"
output: [Text(text="[[foo|"), TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), Text(text="bar]]baz"), TagOpenClose(), Text(text="b"), TagCloseClose(), Text(text="spam")]

---

name: bold_and_italics
label: bold and italics together
input: "this is '''''bold and italic text'''''!"
output: [Text(text="this is "), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), Text(text="bold and italic text"), TagOpenClose(), Text(text="b"), TagCloseClose(), TagOpenClose(), Text(text="i"), TagCloseClose(), Text(text="!")]

---

name: both_then_bold
label: text that starts bold/italic, then is just bold
input: "'''''both''bold'''"
output: [TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="both"), TagOpenClose(), Text(text="i"), TagCloseClose(), Text(text="bold"), TagOpenClose(), Text(text="b"), TagCloseClose()]

---

name: both_then_italics
label: text that starts bold/italic, then is just italic
input: "'''''both'''italics''"
output: [TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), Text(text="both"), TagOpenClose(), Text(text="b"), TagCloseClose(), Text(text="italics"), TagOpenClose(), Text(text="i"), TagCloseClose()]

---

name: bold_then_both
label: text that starts just bold, then is bold/italic
input: "'''bold''both'''''"
output: [TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), Text(text="bold"), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="both"), TagOpenClose(), Text(text="i"), TagCloseClose(), TagOpenClose(), Text(text="b"), TagCloseClose()]

---

name: italics_then_both
label: text that starts just italic, then is bold/italic
input: "''italics'''both'''''"
output: [TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="italics"), TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), Text(text="both"), TagOpenClose(), Text(text="b"), TagCloseClose(), TagOpenClose(), Text(text="i"), TagCloseClose()]

---

name: italics_then_bold
label: text that starts italic, then is bold
input: "none''italics'''''bold'''none"
output: [Text(text="none"), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="italics"), TagOpenClose(), Text(text="i"), TagCloseClose(), TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), Text(text="bold"), TagOpenClose(), Text(text="b"), TagCloseClose(), Text(text="none")]

---

name: bold_then_italics
label: text that starts bold, then is italic
input: "none'''bold'''''italics''none"
output: [Text(text="none"), TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), Text(text="bold"), TagOpenClose(), Text(text="b"), TagCloseClose(), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="italics"), TagOpenClose(), Text(text="i"), TagCloseClose(), Text(text="none")]

---

name: five_three
label: five ticks to open, three to close (bold)
input: "'''''foobar'''"
output: [Text(text="''"), TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), Text(text="foobar"), TagOpenClose(), Text(text="b"), TagCloseClose()]

---

name: five_two
label: five ticks to open, two to close (bold)
input: "'''''foobar''"
output: [Text(text="'''"), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="foobar"), TagOpenClose(), Text(text="i"), TagCloseClose()]

---

name: four
label: four ticks
input: "foo ''''bar'''' baz"
output: [Text(text="foo '"), TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), Text(text="bar'"), TagOpenClose(), Text(text="b"), TagCloseClose(), Text(text=" baz")]

---

name: four_two
label: four ticks to open, two to close
input: "foo ''''bar'' baz"
output: [Text(text="foo ''"), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="bar"), TagOpenClose(), Text(text="i"), TagCloseClose(), Text(text=" baz")]

---

name: two_three
label: two ticks to open, three to close
input: "foo ''bar''' baz"
output: [Text(text="foo "), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="bar'"), TagOpenClose(), Text(text="i"), TagCloseClose(), Text(text=" baz")]

---

name: two_four
label: two ticks to open, four to close
input: "foo ''bar'''' baz"
output: [Text(text="foo "), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="bar''"), TagOpenClose(), Text(text="i"), TagCloseClose(), Text(text=" baz")]

---

name: two_three_two
label: two ticks to open, three to close, two afterwards
input: "foo ''bar''' baz''"
output: [Text(text="foo "), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), Text(text="bar''' baz"), TagOpenClose(), Text(text="i"), TagCloseClose()]

---

name: two_four_four
label: two ticks to open, four to close, four afterwards
input: "foo ''bar'''' baz''''"
output: [Text(text="foo ''bar'"), TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), Text(text=" baz'"), TagOpenClose(), Text(text="b"), TagCloseClose()]

---

name: seven
label: seven ticks
input: "'''''''seven'''''''"
output: [Text(text="''"), TagOpenOpen(wiki_markup="''"), Text(text="i"), TagCloseOpen(), TagOpenOpen(wiki_markup="'''"), Text(text="b"), TagCloseOpen(), Text(text="seven''"), TagOpenClose(), Text(text="b"), TagCloseClose(), TagOpenClose(), Text(text="i"), TagCloseClose()]

---

name: complex_ul
label: ul with a lot in it
input: "* this is a&nbsp;test of an [[Unordered list|ul]] with {{plenty|of|stuff}}"
output: [TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text=" this is a"), HTMLEntityStart(), Text(text="nbsp"), HTMLEntityEnd(), Text(text="test of an "), WikilinkOpen(), Text(text="Unordered list"), WikilinkSeparator(), Text(text="ul"), WikilinkClose(), Text(text=" with "), TemplateOpen(), Text(text="plenty"), TemplateParamSeparator(), Text(text="of"), TemplateParamSeparator(), Text(text="stuff"), TemplateClose()]

---

name: ul_multiline_template
label: ul with a template that spans multiple lines
input: "* this has a template with a {{line|\nbreak}}\nthis is not part of the list"
output: [TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text=" this has a template with a "), TemplateOpen(), Text(text="line"), TemplateParamSeparator(), Text(text="\nbreak"), TemplateClose(), Text(text="\nthis is not part of the list")]

---

name: ul_adjacent
label: multiple adjacent uls
input: "a\n*b\n*c\nd\n*e\nf"
output: [Text(text="a\n"), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="b\n"), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="c\nd\n"), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="e\nf")]

---

name: ul_depths
label: multiple adjacent uls, with differing depths
input: "*a\n**b\n***c\n********d\n**e\nf\n***g"
output: [TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="a\n"), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="b\n"), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="c\n"), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="d\n"), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="e\nf\n"), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="g")]

---

name: ul_space_before
label: uls with space before them
input: "foo *bar\n *baz\n*buzz"
output: [Text(text="foo *bar\n *baz\n"), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="buzz")]

---

name: ul_interruption
label: high-depth ul with something blocking it
input: "**f*oobar"
output: [TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="f*oobar")]

---

name: complex_ol
label: ol with a lot in it
input: "# this is a&nbsp;test of an [[Ordered list|ol]] with {{plenty|of|stuff}}"
output: [TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text=" this is a"), HTMLEntityStart(), Text(text="nbsp"), HTMLEntityEnd(), Text(text="test of an "), WikilinkOpen(), Text(text="Ordered list"), WikilinkSeparator(), Text(text="ol"), WikilinkClose(), Text(text=" with "), TemplateOpen(), Text(text="plenty"), TemplateParamSeparator(), Text(text="of"), TemplateParamSeparator(), Text(text="stuff"), TemplateClose()]

---

name: ol_multiline_template
label: ol with a template that spans moltiple lines
input: "# this has a template with a {{line|\nbreak}}\nthis is not part of the list"
output: [TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text=" this has a template with a "), TemplateOpen(), Text(text="line"), TemplateParamSeparator(), Text(text="\nbreak"), TemplateClose(), Text(text="\nthis is not part of the list")]

---

name: ol_adjacent
label: moltiple adjacent ols
input: "a\n#b\n#c\nd\n#e\nf"
output: [Text(text="a\n"), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="b\n"), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="c\nd\n"), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="e\nf")]

---

name: ol_depths
label: moltiple adjacent ols, with differing depths
input: "#a\n##b\n###c\n########d\n##e\nf\n###g"
output: [TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="a\n"), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="b\n"), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="c\n"), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="d\n"), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="e\nf\n"), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="g")]

---

name: ol_space_before
label: ols with space before them
input: "foo #bar\n #baz\n#buzz"
output: [Text(text="foo #bar\n #baz\n"), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="buzz")]

---

name: ol_interruption
label: high-depth ol with something blocking it
input: "##f#oobar"
output: [TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="f#oobar")]

---

name: ul_ol_mix
label: a mix of adjacent uls and ols
input: "*a\n*#b\n*##c\n*##*#*#*d\n*#e\nf\n##*g"
output: [TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="a\n"), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="b\n"), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="c\n"), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="d\n"), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="e\nf\n"), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="g")]

---

name: complex_dt
label: dt with a lot in it
input: "; this is a&nbsp;test of an [[description term|dt]] with {{plenty|of|stuff}}"
output: [TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text=" this is a"), HTMLEntityStart(), Text(text="nbsp"), HTMLEntityEnd(), Text(text="test of an "), WikilinkOpen(), Text(text="description term"), WikilinkSeparator(), Text(text="dt"), WikilinkClose(), Text(text=" with "), TemplateOpen(), Text(text="plenty"), TemplateParamSeparator(), Text(text="of"), TemplateParamSeparator(), Text(text="stuff"), TemplateClose()]

---

name: dt_multiline_template
label: dt with a template that spans mdttiple lines
input: "; this has a template with a {{line|\nbreak}}\nthis is not part of the list"
output: [TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text=" this has a template with a "), TemplateOpen(), Text(text="line"), TemplateParamSeparator(), Text(text="\nbreak"), TemplateClose(), Text(text="\nthis is not part of the list")]

---

name: dt_adjacent
label: mdttiple adjacent dts
input: "a\n;b\n;c\nd\n;e\nf"
output: [Text(text="a\n"), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="b\n"), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="c\nd\n"), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="e\nf")]

---

name: dt_depths
label: mdttiple adjacent dts, with differing depths
input: ";a\n;;b\n;;;c\n;;;;;;;;d\n;;e\nf\n;;;g"
output: [TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="a\n"), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="b\n"), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="c\n"), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="d\n"), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="e\nf\n"), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="g")]

---

name: dt_space_before
label: dts with space before them
input: "foo ;bar\n ;baz\n;buzz"
output: [Text(text="foo ;bar\n ;baz\n"), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="buzz")]

---

name: dt_interruption
label: high-depth dt with something blocking it
input: ";;f;oobar"
output: [TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="f;oobar")]

---

name: complex_dd
label: dd with a lot in it
input: ": this is a&nbsp;test of an [[description item|dd]] with {{plenty|of|stuff}}"
output: [TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text=" this is a"), HTMLEntityStart(), Text(text="nbsp"), HTMLEntityEnd(), Text(text="test of an "), WikilinkOpen(), Text(text="description item"), WikilinkSeparator(), Text(text="dd"), WikilinkClose(), Text(text=" with "), TemplateOpen(), Text(text="plenty"), TemplateParamSeparator(), Text(text="of"), TemplateParamSeparator(), Text(text="stuff"), TemplateClose()]

---

name: dd_multiline_template
label: dd with a template that spans mddtiple lines
input: ": this has a template with a {{line|\nbreak}}\nthis is not part of the list"
output: [TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text=" this has a template with a "), TemplateOpen(), Text(text="line"), TemplateParamSeparator(), Text(text="\nbreak"), TemplateClose(), Text(text="\nthis is not part of the list")]

---

name: dd_adjacent
label: mddtiple adjacent dds
input: "a\n:b\n:c\nd\n:e\nf"
output: [Text(text="a\n"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="b\n"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="c\nd\n"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="e\nf")]

---

name: dd_depths
label: mddtiple adjacent dds, with differing depths
input: ":a\n::b\n:::c\n::::::::d\n::e\nf\n:::g"
output: [TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="a\n"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="b\n"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="c\n"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="d\n"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="e\nf\n"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="g")]

---

name: dd_space_before
label: dds with space before them
input: "foo :bar\n :baz\n:buzz"
output: [Text(text="foo :bar\n :baz\n"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="buzz")]

---

name: dd_interruption
label: high-depth dd with something blocking it
input: "::f:oobar"
output: [TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="f:oobar")]

---

name: dt_dd_mix
label: a mix of adjacent dts and dds
input: ";a\n;:b\n;::c\n;::;:;:;d\n;:e\nf\n::;g"
output: [TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="a\n"), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="b\n"), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="c\n"), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="d\n"), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="e\nf\n"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="g")]

---

name: dt_dd_mix2
label: the correct usage of a dt/dd unit, as in a dl
input: ";foo:bar:baz"
output: [TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="foo"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="bar:baz")]

---

name: dt_dd_mix3
label: another example of correct (but strange) dt/dd usage
input: ":;;::foo:bar:baz"
output: [TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="foo"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="bar:baz")]

---

name: ul_ol_dt_dd_mix
label: an assortment of uls, ols, dds, and dts
input: ";:#*foo\n:#*;foo\n#*;:foo\n*;:#foo"
output: [TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), Text(text="foo\n"), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), Text(text="foo\n"), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), Text(text="foo\n"), TagOpenOpen(wiki_markup="*"), Text(text="li"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=";"), Text(text="dt"), TagCloseSelfclose(), TagOpenOpen(wiki_markup=":"), Text(text="dd"), TagCloseSelfclose(), TagOpenOpen(wiki_markup="#"), Text(text="li"), TagCloseSelfclose(), Text(text="foo")]

---

name: hr_text_before
label: text before an otherwise-valid hr
input: "foo----"
output: [Text(text="foo----")]

---

name: hr_text_after
label: text after a valid hr
input: "----bar"
output: [TagOpenOpen(wiki_markup="----"), Text(text="hr"), TagCloseSelfclose(), Text(text="bar")]

---

name: hr_text_before_after
label: text at both ends of an otherwise-valid hr
input: "foo----bar"
output: [Text(text="foo----bar")]

---

name: hr_newlines
label: newlines surrounding a valid hr
input: "foo\n----\nbar"
output: [Text(text="foo\n"), TagOpenOpen(wiki_markup="----"), Text(text="hr"), TagCloseSelfclose(), Text(text="\nbar")]

---

name: hr_adjacent
label: two adjacent hrs
input: "----\n----"
output: [TagOpenOpen(wiki_markup="----"), Text(text="hr"), TagCloseSelfclose(), Text(text="\n"), TagOpenOpen(wiki_markup="----"), Text(text="hr"), TagCloseSelfclose()]

---

name: hr_adjacent_space
label: two adjacent hrs, with a space before the second one, making it invalid
input: "----\n ----"
output: [TagOpenOpen(wiki_markup="----"), Text(text="hr"), TagCloseSelfclose(), Text(text="\n ----")]

---

name: hr_short
label: an invalid three-hyphen-long hr
input: "---"
output: [Text(text="---")]

---

name: hr_long
label: a very long, valid hr
input: "------------------------------------------"
output: [TagOpenOpen(wiki_markup="------------------------------------------"), Text(text="hr"), TagCloseSelfclose()]

---

name: hr_interruption_short
label: a hr that is interrupted, making it invalid
input: "---x-"
output: [Text(text="---x-")]

---

name: hr_interruption_long
label: a hr that is interrupted, but the first part remains valid because it is long enough
input: "----x--"
output: [TagOpenOpen(wiki_markup="----"), Text(text="hr"), TagCloseSelfclose(), Text(text="x--")]

---

name: nowiki_cancel
label: a nowiki tag before a list causes it to not be parsed
input: "<nowiki />* Unordered list"
output: [TagOpenOpen(), Text(text="nowiki"), TagCloseSelfclose(padding=" "), Text(text="* Unordered list")]

+ 7
- 0
tests/tokenizer/text.mwtest View File

@@ -23,3 +23,10 @@ name: unicode2
label: additional unicode check for non-BMP codepoints
input: "𐌲𐌿𐍄𐌰𐍂𐌰𐌶𐌳𐌰"
output: [Text(text="𐌲𐌿𐍄𐌰𐍂𐌰𐌶𐌳𐌰")]

---

name: large
label: a lot of text, requiring multiple textbuffer blocks in the C tokenizer
input: "ZWfsZYcZyhGbkDYJiguJuuhsNyHGFkFhnjkbLJyXIygTHqcXdhsDkEOTSIKYlBiohLIkiXxvyebUyCGvvBcYqFdtcftGmaAanKXEIyYSEKlTfEEbdGhdePVwVImOyKiHSzAEuGyEVRIKPZaNjQsYqpqARIQfvAklFtQyTJVGlLwjJIxYkiqmHBmdOvTyNqJRbMvouoqXRyOhYDwowtkcZGSOcyzVxibQdnzhDYbrgbatUrlOMRvFSzmLWHRihtXnddwYadPgFWUOxAzAgddJVDXHerawdkrRuWaEXfuwQSkQUmLEJUmrgXDVlXCpciaisfuOUjBldElygamkkXbewzLucKRnAEBimIIotXeslRRhnqQjrypnLQvvdCsKFWPVTZaHvzJMFEahDHWcCbyXgxFvknWjhVfiLSDuFhGoFxqSvhjnnRZLmCMhmWeOgSoanDEInKTWHnbpKyUlabLppITDFFxyWKAnUYJQIcmYnrvMmzmtYvsbCYbebgAhMFVVFAKUSvlkLFYluDpbpBaNFWyfXTaOdSBrfiHDTWGBTUCXMqVvRCIMrEjWpQaGsABkioGnveQWqBTDdRQlxQiUipwfyqAocMddXqdvTHhEwjEzMkOSWVPjJvDtClhYwpvRztPmRKCSpGIpXQqrYtTLmShFdpKtOxGtGOZYIdyUGPjdmyvhJTQMtgYJWUUZnecRjBfQXsyWQWikyONySLzLEqRFqcJYdRNFcGwWZtfZasfFWcvdsHRXoqKlKYihRAOJdrPBDdxksXFwKceQVncmFXfUfBsNgjKzoObVExSnRnjegeEhqxXzPmFcuiasViAFeaXrAxXhSfSyCILkKYpjxNeKynUmdcGAbwRwRnlAFbOSCafmzXddiNpLCFTHBELvArdXFpKUGpSHRekhrMedMRNkQzmSyFKjVwiWwCvbNWjgxJRzYeRxHiCCRMXktmKBxbxGZvOpvZIJOwvGIxcBLzsMFlDqAMLtScdsJtrbIUAvKfcdChXGnBzIxGxXMgxJhayrziaCswdpjJJJhkaYnGhHXqZwOzHFdhhUIEtfjERdLaSPRTDDMHpQtonNaIgXUYhjdbnnKppfMBxgNSOOXJAPtFjfAKnrRDrumZBpNhxMstqjTGBViRkDqbTdXYUirsedifGYzZpQkvdNhtFTOPgsYXYCwZHLcSLSfwfpQKtWfZuRUUryHJsbVsAOQcIJdSKKlOvCeEjUQNRPHKXuBJUjPuaAJJxcDMqyaufqfVwUmHLdjeYZzSiiGLHOTCInpVAalbXXTMLugLiwFiyPSuSFiyJUKVrWjbZAHaJtZnQmnvorRrxdPKThqXzNgTjszQiCoMczRnwGYJMERUWGXFyrSbAqsHmLwLlnJOJoXNsjVehQjVOpQOQJAZWwFZBlgyVIplzLTlFwumPgBLYrUIAJAcmvHPGfHfWQguCjfTYzxYfbohaLFAPwxFRrNuCdCzLlEbuhyYjCmuDBTJDMCdLpNRVqEALjnPSaBPsKWRCKNGwEMFpiEWbYZRwaMopjoUuBUvMpvyLfsPKDrfQLiFOQIWPtLIMoijUEUYfhykHrSKbTtrvjwIzHdWZDVwLIpNkloCqpzIsErxxKAFuFEjikWNYChqYqVslXMtoSWzNhbMuxYbzLfJIcPGoUeGPkGyPQNhDyrjgdKekzftFrRPTuyLYqCArkDcWHTrjPQHfoThBNnTQyMwLEWxEnBXLtzJmFVLGEPrdbEwlXpgYfnVnWoNXgPQKKyiXifpvrmJATzQOzYwFhliiYxlbnsEPKbHYUfJLrwYPfSUwTIHiEvBFMrEtVmqJobfcwsiiEudTIiAnrtuywgKLOiMYbEIOAOJdOXqroPjWnQQcTNxFvkIEIsuHLyhSqSphuSmlvknzydQEnebOreeZwOouXYKlObAkaWHhOdTFLoMCHOWrVKeXjcniaxtgCziKEqWOZUWHJQpcDJzYnnduDZrmxgjZroBRwoPBUTJMYipsgJwbTSlvMyXXdAmiEWGMiQxhGvHGPLOKeTxNaLnFVbWpiYIVyqN"
output: [Text(text="ZWfsZYcZyhGbkDYJiguJuuhsNyHGFkFhnjkbLJyXIygTHqcXdhsDkEOTSIKYlBiohLIkiXxvyebUyCGvvBcYqFdtcftGmaAanKXEIyYSEKlTfEEbdGhdePVwVImOyKiHSzAEuGyEVRIKPZaNjQsYqpqARIQfvAklFtQyTJVGlLwjJIxYkiqmHBmdOvTyNqJRbMvouoqXRyOhYDwowtkcZGSOcyzVxibQdnzhDYbrgbatUrlOMRvFSzmLWHRihtXnddwYadPgFWUOxAzAgddJVDXHerawdkrRuWaEXfuwQSkQUmLEJUmrgXDVlXCpciaisfuOUjBldElygamkkXbewzLucKRnAEBimIIotXeslRRhnqQjrypnLQvvdCsKFWPVTZaHvzJMFEahDHWcCbyXgxFvknWjhVfiLSDuFhGoFxqSvhjnnRZLmCMhmWeOgSoanDEInKTWHnbpKyUlabLppITDFFxyWKAnUYJQIcmYnrvMmzmtYvsbCYbebgAhMFVVFAKUSvlkLFYluDpbpBaNFWyfXTaOdSBrfiHDTWGBTUCXMqVvRCIMrEjWpQaGsABkioGnveQWqBTDdRQlxQiUipwfyqAocMddXqdvTHhEwjEzMkOSWVPjJvDtClhYwpvRztPmRKCSpGIpXQqrYtTLmShFdpKtOxGtGOZYIdyUGPjdmyvhJTQMtgYJWUUZnecRjBfQXsyWQWikyONySLzLEqRFqcJYdRNFcGwWZtfZasfFWcvdsHRXoqKlKYihRAOJdrPBDdxksXFwKceQVncmFXfUfBsNgjKzoObVExSnRnjegeEhqxXzPmFcuiasViAFeaXrAxXhSfSyCILkKYpjxNeKynUmdcGAbwRwRnlAFbOSCafmzXddiNpLCFTHBELvArdXFpKUGpSHRekhrMedMRNkQzmSyFKjVwiWwCvbNWjgxJRzYeRxHiCCRMXktmKBxbxGZvOpvZIJOwvGIxcBLzsMFlDqAMLtScdsJtrbIUAvKfcdChXGnBzIxGxXMgxJhayrziaCswdpjJJJhkaYnGhHXqZwOzHFdhhUIEtfjERdLaSPRTDDMHpQtonNaIgXUYhjdbnnKppfMBxgNSOOXJAPtFjfAKnrRDrumZBpNhxMstqjTGBViRkDqbTdXYUirsedifGYzZpQkvdNhtFTOPgsYXYCwZHLcSLSfwfpQKtWfZuRUUryHJsbVsAOQcIJdSKKlOvCeEjUQNRPHKXuBJUjPuaAJJxcDMqyaufqfVwUmHLdjeYZzSiiGLHOTCInpVAalbXXTMLugLiwFiyPSuSFiyJUKVrWjbZAHaJtZnQmnvorRrxdPKThqXzNgTjszQiCoMczRnwGYJMERUWGXFyrSbAqsHmLwLlnJOJoXNsjVehQjVOpQOQJAZWwFZBlgyVIplzLTlFwumPgBLYrUIAJAcmvHPGfHfWQguCjfTYzxYfbohaLFAPwxFRrNuCdCzLlEbuhyYjCmuDBTJDMCdLpNRVqEALjnPSaBPsKWRCKNGwEMFpiEWbYZRwaMopjoUuBUvMpvyLfsPKDrfQLiFOQIWPtLIMoijUEUYfhykHrSKbTtrvjwIzHdWZDVwLIpNkloCqpzIsErxxKAFuFEjikWNYChqYqVslXMtoSWzNhbMuxYbzLfJIcPGoUeGPkGyPQNhDyrjgdKekzftFrRPTuyLYqCArkDcWHTrjPQHfoThBNnTQyMwLEWxEnBXLtzJmFVLGEPrdbEwlXpgYfnVnWoNXgPQKKyiXifpvrmJATzQOzYwFhliiYxlbnsEPKbHYUfJLrwYPfSUwTIHiEvBFMrEtVmqJobfcwsiiEudTIiAnrtuywgKLOiMYbEIOAOJdOXqroPjWnQQcTNxFvkIEIsuHLyhSqSphuSmlvknzydQEnebOreeZwOouXYKlObAkaWHhOdTFLoMCHOWrVKeXjcniaxtgCziKEqWOZUWHJQpcDJzYnnduDZrmxgjZroBRwoPBUTJMYipsgJwbTSlvMyXXdAmiEWGMiQxhGvHGPLOKeTxNaLnFVbWpiYIVyqN")]

+ 30
- 9
tests/tokenizer/wikilinks.mwtest View File

@@ -40,17 +40,17 @@ output: [WikilinkOpen(), Text(text="foo"), WikilinkSeparator(), Text(text="bar|b

---

name: nested
label: a wikilink nested within the value of another
input: "[[foo|[[bar]]]]"
output: [WikilinkOpen(), Text(text="foo"), WikilinkSeparator(), WikilinkOpen(), Text(text="bar"), WikilinkClose(), WikilinkClose()]
name: newline_text
label: a newline in the middle of the text
input: "[[foo|foo\nbar]]"
output: [WikilinkOpen(), Text(text="foo"), WikilinkSeparator(), Text(text="foo\nbar"), WikilinkClose()]

---

name: nested_with_text
label: a wikilink nested within the value of another, separated by other data
input: "[[foo|a[[b]]c]]"
output: [WikilinkOpen(), Text(text="foo"), WikilinkSeparator(), Text(text="a"), WikilinkOpen(), Text(text="b"), WikilinkClose(), Text(text="c"), WikilinkClose()]
name: bracket_text
label: a left bracket in the middle of the text
input: "[[foo|bar[baz]]"
output: [WikilinkOpen(), Text(text="foo"), WikilinkSeparator(), Text(text="bar[baz"), WikilinkClose()]

---

@@ -96,13 +96,34 @@ output: [Text(text="[[foo"), WikilinkOpen(), Text(text="bar"), WikilinkClose(),

---

name: invalid_nested_text
name: invalid_nested_padding
label: invalid wikilink: trying to nest in the wrong context, with a text param
input: "[[foo[[bar]]|baz]]"
output: [Text(text="[[foo"), WikilinkOpen(), Text(text="bar"), WikilinkClose(), Text(text="|baz]]")]

---

name: invalid_nested_text
label: invalid wikilink: a wikilink nested within the value of another
input: "[[foo|[[bar]]"
output: [Text(text="[[foo|"), WikilinkOpen(), Text(text="bar"), WikilinkClose()]

---

name: invalid_nested_text_2
label: invalid wikilink: a wikilink nested within the value of another, two pairs of closing brackets
input: "[[foo|[[bar]]]]"
output: [Text(text="[[foo|"), WikilinkOpen(), Text(text="bar"), WikilinkClose(), Text(text="]]")]

---

name: invalid_nested_text_padding
label: invalid wikilink: a wikilink nested within the value of another, separated by other data
input: "[[foo|a[[b]]c]]"
output: [Text(text="[[foo|a"), WikilinkOpen(), Text(text="b"), WikilinkClose(), Text(text="c]]")]

---

name: incomplete_open_only
label: incomplete wikilinks: just an open
input: "[["


Loading…
Cancel
Save