Browse Source

Merge branch 'develop'

tags/v0.3
Ben Kurtovic 11 years ago
parent
commit
1baf3f6af6
65 changed files with 7170 additions and 602 deletions
  1. +2
    -0
      .gitignore
  2. +6
    -0
      .travis.yml
  3. +33
    -0
      CHANGELOG
  4. +1
    -1
      LICENSE
  5. +25
    -12
      README.rst
  6. +58
    -0
      docs/changelog.rst
  7. +4
    -3
      docs/conf.py
  8. +10
    -3
      docs/index.rst
  9. +3
    -3
      docs/integration.rst
  10. +5
    -6
      mwparserfromhell/__init__.py
  11. +29
    -29
      mwparserfromhell/compat.py
  12. +1
    -1
      mwparserfromhell/nodes/__init__.py
  13. +2
    -1
      mwparserfromhell/nodes/argument.py
  14. +2
    -1
      mwparserfromhell/nodes/comment.py
  15. +1
    -1
      mwparserfromhell/nodes/extras/__init__.py
  16. +1
    -1
      mwparserfromhell/nodes/extras/attribute.py
  17. +1
    -1
      mwparserfromhell/nodes/extras/parameter.py
  18. +1
    -1
      mwparserfromhell/nodes/heading.py
  19. +68
    -33
      mwparserfromhell/nodes/html_entity.py
  20. +1
    -1
      mwparserfromhell/nodes/tag.py
  21. +61
    -48
      mwparserfromhell/nodes/template.py
  22. +5
    -1
      mwparserfromhell/nodes/text.py
  23. +6
    -2
      mwparserfromhell/nodes/wikilink.py
  24. +12
    -7
      mwparserfromhell/parser/__init__.py
  25. +1
    -1
      mwparserfromhell/parser/builder.py
  26. +40
    -24
      mwparserfromhell/parser/contexts.py
  27. +1512
    -0
      mwparserfromhell/parser/tokenizer.c
  28. +285
    -0
      mwparserfromhell/parser/tokenizer.h
  29. +102
    -50
      mwparserfromhell/parser/tokenizer.py
  30. +1
    -1
      mwparserfromhell/parser/tokens.py
  31. +134
    -45
      mwparserfromhell/smart_list.py
  32. +100
    -18
      mwparserfromhell/string_mixin.py
  33. +8
    -8
      mwparserfromhell/utils.py
  34. +78
    -102
      mwparserfromhell/wikicode.py
  35. +10
    -2
      setup.py
  36. +130
    -0
      tests/MWPFHTestCase.tmlanguage
  37. +133
    -0
      tests/_test_tokenizer.py
  38. +126
    -0
      tests/_test_tree_equality.py
  39. +20
    -0
      tests/compat.py
  40. +107
    -0
      tests/test_argument.py
  41. +247
    -0
      tests/test_builder.py
  42. +68
    -0
      tests/test_comment.py
  43. +48
    -0
      tests/test_ctokenizer.py
  44. +131
    -0
      tests/test_docs.py
  45. +91
    -0
      tests/test_heading.py
  46. +169
    -0
      tests/test_html_entity.py
  47. +42
    -86
      tests/test_parameter.py
  48. +38
    -35
      tests/test_parser.py
  49. +44
    -0
      tests/test_pytokenizer.py
  50. +392
    -0
      tests/test_smart_list.py
  51. +435
    -0
      tests/test_string_mixin.py
  52. +332
    -74
      tests/test_template.py
  53. +75
    -0
      tests/test_text.py
  54. +108
    -0
      tests/test_tokens.py
  55. +62
    -0
      tests/test_utils.py
  56. +364
    -0
      tests/test_wikicode.py
  57. +107
    -0
      tests/test_wikilink.py
  58. +130
    -0
      tests/tokenizer/arguments.mwtest
  59. +39
    -0
      tests/tokenizer/comments.mwtest
  60. +109
    -0
      tests/tokenizer/headings.mwtest
  61. +144
    -0
      tests/tokenizer/html_entities.mwtest
  62. +46
    -0
      tests/tokenizer/integration.mwtest
  63. +641
    -0
      tests/tokenizer/templates.mwtest
  64. +25
    -0
      tests/tokenizer/text.mwtest
  65. +158
    -0
      tests/tokenizer/wikilinks.mwtest

+ 2
- 0
.gitignore View File

@@ -1,4 +1,6 @@
*.pyc
*.so
*.dll
*.egg
*.egg-info
.DS_Store


+ 6
- 0
.travis.yml View File

@@ -0,0 +1,6 @@
language: python
python:
- "2.7"
- "3.3"
install: python setup.py build
script: python setup.py test -q

+ 33
- 0
CHANGELOG View File

@@ -0,0 +1,33 @@
v0.1.1 (19da4d2144) to v0.2:

- The parser now fully supports Python 3 in addition to Python 2.7.
- Added a C tokenizer extension that is significantly faster than its Python
equivalent. It is enabled by default (if available) and can be toggled by
setting `mwparserfromhell.parser.use_c` to a boolean value.
- Added a complete set of unit tests covering parsing and wikicode
manipulation.
- Renamed Wikicode.filter_links() to filter_wikilinks() (applies to ifilter as
well).
- Added filter methods for Arguments, Comments, Headings, and HTMLEntities.
- Added 'before' param to Template.add(); renamed 'force_nonconformity' to
'preserve_spacing'.
- Added 'include_lead' param to Wikicode.get_sections().
- Removed 'flat' param from Wikicode.get_sections().
- Removed 'force_no_field' param from Template.remove().
- Added support for Travis CI.
- Added note about Windows build issue in the README.
- The tokenizer will limit itself to a realistic recursion depth to prevent
errors and unreasonably long parse times.
- Fixed how some nodes' attribute setters handle input.
- Fixed multiple bugs in the tokenizer's handling of invalid markup.
- Fixed bugs in the implementation of SmartList and StringMixIn.
- Fixed some broken example code in the README; other copyedits.
- Other bugfixes and code cleanup.

v0.1 (ba94938fe8) to v0.1.1 (19da4d2144):

- Added support for Comments (<!-- foo -->) and Wikilinks ([[foo]]).
- Added corresponding ifilter_links() and filter_links() methods to Wikicode.
- Fixed a bug when parsing incomplete templates.
- Fixed strip_code() to affect the contents of headings.
- Various copyedits in documentation and comments.

+ 1
- 1
LICENSE View File

@@ -1,4 +1,4 @@
Copyright (C) 2012 Ben Kurtovic <ben.kurtovic@verizon.net>
Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal


+ 25
- 12
README.rst View File

@@ -1,6 +1,10 @@
mwparserfromhell
================

.. image:: https://travis-ci.org/earwig/mwparserfromhell.png?branch=develop
:alt: Build Status
:target: http://travis-ci.org/earwig/mwparserfromhell

**mwparserfromhell** (the *MediaWiki Parser from Hell*) is a Python package
that provides an easy-to-use and outrageously powerful parser for MediaWiki_
wikicode. It supports Python 2 and Python 3.
@@ -18,7 +22,13 @@ so you can install the latest release with ``pip install mwparserfromhell``
cd mwparserfromhell
python setup.py install

You can run the comprehensive unit testing suite with ``python setup.py test``.
If you get ``error: Unable to find vcvarsall.bat`` while installing, this is
because Windows can't find the compiler for C extensions. Consult this
`StackOverflow question`_ for help. You can also set ``ext_modules`` in
``setup.py`` to an empty list to prevent the extension from building.

You can run the comprehensive unit testing suite with
``python setup.py test -q``.

Usage
-----
@@ -106,12 +116,12 @@ Integration
``Page`` objects have a ``parse`` method that essentially calls
``mwparserfromhell.parse()`` on ``page.get()``.

If you're using PyWikipedia_, your code might look like this::
If you're using Pywikipedia_, your code might look like this::

import mwparserfromhell
import wikipedia as pywikibot
def parse(title):
site = pywikibot.get_site()
site = pywikibot.getSite()
page = pywikibot.Page(site, title)
text = page.get()
return mwparserfromhell.parse(text)
@@ -124,16 +134,19 @@ following code (via the API_)::
import mwparserfromhell
API_URL = "http://en.wikipedia.org/w/api.php"
def parse(title):
raw = urllib.urlopen(API_URL, data).read()
data = {"action": "query", "prop": "revisions", "rvlimit": 1,
"rvprop": "content", "format": "json", "titles": title}
raw = urllib.urlopen(API_URL, urllib.urlencode(data)).read()
res = json.loads(raw)
text = res["query"]["pages"].values()[0]["revisions"][0]["*"]
return mwparserfromhell.parse(text)

.. _MediaWiki: http://mediawiki.org
.. _Earwig: http://en.wikipedia.org/wiki/User:The_Earwig
.. _Σ: http://en.wikipedia.org/wiki/User:%CE%A3
.. _Python Package Index: http://pypi.python.org
.. _get pip: http://pypi.python.org/pypi/pip
.. _EarwigBot: https://github.com/earwig/earwigbot
.. _PyWikipedia: http://pywikipediabot.sourceforge.net/
.. _API: http://mediawiki.org/wiki/API
.. _MediaWiki: http://mediawiki.org
.. _Earwig: http://en.wikipedia.org/wiki/User:The_Earwig
.. _Σ: http://en.wikipedia.org/wiki/User:%CE%A3
.. _Python Package Index: http://pypi.python.org
.. _StackOverflow question: http://stackoverflow.com/questions/2817869/error-unable-to-find-vcvarsall-bat
.. _get pip: http://pypi.python.org/pypi/pip
.. _EarwigBot: https://github.com/earwig/earwigbot
.. _Pywikipedia: https://www.mediawiki.org/wiki/Manual:Pywikipediabot
.. _API: http://mediawiki.org/wiki/API

+ 58
- 0
docs/changelog.rst View File

@@ -0,0 +1,58 @@
Changelog
=========

v0.2
----

19da4d2144_ to master_ (released June 20, 2013)

- The parser now fully supports Python 3 in addition to Python 2.7.
- Added a C tokenizer extension that is significantly faster than its Python
equivalent. It is enabled by default (if available) and can be toggled by
setting :py:attr:`mwparserfromhell.parser.use_c` to a boolean value.
- Added a complete set of unit tests covering parsing and wikicode
manipulation.
- Renamed :py:meth:`.filter_links` to :py:meth:`.filter_wikilinks` (applies to
:py:meth:`.ifilter` as well).
- Added filter methods for :py:class:`Arguments <.Argument>`,
:py:class:`Comments <.Comment>`, :py:class:`Headings <.Heading>`, and
:py:class:`HTMLEntities <.HTMLEntity>`.
- Added *before* param to :py:meth:`Template.add() <.Template.add>`; renamed
*force_nonconformity* to *preserve_spacing*.
- Added *include_lead* param to :py:meth:`Wikicode.get_sections()
<.get_sections>`.
- Removed *flat* param from :py:meth:`.get_sections`.
- Removed *force_no_field* param from :py:meth:`Template.remove()
<.Template.remove>`.
- Added support for Travis CI.
- Added note about Windows build issue in the README.
- The tokenizer will limit itself to a realistic recursion depth to prevent
errors and unreasonably long parse times.
- Fixed how some nodes' attribute setters handle input.
- Fixed multiple bugs in the tokenizer's handling of invalid markup.
- Fixed bugs in the implementation of :py:class:`.SmartList` and
:py:class:`.StringMixIn`.
- Fixed some broken example code in the README; other copyedits.
- Other bugfixes and code cleanup.

v0.1.1
------

ba94938fe8_ to 19da4d2144_ (released September 21, 2012)

- Added support for :py:class:`Comments <.Comment>` (``<!-- foo -->``) and
:py:class:`Wikilinks <.Wikilink>` (``[[foo]]``).
- Added corresponding :py:meth:`.ifilter_links` and :py:meth:`.filter_links`
methods to :py:class:`.Wikicode`.
- Fixed a bug when parsing incomplete templates.
- Fixed :py:meth:`.strip_code` to affect the contents of headings.
- Various copyedits in documentation and comments.

v0.1
----

ba94938fe8_ (released August 23, 2012)

.. _master: https://github.com/earwig/mwparserfromhell/tree/v0.2
.. _19da4d2144: https://github.com/earwig/mwparserfromhell/tree/v0.1.1
.. _ba94938fe8: https://github.com/earwig/mwparserfromhell/tree/v0.1

+ 4
- 3
docs/conf.py View File

@@ -17,6 +17,7 @@ import sys, os
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
sys.path.insert(0, os.path.abspath('..'))
import mwparserfromhell

# -- General configuration -----------------------------------------------------

@@ -41,16 +42,16 @@ master_doc = 'index'

# General information about the project.
project = u'mwparserfromhell'
copyright = u'2012 Ben Kurtovic'
copyright = u'2012, 2013 Ben Kurtovic'

# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
version = '0.1'
version = ".".join(mwparserfromhell.__version__.split(".", 2)[:2])
# The full version, including alpha/beta/rc tags.
release = '0.1.1'
release = mwparserfromhell.__version__

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.


+ 10
- 3
docs/index.rst View File

@@ -1,4 +1,4 @@
MWParserFromHell v0.1 Documentation
MWParserFromHell v0.2 Documentation
===================================

:py:mod:`mwparserfromhell` (the *MediaWiki Parser from Hell*) is a Python
@@ -22,10 +22,16 @@ so you can install the latest release with ``pip install mwparserfromhell``
cd mwparserfromhell
python setup.py install

If you get ``error: Unable to find vcvarsall.bat`` while installing, this is
because Windows can't find the compiler for C extensions. Consult this
`StackOverflow question`_ for help. You can also set ``ext_modules`` in
``setup.py`` to an empty list to prevent the extension from building.

You can run the comprehensive unit testing suite with ``python setup.py test``.

.. _Python Package Index: http://pypi.python.org
.. _get pip: http://pypi.python.org/pypi/pip
.. _Python Package Index: http://pypi.python.org
.. _get pip: http://pypi.python.org/pypi/pip
.. _StackOverflow question: http://stackoverflow.com/questions/2817869/error-unable-to-find-vcvarsall-bat

Contents
--------
@@ -35,6 +41,7 @@ Contents

usage
integration
changelog
API Reference <api/modules>




+ 3
- 3
docs/integration.rst View File

@@ -7,12 +7,12 @@ Integration
:py:func:`mwparserfromhell.parse() <mwparserfromhell.__init__.parse>` on
:py:meth:`~earwigbot.wiki.page.Page.get`.

If you're using PyWikipedia_, your code might look like this::
If you're using Pywikipedia_, your code might look like this::

import mwparserfromhell
import wikipedia as pywikibot
def parse(title):
site = pywikibot.get_site()
site = pywikibot.getSite()
page = pywikibot.Page(site, title)
text = page.get()
return mwparserfromhell.parse(text)
@@ -31,5 +31,5 @@ following code (via the API_)::
return mwparserfromhell.parse(text)

.. _EarwigBot: https://github.com/earwig/earwigbot
.. _PyWikipedia: http://pywikipediabot.sourceforge.net/
.. _Pywikipedia: https://www.mediawiki.org/wiki/Manual:Pywikipediabot
.. _API: http://mediawiki.org/wiki/API

+ 5
- 6
mwparserfromhell/__init__.py View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012 Ben Kurtovic <ben.kurtovic@verizon.net>
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
@@ -29,12 +29,11 @@ outrageously powerful parser for `MediaWiki <http://mediawiki.org>`_ wikicode.
from __future__ import unicode_literals

__author__ = "Ben Kurtovic"
__copyright__ = "Copyright (C) 2012 Ben Kurtovic"
__copyright__ = "Copyright (C) 2012, 2013 Ben Kurtovic"
__license__ = "MIT License"
__version__ = "0.1.1"
__version__ = "0.2"
__email__ = "ben.kurtovic@verizon.net"

from . import nodes, parser, smart_list, string_mixin, wikicode
from . import compat, nodes, parser, smart_list, string_mixin, utils, wikicode

parse = lambda text: parser.Parser(text).parse()
parse.__doc__ = "Short for :py:meth:`.Parser.parse`."
parse = utils.parse_anything

+ 29
- 29
mwparserfromhell/compat.py View File

@@ -1,29 +1,29 @@
# -*- coding: utf-8 -*-
"""
Implements support for both Python 2 and Python 3 by defining common types in
terms of their Python 2/3 variants. For example, :py:class:`str` is set to
:py:class:`unicode` on Python 2 but :py:class:`str` on Python 3; likewise,
:py:class:`bytes` is :py:class:`str` on 2 but :py:class:`bytes` on 3. These
types are meant to be imported directly from within the parser's modules.
"""
import sys
py3k = sys.version_info.major == 3
if py3k:
bytes = bytes
str = str
basestring = str
maxsize = sys.maxsize
import html.entities as htmlentities
else:
bytes = str
str = unicode
basestring = basestring
maxsize = sys.maxint
import htmlentitydefs as htmlentities
del sys
# -*- coding: utf-8 -*-
"""
Implements support for both Python 2 and Python 3 by defining common types in
terms of their Python 2/3 variants. For example, :py:class:`str` is set to
:py:class:`unicode` on Python 2 but :py:class:`str` on Python 3; likewise,
:py:class:`bytes` is :py:class:`str` on 2 but :py:class:`bytes` on 3. These
types are meant to be imported directly from within the parser's modules.
"""
import sys
py3k = sys.version_info[0] == 3
if py3k:
bytes = bytes
str = str
basestring = str
maxsize = sys.maxsize
import html.entities as htmlentities
else:
bytes = str
str = unicode
basestring = basestring
maxsize = sys.maxint
import htmlentitydefs as htmlentities
del sys

+ 1
- 1
mwparserfromhell/nodes/__init__.py View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012 Ben Kurtovic <ben.kurtovic@verizon.net>
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal


+ 2
- 1
mwparserfromhell/nodes/argument.py View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012 Ben Kurtovic <ben.kurtovic@verizon.net>
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
@@ -30,6 +30,7 @@ __all__ = ["Argument"]

class Argument(Node):
"""Represents a template argument substitution, like ``{{{foo}}}``."""

def __init__(self, name, default=None):
super(Argument, self).__init__()
self._name = name


+ 2
- 1
mwparserfromhell/nodes/comment.py View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012 Ben Kurtovic <ben.kurtovic@verizon.net>
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
@@ -29,6 +29,7 @@ __all__ = ["Comment"]

class Comment(Node):
"""Represents a hidden HTML comment, like ``<!-- foobar -->``."""

def __init__(self, contents):
super(Comment, self).__init__()
self._contents = contents


+ 1
- 1
mwparserfromhell/nodes/extras/__init__.py View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012 Ben Kurtovic <ben.kurtovic@verizon.net>
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal


+ 1
- 1
mwparserfromhell/nodes/extras/attribute.py View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012 Ben Kurtovic <ben.kurtovic@verizon.net>
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal


+ 1
- 1
mwparserfromhell/nodes/extras/parameter.py View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012 Ben Kurtovic <ben.kurtovic@verizon.net>
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal


+ 1
- 1
mwparserfromhell/nodes/heading.py View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012 Ben Kurtovic <ben.kurtovic@verizon.net>
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal


+ 68
- 33
mwparserfromhell/nodes/html_entity.py View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012 Ben Kurtovic <ben.kurtovic@verizon.net>
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
@@ -23,7 +23,7 @@
from __future__ import unicode_literals

from . import Node
from ..compat import htmlentities, str
from ..compat import htmlentities, py3k, str

__all__ = ["HTMLEntity"]

@@ -63,28 +63,31 @@ class HTMLEntity(Node):
return self.normalize()
return self

def _unichr(self, value):
"""Implement the builtin unichr() with support for non-BMP code points.
if not py3k:
@staticmethod
def _unichr(value):
"""Implement builtin unichr() with support for non-BMP code points.

On wide Python builds, this functions like the normal unichr(). On
narrow builds, this returns the value's corresponding surrogate pair.
"""
try:
return unichr(value)
except ValueError:
# Test whether we're on the wide or narrow Python build. Check the
# length of a non-BMP code point (U+1F64A, SPEAK-NO-EVIL MONKEY):
if len("\U0001F64A") == 2:
# Ensure this is within the range we can encode:
if value > 0x10FFFF:
raise ValueError("unichr() arg not in range(0x110000)")
code = value - 0x10000
if value < 0: # Invalid code point
raise
lead = 0xD800 + (code >> 10)
trail = 0xDC00 + (code % (1 << 10))
return unichr(lead) + unichr(trail)
raise
On wide Python builds, this functions like the normal unichr(). On
narrow builds, this returns the value's encoded surrogate pair.
"""
try:
return unichr(value)
except ValueError:
# Test whether we're on the wide or narrow Python build. Check
# the length of a non-BMP code point
# (U+1F64A, SPEAK-NO-EVIL MONKEY):
if len("\U0001F64A") == 2:
# Ensure this is within the range we can encode:
if value > 0x10FFFF:
raise ValueError("unichr() arg not in range(0x110000)")
code = value - 0x10000
if value < 0: # Invalid code point
raise
lead = 0xD800 + (code >> 10)
trail = 0xDC00 + (code % (1 << 10))
return unichr(lead) + unichr(trail)
raise

@property
def value(self):
@@ -119,28 +122,60 @@ class HTMLEntity(Node):
@value.setter
def value(self, newval):
newval = str(newval)
if newval not in htmlentities.entitydefs:
test = int(self.value, 16)
if test < 0 or (test > 0x10FFFF and int(self.value) > 0x10FFFF):
raise ValueError(newval)
try:
int(newval)
except ValueError:
try:
int(newval, 16)
except ValueError:
if newval not in htmlentities.entitydefs:
raise ValueError("entity value is not a valid name")
self._named = True
self._hexadecimal = False
else:
if int(newval, 16) < 0 or int(newval, 16) > 0x10FFFF:
raise ValueError("entity value is not in range(0x110000)")
self._named = False
self._hexadecimal = True
else:
test = int(newval, 16 if self.hexadecimal else 10)
if test < 0 or test > 0x10FFFF:
raise ValueError("entity value is not in range(0x110000)")
self._named = False
self._value = newval

@named.setter
def named(self, newval):
self._named = bool(newval)
newval = bool(newval)
if newval and self.value not in htmlentities.entitydefs:
raise ValueError("entity value is not a valid name")
if not newval:
try:
int(self.value, 16)
except ValueError:
err = "current entity value is not a valid Unicode codepoint"
raise ValueError(err)
self._named = newval

@hexadecimal.setter
def hexadecimal(self, newval):
self._hexadecimal = bool(newval)
newval = bool(newval)
if newval and self.named:
raise ValueError("a named entity cannot be hexadecimal")
self._hexadecimal = newval

@hex_char.setter
def hex_char(self, newval):
self._hex_char = bool(newval)
newval = str(newval)
if newval not in ("x", "X"):
raise ValueError(newval)
self._hex_char = newval

def normalize(self):
"""Return the unicode character represented by the HTML entity."""
chrfunc = chr if py3k else HTMLEntity._unichr
if self.named:
return unichr(htmlentities.name2codepoint[self.value])
return chrfunc(htmlentities.name2codepoint[self.value])
if self.hexadecimal:
return self._unichr(int(self.value, 16))
return self._unichr(int(self.value))
return chrfunc(int(self.value, 16))
return chrfunc(int(self.value))

+ 1
- 1
mwparserfromhell/nodes/tag.py View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012 Ben Kurtovic <ben.kurtovic@verizon.net>
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal


+ 61
- 48
mwparserfromhell/nodes/template.py View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012 Ben Kurtovic <ben.kurtovic@verizon.net>
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
@@ -81,7 +81,7 @@ class Template(Node):
in parameter names or values so they are not mistaken for new
parameters.
"""
replacement = HTMLEntity(value=ord(char))
replacement = str(HTMLEntity(value=ord(char)))
for node in code.filter_text(recursive=False):
if char in node:
code.replace(node, node.replace(char, replacement))
@@ -107,7 +107,7 @@ class Template(Node):
values = tuple(theories.values())
best = max(values)
confidence = float(best) / sum(values)
if confidence > 0.75:
if confidence >= 0.75:
return tuple(theories.keys())[values.index(best)]

def _get_spacing_conventions(self, use_names):
@@ -142,9 +142,9 @@ class Template(Node):
return False
return True

def _remove_without_field(self, param, i, force_no_field):
def _remove_without_field(self, param, i):
"""Return False if a parameter name should be kept, otherwise True."""
if not param.showkey and not force_no_field:
if not param.showkey:
dependents = [not after.showkey for after in self.params[i+1:]]
if any(dependents):
return False
@@ -183,11 +183,10 @@ class Template(Node):
def get(self, name):
"""Get the parameter whose name is *name*.

The returned object is a
:py:class:`~.Parameter` instance. Raises :py:exc:`ValueError` if no
parameter has this name. Since multiple parameters can have the same
name, we'll return the last match, since the last parameter is the only
one read by the MediaWiki parser.
The returned object is a :py:class:`~.Parameter` instance. Raises
:py:exc:`ValueError` if no parameter has this name. Since multiple
parameters can have the same name, we'll return the last match, since
the last parameter is the only one read by the MediaWiki parser.
"""
name = name.strip() if isinstance(name, basestring) else str(name)
for param in reversed(self.params):
@@ -195,20 +194,34 @@ class Template(Node):
return param
raise ValueError(name)

def add(self, name, value, showkey=None, force_nonconformity=False):
def add(self, name, value, showkey=None, before=None,
preserve_spacing=True):
"""Add a parameter to the template with a given *name* and *value*.

*name* and *value* can be anything parasable by
:py:func:`.utils.parse_anything`; pipes (and equal signs, if
appropriate) are automatically escaped from *value* where applicable.
:py:func:`.utils.parse_anything`; pipes and equal signs are
automatically escaped from *value* when appropriate.

If *showkey* is given, this will determine whether or not to show the
parameter's name (e.g., ``{{foo|bar}}``'s parameter has a name of
``"1"`` but it is hidden); otherwise, we'll make a safe and intelligent
guess. If *name* is already a parameter, we'll replace its value while
keeping the same spacing rules unless *force_nonconformity* is
``True``. We will also try to guess the dominant spacing convention
when adding a new parameter using :py:meth:`_get_spacing_conventions`
unless *force_nonconformity* is ``True``.
guess.

If *name* is already a parameter in the template, we'll replace its
value while keeping the same whitespace around it. We will also try to
guess the dominant spacing convention when adding a new parameter using
:py:meth:`_get_spacing_conventions`.

If *before* is given (either a :py:class:`~.Parameter` object or a
name), then we will place the parameter immediately before this one.
Otherwise, it will be added at the end. If *before* is a name and
exists multiple times in the template, we will place it before the last
occurance. If *before* is not in the template, :py:exc:`ValueError` is
raised. The argument is ignored if the new parameter already exists.

If *preserve_spacing* is ``False``, we will avoid preserving spacing
conventions when changing the value of an existing parameter or when
adding a new one.
"""
name, value = parse_anything(name), parse_anything(value)
self._surface_escape(value, "|")
@@ -217,14 +230,17 @@ class Template(Node):
self.remove(name, keep_field=True)
existing = self.get(name)
if showkey is not None:
if not showkey:
self._surface_escape(value, "=")
existing.showkey = showkey
if not existing.showkey:
self._surface_escape(value, "=")
nodes = existing.value.nodes
if force_nonconformity:
existing.value = value
else:
if preserve_spacing:
for i in range(2): # Ignore empty text nodes
if not nodes[i]:
nodes[i] = None
existing.value = parse_anything([nodes[0], value, nodes[1]])
else:
existing.value = value
return existing

if showkey is None:
@@ -246,43 +262,38 @@ class Template(Node):
if not showkey:
self._surface_escape(value, "=")

if not force_nonconformity:
if preserve_spacing:
before_n, after_n = self._get_spacing_conventions(use_names=True)
if before_n and after_n:
name = parse_anything([before_n, value, after_n])
elif before_n:
name = parse_anything([before_n, value])
elif after_n:
name = parse_anything([value, after_n])

before_v, after_v = self._get_spacing_conventions(use_names=False)
if before_v and after_v:
value = parse_anything([before_v, value, after_v])
elif before_v:
value = parse_anything([before_v, value])
elif after_v:
value = parse_anything([value, after_v])
name = parse_anything([before_n, name, after_n])
value = parse_anything([before_v, value, after_v])

param = Parameter(name, value, showkey)
self.params.append(param)
if before:
if not isinstance(before, Parameter):
before = self.get(before)
self.params.insert(self.params.index(before), param)
else:
self.params.append(param)
return param

def remove(self, name, keep_field=False, force_no_field=False):
def remove(self, name, keep_field=False):
"""Remove a parameter from the template whose name is *name*.

If *keep_field* is ``True``, we will keep the parameter's name, but
blank its value. Otherwise, we will remove the parameter completely
*unless* other parameters are dependent on it (e.g. removing ``bar``
from ``{{foo|bar|baz}}`` is unsafe because ``{{foo|baz}}`` is not what
we expected, so ``{{foo||baz}}`` will be produced instead), unless
*force_no_field* is also ``True``. If the parameter shows up multiple
times in the template, we will remove all instances of it (and keep
one if *keep_field* is ``True`` - that being the first instance if
none of the instances have dependents, otherwise that instance will be
kept).
we expected, so ``{{foo||baz}}`` will be produced instead).
If the parameter shows up multiple times in the template, we will
remove all instances of it (and keep one if *keep_field* is ``True`` -
the first instance if none have dependents, otherwise the one with
dependents will be kept).
"""
name = name.strip() if isinstance(name, basestring) else str(name)
removed = False
to_remove = []
for i, param in enumerate(self.params):
if param.name.strip() == name:
if keep_field:
@@ -290,13 +301,15 @@ class Template(Node):
self._blank_param_value(param.value)
keep_field = False
else:
self.params.remove(param)
to_remove.append(param)
else:
if self._remove_without_field(param, i, force_no_field):
self.params.remove(param)
if self._remove_without_field(param, i):
to_remove.append(param)
else:
self._blank_param_value(param.value)
if not removed:
removed = True
if not removed:
raise ValueError(name)
for param in to_remove:
self.params.remove(param)

+ 5
- 1
mwparserfromhell/nodes/text.py View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012 Ben Kurtovic <ben.kurtovic@verizon.net>
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
@@ -29,6 +29,7 @@ __all__ = ["Text"]

class Text(Node):
"""Represents ordinary, unformatted text with no special properties."""

def __init__(self, value):
super(Text, self).__init__()
self._value = value
@@ -39,6 +40,9 @@ class Text(Node):
def __strip__(self, normalize, collapse):
return self

def __showtree__(self, write, get, mark):
write(str(self).encode("unicode_escape").decode("utf8"))

@property
def value(self):
"""The actual text itself."""


+ 6
- 2
mwparserfromhell/nodes/wikilink.py View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012 Ben Kurtovic <ben.kurtovic@verizon.net>
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
@@ -30,6 +30,7 @@ __all__ = ["Wikilink"]

class Wikilink(Node):
"""Represents an internal wikilink, like ``[[Foo|Bar]]``."""

def __init__(self, title, text=None):
super(Wikilink, self).__init__()
self._title = title
@@ -78,4 +79,7 @@ class Wikilink(Node):

@text.setter
def text(self, value):
self._text = parse_anything(value)
if value is None:
self._text = None
else:
self._text = parse_anything(value)

+ 12
- 7
mwparserfromhell/parser/__init__.py View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012 Ben Kurtovic <ben.kurtovic@verizon.net>
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
@@ -26,14 +26,16 @@ modules: the :py:mod:`~.tokenizer` and the :py:mod:`~.builder`. This module
joins them together under one interface.
"""

from .builder import Builder
from .tokenizer import Tokenizer
try:
from ._builder import CBuilder as Builder
from ._tokenizer import CTokenizer as Tokenizer
from ._tokenizer import CTokenizer
use_c = True
except ImportError:
from .builder import Builder
from .tokenizer import Tokenizer
CTokenizer = None
use_c = False

__all__ = ["Parser"]
__all__ = ["use_c", "Parser"]

class Parser(object):
"""Represents a parser for wikicode.
@@ -46,7 +48,10 @@ class Parser(object):

def __init__(self, text):
self.text = text
self._tokenizer = Tokenizer()
if use_c and CTokenizer:
self._tokenizer = CTokenizer()
else:
self._tokenizer = Tokenizer()
self._builder = Builder()

def parse(self):


+ 1
- 1
mwparserfromhell/parser/builder.py View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012 Ben Kurtovic <ben.kurtovic@verizon.net>
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal


+ 40
- 24
mwparserfromhell/parser/contexts.py View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012 Ben Kurtovic <ben.kurtovic@verizon.net>
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
@@ -62,6 +62,15 @@ Local (stack-specific) contexts:

* :py:const:`COMMENT`

* :py:const:`SAFETY_CHECK`

* :py:const:`HAS_TEXT`
* :py:const:`FAIL_ON_TEXT`
* :py:const:`FAIL_NEXT`
* :py:const:`FAIL_ON_LBRACE`
* :py:const:`FAIL_ON_RBRACE`
* :py:const:`FAIL_ON_EQUALS`

Global contexts:

* :py:const:`GL_HEADING`
@@ -69,29 +78,36 @@ Global contexts:

# Local contexts:

TEMPLATE = 0b00000000000111
TEMPLATE_NAME = 0b00000000000001
TEMPLATE_PARAM_KEY = 0b00000000000010
TEMPLATE_PARAM_VALUE = 0b00000000000100

ARGUMENT = 0b00000000011000
ARGUMENT_NAME = 0b00000000001000
ARGUMENT_DEFAULT = 0b00000000010000

WIKILINK = 0b00000001100000
WIKILINK_TITLE = 0b00000000100000
WIKILINK_TEXT = 0b00000001000000

HEADING = 0b01111110000000
HEADING_LEVEL_1 = 0b00000010000000
HEADING_LEVEL_2 = 0b00000100000000
HEADING_LEVEL_3 = 0b00001000000000
HEADING_LEVEL_4 = 0b00010000000000
HEADING_LEVEL_5 = 0b00100000000000
HEADING_LEVEL_6 = 0b01000000000000

COMMENT = 0b10000000000000

TEMPLATE = 0b00000000000000000111
TEMPLATE_NAME = 0b00000000000000000001
TEMPLATE_PARAM_KEY = 0b00000000000000000010
TEMPLATE_PARAM_VALUE = 0b00000000000000000100

ARGUMENT = 0b00000000000000011000
ARGUMENT_NAME = 0b00000000000000001000
ARGUMENT_DEFAULT = 0b00000000000000010000

WIKILINK = 0b00000000000001100000
WIKILINK_TITLE = 0b00000000000000100000
WIKILINK_TEXT = 0b00000000000001000000

HEADING = 0b00000001111110000000
HEADING_LEVEL_1 = 0b00000000000010000000
HEADING_LEVEL_2 = 0b00000000000100000000
HEADING_LEVEL_3 = 0b00000000001000000000
HEADING_LEVEL_4 = 0b00000000010000000000
HEADING_LEVEL_5 = 0b00000000100000000000
HEADING_LEVEL_6 = 0b00000001000000000000

COMMENT = 0b00000010000000000000

SAFETY_CHECK = 0b11111100000000000000
HAS_TEXT = 0b00000100000000000000
FAIL_ON_TEXT = 0b00001000000000000000
FAIL_NEXT = 0b00010000000000000000
FAIL_ON_LBRACE = 0b00100000000000000000
FAIL_ON_RBRACE = 0b01000000000000000000
FAIL_ON_EQUALS = 0b10000000000000000000

# Global contexts:



+ 1512
- 0
mwparserfromhell/parser/tokenizer.c
File diff suppressed because it is too large
View File


+ 285
- 0
mwparserfromhell/parser/tokenizer.h View File

@@ -0,0 +1,285 @@
/*
Tokenizer Header File for MWParserFromHell
Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is furnished to do
so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
*/

#ifndef PY_SSIZE_T_CLEAN
#define PY_SSIZE_T_CLEAN
#endif

#include <Python.h>
#include <math.h>
#include <structmember.h>

#if PY_MAJOR_VERSION >= 3
#define IS_PY3K
#endif

#define malloc PyObject_Malloc
#define free PyObject_Free

#define DIGITS "0123456789"
#define HEXDIGITS "0123456789abcdefABCDEF"
#define ALPHANUM "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

static const char* MARKERS[] = {
"{", "}", "[", "]", "<", ">", "|", "=", "&", "#", "*", ";", ":", "/", "-",
"!", "\n", ""};

#define NUM_MARKERS 18
#define TEXTBUFFER_BLOCKSIZE 1024
#define MAX_DEPTH 40
#define MAX_CYCLES 100000
#define MAX_BRACES 255
#define MAX_ENTITY_SIZE 8

static int route_state = 0;
#define BAD_ROUTE (route_state)
#define FAIL_ROUTE() (route_state = 1)
#define RESET_ROUTE() (route_state = 0)

static char** entitydefs;

static PyObject* EMPTY;
static PyObject* NOARGS;
static PyObject* tokens;


/* Tokens */

static PyObject* Text;

static PyObject* TemplateOpen;
static PyObject* TemplateParamSeparator;
static PyObject* TemplateParamEquals;
static PyObject* TemplateClose;

static PyObject* ArgumentOpen;
static PyObject* ArgumentSeparator;
static PyObject* ArgumentClose;

static PyObject* WikilinkOpen;
static PyObject* WikilinkSeparator;
static PyObject* WikilinkClose;

static PyObject* HTMLEntityStart;
static PyObject* HTMLEntityNumeric;
static PyObject* HTMLEntityHex;
static PyObject* HTMLEntityEnd;
static PyObject* HeadingStart;
static PyObject* HeadingEnd;

static PyObject* CommentStart;
static PyObject* CommentEnd;

static PyObject* TagOpenOpen;
static PyObject* TagAttrStart;
static PyObject* TagAttrEquals;
static PyObject* TagAttrQuote;
static PyObject* TagCloseOpen;
static PyObject* TagCloseSelfclose;
static PyObject* TagOpenClose;
static PyObject* TagCloseClose;


/* Local contexts: */

#define LC_TEMPLATE 0x00007
#define LC_TEMPLATE_NAME 0x00001
#define LC_TEMPLATE_PARAM_KEY 0x00002
#define LC_TEMPLATE_PARAM_VALUE 0x00004

#define LC_ARGUMENT 0x00018
#define LC_ARGUMENT_NAME 0x00008
#define LC_ARGUMENT_DEFAULT 0x00010

#define LC_WIKILINK 0x00060
#define LC_WIKILINK_TITLE 0x00020
#define LC_WIKILINK_TEXT 0x00040

#define LC_HEADING 0x01F80
#define LC_HEADING_LEVEL_1 0x00080
#define LC_HEADING_LEVEL_2 0x00100
#define LC_HEADING_LEVEL_3 0x00200
#define LC_HEADING_LEVEL_4 0x00400
#define LC_HEADING_LEVEL_5 0x00800
#define LC_HEADING_LEVEL_6 0x01000

#define LC_COMMENT 0x02000

#define LC_SAFETY_CHECK 0xFC000
#define LC_HAS_TEXT 0x04000
#define LC_FAIL_ON_TEXT 0x08000
#define LC_FAIL_NEXT 0x10000
#define LC_FAIL_ON_LBRACE 0x20000
#define LC_FAIL_ON_RBRACE 0x40000
#define LC_FAIL_ON_EQUALS 0x80000

/* Global contexts: */

#define GL_HEADING 0x1


/* Miscellaneous structs: */

struct Textbuffer {
Py_ssize_t size;
Py_UNICODE* data;
struct Textbuffer* next;
};

struct Stack {
PyObject* stack;
int context;
struct Textbuffer* textbuffer;
struct Stack* next;
};

typedef struct {
PyObject* title;
int level;
} HeadingData;


/* Tokenizer object definition: */

typedef struct {
PyObject_HEAD
PyObject* text; /* text to tokenize */
struct Stack* topstack; /* topmost stack */
Py_ssize_t head; /* current position in text */
Py_ssize_t length; /* length of text */
int global; /* global context */
int depth; /* stack recursion depth */
int cycles; /* total number of stack recursions */
} Tokenizer;


/* Macros for accessing Tokenizer data: */

#define Tokenizer_READ(self, delta) (*PyUnicode_AS_UNICODE(Tokenizer_read(self, delta)))
#define Tokenizer_CAN_RECURSE(self) (self->depth < MAX_DEPTH && self->cycles < MAX_CYCLES)


/* Function prototypes: */

static int heading_level_from_context(int);
static PyObject* Tokenizer_new(PyTypeObject*, PyObject*, PyObject*);
static struct Textbuffer* Textbuffer_new(void);
static void Tokenizer_dealloc(Tokenizer*);
static void Textbuffer_dealloc(struct Textbuffer*);
static int Tokenizer_init(Tokenizer*, PyObject*, PyObject*);
static int Tokenizer_push(Tokenizer*, int);
static PyObject* Textbuffer_render(struct Textbuffer*);
static int Tokenizer_push_textbuffer(Tokenizer*);
static void Tokenizer_delete_top_of_stack(Tokenizer*);
static PyObject* Tokenizer_pop(Tokenizer*);
static PyObject* Tokenizer_pop_keeping_context(Tokenizer*);
static void* Tokenizer_fail_route(Tokenizer*);
static int Tokenizer_write(Tokenizer*, PyObject*);
static int Tokenizer_write_first(Tokenizer*, PyObject*);
static int Tokenizer_write_text(Tokenizer*, Py_UNICODE);
static int Tokenizer_write_all(Tokenizer*, PyObject*);
static int Tokenizer_write_text_then_stack(Tokenizer*, const char*);
static PyObject* Tokenizer_read(Tokenizer*, Py_ssize_t);
static PyObject* Tokenizer_read_backwards(Tokenizer*, Py_ssize_t);
static int Tokenizer_parse_template_or_argument(Tokenizer*);
static int Tokenizer_parse_template(Tokenizer*);
static int Tokenizer_parse_argument(Tokenizer*);
static int Tokenizer_handle_template_param(Tokenizer*);
static int Tokenizer_handle_template_param_value(Tokenizer*);
static PyObject* Tokenizer_handle_template_end(Tokenizer*);
static int Tokenizer_handle_argument_separator(Tokenizer*);
static PyObject* Tokenizer_handle_argument_end(Tokenizer*);
static int Tokenizer_parse_wikilink(Tokenizer*);
static int Tokenizer_handle_wikilink_separator(Tokenizer*);
static PyObject* Tokenizer_handle_wikilink_end(Tokenizer*);
static int Tokenizer_parse_heading(Tokenizer*);
static HeadingData* Tokenizer_handle_heading_end(Tokenizer*);
static int Tokenizer_really_parse_entity(Tokenizer*);
static int Tokenizer_parse_entity(Tokenizer*);
static int Tokenizer_parse_comment(Tokenizer*);
static int Tokenizer_verify_safe(Tokenizer*, int, Py_UNICODE);
static PyObject* Tokenizer_parse(Tokenizer*, int);
static PyObject* Tokenizer_tokenize(Tokenizer*, PyObject*);


/* More structs for creating the Tokenizer type: */

static PyMethodDef
Tokenizer_methods[] = {
{"tokenize", (PyCFunction) Tokenizer_tokenize, METH_VARARGS,
"Build a list of tokens from a string of wikicode and return it."},
{NULL}
};

static PyMemberDef
Tokenizer_members[] = {
{NULL}
};

static PyMethodDef
module_methods[] = {
{NULL}
};

static PyTypeObject
TokenizerType = {
PyObject_HEAD_INIT(NULL)
0, /* ob_size */
"_tokenizer.CTokenizer", /* tp_name */
sizeof(Tokenizer), /* tp_basicsize */
0, /* tp_itemsize */
(destructor) Tokenizer_dealloc, /* tp_dealloc */
0, /* tp_print */
0, /* tp_getattr */
0, /* tp_setattr */
0, /* tp_compare */
0, /* tp_repr */
0, /* tp_as_number */
0, /* tp_as_sequence */
0, /* tp_as_mapping */
0, /* tp_hash */
0, /* tp_call */
0, /* tp_str */
0, /* tp_getattro */
0, /* tp_setattro */
0, /* tp_as_buffer */
Py_TPFLAGS_DEFAULT, /* tp_flags */
"Creates a list of tokens from a string of wikicode.", /* tp_doc */
0, /* tp_traverse */
0, /* tp_clear */
0, /* tp_richcompare */
0, /* tp_weaklistoffset */
0, /* tp_iter */
0, /* tp_iternext */
Tokenizer_methods, /* tp_methods */
Tokenizer_members, /* tp_members */
0, /* tp_getset */
0, /* tp_base */
0, /* tp_dict */
0, /* tp_descr_get */
0, /* tp_descr_set */
0, /* tp_dictoffset */
(initproc) Tokenizer_init, /* tp_init */
0, /* tp_alloc */
Tokenizer_new, /* tp_new */
};

+ 102
- 50
mwparserfromhell/parser/tokenizer.py View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012 Ben Kurtovic <ben.kurtovic@verizon.net>
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
@@ -23,7 +23,6 @@
from __future__ import unicode_literals
from math import log
import re
import string

from . import contexts
from . import tokens
@@ -38,10 +37,13 @@ class BadRoute(Exception):

class Tokenizer(object):
"""Creates a list of tokens from a string of wikicode."""
USES_C = False
START = object()
END = object()
MARKERS = ["{", "}", "[", "]", "<", ">", "|", "=", "&", "#", "*", ";", ":",
"/", "-", "!", "\n", END]
MAX_DEPTH = 40
MAX_CYCLES = 100000
regex = re.compile(r"([{}\[\]<>|=&#*;:/\-!\n])", flags=re.IGNORECASE)

def __init__(self):
@@ -49,6 +51,8 @@ class Tokenizer(object):
self._head = 0
self._stacks = []
self._global = 0
self._depth = 0
self._cycles = 0

@property
def _stack(self):
@@ -76,6 +80,8 @@ class Tokenizer(object):
def _push(self, context=0):
"""Add a new token stack, context, and textbuffer to the list."""
self._stacks.append([[], context, []])
self._depth += 1
self._cycles += 1

def _push_textbuffer(self):
"""Push the textbuffer onto the stack as a Text node and clear it."""
@@ -86,10 +92,11 @@ class Tokenizer(object):
def _pop(self, keep_context=False):
"""Pop the current stack/context/textbuffer, returing the stack.

If *keep_context is ``True``, then we will replace the underlying
If *keep_context* is ``True``, then we will replace the underlying
stack's context with the current stack's.
"""
self._push_textbuffer()
self._depth -= 1
if keep_context:
context = self._context
stack = self._stacks.pop()[0]
@@ -97,6 +104,10 @@ class Tokenizer(object):
return stack
return self._stacks.pop()[0]

def _can_recurse(self):
"""Return whether or not our max recursion depth has been exceeded."""
return self._depth < self.MAX_DEPTH and self._cycles < self.MAX_CYCLES

def _fail_route(self):
"""Fail the current tokenization route.

@@ -162,8 +173,8 @@ class Tokenizer(object):
self._head += 2
braces = 2
while self._read() == "{":
braces += 1
self._head += 1
braces += 1
self._push()

while braces:
@@ -197,10 +208,9 @@ class Tokenizer(object):
except BadRoute:
self._head = reset
raise
else:
self._write_first(tokens.TemplateOpen())
self._write_all(template)
self._write(tokens.TemplateClose())
self._write_first(tokens.TemplateOpen())
self._write_all(template)
self._write(tokens.TemplateClose())

def _parse_argument(self):
"""Parse an argument at the head of the wikicode string."""
@@ -210,29 +220,13 @@ class Tokenizer(object):
except BadRoute:
self._head = reset
raise
else:
self._write_first(tokens.ArgumentOpen())
self._write_all(argument)
self._write(tokens.ArgumentClose())

def _verify_safe(self, unsafes):
"""Verify that there are no unsafe characters in the current stack.

The route will be failed if the name contains any element of *unsafes*
in it (not merely at the beginning or end). This is used when parsing a
template name or parameter key, which cannot contain newlines.
"""
self._push_textbuffer()
if self._stack:
text = [tok for tok in self._stack if isinstance(tok, tokens.Text)]
text = "".join([token.text for token in text]).strip()
if text and any([unsafe in text for unsafe in unsafes]):
self._fail_route()
self._write_first(tokens.ArgumentOpen())
self._write_all(argument)
self._write(tokens.ArgumentClose())

def _handle_template_param(self):
"""Handle a template parameter at the head of the string."""
if self._context & contexts.TEMPLATE_NAME:
self._verify_safe(["\n", "{", "}", "[", "]"])
self._context ^= contexts.TEMPLATE_NAME
elif self._context & contexts.TEMPLATE_PARAM_VALUE:
self._context ^= contexts.TEMPLATE_PARAM_VALUE
@@ -244,37 +238,26 @@ class Tokenizer(object):

def _handle_template_param_value(self):
"""Handle a template parameter's value at the head of the string."""
try:
self._verify_safe(["\n", "{{", "}}"])
except BadRoute:
self._pop()
raise
else:
self._write_all(self._pop(keep_context=True))
self._write_all(self._pop(keep_context=True))
self._context ^= contexts.TEMPLATE_PARAM_KEY
self._context |= contexts.TEMPLATE_PARAM_VALUE
self._write(tokens.TemplateParamEquals())

def _handle_template_end(self):
"""Handle the end of a template at the head of the string."""
if self._context & contexts.TEMPLATE_NAME:
self._verify_safe(["\n", "{", "}", "[", "]"])
elif self._context & contexts.TEMPLATE_PARAM_KEY:
if self._context & contexts.TEMPLATE_PARAM_KEY:
self._write_all(self._pop(keep_context=True))
self._head += 1
return self._pop()

def _handle_argument_separator(self):
"""Handle the separator between an argument's name and default."""
self._verify_safe(["\n", "{{", "}}"])
self._context ^= contexts.ARGUMENT_NAME
self._context |= contexts.ARGUMENT_DEFAULT
self._write(tokens.ArgumentSeparator())

def _handle_argument_end(self):
"""Handle the end of an argument at the head of the string."""
if self._context & contexts.ARGUMENT_NAME:
self._verify_safe(["\n", "{{", "}}"])
self._head += 2
return self._pop()

@@ -294,15 +277,12 @@ class Tokenizer(object):

def _handle_wikilink_separator(self):
"""Handle the separator between a wikilink's title and its text."""
self._verify_safe(["\n", "{", "}", "[", "]"])
self._context ^= contexts.WIKILINK_TITLE
self._context |= contexts.WIKILINK_TEXT
self._write(tokens.WikilinkSeparator())

def _handle_wikilink_end(self):
"""Handle the end of a wikilink at the head of the string."""
if self._context & contexts.WIKILINK_TITLE:
self._verify_safe(["\n", "{", "}", "[", "]"])
self._head += 1
return self._pop()

@@ -342,14 +322,14 @@ class Tokenizer(object):
current = int(log(self._context / contexts.HEADING_LEVEL_1, 2)) + 1
level = min(current, min(best, 6))

try:
try: # Try to check for a heading closure after this one
after, after_level = self._parse(self._context)
except BadRoute:
if level < best:
self._write_text("=" * (best - level))
self._head = reset + best - 1
return self._pop(), level
else:
else: # Found another closure
self._write_text("=" * best)
self._write_all(after)
return self._pop(), after_level
@@ -376,9 +356,9 @@ class Tokenizer(object):
else:
numeric = hexadecimal = False

valid = string.hexdigits if hexadecimal else string.digits
valid = "0123456789abcdefABCDEF" if hexadecimal else "0123456789"
if not numeric and not hexadecimal:
valid += string.ascii_letters
valid += "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
if not all([char in valid for char in this]):
self._fail_route()

@@ -423,18 +403,83 @@ class Tokenizer(object):
self._write(tokens.CommentEnd())
self._head += 2

def _verify_safe(self, this):
"""Make sure we are not trying to write an invalid character."""
context = self._context
if context & contexts.FAIL_NEXT:
return False
if context & contexts.WIKILINK_TITLE:
if this == "]" or this == "{":
self._context |= contexts.FAIL_NEXT
elif this == "\n" or this == "[" or this == "}":
return False
return True
if context & contexts.TEMPLATE_NAME:
if this == "{" or this == "}" or this == "[":
self._context |= contexts.FAIL_NEXT
return True
if this == "]":
return False
if this == "|":
return True
if context & contexts.HAS_TEXT:
if context & contexts.FAIL_ON_TEXT:
if this is self.END or not this.isspace():
return False
else:
if this == "\n":
self._context |= contexts.FAIL_ON_TEXT
elif this is self.END or not this.isspace():
self._context |= contexts.HAS_TEXT
return True
else:
if context & contexts.FAIL_ON_EQUALS:
if this == "=":
return False
elif context & contexts.FAIL_ON_LBRACE:
if this == "{" or (self._read(-1) == self._read(-2) == "{"):
if context & contexts.TEMPLATE:
self._context |= contexts.FAIL_ON_EQUALS
else:
self._context |= contexts.FAIL_NEXT
return True
self._context ^= contexts.FAIL_ON_LBRACE
elif context & contexts.FAIL_ON_RBRACE:
if this == "}":
if context & contexts.TEMPLATE:
self._context |= contexts.FAIL_ON_EQUALS
else:
self._context |= contexts.FAIL_NEXT
return True
self._context ^= contexts.FAIL_ON_RBRACE
elif this == "{":
self._context |= contexts.FAIL_ON_LBRACE
elif this == "}":
self._context |= contexts.FAIL_ON_RBRACE
return True

def _parse(self, context=0):
"""Parse the wikicode string, using *context* for when to stop."""
self._push(context)
while True:
this = self._read()
unsafe = (contexts.TEMPLATE_NAME | contexts.WIKILINK_TITLE |
contexts.TEMPLATE_PARAM_KEY | contexts.ARGUMENT_NAME)
if self._context & unsafe:
if not self._verify_safe(this):
if self._context & contexts.TEMPLATE_PARAM_KEY:
self._pop()
self._fail_route()
if this not in self.MARKERS:
self._write_text(this)
self._head += 1
continue
if this is self.END:
fail = (contexts.TEMPLATE | contexts.ARGUMENT |
contexts.HEADING | contexts.COMMENT)
contexts.WIKILINK | contexts.HEADING |
contexts.COMMENT)
if self._context & contexts.TEMPLATE_PARAM_KEY:
self._pop()
if self._context & fail:
self._fail_route()
return self._pop()
@@ -445,7 +490,12 @@ class Tokenizer(object):
else:
self._write_text(this)
elif this == next == "{":
self._parse_template_or_argument()
if self._can_recurse():
self._parse_template_or_argument()
if self._context & contexts.FAIL_NEXT:
self._context ^= contexts.FAIL_NEXT
else:
self._write_text("{")
elif this == "|" and self._context & contexts.TEMPLATE:
self._handle_template_param()
elif this == "=" and self._context & contexts.TEMPLATE_PARAM_KEY:
@@ -460,8 +510,10 @@ class Tokenizer(object):
else:
self._write_text("}")
elif this == next == "[":
if not self._context & contexts.WIKILINK_TITLE:
if not self._context & contexts.WIKILINK_TITLE and self._can_recurse():
self._parse_wikilink()
if self._context & contexts.FAIL_NEXT:
self._context ^= contexts.FAIL_NEXT
else:
self._write_text("[")
elif this == "|" and self._context & contexts.WIKILINK_TITLE:


+ 1
- 1
mwparserfromhell/parser/tokens.py View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012 Ben Kurtovic <ben.kurtovic@verizon.net>
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal


+ 134
- 45
mwparserfromhell/smart_list.py View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012 Ben Kurtovic <ben.kurtovic@verizon.net>
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
@@ -41,8 +41,23 @@ def inheritdoc(method):
method.__doc__ = getattr(list, method.__name__).__doc__
return method

class _SliceNormalizerMixIn(object):
"""MixIn that provides a private method to normalize slices."""

class SmartList(list):
def _normalize_slice(self, key):
"""Return a slice equivalent to the input *key*, standardized."""
if key.start is not None:
start = (len(self) + key.start) if key.start < 0 else key.start
else:
start = 0
if key.stop is not None:
stop = (len(self) + key.stop) if key.stop < 0 else key.stop
else:
stop = maxsize
return slice(start, stop, key.step or 1)


class SmartList(_SliceNormalizerMixIn, list):
"""Implements the ``list`` interface with special handling of sublists.

When a sublist is created (by ``list[i:j]``), any changes made to this
@@ -76,7 +91,8 @@ class SmartList(list):
def __getitem__(self, key):
if not isinstance(key, slice):
return super(SmartList, self).__getitem__(key)
sliceinfo = [key.start, key.stop, 1 if not key.step else key.step]
key = self._normalize_slice(key)
sliceinfo = [key.start, key.stop, key.step]
child = _ListProxy(self, sliceinfo)
self._children[id(child)] = (child, sliceinfo)
return child
@@ -86,25 +102,28 @@ class SmartList(list):
return super(SmartList, self).__setitem__(key, item)
item = list(item)
super(SmartList, self).__setitem__(key, item)
diff = len(item) - key.stop + key.start
key = self._normalize_slice(key)
diff = len(item) + (key.start - key.stop) // key.step
values = self._children.values if py3k else self._children.itervalues
if diff:
for child, (start, stop, step) in values():
if start >= key.stop:
if start > key.stop:
self._children[id(child)][1][0] += diff
if stop >= key.stop and stop != maxsize:
self._children[id(child)][1][1] += diff

def __delitem__(self, key):
super(SmartList, self).__delitem__(key)
if not isinstance(key, slice):
key = slice(key, key + 1)
diff = key.stop - key.start
if isinstance(key, slice):
key = self._normalize_slice(key)
else:
key = slice(key, key + 1, 1)
diff = (key.stop - key.start) // key.step
values = self._children.values if py3k else self._children.itervalues
for child, (start, stop, step) in values():
if start > key.start:
self._children[id(child)][1][0] -= diff
if stop >= key.stop:
if stop >= key.stop and stop != maxsize:
self._children[id(child)][1][1] -= diff

if not py3k:
@@ -160,24 +179,35 @@ class SmartList(list):
child._parent = copy
super(SmartList, self).reverse()

@inheritdoc
def sort(self, cmp=None, key=None, reverse=None):
copy = list(self)
for child in self._children:
child._parent = copy
if cmp is not None:
if py3k:
@inheritdoc
def sort(self, key=None, reverse=None):
copy = list(self)
for child in self._children:
child._parent = copy
kwargs = {}
if key is not None:
if reverse is not None:
super(SmartList, self).sort(cmp, key, reverse)
else:
super(SmartList, self).sort(cmp, key)
else:
super(SmartList, self).sort(cmp)
else:
super(SmartList, self).sort()
kwargs["key"] = key
if reverse is not None:
kwargs["reverse"] = reverse
super(SmartList, self).sort(**kwargs)
else:
@inheritdoc
def sort(self, cmp=None, key=None, reverse=None):
copy = list(self)
for child in self._children:
child._parent = copy
kwargs = {}
if cmp is not None:
kwargs["cmp"] = cmp
if key is not None:
kwargs["key"] = key
if reverse is not None:
kwargs["reverse"] = reverse
super(SmartList, self).sort(**kwargs)


class _ListProxy(list):
class _ListProxy(_SliceNormalizerMixIn, list):
"""Implement the ``list`` interface by getting elements from a parent.

This is created by a :py:class:`~.SmartList` object when slicing. It does
@@ -231,25 +261,52 @@ class _ListProxy(list):
return bool(self._render())

def __len__(self):
return (self._stop - self._start) / self._step
return (self._stop - self._start) // self._step

def __getitem__(self, key):
return self._render()[key]
if isinstance(key, slice):
key = self._normalize_slice(key)
if key.stop == maxsize:
keystop = self._stop
else:
keystop = key.stop + self._start
adjusted = slice(key.start + self._start, keystop, key.step)
return self._parent[adjusted]
else:
return self._render()[key]

def __setitem__(self, key, item):
if isinstance(key, slice):
adjusted = slice(key.start + self._start, key.stop + self._stop,
key.step)
key = self._normalize_slice(key)
if key.stop == maxsize:
keystop = self._stop
else:
keystop = key.stop + self._start
adjusted = slice(key.start + self._start, keystop, key.step)
self._parent[adjusted] = item
else:
length = len(self)
if key < 0:
key = length + key
if key < 0 or key >= length:
raise IndexError("list assignment index out of range")
self._parent[self._start + key] = item

def __delitem__(self, key):
if isinstance(key, slice):
adjusted = slice(key.start + self._start, key.stop + self._stop,
key.step)
key = self._normalize_slice(key)
if key.stop == maxsize:
keystop = self._stop
else:
keystop = key.stop + self._start
adjusted = slice(key.start + self._start, keystop, key.step)
del self._parent[adjusted]
else:
length = len(self)
if key < 0:
key = length + key
if key < 0 or key >= length:
raise IndexError("list assignment index out of range")
del self._parent[self._start + key]

def __iter__(self):
@@ -287,6 +344,16 @@ class _ListProxy(list):
self.extend(other)
return self

def __mul__(self, other):
return SmartList(list(self) * other)

def __rmul__(self, other):
return SmartList(other * list(self))

def __imul__(self, other):
self.extend(list(self) * (other - 1))
return self

@property
def _start(self):
"""The starting index of this list, inclusive."""
@@ -295,6 +362,8 @@ class _ListProxy(list):
@property
def _stop(self):
"""The ending index of this list, exclusive."""
if self._sliceinfo[1] == maxsize:
return len(self._parent)
return self._sliceinfo[1]

@property
@@ -328,18 +397,25 @@ class _ListProxy(list):

@inheritdoc
def insert(self, index, item):
if index < 0:
index = len(self) + index
self._parent.insert(self._start + index, item)

@inheritdoc
def pop(self, index=None):
length = len(self)
if index is None:
index = len(self) - 1
index = length - 1
elif index < 0:
index = length + index
if index < 0 or index >= length:
raise IndexError("pop index out of range")
return self._parent.pop(self._start + index)

@inheritdoc
def remove(self, item):
index = self.index(item)
del self._parent[index]
del self._parent[self._start + index]

@inheritdoc
def reverse(self):
@@ -347,17 +423,30 @@ class _ListProxy(list):
item.reverse()
self._parent[self._start:self._stop:self._step] = item

@inheritdoc
def sort(self, cmp=None, key=None, reverse=None):
item = self._render()
if cmp is not None:
if py3k:
@inheritdoc
def sort(self, key=None, reverse=None):
item = self._render()
kwargs = {}
if key is not None:
if reverse is not None:
item.sort(cmp, key, reverse)
else:
item.sort(cmp, key)
else:
item.sort(cmp)
else:
item.sort()
self._parent[self._start:self._stop:self._step] = item
kwargs["key"] = key
if reverse is not None:
kwargs["reverse"] = reverse
item.sort(**kwargs)
self._parent[self._start:self._stop:self._step] = item
else:
@inheritdoc
def sort(self, cmp=None, key=None, reverse=None):
item = self._render()
kwargs = {}
if cmp is not None:
kwargs["cmp"] = cmp
if key is not None:
kwargs["key"] = key
if reverse is not None:
kwargs["reverse"] = reverse
item.sort(**kwargs)
self._parent[self._start:self._stop:self._step] = item


del inheritdoc

+ 100
- 18
mwparserfromhell/string_mixin.py View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012 Ben Kurtovic <ben.kurtovic@verizon.net>
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
@@ -40,7 +40,6 @@ def inheritdoc(method):
method.__doc__ = getattr(str, method.__name__).__doc__
return method


class StringMixIn(object):
"""Implement the interface for ``unicode``/``str`` in a dynamic manner.

@@ -114,6 +113,9 @@ class StringMixIn(object):
def __getitem__(self, key):
return self.__unicode__()[key]

def __reversed__(self):
return reversed(self.__unicode__())

def __contains__(self, item):
if isinstance(item, StringMixIn):
return str(item) in self.__unicode__()
@@ -123,22 +125,39 @@ class StringMixIn(object):
def capitalize(self):
return self.__unicode__().capitalize()

if py3k:
@inheritdoc
def casefold(self):
return self.__unicode__().casefold()

@inheritdoc
def center(self, width, fillchar=None):
if fillchar is None:
return self.__unicode__().center(width)
return self.__unicode__().center(width, fillchar)

@inheritdoc
def count(self, sub=None, start=None, end=None):
def count(self, sub, start=None, end=None):
return self.__unicode__().count(sub, start, end)

if not py3k:
@inheritdoc
def decode(self, encoding=None, errors=None):
return self.__unicode__().decode(encoding, errors)
kwargs = {}
if encoding is not None:
kwargs["encoding"] = encoding
if errors is not None:
kwargs["errors"] = errors
return self.__unicode__().decode(**kwargs)

@inheritdoc
def encode(self, encoding=None, errors=None):
return self.__unicode__().encode(encoding, errors)
kwargs = {}
if encoding is not None:
kwargs["encoding"] = encoding
if errors is not None:
kwargs["errors"] = errors
return self.__unicode__().encode(**kwargs)

@inheritdoc
def endswith(self, prefix, start=None, end=None):
@@ -146,18 +165,25 @@ class StringMixIn(object):

@inheritdoc
def expandtabs(self, tabsize=None):
if tabsize is None:
return self.__unicode__().expandtabs()
return self.__unicode__().expandtabs(tabsize)

@inheritdoc
def find(self, sub=None, start=None, end=None):
def find(self, sub, start=None, end=None):
return self.__unicode__().find(sub, start, end)

@inheritdoc
def format(self, *args, **kwargs):
return self.__unicode__().format(*args, **kwargs)

if py3k:
@inheritdoc
def format_map(self, mapping):
return self.__unicode__().format_map(mapping)

@inheritdoc
def index(self, sub=None, start=None, end=None):
def index(self, sub, start=None, end=None):
return self.__unicode__().index(sub, start, end)

@inheritdoc
@@ -176,6 +202,11 @@ class StringMixIn(object):
def isdigit(self):
return self.__unicode__().isdigit()

if py3k:
@inheritdoc
def isidentifier(self):
return self.__unicode__().isidentifier()

@inheritdoc
def islower(self):
return self.__unicode__().islower()
@@ -184,6 +215,11 @@ class StringMixIn(object):
def isnumeric(self):
return self.__unicode__().isnumeric()

if py3k:
@inheritdoc
def isprintable(self):
return self.__unicode__().isprintable()

@inheritdoc
def isspace(self):
return self.__unicode__().isspace()
@@ -202,6 +238,8 @@ class StringMixIn(object):

@inheritdoc
def ljust(self, width, fillchar=None):
if fillchar is None:
return self.__unicode__().ljust(width)
return self.__unicode__().ljust(width, fillchar)

@inheritdoc
@@ -212,44 +250,88 @@ class StringMixIn(object):
def lstrip(self, chars=None):
return self.__unicode__().lstrip(chars)

if py3k:
@staticmethod
@inheritdoc
def maketrans(x, y=None, z=None):
if z is None:
if y is None:
return str.maketrans(x)
return str.maketrans(x, y)
return str.maketrans(x, y, z)

@inheritdoc
def partition(self, sep):
return self.__unicode__().partition(sep)

@inheritdoc
def replace(self, old, new, count):
def replace(self, old, new, count=None):
if count is None:
return self.__unicode__().replace(old, new)
return self.__unicode__().replace(old, new, count)

@inheritdoc
def rfind(self, sub=None, start=None, end=None):
def rfind(self, sub, start=None, end=None):
return self.__unicode__().rfind(sub, start, end)

@inheritdoc
def rindex(self, sub=None, start=None, end=None):
def rindex(self, sub, start=None, end=None):
return self.__unicode__().rindex(sub, start, end)

@inheritdoc
def rjust(self, width, fillchar=None):
if fillchar is None:
return self.__unicode__().rjust(width)
return self.__unicode__().rjust(width, fillchar)

@inheritdoc
def rpartition(self, sep):
return self.__unicode__().rpartition(sep)

@inheritdoc
def rsplit(self, sep=None, maxsplit=None):
return self.__unicode__().rsplit(sep, maxsplit)
if py3k:
@inheritdoc
def rsplit(self, sep=None, maxsplit=None):
kwargs = {}
if sep is not None:
kwargs["sep"] = sep
if maxsplit is not None:
kwargs["maxsplit"] = maxsplit
return self.__unicode__().rsplit(**kwargs)
else:
@inheritdoc
def rsplit(self, sep=None, maxsplit=None):
if maxsplit is None:
if sep is None:
return self.__unicode__().rsplit()
return self.__unicode__().rsplit(sep)
return self.__unicode__().rsplit(sep, maxsplit)

@inheritdoc
def rstrip(self, chars=None):
return self.__unicode__().rstrip(chars)

@inheritdoc
def split(self, sep=None, maxsplit=None):
return self.__unicode__().split(sep, maxsplit)
if py3k:
@inheritdoc
def split(self, sep=None, maxsplit=None):
kwargs = {}
if sep is not None:
kwargs["sep"] = sep
if maxsplit is not None:
kwargs["maxsplit"] = maxsplit
return self.__unicode__().split(**kwargs)
else:
@inheritdoc
def split(self, sep=None, maxsplit=None):
if maxsplit is None:
if sep is None:
return self.__unicode__().split()
return self.__unicode__().split(sep)
return self.__unicode__().split(sep, maxsplit)

@inheritdoc
def splitlines(self, keepends=None):
if keepends is None:
return self.__unicode__().splitlines()
return self.__unicode__().splitlines(keepends)

@inheritdoc
@@ -269,8 +351,8 @@ class StringMixIn(object):
return self.__unicode__().title()

@inheritdoc
def translate(self, table, deletechars=None):
return self.__unicode__().translate(table, deletechars)
def translate(self, table):
return self.__unicode__().translate(table)

@inheritdoc
def upper(self):


+ 8
- 8
mwparserfromhell/utils.py View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012 Ben Kurtovic <ben.kurtovic@verizon.net>
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
@@ -34,16 +34,16 @@ from .smart_list import SmartList
def parse_anything(value):
"""Return a :py:class:`~.Wikicode` for *value*, allowing multiple types.

This differs from :py:func:`mwparserfromhell.parse` in that we accept more
than just a string to be parsed. Unicode objects (strings in py3k), strings
(bytes in py3k), integers (converted to strings), ``None``, existing
This differs from :py:meth:`.Parser.parse` in that we accept more than just
a string to be parsed. Unicode objects (strings in py3k), strings (bytes in
py3k), integers (converted to strings), ``None``, existing
:py:class:`~.Node` or :py:class:`~.Wikicode` objects, as well as an
iterable of these types, are supported. This is used to parse input
on-the-fly by various methods of :py:class:`~.Wikicode` and others like
:py:class:`~.Template`, such as :py:meth:`wikicode.insert()
<.Wikicode.insert>` or setting :py:meth:`template.name <.Template.name>`.
"""
from . import parse
from .parser import Parser
from .wikicode import Wikicode

if isinstance(value, Wikicode):
@@ -51,11 +51,11 @@ def parse_anything(value):
elif isinstance(value, Node):
return Wikicode(SmartList([value]))
elif isinstance(value, str):
return parse(value)
return Parser(value).parse()
elif isinstance(value, bytes):
return parse(value.decode("utf8"))
return Parser(value.decode("utf8")).parse()
elif isinstance(value, int):
return parse(str(value))
return Parser(str(value)).parse()
elif value is None:
return Wikicode(SmartList())
try:


+ 78
- 102
mwparserfromhell/wikicode.py View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012 Ben Kurtovic <ben.kurtovic@verizon.net>
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
@@ -23,8 +23,9 @@
from __future__ import unicode_literals
import re

from .compat import maxsize, str
from .nodes import Heading, Node, Tag, Template, Text, Wikilink
from .compat import maxsize, py3k, str
from .nodes import (Argument, Comment, Heading, HTMLEntity, Node, Tag,
Template, Text, Wikilink)
from .string_mixin import StringMixIn
from .utils import parse_anything

@@ -68,7 +69,7 @@ class Wikicode(StringMixIn):
Raises ``ValueError`` if *obj* is not within *node*.
"""
for context, child in node.__iternodes__(self._get_all_nodes):
if child is obj:
if self._is_equivalent(obj, child):
return context
raise ValueError(obj)

@@ -88,13 +89,7 @@ class Wikicode(StringMixIn):
If *obj* is a ``Node``, the function will test whether they are the
same object, otherwise it will compare them with ``==``.
"""
if isinstance(obj, Node):
if node is obj:
return True
else:
if node == obj:
return True
return False
return (node is obj) if isinstance(obj, Node) else (node == obj)

def _contains(self, nodes, obj):
"""Return ``True`` if *obj* is inside of *nodes*, else ``False``.
@@ -157,6 +152,36 @@ class Wikicode(StringMixIn):
node.__showtree__(write, get, mark)
return lines

@classmethod
def _build_filter_methods(cls, **meths):
"""Given Node types, build the corresponding i?filter shortcuts.

The should be given as keys storing the method's base name paired
with values storing the corresponding :py:class:`~.Node` type. For
example, the dict may contain the pair ``("templates", Template)``,
which will produce the methods :py:meth:`ifilter_templates` and
:py:meth:`filter_templates`, which are shortcuts for
:py:meth:`ifilter(forcetype=Template) <ifilter>` and
:py:meth:`filter(forcetype=Template) <filter>`, respectively. These
shortcuts are added to the class itself, with an appropriate docstring.
"""
doc = """Iterate over {0}.

This is equivalent to :py:meth:`{1}` with *forcetype* set to
:py:class:`~{2.__module__}.{2.__name__}`.
"""
make_ifilter = lambda ftype: (lambda self, **kw:
self.ifilter(forcetype=ftype, **kw))
make_filter = lambda ftype: (lambda self, **kw:
self.filter(forcetype=ftype, **kw))
for name, ftype in (meths.items() if py3k else meths.iteritems()):
ifilter = make_ifilter(ftype)
filter = make_filter(ftype)
ifilter.__doc__ = doc.format(name, "ifilter", ftype)
filter.__doc__ = doc.format(name, "filter", ftype)
setattr(cls, "ifilter_" + name, ifilter)
setattr(cls, "filter_" + name, filter)

@property
def nodes(self):
"""A list of :py:class:`~.Node` objects.
@@ -168,6 +193,8 @@ class Wikicode(StringMixIn):

@nodes.setter
def nodes(self, value):
if not isinstance(value, list):
value = parse_anything(value).nodes
self._nodes = value

def get(self, index):
@@ -188,9 +215,10 @@ class Wikicode(StringMixIn):
raise ValueError("Cannot coerce multiple nodes into one index")
if index >= len(self.nodes) or -1 * index > len(self.nodes):
raise IndexError("List assignment index out of range")
self.nodes.pop(index)
if nodes:
self.nodes[index] = nodes[0]
else:
self.nodes.pop(index)

def index(self, obj, recursive=False):
"""Return the index of *obj* in the list of nodes.
@@ -294,47 +322,11 @@ class Wikicode(StringMixIn):
*flags*. If *forcetype* is given, only nodes that are instances of this
type are yielded.
"""
if recursive:
nodes = self._get_all_nodes(self)
else:
nodes = self.nodes
for node in nodes:
for node in (self._get_all_nodes(self) if recursive else self.nodes):
if not forcetype or isinstance(node, forcetype):
if not matches or re.search(matches, str(node), flags):
yield node

def ifilter_links(self, recursive=False, matches=None, flags=FLAGS):
"""Iterate over wikilink nodes.

This is equivalent to :py:meth:`ifilter` with *forcetype* set to
:py:class:`~.Wikilink`.
"""
return self.ifilter(recursive, matches, flags, forcetype=Wikilink)

def ifilter_templates(self, recursive=False, matches=None, flags=FLAGS):
"""Iterate over template nodes.

This is equivalent to :py:meth:`ifilter` with *forcetype* set to
:py:class:`~.Template`.
"""
return self.filter(recursive, matches, flags, forcetype=Template)

def ifilter_text(self, recursive=False, matches=None, flags=FLAGS):
"""Iterate over text nodes.

This is equivalent to :py:meth:`ifilter` with *forcetype* set to
:py:class:`~.nodes.Text`.
"""
return self.filter(recursive, matches, flags, forcetype=Text)

def ifilter_tags(self, recursive=False, matches=None, flags=FLAGS):
"""Iterate over tag nodes.

This is equivalent to :py:meth:`ifilter` with *forcetype* set to
:py:class:`~.Tag`.
"""
return self.ifilter(recursive, matches, flags, forcetype=Tag)

def filter(self, recursive=False, matches=None, flags=FLAGS,
forcetype=None):
"""Return a list of nodes within our list matching certain conditions.
@@ -343,77 +335,56 @@ class Wikicode(StringMixIn):
"""
return list(self.ifilter(recursive, matches, flags, forcetype))

def filter_links(self, recursive=False, matches=None, flags=FLAGS):
"""Return a list of wikilink nodes.

This is equivalent to calling :py:func:`list` on
:py:meth:`ifilter_links`.
"""
return list(self.ifilter_links(recursive, matches, flags))

def filter_templates(self, recursive=False, matches=None, flags=FLAGS):
"""Return a list of template nodes.

This is equivalent to calling :py:func:`list` on
:py:meth:`ifilter_templates`.
"""
return list(self.ifilter_templates(recursive, matches, flags))

def filter_text(self, recursive=False, matches=None, flags=FLAGS):
"""Return a list of text nodes.

This is equivalent to calling :py:func:`list` on
:py:meth:`ifilter_text`.
"""
return list(self.ifilter_text(recursive, matches, flags))

def filter_tags(self, recursive=False, matches=None, flags=FLAGS):
"""Return a list of tag nodes.

This is equivalent to calling :py:func:`list` on
:py:meth:`ifilter_tags`.
"""
return list(self.ifilter_tags(recursive, matches, flags))

def get_sections(self, flat=True, matches=None, levels=None, flags=FLAGS,
include_headings=True):
def get_sections(self, levels=None, matches=None, flags=FLAGS,
include_lead=None, include_headings=True):
"""Return a list of sections within the page.

Sections are returned as :py:class:`~.Wikicode` objects with a shared
node list (implemented using :py:class:`~.SmartList`) so that changes
to sections are reflected in the parent Wikicode object.

With *flat* as ``True``, each returned section contains all of its
subsections within the :py:class:`~.Wikicode`; otherwise, the returned
sections contain only the section up to the next heading, regardless of
its size. If *matches* is given, it should be a regex to matched
against the titles of section headings; only sections whose headings
match the regex will be included. If *levels* is given, it should be a =
list of integers; only sections whose heading levels are within the
list will be returned. If *include_headings* is ``True``, the section's
literal :py:class:`~.Heading` object will be included in returned
:py:class:`~.Wikicode` objects; otherwise, this is skipped.
Each section contains all of its subsections. If *levels* is given, it
should be a iterable of integers; only sections whose heading levels
are within it will be returned.If *matches* is given, it should be a
regex to be matched against the titles of section headings; only
sections whose headings match the regex will be included. *flags* can
be used to override the default regex flags (see :py:meth:`ifilter`) if
*matches* is used.

If *include_lead* is ``True``, the first, lead section (without a
heading) will be included in the list; ``False`` will not include it;
the default will include it only if no specific *levels* were given. If
*include_headings* is ``True``, the section's beginning
:py:class:`~.Heading` object will be included; otherwise, this is
skipped.
"""
if matches:
matches = r"^(=+?)\s*" + matches + r"\s*\1$"
headings = self.filter(recursive=True, matches=matches, flags=flags,
forcetype=Heading)
headings = self.filter_headings(recursive=True)
filtered = self.filter_headings(recursive=True, matches=matches,
flags=flags)
if levels:
headings = [head for head in headings if head.level in levels]
filtered = [head for head in filtered if head.level in levels]

if matches or include_lead is False or (not include_lead and levels):
buffers = []
else:
buffers = [(maxsize, 0)]
sections = []
buffers = [[maxsize, 0]]
i = 0
while i < len(self.nodes):
if self.nodes[i] in headings:
this = self.nodes[i].level
for (level, start) in buffers:
if not flat or this <= level:
buffers.remove([level, start])
if this <= level:
sections.append(Wikicode(self.nodes[start:i]))
buffers.append([this, i])
if not include_headings:
i += 1
buffers = [buf for buf in buffers if buf[0] < this]
if self.nodes[i] in filtered:
if not include_headings:
i += 1
if i >= len(self.nodes):
break
buffers.append((this, i))
i += 1
for (level, start) in buffers:
if start != i:
@@ -473,3 +444,8 @@ class Wikicode(StringMixIn):
"""
marker = object() # Random object we can find with certainty in a list
return "\n".join(self._get_tree(self, [], marker, 0))

Wikicode._build_filter_methods(
arguments=Argument, comments=Comment, headings=Heading,
html_entities=HTMLEntity, tags=Tag, templates=Template, text=Text,
wikilinks=Wikilink)

+ 10
- 2
setup.py View File

@@ -1,7 +1,7 @@
#! /usr/bin/env python
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012 Ben Kurtovic <ben.kurtovic@verizon.net>
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
@@ -21,16 +21,24 @@
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from setuptools import setup, find_packages
from setuptools import setup, find_packages, Extension

from mwparserfromhell import __version__
from mwparserfromhell.compat import py3k

with open("README.rst") as fp:
long_docs = fp.read()

# builder = Extension("mwparserfromhell.parser._builder",
# sources = ["mwparserfromhell/parser/builder.c"])

tokenizer = Extension("mwparserfromhell.parser._tokenizer",
sources = ["mwparserfromhell/parser/tokenizer.c"])

setup(
name = "mwparserfromhell",
packages = find_packages(exclude=("tests",)),
ext_modules = [] if py3k else [tokenizer],
test_suite = "tests",
version = __version__,
author = "Ben Kurtovic",


+ 130
- 0
tests/MWPFHTestCase.tmlanguage View File

@@ -0,0 +1,130 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>fileTypes</key>
<array>
<string>mwtest</string>
</array>
<key>name</key>
<string>MWParserFromHell Test Case</string>
<key>patterns</key>
<array>
<dict>
<key>match</key>
<string>---</string>
<key>name</key>
<string>markup.heading.divider.mwpfh</string>
</dict>
<dict>
<key>captures</key>
<dict>
<key>1</key>
<dict>
<key>name</key>
<string>keyword.other.name.mwpfh</string>
</dict>
<key>2</key>
<dict>
<key>name</key>
<string>variable.other.name.mwpfh</string>
</dict>
</dict>
<key>match</key>
<string>(name:)\s*(\w*)</string>
<key>name</key>
<string>meta.name.mwpfh</string>
</dict>
<dict>
<key>captures</key>
<dict>
<key>1</key>
<dict>
<key>name</key>
<string>keyword.other.label.mwpfh</string>
</dict>
<key>2</key>
<dict>
<key>name</key>
<string>comment.line.other.label.mwpfh</string>
</dict>
</dict>
<key>match</key>
<string>(label:)\s*(.*)</string>
<key>name</key>
<string>meta.label.mwpfh</string>
</dict>
<dict>
<key>captures</key>
<dict>
<key>1</key>
<dict>
<key>name</key>
<string>keyword.other.input.mwpfh</string>
</dict>
<key>2</key>
<dict>
<key>name</key>
<string>string.quoted.double.input.mwpfh</string>
</dict>
</dict>
<key>match</key>
<string>(input:)\s*(.*)</string>
<key>name</key>
<string>meta.input.mwpfh</string>
</dict>
<dict>
<key>captures</key>
<dict>
<key>1</key>
<dict>
<key>name</key>
<string>keyword.other.output.mwpfh</string>
</dict>
</dict>
<key>match</key>
<string>(output:)</string>
<key>name</key>
<string>meta.output.mwpfh</string>
</dict>
<dict>
<key>captures</key>
<dict>
<key>1</key>
<dict>
<key>name</key>
<string>support.language.token.mwpfh</string>
</dict>
</dict>
<key>match</key>
<string>(\w+)\s*\(</string>
<key>name</key>
<string>meta.name.token.mwpfh</string>
</dict>
<dict>
<key>captures</key>
<dict>
<key>1</key>
<dict>
<key>name</key>
<string>variable.parameter.token.mwpfh</string>
</dict>
</dict>
<key>match</key>
<string>(\w+)\s*(=)</string>
<key>name</key>
<string>meta.name.parameter.token.mwpfh</string>
</dict>
<dict>
<key>match</key>
<string>".*?"</string>
<key>name</key>
<string>string.quoted.double.mwpfh</string>
</dict>
</array>
<key>scopeName</key>
<string>text.mwpfh</string>
<key>uuid</key>
<string>cd3e2ffa-a57d-4c40-954f-1a2e87ffd638</string>
</dict>
</plist>

+ 133
- 0
tests/_test_tokenizer.py View File

@@ -0,0 +1,133 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import print_function, unicode_literals
from os import listdir, path
import sys

from mwparserfromhell.compat import py3k
from mwparserfromhell.parser import tokens

class _TestParseError(Exception):
"""Raised internally when a test could not be parsed."""
pass


class TokenizerTestCase(object):
"""A base test case for tokenizers, whose tests are loaded dynamically.

Subclassed along with unittest.TestCase to form TestPyTokenizer and
TestCTokenizer. Tests are loaded dynamically from files in the 'tokenizer'
directory.
"""

@classmethod
def _build_test_method(cls, funcname, data):
"""Create and return a method to be treated as a test case method.

*data* is a dict containing multiple keys: the *input* text to be
tokenized, the expected list of tokens as *output*, and an optional
*label* for the method's docstring.
"""
def inner(self):
expected = data["output"]
actual = self.tokenizer().tokenize(data["input"])
self.assertEqual(expected, actual)
if not py3k:
inner.__name__ = funcname.encode("utf8")
inner.__doc__ = data["label"]
return inner

@classmethod
def _load_tests(cls, filename, name, text):
"""Load all tests in *text* from the file *filename*."""
tests = text.split("\n---\n")
counter = 1
digits = len(str(len(tests)))
for test in tests:
data = {"name": None, "label": None, "input": None, "output": None}
try:
for line in test.strip().splitlines():
if line.startswith("name:"):
data["name"] = line[len("name:"):].strip()
elif line.startswith("label:"):
data["label"] = line[len("label:"):].strip()
elif line.startswith("input:"):
raw = line[len("input:"):].strip()
if raw[0] == '"' and raw[-1] == '"':
raw = raw[1:-1]
raw = raw.encode("raw_unicode_escape")
data["input"] = raw.decode("unicode_escape")
elif line.startswith("output:"):
raw = line[len("output:"):].strip()
try:
data["output"] = eval(raw, vars(tokens))
except Exception as err:
raise _TestParseError(err)
except _TestParseError as err:
if data["name"]:
error = "Could not parse test '{0}' in '{1}':\n\t{2}"
print(error.format(data["name"], filename, err))
else:
error = "Could not parse a test in '{0}':\n\t{1}"
print(error.format(filename, err))
continue
if not data["name"]:
error = "A test in '{0}' was ignored because it lacked a name"
print(error.format(filename))
continue
if data["input"] is None or data["output"] is None:
error = "Test '{0}' in '{1}' was ignored because it lacked an input or an output"
print(error.format(data["name"], filename))
continue
number = str(counter).zfill(digits)
fname = "test_{0}{1}_{2}".format(name, number, data["name"])
meth = cls._build_test_method(fname, data)
setattr(cls, fname, meth)
counter += 1

@classmethod
def build(cls):
"""Load and install all tests from the 'tokenizer' directory."""
def load_file(filename):
with open(filename, "rU") as fp:
text = fp.read()
if not py3k:
text = text.decode("utf8")
name = path.split(filename)[1][:0-len(extension)]
cls._load_tests(filename, name, text)

directory = path.join(path.dirname(__file__), "tokenizer")
extension = ".mwtest"
if len(sys.argv) > 2 and sys.argv[1] == "--use":
for name in sys.argv[2:]:
load_file(path.join(directory, name + extension))
sys.argv = [sys.argv[0]] # So unittest doesn't try to load these
cls.skip_others = True
else:
for filename in listdir(directory):
if not filename.endswith(extension):
continue
load_file(path.join(directory, filename))
cls.skip_others = False

TokenizerTestCase.build()

+ 126
- 0
tests/_test_tree_equality.py View File

@@ -0,0 +1,126 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import unicode_literals
from unittest import TestCase

from mwparserfromhell.nodes import (Argument, Comment, Heading, HTMLEntity,
Tag, Template, Text, Wikilink)
from mwparserfromhell.nodes.extras import Attribute, Parameter
from mwparserfromhell.smart_list import SmartList
from mwparserfromhell.wikicode import Wikicode

wrap = lambda L: Wikicode(SmartList(L))
wraptext = lambda *args: wrap([Text(t) for t in args])

def getnodes(code):
"""Iterate over all child nodes of a given parent node.

Imitates Wikicode._get_all_nodes().
"""
for node in code.nodes:
for context, child in node.__iternodes__(getnodes):
yield child

class TreeEqualityTestCase(TestCase):
"""A base test case with support for comparing the equality of node trees.

This adds a number of type equality functions, for Wikicode, Text,
Templates, and Wikilinks.
"""

def assertNodeEqual(self, expected, actual):
"""Assert that two Nodes have the same type and have the same data."""
registry = {
Argument: self.assertArgumentNodeEqual,
Comment: self.assertCommentNodeEqual,
Heading: self.assertHeadingNodeEqual,
HTMLEntity: self.assertHTMLEntityNodeEqual,
Tag: self.assertTagNodeEqual,
Template: self.assertTemplateNodeEqual,
Text: self.assertTextNodeEqual,
Wikilink: self.assertWikilinkNodeEqual
}
for nodetype in registry:
if isinstance(expected, nodetype):
self.assertIsInstance(actual, nodetype)
registry[nodetype](expected, actual)

def assertArgumentNodeEqual(self, expected, actual):
"""Assert that two Argument nodes have the same data."""
self.assertWikicodeEqual(expected.name, actual.name)
if expected.default is not None:
self.assertWikicodeEqual(expected.default, actual.default)
else:
self.assertIs(None, actual.default)

def assertCommentNodeEqual(self, expected, actual):
"""Assert that two Comment nodes have the same data."""
self.assertWikicodeEqual(expected.contents, actual.contents)

def assertHeadingNodeEqual(self, expected, actual):
"""Assert that two Heading nodes have the same data."""
self.assertWikicodeEqual(expected.title, actual.title)
self.assertEqual(expected.level, actual.level)

def assertHTMLEntityNodeEqual(self, expected, actual):
"""Assert that two HTMLEntity nodes have the same data."""
self.assertEqual(expected.value, actual.value)
self.assertIs(expected.named, actual.named)
self.assertIs(expected.hexadecimal, actual.hexadecimal)
self.assertEqual(expected.hex_char, actual.hex_char)

def assertTagNodeEqual(self, expected, actual):
"""Assert that two Tag nodes have the same data."""
self.fail("Holding this until feature/html_tags is ready.")

def assertTemplateNodeEqual(self, expected, actual):
"""Assert that two Template nodes have the same data."""
self.assertWikicodeEqual(expected.name, actual.name)
length = len(expected.params)
self.assertEqual(length, len(actual.params))
for i in range(length):
exp_param = expected.params[i]
act_param = actual.params[i]
self.assertWikicodeEqual(exp_param.name, act_param.name)
self.assertWikicodeEqual(exp_param.value, act_param.value)
self.assertIs(exp_param.showkey, act_param.showkey)

def assertTextNodeEqual(self, expected, actual):
"""Assert that two Text nodes have the same data."""
self.assertEqual(expected.value, actual.value)

def assertWikilinkNodeEqual(self, expected, actual):
"""Assert that two Wikilink nodes have the same data."""
self.assertWikicodeEqual(expected.title, actual.title)
if expected.text is not None:
self.assertWikicodeEqual(expected.text, actual.text)
else:
self.assertIs(None, actual.text)

def assertWikicodeEqual(self, expected, actual):
"""Assert that two Wikicode objects have the same data."""
self.assertIsInstance(actual, Wikicode)
length = len(expected.nodes)
self.assertEqual(length, len(actual.nodes))
for i in range(length):
self.assertNodeEqual(expected.get(i), actual.get(i))

+ 20
- 0
tests/compat.py View File

@@ -0,0 +1,20 @@
# -*- coding: utf-8 -*-

"""
Serves the same purpose as mwparserfromhell.compat, but only for objects
required by unit tests. This avoids unnecessary imports (like urllib) within
the main library.
"""

from mwparserfromhell.compat import py3k

if py3k:
range = range
from io import StringIO
from urllib.parse import urlencode
from urllib.request import urlopen

else:
range = xrange
from StringIO import StringIO
from urllib import urlencode, urlopen

+ 107
- 0
tests/test_argument.py View File

@@ -0,0 +1,107 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import unicode_literals
import unittest

from mwparserfromhell.compat import str
from mwparserfromhell.nodes import Argument, Text

from ._test_tree_equality import TreeEqualityTestCase, getnodes, wrap, wraptext

class TestArgument(TreeEqualityTestCase):
"""Test cases for the Argument node."""

def test_unicode(self):
"""test Argument.__unicode__()"""
node = Argument(wraptext("foobar"))
self.assertEqual("{{{foobar}}}", str(node))
node2 = Argument(wraptext("foo"), wraptext("bar"))
self.assertEqual("{{{foo|bar}}}", str(node2))

def test_iternodes(self):
"""test Argument.__iternodes__()"""
node1n1 = Text("foobar")
node2n1, node2n2, node2n3 = Text("foo"), Text("bar"), Text("baz")
node1 = Argument(wrap([node1n1]))
node2 = Argument(wrap([node2n1]), wrap([node2n2, node2n3]))
gen1 = node1.__iternodes__(getnodes)
gen2 = node2.__iternodes__(getnodes)
self.assertEqual((None, node1), next(gen1))
self.assertEqual((None, node2), next(gen2))
self.assertEqual((node1.name, node1n1), next(gen1))
self.assertEqual((node2.name, node2n1), next(gen2))
self.assertEqual((node2.default, node2n2), next(gen2))
self.assertEqual((node2.default, node2n3), next(gen2))
self.assertRaises(StopIteration, next, gen1)
self.assertRaises(StopIteration, next, gen2)

def test_strip(self):
"""test Argument.__strip__()"""
node = Argument(wraptext("foobar"))
node2 = Argument(wraptext("foo"), wraptext("bar"))
for a in (True, False):
for b in (True, False):
self.assertIs(None, node.__strip__(a, b))
self.assertEqual("bar", node2.__strip__(a, b))

def test_showtree(self):
"""test Argument.__showtree__()"""
output = []
getter, marker = object(), object()
get = lambda code: output.append((getter, code))
mark = lambda: output.append(marker)
node1 = Argument(wraptext("foobar"))
node2 = Argument(wraptext("foo"), wraptext("bar"))
node1.__showtree__(output.append, get, mark)
node2.__showtree__(output.append, get, mark)
valid = [
"{{{", (getter, node1.name), "}}}", "{{{", (getter, node2.name),
" | ", marker, (getter, node2.default), "}}}"]
self.assertEqual(valid, output)

def test_name(self):
"""test getter/setter for the name attribute"""
name = wraptext("foobar")
node1 = Argument(name)
node2 = Argument(name, wraptext("baz"))
self.assertIs(name, node1.name)
self.assertIs(name, node2.name)
node1.name = "héhehé"
node2.name = "héhehé"
self.assertWikicodeEqual(wraptext("héhehé"), node1.name)
self.assertWikicodeEqual(wraptext("héhehé"), node2.name)

def test_default(self):
"""test getter/setter for the default attribute"""
default = wraptext("baz")
node1 = Argument(wraptext("foobar"))
node2 = Argument(wraptext("foobar"), default)
self.assertIs(None, node1.default)
self.assertIs(default, node2.default)
node1.default = "buzz"
node2.default = None
self.assertWikicodeEqual(wraptext("buzz"), node1.default)
self.assertIs(None, node2.default)

if __name__ == "__main__":
unittest.main(verbosity=2)

+ 247
- 0
tests/test_builder.py View File

@@ -0,0 +1,247 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import unicode_literals
import unittest

from mwparserfromhell.nodes import (Argument, Comment, Heading, HTMLEntity,
Tag, Template, Text, Wikilink)
from mwparserfromhell.nodes.extras import Attribute, Parameter
from mwparserfromhell.parser import tokens
from mwparserfromhell.parser.builder import Builder

from ._test_tree_equality import TreeEqualityTestCase, wrap, wraptext

class TestBuilder(TreeEqualityTestCase):
"""Tests for the builder, which turns tokens into Wikicode objects."""

def setUp(self):
self.builder = Builder()

def test_text(self):
"""tests for building Text nodes"""
tests = [
([tokens.Text(text="foobar")], wraptext("foobar")),
([tokens.Text(text="fóóbar")], wraptext("fóóbar")),
([tokens.Text(text="spam"), tokens.Text(text="eggs")],
wraptext("spam", "eggs")),
]
for test, valid in tests:
self.assertWikicodeEqual(valid, self.builder.build(test))

def test_template(self):
"""tests for building Template nodes"""
tests = [
([tokens.TemplateOpen(), tokens.Text(text="foobar"),
tokens.TemplateClose()],
wrap([Template(wraptext("foobar"))])),

([tokens.TemplateOpen(), tokens.Text(text="spam"),
tokens.Text(text="eggs"), tokens.TemplateClose()],
wrap([Template(wraptext("spam", "eggs"))])),

([tokens.TemplateOpen(), tokens.Text(text="foo"),
tokens.TemplateParamSeparator(), tokens.Text(text="bar"),
tokens.TemplateClose()],
wrap([Template(wraptext("foo"), params=[
Parameter(wraptext("1"), wraptext("bar"), showkey=False)])])),

([tokens.TemplateOpen(), tokens.Text(text="foo"),
tokens.TemplateParamSeparator(), tokens.Text(text="bar"),
tokens.TemplateParamEquals(), tokens.Text(text="baz"),
tokens.TemplateClose()],
wrap([Template(wraptext("foo"), params=[
Parameter(wraptext("bar"), wraptext("baz"))])])),

([tokens.TemplateOpen(), tokens.Text(text="foo"),
tokens.TemplateParamSeparator(), tokens.Text(text="bar"),
tokens.TemplateParamEquals(), tokens.Text(text="baz"),
tokens.TemplateParamSeparator(), tokens.Text(text="biz"),
tokens.TemplateParamSeparator(), tokens.Text(text="buzz"),
tokens.TemplateParamSeparator(), tokens.Text(text="3"),
tokens.TemplateParamEquals(), tokens.Text(text="buff"),
tokens.TemplateParamSeparator(), tokens.Text(text="baff"),
tokens.TemplateClose()],
wrap([Template(wraptext("foo"), params=[
Parameter(wraptext("bar"), wraptext("baz")),
Parameter(wraptext("1"), wraptext("biz"), showkey=False),
Parameter(wraptext("2"), wraptext("buzz"), showkey=False),
Parameter(wraptext("3"), wraptext("buff")),
Parameter(wraptext("3"), wraptext("baff"),
showkey=False)])])),
]
for test, valid in tests:
self.assertWikicodeEqual(valid, self.builder.build(test))

def test_argument(self):
"""tests for building Argument nodes"""
tests = [
([tokens.ArgumentOpen(), tokens.Text(text="foobar"),
tokens.ArgumentClose()],
wrap([Argument(wraptext("foobar"))])),

([tokens.ArgumentOpen(), tokens.Text(text="spam"),
tokens.Text(text="eggs"), tokens.ArgumentClose()],
wrap([Argument(wraptext("spam", "eggs"))])),

([tokens.ArgumentOpen(), tokens.Text(text="foo"),
tokens.ArgumentSeparator(), tokens.Text(text="bar"),
tokens.ArgumentClose()],
wrap([Argument(wraptext("foo"), wraptext("bar"))])),

([tokens.ArgumentOpen(), tokens.Text(text="foo"),
tokens.Text(text="bar"), tokens.ArgumentSeparator(),
tokens.Text(text="baz"), tokens.Text(text="biz"),
tokens.ArgumentClose()],
wrap([Argument(wraptext("foo", "bar"), wraptext("baz", "biz"))])),
]
for test, valid in tests:
self.assertWikicodeEqual(valid, self.builder.build(test))

def test_wikilink(self):
"""tests for building Wikilink nodes"""
tests = [
([tokens.WikilinkOpen(), tokens.Text(text="foobar"),
tokens.WikilinkClose()],
wrap([Wikilink(wraptext("foobar"))])),

([tokens.WikilinkOpen(), tokens.Text(text="spam"),
tokens.Text(text="eggs"), tokens.WikilinkClose()],
wrap([Wikilink(wraptext("spam", "eggs"))])),

([tokens.WikilinkOpen(), tokens.Text(text="foo"),
tokens.WikilinkSeparator(), tokens.Text(text="bar"),
tokens.WikilinkClose()],
wrap([Wikilink(wraptext("foo"), wraptext("bar"))])),

([tokens.WikilinkOpen(), tokens.Text(text="foo"),
tokens.Text(text="bar"), tokens.WikilinkSeparator(),
tokens.Text(text="baz"), tokens.Text(text="biz"),
tokens.WikilinkClose()],
wrap([Wikilink(wraptext("foo", "bar"), wraptext("baz", "biz"))])),
]
for test, valid in tests:
self.assertWikicodeEqual(valid, self.builder.build(test))

def test_html_entity(self):
"""tests for building HTMLEntity nodes"""
tests = [
([tokens.HTMLEntityStart(), tokens.Text(text="nbsp"),
tokens.HTMLEntityEnd()],
wrap([HTMLEntity("nbsp", named=True, hexadecimal=False)])),

([tokens.HTMLEntityStart(), tokens.HTMLEntityNumeric(),
tokens.Text(text="107"), tokens.HTMLEntityEnd()],
wrap([HTMLEntity("107", named=False, hexadecimal=False)])),

([tokens.HTMLEntityStart(), tokens.HTMLEntityNumeric(),
tokens.HTMLEntityHex(char="X"), tokens.Text(text="6B"),
tokens.HTMLEntityEnd()],
wrap([HTMLEntity("6B", named=False, hexadecimal=True,
hex_char="X")])),
]
for test, valid in tests:
self.assertWikicodeEqual(valid, self.builder.build(test))

def test_heading(self):
"""tests for building Heading nodes"""
tests = [
([tokens.HeadingStart(level=2), tokens.Text(text="foobar"),
tokens.HeadingEnd()],
wrap([Heading(wraptext("foobar"), 2)])),

([tokens.HeadingStart(level=4), tokens.Text(text="spam"),
tokens.Text(text="eggs"), tokens.HeadingEnd()],
wrap([Heading(wraptext("spam", "eggs"), 4)])),
]
for test, valid in tests:
self.assertWikicodeEqual(valid, self.builder.build(test))

def test_comment(self):
"""tests for building Comment nodes"""
tests = [
([tokens.CommentStart(), tokens.Text(text="foobar"),
tokens.CommentEnd()],
wrap([Comment(wraptext("foobar"))])),

([tokens.CommentStart(), tokens.Text(text="spam"),
tokens.Text(text="eggs"), tokens.CommentEnd()],
wrap([Comment(wraptext("spam", "eggs"))])),
]
for test, valid in tests:
self.assertWikicodeEqual(valid, self.builder.build(test))

def test_integration(self):
"""a test for building a combination of templates together"""
# {{{{{{{{foo}}bar|baz=biz}}buzz}}usr|{{bin}}}}
test = [tokens.TemplateOpen(), tokens.TemplateOpen(),
tokens.TemplateOpen(), tokens.TemplateOpen(),
tokens.Text(text="foo"), tokens.TemplateClose(),
tokens.Text(text="bar"), tokens.TemplateParamSeparator(),
tokens.Text(text="baz"), tokens.TemplateParamEquals(),
tokens.Text(text="biz"), tokens.TemplateClose(),
tokens.Text(text="buzz"), tokens.TemplateClose(),
tokens.Text(text="usr"), tokens.TemplateParamSeparator(),
tokens.TemplateOpen(), tokens.Text(text="bin"),
tokens.TemplateClose(), tokens.TemplateClose()]
valid = wrap(
[Template(wrap([Template(wrap([Template(wrap([Template(wraptext(
"foo")), Text("bar")]), params=[Parameter(wraptext("baz"),
wraptext("biz"))]), Text("buzz")])), Text("usr")]), params=[
Parameter(wraptext("1"), wrap([Template(wraptext("bin"))]),
showkey=False)])])
self.assertWikicodeEqual(valid, self.builder.build(test))

def test_integration2(self):
"""an even more audacious test for building a horrible wikicode mess"""
# {{a|b|{{c|[[d]]{{{e}}}}}}}[[f|{{{g}}}<!--h-->]]{{i|j=&nbsp;}}
test = [tokens.TemplateOpen(), tokens.Text(text="a"),
tokens.TemplateParamSeparator(), tokens.Text(text="b"),
tokens.TemplateParamSeparator(), tokens.TemplateOpen(),
tokens.Text(text="c"), tokens.TemplateParamSeparator(),
tokens.WikilinkOpen(), tokens.Text(text="d"),
tokens.WikilinkClose(), tokens.ArgumentOpen(),
tokens.Text(text="e"), tokens.ArgumentClose(),
tokens.TemplateClose(), tokens.TemplateClose(),
tokens.WikilinkOpen(), tokens.Text(text="f"),
tokens.WikilinkSeparator(), tokens.ArgumentOpen(),
tokens.Text(text="g"), tokens.ArgumentClose(),
tokens.CommentStart(), tokens.Text(text="h"),
tokens.CommentEnd(), tokens.WikilinkClose(),
tokens.TemplateOpen(), tokens.Text(text="i"),
tokens.TemplateParamSeparator(), tokens.Text(text="j"),
tokens.TemplateParamEquals(), tokens.HTMLEntityStart(),
tokens.Text(text="nbsp"), tokens.HTMLEntityEnd(),
tokens.TemplateClose()]
valid = wrap(
[Template(wraptext("a"), params=[Parameter(wraptext("1"), wraptext(
"b"), showkey=False), Parameter(wraptext("2"), wrap([Template(
wraptext("c"), params=[Parameter(wraptext("1"), wrap([Wikilink(
wraptext("d")), Argument(wraptext("e"))]), showkey=False)])]),
showkey=False)]), Wikilink(wraptext("f"), wrap([Argument(wraptext(
"g")), Comment(wraptext("h"))])), Template(wraptext("i"), params=[
Parameter(wraptext("j"), wrap([HTMLEntity("nbsp",
named=True)]))])])
self.assertWikicodeEqual(valid, self.builder.build(test))

if __name__ == "__main__":
unittest.main(verbosity=2)

+ 68
- 0
tests/test_comment.py View File

@@ -0,0 +1,68 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import unicode_literals
import unittest

from mwparserfromhell.compat import str
from mwparserfromhell.nodes import Comment

from ._test_tree_equality import TreeEqualityTestCase

class TestComment(TreeEqualityTestCase):
"""Test cases for the Comment node."""

def test_unicode(self):
"""test Comment.__unicode__()"""
node = Comment("foobar")
self.assertEqual("<!--foobar-->", str(node))

def test_iternodes(self):
"""test Comment.__iternodes__()"""
node = Comment("foobar")
gen = node.__iternodes__(None)
self.assertEqual((None, node), next(gen))
self.assertRaises(StopIteration, next, gen)

def test_strip(self):
"""test Comment.__strip__()"""
node = Comment("foobar")
for a in (True, False):
for b in (True, False):
self.assertIs(None, node.__strip__(a, b))

def test_showtree(self):
"""test Comment.__showtree__()"""
output = []
node = Comment("foobar")
node.__showtree__(output.append, None, None)
self.assertEqual(["<!--foobar-->"], output)

def test_contents(self):
"""test getter/setter for the contents attribute"""
node = Comment("foobar")
self.assertEqual("foobar", node.contents)
node.contents = "barfoo"
self.assertEqual("barfoo", node.contents)

if __name__ == "__main__":
unittest.main(verbosity=2)

+ 48
- 0
tests/test_ctokenizer.py View File

@@ -0,0 +1,48 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import unicode_literals
import unittest

try:
from mwparserfromhell.parser._tokenizer import CTokenizer
except ImportError:
CTokenizer = None

from ._test_tokenizer import TokenizerTestCase

@unittest.skipUnless(CTokenizer, "C tokenizer not available")
class TestCTokenizer(TokenizerTestCase, unittest.TestCase):
"""Test cases for the C tokenizer."""

@classmethod
def setUpClass(cls):
cls.tokenizer = CTokenizer

if not TokenizerTestCase.skip_others:
def test_uses_c(self):
"""make sure the C tokenizer identifies as using a C extension"""
self.assertTrue(CTokenizer.USES_C)
self.assertTrue(CTokenizer().USES_C)

if __name__ == "__main__":
unittest.main(verbosity=2)

+ 131
- 0
tests/test_docs.py View File

@@ -0,0 +1,131 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import print_function, unicode_literals
import json
import unittest

import mwparserfromhell
from mwparserfromhell.compat import py3k, str

from .compat import StringIO, urlencode, urlopen

class TestDocs(unittest.TestCase):
"""Integration test cases for mwparserfromhell's documentation."""

def assertPrint(self, input, output):
"""Assertion check that *input*, when printed, produces *output*."""
buff = StringIO()
print(input, end="", file=buff)
buff.seek(0)
self.assertEqual(output, buff.read())

def test_readme_1(self):
"""test a block of example code in the README"""
text = "I has a template! {{foo|bar|baz|eggs=spam}} See it?"
wikicode = mwparserfromhell.parse(text)
self.assertPrint(wikicode,
"I has a template! {{foo|bar|baz|eggs=spam}} See it?")
templates = wikicode.filter_templates()
if py3k:
self.assertPrint(templates, "['{{foo|bar|baz|eggs=spam}}']")
else:
self.assertPrint(templates, "[u'{{foo|bar|baz|eggs=spam}}']")
template = templates[0]
self.assertPrint(template.name, "foo")
if py3k:
self.assertPrint(template.params, "['bar', 'baz', 'eggs=spam']")
else:
self.assertPrint(template.params, "[u'bar', u'baz', u'eggs=spam']")
self.assertPrint(template.get(1).value, "bar")
self.assertPrint(template.get("eggs").value, "spam")

def test_readme_2(self):
"""test a block of example code in the README"""
code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}")
if py3k:
self.assertPrint(code.filter_templates(),
"['{{foo|this {{includes a|template}}}}']")
else:
self.assertPrint(code.filter_templates(),
"[u'{{foo|this {{includes a|template}}}}']")
foo = code.filter_templates()[0]
self.assertPrint(foo.get(1).value, "this {{includes a|template}}")
self.assertPrint(foo.get(1).value.filter_templates()[0],
"{{includes a|template}}")
self.assertPrint(foo.get(1).value.filter_templates()[0].get(1).value,
"template")

def test_readme_3(self):
"""test a block of example code in the README"""
text = "{{foo|{{bar}}={{baz|{{spam}}}}}}"
temps = mwparserfromhell.parse(text).filter_templates(recursive=True)
if py3k:
res = "['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}']"
else:
res = "[u'{{foo|{{bar}}={{baz|{{spam}}}}}}', u'{{bar}}', u'{{baz|{{spam}}}}', u'{{spam}}']"
self.assertPrint(temps, res)

def test_readme_4(self):
"""test a block of example code in the README"""
text = "{{cleanup}} '''Foo''' is a [[bar]]. {{uncategorized}}"
code = mwparserfromhell.parse(text)
for template in code.filter_templates():
if template.name == "cleanup" and not template.has_param("date"):
template.add("date", "July 2012")
res = "{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{uncategorized}}"
self.assertPrint(code, res)
code.replace("{{uncategorized}}", "{{bar-stub}}")
res = "{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}"
self.assertPrint(code, res)
if py3k:
res = "['{{cleanup|date=July 2012}}', '{{bar-stub}}']"
else:
res = "[u'{{cleanup|date=July 2012}}', u'{{bar-stub}}']"
self.assertPrint(code.filter_templates(), res)
text = str(code)
res = "{{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}"
self.assertPrint(text, res)
self.assertEqual(text, code)

def test_readme_5(self):
"""test a block of example code in the README; includes a web call"""
url1 = "http://en.wikipedia.org/w/api.php"
url2 = "http://en.wikipedia.org/w/index.php?title={0}&action=raw"
title = "Test"
data = {"action": "query", "prop": "revisions", "rvlimit": 1,
"rvprop": "content", "format": "json", "titles": title}
try:
raw = urlopen(url1, urlencode(data).encode("utf8")).read()
except IOError:
self.skipTest("cannot continue because of unsuccessful web call")
res = json.loads(raw.decode("utf8"))
text = list(res["query"]["pages"].values())[0]["revisions"][0]["*"]
try:
expected = urlopen(url2.format(title)).read().decode("utf8")
except IOError:
self.skipTest("cannot continue because of unsuccessful web call")
actual = mwparserfromhell.parse(text)
self.assertEqual(expected, actual)

if __name__ == "__main__":
unittest.main(verbosity=2)

+ 91
- 0
tests/test_heading.py View File

@@ -0,0 +1,91 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import unicode_literals
import unittest

from mwparserfromhell.compat import str
from mwparserfromhell.nodes import Heading, Text

from ._test_tree_equality import TreeEqualityTestCase, getnodes, wrap, wraptext

class TestHeading(TreeEqualityTestCase):
"""Test cases for the Heading node."""

def test_unicode(self):
"""test Heading.__unicode__()"""
node = Heading(wraptext("foobar"), 2)
self.assertEqual("==foobar==", str(node))
node2 = Heading(wraptext(" zzz "), 5)
self.assertEqual("===== zzz =====", str(node2))

def test_iternodes(self):
"""test Heading.__iternodes__()"""
text1, text2 = Text("foo"), Text("bar")
node = Heading(wrap([text1, text2]), 3)
gen = node.__iternodes__(getnodes)
self.assertEqual((None, node), next(gen))
self.assertEqual((node.title, text1), next(gen))
self.assertEqual((node.title, text2), next(gen))
self.assertRaises(StopIteration, next, gen)

def test_strip(self):
"""test Heading.__strip__()"""
node = Heading(wraptext("foobar"), 3)
for a in (True, False):
for b in (True, False):
self.assertEqual("foobar", node.__strip__(a, b))

def test_showtree(self):
"""test Heading.__showtree__()"""
output = []
getter = object()
get = lambda code: output.append((getter, code))
node1 = Heading(wraptext("foobar"), 3)
node2 = Heading(wraptext(" baz "), 4)
node1.__showtree__(output.append, get, None)
node2.__showtree__(output.append, get, None)
valid = ["===", (getter, node1.title), "===",
"====", (getter, node2.title), "===="]
self.assertEqual(valid, output)

def test_title(self):
"""test getter/setter for the title attribute"""
title = wraptext("foobar")
node = Heading(title, 3)
self.assertIs(title, node.title)
node.title = "héhehé"
self.assertWikicodeEqual(wraptext("héhehé"), node.title)

def test_level(self):
"""test getter/setter for the level attribute"""
node = Heading(wraptext("foobar"), 3)
self.assertEqual(3, node.level)
node.level = 5
self.assertEqual(5, node.level)
self.assertRaises(ValueError, setattr, node, "level", 0)
self.assertRaises(ValueError, setattr, node, "level", 7)
self.assertRaises(ValueError, setattr, node, "level", "abc")
self.assertRaises(ValueError, setattr, node, "level", False)

if __name__ == "__main__":
unittest.main(verbosity=2)

+ 169
- 0
tests/test_html_entity.py View File

@@ -0,0 +1,169 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import unicode_literals
import unittest

from mwparserfromhell.compat import str
from mwparserfromhell.nodes import HTMLEntity

from ._test_tree_equality import TreeEqualityTestCase, wrap

class TestHTMLEntity(TreeEqualityTestCase):
"""Test cases for the HTMLEntity node."""

def test_unicode(self):
"""test HTMLEntity.__unicode__()"""
node1 = HTMLEntity("nbsp", named=True, hexadecimal=False)
node2 = HTMLEntity("107", named=False, hexadecimal=False)
node3 = HTMLEntity("6b", named=False, hexadecimal=True)
node4 = HTMLEntity("6C", named=False, hexadecimal=True, hex_char="X")
self.assertEqual("&nbsp;", str(node1))
self.assertEqual("&#107;", str(node2))
self.assertEqual("&#x6b;", str(node3))
self.assertEqual("&#X6C;", str(node4))

def test_iternodes(self):
"""test HTMLEntity.__iternodes__()"""
node = HTMLEntity("nbsp", named=True, hexadecimal=False)
gen = node.__iternodes__(None)
self.assertEqual((None, node), next(gen))
self.assertRaises(StopIteration, next, gen)

def test_strip(self):
"""test HTMLEntity.__strip__()"""
node1 = HTMLEntity("nbsp", named=True, hexadecimal=False)
node2 = HTMLEntity("107", named=False, hexadecimal=False)
node3 = HTMLEntity("e9", named=False, hexadecimal=True)
for a in (True, False):
self.assertEqual("\xa0", node1.__strip__(True, a))
self.assertEqual("&nbsp;", node1.__strip__(False, a))
self.assertEqual("k", node2.__strip__(True, a))
self.assertEqual("&#107;", node2.__strip__(False, a))
self.assertEqual("é", node3.__strip__(True, a))
self.assertEqual("&#xe9;", node3.__strip__(False, a))

def test_showtree(self):
"""test HTMLEntity.__showtree__()"""
output = []
node1 = HTMLEntity("nbsp", named=True, hexadecimal=False)
node2 = HTMLEntity("107", named=False, hexadecimal=False)
node3 = HTMLEntity("e9", named=False, hexadecimal=True)
node1.__showtree__(output.append, None, None)
node2.__showtree__(output.append, None, None)
node3.__showtree__(output.append, None, None)
res = ["&nbsp;", "&#107;", "&#xe9;"]
self.assertEqual(res, output)

def test_value(self):
"""test getter/setter for the value attribute"""
node1 = HTMLEntity("nbsp")
node2 = HTMLEntity("107")
node3 = HTMLEntity("e9")
self.assertEqual("nbsp", node1.value)
self.assertEqual("107", node2.value)
self.assertEqual("e9", node3.value)

node1.value = "ffa4"
node2.value = 72
node3.value = "Sigma"
self.assertEqual("ffa4", node1.value)
self.assertFalse(node1.named)
self.assertTrue(node1.hexadecimal)
self.assertEqual("72", node2.value)
self.assertFalse(node2.named)
self.assertFalse(node2.hexadecimal)
self.assertEqual("Sigma", node3.value)
self.assertTrue(node3.named)
self.assertFalse(node3.hexadecimal)

node1.value = "10FFFF"
node2.value = 110000
node2.value = 1114111
self.assertRaises(ValueError, setattr, node3, "value", "")
self.assertRaises(ValueError, setattr, node3, "value", "foobar")
self.assertRaises(ValueError, setattr, node3, "value", True)
self.assertRaises(ValueError, setattr, node3, "value", -1)
self.assertRaises(ValueError, setattr, node1, "value", 110000)
self.assertRaises(ValueError, setattr, node1, "value", "1114112")

def test_named(self):
"""test getter/setter for the named attribute"""
node1 = HTMLEntity("nbsp")
node2 = HTMLEntity("107")
node3 = HTMLEntity("e9")
self.assertTrue(node1.named)
self.assertFalse(node2.named)
self.assertFalse(node3.named)
node1.named = 1
node2.named = 0
node3.named = 0
self.assertTrue(node1.named)
self.assertFalse(node2.named)
self.assertFalse(node3.named)
self.assertRaises(ValueError, setattr, node1, "named", False)
self.assertRaises(ValueError, setattr, node2, "named", True)
self.assertRaises(ValueError, setattr, node3, "named", True)

def test_hexadecimal(self):
"""test getter/setter for the hexadecimal attribute"""
node1 = HTMLEntity("nbsp")
node2 = HTMLEntity("107")
node3 = HTMLEntity("e9")
self.assertFalse(node1.hexadecimal)
self.assertFalse(node2.hexadecimal)
self.assertTrue(node3.hexadecimal)
node1.hexadecimal = False
node2.hexadecimal = True
node3.hexadecimal = False
self.assertFalse(node1.hexadecimal)
self.assertTrue(node2.hexadecimal)
self.assertFalse(node3.hexadecimal)
self.assertRaises(ValueError, setattr, node1, "hexadecimal", True)

def test_hex_char(self):
"""test getter/setter for the hex_char attribute"""
node1 = HTMLEntity("e9")
node2 = HTMLEntity("e9", hex_char="X")
self.assertEqual("x", node1.hex_char)
self.assertEqual("X", node2.hex_char)
node1.hex_char = "X"
node2.hex_char = "x"
self.assertEqual("X", node1.hex_char)
self.assertEqual("x", node2.hex_char)
self.assertRaises(ValueError, setattr, node1, "hex_char", 123)
self.assertRaises(ValueError, setattr, node1, "hex_char", "foobar")
self.assertRaises(ValueError, setattr, node1, "hex_char", True)

def test_normalize(self):
"""test getter/setter for the normalize attribute"""
node1 = HTMLEntity("nbsp")
node2 = HTMLEntity("107")
node3 = HTMLEntity("e9")
node4 = HTMLEntity("1f648")
self.assertEqual("\xa0", node1.normalize())
self.assertEqual("k", node2.normalize())
self.assertEqual("é", node3.normalize())
self.assertEqual("\U0001F648", node4.normalize())

if __name__ == "__main__":
unittest.main(verbosity=2)

+ 42
- 86
tests/test_parameter.py View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012 Ben Kurtovic <ben.kurtovic@verizon.net>
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
@@ -20,100 +20,56 @@
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import unicode_literals
import unittest

from mwparserfromhell.parameter import Parameter
from mwparserfromhell.template import Template
from mwparserfromhell.compat import str
from mwparserfromhell.nodes import Text
from mwparserfromhell.nodes.extras import Parameter

class TestParameter(unittest.TestCase):
def setUp(self):
self.name = "foo"
self.value1 = "bar"
self.value2 = "{{spam}}"
self.value3 = "bar{{spam}}"
self.value4 = "embedded {{eggs|spam|baz=buz}} {{goes}} here"
self.templates2 = [Template("spam")]
self.templates3 = [Template("spam")]
self.templates4 = [Template("eggs", [Parameter("1", "spam"),
Parameter("baz", "buz")]),
Template("goes")]
from ._test_tree_equality import TreeEqualityTestCase, wrap, wraptext

def test_construct(self):
Parameter(self.name, self.value1)
Parameter(self.name, self.value2, self.templates2)
Parameter(name=self.name, value=self.value3)
Parameter(name=self.name, value=self.value4, templates=self.templates4)
class TestParameter(TreeEqualityTestCase):
"""Test cases for the Parameter node extra."""

def test_unicode(self):
"""test Parameter.__unicode__()"""
node = Parameter(wraptext("1"), wraptext("foo"), showkey=False)
self.assertEqual("foo", str(node))
node2 = Parameter(wraptext("foo"), wraptext("bar"))
self.assertEqual("foo=bar", str(node2))

def test_name(self):
params = [
Parameter(self.name, self.value1),
Parameter(self.name, self.value2, self.templates2),
Parameter(name=self.name, value=self.value3),
Parameter(name=self.name, value=self.value4,
templates=self.templates4)
]
for param in params:
self.assertEqual(param.name, self.name)
"""test getter/setter for the name attribute"""
name1 = wraptext("1")
name2 = wraptext("foobar")
node1 = Parameter(name1, wraptext("foobar"), showkey=False)
node2 = Parameter(name2, wraptext("baz"))
self.assertIs(name1, node1.name)
self.assertIs(name2, node2.name)
node1.name = "héhehé"
node2.name = "héhehé"
self.assertWikicodeEqual(wraptext("héhehé"), node1.name)
self.assertWikicodeEqual(wraptext("héhehé"), node2.name)

def test_value(self):
tests = [
(Parameter(self.name, self.value1), self.value1),
(Parameter(self.name, self.value2, self.templates2), self.value2),
(Parameter(name=self.name, value=self.value3), self.value3),
(Parameter(name=self.name, value=self.value4,
templates=self.templates4), self.value4)
]
for param, correct in tests:
self.assertEqual(param.value, correct)

def test_templates(self):
tests = [
(Parameter(self.name, self.value3, self.templates3),
self.templates3),
(Parameter(name=self.name, value=self.value4,
templates=self.templates4), self.templates4)
]
for param, correct in tests:
self.assertEqual(param.templates, correct)

def test_magic(self):
params = [Parameter(self.name, self.value1),
Parameter(self.name, self.value2, self.templates2),
Parameter(self.name, self.value3, self.templates3),
Parameter(self.name, self.value4, self.templates4)]
for param in params:
self.assertEqual(repr(param), repr(param.value))
self.assertEqual(str(param), str(param.value))
self.assertIs(param < "eggs", param.value < "eggs")
self.assertIs(param <= "bar{{spam}}", param.value <= "bar{{spam}}")
self.assertIs(param == "bar", param.value == "bar")
self.assertIs(param != "bar", param.value != "bar")
self.assertIs(param > "eggs", param.value > "eggs")
self.assertIs(param >= "bar{{spam}}", param.value >= "bar{{spam}}")
self.assertEquals(bool(param), bool(param.value))
self.assertEquals(len(param), len(param.value))
self.assertEquals(list(param), list(param.value))
self.assertEquals(param[2], param.value[2])
self.assertEquals(list(reversed(param)),
list(reversed(param.value)))
self.assertIs("bar" in param, "bar" in param.value)
self.assertEquals(param + "test", param.value + "test")
self.assertEquals("test" + param, "test" + param.value)
# add param
# add template left
# add template right

self.assertEquals(param * 3, Parameter(param.name, param.value * 3,
param.templates * 3))
self.assertEquals(3 * param, Parameter(param.name, 3 * param.value,
3 * param.templates))
"""test getter/setter for the value attribute"""
value = wraptext("bar")
node = Parameter(wraptext("foo"), value)
self.assertIs(value, node.value)
node.value = "héhehé"
self.assertWikicodeEqual(wraptext("héhehé"), node.value)

# add param inplace
# add template implace
# add str inplace
# multiply int inplace
self.assertIsInstance(param, Parameter)
self.assertIsInstance(param.value, str)
def test_showkey(self):
"""test getter/setter for the showkey attribute"""
node1 = Parameter(wraptext("1"), wraptext("foo"), showkey=False)
node2 = Parameter(wraptext("foo"), wraptext("bar"))
self.assertFalse(node1.showkey)
self.assertTrue(node2.showkey)
node1.showkey = True
node2.showkey = ""
self.assertTrue(node1.showkey)
self.assertFalse(node2.showkey)

if __name__ == "__main__":
unittest.main(verbosity=2)

+ 38
- 35
tests/test_parser.py View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012 Ben Kurtovic <ben.kurtovic@verizon.net>
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
@@ -20,44 +20,47 @@
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import unicode_literals
import unittest

from mwparserfromhell.parameter import Parameter
from mwparserfromhell.parser import Parser
from mwparserfromhell.template import Template
from mwparserfromhell import parser
from mwparserfromhell.nodes import Template, Text, Wikilink
from mwparserfromhell.nodes.extras import Parameter

TESTS = [
("", []),
("abcdef ghijhk", []),
("abc{this is not a template}def", []),
("neither is {{this one}nor} {this one {despite}} containing braces", []),
("this is an acceptable {{template}}", [Template("template")]),
("{{multiple}}{{templates}}", [Template("multiple"),
Template("templates")]),
("multiple {{-}} templates {{+}}!", [Template("-"), Template("+")]),
("{{{no templates here}}}", []),
("{ {{templates here}}}", [Template("templates here")]),
("{{{{I do not exist}}}}", []),
("{{foo|bar|baz|eggs=spam}}",
[Template("foo", [Parameter("1", "bar"), Parameter("2", "baz"),
Parameter("eggs", "spam")])]),
("{{abc def|ghi|jk=lmno|pqr|st=uv|wx|yz}}",
[Template("abc def", [Parameter("1", "ghi"), Parameter("jk", "lmno"),
Parameter("2", "pqr"), Parameter("st", "uv"),
Parameter("3", "wx"), Parameter("4", "yz")])]),
("{{this has a|{{template}}|inside of it}}",
[Template("this has a", [Parameter("1", "{{template}}",
[Template("template")]),
Parameter("2", "inside of it")])]),
("{{{{I exist}} }}", [Template("I exist", [] )]),
("{{}}")
]
from ._test_tree_equality import TreeEqualityTestCase, wrap, wraptext
from .compat import range

class TestParser(unittest.TestCase):
def test_parse(self):
parser = Parser()
for unparsed, parsed in TESTS:
self.assertEqual(parser.parse(unparsed), parsed)
class TestParser(TreeEqualityTestCase):
"""Tests for the Parser class itself, which tokenizes and builds nodes."""

def test_use_c(self):
"""make sure the correct tokenizer is used"""
if parser.use_c:
self.assertTrue(parser.Parser(None)._tokenizer.USES_C)
parser.use_c = False
self.assertFalse(parser.Parser(None)._tokenizer.USES_C)

def test_parsing(self):
"""integration test for parsing overall"""
text = "this is text; {{this|is=a|template={{with|[[links]]|in}}it}}"
expected = wrap([
Text("this is text; "),
Template(wraptext("this"), [
Parameter(wraptext("is"), wraptext("a")),
Parameter(wraptext("template"), wrap([
Template(wraptext("with"), [
Parameter(wraptext("1"),
wrap([Wikilink(wraptext("links"))]),
showkey=False),
Parameter(wraptext("2"),
wraptext("in"), showkey=False)
]),
Text("it")
]))
])
])
actual = parser.Parser(text).parse()
self.assertWikicodeEqual(expected, actual)

if __name__ == "__main__":
unittest.main(verbosity=2)

+ 44
- 0
tests/test_pytokenizer.py View File

@@ -0,0 +1,44 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import unicode_literals
import unittest

from mwparserfromhell.parser.tokenizer import Tokenizer

from ._test_tokenizer import TokenizerTestCase

class TestPyTokenizer(TokenizerTestCase, unittest.TestCase):
"""Test cases for the Python tokenizer."""

@classmethod
def setUpClass(cls):
cls.tokenizer = Tokenizer

if not TokenizerTestCase.skip_others:
def test_uses_c(self):
"""make sure the Python tokenizer identifies as not using C"""
self.assertFalse(Tokenizer.USES_C)
self.assertFalse(Tokenizer().USES_C)

if __name__ == "__main__":
unittest.main(verbosity=2)

+ 392
- 0
tests/test_smart_list.py View File

@@ -0,0 +1,392 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import unicode_literals
import unittest

from mwparserfromhell.compat import py3k
from mwparserfromhell.smart_list import SmartList, _ListProxy

from .compat import range

class TestSmartList(unittest.TestCase):
"""Test cases for the SmartList class and its child, _ListProxy."""

def _test_get_set_del_item(self, builder):
"""Run tests on __get/set/delitem__ of a list built with *builder*."""
def assign(L, s1, s2, s3, val):
L[s1:s2:s3] = val
def delete(L, s1):
del L[s1]

list1 = builder([0, 1, 2, 3, "one", "two"])
list2 = builder(list(range(10)))

self.assertEqual(1, list1[1])
self.assertEqual("one", list1[-2])
self.assertEqual([2, 3], list1[2:4])
self.assertRaises(IndexError, lambda: list1[6])
self.assertRaises(IndexError, lambda: list1[-7])

self.assertEqual([0, 1, 2], list1[:3])
self.assertEqual([0, 1, 2, 3, "one", "two"], list1[:])
self.assertEqual([3, "one", "two"], list1[3:])
self.assertEqual(["one", "two"], list1[-2:])
self.assertEqual([0, 1], list1[:-4])
self.assertEqual([], list1[6:])
self.assertEqual([], list1[4:2])

self.assertEqual([0, 2, "one"], list1[0:5:2])
self.assertEqual([0, 2], list1[0:-3:2])
self.assertEqual([0, 1, 2, 3, "one", "two"], list1[::])
self.assertEqual([2, 3, "one", "two"], list1[2::])
self.assertEqual([0, 1, 2, 3], list1[:4:])
self.assertEqual([2, 3], list1[2:4:])
self.assertEqual([0, 2, 4, 6, 8], list2[::2])
self.assertEqual([2, 5, 8], list2[2::3])
self.assertEqual([0, 3], list2[:6:3])
self.assertEqual([2, 5, 8], list2[-8:9:3])
self.assertEqual([], list2[100000:1000:-100])

list1[3] = 100
self.assertEqual(100, list1[3])
list1[-3] = 101
self.assertEqual([0, 1, 2, 101, "one", "two"], list1)
list1[5:] = [6, 7, 8]
self.assertEqual([6, 7, 8], list1[5:])
self.assertEqual([0, 1, 2, 101, "one", 6, 7, 8], list1)
list1[2:4] = [-1, -2, -3, -4, -5]
self.assertEqual([0, 1, -1, -2, -3, -4, -5, "one", 6, 7, 8], list1)
list1[0:-3] = [99]
self.assertEqual([99, 6, 7, 8], list1)
list2[0:6:2] = [100, 102, 104]
self.assertEqual([100, 1, 102, 3, 104, 5, 6, 7, 8, 9], list2)
list2[::3] = [200, 203, 206, 209]
self.assertEqual([200, 1, 102, 203, 104, 5, 206, 7, 8, 209], list2)
list2[::] = range(7)
self.assertEqual([0, 1, 2, 3, 4, 5, 6], list2)
self.assertRaises(ValueError, assign, list2, 0, 5, 2,
[100, 102, 104, 106])

del list2[2]
self.assertEqual([0, 1, 3, 4, 5, 6], list2)
del list2[-3]
self.assertEqual([0, 1, 3, 5, 6], list2)
self.assertRaises(IndexError, delete, list2, 100)
self.assertRaises(IndexError, delete, list2, -6)
list2[:] = range(10)
del list2[3:6]
self.assertEqual([0, 1, 2, 6, 7, 8, 9], list2)
del list2[-2:]
self.assertEqual([0, 1, 2, 6, 7], list2)
del list2[:2]
self.assertEqual([2, 6, 7], list2)
list2[:] = range(10)
del list2[2:8:2]
self.assertEqual([0, 1, 3, 5, 7, 8, 9], list2)

def _test_add_radd_iadd(self, builder):
"""Run tests on __r/i/add__ of a list built with *builder*."""
list1 = builder(range(5))
list2 = builder(range(5, 10))
self.assertEqual([0, 1, 2, 3, 4, 5, 6], list1 + [5, 6])
self.assertEqual([0, 1, 2, 3, 4], list1)
self.assertEqual(list(range(10)), list1 + list2)
self.assertEqual([-2, -1, 0, 1, 2, 3, 4], [-2, -1] + list1)
self.assertEqual([0, 1, 2, 3, 4], list1)
list1 += ["foo", "bar", "baz"]
self.assertEqual([0, 1, 2, 3, 4, "foo", "bar", "baz"], list1)

def _test_other_magic_methods(self, builder):
"""Run tests on other magic methods of a list built with *builder*."""
list1 = builder([0, 1, 2, 3, "one", "two"])
list2 = builder([])
list3 = builder([0, 2, 3, 4])
list4 = builder([0, 1, 2])

if py3k:
self.assertEqual("[0, 1, 2, 3, 'one', 'two']", str(list1))
self.assertEqual(b"\x00\x01\x02", bytes(list4))
self.assertEqual("[0, 1, 2, 3, 'one', 'two']", repr(list1))
else:
self.assertEqual("[0, 1, 2, 3, u'one', u'two']", unicode(list1))
self.assertEqual(b"[0, 1, 2, 3, u'one', u'two']", str(list1))
self.assertEqual(b"[0, 1, 2, 3, u'one', u'two']", repr(list1))

self.assertTrue(list1 < list3)
self.assertTrue(list1 <= list3)
self.assertFalse(list1 == list3)
self.assertTrue(list1 != list3)
self.assertFalse(list1 > list3)
self.assertFalse(list1 >= list3)

other1 = [0, 2, 3, 4]
self.assertTrue(list1 < other1)
self.assertTrue(list1 <= other1)
self.assertFalse(list1 == other1)
self.assertTrue(list1 != other1)
self.assertFalse(list1 > other1)
self.assertFalse(list1 >= other1)

other2 = [0, 0, 1, 2]
self.assertFalse(list1 < other2)
self.assertFalse(list1 <= other2)
self.assertFalse(list1 == other2)
self.assertTrue(list1 != other2)
self.assertTrue(list1 > other2)
self.assertTrue(list1 >= other2)

other3 = [0, 1, 2, 3, "one", "two"]
self.assertFalse(list1 < other3)
self.assertTrue(list1 <= other3)
self.assertTrue(list1 == other3)
self.assertFalse(list1 != other3)
self.assertFalse(list1 > other3)
self.assertTrue(list1 >= other3)

self.assertTrue(bool(list1))
self.assertFalse(bool(list2))

self.assertEqual(6, len(list1))
self.assertEqual(0, len(list2))

out = []
for obj in list1:
out.append(obj)
self.assertEqual([0, 1, 2, 3, "one", "two"], out)

out = []
for ch in list2:
out.append(ch)
self.assertEqual([], out)

gen1 = iter(list1)
out = []
for i in range(len(list1)):
out.append(next(gen1))
self.assertRaises(StopIteration, next, gen1)
self.assertEqual([0, 1, 2, 3, "one", "two"], out)
gen2 = iter(list2)
self.assertRaises(StopIteration, next, gen2)

self.assertEqual(["two", "one", 3, 2, 1, 0], list(reversed(list1)))
self.assertEqual([], list(reversed(list2)))

self.assertTrue("one" in list1)
self.assertTrue(3 in list1)
self.assertFalse(10 in list1)
self.assertFalse(0 in list2)

self.assertEqual([], list2 * 5)
self.assertEqual([], 5 * list2)
self.assertEqual([0, 1, 2, 0, 1, 2, 0, 1, 2], list4 * 3)
self.assertEqual([0, 1, 2, 0, 1, 2, 0, 1, 2], 3 * list4)
list4 *= 2
self.assertEqual([0, 1, 2, 0, 1, 2], list4)

def _test_list_methods(self, builder):
"""Run tests on the public methods of a list built with *builder*."""
list1 = builder(range(5))
list2 = builder(["foo"])
list3 = builder([("a", 5), ("d", 2), ("b", 8), ("c", 3)])

list1.append(5)
list1.append(1)
list1.append(2)
self.assertEqual([0, 1, 2, 3, 4, 5, 1, 2], list1)

self.assertEqual(0, list1.count(6))
self.assertEqual(2, list1.count(1))

list1.extend(range(5, 8))
self.assertEqual([0, 1, 2, 3, 4, 5, 1, 2, 5, 6, 7], list1)

self.assertEqual(1, list1.index(1))
self.assertEqual(6, list1.index(1, 3))
self.assertEqual(6, list1.index(1, 3, 7))
self.assertRaises(ValueError, list1.index, 1, 3, 5)

list1.insert(0, -1)
self.assertEqual([-1, 0, 1, 2, 3, 4, 5, 1, 2, 5, 6, 7], list1)
list1.insert(-1, 6.5)
self.assertEqual([-1, 0, 1, 2, 3, 4, 5, 1, 2, 5, 6, 6.5, 7], list1)
list1.insert(13, 8)
self.assertEqual([-1, 0, 1, 2, 3, 4, 5, 1, 2, 5, 6, 6.5, 7, 8], list1)

self.assertEqual(8, list1.pop())
self.assertEqual(7, list1.pop())
self.assertEqual([-1, 0, 1, 2, 3, 4, 5, 1, 2, 5, 6, 6.5], list1)
self.assertEqual(-1, list1.pop(0))
self.assertEqual(5, list1.pop(5))
self.assertEqual(6.5, list1.pop(-1))
self.assertEqual([0, 1, 2, 3, 4, 1, 2, 5, 6], list1)
self.assertEqual("foo", list2.pop())
self.assertRaises(IndexError, list2.pop)
self.assertEqual([], list2)

list1.remove(6)
self.assertEqual([0, 1, 2, 3, 4, 1, 2, 5], list1)
list1.remove(1)
self.assertEqual([0, 2, 3, 4, 1, 2, 5], list1)
list1.remove(1)
self.assertEqual([0, 2, 3, 4, 2, 5], list1)
self.assertRaises(ValueError, list1.remove, 1)

list1.reverse()
self.assertEqual([5, 2, 4, 3, 2, 0], list1)

list1.sort()
self.assertEqual([0, 2, 2, 3, 4, 5], list1)
list1.sort(reverse=True)
self.assertEqual([5, 4, 3, 2, 2, 0], list1)
if not py3k:
func = lambda x, y: abs(3 - x) - abs(3 - y) # Distance from 3
list1.sort(cmp=func)
self.assertEqual([3, 4, 2, 2, 5, 0], list1)
list1.sort(cmp=func, reverse=True)
self.assertEqual([0, 5, 4, 2, 2, 3], list1)
list3.sort(key=lambda i: i[1])
self.assertEqual([("d", 2), ("c", 3), ("a", 5), ("b", 8)], list3)
list3.sort(key=lambda i: i[1], reverse=True)
self.assertEqual([("b", 8), ("a", 5), ("c", 3), ("d", 2)], list3)

def test_docs(self):
"""make sure the methods of SmartList/_ListProxy have docstrings"""
methods = ["append", "count", "extend", "index", "insert", "pop",
"remove", "reverse", "sort"]
for meth in methods:
expected = getattr(list, meth).__doc__
smartlist_doc = getattr(SmartList, meth).__doc__
listproxy_doc = getattr(_ListProxy, meth).__doc__
self.assertEqual(expected, smartlist_doc)
self.assertEqual(expected, listproxy_doc)

def test_doctest(self):
"""make sure the test embedded in SmartList's docstring passes"""
parent = SmartList([0, 1, 2, 3])
self.assertEqual([0, 1, 2, 3], parent)
child = parent[2:]
self.assertEqual([2, 3], child)
child.append(4)
self.assertEqual([2, 3, 4], child)
self.assertEqual([0, 1, 2, 3, 4], parent)

def test_parent_get_set_del(self):
"""make sure SmartList's getitem/setitem/delitem work"""
self._test_get_set_del_item(SmartList)

def test_parent_add(self):
"""make sure SmartList's add/radd/iadd work"""
self._test_add_radd_iadd(SmartList)

def test_parent_unaffected_magics(self):
"""sanity checks against SmartList features that were not modified"""
self._test_other_magic_methods(SmartList)

def test_parent_methods(self):
"""make sure SmartList's non-magic methods work, like append()"""
self._test_list_methods(SmartList)

def test_child_get_set_del(self):
"""make sure _ListProxy's getitem/setitem/delitem work"""
self._test_get_set_del_item(lambda L: SmartList(list(L))[:])
self._test_get_set_del_item(lambda L: SmartList([999] + list(L))[1:])
self._test_get_set_del_item(lambda L: SmartList(list(L) + [999])[:-1])
builder = lambda L: SmartList([101, 102] + list(L) + [201, 202])[2:-2]
self._test_get_set_del_item(builder)

def test_child_add(self):
"""make sure _ListProxy's add/radd/iadd work"""
self._test_add_radd_iadd(lambda L: SmartList(list(L))[:])
self._test_add_radd_iadd(lambda L: SmartList([999] + list(L))[1:])
self._test_add_radd_iadd(lambda L: SmartList(list(L) + [999])[:-1])
builder = lambda L: SmartList([101, 102] + list(L) + [201, 202])[2:-2]
self._test_add_radd_iadd(builder)

def test_child_other_magics(self):
"""make sure _ListProxy's other magically implemented features work"""
self._test_other_magic_methods(lambda L: SmartList(list(L))[:])
self._test_other_magic_methods(lambda L: SmartList([999] + list(L))[1:])
self._test_other_magic_methods(lambda L: SmartList(list(L) + [999])[:-1])
builder = lambda L: SmartList([101, 102] + list(L) + [201, 202])[2:-2]
self._test_other_magic_methods(builder)

def test_child_methods(self):
"""make sure _ListProxy's non-magic methods work, like append()"""
self._test_list_methods(lambda L: SmartList(list(L))[:])
self._test_list_methods(lambda L: SmartList([999] + list(L))[1:])
self._test_list_methods(lambda L: SmartList(list(L) + [999])[:-1])
builder = lambda L: SmartList([101, 102] + list(L) + [201, 202])[2:-2]
self._test_list_methods(builder)

def test_influence(self):
"""make sure changes are propagated from parents to children"""
parent = SmartList([0, 1, 2, 3, 4, 5])
child1 = parent[2:]
child2 = parent[2:5]

parent.append(6)
child1.append(7)
child2.append(4.5)
self.assertEqual([0, 1, 2, 3, 4, 4.5, 5, 6, 7], parent)
self.assertEqual([2, 3, 4, 4.5, 5, 6, 7], child1)
self.assertEqual([2, 3, 4, 4.5], child2)

parent.insert(0, -1)
parent.insert(4, 2.5)
parent.insert(10, 6.5)
self.assertEqual([-1, 0, 1, 2, 2.5, 3, 4, 4.5, 5, 6, 6.5, 7], parent)
self.assertEqual([2, 2.5, 3, 4, 4.5, 5, 6, 6.5, 7], child1)
self.assertEqual([2, 2.5, 3, 4, 4.5], child2)

self.assertEqual(7, parent.pop())
self.assertEqual(6.5, child1.pop())
self.assertEqual(4.5, child2.pop())
self.assertEqual([-1, 0, 1, 2, 2.5, 3, 4, 5, 6], parent)
self.assertEqual([2, 2.5, 3, 4, 5, 6], child1)
self.assertEqual([2, 2.5, 3, 4], child2)

parent.remove(-1)
child1.remove(2.5)
self.assertEqual([0, 1, 2, 3, 4, 5, 6], parent)
self.assertEqual([2, 3, 4, 5, 6], child1)
self.assertEqual([2, 3, 4], child2)

self.assertEqual(0, parent.pop(0))
self.assertEqual([1, 2, 3, 4, 5, 6], parent)
self.assertEqual([2, 3, 4, 5, 6], child1)
self.assertEqual([2, 3, 4], child2)

child2.reverse()
self.assertEqual([1, 4, 3, 2, 5, 6], parent)
self.assertEqual([4, 3, 2, 5, 6], child1)
self.assertEqual([4, 3, 2], child2)

parent.extend([7, 8])
child1.extend([8.1, 8.2])
child2.extend([1.9, 1.8])
self.assertEqual([1, 4, 3, 2, 1.9, 1.8, 5, 6, 7, 8, 8.1, 8.2], parent)
self.assertEqual([4, 3, 2, 1.9, 1.8, 5, 6, 7, 8, 8.1, 8.2], child1)
self.assertEqual([4, 3, 2, 1.9, 1.8], child2)

if __name__ == "__main__":
unittest.main(verbosity=2)

+ 435
- 0
tests/test_string_mixin.py View File

@@ -0,0 +1,435 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import unicode_literals
from sys import getdefaultencoding
from types import GeneratorType
import unittest

from mwparserfromhell.compat import bytes, py3k, str
from mwparserfromhell.string_mixin import StringMixIn

from .compat import range

class _FakeString(StringMixIn):
def __init__(self, data):
self._data = data

def __unicode__(self):
return self._data


class TestStringMixIn(unittest.TestCase):
"""Test cases for the StringMixIn class."""

def test_docs(self):
"""make sure the various methods of StringMixIn have docstrings"""
methods = [
"capitalize", "center", "count", "encode", "endswith",
"expandtabs", "find", "format", "index", "isalnum", "isalpha",
"isdecimal", "isdigit", "islower", "isnumeric", "isspace",
"istitle", "isupper", "join", "ljust", "lower", "lstrip",
"partition", "replace", "rfind", "rindex", "rjust", "rpartition",
"rsplit", "rstrip", "split", "splitlines", "startswith", "strip",
"swapcase", "title", "translate", "upper", "zfill"]
if py3k:
methods.extend(["casefold", "format_map", "isidentifier",
"isprintable", "maketrans"])
else:
methods.append("decode")
for meth in methods:
expected = getattr(str, meth).__doc__
actual = getattr(StringMixIn, meth).__doc__
self.assertEqual(expected, actual)

def test_types(self):
"""make sure StringMixIns convert to different types correctly"""
fstr = _FakeString("fake string")
self.assertEqual(str(fstr), "fake string")
self.assertEqual(bytes(fstr), b"fake string")
if py3k:
self.assertEqual(repr(fstr), "'fake string'")
else:
self.assertEqual(repr(fstr), b"u'fake string'")

self.assertIsInstance(str(fstr), str)
self.assertIsInstance(bytes(fstr), bytes)
if py3k:
self.assertIsInstance(repr(fstr), str)
else:
self.assertIsInstance(repr(fstr), bytes)

def test_comparisons(self):
"""make sure comparison operators work"""
str1 = _FakeString("this is a fake string")
str2 = _FakeString("this is a fake string")
str3 = _FakeString("fake string, this is")
str4 = "this is a fake string"
str5 = "fake string, this is"

self.assertFalse(str1 > str2)
self.assertTrue(str1 >= str2)
self.assertTrue(str1 == str2)
self.assertFalse(str1 != str2)
self.assertFalse(str1 < str2)
self.assertTrue(str1 <= str2)

self.assertTrue(str1 > str3)
self.assertTrue(str1 >= str3)
self.assertFalse(str1 == str3)
self.assertTrue(str1 != str3)
self.assertFalse(str1 < str3)
self.assertFalse(str1 <= str3)

self.assertFalse(str1 > str4)
self.assertTrue(str1 >= str4)
self.assertTrue(str1 == str4)
self.assertFalse(str1 != str4)
self.assertFalse(str1 < str4)
self.assertTrue(str1 <= str4)

self.assertTrue(str1 > str5)
self.assertTrue(str1 >= str5)
self.assertFalse(str1 == str5)
self.assertTrue(str1 != str5)
self.assertFalse(str1 < str5)
self.assertFalse(str1 <= str5)

def test_other_magics(self):
"""test other magically implemented features, like len() and iter()"""
str1 = _FakeString("fake string")
str2 = _FakeString("")
expected = ["f", "a", "k", "e", " ", "s", "t", "r", "i", "n", "g"]

self.assertTrue(str1)
self.assertFalse(str2)
self.assertEqual(11, len(str1))
self.assertEqual(0, len(str2))

out = []
for ch in str1:
out.append(ch)
self.assertEqual(expected, out)

out = []
for ch in str2:
out.append(ch)
self.assertEqual([], out)

gen1 = iter(str1)
gen2 = iter(str2)
self.assertIsInstance(gen1, GeneratorType)
self.assertIsInstance(gen2, GeneratorType)

out = []
for i in range(len(str1)):
out.append(next(gen1))
self.assertRaises(StopIteration, next, gen1)
self.assertEqual(expected, out)
self.assertRaises(StopIteration, next, gen2)

self.assertEqual("gnirts ekaf", "".join(list(reversed(str1))))
self.assertEqual([], list(reversed(str2)))

self.assertEqual("f", str1[0])
self.assertEqual(" ", str1[4])
self.assertEqual("g", str1[10])
self.assertEqual("n", str1[-2])
self.assertRaises(IndexError, lambda: str1[11])
self.assertRaises(IndexError, lambda: str2[0])

self.assertTrue("k" in str1)
self.assertTrue("fake" in str1)
self.assertTrue("str" in str1)
self.assertTrue("" in str1)
self.assertTrue("" in str2)
self.assertFalse("real" in str1)
self.assertFalse("s" in str2)

def test_other_methods(self):
"""test the remaining non-magic methods of StringMixIn"""
str1 = _FakeString("fake string")
self.assertEqual("Fake string", str1.capitalize())

self.assertEqual(" fake string ", str1.center(15))
self.assertEqual(" fake string ", str1.center(16))
self.assertEqual("qqfake stringqq", str1.center(15, "q"))

self.assertEqual(1, str1.count("e"))
self.assertEqual(0, str1.count("z"))
self.assertEqual(1, str1.count("r", 7))
self.assertEqual(0, str1.count("r", 8))
self.assertEqual(1, str1.count("r", 5, 9))
self.assertEqual(0, str1.count("r", 5, 7))

if not py3k:
str2 = _FakeString("fo")
self.assertEqual(str1, str1.decode())
actual = _FakeString("\\U00010332\\U0001033f\\U00010344")
self.assertEqual("𐌲𐌿𐍄", actual.decode("unicode_escape"))
self.assertRaises(UnicodeError, str2.decode, "punycode")
self.assertEqual("", str2.decode("punycode", "ignore"))

str3 = _FakeString("𐌲𐌿𐍄")
actual = b"\xF0\x90\x8C\xB2\xF0\x90\x8C\xBF\xF0\x90\x8D\x84"
self.assertEqual(b"fake string", str1.encode())
self.assertEqual(actual, str3.encode("utf-8"))
self.assertEqual(actual, str3.encode(encoding="utf-8"))
if getdefaultencoding() == "ascii":
self.assertRaises(UnicodeEncodeError, str3.encode)
elif getdefaultencoding() == "utf-8":
self.assertEqual(actual, str3.encode())
self.assertRaises(UnicodeEncodeError, str3.encode, "ascii")
self.assertRaises(UnicodeEncodeError, str3.encode, "ascii", "strict")
if getdefaultencoding() == "ascii":
self.assertRaises(UnicodeEncodeError, str3.encode, errors="strict")
elif getdefaultencoding() == "utf-8":
self.assertEqual(actual, str3.encode(errors="strict"))
self.assertEqual(b"", str3.encode("ascii", "ignore"))
if getdefaultencoding() == "ascii":
self.assertEqual(b"", str3.encode(errors="ignore"))
elif getdefaultencoding() == "utf-8":
self.assertEqual(actual, str3.encode(errors="ignore"))

self.assertTrue(str1.endswith("ing"))
self.assertFalse(str1.endswith("ingh"))

str4 = _FakeString("\tfoobar")
self.assertEqual("fake string", str1)
self.assertEqual(" foobar", str4.expandtabs())
self.assertEqual(" foobar", str4.expandtabs(4))

self.assertEqual(3, str1.find("e"))
self.assertEqual(-1, str1.find("z"))
self.assertEqual(7, str1.find("r", 7))
self.assertEqual(-1, str1.find("r", 8))
self.assertEqual(7, str1.find("r", 5, 9))
self.assertEqual(-1, str1.find("r", 5, 7))

str5 = _FakeString("foo{0}baz")
str6 = _FakeString("foo{abc}baz")
str7 = _FakeString("foo{0}{abc}buzz")
str8 = _FakeString("{0}{1}")
self.assertEqual("fake string", str1.format())
self.assertEqual("foobarbaz", str5.format("bar"))
self.assertEqual("foobarbaz", str6.format(abc="bar"))
self.assertEqual("foobarbazbuzz", str7.format("bar", abc="baz"))
self.assertRaises(IndexError, str8.format, "abc")

if py3k:
self.assertEqual("fake string", str1.format_map({}))
self.assertEqual("foobarbaz", str6.format_map({"abc": "bar"}))
self.assertRaises(ValueError, str5.format_map, {0: "abc"})

self.assertEqual(3, str1.index("e"))
self.assertRaises(ValueError, str1.index, "z")
self.assertEqual(7, str1.index("r", 7))
self.assertRaises(ValueError, str1.index, "r", 8)
self.assertEqual(7, str1.index("r", 5, 9))
self.assertRaises(ValueError, str1.index, "r", 5, 7)

str9 = _FakeString("foobar")
str10 = _FakeString("foobar123")
str11 = _FakeString("foo bar")
self.assertTrue(str9.isalnum())
self.assertTrue(str10.isalnum())
self.assertFalse(str11.isalnum())

self.assertTrue(str9.isalpha())
self.assertFalse(str10.isalpha())
self.assertFalse(str11.isalpha())

str12 = _FakeString("123")
str13 = _FakeString("\u2155")
str14 = _FakeString("\u00B2")
self.assertFalse(str9.isdecimal())
self.assertTrue(str12.isdecimal())
self.assertFalse(str13.isdecimal())
self.assertFalse(str14.isdecimal())

self.assertFalse(str9.isdigit())
self.assertTrue(str12.isdigit())
self.assertFalse(str13.isdigit())
self.assertTrue(str14.isdigit())

if py3k:
self.assertTrue(str9.isidentifier())
self.assertTrue(str10.isidentifier())
self.assertFalse(str11.isidentifier())
self.assertFalse(str12.isidentifier())

str15 = _FakeString("")
str16 = _FakeString("FooBar")
self.assertTrue(str9.islower())
self.assertFalse(str15.islower())
self.assertFalse(str16.islower())

self.assertFalse(str9.isnumeric())
self.assertTrue(str12.isnumeric())
self.assertTrue(str13.isnumeric())
self.assertTrue(str14.isnumeric())

if py3k:
str16B = _FakeString("\x01\x02")
self.assertTrue(str9.isprintable())
self.assertTrue(str13.isprintable())
self.assertTrue(str14.isprintable())
self.assertTrue(str15.isprintable())
self.assertFalse(str16B.isprintable())

str17 = _FakeString(" ")
str18 = _FakeString("\t \t \r\n")
self.assertFalse(str1.isspace())
self.assertFalse(str9.isspace())
self.assertTrue(str17.isspace())
self.assertTrue(str18.isspace())

str19 = _FakeString("This Sentence Looks Like A Title")
str20 = _FakeString("This sentence doesn't LookLikeATitle")
self.assertFalse(str15.istitle())
self.assertTrue(str19.istitle())
self.assertFalse(str20.istitle())

str21 = _FakeString("FOOBAR")
self.assertFalse(str9.isupper())
self.assertFalse(str15.isupper())
self.assertTrue(str21.isupper())

self.assertEqual("foobar", str15.join(["foo", "bar"]))
self.assertEqual("foo123bar123baz", str12.join(("foo", "bar", "baz")))

self.assertEqual("fake string ", str1.ljust(15))
self.assertEqual("fake string ", str1.ljust(16))
self.assertEqual("fake stringqqqq", str1.ljust(15, "q"))

str22 = _FakeString("ß")
self.assertEqual("", str15.lower())
self.assertEqual("foobar", str16.lower())
self.assertEqual("ß", str22.lower())
if py3k:
self.assertEqual("", str15.casefold())
self.assertEqual("foobar", str16.casefold())
self.assertEqual("ss", str22.casefold())

str23 = _FakeString(" fake string ")
self.assertEqual("fake string", str1.lstrip())
self.assertEqual("fake string ", str23.lstrip())
self.assertEqual("ke string", str1.lstrip("abcdef"))

self.assertEqual(("fa", "ke", " string"), str1.partition("ke"))
self.assertEqual(("fake string", "", ""), str1.partition("asdf"))

str24 = _FakeString("boo foo moo")
self.assertEqual("real string", str1.replace("fake", "real"))
self.assertEqual("bu fu moo", str24.replace("oo", "u", 2))

self.assertEqual(3, str1.rfind("e"))
self.assertEqual(-1, str1.rfind("z"))
self.assertEqual(7, str1.rfind("r", 7))
self.assertEqual(-1, str1.rfind("r", 8))
self.assertEqual(7, str1.rfind("r", 5, 9))
self.assertEqual(-1, str1.rfind("r", 5, 7))

self.assertEqual(3, str1.rindex("e"))
self.assertRaises(ValueError, str1.rindex, "z")
self.assertEqual(7, str1.rindex("r", 7))
self.assertRaises(ValueError, str1.rindex, "r", 8)
self.assertEqual(7, str1.rindex("r", 5, 9))
self.assertRaises(ValueError, str1.rindex, "r", 5, 7)

self.assertEqual(" fake string", str1.rjust(15))
self.assertEqual(" fake string", str1.rjust(16))
self.assertEqual("qqqqfake string", str1.rjust(15, "q"))

self.assertEqual(("fa", "ke", " string"), str1.rpartition("ke"))
self.assertEqual(("", "", "fake string"), str1.rpartition("asdf"))

str25 = _FakeString(" this is a sentence with whitespace ")
actual = ["this", "is", "a", "sentence", "with", "whitespace"]
self.assertEqual(actual, str25.rsplit())
self.assertEqual(actual, str25.rsplit(None))
actual = ["", "", "", "this", "is", "a", "", "", "sentence", "with",
"", "whitespace", ""]
self.assertEqual(actual, str25.rsplit(" "))
actual = [" this is a", "sentence", "with", "whitespace"]
self.assertEqual(actual, str25.rsplit(None, 3))
actual = [" this is a sentence with", "", "whitespace", ""]
self.assertEqual(actual, str25.rsplit(" ", 3))
if py3k:
actual = [" this is a", "sentence", "with", "whitespace"]
self.assertEqual(actual, str25.rsplit(maxsplit=3))

self.assertEqual("fake string", str1.rstrip())
self.assertEqual(" fake string", str23.rstrip())
self.assertEqual("fake stri", str1.rstrip("ngr"))

actual = ["this", "is", "a", "sentence", "with", "whitespace"]
self.assertEqual(actual, str25.split())
self.assertEqual(actual, str25.split(None))
actual = ["", "", "", "this", "is", "a", "", "", "sentence", "with",
"", "whitespace", ""]
self.assertEqual(actual, str25.split(" "))
actual = ["this", "is", "a", "sentence with whitespace "]
self.assertEqual(actual, str25.split(None, 3))
actual = ["", "", "", "this is a sentence with whitespace "]
self.assertEqual(actual, str25.split(" ", 3))
if py3k:
actual = ["this", "is", "a", "sentence with whitespace "]
self.assertEqual(actual, str25.split(maxsplit=3))

str26 = _FakeString("lines\nof\ntext\r\nare\r\npresented\nhere")
self.assertEqual(["lines", "of", "text", "are", "presented", "here"],
str26.splitlines())
self.assertEqual(["lines\n", "of\n", "text\r\n", "are\r\n",
"presented\n", "here"], str26.splitlines(True))

self.assertTrue(str1.startswith("fake"))
self.assertFalse(str1.startswith("faker"))

self.assertEqual("fake string", str1.strip())
self.assertEqual("fake string", str23.strip())
self.assertEqual("ke stri", str1.strip("abcdefngr"))

self.assertEqual("fOObAR", str16.swapcase())

self.assertEqual("Fake String", str1.title())

if py3k:
table1 = StringMixIn.maketrans({97: "1", 101: "2", 105: "3",
111: "4", 117: "5"})
table2 = StringMixIn.maketrans("aeiou", "12345")
table3 = StringMixIn.maketrans("aeiou", "12345", "rts")
self.assertEqual("f1k2 str3ng", str1.translate(table1))
self.assertEqual("f1k2 str3ng", str1.translate(table2))
self.assertEqual("f1k2 3ng", str1.translate(table3))
else:
table = {97: "1", 101: "2", 105: "3", 111: "4", 117: "5"}
self.assertEqual("f1k2 str3ng", str1.translate(table))

self.assertEqual("", str15.upper())
self.assertEqual("FOOBAR", str16.upper())

self.assertEqual("123", str12.zfill(3))
self.assertEqual("000123", str12.zfill(6))

if __name__ == "__main__":
unittest.main(verbosity=2)

+ 332
- 74
tests/test_template.py View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012 Ben Kurtovic <ben.kurtovic@verizon.net>
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
@@ -20,87 +20,345 @@
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from itertools import permutations
from __future__ import unicode_literals
import unittest

from mwparserfromhell.parameter import Parameter
from mwparserfromhell.template import Template
from mwparserfromhell.compat import str
from mwparserfromhell.nodes import HTMLEntity, Template, Text
from mwparserfromhell.nodes.extras import Parameter
from ._test_tree_equality import TreeEqualityTestCase, getnodes, wrap, wraptext

class TestTemplate(unittest.TestCase):
def setUp(self):
self.name = "foo"
self.bar = Parameter("1", "bar")
self.baz = Parameter("2", "baz")
self.eggs = Parameter("eggs", "spam")
self.params = [self.bar, self.baz, self.eggs]
pgens = lambda k, v: Parameter(wraptext(k), wraptext(v), showkey=True)
pgenh = lambda k, v: Parameter(wraptext(k), wraptext(v), showkey=False)

def test_construct(self):
Template(self.name)
Template(self.name, self.params)
Template(name=self.name)
Template(name=self.name, params=self.params)
class TestTemplate(TreeEqualityTestCase):
"""Test cases for the Template node."""

def test_unicode(self):
"""test Template.__unicode__()"""
node = Template(wraptext("foobar"))
self.assertEqual("{{foobar}}", str(node))
node2 = Template(wraptext("foo"),
[pgenh("1", "bar"), pgens("abc", "def")])
self.assertEqual("{{foo|bar|abc=def}}", str(node2))

def test_iternodes(self):
"""test Template.__iternodes__()"""
node1n1 = Text("foobar")
node2n1, node2n2, node2n3 = Text("foo"), Text("bar"), Text("abc")
node2n4, node2n5 = Text("def"), Text("ghi")
node2p1 = Parameter(wraptext("1"), wrap([node2n2]), showkey=False)
node2p2 = Parameter(wrap([node2n3]), wrap([node2n4, node2n5]),
showkey=True)
node1 = Template(wrap([node1n1]))
node2 = Template(wrap([node2n1]), [node2p1, node2p2])

gen1 = node1.__iternodes__(getnodes)
gen2 = node2.__iternodes__(getnodes)
self.assertEqual((None, node1), next(gen1))
self.assertEqual((None, node2), next(gen2))
self.assertEqual((node1.name, node1n1), next(gen1))
self.assertEqual((node2.name, node2n1), next(gen2))
self.assertEqual((node2.params[0].value, node2n2), next(gen2))
self.assertEqual((node2.params[1].name, node2n3), next(gen2))
self.assertEqual((node2.params[1].value, node2n4), next(gen2))
self.assertEqual((node2.params[1].value, node2n5), next(gen2))
self.assertRaises(StopIteration, next, gen1)
self.assertRaises(StopIteration, next, gen2)

def test_strip(self):
"""test Template.__strip__()"""
node1 = Template(wraptext("foobar"))
node2 = Template(wraptext("foo"),
[pgenh("1", "bar"), pgens("abc", "def")])
for a in (True, False):
for b in (True, False):
self.assertEqual(None, node1.__strip__(a, b))
self.assertEqual(None, node2.__strip__(a, b))

def test_showtree(self):
"""test Template.__showtree__()"""
output = []
getter, marker = object(), object()
get = lambda code: output.append((getter, code))
mark = lambda: output.append(marker)
node1 = Template(wraptext("foobar"))
node2 = Template(wraptext("foo"),
[pgenh("1", "bar"), pgens("abc", "def")])
node1.__showtree__(output.append, get, mark)
node2.__showtree__(output.append, get, mark)
valid = [
"{{", (getter, node1.name), "}}", "{{", (getter, node2.name),
" | ", marker, (getter, node2.params[0].name), " = ", marker,
(getter, node2.params[0].value), " | ", marker,
(getter, node2.params[1].name), " = ", marker,
(getter, node2.params[1].value), "}}"]
self.assertEqual(valid, output)

def test_name(self):
templates = [
Template(self.name),
Template(self.name, self.params),
Template(name=self.name),
Template(name=self.name, params=self.params)
]
for template in templates:
self.assertEqual(template.name, self.name)
"""test getter/setter for the name attribute"""
name = wraptext("foobar")
node1 = Template(name)
node2 = Template(name, [pgenh("1", "bar")])
self.assertIs(name, node1.name)
self.assertIs(name, node2.name)
node1.name = "asdf"
node2.name = "téstïng"
self.assertWikicodeEqual(wraptext("asdf"), node1.name)
self.assertWikicodeEqual(wraptext("téstïng"), node2.name)

def test_params(self):
for template in (Template(self.name), Template(name=self.name)):
self.assertEqual(template.params, [])
for template in (Template(self.name, self.params),
Template(name=self.name, params=self.params)):
self.assertEqual(template.params, self.params)

def test_getitem(self):
template = Template(name=self.name, params=self.params)
self.assertIs(template[0], self.bar)
self.assertIs(template[1], self.baz)
self.assertIs(template[2], self.eggs)
self.assertIs(template["1"], self.bar)
self.assertIs(template["2"], self.baz)
self.assertIs(template["eggs"], self.eggs)

def test_render(self):
tests = [
(Template(self.name), "{{foo}}"),
(Template(self.name, self.params), "{{foo|bar|baz|eggs=spam}}")
]
for template, rendered in tests:
self.assertEqual(template.render(), rendered)

def test_repr(self):
correct1= 'Template(name=foo, params={})'
correct2 = 'Template(name=foo, params={"1": "bar", "2": "baz", "eggs": "spam"})'
tests = [(Template(self.name), correct1),
(Template(self.name, self.params), correct2)]
for template, correct in tests:
self.assertEqual(repr(template), correct)
self.assertEqual(str(template), correct)

def test_cmp(self):
tmp1 = Template(self.name)
tmp2 = Template(name=self.name)
tmp3 = Template(self.name, [])
tmp4 = Template(name=self.name, params=[])
tmp5 = Template(self.name, self.params)
tmp6 = Template(name=self.name, params=self.params)

for tmpA, tmpB in permutations((tmp1, tmp2, tmp3, tmp4), 2):
self.assertEqual(tmpA, tmpB)

for tmpA, tmpB in permutations((tmp5, tmp6), 2):
self.assertEqual(tmpA, tmpB)

for tmpA in (tmp5, tmp6):
for tmpB in (tmp1, tmp2, tmp3, tmp4):
self.assertNotEqual(tmpA, tmpB)
self.assertNotEqual(tmpB, tmpA)
"""test getter for the params attribute"""
node1 = Template(wraptext("foobar"))
plist = [pgenh("1", "bar"), pgens("abc", "def")]
node2 = Template(wraptext("foo"), plist)
self.assertEqual([], node1.params)
self.assertIs(plist, node2.params)

def test_has_param(self):
"""test Template.has_param()"""
node1 = Template(wraptext("foobar"))
node2 = Template(wraptext("foo"),
[pgenh("1", "bar"), pgens("\nabc ", "def")])
node3 = Template(wraptext("foo"),
[pgenh("1", "a"), pgens("b", "c"), pgens("1", "d")])
node4 = Template(wraptext("foo"), [pgenh("1", "a"), pgens("b", " ")])
self.assertFalse(node1.has_param("foobar"))
self.assertTrue(node2.has_param(1))
self.assertTrue(node2.has_param("abc"))
self.assertFalse(node2.has_param("def"))
self.assertTrue(node3.has_param("1"))
self.assertTrue(node3.has_param(" b "))
self.assertFalse(node4.has_param("b"))
self.assertTrue(node3.has_param("b", False))
self.assertTrue(node4.has_param("b", False))

def test_get(self):
"""test Template.get()"""
node1 = Template(wraptext("foobar"))
node2p1 = pgenh("1", "bar")
node2p2 = pgens("abc", "def")
node2 = Template(wraptext("foo"), [node2p1, node2p2])
node3p1 = pgens("b", "c")
node3p2 = pgens("1", "d")
node3 = Template(wraptext("foo"), [pgenh("1", "a"), node3p1, node3p2])
node4p1 = pgens(" b", " ")
node4 = Template(wraptext("foo"), [pgenh("1", "a"), node4p1])
self.assertRaises(ValueError, node1.get, "foobar")
self.assertIs(node2p1, node2.get(1))
self.assertIs(node2p2, node2.get("abc"))
self.assertRaises(ValueError, node2.get, "def")
self.assertIs(node3p1, node3.get("b"))
self.assertIs(node3p2, node3.get("1"))
self.assertIs(node4p1, node4.get("b "))

def test_add(self):
"""test Template.add()"""
node1 = Template(wraptext("a"), [pgens("b", "c"), pgenh("1", "d")])
node2 = Template(wraptext("a"), [pgens("b", "c"), pgenh("1", "d")])
node3 = Template(wraptext("a"), [pgens("b", "c"), pgenh("1", "d")])
node4 = Template(wraptext("a"), [pgens("b", "c"), pgenh("1", "d")])
node5 = Template(wraptext("a"), [pgens("b", "c"),
pgens(" d ", "e")])
node6 = Template(wraptext("a"), [pgens("b", "c"), pgens("b", "d"),
pgens("b", "e")])
node7 = Template(wraptext("a"), [pgens("b", "c"), pgenh("1", "d")])
node8p = pgenh("1", "d")
node8 = Template(wraptext("a"), [pgens("b", "c"), node8p])
node9 = Template(wraptext("a"), [pgens("b", "c"), pgenh("1", "d")])
node10 = Template(wraptext("a"), [pgens("b", "c"), pgenh("1", "e")])
node11 = Template(wraptext("a"), [pgens("b", "c")])
node12 = Template(wraptext("a"), [pgens("b", "c")])
node13 = Template(wraptext("a"), [
pgens("\nb ", " c"), pgens("\nd ", " e"), pgens("\nf ", " g")])
node14 = Template(wraptext("a\n"), [
pgens("b ", "c\n"), pgens("d ", " e"), pgens("f ", "g\n"),
pgens("h ", " i\n")])
node15 = Template(wraptext("a"), [
pgens("b ", " c\n"), pgens("\nd ", " e"), pgens("\nf ", "g ")])
node16 = Template(wraptext("a"), [
pgens("\nb ", " c"), pgens("\nd ", " e"), pgens("\nf ", " g")])
node17 = Template(wraptext("a"), [
pgens("\nb ", " c"), pgens("\nd ", " e"), pgens("\nf ", " g")])
node18 = Template(wraptext("a\n"), [
pgens("b ", "c\n"), pgens("d ", " e"), pgens("f ", "g\n"),
pgens("h ", " i\n")])
node19 = Template(wraptext("a"), [
pgens("b ", " c\n"), pgens("\nd ", " e"), pgens("\nf ", "g ")])
node20 = Template(wraptext("a"), [
pgens("\nb ", " c"), pgens("\nd ", " e"), pgens("\nf ", " g")])
node21 = Template(wraptext("a"), [pgenh("1", "b")])
node22 = Template(wraptext("a"), [pgenh("1", "b")])
node23 = Template(wraptext("a"), [pgenh("1", "b")])
node24 = Template(wraptext("a"), [pgenh("1", "b"), pgenh("2", "c"),
pgenh("3", "d"), pgenh("4", "e")])
node25 = Template(wraptext("a"), [pgenh("1", "b"), pgenh("2", "c"),
pgens("4", "d"), pgens("5", "e")])
node26 = Template(wraptext("a"), [pgenh("1", "b"), pgenh("2", "c"),
pgens("4", "d"), pgens("5", "e")])
node27 = Template(wraptext("a"), [pgenh("1", "b")])
node28 = Template(wraptext("a"), [pgenh("1", "b")])
node29 = Template(wraptext("a"), [pgens("b", "c")])
node30 = Template(wraptext("a"), [pgenh("1", "b")])
node31 = Template(wraptext("a"), [pgenh("1", "b")])
node32 = Template(wraptext("a"), [pgens("1", "b")])
node33 = Template(wraptext("a"), [
pgens("\nb ", " c"), pgens("\nd ", " e"), pgens("\nf ", " g")])
node34 = Template(wraptext("a\n"), [
pgens("b ", "c\n"), pgens("d ", " e"), pgens("f ", "g\n"),
pgens("h ", " i\n")])
node35 = Template(wraptext("a"), [
pgens("b ", " c\n"), pgens("\nd ", " e"), pgens("\nf ", "g ")])
node36 = Template(wraptext("a"), [
pgens("\nb ", " c "), pgens("\nd ", " e "), pgens("\nf ", " g ")])
node37 = Template(wraptext("a"), [pgens("b", "c"), pgens("d", "e"),
pgens("b", "f"), pgens("b", "h"),
pgens("i", "j")])
node37 = Template(wraptext("a"), [pgens("b", "c"), pgens("d", "e"),
pgens("b", "f"), pgens("b", "h"),
pgens("i", "j")])
node38 = Template(wraptext("a"), [pgens("1", "b"), pgens("x", "y"),
pgens("1", "c"), pgens("2", "d")])
node39 = Template(wraptext("a"), [pgens("1", "b"), pgens("x", "y"),
pgenh("1", "c"), pgenh("2", "d")])
node40 = Template(wraptext("a"), [pgens("b", "c"), pgens("d", "e"),
pgens("f", "g")])

node1.add("e", "f", showkey=True)
node2.add(2, "g", showkey=False)
node3.add("e", "foo|bar", showkey=True)
node4.add("e", "f", showkey=True, before="b")
node5.add("f", "g", showkey=True, before=" d ")
node6.add("f", "g", showkey=True, before="b")
self.assertRaises(ValueError, node7.add, "e", "f", showkey=True,
before="q")
node8.add("e", "f", showkey=True, before=node8p)
node9.add("e", "f", showkey=True, before=pgenh("1", "d"))
self.assertRaises(ValueError, node10.add, "e", "f", showkey=True,
before=pgenh("1", "d"))
node11.add("d", "foo=bar", showkey=True)
node12.add("1", "foo=bar", showkey=False)
node13.add("h", "i", showkey=True)
node14.add("j", "k", showkey=True)
node15.add("h", "i", showkey=True)
node16.add("h", "i", showkey=True, preserve_spacing=False)
node17.add("h", "i", showkey=False)
node18.add("j", "k", showkey=False)
node19.add("h", "i", showkey=False)
node20.add("h", "i", showkey=False, preserve_spacing=False)
node21.add("2", "c")
node22.add("3", "c")
node23.add("c", "d")
node24.add("5", "f")
node25.add("3", "f")
node26.add("6", "f")
node27.add("c", "foo=bar")
node28.add("2", "foo=bar")
node29.add("b", "d")
node30.add("1", "foo=bar")
node31.add("1", "foo=bar", showkey=True)
node32.add("1", "foo=bar", showkey=False)
node33.add("d", "foo")
node34.add("f", "foo")
node35.add("f", "foo")
node36.add("d", "foo", preserve_spacing=False)
node37.add("b", "k")
node38.add("1", "e")
node39.add("1", "e")
node40.add("d", "h", before="b")

self.assertEqual("{{a|b=c|d|e=f}}", node1)
self.assertEqual("{{a|b=c|d|g}}", node2)
self.assertEqual("{{a|b=c|d|e=foo&#124;bar}}", node3)
self.assertIsInstance(node3.params[2].value.get(1), HTMLEntity)
self.assertEqual("{{a|e=f|b=c|d}}", node4)
self.assertEqual("{{a|b=c|f=g| d =e}}", node5)
self.assertEqual("{{a|b=c|b=d|f=g|b=e}}", node6)
self.assertEqual("{{a|b=c|d}}", node7)
self.assertEqual("{{a|b=c|e=f|d}}", node8)
self.assertEqual("{{a|b=c|e=f|d}}", node9)
self.assertEqual("{{a|b=c|e}}", node10)
self.assertEqual("{{a|b=c|d=foo=bar}}", node11)
self.assertEqual("{{a|b=c|foo&#61;bar}}", node12)
self.assertIsInstance(node12.params[1].value.get(1), HTMLEntity)
self.assertEqual("{{a|\nb = c|\nd = e|\nf = g|\nh = i}}", node13)
self.assertEqual("{{a\n|b =c\n|d = e|f =g\n|h = i\n|j =k\n}}", node14)
self.assertEqual("{{a|b = c\n|\nd = e|\nf =g |h =i}}", node15)
self.assertEqual("{{a|\nb = c|\nd = e|\nf = g|h=i}}", node16)
self.assertEqual("{{a|\nb = c|\nd = e|\nf = g| i}}", node17)
self.assertEqual("{{a\n|b =c\n|d = e|f =g\n|h = i\n|k\n}}", node18)
self.assertEqual("{{a|b = c\n|\nd = e|\nf =g |i}}", node19)
self.assertEqual("{{a|\nb = c|\nd = e|\nf = g|i}}", node20)
self.assertEqual("{{a|b|c}}", node21)
self.assertEqual("{{a|b|3=c}}", node22)
self.assertEqual("{{a|b|c=d}}", node23)
self.assertEqual("{{a|b|c|d|e|f}}", node24)
self.assertEqual("{{a|b|c|4=d|5=e|f}}", node25)
self.assertEqual("{{a|b|c|4=d|5=e|6=f}}", node26)
self.assertEqual("{{a|b|c=foo=bar}}", node27)
self.assertEqual("{{a|b|foo&#61;bar}}", node28)
self.assertIsInstance(node28.params[1].value.get(1), HTMLEntity)
self.assertEqual("{{a|b=d}}", node29)
self.assertEqual("{{a|foo&#61;bar}}", node30)
self.assertIsInstance(node30.params[0].value.get(1), HTMLEntity)
self.assertEqual("{{a|1=foo=bar}}", node31)
self.assertEqual("{{a|foo&#61;bar}}", node32)
self.assertIsInstance(node32.params[0].value.get(1), HTMLEntity)
self.assertEqual("{{a|\nb = c|\nd = foo|\nf = g}}", node33)
self.assertEqual("{{a\n|b =c\n|d = e|f =foo\n|h = i\n}}", node34)
self.assertEqual("{{a|b = c\n|\nd = e|\nf =foo }}", node35)
self.assertEqual("{{a|\nb = c |\nd =foo|\nf = g }}", node36)
self.assertEqual("{{a|b=k|d=e|i=j}}", node37)
self.assertEqual("{{a|1=e|x=y|2=d}}", node38)
self.assertEqual("{{a|x=y|e|d}}", node39)
self.assertEqual("{{a|b=c|d=h|f=g}}", node40)

def test_remove(self):
"""test Template.remove()"""
node1 = Template(wraptext("foobar"))
node2 = Template(wraptext("foo"), [pgenh("1", "bar"),
pgens("abc", "def")])
node3 = Template(wraptext("foo"), [pgenh("1", "bar"),
pgens("abc", "def")])
node4 = Template(wraptext("foo"), [pgenh("1", "bar"),
pgenh("2", "baz")])
node5 = Template(wraptext("foo"), [
pgens(" a", "b"), pgens("b", "c"), pgens("a ", "d")])
node6 = Template(wraptext("foo"), [
pgens(" a", "b"), pgens("b", "c"), pgens("a ", "d")])
node7 = Template(wraptext("foo"), [
pgens("1 ", "a"), pgens(" 1", "b"), pgens("2", "c")])
node8 = Template(wraptext("foo"), [
pgens("1 ", "a"), pgens(" 1", "b"), pgens("2", "c")])
node9 = Template(wraptext("foo"), [
pgens("1 ", "a"), pgenh("1", "b"), pgenh("2", "c")])
node10 = Template(wraptext("foo"), [
pgens("1 ", "a"), pgenh("1", "b"), pgenh("2", "c")])

node2.remove("1")
node2.remove("abc")
node3.remove(1, keep_field=True)
node3.remove("abc", keep_field=True)
node4.remove("1", keep_field=False)
node5.remove("a", keep_field=False)
node6.remove("a", keep_field=True)
node7.remove(1, keep_field=True)
node8.remove(1, keep_field=False)
node9.remove(1, keep_field=True)
node10.remove(1, keep_field=False)

self.assertRaises(ValueError, node1.remove, 1)
self.assertRaises(ValueError, node1.remove, "a")
self.assertRaises(ValueError, node2.remove, "1")
self.assertEqual("{{foo}}", node2)
self.assertEqual("{{foo||abc=}}", node3)
self.assertEqual("{{foo||baz}}", node4)
self.assertEqual("{{foo|b=c}}", node5)
self.assertEqual("{{foo| a=|b=c}}", node6)
self.assertEqual("{{foo|1 =|2=c}}", node7)
self.assertEqual("{{foo|2=c}}", node8)
self.assertEqual("{{foo||c}}", node9)
self.assertEqual("{{foo||c}}", node10)

if __name__ == "__main__":
unittest.main(verbosity=2)

+ 75
- 0
tests/test_text.py View File

@@ -0,0 +1,75 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import unicode_literals
import unittest

from mwparserfromhell.compat import str
from mwparserfromhell.nodes import Text

class TestText(unittest.TestCase):
"""Test cases for the Text node."""

def test_unicode(self):
"""test Text.__unicode__()"""
node = Text("foobar")
self.assertEqual("foobar", str(node))
node2 = Text("fóóbar")
self.assertEqual("fóóbar", str(node2))

def test_iternodes(self):
"""test Text.__iternodes__()"""
node = Text("foobar")
gen = node.__iternodes__(None)
self.assertEqual((None, node), next(gen))
self.assertRaises(StopIteration, next, gen)

def test_strip(self):
"""test Text.__strip__()"""
node = Text("foobar")
for a in (True, False):
for b in (True, False):
self.assertIs(node, node.__strip__(a, b))

def test_showtree(self):
"""test Text.__showtree__()"""
output = []
node1 = Text("foobar")
node2 = Text("fóóbar")
node3 = Text("𐌲𐌿𐍄")
node1.__showtree__(output.append, None, None)
node2.__showtree__(output.append, None, None)
node3.__showtree__(output.append, None, None)
res = ["foobar", r"f\xf3\xf3bar", "\\U00010332\\U0001033f\\U00010344"]
self.assertEqual(res, output)

def test_value(self):
"""test getter/setter for the value attribute"""
node = Text("foobar")
self.assertEqual("foobar", node.value)
self.assertIsInstance(node.value, str)
node.value = "héhéhé"
self.assertEqual("héhéhé", node.value)
self.assertIsInstance(node.value, str)

if __name__ == "__main__":
unittest.main(verbosity=2)

+ 108
- 0
tests/test_tokens.py View File

@@ -0,0 +1,108 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import unicode_literals
import unittest

from mwparserfromhell.compat import py3k
from mwparserfromhell.parser import tokens

class TestTokens(unittest.TestCase):
"""Test cases for the Token class and its subclasses."""

def test_issubclass(self):
"""check that all classes within the tokens module are really Tokens"""
for name in tokens.__all__:
klass = getattr(tokens, name)
self.assertTrue(issubclass(klass, tokens.Token))
self.assertIsInstance(klass(), klass)
self.assertIsInstance(klass(), tokens.Token)

def test_attributes(self):
"""check that Token attributes can be managed properly"""
token1 = tokens.Token()
token2 = tokens.Token(foo="bar", baz=123)

self.assertEqual("bar", token2.foo)
self.assertEqual(123, token2.baz)
self.assertRaises(KeyError, lambda: token1.foo)
self.assertRaises(KeyError, lambda: token2.bar)

token1.spam = "eggs"
token2.foo = "ham"
del token2.baz

self.assertEqual("eggs", token1.spam)
self.assertEqual("ham", token2.foo)
self.assertRaises(KeyError, lambda: token2.baz)
self.assertRaises(KeyError, delattr, token2, "baz")

def test_repr(self):
"""check that repr() on a Token works as expected"""
token1 = tokens.Token()
token2 = tokens.Token(foo="bar", baz=123)
token3 = tokens.Text(text="earwig" * 100)
hundredchars = ("earwig" * 100)[:97] + "..."

self.assertEqual("Token()", repr(token1))
if py3k:
token2repr1 = "Token(foo='bar', baz=123)"
token2repr2 = "Token(baz=123, foo='bar')"
token3repr = "Text(text='" + hundredchars + "')"
else:
token2repr1 = "Token(foo=u'bar', baz=123)"
token2repr2 = "Token(baz=123, foo=u'bar')"
token3repr = "Text(text=u'" + hundredchars + "')"
token2repr = repr(token2)
self.assertTrue(token2repr == token2repr1 or token2repr == token2repr2)
self.assertEqual(token3repr, repr(token3))

def test_equality(self):
"""check that equivalent tokens are considered equal"""
token1 = tokens.Token()
token2 = tokens.Token()
token3 = tokens.Token(foo="bar", baz=123)
token4 = tokens.Text(text="asdf")
token5 = tokens.Text(text="asdf")
token6 = tokens.TemplateOpen(text="asdf")

self.assertEqual(token1, token2)
self.assertEqual(token2, token1)
self.assertEqual(token4, token5)
self.assertEqual(token5, token4)
self.assertNotEqual(token1, token3)
self.assertNotEqual(token2, token3)
self.assertNotEqual(token4, token6)
self.assertNotEqual(token5, token6)

def test_repr_equality(self):
"check that eval(repr(token)) == token"
tests = [
tokens.Token(),
tokens.Token(foo="bar", baz=123),
tokens.Text(text="earwig")
]
for token in tests:
self.assertEqual(token, eval(repr(token), vars(tokens)))

if __name__ == "__main__":
unittest.main(verbosity=2)

+ 62
- 0
tests/test_utils.py View File

@@ -0,0 +1,62 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import unicode_literals
import unittest

from mwparserfromhell.nodes import Template, Text
from mwparserfromhell.utils import parse_anything

from ._test_tree_equality import TreeEqualityTestCase, wrap, wraptext

class TestUtils(TreeEqualityTestCase):
"""Tests for the utils module, which provides parse_anything()."""

def test_parse_anything_valid(self):
"""tests for valid input to utils.parse_anything()"""
tests = [
(wraptext("foobar"), wraptext("foobar")),
(Template(wraptext("spam")), wrap([Template(wraptext("spam"))])),
("fóóbar", wraptext("fóóbar")),
(b"foob\xc3\xa1r", wraptext("foobár")),
(123, wraptext("123")),
(True, wraptext("True")),
(None, wrap([])),
([Text("foo"), Text("bar"), Text("baz")],
wraptext("foo", "bar", "baz")),
([wraptext("foo"), Text("bar"), "baz", 123, 456],
wraptext("foo", "bar", "baz", "123", "456")),
([[[([[((("foo",),),)], "bar"],)]]], wraptext("foo", "bar"))
]
for test, valid in tests:
self.assertWikicodeEqual(valid, parse_anything(test))

def test_parse_anything_invalid(self):
"""tests for invalid input to utils.parse_anything()"""
self.assertRaises(ValueError, parse_anything, Ellipsis)
self.assertRaises(ValueError, parse_anything, object)
self.assertRaises(ValueError, parse_anything, object())
self.assertRaises(ValueError, parse_anything, type)
self.assertRaises(ValueError, parse_anything, ["foo", [object]])

if __name__ == "__main__":
unittest.main(verbosity=2)

+ 364
- 0
tests/test_wikicode.py View File

@@ -0,0 +1,364 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import unicode_literals
import re
from types import GeneratorType
import unittest

from mwparserfromhell.nodes import (Argument, Comment, Heading, HTMLEntity,
Node, Tag, Template, Text, Wikilink)
from mwparserfromhell.smart_list import SmartList
from mwparserfromhell.wikicode import Wikicode
from mwparserfromhell import parse
from mwparserfromhell.compat import py3k, str

from ._test_tree_equality import TreeEqualityTestCase, wrap, wraptext

class TestWikicode(TreeEqualityTestCase):
"""Tests for the Wikicode class, which manages a list of nodes."""

def test_unicode(self):
"""test Wikicode.__unicode__()"""
code1 = parse("foobar")
code2 = parse("Have a {{template}} and a [[page|link]]")
self.assertEqual("foobar", str(code1))
self.assertEqual("Have a {{template}} and a [[page|link]]", str(code2))

def test_nodes(self):
"""test getter/setter for the nodes attribute"""
code = parse("Have a {{template}}")
self.assertEqual(["Have a ", "{{template}}"], code.nodes)
L1 = SmartList([Text("foobar"), Template(wraptext("abc"))])
L2 = [Text("barfoo"), Template(wraptext("cba"))]
L3 = "abc{{def}}"
code.nodes = L1
self.assertIs(L1, code.nodes)
code.nodes = L2
self.assertIs(L2, code.nodes)
code.nodes = L3
self.assertEqual(["abc", "{{def}}"], code.nodes)
self.assertRaises(ValueError, setattr, code, "nodes", object)

def test_get(self):
"""test Wikicode.get()"""
code = parse("Have a {{template}} and a [[page|link]]")
self.assertIs(code.nodes[0], code.get(0))
self.assertIs(code.nodes[2], code.get(2))
self.assertRaises(IndexError, code.get, 4)

def test_set(self):
"""test Wikicode.set()"""
code = parse("Have a {{template}} and a [[page|link]]")
code.set(1, "{{{argument}}}")
self.assertEqual("Have a {{{argument}}} and a [[page|link]]", code)
self.assertIsInstance(code.get(1), Argument)
code.set(2, None)
self.assertEqual("Have a {{{argument}}}[[page|link]]", code)
code.set(-3, "This is an ")
self.assertEqual("This is an {{{argument}}}[[page|link]]", code)
self.assertRaises(ValueError, code.set, 1, "foo {{bar}}")
self.assertRaises(IndexError, code.set, 3, "{{baz}}")
self.assertRaises(IndexError, code.set, -4, "{{baz}}")

def test_index(self):
"""test Wikicode.index()"""
code = parse("Have a {{template}} and a [[page|link]]")
self.assertEqual(0, code.index("Have a "))
self.assertEqual(3, code.index("[[page|link]]"))
self.assertEqual(1, code.index(code.get(1)))
self.assertRaises(ValueError, code.index, "foo")

code = parse("{{foo}}{{bar|{{baz}}}}")
self.assertEqual(1, code.index("{{bar|{{baz}}}}"))
self.assertEqual(1, code.index("{{baz}}", recursive=True))
self.assertEqual(1, code.index(code.get(1).get(1).value,
recursive=True))
self.assertRaises(ValueError, code.index, "{{baz}}", recursive=False)
self.assertRaises(ValueError, code.index,
code.get(1).get(1).value, recursive=False)

def test_insert(self):
"""test Wikicode.insert()"""
code = parse("Have a {{template}} and a [[page|link]]")
code.insert(1, "{{{argument}}}")
self.assertEqual(
"Have a {{{argument}}}{{template}} and a [[page|link]]", code)
self.assertIsInstance(code.get(1), Argument)
code.insert(2, None)
self.assertEqual(
"Have a {{{argument}}}{{template}} and a [[page|link]]", code)
code.insert(-3, Text("foo"))
self.assertEqual(
"Have a {{{argument}}}foo{{template}} and a [[page|link]]", code)

code2 = parse("{{foo}}{{bar}}{{baz}}")
code2.insert(1, "abc{{def}}ghi[[jk]]")
self.assertEqual("{{foo}}abc{{def}}ghi[[jk]]{{bar}}{{baz}}", code2)
self.assertEqual(["{{foo}}", "abc", "{{def}}", "ghi", "[[jk]]",
"{{bar}}", "{{baz}}"], code2.nodes)

code3 = parse("{{foo}}bar")
code3.insert(1000, "[[baz]]")
code3.insert(-1000, "derp")
self.assertEqual("derp{{foo}}bar[[baz]]", code3)

def test_insert_before(self):
"""test Wikicode.insert_before()"""
code = parse("{{a}}{{b}}{{c}}{{d}}")
code.insert_before("{{b}}", "x", recursive=True)
code.insert_before("{{d}}", "[[y]]", recursive=False)
self.assertEqual("{{a}}x{{b}}{{c}}[[y]]{{d}}", code)
code.insert_before(code.get(2), "z")
self.assertEqual("{{a}}xz{{b}}{{c}}[[y]]{{d}}", code)
self.assertRaises(ValueError, code.insert_before, "{{r}}", "n",
recursive=True)
self.assertRaises(ValueError, code.insert_before, "{{r}}", "n",
recursive=False)

code2 = parse("{{a|{{b}}|{{c|d={{f}}}}}}")
code2.insert_before(code2.get(0).params[0].value.get(0), "x",
recursive=True)
code2.insert_before("{{f}}", "y", recursive=True)
self.assertEqual("{{a|x{{b}}|{{c|d=y{{f}}}}}}", code2)
self.assertRaises(ValueError, code2.insert_before, "{{f}}", "y",
recursive=False)

def test_insert_after(self):
"""test Wikicode.insert_after()"""
code = parse("{{a}}{{b}}{{c}}{{d}}")
code.insert_after("{{b}}", "x", recursive=True)
code.insert_after("{{d}}", "[[y]]", recursive=False)
self.assertEqual("{{a}}{{b}}x{{c}}{{d}}[[y]]", code)
code.insert_after(code.get(2), "z")
self.assertEqual("{{a}}{{b}}xz{{c}}{{d}}[[y]]", code)
self.assertRaises(ValueError, code.insert_after, "{{r}}", "n",
recursive=True)
self.assertRaises(ValueError, code.insert_after, "{{r}}", "n",
recursive=False)

code2 = parse("{{a|{{b}}|{{c|d={{f}}}}}}")
code2.insert_after(code2.get(0).params[0].value.get(0), "x",
recursive=True)
code2.insert_after("{{f}}", "y", recursive=True)
self.assertEqual("{{a|{{b}}x|{{c|d={{f}}y}}}}", code2)
self.assertRaises(ValueError, code2.insert_after, "{{f}}", "y",
recursive=False)

def test_replace(self):
"""test Wikicode.replace()"""
code = parse("{{a}}{{b}}{{c}}{{d}}")
code.replace("{{b}}", "x", recursive=True)
code.replace("{{d}}", "[[y]]", recursive=False)
self.assertEqual("{{a}}x{{c}}[[y]]", code)
code.replace(code.get(1), "z")
self.assertEqual("{{a}}z{{c}}[[y]]", code)
self.assertRaises(ValueError, code.replace, "{{r}}", "n",
recursive=True)
self.assertRaises(ValueError, code.replace, "{{r}}", "n",
recursive=False)

code2 = parse("{{a|{{b}}|{{c|d={{f}}}}}}")
code2.replace(code2.get(0).params[0].value.get(0), "x", recursive=True)
code2.replace("{{f}}", "y", recursive=True)
self.assertEqual("{{a|x|{{c|d=y}}}}", code2)
self.assertRaises(ValueError, code2.replace, "y", "z", recursive=False)

def test_append(self):
"""test Wikicode.append()"""
code = parse("Have a {{template}}")
code.append("{{{argument}}}")
self.assertEqual("Have a {{template}}{{{argument}}}", code)
self.assertIsInstance(code.get(2), Argument)
code.append(None)
self.assertEqual("Have a {{template}}{{{argument}}}", code)
code.append(Text(" foo"))
self.assertEqual("Have a {{template}}{{{argument}}} foo", code)
self.assertRaises(ValueError, code.append, slice(0, 1))

def test_remove(self):
"""test Wikicode.remove()"""
code = parse("{{a}}{{b}}{{c}}{{d}}")
code.remove("{{b}}", recursive=True)
code.remove(code.get(1), recursive=True)
self.assertEqual("{{a}}{{d}}", code)
self.assertRaises(ValueError, code.remove, "{{r}}", recursive=True)
self.assertRaises(ValueError, code.remove, "{{r}}", recursive=False)

code2 = parse("{{a|{{b}}|{{c|d={{f}}{{h}}}}}}")
code2.remove(code2.get(0).params[0].value.get(0), recursive=True)
code2.remove("{{f}}", recursive=True)
self.assertEqual("{{a||{{c|d={{h}}}}}}", code2)
self.assertRaises(ValueError, code2.remove, "{{h}}", recursive=False)

def test_filter_family(self):
"""test the Wikicode.i?filter() family of functions"""
def genlist(gen):
self.assertIsInstance(gen, GeneratorType)
return list(gen)
ifilter = lambda code: (lambda **kw: genlist(code.ifilter(**kw)))

code = parse("a{{b}}c[[d]]{{{e}}}{{f}}[[g]]")
for func in (code.filter, ifilter(code)):
self.assertEqual(["a", "{{b}}", "c", "[[d]]", "{{{e}}}", "{{f}}",
"[[g]]"], func())
self.assertEqual(["{{{e}}}"], func(forcetype=Argument))
self.assertIs(code.get(4), func(forcetype=Argument)[0])
self.assertEqual(["a", "c"], func(forcetype=Text))
self.assertEqual([], func(forcetype=Heading))
self.assertRaises(TypeError, func, forcetype=True)

funcs = [
lambda name, **kw: getattr(code, "filter_" + name)(**kw),
lambda name, **kw: genlist(getattr(code, "ifilter_" + name)(**kw))
]
for get_filter in funcs:
self.assertEqual(["{{{e}}}"], get_filter("arguments"))
self.assertIs(code.get(4), get_filter("arguments")[0])
self.assertEqual([], get_filter("comments"))
self.assertEqual([], get_filter("headings"))
self.assertEqual([], get_filter("html_entities"))
self.assertEqual([], get_filter("tags"))
self.assertEqual(["{{b}}", "{{f}}"], get_filter("templates"))
self.assertEqual(["a", "c"], get_filter("text"))
self.assertEqual(["[[d]]", "[[g]]"], get_filter("wikilinks"))

code2 = parse("{{a|{{b}}|{{c|d={{f}}{{h}}}}}}")
for func in (code2.filter, ifilter(code2)):
self.assertEqual(["{{a|{{b}}|{{c|d={{f}}{{h}}}}}}"],
func(recursive=False, forcetype=Template))
self.assertEqual(["{{a|{{b}}|{{c|d={{f}}{{h}}}}}}", "{{b}}",
"{{c|d={{f}}{{h}}}}", "{{f}}", "{{h}}"],
func(recursive=True, forcetype=Template))

code3 = parse("{{foobar}}{{FOO}}{{baz}}{{bz}}")
for func in (code3.filter, ifilter(code3)):
self.assertEqual(["{{foobar}}", "{{FOO}}"], func(matches=r"foo"))
self.assertEqual(["{{foobar}}", "{{FOO}}"],
func(matches=r"^{{foo.*?}}"))
self.assertEqual(["{{foobar}}"],
func(matches=r"^{{foo.*?}}", flags=re.UNICODE))
self.assertEqual(["{{baz}}", "{{bz}}"], func(matches=r"^{{b.*?z"))
self.assertEqual(["{{baz}}"], func(matches=r"^{{b.+?z}}"))

self.assertEqual(["{{a|{{b}}|{{c|d={{f}}{{h}}}}}}"],
code2.filter_templates(recursive=False))
self.assertEqual(["{{a|{{b}}|{{c|d={{f}}{{h}}}}}}", "{{b}}",
"{{c|d={{f}}{{h}}}}", "{{f}}", "{{h}}"],
code2.filter_templates(recursive=True))
self.assertEqual(["{{baz}}", "{{bz}}"],
code3.filter_templates(matches=r"^{{b.*?z"))
self.assertEqual([], code3.filter_tags(matches=r"^{{b.*?z"))
self.assertEqual([], code3.filter_tags(matches=r"^{{b.*?z", flags=0))

self.assertRaises(TypeError, code.filter_templates, 100)
self.assertRaises(TypeError, code.filter_templates, a=42)
self.assertRaises(TypeError, code.filter_templates, forcetype=Template)

def test_get_sections(self):
"""test Wikicode.get_sections()"""
page1 = parse("")
page2 = parse("==Heading==")
page3 = parse("===Heading===\nFoo bar baz\n====Gnidaeh====\n")

p4_lead = "This is a lead.\n"
p4_IA = "=== Section I.A ===\nSection I.A [[body]].\n"
p4_IB1 = "==== Section I.B.1 ====\nSection I.B.1 body.\n\n&bull;Some content.\n\n"
p4_IB = "=== Section I.B ===\n" + p4_IB1
p4_I = "== Section I ==\nSection I body. {{and a|template}}\n" + p4_IA + p4_IB
p4_II = "== Section II ==\nSection II body.\n\n"
p4_IIIA1a = "===== Section III.A.1.a =====\nMore text.\n"
p4_IIIA2ai1 = "======= Section III.A.2.a.i.1 =======\nAn invalid section!"
p4_IIIA2 = "==== Section III.A.2 ====\nEven more text.\n" + p4_IIIA2ai1
p4_IIIA = "=== Section III.A ===\nText.\n" + p4_IIIA1a + p4_IIIA2
p4_III = "== Section III ==\n" + p4_IIIA
page4 = parse(p4_lead + p4_I + p4_II + p4_III)

self.assertEqual([], page1.get_sections())
self.assertEqual(["", "==Heading=="], page2.get_sections())
self.assertEqual(["", "===Heading===\nFoo bar baz\n====Gnidaeh====\n",
"====Gnidaeh====\n"], page3.get_sections())
self.assertEqual([p4_lead, p4_IA, p4_I, p4_IB, p4_IB1, p4_II,
p4_IIIA1a, p4_III, p4_IIIA, p4_IIIA2, p4_IIIA2ai1],
page4.get_sections())

self.assertEqual(["====Gnidaeh====\n"], page3.get_sections(levels=[4]))
self.assertEqual(["===Heading===\nFoo bar baz\n====Gnidaeh====\n"],
page3.get_sections(levels=(2, 3)))
self.assertEqual([], page3.get_sections(levels=[0]))
self.assertEqual(["", "====Gnidaeh====\n"],
page3.get_sections(levels=[4], include_lead=True))
self.assertEqual(["===Heading===\nFoo bar baz\n====Gnidaeh====\n",
"====Gnidaeh====\n"],
page3.get_sections(include_lead=False))

self.assertEqual([p4_IB1, p4_IIIA2], page4.get_sections(levels=[4]))
self.assertEqual([""], page2.get_sections(include_headings=False))
self.assertEqual(["\nSection I.B.1 body.\n\n&bull;Some content.\n\n",
"\nEven more text.\n" + p4_IIIA2ai1],
page4.get_sections(levels=[4],
include_headings=False))

self.assertEqual([], page4.get_sections(matches=r"body"))
self.assertEqual([p4_IA, p4_I, p4_IB, p4_IB1],
page4.get_sections(matches=r"Section\sI[.\s].*?"))
self.assertEqual([p4_IA, p4_IIIA1a, p4_IIIA, p4_IIIA2, p4_IIIA2ai1],
page4.get_sections(matches=r".*?a.*?"))
self.assertEqual([p4_IIIA1a, p4_IIIA2ai1],
page4.get_sections(matches=r".*?a.*?", flags=re.U))
self.assertEqual(["\nMore text.\n", "\nAn invalid section!"],
page4.get_sections(matches=r".*?a.*?", flags=re.U,
include_headings=False))

page5 = parse("X\n== Foo ==\nBar\n== Baz ==\nBuzz")
section = page5.get_sections(matches="Foo")[0]
section.replace("\nBar\n", "\nBarf ")
section.append("{{Haha}}\n")
self.assertEqual("== Foo ==\nBarf {{Haha}}\n", section)
self.assertEqual("X\n== Foo ==\nBarf {{Haha}}\n== Baz ==\nBuzz", page5)

def test_strip_code(self):
"""test Wikicode.strip_code()"""
# Since individual nodes have test cases for their __strip__ methods,
# we're only going to do an integration test:
code = parse("Foo [[bar]]\n\n{{baz}}\n\n[[a|b]] &Sigma;")
self.assertEqual("Foo bar\n\nb Σ",
code.strip_code(normalize=True, collapse=True))
self.assertEqual("Foo bar\n\n\n\nb Σ",
code.strip_code(normalize=True, collapse=False))
self.assertEqual("Foo bar\n\nb &Sigma;",
code.strip_code(normalize=False, collapse=True))
self.assertEqual("Foo bar\n\n\n\nb &Sigma;",
code.strip_code(normalize=False, collapse=False))

def test_get_tree(self):
"""test Wikicode.get_tree()"""
# Since individual nodes have test cases for their __showtree___
# methods, and the docstring covers all possibilities for the output of
# __showtree__, we'll test it only:
code = parse("Lorem ipsum {{foo|bar|{{baz}}|spam=eggs}}")
expected = "Lorem ipsum \n{{\n\t foo\n\t| 1\n\t= bar\n\t| 2\n\t= " + \
"{{\n\t\t\tbaz\n\t }}\n\t| spam\n\t= eggs\n}}"
self.assertEqual(expected.expandtabs(4), code.get_tree())

if __name__ == "__main__":
unittest.main(verbosity=2)

+ 107
- 0
tests/test_wikilink.py View File

@@ -0,0 +1,107 @@
# -*- coding: utf-8 -*-
#
# Copyright (C) 2012-2013 Ben Kurtovic <ben.kurtovic@verizon.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import unicode_literals
import unittest

from mwparserfromhell.compat import str
from mwparserfromhell.nodes import Text, Wikilink

from ._test_tree_equality import TreeEqualityTestCase, getnodes, wrap, wraptext

class TestWikilink(TreeEqualityTestCase):
"""Test cases for the Wikilink node."""

def test_unicode(self):
"""test Wikilink.__unicode__()"""
node = Wikilink(wraptext("foobar"))
self.assertEqual("[[foobar]]", str(node))
node2 = Wikilink(wraptext("foo"), wraptext("bar"))
self.assertEqual("[[foo|bar]]", str(node2))

def test_iternodes(self):
"""test Wikilink.__iternodes__()"""
node1n1 = Text("foobar")
node2n1, node2n2, node2n3 = Text("foo"), Text("bar"), Text("baz")
node1 = Wikilink(wrap([node1n1]))
node2 = Wikilink(wrap([node2n1]), wrap([node2n2, node2n3]))
gen1 = node1.__iternodes__(getnodes)
gen2 = node2.__iternodes__(getnodes)
self.assertEqual((None, node1), next(gen1))
self.assertEqual((None, node2), next(gen2))
self.assertEqual((node1.title, node1n1), next(gen1))
self.assertEqual((node2.title, node2n1), next(gen2))
self.assertEqual((node2.text, node2n2), next(gen2))
self.assertEqual((node2.text, node2n3), next(gen2))
self.assertRaises(StopIteration, next, gen1)
self.assertRaises(StopIteration, next, gen2)

def test_strip(self):
"""test Wikilink.__strip__()"""
node = Wikilink(wraptext("foobar"))
node2 = Wikilink(wraptext("foo"), wraptext("bar"))
for a in (True, False):
for b in (True, False):
self.assertEqual("foobar", node.__strip__(a, b))
self.assertEqual("bar", node2.__strip__(a, b))

def test_showtree(self):
"""test Wikilink.__showtree__()"""
output = []
getter, marker = object(), object()
get = lambda code: output.append((getter, code))
mark = lambda: output.append(marker)
node1 = Wikilink(wraptext("foobar"))
node2 = Wikilink(wraptext("foo"), wraptext("bar"))
node1.__showtree__(output.append, get, mark)
node2.__showtree__(output.append, get, mark)
valid = [
"[[", (getter, node1.title), "]]", "[[", (getter, node2.title),
" | ", marker, (getter, node2.text), "]]"]
self.assertEqual(valid, output)

def test_title(self):
"""test getter/setter for the title attribute"""
title = wraptext("foobar")
node1 = Wikilink(title)
node2 = Wikilink(title, wraptext("baz"))
self.assertIs(title, node1.title)
self.assertIs(title, node2.title)
node1.title = "héhehé"
node2.title = "héhehé"
self.assertWikicodeEqual(wraptext("héhehé"), node1.title)
self.assertWikicodeEqual(wraptext("héhehé"), node2.title)

def test_text(self):
"""test getter/setter for the text attribute"""
text = wraptext("baz")
node1 = Wikilink(wraptext("foobar"))
node2 = Wikilink(wraptext("foobar"), text)
self.assertIs(None, node1.text)
self.assertIs(text, node2.text)
node1.text = "buzz"
node2.text = None
self.assertWikicodeEqual(wraptext("buzz"), node1.text)
self.assertIs(None, node2.text)

if __name__ == "__main__":
unittest.main(verbosity=2)

+ 130
- 0
tests/tokenizer/arguments.mwtest View File

@@ -0,0 +1,130 @@
name: blank
label: argument with no content
input: "{{{}}}"
output: [ArgumentOpen(), ArgumentClose()]

---

name: blank_with_default
label: argument with no content but a pipe
input: "{{{|}}}"
output: [ArgumentOpen(), ArgumentSeparator(), ArgumentClose()]

---

name: basic
label: simplest type of argument
input: "{{{argument}}}"
output: [ArgumentOpen(), Text(text="argument"), ArgumentClose()]

---

name: default
label: argument with a default value
input: "{{{foo|bar}}}"
output: [ArgumentOpen(), Text(text="foo"), ArgumentSeparator(), Text(text="bar"), ArgumentClose()]

---

name: blank_with_multiple_defaults
label: no content, multiple pipes
input: "{{{|||}}}"
output: [ArgumentOpen(), ArgumentSeparator(), Text(text="||"), ArgumentClose()]

---

name: multiple_defaults
label: multiple values separated by pipes
input: "{{{foo|bar|baz}}}"
output: [ArgumentOpen(), Text(text="foo"), ArgumentSeparator(), Text(text="bar|baz"), ArgumentClose()]

---

name: newline
label: newline as only content
input: "{{{\n}}}"
output: [ArgumentOpen(), Text(text="\n"), ArgumentClose()]

---

name: right_braces
label: multiple } scattered throughout text
input: "{{{foo}b}a}r}}}"
output: [ArgumentOpen(), Text(text="foo}b}a}r"), ArgumentClose()]

---

name: right_braces_default
label: multiple } scattered throughout text, with a default value
input: "{{{foo}b}|}a}r}}}"
output: [ArgumentOpen(), Text(text="foo}b}"), ArgumentSeparator(), Text(text="}a}r"), ArgumentClose()]

---

name: nested
label: an argument nested within another argument
input: "{{{{{{foo}}}|{{{bar}}}}}}"
output: [ArgumentOpen(), ArgumentOpen(), Text(text="foo"), ArgumentClose(), ArgumentSeparator(), ArgumentOpen(), Text(text="bar"), ArgumentClose(), ArgumentClose()]

---

name: invalid_braces
label: invalid argument: multiple braces that are not part of a template or argument
input: "{{{foo{{[a}}}}}"
output: [Text(text="{{{foo{{[a}}}}}")]

---

name: incomplete_open_only
label: incomplete arguments: just an open
input: "{{{"
output: [Text(text="{{{")]

---

name: incomplete_open_text
label: incomplete arguments: an open with some text
input: "{{{foo"
output: [Text(text="{{{foo")]

---

name: incomplete_open_text_pipe
label: incomplete arguments: an open, text, then a pipe
input: "{{{foo|"
output: [Text(text="{{{foo|")]

---

name: incomplete_open_pipe
label: incomplete arguments: an open, then a pipe
input: "{{{|"
output: [Text(text="{{{|")]

---

name: incomplete_open_pipe_text
label: incomplete arguments: an open, then a pipe, then text
input: "{{{|foo"
output: [Text(text="{{{|foo")]

---

name: incomplete_open_pipes_text
label: incomplete arguments: a pipe, then text then two pipes
input: "{{{|f||"
output: [Text(text="{{{|f||")]

---

name: incomplete_open_partial_close
label: incomplete arguments: an open, then one right brace
input: "{{{{}"
output: [Text(text="{{{{}")]

---

name: incomplete_preserve_previous
label: incomplete arguments: a valid argument followed by an invalid one
input: "{{{foo}}} {{{bar"
output: [ArgumentOpen(), Text(text="foo"), ArgumentClose(), Text(text=" {{{bar")]

+ 39
- 0
tests/tokenizer/comments.mwtest View File

@@ -0,0 +1,39 @@
name: blank
label: a blank comment
input: "<!---->"
output: [CommentStart(), CommentEnd()]

---

name: basic
label: a basic comment
input: "<!-- comment -->"
output: [CommentStart(), Text(text=" comment "), CommentEnd()]

---

name: tons_of_nonsense
label: a comment with tons of ignorable garbage in it
input: "<!-- foo{{bar}}[[basé\n\n]{}{}{}{}]{{{{{{haha{{--a>aa<!--aa -->"
output: [CommentStart(), Text(text=" foo{{bar}}[[basé\n\n]{}{}{}{}]{{{{{{haha{{--a>aa<!--aa "), CommentEnd()]

---

name: incomplete_blank
label: a comment that doesn't close
input: "<!--"
output: [Text(text="<!--")]

---

name: incomplete_text
label: a comment that doesn't close, with text
input: "<!-- foo"
output: [Text(text="<!-- foo")]

---

name: incomplete_partial_close
label: a comment that doesn't close, with a partial close
input: "<!-- foo --\x01>"
output: [Text(text="<!-- foo --\x01>")]

+ 109
- 0
tests/tokenizer/headings.mwtest View File

@@ -0,0 +1,109 @@
name: level_1
label: a basic level-1 heading
input: "= Heading ="
output: [HeadingStart(level=1), Text(text=" Heading "), HeadingEnd()]

---

name: level_2
label: a basic level-2 heading
input: "== Heading =="
output: [HeadingStart(level=2), Text(text=" Heading "), HeadingEnd()]

---

name: level_3
label: a basic level-3 heading
input: "=== Heading ==="
output: [HeadingStart(level=3), Text(text=" Heading "), HeadingEnd()]

---

name: level_4
label: a basic level-4 heading
input: "==== Heading ===="
output: [HeadingStart(level=4), Text(text=" Heading "), HeadingEnd()]

---

name: level_5
label: a basic level-5 heading
input: "===== Heading ====="
output: [HeadingStart(level=5), Text(text=" Heading "), HeadingEnd()]

---

name: level_6
label: a basic level-6 heading
input: "====== Heading ======"
output: [HeadingStart(level=6), Text(text=" Heading "), HeadingEnd()]

---

name: level_7
label: a level-6 heading that pretends to be a level-7 heading
input: "======= Heading ======="
output: [HeadingStart(level=6), Text(text="= Heading ="), HeadingEnd()]

---

name: level_3_2
label: a level-2 heading that pretends to be a level-3 heading
input: "=== Heading =="
output: [HeadingStart(level=2), Text(text="= Heading "), HeadingEnd()]

---

name: level_4_6
label: a level-4 heading that pretends to be a level-6 heading
input: "==== Heading ======"
output: [HeadingStart(level=4), Text(text=" Heading =="), HeadingEnd()]

---

name: newline_before
label: a heading that starts after a newline
input: "This is some text.\n== Foobar ==\nbaz"
output: [Text(text="This is some text.\n"), HeadingStart(level=2), Text(text=" Foobar "), HeadingEnd(), Text(text="\nbaz")]

---

name: text_after
label: text on the same line after
input: "This is some text.\n== Foobar == baz"
output: [Text(text="This is some text.\n"), HeadingStart(level=2), Text(text=" Foobar "), HeadingEnd(), Text(text=" baz")]

---

name: invalid_text_before
label: invalid headings: text on the same line before
input: "This is some text. == Foobar ==\nbaz"
output: [Text(text="This is some text. == Foobar ==\nbaz")]

---

name: invalid_newline_middle
label: invalid headings: newline in the middle
input: "This is some text.\n== Foo\nbar =="
output: [Text(text="This is some text.\n== Foo\nbar ==")]

---

name: invalid_newline_end
label: invalid headings: newline in the middle
input: "This is some text.\n=== Foo\n==="
output: [Text(text="This is some text.\n=== Foo\n===")]

---

name: invalid_nesting
label: invalid headings: attempts at nesting
input: "== Foo === Bar === Baz =="
output: [HeadingStart(level=2), Text(text=" Foo === Bar === Baz "), HeadingEnd()]

---

name: incomplete
label: a heading that starts but doesn't finish
input: "Foobar. \n== Heading "
output: [Text(text="Foobar. \n== Heading ")]

+ 144
- 0
tests/tokenizer/html_entities.mwtest View File

@@ -0,0 +1,144 @@
name: named
label: a basic named HTML entity
input: "&nbsp;"
output: [HTMLEntityStart(), Text(text="nbsp"), HTMLEntityEnd()]

---

name: numeric_decimal
label: a basic decimal HTML entity
input: "&#107;"
output: [HTMLEntityStart(), HTMLEntityNumeric(), Text(text="107"), HTMLEntityEnd()]

---

name: numeric_hexadecimal_x
label: a basic hexadecimal HTML entity, using 'x' as a signal
input: "&#x6B;"
output: [HTMLEntityStart(), HTMLEntityNumeric(), HTMLEntityHex(char="x"), Text(text="6B"), HTMLEntityEnd()]

---

name: numeric_hexadecimal_X
label: a basic hexadecimal HTML entity, using 'X' as a signal
input: "&#X6B;"
output: [HTMLEntityStart(), HTMLEntityNumeric(), HTMLEntityHex(char="X"), Text(text="6B"), HTMLEntityEnd()]

---

name: numeric_decimal_max
label: the maximum acceptable decimal numeric entity
input: "&#1114111;"
output: [HTMLEntityStart(), HTMLEntityNumeric(), Text(text="1114111"), HTMLEntityEnd()]

---

name: numeric_hex_max
label: the maximum acceptable hexadecimal numeric entity
input: "&#x10FFFF;"
output: [HTMLEntityStart(), HTMLEntityNumeric(), HTMLEntityHex(char="x"), Text(text="10FFFF"), HTMLEntityEnd()]

---

name: numeric_zeros
label: zeros accepted at the beginning of a numeric entity
input: "&#0000000107;"
output: [HTMLEntityStart(), HTMLEntityNumeric(), Text(text="0000000107"), HTMLEntityEnd()]

---

name: numeric_hex_zeros
label: zeros accepted at the beginning of a hex numeric entity
input: "&#x0000000107;"
output: [HTMLEntityStart(), HTMLEntityNumeric(), HTMLEntityHex(char="x"), Text(text="0000000107"), HTMLEntityEnd()]

---

name: invalid_named_too_long
label: a named entity that is too long
input: "&sigmaSigma;"
output: [Text(text="&sigmaSigma;")]

---

name: invalid_named_undefined
label: a named entity that doesn't exist
input: "&foobar;"
output: [Text(text="&foobar;")]

---

name: invalid_named_nonascii
label: a named entity with non-ASCII characters
input: "&sígma;"
output: [Text(text="&sígma;")]

---

name: invalid_numeric_out_of_range_1
label: a numeric entity that is out of range: < 1
input: "&#0;"
output: [Text(text="&#0;")]

---

name: invalid_numeric_out_of_range_2
label: a hex numeric entity that is out of range: < 1
input: "&#x0;"
output: [Text(text="&#x0;")]

---

name: invalid_numeric_out_of_range_3
label: a numeric entity that is out of range: > 0x10FFFF
input: "&#1114112;"
output: [Text(text="&#1114112;")]

---

name: invalid_numeric_out_of_range_4
label: a hex numeric entity that is out of range: > 0x10FFFF
input: "&#x0110000;"
output: [Text(text="&#x0110000;")]

---

name: invalid_partial_amp
label: invalid entities: just an ampersand
input: "&"
output: [Text(text="&")]

---

name: invalid_partial_amp_semicolon
label: invalid entities: an ampersand and semicolon
input: "&;"
output: [Text(text="&;")]

---

name: invalid_partial_amp_pound_semicolon
label: invalid entities: an ampersand, pound sign, and semicolon
input: "&#;"
output: [Text(text="&#;")]

---

name: invalid_partial_amp_pound_x_semicolon
label: invalid entities: an ampersand, pound sign, x, and semicolon
input: "&#x;"
output: [Text(text="&#x;")]

---

name: invalid_partial_amp_pound_numbers
label: invalid entities: an ampersand, pound sign, numbers
input: "&#123"
output: [Text(text="&#123")]

---

name: invalid_partial_amp_pound_x_semicolon
label: invalid entities: an ampersand, pound sign, and x
input: "&#x"
output: [Text(text="&#x")]

+ 46
- 0
tests/tokenizer/integration.mwtest View File

@@ -0,0 +1,46 @@
name: empty
label: sanity check that parsing an empty string yields nothing
input: ""
output: []

---

name: template_argument_mix
label: an ambiguous mix of templates and arguments
input: "{{{{{{{{foo}}}}}}}}{{{{{{{bar}}baz}}}buz}}"
output: [TemplateOpen(), ArgumentOpen(), ArgumentOpen(), Text(text="foo"), ArgumentClose(), ArgumentClose(), TemplateClose(), TemplateOpen(), ArgumentOpen(), TemplateOpen(), Text(text="bar"), TemplateClose(), Text(text="baz"), ArgumentClose(), Text(text="buz"), TemplateClose()]

---

name: rich_heading
label: a heading with templates/wikilinks in it
input: "== Head{{ing}} [[with]] {{{funky|{{stuf}}}}} =="
output: [HeadingStart(level=2), Text(text=" Head"), TemplateOpen(), Text(text="ing"), TemplateClose(), Text(text=" "), WikilinkOpen(), Text(text="with"), WikilinkClose(), Text(text=" "), ArgumentOpen(), Text(text="funky"), ArgumentSeparator(), TemplateOpen(), Text(text="stuf"), TemplateClose(), ArgumentClose(), Text(text=" "), HeadingEnd()]

---

name: html_entity_with_template
label: a HTML entity with a template embedded inside
input: "&n{{bs}}p;"
output: [Text(text="&n"), TemplateOpen(), Text(text="bs"), TemplateClose(), Text(text="p;")]

---

name: html_entity_with_comment
label: a HTML entity with a comment embedded inside
input: "&n<!--foo-->bsp;"
output: [Text(text="&n"), CommentStart(), Text(text="foo"), CommentEnd(), Text(text="bsp;")]

---

name: wildcard
label: a wildcard assortment of various things
input: "{{{{{{{{foo}}bar|baz=biz}}buzz}}usr|{{bin}}}}"
output: [TemplateOpen(), TemplateOpen(), TemplateOpen(), TemplateOpen(), Text(text="foo"), TemplateClose(), Text(text="bar"), TemplateParamSeparator(), Text(text="baz"), TemplateParamEquals(), Text(text="biz"), TemplateClose(), Text(text="buzz"), TemplateClose(), Text(text="usr"), TemplateParamSeparator(), TemplateOpen(), Text(text="bin"), TemplateClose(), TemplateClose()]

---

name: wildcard_redux
label: an even wilder assortment of various things
input: "{{a|b|{{c|[[d]]{{{e}}}}}}}[[f|{{{g}}}<!--h-->]]{{i|j=&nbsp;}}"
output: [TemplateOpen(), Text(text="a"), TemplateParamSeparator(), Text(text="b"), TemplateParamSeparator(), TemplateOpen(), Text(text="c"), TemplateParamSeparator(), WikilinkOpen(), Text(text="d"), WikilinkClose(), ArgumentOpen(), Text(text="e"), ArgumentClose(), TemplateClose(), TemplateClose(), WikilinkOpen(), Text(text="f"), WikilinkSeparator(), ArgumentOpen(), Text(text="g"), ArgumentClose(), CommentStart(), Text(text="h"), CommentEnd(), WikilinkClose(), TemplateOpen(), Text(text="i"), TemplateParamSeparator(), Text(text="j"), TemplateParamEquals(), HTMLEntityStart(), Text(text="nbsp"), HTMLEntityEnd(), TemplateClose()]

+ 641
- 0
tests/tokenizer/templates.mwtest View File

@@ -0,0 +1,641 @@
name: blank
label: template with no content
input: "{{}}"
output: [TemplateOpen(), TemplateClose()]

---

name: blank_with_params
label: template with no content, but pipes and equal signs
input: "{{||=|}}"
output: [TemplateOpen(), TemplateParamSeparator(), TemplateParamSeparator(), TemplateParamEquals(), TemplateParamSeparator(), TemplateClose()]

---

name: no_params
label: simplest type of template
input: "{{template}}"
output: [TemplateOpen(), Text(text="template"), TemplateClose()]

---

name: one_param_unnamed
label: basic template with one unnamed parameter
input: "{{foo|bar}}"
output: [TemplateOpen(), Text(text="foo"), TemplateParamSeparator(), Text(text="bar"), TemplateClose()]

---

name: one_param_named
label: basic template with one named parameter
input: "{{foo|bar=baz}}"
output: [TemplateOpen(), Text(text="foo"), TemplateParamSeparator(), Text(text="bar"), TemplateParamEquals(), Text(text="baz"), TemplateClose()]

---

name: multiple_unnamed_params
label: basic template with multiple unnamed parameters
input: "{{foo|bar|baz|biz|buzz}}"
output: [TemplateOpen(), Text(text="foo"), TemplateParamSeparator(), Text(text="bar"), TemplateParamSeparator(), Text(text="baz"), TemplateParamSeparator(), Text(text="biz"), TemplateParamSeparator(), Text(text="buzz"), TemplateClose()]

---

name: multiple_named_params
label: basic template with multiple named parameters
input: "{{foo|bar=baz|biz=buzz|buff=baff|usr=bin}}"
output: [TemplateOpen(), Text(text="foo"), TemplateParamSeparator(), Text(text="bar"), TemplateParamEquals(), Text(text="baz"), TemplateParamSeparator(), Text(text="biz"), TemplateParamEquals(), Text(text="buzz"), TemplateParamSeparator(), Text(text="buff"), TemplateParamEquals(), Text(text="baff"), TemplateParamSeparator(), Text(text="usr"), TemplateParamEquals(), Text(text="bin"), TemplateClose()]

---

name: multiple_mixed_params
label: basic template with multiple unnamed/named parameters
input: "{{foo|bar=baz|biz|buzz=buff|usr|bin}}"
output: [TemplateOpen(), Text(text="foo"), TemplateParamSeparator(), Text(text="bar"), TemplateParamEquals(), Text(text="baz"), TemplateParamSeparator(), Text(text="biz"), TemplateParamSeparator(), Text(text="buzz"), TemplateParamEquals(), Text(text="buff"), TemplateParamSeparator(), Text(text="usr"), TemplateParamSeparator(), Text(text="bin"), TemplateClose()]

---

name: multiple_mixed_params2
label: basic template with multiple unnamed/named parameters in another order
input: "{{foo|bar|baz|biz=buzz|buff=baff|usr=bin}}"
output: [TemplateOpen(), Text(text="foo"), TemplateParamSeparator(), Text(text="bar"), TemplateParamSeparator(), Text(text="baz"), TemplateParamSeparator(), Text(text="biz"), TemplateParamEquals(), Text(text="buzz"), TemplateParamSeparator(), Text(text="buff"), TemplateParamEquals(), Text(text="baff"), TemplateParamSeparator(), Text(text="usr"), TemplateParamEquals(), Text(text="bin"), TemplateClose()]

---

name: nested_unnamed_param
label: nested template as an unnamed parameter
input: "{{foo|{{bar}}}}"
output: [TemplateOpen(), Text(text="foo"), TemplateParamSeparator(), TemplateOpen(), Text(text="bar"), TemplateClose(), TemplateClose()]

---

name: nested_named_param_value
label: nested template as a parameter value with a named parameter
input: "{{foo|bar={{baz}}}}"
output: [TemplateOpen(), Text(text="foo"), TemplateParamSeparator(), Text(text="bar"), TemplateParamEquals(), TemplateOpen(), Text(text="baz"), TemplateClose(), TemplateClose()]

---

name: nested_named_param_name_and_value
label: nested templates as a parameter name and value
input: "{{foo|{{bar}}={{baz}}}}"
output: [TemplateOpen(), Text(text="foo"), TemplateParamSeparator(), TemplateOpen(), Text(text="bar"), TemplateClose(), TemplateParamEquals(), TemplateOpen(), Text(text="baz"), TemplateClose(), TemplateClose()]

---

name: nested_name_start
label: nested template at the beginning of a template name
input: "{{{{foo}}bar}}"
output: [TemplateOpen(), TemplateOpen(), Text(text="foo"), TemplateClose(), Text(text="bar"), TemplateClose()]

---

name: nested_name_start_unnamed_param
label: nested template at the beginning of a template name and as an unnamed parameter
input: "{{{{foo}}bar|{{baz}}}}"
output: [TemplateOpen(), TemplateOpen(), Text(text="foo"), TemplateClose(), Text(text="bar"), TemplateParamSeparator(), TemplateOpen(), Text(text="baz"), TemplateClose(), TemplateClose()]

---

name: nested_name_start_named_param_value
label: nested template at the beginning of a template name and as a parameter value with a named parameter
input: "{{{{foo}}bar|baz={{biz}}}}"
output: [TemplateOpen(), TemplateOpen(), Text(text="foo"), TemplateClose(), Text(text="bar"), TemplateParamSeparator(), Text(text="baz"), TemplateParamEquals(), TemplateOpen(), Text(text="biz"), TemplateClose(), TemplateClose()]

---

name: nested_name_start_named_param_name_and_value
label: nested template at the beginning of a template name and as a parameter name and value
input: "{{{{foo}}bar|{{baz}}={{biz}}}}"
output: [TemplateOpen(), TemplateOpen(), Text(text="foo"), TemplateClose(), Text(text="bar"), TemplateParamSeparator(), TemplateOpen(), Text(text="baz"), TemplateClose(), TemplateParamEquals(), TemplateOpen(), Text(text="biz"), TemplateClose(), TemplateClose()]

---

name: nested_name_end
label: nested template at the end of a template name
input: "{{foo{{bar}}}}"
output: [TemplateOpen(), Text(text="foo"), TemplateOpen(), Text(text="bar"), TemplateClose(), TemplateClose()]

---

name: nested_name_end_unnamed_param
label: nested template at the end of a template name and as an unnamed parameter
input: "{{foo{{bar}}|{{baz}}}}"
output: [TemplateOpen(), Text(text="foo"), TemplateOpen(), Text(text="bar"), TemplateClose(), TemplateParamSeparator(), TemplateOpen(), Text(text="baz"), TemplateClose(), TemplateClose()]

---

name: nested_name_end_named_param_value
label: nested template at the end of a template name and as a parameter value with a named parameter
input: "{{foo{{bar}}|baz={{biz}}}}"
output: [TemplateOpen(), Text(text="foo"), TemplateOpen(), Text(text="bar"), TemplateClose(), TemplateParamSeparator(), Text(text="baz"), TemplateParamEquals(), TemplateOpen(), Text(text="biz"), TemplateClose(), TemplateClose()]

---

name: nested_name_end_named_param_name_and_value
label: nested template at the end of a template name and as a parameter name and value
input: "{{foo{{bar}}|{{baz}}={{biz}}}}"
output: [TemplateOpen(), Text(text="foo"), TemplateOpen(), Text(text="bar"), TemplateClose(), TemplateParamSeparator(), TemplateOpen(), Text(text="baz"), TemplateClose(), TemplateParamEquals(), TemplateOpen(), Text(text="biz"), TemplateClose(), TemplateClose()]

---

name: nested_name_mid
label: nested template in the middle of a template name
input: "{{foo{{bar}}baz}}"
output: [TemplateOpen(), Text(text="foo"), TemplateOpen(), Text(text="bar"), TemplateClose(), Text(text="baz"), TemplateClose()]

---

name: nested_name_mid_unnamed_param
label: nested template in the middle of a template name and as an unnamed parameter
input: "{{foo{{bar}}baz|{{biz}}}}"
output: [TemplateOpen(), Text(text="foo"), TemplateOpen(), Text(text="bar"), TemplateClose(), Text(text="baz"), TemplateParamSeparator(), TemplateOpen(), Text(text="biz"), TemplateClose(), TemplateClose()]

---

name: nested_name_mid_named_param_value
label: nested template in the middle of a template name and as a parameter value with a named parameter
input: "{{foo{{bar}}baz|biz={{buzz}}}}"
output: [TemplateOpen(), Text(text="foo"), TemplateOpen(), Text(text="bar"), TemplateClose(), Text(text="baz"), TemplateParamSeparator(), Text(text="biz"), TemplateParamEquals(), TemplateOpen(), Text(text="buzz"), TemplateClose(), TemplateClose()]

---

name: nested_name_mid_named_param_name_and_value
label: nested template in the middle of a template name and as a parameter name and value
input: "{{foo{{bar}}baz|{{biz}}={{buzz}}}}"
output: [TemplateOpen(), Text(text="foo"), TemplateOpen(), Text(text="bar"), TemplateClose(), Text(text="baz"), TemplateParamSeparator(), TemplateOpen(), Text(text="biz"), TemplateClose(), TemplateParamEquals(), TemplateOpen(), Text(text="buzz"), TemplateClose(), TemplateClose()]

---

name: nested_name_start_end
label: nested template at the beginning and end of a template name
input: "{{{{foo}}{{bar}}}}"
output: [TemplateOpen(), TemplateOpen(), Text(text="foo"), TemplateClose(), TemplateOpen(), Text(text="bar"), TemplateClose(), TemplateClose()]

---

name: nested_name_start_end_unnamed_param
label: nested template at the beginning and end of a template name and as an unnamed parameter
input: "{{{{foo}}{{bar}}|{{baz}}}}"
output: [TemplateOpen(), TemplateOpen(), Text(text="foo"), TemplateClose(), TemplateOpen(), Text(text="bar"), TemplateClose(), TemplateParamSeparator(), TemplateOpen(), Text(text="baz"), TemplateClose(), TemplateClose()]

---

name: nested_name_start_end_named_param_value
label: nested template at the beginning and end of a template name and as a parameter value with a named parameter
input: "{{{{foo}}{{bar}}|baz={{biz}}}}"
output: [TemplateOpen(), TemplateOpen(), Text(text="foo"), TemplateClose(), TemplateOpen(), Text(text="bar"), TemplateClose(), TemplateParamSeparator(), Text(text="baz"), TemplateParamEquals(), TemplateOpen(), Text(text="biz"), TemplateClose(), TemplateClose()]

---

name: nested_name_start_end_named_param_name_and_value
label: nested template at the beginning and end of a template name and as a parameter name and value
input: "{{{{foo}}{{bar}}|{{baz}}={{biz}}}}"
output: [TemplateOpen(), TemplateOpen(), Text(text="foo"), TemplateClose(), TemplateOpen(), Text(text="bar"), TemplateClose(), TemplateParamSeparator(), TemplateOpen(), Text(text="baz"), TemplateClose(), TemplateParamEquals(), TemplateOpen(), Text(text="biz"), TemplateClose(), TemplateClose()]

---

name: nested_names_multiple
label: multiple nested templates within nested templates
input: "{{{{{{{{foo}}bar}}baz}}biz}}"
output: [TemplateOpen(), TemplateOpen(), TemplateOpen(), TemplateOpen(), Text(text="foo"), TemplateClose(), Text(text="bar"), TemplateClose(), Text(text="baz"), TemplateClose(), Text(text="biz"), TemplateClose()]

---

name: nested_names_multiple_unnamed_param
label: multiple nested templates within nested templates with a nested unnamed parameter
input: "{{{{{{{{foo}}bar}}baz}}biz|{{buzz}}}}"
output: [TemplateOpen(), TemplateOpen(), TemplateOpen(), TemplateOpen(), Text(text="foo"), TemplateClose(), Text(text="bar"), TemplateClose(), Text(text="baz"), TemplateClose(), Text(text="biz"), TemplateParamSeparator(), TemplateOpen(), Text(text="buzz"), TemplateClose(), TemplateClose()]

---

name: nested_names_multiple_named_param_value
label: multiple nested templates within nested templates with a nested parameter value in a named parameter
input: "{{{{{{{{foo}}bar}}baz}}biz|buzz={{bin}}}}"
output: [TemplateOpen(), TemplateOpen(), TemplateOpen(), TemplateOpen(), Text(text="foo"), TemplateClose(), Text(text="bar"), TemplateClose(), Text(text="baz"), TemplateClose(), Text(text="biz"), TemplateParamSeparator(), Text(text="buzz"), TemplateParamEquals(), TemplateOpen(), Text(text="bin"), TemplateClose(), TemplateClose()]

---

name: nested_names_multiple_named_param_name_and_value
label: multiple nested templates within nested templates with a nested parameter name and value
input: "{{{{{{{{foo}}bar}}baz}}biz|{{buzz}}={{bin}}}}"
output: [TemplateOpen(), TemplateOpen(), TemplateOpen(), TemplateOpen(), Text(text="foo"), TemplateClose(), Text(text="bar"), TemplateClose(), Text(text="baz"), TemplateClose(), Text(text="biz"), TemplateParamSeparator(), TemplateOpen(), Text(text="buzz"), TemplateClose(), TemplateParamEquals(), TemplateOpen(), Text(text="bin"), TemplateClose(), TemplateClose()]

---

name: mixed_nested_templates
label: mixed assortment of nested templates within template names, parameter names, and values
input: "{{{{{{{{foo}}bar|baz=biz}}buzz}}usr|{{bin}}}}"
output: [TemplateOpen(), TemplateOpen(), TemplateOpen(), TemplateOpen(), Text(text="foo"), TemplateClose(), Text(text="bar"), TemplateParamSeparator(), Text(text="baz"), TemplateParamEquals(), Text(text="biz"), TemplateClose(), Text(text="buzz"), TemplateClose(), Text(text="usr"), TemplateParamSeparator(), TemplateOpen(), Text(text="bin"), TemplateClose(), TemplateClose()]

---

name: newlines_start
label: a newline at the start of a template name
input: "{{\nfoobar}}"
output: [TemplateOpen(), Text(text="\nfoobar"), TemplateClose()]

---

name: newlines_end
label: a newline at the end of a template name
input: "{{foobar\n}}"
output: [TemplateOpen(), Text(text="foobar\n"), TemplateClose()]

---

name: newlines_start_end
label: a newline at the start and end of a template name
input: "{{\nfoobar\n}}"
output: [TemplateOpen(), Text(text="\nfoobar\n"), TemplateClose()]

---

name: newlines_mid
label: a newline at the middle of a template name
input: "{{foo\nbar}}"
output: [Text(text="{{foo\nbar}}")]

---

name: newlines_start_mid
label: a newline at the start and middle of a template name
input: "{{\nfoo\nbar}}"
output: [Text(text="{{\nfoo\nbar}}")]

---

name: newlines_mid_end
label: a newline at the middle and end of a template name
input: "{{foo\nbar\n}}"
output: [Text(text="{{foo\nbar\n}}")]

---

name: newlines_start_mid_end
label: a newline at the start, middle, and end of a template name
input: "{{\nfoo\nbar\n}}"
output: [Text(text="{{\nfoo\nbar\n}}")]

---

name: newlines_unnamed_param
label: newlines within an unnamed template parameter
input: "{{foo|\nb\nar\n}}"
output: [TemplateOpen(), Text(text="foo"), TemplateParamSeparator(), Text(text="\nb\nar\n"), TemplateClose()]

---

name: newlines_enclose_template_name_unnamed_param
label: newlines enclosing a template name and within an unnamed template parameter
input: "{{\nfoo\n|\nb\nar\n}}"
output: [TemplateOpen(), Text(text="\nfoo\n"), TemplateParamSeparator(), Text(text="\nb\nar\n"), TemplateClose()]

---

name: newlines_within_template_name_unnamed_param
label: newlines within a template name and within an unnamed template parameter
input: "{{\nfo\no\n|\nb\nar\n}}"
output: [Text(text="{{\nfo\no\n|\nb\nar\n}}")]

---

name: newlines_enclose_template_name_named_param_value
label: newlines enclosing a template name and within a named parameter value
input: "{{\nfoo\n|1=\nb\nar\n}}"
output: [TemplateOpen(), Text(text="\nfoo\n"), TemplateParamSeparator(), Text(text="1"), TemplateParamEquals(), Text(text="\nb\nar\n"), TemplateClose()]

---

name: newlines_within_template_name_named_param_value
label: newlines within a template name and within a named parameter value
input: "{{\nf\noo\n|1=\nb\nar\n}}"
output: [Text(text="{{\nf\noo\n|1=\nb\nar\n}}")]

---

name: newlines_named_param_name
label: newlines within a parameter name
input: "{{foo|\nb\nar\n=baz}}"
output: [TemplateOpen(), Text(text="foo"), TemplateParamSeparator(), Text(text="\nb\nar\n"), TemplateParamEquals(), Text(text="baz"), TemplateClose()]

---

name: newlines_named_param_name_param_value
label: newlines within a parameter name and within a parameter value
input: "{{foo|\nb\nar\n=\nba\nz\n}}"
output: [TemplateOpen(), Text(text="foo"), TemplateParamSeparator(), Text(text="\nb\nar\n"), TemplateParamEquals(), Text(text="\nba\nz\n"), TemplateClose()]

---

name: newlines_enclose_template_name_named_param_name
label: newlines enclosing a template name and within a parameter name
input: "{{\nfoo\n|\nb\nar\n=baz}}"
output: [TemplateOpen(), Text(text="\nfoo\n"), TemplateParamSeparator(), Text(text="\nb\nar\n"), TemplateParamEquals(), Text(text="baz"), TemplateClose()]

---

name: newlines_enclose_template_name_named_param_name_param_value
label: newlines enclosing a template name and within a parameter name and within a parameter value
input: "{{\nfoo\n|\nb\nar\n=\nba\nz\n}}"
output: [TemplateOpen(), Text(text="\nfoo\n"), TemplateParamSeparator(), Text(text="\nb\nar\n"), TemplateParamEquals(), Text(text="\nba\nz\n"), TemplateClose()]

---

name: newlines_within_template_name_named_param_name
label: newlines within a template name and within a parameter name
input: "{{\nfo\no\n|\nb\nar\n=baz}}"
output: [Text(text="{{\nfo\no\n|\nb\nar\n=baz}}")]

---

name: newlines_within_template_name_named_param_name_param_value
label: newlines within a template name and within a parameter name and within a parameter value
input: "{{\nf\noo\n|\nb\nar\n=\nba\nz\n}}"
output: [Text(text="{{\nf\noo\n|\nb\nar\n=\nba\nz\n}}")]

---

name: newlines_wildcard
label: a random, complex assortment of templates and newlines
input: "{{\nfoo\n|\nb\nar\n=\nb\naz\n|\nb\nuz\n}}"
output: [TemplateOpen(), Text(text="\nfoo\n"), TemplateParamSeparator(), Text(text="\nb\nar\n"), TemplateParamEquals(), Text(text="\nb\naz\n"), TemplateParamSeparator(), Text(text="\nb\nuz\n"), TemplateClose()]

---

name: newlines_wildcard_redux
label: an even more random and complex assortment of templates and newlines
input: "{{\nfoo\n|\n{{\nbar\n|\nb\naz\n=\nb\niz\n}}\n=\nb\nuzz\n}}"
output: [TemplateOpen(), Text(text="\nfoo\n"), TemplateParamSeparator(), Text(text="\n"), TemplateOpen(), Text(text="\nbar\n"), TemplateParamSeparator(), Text(text="\nb\naz\n"), TemplateParamEquals(), Text(text="\nb\niz\n"), TemplateClose(), Text(text="\n"), TemplateParamEquals(), Text(text="\nb\nuzz\n"), TemplateClose()]

---

name: newlines_wildcard_redux_invalid
label: a variation of the newlines_wildcard_redux test that is invalid
input: "{{\nfoo\n|\n{{\nb\nar\n|\nb\naz\n=\nb\niz\n}}\n=\nb\nuzz\n}}"
output: [Text(text="{{\nfoo\n|\n{{\nb\nar\n|\nb\naz\n=\nb\niz\n}}\n=\nb\nuzz\n}}")]

---

name: invalid_name_left_brace_middle
label: invalid characters in template name: left brace in middle
input: "{{foo{bar}}"
output: [Text(text="{{foo{bar}}")]

---

name: invalid_name_right_brace_middle
label: invalid characters in template name: right brace in middle
input: "{{foo}bar}}"
output: [Text(text="{{foo}bar}}")]

---

name: invalid_name_left_braces
label: invalid characters in template name: two left braces in middle
input: "{{foo{b{ar}}"
output: [Text(text="{{foo{b{ar}}")]

---

name: invalid_name_left_bracket_middle
label: invalid characters in template name: left bracket in middle
input: "{{foo[bar}}"
output: [Text(text="{{foo[bar}}")]

---

name: invalid_name_right_bracket_middle
label: invalid characters in template name: right bracket in middle
input: "{{foo]bar}}"
output: [Text(text="{{foo]bar}}")]

---

name: invalid_name_left_bracket_start
label: invalid characters in template name: left bracket at start
input: "{{[foobar}}"
output: [Text(text="{{[foobar}}")]

---

name: invalid_name_right_bracket_start
label: invalid characters in template name: right bracket at end
input: "{{foobar]}}"
output: [Text(text="{{foobar]}}")]

---

name: valid_name_left_brace_start
label: valid characters in template name: left brace at start
input: "{{{foobar}}"
output: [Text(text="{"), TemplateOpen(), Text(text="foobar"), TemplateClose()]

---

name: valid_unnamed_param_left_brace
label: valid characters in unnamed template parameter: left brace
input: "{{foo|ba{r}}"
output: [TemplateOpen(), Text(text="foo"), TemplateParamSeparator(), Text(text="ba{r"), TemplateClose()]

---

name: valid_unnamed_param_braces
label: valid characters in unnamed template parameter: left and right braces
input: "{{foo|ba{r}}}"
output: [TemplateOpen(), Text(text="foo"), TemplateParamSeparator(), Text(text="ba{r"), TemplateClose(), Text(text="}")]

---

name: valid_param_name_braces
label: valid characters in template parameter name: left and right braces
input: "{{foo|ba{r}=baz}}"
output: [TemplateOpen(), Text(text="foo"), TemplateParamSeparator(), Text(text="ba{r}"), TemplateParamEquals(), Text(text="baz"), TemplateClose()]

---

name: valid_param_name_brackets
label: valid characters in unnamed template parameter: left and right brackets
input: "{{foo|ba[r]=baz}}"
output: [TemplateOpen(), Text(text="foo"), TemplateParamSeparator(), Text(text="ba[r]"), TemplateParamEquals(), Text(text="baz"), TemplateClose()]

---

name: valid_param_name_double_left_brackets
label: valid characters in unnamed template parameter: double left brackets
input: "{{foo|bar[[in\nvalid=baz}}"
output: [TemplateOpen(), Text(text="foo"), TemplateParamSeparator(), Text(text="bar[[in\nvalid"), TemplateParamEquals(), Text(text="baz"), TemplateClose()]

---

name: valid_param_name_double_right_brackets
label: valid characters in unnamed template parameter: double right brackets
input: "{{foo|bar]]=baz}}"
output: [TemplateOpen(), Text(text="foo"), TemplateParamSeparator(), Text(text="bar]]"), TemplateParamEquals(), Text(text="baz"), TemplateClose()]

---

name: valid_param_name_double_brackets
label: valid characters in unnamed template parameter: double left and right brackets
input: "{{foo|bar[[in\nvalid]]=baz}}"
output: [TemplateOpen(), Text(text="foo"), TemplateParamSeparator(), Text(text="bar[[in\nvalid]]"), TemplateParamEquals(), Text(text="baz"), TemplateClose()]

---

name: invalid_param_name_double_left_braces
label: invalid characters in template parameter name: double left braces
input: "{{foo|bar{{in\nvalid=baz}}"
output: [Text(text="{{foo|bar{{in\nvalid=baz}}")]

---

name: invalid_param_name_double_braces
label: invalid characters in template parameter name: double left and right braces
input: "{{foo|bar{{in\nvalid}}=baz}}"
output: [TemplateOpen(), Text(text="foo"), TemplateParamSeparator(), Text(text="bar{{in\nvalid"), TemplateClose(), Text(text="=baz}}")]

---

name: incomplete_stub
label: incomplete templates that should fail gracefully: just an opening
input: "{{"
output: [Text(text="{{")]

---

name: incomplete_plain
label: incomplete templates that should fail gracefully: no close whatsoever
input: "{{stuff}} {{foobar"
output: [TemplateOpen(), Text(text="stuff"), TemplateClose(), Text(text=" {{foobar")]

---

name: incomplete_right_brace
label: incomplete templates that should fail gracefully: only one right brace
input: "{{stuff}} {{foobar}"
output: [TemplateOpen(), Text(text="stuff"), TemplateClose(), Text(text=" {{foobar}")]

---

name: incomplete_pipe
label: incomplete templates that should fail gracefully: a pipe
input: "{{stuff}} {{foobar|"
output: [TemplateOpen(), Text(text="stuff"), TemplateClose(), Text(text=" {{foobar|")]

---

name: incomplete_unnamed_param
label: incomplete templates that should fail gracefully: an unnamed parameter
input: "{{stuff}} {{foo|bar"
output: [TemplateOpen(), Text(text="stuff"), TemplateClose(), Text(text=" {{foo|bar")]

---

name: incomplete_unnamed_param_pipe
label: incomplete templates that should fail gracefully: an unnamed parameter, then a pipe
input: "{{stuff}} {{foo|bar|"
output: [TemplateOpen(), Text(text="stuff"), TemplateClose(), Text(text=" {{foo|bar|")]

---

name: incomplete_valueless_param
label: incomplete templates that should fail gracefully: an a named parameter with no value
input: "{{stuff}} {{foo|bar="
output: [TemplateOpen(), Text(text="stuff"), TemplateClose(), Text(text=" {{foo|bar=")]

---

name: incomplete_valueless_param_pipe
label: incomplete templates that should fail gracefully: a named parameter with no value, then a pipe
input: "{{stuff}} {{foo|bar=|"
output: [TemplateOpen(), Text(text="stuff"), TemplateClose(), Text(text=" {{foo|bar=|")]

---

name: incomplete_named_param
label: incomplete templates that should fail gracefully: a named parameter with a value
input: "{{stuff}} {{foo|bar=baz"
output: [TemplateOpen(), Text(text="stuff"), TemplateClose(), Text(text=" {{foo|bar=baz")]

---

name: incomplete_named_param_pipe
label: incomplete templates that should fail gracefully: a named parameter with a value, then a paipe
input: "{{stuff}} {{foo|bar=baz|"
output: [TemplateOpen(), Text(text="stuff"), TemplateClose(), Text(text=" {{foo|bar=baz|")]

---

name: incomplete_two_unnamed_params
label: incomplete templates that should fail gracefully: two unnamed parameters
input: "{{stuff}} {{foo|bar|baz"
output: [TemplateOpen(), Text(text="stuff"), TemplateClose(), Text(text=" {{foo|bar|baz")]

---

name: incomplete_unnamed_param_valueless_param
label: incomplete templates that should fail gracefully: an unnamed parameter, then a named parameter with no value
input: "{{stuff}} {{foo|bar|baz="
output: [TemplateOpen(), Text(text="stuff"), TemplateClose(), Text(text=" {{foo|bar|baz=")]

---

name: incomplete_unnamed_param_named_param
label: incomplete templates that should fail gracefully: an unnamed parameter, then a named parameter with a value
input: "{{stuff}} {{foo|bar|baz=biz"
output: [TemplateOpen(), Text(text="stuff"), TemplateClose(), Text(text=" {{foo|bar|baz=biz")]

---

name: incomplete_named_param_unnamed_param
label: incomplete templates that should fail gracefully: a named parameter with a value, then an unnamed parameter
input: "{{stuff}} {{foo|bar=baz|biz"
output: [TemplateOpen(), Text(text="stuff"), TemplateClose(), Text(text=" {{foo|bar=baz|biz")]

---

name: incomplete_named_param_valueless_param
label: incomplete templates that should fail gracefully: a named parameter with a value, then a named parameter with no value
input: "{{stuff}} {{foo|bar=baz|biz="
output: [TemplateOpen(), Text(text="stuff"), TemplateClose(), Text(text=" {{foo|bar=baz|biz=")]

---

name: incomplete_two_named_params
label: incomplete templates that should fail gracefully: two named parameters with values
input: "{{stuff}} {{foo|bar=baz|biz=buzz"
output: [TemplateOpen(), Text(text="stuff"), TemplateClose(), Text(text=" {{foo|bar=baz|biz=buzz")]

---

name: incomplete_nested_template_as_unnamed_param
label: incomplete templates that should fail gracefully: a valid nested template as an unnamed parameter
input: "{{stuff}} {{foo|{{bar}}"
output: [TemplateOpen(), Text(text="stuff"), TemplateClose(), Text(text=" {{foo|"), TemplateOpen(), Text(text="bar"), TemplateClose()]

---

name: incomplete_nested_template_as_param_value
label: incomplete templates that should fail gracefully: a valid nested template as a parameter value
input: "{{stuff}} {{foo|bar={{baz}}"
output: [TemplateOpen(), Text(text="stuff"), TemplateClose(), Text(text=" {{foo|bar="), TemplateOpen(), Text(text="baz"), TemplateClose()]

---

name: recursion_five_hundred_opens
label: test potentially dangerous recursion: five hundred template openings, without spaces
input: "{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{"
output: [Text(text="{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{")]

---

name: recursion_one_hundred_opens
label: test potentially dangerous recursion: one hundred template openings, with spaces
input: "{{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{"
output: [Text(text="{{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{")]

---

name: recursion_opens_and_closes
label: test potentially dangerous recursion: template openings and closings
input: "{{|{{}}{{|{{}}{{|{{}}{{|{{}}{{|{{}}{{|{{}}{{|{{}}{{|{{}}{{|{{}}{{|{{}}{{|{{}}{{|{{}}{{|{{}}{{|{{}}"
output: [Text(text="{{|"), TemplateOpen(), TemplateClose(), Text(text="{{|"), TemplateOpen(), TemplateClose(), TemplateOpen(), TemplateParamSeparator(), TemplateOpen(), TemplateClose(), Text(text="{{"), TemplateParamSeparator(), Text(text="{{"), TemplateClose(), Text(text="{{|{{}}{{|{{}}{{|{{}}{{|{{}}{{|{{}}{{|{{}}{{|{{}}{{|{{}}{{|{{}}{{|{{}}")]

+ 25
- 0
tests/tokenizer/text.mwtest View File

@@ -0,0 +1,25 @@
name: basic
label: sanity check for basic text parsing, no gimmicks
input: "foobar"
output: [Text(text="foobar")]

---

name: newlines
label: slightly more complex text parsing, with newlines
input: "This is a line of text.\nThis is another line of text.\nThis is another."
output: [Text(text="This is a line of text.\nThis is another line of text.\nThis is another.")]

---

name: unicode
label: ensure unicode data is handled properly
input: "Thís ís å sëñtënce with diœcritiçs."
output: [Text(text="Thís ís å sëñtënce with diœcritiçs.")]

---

name: unicode2
label: additional unicode check for non-BMP codepoints
input: "𐌲𐌿𐍄𐌰𐍂𐌰𐌶𐌳𐌰"
output: [Text(text="𐌲𐌿𐍄𐌰𐍂𐌰𐌶𐌳𐌰")]

+ 158
- 0
tests/tokenizer/wikilinks.mwtest View File

@@ -0,0 +1,158 @@
name: blank
label: wikilink with no content
input: "[[]]"
output: [WikilinkOpen(), WikilinkClose()]

---

name: blank_with_text
label: wikilink with no content but a pipe
input: "[[|]]"
output: [WikilinkOpen(), WikilinkSeparator(), WikilinkClose()]

---

name: basic
label: simplest type of wikilink
input: "[[wikilink]]"
output: [WikilinkOpen(), Text(text="wikilink"), WikilinkClose()]

---

name: with_text
label: wikilink with a text value
input: "[[foo|bar]]"
output: [WikilinkOpen(), Text(text="foo"), WikilinkSeparator(), Text(text="bar"), WikilinkClose()]

---

name: blank_with_multiple_texts
label: no content, multiple pipes
input: "[[|||]]"
output: [WikilinkOpen(), WikilinkSeparator(), Text(text="||"), WikilinkClose()]

---

name: multiple_texts
label: multiple text values separated by pipes
input: "[[foo|bar|baz]]"
output: [WikilinkOpen(), Text(text="foo"), WikilinkSeparator(), Text(text="bar|baz"), WikilinkClose()]

---

name: nested
label: a wikilink nested within the value of another
input: "[[foo|[[bar]]]]"
output: [WikilinkOpen(), Text(text="foo"), WikilinkSeparator(), WikilinkOpen(), Text(text="bar"), WikilinkClose(), WikilinkClose()]

---

name: nested_with_text
label: a wikilink nested within the value of another, separated by other data
input: "[[foo|a[[b]]c]]"
output: [WikilinkOpen(), Text(text="foo"), WikilinkSeparator(), Text(text="a"), WikilinkOpen(), Text(text="b"), WikilinkClose(), Text(text="c"), WikilinkClose()]

---

name: invalid_newline
label: invalid wikilink: newline as only content
input: "[[\n]]"
output: [Text(text="[[\n]]")]

---

name: invalid_right_brace
label: invalid wikilink: right brace
input: "[[foo}b}a}r]]"
output: [Text(text="[[foo}b}a}r]]")]

---

name: invalid_left_brace
label: invalid wikilink: left brace
input: "[[foo{{[a}}]]"
output: [Text(text="[[foo{{[a}}]]")]

---

name: invalid_right_bracket
label: invalid wikilink: right bracket
input: "[[foo]bar]]"
output: [Text(text="[[foo]bar]]")]

---

name: invalid_left_bracket
label: invalid wikilink: left bracket
input: "[[foo[bar]]"
output: [Text(text="[[foo[bar]]")]

---

name: invalid_nested
label: invalid wikilink: trying to nest in the wrong context
input: "[[foo[[bar]]]]"
output: [Text(text="[[foo"), WikilinkOpen(), Text(text="bar"), WikilinkClose(), Text(text="]]")]

---

name: invalid_nested_text
label: invalid wikilink: trying to nest in the wrong context, with a text param
input: "[[foo[[bar]]|baz]]"
output: [Text(text="[[foo"), WikilinkOpen(), Text(text="bar"), WikilinkClose(), Text(text="|baz]]")]

---

name: incomplete_open_only
label: incomplete wikilinks: just an open
input: "[["
output: [Text(text="[[")]

---

name: incomplete_open_text
label: incomplete wikilinks: an open with some text
input: "[[foo"
output: [Text(text="[[foo")]

---

name: incomplete_open_text_pipe
label: incomplete wikilinks: an open, text, then a pipe
input: "[[foo|"
output: [Text(text="[[foo|")]

---

name: incomplete_open_pipe
label: incomplete wikilinks: an open, then a pipe
input: "[[|"
output: [Text(text="[[|")]

---

name: incomplete_open_pipe_text
label: incomplete wikilinks: an open, then a pipe, then text
input: "[[|foo"
output: [Text(text="[[|foo")]

---

name: incomplete_open_pipes_text
label: incomplete wikilinks: a pipe, then text then two pipes
input: "[[|f||"
output: [Text(text="[[|f||")]

---

name: incomplete_open_partial_close
label: incomplete wikilinks: an open, then one right brace
input: "[[{}"
output: [Text(text="[[{}")]

---

name: incomplete_preserve_previous
label: incomplete wikilinks: a valid wikilink followed by an invalid one
input: "[[foo]] [[bar"
output: [WikilinkOpen(), Text(text="foo"), WikilinkClose(), Text(text=" [[bar")]

Loading…
Cancel
Save