@@ -1,3 +1,25 @@ | |||
v0.5 (released June 23, 2017): | |||
- Added Wikicode.contains() to determine whether a Node or Wikicode object is | |||
contained within another Wikicode object. | |||
- Added Wikicode.get_ancestors() and Wikicode.get_parent() to find all | |||
ancestors and the direct parent of a Node, respectively. | |||
- Fixed a long-standing performance issue with deeply nested, invalid syntax | |||
(issue #42). The parser should be much faster on certain complex pages. The | |||
"max cycle" restriction has also been removed, so some situations where | |||
templates at the end of a page were being skipped are now resolved. | |||
- Made Template.remove(keep_field=True) behave more reasonably when the | |||
parameter is already empty. | |||
- Added the keep_template_params argument to Wikicode.strip_code(). If True, | |||
then template parameters will be preserved in the output. | |||
- Wikicode objects can now be pickled properly (fixed infinite recursion error | |||
on incompletely-constructed StringMixIn subclasses). | |||
- Fixed Wikicode.matches()'s behavior on iterables besides lists and tuples. | |||
- Fixed len() sometimes raising ValueError on empty node lists. | |||
- Fixed a rare parsing bug involving self-closing tags inside the attributes of | |||
unpaired tags. | |||
- Fixed release script after changes to PyPI. | |||
v0.4.4 (released December 30, 2016): | |||
- Added support for Python 3.6. | |||
@@ -1,4 +1,4 @@ | |||
Copyright (C) 2012-2016 Ben Kurtovic <ben.kurtovic@gmail.com> | |||
Copyright (C) 2012-2017 Ben Kurtovic <ben.kurtovic@gmail.com> | |||
Permission is hereby granted, free of charge, to any person obtaining a copy | |||
of this software and associated documentation files (the "Software"), to deal | |||
@@ -113,23 +113,49 @@ saving the page!) by calling ``str()`` on it:: | |||
Likewise, use ``unicode(code)`` in Python 2. | |||
Caveats | |||
Limitations | |||
----------- | |||
While the MediaWiki parser generates HTML and has access to the contents of | |||
templates, among other things, mwparserfromhell acts as a direct interface to | |||
the source code only. This has several implications: | |||
* Syntax elements produced by a template transclusion cannot be detected. For | |||
example, imagine a hypothetical page ``"Template:End-bold"`` that contained | |||
the text ``</b>``. While MediaWiki would correctly understand that | |||
``<b>foobar{{end-bold}}`` translates to ``<b>foobar</b>``, mwparserfromhell | |||
has no way of examining the contents of ``{{end-bold}}``. Instead, it would | |||
treat the bold tag as unfinished, possibly extending further down the page. | |||
* Templates adjacent to external links, as in ``http://example.com{{foo}}``, | |||
are considered part of the link. In reality, this would depend on the | |||
contents of the template. | |||
* When different syntax elements cross over each other, as in | |||
``{{echo|''Hello}}, world!''``, the parser gets confused because this cannot | |||
be represented by an ordinary syntax tree. Instead, the parser will treat the | |||
first syntax construct as plain text. In this case, only the italic tag would | |||
be properly parsed. | |||
**Workaround:** Since this commonly occurs with text formatting and text | |||
formatting is often not of interest to users, you may pass | |||
*skip_style_tags=True* to ``mwparserfromhell.parse()``. This treats ``''`` | |||
and ``'''`` as plain text. | |||
A future version of mwparserfromhell may include multiple parsing modes to | |||
get around this restriction more sensibly. | |||
Additionally, the parser lacks awareness of certain wiki-specific settings: | |||
An inherent limitation in wikicode prevents us from generating complete parse | |||
trees in certain cases. For example, the string ``{{echo|''Hello}}, world!''`` | |||
produces the valid output ``<i>Hello, world!</i>`` in MediaWiki, assuming | |||
``{{echo}}`` is a template that returns its first parameter. But since | |||
representing this in mwparserfromhell's node tree would be impossible, we | |||
compromise by treating the first node (i.e., the template) as plain text, | |||
parsing only the italics. | |||
* `Word-ending links`_ are not supported, since the linktrail rules are | |||
language-specific. | |||
The current workaround for cases where you are not interested in text | |||
formatting is to pass ``skip_style_tags=True`` to ``mwparserfromhell.parse()``. | |||
This treats ``''`` and ``'''`` like plain text. | |||
* Localized namespace names aren't recognized, so file links (such as | |||
``[[File:...]]``) are treated as regular wikilinks. | |||
A future version of mwparserfromhell will include multiple parsing modes to get | |||
around this restriction. | |||
* Anything that looks like an XML tag is treated as a tag, even if it is not a | |||
recognized tag name, since the list of valid tags depends on loaded MediaWiki | |||
extensions. | |||
Integration | |||
----------- | |||
@@ -174,6 +200,7 @@ Python 3 code (via the API_):: | |||
.. _GitHub: https://github.com/earwig/mwparserfromhell | |||
.. _Python Package Index: http://pypi.python.org | |||
.. _get pip: http://pypi.python.org/pypi/pip | |||
.. _Word-ending links: https://www.mediawiki.org/wiki/Help:Links#linktrail | |||
.. _EarwigBot: https://github.com/earwig/earwigbot | |||
.. _Pywikibot: https://www.mediawiki.org/wiki/Manual:Pywikibot | |||
.. _API: http://mediawiki.org/wiki/API |
@@ -1,6 +1,6 @@ | |||
# This config file is used by appveyor.com to build Windows release binaries | |||
version: 0.4.4-b{build} | |||
version: 0.5-b{build} | |||
branches: | |||
only: | |||
@@ -52,6 +52,14 @@ environment: | |||
PYTHON_VERSION: "3.5" | |||
PYTHON_ARCH: "64" | |||
- PYTHON: "C:\\Python36" | |||
PYTHON_VERSION: "3.6" | |||
PYTHON_ARCH: "32" | |||
- PYTHON: "C:\\Python36-x64" | |||
PYTHON_VERSION: "3.6" | |||
PYTHON_ARCH: "64" | |||
install: | |||
- "%PIP% install --disable-pip-version-check --user --upgrade pip" | |||
- "%PIP% install wheel twine" | |||
@@ -1,17 +0,0 @@ | |||
Caveats | |||
======= | |||
An inherent limitation in wikicode prevents us from generating complete parse | |||
trees in certain cases. For example, the string ``{{echo|''Hello}}, world!''`` | |||
produces the valid output ``<i>Hello, world!</i>`` in MediaWiki, assuming | |||
``{{echo}}`` is a template that returns its first parameter. But since | |||
representing this in mwparserfromhell's node tree would be impossible, we | |||
compromise by treating the first node (i.e., the template) as plain text, | |||
parsing only the italics. | |||
The current workaround for cases where you are not interested in text | |||
formatting is to pass *skip_style_tags=True* to :func:`mwparserfromhell.parse`. | |||
This treats ``''`` and ``'''`` like plain text. | |||
A future version of mwparserfromhell will include multiple parsing modes to get | |||
around this restriction. |
@@ -1,6 +1,36 @@ | |||
Changelog | |||
========= | |||
v0.5 | |||
---- | |||
`Released June 23, 2017 <https://github.com/earwig/mwparserfromhell/tree/v0.5>`_ | |||
(`changes <https://github.com/earwig/mwparserfromhell/compare/v0.4.4...v0.5>`__): | |||
- Added :meth:`.Wikicode.contains` to determine whether a :class:`.Node` or | |||
:class:`.Wikicode` object is contained within another :class:`.Wikicode` | |||
object. | |||
- Added :meth:`.Wikicode.get_ancestors` and :meth:`.Wikicode.get_parent` to | |||
find all ancestors and the direct parent of a :class:`.Node`, respectively. | |||
- Fixed a long-standing performance issue with deeply nested, invalid syntax | |||
(`issue #42 <https://github.com/earwig/mwparserfromhell/issues/42>`_). The | |||
parser should be much faster on certain complex pages. The "max cycle" | |||
restriction has also been removed, so some situations where templates at the | |||
end of a page were being skipped are now resolved. | |||
- Made :meth:`Template.remove(keep_field=True) <.Template.remove>` behave more | |||
reasonably when the parameter is already empty. | |||
- Added the *keep_template_params* argument to :meth:`.Wikicode.strip_code`. | |||
If *True*, then template parameters will be preserved in the output. | |||
- :class:`.Wikicode` objects can now be pickled properly (fixed infinite | |||
recursion error on incompletely-constructed :class:`.StringMixIn` | |||
subclasses). | |||
- Fixed :meth:`.Wikicode.matches`\ 's behavior on iterables besides lists and | |||
tuples. | |||
- Fixed ``len()`` sometimes raising ``ValueError`` on empty node lists. | |||
- Fixed a rare parsing bug involving self-closing tags inside the attributes of | |||
unpaired tags. | |||
- Fixed release script after changes to PyPI. | |||
v0.4.4 | |||
------ | |||
@@ -31,7 +61,7 @@ v0.4.3 | |||
v0.4.2 | |||
------ | |||
`Released July 30, 2015 <https://github.com/earwig/mwparserfromhell/tree/v0.4.2>`_ | |||
`Released July 30, 2015 <https://github.com/earwig/mwparserfromhell/tree/v0.4.2>`__ | |||
(`changes <https://github.com/earwig/mwparserfromhell/compare/v0.4.1...v0.4.2>`__): | |||
- Fixed setup script not including header files in releases. | |||
@@ -40,7 +70,7 @@ v0.4.2 | |||
v0.4.1 | |||
------ | |||
`Released July 30, 2015 <https://github.com/earwig/mwparserfromhell/tree/v0.4.1>`_ | |||
`Released July 30, 2015 <https://github.com/earwig/mwparserfromhell/tree/v0.4.1>`__ | |||
(`changes <https://github.com/earwig/mwparserfromhell/compare/v0.4...v0.4.1>`__): | |||
- The process for building Windows binaries has been fixed, and these should be | |||
@@ -42,7 +42,7 @@ master_doc = 'index' | |||
# General information about the project. | |||
project = u'mwparserfromhell' | |||
copyright = u'2012, 2013, 2014, 2015, 2016 Ben Kurtovic' | |||
copyright = u'2012, 2013, 2014, 2015, 2016, 2017 Ben Kurtovic' | |||
# The version info for the project you're documenting, acts as replacement for | |||
# |version| and |release|, also used in various other places throughout the | |||
@@ -40,7 +40,7 @@ Contents | |||
:maxdepth: 2 | |||
usage | |||
caveats | |||
limitations | |||
integration | |||
changelog | |||
API Reference <api/modules> | |||
@@ -0,0 +1,45 @@ | |||
Limitations | |||
=========== | |||
While the MediaWiki parser generates HTML and has access to the contents of | |||
templates, among other things, mwparserfromhell acts as a direct interface to | |||
the source code only. This has several implications: | |||
* Syntax elements produced by a template transclusion cannot be detected. For | |||
example, imagine a hypothetical page ``"Template:End-bold"`` that contained | |||
the text ``</b>``. While MediaWiki would correctly understand that | |||
``<b>foobar{{end-bold}}`` translates to ``<b>foobar</b>``, mwparserfromhell | |||
has no way of examining the contents of ``{{end-bold}}``. Instead, it would | |||
treat the bold tag as unfinished, possibly extending further down the page. | |||
* Templates adjacent to external links, as in ``http://example.com{{foo}}``, | |||
are considered part of the link. In reality, this would depend on the | |||
contents of the template. | |||
* When different syntax elements cross over each other, as in | |||
``{{echo|''Hello}}, world!''``, the parser gets confused because this cannot | |||
be represented by an ordinary syntax tree. Instead, the parser will treat the | |||
first syntax construct as plain text. In this case, only the italic tag would | |||
be properly parsed. | |||
**Workaround:** Since this commonly occurs with text formatting and text | |||
formatting is often not of interest to users, you may pass | |||
*skip_style_tags=True* to ``mwparserfromhell.parse()``. This treats ``''`` | |||
and ``'''`` as plain text. | |||
A future version of mwparserfromhell may include multiple parsing modes to | |||
get around this restriction more sensibly. | |||
Additionally, the parser lacks awareness of certain wiki-specific settings: | |||
* `Word-ending links`_ are not supported, since the linktrail rules are | |||
language-specific. | |||
* Localized namespace names aren't recognized, so file links (such as | |||
``[[File:...]]``) are treated as regular wikilinks. | |||
* Anything that looks like an XML tag is treated as a tag, even if it is not a | |||
recognized tag name, since the list of valid tags depends on loaded MediaWiki | |||
extensions. | |||
.. _Word-ending links: https://www.mediawiki.org/wiki/Help:Links#linktrail |
@@ -1,6 +1,6 @@ | |||
# -*- coding: utf-8 -*- | |||
# | |||
# Copyright (C) 2012-2016 Ben Kurtovic <ben.kurtovic@gmail.com> | |||
# Copyright (C) 2012-2017 Ben Kurtovic <ben.kurtovic@gmail.com> | |||
# | |||
# Permission is hereby granted, free of charge, to any person obtaining a copy | |||
# of this software and associated documentation files (the "Software"), to deal | |||
@@ -29,7 +29,7 @@ outrageously powerful parser for `MediaWiki <http://mediawiki.org>`_ wikicode. | |||
__author__ = "Ben Kurtovic" | |||
__copyright__ = "Copyright (C) 2012, 2013, 2014, 2015, 2016 Ben Kurtovic" | |||
__license__ = "MIT License" | |||
__version__ = "0.4.4" | |||
__version__ = "0.5" | |||
__email__ = "ben.kurtovic@gmail.com" | |||
from . import (compat, definitions, nodes, parser, smart_list, string_mixin, | |||
@@ -58,7 +58,7 @@ class Node(StringMixIn): | |||
return | |||
yield # pragma: no cover (this is a generator that yields nothing) | |||
def __strip__(self, normalize, collapse): | |||
def __strip__(self, **kwargs): | |||
return None | |||
def __showtree__(self, write, get, mark): | |||
@@ -47,9 +47,9 @@ class Argument(Node): | |||
if self.default is not None: | |||
yield self.default | |||
def __strip__(self, normalize, collapse): | |||
def __strip__(self, **kwargs): | |||
if self.default is not None: | |||
return self.default.strip_code(normalize, collapse) | |||
return self.default.strip_code(**kwargs) | |||
return None | |||
def __showtree__(self, write, get, mark): | |||
@@ -49,12 +49,12 @@ class ExternalLink(Node): | |||
if self.title is not None: | |||
yield self.title | |||
def __strip__(self, normalize, collapse): | |||
def __strip__(self, **kwargs): | |||
if self.brackets: | |||
if self.title: | |||
return self.title.strip_code(normalize, collapse) | |||
return self.title.strip_code(**kwargs) | |||
return None | |||
return self.url.strip_code(normalize, collapse) | |||
return self.url.strip_code(**kwargs) | |||
def __showtree__(self, write, get, mark): | |||
if self.brackets: | |||
@@ -42,8 +42,8 @@ class Heading(Node): | |||
def __children__(self): | |||
yield self.title | |||
def __strip__(self, normalize, collapse): | |||
return self.title.strip_code(normalize, collapse) | |||
def __strip__(self, **kwargs): | |||
return self.title.strip_code(**kwargs) | |||
def __showtree__(self, write, get, mark): | |||
write("=" * self.level) | |||
@@ -58,8 +58,8 @@ class HTMLEntity(Node): | |||
return "&#{0}{1};".format(self.hex_char, self.value) | |||
return "&#{0};".format(self.value) | |||
def __strip__(self, normalize, collapse): | |||
if normalize: | |||
def __strip__(self, **kwargs): | |||
if kwargs.get("normalize"): | |||
return self.normalize() | |||
return self | |||
@@ -98,9 +98,9 @@ class Tag(Node): | |||
if not self.self_closing and not self.wiki_markup and self.closing_tag: | |||
yield self.closing_tag | |||
def __strip__(self, normalize, collapse): | |||
def __strip__(self, **kwargs): | |||
if self.contents and is_visible(self.tag): | |||
return self.contents.strip_code(normalize, collapse) | |||
return self.contents.strip_code(**kwargs) | |||
return None | |||
def __showtree__(self, write, get, mark): | |||
@@ -1,6 +1,6 @@ | |||
# -*- coding: utf-8 -*- | |||
# | |||
# Copyright (C) 2012-2016 Ben Kurtovic <ben.kurtovic@gmail.com> | |||
# Copyright (C) 2012-2017 Ben Kurtovic <ben.kurtovic@gmail.com> | |||
# | |||
# Permission is hereby granted, free of charge, to any person obtaining a copy | |||
# of this software and associated documentation files (the "Software"), to deal | |||
@@ -58,6 +58,12 @@ class Template(Node): | |||
yield param.name | |||
yield param.value | |||
def __strip__(self, **kwargs): | |||
if kwargs.get("keep_template_params"): | |||
parts = [param.value.strip_code(**kwargs) for param in self.params] | |||
return " ".join(part for part in parts if part) | |||
return None | |||
def __showtree__(self, write, get, mark): | |||
write("{{") | |||
get(self.name) | |||
@@ -70,7 +76,8 @@ class Template(Node): | |||
get(param.value) | |||
write("}}") | |||
def _surface_escape(self, code, char): | |||
@staticmethod | |||
def _surface_escape(code, char): | |||
"""Return *code* with *char* escaped as an HTML entity. | |||
The main use of this is to escape pipes (``|``) or equal signs (``=``) | |||
@@ -82,7 +89,8 @@ class Template(Node): | |||
if char in node: | |||
code.replace(node, node.replace(char, replacement), False) | |||
def _select_theory(self, theories): | |||
@staticmethod | |||
def _select_theory(theories): | |||
"""Return the most likely spacing convention given different options. | |||
Given a dictionary of convention options as keys and their occurrence | |||
@@ -96,6 +104,22 @@ class Template(Node): | |||
if confidence >= 0.75: | |||
return tuple(theories.keys())[values.index(best)] | |||
@staticmethod | |||
def _blank_param_value(value): | |||
"""Remove the content from *value* while keeping its whitespace. | |||
Replace *value*\ 's nodes with two text nodes, the first containing | |||
whitespace from before its content and the second containing whitespace | |||
from after its content. | |||
""" | |||
sval = str(value) | |||
if sval.isspace(): | |||
before, after = "", sval | |||
else: | |||
match = re.search(r"^(\s*).*?(\s*)$", sval, FLAGS) | |||
before, after = match.group(1), match.group(2) | |||
value.nodes = [Text(before), Text(after)] | |||
def _get_spacing_conventions(self, use_names): | |||
"""Try to determine the whitespace conventions for parameters. | |||
@@ -112,6 +136,11 @@ class Template(Node): | |||
component = str(param.value) | |||
match = re.search(r"^(\s*).*?(\s*)$", component, FLAGS) | |||
before, after = match.group(1), match.group(2) | |||
if not use_names and component.isspace() and "\n" in before: | |||
# If the value is empty, we expect newlines in the whitespace | |||
# to be after the content, not before it: | |||
before, after = before.split("\n", 1) | |||
after = "\n" + after | |||
before_theories[before] += 1 | |||
after_theories[after] += 1 | |||
@@ -119,16 +148,6 @@ class Template(Node): | |||
after = self._select_theory(after_theories) | |||
return before, after | |||
def _blank_param_value(self, value): | |||
"""Remove the content from *value* while keeping its whitespace. | |||
Replace *value*\ 's nodes with two text nodes, the first containing | |||
whitespace from before its content and the second containing whitespace | |||
from after its content. | |||
""" | |||
match = re.search(r"^(\s*).*?(\s*)$", str(value), FLAGS) | |||
value.nodes = [Text(match.group(1)), Text(match.group(2))] | |||
def _fix_dependendent_params(self, i): | |||
"""Unhide keys if necessary after removing the param at index *i*.""" | |||
if not self.params[i].showkey: | |||
@@ -37,7 +37,7 @@ class Text(Node): | |||
def __unicode__(self): | |||
return self.value | |||
def __strip__(self, normalize, collapse): | |||
def __strip__(self, **kwargs): | |||
return self | |||
def __showtree__(self, write, get, mark): | |||
@@ -46,10 +46,10 @@ class Wikilink(Node): | |||
if self.text is not None: | |||
yield self.text | |||
def __strip__(self, normalize, collapse): | |||
def __strip__(self, **kwargs): | |||
if self.text is not None: | |||
return self.text.strip_code(normalize, collapse) | |||
return self.title.strip_code(normalize, collapse) | |||
return self.text.strip_code(**kwargs) | |||
return self.title.strip_code(**kwargs) | |||
def __showtree__(self, write, get, mark): | |||
write("[[") | |||
@@ -1,6 +1,6 @@ | |||
# -*- coding: utf-8 -*- | |||
# | |||
# Copyright (C) 2012-2016 Ben Kurtovic <ben.kurtovic@gmail.com> | |||
# Copyright (C) 2012-2017 Ben Kurtovic <ben.kurtovic@gmail.com> | |||
# | |||
# Permission is hereby granted, free of charge, to any person obtaining a copy | |||
# of this software and associated documentation files (the "Software"), to deal | |||
@@ -100,6 +100,8 @@ Local (stack-specific) contexts: | |||
* :const:`TABLE_TH_LINE` | |||
* :const:`TABLE_CELL_LINE_CONTEXTS` | |||
* :const:`HTML_ENTITY` | |||
Global contexts: | |||
* :const:`GL_HEADING` | |||
@@ -176,6 +178,8 @@ TABLE_CELL_LINE_CONTEXTS = TABLE_TD_LINE + TABLE_TH_LINE + TABLE_CELL_STYLE | |||
TABLE = (TABLE_OPEN + TABLE_CELL_OPEN + TABLE_CELL_STYLE + TABLE_ROW_OPEN + | |||
TABLE_TD_LINE + TABLE_TH_LINE) | |||
HTML_ENTITY = 1 << 37 | |||
# Global contexts: | |||
GL_HEADING = 1 << 0 | |||
@@ -0,0 +1,795 @@ | |||
/* | |||
* avl_tree.c - intrusive, nonrecursive AVL tree data structure (self-balancing | |||
* binary search tree), implementation file | |||
* | |||
* Written in 2014-2016 by Eric Biggers <ebiggers3@gmail.com> | |||
* Slight changes for compatibility by Ben Kurtovic <ben.kurtovic@gmail.com> | |||
* | |||
* To the extent possible under law, the author(s) have dedicated all copyright | |||
* and related and neighboring rights to this software to the public domain | |||
* worldwide via the Creative Commons Zero 1.0 Universal Public Domain | |||
* Dedication (the "CC0"). | |||
* | |||
* This software is distributed in the hope that it will be useful, but WITHOUT | |||
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS | |||
* FOR A PARTICULAR PURPOSE. See the CC0 for more details. | |||
* | |||
* You should have received a copy of the CC0 along with this software; if not | |||
* see <http://creativecommons.org/publicdomain/zero/1.0/>. | |||
*/ | |||
#define false 0 | |||
#define true 1 | |||
typedef int bool; | |||
#include "avl_tree.h" | |||
/* Returns the left child (sign < 0) or the right child (sign > 0) of the | |||
* specified AVL tree node. | |||
* Note: for all calls of this, 'sign' is constant at compilation time, | |||
* so the compiler can remove the conditional. */ | |||
static AVL_INLINE struct avl_tree_node * | |||
avl_get_child(const struct avl_tree_node *parent, int sign) | |||
{ | |||
if (sign < 0) | |||
return parent->left; | |||
else | |||
return parent->right; | |||
} | |||
static AVL_INLINE struct avl_tree_node * | |||
avl_tree_first_or_last_in_order(const struct avl_tree_node *root, int sign) | |||
{ | |||
const struct avl_tree_node *first = root; | |||
if (first) | |||
while (avl_get_child(first, +sign)) | |||
first = avl_get_child(first, +sign); | |||
return (struct avl_tree_node *)first; | |||
} | |||
/* Starts an in-order traversal of the tree: returns the least-valued node, or | |||
* NULL if the tree is empty. */ | |||
struct avl_tree_node * | |||
avl_tree_first_in_order(const struct avl_tree_node *root) | |||
{ | |||
return avl_tree_first_or_last_in_order(root, -1); | |||
} | |||
/* Starts a *reverse* in-order traversal of the tree: returns the | |||
* greatest-valued node, or NULL if the tree is empty. */ | |||
struct avl_tree_node * | |||
avl_tree_last_in_order(const struct avl_tree_node *root) | |||
{ | |||
return avl_tree_first_or_last_in_order(root, 1); | |||
} | |||
static AVL_INLINE struct avl_tree_node * | |||
avl_tree_next_or_prev_in_order(const struct avl_tree_node *node, int sign) | |||
{ | |||
const struct avl_tree_node *next; | |||
if (avl_get_child(node, +sign)) | |||
for (next = avl_get_child(node, +sign); | |||
avl_get_child(next, -sign); | |||
next = avl_get_child(next, -sign)) | |||
; | |||
else | |||
for (next = avl_get_parent(node); | |||
next && node == avl_get_child(next, +sign); | |||
node = next, next = avl_get_parent(next)) | |||
; | |||
return (struct avl_tree_node *)next; | |||
} | |||
/* Continues an in-order traversal of the tree: returns the next-greatest-valued | |||
* node, or NULL if there is none. */ | |||
struct avl_tree_node * | |||
avl_tree_next_in_order(const struct avl_tree_node *node) | |||
{ | |||
return avl_tree_next_or_prev_in_order(node, 1); | |||
} | |||
/* Continues a *reverse* in-order traversal of the tree: returns the | |||
* previous-greatest-valued node, or NULL if there is none. */ | |||
struct avl_tree_node * | |||
avl_tree_prev_in_order(const struct avl_tree_node *node) | |||
{ | |||
return avl_tree_next_or_prev_in_order(node, -1); | |||
} | |||
/* Starts a postorder traversal of the tree. */ | |||
struct avl_tree_node * | |||
avl_tree_first_in_postorder(const struct avl_tree_node *root) | |||
{ | |||
const struct avl_tree_node *first = root; | |||
if (first) | |||
while (first->left || first->right) | |||
first = first->left ? first->left : first->right; | |||
return (struct avl_tree_node *)first; | |||
} | |||
/* Continues a postorder traversal of the tree. @prev will not be deferenced as | |||
* it's allowed that its memory has been freed; @prev_parent must be its saved | |||
* parent node. Returns NULL if there are no more nodes (i.e. @prev was the | |||
* root of the tree). */ | |||
struct avl_tree_node * | |||
avl_tree_next_in_postorder(const struct avl_tree_node *prev, | |||
const struct avl_tree_node *prev_parent) | |||
{ | |||
const struct avl_tree_node *next = prev_parent; | |||
if (next && prev == next->left && next->right) | |||
for (next = next->right; | |||
next->left || next->right; | |||
next = next->left ? next->left : next->right) | |||
; | |||
return (struct avl_tree_node *)next; | |||
} | |||
/* Sets the left child (sign < 0) or the right child (sign > 0) of the | |||
* specified AVL tree node. | |||
* Note: for all calls of this, 'sign' is constant at compilation time, | |||
* so the compiler can remove the conditional. */ | |||
static AVL_INLINE void | |||
avl_set_child(struct avl_tree_node *parent, int sign, | |||
struct avl_tree_node *child) | |||
{ | |||
if (sign < 0) | |||
parent->left = child; | |||
else | |||
parent->right = child; | |||
} | |||
/* Sets the parent and balance factor of the specified AVL tree node. */ | |||
static AVL_INLINE void | |||
avl_set_parent_balance(struct avl_tree_node *node, struct avl_tree_node *parent, | |||
int balance_factor) | |||
{ | |||
node->parent_balance = (uintptr_t)parent | (balance_factor + 1); | |||
} | |||
/* Sets the parent of the specified AVL tree node. */ | |||
static AVL_INLINE void | |||
avl_set_parent(struct avl_tree_node *node, struct avl_tree_node *parent) | |||
{ | |||
node->parent_balance = (uintptr_t)parent | (node->parent_balance & 3); | |||
} | |||
/* Returns the balance factor of the specified AVL tree node --- that is, the | |||
* height of its right subtree minus the height of its left subtree. */ | |||
static AVL_INLINE int | |||
avl_get_balance_factor(const struct avl_tree_node *node) | |||
{ | |||
return (int)(node->parent_balance & 3) - 1; | |||
} | |||
/* Adds @amount to the balance factor of the specified AVL tree node. | |||
* The caller must ensure this still results in a valid balance factor | |||
* (-1, 0, or 1). */ | |||
static AVL_INLINE void | |||
avl_adjust_balance_factor(struct avl_tree_node *node, int amount) | |||
{ | |||
node->parent_balance += amount; | |||
} | |||
static AVL_INLINE void | |||
avl_replace_child(struct avl_tree_node **root_ptr, | |||
struct avl_tree_node *parent, | |||
struct avl_tree_node *old_child, | |||
struct avl_tree_node *new_child) | |||
{ | |||
if (parent) { | |||
if (old_child == parent->left) | |||
parent->left = new_child; | |||
else | |||
parent->right = new_child; | |||
} else { | |||
*root_ptr = new_child; | |||
} | |||
} | |||
/* | |||
* Template for performing a single rotation --- | |||
* | |||
* sign > 0: Rotate clockwise (right) rooted at A: | |||
* | |||
* P? P? | |||
* | | | |||
* A B | |||
* / \ / \ | |||
* B C? => D? A | |||
* / \ / \ | |||
* D? E? E? C? | |||
* | |||
* (nodes marked with ? may not exist) | |||
* | |||
* sign < 0: Rotate counterclockwise (left) rooted at A: | |||
* | |||
* P? P? | |||
* | | | |||
* A B | |||
* / \ / \ | |||
* C? B => A D? | |||
* / \ / \ | |||
* E? D? C? E? | |||
* | |||
* This updates pointers but not balance factors! | |||
*/ | |||
static AVL_INLINE void | |||
avl_rotate(struct avl_tree_node ** const root_ptr, | |||
struct avl_tree_node * const A, const int sign) | |||
{ | |||
struct avl_tree_node * const B = avl_get_child(A, -sign); | |||
struct avl_tree_node * const E = avl_get_child(B, +sign); | |||
struct avl_tree_node * const P = avl_get_parent(A); | |||
avl_set_child(A, -sign, E); | |||
avl_set_parent(A, B); | |||
avl_set_child(B, +sign, A); | |||
avl_set_parent(B, P); | |||
if (E) | |||
avl_set_parent(E, A); | |||
avl_replace_child(root_ptr, P, A, B); | |||
} | |||
/* | |||
* Template for performing a double rotation --- | |||
* | |||
* sign > 0: Rotate counterclockwise (left) rooted at B, then | |||
* clockwise (right) rooted at A: | |||
* | |||
* P? P? P? | |||
* | | | | |||
* A A E | |||
* / \ / \ / \ | |||
* B C? => E C? => B A | |||
* / \ / \ / \ / \ | |||
* D? E B G? D? F?G? C? | |||
* / \ / \ | |||
* F? G? D? F? | |||
* | |||
* (nodes marked with ? may not exist) | |||
* | |||
* sign < 0: Rotate clockwise (right) rooted at B, then | |||
* counterclockwise (left) rooted at A: | |||
* | |||
* P? P? P? | |||
* | | | | |||
* A A E | |||
* / \ / \ / \ | |||
* C? B => C? E => A B | |||
* / \ / \ / \ / \ | |||
* E D? G? B C? G?F? D? | |||
* / \ / \ | |||
* G? F? F? D? | |||
* | |||
* Returns a pointer to E and updates balance factors. Except for those | |||
* two things, this function is equivalent to: | |||
* avl_rotate(root_ptr, B, -sign); | |||
* avl_rotate(root_ptr, A, +sign); | |||
* | |||
* See comment in avl_handle_subtree_growth() for explanation of balance | |||
* factor updates. | |||
*/ | |||
static AVL_INLINE struct avl_tree_node * | |||
avl_do_double_rotate(struct avl_tree_node ** const root_ptr, | |||
struct avl_tree_node * const B, | |||
struct avl_tree_node * const A, const int sign) | |||
{ | |||
struct avl_tree_node * const E = avl_get_child(B, +sign); | |||
struct avl_tree_node * const F = avl_get_child(E, -sign); | |||
struct avl_tree_node * const G = avl_get_child(E, +sign); | |||
struct avl_tree_node * const P = avl_get_parent(A); | |||
const int e = avl_get_balance_factor(E); | |||
avl_set_child(A, -sign, G); | |||
avl_set_parent_balance(A, E, ((sign * e >= 0) ? 0 : -e)); | |||
avl_set_child(B, +sign, F); | |||
avl_set_parent_balance(B, E, ((sign * e <= 0) ? 0 : -e)); | |||
avl_set_child(E, +sign, A); | |||
avl_set_child(E, -sign, B); | |||
avl_set_parent_balance(E, P, 0); | |||
if (G) | |||
avl_set_parent(G, A); | |||
if (F) | |||
avl_set_parent(F, B); | |||
avl_replace_child(root_ptr, P, A, E); | |||
return E; | |||
} | |||
/* | |||
* This function handles the growth of a subtree due to an insertion. | |||
* | |||
* @root_ptr | |||
* Location of the tree's root pointer. | |||
* | |||
* @node | |||
* A subtree that has increased in height by 1 due to an insertion. | |||
* | |||
* @parent | |||
* Parent of @node; must not be NULL. | |||
* | |||
* @sign | |||
* -1 if @node is the left child of @parent; | |||
* +1 if @node is the right child of @parent. | |||
* | |||
* This function will adjust @parent's balance factor, then do a (single | |||
* or double) rotation if necessary. The return value will be %true if | |||
* the full AVL tree is now adequately balanced, or %false if the subtree | |||
* rooted at @parent is now adequately balanced but has increased in | |||
* height by 1, so the caller should continue up the tree. | |||
* | |||
* Note that if %false is returned, no rotation will have been done. | |||
* Indeed, a single node insertion cannot require that more than one | |||
* (single or double) rotation be done. | |||
*/ | |||
static AVL_INLINE bool | |||
avl_handle_subtree_growth(struct avl_tree_node ** const root_ptr, | |||
struct avl_tree_node * const node, | |||
struct avl_tree_node * const parent, | |||
const int sign) | |||
{ | |||
int old_balance_factor, new_balance_factor; | |||
old_balance_factor = avl_get_balance_factor(parent); | |||
if (old_balance_factor == 0) { | |||
avl_adjust_balance_factor(parent, sign); | |||
/* @parent is still sufficiently balanced (-1 or +1 | |||
* balance factor), but must have increased in height. | |||
* Continue up the tree. */ | |||
return false; | |||
} | |||
new_balance_factor = old_balance_factor + sign; | |||
if (new_balance_factor == 0) { | |||
avl_adjust_balance_factor(parent, sign); | |||
/* @parent is now perfectly balanced (0 balance factor). | |||
* It cannot have increased in height, so there is | |||
* nothing more to do. */ | |||
return true; | |||
} | |||
/* @parent is too left-heavy (new_balance_factor == -2) or | |||
* too right-heavy (new_balance_factor == +2). */ | |||
/* Test whether @node is left-heavy (-1 balance factor) or | |||
* right-heavy (+1 balance factor). | |||
* Note that it cannot be perfectly balanced (0 balance factor) | |||
* because here we are under the invariant that @node has | |||
* increased in height due to the insertion. */ | |||
if (sign * avl_get_balance_factor(node) > 0) { | |||
/* @node (B below) is heavy in the same direction @parent | |||
* (A below) is heavy. | |||
* | |||
* @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | |||
* The comment, diagram, and equations below assume sign < 0. | |||
* The other case is symmetric! | |||
* @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | |||
* | |||
* Do a clockwise rotation rooted at @parent (A below): | |||
* | |||
* A B | |||
* / \ / \ | |||
* B C? => D A | |||
* / \ / \ / \ | |||
* D E? F? G?E? C? | |||
* / \ | |||
* F? G? | |||
* | |||
* Before the rotation: | |||
* balance(A) = -2 | |||
* balance(B) = -1 | |||
* Let x = height(C). Then: | |||
* height(B) = x + 2 | |||
* height(D) = x + 1 | |||
* height(E) = x | |||
* max(height(F), height(G)) = x. | |||
* | |||
* After the rotation: | |||
* height(D) = max(height(F), height(G)) + 1 | |||
* = x + 1 | |||
* height(A) = max(height(E), height(C)) + 1 | |||
* = max(x, x) + 1 = x + 1 | |||
* balance(B) = 0 | |||
* balance(A) = 0 | |||
*/ | |||
avl_rotate(root_ptr, parent, -sign); | |||
/* Equivalent to setting @parent's balance factor to 0. */ | |||
avl_adjust_balance_factor(parent, -sign); /* A */ | |||
/* Equivalent to setting @node's balance factor to 0. */ | |||
avl_adjust_balance_factor(node, -sign); /* B */ | |||
} else { | |||
/* @node (B below) is heavy in the direction opposite | |||
* from the direction @parent (A below) is heavy. | |||
* | |||
* @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | |||
* The comment, diagram, and equations below assume sign < 0. | |||
* The other case is symmetric! | |||
* @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | |||
* | |||
* Do a counterblockwise rotation rooted at @node (B below), | |||
* then a clockwise rotation rooted at @parent (A below): | |||
* | |||
* A A E | |||
* / \ / \ / \ | |||
* B C? => E C? => B A | |||
* / \ / \ / \ / \ | |||
* D? E B G? D? F?G? C? | |||
* / \ / \ | |||
* F? G? D? F? | |||
* | |||
* Before the rotation: | |||
* balance(A) = -2 | |||
* balance(B) = +1 | |||
* Let x = height(C). Then: | |||
* height(B) = x + 2 | |||
* height(E) = x + 1 | |||
* height(D) = x | |||
* max(height(F), height(G)) = x | |||
* | |||
* After both rotations: | |||
* height(A) = max(height(G), height(C)) + 1 | |||
* = x + 1 | |||
* balance(A) = balance(E{orig}) >= 0 ? 0 : -balance(E{orig}) | |||
* height(B) = max(height(D), height(F)) + 1 | |||
* = x + 1 | |||
* balance(B) = balance(E{orig} <= 0) ? 0 : -balance(E{orig}) | |||
* | |||
* height(E) = x + 2 | |||
* balance(E) = 0 | |||
*/ | |||
avl_do_double_rotate(root_ptr, node, parent, -sign); | |||
} | |||
/* Height after rotation is unchanged; nothing more to do. */ | |||
return true; | |||
} | |||
/* Rebalance the tree after insertion of the specified node. */ | |||
void | |||
avl_tree_rebalance_after_insert(struct avl_tree_node **root_ptr, | |||
struct avl_tree_node *inserted) | |||
{ | |||
struct avl_tree_node *node, *parent; | |||
bool done; | |||
inserted->left = NULL; | |||
inserted->right = NULL; | |||
node = inserted; | |||
/* Adjust balance factor of new node's parent. | |||
* No rotation will need to be done at this level. */ | |||
parent = avl_get_parent(node); | |||
if (!parent) | |||
return; | |||
if (node == parent->left) | |||
avl_adjust_balance_factor(parent, -1); | |||
else | |||
avl_adjust_balance_factor(parent, +1); | |||
if (avl_get_balance_factor(parent) == 0) | |||
/* @parent did not change in height. Nothing more to do. */ | |||
return; | |||
/* The subtree rooted at @parent increased in height by 1. */ | |||
do { | |||
/* Adjust balance factor of next ancestor. */ | |||
node = parent; | |||
parent = avl_get_parent(node); | |||
if (!parent) | |||
return; | |||
/* The subtree rooted at @node has increased in height by 1. */ | |||
if (node == parent->left) | |||
done = avl_handle_subtree_growth(root_ptr, node, | |||
parent, -1); | |||
else | |||
done = avl_handle_subtree_growth(root_ptr, node, | |||
parent, +1); | |||
} while (!done); | |||
} | |||
/* | |||
* This function handles the shrinkage of a subtree due to a deletion. | |||
* | |||
* @root_ptr | |||
* Location of the tree's root pointer. | |||
* | |||
* @parent | |||
* A node in the tree, exactly one of whose subtrees has decreased | |||
* in height by 1 due to a deletion. (This includes the case where | |||
* one of the child pointers has become NULL, since we can consider | |||
* the "NULL" subtree to have a height of 0.) | |||
* | |||
* @sign | |||
* +1 if the left subtree of @parent has decreased in height by 1; | |||
* -1 if the right subtree of @parent has decreased in height by 1. | |||
* | |||
* @left_deleted_ret | |||
* If the return value is not NULL, this will be set to %true if the | |||
* left subtree of the returned node has decreased in height by 1, | |||
* or %false if the right subtree of the returned node has decreased | |||
* in height by 1. | |||
* | |||
* This function will adjust @parent's balance factor, then do a (single | |||
* or double) rotation if necessary. The return value will be NULL if | |||
* the full AVL tree is now adequately balanced, or a pointer to the | |||
* parent of @parent if @parent is now adequately balanced but has | |||
* decreased in height by 1. Also in the latter case, *left_deleted_ret | |||
* will be set. | |||
*/ | |||
static AVL_INLINE struct avl_tree_node * | |||
avl_handle_subtree_shrink(struct avl_tree_node ** const root_ptr, | |||
struct avl_tree_node *parent, | |||
const int sign, | |||
bool * const left_deleted_ret) | |||
{ | |||
struct avl_tree_node *node; | |||
int old_balance_factor, new_balance_factor; | |||
old_balance_factor = avl_get_balance_factor(parent); | |||
if (old_balance_factor == 0) { | |||
/* Prior to the deletion, the subtree rooted at | |||
* @parent was perfectly balanced. It's now | |||
* unbalanced by 1, but that's okay and its height | |||
* hasn't changed. Nothing more to do. */ | |||
avl_adjust_balance_factor(parent, sign); | |||
return NULL; | |||
} | |||
new_balance_factor = old_balance_factor + sign; | |||
if (new_balance_factor == 0) { | |||
/* The subtree rooted at @parent is now perfectly | |||
* balanced, whereas before the deletion it was | |||
* unbalanced by 1. Its height must have decreased | |||
* by 1. No rotation is needed at this location, | |||
* but continue up the tree. */ | |||
avl_adjust_balance_factor(parent, sign); | |||
node = parent; | |||
} else { | |||
/* @parent is too left-heavy (new_balance_factor == -2) or | |||
* too right-heavy (new_balance_factor == +2). */ | |||
node = avl_get_child(parent, sign); | |||
/* The rotations below are similar to those done during | |||
* insertion (see avl_handle_subtree_growth()), so full | |||
* comments are not provided. The only new case is the | |||
* one where @node has a balance factor of 0, and that is | |||
* commented. */ | |||
if (sign * avl_get_balance_factor(node) >= 0) { | |||
avl_rotate(root_ptr, parent, -sign); | |||
if (avl_get_balance_factor(node) == 0) { | |||
/* | |||
* @node (B below) is perfectly balanced. | |||
* | |||
* @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | |||
* The comment, diagram, and equations | |||
* below assume sign < 0. The other case | |||
* is symmetric! | |||
* @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | |||
* | |||
* Do a clockwise rotation rooted at | |||
* @parent (A below): | |||
* | |||
* A B | |||
* / \ / \ | |||
* B C? => D A | |||
* / \ / \ / \ | |||
* D E F? G?E C? | |||
* / \ | |||
* F? G? | |||
* | |||
* Before the rotation: | |||
* balance(A) = -2 | |||
* balance(B) = 0 | |||
* Let x = height(C). Then: | |||
* height(B) = x + 2 | |||
* height(D) = x + 1 | |||
* height(E) = x + 1 | |||
* max(height(F), height(G)) = x. | |||
* | |||
* After the rotation: | |||
* height(D) = max(height(F), height(G)) + 1 | |||
* = x + 1 | |||
* height(A) = max(height(E), height(C)) + 1 | |||
* = max(x + 1, x) + 1 = x + 2 | |||
* balance(A) = -1 | |||
* balance(B) = +1 | |||
*/ | |||
/* A: -2 => -1 (sign < 0) | |||
* or +2 => +1 (sign > 0) | |||
* No change needed --- that's the same as | |||
* old_balance_factor. */ | |||
/* B: 0 => +1 (sign < 0) | |||
* or 0 => -1 (sign > 0) */ | |||
avl_adjust_balance_factor(node, -sign); | |||
/* Height is unchanged; nothing more to do. */ | |||
return NULL; | |||
} else { | |||
avl_adjust_balance_factor(parent, -sign); | |||
avl_adjust_balance_factor(node, -sign); | |||
} | |||
} else { | |||
node = avl_do_double_rotate(root_ptr, node, | |||
parent, -sign); | |||
} | |||
} | |||
parent = avl_get_parent(node); | |||
if (parent) | |||
*left_deleted_ret = (node == parent->left); | |||
return parent; | |||
} | |||
/* Swaps node X, which must have 2 children, with its in-order successor, then | |||
* unlinks node X. Returns the parent of X just before unlinking, without its | |||
* balance factor having been updated to account for the unlink. */ | |||
static AVL_INLINE struct avl_tree_node * | |||
avl_tree_swap_with_successor(struct avl_tree_node **root_ptr, | |||
struct avl_tree_node *X, | |||
bool *left_deleted_ret) | |||
{ | |||
struct avl_tree_node *Y, *ret; | |||
Y = X->right; | |||
if (!Y->left) { | |||
/* | |||
* P? P? P? | |||
* | | | | |||
* X Y Y | |||
* / \ / \ / \ | |||
* A Y => A X => A B? | |||
* / \ / \ | |||
* (0) B? (0) B? | |||
* | |||
* [ X unlinked, Y returned ] | |||
*/ | |||
ret = Y; | |||
*left_deleted_ret = false; | |||
} else { | |||
struct avl_tree_node *Q; | |||
do { | |||
Q = Y; | |||
Y = Y->left; | |||
} while (Y->left); | |||
/* | |||
* P? P? P? | |||
* | | | | |||
* X Y Y | |||
* / \ / \ / \ | |||
* A ... => A ... => A ... | |||
* | | | | |||
* Q Q Q | |||
* / / / | |||
* Y X B? | |||
* / \ / \ | |||
* (0) B? (0) B? | |||
* | |||
* | |||
* [ X unlinked, Q returned ] | |||
*/ | |||
Q->left = Y->right; | |||
if (Q->left) | |||
avl_set_parent(Q->left, Q); | |||
Y->right = X->right; | |||
avl_set_parent(X->right, Y); | |||
ret = Q; | |||
*left_deleted_ret = true; | |||
} | |||
Y->left = X->left; | |||
avl_set_parent(X->left, Y); | |||
Y->parent_balance = X->parent_balance; | |||
avl_replace_child(root_ptr, avl_get_parent(X), X, Y); | |||
return ret; | |||
} | |||
/* | |||
* Removes an item from the specified AVL tree. | |||
* | |||
* @root_ptr | |||
* Location of the AVL tree's root pointer. Indirection is needed | |||
* because the root node may change if the tree needed to be rebalanced | |||
* because of the deletion or if @node was the root node. | |||
* | |||
* @node | |||
* Pointer to the `struct avl_tree_node' embedded in the item to | |||
* remove from the tree. | |||
* | |||
* Note: This function *only* removes the node and rebalances the tree. | |||
* It does not free any memory, nor does it do the equivalent of | |||
* avl_tree_node_set_unlinked(). | |||
*/ | |||
void | |||
avl_tree_remove(struct avl_tree_node **root_ptr, struct avl_tree_node *node) | |||
{ | |||
struct avl_tree_node *parent; | |||
bool left_deleted = false; | |||
if (node->left && node->right) { | |||
/* @node is fully internal, with two children. Swap it | |||
* with its in-order successor (which must exist in the | |||
* right subtree of @node and can have, at most, a right | |||
* child), then unlink @node. */ | |||
parent = avl_tree_swap_with_successor(root_ptr, node, | |||
&left_deleted); | |||
/* @parent is now the parent of what was @node's in-order | |||
* successor. It cannot be NULL, since @node itself was | |||
* an ancestor of its in-order successor. | |||
* @left_deleted has been set to %true if @node's | |||
* in-order successor was the left child of @parent, | |||
* otherwise %false. */ | |||
} else { | |||
struct avl_tree_node *child; | |||
/* @node is missing at least one child. Unlink it. Set | |||
* @parent to @node's parent, and set @left_deleted to | |||
* reflect which child of @parent @node was. Or, if | |||
* @node was the root node, simply update the root node | |||
* and return. */ | |||
child = node->left ? node->left : node->right; | |||
parent = avl_get_parent(node); | |||
if (parent) { | |||
if (node == parent->left) { | |||
parent->left = child; | |||
left_deleted = true; | |||
} else { | |||
parent->right = child; | |||
left_deleted = false; | |||
} | |||
if (child) | |||
avl_set_parent(child, parent); | |||
} else { | |||
if (child) | |||
avl_set_parent(child, parent); | |||
*root_ptr = child; | |||
return; | |||
} | |||
} | |||
/* Rebalance the tree. */ | |||
do { | |||
if (left_deleted) | |||
parent = avl_handle_subtree_shrink(root_ptr, parent, | |||
+1, &left_deleted); | |||
else | |||
parent = avl_handle_subtree_shrink(root_ptr, parent, | |||
-1, &left_deleted); | |||
} while (parent); | |||
} |
@@ -0,0 +1,363 @@ | |||
/* | |||
* avl_tree.h - intrusive, nonrecursive AVL tree data structure (self-balancing | |||
* binary search tree), header file | |||
* | |||
* Written in 2014-2016 by Eric Biggers <ebiggers3@gmail.com> | |||
* Slight changes for compatibility by Ben Kurtovic <ben.kurtovic@gmail.com> | |||
* | |||
* To the extent possible under law, the author(s) have dedicated all copyright | |||
* and related and neighboring rights to this software to the public domain | |||
* worldwide via the Creative Commons Zero 1.0 Universal Public Domain | |||
* Dedication (the "CC0"). | |||
* | |||
* This software is distributed in the hope that it will be useful, but WITHOUT | |||
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS | |||
* FOR A PARTICULAR PURPOSE. See the CC0 for more details. | |||
* | |||
* You should have received a copy of the CC0 along with this software; if not | |||
* see <http://creativecommons.org/publicdomain/zero/1.0/>. | |||
*/ | |||
#ifndef _AVL_TREE_H_ | |||
#define _AVL_TREE_H_ | |||
#include <stddef.h> | |||
#if !defined(_MSC_VER) || (_MSC_VER >= 1600) | |||
#include <stdint.h> | |||
#endif | |||
#ifdef __GNUC__ | |||
# define AVL_INLINE inline __attribute__((always_inline)) | |||
#elif defined(_MSC_VER) && (_MSC_VER < 1900) | |||
# define AVL_INLINE __inline | |||
#else | |||
# define AVL_INLINE inline | |||
#endif | |||
/* Node in an AVL tree. Embed this in some other data structure. */ | |||
struct avl_tree_node { | |||
/* Pointer to left child or NULL */ | |||
struct avl_tree_node *left; | |||
/* Pointer to right child or NULL */ | |||
struct avl_tree_node *right; | |||
/* Pointer to parent combined with the balance factor. This saves 4 or | |||
* 8 bytes of memory depending on the CPU architecture. | |||
* | |||
* Low 2 bits: One greater than the balance factor of this subtree, | |||
* which is equal to height(right) - height(left). The mapping is: | |||
* | |||
* 00 => -1 | |||
* 01 => 0 | |||
* 10 => +1 | |||
* 11 => undefined | |||
* | |||
* The rest of the bits are the pointer to the parent node. It must be | |||
* 4-byte aligned, and it will be NULL if this is the root node and | |||
* therefore has no parent. */ | |||
uintptr_t parent_balance; | |||
}; | |||
/* Cast an AVL tree node to the containing data structure. */ | |||
#define avl_tree_entry(entry, type, member) \ | |||
((type*) ((char *)(entry) - offsetof(type, member))) | |||
/* Returns a pointer to the parent of the specified AVL tree node, or NULL if it | |||
* is already the root of the tree. */ | |||
static AVL_INLINE struct avl_tree_node * | |||
avl_get_parent(const struct avl_tree_node *node) | |||
{ | |||
return (struct avl_tree_node *)(node->parent_balance & ~3); | |||
} | |||
/* Marks the specified AVL tree node as unlinked from any tree. */ | |||
static AVL_INLINE void | |||
avl_tree_node_set_unlinked(struct avl_tree_node *node) | |||
{ | |||
node->parent_balance = (uintptr_t)node; | |||
} | |||
/* Returns true iff the specified AVL tree node has been marked with | |||
* avl_tree_node_set_unlinked() and has not subsequently been inserted into a | |||
* tree. */ | |||
static AVL_INLINE int | |||
avl_tree_node_is_unlinked(const struct avl_tree_node *node) | |||
{ | |||
return node->parent_balance == (uintptr_t)node; | |||
} | |||
/* (Internal use only) */ | |||
extern void | |||
avl_tree_rebalance_after_insert(struct avl_tree_node **root_ptr, | |||
struct avl_tree_node *inserted); | |||
/* | |||
* Looks up an item in the specified AVL tree. | |||
* | |||
* @root | |||
* Pointer to the root of the AVL tree. (This can be NULL --- that just | |||
* means the tree is empty.) | |||
* | |||
* @cmp_ctx | |||
* First argument to pass to the comparison callback. This generally | |||
* should be a pointer to an object equal to the one being searched for. | |||
* | |||
* @cmp | |||
* Comparison callback. Must return < 0, 0, or > 0 if the first argument | |||
* is less than, equal to, or greater than the second argument, | |||
* respectively. The first argument will be @cmp_ctx and the second | |||
* argument will be a pointer to the AVL tree node of an item in the tree. | |||
* | |||
* Returns a pointer to the AVL tree node of the resulting item, or NULL if the | |||
* item was not found. | |||
* | |||
* Example: | |||
* | |||
* struct int_wrapper { | |||
* int data; | |||
* struct avl_tree_node index_node; | |||
* }; | |||
* | |||
* static int _avl_cmp_int_to_node(const void *intptr, | |||
* const struct avl_tree_node *nodeptr) | |||
* { | |||
* int n1 = *(const int *)intptr; | |||
* int n2 = avl_tree_entry(nodeptr, struct int_wrapper, index_node)->data; | |||
* if (n1 < n2) | |||
* return -1; | |||
* else if (n1 > n2) | |||
* return 1; | |||
* else | |||
* return 0; | |||
* } | |||
* | |||
* bool contains_int(struct avl_tree_node *root, int n) | |||
* { | |||
* struct avl_tree_node *result; | |||
* | |||
* result = avl_tree_lookup(root, &n, _avl_cmp_int_to_node); | |||
* return result ? true : false; | |||
* } | |||
*/ | |||
static AVL_INLINE struct avl_tree_node * | |||
avl_tree_lookup(const struct avl_tree_node *root, | |||
const void *cmp_ctx, | |||
int (*cmp)(const void *, const struct avl_tree_node *)) | |||
{ | |||
const struct avl_tree_node *cur = root; | |||
while (cur) { | |||
int res = (*cmp)(cmp_ctx, cur); | |||
if (res < 0) | |||
cur = cur->left; | |||
else if (res > 0) | |||
cur = cur->right; | |||
else | |||
break; | |||
} | |||
return (struct avl_tree_node*)cur; | |||
} | |||
/* Same as avl_tree_lookup(), but uses a more specific type for the comparison | |||
* function. Specifically, with this function the item being searched for is | |||
* expected to be in the same format as those already in the tree, with an | |||
* embedded 'struct avl_tree_node'. */ | |||
static AVL_INLINE struct avl_tree_node * | |||
avl_tree_lookup_node(const struct avl_tree_node *root, | |||
const struct avl_tree_node *node, | |||
int (*cmp)(const struct avl_tree_node *, | |||
const struct avl_tree_node *)) | |||
{ | |||
const struct avl_tree_node *cur = root; | |||
while (cur) { | |||
int res = (*cmp)(node, cur); | |||
if (res < 0) | |||
cur = cur->left; | |||
else if (res > 0) | |||
cur = cur->right; | |||
else | |||
break; | |||
} | |||
return (struct avl_tree_node*)cur; | |||
} | |||
/* | |||
* Inserts an item into the specified AVL tree. | |||
* | |||
* @root_ptr | |||
* Location of the AVL tree's root pointer. Indirection is needed because | |||
* the root node may change as a result of rotations caused by the | |||
* insertion. Initialize *root_ptr to NULL for an empty tree. | |||
* | |||
* @item | |||
* Pointer to the `struct avl_tree_node' embedded in the item to insert. | |||
* No members in it need be pre-initialized, although members in the | |||
* containing structure should be pre-initialized so that @cmp can use them | |||
* in comparisons. | |||
* | |||
* @cmp | |||
* Comparison callback. Must return < 0, 0, or > 0 if the first argument | |||
* is less than, equal to, or greater than the second argument, | |||
* respectively. The first argument will be @item and the second | |||
* argument will be a pointer to an AVL tree node embedded in some | |||
* previously-inserted item to which @item is being compared. | |||
* | |||
* If no item in the tree is comparatively equal (via @cmp) to @item, inserts | |||
* @item and returns NULL. Otherwise does nothing and returns a pointer to the | |||
* AVL tree node embedded in the previously-inserted item which compared equal | |||
* to @item. | |||
* | |||
* Example: | |||
* | |||
* struct int_wrapper { | |||
* int data; | |||
* struct avl_tree_node index_node; | |||
* }; | |||
* | |||
* #define GET_DATA(i) avl_tree_entry((i), struct int_wrapper, index_node)->data | |||
* | |||
* static int _avl_cmp_ints(const struct avl_tree_node *node1, | |||
* const struct avl_tree_node *node2) | |||
* { | |||
* int n1 = GET_DATA(node1); | |||
* int n2 = GET_DATA(node2); | |||
* if (n1 < n2) | |||
* return -1; | |||
* else if (n1 > n2) | |||
* return 1; | |||
* else | |||
* return 0; | |||
* } | |||
* | |||
* bool insert_int(struct avl_tree_node **root_ptr, int data) | |||
* { | |||
* struct int_wrapper *i = malloc(sizeof(struct int_wrapper)); | |||
* i->data = data; | |||
* if (avl_tree_insert(root_ptr, &i->index_node, _avl_cmp_ints)) { | |||
* // Duplicate. | |||
* free(i); | |||
* return false; | |||
* } | |||
* return true; | |||
* } | |||
*/ | |||
static AVL_INLINE struct avl_tree_node * | |||
avl_tree_insert(struct avl_tree_node **root_ptr, | |||
struct avl_tree_node *item, | |||
int (*cmp)(const struct avl_tree_node *, | |||
const struct avl_tree_node *)) | |||
{ | |||
struct avl_tree_node **cur_ptr = root_ptr, *cur = NULL; | |||
int res; | |||
while (*cur_ptr) { | |||
cur = *cur_ptr; | |||
res = (*cmp)(item, cur); | |||
if (res < 0) | |||
cur_ptr = &cur->left; | |||
else if (res > 0) | |||
cur_ptr = &cur->right; | |||
else | |||
return cur; | |||
} | |||
*cur_ptr = item; | |||
item->parent_balance = (uintptr_t)cur | 1; | |||
avl_tree_rebalance_after_insert(root_ptr, item); | |||
return NULL; | |||
} | |||
/* Removes an item from the specified AVL tree. | |||
* See implementation for details. */ | |||
extern void | |||
avl_tree_remove(struct avl_tree_node **root_ptr, struct avl_tree_node *node); | |||
/* Nonrecursive AVL tree traversal functions */ | |||
extern struct avl_tree_node * | |||
avl_tree_first_in_order(const struct avl_tree_node *root); | |||
extern struct avl_tree_node * | |||
avl_tree_last_in_order(const struct avl_tree_node *root); | |||
extern struct avl_tree_node * | |||
avl_tree_next_in_order(const struct avl_tree_node *node); | |||
extern struct avl_tree_node * | |||
avl_tree_prev_in_order(const struct avl_tree_node *node); | |||
extern struct avl_tree_node * | |||
avl_tree_first_in_postorder(const struct avl_tree_node *root); | |||
extern struct avl_tree_node * | |||
avl_tree_next_in_postorder(const struct avl_tree_node *prev, | |||
const struct avl_tree_node *prev_parent); | |||
/* | |||
* Iterate through the nodes in an AVL tree in sorted order. | |||
* You may not modify the tree during the iteration. | |||
* | |||
* @child_struct | |||
* Variable that will receive a pointer to each struct inserted into the | |||
* tree. | |||
* @root | |||
* Root of the AVL tree. | |||
* @struct_name | |||
* Type of *child_struct. | |||
* @struct_member | |||
* Member of @struct_name type that is the AVL tree node. | |||
* | |||
* Example: | |||
* | |||
* struct int_wrapper { | |||
* int data; | |||
* struct avl_tree_node index_node; | |||
* }; | |||
* | |||
* void print_ints(struct avl_tree_node *root) | |||
* { | |||
* struct int_wrapper *i; | |||
* | |||
* avl_tree_for_each_in_order(i, root, struct int_wrapper, index_node) | |||
* printf("%d\n", i->data); | |||
* } | |||
*/ | |||
#define avl_tree_for_each_in_order(child_struct, root, \ | |||
struct_name, struct_member) \ | |||
for (struct avl_tree_node *_cur = \ | |||
avl_tree_first_in_order(root); \ | |||
_cur && ((child_struct) = \ | |||
avl_tree_entry(_cur, struct_name, \ | |||
struct_member), 1); \ | |||
_cur = avl_tree_next_in_order(_cur)) | |||
/* | |||
* Like avl_tree_for_each_in_order(), but uses the reverse order. | |||
*/ | |||
#define avl_tree_for_each_in_reverse_order(child_struct, root, \ | |||
struct_name, struct_member) \ | |||
for (struct avl_tree_node *_cur = \ | |||
avl_tree_last_in_order(root); \ | |||
_cur && ((child_struct) = \ | |||
avl_tree_entry(_cur, struct_name, \ | |||
struct_member), 1); \ | |||
_cur = avl_tree_prev_in_order(_cur)) | |||
/* | |||
* Like avl_tree_for_each_in_order(), but iterates through the nodes in | |||
* postorder, so the current node may be deleted or freed. | |||
*/ | |||
#define avl_tree_for_each_in_postorder(child_struct, root, \ | |||
struct_name, struct_member) \ | |||
for (struct avl_tree_node *_cur = \ | |||
avl_tree_first_in_postorder(root), *_parent; \ | |||
_cur && ((child_struct) = \ | |||
avl_tree_entry(_cur, struct_name, \ | |||
struct_member), 1) \ | |||
&& (_parent = avl_get_parent(_cur), 1); \ | |||
_cur = avl_tree_next_in_postorder(_cur, _parent)) | |||
#endif /* _AVL_TREE_H_ */ |
@@ -1,5 +1,5 @@ | |||
/* | |||
Copyright (C) 2012-2016 Ben Kurtovic <ben.kurtovic@gmail.com> | |||
Copyright (C) 2012-2017 Ben Kurtovic <ben.kurtovic@gmail.com> | |||
Permission is hereby granted, free of charge, to any person obtaining a copy of | |||
this software and associated documentation files (the "Software"), to deal in | |||
@@ -30,6 +30,8 @@ SOFTWARE. | |||
#include <structmember.h> | |||
#include <bytesobject.h> | |||
#include "avl_tree.h" | |||
/* Compatibility macros */ | |||
#if PY_MAJOR_VERSION >= 3 | |||
@@ -92,10 +94,16 @@ typedef struct { | |||
#endif | |||
} Textbuffer; | |||
typedef struct { | |||
Py_ssize_t head; | |||
uint64_t context; | |||
} StackIdent; | |||
struct Stack { | |||
PyObject* stack; | |||
uint64_t context; | |||
Textbuffer* textbuffer; | |||
StackIdent ident; | |||
struct Stack* next; | |||
}; | |||
typedef struct Stack Stack; | |||
@@ -111,6 +119,13 @@ typedef struct { | |||
#endif | |||
} TokenizerInput; | |||
typedef struct avl_tree_node avl_tree; | |||
typedef struct { | |||
StackIdent id; | |||
struct avl_tree_node node; | |||
} route_tree_node; | |||
typedef struct { | |||
PyObject_HEAD | |||
TokenizerInput text; /* text to tokenize */ | |||
@@ -118,8 +133,8 @@ typedef struct { | |||
Py_ssize_t head; /* current position in text */ | |||
int global; /* global context */ | |||
int depth; /* stack recursion depth */ | |||
int cycles; /* total number of stack recursions */ | |||
int route_state; /* whether a BadRoute has been triggered */ | |||
uint64_t route_context; /* context when the last BadRoute was triggered */ | |||
avl_tree* bad_routes; /* stack idents for routes known to fail */ | |||
int skip_style_tags; /* temp fix for the sometimes broken tag parser */ | |||
} Tokenizer; |
@@ -1,5 +1,5 @@ | |||
/* | |||
Copyright (C) 2012-2016 Ben Kurtovic <ben.kurtovic@gmail.com> | |||
Copyright (C) 2012-2017 Ben Kurtovic <ben.kurtovic@gmail.com> | |||
Permission is hereby granted, free of charge, to any person obtaining a copy of | |||
this software and associated documentation files (the "Software"), to deal in | |||
@@ -81,6 +81,8 @@ SOFTWARE. | |||
#define LC_TABLE_TD_LINE 0x0000000800000000 | |||
#define LC_TABLE_TH_LINE 0x0000001000000000 | |||
#define LC_HTML_ENTITY 0x0000002000000000 | |||
/* Global contexts */ | |||
#define GL_HEADING 0x1 | |||
@@ -1,5 +1,5 @@ | |||
/* | |||
Copyright (C) 2012-2016 Ben Kurtovic <ben.kurtovic@gmail.com> | |||
Copyright (C) 2012-2017 Ben Kurtovic <ben.kurtovic@gmail.com> | |||
Permission is hereby granted, free of charge, to any person obtaining a copy of | |||
this software and associated documentation files (the "Software"), to deal in | |||
@@ -445,6 +445,8 @@ static int Tokenizer_parse_bracketed_uri_scheme(Tokenizer* self) | |||
Unicode this; | |||
int slashes, i; | |||
if (Tokenizer_check_route(self, LC_EXT_LINK_URI) < 0) | |||
return 0; | |||
if (Tokenizer_push(self, LC_EXT_LINK_URI)) | |||
return -1; | |||
if (Tokenizer_read(self, 0) == '/' && Tokenizer_read(self, 1) == '/') { | |||
@@ -461,7 +463,7 @@ static int Tokenizer_parse_bracketed_uri_scheme(Tokenizer* self) | |||
while (1) { | |||
if (!valid[i]) | |||
goto end_of_loop; | |||
if (this == valid[i]) | |||
if (this == (Unicode) valid[i]) | |||
break; | |||
i++; | |||
} | |||
@@ -517,6 +519,7 @@ static int Tokenizer_parse_free_uri_scheme(Tokenizer* self) | |||
Unicode chunk; | |||
Py_ssize_t i; | |||
int slashes, j; | |||
uint64_t new_context; | |||
if (!scheme_buffer) | |||
return -1; | |||
@@ -533,7 +536,7 @@ static int Tokenizer_parse_free_uri_scheme(Tokenizer* self) | |||
FAIL_ROUTE(0); | |||
return 0; | |||
} | |||
} while (chunk != valid[j++]); | |||
} while (chunk != (Unicode) valid[j++]); | |||
Textbuffer_write(scheme_buffer, chunk); | |||
} | |||
end_of_loop: | |||
@@ -552,7 +555,12 @@ static int Tokenizer_parse_free_uri_scheme(Tokenizer* self) | |||
return 0; | |||
} | |||
Py_DECREF(scheme); | |||
if (Tokenizer_push(self, self->topstack->context | LC_EXT_LINK_URI)) { | |||
new_context = self->topstack->context | LC_EXT_LINK_URI; | |||
if (Tokenizer_check_route(self, new_context) < 0) { | |||
Textbuffer_dealloc(scheme_buffer); | |||
return 0; | |||
} | |||
if (Tokenizer_push(self, new_context)) { | |||
Textbuffer_dealloc(scheme_buffer); | |||
return -1; | |||
} | |||
@@ -1000,7 +1008,7 @@ static int Tokenizer_really_parse_entity(Tokenizer* self) | |||
while (1) { | |||
if (!valid[j]) | |||
FAIL_ROUTE_AND_EXIT() | |||
if (this == valid[j]) | |||
if (this == (Unicode) valid[j]) | |||
break; | |||
j++; | |||
} | |||
@@ -1065,11 +1073,14 @@ static int Tokenizer_parse_entity(Tokenizer* self) | |||
Py_ssize_t reset = self->head; | |||
PyObject *tokenlist; | |||
if (Tokenizer_push(self, 0)) | |||
if (Tokenizer_check_route(self, LC_HTML_ENTITY) < 0) | |||
goto on_bad_route; | |||
if (Tokenizer_push(self, LC_HTML_ENTITY)) | |||
return -1; | |||
if (Tokenizer_really_parse_entity(self)) | |||
return -1; | |||
if (BAD_ROUTE) { | |||
on_bad_route: | |||
RESET_ROUTE(); | |||
self->head = reset; | |||
if (Tokenizer_emit_char(self, '&')) | |||
@@ -1537,6 +1548,14 @@ static PyObject* Tokenizer_handle_single_tag_end(Tokenizer* self) | |||
if (depth == 0) | |||
break; | |||
} | |||
is_instance = PyObject_IsInstance(token, TagCloseSelfclose); | |||
if (is_instance == -1) | |||
return NULL; | |||
else if (is_instance == 1) { | |||
depth--; | |||
if (depth == 0) // Should never happen | |||
return NULL; | |||
} | |||
} | |||
if (!token || depth > 0) | |||
return NULL; | |||
@@ -1574,6 +1593,8 @@ static PyObject* Tokenizer_really_parse_tag(Tokenizer* self) | |||
if (!data) | |||
return NULL; | |||
if (Tokenizer_check_route(self, LC_TAG_OPEN) < 0) | |||
return NULL; | |||
if (Tokenizer_push(self, LC_TAG_OPEN)) { | |||
TagData_dealloc(data); | |||
return NULL; | |||
@@ -2191,14 +2212,18 @@ static PyObject* Tokenizer_handle_table_style(Tokenizer* self, Unicode end_token | |||
static int Tokenizer_parse_table(Tokenizer* self) | |||
{ | |||
Py_ssize_t reset = self->head; | |||
PyObject *style, *padding; | |||
PyObject *style, *padding, *trash; | |||
PyObject *table = NULL; | |||
StackIdent restore_point; | |||
self->head += 2; | |||
if(Tokenizer_push(self, LC_TABLE_OPEN)) | |||
if (Tokenizer_check_route(self, LC_TABLE_OPEN) < 0) | |||
goto on_bad_route; | |||
if (Tokenizer_push(self, LC_TABLE_OPEN)) | |||
return -1; | |||
padding = Tokenizer_handle_table_style(self, '\n'); | |||
if (BAD_ROUTE) { | |||
on_bad_route: | |||
RESET_ROUTE(); | |||
self->head = reset; | |||
if (Tokenizer_emit_char(self, '{')) | |||
@@ -2214,11 +2239,16 @@ static int Tokenizer_parse_table(Tokenizer* self) | |||
} | |||
self->head++; | |||
restore_point = self->topstack->ident; | |||
table = Tokenizer_parse(self, LC_TABLE_OPEN, 1); | |||
if (BAD_ROUTE) { | |||
RESET_ROUTE(); | |||
Py_DECREF(padding); | |||
Py_DECREF(style); | |||
while (!Tokenizer_IS_CURRENT_STACK(self, restore_point)) { | |||
trash = Tokenizer_pop(self); | |||
Py_XDECREF(trash); | |||
} | |||
self->head = reset; | |||
if (Tokenizer_emit_char(self, '{')) | |||
return -1; | |||
@@ -2243,7 +2273,7 @@ static int Tokenizer_parse_table(Tokenizer* self) | |||
*/ | |||
static int Tokenizer_handle_table_row(Tokenizer* self) | |||
{ | |||
PyObject *padding, *style, *row, *trash; | |||
PyObject *padding, *style, *row; | |||
self->head += 2; | |||
if (!Tokenizer_CAN_RECURSE(self)) { | |||
@@ -2253,14 +2283,13 @@ static int Tokenizer_handle_table_row(Tokenizer* self) | |||
return 0; | |||
} | |||
if(Tokenizer_push(self, LC_TABLE_OPEN | LC_TABLE_ROW_OPEN)) | |||
if (Tokenizer_check_route(self, LC_TABLE_OPEN | LC_TABLE_ROW_OPEN) < 0) | |||
return 0; | |||
if (Tokenizer_push(self, LC_TABLE_OPEN | LC_TABLE_ROW_OPEN)) | |||
return -1; | |||
padding = Tokenizer_handle_table_style(self, '\n'); | |||
if (BAD_ROUTE) { | |||
trash = Tokenizer_pop(self); | |||
Py_XDECREF(trash); | |||
if (BAD_ROUTE) | |||
return 0; | |||
} | |||
if (!padding) | |||
return -1; | |||
style = Tokenizer_pop(self); | |||
@@ -2319,8 +2348,8 @@ Tokenizer_handle_table_cell(Tokenizer* self, const char *markup, | |||
if (cell_context & LC_TABLE_CELL_STYLE) { | |||
Py_DECREF(cell); | |||
self->head = reset; | |||
if(Tokenizer_push(self, LC_TABLE_OPEN | LC_TABLE_CELL_OPEN | | |||
line_context)) | |||
if (Tokenizer_push(self, LC_TABLE_OPEN | LC_TABLE_CELL_OPEN | | |||
line_context)) | |||
return -1; | |||
padding = Tokenizer_handle_table_style(self, '|'); | |||
if (!padding) | |||
@@ -2541,6 +2570,8 @@ PyObject* Tokenizer_parse(Tokenizer* self, uint64_t context, int push) | |||
PyObject* temp; | |||
if (push) { | |||
if (Tokenizer_check_route(self, context) < 0) | |||
return NULL; | |||
if (Tokenizer_push(self, context)) | |||
return NULL; | |||
} | |||
@@ -1,5 +1,5 @@ | |||
/* | |||
Copyright (C) 2012-2016 Ben Kurtovic <ben.kurtovic@gmail.com> | |||
Copyright (C) 2012-2017 Ben Kurtovic <ben.kurtovic@gmail.com> | |||
Permission is hereby granted, free of charge, to any person obtaining a copy of | |||
this software and associated documentation files (the "Software"), to deal in | |||
@@ -40,10 +40,11 @@ int Tokenizer_push(Tokenizer* self, uint64_t context) | |||
top->textbuffer = Textbuffer_new(&self->text); | |||
if (!top->textbuffer) | |||
return -1; | |||
top->ident.head = self->head; | |||
top->ident.context = context; | |||
top->next = self->topstack; | |||
self->topstack = top; | |||
self->depth++; | |||
self->cycles++; | |||
return 0; | |||
} | |||
@@ -130,20 +131,88 @@ PyObject* Tokenizer_pop_keeping_context(Tokenizer* self) | |||
} | |||
/* | |||
Compare two route_tree_nodes that are in their avl_tree_node forms. | |||
*/ | |||
static int compare_nodes( | |||
const struct avl_tree_node* na, const struct avl_tree_node* nb) | |||
{ | |||
route_tree_node *a = avl_tree_entry(na, route_tree_node, node); | |||
route_tree_node *b = avl_tree_entry(nb, route_tree_node, node); | |||
if (a->id.head < b->id.head) | |||
return -1; | |||
if (a->id.head > b->id.head) | |||
return 1; | |||
return (a->id.context > b->id.context) - (a->id.context < b->id.context); | |||
} | |||
/* | |||
Fail the current tokenization route. Discards the current | |||
stack/context/textbuffer and sets the BAD_ROUTE flag. | |||
stack/context/textbuffer and sets the BAD_ROUTE flag. Also records the | |||
ident of the failed stack so future parsing attempts down this route can be | |||
stopped early. | |||
*/ | |||
void* Tokenizer_fail_route(Tokenizer* self) | |||
{ | |||
uint64_t context = self->topstack->context; | |||
PyObject* stack = Tokenizer_pop(self); | |||
PyObject* stack; | |||
route_tree_node *node = malloc(sizeof(route_tree_node)); | |||
if (node) { | |||
node->id = self->topstack->ident; | |||
if (avl_tree_insert(&self->bad_routes, &node->node, compare_nodes)) | |||
free(node); | |||
} | |||
stack = Tokenizer_pop(self); | |||
Py_XDECREF(stack); | |||
FAIL_ROUTE(context); | |||
return NULL; | |||
} | |||
/* | |||
Check if pushing a new route here with the given context would definitely | |||
fail, based on a previous call to Tokenizer_fail_route() with the same | |||
stack. | |||
Return 0 if safe and -1 if unsafe. The BAD_ROUTE flag will be set in the | |||
latter case. | |||
This function is not necessary to call and works as an optimization | |||
implementation detail. (The Python tokenizer checks every route on push, | |||
but this would introduce too much overhead in C tokenizer due to the need | |||
to check for a bad route after every call to Tokenizer_push.) | |||
*/ | |||
int Tokenizer_check_route(Tokenizer* self, uint64_t context) | |||
{ | |||
StackIdent ident = {self->head, context}; | |||
struct avl_tree_node *node = (struct avl_tree_node*) (&ident + 1); | |||
if (avl_tree_lookup_node(self->bad_routes, node, compare_nodes)) { | |||
FAIL_ROUTE(context); | |||
return -1; | |||
} | |||
return 0; | |||
} | |||
/* | |||
Free the tokenizer's bad route cache tree. Intended to be called by the | |||
main tokenizer function after parsing is finished. | |||
*/ | |||
void Tokenizer_free_bad_route_tree(Tokenizer *self) | |||
{ | |||
struct avl_tree_node *cur = avl_tree_first_in_postorder(self->bad_routes); | |||
struct avl_tree_node *parent; | |||
while (cur) { | |||
route_tree_node *node = avl_tree_entry(cur, route_tree_node, node); | |||
parent = avl_get_parent(cur); | |||
free(node); | |||
cur = avl_tree_next_in_postorder(cur, parent); | |||
} | |||
self->bad_routes = NULL; | |||
} | |||
/* | |||
Write a token to the current token stack. | |||
*/ | |||
int Tokenizer_emit_token(Tokenizer* self, PyObject* token, int first) | |||
@@ -1,5 +1,5 @@ | |||
/* | |||
Copyright (C) 2012-2016 Ben Kurtovic <ben.kurtovic@gmail.com> | |||
Copyright (C) 2012-2017 Ben Kurtovic <ben.kurtovic@gmail.com> | |||
Permission is hereby granted, free of charge, to any person obtaining a copy of | |||
this software and associated documentation files (the "Software"), to deal in | |||
@@ -32,6 +32,8 @@ void Tokenizer_delete_top_of_stack(Tokenizer*); | |||
PyObject* Tokenizer_pop(Tokenizer*); | |||
PyObject* Tokenizer_pop_keeping_context(Tokenizer*); | |||
void* Tokenizer_fail_route(Tokenizer*); | |||
int Tokenizer_check_route(Tokenizer*, uint64_t); | |||
void Tokenizer_free_bad_route_tree(Tokenizer*); | |||
int Tokenizer_emit_token(Tokenizer*, PyObject*, int); | |||
int Tokenizer_emit_token_kwargs(Tokenizer*, PyObject*, PyObject*, int); | |||
@@ -47,10 +49,11 @@ Unicode Tokenizer_read_backwards(Tokenizer*, Py_ssize_t); | |||
/* Macros */ | |||
#define MAX_DEPTH 40 | |||
#define MAX_CYCLES 100000 | |||
#define Tokenizer_CAN_RECURSE(self) \ | |||
(self->depth < MAX_DEPTH && self->cycles < MAX_CYCLES) | |||
(self->depth < MAX_DEPTH) | |||
#define Tokenizer_IS_CURRENT_STACK(self, id) \ | |||
(self->topstack->ident.head == (id).head && \ | |||
self->topstack->ident.context == (id).context) | |||
#define Tokenizer_emit(self, token) \ | |||
Tokenizer_emit_token(self, token, 0) | |||
@@ -1,5 +1,5 @@ | |||
/* | |||
Copyright (C) 2012-2016 Ben Kurtovic <ben.kurtovic@gmail.com> | |||
Copyright (C) 2012-2017 Ben Kurtovic <ben.kurtovic@gmail.com> | |||
Permission is hereby granted, free of charge, to any person obtaining a copy of | |||
this software and associated documentation files (the "Software"), to deal in | |||
@@ -22,6 +22,7 @@ SOFTWARE. | |||
#include "tokenizer.h" | |||
#include "tok_parse.h" | |||
#include "tok_support.h" | |||
#include "tokens.h" | |||
/* Globals */ | |||
@@ -103,8 +104,9 @@ static int Tokenizer_init(Tokenizer* self, PyObject* args, PyObject* kwds) | |||
return -1; | |||
init_tokenizer_text(&self->text); | |||
self->topstack = NULL; | |||
self->head = self->global = self->depth = self->cycles = 0; | |||
self->head = self->global = self->depth = 0; | |||
self->route_context = self->route_state = 0; | |||
self->bad_routes = NULL; | |||
self->skip_style_tags = 0; | |||
return 0; | |||
} | |||
@@ -158,10 +160,14 @@ static PyObject* Tokenizer_tokenize(Tokenizer* self, PyObject* args) | |||
return NULL; | |||
} | |||
self->head = self->global = self->depth = self->cycles = 0; | |||
self->head = self->global = self->depth = 0; | |||
self->skip_style_tags = skip_style_tags; | |||
self->bad_routes = NULL; | |||
tokens = Tokenizer_parse(self, context, 1); | |||
Tokenizer_free_bad_route_tree(self); | |||
if (!tokens || self->topstack) { | |||
Py_XDECREF(tokens); | |||
if (PyErr_Occurred()) | |||
@@ -1,6 +1,6 @@ | |||
# -*- coding: utf-8 -*- | |||
# | |||
# Copyright (C) 2012-2016 Ben Kurtovic <ben.kurtovic@gmail.com> | |||
# Copyright (C) 2012-2017 Ben Kurtovic <ben.kurtovic@gmail.com> | |||
# | |||
# Permission is hereby granted, free of charge, to any person obtaining a copy | |||
# of this software and associated documentation files (the "Software"), to deal | |||
@@ -65,7 +65,6 @@ class Tokenizer(object): | |||
MARKERS = ["{", "}", "[", "]", "<", ">", "|", "=", "&", "'", "#", "*", ";", | |||
":", "/", "-", "!", "\n", START, END] | |||
MAX_DEPTH = 40 | |||
MAX_CYCLES = 100000 | |||
regex = re.compile(r"([{}\[\]<>|=&'#*;:/\\\"\-!\n])", flags=re.IGNORECASE) | |||
tag_splitter = re.compile(r"([\s\"\'\\]+)") | |||
@@ -75,7 +74,8 @@ class Tokenizer(object): | |||
self._stacks = [] | |||
self._global = 0 | |||
self._depth = 0 | |||
self._cycles = 0 | |||
self._bad_routes = set() | |||
self._skip_style_tags = False | |||
@property | |||
def _stack(self): | |||
@@ -100,11 +100,24 @@ class Tokenizer(object): | |||
def _textbuffer(self, value): | |||
self._stacks[-1][2] = value | |||
@property | |||
def _stack_ident(self): | |||
"""An identifier for the current stack. | |||
This is based on the starting head position and context. Stacks with | |||
the same identifier are always parsed in the same way. This can be used | |||
to cache intermediate parsing info. | |||
""" | |||
return self._stacks[-1][3] | |||
def _push(self, context=0): | |||
"""Add a new token stack, context, and textbuffer to the list.""" | |||
self._stacks.append([[], context, []]) | |||
new_ident = (self._head, context) | |||
if new_ident in self._bad_routes: | |||
raise BadRoute(context) | |||
self._stacks.append([[], context, [], new_ident]) | |||
self._depth += 1 | |||
self._cycles += 1 | |||
def _push_textbuffer(self): | |||
"""Push the textbuffer onto the stack as a Text node and clear it.""" | |||
@@ -129,7 +142,7 @@ class Tokenizer(object): | |||
def _can_recurse(self): | |||
"""Return whether or not our max recursion depth has been exceeded.""" | |||
return self._depth < self.MAX_DEPTH and self._cycles < self.MAX_CYCLES | |||
return self._depth < self.MAX_DEPTH | |||
def _fail_route(self): | |||
"""Fail the current tokenization route. | |||
@@ -138,6 +151,7 @@ class Tokenizer(object): | |||
:exc:`.BadRoute`. | |||
""" | |||
context = self._context | |||
self._bad_routes.add(self._stack_ident) | |||
self._pop() | |||
raise BadRoute(context) | |||
@@ -609,8 +623,8 @@ class Tokenizer(object): | |||
def _parse_entity(self): | |||
"""Parse an HTML entity at the head of the wikicode string.""" | |||
reset = self._head | |||
self._push() | |||
try: | |||
self._push(contexts.HTML_ENTITY) | |||
self._really_parse_entity() | |||
except BadRoute: | |||
self._head = reset | |||
@@ -650,8 +664,9 @@ class Tokenizer(object): | |||
self._emit_first(tokens.TagAttrQuote(char=data.quoter)) | |||
self._emit_all(self._pop()) | |||
buf = data.padding_buffer | |||
self._emit_first(tokens.TagAttrStart(pad_first=buf["first"], | |||
pad_before_eq=buf["before_eq"], pad_after_eq=buf["after_eq"])) | |||
self._emit_first(tokens.TagAttrStart( | |||
pad_first=buf["first"], pad_before_eq=buf["before_eq"], | |||
pad_after_eq=buf["after_eq"])) | |||
self._emit_all(self._pop()) | |||
for key in data.padding_buffer: | |||
data.padding_buffer[key] = "" | |||
@@ -804,6 +819,12 @@ class Tokenizer(object): | |||
depth -= 1 | |||
if depth == 0: | |||
break | |||
elif isinstance(token, tokens.TagCloseSelfclose): | |||
depth -= 1 | |||
if depth == 0: # pragma: no cover (untestable/exceptional) | |||
raise ParserError( | |||
"_handle_single_tag_end() got an unexpected " | |||
"TagCloseSelfclose") | |||
else: # pragma: no cover (untestable/exceptional case) | |||
raise ParserError("_handle_single_tag_end() missed a TagCloseOpen") | |||
padding = stack[index].padding | |||
@@ -1076,8 +1097,8 @@ class Tokenizer(object): | |||
"""Parse a wikicode table by starting with the first line.""" | |||
reset = self._head | |||
self._head += 2 | |||
self._push(contexts.TABLE_OPEN) | |||
try: | |||
self._push(contexts.TABLE_OPEN) | |||
padding = self._handle_table_style("\n") | |||
except BadRoute: | |||
self._head = reset | |||
@@ -1086,9 +1107,12 @@ class Tokenizer(object): | |||
style = self._pop() | |||
self._head += 1 | |||
restore_point = self._stack_ident | |||
try: | |||
table = self._parse(contexts.TABLE_OPEN) | |||
except BadRoute: | |||
while self._stack_ident != restore_point: | |||
self._pop() | |||
self._head = reset | |||
self._emit_text("{") | |||
return | |||
@@ -1106,11 +1130,7 @@ class Tokenizer(object): | |||
return | |||
self._push(contexts.TABLE_OPEN | contexts.TABLE_ROW_OPEN) | |||
try: | |||
padding = self._handle_table_style("\n") | |||
except BadRoute: | |||
self._pop() | |||
raise | |||
padding = self._handle_table_style("\n") | |||
style = self._pop() | |||
# Don't parse the style separator: | |||
@@ -1348,7 +1368,8 @@ class Tokenizer(object): | |||
# Kill potential table contexts | |||
self._context &= ~contexts.TABLE_CELL_LINE_CONTEXTS | |||
# Start of table parsing | |||
elif this == "{" and next == "|" and (self._read(-1) in ("\n", self.START) or | |||
elif this == "{" and next == "|" and ( | |||
self._read(-1) in ("\n", self.START) or | |||
(self._read(-2) in ("\n", self.START) and self._read(-1).isspace())): | |||
if self._can_recurse(): | |||
self._parse_table() | |||
@@ -1374,7 +1395,7 @@ class Tokenizer(object): | |||
self._context &= ~contexts.TABLE_CELL_LINE_CONTEXTS | |||
self._emit_text(this) | |||
elif (self._read(-1) in ("\n", self.START) or | |||
(self._read(-2) in ("\n", self.START) and self._read(-1).isspace())): | |||
(self._read(-2) in ("\n", self.START) and self._read(-1).isspace())): | |||
if this == "|" and next == "}": | |||
if self._context & contexts.TABLE_CELL_OPEN: | |||
return self._handle_table_cell_end() | |||
@@ -1406,10 +1427,12 @@ class Tokenizer(object): | |||
def tokenize(self, text, context=0, skip_style_tags=False): | |||
"""Build a list of tokens from a string of wikicode and return it.""" | |||
self._skip_style_tags = skip_style_tags | |||
split = self.regex.split(text) | |||
self._text = [segment for segment in split if segment] | |||
self._head = self._global = self._depth = self._cycles = 0 | |||
self._head = self._global = self._depth = 0 | |||
self._bad_routes = set() | |||
self._skip_style_tags = skip_style_tags | |||
try: | |||
tokens = self._parse(context) | |||
except BadRoute: # pragma: no cover (untestable/exceptional case) | |||
@@ -271,7 +271,7 @@ class _ListProxy(_SliceNormalizerMixIn, list): | |||
return bool(self._render()) | |||
def __len__(self): | |||
return (self._stop - self._start) // self._step | |||
return max((self._stop - self._start) // self._step, 0) | |||
def __getitem__(self, key): | |||
if isinstance(key, slice): | |||
@@ -108,6 +108,9 @@ class StringMixIn(object): | |||
return str(item) in self.__unicode__() | |||
def __getattr__(self, attr): | |||
if not hasattr(str, attr): | |||
raise AttributeError("{0!r} object has no attribute {1!r}".format( | |||
type(self).__name__, attr)) | |||
return getattr(self.__unicode__(), attr) | |||
if py3k: | |||
@@ -1,6 +1,6 @@ | |||
# -*- coding: utf-8 -*- | |||
# | |||
# Copyright (C) 2012-2016 Ben Kurtovic <ben.kurtovic@gmail.com> | |||
# Copyright (C) 2012-2017 Ben Kurtovic <ben.kurtovic@gmail.com> | |||
# | |||
# Permission is hereby granted, free of charge, to any person obtaining a copy | |||
# of this software and associated documentation files (the "Software"), to deal | |||
@@ -24,7 +24,7 @@ from __future__ import unicode_literals | |||
from itertools import chain | |||
import re | |||
from .compat import py3k, range, str | |||
from .compat import bytes, py3k, range, str | |||
from .nodes import (Argument, Comment, ExternalLink, Heading, HTMLEntity, | |||
Node, Tag, Template, Text, Wikilink) | |||
from .string_mixin import StringMixIn | |||
@@ -275,6 +275,21 @@ class Wikicode(StringMixIn): | |||
else: | |||
self.nodes.pop(index) | |||
def contains(self, obj): | |||
"""Return whether this Wikicode object contains *obj*. | |||
If *obj* is a :class:`.Node` or :class:`.Wikicode` object, then we | |||
search for it exactly among all of our children, recursively. | |||
Otherwise, this method just uses :meth:`.__contains__` on the string. | |||
""" | |||
if not isinstance(obj, (Node, Wikicode)): | |||
return obj in self | |||
try: | |||
self._do_strong_search(obj, recursive=True) | |||
except ValueError: | |||
return False | |||
return True | |||
def index(self, obj, recursive=False): | |||
"""Return the index of *obj* in the list of nodes. | |||
@@ -294,6 +309,52 @@ class Wikicode(StringMixIn): | |||
return i | |||
raise ValueError(obj) | |||
def get_ancestors(self, obj): | |||
"""Return a list of all ancestor nodes of the :class:`.Node` *obj*. | |||
The list is ordered from the most shallow ancestor (greatest great- | |||
grandparent) to the direct parent. The node itself is not included in | |||
the list. For example:: | |||
>>> text = "{{a|{{b|{{c|{{d}}}}}}}}" | |||
>>> code = mwparserfromhell.parse(text) | |||
>>> node = code.filter_templates(matches=lambda n: n == "{{d}}")[0] | |||
>>> code.get_ancestors(node) | |||
['{{a|{{b|{{c|{{d}}}}}}}}', '{{b|{{c|{{d}}}}}}', '{{c|{{d}}}}'] | |||
Will return an empty list if *obj* is at the top level of this Wikicode | |||
object. Will raise :exc:`ValueError` if it wasn't found. | |||
""" | |||
def _get_ancestors(code, needle): | |||
for node in code.nodes: | |||
if node is needle: | |||
return [] | |||
for code in node.__children__(): | |||
ancestors = _get_ancestors(code, needle) | |||
if ancestors is not None: | |||
return [node] + ancestors | |||
if isinstance(obj, Wikicode): | |||
obj = obj.get(0) | |||
elif not isinstance(obj, Node): | |||
raise ValueError(obj) | |||
ancestors = _get_ancestors(self, obj) | |||
if ancestors is None: | |||
raise ValueError(obj) | |||
return ancestors | |||
def get_parent(self, obj): | |||
"""Return the direct parent node of the :class:`.Node` *obj*. | |||
This function is equivalent to calling :meth:`.get_ancestors` and | |||
taking the last element of the resulting list. Will return None if | |||
the node exists but does not have a parent; i.e., it is at the top | |||
level of the Wikicode object. | |||
""" | |||
ancestors = self.get_ancestors(obj) | |||
return ancestors[-1] if ancestors else None | |||
def insert(self, index, value): | |||
"""Insert *value* at *index* in the list of nodes. | |||
@@ -413,22 +474,23 @@ class Wikicode(StringMixIn): | |||
"""Do a loose equivalency test suitable for comparing page names. | |||
*other* can be any string-like object, including :class:`.Wikicode`, or | |||
a tuple of these. This operation is symmetric; both sides are adjusted. | |||
Specifically, whitespace and markup is stripped and the first letter's | |||
case is normalized. Typical usage is | |||
an iterable of these. This operation is symmetric; both sides are | |||
adjusted. Specifically, whitespace and markup is stripped and the first | |||
letter's case is normalized. Typical usage is | |||
``if template.name.matches("stub"): ...``. | |||
""" | |||
cmp = lambda a, b: (a[0].upper() + a[1:] == b[0].upper() + b[1:] | |||
if a and b else a == b) | |||
this = self.strip_code().strip() | |||
if isinstance(other, (tuple, list)): | |||
for obj in other: | |||
that = parse_anything(obj).strip_code().strip() | |||
if cmp(this, that): | |||
return True | |||
return False | |||
that = parse_anything(other).strip_code().strip() | |||
return cmp(this, that) | |||
if isinstance(other, (str, bytes, Wikicode, Node)): | |||
that = parse_anything(other).strip_code().strip() | |||
return cmp(this, that) | |||
for obj in other: | |||
that = parse_anything(obj).strip_code().strip() | |||
if cmp(this, that): | |||
return True | |||
return False | |||
def ifilter(self, recursive=True, matches=None, flags=FLAGS, | |||
forcetype=None): | |||
@@ -530,23 +592,33 @@ class Wikicode(StringMixIn): | |||
# Ensure that earlier sections are earlier in the returned list: | |||
return [section for i, section in sorted(sections)] | |||
def strip_code(self, normalize=True, collapse=True): | |||
def strip_code(self, normalize=True, collapse=True, | |||
keep_template_params=False): | |||
"""Return a rendered string without unprintable code such as templates. | |||
The way a node is stripped is handled by the | |||
:meth:`~.Node.__strip__` method of :class:`.Node` objects, which | |||
generally return a subset of their nodes or ``None``. For example, | |||
templates and tags are removed completely, links are stripped to just | |||
their display part, headings are stripped to just their title. If | |||
*normalize* is ``True``, various things may be done to strip code | |||
their display part, headings are stripped to just their title. | |||
If *normalize* is ``True``, various things may be done to strip code | |||
further, such as converting HTML entities like ``Σ``, ``Σ``, | |||
and ``Σ`` to ``Σ``. If *collapse* is ``True``, we will try to | |||
remove excess whitespace as well (three or more newlines are converted | |||
to two, for example). | |||
to two, for example). If *keep_template_params* is ``True``, then | |||
template parameters will be preserved in the output (normally, they are | |||
removed completely). | |||
""" | |||
kwargs = { | |||
"normalize": normalize, | |||
"collapse": collapse, | |||
"keep_template_params": keep_template_params | |||
} | |||
nodes = [] | |||
for node in self.nodes: | |||
stripped = node.__strip__(normalize, collapse) | |||
stripped = node.__strip__(**kwargs) | |||
if stripped: | |||
nodes.append(str(stripped)) | |||
@@ -117,11 +117,11 @@ test_release() { | |||
fi | |||
pip -q uninstall -y mwparserfromhell | |||
echo -n "Downloading mwparserfromhell source tarball and GPG signature..." | |||
curl -sL "https://pypi.python.org/packages/source/m/mwparserfromhell/mwparserfromhell-$VERSION.tar.gz" -o "mwparserfromhell.tar.gz" | |||
curl -sL "https://pypi.python.org/packages/source/m/mwparserfromhell/mwparserfromhell-$VERSION.tar.gz.asc" -o "mwparserfromhell.tar.gz.asc" | |||
curl -sL "https://pypi.io/packages/source/m/mwparserfromhell/mwparserfromhell-$VERSION.tar.gz" -o "mwparserfromhell.tar.gz" | |||
curl -sL "https://pypi.io/packages/source/m/mwparserfromhell/mwparserfromhell-$VERSION.tar.gz.asc" -o "mwparserfromhell.tar.gz.asc" | |||
echo " done." | |||
echo "Verifying tarball..." | |||
gpg --verify mwparserfromhell.tar.gz.asc | |||
gpg --verify mwparserfromhell.tar.gz.asc mwparserfromhell.tar.gz | |||
if [[ "$?" != "0" ]]; then | |||
echo "*** ERROR: GPG signature verification failed!" | |||
deactivate | |||
@@ -56,12 +56,10 @@ class TestArgument(TreeEqualityTestCase): | |||
def test_strip(self): | |||
"""test Argument.__strip__()""" | |||
node = Argument(wraptext("foobar")) | |||
node1 = Argument(wraptext("foobar")) | |||
node2 = Argument(wraptext("foo"), wraptext("bar")) | |||
for a in (True, False): | |||
for b in (True, False): | |||
self.assertIs(None, node.__strip__(a, b)) | |||
self.assertEqual("bar", node2.__strip__(a, b)) | |||
self.assertIs(None, node1.__strip__()) | |||
self.assertEqual("bar", node2.__strip__()) | |||
def test_showtree(self): | |||
"""test Argument.__showtree__()""" | |||
@@ -49,9 +49,7 @@ class TestComment(TreeEqualityTestCase): | |||
def test_strip(self): | |||
"""test Comment.__strip__()""" | |||
node = Comment("foobar") | |||
for a in (True, False): | |||
for b in (True, False): | |||
self.assertIs(None, node.__strip__(a, b)) | |||
self.assertIs(None, node.__strip__()) | |||
def test_showtree(self): | |||
"""test Comment.__showtree__()""" | |||
@@ -66,12 +66,11 @@ class TestExternalLink(TreeEqualityTestCase): | |||
node2 = ExternalLink(wraptext("http://example.com")) | |||
node3 = ExternalLink(wraptext("http://example.com"), wrap([])) | |||
node4 = ExternalLink(wraptext("http://example.com"), wraptext("Link")) | |||
for a in (True, False): | |||
for b in (True, False): | |||
self.assertEqual("http://example.com", node1.__strip__(a, b)) | |||
self.assertEqual(None, node2.__strip__(a, b)) | |||
self.assertEqual(None, node3.__strip__(a, b)) | |||
self.assertEqual("Link", node4.__strip__(a, b)) | |||
self.assertEqual("http://example.com", node1.__strip__()) | |||
self.assertEqual(None, node2.__strip__()) | |||
self.assertEqual(None, node3.__strip__()) | |||
self.assertEqual("Link", node4.__strip__()) | |||
def test_showtree(self): | |||
"""test ExternalLink.__showtree__()""" | |||
@@ -52,9 +52,7 @@ class TestHeading(TreeEqualityTestCase): | |||
def test_strip(self): | |||
"""test Heading.__strip__()""" | |||
node = Heading(wraptext("foobar"), 3) | |||
for a in (True, False): | |||
for b in (True, False): | |||
self.assertEqual("foobar", node.__strip__(a, b)) | |||
self.assertEqual("foobar", node.__strip__()) | |||
def test_showtree(self): | |||
"""test Heading.__showtree__()""" | |||
@@ -57,13 +57,13 @@ class TestHTMLEntity(TreeEqualityTestCase): | |||
node1 = HTMLEntity("nbsp", named=True, hexadecimal=False) | |||
node2 = HTMLEntity("107", named=False, hexadecimal=False) | |||
node3 = HTMLEntity("e9", named=False, hexadecimal=True) | |||
for a in (True, False): | |||
self.assertEqual("\xa0", node1.__strip__(True, a)) | |||
self.assertEqual(" ", node1.__strip__(False, a)) | |||
self.assertEqual("k", node2.__strip__(True, a)) | |||
self.assertEqual("k", node2.__strip__(False, a)) | |||
self.assertEqual("é", node3.__strip__(True, a)) | |||
self.assertEqual("é", node3.__strip__(False, a)) | |||
self.assertEqual("\xa0", node1.__strip__(normalize=True)) | |||
self.assertEqual(" ", node1.__strip__(normalize=False)) | |||
self.assertEqual("k", node2.__strip__(normalize=True)) | |||
self.assertEqual("k", node2.__strip__(normalize=False)) | |||
self.assertEqual("é", node3.__strip__(normalize=True)) | |||
self.assertEqual("é", node3.__strip__(normalize=False)) | |||
def test_showtree(self): | |||
"""test HTMLEntity.__showtree__()""" | |||
@@ -398,6 +398,7 @@ class TestSmartList(unittest.TestCase): | |||
self.assertEqual([4, 3, 2, 1.9, 1.8, 5, 6], child1) | |||
self.assertEqual([4, 3, 2, 1.9, 1.8], child2) | |||
self.assertEqual([], child3) | |||
self.assertEqual(0, len(child3)) | |||
del child1 | |||
self.assertEqual([1, 4, 3, 2, 1.9, 1.8, 5, 6], parent) | |||
@@ -103,11 +103,10 @@ class TestTag(TreeEqualityTestCase): | |||
node1 = Tag(wraptext("i"), wraptext("foobar")) | |||
node2 = Tag(wraptext("math"), wraptext("foobar")) | |||
node3 = Tag(wraptext("br"), self_closing=True) | |||
for a in (True, False): | |||
for b in (True, False): | |||
self.assertEqual("foobar", node1.__strip__(a, b)) | |||
self.assertEqual(None, node2.__strip__(a, b)) | |||
self.assertEqual(None, node3.__strip__(a, b)) | |||
self.assertEqual("foobar", node1.__strip__()) | |||
self.assertEqual(None, node2.__strip__()) | |||
self.assertEqual(None, node3.__strip__()) | |||
def test_showtree(self): | |||
"""test Tag.__showtree__()""" | |||
@@ -67,12 +67,19 @@ class TestTemplate(TreeEqualityTestCase): | |||
def test_strip(self): | |||
"""test Template.__strip__()""" | |||
node1 = Template(wraptext("foobar")) | |||
node2 = Template(wraptext("foo"), | |||
[pgenh("1", "bar"), pgens("abc", "def")]) | |||
for a in (True, False): | |||
for b in (True, False): | |||
self.assertEqual(None, node1.__strip__(a, b)) | |||
self.assertEqual(None, node2.__strip__(a, b)) | |||
node2 = Template(wraptext("foo"), [ | |||
pgenh("1", "bar"), pgens("foo", ""), pgens("abc", "def")]) | |||
node3 = Template(wraptext("foo"), [ | |||
pgenh("1", "foo"), | |||
Parameter(wraptext("2"), wrap([Template(wraptext("hello"))]), | |||
showkey=False), | |||
pgenh("3", "bar")]) | |||
self.assertEqual(None, node1.__strip__(keep_template_params=False)) | |||
self.assertEqual(None, node2.__strip__(keep_template_params=False)) | |||
self.assertEqual("", node1.__strip__(keep_template_params=True)) | |||
self.assertEqual("bar def", node2.__strip__(keep_template_params=True)) | |||
self.assertEqual("foo bar", node3.__strip__(keep_template_params=True)) | |||
def test_showtree(self): | |||
"""test Template.__showtree__()""" | |||
@@ -216,6 +223,7 @@ class TestTemplate(TreeEqualityTestCase): | |||
node39 = Template(wraptext("a"), [pgenh("1", " b ")]) | |||
node40 = Template(wraptext("a"), [pgenh("1", " b"), pgenh("2", " c")]) | |||
node41 = Template(wraptext("a"), [pgens("1", " b"), pgens("2", " c")]) | |||
node42 = Template(wraptext("a"), [pgens("b", " \n")]) | |||
node1.add("e", "f", showkey=True) | |||
node2.add(2, "g", showkey=False) | |||
@@ -261,6 +269,7 @@ class TestTemplate(TreeEqualityTestCase): | |||
node39.add("1", "c") | |||
node40.add("3", "d") | |||
node41.add("3", "d") | |||
node42.add("b", "hello") | |||
self.assertEqual("{{a|b=c|d|e=f}}", node1) | |||
self.assertEqual("{{a|b=c|d|g}}", node2) | |||
@@ -308,6 +317,7 @@ class TestTemplate(TreeEqualityTestCase): | |||
self.assertEqual("{{a|c}}", node39) | |||
self.assertEqual("{{a| b| c|d}}", node40) | |||
self.assertEqual("{{a|1= b|2= c|3= d}}", node41) | |||
self.assertEqual("{{a|b=hello \n}}", node42) | |||
def test_remove(self): | |||
"""test Template.remove()""" | |||
@@ -49,9 +49,7 @@ class TestText(unittest.TestCase): | |||
def test_strip(self): | |||
"""test Text.__strip__()""" | |||
node = Text("foobar") | |||
for a in (True, False): | |||
for b in (True, False): | |||
self.assertIs(node, node.__strip__(a, b)) | |||
self.assertIs(node, node.__strip__()) | |||
def test_showtree(self): | |||
"""test Text.__showtree__()""" | |||
@@ -85,6 +85,17 @@ class TestWikicode(TreeEqualityTestCase): | |||
self.assertRaises(IndexError, code.set, 3, "{{baz}}") | |||
self.assertRaises(IndexError, code.set, -4, "{{baz}}") | |||
def test_contains(self): | |||
"""test Wikicode.contains()""" | |||
code = parse("Here is {{aaa|{{bbb|xyz{{ccc}}}}}} and a [[page|link]]") | |||
tmpl1, tmpl2, tmpl3 = code.filter_templates() | |||
tmpl4 = parse("{{ccc}}").filter_templates()[0] | |||
self.assertTrue(code.contains(tmpl1)) | |||
self.assertTrue(code.contains(tmpl3)) | |||
self.assertFalse(code.contains(tmpl4)) | |||
self.assertTrue(code.contains(str(tmpl4))) | |||
self.assertTrue(code.contains(tmpl2.params[0].value)) | |||
def test_index(self): | |||
"""test Wikicode.index()""" | |||
code = parse("Have a {{template}} and a [[page|link]]") | |||
@@ -102,6 +113,22 @@ class TestWikicode(TreeEqualityTestCase): | |||
self.assertRaises(ValueError, code.index, | |||
code.get(1).get(1).value, recursive=False) | |||
def test_get_ancestors_parent(self): | |||
"""test Wikicode.get_ancestors() and Wikicode.get_parent()""" | |||
code = parse("{{a|{{b|{{d|{{e}}{{f}}}}{{g}}}}}}{{c}}") | |||
tmpl = code.filter_templates(matches=lambda n: n.name == "f")[0] | |||
parent1 = code.filter_templates(matches=lambda n: n.name == "d")[0] | |||
parent2 = code.filter_templates(matches=lambda n: n.name == "b")[0] | |||
parent3 = code.filter_templates(matches=lambda n: n.name == "a")[0] | |||
fake = parse("{{f}}").get(0) | |||
self.assertEqual([parent3, parent2, parent1], code.get_ancestors(tmpl)) | |||
self.assertIs(parent1, code.get_parent(tmpl)) | |||
self.assertEqual([], code.get_ancestors(parent3)) | |||
self.assertIs(None, code.get_parent(parent3)) | |||
self.assertRaises(ValueError, code.get_ancestors, fake) | |||
self.assertRaises(ValueError, code.get_parent, fake) | |||
def test_insert(self): | |||
"""test Wikicode.insert()""" | |||
code = parse("Have a {{template}} and a [[page|link]]") | |||
@@ -433,7 +460,7 @@ class TestWikicode(TreeEqualityTestCase): | |||
"""test Wikicode.strip_code()""" | |||
# Since individual nodes have test cases for their __strip__ methods, | |||
# we're only going to do an integration test: | |||
code = parse("Foo [[bar]]\n\n{{baz}}\n\n[[a|b]] Σ") | |||
code = parse("Foo [[bar]]\n\n{{baz|hello}}\n\n[[a|b]] Σ") | |||
self.assertEqual("Foo bar\n\nb Σ", | |||
code.strip_code(normalize=True, collapse=True)) | |||
self.assertEqual("Foo bar\n\n\n\nb Σ", | |||
@@ -442,6 +469,9 @@ class TestWikicode(TreeEqualityTestCase): | |||
code.strip_code(normalize=False, collapse=True)) | |||
self.assertEqual("Foo bar\n\n\n\nb Σ", | |||
code.strip_code(normalize=False, collapse=False)) | |||
self.assertEqual("Foo bar\n\nhello\n\nb Σ", | |||
code.strip_code(normalize=True, collapse=True, | |||
keep_template_params=True)) | |||
def test_get_tree(self): | |||
"""test Wikicode.get_tree()""" | |||
@@ -58,10 +58,8 @@ class TestWikilink(TreeEqualityTestCase): | |||
"""test Wikilink.__strip__()""" | |||
node = Wikilink(wraptext("foobar")) | |||
node2 = Wikilink(wraptext("foo"), wraptext("bar")) | |||
for a in (True, False): | |||
for b in (True, False): | |||
self.assertEqual("foobar", node.__strip__(a, b)) | |||
self.assertEqual("bar", node2.__strip__(a, b)) | |||
self.assertEqual("foobar", node.__strip__()) | |||
self.assertEqual("bar", node2.__strip__()) | |||
def test_showtree(self): | |||
"""test Wikilink.__showtree__()""" | |||
@@ -346,3 +346,10 @@ name: tables_in_templates_2 | |||
label: catch error handling mistakes when wikitables are inside templates | |||
input: "{{hello|test\n{|\n| }}" | |||
output: [TemplateOpen(), Text(text="hello"), TemplateParamSeparator(), Text(text="test\n{"), TemplateParamSeparator(), Text(text="\n"), TemplateParamSeparator(), Text(text=" "), TemplateClose()] | |||
--- | |||
name: many_invalid_nested_tags | |||
label: many unending nested tags that should be treated as plain text, followed by valid wikitext (see issues #42, #183) | |||
input: "<b><b><b><b><b><b><b><b><b><b><b><b><b><b><b><b><b><b>[[{{x}}" | |||
output: [Text(text="<b><b><b><b><b><b><b><b><b><b><b><b><b><b><b><b><b><b>[["), TemplateOpen(), Text(text="x"), TemplateClose()] |
@@ -646,3 +646,10 @@ name: non_ascii_full | |||
label: an open/close tag pair containing non-ASCII characters | |||
input: "<éxamplé></éxamplé>" | |||
output: [TagOpenOpen(), Text(text="éxamplé"), TagCloseOpen(padding=""), TagOpenClose(), Text(text="éxamplé"), TagCloseClose()] | |||
--- | |||
name: single_nested_selfclosing | |||
label: a single (unpaired) tag with a self-closing tag in the middle (see issue #147) | |||
input: "<li a <br/> c>foobar" | |||
output: [TagOpenOpen(), Text(text="li"), TagAttrStart(pad_first=" ", pad_after_eq="", pad_before_eq=" "), Text(text="a"), TagAttrStart(pad_first="", pad_after_eq="", pad_before_eq=" "), TagOpenOpen(), Text(text="br"), TagCloseSelfclose(padding=""), TagAttrStart(pad_first="", pad_after_eq="", pad_before_eq=""), Text(text="c"), TagCloseSelfclose(padding="", implicit=True), Text(text="foobar")] |
@@ -694,4 +694,4 @@ output: [Text(text="{{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ {{ | |||
name: recursion_opens_and_closes | |||
label: test potentially dangerous recursion: template openings and closings | |||
input: "{{x|{{x}}{{x|{{x}}{{x|{{x}}{{x|{{x}}{{x|{{x}}{{x|{{x}}{{x|{{x}}{{x|{{x}}{{x|{{x}}{{x|{{x}}{{x|{{x}}{{x|{{x}}{{x|{{x}}{{x|{{x}}" | |||
output: [Text(text="{{x|"), TemplateOpen(), Text(text="x"), TemplateClose(), Text(text="{{x|"), TemplateOpen(), Text(text="x"), TemplateClose(), TemplateOpen(), Text(text="x"), TemplateParamSeparator(), TemplateOpen(), Text(text="x"), TemplateClose(), Text(text="{{x"), TemplateParamSeparator(), Text(text="{{x"), TemplateClose(), Text(text="{{x|{{x}}{{x|{{x}}{{x|{{x}}{{x|{{x}}{{x|{{x}}{{x|{{x}}{{x|{{x}}{{x|{{x}}{{x|{{x}}{{x|{{x}}")] | |||
output: [Text(text="{{x|"), TemplateOpen(), Text(text="x"), TemplateClose(), Text(text="{{x|"), TemplateOpen(), Text(text="x"), TemplateClose(), Text(text="{{x|"), TemplateOpen(), Text(text="x"), TemplateClose(), Text(text="{{x|"), TemplateOpen(), Text(text="x"), TemplateClose(), Text(text="{{x|"), TemplateOpen(), Text(text="x"), TemplateClose(), Text(text="{{x|"), TemplateOpen(), Text(text="x"), TemplateClose(), Text(text="{{x|"), TemplateOpen(), Text(text="x"), TemplateClose(), Text(text="{{x|"), TemplateOpen(), Text(text="x"), TemplateClose(), Text(text="{{x|"), TemplateOpen(), Text(text="x"), TemplateClose(), Text(text="{{x|"), TemplateOpen(), Text(text="x"), TemplateClose(), Text(text="{{x|"), TemplateOpen(), Text(text="x"), TemplateClose(), Text(text="{{x|"), TemplateOpen(), Text(text="x"), TemplateClose(), Text(text="{{x|"), TemplateOpen(), Text(text="x"), TemplateClose(), Text(text="{{x|"), TemplateOpen(), Text(text="x"), TemplateClose()] |