Browse Source

Merge branch 'develop' into feature/html_tags

Conflicts:
	tests/test_builder.py
tags/v0.3
Ben Kurtovic 11 years ago
parent
commit
b89ae9ad48
14 changed files with 142 additions and 20 deletions
  1. +1
    -0
      .gitignore
  2. +41
    -0
      CHANGELOG
  3. +3
    -1
      README.rst
  4. +66
    -0
      docs/changelog.rst
  5. +3
    -2
      docs/index.rst
  6. +1
    -1
      mwparserfromhell/__init__.py
  7. +0
    -0
     
  8. +1
    -1
      mwparserfromhell/nodes/template.py
  9. +13
    -3
      mwparserfromhell/parser/tokenizer.c
  10. +1
    -0
      mwparserfromhell/parser/tokenizer.h
  11. +4
    -4
      mwparserfromhell/string_mixin.py
  12. +3
    -3
      mwparserfromhell/wikicode.py
  13. +1
    -1
      tests/_test_tokenizer.py
  14. +4
    -4
      tests/test_string_mixin.py

+ 1
- 0
.gitignore View File

@@ -1,5 +1,6 @@
*.pyc
*.so
*.dll
*.egg
*.egg-info
.DS_Store


+ 41
- 0
CHANGELOG View File

@@ -0,0 +1,41 @@
v0.3 (unreleased):

- Various fixes and cleanup.

v0.2 (released June 20, 2013):

- The parser now fully supports Python 3 in addition to Python 2.7.
- Added a C tokenizer extension that is significantly faster than its Python
equivalent. It is enabled by default (if available) and can be toggled by
setting `mwparserfromhell.parser.use_c` to a boolean value.
- Added a complete set of unit tests covering parsing and wikicode
manipulation.
- Renamed Wikicode.filter_links() to filter_wikilinks() (applies to ifilter as
well).
- Added filter methods for Arguments, Comments, Headings, and HTMLEntities.
- Added 'before' param to Template.add(); renamed 'force_nonconformity' to
'preserve_spacing'.
- Added 'include_lead' param to Wikicode.get_sections().
- Removed 'flat' param from Wikicode.get_sections().
- Removed 'force_no_field' param from Template.remove().
- Added support for Travis CI.
- Added note about Windows build issue in the README.
- The tokenizer will limit itself to a realistic recursion depth to prevent
errors and unreasonably long parse times.
- Fixed how some nodes' attribute setters handle input.
- Fixed multiple bugs in the tokenizer's handling of invalid markup.
- Fixed bugs in the implementation of SmartList and StringMixIn.
- Fixed some broken example code in the README; other copyedits.
- Other bugfixes and code cleanup.

v0.1.1 (released September 21, 2012):

- Added support for Comments (<!-- foo -->) and Wikilinks ([[foo]]).
- Added corresponding ifilter_links() and filter_links() methods to Wikicode.
- Fixed a bug when parsing incomplete templates.
- Fixed strip_code() to affect the contents of headings.
- Various copyedits in documentation and comments.

v0.1 (released August 23, 2012):

- Initial release.

+ 3
- 1
README.rst View File

@@ -9,7 +9,8 @@ mwparserfromhell
that provides an easy-to-use and outrageously powerful parser for MediaWiki_
wikicode. It supports Python 2 and Python 3.

Developed by Earwig_ with help from `Σ`_.
Developed by Earwig_ with help from `Σ`_. Full documentation is available on
ReadTheDocs_.

Installation
------------
@@ -142,6 +143,7 @@ following code (via the API_)::
return mwparserfromhell.parse(text)

.. _MediaWiki: http://mediawiki.org
.. _ReadTheDocs: http://mwparserfromhell.readthedocs.org
.. _Earwig: http://en.wikipedia.org/wiki/User:The_Earwig
.. _Σ: http://en.wikipedia.org/wiki/User:%CE%A3
.. _Python Package Index: http://pypi.python.org


+ 66
- 0
docs/changelog.rst View File

@@ -0,0 +1,66 @@
Changelog
=========

v0.3
----

Unreleased
(`changes <https://github.com/earwig/mwparserfromhell/compare/v0.2...develop>`__):

- Various fixes and cleanup.

v0.2
----

`Released June 20, 2013 <https://github.com/earwig/mwparserfromhell/tree/v0.2>`_
(`changes <https://github.com/earwig/mwparserfromhell/compare/v0.1.1...v0.2>`__):

- The parser now fully supports Python 3 in addition to Python 2.7.
- Added a C tokenizer extension that is significantly faster than its Python
equivalent. It is enabled by default (if available) and can be toggled by
setting :py:attr:`mwparserfromhell.parser.use_c` to a boolean value.
- Added a complete set of unit tests covering parsing and wikicode
manipulation.
- Renamed :py:meth:`.filter_links` to :py:meth:`.filter_wikilinks` (applies to
:py:meth:`.ifilter` as well).
- Added filter methods for :py:class:`Arguments <.Argument>`,
:py:class:`Comments <.Comment>`, :py:class:`Headings <.Heading>`, and
:py:class:`HTMLEntities <.HTMLEntity>`.
- Added *before* param to :py:meth:`Template.add() <.Template.add>`; renamed
*force_nonconformity* to *preserve_spacing*.
- Added *include_lead* param to :py:meth:`Wikicode.get_sections()
<.get_sections>`.
- Removed *flat* param from :py:meth:`.get_sections`.
- Removed *force_no_field* param from :py:meth:`Template.remove()
<.Template.remove>`.
- Added support for Travis CI.
- Added note about Windows build issue in the README.
- The tokenizer will limit itself to a realistic recursion depth to prevent
errors and unreasonably long parse times.
- Fixed how some nodes' attribute setters handle input.
- Fixed multiple bugs in the tokenizer's handling of invalid markup.
- Fixed bugs in the implementation of :py:class:`.SmartList` and
:py:class:`.StringMixIn`.
- Fixed some broken example code in the README; other copyedits.
- Other bugfixes and code cleanup.

v0.1.1
------

`Released September 21, 2012 <https://github.com/earwig/mwparserfromhell/tree/v0.1.1>`_
(`changes <https://github.com/earwig/mwparserfromhell/compare/v0.1...v0.1.1>`__):

- Added support for :py:class:`Comments <.Comment>` (``<!-- foo -->``) and
:py:class:`Wikilinks <.Wikilink>` (``[[foo]]``).
- Added corresponding :py:meth:`.ifilter_links` and :py:meth:`.filter_links`
methods to :py:class:`.Wikicode`.
- Fixed a bug when parsing incomplete templates.
- Fixed :py:meth:`.strip_code` to affect the contents of headings.
- Various copyedits in documentation and comments.

v0.1
----

`Released August 23, 2012 <https://github.com/earwig/mwparserfromhell/tree/v0.1>`_:

- Initial release.

+ 3
- 2
docs/index.rst View File

@@ -1,5 +1,5 @@
MWParserFromHell v0.2 Documentation
===================================
MWParserFromHell v\ |version| Documentation
===========================================

:py:mod:`mwparserfromhell` (the *MediaWiki Parser from Hell*) is a Python
package that provides an easy-to-use and outrageously powerful parser for
@@ -41,6 +41,7 @@ Contents

usage
integration
changelog
API Reference <api/modules>




+ 1
- 1
mwparserfromhell/__init__.py View File

@@ -31,7 +31,7 @@ from __future__ import unicode_literals
__author__ = "Ben Kurtovic"
__copyright__ = "Copyright (C) 2012, 2013 Ben Kurtovic"
__license__ = "MIT License"
__version__ = "0.2.dev"
__version__ = "0.3.dev"
__email__ = "ben.kurtovic@verizon.net"

from . import compat, nodes, parser, smart_list, string_mixin, utils, wikicode


+ 0
- 0
View File


+ 1
- 1
mwparserfromhell/nodes/template.py View File

@@ -293,7 +293,7 @@ class Template(Node):
"""
name = name.strip() if isinstance(name, basestring) else str(name)
removed = False
to_remove =[]
to_remove = []
for i, param in enumerate(self.params):
if param.name.strip() == name:
if keep_field:


+ 13
- 3
mwparserfromhell/parser/tokenizer.c View File

@@ -23,9 +23,16 @@ SOFTWARE.

#include "tokenizer.h"

double log2(double n)
/*
Given a context, return the heading level encoded within it.
*/
static int heading_level_from_context(int n)
{
return log(n) / log(2);
int level;
n /= LC_HEADING_LEVEL_1;
for (level = 1; n > 1; n >>= 1)
level++;
return level;
}

static PyObject*
@@ -175,6 +182,9 @@ Tokenizer_push_textbuffer(Tokenizer* self)
return 0;
}

/*
Pop and deallocate the top token stack/context/textbuffer.
*/
static void
Tokenizer_delete_top_of_stack(Tokenizer* self)
{
@@ -857,7 +867,7 @@ Tokenizer_handle_heading_end(Tokenizer* self)
best++;
self->head++;
}
current = log2(self->topstack->context / LC_HEADING_LEVEL_1) + 1;
current = heading_level_from_context(self->topstack->context);
level = current > best ? (best > 6 ? 6 : best) :
(current > 6 ? 6 : current);
after = (HeadingData*) Tokenizer_parse(self, self->topstack->context);


+ 1
- 0
mwparserfromhell/parser/tokenizer.h View File

@@ -181,6 +181,7 @@ typedef struct {

/* Function prototypes: */

static int heading_level_from_context(int);
static PyObject* Tokenizer_new(PyTypeObject*, PyObject*, PyObject*);
static struct Textbuffer* Textbuffer_new(void);
static void Tokenizer_dealloc(Tokenizer*);


+ 4
- 4
mwparserfromhell/string_mixin.py View File

@@ -253,12 +253,12 @@ class StringMixIn(object):
if py3k:
@staticmethod
@inheritdoc
def maketrans(self, x, y=None, z=None):
def maketrans(x, y=None, z=None):
if z is None:
if y is None:
return self.__unicode__.maketrans(x)
return self.__unicode__.maketrans(x, y)
return self.__unicode__.maketrans(x, y, z)
return str.maketrans(x)
return str.maketrans(x, y)
return str.maketrans(x, y, z)

@inheritdoc
def partition(self, sep):


+ 3
- 3
mwparserfromhell/wikicode.py View File

@@ -168,7 +168,7 @@ class Wikicode(StringMixIn):
doc = """Iterate over {0}.

This is equivalent to :py:meth:`{1}` with *forcetype* set to
:py:class:`~.{2}`.
:py:class:`~{2.__module__}.{2.__name__}`.
"""
make_ifilter = lambda ftype: (lambda self, **kw:
self.ifilter(forcetype=ftype, **kw))
@@ -177,8 +177,8 @@ class Wikicode(StringMixIn):
for name, ftype in (meths.items() if py3k else meths.iteritems()):
ifilter = make_ifilter(ftype)
filter = make_filter(ftype)
ifilter.__doc__ = doc.format(name, "ifilter", ftype.__name__)
filter.__doc__ = doc.format(name, "filter", ftype.__name__)
ifilter.__doc__ = doc.format(name, "ifilter", ftype)
filter.__doc__ = doc.format(name, "filter", ftype)
setattr(cls, "ifilter_" + name, ifilter)
setattr(cls, "filter_" + name, filter)



+ 1
- 1
tests/_test_tokenizer.py View File

@@ -109,7 +109,7 @@ class TokenizerTestCase(object):
def build(cls):
"""Load and install all tests from the 'tokenizer' directory."""
def load_file(filename):
with open(filename, "r") as fp:
with open(filename, "rU") as fp:
text = fp.read()
if not py3k:
text = text.decode("utf8")


+ 4
- 4
tests/test_string_mixin.py View File

@@ -414,10 +414,10 @@ class TestStringMixIn(unittest.TestCase):
self.assertEqual("Fake String", str1.title())

if py3k:
table1 = str.maketrans({97: "1", 101: "2", 105: "3", 111: "4",
117: "5"})
table2 = str.maketrans("aeiou", "12345")
table3 = str.maketrans("aeiou", "12345", "rts")
table1 = StringMixIn.maketrans({97: "1", 101: "2", 105: "3",
111: "4", 117: "5"})
table2 = StringMixIn.maketrans("aeiou", "12345")
table3 = StringMixIn.maketrans("aeiou", "12345", "rts")
self.assertEqual("f1k2 str3ng", str1.translate(table1))
self.assertEqual("f1k2 str3ng", str1.translate(table2))
self.assertEqual("f1k2 3ng", str1.translate(table3))


Loading…
Cancel
Save