Also removed the max cycles stop-gap, allowing much more complex pages
to be parsed quickly without losing nodes at the end
Also fixes#65, fixes#102, fixes#165, fixes#183
Also fixes#81 (Rafael Nadal parsing bug)
Also fixes#53, fixes#58, fixes#88, fixes#152 (duplicate issues)
Tests were not correctly testing the situations without a table close.
Fixed tests and then fixed tokenizers for failing tests. Also refactored
pytokenizer to more closely match the ctokenizer by only holding the
`_parse` methods in the try blocks and no other code.
Python 3.4 compiles C extensions with the
`-Werror=declaration-after-statement` flag that enforces C90 more
strictly than previous versions. Move all statements after declarations
to make sure this extension builds on 3.4.
Table tags no longer self-closing. Rows and cells now contain their
contents. Also refactored out an `emit_table_tag` method.
Note: this will require changes to the Tag node and possibly the builder,
those changes will be in the next commit.
CTokenizer is completely implemented in this commit - it didn't
make much sense to me to split it up. All tests passing, memory test
shows no leaks on Linux.
For the C tokenizer, include `<stdint.h>` and use `uint64_t` instead
of `int` for context. Changes to tables mean that context can be
larger than 32 bits, and it is possible for `int` to only have 16
bits anyways (though this is very unlikely).