Ben Kurtovic
b24ddaea10
Tokenizer support for implicitly self-closing tags.
pirms 11 gadiem
Ben Kurtovic
7d1a28a249
Support single and single-only tags like <br>.
pirms 11 gadiem
Ben Kurtovic
50beda0914
Improve/fix the way padding is handled.
pirms 11 gadiem
Ben Kurtovic
fb92012fcb
Support parser-blacklisted tags like <nowiki>
pirms 11 gadiem
Ben Kurtovic
4cfa40685e
Clean up the way contexts are defined.
pirms 11 gadiem
Ben Kurtovic
a42a704230
Support backslash-escaped quotes in tags; CX_NEED_* -> CX_NOTE_*
pirms 11 gadiem
Ben Kurtovic
591a0f5ed5
Change 'write' to 'emit'; adjust some other names for PEP8.
pirms 11 gadiem
Ben Kurtovic
e99c9d3038
More tag refactoring; fix some bugs.
pirms 11 gadiem
Ben Kurtovic
5e8794da5e
Refactor more of the tag tokenization process.
pirms 11 gadiem
Ben Kurtovic
dd6bb1637d
Support tag nesting properly; unit tests; recursion checks for tags.
pirms 11 gadiem
Ben Kurtovic
9693b6d5e6
Replace data.literal and data.quoted with a data.CX_QUOTED context
pirms 11 gadiem
Ben Kurtovic
e34026dabe
Support templates and wikilinks inside <open> tags (part 2)
pirms 11 gadiem
Ben Kurtovic
dfe100ceb7
Support templates and wikilinks inside <open> tags (part 1)
pirms 11 gadiem
Ben Kurtovic
6c2898d7bd
Make {{|=}} build correctly; add a test for this.
pirms 11 gadiem
Ben Kurtovic
f63480bcf3
Update the integration.rich_tags test to use the new tag tokens.
Remove an now-unused import in the tokenizer.
pirms 11 gadiem
Ben Kurtovic
82edc93bbb
Pass some tests by simplifying the way tags are read from the stack.
Two still fail because templates aren't implemented yet, but those
are otherwise handled correctly.
pirms 11 gadiem
Ben Kurtovic
962adcd62c
Add docstrings for a couple new methods in the tokenizer.
pirms 11 gadiem
Ben Kurtovic
5f5a081d91
Rewrite tag parser to be cleaner and safer.
All tag tests passing. Still need to finish backslash support and
support for templates and tags within <open> tags.
pirms 11 gadiem
Ben Kurtovic
81e8fdd682
Give Attributes more attributes for padding data.
pirms 11 gadiem
Ben Kurtovic
ce27d5d385
Fix six failing tests; add three more (all passing).
pirms 11 gadiem
Ben Kurtovic
6450814729
Remove 'type' attribute from tags; rework tag definitions.
pirms 11 gadiem
Ben Kurtovic
2596e697ae
Fix a possible compiler warning on some build systems.
pirms 11 gadiem
Ben
a689467577
Replace broken log2 function; add a missing comment.
pirms 11 gadiem
Ben Kurtovic
d2b3954669
Fix remaining broken tests; some refactoring.
pirms 11 gadiem
Ben Kurtovic
9ea06c2830
Push the textbuffer to fix a couple broken tests.
pirms 11 gadiem
Ben Kurtovic
1b4c01b4c0
Implement assertTagNodeEqual(), start test_tag(), add to tags.mwtest.
pirms 11 gadiem
Ben Kurtovic
61fc5b5eab
Fix handling of self-closing tags ( closes #31 )
pirms 11 gadiem
Ben Kurtovic
22e869b142
Fix a failing HTML entity test in the C tokenizer.
Remove some extraneous whitespace in string_mixin.py.
pirms 11 gadiem
Ben Kurtovic
496475c977
Whoops, that should be one larger ( #25 ).
pirms 11 gadiem
Ben Kurtovic
9ede1121ba
Fix tokenizer.c on Windows; add another template test ( #25 )
Mostly by @gdooms , with tweaks.
pirms 11 gadiem
Ben Kurtovic
debcb6577e
Fix recursion issues by giving up at a certain point ( closes #16 ).
- Stop parsing new templates if the template depth gets above
MAX_DEPTH (40) or if we've already tried to parse over MAX_CYCLES
(100,000) templates.
- Add two tests to ensure recursion works somewhat correctly.
- Fix parsing the string "{{" with the Python tokenizer; add a test.
pirms 11 gadiem
Ben Kurtovic
f803269514
Add a USES_C field to the tokenizers; add TestParser.test_use_c()
pirms 11 gadiem
Ben Kurtovic
6a741db7ce
Applying fb71f5507e
pirms 11 gadiem
Ben Kurtovic
d8814968b7
Applying latest commit from develop
pirms 11 gadiem
Ben Kurtovic
fb71f5507e
Support a 'use_c' field to explicitly disable the C tokenizer.
pirms 11 gadiem
Ben Kurtovic
054a84afe0
A bit of misc cleanup.
pirms 11 gadiem
Ben Kurtovic
718fcb24c8
Fix eight failing tests; all template parsing tests now passing ( #25 ).
pirms 11 gadiem
Ben Kurtovic
5a0a00ba98
Change the way verify_safe() handles template params ( #25 ).
- Newlines are now allowed in template param names.
- Changes also affect handling of arguments like {{{foo}}}.
- Update unit tests: remove some unnecessary ones, and add some to cover the changes.
- Update StringMixIn tests to actually work for some of the methods.
- Update copyright notices for the C extensions.
pirms 11 gadiem
Ben Kurtovic
0803417901
Port CTokenizer's verify_safe method to Python to solve a failing test.
pirms 11 gadiem
Ben Kurtovic
acb7e57904
Make mwparserfromhell.parser() be an alias for parse_anything().
Some other changes, including removal of the 'string' import in the tokenizer.
pirms 11 gadiem
Ben Kurtovic
d6f2723a06
Fix safety checks on template params in some odd cases ( closes #24 ).
Also, fix parsing of wikilinks in both tokenizers such that newlines
in any location within the title are an automatic failure.
pirms 11 gadiem
Ben Kurtovic
0ee505b5a5
Docstrings for new tokenizer methods.
pirms 11 gadiem
Ben Kurtovic
cd5cc6a7d0
Update copyright notices for 2013.
pirms 11 gadiem
Ben Kurtovic
11cf5def75
Fix handling of sections headers with equal signs ( closes #20 )
pirms 11 gadiem
Ben Kurtovic
6ea618460f
_get_tag_type_from_stack() makes more sense now
pirms 11 gadiem
Ben Kurtovic
eed7c918bf
Implement padding support for Tags completely; open_padding->padding.
pirms 11 gadiem
Ben Kurtovic
a58c480639
Fix some usage of attrs; shorten a context, fix some behavior I broke.
pirms 11 gadiem
Ben Kurtovic
ca47305074
Fix attribute behavior under certain strange circumstances.
pirms 11 gadiem
Ben Kurtovic
26d30f3d1a
Seems to be working for quoted attributes now.
pirms 11 gadiem
Ben Kurtovic
d459899649
More attribute stuff.
pirms 11 gadiem