mwparserfromhell

提交图

作者	SHA1	备注	提交日期
Ben Kurtovic	297bcb0cee	Move mwparserfromhell to src/ dir	3 年前
Jakub Klinkovský	bb51e8f282	Some fixes for the parsing of external links (#232 ) * Proposed fix for https://github.com/earwig/mwparserfromhell/issues/197 * Port the fix for #197 to the C tokenizer * Fix parsing of external links where the URL is terminated by some special character - One existing test case has been found wrong -- current MediaWiki version always terminates the URL when an opening bracket is encountered. - Other test cases added: double quote, two single quotes and angles always terminate the URL (regardless if it is a free link or external link inside brackets). One single quote does not terminate the URL. * Fix case-insensitive parsing of URI schemes	3 年前
Jakub Klinkovský	90061b6844	Fix parsing of section headings inside templates (#233 ) Fixes #198 Co-authored-by: Ben Kurtovic <ben.kurtovic@gmail.com>	3 年前
Kunal Mehta	7e5297fbe6	Drop Python 2 support Fixes #221.	4 年前
Ben Kurtovic	b3c98efd22	Fix a parsing bug involving deeply nested style tags (fixes #224 )	4 年前
Ben Kurtovic	8c5f554406	Add guard against a rare crash in the C tokenizer	5 年前
Ben Kurtovic	fa98aad408	Bump copyright [skip ci]	5 年前
Ben Kurtovic	4775131717	Fix not memoizing bad routes after failing inside a table (fixes #206 )	5 年前
Ben Kurtovic	6de7d41733	Fix signals getting stuck inside the C tokenizer (#206 )	5 年前
Ben Kurtovic	86c805d59b	Don't get stuck in tags with unclosed quoted attributes (fixes #190 ).	6 年前
Ben Kurtovic	cd4f90e663	Fix a rare parsing bug involving nested broken tags.	7 年前
Ben Kurtovic	0ef6a2ffbe	Fix declarations for C89 compatibility (forgot MSVC needed that...)	7 年前
Ben Kurtovic	8a9c9224be	Speed up parsing deeply nested syntax by caching bad routes (fixes #42 ) Also removed the max cycles stop-gap, allowing much more complex pages to be parsed quickly without losing nodes at the end Also fixes #65, fixes #102, fixes #165, fixes #183 Also fixes #81 (Rafael Nadal parsing bug) Also fixes #53, fixes #58, fixes #88, fixes #152 (duplicate issues)	7 年前
Ben Kurtovic	aaffb7f66b	Update copyright for 2016.	8 年前
Ben Kurtovic	8835ca313a	Don't preserve context when popping template key stack (fixes #142 , hopefully).	8 年前
Ben Kurtovic	61b6b98470	Fix two parser bugs involving wikitable error handling.	8 年前
Ben Kurtovic	460199488f	Fix a couple sign compare issues.	8 年前
Ben Kurtovic	90bd12dd47	Fix a C tokenizer crash when parsing is interrupted (fixes #97 )	8 年前
Ben Kurtovic	4f3ab48375	Edge cases involving wikilink -> external link fallback (fixes #120 )	8 年前
Ben Kurtovic	c1d4feea66	Py_UNICODE -> Unicode everywhere; bugfix for PEP 393.	8 年前
Ben Kurtovic	5eac0ab16f	More PEP 393 work; update Textbuffer interface and usage.	8 年前
Ben Kurtovic	2072a10b67	More reworking of CTokenizer Unicode support (incomplete)	8 年前
Ben Kurtovic	2a3a978986	Incomplete code for C tokenizer textbuffer.	9 年前
Ben Kurtovic	40fed91806	Fix C tokenizer leaking memory.	9 年前
Ben Kurtovic	2005efd309	Split up C tokenizer into tag_data, tok_parse, tok_support, tokens.	9 年前
Ben Kurtovic	0e547aa416	Begin splitting up C tokenizer.	9 年前
Ben Kurtovic	dad042bc2c	Fix C warnings in MSVC.	9 年前
Ben Kurtovic	1d5bbbe25b	Disallow < and > in wikilink titles/template names (fixes #104 )	9 年前
Ben Kurtovic	e71e7b4ece	Update copyright years for 2015; fix whitespace in docs.	9 年前
Ben Kurtovic	a00c645bd8	Fix handling of tag closes within <nowiki> (fixes #89 ).	9 年前
Ben Kurtovic	a15f6172c0	Minor bugfix.	9 年前
Ben Kurtovic	9fc4b909e1	Refactor a lot of table error recovery code.	9 年前
Ben Kurtovic	fb261450d8	Port tokenizer updates to C.	9 年前
Ben Kurtovic	640005dbb2	Tokenizer cleanup; make inline table syntax invalid as it should be.	9 年前
Ben Kurtovic	913ff590c8	Cleanup; add a missing test.	9 年前
Ben Kurtovic	5d29bff918	Remove an incorrect usage of Py_XDECREF().	9 年前
Ben Kurtovic	7489253e32	Break at 80 cols for most lines.	9 年前
David Winegar	1a4c88e11f	Correctly handle no table endings Tests were not correctly testing the situations without a table close. Fixed tests and then fixed tokenizers for failing tests. Also refactored pytokenizer to more closely match the ctokenizer by only holding the `_parse` methods in the try blocks and no other code.	10 年前
David Winegar	c63108039b	Fix C code to make declarations before statements Python 3.4 compiles C extensions with the `-Werror=declaration-after-statement` flag that enforces C90 more strictly than previous versions. Move all statements after declarations to make sure this extension builds on 3.4.	10 年前
David Winegar	213c105666	Table tags are no longer self-closing Table tags no longer self-closing. Rows and cells now contain their contents. Also refactored out an `emit_table_tag` method. Note: this will require changes to the Tag node and possibly the builder, those changes will be in the next commit.	10 年前
David Winegar	0128b1f78a	Implement CTokenizer for tables CTokenizer is completely implemented in this commit - it didn't make much sense to me to split it up. All tests passing, memory test shows no leaks on Linux.	10 年前
David Winegar	2d945b30e5	Use uint64_t for context For the C tokenizer, include `<stdint.h>` and use `uint64_t` instead of `int` for context. Changes to tables mean that context can be larger than 32 bits, and it is possible for `int` to only have 16 bits anyways (though this is very unlikely).	10 年前
Ben Kurtovic	6954480263	Fix template parsing when comments are inside the name (fixes #59 ).	10 年前
Ben Kurtovic	ded89fb14e	Add a few unit tests for untested code; remove a useless conditional.	10 年前
Ben Kurtovic	b997e4cd71	Support attributes quoted with '; add required quotes in value setter.	10 年前
Ben Kurtovic	a4c2fd023a	Remove some useless code in the tokenizers.	10 年前
Ben Kurtovic	08cafc0576	Raise ParserError for internal problems. Improve coverage. Cleanup.	10 年前
Ben Kurtovic	02eff0fc49	Fully fix #74 . Add another tokenizer test.	10 年前
Ben Kurtovic	0497b54f03	Fix _handle_single_tag_end()'s token search order (fixes #74 )	10 年前
Ben Kurtovic	5c5fd6b3cb	Fix a bug involving nested links (closes #61 and #62 ).	10 年前

1 次代码提交 (297bcb0ceedbdc1972fbb568a8efde1dba708cac)