This regression seems more severe than the bug the commit was
attempting to fix (incorrect parsing of nested wikilinks in normal
links), so that bug is reintroduced until localization-aware parsing
that allows us to detect file links is added.
This commit partially reverts fac60dee48.
* Proposed fix for https://github.com/earwig/mwparserfromhell/issues/197
* Port the fix for #197 to the C tokenizer
* Fix parsing of external links where the URL is terminated by some special character
- One existing test case has been found wrong -- current MediaWiki
version always terminates the URL when an opening bracket is
encountered.
- Other test cases added: double quote, two single quotes and angles
always terminate the URL (regardless if it is a free link or external
link inside brackets). One single quote does not terminate the URL.
* Fix case-insensitive parsing of URI schemes
pytest is the preferred way to write and run unit tests these days and
it has a cleaner interface - so lets switch to it. The tokenizer tests
especially are much easier to read/understand.
This was mostly done with find/replace regexes and then cleaned up
manually.
Step one of refactoring - making SmartList into its own
package, with each class having its own file. No code
changes were made.
Note that SmartList and ListProxy import each other,
so had to import SmartList as a full package name
rather than use from ... import ... construct.
Also removed the max cycles stop-gap, allowing much more complex pages
to be parsed quickly without losing nodes at the end
Also fixes#65, fixes#102, fixes#165, fixes#183
Also fixes#81 (Rafael Nadal parsing bug)
Also fixes#53, fixes#58, fixes#88, fixes#152 (duplicate issues)