Browse Source

Fix regression in parsing nested wikilinks in file captions

This regression seems more severe than the bug the commit was
attempting to fix (incorrect parsing of nested wikilinks in normal
links), so that bug is reintroduced until localization-aware parsing
that allows us to detect file links is added.

This commit partially reverts fac60dee48.
tags/v0.6.4
Ben Kurtovic 2 years ago
parent
commit
2155638b91
5 changed files with 34 additions and 20 deletions
  1. +4
    -1
      CHANGELOG
  2. +7
    -1
      docs/changelog.rst
  3. +6
    -4
      src/mwparserfromhell/parser/ctokenizer/tok_parse.c
  4. +3
    -2
      src/mwparserfromhell/parser/tokenizer.py
  5. +14
    -12
      tests/tokenizer/wikilinks.mwtest

+ 4
- 1
CHANGELOG View File

@@ -1,7 +1,10 @@
v0.7 (unreleased):
v0.6.4 (unreleased):

- Dropped support for end-of-life Python 3.5.
- Added support for Python 3.10. (#278)
- Fixed a regression in v0.6.2 that broke parsing of nested wikilinks in file
captions. For now, the parser will interpret nested wikilinks in normal links
as well, even though this differs from MediaWiki. (#270)

v0.6.3 (released September 2, 2021):



+ 7
- 1
docs/changelog.rst View File

@@ -1,14 +1,19 @@
Changelog
=========

v0.7
v0.6.4
------

Unreleased
(`changes <https://github.com/earwig/mwparserfromhell/compare/v0.6.3...develop>`__):

- Dropped support for end-of-life Python 3.5.
- Added support for Python 3.10.
(`#278 <https://github.com/earwig/mwparserfromhell/issues/278>`_)
- Fixed a regression in v0.6.2 that broke parsing of nested wikilinks in file
captions. For now, the parser will handle interpret wikilinks in normal links
as well, even though this differs from MediaWiki.
(`#270 <https://github.com/earwig/mwparserfromhell/issues/270>`_)

v0.6.3
------


+ 6
- 4
src/mwparserfromhell/parser/ctokenizer/tok_parse.c View File

@@ -51,7 +51,8 @@ static int Tokenizer_parse_tag(Tokenizer *);
/*
Determine whether the given code point is a marker.
*/
static int is_marker(Py_UCS4 this)
static int
is_marker(Py_UCS4 this)
{
int i;

@@ -2929,9 +2930,10 @@ Tokenizer_parse(Tokenizer *self, uint64_t context, int push)
return NULL;
}
} else if (this == next && next == '[' && Tokenizer_CAN_RECURSE(self)) {
if (this_context & LC_WIKILINK_TEXT) {
return Tokenizer_fail_route(self);
}
// TODO: Only do this if not in a file context:
// if (this_context & LC_WIKILINK_TEXT) {
// return Tokenizer_fail_route(self);
// }
if (!(this_context & AGG_NO_WIKILINKS)) {
if (Tokenizer_parse_wikilink(self)) {
return NULL;


+ 3
- 2
src/mwparserfromhell/parser/tokenizer.py View File

@@ -1406,8 +1406,9 @@ class Tokenizer:
return self._handle_argument_end()
self._emit_text("}")
elif this == nxt == "[" and self._can_recurse():
if self._context & contexts.WIKILINK_TEXT:
self._fail_route()
# TODO: Only do this if not in a file context:
# if self._context & contexts.WIKILINK_TEXT:
# self._fail_route()
if not self._context & contexts.NO_WIKILINKS:
self._parse_wikilink()
else:


+ 14
- 12
tests/tokenizer/wikilinks.mwtest View File

@@ -54,6 +54,20 @@ output: [WikilinkOpen(), Text(text="foo"), WikilinkSeparator(), Text(text="bar[b

---

name: nested
label: a wikilink nested within another
input: "[[file:foo|[[bar]]]]"
output: [WikilinkOpen(), Text(text="file:foo"), WikilinkSeparator(), WikilinkOpen(), Text(text="bar"), WikilinkClose(), WikilinkClose()]

---

name: nested_padding
label: a wikilink nested within another, separated by other data
input: "[[file:foo|a[[b]]c]]"
output: [WikilinkOpen(), Text(text="file:foo"), WikilinkSeparator(), Text(text="a"), WikilinkOpen(), Text(text="b"), WikilinkClose(), Text(text="c"), WikilinkClose()]

---

name: invalid_newline
label: invalid wikilink: newline as only content
input: "[[\n]]"
@@ -89,20 +103,6 @@ output: [Text(text="[[foo[bar]]")]

---

name: invalid_nested_text
label: invalid wikilink: nested within the text of another
input: "[[foo|[[bar]]]]"
output: [Text(text="[[foo|"), WikilinkOpen(), Text(text="bar"), WikilinkClose(), Text(text="]]")]


name: invalid_nested_text_2
label: invalid wikilink: a wikilink nested within the text of another, with additional content
input: "[[foo|a[[b]]c]]"
output: [Text(text="[[foo|a"), WikilinkOpen(), Text(text="b"), WikilinkClose(), Text(text="c]]")]


name: invalid_nested_title
label: invalid wikilink: nested within the title of another
input: "[[foo[[bar]]]]"


Loading…
Cancel
Save