瀏覽代碼

Fix regression in parsing nested wikilinks in file captions

This regression seems more severe than the bug the commit was
attempting to fix (incorrect parsing of nested wikilinks in normal
links), so that bug is reintroduced until localization-aware parsing
that allows us to detect file links is added.

This commit partially reverts fac60dee48.
tags/v0.6.4
Ben Kurtovic 2 年之前
父節點
當前提交
2155638b91
共有 5 個檔案被更改,包括 34 行新增20 行删除
  1. +4
    -1
      CHANGELOG
  2. +7
    -1
      docs/changelog.rst
  3. +6
    -4
      src/mwparserfromhell/parser/ctokenizer/tok_parse.c
  4. +3
    -2
      src/mwparserfromhell/parser/tokenizer.py
  5. +14
    -12
      tests/tokenizer/wikilinks.mwtest

+ 4
- 1
CHANGELOG 查看文件

@@ -1,7 +1,10 @@
v0.7 (unreleased):
v0.6.4 (unreleased):

- Dropped support for end-of-life Python 3.5.
- Added support for Python 3.10. (#278)
- Fixed a regression in v0.6.2 that broke parsing of nested wikilinks in file
captions. For now, the parser will interpret nested wikilinks in normal links
as well, even though this differs from MediaWiki. (#270)

v0.6.3 (released September 2, 2021):



+ 7
- 1
docs/changelog.rst 查看文件

@@ -1,14 +1,19 @@
Changelog
=========

v0.7
v0.6.4
------

Unreleased
(`changes <https://github.com/earwig/mwparserfromhell/compare/v0.6.3...develop>`__):

- Dropped support for end-of-life Python 3.5.
- Added support for Python 3.10.
(`#278 <https://github.com/earwig/mwparserfromhell/issues/278>`_)
- Fixed a regression in v0.6.2 that broke parsing of nested wikilinks in file
captions. For now, the parser will handle interpret wikilinks in normal links
as well, even though this differs from MediaWiki.
(`#270 <https://github.com/earwig/mwparserfromhell/issues/270>`_)

v0.6.3
------


+ 6
- 4
src/mwparserfromhell/parser/ctokenizer/tok_parse.c 查看文件

@@ -51,7 +51,8 @@ static int Tokenizer_parse_tag(Tokenizer *);
/*
Determine whether the given code point is a marker.
*/
static int is_marker(Py_UCS4 this)
static int
is_marker(Py_UCS4 this)
{
int i;

@@ -2929,9 +2930,10 @@ Tokenizer_parse(Tokenizer *self, uint64_t context, int push)
return NULL;
}
} else if (this == next && next == '[' && Tokenizer_CAN_RECURSE(self)) {
if (this_context & LC_WIKILINK_TEXT) {
return Tokenizer_fail_route(self);
}
// TODO: Only do this if not in a file context:
// if (this_context & LC_WIKILINK_TEXT) {
// return Tokenizer_fail_route(self);
// }
if (!(this_context & AGG_NO_WIKILINKS)) {
if (Tokenizer_parse_wikilink(self)) {
return NULL;


+ 3
- 2
src/mwparserfromhell/parser/tokenizer.py 查看文件

@@ -1406,8 +1406,9 @@ class Tokenizer:
return self._handle_argument_end()
self._emit_text("}")
elif this == nxt == "[" and self._can_recurse():
if self._context & contexts.WIKILINK_TEXT:
self._fail_route()
# TODO: Only do this if not in a file context:
# if self._context & contexts.WIKILINK_TEXT:
# self._fail_route()
if not self._context & contexts.NO_WIKILINKS:
self._parse_wikilink()
else:


+ 14
- 12
tests/tokenizer/wikilinks.mwtest 查看文件

@@ -54,6 +54,20 @@ output: [WikilinkOpen(), Text(text="foo"), WikilinkSeparator(), Text(text="bar[b

---

name: nested
label: a wikilink nested within another
input: "[[file:foo|[[bar]]]]"
output: [WikilinkOpen(), Text(text="file:foo"), WikilinkSeparator(), WikilinkOpen(), Text(text="bar"), WikilinkClose(), WikilinkClose()]

---

name: nested_padding
label: a wikilink nested within another, separated by other data
input: "[[file:foo|a[[b]]c]]"
output: [WikilinkOpen(), Text(text="file:foo"), WikilinkSeparator(), Text(text="a"), WikilinkOpen(), Text(text="b"), WikilinkClose(), Text(text="c"), WikilinkClose()]

---

name: invalid_newline
label: invalid wikilink: newline as only content
input: "[[\n]]"
@@ -89,20 +103,6 @@ output: [Text(text="[[foo[bar]]")]

---

name: invalid_nested_text
label: invalid wikilink: nested within the text of another
input: "[[foo|[[bar]]]]"
output: [Text(text="[[foo|"), WikilinkOpen(), Text(text="bar"), WikilinkClose(), Text(text="]]")]


name: invalid_nested_text_2
label: invalid wikilink: a wikilink nested within the text of another, with additional content
input: "[[foo|a[[b]]c]]"
output: [Text(text="[[foo|a"), WikilinkOpen(), Text(text="b"), WikilinkClose(), Text(text="c]]")]


name: invalid_nested_title
label: invalid wikilink: nested within the title of another
input: "[[foo[[bar]]]]"


Loading…
取消
儲存