Procházet zdrojové kódy

Add support for detailed text comparison in API (T132949)

copyvios-ng
Ben Kurtovic před 8 roky
rodič
revize
2a81217de8
2 změnil soubory, kde provedl 24 přidání a 8 odebrání
  1. +11
    -2
      copyvios/api.py
  2. +13
    -6
      templates/api.mako

+ 11
- 2
copyvios/api.py Zobrazit soubor

@@ -2,6 +2,7 @@

from collections import OrderedDict

from .highlighter import highlight_delta
from .checker import do_check, T_POSSIBLE, T_SUSPECT
from .misc import Query, cache
from .sites import update_sites
@@ -40,6 +41,11 @@ def _serialize_source(source, show_skip=True):
data["excluded"] = source.excluded
return data

def _serialize_detail(result):
article = highlight_delta(result.article_chain, result.best.chains[1])
source = highlight_delta(result.best.chains[0], result.best.chains[1])
return OrderedDict((("article", article), ("source", source)))

def format_api_error(code, info):
if isinstance(info, BaseException):
info = type(info).__name__ + ": " + str(info)
@@ -90,12 +96,15 @@ def _hook_check(query):
data["original_page"] = _serialize_page(query.redirected_from)
data["best"] = _serialize_source(result.best, show_skip=False)
data["sources"] = [_serialize_source(source) for source in result.sources]
if query.detail in ("1", "true"):
data["detail"] = _serialize_detail(result)
return data

def _hook_sites(query):
update_sites()
return OrderedDict((("status", "ok"),
("langs", cache.langs), ("projects", cache.projects)))
return OrderedDict((
("status", "ok"), ("langs", cache.langs), ("projects", cache.projects)
))

_HOOKS = {
"compare": _hook_check,


+ 13
- 6
templates/api.mako Zobrazit soubor

@@ -112,6 +112,12 @@
<td>Yes</td>
<td>The URL of the suspected violation source that will be compared to the page.</td>
</tr>
<tr>
<td>detail</td>
<td>boolean</td>
<td>No (default: <span class="code">false</span>)</td>
<td>Whether to include the detailed HTML text comparison available in the regular interface. If not, only the confidence percentage is available.</td>
</tr>
</table>
<table class="parameters">
<tr>
@@ -219,7 +225,11 @@
"excluded": <span class="resp-dtype">boolean</span> <span class="resp-desc">whether the source was skipped for being in the excluded URL list</span>
},
...
]
],
<span class="resp-cond">only if action=compare and detail=true</span> "detail": {
"article": <span class="resp-dtype">string</span> <span class="resp-desc">article text, with shared passages marked with HTML</span>,
"source": <span class="resp-dtype">string</span> <span class="resp-desc">best source text, with shared passages marked with HTML</span>
}
}</pre>
<p>In the case of <span class="code">action=search</span>, <span class="code">sources</span> will contain one entry for each source checked (or skipped if the check ends early), sorted in order of confidence, with skipped and excluded sources at the bottom.</p>
<p>In the case of <span class="code">action=compare</span>, <span class="code">best</span> will always contain information about the URL that was given, so <span class="code">response["best"]["url"]</span> will never be <span class="code">null</span>. Also, <span class="code">sources</span> will always contain one entry, with the same data as <span class="code">best</span>, since only one source is checked in comparison mode.</p>
@@ -241,11 +251,8 @@
...
]
}</pre>
<h2>Caveats</h2>
<ul>
<li>There is currently no way to get the contents of the article or suspected source, nor can you get the data behind the visual comparison available from the main tool. This may be changed in a future version if there is sufficient demand for it.</li>
<li>Requests are typically not rate-limited, but the tool uses the same workers to handle all requests, so making simultaneous API calls is only going to slow you down. In general, you are fine making an unlimited number of requests, as long as they are not concurrent and you wait a few seconds between them.</li>
</ul>
<h2>Etiquette</h2>
The tool uses the same workers to handle all requests, so making concurrent API calls is only going to slow you down. Most operations are not rate-limited, but full searches with <span class="code">use_engine=True</span> are globally limited to a few thousand per day. Be respectful!
<h2>Example</h2>
<p><a class="no-color" href="https://tools.wmflabs.org/copyvios/api.json?version=1&amp;action=search&amp;project=wikipedia&amp;lang=en&amp;title=User:EarwigBot/Copyvios/Tests/2"><span class="code">https://tools.wmflabs.org/copyvios/api.json?<span class="param-key">version</span>=<span class="param-val">1</span>&amp;<span class="param-key">action</span>=<span class="param-val">search</span>&amp;<span class="param-key">project</span>=<span class="param-val">wikipedia</span>&amp;<span class="param-key">lang</span>=<span class="param-val">en</span>&amp;<span class="param-key">title</span>=<span class="param-val">User:EarwigBot/Copyvios/Tests/2</span></span></a></p>
<pre>{


Načítá se…
Zrušit
Uložit