%def name="do_indent(size)"> % for i in xrange(size):
% endfor %def>\ <%def name="walk_json(obj, indent=0)"> % if isinstance(obj, type({})): { % for key in obj: ${do_indent(indent + 1)} "${key | h}": ${walk_json(obj[key], indent + 1)}${"," if not loop.last else ""} % endfor ${do_indent(indent)} } % elif isinstance(obj, type([])): [ % for member in obj: ${do_indent(indent + 1)} ${walk_json(member, indent + 1)}${"," if not loop.last else ""} % endfor ${do_indent(indent)} ] % else: ${obj | h} % endif %def>\This is the first version of the API for Earwig's Copyvio Detector. It works, but some bugs might still need to be ironed out, so please report any if you see them.
The API responds to GET requests made to https://tools.wmflabs.org/copyvios/api.json. Parameters are described in the tables below:
Always | |||
---|---|---|---|
Parameter | Values | Required? | Description |
action | compare, search, sites | Yes | The API will do URL comparisons in compare mode, run full copyvio checks in search mode, and list all known site languages and projects in sites mode. |
format | json, jsonfm | No (default: json) | The default output format is JSON. jsonfm mode produces the same output, but renders it as a formatted HTML document for debugging. |
version | integer | No (default: 1) | Currently, the API only has one version. You can skip this parameter, but it is recommended to include it for forward compatibility. |
compare Mode | |||
---|---|---|---|
Parameter | Values | Required? | Description |
project | string | Yes | The project code of the site the page lives on. Examples are wikipedia and wiktionary. A list of acceptable values can be retrieved using action=sites. |
lang | string | Yes | The language code of the site the page lives on. Examples are en and de. A list of acceptable values can be retrieved using action=sites. |
title | string | Yes (either title or oldid) | The title of the page or article to make a comparison against. Namespace must be included if the page isn't in the mainspace. |
oldid | integer | Yes (either title or oldid) | The revision ID (also called oldid) of the page revision to make a comparison against. If both a title and oldid are given, the oldid will be used. |
url | string | Yes | The URL of the suspected violation source that will be compared to the page. |
search Mode | |||
---|---|---|---|
Parameter | Values | Required? | Description |
project | string | Yes | The project code of the site the page lives on. Examples are wikipedia and wiktionary. A list of acceptable values can be retrieved using action=sites. |
lang | string | Yes | The language code of the site the page lives on. Examples are en and de. A list of acceptable values can be retrieved using action=sites. |
title | string | Yes (either title or oldid) | The title of the page or article to make a check against. Namespace must be included if the page isn't in the mainspace. |
oldid | integer | Yes (either title or oldid) | The revision ID (also called oldid) of the page revision to make a check against. If both a title and oldid are given, the oldid will be used. |
use_engine | boolean | No (default: true) | Whether to use a search engine (Yahoo! BOSS) as a source of URLs to compare against the page. |
use_links | boolean | No (default: true) | Whether to compare the page against external links found in its wikitext. |
nocache | boolean | No (default: false) | Whether to bypass search results cached from previous checks. It is recommended that you don't pass this option unless a user specifically asks for it. |
noredirect | boolean | No (default: false) | Whether to avoid following redirects if the given page is a redirect. |
noskip | boolean | No (default: false) | If a suspected source is found during a check to have a sufficiently high confidence value, the check will end prematurely, and other pending URLs will be skipped. Passing this option will prevent this behavior, resulting in complete (but more time-consuming) checks. |
The JSON response object always contains a status key, whose value is either ok or error. If an error has occurred, the response will look like this:
{ "status": "error", "error": { "code": string error code, "info": string human-readable description of error } }
Valid responses for action=compare and action=search are formatted like this:
{ "status": "ok", "meta": { "time": float time to generate results, in seconds, "queries": int number of search engine queries made, "cached": boolean whether or not these results are cached from an earlier search (always false in the case of action=compare), only if cached=true "cache_time": string human-readable time of the original search that the results are cached from, "redirected": boolean whether or not a redirect was followed }, "page": { "title": string the normalized title of the page checked, "url": string the full URL of the page checked }, only if redirected=true "original_page": { "title": string the normalized title of the original page whose redirect was followed, "url": string the full URL of the original page whose redirect was followed }, "best": { "url": string the URL of the best match found, or null if no matches were found, "confidence": float the confidence of a violation in the best match, or 0.0 if no matches were found, "violation": string one of "suspected", "possible", or "none" }, "sources": [ { "url": string the URL of the source, "confidence": float the confidence of a violation in the source, "violation": string one of "suspected", "possible", or "none", "skipped": boolean whether or not the source was skipped due to the check finishing early (see note about noskip above) }, ... ] }
In the case of action=search, sources will contain one entry for each source checked (or skipped if the check ends early), sorted in order of confidence, with skipped sources at the bottom.
In the case of action=compare, best will always contain information about the URL that was given, so response["best"]["url"] will never be null. Also, sources will always contain one entry, with the same data as best, since only one source is checked in comparison mode.
Valid responses for action=sites are formatted like this:
{ "status": "ok", "langs": [ [ string language code, string human-readable language name ], ... ], "projects": [ [ string project code, string human-readable project name ], ... ] }
{ "status": "ok", "meta": { "time": 2.2474379539489746, "queries": 1, "cached": false, "redirected": false }, "page": { "title": "User:The Earwig/Sandbox/CopyvioExample", "url": "https://en.wikipedia.org/wiki/User:The_Earwig/Sandbox/CopyvioExample" }, "best": { "url": "http://www.whitehouse.gov/administration/president-obama/", "confidence": 0.9886608511242603, "violation": "suspected" } "sources": [ { "url": "http://www.whitehouse.gov/administration/president-obama/", "confidence": 0.9886608511242603, "violation": "suspected", "skipped": false }, { "url": "http://maige2009.blogspot.com/2013/07/barack-h-obama-is-44th-president-of.html", "confidence": 0.9864798816568047, "violation": "suspected", "skipped": false }, { "url": "http://jeuxdemonstre-apkdownload.rhcloud.com/luo-people-of-kenya-and-tanzania---wikipedia--the-free", "confidence": 0.0, "violation": "none", "skipped": false }, { "url": "http://www.whitehouse.gov/about/presidents/barackobama", "confidence": 0.0, "violation": "none", "skipped": true }, { "url": "http://jeuxdemonstre-apkdownload.rhcloud.com/president-barack-obama---the-white-house", "confidence": 0.0, "violation": "none", "skipped": true } ] }
You are using jsonfm output mode, which renders JSON data as a formatted HTML document. This is intended for testing and debugging only.