%!
from json import dumps
from flask import url_for
%>\
<%def name="do_indent(size)">
% for i in xrange(size):
This is the first version of the API for Earwig's Copyvio Detector. Please report any issues you encounter.
The API responds to GET requests made to https://copyvios.toolforge.org/api.json. Parameters are described in the tables below:
Always | |||
---|---|---|---|
Parameter | Values | Required? | Description |
action | compare, search, sites | Yes | The API will do URL comparisons in compare mode, run full copyvio checks in search mode, and list all known site languages and projects in sites mode. |
format | json, jsonfm | No (default: json) | The default output format is JSON. jsonfm mode produces the same output, but renders it as a formatted HTML document for debugging. |
version | integer | No (default: 1) | Currently, the API only has one version. You can skip this parameter, but it is recommended to include it for forward compatibility. |
compare Mode | |||
---|---|---|---|
Parameter | Values | Required? | Description |
project | string | Yes | The project code of the site the page lives on. Examples are wikipedia and wiktionary. A list of acceptable values can be retrieved using action=sites. |
lang | string | Yes | The language code of the site the page lives on. Examples are en and de. A list of acceptable values can be retrieved using action=sites. |
title | string | Yes (either title or oldid) | The title of the page or article to make a comparison against. Namespace must be included if the page isn't in the mainspace. |
oldid | integer | Yes (either title or oldid) | The revision ID (also called oldid) of the page revision to make a comparison against. If both a title and oldid are given, the oldid will be used. |
url | string | Yes | The URL of the suspected violation source that will be compared to the page. |
detail | boolean | No (default: false) | Whether to include the detailed HTML text comparison available in the regular interface. If not, only the similarity percentage is available. |
search Mode | |||
---|---|---|---|
Parameter | Values | Required? | Description |
project | string | Yes | The project code of the site the page lives on. Examples are wikipedia and wiktionary. A list of acceptable values can be retrieved using action=sites. |
lang | string | Yes | The language code of the site the page lives on. Examples are en and de. A list of acceptable values can be retrieved using action=sites. |
title | string | Yes (either title or oldid) | The title of the page or article to make a check against. Namespace must be included if the page isn't in the mainspace. |
oldid | integer | Yes (either title or oldid) | The revision ID (also called oldid) of the page revision to make a check against. If both a title and oldid are given, the oldid will be used. |
use_engine | boolean | No (default: true) | Whether to use a search engine (Google) as a source of URLs to compare against the page. |
use_links | boolean | No (default: true) | Whether to compare the page against external links found in its wikitext. |
nocache | boolean | No (default: false) | Whether to bypass search results cached from previous checks. It is recommended that you don't pass this option unless a user specifically asks for it. |
noredirect | boolean | No (default: false) | Whether to avoid following redirects if the given page is a redirect. |
noskip | boolean | No (default: false) | If a suspected source is found during a check to have a sufficiently high similarity value, the check will end prematurely, and other pending URLs will be skipped. Passing this option will prevent this behavior, resulting in complete (but more time-consuming) checks. |
The JSON response object always contains a status key, whose value is either ok or error. If an error has occurred, the response will look like this:
{ "status": "error", "error": { "code": string error code, "info": string human-readable description of error } }
Valid responses for action=compare and action=search are formatted like this:
{ "status": "ok", "meta": { "time": float time to generate results, in seconds, "queries": int number of search engine queries made, "cached": boolean whether these results are cached from an earlier search (always false in the case of action=compare), "redirected": boolean whether a redirect was followed, only if cached=true "cache_time": string human-readable time of the original search that the results are cached from }, "page": { "title": string the normalized title of the page checked, "url": string the full URL of the page checked }, only if redirected=true "original_page": { "title": string the normalized title of the original page whose redirect was followed, "url": string the full URL of the original page whose redirect was followed }, "best": { "url": string the URL of the best match found, or null if no matches were found, "confidence": float the similarity of a violation in the best match, or 0.0 if no matches were found, "violation": string one of "suspected", "possible", or "none" }, "sources": [ { "url": string the URL of the source, "confidence": float the similarity of the source to the page checked as a ratio between 0.0 and 1.0, "violation": string one of "suspected", "possible", or "none", "skipped": boolean whether the source was skipped due to the check finishing early (see note about noskip above) or an exclusion, "excluded": boolean whether the source was skipped for being in the excluded URL list }, ... ], only if action=compare and detail=true "detail": { "article": string article text, with shared passages marked with HTML, "source": string source text, with shared passages marked with HTML } }
In the case of action=search, sources will contain one entry for each source checked (or skipped if the check ends early), sorted by similarity, with skipped and excluded sources at the bottom.
In the case of action=compare, best will always contain information about the URL that was given, so response["best"]["url"] will never be null. Also, sources will always contain one entry, with the same data as best, since only one source is checked in comparison mode.
Valid responses for action=sites are formatted like this:
{ "status": "ok", "langs": [ [ string language code, string human-readable language name ], ... ], "projects": [ [ string project code, string human-readable project name ], ... ] }
The tool uses the same workers to handle all requests, so making concurrent API calls is only going to slow you down. Most operations are not rate-limited, but full searches with use_engine=True are globally limited to around a thousand per day. Be respectful!
Aside from testing, you must set a reasonable user agent that identifies your bot and and gives some way to contact you. You may be blocked if using an improper user agent (for example, the default user agent set by your HTTP library), or if your bot makes requests too frequently.
{ "status": "ok", "meta": { "time": 2.2474379539489746, "queries": 1, "cached": false, "redirected": false }, "page": { "title": "User:EarwigBot/Copyvios/Tests/2", "url": "https://en.wikipedia.org/wiki/User:EarwigBot/Copyvios/Tests/2" }, "best": { "url": "http://www.whitehouse.gov/administration/president-obama/", "confidence": 0.9886608511242603, "violation": "suspected" } "sources": [ { "url": "http://www.whitehouse.gov/administration/president-obama/", "confidence": 0.9886608511242603, "violation": "suspected", "skipped": false, "excluded": false }, { "url": "http://maige2009.blogspot.com/2013/07/barack-h-obama-is-44th-president-of.html", "confidence": 0.9864798816568047, "violation": "suspected", "skipped": false, "excluded": false }, { "url": "http://jeuxdemonstre-apkdownload.rhcloud.com/luo-people-of-kenya-and-tanzania---wikipedia--the-free", "confidence": 0.0, "violation": "none", "skipped": false, "excluded": false }, { "url": "http://www.whitehouse.gov/about/presidents/barackobama", "confidence": 0.0, "violation": "none", "skipped": true, "excluded": false }, { "url": "http://jeuxdemonstre-apkdownload.rhcloud.com/president-barack-obama---the-white-house", "confidence": 0.0, "violation": "none", "skipped": true, "excluded": false } ] }
You are using jsonfm output mode, which renders JSON data as a formatted HTML document. This is intended for testing and debugging only.