A copyright violation detector running on Wikimedia Cloud Services https://tools.wmflabs.org/copyvios/
25'ten fazla konu seçemezsiniz Konular bir harf veya rakamla başlamalı, kısa çizgiler ('-') içerebilir ve en fazla 35 karakter uzunluğunda olabilir.
 
 
 
 
 

284 satır
16 KiB

  1. <%def name="walk_json(obj)">
  2. <!-- TODO -->
  3. ${obj | h}
  4. </%def>
  5. <!DOCTYPE html>
  6. <html lang="en">
  7. <head>
  8. <meta charset="utf-8">
  9. <title>API - Earwig's Copyvio Detector</title>
  10. <link rel="stylesheet" href="${request.script_root}/static/api.min.css" type="text/css" />
  11. </head>
  12. <body>
  13. % if help:
  14. <div id="help">
  15. <h1>Copyvio Detector API</h1>
  16. <p>This is the first version of the <a href="//en.wikipedia.org/wiki/Application_programming_interface">API</a> for <a href="${request.script_root}">Earwig's Copyvio Detector</a>. It works, but some bugs might still need to be ironed out, so please <a href="https://github.com/earwig/copyvios/issues">report any</a> if you see them.</p>
  17. <h2>Requests</h2>
  18. <p>The API responds to GET requests made to <span class="code">https://tools.wmflabs.org/copyvios/api.json</span>. Parameters are described in the tables below:</p>
  19. <table class="parameters">
  20. <tr>
  21. <th colspan="4">Always</th>
  22. </tr>
  23. <tr>
  24. <th>Parameter</th>
  25. <th>Values</th>
  26. <th>Required?</th>
  27. <th>Description</th>
  28. </tr>
  29. <tr>
  30. <td>action</td>
  31. <td><span class="code">compare</span>, <span class="code">search</span>, <span class="code">sites</span></td>
  32. <td>Yes</td>
  33. <td>The API will do URL comparisons in <span class="code">compare</span> mode, run full copyvio checks in <span class="code">search</span> mode, and list all known site languages and projects in <span class="code">sites</span> mode.</td>
  34. </tr>
  35. <tr>
  36. <td>format</td>
  37. <td><span class="code">json</span>, <span class="code">jsonfm</span></td>
  38. <td>No&nbsp;(default:&nbsp;<span class="code">json</span>)</td>
  39. <td>The default output format is <a href="http://json.org/">JSON</a>. <span class="code">jsonfm</span> mode produces the same output, but renders it as a formatted HTML document for debugging.</td>
  40. </tr>
  41. <tr>
  42. <td>version</td>
  43. <td>integer</td>
  44. <td>No (default: <span class="code">1</span>)</td>
  45. <td>Currently, the API only has one version. You can skip this parameter, but it is recommended to include it for forward compatibility.</td>
  46. </tr>
  47. </table>
  48. <table class="parameters">
  49. <tr>
  50. <th colspan="4"><span class="code">compare</span> Mode</th>
  51. </tr>
  52. <tr>
  53. <th>Parameter</th>
  54. <th>Values</th>
  55. <th>Required?</th>
  56. <th>Description</th>
  57. </tr>
  58. <tr>
  59. <td>project</td>
  60. <td>string</td>
  61. <td>Yes</td>
  62. <td>The project code of the site the page lives on. Examples are <span class="code">wikipedia</span> and <span class="code">wiktionary</span>. A list of acceptable values can be retrieved using <span class="code">action=sites</span>.</td>
  63. </tr>
  64. <tr>
  65. <td>lang</td>
  66. <td>string</td>
  67. <td>Yes</td>
  68. <td>The language code of the site the page lives on. Examples are <span class="code">en</span> and <span class="code">de</span>. A list of acceptable values can be retrieved using <span class="code">action=sites</span>.</td>
  69. </tr>
  70. <tr>
  71. <td>title</td>
  72. <td>string</td>
  73. <td>Yes&nbsp;(either&nbsp;<span class="code">title</span>&nbsp;or&nbsp;<span class="code">oldid</span>)</td>
  74. <td>The title of the page or article to make a comparison against. Namespace must be included if the page isn't in the mainspace.</td>
  75. </tr>
  76. <tr>
  77. <td>oldid</td>
  78. <td>integer</td>
  79. <td>Yes (either <span class="code">title</span> or <span class="code">oldid</span>)</td>
  80. <td>The revision ID (also called oldid) of the page revision to make a comparison against. If both a title and oldid are given, the oldid will be used.</td>
  81. </tr>
  82. <tr>
  83. <td>url</td>
  84. <td>string</td>
  85. <td>Yes</td>
  86. <td>The URL of the suspected violation source that will be compared to the page.</td>
  87. </tr>
  88. </table>
  89. <table class="parameters">
  90. <tr>
  91. <th colspan="4"><span class="code">search</span> Mode</th>
  92. </tr>
  93. <tr>
  94. <th>Parameter</th>
  95. <th>Values</th>
  96. <th>Required?</th>
  97. <th>Description</th>
  98. </tr>
  99. <tr>
  100. <td>project</td>
  101. <td>string</td>
  102. <td>Yes</td>
  103. <td>The project code of the site the page lives on. Examples are <span class="code">wikipedia</span> and <span class="code">wiktionary</span>. A list of acceptable values can be retrieved using <span class="code">action=sites</span>.</td>
  104. </tr>
  105. <tr>
  106. <td>lang</td>
  107. <td>string</td>
  108. <td>Yes</td>
  109. <td>The language code of the site the page lives on. Examples are <span class="code">en</span> and <span class="code">de</span>. A list of acceptable values can be retrieved using <span class="code">action=sites</span>.</td>
  110. </tr>
  111. <tr>
  112. <td>title</td>
  113. <td>string</td>
  114. <td>Yes&nbsp;(either&nbsp;<span class="code">title</span>&nbsp;or&nbsp;<span class="code">oldid</span>)</td>
  115. <td>The title of the page or article to make a check against. Namespace must be included if the page isn't in the mainspace.</td>
  116. </tr>
  117. <tr>
  118. <td>oldid</td>
  119. <td>integer</td>
  120. <td>Yes (either <span class="code">title</span> or <span class="code">oldid</span>)</td>
  121. <td>The revision ID (also called oldid) of the page revision to make a check against. If both a title and oldid are given, the oldid will be used.</td>
  122. </tr>
  123. <tr>
  124. <td>use_engine</td>
  125. <td>boolean</td>
  126. <td>No (default: <span class="code">true</span>)</td>
  127. <td>Whether to use a search engine (<a href="//developer.yahoo.com/boss/search/">Yahoo! BOSS</a>) as a source of URLs to compare against the page.</td>
  128. </tr>
  129. <tr>
  130. <td>use_links</td>
  131. <td>boolean</td>
  132. <td>No (default: <span class="code">true</span>)</td>
  133. <td>Whether to compare the page against external links found in its wikitext.</td>
  134. </tr>
  135. <tr>
  136. <td>nocache</td>
  137. <td>boolean</td>
  138. <td>No (default: <span class="code">false</span>)</td>
  139. <td>Whether to bypass search results cached from previous checks. It is recommended that you don't pass this option unless a user specifically asks for it.</td>
  140. </tr>
  141. <tr>
  142. <td>noredirect</td>
  143. <td>boolean</td>
  144. <td>No (default: <span class="code">false</span>)</td>
  145. <td>Whether to avoid following redirects if the given page is a redirect.</td>
  146. </tr>
  147. <tr>
  148. <td>noskip</td>
  149. <td>boolean</td>
  150. <td>No (default: <span class="code">false</span>)</td>
  151. <td>If a suspected source is found during a check to have a sufficiently high confidence value, the check will end prematurely, and other pending URLs will be skipped. Passing this option will prevent this behavior, resulting in complete (but more time-consuming) checks.</td>
  152. </tr>
  153. </table>
  154. <h2>Responses</h2>
  155. <p>The JSON response object always contains a <span class="code">status</span> key, whose value is either <span class="code">ok</span> or <span class="code">error</span>. If an error has occurred, the response will look like this:</p>
  156. <pre>{
  157. "status": "error",
  158. "error": {
  159. "code": (string) error code,
  160. "info": (string) human-readable description of error
  161. }
  162. }</pre>
  163. <p>Valid responses for <span class="code">action=compare</span> and <span class="code">action=search</span> are formatted like this:</p>
  164. <pre>{
  165. "status": "ok",
  166. "meta": {
  167. "time": (float) time to generate results, in seconds,
  168. "queries": (int) number of search engine queries made,
  169. "cached": (boolean) whether or not these results are cached from an earlier search (always false in the case of action=compare),
  170. (only if cached=true) "cache_time": (string) human-readable time of the original search that the results are cached from
  171. "redirected": (boolean) whether or not a redirect was followed
  172. },
  173. "page": {
  174. "title": (string) the normalized title of the page checked,
  175. "url": (string) the full URL of the page checked
  176. },
  177. (only if redirected=true) "original_page": {
  178. "title": (string) the normalized title of the original page whose redirect was followed,
  179. "url": (string) the full URL of the original page whose redirect was followed
  180. },
  181. "best": {
  182. "url": (string) the URL of the best match found, or null if no matches were found,
  183. "confidence": (float) the confidence of a violation in the best match, or 0.0 if no matches were found,
  184. "violation": (string) one of "suspected", "possible", or "none"
  185. },
  186. "sources": [
  187. {
  188. "url": (string) the URL of the source,
  189. "confidence": (float) the confidence of a violation in the source,
  190. "violation": (string) one of "suspected", "possible", or "none",
  191. "skipped": (boolean) whether or not the source was skipped due to the check finishing early (see note about noskip above)
  192. },
  193. ...
  194. ]
  195. }</pre>
  196. <p>In the case of <span class="code">action=search</span>, <span class="code">sources</span> will contain one entry for each source checked (or skipped if the check ends early), sorted in order of confidence, with skipped sources at the bottom.</p>
  197. <p>In the case of <span class="code">action=compare</span>, <span class="code">best</span> will always contain information about the URL that was given, so <span class="code">response["best"]["url"]</span> will never be <span class="code">null</span>. Also, <span class="code">sources</span> will always contain one entry, with the same data as <span class="code">best</span>, since only one source is checked in comparison mode.</p>
  198. <p>Valid responses for <span class="code">action=sites</span> are formatted like this:</p>
  199. <pre>{
  200. "status": "ok",
  201. "langs": [
  202. [
  203. (string) language code,
  204. (string) human-readable language name
  205. ],
  206. ...
  207. ],
  208. "projects": [
  209. [
  210. (string) project code,
  211. (string) human-readable project name
  212. ],
  213. ...
  214. ]
  215. }</pre>
  216. <h2>Caveats</h2>
  217. <ul>
  218. <li>There is currently no way to get the contents of the article or suspected source, nor can you get the data behind the visual comparison available from the main tool. This may be changed in a future version if there is sufficient demand for it.</li>
  219. <li>Requests are typically not rate-limited, but the tool uses the same workers to handle all requests, so making simultaneous API calls is only going to slow you down. In general, you are fine making an unlimited number of requests, as long as they are not concurrent and you wait a few seconds between them.</li>
  220. </ul>
  221. <h2>Example</h2>
  222. <p><a class="no-color" href="https://tools.wmflabs.org/copyvios/api.json?version=1&amp;action=search&amp;project=wikipedia&amp;lang=en&amp;title=User:The_Earwig/Sandbox/CopyvioExample"><span class="code">https://tools.wmflabs.org/copyvios/api.json?<span class="param-key">version</span>=<span class="param-val">1</span>&amp;<span class="param-key">action</span>=<span class="param-val">search</span>&amp;<span class="param-key">project</span>=<span class="param-val">wikipedia</span>&amp;<span class="param-key">lang</span>=<span class="param-val">en</span>&amp;<span class="param-key">title</span>=<span class="param-val">User:The_Earwig/Sandbox/CopyvioExample</span></span></a></p>
  223. <pre>{
  224. "status": "ok",
  225. "meta": {
  226. "time": 2.2474379539489746,
  227. "queries": 1,
  228. "cached": false,
  229. "redirected": false
  230. },
  231. "page": {
  232. "title": "User:The Earwig/Sandbox/CopyvioExample",
  233. "url": "https://en.wikipedia.org/wiki/User:The_Earwig/Sandbox/CopyvioExample"
  234. },
  235. "best": {
  236. "url": "http://www.whitehouse.gov/administration/president-obama/",
  237. "confidence": 0.9886608511242603,
  238. "violation": "suspected"
  239. }
  240. "sources": [
  241. {
  242. "url": "http://www.whitehouse.gov/administration/president-obama/",
  243. "confidence": 0.9886608511242603,
  244. "violation": "suspected",
  245. "skipped": false
  246. },
  247. {
  248. "url": "http://maige2009.blogspot.com/2013/07/barack-h-obama-is-44th-president-of.html",
  249. "confidence": 0.9864798816568047,
  250. "violation": "suspected",
  251. "skipped": false
  252. },
  253. {
  254. "url": "http://jeuxdemonstre-apkdownload.rhcloud.com/luo-people-of-kenya-and-tanzania---wikipedia--the-free",
  255. "confidence": 0.0,
  256. "violation": "none",
  257. "skipped": false
  258. },
  259. {
  260. "url": "http://www.whitehouse.gov/about/presidents/barackobama",
  261. "confidence": 0.0,
  262. "violation": "none",
  263. "skipped": true
  264. },
  265. {
  266. "url": "http://jeuxdemonstre-apkdownload.rhcloud.com/president-barack-obama---the-white-house",
  267. "confidence": 0.0,
  268. "violation": "none",
  269. "skipped": true
  270. }
  271. ]
  272. }
  273. </pre>
  274. </div>
  275. % endif
  276. % if result:
  277. <div id="result">
  278. <p>You are using <span class="code">jsonfm</span> output mode, which renders JSON data as a formatted HTML document. This is intended for testing and debugging only.</p>
  279. <pre>${walk_json(result)}</pre>
  280. </div>
  281. % endif
  282. </body>
  283. </html>