Browse Source

Update URLs

copyvios-ng
Ben Kurtovic 4 years ago
parent
commit
eacb294735
9 changed files with 29 additions and 29 deletions
  1. +8
    -8
      README.md
  2. +1
    -1
      app.py
  3. +1
    -1
      copyvios/turnitin.py
  4. +1
    -1
      static/toolinfo.json
  5. +4
    -4
      templates/api.mako
  6. +10
    -10
      templates/index.mako
  7. +1
    -1
      templates/settings.mako
  8. +2
    -2
      templates/support/footer.mako
  9. +1
    -1
      templates/support/header.mako

+ 8
- 8
README.md View File

@@ -1,21 +1,21 @@
This is a [copyright violation](https://en.wikipedia.org/wiki/WP:COPYVIO)
detector running on [Wikimedia Labs](https://tools.wmflabs.org/copyvios).
detector running on [Wikimedia Cloud Services](https://copyvios.toolforge.org/).

It can search the web for content similar to a given article, and graphically
compare an article to a specific URL. Some technical details are expanded upon
[in a blog post](http://benkurtovic.com/2014/08/20/copyvio-detector.html).
[in a blog post](https://benkurtovic.com/2014/08/20/copyvio-detector.html).

Dependencies
============

* [earwigbot](https://github.com/earwig/earwigbot) >= 0.1
* [flask](http://flask.pocoo.org/) >= 0.10.1
* [flask](https://flask.palletsprojects.com/) >= 0.10.1
* [flask-mako](https://pythonhosted.org/Flask-Mako/) >= 0.3
* [mako](http://www.makotemplates.org/) >= 0.7.2
* [mako](https://www.makotemplates.org/) >= 0.7.2
* [mwparserfromhell](https://github.com/earwig/mwparserfromhell) >= 0.3
* [oursql](http://packages.python.org/oursql/) >= 0.9.3.1
* [requests](http://python-requests.org/) >= 2.9.1
* [SQLAlchemy](http://sqlalchemy.org/) >= 0.9.6
* [oursql](https://pythonhosted.org/oursql/) >= 0.9.3.1
* [requests](https://requests.readthedocs.io/) >= 2.9.1
* [SQLAlchemy](https://www.sqlalchemy.org/) >= 0.9.6
* [apsw](https://github.com/rogerbinns/apsw) >= 3.26.0
* [uglifycss](https://github.com/fmarcia/UglifyCSS/)
* [uglifyjs](https://github.com/mishoo/UglifyJS/) >= 1.3.3
@@ -25,7 +25,7 @@ Running

- If using Tool Labs, you should clone the repository to `~/www/python/src`, or
otherwise symlink it to that directory. A
[virtualenv](http://virtualenv.readthedocs.org/) should be created at
[virtualenv](https://virtualenv.pypa.io/) should be created at
`~/www/python/venv`.

- Install all dependencies listed above.


+ 1
- 1
app.py View File

@@ -58,7 +58,7 @@ def setup_app():
def prepare_request():
g._db = None
g.cookies = parse_cookies(
request.script_root, request.environ.get("HTTP_COOKIE"))
request.script_root or "/", request.environ.get("HTTP_COOKIE"))
g.new_cookies = []

@app.after_request


+ 1
- 1
copyvios/turnitin.py View File

@@ -8,7 +8,7 @@ from .misc import parse_wiki_timestamp

__all__ = ['search_turnitin', 'TURNITIN_API_ENDPOINT']

TURNITIN_API_ENDPOINT = 'http://tools.wmflabs.org/eranbot/plagiabot/api.py'
TURNITIN_API_ENDPOINT = 'https://eranbot.toolforge.org/plagiabot/api.py'

def search_turnitin(page_title, lang):
""" Search the Plagiabot database for Turnitin reports for a page.


+ 1
- 1
static/toolinfo.json View File

@@ -2,7 +2,7 @@
"name" : "copyvios",
"title" : "Copyvios",
"description" : "Detects copyright violations in pages by searching for their contents online. Can also compare a page and a specific URL.",
"url" : "https://tools.wmflabs.org/copyvios",
"url" : "https://copyvios.toolforge.org/",
"keywords" : "copyvios, copyright violations",
"author" : "The Earwig",
"repository" : "https://github.com/earwig/copyvios"


+ 4
- 4
templates/api.mako View File

@@ -40,9 +40,9 @@
% if help:
<div id="help">
<h1>Copyvio Detector API</h1>
<p>This is the first version of the <a href="//en.wikipedia.org/wiki/Application_programming_interface">API</a> for <a href="${request.script_root}">Earwig's Copyvio Detector</a>. It works, but some bugs might still need to be ironed out, so please <a href="https://github.com/earwig/copyvios/issues">report any</a> if you see them.</p>
<p>This is the first version of the <a href="https://en.wikipedia.org/wiki/Application_programming_interface">API</a> for <a href="${request.script_root}/">Earwig's Copyvio Detector</a>. It works, but some bugs might still need to be ironed out, so please <a href="https://github.com/earwig/copyvios/issues">report any</a> if you see them.</p>
<h2>Requests</h2>
<p>The API responds to GET requests made to <span class="code">https://tools.wmflabs.org/copyvios/api.json</span>. Parameters are described in the tables below:</p>
<p>The API responds to GET requests made to <span class="code">https://copyvios.toolforge.org/api.json</span>. Parameters are described in the tables below:</p>
<table class="parameters">
<tr>
<th colspan="4">Always</th>
@@ -63,7 +63,7 @@
<td>format</td>
<td><span class="code">json</span>, <span class="code">jsonfm</span></td>
<td>No&nbsp;(default:&nbsp;<span class="code">json</span>)</td>
<td>The default output format is <a href="http://json.org/">JSON</a>. <span class="code">jsonfm</span> mode produces the same output, but renders it as a formatted HTML document for debugging.</td>
<td>The default output format is <a href="https://www.json.org/">JSON</a>. <span class="code">jsonfm</span> mode produces the same output, but renders it as a formatted HTML document for debugging.</td>
</tr>
<tr>
<td>version</td>
@@ -254,7 +254,7 @@
<h2>Etiquette</h2>
The tool uses the same workers to handle all requests, so making concurrent API calls is only going to slow you down. Most operations are not rate-limited, but full searches with <span class="code">use_engine=True</span> are globally limited to around a thousand per day. Be respectful!
<h2>Example</h2>
<p><a class="no-color" href="https://tools.wmflabs.org/copyvios/api.json?version=1&amp;action=search&amp;project=wikipedia&amp;lang=en&amp;title=User:EarwigBot/Copyvios/Tests/2"><span class="code">https://tools.wmflabs.org/copyvios/api.json?<span class="param-key">version</span>=<span class="param-val">1</span>&amp;<span class="param-key">action</span>=<span class="param-val">search</span>&amp;<span class="param-key">project</span>=<span class="param-val">wikipedia</span>&amp;<span class="param-key">lang</span>=<span class="param-val">en</span>&amp;<span class="param-key">title</span>=<span class="param-val">User:EarwigBot/Copyvios/Tests/2</span></span></a></p>
<p><a class="no-color" href="https://copyvios.toolforge.org/api.json?version=1&amp;action=search&amp;project=wikipedia&amp;lang=en&amp;title=User:EarwigBot/Copyvios/Tests/2"><span class="code">https://copyvios.toolforge.org/api.json?<span class="param-key">version</span>=<span class="param-val">1</span>&amp;<span class="param-key">action</span>=<span class="param-val">search</span>&amp;<span class="param-key">project</span>=<span class="param-val">wikipedia</span>&amp;<span class="param-key">lang</span>=<span class="param-val">en</span>&amp;<span class="param-key">title</span>=<span class="param-val">User:EarwigBot/Copyvios/Tests/2</span></span></a></p>
<pre>{
"status": "ok",
"meta": {


+ 10
- 10
templates/index.mako View File

@@ -35,11 +35,11 @@
</p></div>
% elif not query.site:
<div id="info-box" class="red-box">
<p>The given site (project=<b><span class="mono">${query.project | h}</span></b>, language=<b><span class="mono">${query.lang | h}</span></b>) doesn't seem to exist. It may also be closed or private. <a href="//${query.lang | h}.${query.project | h}.org/">Confirm its URL.</a></p>
<p>The given site (project=<b><span class="mono">${query.project | h}</span></b>, language=<b><span class="mono">${query.lang | h}</span></b>) doesn't seem to exist. It may also be closed or private. <a href="https://${query.lang | h}.${query.project | h}.org/">Confirm its URL.</a></p>
</div>
% elif query.oldid and not result:
<div id="info-box" class="red-box">
<p>The given revision ID doesn't seem to exist: <a href="//${query.site.domain | h}/w/index.php?oldid=${query.oldid | h}">${query.oldid | h}</a>.</p>
<p>The given revision ID doesn't seem to exist: <a href="https://${query.site.domain | h}/w/index.php?oldid=${query.oldid | h}">${query.oldid | h}</a>.</p>
</div>
% elif query.title and not result:
<div id="info-box" class="red-box">
@@ -47,10 +47,10 @@
</div>
% endif
%endif
<p>This tool attempts to detect <a href="//en.wikipedia.org/wiki/WP:COPYVIO">copyright violations</a> in articles. In <i>search mode</i>, it will check for similar content elsewhere on the web using <a href="https://developers.google.com/custom-search/">Google</a>, external links present in the text of the page, or <a href="//en.wikipedia.org/wiki/Wikipedia:Turnitin">Turnitin</a> (provided by <a href="//en.wikipedia.org/wiki/User:EranBot">EranBot</a>), depending on which options are selected. In <i>comparison mode</i>, the tool will compare the article to a specific webpage without making additional searches, like the <a href="//tools.wmflabs.org/dupdet/">Duplication Detector</a>.</p>
<p>This tool attempts to detect <a href="https://en.wikipedia.org/wiki/WP:COPYVIO">copyright violations</a> in articles. In <i>search mode</i>, it will check for similar content elsewhere on the web using <a href="https://developers.google.com/custom-search/">Google</a>, external links present in the text of the page, or <a href="https://en.wikipedia.org/wiki/Wikipedia:Turnitin">Turnitin</a> (provided by <a href="https://en.wikipedia.org/wiki/User:EranBot">EranBot</a>), depending on which options are selected. In <i>comparison mode</i>, the tool will compare the article to a specific webpage without making additional searches, like the <a href="https://dupdet.toolforge.org/">Duplication Detector</a>.</p>
<p>Running a full check can take up to a minute if other websites are slow or if the tool is under heavy use. Please be patient. If you get a timeout, wait a moment and refresh the page.</p>
<p>Be aware that other websites can copy from Wikipedia, so check the results carefully, especially for older or well-developed articles. Specific websites can be skipped by being added to the <a href="//en.wikipedia.org/wiki/User:EarwigBot/Copyvios/Exclusions">excluded URL list</a>.</p>
<form id="cv-form" action="${request.script_root}" method="get">
<p>Be aware that other websites can copy from Wikipedia, so check the results carefully, especially for older or well-developed articles. Specific websites can be skipped by being added to the <a href="https://en.wikipedia.org/wiki/User:EarwigBot/Copyvios/Exclusions">excluded URL list</a>.</p>
<form id="cv-form" action="${request.script_root}/" method="get">
<table id="cv-form-outer">
<tr>
<td>Site:</td>
@@ -165,7 +165,7 @@
% else:
seconds.
% endif
<a href="${request.script_root | h}?lang=${query.lang | h}&amp;project=${query.project | h}&amp;oldid=${query.oldid or query.page.lastrevid | h}&amp;action=${query.action | h}&amp;${"use_engine={0}&use_links={1}".format(int(query.use_engine not in ("0", "false")), int(query.use_links not in ("0", "false"))) if query.action == "search" else "" | h}${"url=" if query.action == "compare" else ""}${query.url if query.action == "compare" else "" | u}">Permalink.</a>
<a href="${request.script_root | h}/?lang=${query.lang | h}&amp;project=${query.project | h}&amp;oldid=${query.oldid or query.page.lastrevid | h}&amp;action=${query.action | h}&amp;${"use_engine={0}&use_links={1}".format(int(query.use_engine not in ("0", "false")), int(query.use_links not in ("0", "false"))) if query.action == "search" else "" | h}${"url=" if query.action == "compare" else ""}${query.url if query.action == "compare" else "" | u}">Permalink.</a>
</div>

<div id="cv-result" class="${'red' if result.confidence >= T_SUSPECT else 'yellow' if result.confidence >= T_POSSIBLE else 'green'}-box">
@@ -179,11 +179,11 @@
<td>
<a href="${query.page.url}">${query.page.title | h}</a>
% if query.oldid:
@<a href="//${query.site.domain | h}/w/index.php?oldid=${query.oldid | h}">${query.oldid | h}</a>
@<a href="https://${query.site.domain | h}/w/index.php?oldid=${query.oldid | h}">${query.oldid | h}</a>
% endif
% if query.redirected_from:
<br />
<span id="redirected-from">Redirected from <a href="//${query.site.domain | h}/w/index.php?title=${query.redirected_from.title | u}&amp;redirect=no">${query.redirected_from.title | h}</a>. <a href="${request.url | httpsfix, h}&amp;noredirect=1">Check original.</a></span>
<span id="redirected-from">Redirected from <a href="https://${query.site.domain | h}/w/index.php?title=${query.redirected_from.title | u}&amp;redirect=no">${query.redirected_from.title | h}</a>. <a href="${request.url | httpsfix, h}&amp;noredirect=1">Check original.</a></span>
% endif
</td>
<td>
@@ -225,7 +225,7 @@
% if query.turnitin_result.reports:
<table id="turnitin-table"><tbody>
% for report in turnitin_result.reports:
<tr><td class="turnitin-table-cell"><a href="https://tools.wmflabs.org/eranbot/ithenticate.py?rid=${report.reportid}">Report ${report.reportid}</a> for text added at <a href="https://${query.lang}.wikipedia.org/w/index.php?title=${query.title}&amp;diff=${report.diffid}"> ${report.time_posted.strftime("%H:%M, %d %B %Y (UTC)")}</a>:
<tr><td class="turnitin-table-cell"><a href="https://eranbot.toolforge.org/ithenticate.py?rid=${report.reportid}">Report ${report.reportid}</a> for text added at <a href="https://${query.lang}.wikipedia.org/w/index.php?title=${query.title}&amp;diff=${report.diffid}"> ${report.time_posted.strftime("%H:%M, %d %B %Y (UTC)")}</a>:
<ul>
% for source in report.sources:
<li>${source['percent']}% of revision text (${source['words']} words) found at <a href="${source['url'] | h}">${source['url'] | h}</a></li>
@@ -269,7 +269,7 @@
% endif
</td>
<td>
<a href="${request.script_root | h}?lang=${query.lang | h}&amp;project=${query.project | h}&amp;oldid=${query.oldid or query.page.lastrevid | h}&amp;action=compare&amp;url=${source.url | u}">Compare</a>
<a href="${request.script_root | h}/?lang=${query.lang | h}&amp;project=${query.project | h}&amp;oldid=${query.oldid or query.page.lastrevid | h}&amp;action=compare&amp;url=${source.url | u}">Compare</a>
</td>
</tr>
% endfor


+ 1
- 1
templates/settings.mako View File

@@ -42,7 +42,7 @@
</tr>
<%
background_options = [
("list", 'Randomly select from <a href="http://commons.wikimedia.org/wiki/User:The_Earwig/POTD">a subset</a> of previous <a href="//commons.wikimedia.org/">Wikimedia Commons</a> <a href="//commons.wikimedia.org/wiki/Commons:Picture_of_the_day">Pictures of the Day</a> that work well as widescreen backgrounds, refreshed daily (default).'),
("list", 'Randomly select from <a href="https://commons.wikimedia.org/wiki/User:The_Earwig/POTD">a subset</a> of previous <a href="https://commons.wikimedia.org/">Wikimedia Commons</a> <a href="https://commons.wikimedia.org/wiki/Commons:Picture_of_the_day">Pictures of the Day</a> that work well as widescreen backgrounds, refreshed daily (default).'),
("potd", 'Use the current Commons Picture of the Day, unfiltered. Certain POTDs may be unsuitable as backgrounds due to their aspect ratio or subject matter.'),
("plain", "Use a plain background."),
]


+ 2
- 2
templates/support/footer.mako View File

@@ -4,13 +4,13 @@
%>\
</div>
<div id="footer">
<p>Copyright &copy; 2009&ndash;${datetime.now().year} <a href="//en.wikipedia.org/wiki/User:The_Earwig">Ben Kurtovic</a> &bull; \
<p>Copyright &copy; 2009&ndash;${datetime.now().year} <a href="https://en.wikipedia.org/wiki/User:The_Earwig">Ben Kurtovic</a> &bull; \
<a href="${request.script_root}/api">API</a> &bull; \
<a href="https://github.com/earwig/copyvios">Source Code</a> &bull; \
% if ("CopyviosBackground" in g.cookies and g.cookies["CopyviosBackground"].value in ["potd", "list"]) or "CopyviosBackground" not in g.cookies:
<a href="${g.descurl | h}">Background</a> &bull; \
% endif
<a href="http://validator.w3.org/check?uri=referer">Valid HTML5</a>
<a href="https://validator.w3.org/check?uri=referer">Valid HTML5</a>
</p>
</div>
</body>


+ 1
- 1
templates/support/header.mako View File

@@ -21,7 +21,7 @@
<div id="header">
<table id="heading">
<tr>
<td id="head-home"><a id="a-home" href="${request.script_root}">Earwig's Copyvio Detector</a></td>
<td id="head-home"><a id="a-home" href="${request.script_root}/">Earwig's Copyvio Detector</a></td>
<td id="head-settings"><a id="a-settings" href="${request.script_root}/settings">Settings</a></td>
</tr>
</table>


Loading…
Cancel
Save