A copyright violation detector running on Wikimedia Cloud Services https://tools.wmflabs.org/copyvios/
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Ben Kurtovic 4cebe36eee Validate site info if not in sites DB 1 year ago
copyvios Validate site info if not in sites DB 1 year ago
logs Always have a log dir. 5 years ago
static Disable form submission button while loading 1 year ago
templates Better error message when the tool runs out of search quota 1 year ago
.gitignore Fix gitignore for logs. 5 years ago
LICENSE Update copyright year; more CSS tweaks. 4 years ago
README.md Add sqlite3 backend to replace mysql 1 year ago
app.py Reduce worker counter to 8 1 year ago
build.py Begin conversion to Flask; updates. 5 years ago
schema.sql More additions for sqlite support 1 year ago


This is a copyright violation detector running on Wikimedia Labs.

It can search the web for content similar to a given article, and graphically compare an article to a specific URL. Some technical details are expanded upon in a blog post.



  • If using Tool Labs, you should clone the repository to ~/www/python/src, or otherwise symlink it to that directory. A virtualenv should be created at ~/www/python/venv.

  • Install all dependencies listed above.

  • Create an SQL database with the cache and cache_data tables defined by earwigbot-plugins.

  • Create an earwigbot instance in .earwigbot (run earwigbot .earwigbot). In .earwigbot/config.yml, fill out the connection info for the database by adding the following to the wiki section:

        host: <hostname of database server>
        db:   <name of database>

If additional arguments are needed by oursql.connect(), like usernames or passwords, they should be added to the _copyviosSQL section.

  • Run ./build.py to minify JS and CSS files.

  • Start the web server (on Tool Labs, webservice2 uwsgi-python start).