Ben Kurtovic
a73f618e0a
Initial conversion to Python 3
3 år sedan
Ben Kurtovic
9d66ebc6b2
copyvios: Config-directed URL proxying
3 år sedan
Ben Kurtovic
91846ce4fb
Refactor out mirror hinting logic in source parsers.
8 år sedan
Ben Kurtovic
03910b6cb5
Add mirror detection logic to parsers; fixes.
8 år sedan
Ben Kurtovic
bb819c9306
Explicitly include excluded URLs in the result set; mark as excluded.
8 år sedan
Ben Kurtovic
4e8be871b7
Update copyright year for 2015.
9 år sedan
Ben Kurtovic
5194525a32
Note when sources might have been missed.
9 år sedan
Ben Kurtovic
30f72df470
Refactor parsers; fix empty document behavior.
9 år sedan
Ben Kurtovic
f8f4669460
Remove unnecessary key attribute of sources.
9 år sedan
Ben Kurtovic
9fd145da5c
Add some docs; better sorting function.
9 år sedan
Ben Kurtovic
7afb484cea
Refactor a bunch of copyvio internals. Store all sources with a result object.
9 år sedan
Ben Kurtovic
54ddff049f
Make CopyvioSource public; tweaks.
9 år sedan
Ben Kurtovic
ae0c390ceb
Redesign copyvio internals to parallelize URL loading/parsing.
10 år sedan
Ben Kurtovic
39d5c7c149
Update copyright notices for 2014.
10 år sedan
Ben Kurtovic
ed95c99f0e
Update email address.
10 år sedan
Ben Kurtovic
0b7a13eca5
Update copyright notices for 2013.
11 år sedan
Ben Kurtovic
bcf9b70107
Keep track of how long generating results takes; support 'max_time'.
11 år sedan
Ben Kurtovic
a4dda89a61
Various fixes for copyvios.
- Fix a bug in ExclusionsDB; improve URL regexes.
- NLTK's LookupError is actually an IOError.
- Fix bug in __repr__ for CopyvioCheckResult.
- Rewrite YahooBOSSSearchEngine to actually work with oauth2.
- Search engines now take a URL opener in addition to credentials.
11 år sedan
Ben Kurtovic
a074da853b
More work on copyvios, including an exclusions database ( #5 )
* Added exclusions module with a fully implemented ExclusionsDB that can pull
from multiple sources for different sites.
* Moved CopyvioCheckResult to its own module, to be imported by __init__.
* Some other related changes.
12 år sedan