Ben Kurtovic
|
9d66ebc6b2
|
copyvios: Config-directed URL proxying
|
há 3 anos |
Ben Kurtovic
|
fe2e7879e4
|
Fix issues in previous commit
|
há 3 anos |
Ben Kurtovic
|
2324a73624
|
copyvios: Refactor some parsing logic and add dynamic Blogger support
|
há 3 anos |
Ben Kurtovic
|
abb9403e5d
|
More bug fixes
|
há 3 anos |
Ben Kurtovic
|
a49a82e263
|
Fix a few bugs
|
há 3 anos |
Ben Kurtovic
|
2b5914b6ae
|
Support parser-directed URL redirecting (for Wayback Machine PDFs)
|
há 3 anos |
Ben Kurtovic
|
b9074c9f9d
|
URL exclusions: fix uppercase characters in patterns never matching
|
há 4 anos |
Ben Kurtovic
|
88f9c21111
|
URL exclusions: fix comment parsing
|
há 5 anos |
Ben Kurtovic
|
1cdc0a5a4c
|
Improve excluded URL list parsing
|
há 5 anos |
Ben Kurtovic
|
774628b34e
|
OAuth support; switch to requests; update login flow
|
há 5 anos |
Ben Kurtovic
|
8a945b0782
|
Greatly simplify MarkovChain implementation
|
há 5 anos |
Ben Kurtovic
|
466d3a42f1
|
copyvios: Minor refactor for cleaner stack frames.
|
há 5 anos |
Ben Kurtovic
|
42a224f365
|
copyvios: Catch PDF parser exceptions more aggressively.
|
há 5 anos |
Ben Kurtovic
|
a463c6d052
|
Fix lazy loading bug where lxml.etree wasn't accessible to bs4.
|
há 8 anos |
Ben Kurtovic
|
f2099df5d5
|
Minor refactor in HTML parser.
|
há 8 anos |
Ben Kurtovic
|
fbb9ea7b03
|
Catch empty Google results properly.
|
há 8 anos |
Ben Kurtovic
|
aba91c0f1c
|
Missing comma.
|
há 8 anos |
Ben Kurtovic
|
a95356676b
|
Add GoogleSearchEngine.
|
há 8 anos |
Ben Kurtovic
|
98d0977c19
|
Refactor search; cleanup; fixup.
|
há 8 anos |
Ben Kurtovic
|
7853bcc0f3
|
Fix dependency checking for search engines.
|
há 8 anos |
Ben Kurtovic
|
76b068c4df
|
Add Yandex proxy support.
|
há 8 anos |
Ben Kurtovic
|
a0d7eb62a2
|
Add Yandex search support.
|
há 8 anos |
Ben Kurtovic
|
04ed5257c7
|
Refactor search engines.
|
há 8 anos |
Ben Kurtovic
|
80890fb191
|
WebFileType doesn't work
|
há 8 anos |
Ben Kurtovic
|
977b587e5e
|
Add support for Bing Search
|
há 8 anos |
Ben Kurtovic
|
69cdb41d07
|
Adjust mirror hints to include direct links back to the article.
|
há 8 anos |
Ben Kurtovic
|
b4b079ffd0
|
Update copyright year for 2016.
|
há 8 anos |
Ben Kurtovic
|
4828cbad69
|
Catch possible ValueError when doing opener.open().
|
há 8 anos |
Ben Kurtovic
|
eceb4d139a
|
Minor refactor.
|
há 8 anos |
Ben Kurtovic
|
f92fb34d0e
|
Improve sentence splitting, again.
|
há 8 anos |
Ben Kurtovic
|
75058997c2
|
Split copyvio queries a bit differently; maybe better on other languages.
|
há 8 anos |
Ben Kurtovic
|
f52fb06c19
|
Add a debug message when catching ParserExclusionError.
|
há 9 anos |
Ben Kurtovic
|
c81d1d949d
|
Update global exclusion lists more often than site-specific ones.
|
há 9 anos |
Ben Kurtovic
|
108eca13ac
|
Finish mirror hinting algorithm.
|
há 9 anos |
Ben Kurtovic
|
91846ce4fb
|
Refactor out mirror hinting logic in source parsers.
|
há 9 anos |
Ben Kurtovic
|
147b46f572
|
A couple more fixes and cleanup.
|
há 9 anos |
Ben Kurtovic
|
03910b6cb5
|
Add mirror detection logic to parsers; fixes.
|
há 9 anos |
Ben Kurtovic
|
81a090c923
|
Allow content parsers to signal that a source should be excluded.
|
há 9 anos |
Ben Kurtovic
|
bb819c9306
|
Explicitly include excluded URLs in the result set; mark as excluded.
|
há 9 anos |
Ben Kurtovic
|
e99e1c1ef1
|
Typo fix.
|
há 9 anos |
Ben Kurtovic
|
509598d7fc
|
Try merging in templates with parameter values of a certain size (fixes #42)
|
há 9 anos |
Ben Kurtovic
|
d741667c4c
|
Try using pentagrams rather than trigrams for copyvio Markov chains.
|
há 9 anos |
Ben Kurtovic
|
4e8be871b7
|
Update copyright year for 2015.
|
há 9 anos |
Ben Kurtovic
|
09319b1675
|
Don't die on broken regexes.
|
há 9 anos |
Ben Kurtovic
|
4cdfafd487
|
Skip site check.
|
há 9 anos |
Ben Kurtovic
|
4075d887e9
|
Fix return.
|
há 9 anos |
Ben Kurtovic
|
a2c10650a8
|
Add support for User:EranBot/Copyright/Blacklist (closes #52)
|
há 9 anos |
Ben Kurtovic
|
9ffc3f1bf5
|
Raise file crawl size limit for PDFs.
|
há 9 anos |
Ben Kurtovic
|
b87d5ac673
|
Pass parameter to recursive call.
|
há 10 anos |
Ben Kurtovic
|
170f810735
|
Allow ExclusionDB to force a sync.
|
há 10 anos |