Ben Kurtovic
|
a73f618e0a
|
Initial conversion to Python 3
|
3 anni fa |
Ben Kurtovic
|
01dcbd4394
|
copyvios: Support on-error condition for URL proxying
|
3 anni fa |
Ben Kurtovic
|
92d43e566e
|
copyvios: Missed a file
|
3 anni fa |
Ben Kurtovic
|
9d66ebc6b2
|
copyvios: Config-directed URL proxying
|
3 anni fa |
Ben Kurtovic
|
fe2e7879e4
|
Fix issues in previous commit
|
3 anni fa |
Ben Kurtovic
|
2324a73624
|
copyvios: Refactor some parsing logic and add dynamic Blogger support
|
3 anni fa |
Ben Kurtovic
|
a8a605fe05
|
Reduce wait between queries from 2s to 1s
|
3 anni fa |
Ben Kurtovic
|
abb9403e5d
|
More bug fixes
|
3 anni fa |
Ben Kurtovic
|
a49a82e263
|
Fix a few bugs
|
3 anni fa |
Ben Kurtovic
|
2b5914b6ae
|
Support parser-directed URL redirecting (for Wayback Machine PDFs)
|
3 anni fa |
Ben Kurtovic
|
b9074c9f9d
|
URL exclusions: fix uppercase characters in patterns never matching
|
4 anni fa |
Ben Kurtovic
|
2df64ede65
|
Fix not sending Content-Type in POST requests
|
4 anni fa |
Ben Kurtovic
|
88f9c21111
|
URL exclusions: fix comment parsing
|
4 anni fa |
Ben Kurtovic
|
1cdc0a5a4c
|
Improve excluded URL list parsing
|
4 anni fa |
Ben Kurtovic
|
ea4ee76691
|
release/0.3
|
5 anni fa |
Ben Kurtovic
|
774628b34e
|
OAuth support; switch to requests; update login flow
|
5 anni fa |
Ben Kurtovic
|
f1b93a465a
|
Log warnings; use rvslots when fetching revision content
|
5 anni fa |
Ben Kurtovic
|
8a945b0782
|
Greatly simplify MarkovChain implementation
|
5 anni fa |
Ben Kurtovic
|
466d3a42f1
|
copyvios: Minor refactor for cleaner stack frames.
|
5 anni fa |
Ben Kurtovic
|
42a224f365
|
copyvios: Catch PDF parser exceptions more aggressively.
|
5 anni fa |
Ben Kurtovic
|
c68a5e6dfb
|
Fix Page.toggle_talk() on mainspace titles with colons.
|
5 anni fa |
Ben Kurtovic
|
7d7d1aceea
|
Update dependencies, copyright year.
|
6 anni fa |
Ben Kurtovic
|
aed5a5954d
|
Fix SitesDB lookup for sites with overlapping URLs.
|
7 anni fa |
Ben Kurtovic
|
a463c6d052
|
Fix lazy loading bug where lxml.etree wasn't accessible to bs4.
|
7 anni fa |
Ben Kurtovic
|
f2099df5d5
|
Minor refactor in HTML parser.
|
7 anni fa |
Ben Kurtovic
|
fbb9ea7b03
|
Catch empty Google results properly.
|
8 anni fa |
Ben Kurtovic
|
aba91c0f1c
|
Missing comma.
|
8 anni fa |
Ben Kurtovic
|
a95356676b
|
Add GoogleSearchEngine.
|
8 anni fa |
Ben Kurtovic
|
98d0977c19
|
Refactor search; cleanup; fixup.
|
8 anni fa |
Ben Kurtovic
|
7853bcc0f3
|
Fix dependency checking for search engines.
|
8 anni fa |
Ben Kurtovic
|
76b068c4df
|
Add Yandex proxy support.
|
8 anni fa |
Ben Kurtovic
|
a0d7eb62a2
|
Add Yandex search support.
|
8 anni fa |
Ben Kurtovic
|
04ed5257c7
|
Refactor search engines.
|
8 anni fa |
Ben Kurtovic
|
80890fb191
|
WebFileType doesn't work
|
8 anni fa |
Ben Kurtovic
|
977b587e5e
|
Add support for Bing Search
|
8 anni fa |
Ben Kurtovic
|
69cdb41d07
|
Adjust mirror hints to include direct links back to the article.
|
8 anni fa |
Ben Kurtovic
|
b4b079ffd0
|
Update copyright year for 2016.
|
8 anni fa |
Ben Kurtovic
|
4828cbad69
|
Catch possible ValueError when doing opener.open().
|
8 anni fa |
Ben Kurtovic
|
48a14ee3ed
|
Don't log the full debug line when sending a lot of data.
|
8 anni fa |
Ben Kurtovic
|
eceb4d139a
|
Minor refactor.
|
8 anni fa |
Ben Kurtovic
|
f92fb34d0e
|
Improve sentence splitting, again.
|
8 anni fa |
Ben Kurtovic
|
75058997c2
|
Split copyvio queries a bit differently; maybe better on other languages.
|
8 anni fa |
Ben Kurtovic
|
f52fb06c19
|
Add a debug message when catching ParserExclusionError.
|
8 anni fa |
Ben Kurtovic
|
c81d1d949d
|
Update global exclusion lists more often than site-specific ones.
|
8 anni fa |
Ben Kurtovic
|
108eca13ac
|
Finish mirror hinting algorithm.
|
8 anni fa |
Ben Kurtovic
|
91846ce4fb
|
Refactor out mirror hinting logic in source parsers.
|
8 anni fa |
Ben Kurtovic
|
147b46f572
|
A couple more fixes and cleanup.
|
8 anni fa |
Ben Kurtovic
|
03910b6cb5
|
Add mirror detection logic to parsers; fixes.
|
8 anni fa |
Ben Kurtovic
|
81a090c923
|
Allow content parsers to signal that a source should be excluded.
|
8 anni fa |
Ben Kurtovic
|
bb819c9306
|
Explicitly include excluded URLs in the result set; mark as excluded.
|
8 anni fa |