312 Commit (a73f618e0aec26efdc28b85ae824911e1b9536c1)

Autore SHA1 Messaggio Data
  Ben Kurtovic a73f618e0a Initial conversion to Python 3 3 anni fa
  Ben Kurtovic 01dcbd4394 copyvios: Support on-error condition for URL proxying 3 anni fa
  Ben Kurtovic 92d43e566e copyvios: Missed a file 3 anni fa
  Ben Kurtovic 9d66ebc6b2 copyvios: Config-directed URL proxying 3 anni fa
  Ben Kurtovic fe2e7879e4 Fix issues in previous commit 3 anni fa
  Ben Kurtovic 2324a73624 copyvios: Refactor some parsing logic and add dynamic Blogger support 3 anni fa
  Ben Kurtovic a8a605fe05 Reduce wait between queries from 2s to 1s 3 anni fa
  Ben Kurtovic abb9403e5d More bug fixes 3 anni fa
  Ben Kurtovic a49a82e263 Fix a few bugs 3 anni fa
  Ben Kurtovic 2b5914b6ae Support parser-directed URL redirecting (for Wayback Machine PDFs) 3 anni fa
  Ben Kurtovic b9074c9f9d URL exclusions: fix uppercase characters in patterns never matching 4 anni fa
  Ben Kurtovic 2df64ede65 Fix not sending Content-Type in POST requests 4 anni fa
  Ben Kurtovic 88f9c21111 URL exclusions: fix comment parsing 4 anni fa
  Ben Kurtovic 1cdc0a5a4c Improve excluded URL list parsing 4 anni fa
  Ben Kurtovic ea4ee76691 release/0.3 5 anni fa
  Ben Kurtovic 774628b34e OAuth support; switch to requests; update login flow 5 anni fa
  Ben Kurtovic f1b93a465a Log warnings; use rvslots when fetching revision content 5 anni fa
  Ben Kurtovic 8a945b0782 Greatly simplify MarkovChain implementation 5 anni fa
  Ben Kurtovic 466d3a42f1 copyvios: Minor refactor for cleaner stack frames. 5 anni fa
  Ben Kurtovic 42a224f365 copyvios: Catch PDF parser exceptions more aggressively. 5 anni fa
  Ben Kurtovic c68a5e6dfb Fix Page.toggle_talk() on mainspace titles with colons. 5 anni fa
  Ben Kurtovic 7d7d1aceea Update dependencies, copyright year. 6 anni fa
  Ben Kurtovic aed5a5954d Fix SitesDB lookup for sites with overlapping URLs. 7 anni fa
  Ben Kurtovic a463c6d052 Fix lazy loading bug where lxml.etree wasn't accessible to bs4. 7 anni fa
  Ben Kurtovic f2099df5d5 Minor refactor in HTML parser. 7 anni fa
  Ben Kurtovic fbb9ea7b03 Catch empty Google results properly. 8 anni fa
  Ben Kurtovic aba91c0f1c Missing comma. 8 anni fa
  Ben Kurtovic a95356676b Add GoogleSearchEngine. 8 anni fa
  Ben Kurtovic 98d0977c19 Refactor search; cleanup; fixup. 8 anni fa
  Ben Kurtovic 7853bcc0f3 Fix dependency checking for search engines. 8 anni fa
  Ben Kurtovic 76b068c4df Add Yandex proxy support. 8 anni fa
  Ben Kurtovic a0d7eb62a2 Add Yandex search support. 8 anni fa
  Ben Kurtovic 04ed5257c7 Refactor search engines. 8 anni fa
  Ben Kurtovic 80890fb191 WebFileType doesn't work 8 anni fa
  Ben Kurtovic 977b587e5e Add support for Bing Search 8 anni fa
  Ben Kurtovic 69cdb41d07 Adjust mirror hints to include direct links back to the article. 8 anni fa
  Ben Kurtovic b4b079ffd0 Update copyright year for 2016. 8 anni fa
  Ben Kurtovic 4828cbad69 Catch possible ValueError when doing opener.open(). 8 anni fa
  Ben Kurtovic 48a14ee3ed Don't log the full debug line when sending a lot of data. 8 anni fa
  Ben Kurtovic eceb4d139a Minor refactor. 8 anni fa
  Ben Kurtovic f92fb34d0e Improve sentence splitting, again. 8 anni fa
  Ben Kurtovic 75058997c2 Split copyvio queries a bit differently; maybe better on other languages. 8 anni fa
  Ben Kurtovic f52fb06c19 Add a debug message when catching ParserExclusionError. 8 anni fa
  Ben Kurtovic c81d1d949d Update global exclusion lists more often than site-specific ones. 8 anni fa
  Ben Kurtovic 108eca13ac Finish mirror hinting algorithm. 8 anni fa
  Ben Kurtovic 91846ce4fb Refactor out mirror hinting logic in source parsers. 8 anni fa
  Ben Kurtovic 147b46f572 A couple more fixes and cleanup. 8 anni fa
  Ben Kurtovic 03910b6cb5 Add mirror detection logic to parsers; fixes. 8 anni fa
  Ben Kurtovic 81a090c923 Allow content parsers to signal that a source should be excluded. 8 anni fa
  Ben Kurtovic bb819c9306 Explicitly include excluded URLs in the result set; mark as excluded. 8 anni fa