Ben Kurtovic
|
c68a5e6dfb
|
Fix Page.toggle_talk() on mainspace titles with colons.
|
5 years ago |
Ben Kurtovic
|
7d7d1aceea
|
Update dependencies, copyright year.
|
7 years ago |
Ben Kurtovic
|
aed5a5954d
|
Fix SitesDB lookup for sites with overlapping URLs.
|
8 years ago |
Ben Kurtovic
|
a463c6d052
|
Fix lazy loading bug where lxml.etree wasn't accessible to bs4.
|
8 years ago |
Ben Kurtovic
|
f2099df5d5
|
Minor refactor in HTML parser.
|
8 years ago |
Ben Kurtovic
|
fbb9ea7b03
|
Catch empty Google results properly.
|
8 years ago |
Ben Kurtovic
|
aba91c0f1c
|
Missing comma.
|
8 years ago |
Ben Kurtovic
|
a95356676b
|
Add GoogleSearchEngine.
|
8 years ago |
Ben Kurtovic
|
98d0977c19
|
Refactor search; cleanup; fixup.
|
8 years ago |
Ben Kurtovic
|
7853bcc0f3
|
Fix dependency checking for search engines.
|
8 years ago |
Ben Kurtovic
|
76b068c4df
|
Add Yandex proxy support.
|
8 years ago |
Ben Kurtovic
|
a0d7eb62a2
|
Add Yandex search support.
|
8 years ago |
Ben Kurtovic
|
04ed5257c7
|
Refactor search engines.
|
8 years ago |
Ben Kurtovic
|
80890fb191
|
WebFileType doesn't work
|
8 years ago |
Ben Kurtovic
|
977b587e5e
|
Add support for Bing Search
|
8 years ago |
Ben Kurtovic
|
69cdb41d07
|
Adjust mirror hints to include direct links back to the article.
|
8 years ago |
Ben Kurtovic
|
b4b079ffd0
|
Update copyright year for 2016.
|
9 years ago |
Ben Kurtovic
|
4828cbad69
|
Catch possible ValueError when doing opener.open().
|
9 years ago |
Ben Kurtovic
|
48a14ee3ed
|
Don't log the full debug line when sending a lot of data.
|
9 years ago |
Ben Kurtovic
|
eceb4d139a
|
Minor refactor.
|
9 years ago |
Ben Kurtovic
|
f92fb34d0e
|
Improve sentence splitting, again.
|
9 years ago |
Ben Kurtovic
|
75058997c2
|
Split copyvio queries a bit differently; maybe better on other languages.
|
9 years ago |
Ben Kurtovic
|
f52fb06c19
|
Add a debug message when catching ParserExclusionError.
|
9 years ago |
Ben Kurtovic
|
c81d1d949d
|
Update global exclusion lists more often than site-specific ones.
|
9 years ago |
Ben Kurtovic
|
108eca13ac
|
Finish mirror hinting algorithm.
|
9 years ago |
Ben Kurtovic
|
91846ce4fb
|
Refactor out mirror hinting logic in source parsers.
|
9 years ago |
Ben Kurtovic
|
147b46f572
|
A couple more fixes and cleanup.
|
9 years ago |
Ben Kurtovic
|
03910b6cb5
|
Add mirror detection logic to parsers; fixes.
|
9 years ago |
Ben Kurtovic
|
81a090c923
|
Allow content parsers to signal that a source should be excluded.
|
9 years ago |
Ben Kurtovic
|
bb819c9306
|
Explicitly include excluded URLs in the result set; mark as excluded.
|
9 years ago |
Ben Kurtovic
|
e99e1c1ef1
|
Typo fix.
|
9 years ago |
Ben Kurtovic
|
509598d7fc
|
Try merging in templates with parameter values of a certain size (fixes #42)
|
9 years ago |
Ben Kurtovic
|
d741667c4c
|
Try using pentagrams rather than trigrams for copyvio Markov chains.
|
9 years ago |
Ben Kurtovic
|
b315e9bdc5
|
Default to useHTTPS=True for new Sites.
|
9 years ago |
Ben Kurtovic
|
68e3ec2a73
|
Don't pass starttimestamp when None.
This sometimes happens when pages are edited without loading them beforehand.
|
9 years ago |
Ben Kurtovic
|
4e8be871b7
|
Update copyright year for 2015.
|
9 years ago |
Ben Kurtovic
|
09319b1675
|
Don't die on broken regexes.
|
9 years ago |
Ben Kurtovic
|
4cdfafd487
|
Skip site check.
|
9 years ago |
Ben Kurtovic
|
4075d887e9
|
Fix return.
|
9 years ago |
Ben Kurtovic
|
a2c10650a8
|
Add support for User:EranBot/Copyright/Blacklist (closes #52)
|
9 years ago |
Ben Kurtovic
|
9ffc3f1bf5
|
Raise file crawl size limit for PDFs.
|
10 years ago |
Ben Kurtovic
|
33107c6a01
|
Add a lastrevid property to pages.
|
10 years ago |
Ben Kurtovic
|
b87d5ac673
|
Pass parameter to recursive call.
|
10 years ago |
Ben Kurtovic
|
170f810735
|
Allow ExclusionDB to force a sync.
|
10 years ago |
Ben Kurtovic
|
6ae3cd6d08
|
Handle interwiki page titles correctly.
|
10 years ago |
Ben Kurtovic
|
901192ec18
|
Handle errors from UnicodeDamnit.
|
10 years ago |
Ben Kurtovic
|
3f2dd1094f
|
Catch HTTPException in opener.open.
|
10 years ago |
Ben Kurtovic
|
9eaad11efb
|
Fix unicode bug in exception.
|
10 years ago |
Ben Kurtovic
|
699f6e3b17
|
Seems it will sometimes raise AssertionError.
|
10 years ago |
Ben Kurtovic
|
12c5170815
|
Catch another exception thrown by pdfminer.
|
10 years ago |