17 Commit (9254158fc56c6d91f3eea6454505ede8ad909a70)

Autore SHA1 Messaggio Data
  Ben Kurtovic 4baab6f57c Implement lazy importing of root-level modules and packages. 12 anni fa
  Ben Kurtovic 8d8703358c More fixes and tweaks; cleanup; etc. 12 anni fa
  Ben Kurtovic f993b847ab Encode URLs as UTF-8 before opening them. 12 anni fa
  Ben Kurtovic 570168ed0e Institute a timeout so we don't try to open these suspicious URLs forever. 12 anni fa
  Ben Kurtovic 439b855254 Fully implement logging; fix non-unicode log messages. 12 anni fa
  Ben Kurtovic a074da853b More work on copyvios, including an exclusions database (#5) 12 anni fa
  Ben Kurtovic c260648bdb Finish chunking algorithm, improve !link, other fixes. 12 anni fa
  Ben Kurtovic 569c815d99 Implement NLTK for chunking article content (#5). 12 anni fa
  Ben Kurtovic 1af4217b63 Update copyright notices and some other improvements. 12 anni fa
  Ben Kurtovic d45e342bac DOCUMENT EVERYTHING (#5) 12 anni fa
  Ben Kurtovic d87c226417 __repr__ and __str__ for everything per #5 and #22. 12 anni fa
  Ben Kurtovic 7dbbe9683c Update imports and exceptions. 12 anni fa
  Ben Kurtovic 5ca1d91f3e Use __all__ within e.w.copyvios and shorter imports 12 anni fa
  Ben Kurtovic 86a8440730 Moving parsers to own file. 12 anni fa
  Ben Kurtovic d4e947b98b earwigbot.wiki.copyvios.search module split 12 anni fa
  Ben Kurtovic e6a381f3f7 Restructuring copyvio stuff as its own package. 12 anni fa
  Ben Kurtovic 9434a416a1 Moved search engine/credential info into config proper. 12 anni fa
  Ben Kurtovic f382ceb38e Pushing some smarter logic for MarkovChains 12 anni fa
  Ben Kurtovic 755dff9714 Copyvios: auto-fail very small articles (< 20 chain links) 12 anni fa
  Ben Kurtovic 6009c050f9 Minor integer division fix. 12 anni fa
  Ben Kurtovic df7868da3e Updates to copyright violation stuff. 12 anni fa
  Ben Kurtovic ee2b1133bb Algorithm for comparing article content against a suspected source using MarkovChains 12 anni fa
  Ben Kurtovic 2da906109b Copyright update for 2012. 12 anni fa
  Ben Kurtovic 13100533b9 CopyrightMixin needs Page._site 12 anni fa
  Ben Kurtovic c48073515b #wikipedia-en-afc -> #wikipedia-en-afc-feed 12 anni fa
  Ben Kurtovic 24f7eabb77 Some more work on copyvio detection code 12 anni fa
  Ben Kurtovic 56e6140284 More work on copyright violation detection code. 12 anni fa
  Ben Kurtovic 0b6d5eac5e Some code for copyvio detection, including querying Yahoo! BOSS correctly. 12 anni fa