earwigbot

커밋 그래프

작성자	SHA1	메시지	날짜
Ben Kurtovic	a73f618e0a	Initial conversion to Python 3	3 년 전
Ben Kurtovic	fe2e7879e4	Fix issues in previous commit	3 년 전
Ben Kurtovic	2324a73624	copyvios: Refactor some parsing logic and add dynamic Blogger support	3 년 전
Ben Kurtovic	a49a82e263	Fix a few bugs	3 년 전
Ben Kurtovic	2b5914b6ae	Support parser-directed URL redirecting (for Wayback Machine PDFs)	3 년 전
Ben Kurtovic	466d3a42f1	copyvios: Minor refactor for cleaner stack frames.	5 년 전
Ben Kurtovic	42a224f365	copyvios: Catch PDF parser exceptions more aggressively.	5 년 전
Ben Kurtovic	f2099df5d5	Minor refactor in HTML parser.	7 년 전
Ben Kurtovic	eceb4d139a	Minor refactor.	8 년 전
Ben Kurtovic	f92fb34d0e	Improve sentence splitting, again.	8 년 전
Ben Kurtovic	75058997c2	Split copyvio queries a bit differently; maybe better on other languages.	8 년 전
Ben Kurtovic	91846ce4fb	Refactor out mirror hinting logic in source parsers.	8 년 전
Ben Kurtovic	147b46f572	A couple more fixes and cleanup.	8 년 전
Ben Kurtovic	03910b6cb5	Add mirror detection logic to parsers; fixes.	8 년 전
Ben Kurtovic	81a090c923	Allow content parsers to signal that a source should be excluded.	8 년 전
Ben Kurtovic	e99e1c1ef1	Typo fix.	8 년 전
Ben Kurtovic	509598d7fc	Try merging in templates with parameter values of a certain size (fixes #42 )	8 년 전
Ben Kurtovic	4e8be871b7	Update copyright year for 2015.	9 년 전
Ben Kurtovic	9ffc3f1bf5	Raise file crawl size limit for PDFs.	9 년 전
Ben Kurtovic	901192ec18	Handle errors from UnicodeDamnit.	9 년 전
Ben Kurtovic	699f6e3b17	Seems it will sometimes raise AssertionError.	9 년 전
Ben Kurtovic	12c5170815	Catch another exception thrown by pdfminer.	9 년 전
Ben Kurtovic	77514ee925	Add another PDF string substitution.	9 년 전
Ben Kurtovic	0bdcbca8b0	Rudimentary solution for PDF parsing (closes earwig/copyvios#18 )	9 년 전
Ben Kurtovic	30f72df470	Refactor parsers; fix empty document behavior.	9 년 전
Ben Kurtovic	5349179088	Fix parsing of plain text documents (earwig/copyvios#3 )	9 년 전
Ben Kurtovic	7afb484cea	Refactor a bunch of copyvio internals. Store all sources with a result object.	9 년 전
Ben Kurtovic	54ddff049f	Make CopyvioSource public; tweaks.	9 년 전
Ben Kurtovic	9b87e2e5f7	Fix trying to remove a node that was already removed.	9 년 전
Ben Kurtovic	193f96451e	Also strip <ref>s in ArticleTextParser.strip().	9 년 전
Ben Kurtovic	c4dede1459	Reorder length check to potentially fix an empty-query bug.	9 년 전
Ben Kurtovic	5874467ec3	Bugfix, cleanup.	9 년 전
Ben Kurtovic	ae0c390ceb	Redesign copyvio internals to parallelize URL loading/parsing.	9 년 전
Ben Kurtovic	3e4dac967d	Remove auto-quotes from queries; add min_query; halve max_query.	9 년 전
Ben Kurtovic	6b146a397a	Also strip out files and categories in ATP.strip().	9 년 전
Ben Kurtovic	2dfdf1bd4a	Ensure the text is stripped properly.	10 년 전
Ben Kurtovic	3dde1c5d60	Correctly handle HTML with no <body> tags.	10 년 전
Ben Kurtovic	39d5c7c149	Update copyright notices for 2014.	10 년 전
Ben Kurtovic	ed95c99f0e	Update email address.	10 년 전
Ben Kurtovic	0b7a13eca5	Update copyright notices for 2013.	11 년 전
Ben Kurtovic	8862bec3d9	Fix statements assigned to nothing.	11 년 전
Ben Kurtovic	a4dda89a61	Various fixes for copyvios. - Fix a bug in ExclusionsDB; improve URL regexes. - NLTK's LookupError is actually an IOError. - Fix bug in __repr__ for CopyvioCheckResult. - Rewrite YahooBOSSSearchEngine to actually work with oauth2. - Search engines now take a URL opener in addition to credentials.	11 년 전
Ben Kurtovic	0ca84ab9bc	Implement lazy-importing of oauth2, nltk, and bs4.	11 년 전
Ben Kurtovic	4baab6f57c	Implement lazy importing of root-level modules and packages. - Simplify all imports - Update dependency version in setup.py - Change waitTime default from three seconds to two	11 년 전
Ben Kurtovic	7d52d4558a	Some updates to !lag.	11 년 전
Ben Kurtovic	33aa1d6744	Collapse extra newlines to avoid distorting trigrams.	11 년 전
Ben Kurtovic	5ab736723b	Fixing a number of silly mistakes; refactoring (thanks pylint)	12 년 전
Ben Kurtovic	c260648bdb	Finish chunking algorithm, improve !link, other fixes.	12 년 전
Ben Kurtovic	569c815d99	Implement NLTK for chunking article content (#5 ).	12 년 전
Ben Kurtovic	cb87004107	Primitive screen scraper for HTML using BeautifulSoup and LXML. Obviously this can and should be improved significantly later, but it seems good enough for now.	12 년 전

1 2

55 커밋 (a73f618e0aec26efdc28b85ae824911e1b9536c1)