Ben Kurtovic
|
0bdcbca8b0
|
Rudimentary solution for PDF parsing (closes earwig/copyvios#18)
|
há 10 anos |
Ben Kurtovic
|
30f72df470
|
Refactor parsers; fix empty document behavior.
|
há 10 anos |
Ben Kurtovic
|
5349179088
|
Fix parsing of plain text documents (earwig/copyvios#3)
|
há 10 anos |
Ben Kurtovic
|
f10908e34e
|
Handle struct.error from GzipFile.read() (Python bug?)
|
há 10 anos |
Ben Kurtovic
|
693cdc302f
|
Catch errors while searching.
|
há 10 anos |
Ben Kurtovic
|
303c39c8c7
|
Add an option to disable short-circuiting.
|
há 10 anos |
Ben Kurtovic
|
f8f4669460
|
Remove unnecessary key attribute of sources.
|
há 10 anos |
Ben Kurtovic
|
9fd145da5c
|
Add some docs; better sorting function.
|
há 10 anos |
Ben Kurtovic
|
7afb484cea
|
Refactor a bunch of copyvio internals. Store all sources with a result object.
|
há 10 anos |
Ben Kurtovic
|
e88d1c2c70
|
Fix lazy module behavior after failure.
|
há 10 anos |
Ben Kurtovic
|
54ddff049f
|
Make CopyvioSource public; tweaks.
|
há 10 anos |
Ben Kurtovic
|
0438766ee4
|
Handle empty URLs better.
|
há 10 anos |
Ben Kurtovic
|
2147207388
|
Remove unnecessary variable assign.
|
há 10 anos |
Ben Kurtovic
|
f94a67e0e3
|
Define num_queries in the proper place.
|
há 10 anos |
Ben Kurtovic
|
12247dd756
|
Add no_links and no_searches to copyvio_check().
|
há 10 anos |
Ben Kurtovic
|
f37621e5ec
|
Use a deque for a FIFO instead of the python list LIFO.
|
há 10 anos |
Ben Kurtovic
|
8e439e1eea
|
source.join() now blocks when in the middle of processing.
|
há 10 anos |
Ben Kurtovic
|
dbb1ae5483
|
Handle empty queues correctly. Remove some log messages.
|
há 10 anos |
Ben Kurtovic
|
2fa8aeba5b
|
Fix a blocking issue.
|
há 10 anos |
Ben Kurtovic
|
c56838e742
|
Only spawn one worker for comparisons in local mode.
|
há 10 anos |
Ben Kurtovic
|
939d8be08f
|
Fix variable.
|
há 10 anos |
Ben Kurtovic
|
3ed8837a3e
|
Fix stopping queues in local mode.
|
há 10 anos |
Ben Kurtovic
|
de7576728f
|
Fix dequeueing logic a bit.
|
há 10 anos |
Ben Kurtovic
|
b939262b11
|
Bugfix.
|
há 10 anos |
Ben Kurtovic
|
32ef0fbf1f
|
Add a bunch of temporary debugging code.
|
há 10 anos |
Ben Kurtovic
|
c7b3b7bc7f
|
CopyvioSource.workspace should be public.
|
há 10 anos |
Ben Kurtovic
|
e73e626994
|
Some locks needed to be tightened.
|
há 10 anos |
Ben Kurtovic
|
486c4692ed
|
Remove _workers attr of workspaces.
|
há 10 anos |
Ben Kurtovic
|
7c0e98596c
|
Some bugfixes.
|
há 10 anos |
Ben Kurtovic
|
361f7709f8
|
Starting work on global workers.
|
há 10 anos |
Ben Kurtovic
|
bdcbfa5327
|
Catch errors around response.read().
|
há 10 anos |
Ben Kurtovic
|
9b87e2e5f7
|
Fix trying to remove a node that was already removed.
|
há 10 anos |
Ben Kurtovic
|
24dd497fd9
|
Catch more general socket.error.
|
há 10 anos |
Ben Kurtovic
|
5e72e74759
|
Employ new piecewise article-delta confidence function.
|
há 10 anos |
Ben Kurtovic
|
193f96451e
|
Also strip <ref>s in ArticleTextParser.strip().
|
há 10 anos |
Ben Kurtovic
|
c4dede1459
|
Reorder length check to potentially fix an empty-query bug.
|
há 10 anos |
Ben Kurtovic
|
203c65280c
|
Float delta.
|
há 10 anos |
Ben Kurtovic
|
6b0f8ad311
|
Fix reference.
|
há 10 anos |
Ben Kurtovic
|
e2d7c7aef6
|
Update with new confidence function; fix unicode.
|
há 10 anos |
Ben Kurtovic
|
05010933c7
|
Reorder some URL opening code; zip protection.
|
há 10 anos |
Ben Kurtovic
|
4f5a22a2e5
|
Apparently oauth2 converts the query to unicode.
|
há 10 anos |
Ben Kurtovic
|
5003c21ff6
|
Quoting the entire query works now.
|
há 10 anos |
Ben Kurtovic
|
5677664476
|
Properly encode URL for the search engine.
|
há 10 anos |
Ben Kurtovic
|
5890ee6e6a
|
Don't quote_plus() the query.
|
há 10 anos |
Ben Kurtovic
|
2bddf79a3d
|
Fix deadlock when calling queue.put() while holding the mutex.
|
há 10 anos |
Ben Kurtovic
|
7a4fcd7807
|
Fix queue clear call.
|
há 10 anos |
Ben Kurtovic
|
efae85a1fe
|
Move thread spawning code to worker class.
|
há 10 anos |
Ben Kurtovic
|
6a90efc812
|
Improve !threads command output.
|
há 10 anos |
Ben Kurtovic
|
7137dda920
|
Update copyvio checker to not make concurrent requests to a single domain.
|
há 10 anos |
Ben Kurtovic
|
5874467ec3
|
Bugfix, cleanup.
|
há 10 anos |