Add:
bitshift/crawler/crawler.py
-Add more efficient method of querying GitHub's API for stargazer
counts, by batching 25 repositories per request.
-Add watcher counts for Bitbucket repositories, by querying the
Bitbucket API once per repository (inefficient, but the API in question
isn't sufficiently robust to accommodate a better approach, and Git
repositories surface so infrequently that there shouldn't be any query
limit problems).
Add:
bitshift/
__init__.py
-add `_configure_logging()`, which sets up a more robust logging
infrastructure than was previously used: log files are rotated once
per hour, and have some additional formatting rules.
(crawler, indexer).py
-add hierarchically-descending loggers to individual threaded
classes (`GitHubCrawler`, `GitIndexer`, etc.); add logging calls.
indexer.py
-remove file filtering regex matches from `_get_tracked_files()`,
as non-code files will be discarded by the parsers.
Add:
bitshift/crawler/
__init__.py
-Initialize 'BitbucketCrawler()' singleton.
-Instantiate all thread instances on-the-fly in a 'threads' array, as
opposed to individual named variables.
crawler.py
-Add 'BitbucketCrawler()', to crawl Bitbucket for repositories.
-Not entirely tested for proper functionality.
-The Bitbucket framework is not yet accounted for in
'indexer._generate_file_url()'.
Add:
bitshift/crawler/indexer.py
-add _debug().
-add content to the module docstring; add documentation to GitIndexer,
and the functions that were lacking it.
-add another perl one-liner to supplement the `git clone` subprocess
call, which terminates it after a set amount of time (should it have
frozen) -- fixes a major bug that caused the entire indexer to hang.
Add, Fix:
bitshift/crawler/
__init__.py
-add module and crawl() docstrings.
-add repository_queue size limit.
crawler.py
-account for time spent executing an API query in the run() loop
sleep() interval.
Additions are not tested and not yet documented.
Add:
crawler.py
-add threaded GitHubCrawler class, which interacts with a GitIndexer
via a Queue.
git_indexer.py
-add threaded GitIndexer class, which interacts with GitHubCrawler via
a Queue.
-rename context-manager ChangeDir class to _ChangeDir, because it's
essentially "private".
__init__.py
-add body to crawl(), which creates instances of GitHubCrawler and
GitIndexer and starts them.
Add:
bitshift/codelet.py
-add name field to Codelet.
bitshift/crawler/crawler.py
-fix previously defunct code (which was committed at a point of
incompletion) -- incorrect dictionary keys, etc..
-reformat some function calls' argument alignment to fit PEP standards.
bitshift/crawler.py
-add sleep() to ensure that an API query is made at regular intervals
(determined by the GitHub API limit).