bitshift

Gráfico de commits

Autor	SHA1	Mensaje	Fecha
Severyn Kozak	7c5c9fc7e1	Add GitHub stars, Bitbucket watchers; close #14 . Add: bitshift/crawler/crawler.py -Add more efficient method of querying GitHub's API for stargazer counts, by batching 25 repositories per request. -Add watcher counts for Bitbucket repositories, by querying the Bitbucket API once per repository (inefficient, but the API in question isn't sufficiently robust to accommodate a better approach, and Git repositories surface so infrequently that there shouldn't be any query limit problems).	hace 10 años
Severyn Kozak	d142f1fd55	Complete Crawler. Close #15 , #14 , #11 , #8 . Several of the closed issues were addressed partly in previous commits; definitively close them with this, for the moment, final update to the crawler package. Ref: bitshift/crawler/indexer.py -move all `GitIndexer` specific functions (eg, `_decode`, `_is_ascii()`)from the global scope to the class definition.	hace 10 años
Severyn Kozak	6762c1fa3d	Re-add logging, rem file filters. Add: bitshift/ __init__.py -add `_configure_logging()`, which sets up a more robust logging infrastructure than was previously used: log files are rotated once per hour, and have some additional formatting rules. (crawler, indexer).py -add hierarchically-descending loggers to individual threaded classes (`GitHubCrawler`, `GitIndexer`, etc.); add logging calls. indexer.py -remove file filtering regex matches from `_get_tracked_files()`, as non-code files will be discarded by the parsers.	hace 10 años
Severyn Kozak	1b2739f8c4	Add GitHub repo star count, simple logging. Add: bitshift/crawler/crawler.py -add `_get_repo_stars()` to `GitHubCrawler`, which queries the GitHub API for the number of a stars that a given repository has. -log the `next_api_url` every time it's generated by `GitHubCrawler` and `BitbucketCrawler` to two respective log-files.	hace 10 años
Severyn Kozak	ad7ce9d9cf	Commit latest crawler, continue fix of #8 . Add: bitshift/crawler/*.py -Remove use of the `logging` module, which appeared to be causing a memory leak even with log-file rotation.	hace 10 años
Severyn Kozak	f38772760b	Remove some subprocesses, comment out logging. Add: bitshift/crawler/ (crawler, indexer).py -comment out all logging statements, as they may be causing a memory leak (the crawler is meant to run perpetually, meaning that, depending on how the `logging` module is implemented, it may be accumulating logged strings in memory.) bitshift/crawler/indexer.py -make `_index_repository()` and `_index_repository_codelets()` functions of the `GitIndexer` class. -replace `_get_tracked_files()` subprocess call, which found the files in a Git repository and removed any that were non-ASCII, with a pure Python solution. -add `_is_ascii()`.	hace 10 años
Severyn Kozak	2954161747	Add partially integrated BitbucketCrawler(). Add: bitshift/crawler/ __init__.py -Initialize 'BitbucketCrawler()' singleton. -Instantiate all thread instances on-the-fly in a 'threads' array, as opposed to individual named variables. crawler.py -Add 'BitbucketCrawler()', to crawl Bitbucket for repositories. -Not entirely tested for proper functionality. -The Bitbucket framework is not yet accounted for in 'indexer._generate_file_url()'.	hace 10 años
Severyn Kozak	93ed68645d	Add partially integrated BitbucketCrawler(). Add: bitshift/crawler/ __init__.py -Initialize 'BitbucketCrawler()' singleton. -Instantiate all thread instances on-the-fly in a 'threads' array, as opposed to individual named variables. crawler.py -Add 'BitbucketCrawler()', to crawl Bitbucket for repositories. -Not entirely tested for proper functionality. -The Bitbucket framework is not yet accounted for in 'indexer._generate_file_url()'.	hace 10 años
Severyn Kozak	6718650a8c	First part of #8 fix. Add: bitshift/crawler/indexer.py -Add 'pkill git' to the 'git clone' subprocess in '_clone_repository()', to kill hanging remotes -- it's un-Pythonic, but, thus far, the only method that's proved successful. The RAM problem still persists; the latest dry-run lasted 01:11:00 before terminating due to a lack of allocatable memory. -Add exception names to `logging` messages. bitshift/assets -Update 'tag()' docstring to current 'bitshift' standards (add a ':type' and ':rtype:' field).	hace 10 años
Severyn Kozak	3ce399adbf	Add threaded cloner, GitRepository class (#7 ). Add: bitshift/crawler/ (crawler, indexer).py -add a 'time.sleep()' call whenever a thread is blocking on items in a Queue, to prevent excessive polling (which hogs system resources). indexer.py -move 'git clone' functionality from the 'GitIndexer' singleton to a separate, threaded '_GitCloner'. -'crawler.GitHubCrawler' now shares a "clone" queue with '_GitCloner', which shares an "index" queue with 'GitIndexer'. -both indexing and cloning are time-intensive processes, so this improvement should (hypothetically) boost performance. -add `GitRepository` class, instances of which are passed around in the queues.	hace 10 años
Severyn Kozak	755dce6ae3	Add logging to crawler/indexer. Add: bitshift/crawler/(__init__, crawler, indexer).py -add `logging` module to all `bitshift.crawler` modules, for some basic diagnostic output.	hace 10 años
Severyn Kozak	f4b28e6178	Add file-ext regex rules, exception handlers. Add: bitshift/crawler/indexer.py -add two `try: except: pass` blocks, one to _decode() and another to GitIndexer.run(); bad practice, but GitIndexer has numerous unreliable moving parts that can throw too many unforseeable exceptions. Only current viable option. -add file-extension regex ignore rules (for text, markdown, etc. files) to _get_tracked_files().	hace 10 años
Severyn Kozak	627c848f20	Add tested indexer. Add: bitshift/crawler/indexer.py -add _debug(). -add content to the module docstring; add documentation to GitIndexer, and the functions that were lacking it. -add another perl one-liner to supplement the `git clone` subprocess call, which terminates it after a set amount of time (should it have frozen) -- fixes a major bug that caused the entire indexer to hang.	hace 10 años
Severyn Kozak	b680756f8d	Test crawler, complete documentation. Add, Fix: bitshift/crawler/ __init__.py -add module and crawl() docstrings. -add repository_queue size limit. crawler.py -account for time spent executing an API query in the run() loop sleep() interval.	hace 10 años
Severyn Kozak	b7ccec0501	Add untested threaded indexer/crawler prototype. Additions are not tested and not yet documented. Add: crawler.py -add threaded GitHubCrawler class, which interacts with a GitIndexer via a Queue. git_indexer.py -add threaded GitIndexer class, which interacts with GitHubCrawler via a Queue. -rename context-manager ChangeDir class to _ChangeDir, because it's essentially "private". __init__.py -add body to crawl(), which creates instances of GitHubCrawler and GitIndexer and starts them.	hace 10 años
Severyn Kozak	97198ee523	Update Crawler documentation. Add: bitshift/crawler/git_indexer.py -add some missing docstrings, complete others.	hace 10 años
Severyn Kozak	c655d97f48	Add class ChangeDir, amend unsafe subprocess. Add: bitshift/crawler/git_indexer.py -add ChangeDir class, a context-management wrapper for os.chdir(). -replace unsafe "rm -rf" subprocess call with shutil.rmtree()	hace 10 años
Severyn Kozak	9fc4598001	Clean up crawler/, fix minor bugs. Add: bitshift/codelet.py -add name field to Codelet. bitshift/crawler/crawler.py -fix previously defunct code (which was committed at a point of incompletion) -- incorrect dictionary keys, etc.. -reformat some function calls' argument alignment to fit PEP standards. bitshift/crawler.py -add sleep() to ensure that an API query is made at regular intervals (determined by the GitHub API limit).	hace 10 años
Severyn Kozak	77b448c3de	Mod Codelet, mov codelet creation from crawler. Add: bitshift/crawler/(crawler, git_indexer).py -move Codelet creation from the crawler to the git_indexer, in preparation for making crawling/indexing independent, threaded processes. Mod: bitshift/codelet.py -modify documentation for the author instance variable.	hace 10 años
Severyn Kozak	ef9c0609fe	Mov author_files > git_inder, heavily refactor. Add: bitshift/crawler/crawler.py -add base crawler module -add github(), to index Github. Mod: bitshift/crawler/ -add package subdirectory for the crawler module, and any subsidiary modules (eg, git_indexer). bitshift/author_files.py > bitshift/crawler/git_indexer.py -rename the module to "git_indexer", to better reflect its use. -convert from stand-alone script to a module whose functions integrate cleanly with the rest of the application. -add all necessary, tested functions, with Sphinx documentation.	hace 10 años
Severyn Kozak	ef73c04347	Add prototype repo-indexer script author_files.py. Add: author_files.py -add prototype script to output metadata about every file in a Git repository: filename, author names, dates of creation and modification. -lacking Sphinx documentation.	hace 10 años
Ben Kurtovic	a5cc3537cb	Credits.	hace 10 años
Severyn Kozak	20b518fccc	Minor refactor of codelet. Add: bitshift/codelet.py -complete docstrings, add filename to Codelet constructor.	hace 10 años
Severyn Kozak	6a4ba580ed	Add Codelet, crawler dependencies to setup. Add: bitshift/codelet.py -add Codelet class with constructor. README.md -add SASS stylesheet documentation	hace 10 años
Ben Kurtovic	902d734c28	Update __init__.py.	hace 10 años
Severyn Kozak	b70e2c961d	Update assets module with template docstring. Mod: bitshift/assets.py -convert existing docstrings to the Sphinx auto-doc format.	hace 10 años
Ben Kurtovic	0c68988982	CREATE THE THINGS	hace 10 años
Ben Kurtovic	6a9598fe12	Basic setup.py.	hace 10 años
Ben Kurtovic	08249e086e	Fix __init__.py and add some info to README.	hace 10 años
Ben Kurtovic	6adea4a97e	Adding basic sphinx documentation.	hace 10 años
Ben Kurtovic	404a2fb7e3	Fix names in license.	hace 10 años
Ben Kurtovic	82147c7b51	Fix description.	hace 10 años
Severyn Kozak	6ff65c0906	Merge branch 'master' into develop Conflicts: app.py	hace 10 años
Severyn Kozak	f24d2a6be2	Add assets/config module, SASS files, templates. Add: bitshift/assets.py -add module that contains functions to be called from inside the templates/ Jinja HTML files -- currently contains tag(), which generates an HTML asset tag based on a filename. bitshift/config.py -add Flask configuration module. static/(sass/main.sass, css/main.css) -create isolated directory for SASS files; compiled CSS files will be stored in static/css. static/css/_mixins.sass -add SASS partial to contain mixins (globally relevant to the project's styling). templates/layout.html -add various metadata.	hace 10 años
Severyn Kozak	9d06e0c442	Add skeleton dir-structure, content to files. Add: app.py -add boilerplate Flask source. bitshift/ -directory for all python source. templates/(layout, index).html -add global layout template, and placeholder home page. static/css/main.sass -add placeholder main SASS file.	hace 10 años
Ben Kurtovic	02c58909a7	app.py	hace 10 años
Ben Kurtovic	e3b711ade3	Initial commit	hace 10 años

37 Commits (7c5c9fc7e1c99c1d67146570c43e60d0b04c899f) Todas las Ramas Buscar

37 Commits (7c5c9fc7e1c99c1d67146570c43e60d0b04c899f)

Todas las Ramas