diff --git a/README.rst b/README.rst index 854ad08..5824805 100644 --- a/README.rst +++ b/README.rst @@ -19,10 +19,7 @@ made over 50,000 edits. A project to rewrite it from scratch began in early April 2011, thus moving away from the Pywikipedia framework and allowing for less overall code, better -integration between bot parts, and easier maintenance. Thanks to abstraction of -the core bot from tasks specific to my instance of it, the bot core can now be -used by other people with little additional work. How to take full advantage of -it is explained below. +integration between bot parts, and easier maintenance. Installation ------------ @@ -71,9 +68,9 @@ Setup ----- The bot stores its data in a "working directory", including its config file and -some databases. This is also the location where you will place custom IRC -commands and bot tasks, which will be explained later. It doesn't matter where -this directory is, as long as the bot can write to it. +databases. This is also the location where you will place custom IRC commands +and bot tasks, which will be explained later. It doesn't matter where this +directory is, as long as the bot can write to it. Start the bot with ``earwigbot path/to/working/dir``, or just ``earwigbot`` if the working directory is the current directory. It will notice that no @@ -90,19 +87,16 @@ then wait for instructions (as commands on IRC). For a list of commands, say "``!help``" (commands are messages prefixed with an exclamation mark). You can stop the bot at any time with Control+C, same as you stop a normal -Python program, and it will exit safely. You can also use the "``!quit``" -command on IRC for the same purpose. +Python program, and it will try to exit safely. You can also use the +"``!quit``" command on IRC. Customizing ----------- -The bot's directory contains a ``commands`` subdirectory and a ``tasks`` -subdirectory. Custom IRC commands can be placed in the former, whereas custom -wiki bot tasks go into the latter. Developing custom modules is explained -below, and in more detail through the bot's documentation on PyPI_. - -You can easily reload commands and tasks without restarting the bot by using -"``!reload``". +The bot's working directory contains a ``commands`` subdirectory and a +``tasks`` subdirectory. Custom IRC commands can be placed in the former, +whereas custom wiki bot tasks go into the latter. Developing custom modules is +explained below, and in more detail through the bot's documentation on PyPI_. Note that custom commands will override built-in commands and tasks with the same name. @@ -110,9 +104,55 @@ same name. ``Bot`` and ``BotConfig`` ~~~~~~~~~~~~~~~~~~~~~~~~~ -- ``bot.wiki``: entry into `the Wiki Toolset`_, explained below. +`earwigbot.bot.Bot`_ is EarwigBot's main class. You don't have to instantiate +this yourself, but it's good to be familiar with its attributes and methods, +because it is the main way to communicate with other parts of the bot. A +``Bot`` object is accessible as an attribute of commands and tasks (i.e., +``self.bot``). + +The most useful attributes are: + +- ``bot.config``: an instance of ``BotConfig``, for accessing the bot's + configuration data (see below). + +- ``bot.commands``: the bot's ``CommandManager``, which is used internally to + run IRC commands (through ``bot.commands.call()``, which you shouldn't have + to use); you can safely reload all commands with ``bot.commands.load()``. + +- ``bot.tasks``: the bot's ``TaskManager``, which can be used to start tasks + with ``bot.tasks.start(task_name, **kwargs)``. ``bot.tasks.load()`` can be + used to safely reload all tasks. + +- ``bot.frontend`` / ``bot.watcher``: instances of ``earwigbot.irc.Frontend`` + and ``earwigbot.irc.Watcher``, respectively, which represent the bot's + connections to these two servers; you can, for example, send a message to the + frontend with ``bot.frontend.say(chan, msg)`` (more on communicating with IRC + below). + +- ``bot.wiki``: interface with `the Wiki Toolset`_ (see below). -XXX: TODO +- Finally, ``bot.restart()`` (restarts IRC components and reloads config, + commands, and tasks) and ``bot.stop()`` can be used almost anywhere. Both + take an optional "reason" that will be logged and used as the quit message + when disconnecting from IRC. + +`earwigbot.config.BotConfig`_ stores configuration information for the bot. Its +``__doc__``string explains what each attribute is used for, but essentially +each "node" (one of ``config.components``, ``wiki``, ``tasks``, ``irc``, or +``metadata``) maps to a section of the bot's ``config.yml`` file. For example, +if ``config.yml`` includes something like:: + + irc: + frontend: + nick: MyAwesomeBot + channels: + - "##earwigbot" + - "#channel" + - "#other-channel" + +...then ``config.irc["frontend"]["nick"]`` will be ``"MyAwesomeBot"`` and +``config.irc["frontend"]["channels"]`` will be ``["##earwigbot", "#channel", +"#other-channel"]``. Custom IRC commands ~~~~~~~~~~~~~~~~~~~ @@ -120,9 +160,8 @@ Custom IRC commands Custom commands are subclasses of `earwigbot.commands.BaseCommand`_ that override ``BaseCommand``'s ``process()`` (and optionally ``check()``) methods. -``BaseCommand``'s ``__doc__``-strings should explain what each attribute and -method is for and what they should be overridden with, but these are the -basics: +``BaseCommand``'s docstrings should explain what each attribute andmethod is +for and what they should be overridden with, but these are the basics: - Class attribute ``name`` is the name of the command. This must be specified. @@ -145,6 +184,7 @@ basics: goes. To respond to IRC messages, there are a number of methods of ``BaseCommand`` at your disposal. See the the test_ command for a simple example, or look in BaseCommand's ``__init__`` method for the full list. + The most common ones are ``self.say(chan_or_user, msg)``, ``self.reply(data, msg)`` (convenience function; sends a reply to the issuer of the command in the channel it was received), @@ -158,7 +198,7 @@ for readability. The bot has a wide selection of built-in commands and plugins to act as sample code and/or to give ideas. Start with test_, and then check out chanops_ and -afc_status_ for some more complicated scripts! +afc_status_ for some more complicated scripts. Custom bot tasks ~~~~~~~~~~~~~~~~ @@ -166,8 +206,8 @@ Custom bot tasks Custom tasks are subclasses of `earwigbot.tasks.BaseTask`_ that override ``BaseTask``'s ``run()`` (and optionally ``setup()``) methods. -``BaseTask``'s ``_doc__``-strings should explain what each attribute and method -is for and what they should be overridden with, but these are the basics: +``BaseTask``'s docstrings should explain what each attribute and method is for +and what they should be overridden with, but these are the basics: - Class attribute ``name`` is the name of the task. This must be specified. @@ -176,10 +216,11 @@ is for and what they should be overridden with, but these are the basics: For example, EarwigBot's ``config.wiki["summary"]`` is ``"([[WP:BOT|Bot]]; [[User:EarwigBot#Task $1|Task $1]]): $2"``, which the task class's ``make_summary(comment)`` method will take and replace ``$1`` - with the task number and ``$2`` with the details of the edit. Additionally, - ``shutoff_enabled()`` (which checks whether the bot has been told to stop - on-wiki by checking the content of a particular page) can check a different - page for each task using similar variables. EarwigBot's + with the task number and ``$2`` with the details of the edit. + + Additionally, ``shutoff_enabled()`` (which checks whether the bot has been + told to stop on-wiki by checking the content of a particular page) can check + a different page for each task using similar variables. EarwigBot's ``config.wiki["shutoff"]["page"]`` is ``"User:$1/Shutoff/Task $2"``; ``$1`` is substituted with the bot's username, and ``$2`` is substituted with the task number, so, e.g., task #14 checks the page @@ -192,7 +233,8 @@ is for and what they should be overridden with, but these are the basics: - Method ``setup()`` is called *once* with no arguments immediately after the task is first loaded. Does nothing by default; treat it like an ``__init__()`` if you want (``__init__()`` does things by default and a - dedicated setup method is easier than asking people to use ``super``). + dedicated setup method is often easier than overriding ``__init__()`` and + using ``super``). - Method ``run()`` is called with any number of keyword arguments every time the task is executed (by ``bot.tasks.start(task_name, **kwargs)``, usually). @@ -218,16 +260,201 @@ The Wiki Toolset EarwigBot's answer to the `Pywikipedia framework`_ is the Wiki Toolset (``earwigbot.wiki``), which you will mainly access through ``bot.wiki``. -XXX: TODO +``bot.wiki`` provides three methods for the management of Sites - +``get_site()``, ``add_site()``, and ``remove_site()``. Sites are objects that +simply represent a MediaWiki site. A single instance of EarwigBot (i.e. a +single *working directory*) is expected to relate to a single site or group of +sites using the same login info (like all WMF wikis with CentralAuth). + +Load your default site (the one that you picked during setup) with +``site = bot.wiki.get_site()``. + +Dealing with other sites +^^^^^^^^^^^^^^^^^^^^^^^^ + +*Skip this section if you're only working with one site.* + +If a site is *already known to the bot* (meaning that it is stored in the +``sites.db`` file, which includes just your default wiki at first), you can +load a site with ``site = bot.wiki.get_site(name)``, where ``name`` might be +``"enwiki"`` or ``"frwiktionary"`` (you can also do +``site = bot.wiki.get_site(project="wikipedia", lang="en")``). Recall that not +giving any arguments to ``get_site()`` will return the default site. + +``add_site()`` is used to add new sites to the sites database. It may be called +with similar arguments as ``get_site()``, but the difference is important. +``get_site()`` only needs enough information to identify the site in its +database, which is usually just its name; the database stores all other +necessary connection info. With ``add_site()``, you need to provide enough +connection info so the toolset can successfully access the site's API/SQL +databases and store that information for later. That might not be much; for +WMF wikis, you can usually use code like this:: + + project, lang = "wikipedia", "es" + try: + site = bot.wiki.get_site(project=project, lang=lang) + except earwigbot.SiteNotFoundError: + # Load site info from http://es.wikipedia.org/w/api.php: + site = bot.wiki.add_site(project=project, lang=lang) + +This works because EarwigBot assumes that the URL for the site is +``"//{lang}.{project}.org"`` and the API is at ``/w/api.php``; this might +change if you're dealing with non-WMF wikis, where the code might look +something more like:: + + project, lang = "mywiki", "it" + try: + site = bot.wiki.get_site(project=project, lang=lang) + except earwigbot.SiteNotFoundError: + Load site info from http://mysite.net/mywiki/it/s/api.php: + base_url = "http://mysite.net/" + project + "/" + lang + db_name = lang + project + "_p" + sql = {host: "sql.mysite.net", db: db_name} + site = bot.wiki.add_site(base_url=base_url, script_path="/s", sql=sql) + +``remove_site()`` does the opposite of ``add_site()``: give it a site's name +or a project/lang pair like ``get_site()`` takes, and it'll remove that site +from the sites database. + +Sites +^^^^^ + +``Site`` objects provide the following attributes: + +- ``name``: the site's name (or "wikiid"), like ``"enwiki"`` +- ``project``: the site's project name, like ``"wikipedia"`` +- ``lang``: the site's language code, like ``"en"`` +- ``domain``: the site's web domain, like ``"en.wikipedia.org"`` + +and the following methods: + +- ``api_query(**kwargs)``: does an API query with the given keyword arguments + as params +- ``sql_query(query, params=(), ...)``: does an SQL query and yields its + results (as a generator) +- ``get_replag()``: returns the estimated database replication lag (if we have + the site's SQL connection info) +- ``namespace_id_to_name(id, all=False)``: given a namespace ID, returns the + primary associated namespace name (or a list of all names when ``all`` is + ``True``) +- ``namespace_name_to_id(name)``: given a namespace name, returns the + associated namespace ID +- ``get_page(title, follow_redirects=False)``: returns a ``Page`` object for + the given title (or a ``Category`` object if the page's namespace is + "``Category:``") +- ``get_category(catname, follow_redirects=False)``: returns a ``Category`` + object for the given title (sans namespace) +- ``get_user(username)``: returns a ``User`` object for the given username + +Pages (and Categories) +^^^^^^^^^^^^^^^^^^^^^^ + +Create ``Page`` objects with ``site.get_page(title)``, +``page.toggle_talk()``, ``user.get_userpage()``, or ``user.get_talkpage()``. +They provide the following attributes: + +- ``title``: the page's title, or pagename +- ``exists``: whether the page exists +- ``pageid``: an integer ID representing the page +- ``url``: the page's URL +- ``namespace``: the page's namespace as an integer +- ``protection``: the page's current protection status +- ``is_talkpage``: ``True`` if the page is a talkpage, else ``False`` +- ``is_redirect``: ``True`` if the page is a redirect, else ``False`` + +and the following methods: + +- ``reload()``: forcibly reload the page's attributes (emphasis on *reload* - + this is only necessary if there is reason to believe they have changed) +- ``toggle_talk(...)``: returns a content page's talk page, or vice versa +- ``get()``: returns page content +- ``get_redirect_target()``: if the page is a redirect, returns its destination +- ``get_creator()``: returns a ``User`` object representing the first user to + edit the page +- ``edit(text, summary, minor=False, bot=True, force=False)``: replaces the + page's content with ``text`` or creates a new page +- ``add_section(text, title, minor=False, bot=True, force=False)``: adds a new + section named ``title`` at the bottom of the page +- ``copyvio_check(...)``: checks the page for copyright violations +- ``copyvio_compare(url, ...)``: checks the page like ``copyvio_check()``, but + against a specific URL + +Additionally, ``Category`` objects (created with ``site.get_category(name)`` or +``site.get_page(title)`` where ``title`` is in the ``Category:`` namespace) +provide the following additional method: + +- ``get_members(use_sql=False, limit=None)``: returns a list of page titles in + the category (limit is ``50`` by default if using the API) + +Users +^^^^^ + +Create ``User`` objects with ``site.get_user(name)`` or +``page.get_creator()``. They provide the following attributes: + +- ``name``: the user's username +- ``exists``: ``True`` if the user exists, or ``False`` if they do not +- ``userid``: an integer ID representing the user +- ``blockinfo``: information about any current blocks on the user (``False`` if + no block, or a dict of ``{"by": blocking_user, "reason": block_reason, + "expiry": block_expire_time}``) +- ``groups``: a list of the user's groups +- ``rights``: a list of the user's rights +- ``editcount``: the number of edits made by the user +- ``registration``: the time the user registered as a ``time.struct_time`` +- ``emailable``: ``True`` if you can email the user, ``False`` if you cannot +- ``gender``: the user's gender (``"male"``, ``"female"``, or ``"unknown"``) + +and the following methods: + +- ``reload()``: forcibly reload the user's attributes (emphasis on *reload* - + this is only necessary if there is reason to believe they have changed) +- ``get_userpage()``: returns a ``Page`` object representing the user's + userpage +- ``get_talkpage()``: returns a ``Page`` object representing the user's + talkpage + +Additional features +^^^^^^^^^^^^^^^^^^^ + +Not all aspects of the toolset are covered here. Explore `its code and +docstrings`_ to learn how to use it in a more hands-on fashion. For reference, +``bot.wiki`` is an instance of ``earwigbot.wiki.SitesDB`` tied to the +``sites.db`` file in the bot's working directory. Tips ---- -- Logging_ is a fantastic way to track the bot's progress +- Logging_ is a fantastic way to monitor the bot's progress as it runs. It has + a slew of built-in loggers, and enabling log retention (so logs are saved to + ``logs/`` in the working directory) is highly recommended. In the normal + setup, there are three log files, each of which "rotate" at a specific time + (``filename.log`` becomes ``filename.log.2012-04-10``, for example). The + ``debug.log`` file rotates every hour, and maintains six hours of logs of + every level (``DEBUG`` and up). ``bot.log`` rotates every day at midnight, + and maintains seven days of non-debug logs (``INFO`` and up). Finally, + ``error.log`` rotates every Sunday night, and maintains four weeks of logs + indicating unexpected events (``WARNING`` and up). + + To use logging in your commands or tasks (recommended), ``BaseCommand`` and + ``BaseTask`` provide ``logger`` attributes configured for the specific + command or task. If you're working with other classes, ``bot.logger`` is the + root logger (``logging.getLogger("earwigbot")`` by default), so you can use + ``getChild`` to make your logger. For example, task loggers are essentially + ``bot.logger.getChild("tasks").getChild(task.name)``. + +- A very useful IRC command is "``!reload``", which reloads all commands and + tasks without restarting the bot. [3]_ Combined with using the `!git plugin`_ + for pulling repositories from IRC, this can provide a seamless command/task + development workflow if the bot runs on an external server and you set up + its working directory as a git repo. - You can run a task by itself instead of the entire bot with ``earwigbot path/to/working/dir --task task_name``. +- Questions, comments, or suggestions about the documentation? `Let me know`_ + so I can improve it for other people. + Footnotes --------- @@ -250,6 +477,9 @@ Footnotes ones generated by joins will only have ``chan``, ``nick``, ``ident``, and ``host``. +.. [3] In reality, all this does is call ``bot.commands.load()`` and + ``bot.tasks.load()``! + .. _EarwigBot: http://en.wikipedia.org/wiki/User:EarwigBot .. _Python: http://python.org/ .. _Wikipedia: http://en.wikipedia.org/ @@ -264,6 +494,8 @@ Footnotes .. _get pip: http://pypi.python.org/pypi/pip .. _git flow: http://nvie.com/posts/a-successful-git-branching-model/ .. _explanation of YAML: http://en.wikipedia.org/wiki/YAML +.. _earwigbot.bot.Bot: https://github.com/earwig/earwigbot/blob/develop/earwigbot/bot.py +.. _earwigbot.config.BotConfig: https://github.com/earwig/earwigbot/blob/develop/earwigbot/config.py .. _earwigbot.commands.BaseCommand: https://github.com/earwig/earwigbot/blob/develop/earwigbot/commands/__init__.py .. _afc_status: https://github.com/earwig/earwigbot-plugins/blob/develop/commands/afc_status.py .. _chanops: https://github.com/earwig/earwigbot/blob/develop/earwigbot/commands/chanops.py @@ -271,5 +503,8 @@ Footnotes .. _earwigbot.tasks.BaseTask: https://github.com/earwig/earwigbot/blob/develop/earwigbot/tasks/__init__.py .. _wikiproject_tagger: https://github.com/earwig/earwigbot/blob/develop/earwigbot/tasks/wikiproject_tagger.py .. _afc_statistics: https://github.com/earwig/earwigbot-plugins/blob/develop/tasks/afc_statistics.py +.. _its code and docstrings: https://github.com/earwig/earwigbot/tree/develop/earwigbot/wiki .. _logging: http://docs.python.org/library/logging.html +.. _Let me know: ben.kurtovic@verizon.net +.. _!git plugin: https://github.com/earwig/earwigbot-plugins/blob/develop/commands/git.py .. _ident: http://en.wikipedia.org/wiki/Ident diff --git a/earwigbot/irc/frontend.py b/earwigbot/irc/frontend.py index 2065ff1..b13f5b0 100644 --- a/earwigbot/irc/frontend.py +++ b/earwigbot/irc/frontend.py @@ -57,7 +57,7 @@ class Frontend(IRCConnection): data.nick, data.ident, data.host = self.sender_regex.findall(line[0])[0] data.chan = line[2] data.parse_args() - self.bot.commands.check("join", data) + self.bot.commands.call("join", data) elif line[1] == "PRIVMSG": data.nick, data.ident, data.host = self.sender_regex.findall(line[0])[0] @@ -69,13 +69,13 @@ class Frontend(IRCConnection): # This is a privmsg to us, so set 'chan' as the nick of the # sender, then check for private-only command hooks: data.chan = data.nick - self.bot.commands.check("msg_private", data) + self.bot.commands.call("msg_private", data) else: # Check for public-only command hooks: - self.bot.commands.check("msg_public", data) + self.bot.commands.call("msg_public", data) # Check for command hooks that apply to all messages: - self.bot.commands.check("msg", data) + self.bot.commands.call("msg", data) elif line[0] == "PING": # If we are pinged, pong back self.pong(line[1]) diff --git a/earwigbot/managers.py b/earwigbot/managers.py index 293894c..62d1304 100644 --- a/earwigbot/managers.py +++ b/earwigbot/managers.py @@ -147,8 +147,8 @@ class CommandManager(_ResourceManager): base = super(CommandManager, self) base.__init__(bot, "commands", "Command", BaseCommand) - def check(self, hook, data): - """Given an IRC event, check if there's anything we can respond to.""" + def call(self, hook, data): + """Given a hook type and a Data object, respond appropriately.""" self.lock.acquire() for command in self._resources.itervalues(): if hook in command.hooks and command._wrap_check(data): diff --git a/earwigbot/wiki/category.py b/earwigbot/wiki/category.py index 67426f4..bcc8cb1 100644 --- a/earwigbot/wiki/category.py +++ b/earwigbot/wiki/category.py @@ -35,7 +35,7 @@ class Category(Page): because it accepts category names without the namespace prefix. Public methods: - members -- returns a list of page titles in the category + get_members -- returns a list of page titles in the category """ def __repr__(self): diff --git a/earwigbot/wiki/page.py b/earwigbot/wiki/page.py index 957878e..8772d53 100644 --- a/earwigbot/wiki/page.py +++ b/earwigbot/wiki/page.py @@ -38,19 +38,23 @@ class Page(CopyrightMixin): about the page, getting page content, and so on. Category is a subclass of Page with additional methods. + Attributes: + title -- the page's title, or pagename + exists -- whether the page exists + pageid -- an integer ID representing the page + url -- the page's URL + namespace -- the page's namespace as an integer + protection -- the page's current protection status + is_talkpage -- True if the page is a talkpage, else False + is_redirect -- True if the page is a redirect, else False + Public methods: - title -- returns the page's title, or pagename - exists -- returns whether the page exists - pageid -- returns an integer ID representing the page - url -- returns the page's URL - namespace -- returns the page's namespace as an integer - protection -- returns the page's current protection status - creator -- returns the page's creator (first user to edit) - is_talkpage -- returns True if the page is a talkpage, else False - is_redirect -- returns True if the page is a redirect, else False + reload -- forcibly reload the page's attributes toggle_talk -- returns a content page's talk page, or vice versa get -- returns page content get_redirect_target -- if the page is a redirect, returns its destination + get_creator -- returns a User object representing the first person + to edit the page edit -- replaces the page's content or creates a new page add_section -- adds a new section at the bottom of the page copyvio_check -- checks the page for copyright violations diff --git a/earwigbot/wiki/site.py b/earwigbot/wiki/site.py index 1f4d277..4154fe0 100644 --- a/earwigbot/wiki/site.py +++ b/earwigbot/wiki/site.py @@ -55,16 +55,18 @@ class Site(object): instances, tools.add_site() for adding new ones to config, and tools.del_site() for removing old ones from config, should suffice. + Attributes: + name -- the site's name (or "wikiid"), like "enwiki" + project -- the site's project name, like "wikipedia" + lang -- the site's language code, like "en" + domain -- the site's web domain, like "en.wikipedia.org" + Public methods: - name -- returns our name (or "wikiid"), like "enwiki" - project -- returns our project name, like "wikipedia" - lang -- returns our language code, like "en" - domain -- returns our web domain, like "en.wikipedia.org" api_query -- does an API query with the given kwargs as params sql_query -- does an SQL query and yields its results get_replag -- returns the estimated database replication lag namespace_id_to_name -- given a namespace ID, returns associated name(s) - namespace_name_to_id -- given a namespace name, returns associated id + namespace_name_to_id -- given a namespace name, returns the associated ID get_page -- returns a Page object for the given title get_category -- returns a Category object for the given title get_user -- returns a User object for the given username diff --git a/earwigbot/wiki/user.py b/earwigbot/wiki/user.py index 2747e2d..6c51051 100644 --- a/earwigbot/wiki/user.py +++ b/earwigbot/wiki/user.py @@ -36,17 +36,20 @@ class User(object): information about the user, such as editcount and user rights, methods for returning the user's userpage and talkpage, etc. + Attributes: + name -- the user's username + exists -- True if the user exists, or False if they do not + userid -- an integer ID representing the user + blockinfo -- information about any current blocks on the user + groups -- a list of the user's groups + rights -- a list of the user's rights + editcount -- the number of edits made by the user + registration -- the time the user registered as a time.struct_time + emailable -- True if you can email the user, False if you cannot + gender -- the user's gender ("male", "female", or "unknown") + Public methods: - name -- returns the user's username - exists -- returns True if the user exists, False if they do not - userid -- returns an integer ID representing the user - blockinfo -- returns information about a current block on the user - groups -- returns a list of the user's groups - rights -- returns a list of the user's rights - editcount -- returns the number of edits made by the user - registration -- returns the time the user registered as a time.struct_time - emailable -- returns True if you can email the user, False if you cannot - gender -- returns the user's gender ("male", "female", or "unknown") + reload -- forcibly reload the user's attributes get_userpage -- returns a Page object representing the user's userpage get_talkpage -- returns a Page object representing the user's talkpage """