Bladeren bron

Updating README and class docstrings throughout wiki toolset

* Much of the documentation doesn't match the code; code will be
updated in develop soon
tags/v0.1^2
Ben Kurtovic 12 jaren geleden
bovenliggende
commit
80091b2de3
7 gewijzigde bestanden met toevoegingen van 306 en 62 verwijderingen
  1. +266
    -31
      README.rst
  2. +4
    -4
      earwigbot/irc/frontend.py
  3. +2
    -2
      earwigbot/managers.py
  4. +1
    -1
      earwigbot/wiki/category.py
  5. +13
    -9
      earwigbot/wiki/page.py
  6. +7
    -5
      earwigbot/wiki/site.py
  7. +13
    -10
      earwigbot/wiki/user.py

+ 266
- 31
README.rst Bestand weergeven

@@ -19,10 +19,7 @@ made over 50,000 edits.

A project to rewrite it from scratch began in early April 2011, thus moving
away from the Pywikipedia framework and allowing for less overall code, better
integration between bot parts, and easier maintenance. Thanks to abstraction of
the core bot from tasks specific to my instance of it, the bot core can now be
used by other people with little additional work. How to take full advantage of
it is explained below.
integration between bot parts, and easier maintenance.

Installation
------------
@@ -71,9 +68,9 @@ Setup
-----

The bot stores its data in a "working directory", including its config file and
some databases. This is also the location where you will place custom IRC
commands and bot tasks, which will be explained later. It doesn't matter where
this directory is, as long as the bot can write to it.
databases. This is also the location where you will place custom IRC commands
and bot tasks, which will be explained later. It doesn't matter where this
directory is, as long as the bot can write to it.

Start the bot with ``earwigbot path/to/working/dir``, or just ``earwigbot`` if
the working directory is the current directory. It will notice that no
@@ -90,19 +87,16 @@ then wait for instructions (as commands on IRC). For a list of commands, say
"``!help``" (commands are messages prefixed with an exclamation mark).

You can stop the bot at any time with Control+C, same as you stop a normal
Python program, and it will exit safely. You can also use the "``!quit``"
command on IRC for the same purpose.
Python program, and it will try to exit safely. You can also use the
"``!quit``" command on IRC.

Customizing
-----------

The bot's directory contains a ``commands`` subdirectory and a ``tasks``
subdirectory. Custom IRC commands can be placed in the former, whereas custom
wiki bot tasks go into the latter. Developing custom modules is explained
below, and in more detail through the bot's documentation on PyPI_.

You can easily reload commands and tasks without restarting the bot by using
"``!reload``".
The bot's working directory contains a ``commands`` subdirectory and a
``tasks`` subdirectory. Custom IRC commands can be placed in the former,
whereas custom wiki bot tasks go into the latter. Developing custom modules is
explained below, and in more detail through the bot's documentation on PyPI_.

Note that custom commands will override built-in commands and tasks with the
same name.
@@ -110,9 +104,55 @@ same name.
``Bot`` and ``BotConfig``
~~~~~~~~~~~~~~~~~~~~~~~~~

- ``bot.wiki``: entry into `the Wiki Toolset`_, explained below.
`earwigbot.bot.Bot`_ is EarwigBot's main class. You don't have to instantiate
this yourself, but it's good to be familiar with its attributes and methods,
because it is the main way to communicate with other parts of the bot. A
``Bot`` object is accessible as an attribute of commands and tasks (i.e.,
``self.bot``).

The most useful attributes are:

- ``bot.config``: an instance of ``BotConfig``, for accessing the bot's
configuration data (see below).

- ``bot.commands``: the bot's ``CommandManager``, which is used internally to
run IRC commands (through ``bot.commands.call()``, which you shouldn't have
to use); you can safely reload all commands with ``bot.commands.load()``.

- ``bot.tasks``: the bot's ``TaskManager``, which can be used to start tasks
with ``bot.tasks.start(task_name, **kwargs)``. ``bot.tasks.load()`` can be
used to safely reload all tasks.

- ``bot.frontend`` / ``bot.watcher``: instances of ``earwigbot.irc.Frontend``
and ``earwigbot.irc.Watcher``, respectively, which represent the bot's
connections to these two servers; you can, for example, send a message to the
frontend with ``bot.frontend.say(chan, msg)`` (more on communicating with IRC
below).

- ``bot.wiki``: interface with `the Wiki Toolset`_ (see below).

XXX: TODO
- Finally, ``bot.restart()`` (restarts IRC components and reloads config,
commands, and tasks) and ``bot.stop()`` can be used almost anywhere. Both
take an optional "reason" that will be logged and used as the quit message
when disconnecting from IRC.

`earwigbot.config.BotConfig`_ stores configuration information for the bot. Its
``__doc__``string explains what each attribute is used for, but essentially
each "node" (one of ``config.components``, ``wiki``, ``tasks``, ``irc``, or
``metadata``) maps to a section of the bot's ``config.yml`` file. For example,
if ``config.yml`` includes something like::

irc:
frontend:
nick: MyAwesomeBot
channels:
- "##earwigbot"
- "#channel"
- "#other-channel"

...then ``config.irc["frontend"]["nick"]`` will be ``"MyAwesomeBot"`` and
``config.irc["frontend"]["channels"]`` will be ``["##earwigbot", "#channel",
"#other-channel"]``.

Custom IRC commands
~~~~~~~~~~~~~~~~~~~
@@ -120,9 +160,8 @@ Custom IRC commands
Custom commands are subclasses of `earwigbot.commands.BaseCommand`_ that
override ``BaseCommand``'s ``process()`` (and optionally ``check()``) methods.

``BaseCommand``'s ``__doc__``-strings should explain what each attribute and
method is for and what they should be overridden with, but these are the
basics:
``BaseCommand``'s docstrings should explain what each attribute andmethod is
for and what they should be overridden with, but these are the basics:

- Class attribute ``name`` is the name of the command. This must be specified.

@@ -145,6 +184,7 @@ basics:
goes. To respond to IRC messages, there are a number of methods of
``BaseCommand`` at your disposal. See the the test_ command for a simple
example, or look in BaseCommand's ``__init__`` method for the full list.

The most common ones are ``self.say(chan_or_user, msg)``,
``self.reply(data, msg)`` (convenience function; sends a reply to the
issuer of the command in the channel it was received),
@@ -158,7 +198,7 @@ for readability.

The bot has a wide selection of built-in commands and plugins to act as sample
code and/or to give ideas. Start with test_, and then check out chanops_ and
afc_status_ for some more complicated scripts!
afc_status_ for some more complicated scripts.

Custom bot tasks
~~~~~~~~~~~~~~~~
@@ -166,8 +206,8 @@ Custom bot tasks
Custom tasks are subclasses of `earwigbot.tasks.BaseTask`_ that override
``BaseTask``'s ``run()`` (and optionally ``setup()``) methods.

``BaseTask``'s ``_doc__``-strings should explain what each attribute and method
is for and what they should be overridden with, but these are the basics:
``BaseTask``'s docstrings should explain what each attribute and method is for
and what they should be overridden with, but these are the basics:

- Class attribute ``name`` is the name of the task. This must be specified.

@@ -176,10 +216,11 @@ is for and what they should be overridden with, but these are the basics:
For example, EarwigBot's ``config.wiki["summary"]`` is
``"([[WP:BOT|Bot]]; [[User:EarwigBot#Task $1|Task $1]]): $2"``, which the
task class's ``make_summary(comment)`` method will take and replace ``$1``
with the task number and ``$2`` with the details of the edit. Additionally,
``shutoff_enabled()`` (which checks whether the bot has been told to stop
on-wiki by checking the content of a particular page) can check a different
page for each task using similar variables. EarwigBot's
with the task number and ``$2`` with the details of the edit.
Additionally, ``shutoff_enabled()`` (which checks whether the bot has been
told to stop on-wiki by checking the content of a particular page) can check
a different page for each task using similar variables. EarwigBot's
``config.wiki["shutoff"]["page"]`` is ``"User:$1/Shutoff/Task $2"``; ``$1``
is substituted with the bot's username, and ``$2`` is substituted with the
task number, so, e.g., task #14 checks the page
@@ -192,7 +233,8 @@ is for and what they should be overridden with, but these are the basics:
- Method ``setup()`` is called *once* with no arguments immediately after the
task is first loaded. Does nothing by default; treat it like an
``__init__()`` if you want (``__init__()`` does things by default and a
dedicated setup method is easier than asking people to use ``super``).
dedicated setup method is often easier than overriding ``__init__()`` and
using ``super``).

- Method ``run()`` is called with any number of keyword arguments every time
the task is executed (by ``bot.tasks.start(task_name, **kwargs)``, usually).
@@ -218,16 +260,201 @@ The Wiki Toolset
EarwigBot's answer to the `Pywikipedia framework`_ is the Wiki Toolset
(``earwigbot.wiki``), which you will mainly access through ``bot.wiki``.

XXX: TODO
``bot.wiki`` provides three methods for the management of Sites -
``get_site()``, ``add_site()``, and ``remove_site()``. Sites are objects that
simply represent a MediaWiki site. A single instance of EarwigBot (i.e. a
single *working directory*) is expected to relate to a single site or group of
sites using the same login info (like all WMF wikis with CentralAuth).

Load your default site (the one that you picked during setup) with
``site = bot.wiki.get_site()``.

Dealing with other sites
^^^^^^^^^^^^^^^^^^^^^^^^

*Skip this section if you're only working with one site.*

If a site is *already known to the bot* (meaning that it is stored in the
``sites.db`` file, which includes just your default wiki at first), you can
load a site with ``site = bot.wiki.get_site(name)``, where ``name`` might be
``"enwiki"`` or ``"frwiktionary"`` (you can also do
``site = bot.wiki.get_site(project="wikipedia", lang="en")``). Recall that not
giving any arguments to ``get_site()`` will return the default site.

``add_site()`` is used to add new sites to the sites database. It may be called
with similar arguments as ``get_site()``, but the difference is important.
``get_site()`` only needs enough information to identify the site in its
database, which is usually just its name; the database stores all other
necessary connection info. With ``add_site()``, you need to provide enough
connection info so the toolset can successfully access the site's API/SQL
databases and store that information for later. That might not be much; for
WMF wikis, you can usually use code like this::

project, lang = "wikipedia", "es"
try:
site = bot.wiki.get_site(project=project, lang=lang)
except earwigbot.SiteNotFoundError:
# Load site info from http://es.wikipedia.org/w/api.php:
site = bot.wiki.add_site(project=project, lang=lang)

This works because EarwigBot assumes that the URL for the site is
``"//{lang}.{project}.org"`` and the API is at ``/w/api.php``; this might
change if you're dealing with non-WMF wikis, where the code might look
something more like::

project, lang = "mywiki", "it"
try:
site = bot.wiki.get_site(project=project, lang=lang)
except earwigbot.SiteNotFoundError:
Load site info from http://mysite.net/mywiki/it/s/api.php:
base_url = "http://mysite.net/" + project + "/" + lang
db_name = lang + project + "_p"
sql = {host: "sql.mysite.net", db: db_name}
site = bot.wiki.add_site(base_url=base_url, script_path="/s", sql=sql)

``remove_site()`` does the opposite of ``add_site()``: give it a site's name
or a project/lang pair like ``get_site()`` takes, and it'll remove that site
from the sites database.

Sites
^^^^^

``Site`` objects provide the following attributes:

- ``name``: the site's name (or "wikiid"), like ``"enwiki"``
- ``project``: the site's project name, like ``"wikipedia"``
- ``lang``: the site's language code, like ``"en"``
- ``domain``: the site's web domain, like ``"en.wikipedia.org"``

and the following methods:

- ``api_query(**kwargs)``: does an API query with the given keyword arguments
as params
- ``sql_query(query, params=(), ...)``: does an SQL query and yields its
results (as a generator)
- ``get_replag()``: returns the estimated database replication lag (if we have
the site's SQL connection info)
- ``namespace_id_to_name(id, all=False)``: given a namespace ID, returns the
primary associated namespace name (or a list of all names when ``all`` is
``True``)
- ``namespace_name_to_id(name)``: given a namespace name, returns the
associated namespace ID
- ``get_page(title, follow_redirects=False)``: returns a ``Page`` object for
the given title (or a ``Category`` object if the page's namespace is
"``Category:``")
- ``get_category(catname, follow_redirects=False)``: returns a ``Category``
object for the given title (sans namespace)
- ``get_user(username)``: returns a ``User`` object for the given username

Pages (and Categories)
^^^^^^^^^^^^^^^^^^^^^^

Create ``Page`` objects with ``site.get_page(title)``,
``page.toggle_talk()``, ``user.get_userpage()``, or ``user.get_talkpage()``.
They provide the following attributes:

- ``title``: the page's title, or pagename
- ``exists``: whether the page exists
- ``pageid``: an integer ID representing the page
- ``url``: the page's URL
- ``namespace``: the page's namespace as an integer
- ``protection``: the page's current protection status
- ``is_talkpage``: ``True`` if the page is a talkpage, else ``False``
- ``is_redirect``: ``True`` if the page is a redirect, else ``False``

and the following methods:

- ``reload()``: forcibly reload the page's attributes (emphasis on *reload* -
this is only necessary if there is reason to believe they have changed)
- ``toggle_talk(...)``: returns a content page's talk page, or vice versa
- ``get()``: returns page content
- ``get_redirect_target()``: if the page is a redirect, returns its destination
- ``get_creator()``: returns a ``User`` object representing the first user to
edit the page
- ``edit(text, summary, minor=False, bot=True, force=False)``: replaces the
page's content with ``text`` or creates a new page
- ``add_section(text, title, minor=False, bot=True, force=False)``: adds a new
section named ``title`` at the bottom of the page
- ``copyvio_check(...)``: checks the page for copyright violations
- ``copyvio_compare(url, ...)``: checks the page like ``copyvio_check()``, but
against a specific URL

Additionally, ``Category`` objects (created with ``site.get_category(name)`` or
``site.get_page(title)`` where ``title`` is in the ``Category:`` namespace)
provide the following additional method:

- ``get_members(use_sql=False, limit=None)``: returns a list of page titles in
the category (limit is ``50`` by default if using the API)

Users
^^^^^

Create ``User`` objects with ``site.get_user(name)`` or
``page.get_creator()``. They provide the following attributes:

- ``name``: the user's username
- ``exists``: ``True`` if the user exists, or ``False`` if they do not
- ``userid``: an integer ID representing the user
- ``blockinfo``: information about any current blocks on the user (``False`` if
no block, or a dict of ``{"by": blocking_user, "reason": block_reason,
"expiry": block_expire_time}``)
- ``groups``: a list of the user's groups
- ``rights``: a list of the user's rights
- ``editcount``: the number of edits made by the user
- ``registration``: the time the user registered as a ``time.struct_time``
- ``emailable``: ``True`` if you can email the user, ``False`` if you cannot
- ``gender``: the user's gender (``"male"``, ``"female"``, or ``"unknown"``)

and the following methods:

- ``reload()``: forcibly reload the user's attributes (emphasis on *reload* -
this is only necessary if there is reason to believe they have changed)
- ``get_userpage()``: returns a ``Page`` object representing the user's
userpage
- ``get_talkpage()``: returns a ``Page`` object representing the user's
talkpage

Additional features
^^^^^^^^^^^^^^^^^^^

Not all aspects of the toolset are covered here. Explore `its code and
docstrings`_ to learn how to use it in a more hands-on fashion. For reference,
``bot.wiki`` is an instance of ``earwigbot.wiki.SitesDB`` tied to the
``sites.db`` file in the bot's working directory.

Tips
----

- Logging_ is a fantastic way to track the bot's progress
- Logging_ is a fantastic way to monitor the bot's progress as it runs. It has
a slew of built-in loggers, and enabling log retention (so logs are saved to
``logs/`` in the working directory) is highly recommended. In the normal
setup, there are three log files, each of which "rotate" at a specific time
(``filename.log`` becomes ``filename.log.2012-04-10``, for example). The
``debug.log`` file rotates every hour, and maintains six hours of logs of
every level (``DEBUG`` and up). ``bot.log`` rotates every day at midnight,
and maintains seven days of non-debug logs (``INFO`` and up). Finally,
``error.log`` rotates every Sunday night, and maintains four weeks of logs
indicating unexpected events (``WARNING`` and up).

To use logging in your commands or tasks (recommended), ``BaseCommand`` and
``BaseTask`` provide ``logger`` attributes configured for the specific
command or task. If you're working with other classes, ``bot.logger`` is the
root logger (``logging.getLogger("earwigbot")`` by default), so you can use
``getChild`` to make your logger. For example, task loggers are essentially
``bot.logger.getChild("tasks").getChild(task.name)``.

- A very useful IRC command is "``!reload``", which reloads all commands and
tasks without restarting the bot. [3]_ Combined with using the `!git plugin`_
for pulling repositories from IRC, this can provide a seamless command/task
development workflow if the bot runs on an external server and you set up
its working directory as a git repo.

- You can run a task by itself instead of the entire bot with ``earwigbot
path/to/working/dir --task task_name``.

- Questions, comments, or suggestions about the documentation? `Let me know`_
so I can improve it for other people.

Footnotes
---------

@@ -250,6 +477,9 @@ Footnotes
ones generated by joins will only have ``chan``, ``nick``, ``ident``,
and ``host``.

.. [3] In reality, all this does is call ``bot.commands.load()`` and
``bot.tasks.load()``!

.. _EarwigBot: http://en.wikipedia.org/wiki/User:EarwigBot
.. _Python: http://python.org/
.. _Wikipedia: http://en.wikipedia.org/
@@ -264,6 +494,8 @@ Footnotes
.. _get pip: http://pypi.python.org/pypi/pip
.. _git flow: http://nvie.com/posts/a-successful-git-branching-model/
.. _explanation of YAML: http://en.wikipedia.org/wiki/YAML
.. _earwigbot.bot.Bot: https://github.com/earwig/earwigbot/blob/develop/earwigbot/bot.py
.. _earwigbot.config.BotConfig: https://github.com/earwig/earwigbot/blob/develop/earwigbot/config.py
.. _earwigbot.commands.BaseCommand: https://github.com/earwig/earwigbot/blob/develop/earwigbot/commands/__init__.py
.. _afc_status: https://github.com/earwig/earwigbot-plugins/blob/develop/commands/afc_status.py
.. _chanops: https://github.com/earwig/earwigbot/blob/develop/earwigbot/commands/chanops.py
@@ -271,5 +503,8 @@ Footnotes
.. _earwigbot.tasks.BaseTask: https://github.com/earwig/earwigbot/blob/develop/earwigbot/tasks/__init__.py
.. _wikiproject_tagger: https://github.com/earwig/earwigbot/blob/develop/earwigbot/tasks/wikiproject_tagger.py
.. _afc_statistics: https://github.com/earwig/earwigbot-plugins/blob/develop/tasks/afc_statistics.py
.. _its code and docstrings: https://github.com/earwig/earwigbot/tree/develop/earwigbot/wiki
.. _logging: http://docs.python.org/library/logging.html
.. _Let me know: ben.kurtovic@verizon.net
.. _!git plugin: https://github.com/earwig/earwigbot-plugins/blob/develop/commands/git.py
.. _ident: http://en.wikipedia.org/wiki/Ident

+ 4
- 4
earwigbot/irc/frontend.py Bestand weergeven

@@ -57,7 +57,7 @@ class Frontend(IRCConnection):
data.nick, data.ident, data.host = self.sender_regex.findall(line[0])[0]
data.chan = line[2]
data.parse_args()
self.bot.commands.check("join", data)
self.bot.commands.call("join", data)

elif line[1] == "PRIVMSG":
data.nick, data.ident, data.host = self.sender_regex.findall(line[0])[0]
@@ -69,13 +69,13 @@ class Frontend(IRCConnection):
# This is a privmsg to us, so set 'chan' as the nick of the
# sender, then check for private-only command hooks:
data.chan = data.nick
self.bot.commands.check("msg_private", data)
self.bot.commands.call("msg_private", data)
else:
# Check for public-only command hooks:
self.bot.commands.check("msg_public", data)
self.bot.commands.call("msg_public", data)

# Check for command hooks that apply to all messages:
self.bot.commands.check("msg", data)
self.bot.commands.call("msg", data)

elif line[0] == "PING": # If we are pinged, pong back
self.pong(line[1])


+ 2
- 2
earwigbot/managers.py Bestand weergeven

@@ -147,8 +147,8 @@ class CommandManager(_ResourceManager):
base = super(CommandManager, self)
base.__init__(bot, "commands", "Command", BaseCommand)

def check(self, hook, data):
"""Given an IRC event, check if there's anything we can respond to."""
def call(self, hook, data):
"""Given a hook type and a Data object, respond appropriately."""
self.lock.acquire()
for command in self._resources.itervalues():
if hook in command.hooks and command._wrap_check(data):


+ 1
- 1
earwigbot/wiki/category.py Bestand weergeven

@@ -35,7 +35,7 @@ class Category(Page):
because it accepts category names without the namespace prefix.

Public methods:
members -- returns a list of page titles in the category
get_members -- returns a list of page titles in the category
"""

def __repr__(self):


+ 13
- 9
earwigbot/wiki/page.py Bestand weergeven

@@ -38,19 +38,23 @@ class Page(CopyrightMixin):
about the page, getting page content, and so on. Category is a subclass of
Page with additional methods.

Attributes:
title -- the page's title, or pagename
exists -- whether the page exists
pageid -- an integer ID representing the page
url -- the page's URL
namespace -- the page's namespace as an integer
protection -- the page's current protection status
is_talkpage -- True if the page is a talkpage, else False
is_redirect -- True if the page is a redirect, else False

Public methods:
title -- returns the page's title, or pagename
exists -- returns whether the page exists
pageid -- returns an integer ID representing the page
url -- returns the page's URL
namespace -- returns the page's namespace as an integer
protection -- returns the page's current protection status
creator -- returns the page's creator (first user to edit)
is_talkpage -- returns True if the page is a talkpage, else False
is_redirect -- returns True if the page is a redirect, else False
reload -- forcibly reload the page's attributes
toggle_talk -- returns a content page's talk page, or vice versa
get -- returns page content
get_redirect_target -- if the page is a redirect, returns its destination
get_creator -- returns a User object representing the first person
to edit the page
edit -- replaces the page's content or creates a new page
add_section -- adds a new section at the bottom of the page
copyvio_check -- checks the page for copyright violations


+ 7
- 5
earwigbot/wiki/site.py Bestand weergeven

@@ -55,16 +55,18 @@ class Site(object):
instances, tools.add_site() for adding new ones to config, and
tools.del_site() for removing old ones from config, should suffice.

Attributes:
name -- the site's name (or "wikiid"), like "enwiki"
project -- the site's project name, like "wikipedia"
lang -- the site's language code, like "en"
domain -- the site's web domain, like "en.wikipedia.org"

Public methods:
name -- returns our name (or "wikiid"), like "enwiki"
project -- returns our project name, like "wikipedia"
lang -- returns our language code, like "en"
domain -- returns our web domain, like "en.wikipedia.org"
api_query -- does an API query with the given kwargs as params
sql_query -- does an SQL query and yields its results
get_replag -- returns the estimated database replication lag
namespace_id_to_name -- given a namespace ID, returns associated name(s)
namespace_name_to_id -- given a namespace name, returns associated id
namespace_name_to_id -- given a namespace name, returns the associated ID
get_page -- returns a Page object for the given title
get_category -- returns a Category object for the given title
get_user -- returns a User object for the given username


+ 13
- 10
earwigbot/wiki/user.py Bestand weergeven

@@ -36,17 +36,20 @@ class User(object):
information about the user, such as editcount and user rights, methods for
returning the user's userpage and talkpage, etc.

Attributes:
name -- the user's username
exists -- True if the user exists, or False if they do not
userid -- an integer ID representing the user
blockinfo -- information about any current blocks on the user
groups -- a list of the user's groups
rights -- a list of the user's rights
editcount -- the number of edits made by the user
registration -- the time the user registered as a time.struct_time
emailable -- True if you can email the user, False if you cannot
gender -- the user's gender ("male", "female", or "unknown")

Public methods:
name -- returns the user's username
exists -- returns True if the user exists, False if they do not
userid -- returns an integer ID representing the user
blockinfo -- returns information about a current block on the user
groups -- returns a list of the user's groups
rights -- returns a list of the user's rights
editcount -- returns the number of edits made by the user
registration -- returns the time the user registered as a time.struct_time
emailable -- returns True if you can email the user, False if you cannot
gender -- returns the user's gender ("male", "female", or "unknown")
reload -- forcibly reload the user's attributes
get_userpage -- returns a Page object representing the user's userpage
get_talkpage -- returns a Page object representing the user's talkpage
"""


Laden…
Annuleren
Opslaan