layout | title | description |
---|---|---|
post | EarwigBot and Toolserver Updates | More progress on EarwigBot and the Toolserver site rewrite, including dynamic backgrounds. |
Haven’t really said much in a while, so I felt it appropriate to make a new blog post. No, I’m not dead, and yes, I am busy working on my Wikipedia responsibilities, including EarwigBot and my Toolserver site (which I’ll go into detail about in a bit). Progress is not as quick now as it was over the summer, and you can blame school for the delay in getting things done. My primary to-do list for Wikipedia right now looks like this:
#1 has a good portion of its work done, but I still have to finish the actual
detection. I’ve isolated the work down to a few methods of
earwigbot.wiki.copyright.CopyrightMixin
: _copyvio_strip_html()
, to
extract the “text” (i.e. content inside <p> tags) from an HTML document
(I’ll probably use something like
Beautiful Soup for this);
_copyvio_strip_article()
, to extract the “text” from an article (that is,
stripping templates, quotes, references); and _copyvio_chunk_article()
, to
divide a stripped article into a list of web-searchable queries. Everything
else, including the Task class for afc_copyvios
is done.
#2 is very simple once #1 is done. I’ve already written the code to load
EarwigBot’s wiki toolset from copyvios.mako
, and the config file is written,
so running the detector is trivial once it works. The only thing left here is
to have the tool produce relatively eye-pleasing output, perhaps with a
“details” section showing the Markov chains formed from the two sources and
comparing them visually. Not necessary at all, but a nice touch.
Unfortunately, there’s still a bit more work to do on EarwigBot before he’s
ready for his first release (0.1!). Aside from the copyvio stuff above, which
is integrated directly as a function of Page
, I want to finish porting over
the remaining tasks from old EarwigBot that are still running via cron, improve
the Wiki Toolset such that new sites can be added programmatically, and improve
config such that it can be created by the bot and not only by hand. This is the
main barrier stopping other people from running EarwigBot, and thus the primary
concern before v0.1 is good. Of course, none of this urgent; getting copyvio
detection finished is my primary concern.
Now that that’s covered, let’s look at something (mostly unrelated) I finished a couple days ago: dynamic backgrounds for my new toolserver site! You can see it in action a bit better on this page. The background is the the Wikimedia Commons Picture of the Day, loaded and displayed with JavaScript. Here’s the code for it, a good deal more code than I had expected to write.
Here’s what we have to do:
/thumb/
in the middle somewhere and
add a resolution at the end) and set that as our body
's background-image
property. This is better than loading the full image and downscaling it,
because less bandwidth is required.background-size: cover;
.All of the images I tested looked decent when displayed under this method, some better than others, but all acceptable. I figured this code provided a nice touch to an otherwise drab webpage (like the one you’re viewing now, it wouldn’t have been very pretty), which is why I did it, but I couldn’t help but wonder if there was an... easier... method that still saved bandwidth and didn’t resort to ugly scaling/cropping/repeating/whatever, but I could come up with nothing. It was a fun project in a language I almost never use, though, so worth it in the end.
—earwig