Personal website https://benkurtovic.com/
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

2012-01-29-earwigbot-and-toolserver-updates.md 5.0 KiB

преди 12 години
преди 12 години
преди 10 години
преди 12 години
преди 10 години
преди 12 години
преди 10 години
преди 12 години
преди 10 години
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091
  1. ---
  2. layout: post
  3. title: EarwigBot and Toolserver Updates
  4. tags: ["Wikipedia", "Status report"]
  5. description: More progress on EarwigBot and the Toolserver site rewrite, including dynamic backgrounds
  6. ---
  7. Haven't really said much in a while, so I felt it appropriate to make a new
  8. blog post. No, I'm not dead, and yes, I _am_ busy working on my Wikipedia
  9. responsibilities, including EarwigBot and my Toolserver site (which I'll go
  10. into detail about in a bit). Progress is not as quick now as it was over the
  11. summer, and you can blame school for the delay in getting things done. My
  12. primary to-do list for Wikipedia right now looks like this:
  13. 1. Finish copyvio detection in the new EarwigBot
  14. 2. Integrate EarwigBot's copyvio detection with the new Toolserver site
  15. \#1 has a good portion of its work done, but I still have to finish the actual
  16. detection. I've isolated the work down to a few methods of
  17. `earwigbot.wiki.copyright.CopyrightMixin`: `_copyvio_strip_html()`, to
  18. extract the "text" (i.e. content inside <p> tags) from an HTML document
  19. (I'll probably use something like
  20. [Beautiful Soup](http://www.crummy.com/software/BeautifulSoup/) for this);
  21. `_copyvio_strip_article()`, to extract the "text" from an article (that is,
  22. stripping templates, quotes, references); and `_copyvio_chunk_article()`, to
  23. divide a stripped article into a list of web-searchable queries. Everything
  24. else, including the Task class for `afc_copyvios` is done.
  25. \#2 is very simple once #1 is done. I've already written the code to load
  26. EarwigBot's wiki toolset from `copyvios.mako`, and the config file is written,
  27. so running the detector is trivial once it works. The only thing left here is
  28. to have the tool produce relatively eye-pleasing output, perhaps with a
  29. "details" section showing the Markov chains formed from the two sources and
  30. comparing them visually. Not necessary at all, but a nice touch.
  31. Unfortunately, there's still a bit more work to do on EarwigBot before he's
  32. ready for his first release (0.1!). Aside from the copyvio stuff above, which
  33. is integrated directly as a function of `Page`, I want to finish porting over
  34. the remaining tasks from old EarwigBot that are still running via cron, improve
  35. the Wiki Toolset such that new sites can be added programmatically, and improve
  36. config such that it can be created by the bot and not only by hand. This is the
  37. main barrier stopping other people from running EarwigBot, and thus the primary
  38. concern before v0.1 is good. Of course, none of this urgent; getting copyvio
  39. detection finished is my primary concern.
  40. ## Dynamic Backgrounds
  41. Now that that's covered, let's look at something (mostly unrelated) I finished
  42. a couple days ago: dynamic backgrounds for
  43. [my new toolserver site](http://toolserver.org/~earwig/rewrite)! You can see it
  44. in action a bit better on [this page](http://toolserver.org/~earwig/earwigbot).
  45. The background is the the [Wikimedia Commons](//commons.wikimedia.org/)
  46. [Picture of the Day](//commons.wikimedia.org/wiki/Commons:Picture_of_the_day),
  47. loaded and displayed with JavaScript.
  48. [Here's the code for it](//github.com/earwig/toolserver/blob/master/static/js/potd.js),
  49. a good deal more code than I had expected to write.
  50. Here's what we have to do:
  51. 1. Query Commons' API for the content of the template \{\{Potd/YYY-MM-DD}}.
  52. 2. Parse that content for the filename of the image, which will be hidden in
  53. something like \{\{Potd filename|1=Foo.png}}.
  54. 3. Query Commons' API again for Foo.png's URL and dimensions.
  55. 4. Since we want the image to "cover" the background (that is, be the smallest
  56. size possible while leaving none of the background color visible), we need
  57. to calculate the image's aspect ratio and our own aspect ratio, then
  58. determine the width of the thumbnail we want. If the image is shorter than
  59. our screen, the necessary width is our screen's width, but if the image is
  60. longer than our screen, the necessary width is our screen's height
  61. multiplied by the image's aspect ratio.
  62. 5. If the width of our desired thumbnail is less than the width of the image,
  63. we'll alter the image's URL (insert a `/thumb/` in the middle somewhere and
  64. add a resolution at the end) and set that as our `body`'s `background-image`
  65. property. This is better than loading the full image and downscaling it,
  66. because less bandwidth is required.
  67. 6. If the width of our desired thumbnail is _greater_ than (or equal to) the
  68. width of the image, we load the full image and upscale it (gasp! horror!)
  69. using the CSS bit `background-size: cover;`.
  70. All of the images I tested looked decent when displayed under this method, some
  71. better than others, but all acceptable. I figured this code provided a nice
  72. touch to an otherwise drab webpage (like the one you're viewing now, it
  73. wouldn't have been very pretty), which is why I did it, but I couldn't help but
  74. wonder if there was an... easier... method that still saved bandwidth and
  75. didn't resort to ugly scaling/cropping/repeating/whatever, but I could come up
  76. with nothing. It was a fun project in a language I almost never use, though, so
  77. worth it in the end.
  78. That's all for now!
  79. : —earwig