Limnoria-doc/develop/using_utils.rst
2014-01-20 09:31:22 +01:00

15 KiB

Using Supybot's utils module

Supybot provides a wealth of utilities for plugin writers in the supybot.utils module, this tutorial describes these utilities and shows you how to use them.

str.py

The Format Function

The supybot.utils.str module provides a bunch of utility functions for handling string values. This section contains a quick rundown of all of the functions available, along with descriptions of the arguments they take. First and foremost is the format function, which provides a lot of capability in just one function that uses string-formatting style to accomplish a lot. So much so that it gets its own section in this tutorial. All other functions will be in other sections. format takes several arguments - first, the format string (using the format characters described below), and then after that, each individual item to be formatted. Do not attempt to use the % operator to do the formatting because that will fall back on the normal string formatting operator. The format function uses the following string formatting characters.

  • % - literal %
  • i - integer
  • s - string
  • f - float
  • r - repr
  • b - form of the verb to be (takes an int)
  • h - form of the verb to have (takes an int)
  • L - commaAndify (takes a list of strings or a tuple of ([strings], and))
  • p - pluralize (takes a string)
  • q - quoted (takes a string)
  • n - n items (takes a 2-tuple of (n, item) or a 3-tuple of (n, between, item))
  • t - time, formatted (takes an int)
  • u - url, wrapped in braces

Here are a few examples to help elaborate on the above descriptions:

>>> format("Error %q has been reported %n.  For more information, see %u.",
           "AttributeError", (5, "time"), "http://supybot.com")

'Error "AttributeError" has been reported 5 times.  For more information,
 see <http://supybot.com>.'

>>> i = 4
>>> format("There %b %n at this time.  You are only allowed %n at any given
            time", i, (i, "active", "thread"), (5, "active", "thread"))
'There are 4 active threads at this time.  You are only allowed 5 active
 threads at any given time'

>>> i = 1
>>> format("There %b %n at this time.  You are only allowed %n at any given
            time", i, (i, "active", "thread"), (5, "active", "thread"))
 'There is 1 active thread at this time.  You are only allowed 5 active
  threads at any given time'

>>> ops = ["foo", "bar", "baz"]
>>> format("The following %n %h the %s capability: %L", (len(ops), "user"),
            len(ops), "op", ops)
'The following 3 users have the op capability: foo, bar, and baz'

As you can see, you can combine all sorts of combinations of formatting strings into one. In fact, that was the major motivation behind format. We have specific functions that you can use individually for each of those formatting types, but it became much easier just to use special formatting chars and the format function than concatenating a bunch of strings that were the result of other utils.str functions.

The Other Functions

These are the functions that can't be handled by format. They are sorted in what I perceive to be the general order of usefulness (and I'm leaving the ones covered by format for the next section).

  • ellipsisify(s, n) - Returns a shortened version of a string. Produces up to the first n chars at the nearest word boundary.
    • s: the string to be shortened
    • n: the number of characters to shorten it to
  • perlReToPythonRe(s) - Converts a Perl-style regexp (e.g., "/abcd/i" or "m/abcd/i") to an actual Python regexp (an re object)
    • s: the regexp string
  • perlReToReplacer(s) - converts a perl-style replacement regexp (eg, "s/foo/bar/g") to a Python function that performs such a replacement
    • s: the regexp string
  • dqrepr(s) - Returns a repr() of s guaranteed to be in double quotes. (Double Quote Repr)
    • s: the string to be double-quote repr()'ed
  • toBool(s) - Determines whether or not a string means True or False and returns the appropriate boolean value. True is any of "true", "on", "enable", "enabled", or "1". False is any of "false", "off", "disable", "disabled", or "0".
    • s: the string to determine the boolean value for
  • rsplit(s, sep=None, maxsplit=-1) - functionally the same as str.split in the Python standard library except splitting from the right instead of the left. Python 2.4 has str.rsplit (which this function defers to for those versions >= 2.4), but Python 2.3 did not.
    • s: the string to be split
    • sep: the separator to split on, defaults to whitespace
    • maxsplit: the maximum number of splits to perform, -1 splits all possible splits.
  • normalizeWhitespace(s) - reduces all multi-spaces in a string to a single space
    • s: the string to normalize
  • depluralize(s) - the opposite of pluralize
    • s: the string to depluralize
  • unCommaThe(s) - Takes a string of the form "foo, the" and turns it into "the foo"
    • s: string, the
  • distance(s, t) - computes the levenshtein distance (or "edit distance") between two strings
    • s: the first string
    • t: the second string
  • soundex(s, length=4) - computes the soundex for a given string
    • s: the string to compute the soundex for
    • length: the length of the soundex to generate
  • matchCase(s1, s2) - Matches the case of the first string in the second string.
    • s1: the first string
    • s2: the string which will be made to match the case of the first

The Commands Format Already Covers

These commands aren't necessary because you can achieve them more easily by using the format command, but they exist if you decide you want to use them anyway though it is greatly discouraged for general use.

  • commaAndify(seq, comma=",", And="and") - transforms a list of items into a comma separated list with an "and" preceding the last element. For example, ["foo", "bar", "baz"] becomes "foo, bar, and baz". Is smart enough to convert two-element lists to just "item1 and item2" as well.
    • seq: the sequence of items (don't have to be strings, but need to be 'str()'-able)
    • comma: the character to use to separate the list
    • And: the word to use before the last element
  • pluralize(s) - Returns the plural of a string. Put any exceptions to the general English rules of pluralization in the plurals dictionary in supybot.utils.str.
    • s: the string to pluralize
  • nItems(n, item, between=None) - returns a string that describes a given number of an item (with any string between the actual number and the item itself), handles pluralization with the pluralize function above. Note that the arguments here are in a different order since between is optional.
    • n: the number of items
    • item: the type of item
    • between: the optional string that goes between the number and the type of item
  • quoted(s) - Returns the string surrounded by double-quotes.
    • s: the string to quote
  • be(i) - Returns the proper form of the verb "to be" based on the number provided (be(1) is "is", be(anything else) is "are")
    • i: the number of things that "be"
  • has(i) - Returns the proper form of the verb "to have" based on the number provided (has(1) is "has", has(anything else) is "have")
    • i: the number of things that "has"

structures.py

Intro

This module provides a number of useful data structures that aren't found in the standard Python library. For the most part they were created as needed for the bot and plugins themselves, but they were created in such a way as to be of general use for anyone who needs a data structure that performs a like duty. As usual in this document, I'll try and order these in order of usefulness, starting with the most useful.

The queue classes

The structures module provides two general-purpose queue classes for you to use. The "queue" class is a robust full-featured queue that scales up to larger sized queues. The "smallqueue" class is for queues that will contain fewer (less than 1000 or so) items. Both offer the same common interface, which consists of:

  • a constructor which will optionally accept a sequence to start the queue off with
  • enqueue(item) - adds an item to the back of the queue
  • dequeue() - removes (and returns) the item from the front of the queue
  • peek() - returns the item from the front of the queue without removing it
  • reset() - empties the queue entirely

In addition to these general-use queue classes, there are two other more specialized queue classes as well. The first is the "TimeoutQueue" which holds a queue of items until they reach a certain age and then they are removed from the queue. It features the following:

  • TimeoutQueue(timeout, queue=None) - you must specify the timeout (in seconds) in the constructor. Note that you can also optionally pass it a queue which uses any implementation you wish to use whether it be one of the above (queue or smallqueue) or if it's some custom queue you create that implements the same interface. If you don't pass it a queue instance to use, it will build its own using smallqueue.
    • reset(), enqueue(item), dequeue() - all same as above queue classes
    • setTimeout(secs) - allows you to change the timeout value

And for the final queue class, there's the "MaxLengthQueue" class. As you may have guessed, it's a queue that is capped at a certain specified length. It features the following:

  • MaxLengthQueue(length, seq=()) - the constructor naturally requires that you set the max length and it allows you to optionally pass in a sequence to be used as the starting queue. The underlying implementation is actually the queue from before.
    • enqueue(item) - adds an item onto the back of the queue and if it would push it over the max length, it dequeues the item on the front (it does not return this item to you)
    • all the standard methods from the queue class are inherited for this class

The Other Structures

The most useful of the other structures is actually very similar to the "MaxLengthQueue". It's the "RingBuffer", which is essentially a MaxLengthQueue which fills up to its maximum size and then circularly replaces the old contents as new entries are added instead of dequeuing. It features the following:

  • RingBuffer(size, seq=()) - as with the MaxLengthQueue you specify the size of the RingBuffer and optionally give it a sequence.
    • append(item) - adds item to the end of the buffer, pushing out an item from the front if necessary
    • reset() - empties out the buffer entirely
    • resize(i) - shrinks/expands the RingBuffer to the size provided
    • extend(seq) - append the items from the provided sequence onto the end of the RingBuffer

The next data structure is the TwoWayDictionary, which as the name implies is a dictionary in which key-value pairs have mappings going both directions. It features the following:

  • TwoWayDictionary(seq=(), **kwargs) - Takes an optional sequence of (key, value) pairs as well as any key=value pairs specified in the constructor as initial values for the two-way dict.
    • other than that, no extra features that a normal Python dict doesn't already offer with the exception that any (key, val) pair added to the dict is also added as (val, key) as well, so the mapping goes both ways. Elements are still accessed the same way you always do with Python 'dict's.

There is also a MultiSet class available, but it's very unlikely that it will serve your purpose, so I won't go into it here. The curious coder can go check the source and see what it's all about if they wish (it's only used once in our code, in the Relay plugin).

web.py

The web portion of Supybot's utils module is mainly used for retrieving data from websites but it also has some utility functions pertaining to HTML and email text as well. The functions in web are listed below, once again in order of usefulness.

  • getUrl(url, size=None, headers=None) - gets the data at the URL provided and returns it as one large string
    • url: the location of the data to be retrieved or a urllib2.Request object to be used in the retrieval
    • size: the maximum number of bytes to retrieve, defaults to None, meaning that it is to try to retrieve all data
    • headers: a dictionary mapping header types to header data
  • getUrlFd(url, headers=None) - returns a file-like object for a url
    • url: the location of the data to be retrieved or a urllib2.Request object to be used in the retrieval
    • headers: a dictionary mapping header types to header data
  • htmlToText(s, tagReplace=" ") - strips out all tags in a string of HTML, replacing them with the specified character
    • s: the HTML text to strip the tags out of
    • tagReplace: the string to replace tags with
  • strError(e) - pretty-printer for web exceptions, returns a descriptive string given a web-related exception
    • e: the exception to pretty-print
  • mungeEmail(s) - a naive e-mail obfuscation function, replaces "@" with "AT" and "." with "DOT"
    • s: the e-mail address to obfuscate
  • getDomain(url) - returns the domain of a URL
    • url: the URL in question

The Best of the Rest

Intro

Rather than document each of the remaining portions of the supybot.utils module, I've elected to just pick out the choice bits from specific parts and document those instead. Here they are, broken out by module name.

supybot.utils.file - file utilities

  • touch(filename) - updates the access time of a file by opening it for writing and immediately closing it
  • mktemp(suffix="") - creates a decent random string, suitable for a temporary filename with the given suffix, if provided
  • the AtomicFile class - used for files that need to be atomically written, i.e., if there's a failure the original file remains unmodified. For more info consult file.py in src/utils

supybot.utils.gen - general utilities

  • timeElapsed(elapsed, [lots of optional args]) - given the number of seconds elapsed, returns a string with the English description of the amount of time passed, consult gen.py in src/utils for the exact argument list and documentation if you feel you could use this function.
  • exnToString(e) - improved exception-to-string function. Provides nicer output than a simple str(e).
  • InsensitivePreservingDict class - a dict class that is case-insensitive when accessing keys

supybot.utils.iter - iterable utilities

  • len(iterable) - returns the length of a given iterable
  • groupby(key, iterable) - equivalent to the itertools.groupby function available as of Python 2.4. Provided for backwards compatibility.
  • any(p, iterable) - Returns true if any element in the iterable satisfies the predicate p
  • all(p, iterable) - Returns true if all elements in the iterable satisfy the predicate p
  • choice(iterable) - Returns a random element from the iterable