Limnoria/docs/USING_UTILS

Using Supybot's utils module
----------------------------
Supybot provides a wealth of utilities for plugin writers in the supybot.utils
module, this tutorial describes these utilities and shows you how to use them.

str.py
======
The Format Function

The supybot.utils.str module provides a bunch of utility functions for
handling string values. This section contains a quick rundown of all of the
functions available, along with descriptions of the arguments they take. First
and foremost is the format function, which provides a lot of capability in
just one function that uses string-formatting style to accomplish a lot. So
much so that it gets its own section in this tutorial. All other functions
will be in other sections. format takes several arguments - first, the format
string (using the format characters described below), and then after that,
each individual item to be formatted. Do not attempt to use the % operator to
do the formatting because that will fall back on the normal string formatting
operator. The format function uses the following string formatting characters.

    * % - literal "%"
    * i - integer
    * s - string
    * f - float
    * r - repr
    * b - form of the verb "to be" (takes an int)
    * h - form of the verb "to have" (takes an int)
    * L - commaAndify (takes a list of strings or a tuple of ([strings], and))
    * p - pluralize (takes a string)
    * q - quoted (takes a string)
    * n - n items (takes a 2-tuple of (n, item) or a 3-tuple of (n, between, item))
    * t - time, formatted (takes an int)
    * u - url, wrapped in braces

Here are a few examples to help elaborate on the above descriptions:

  >>> format("Error %q has been reported %n.  For more information, see %u.",
             "AttributeError", (5, "time"), "http://supybot.com")

  'Error "AttributeError" has been reported 5 times.  For more information,
   see <http://supybot.com>.'

  >>> i = 4
  >>> format("There %b %n at this time.  You are only allowed %n at any given
              time", i, (i, "active", "thread"), (5, "active", "thread"))
  'There are 4 active threads at this time.  You are only allowed 5 active
   threads at any given time'

  >>> i = 1
  >>> format("There %b %n at this time.  You are only allowed %n at any given
              time", i, (i, "active", "thread"), (5, "active", "thread"))
   'There is 1 active thread at this time.  You are only allowed 5 active
    threads at any given time'

  >>> ops = ["foo", "bar", "baz"]
  >>> format("The following %n %h the %s capability: %L", (len(ops), "user"),
              len(ops), "op", ops)
  'The following 3 users have the op capability: foo, bar, and baz'

As you can see, you can combine all sorts of combinations of formatting
strings into one. In fact, that was the major motivation behind format. We
have specific functions that you can use individually for each of those
formatting types, but it became much easier just to use special formatting
chars and the format function than concatenating a bunch of strings that were
the result of other utils.str functions.

The Other Functions

These are the functions that can't be handled by format. They are sorted in
what I perceive to be the general order of usefulness (and I'm leaving the
ones covered by format for the next section).

    * ellipsisify(s, n) - Returns a shortened version of a string. Produces up
    to the first n chars at the nearest word boundary.
          - s: the string to be shortened
          - n: the number of characters to shorten it to

    * perlReToPythonRe(s) - Converts a Perl-style regexp (e.g., "/abcd/i" or
    "m/abcd/i") to an actual Python regexp (an re object)
          - s: the regexp string

    * perlReToReplacer(s) - converts a perl-style replacement regexp (eg,
    "s/foo/bar/g") to a Python function that performs such a replacement
          - s: the regexp string

    * dqrepr(s) - Returns a repr() of s guaranteed to be in double quotes.
    (Double Quote Repr)
          - s: the string to be double-quote repr()'ed

    * toBool(s) - Determines whether or not a string means True or False and
    returns the appropriate boolean value. True is any of "true", "on",
    "enable", "enabled", or "1". False is any of "false", "off", "disable",
    "disabled", or "0".
          - s: the string to determine the boolean value for

    * rsplit(s, sep=None, maxsplit=-1) - functionally the same as str.split in
    the Python standard library except splitting from the right instead of the
    left. Python 2.4 has str.rsplit (which this function defers to for those
    versions >= 2.4), but Python 2.3 did not.
          - s: the string to be split
          - sep: the separator to split on, defaults to whitespace
          - maxsplit: the maximum number of splits to perform, -1 splits all
                      possible splits.

    * normalizeWhitespace(s) - reduces all multi-spaces in a string to a
    single space
          - s: the string to normalize

    * depluralize(s) - the opposite of pluralize
          - s: the string to depluralize

    * unCommaThe(s) - Takes a string of the form "foo, the" and turns it into
    "the foo"
          - s: string, the

    * distance(s, t) - computes the levenshtein distance (or "edit distance")
    between two strings
          - s: the first string
          - t: the second string

    * soundex(s, length=4) - computes the soundex for a given string
          - s: the string to compute the soundex for
          - length: the length of the soundex to generate

    * matchCase(s1, s2) - Matches the case of the first string in the second
    string.
          - s1: the first string
          - s2: the string which will be made to match the case of the first

The Commands Format Already Covers

These commands aren't necessary because you can achieve them more easily by
using the format command, but they exist if you decide you want to use them
anyway though it is greatly discouraged for general use.

    * commaAndify(seq, comma=",", And="and") - transforms a list of items into
    a comma separated list with an "and" preceding the last element. For
    example, ["foo", "bar", "baz"] becomes "foo, bar, and baz". Is smart
    enough to convert two-element lists to just "item1 and item2" as well.
          - seq: the sequence of items (don't have to be strings, but need to
                be 'str()'-able)
          - comma: the character to use to separate the list
          - And: the word to use before the last element

    * pluralize(s) - Returns the plural of a string. Put any exceptions to the
    general English rules of pluralization in the plurals dictionary in
    supybot.utils.str.
          - s: the string to pluralize

    * nItems(n, item, between=None) - returns a string that describes a given
    number of an item (with any string between the actual number and the item
    itself), handles pluralization with the pluralize function above. Note
    that the arguments here are in a different order since between is
    optional.
          - n: the number of items
          - item: the type of item
          - between: the optional string that goes between the number and the
                     type of item

    * quoted(s) - Returns the string surrounded by double-quotes.
          - s: the string to quote

    * be(i) - Returns the proper form of the verb "to be" based on the number
    provided (be(1) is "is", be(anything else) is "are")
          - i: the number of things that "be"

    * has(i) - Returns the proper form of the verb "to have" based on the
    number provided (has(1) is "has", has(anything else) is "have")
          - i: the number of things that "has"

structures.py
=============
Intro

This module provides a number of useful data structures that aren't found in
the standard Python library. For the most part they were created as needed for
the bot and plugins themselves, but they were created in such a way as to be
of general use for anyone who needs a data structure that performs a like
duty. As usual in this document, I'll try and order these in order of
usefulness, starting with the most useful.

The queue classes

The structures module provides two general-purpose queue classes for you to
use. The "queue" class is a robust full-featured queue that scales up to
larger sized queues. The "smallqueue" class is for queues that will contain
fewer (less than 1000 or so) items. Both offer the same common interface,
which consists of:

    * a constructor which will optionally accept a sequence to start the queue
      off with
    * enqueue(item) - adds an item to the back of the queue
    * dequeue() - removes (and returns) the item from the front of the queue
    * peek() - returns the item from the front of the queue without removing
               it
    * reset() - empties the queue entirely

In addition to these general-use queue classes, there are two other more
specialized queue classes as well. The first is the "TimeoutQueue" which holds
a queue of items until they reach a certain age and then they are removed from
the queue. It features the following:

    * TimeoutQueue(timeout, queue=None) - you must specify the timeout (in
    seconds) in the constructor. Note that you can also optionally pass it a
    queue which uses any implementation you wish to use whether it be one of
    the above (queue or smallqueue) or if it's some custom queue you create
    that implements the same interface. If you don't pass it a queue instance
    to use, it will build its own using smallqueue.
    * reset(), enqueue(item), dequeue() - all same as above queue classes
    * setTimeout(secs) - allows you to change the timeout value

And for the final queue class, there's the "MaxLengthQueue" class. As you may
have guessed, it's a queue that is capped at a certain specified length. It
features the following:

    * MaxLengthQueue(length, seq=()) - the constructor naturally requires that
    you set the max length and it allows you to optionally pass in a sequence
    to be used as the starting queue. The underlying implementation is
    actually the queue from before.
    * enqueue(item) - adds an item onto the back of the queue and if it would
    push it over the max length, it dequeues the item on the front (it does
    not return this item to you)
    * all the standard methods from the queue class are inherited for this class

The Other Structures

The most useful of the other structures is actually very similar to the
"MaxLengthQueue". It's the "RingBuffer", which is essentially a MaxLengthQueue
which fills up to its maximum size and then circularly replaces the old
contents as new entries are added instead of dequeuing.  It features the
following:

    * RingBuffer(size, seq=()) - as with the MaxLengthQueue you specify the
    size of the RingBuffer and optionally give it a sequence.
    * append(item) - adds item to the end of the buffer, pushing out an item
    from the front if necessary
    * reset() - empties out the buffer entirely
    * resize(i) - shrinks/expands the RingBuffer to the size provided
    * extend(seq) - append the items from the provided sequence onto the end
    of the RingBuffer

The next data structure is the TwoWayDictionary, which as the name implies is
a dictionary in which key-value pairs have mappings going both directions. It
features the following:

    * TwoWayDictionary(seq=(), **kwargs) - Takes an optional sequence of (key,
    value) pairs as well as any key=value pairs specified in the constructor
    as initial values for the two-way dict.
    * other than that, no extra features that a normal Python dict doesn't
    already offer with the exception that any (key, val) pair added to the
    dict is also added as (val, key) as well, so the mapping goes both ways.
    Elements are still accessed the same way you always do with Python
    'dict's.

There is also a MultiSet class available, but it's very unlikely that it will
serve your purpose, so I won't go into it here. The curious coder can go check
the source and see what it's all about if they wish (it's only used once in our
code, in the Relay plugin).

web.py
======
The web portion of Supybot's utils module is mainly used for retrieving data
from websites but it also has some utility functions pertaining to HTML and
email text as well. The functions in web are listed below, once again in order
of usefulness.

    * getUrl(url, size=None, headers=None) - gets the data at the URL provided
    and returns it as one large string
          - url: the location of the data to be retrieved or a urllib2.Request
                 object to be used in the retrieval
          - size: the maximum number of bytes to retrieve, defaults to None,
                  meaning that it is to try to retrieve all data
          - headers: a dictionary mapping header types to header data

    * getUrlFd(url, headers=None) - returns a file-like object for a url
          - url: the location of the data to be retrieved or a urllib2.Request
                 object to be used in the retrieval
          - headers: a dictionary mapping header types to header data

    * htmlToText(s, tagReplace=" ") - strips out all tags in a string of HTML,
    replacing them with the specified character
          - s: the HTML text to strip the tags out of
          - tagReplace: the string to replace tags with

    * strError(e) - pretty-printer for web exceptions, returns a descriptive
    string given a web-related exception
          - e: the exception to pretty-print

    * mungeEmail(s) - a naive e-mail obfuscation function, replaces "@" with
    "AT" and "." with "DOT"
          - s: the e-mail address to obfuscate

    * getDomain(url) - returns the domain of a URL
          - url: the URL in question

The Best of the Rest
====================
  Highlights the most useful of the remaining functionality in supybot.utils

Intro

Rather than document each of the remaining portions of the supybot.utils
module, I've elected to just pick out the choice bits from specific parts and
document those instead. Here they are, broken out by module name.

supybot.utils.file - file utilities

    * touch(filename) - updates the access time of a file by opening it for
                        writing and immediately closing it
    * mktemp(suffix="") - creates a decent random string, suitable for a
                          temporary filename with the given suffix, if
                          provided
    * the AtomicFile class - used for files that need to be atomically
                             written, i.e., if there's a failure the original
                             file remains unmodified. For more info consult
                             file.py in src/utils

supybot.utils.gen - general utilities

    * timeElapsed(elapsed, [lots of optional args]) - given the number of
        seconds elapsed, returns a string with the English description of the
        amount of time passed, consult gen.py in src/utils for the exact
        argument list and documentation if you feel you could use this
        function.
    * exnToString(e) - improved exception-to-string function. Provides nicer
                       output than a simple str(e).
    * InsensitivePreservingDict class - a dict class that is case-insensitive
                                        when accessing keys

supybot.utils.iter - iterable utilities

    * len(iterable) - returns the length of a given iterable
    * groupby(key, iterable) - equivalent to the itertools.groupby function
                               available as of Python 2.4. Provided for
                               backwards compatibility.
    * any(p, iterable) - Returns true if any element in the iterable satisfies
                         the predicate p
    * all(p, iterable) - Returns true if all elements in the iterable satisfy
                         the predicate p
    * choice(iterable) - Returns a random element from the iterable