mirror of
https://github.com/Mikaela/Limnoria.git
synced 2025-01-04 17:22:38 +01:00
344 lines
16 KiB
Plaintext
344 lines
16 KiB
Plaintext
|
Using Supybot's utils module
|
||
|
----------------------------
|
||
|
Supybot provides a wealth of utilities for plugin writers in the supybot.utils
|
||
|
module, this tutorial describes these utilities and shows you how to use them.
|
||
|
|
||
|
str.py
|
||
|
======
|
||
|
The Format Function
|
||
|
|
||
|
The supybot.utils.str module provides a bunch of utility functions for
|
||
|
handling string values. This section contains a quick rundown of all of the
|
||
|
functions available, along with descriptions of the arguments they take. First
|
||
|
and foremost is the format function, which provides a lot of capability in
|
||
|
just one function that uses string-formatting style to accomplish a lot. So
|
||
|
much so that it gets its own section in this tutorial. All other functions
|
||
|
will be in other sections. format takes several arguments - first, the format
|
||
|
string (using the format characters described below), and then after that,
|
||
|
each individual item to be formatted. Do not attempt to use the % operator to
|
||
|
do the formatting because that will fall back on the normal string formatting
|
||
|
operator. The format function uses the following string formatting characters.
|
||
|
|
||
|
* % - literal "%"
|
||
|
* i - integer
|
||
|
* s - string
|
||
|
* f - float
|
||
|
* r - repr
|
||
|
* b - form of the verb "to be" (takes an int)
|
||
|
* h - form of the verb "to have" (takes an int)
|
||
|
* L - commaAndify (takes a list of strings or a tuple of ([strings], and))
|
||
|
* p - pluralize (takes a string)
|
||
|
* q - quoted (takes a string)
|
||
|
* n - n items (takes a 2-tuple of (n, item) or a 3-tuple of (n, between, item))
|
||
|
* t - time, formatted (takes an int)
|
||
|
* u - url, wrapped in braces
|
||
|
|
||
|
Here are a few examples to help elaborate on the above descriptions:
|
||
|
|
||
|
>>> format("Error %q has been reported %n. For more information, see %u.",
|
||
|
"AttributeError", (5, "time"), "http://supybot.com")
|
||
|
|
||
|
'Error "AttributeError" has been reported 5 times. For more information,
|
||
|
see <http://supybot.com>.'
|
||
|
|
||
|
>>> i = 4
|
||
|
>>> format("There %b %n at this time. You are only allowed %n at any given
|
||
|
time", i, (i, "active", "thread"), (5, "active", "thread"))
|
||
|
'There are 4 active threads at this time. You are only allowed 5 active
|
||
|
threads at any given time'
|
||
|
|
||
|
>>> i = 1
|
||
|
>>> format("There %b %n at this time. You are only allowed %n at any given
|
||
|
time", i, (i, "active", "thread"), (5, "active", "thread"))
|
||
|
'There is 1 active thread at this time. You are only allowed 5 active
|
||
|
threads at any given time'
|
||
|
|
||
|
>>> ops = ["foo", "bar", "baz"]
|
||
|
>>> format("The following %n %h the %s capability: %L", (len(ops), "user"),
|
||
|
len(ops), "op", ops)
|
||
|
'The following 3 users have the op capability: foo, bar, and baz'
|
||
|
|
||
|
As you can see, you can combine all sorts of combinations of formatting
|
||
|
strings into one. In fact, that was the major motivation behind format. We
|
||
|
have specific functions that you can use individually for each of those
|
||
|
formatting types, but it became much easier just to use special formatting
|
||
|
chars and the format function than concatenating a bunch of strings that were
|
||
|
the result of other utils.str functions.
|
||
|
|
||
|
The Other Functions
|
||
|
|
||
|
These are the functions that can't be handled by format. They are sorted in
|
||
|
what I perceive to be the general order of usefulness (and I'm leaving the
|
||
|
ones covered by format for the next section).
|
||
|
|
||
|
* ellipsisify(s, n) - Returns a shortened version of a string. Produces up
|
||
|
to the first n chars at the nearest word boundary.
|
||
|
- s: the string to be shortened
|
||
|
- n: the number of characters to shorten it to
|
||
|
|
||
|
* perlReToPythonRe(s) - Converts a Perl-style regexp (e.g., "/abcd/i" or
|
||
|
"m/abcd/i") to an actual Python regexp (an re object)
|
||
|
- s: the regexp string
|
||
|
|
||
|
* perlReToReplacer(s) - converts a perl-style replacement regexp (eg,
|
||
|
"s/foo/bar/g") to a Python function that performs such a replacement
|
||
|
- s: the regexp string
|
||
|
|
||
|
* dqrepr(s) - Returns a repr() of s guaranteed to be in double quotes.
|
||
|
(Double Quote Repr)
|
||
|
- s: the string to be double-quote repr()'ed
|
||
|
|
||
|
* toBool(s) - Determines whether or not a string means True or False and
|
||
|
returns the appropriate boolean value. True is any of "true", "on",
|
||
|
"enable", "enabled", or "1". False is any of "false", "off", "disable",
|
||
|
"disabled", or "0".
|
||
|
- s: the string to determine the boolean value for
|
||
|
|
||
|
* rsplit(s, sep=None, maxsplit=-1) - functionally the same as str.split in
|
||
|
the Python standard library except splitting from the right instead of the
|
||
|
left. Python 2.4 has str.rsplit (which this function defers to for those
|
||
|
versions >= 2.4), but Python 2.3 did not.
|
||
|
- s: the string to be split
|
||
|
- sep: the separator to split on, defaults to whitespace
|
||
|
- maxsplit: the maximum number of splits to perform, -1 splits all
|
||
|
possible splits.
|
||
|
|
||
|
* normalizeWhitespace(s) - reduces all multi-spaces in a string to a
|
||
|
single space
|
||
|
- s: the string to normalize
|
||
|
|
||
|
* depluralize(s) - the opposite of pluralize
|
||
|
- s: the string to depluralize
|
||
|
|
||
|
* unCommaThe(s) - Takes a string of the form "foo, the" and turns it into
|
||
|
"the foo"
|
||
|
- s: string, the
|
||
|
|
||
|
* distance(s, t) - computes the levenshtein distance (or "edit distance")
|
||
|
between two strings
|
||
|
- s: the first string
|
||
|
- t: the second string
|
||
|
|
||
|
* soundex(s, length=4) - computes the soundex for a given string
|
||
|
- s: the string to compute the soundex for
|
||
|
- length: the length of the soundex to generate
|
||
|
|
||
|
* matchCase(s1, s2) - Matches the case of the first string in the second
|
||
|
string.
|
||
|
- s1: the first string
|
||
|
- s2: the string which will be made to match the case of the first
|
||
|
|
||
|
The Commands Format Already Covers
|
||
|
|
||
|
These commands aren't necessary because you can achieve them more easily by
|
||
|
using the format command, but they exist if you decide you want to use them
|
||
|
anyway though it is greatly discouraged for general use.
|
||
|
|
||
|
* commaAndify(seq, comma=",", And="and") - transforms a list of items into
|
||
|
a comma separated list with an "and" preceding the last element. For
|
||
|
example, ["foo", "bar", "baz"] becomes "foo, bar, and baz". Is smart
|
||
|
enough to convert two-element lists to just "item1 and item2" as well.
|
||
|
- seq: the sequence of items (don't have to be strings, but need to
|
||
|
be 'str()'-able)
|
||
|
- comma: the character to use to separate the list
|
||
|
- And: the word to use before the last element
|
||
|
|
||
|
* pluralize(s) - Returns the plural of a string. Put any exceptions to the
|
||
|
general English rules of pluralization in the plurals dictionary in
|
||
|
supybot.utils.str.
|
||
|
- s: the string to pluralize
|
||
|
|
||
|
* nItems(n, item, between=None) - returns a string that describes a given
|
||
|
number of an item (with any string between the actual number and the item
|
||
|
itself), handles pluralization with the pluralize function above. Note
|
||
|
that the arguments here are in a different order since between is
|
||
|
optional.
|
||
|
- n: the number of items
|
||
|
- item: the type of item
|
||
|
- between: the optional string that goes between the number and the
|
||
|
type of item
|
||
|
|
||
|
* quoted(s) - Returns the string surrounded by double-quotes.
|
||
|
- s: the string to quote
|
||
|
|
||
|
* be(i) - Returns the proper form of the verb "to be" based on the number
|
||
|
provided (be(1) is "is", be(anything else) is "are")
|
||
|
- i: the number of things that "be"
|
||
|
|
||
|
* has(i) - Returns the proper form of the verb "to have" based on the
|
||
|
number provided (has(1) is "has", has(anything else) is "have")
|
||
|
- i: the number of things that "has"
|
||
|
|
||
|
structures.py
|
||
|
=============
|
||
|
Intro
|
||
|
|
||
|
This module provides a number of useful data structures that aren't found in
|
||
|
the standard Python library. For the most part they were created as needed for
|
||
|
the bot and plugins themselves, but they were created in such a way as to be
|
||
|
of general use for anyone who needs a data structure that performs a like
|
||
|
duty. As usual in this document, I'll try and order these in order of
|
||
|
usefulness, starting with the most useful.
|
||
|
|
||
|
The queue classes
|
||
|
|
||
|
The structures module provides two general-purpose queue classes for you to
|
||
|
use. The "queue" class is a robust full-featured queue that scales up to
|
||
|
larger sized queues. The "smallqueue" class is for queues that will contain
|
||
|
fewer (less than 1000 or so) items. Both offer the same common interface,
|
||
|
which consists of:
|
||
|
|
||
|
* a constructor which will optionally accept a sequence to start the queue
|
||
|
off with
|
||
|
* enqueue(item) - adds an item to the back of the queue
|
||
|
* dequeue() - removes (and returns) the item from the front of the queue
|
||
|
* peek() - returns the item from the front of the queue without removing
|
||
|
it
|
||
|
* reset() - empties the queue entirely
|
||
|
|
||
|
In addition to these general-use queue classes, there are two other more
|
||
|
specialized queue classes as well. The first is the "TimeoutQueue" which holds
|
||
|
a queue of items until they reach a certain age and then they are removed from
|
||
|
the queue. It features the following:
|
||
|
|
||
|
* TimeoutQueue(timeout, queue=None) - you must specify the timeout (in
|
||
|
seconds) in the constructor. Note that you can also optionally pass it a
|
||
|
queue which uses any implementation you wish to use whether it be one of
|
||
|
the above (queue or smallqueue) or if it's some custom queue you create
|
||
|
that implements the same interface. If you don't pass it a queue instance
|
||
|
to use, it will build its own using smallqueue.
|
||
|
* reset(), enqueue(item), dequeue() - all same as above queue classes
|
||
|
* setTimeout(secs) - allows you to change the timeout value
|
||
|
|
||
|
And for the final queue class, there's the "MaxLengthQueue" class. As you may
|
||
|
have guessed, it's a queue that is capped at a certain specified length. It
|
||
|
features the following:
|
||
|
|
||
|
* MaxLengthQueue(length, seq=()) - the constructor naturally requires that
|
||
|
you set the max length and it allows you to optionally pass in a sequence
|
||
|
to be used as the starting queue. The underlying implementation is
|
||
|
actually the queue from before.
|
||
|
* enqueue(item) - adds an item onto the back of the queue and if it would
|
||
|
push it over the max length, it dequeues the item on the front (it does
|
||
|
not return this item to you)
|
||
|
* all the standard methods from the queue class are inherited for this class
|
||
|
|
||
|
The Other Structures
|
||
|
|
||
|
The most useful of the other structures is actually very similar to the
|
||
|
"MaxLengthQueue". It's the "RingBuffer", which is essentially a MaxLengthQueue
|
||
|
which fills up to its maximum size and then circularly replaces the old
|
||
|
contents as new entries are added instead of dequeuing. It features the
|
||
|
following:
|
||
|
|
||
|
* RingBuffer(size, seq=()) - as with the MaxLengthQueue you specify the
|
||
|
size of the RingBuffer and optionally give it a sequence.
|
||
|
* append(item) - adds item to the end of the buffer, pushing out an item
|
||
|
from the front if necessary
|
||
|
* reset() - empties out the buffer entirely
|
||
|
* resize(i) - shrinks/expands the RingBuffer to the size provided
|
||
|
* extend(seq) - append the items from the provided sequence onto the end
|
||
|
of the RingBuffer
|
||
|
|
||
|
The next data structure is the TwoWayDictionary, which as the name implies is
|
||
|
a dictionary in which key-value pairs have mappings going both directions. It
|
||
|
features the following:
|
||
|
|
||
|
* TwoWayDictionary(seq=(), **kwargs) - Takes an optional sequence of (key,
|
||
|
value) pairs as well as any key=value pairs specified in the constructor
|
||
|
as initial values for the two-way dict.
|
||
|
* other than that, no extra features that a normal Python dict doesn't
|
||
|
already offer with the exception that any (key, val) pair added to the
|
||
|
dict is also added as (val, key) as well, so the mapping goes both ways.
|
||
|
Elements are still accessed the same way you always do with Python
|
||
|
'dict's.
|
||
|
|
||
|
There is also a MultiSet class available, but it's very unlikely that it will
|
||
|
serve your purpose, so I won't go into it here. The curious coder can go check
|
||
|
the source and see what it's all about if they wish (it's only used once in our
|
||
|
code, in the Relay plugin).
|
||
|
|
||
|
web.py
|
||
|
======
|
||
|
The web portion of Supybot's utils module is mainly used for retrieving data
|
||
|
from websites but it also has some utility functions pertaining to HTML and
|
||
|
email text as well. The functions in web are listed below, once again in order
|
||
|
of usefulness.
|
||
|
|
||
|
* getUrl(url, size=None, headers=None) - gets the data at the URL provided
|
||
|
and returns it as one large string
|
||
|
- url: the location of the data to be retrieved or a urllib2.Request
|
||
|
object to be used in the retrieval
|
||
|
- size: the maximum number of bytes to retrieve, defaults to None,
|
||
|
meaning that it is to try to retrieve all data
|
||
|
- headers: a dictionary mapping header types to header data
|
||
|
|
||
|
* getUrlFd(url, headers=None) - returns a file-like object for a url
|
||
|
- url: the location of the data to be retrieved or a urllib2.Request
|
||
|
object to be used in the retrieval
|
||
|
- headers: a dictionary mapping header types to header data
|
||
|
|
||
|
* htmlToText(s, tagReplace=" ") - strips out all tags in a string of HTML,
|
||
|
replacing them with the specified character
|
||
|
- s: the HTML text to strip the tags out of
|
||
|
- tagReplace: the string to replace tags with
|
||
|
|
||
|
* strError(e) - pretty-printer for web exceptions, returns a descriptive
|
||
|
string given a web-related exception
|
||
|
- e: the exception to pretty-print
|
||
|
|
||
|
* mungeEmail(s) - a naive e-mail obfuscation function, replaces "@" with
|
||
|
"AT" and "." with "DOT"
|
||
|
- s: the e-mail address to obfuscate
|
||
|
|
||
|
* getDomain(url) - returns the domain of a URL
|
||
|
- url: the URL in question
|
||
|
|
||
|
The Best of the Rest
|
||
|
====================
|
||
|
Highlights the most useful of the remaining functionality in supybot.utils
|
||
|
|
||
|
Intro
|
||
|
|
||
|
Rather than document each of the remaining portions of the supybot.utils
|
||
|
module, I've elected to just pick out the choice bits from specific parts and
|
||
|
document those instead. Here they are, broken out by module name.
|
||
|
|
||
|
supybot.utils.file - file utilities
|
||
|
|
||
|
* touch(filename) - updates the access time of a file by opening it for
|
||
|
writing and immediately closing it
|
||
|
* mktemp(suffix="") - creates a decent random string, suitable for a
|
||
|
temporary filename with the given suffix, if
|
||
|
provided
|
||
|
* the AtomicFile class - used for files that need to be atomically
|
||
|
written, i.e., if there's a failure the original
|
||
|
file remains unmodified. For more info consult
|
||
|
file.py in src/utils
|
||
|
|
||
|
supybot.utils.gen - general utilities
|
||
|
|
||
|
* timeElapsed(elapsed, [lots of optional args]) - given the number of
|
||
|
seconds elapsed, returns a string with the English description of the
|
||
|
amount of time passed, consult gen.py in src/utils for the exact
|
||
|
argument list and documentation if you feel you could use this
|
||
|
function.
|
||
|
* exnToString(e) - improved exception-to-string function. Provides nicer
|
||
|
output than a simple str(e).
|
||
|
* InsensitivePreservingDict class - a dict class that is case-insensitive
|
||
|
when accessing keys
|
||
|
|
||
|
supybot.utils.iter - iterable utilities
|
||
|
|
||
|
* len(iterable) - returns the length of a given iterable
|
||
|
* groupby(key, iterable) - equivalent to the itertools.groupby function
|
||
|
available as of Python 2.4. Provided for
|
||
|
backwards compatibility.
|
||
|
* any(p, iterable) - Returns true if any element in the iterable satisfies
|
||
|
the predicate p
|
||
|
* all(p, iterable) - Returns true if all elements in the iterable satisfy
|
||
|
the predicate p
|
||
|
* choice(iterable) - Returns a random element from the iterable
|
||
|
|
||
|
|