mirror of
				https://github.com/Mikaela/Limnoria.git
				synced 2025-10-26 04:57:21 +01:00 
			
		
		
		
	
		
			
				
	
	
		
			344 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			344 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| Using Supybot's utils module
 | |
| ----------------------------
 | |
| Supybot provides a wealth of utilities for plugin writers in the supybot.utils
 | |
| module, this tutorial describes these utilities and shows you how to use them.
 | |
| 
 | |
| str.py
 | |
| ======
 | |
| The Format Function
 | |
| 
 | |
| The supybot.utils.str module provides a bunch of utility functions for
 | |
| handling string values. This section contains a quick rundown of all of the
 | |
| functions available, along with descriptions of the arguments they take. First
 | |
| and foremost is the format function, which provides a lot of capability in
 | |
| just one function that uses string-formatting style to accomplish a lot. So
 | |
| much so that it gets its own section in this tutorial. All other functions
 | |
| will be in other sections. format takes several arguments - first, the format
 | |
| string (using the format characters described below), and then after that,
 | |
| each individual item to be formatted. Do not attempt to use the % operator to
 | |
| do the formatting because that will fall back on the normal string formatting
 | |
| operator. The format function uses the following string formatting characters.
 | |
| 
 | |
|     * % - literal "%"
 | |
|     * i - integer
 | |
|     * s - string
 | |
|     * f - float
 | |
|     * r - repr
 | |
|     * b - form of the verb "to be" (takes an int)
 | |
|     * h - form of the verb "to have" (takes an int)
 | |
|     * L - commaAndify (takes a list of strings or a tuple of ([strings], and))
 | |
|     * p - pluralize (takes a string)
 | |
|     * q - quoted (takes a string)
 | |
|     * n - n items (takes a 2-tuple of (n, item) or a 3-tuple of (n, between, item))
 | |
|     * t - time, formatted (takes an int)
 | |
|     * u - url, wrapped in braces
 | |
| 
 | |
| Here are a few examples to help elaborate on the above descriptions:
 | |
| 
 | |
|   >>> format("Error %q has been reported %n.  For more information, see %u.",
 | |
|              "AttributeError", (5, "time"), "http://supybot.com")
 | |
| 
 | |
|   'Error "AttributeError" has been reported 5 times.  For more information,
 | |
|    see <http://supybot.com>.'
 | |
| 
 | |
|   >>> i = 4
 | |
|   >>> format("There %b %n at this time.  You are only allowed %n at any given
 | |
|               time", i, (i, "active", "thread"), (5, "active", "thread"))
 | |
|   'There are 4 active threads at this time.  You are only allowed 5 active
 | |
|    threads at any given time'
 | |
| 
 | |
|   >>> i = 1
 | |
|   >>> format("There %b %n at this time.  You are only allowed %n at any given
 | |
|               time", i, (i, "active", "thread"), (5, "active", "thread"))
 | |
|    'There is 1 active thread at this time.  You are only allowed 5 active
 | |
|     threads at any given time'
 | |
| 
 | |
|   >>> ops = ["foo", "bar", "baz"]
 | |
|   >>> format("The following %n %h the %s capability: %L", (len(ops), "user"),
 | |
|               len(ops), "op", ops)
 | |
|   'The following 3 users have the op capability: foo, bar, and baz'
 | |
| 
 | |
| As you can see, you can combine all sorts of combinations of formatting
 | |
| strings into one. In fact, that was the major motivation behind format. We
 | |
| have specific functions that you can use individually for each of those
 | |
| formatting types, but it became much easier just to use special formatting
 | |
| chars and the format function than concatenating a bunch of strings that were
 | |
| the result of other utils.str functions.
 | |
| 
 | |
| The Other Functions
 | |
| 
 | |
| These are the functions that can't be handled by format. They are sorted in
 | |
| what I perceive to be the general order of usefulness (and I'm leaving the
 | |
| ones covered by format for the next section).
 | |
| 
 | |
|     * ellipsisify(s, n) - Returns a shortened version of a string. Produces up
 | |
|     to the first n chars at the nearest word boundary.
 | |
|           - s: the string to be shortened
 | |
|           - n: the number of characters to shorten it to
 | |
| 
 | |
|     * perlReToPythonRe(s) - Converts a Perl-style regexp (e.g., "/abcd/i" or
 | |
|     "m/abcd/i") to an actual Python regexp (an re object)
 | |
|           - s: the regexp string
 | |
| 
 | |
|     * perlReToReplacer(s) - converts a perl-style replacement regexp (eg,
 | |
|     "s/foo/bar/g") to a Python function that performs such a replacement
 | |
|           - s: the regexp string
 | |
| 
 | |
|     * dqrepr(s) - Returns a repr() of s guaranteed to be in double quotes.
 | |
|     (Double Quote Repr)
 | |
|           - s: the string to be double-quote repr()'ed
 | |
| 
 | |
|     * toBool(s) - Determines whether or not a string means True or False and
 | |
|     returns the appropriate boolean value. True is any of "true", "on",
 | |
|     "enable", "enabled", or "1". False is any of "false", "off", "disable",
 | |
|     "disabled", or "0".
 | |
|           - s: the string to determine the boolean value for
 | |
| 
 | |
|     * rsplit(s, sep=None, maxsplit=-1) - functionally the same as str.split in
 | |
|     the Python standard library except splitting from the right instead of the
 | |
|     left. Python 2.4 has str.rsplit (which this function defers to for those
 | |
|     versions >= 2.4), but Python 2.3 did not.
 | |
|           - s: the string to be split
 | |
|           - sep: the separator to split on, defaults to whitespace
 | |
|           - maxsplit: the maximum number of splits to perform, -1 splits all
 | |
|                       possible splits.
 | |
| 
 | |
|     * normalizeWhitespace(s) - reduces all multi-spaces in a string to a
 | |
|     single space
 | |
|           - s: the string to normalize
 | |
| 
 | |
|     * depluralize(s) - the opposite of pluralize
 | |
|           - s: the string to depluralize
 | |
| 
 | |
|     * unCommaThe(s) - Takes a string of the form "foo, the" and turns it into
 | |
|     "the foo"
 | |
|           - s: string, the
 | |
| 
 | |
|     * distance(s, t) - computes the levenshtein distance (or "edit distance")
 | |
|     between two strings
 | |
|           - s: the first string
 | |
|           - t: the second string
 | |
| 
 | |
|     * soundex(s, length=4) - computes the soundex for a given string
 | |
|           - s: the string to compute the soundex for
 | |
|           - length: the length of the soundex to generate
 | |
| 
 | |
|     * matchCase(s1, s2) - Matches the case of the first string in the second
 | |
|     string.
 | |
|           - s1: the first string
 | |
|           - s2: the string which will be made to match the case of the first
 | |
| 
 | |
| The Commands Format Already Covers
 | |
| 
 | |
| These commands aren't necessary because you can achieve them more easily by
 | |
| using the format command, but they exist if you decide you want to use them
 | |
| anyway though it is greatly discouraged for general use.
 | |
| 
 | |
|     * commaAndify(seq, comma=",", And="and") - transforms a list of items into
 | |
|     a comma separated list with an "and" preceding the last element. For
 | |
|     example, ["foo", "bar", "baz"] becomes "foo, bar, and baz". Is smart
 | |
|     enough to convert two-element lists to just "item1 and item2" as well.
 | |
|           - seq: the sequence of items (don't have to be strings, but need to
 | |
|                 be 'str()'-able)
 | |
|           - comma: the character to use to separate the list
 | |
|           - And: the word to use before the last element
 | |
| 
 | |
|     * pluralize(s) - Returns the plural of a string. Put any exceptions to the
 | |
|     general English rules of pluralization in the plurals dictionary in
 | |
|     supybot.utils.str.
 | |
|           - s: the string to pluralize
 | |
| 
 | |
|     * nItems(n, item, between=None) - returns a string that describes a given
 | |
|     number of an item (with any string between the actual number and the item
 | |
|     itself), handles pluralization with the pluralize function above. Note
 | |
|     that the arguments here are in a different order since between is
 | |
|     optional.
 | |
|           - n: the number of items
 | |
|           - item: the type of item
 | |
|           - between: the optional string that goes between the number and the
 | |
|                      type of item
 | |
| 
 | |
|     * quoted(s) - Returns the string surrounded by double-quotes.
 | |
|           - s: the string to quote
 | |
| 
 | |
|     * be(i) - Returns the proper form of the verb "to be" based on the number
 | |
|     provided (be(1) is "is", be(anything else) is "are")
 | |
|           - i: the number of things that "be"
 | |
| 
 | |
|     * has(i) - Returns the proper form of the verb "to have" based on the
 | |
|     number provided (has(1) is "has", has(anything else) is "have")
 | |
|           - i: the number of things that "has"
 | |
| 
 | |
| structures.py
 | |
| =============
 | |
| Intro
 | |
| 
 | |
| This module provides a number of useful data structures that aren't found in
 | |
| the standard Python library. For the most part they were created as needed for
 | |
| the bot and plugins themselves, but they were created in such a way as to be
 | |
| of general use for anyone who needs a data structure that performs a like
 | |
| duty. As usual in this document, I'll try and order these in order of
 | |
| usefulness, starting with the most useful.
 | |
| 
 | |
| The queue classes
 | |
| 
 | |
| The structures module provides two general-purpose queue classes for you to
 | |
| use. The "queue" class is a robust full-featured queue that scales up to
 | |
| larger sized queues. The "smallqueue" class is for queues that will contain
 | |
| fewer (less than 1000 or so) items. Both offer the same common interface,
 | |
| which consists of:
 | |
| 
 | |
|     * a constructor which will optionally accept a sequence to start the queue
 | |
|       off with
 | |
|     * enqueue(item) - adds an item to the back of the queue
 | |
|     * dequeue() - removes (and returns) the item from the front of the queue
 | |
|     * peek() - returns the item from the front of the queue without removing
 | |
|                it
 | |
|     * reset() - empties the queue entirely
 | |
| 
 | |
| In addition to these general-use queue classes, there are two other more
 | |
| specialized queue classes as well. The first is the "TimeoutQueue" which holds
 | |
| a queue of items until they reach a certain age and then they are removed from
 | |
| the queue. It features the following:
 | |
| 
 | |
|     * TimeoutQueue(timeout, queue=None) - you must specify the timeout (in
 | |
|     seconds) in the constructor. Note that you can also optionally pass it a
 | |
|     queue which uses any implementation you wish to use whether it be one of
 | |
|     the above (queue or smallqueue) or if it's some custom queue you create
 | |
|     that implements the same interface. If you don't pass it a queue instance
 | |
|     to use, it will build its own using smallqueue.
 | |
|     * reset(), enqueue(item), dequeue() - all same as above queue classes
 | |
|     * setTimeout(secs) - allows you to change the timeout value
 | |
| 
 | |
| And for the final queue class, there's the "MaxLengthQueue" class. As you may
 | |
| have guessed, it's a queue that is capped at a certain specified length. It
 | |
| features the following:
 | |
| 
 | |
|     * MaxLengthQueue(length, seq=()) - the constructor naturally requires that
 | |
|     you set the max length and it allows you to optionally pass in a sequence
 | |
|     to be used as the starting queue. The underlying implementation is
 | |
|     actually the queue from before.
 | |
|     * enqueue(item) - adds an item onto the back of the queue and if it would
 | |
|     push it over the max length, it dequeues the item on the front (it does
 | |
|     not return this item to you)
 | |
|     * all the standard methods from the queue class are inherited for this class
 | |
| 
 | |
| The Other Structures
 | |
| 
 | |
| The most useful of the other structures is actually very similar to the
 | |
| "MaxLengthQueue". It's the "RingBuffer", which is essentially a MaxLengthQueue
 | |
| which fills up to its maximum size and then circularly replaces the old
 | |
| contents as new entries are added instead of dequeuing.  It features the
 | |
| following:
 | |
| 
 | |
|     * RingBuffer(size, seq=()) - as with the MaxLengthQueue you specify the
 | |
|     size of the RingBuffer and optionally give it a sequence.
 | |
|     * append(item) - adds item to the end of the buffer, pushing out an item
 | |
|     from the front if necessary
 | |
|     * reset() - empties out the buffer entirely
 | |
|     * resize(i) - shrinks/expands the RingBuffer to the size provided
 | |
|     * extend(seq) - append the items from the provided sequence onto the end
 | |
|     of the RingBuffer
 | |
| 
 | |
| The next data structure is the TwoWayDictionary, which as the name implies is
 | |
| a dictionary in which key-value pairs have mappings going both directions. It
 | |
| features the following:
 | |
| 
 | |
|     * TwoWayDictionary(seq=(), **kwargs) - Takes an optional sequence of (key,
 | |
|     value) pairs as well as any key=value pairs specified in the constructor
 | |
|     as initial values for the two-way dict.
 | |
|     * other than that, no extra features that a normal Python dict doesn't
 | |
|     already offer with the exception that any (key, val) pair added to the
 | |
|     dict is also added as (val, key) as well, so the mapping goes both ways.
 | |
|     Elements are still accessed the same way you always do with Python
 | |
|     'dict's.
 | |
| 
 | |
| There is also a MultiSet class available, but it's very unlikely that it will
 | |
| serve your purpose, so I won't go into it here. The curious coder can go check
 | |
| the source and see what it's all about if they wish (it's only used once in our
 | |
| code, in the Relay plugin).
 | |
| 
 | |
| web.py
 | |
| ======
 | |
| The web portion of Supybot's utils module is mainly used for retrieving data
 | |
| from websites but it also has some utility functions pertaining to HTML and
 | |
| email text as well. The functions in web are listed below, once again in order
 | |
| of usefulness.
 | |
| 
 | |
|     * getUrl(url, size=None, headers=None) - gets the data at the URL provided
 | |
|     and returns it as one large string
 | |
|           - url: the location of the data to be retrieved or a urllib2.Request
 | |
|                  object to be used in the retrieval
 | |
|           - size: the maximum number of bytes to retrieve, defaults to None,
 | |
|                   meaning that it is to try to retrieve all data
 | |
|           - headers: a dictionary mapping header types to header data
 | |
| 
 | |
|     * getUrlFd(url, headers=None) - returns a file-like object for a url
 | |
|           - url: the location of the data to be retrieved or a urllib2.Request
 | |
|                  object to be used in the retrieval
 | |
|           - headers: a dictionary mapping header types to header data
 | |
| 
 | |
|     * htmlToText(s, tagReplace=" ") - strips out all tags in a string of HTML,
 | |
|     replacing them with the specified character
 | |
|           - s: the HTML text to strip the tags out of
 | |
|           - tagReplace: the string to replace tags with
 | |
| 
 | |
|     * strError(e) - pretty-printer for web exceptions, returns a descriptive
 | |
|     string given a web-related exception
 | |
|           - e: the exception to pretty-print
 | |
| 
 | |
|     * mungeEmail(s) - a naive e-mail obfuscation function, replaces "@" with
 | |
|     "AT" and "." with "DOT"
 | |
|           - s: the e-mail address to obfuscate
 | |
| 
 | |
|     * getDomain(url) - returns the domain of a URL
 | |
|           - url: the URL in question
 | |
| 
 | |
| The Best of the Rest
 | |
| ====================
 | |
|   Highlights the most useful of the remaining functionality in supybot.utils
 | |
| 
 | |
| Intro
 | |
| 
 | |
| Rather than document each of the remaining portions of the supybot.utils
 | |
| module, I've elected to just pick out the choice bits from specific parts and
 | |
| document those instead. Here they are, broken out by module name.
 | |
| 
 | |
| supybot.utils.file - file utilities
 | |
| 
 | |
|     * touch(filename) - updates the access time of a file by opening it for
 | |
|                         writing and immediately closing it
 | |
|     * mktemp(suffix="") - creates a decent random string, suitable for a
 | |
|                           temporary filename with the given suffix, if
 | |
|                           provided
 | |
|     * the AtomicFile class - used for files that need to be atomically
 | |
|                              written, i.e., if there's a failure the original
 | |
|                              file remains unmodified. For more info consult
 | |
|                              file.py in src/utils
 | |
| 
 | |
| supybot.utils.gen - general utilities
 | |
| 
 | |
|     * timeElapsed(elapsed, [lots of optional args]) - given the number of
 | |
|         seconds elapsed, returns a string with the English description of the
 | |
|         amount of time passed, consult gen.py in src/utils for the exact
 | |
|         argument list and documentation if you feel you could use this
 | |
|         function.
 | |
|     * exnToString(e) - improved exception-to-string function. Provides nicer
 | |
|                        output than a simple str(e).
 | |
|     * InsensitivePreservingDict class - a dict class that is case-insensitive
 | |
|                                         when accessing keys
 | |
| 
 | |
| supybot.utils.iter - iterable utilities
 | |
| 
 | |
|     * len(iterable) - returns the length of a given iterable
 | |
|     * groupby(key, iterable) - equivalent to the itertools.groupby function
 | |
|                                available as of Python 2.4. Provided for
 | |
|                                backwards compatibility.
 | |
|     * any(p, iterable) - Returns true if any element in the iterable satisfies
 | |
|                          the predicate p
 | |
|     * all(p, iterable) - Returns true if all elements in the iterable satisfy
 | |
|                          the predicate p
 | |
|     * choice(iterable) - Returns a random element from the iterable
 | |
| 
 | |
| 
 | 
