RSS._getConverter: Encode strings before handing them off to other functions

When the feed has a specified encoding, we'll be dealing with unicode objects
in the response from feedparser.parse().  To avoid possible UnicodeErrors, we
need to encode() before handing the string off to other functions, so the
other functions are always dealing with bytestrings instead of bytestrings and
unicode objects.  Mixing unicode and bytestrings will cause implicit
conversions of the unicode objects, which will most likely use the wrong
encoding.

Signed-off-by: James McCoy <jamessan@users.sourceforge.net>

Conflicts:

	plugins/RSS/plugin.py
This commit is contained in:
James McCoy 2011-10-22 15:23:56 -04:00 committed by Valentin Lorentz
parent 72077c8c97
commit ff96b898f9

View File

@ -291,10 +291,12 @@ class RSS(callbacks.Plugin):
toText = utils.web.htmlToText toText = utils.web.htmlToText
if 'encoding' in feed: if 'encoding' in feed:
def conv(s): def conv(s):
try: # encode() first so there implicit encoding doesn't happen in
return toText(s).strip().encode(feed['encoding'],'replace') # other functions when unicode and bytestring objects are used
except UnicodeEncodeError: # together
return toText(s.encode('utf-8', 'ignore')).strip() s = s.encode(feed['encoding'], 'replace')
s = toText(s).strip()
return s
return conv return conv
else: else:
return lambda s: toText(s).strip() return lambda s: toText(s).strip()