RSS._getConverter: Encode strings before handing them off to other functions

When the feed has a specified encoding, we'll be dealing with unicode objects in the response from feedparser.parse(). To avoid possible UnicodeErrors, we need to encode() before handing the string off to other functions, so the other functions are always dealing with bytestrings instead of bytestrings and unicode objects. Mixing unicode and bytestrings will cause implicit conversions of the unicode objects, which will most likely use the wrong encoding. Signed-off-by: James McCoy <jamessan@users.sourceforge.net> Conflicts: plugins/RSS/plugin.py
2026-01-12 13:38:12 +01:00 · 2011-10-22 15:23:56 -04:00 · 2011-10-22 15:23:56 -04:00 · ff96b898f9
commit ff96b898f9
parent 72077c8c97
1 changed files with 6 additions and 4 deletions
--- a/plugins/RSS/plugin.py
+++ b/plugins/RSS/plugin.py
@ -291,10 +291,12 @@ class RSS(callbacks.Plugin):
        toText = utils.web.htmlToText
        if 'encoding' in feed:
            def conv(s):
-                try:
-                    return toText(s).strip().encode(feed['encoding'],'replace')
-                except UnicodeEncodeError:
-                    return toText(s.encode('utf-8', 'ignore')).strip()
+                # encode() first so there implicit encoding doesn't happen in
+                # other functions when unicode and bytestring objects are used
+                # together
+                s = s.encode(feed['encoding'], 'replace')
+                s = toText(s).strip()
+                return s
            return conv
        else:
            return lambda s: toText(s).strip()