When the feed has a specified encoding, we'll be dealing with unicode objects
in the response from feedparser.parse(). To avoid possible UnicodeErrors, we
need to encode() before handing the string off to other functions, so the
other functions are always dealing with bytestrings instead of bytestrings and
unicode objects. Mixing unicode and bytestrings will cause implicit
conversions of the unicode objects, which will most likely use the wrong
encoding.
Signed-off-by: James McCoy <jamessan@users.sourceforge.net>
Upstream bug: http://bugs.python.org/issue3932
Rather than override the unescape method with the patch posted, we just convert the page
text to unicode before passing it to the HTMLParser. UTF8 and Latin1 will eat just about
anything.
Signed-off-by: James Vega <jamessan@users.sourceforge.net>
(cherry picked from commit 44eb449ba4)
Signed-off-by: Daniel Folkinshteyn <nanotube@users.sourceforge.net>
Signed-off-by: James Vega <jamessan@users.sourceforge.net>
(cherry picked from commit 4661acb3a3)
Signed-off-by: Daniel Folkinshteyn <nanotube@users.sourceforge.net>
When searching for 'st*ke', 'stryker' would incorrectly match, 'stryke' would
be added to the nick set and the subsequent lookup would cause a KeyError.
This is fixed both by anchoring the regexp ('^st.*ke$' instead of 'st.*ke')
and adding searchNick to the nick set instead of the string that matched the
pattern.
Closes: Sf#3377381
Signed-off-by: James Vega <jamessan@users.sourceforge.net>
(cherry picked from commit 0cd4939678)
Signed-off-by: Daniel Folkinshteyn <nanotube@users.sourceforge.net>
Signed-off-by: James Vega <jamessan@users.sourceforge.net>
(cherry picked from commit b0e595fbd2)
Signed-off-by: Daniel Folkinshteyn <nanotube@users.sourceforge.net>
Signed-off-by: James Vega <jamessan@users.sourceforge.net>
(cherry picked from commit d56381436c)
Signed-off-by: Daniel Folkinshteyn <nanotube@users.sourceforge.net>