mirror of
https://gitea.blesmrt.net/mikaela/gist.git
synced 2024-11-01 16:09:23 +01:00
2.0 KiB
2.0 KiB
A bit opinionated titlefetching
Preparation
load Web
config plugins.web.snarfMultipleUrls True
config plugins.web.snarferShowDomain False
config plugins.web.snarferShowTargetDomain False
config supybot.protocols.http.userAgents "Limnoria UrlPreviewBot"
config supybot.protocols.http.peekSize 1048576
- enables the plugin (shipped with Limnoria)
- enables titlefetching for all links on line, not just the first one
- disables showing domain (small protection against multiple titlesfetcherrs entering a loop or simply not annoying users with clientside link previews (Matrix/Telegram bridges/relays included))
- disables showing redirect target -||-
- sets user-agent to “Limnoria UrlPreviewBot” instead of ‘Mozilla/5.0
(compatible; utils.web python module)’ from 2005
- I have heard that it’s bad to pretend to be something you aren’t and
Twitter will only give you HTMl
<title>
s if your user-agent containsUrlPreviewBot
- I have heard that it’s bad to pretend to be something you aren’t and
Twitter will only give you HTMl
- search for html titles from the first MEGABYTE of the webpage as modern web is horrible (looking at you hs.fi & youtube.com)
Actually enabling it
config channel #CHAN plugins.web.titleSnarfer True
- enables titlefetching per-channel, on #CHAN to be accurate (avoiding
unwanted channels in case of botloop)
"channel #CHAN"
could also be replaced withnetwork NETWORKNAME
for every channel on network orconfig
(or omitted entirely) for everywhere (channel takes priority over network which probably takes priority over global)
Excluding domains from titlefetching
config supybot.plugins.Web.nonSnarfingRegexp m/(t.me|matrix.to|facebook.com)/
- regexp to block t.me, matrix.to & facebook.com, which are the first useless examples I have encountered recently. I just stole the regexp from canonical Limnoria