mirror of
				https://gitea.blesmrt.net/mikaela/gist.git
				synced 2025-10-31 09:27:20 +01:00 
			
		
		
		
	
		
			
				
	
	
	
		
			3.3 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	
			3.3 KiB
		
	
	
	
	
	
	
	
A bit opinionated titlefetching
Preparation
load Web
config plugins.web.snarfMultipleUrls True
config plugins.web.snarferShowDomain False
config plugins.web.snarferShowTargetDomain False
config supybot.protocols.http.userAgents "Limnoria UrlPreviewBot"
config supybot.protocols.http.peekSize 1048576- enables the plugin (shipped with Limnoria)
- enables titlefetching for all links on line, not just the first one
- disables showing domain (small protection against multiple titlesfetcherrs entering a loop or simply not annoying users with clientside link previews (Matrix/Telegram bridges/relays included))
- disables showing redirect target (see previous point)
- sets user-agent to “Limnoria UrlPreviewBot” instead of ‘Mozilla/5.0
(compatible; utils.web python module)’ from 2005
- I have heard that it’s bad to pretend to be something you aren’t and
Twitter will only give you HTML <title>s if your user-agent containsUrlPreviewBot, thanks Tulir’s Synapse patch
 
- I have heard that it’s bad to pretend to be something you aren’t and
Twitter will only give you HTML 
- search for HTML titles from the first MEGABYTE of the webpage as modern web is horrible (looking at you HS & YouTube)
Actually enabling it
config channel #CHAN plugins.web.titleSnarfer True- enables titlefetching per-channel, on #CHAN to be accurate (avoiding
unwanted channels in case of botloop)
- "channel #CHAN"could also be replaced with- network NETWORKNAMEfor every channel on network or- config(or omitted entirely) for everywhere (channel takes priority over network which probably takes priority over global)
 
Excluding domains from titlefetching
config supybot.plugins.Web.nonSnarfingRegexp m/(t.me|matrix.to|facebook.com|instagram.com|imgur.com)/- regexp to block the listed domains, which are the first useless examples I have encountered recently. I just stole the regexp from canonical Limnoria
Titlesnarfing ignored users
While I personally don’t like to do this, it’s possible by
config channel #CHAN plugins.web.checkignored FalseI may have the bot on multiple sides of relay or the user may be ignored due to abuse so this may result into spam.
Bonus: Fediverse
If the Fediverse plugin is configured with secure fetch, fetching Fediverse profiles/statuses/usernames can be enabled by:
channel #CHAN plugins.Fediverse.snarfers.profile true
channel #CHAN plugins.Fediverse.snarfers.status true
channel #CHAN plugins.Fediverse.snarfers.username true