2021-06-11 16:56:19 +02:00
# A bit opinionated titlefetching
2021-06-11 17:57:49 +02:00
## Preparation
2021-06-11 16:56:19 +02:00
```
load Web
config plugins.web.snarfMultipleUrls True
config plugins.web.snarferShowDomain False
config plugins.web.snarferShowTargetDomain False
config supybot.protocols.http.userAgents "Limnoria UrlPreviewBot"
2021-06-11 17:51:21 +02:00
config supybot.protocols.http.peekSize 1048576
2021-06-11 16:56:19 +02:00
```
* enables the plugin (shipped with Limnoria)
* enables titlefetching for all links on line, not just the first one
2021-06-11 19:29:54 +02:00
* disables showing domain (small protection against multiple titlesfetcherrs
entering a loop or simply not annoying users with clientside link previews
(Matrix/Telegram bridges/relays included))
2021-06-11 16:56:19 +02:00
* disables showing redirect target -||-
* sets user-agent to "Limnoria UrlPreviewBot" instead of ['Mozilla/5.0 (compatible; utils.web python module)' from 2005 ](https://github.com/ProgVal/Limnoria/blame/2990fcd302afdc6a3b741594017c3959fd5da2fd/src/utils/web.py#L120 )
2021-06-11 17:57:49 +02:00
* I have heard that it's bad to pretend to be something you aren't and
Twitter will only give you HTMl `<title>` s if your user-agent contains
2021-06-15 09:08:31 +02:00
`UrlPreviewBot` [thanks Tulir's Synapse patch ](https://mau.dev/maunium/synapse/-/commit/55d926999cffee893cb4951890a33985beaf70ba )
2021-06-11 17:57:49 +02:00
* search for html titles from the first MEGABYTE of the webpage as modern
web is horrible (looking at you hs.fi & youtube.com)
## Actually enabling it
```
config channel #CHAN plugins.web.titleSnarfer True
```
* enables titlefetching per-channel, on #CHAN to be accurate
(avoiding unwanted channels in case of botloop)
* `"channel #CHAN"` could also be replaced with `network NETWORKNAME` for
every channel on network or `config` (or omitted entirely) for
everywhere (channel takes priority over network which *probably takes*
priority over global)
2021-06-12 23:24:07 +02:00
## Excluding domains from titlefetching
```
2021-06-22 17:46:35 +02:00
config supybot.plugins.Web.nonSnarfingRegexp m/(t.me|matrix.to|facebook.com|instagram.com|imgur.com)/
2021-06-12 23:24:07 +02:00
```
2021-06-17 13:10:04 +02:00
* regexp to block the listed domains, which are the first useless
examples I have encountered recently. I just stole the regexp from
[canonical Limnoria ](https://github.com/ProgVal/Limnoria/wiki/Canonical-%23limnoria-doc )
2021-06-20 16:53:00 +02:00
## Titlesnarfing ignored users
While I personally don't like to do this, it's possible by
```
config channel #CHAN plugins.web.checkignored False
```
I may have the bot on multiple sides of relay or the user may be ignored due
to abuse so this may result into spam.
2022-03-12 08:33:12 +01:00
## Bonus: Fediverse
If [the Fediverse plugin is configured with secure fetch ](https://github.com/progval/Limnoria/tree/master/plugins/Fediverse ),
fetching Fediverse profiles/statuses/usernames can be enabled by:
```
channel #CHAN plugins.Fediverse.snarfers.profile true
channel #CHAN plugins.Fediverse.snarfers.status true
channel #CHAN plugins.Fediverse.snarfers.username true
```