2024-06-19 07:53:27 +02:00
|
|
|
<!-- @format -->
|
|
|
|
|
2023-03-16 18:53:28 +01:00
|
|
|
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
|
|
|
|
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
|
|
|
|
|
|
|
|
- [A bit opinionated titlefetching](#a-bit-opinionated-titlefetching)
|
|
|
|
- [Preparation](#preparation)
|
|
|
|
- [Actually enabling it](#actually-enabling-it)
|
|
|
|
- [Excluding domains from titlefetching](#excluding-domains-from-titlefetching)
|
|
|
|
- [Titlesnarfing ignored users](#titlesnarfing-ignored-users)
|
|
|
|
- [Bonus: Fediverse](#bonus-fediverse)
|
|
|
|
|
|
|
|
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
|
|
|
|
|
2021-06-11 16:56:19 +02:00
|
|
|
# A bit opinionated titlefetching
|
|
|
|
|
2021-06-11 17:57:49 +02:00
|
|
|
## Preparation
|
|
|
|
|
2021-06-11 16:56:19 +02:00
|
|
|
```
|
|
|
|
load Web
|
|
|
|
config plugins.web.snarfMultipleUrls True
|
|
|
|
config plugins.web.snarferShowDomain False
|
|
|
|
config plugins.web.snarferShowTargetDomain False
|
|
|
|
config supybot.protocols.http.userAgents "Limnoria UrlPreviewBot"
|
2021-06-11 17:51:21 +02:00
|
|
|
config supybot.protocols.http.peekSize 1048576
|
2021-06-11 16:56:19 +02:00
|
|
|
```
|
|
|
|
|
2023-02-26 18:01:13 +01:00
|
|
|
- enables the plugin (shipped with Limnoria)
|
|
|
|
- enables titlefetching for all links on line, not just the first one
|
|
|
|
- disables showing domain (small protection against multiple titlesfetcherrs
|
2021-06-11 19:29:54 +02:00
|
|
|
entering a loop or simply not annoying users with clientside link previews
|
|
|
|
(Matrix/Telegram bridges/relays included))
|
2023-02-26 18:01:13 +01:00
|
|
|
- disables showing redirect target (see previous point)
|
2024-06-19 07:53:27 +02:00
|
|
|
- sets user-agent to "Limnoria UrlPreviewBot" instead of
|
|
|
|
['Mozilla/5.0 (compatible; utils.web python module)' from 2005](https://github.com/ProgVal/Limnoria/blame/2990fcd302afdc6a3b741594017c3959fd5da2fd/src/utils/web.py#L120)
|
2023-02-26 18:01:13 +01:00
|
|
|
- I have heard that it's bad to pretend to be something you aren't and
|
2022-03-23 19:22:59 +01:00
|
|
|
Twitter will only give you HTML `<title>`s if your user-agent contains
|
2024-06-19 07:53:27 +02:00
|
|
|
`UrlPreviewBot`,
|
|
|
|
[thanks Tulir's Synapse patch](https://mau.dev/maunium/synapse/-/commit/55d926999cffee893cb4951890a33985beaf70ba)
|
|
|
|
- search for HTML titles from the first MEGABYTE of the webpage as modern web
|
|
|
|
is horrible (looking at you [HS](https://hs.fi) &
|
|
|
|
[YouTube](https://youtube.com))
|
2021-06-11 17:57:49 +02:00
|
|
|
|
|
|
|
## Actually enabling it
|
|
|
|
|
|
|
|
```
|
|
|
|
config channel #CHAN plugins.web.titleSnarfer True
|
|
|
|
```
|
|
|
|
|
2024-06-19 07:53:27 +02:00
|
|
|
- enables titlefetching per-channel, on #CHAN to be accurate (avoiding
|
|
|
|
unwanted channels in case of botloop)
|
2023-02-26 18:01:13 +01:00
|
|
|
- `"channel #CHAN"` could also be replaced with `network NETWORKNAME` for
|
2024-06-19 07:53:27 +02:00
|
|
|
every channel on network or `config` (or omitted entirely) for everywhere
|
|
|
|
(channel takes priority over network which _probably_ takes priority over
|
|
|
|
global)
|
2021-06-12 23:24:07 +02:00
|
|
|
|
|
|
|
## Excluding domains from titlefetching
|
|
|
|
|
|
|
|
```
|
2021-06-22 17:46:35 +02:00
|
|
|
config supybot.plugins.Web.nonSnarfingRegexp m/(t.me|matrix.to|facebook.com|instagram.com|imgur.com)/
|
2021-06-12 23:24:07 +02:00
|
|
|
```
|
|
|
|
|
2024-06-19 07:53:27 +02:00
|
|
|
- regexp to block the listed domains, which are the first useless examples I
|
|
|
|
have encountered recently. I just stole the regexp from
|
2021-06-17 13:10:04 +02:00
|
|
|
[canonical Limnoria](https://github.com/ProgVal/Limnoria/wiki/Canonical-%23limnoria-doc)
|
2021-06-20 16:53:00 +02:00
|
|
|
|
|
|
|
## Titlesnarfing ignored users
|
|
|
|
|
|
|
|
While I personally don't like to do this, it's possible by
|
|
|
|
|
|
|
|
```
|
|
|
|
config channel #CHAN plugins.web.checkignored False
|
|
|
|
```
|
|
|
|
|
|
|
|
I may have the bot on multiple sides of relay or the user may be ignored due
|
|
|
|
to abuse so this may result into spam.
|
2022-03-12 08:33:12 +01:00
|
|
|
|
|
|
|
## Bonus: Fediverse
|
|
|
|
|
2024-06-19 07:53:27 +02:00
|
|
|
If
|
|
|
|
[the Fediverse plugin is configured with secure fetch](https://github.com/progval/Limnoria/tree/master/plugins/Fediverse),
|
2022-03-12 08:33:12 +01:00
|
|
|
fetching Fediverse profiles/statuses/usernames can be enabled by:
|
|
|
|
|
|
|
|
```
|
|
|
|
channel #CHAN plugins.Fediverse.snarfers.profile true
|
|
|
|
channel #CHAN plugins.Fediverse.snarfers.status true
|
|
|
|
channel #CHAN plugins.Fediverse.snarfers.username true
|
|
|
|
```
|