gist/irc/limnoria/titlefetching.md

<!-- @format -->

<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->

- [A bit opinionated titlefetching](#a-bit-opinionated-titlefetching)
  - [Preparation](#preparation)
  - [Actually enabling it](#actually-enabling-it)
  - [Excluding domains from titlefetching](#excluding-domains-from-titlefetching)
  - [Titlesnarfing ignored users](#titlesnarfing-ignored-users)
  - [Bonus: Fediverse](#bonus-fediverse)

<!-- END doctoc generated TOC please keep comment here to allow auto update -->

# A bit opinionated titlefetching

## Preparation

```
load Web
config plugins.web.snarfMultipleUrls True
config plugins.web.snarferShowDomain False
config plugins.web.snarferShowTargetDomain False
config supybot.protocols.http.userAgents "Limnoria UrlPreviewBot"
config supybot.protocols.http.peekSize 1048576
```

- enables the plugin (shipped with Limnoria)
- enables titlefetching for all links on line, not just the first one
- disables showing domain (small protection against multiple titlesfetcherrs
  entering a loop or simply not annoying users with clientside link previews
  (Matrix/Telegram bridges/relays included))
- disables showing redirect target (see previous point)
- sets user-agent to "Limnoria UrlPreviewBot" instead of
  ['Mozilla/5.0 (compatible; utils.web python module)' from 2005](https://github.com/ProgVal/Limnoria/blame/2990fcd302afdc6a3b741594017c3959fd5da2fd/src/utils/web.py#L120)
  - I have heard that it's bad to pretend to be something you aren't and
    Twitter will only give you HTML `<title>`s if your user-agent contains
    `UrlPreviewBot`,
    [thanks Tulir's Synapse patch](https://mau.dev/maunium/synapse/-/commit/55d926999cffee893cb4951890a33985beaf70ba)
- search for HTML titles from the first MEGABYTE of the webpage as modern web
  is horrible (looking at you [HS](https://hs.fi) &
  [YouTube](https://youtube.com))

## Actually enabling it

```
config channel #CHAN plugins.web.titleSnarfer True
```

- enables titlefetching per-channel, on #CHAN to be accurate (avoiding
  unwanted channels in case of botloop)
  - `"channel #CHAN"` could also be replaced with `network NETWORKNAME` for
    every channel on network or `config` (or omitted entirely) for everywhere
    (channel takes priority over network which _probably_ takes priority over
    global)

## Excluding domains from titlefetching

```
config supybot.plugins.Web.nonSnarfingRegexp m/(t.me|matrix.to|facebook.com|instagram.com|imgur.com)/
```

- regexp to block the listed domains, which are the first useless examples I
  have encountered recently. I just stole the regexp from
  [canonical Limnoria](https://github.com/ProgVal/Limnoria/wiki/Canonical-%23limnoria-doc)

## Titlesnarfing ignored users

While I personally don't like to do this, it's possible by

```
config channel #CHAN plugins.web.checkignored False
```

I may have the bot on multiple sides of relay or the user may be ignored due
to abuse so this may result into spam.

## Bonus: Fediverse

If
[the Fediverse plugin is configured with secure fetch](https://github.com/progval/Limnoria/tree/master/plugins/Fediverse),
fetching Fediverse profiles/statuses/usernames can be enabled by:

```
channel #CHAN plugins.Fediverse.snarfers.profile true
channel #CHAN plugins.Fediverse.snarfers.status true
channel #CHAN plugins.Fediverse.snarfers.username true
```
run prettier 2024-06-19 07:53:27 +02:00			`<!-- @format -->`

add doctoc 2023-03-16 18:53:28 +01:00			`<!-- START doctoc generated TOC please keep comment here to allow auto update -->`
			`<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->`

			`- [A bit opinionated titlefetching](#a-bit-opinionated-titlefetching)`
			`- [Preparation](#preparation)`
			`- [Actually enabling it](#actually-enabling-it)`
			`- [Excluding domains from titlefetching](#excluding-domains-from-titlefetching)`
			`- [Titlesnarfing ignored users](#titlesnarfing-ignored-users)`
			`- [Bonus: Fediverse](#bonus-fediverse)`

			`<!-- END doctoc generated TOC please keep comment here to allow auto update -->`

irc/limnoria: writeup titlefetching.md 2021-06-11 16:56:19 +02:00			`# A bit opinionated titlefetching`

titlefetching: separate enabling, fleshen it & cut lines 2021-06-11 17:57:49 +02:00			`## Preparation`

irc/limnoria: writeup titlefetching.md 2021-06-11 16:56:19 +02:00			```
			`load Web`
			`config plugins.web.snarfMultipleUrls True`
			`config plugins.web.snarferShowDomain False`
			`config plugins.web.snarferShowTargetDomain False`
			`config supybot.protocols.http.userAgents "Limnoria UrlPreviewBot"`
irc/limnoria/titlefetching.md: add peeksize 2021-06-11 17:51:21 +02:00			`config supybot.protocols.http.peekSize 1048576`
irc/limnoria: writeup titlefetching.md 2021-06-11 16:56:19 +02:00			```

pre-commit applying mess round one 2023-02-26 18:01:13 +01:00			`- enables the plugin (shipped with Limnoria)`
			`- enables titlefetching for all links on line, not just the first one`
			`- disables showing domain (small protection against multiple titlesfetcherrs`
titlefetching.md: note Matrix & Telegram previews as avoidable 2021-06-11 19:29:54 +02:00			`entering a loop or simply not annoying users with clientside link previews`
			`(Matrix/Telegram bridges/relays included))`
pre-commit applying mess round one 2023-02-26 18:01:13 +01:00			`- disables showing redirect target (see previous point)`
run prettier 2024-06-19 07:53:27 +02:00			`- sets user-agent to "Limnoria UrlPreviewBot" instead of`
			`['Mozilla/5.0 (compatible; utils.web python module)' from 2005](https://github.com/ProgVal/Limnoria/blame/2990fcd302afdc6a3b741594017c3959fd5da2fd/src/utils/web.py#L120)`
pre-commit applying mess round one 2023-02-26 18:01:13 +01:00			`- I have heard that it's bad to pretend to be something you aren't and`
irc/limnoria/titlefetching.md: fix/improve typos… I hope that specifying the www subdomain will help Gitea recognise the links 2022-03-23 19:22:59 +01:00			Twitter will only give you HTML `<title>`s if your user-agent contains
run prettier 2024-06-19 07:53:27 +02:00			`UrlPreviewBot`,
			`[thanks Tulir's Synapse patch](https://mau.dev/maunium/synapse/-/commit/55d926999cffee893cb4951890a33985beaf70ba)`
			`- search for HTML titles from the first MEGABYTE of the webpage as modern web`
			`is horrible (looking at you [HS](https://hs.fi) &`
			`[YouTube](https://youtube.com))`
titlefetching: separate enabling, fleshen it & cut lines 2021-06-11 17:57:49 +02:00
			`## Actually enabling it`

			```
			`config channel #CHAN plugins.web.titleSnarfer True`
			```

run prettier 2024-06-19 07:53:27 +02:00			`- enables titlefetching per-channel, on #CHAN to be accurate (avoiding`
			`unwanted channels in case of botloop)`
pre-commit applying mess round one 2023-02-26 18:01:13 +01:00			- `"channel #CHAN"` could also be replaced with `network NETWORKNAME` for
run prettier 2024-06-19 07:53:27 +02:00			every channel on network or `config` (or omitted entirely) for everywhere
			`(channel takes priority over network which _probably_ takes priority over`
			`global)`
irc/limnoria/titlefetching.md: add excluding domains 2021-06-12 23:24:07 +02:00
			`## Excluding domains from titlefetching`

			```
limnoria/titlefetching: exclude imgur 2021-06-22 17:46:35 +02:00			`config supybot.plugins.Web.nonSnarfingRegexp m/(t.me\|matrix.to\|facebook.com\|instagram.com\|imgur.com)/`
irc/limnoria/titlefetching.md: add excluding domains 2021-06-12 23:24:07 +02:00			```

run prettier 2024-06-19 07:53:27 +02:00			`- regexp to block the listed domains, which are the first useless examples I`
			`have encountered recently. I just stole the regexp from`
irc/limnoria/titlefetching.md: add instagram to blocked 2021-06-17 13:10:04 +02:00			`[canonical Limnoria](https://github.com/ProgVal/Limnoria/wiki/Canonical-%23limnoria-doc)`
titlefetching.md: mention plugins.web.checkignored Thanks de-facto on #limnoria @ irc.libera.chat 2021-06-20 16:53:00 +02:00
			`## Titlesnarfing ignored users`

			`While I personally don't like to do this, it's possible by`

			```
			`config channel #CHAN plugins.web.checkignored False`
			```

			`I may have the bot on multiple sides of relay or the user may be ignored due`
			`to abuse so this may result into spam.`
irc/limnoria/titlefetching: mention Fediverse as a bonus 2022-03-12 08:33:12 +01:00
			`## Bonus: Fediverse`

run prettier 2024-06-19 07:53:27 +02:00			`If`
			`[the Fediverse plugin is configured with secure fetch](https://github.com/progval/Limnoria/tree/master/plugins/Fediverse),`
irc/limnoria/titlefetching: mention Fediverse as a bonus 2022-03-12 08:33:12 +01:00			`fetching Fediverse profiles/statuses/usernames can be enabled by:`

			```
			`channel #CHAN plugins.Fediverse.snarfers.profile true`
			`channel #CHAN plugins.Fediverse.snarfers.status true`
			`channel #CHAN plugins.Fediverse.snarfers.username true`
			```