titlefetching: separate enabling, fleshen it & cut lines

This commit is contained in:
Aminda Suomalainen 2021-06-11 18:57:49 +03:00
parent a26a761849
commit eb33940d44
Signed by: Mikaela
GPG Key ID: 99392F62BAE30723

View File

@ -1,21 +1,37 @@
# A bit opinionated titlefetching # A bit opinionated titlefetching
## Preparation
``` ```
load Web load Web
config plugins.web.snarfMultipleUrls True config plugins.web.snarfMultipleUrls True
config plugins.web.snarferShowDomain False config plugins.web.snarferShowDomain False
config plugins.web.snarferShowTargetDomain False config plugins.web.snarferShowTargetDomain False
config channel #CHAN plugins.web.titleSnarfer True
config supybot.protocols.http.userAgents "Limnoria UrlPreviewBot" config supybot.protocols.http.userAgents "Limnoria UrlPreviewBot"
config supybot.protocols.http.peekSize 1048576 config supybot.protocols.http.peekSize 1048576
``` ```
* enables the plugin (shipped with Limnoria) * enables the plugin (shipped with Limnoria)
* enables titlefetching for all links on line, not just the first one * enables titlefetching for all links on line, not just the first one
* disables showing domain (small protection against multiple titlesnarfers entering loop) * disables showing domain (small protection against multiple titlesnarfers
entering loop)
* disables showing redirect target -||- * disables showing redirect target -||-
* enables titlefetching per-channel (avoids unwanted channels in case of botloop)
* could also be `config config` or `config network NETWORKNAME` to do globally or per-network, but risk of accidental botloop
* sets user-agent to "Limnoria UrlPreviewBot" instead of ['Mozilla/5.0 (compatible; utils.web python module)' from 2005](https://github.com/ProgVal/Limnoria/blame/2990fcd302afdc6a3b741594017c3959fd5da2fd/src/utils/web.py#L120) * sets user-agent to "Limnoria UrlPreviewBot" instead of ['Mozilla/5.0 (compatible; utils.web python module)' from 2005](https://github.com/ProgVal/Limnoria/blame/2990fcd302afdc6a3b741594017c3959fd5da2fd/src/utils/web.py#L120)
* I have heard that it's bad to pretend to be something you aren't and Twitter will only give you HTMl `<title>`s if your user-agent contains `UrlPreviewBot` * I have heard that it's bad to pretend to be something you aren't and
* search for html titles from the first MEGABYTE of the webpage as modern web is horrible (looking at you hs.fi & youtube.com) Twitter will only give you HTMl `<title>`s if your user-agent contains
`UrlPreviewBot`
* search for html titles from the first MEGABYTE of the webpage as modern
web is horrible (looking at you hs.fi & youtube.com)
## Actually enabling it
```
config channel #CHAN plugins.web.titleSnarfer True
```
* enables titlefetching per-channel, on #CHAN to be accurate
(avoiding unwanted channels in case of botloop)
* `"channel #CHAN"` could also be replaced with `network NETWORKNAME` for
every channel on network or `config` (or omitted entirely) for
everywhere (channel takes priority over network which *probably takes*
priority over global)