Commit Graph

27 Commits

Author SHA1 Message Date
Alvar Penning
5aed9591e1 Detect connection loss through IRC PONG
In the current state, the alertmanager-irc-relay already sends minutely
IRC PINGs. This allows to check the IRC connection's health in protocol
without having to deal with specific TCP settings. However, even when
we are sending those PINGs, we don't process the server's PONGs or their
absence.

On one of my alertmanager-irc-relay instances, the time between the last
received PONG and the TCP read to fail was round about fifteen minutes.
All this time, the connection was already dead, but there was no attempt
to reestablish it.

The introduces changes keep book on the last received PONG and fails if
there was no new PONG within twice the pingFrequencySecs time. When
establishing a new connection during the SetupPhase, the current time
will be set as the last PONG's time to reset the time comparison.
2023-06-07 15:39:54 +02:00
Alvar Penning
d47139a6d6 Restore IRC Ident on Reconnect
After a connection loss on an IRC session with a ngIRCd, the
alertmanager-irc-relay was unable to reconnect. After some debugging,
the error's origin was the state tracking within the used goirc library.
When using an unidentified session, ngIRCd prefixes the user's ident
with a `~`. The state tracking registers this and keeps `~${NICK}` as
the current and the new ident for future reconnects. However, `~` is not
a valid char for the `<user>` part in the `USER` command, at least not
for ngIRCd.

To clarify this behaviour, take a look at the following log. First, the
initial connection is begin established correctly. Keep an eye on the
`USER` command being sent to the server.

> http.go:132: INFO Starting HTTP server
> irc.go:308: INFO Connected to IRC server, waiting to establish session
> connection.go:543: DEBUG -> NICK alertbot
> connection.go:543: DEBUG -> USER alertbot 12 * :Alertmanager IRC Relay
> connection.go:474: DEBUG <- :__SERVER__ 001 alertbot :Welcome to the Internet Relay Network alertbot!~alertbot@__IP__

Now, there was a network incident and the session needs to be recreated.

> connection.go:466: ERROR irc.recv(): read tcp __REDACTED__: read: connection timed out
> connection.go:577: INFO irc.Close(): Disconnected from server.
> irc.go:150: INFO Disconnected from IRC
> reconciler.go:129: INFO Channel #alerts monitor: context canceled while monitoring
> irc.go:300: INFO Connecting to IRC __SERVER__
> backoff.go:111: INFO Backoff for 0s starts
> backoff.go:114: INFO Backoff for 0s ends
> connection.go:390: INFO irc.Connect(): Connecting to __SERVER__.
> irc.go:308: INFO Connected to IRC server, waiting to establish session
> connection.go:543: DEBUG -> NICK alertbot
> connection.go:543: DEBUG -> USER ~alertbot 12 * :Alertmanager IRC Relay
> connection.go:474: DEBUG <- ERROR :Invalid user name
> connection.go:577: INFO irc.Close(): Disconnected from server.
> irc.go:150: INFO Disconnected from IRC
> irc.go:319: WARN Receiving a session down before the session is up, this is odd

This time, the used `user` part of the `USER` command has the prefixed
`~` and fails. However, without using `-debug` and taking a very close
look, this error can be missed very easy.

As the new ident is invalid, the alertmanager-irc-relay is now stuck in
an endless reconnection loop.

This fix is kind of straight forward and just checks if the ident has
changed before trying to reconnect. It might not be the prettiest
solution, but recreating the whole *irc.Config resulted in other bugs as
it was still referenced - even after being `Close`d.
2023-06-07 15:39:54 +02:00
Erik Mackdanz
41b9ed2dde Add NickservName and ChanservName to fix tests 2022-10-18 22:56:45 -05:00
Luca Bigliardi
a63bfa3aad explicitly handle nickserv identify request
handle nickserv identify requests instead of blindly issuing a message
when connected

this helps if nickserv's state is wiped and we are being asked to
re-identify

introduce a `nickserv_identify_patterns` config option. these patterns
are used to guess identify requests of the various nickserv implementations

Signed-off-by: Luca Bigliardi <shammash@google.com>
2021-04-16 18:17:08 +02:00
Luca Bigliardi
9748c8bfbf try to unban ourselves before joining a channel
Signed-off-by: Luca Bigliardi <shammash@google.com>
2021-04-16 13:53:06 +02:00
Luca Bigliardi
2eb5fb9aa5 add own logging and define debug flag
Signed-off-by: Luca Bigliardi <shammash@google.com>
2021-04-07 03:20:52 +02:00
Luca Bigliardi
dcfa3cccf0 server side cleanup should no longer be necessary w/ new goirc, removing
Signed-off-by: Luca Bigliardi <shammash@google.com>
2021-03-29 16:18:09 +02:00
Luca Bigliardi
559b817262 Add tests for channel join retry logic
Also adopt interface for querying time information, so it can be faked
properly during at test time

Signed-off-by: Luca Bigliardi <shammash@google.com>
2021-03-29 16:06:36 +02:00
Luca Bigliardi
4d0f1f26b0 Graceful disconnect upon context cancel
Make sure the underlying library context cancellation happens only
after the session has been shutdown.

Signed-off-by: Luca Bigliardi <shammash@google.com>
2021-03-27 17:29:54 +01:00
Luca Bigliardi
882cecd6a6 simplify half connected test
Signed-off-by: Luca Bigliardi <shammash@google.com>
2021-03-27 17:27:35 +01:00
Luca Bigliardi
2990b5a309 fix ghost test
Signed-off-by: Luca Bigliardi <shammash@google.com>
2021-03-27 15:52:48 +01:00
Luca Bigliardi
0ec08d5ea1 stop storing context received from outside
Signed-off-by: Luca Bigliardi <shammash@google.com>
2021-03-27 12:35:38 +01:00
Luca Bigliardi
0b2fbef1f2 new channel management logic
this should handle bans and kicks a bit better

Signed-off-by: Luca Bigliardi <shammash@google.com>
2021-03-27 00:49:16 +01:00
Luca Bigliardi
c22e7a0c84 stop using multiple channels in basic conn/auth tests
Signed-off-by: Luca Bigliardi <shammash@google.com>
2021-03-27 00:15:35 +01:00
Luca Bigliardi
4e0b0497f4 move channel-specific tests to new management object
Signed-off-by: Luca Bigliardi <shammash@google.com>
2021-03-26 23:52:07 +01:00
Luca Bigliardi
9690575d68 refactor: move test server in separate file
Signed-off-by: Luca Bigliardi <shammash@google.com>
2021-03-26 23:08:07 +01:00
Luca Bigliardi
cb65b4d28d Add factory-like interface to generate Delayers
Will be used to inject fake delayers in objects created during tests

Signed-off-by: Luca Bigliardi <shammash@google.com>
2021-03-26 12:34:36 +01:00
Luca Bigliardi
bc13e4be9c Move Delayer/Backoff stub in its own file
Signed-off-by: Luca Bigliardi <shammash@google.com>
2021-03-25 23:28:13 +01:00
Luca Bigliardi
1eeb4dda9c Prevent race condition in TestConnectErrorRetry
Signed-off-by: Luca Bigliardi <shammash@google.com>
2021-02-24 20:45:52 +01:00
Luca Bigliardi
2471b866f1 Stop sending messages while disconnected
Make sure the session is up before consuming alert messages.
Also, split main run loop for readability.

Signed-off-by: Luca Bigliardi <shammash@google.com>
2021-02-24 17:17:46 +01:00
Luca Bigliardi
82af7c1f69 Add Context support to Backoff
Signed-off-by: Luca Bigliardi <shammash@google.com>
2021-02-24 17:14:40 +01:00
Luca Bigliardi
bde6681de9 Use Context and WaitGroup for routines coordination
Signed-off-by: Luca Bigliardi <shammash@google.com>
2021-02-24 15:49:33 +01:00
Luca Bigliardi
826f088241 Handle IRC server password
Introduce optional config parameter 'irc_host_password' to specify the
IRC server password.

Signed-off-by: Luca Bigliardi <shammash@google.com>
2020-11-05 11:05:15 +01:00
Goutham Veeramachaneni
219d3672b7
Clean up so that lint would pass
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
2020-01-30 17:38:45 +01:00
Luca Bigliardi
ae6594c606 Add config option to deliver alerts with PRIVMSG
Add `use_privmsg` config option to deliver alerts with PRIVMSG instead
of the default NOTICE.

This addresses a use case described in
https://github.com/google/alertmanager-irc-relay/pull/1 .

Signed-off-by: Luca Bigliardi <shammash@google.com>
2020-01-25 18:03:13 +00:00
Luca Bigliardi
4e1aeaf931 s/notice/msg/
Use a more generic name as there is soon going to be support for PRIVMSG
(see https://github.com/google/alertmanager-irc-relay/pull/1 for
background).

This introduces a backward-incompatible change in the config file for
these two parameters:
- notice_template -> msg_template
- notice_once_per_alert_group -> msg_once_per_alert_group

I am not introducing the new parameters with a deprecation plan since
both parameters are relatively secondary to the core functioning of the
bot (and this is a free time project after all).

Signed-off-by: Luca Bigliardi <shammash@google.com>
2020-01-25 16:42:59 +00:00
Luca Bigliardi
60632b16e6 Initial code check-in
Signed-off-by: Luca Bigliardi <shammash@google.com>
2018-05-21 15:49:47 +01:00