squigz on freenode reported an issue where bots were responding to PING
on time, but were occasionally being timed out regardless. This was a race
condition: timeout was detected as idleTime >= it.quitTimeout, but if
the client responded promptly to its PING message and sent no further messages,
but the main loop subsequently slept for longer than expected (i.e., significantly
longer than quitTimeout), this condition would be met through no fault of the
client's.
The fix here is to explicitly track the last time the ping was sent, then test
!lastSeen.After(lastPinged) instead (making use of time.Time's monotonicity).
It is sufficient that the measurement of lastPinged happens-before the PING is sent.
This moves channel registration to an eventual consistency model,
where the in-memory datastructures (Channel and ChannelManager)
are the exclusive source of truth, and updates to them get persisted
asynchronously to the DB.
The rationale for why regenerateMembersCache didn't need to hold the Lock()
throughout was subtly wrong. It is true that at least some attempt to
regenerate the cache would see *all* the updates. However, it was possible for
the value of `result` generated by that attempt to lose the race for the final
assignment `channel.membersCache = result`.
The fix is to serialize the attempts to regenerate the cache, without adding
any additional locking on the underlying `Channel` fields via
`Channel.stateMutex`. This ensures that the final read from `Channel.members`
is paired with the final write to `Channel.membersCache`.
* move constant definitions
* always give the client at least quitTimeout to respond to ping,
even if registerTimeout or quitTimeout are longer than idleTimeout