Some drivers do not handle the colocated scan flag very well and this
results in BSS's not being seen in scans. This of course results in
very poor behavior.
This has been seen on ath11k specifically but after some
conversations [1] on the linux-wireless mailing list others have
reported issues with iwlwifi acting similarly. Since there are many
hardware variants that use both ath11k and iwlwifi this new quirk
isn't being forced to those drivers, but let users configure IWD to
disable the flag if needed.
[1] https://lore.kernel.org/linux-wireless/d1e75a08-047d-7947-d51a-2e486efead77@candelatech.com/
In kernel 6.8 a new CMD_ASSOCIATE failure path was added which checks
if the AP has a channel switch in progress. Eariler patches update
IWD into handling this case better, and this new test exercises that.
After CSA IE parsing was added to the kernel this opened up the
possibility that associations could be rejected locally based on
the contents of this CSA IE in the AP's beacons. Overall, it was
always possible for a local rejection but this case was never
considered by IWD. The CSA-based rejection is something that can
and does happen out in the wild.
When this association rejection happens it desync's IWD and the
kernel's state:
1. IWD begins an FT roam. Authenticates successfully, then proceeds
to calling netdev_ft_reassociate().
2. Immediately IWD transitions to a ft-roaming state and waits for
an association response.
3. CMD_ASSOCIATE is rejected by the kernel in the ACK which IWD
handles by sending a deauthenticate command to the kernel (since
we have a valid authentication to the new BSS).
4. Due to a bug IWD uses the target BSSID to deauthenticate which
the kernel rejects since it has no knowledge of this auth. This
error is not handled or logged.
5. IWD proceeds, assuming its deauthenticated, and transitions to a
disconnected state. The kernel remains "connected" which of course
prevents any future connections.
A simple fix for this is to address the bug (4) in IWD that deauths
using the current BSS roam target. This is actually legacy behavior
from back when IWD used CMD_AUTHENTICATE. Today the kernel is unaware
that IWD authenticated so a deauth is not going to be effective.
Instead we can issue a CMD_DISCONNECT. This is somewhat of a large
hammer, but since the handshake and internal state has already been
modified to use the new target BSS we cannot go back and maintain the
existing connect (though it is _possible_, see the TODO in the
patch).
In an ideal world userspace should never be getting a channel switch
event unless connected to an AP, but alas this has been seen at least
with ath10k hardware. This causes IWD to crash since the logic
assumes netdev->handshake is set.
These two newly parsed station info params "inactive time" and the
"connected time" would be helpful to track the duration (in ms) for
which the station was last inactive and the total duration (in s) for
which the station is currently connected to the AP.
When the wlan device is in STA mode, these fields represent the info
of this station device. And when wlan device is in AP mode, then these
fields repesents the stations that are connected to this AP device.
Since netconfig is now part of the Connect() call from a DBus
perspective add a note indicating that this method has the potential
to take a very long time if there are issues with DHCP.
Since the method return to Connect() and ConnectBssid() come after
netconfig some tests needed to be updated since they were waiting
for the method return before continuing. For timeout-based tests
specifically this caused them to fail since before they expected
the return to come before the connection was actually completed.
Let the caller specify the method timeout if there is an expectation
that it could take a long time.
For the conventional connect call (not the "bssid" debug variant) let
them pass their own callback handlers. This is useful if we don't
want to wait for the connect call to finish, but later get some
indication that it did finish either successfully or not.
A netconfig failure results in a failed connection which restarts
autoconnect and prevents IWD from retrying the connection on any
other BSS's within the network as a whole. When autoconnect restarts
IWD will scan and choose the "best" BSS which is likely the same as
the prior attempt. If that BSS is somehow misconfigured as far as
DHCP goes, it will likely fail indefinitely and in turn cause IWD to
retry indefinitely.
To improve this netconfig has been adopted into the IWD's BSS retry
logic. If netconfig fails this will not result in IWD transitioning
to a disconnected state, and instead the BSS will be network
blacklisted and the next will be tried. Only once all BSS's have been
tried will IWD go into a disconnected state and start autoconnect
over.
When netconfig is enabled the DBus reply was being sent in
station_connect_ok(), before netconfig had even started. This would
result in a call to Connect() succeeding from a DBus perspective but
really netconfig still needed to complete before IWD transitioned
to a connected state.
Fixes: 72e7d3ceb83d ("station: Handle NETCONFIG_EVENT_FAILED")
This adds a new API network_clear_blacklist() and removes this
functionality from network_connected(). This is done to support BSS
iteration when netconfig is enabled. Since a call to
network_connected() will happen prior to netconfig completing we
cannot clear the blacklist until netconfig has either passed or
failed.
If this fails, in some cases, -EAGAIN would be returned up to netdev
which would then assume a retry would be done automatically. This
would not in fact happen since it was an internal SAE failure which
would result in the connect method return to never get sent.
Now if sae_send_commit() fails, return -EPROTO which will cause
netdev to fail the connection.
A BSS can temporarily reject associations and provide a delay that
the station should wait for before retrying. This is useful when
sane values are used, but taking it to the extreme an AP could
potentially request the client wait UINT32_MAX TU's which equates
to 49 days.
Either due to a bug, or worse by design, the kernel will wait for
however long that timeout is. Luckily the kernel also sends an event
to userspace with the amount of time it will be waiting. To guard
against excessive timeouts IWD will now handle this event and enforce
a maximum allowed value. If the timeout exceeds this IWD will
deauthenticate.
Specifically for the NO_MORE_STAS reason code, add the BSS to the
(now renamed) AP_BUSY blacklist to avoid roaming to this BSS for
the near future.
Since we are now handling individual reason codes differently the
whole IS_TEMPORARY_STATUS macro was removed and replaced with a
case statement.
The initial pass of this feature only envisioned BSS transition
management frames as the trigger to "roam blacklist" a BSS, hence
the original name. But some APs actually utilize status codes that
also indicate to the stations that they are busy, or not able to
handle more connections. This directly aligns with the original
motivation of the "roam blacklist" series and these events should
also trigger this type of blacklist.
First, since we will be applying this blacklist to cases other
than being told to roam, rename this reason code internally to
BLACKLIST_REASON_AP_BUSY. The config option is also being renamed
to [Blacklist].InitialAccessPointBusyTimeout while also supporting
the old config option, but warning that it is deprecated.
Some drivers/hardware are more strict about limiting use of
certain frequencies on startup until the regulatory domain has
been set. For most cards the only way to set the regulatory domain
is to scan and see BSS's nearby that advertise the country they
reside in.
This is particularly important for AP mode since AP's are always
emitting radiation from beacons and will not start until the desired
frequency is both enabled and allows IR. To make this process
seamless in IWD we will first check that the desired frequency is
enabled/IR and if not issue a scan to (hopefully) get the regulatory
domain set.
A prior patch broke this by checking the return of
l_dbus_message_iter_next_entry. This was really subtle but the logic
actually relied on _not_ checking that return in order to handle
empty lists.
Instead of reverting the logic was adapted/commented to make it more
clear what the API expects from DBus. If list contains at least one
value the first element path will get set, if it contains zero
values "new_path" will be set to NULL which will then cause the
list to be cleared later on.
This both fixes the regression, and makes it clear that a zero
element list is supported and handled.
The length of EncryptedSecurity was assumed to be at least 16 bytes
and anything less would underflow the length to l_malloc.
Fixes: 01cd8587606b ("storage: implement network profile encryption")
The MAC and version elements weren't super critical but the channel
and bootstrapping key elements would result in memory leaks if there
were duplicates.
This patch now will not allow duplicate elements in the URI.
Fixes: f7f602e1b1e7 ("dpp-util: add URI parsing")
The survey arrays were exactly the number of valid channels for a
given band (e.g. 14 for 2.4GHz) but since channels start at 1 this
means that the last channel for a band would overflow the array.
Fixes: 35808debaefd ("scan: use GET_SURVEY for SNR calculation in ranking")
Supporting PMKSA on fullmac drivers requires that we set the PMKSA
into the kernel as well as remove it. This can now be triggered
via the new PMKSA driver callbacks which are implemented and set
with this patch.