This adds a ref count to the handshake state object (as well as
ref/unref APIs). Currently IWD is careful to ensure that netdev
holds the root reference to the handshake state. Other modules do
track it themselves, but ensure that it doesn't get referenced
after netdev frees it.
Future work related to PMKSA will require that station holds a
references to the handshake state, specifically for retry logic,
after netdev is done with it so we need a way to delay the free
until station is also done.
The utilization rank factor already existed but was very rigid
and only checked a few values. This adds the (optional) ability
to start applying an exponentially decaying factor to both
utilization and station count after some threshold is reached.
This area needs to be re-worked in order to support very highly
loaded networks. If a network either doesn't support client
balancing or does it poorly its left up to the clients to choose
the best BSS possible given all the information available. In
these cases connecting to a highly loaded BSS may fail, or result
in a disconnect soon after connecting. In these cases its likely
better for IWD to choose a slightly lower RSSI/datarate BSS over
the conventionally 'best' BSS in order to aid in distributing
the network load.
The thresholds are currently optional and not enabled by default
but if set they behave as follows:
If the value is above the threshold it is mapped to an integer
between 0 and 30. (using a starting range of <value> - 255).
This integer is then used to index in the exponential decay table
to get a factor between 1 and 0. This factor is then applied to
the rank.
Note that as the value increases above the threshold the rank
will be increasingly effected, as is expected for an exponential
function. These option should be used with care as it may have
unintended consequences, especially with very high load networks.
i.e. you may see IWD roaming to BSS's with much lower signal if
there are high load BSS's nearby.
To maintain the existing behavior if there is no utilization
factor set in main.conf the legacy thresholds/factors will be
used.
This is copied from network.c that uses a static table to lookup
exponential decay values by index (generated from 1/pow(n, 0.3)).
network.c uses this for network ranking but it can be useful for
BSS ranking as well if you need to apply some exponential backoff
to a value.
This has been needed elsewhere but generally shortcuts could be
taken mapping with ranges starting/ending with zero. This is a
more general linear mapping utility to map values between any
two ranges.
gcc-15 switched to -std=c23 by default:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=55e3bd376b2214e200fa76d12b67ff259b06c212
As a result `iwd` fails the build as:
../src/crypto.c:1215:24: error: incompatible types when returning type '_Bool' but 'struct l_ecc_point *' was expected
1215 | return false;
| ^~~~~
Signed-off-by: Rudi Heitbaum <rudi@heitbaum.com>
After the band is established we check the e4 table for the channel
that matches. The problem here is we will end up checking all the
operating classes, even those that are not within the band that was
determined. This could result in false positives and return a
channel that doesn't make sense.
When the frequencies/channels were parsed there was no check that the
resulting band matched what was expected. Now, pass the band object
itself in which has the band set to what is expected.
If IPv6 is disabled or not supported at the kernel level writing the
sysfs settings will fail. A few of them had a support check but this
patch adds a supported bool to the remainder so we done get errors
like:
Unable to write drop_unsolicited_na to /proc/sys/net/ipv6/conf/wlan0/drop_unsolicited_na
Similar to several other modules DPP registers for its frame
watches on init then ignores anything is receives unless DPP
is actually running.
Due to some recent issues surrounding ath10k and multicast frames
it was discovered that simply registering for multicast RX frames
causes a significant performance impact depending on the current
channel load.
Regardless of the impact to a single driver, it is actually more
efficient to only register for the DPP frames when DPP starts
rather than when IWD initializes. This prevents any of the frames
from hitting userspace which would otherwise be ignored.
Using the frame-xchg group ID's we can only register for DPP
frames when needed, then close that group and the associated
frame watches.
DPP optionally uses the multicast RX flag for frame registrations but
since frame-xchg did not support that, it used its own registration
internally. To avoid code duplication within DPP add a flag to
frame_watch_add in order to allow DPP to utilize frame-xchg.
The selection loop was choosing an initial candidate purely for
use of the "fallback_to_blacklist" flag. But we have a similar
case with OWE transitional networks where we avoid the legacy
open network in preference for OWE:
/* Don't want to connect to the Open BSS if possible */
if (!bss->rsne)
continue;
If no OWE network gets selected we may iterate all BSS's and end
the loop, which then returns NULL.
To fix this move the blacklist check earlier and still ignore any
BSS's in the blacklist. Also add a new flag in the selection loop
indicating an open network was skipped. If we then exhaust all
other BSS's we can return this candidate.
Some drivers like brcmfmac don't support OWE but from userspace its
not possible to query this information. Rather than completely
blacklist brcmfmac we can allow the user to configure this and
disable OWE in IWD.
The "UK" alpha2 code is not the official code for the United Kingdom
but is a "reserved" code for compatibility. The official alpha2 is
"GB" which is being added to the EU list. This fixes issues parsing
neighbor reports, for example:
src/station.c:parse_neighbor_report() Neighbor report received for xx:xx:xx:xx:xx:xx: ch 136 (oper class 3), MD not set
Failed to find band with country string 'GB 32' and oper class 3, trying fallback
src/station.c:station_add_neighbor_report_freqs() Ignored: unsupported oper class
Prior to adding the polling fallback this code path was only used for
signal level list notifications and netdev_rssi_polling_update() was
structured as such, where if the RSSI list feature existed there was
nothing to be done as the kernel handled the notifications.
For certain mediatek cards this is broken, hence why the fallback was
added. But netdev_rssi_polling_update() was never changed to take
this into account which bypassed the timer cleanup on disconnections
resulting in a crash when the timer fired after IWD was disconnected:
iwd: ++++++++ backtrace ++++++++
iwd: #0 0x7b5459642520 in /lib/x86_64-linux-gnu/libc.so.6
iwd: #1 0x7b54597aedf4 in /lib/x86_64-linux-gnu/libc.so.6
iwd: #2 0x49f82d in l_netlink_message_append() at ome/jprestwood/iwd/ell/netlink.c:825
iwd: #3 0x4a0c12 in l_genl_msg_append_attr() at ome/jprestwood/iwd/ell/genl.c:1522
iwd: #4 0x405c61 in netdev_rssi_poll() at ome/jprestwood/iwd/src/netdev.c:764
iwd: #5 0x49cce4 in timeout_callback() at ome/jprestwood/iwd/ell/timeout.c:70
iwd: #6 0x49c2ed in l_main_iterate() at ome/jprestwood/iwd/ell/main.c:455 (discriminator 2)
iwd: #7 0x49c3bc in l_main_run() at ome/jprestwood/iwd/ell/main.c:504
iwd: #8 0x49c5f0 in l_main_run_with_signal() at ome/jprestwood/iwd/ell/main.c:632
iwd: #9 0x4049ed in main() at ome/jprestwood/iwd/src/main.c:614
iwd: #10 0x7b5459629d90 in /lib/x86_64-linux-gnu/libc.so.6
iwd: #11 0x7b5459629e40 in /lib/x86_64-linux-gnu/libc.so.6
iwd: +++++++++++++++++++++++++++
To fix this we need to add checks for the cqm_poll_fallback flag in
netdev_rssi_polling_update().
Certain FullMAC drivers do not expose CMD_ASSOCIATE/CMD_AUTHENTICATE,
but lack the ability to fully offload SAE connections to the firmware.
Such connections can still be supported on such firmware by using
CMD_EXTERNAL_AUTH & CMD_FRAME. The firmware sets the
NL80211_FEATURE_SAE bit (which implies support for CMD_AUTHENTICATE, but
oh well), and no other offload extended features.
When CMD_CONNECT is issued, the firmware sends CMD_EXTERNAL_AUTH via
unicast to the owner of the connection. The connection owner is then
expected to send SAE frames with the firmware using CMD_FRAME and
receive authenticate frames using unicast CMD_FRAME notifications as
well. Once SAE authentication completes, userspace is expected to
send a final CMD_EXTERNAL_AUTH back to the kernel with the corresponding
status code. On failure, a non-0 status code should be used.
Note that for historical reasons, SAE AKM sent in CMD_EXTERNAL_AUTH is
given in big endian order, not CPU order as is expected!
The TX or RX bitrate attributes can contain zero nested attributes.
This causes netdev_parse_bitrate() to fail, but this shouldn't then
cause the overall parsing to fail (we just don't have those values).
Fix this by continuing to parse attributes if either the TX/RX
bitrates fail to parse.
If the affinity watch is removed by setting an empty list the
disconnect callback won't be called which was the only place
the watch ID was cleared. This resulted in the next SetProperty call
to think a watch existed, and attempt to compare the sender address
which would be NULL.
The watch ID should be cleared inside the destroy callback, not
the disconnect callback.
If we scan a huge number of frequencies the PKEX timeout can get
rather large. This was overlooked in a prior patch who's intent
was to reduce the PKEX time, but in these cases it increased it.
Now the timeout will be capped at 2 minutes, but will still be
as low as 10 seconds for a single frequency.
In addition there was no timer reset once PKEX was completed.
This could cause excessive waits if, for example, the peer left
the channel mid-authentication. IWD would just wait until the
long PKEX timeout to eventually reset DPP. Once PKEX completes
we can assume that this peer will complete authentication quickly
and if not, we can fail.
While there is proper handling for a regdom update during a
TRIGGER_SCAN scan, prior to NEW_SCAN_RESULTS there is no such
handling if the regdom update comes in during a GET_SCAN or
GET_SURVEY.
In both the 6ghz and non-6ghz code paths we have some issues:
- For non-6ghz devices, or regdom updates that did not enable
6ghz the wiphy state watch callback will automatically issues
another GET_SURVEY/GET_SCAN without checking if there was
already one pending. It does this using the current scan request
which gets freed by the prior GET_SCAN/GET_SURVEY calls when
they complete, causing invalid reads when the subsequent calls
finish.
- If 6ghz was enabled by the update we actually append another
trigger command to the list and potentially run it if its the
current request. This also will end up in the same situation as
the request is freed by the pending GET_SURVEY/GET_SCAN calls.
For the non-6ghz case there is little to no harm in ignoring the
regdom update because its very unlikely it changed the allowed
frequencies.
For the 6ghz case we could potentially handle the new trigger scan
within get_scan_done, but thats beyond the scope of this change
and is likely quite intrusive.
Since surveys end up making driver calls in the kernel its not
entirely known how they are implemented or how long they will
take. For this reason the survey will be skipped if getting the
results from an external scan.
Doing this also fixes a crash caused by external scans where the
scan request pointer is not checked and dereferenced:
0x00005ffa6a0376de in get_survey_done (user_data=0x5ffa783a3f90) at src/scan.c:2059
0x0000749646a29bbd in ?? () from /usr/lib/libell.so.0
0x0000749646a243cb in ?? () from /usr/lib/libell.so.0
0x0000749646a24655 in l_main_iterate () from /usr/lib/libell.so.0
0x0000749646a24ace in l_main_run () from /usr/lib/libell.so.0
0x0000749646a263a4 in l_main_run_with_signal () from /usr/lib/libell.so.0
0x00005ffa6a00d642 in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:614
Reported-by: Daniel Bond <danielbondno@gmail.com>
With the introduction of affinities the CQM threshold can be toggled
by a DBus call. There was no check if there was already a pending
call which would cause the command ID to be overwritten and lose any
potential to cancel it, e.g. if netdev went down.
Some drivers fail to set a CQM threshold and report not supported.
Its unclear exactly why but if this happens roaming is effectively
broken.
To work around this enable RSSI polling if -ENOTSUP is returned.
The polling callback has been changed to emit the HIGH/LOW signal
threshold events instead of just the RSSI level index, just as if
a CQM event came from the kernel.
When the affinity is set to the current BSS lower the roaming
threshold to loosly lock IWD to the current BSS. The lower
threshold is automatically removed upon roaming/disconnection
since the affinity array is also cleared out.
This property will hold an array of object paths for
BasicServiceSet (BSS) objects. For the purpose of this patch
only the setter/getter and client watch is implemented. The
purpose of this array is to guide or loosely lock IWD to certain
BSS's provided that some external client has more information
about the environment than what IWD takes into account for its
roaming decisions.
For the time being, the array is limited to only the connected
BSS path, and any roams or disconnects will clear the array.
The intended use case for this is if the device is stationary
an external client could reduce the likelihood of roaming by
setting the affinity to the current BSS.
This documents new DBus property that expose a bit more control to
how IWD roams.
Setting the affinity on the connected BSS effectively "locks" IWD to
that BSS (except at critical RSSI levels, explained below). This can
be useful for clients that have access to more information about the
environment than IWD. For example, if a client is stationary there
is likely no point in trying to roam until it has moved elsewhere.
A new main.conf option would also be added:
[General].CriticalRoamThreshold
This would be the new roam threshold set if the currently connected
BSS is in the Affinities list. If the RSSI continues to drop below
this level IWD will still attempt to roam.
A user reported a crash which was due to the roam trigger timeout
being overwritten, followed by a disconnect. Post-disconnect the
timer would fire and result in a crash. Its not clear exactly where
the overwrite was happening but upon code inspection it could
happen in the following scenario:
1. Beacon loss event, start roam timeout
2. Signal low event, no check if timeout is running and the timeout
gets overwritten.
The reported crash actually didn't appear to be from the above
scenario but something else, so this logic is being hardened and
improved
Now if a roam timeout already exists and trying to be rearmed IWD
will check the time remaining on the current timer and either keep
the active timer or reschedule it to the lesser of the two values
(current or new rearm time). This will avoid cases such as a long
roam timer being active (e.g. 60 seconds) followed by a beacon or
packet loss event which should trigger a more agressive roam
schedule.
This adds a secondary set of signal thresholds. The purpose of these
are to provide more flexibility in how IWD roams. The critical
threshold is intended to be temporary and is automatically reset
upon any connection changes: disconnects, roams, or new connections.
This prepares for the ability to toggle between two signal
thresholds in netdev. Since each netdev may not need/want the
same threshold store it in the netdev object rather than globally.