The survey arrays were exactly the number of valid channels for a
given band (e.g. 14 for 2.4GHz) but since channels start at 1 this
means that the last channel for a band would overflow the array.
Fixes: 35808debaefd ("scan: use GET_SURVEY for SNR calculation in ranking")
The utilization rank factor already existed but was very rigid
and only checked a few values. This adds the (optional) ability
to start applying an exponentially decaying factor to both
utilization and station count after some threshold is reached.
This area needs to be re-worked in order to support very highly
loaded networks. If a network either doesn't support client
balancing or does it poorly its left up to the clients to choose
the best BSS possible given all the information available. In
these cases connecting to a highly loaded BSS may fail, or result
in a disconnect soon after connecting. In these cases its likely
better for IWD to choose a slightly lower RSSI/datarate BSS over
the conventionally 'best' BSS in order to aid in distributing
the network load.
The thresholds are currently optional and not enabled by default
but if set they behave as follows:
If the value is above the threshold it is mapped to an integer
between 0 and 30. (using a starting range of <value> - 255).
This integer is then used to index in the exponential decay table
to get a factor between 1 and 0. This factor is then applied to
the rank.
Note that as the value increases above the threshold the rank
will be increasingly effected, as is expected for an exponential
function. These option should be used with care as it may have
unintended consequences, especially with very high load networks.
i.e. you may see IWD roaming to BSS's with much lower signal if
there are high load BSS's nearby.
To maintain the existing behavior if there is no utilization
factor set in main.conf the legacy thresholds/factors will be
used.
While there is proper handling for a regdom update during a
TRIGGER_SCAN scan, prior to NEW_SCAN_RESULTS there is no such
handling if the regdom update comes in during a GET_SCAN or
GET_SURVEY.
In both the 6ghz and non-6ghz code paths we have some issues:
- For non-6ghz devices, or regdom updates that did not enable
6ghz the wiphy state watch callback will automatically issues
another GET_SURVEY/GET_SCAN without checking if there was
already one pending. It does this using the current scan request
which gets freed by the prior GET_SCAN/GET_SURVEY calls when
they complete, causing invalid reads when the subsequent calls
finish.
- If 6ghz was enabled by the update we actually append another
trigger command to the list and potentially run it if its the
current request. This also will end up in the same situation as
the request is freed by the pending GET_SURVEY/GET_SCAN calls.
For the non-6ghz case there is little to no harm in ignoring the
regdom update because its very unlikely it changed the allowed
frequencies.
For the 6ghz case we could potentially handle the new trigger scan
within get_scan_done, but thats beyond the scope of this change
and is likely quite intrusive.
Since surveys end up making driver calls in the kernel its not
entirely known how they are implemented or how long they will
take. For this reason the survey will be skipped if getting the
results from an external scan.
Doing this also fixes a crash caused by external scans where the
scan request pointer is not checked and dereferenced:
0x00005ffa6a0376de in get_survey_done (user_data=0x5ffa783a3f90) at src/scan.c:2059
0x0000749646a29bbd in ?? () from /usr/lib/libell.so.0
0x0000749646a243cb in ?? () from /usr/lib/libell.so.0
0x0000749646a24655 in l_main_iterate () from /usr/lib/libell.so.0
0x0000749646a24ace in l_main_run () from /usr/lib/libell.so.0
0x0000749646a263a4 in l_main_run_with_signal () from /usr/lib/libell.so.0
0x00005ffa6a00d642 in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:614
Reported-by: Daniel Bond <danielbondno@gmail.com>
This will help to get rid of magic number use throughout the project.
The definitions should be limited to global magic numbers that are used
throughout the project, for example SSID length, MAC address length,
etc.
When the survey code was added it neglected to add the same
cancelation logic that existed for the GET_SCAN call, i.e. if
a scan was canceled and there was a pending GET_SURVEY to the
kernel that needs to be canceled, and the request cleaned up.
Fixes: 35808debae ("scan: use GET_SURVEY for SNR calculation in ranking")
For ranking purposes the utilization was defaulted to a valid (127)
which would not change the rank if that IE was not found in the
scan results. Historically this was printed (debug) as part of the
scan results but that was removed as it was somewhat confusing. i.e.
did the AP _really_ have a utilization of 127? or was the IE not
found?
Since it is useful to see the BSS load if that is advertised add a
flag to the scan_bss struct to indicate if the IE was present which
can be checked.
This issues a GET_SURVEY dump after scan results are available and
populates the survey information within the scan results. Currently
the only value obtained is the noise for a given frequency but the
survey results structure was created if in the future more values
need to be added.
From the noise, the SNR can be calculated. This is then used in the
ranking calculation to help lower BSS ranks that are on high noise
channels.
Parsing the flush flag for external scans was not done correctly
as it was not parsing the ATTR_SCAN_FLAGS but instead the flag
bitmap. Fix this by parsing the flags attribute, then checking if
the bit is set.
This was added to support a single buggy AP model that failed to
negotiate the SAE group correctly. This may still be a problem but
since then the [Network].UseDefaultEccGroup option has been added
which accomplishes the same thing.
Remove the special handling for this specific OUI and rely on the
user setting the new option if they have problems.
This mispelling was present in the configuration, so I retained parsing
of the legacy BandModifier*Ghz options for compatibility. Without this
change anyone spelling GHz correctly in their configs would be very
confused.
wiphy_get_allowed_freqs was only being used to see if 6GHz was disabled
or not. This is expensive and requires several allocations when there
already exists wiphy_is_band_disabled(). The prior patch modified
wiphy_is_band_disabled() to return -ENOTSUP which allows scan.c to
completely remove the need for wiphy_get_allowed_freqs.
scan_wiphy_watch was also slightly re-ordered to avoid allocating
freqs_6ghz if the scan request was being completed.
To support user-disabled bands periodic scans need to specify a
frequency list filtered by any bands that are disabled. This was
needed in scan.c since periodic scans don't provide a frequency
list in the scan request.
If no bands are disabled the allowed freqs API should still
result in the same scan behavior as if a frequency list is left
out i.e. IWD just filters the frequencies as opposed to the kernel.
Currently the only way a scan can be split is if the request does
not specify any frequencies, implying the request should scan the
entire spectrum. This allows the scan logic to issue an extra
request if 6GHz becomes available during the 2.4 or 5GHz scans.
This restriction was somewhat arbitrary and done to let periodic
scans pick up 6GHz APs through a single scan request.
But now with the addition of allowing user-disabled bands
periodic scans will need to specify a frequency list in case a
given band has been disabled. This will break the scan splitting
code which is why this prep work is being done.
The main difference now is the original scan frequencies are
tracked with the scan request. The reason for this is so if a
request comes in with a limited set of 6GHz frequences IWD won't
end up scanning the full 6GHz spectrum later on.
This exposes the [Rank].BandModifier* settings so other modules
can use then. Doing this will allow user-disabling of certain
bands by setting these modifier values to 0.0.
Removed several debug prints which are very verbose and provide
little to no important information.
The get_scan_{done,callback} prints are pointless since all the
parsed scan results are printed by station anyways.
Printing the BSS load is also not that useful since it doesn't
include the BSSID. If anything the BSS load should be included
when station prints out each individual BSS (along with frequency,
rank, etc).
The advertisement protocol print was just just left in there by
accident when debugging, and also provides basically no useful
information.
If the regdom updates during a periodic scan the results will be
delayed until after the update in order to, potentially, add 6GHz
frequencies since they may become available. The delayed results
happen regardless of 6GHz support but scan_wiphy_watch() was
returning early if 6GHz was not supported causing the scan request
to never complete.
The blamed commit argues that the periodic scan callback doesn't do
anything useful in the event of an aborted scan, but this is not
entirely true. In particular, the callback is responsible for re-arming
the periodic scan timer. Make sure to call scan_finished() so that iwd's
periodic scanning logic continues unabated even when a periodic scan is
aborted.
Also remove the periodic boolean member of struct scan_request, as it
serves no purpose anymore.
Fixes: 6051a1495227 ("scan: Don't callback on SCAN_ABORTED")
wiphy_get_supported_rates expected an enum defined in the nl80211
header but the argument type was an unsigned int, not exactly
intuitive to anyone using the API. Since the nl80211 enum value
was only used in a switch statement it could just as well be IWD's
internal enum band_freq.
This also allows modules which do not reference nl80211.h to use
wiphy_get_supported_rates().
If a CMD_TRIGGER_SCAN request fails with -EBUSY, iwd currently assumes
that a scan is ongoing on the underlying wdev and will retry the same
command when that scan is complete. It gets notified of that completion
via the scan_notify() function, and kicks the scan logic to try again.
However, if there is another wdev on the same wiphy and that wdev has a
scan request in flight, the kernel will also return -EBUSY. In other
words, only one scan request per wiphy is permitted.
As an example, the brcmfmac driver can create an AP interface on the
same wiphy as the default station interface, and scans can be triggered
on that AP interface.
If -EBUSY is returned because another wdev is scanning, then iwd won't
know when it can retry the original trigger request because the relevant
netlink event will arrive on a different wdev. Indeed, if no scan
context exists for that other wdev, then scan_notify will return early
and the scan logic will stall indefinitely.
Instead, and in the event that no scan context matches, use it as a cue
to retry a pending scan request that happens to be destined for the same
wiphy.
This functionality works around the kernel's behavior of allowing
6GHz only after a regulatory domain update. If the regdom updates
scan.c needs to be aware in order to split up periodic scans, or
insert 6GHz frequencies into an ongoing periodic scan. Doing this
allows any 6GHz BSS's to show up in the scan results rather than
needing to issue an entirely new scan to see these BSS's.
The kernel's regulatory domain updates after some number of beacons
are processed. This triggers a regulatory domain update (and wiphy
dump) but only after a scan request. This means a full scan started
prior to the regdom being set will not include any 6Ghz BSS's even
if the regdom was unlocked during the scan.
This can be worked around by splitting up a large scan request into
multiple requests allowing one of the first commands to trigger a
regdom update. Once the regdom updates (and wiphy dumps) we are
hopefully still scanning and could append an additional request to
scan 6GHz.
In the case of an external scan, we won't have a scan_request object,
sr. Make sure to not crash in this case.
Also, since scan_request can no longer carry the frequency set in all
cases, add a new member to scan_results in order to do so.
Fixes: 27d8cf4ccc59 ("scan: track scanned frequencies for entire request")
The NEW_SCAN_RESULTS handling was written to only parse the frequency
list if there were no additional scan commands to send. This results in
the scan callback containing frequencies of only the last CMD_TRIGGER.
Until now this worked fine because a) the queue is only used for hidden
networks and b) frequencies were never defined by any callers scanning
for hidden networks (e.g. dbus/periodic scans).
Soon the scan command queue will be used to break up scan requests
meaning only the last scan request frequencies would be used in the
callback, breaking the logic in station.
Now the NEW_SCAN_RESULTS case will parse the frequencies for each scan
command rather than only the last.
The scan_passive API wasn't using a const struct scan_freq_set as it
should be since it's not modifying the contents. Changing this to
const did require some additional changes like making the scan_parameters
'freqs' member const as well.
After changing scan_parameters, p2p needed updating since it was using
scan_parameters.freqs directly. This was changed to using a separate
scan_freq_set pointer, then setting to scan_parameters.freqs when needed.
This increases the maximum data rate which now is possible with HE.
A few comments were also updated, one to include 6G when adjusting
the rank for >4000mhz, and the other fixing a typo.
Certain module dependencies were missing, which could cause a crash on
exit under (very unlikely) circumstances.
#0 l_queue_peek_head (queue=<optimized out>) at ../iwd-1.28/ell/queue.c:241
#1 0x0000aaaab752f2a0 in wiphy_radio_work_done (wiphy=0xaaaac3a129a0, id=6)
at ../iwd-1.28/src/wiphy.c:2013
#2 0x0000aaaab7523f50 in netdev_connect_free (netdev=netdev@entry=0xaaaac3a13db0)
at ../iwd-1.28/src/netdev.c:765
#3 0x0000aaaab7526208 in netdev_free (data=0xaaaac3a13db0) at ../iwd-1.28/src/netdev.c:909
#4 0x0000aaaab75a3924 in l_queue_clear (queue=queue@entry=0xaaaac3a0c800,
destroy=destroy@entry=0xaaaab7526190 <netdev_free>) at ../iwd-1.28/ell/queue.c:107
#5 0x0000aaaab75a3974 in l_queue_destroy (queue=0xaaaac3a0c800,
destroy=destroy@entry=0xaaaab7526190 <netdev_free>) at ../iwd-1.28/ell/queue.c:82
#6 0x0000aaaab7522050 in netdev_exit () at ../iwd-1.28/src/netdev.c:6653
#7 0x0000aaaab7579bb0 in iwd_modules_exit () at ../iwd-1.28/src/module.c:181
In this particular case, wiphy module was de-initialized prior to the
netdev module:
Jul 14 18:14:39 localhost iwd[2867]: ../iwd-1.28/src/wiphy.c:wiphy_free() Freeing wiphy phy0[0]
Jul 14 18:14:39 localhost iwd[2867]: ../iwd-1.28/src/netdev.c:netdev_free() Freeing netdev wlan0[45]
Periodic scan requests are meant to be performed with a lower priority
than normal scan requests. They're thus given a different priority when
inserting them into the wiphy work queue. Unfortunately, the priority
is not taken into account when they are inserted into the
sr->requests queue. This can result in the scanning code being confused
since it assumes the top of the queue is always the next scheduled or
currently ongoing scan. As a result any further wiphy_work might never be
started properly.
Apr 27 16:34:40 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_insert() Inserting work item 3
Apr 27 16:34:40 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_next() Starting work item 3
Apr 27 16:34:40 iwd[5117]: ../iwd-1.26/src/scan.c:scan_periodic_timeout() 1
Apr 27 16:34:40 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_insert() Inserting work item 4
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_insert() Inserting work item 5
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_done() Work item 3 done
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_next() Starting work item 5
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/scan.c:scan_notify() Scan notification Trigger Scan(33)
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/scan.c:scan_request_triggered() Passive scan triggered for wdev 1
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/scan.c:scan_periodic_triggered() Periodic scan triggered for wdev 1
In the above log, scan request 5 (triggered by dbus) is started before
scan request 4 (periodic scan). Yet the scanning code thinks scan
request 4 was triggered.
Fix this by using the wiphy_work priority to sort the sr->requests queue
so that the scans are ordered in the same manner.
Reported-by: Alvin Šipraga <ALSI@bang-olufsen.dk>
This was already implemented in station but with no dependency on
that module at all. AP will need this for a scanning API so its
being moved into scan.c.
The periodic scan code was refactored to make normal scans and
periodic scans consistent by keeping both in the same queue. But
that change left out the abort path where periodic scans were not
actually removed from the queue.
This fixes a rare crash when a periodic scan has been triggered and
the device goes down. This path never removes the request from the
queue but still frees it. Then when the scan context is removed the
stale request is freed again.
0 0x4bb65b in scan_request_cancel src/scan.c:202
1 0x64313c in l_queue_clear ell/queue.c:107
2 0x643348 in l_queue_destroy ell/queue.c:82
3 0x4bbfb7 in scan_context_free src/scan.c:209
4 0x4c9a78 in scan_wdev_remove src/scan.c:2115
5 0x42fecd in netdev_free src/netdev.c:965
6 0x445827 in netdev_destroy src/netdev.c:6507
7 0x52beb9 in manager_config_notify src/manager.c:765
8 0x67084b in process_multicast ell/genl.c:1029
9 0x67084b in received_data ell/genl.c:1096
10 0x65e790 in io_callback ell/io.c:120
11 0x65aaae in l_main_iterate ell/main.c:478
12 0x65b213 in l_main_run ell/main.c:525
13 0x65b213 in l_main_run ell/main.c:507
14 0x65b72c in l_main_run_with_signal ell/main.c:647
15 0x4124e7 in main src/main.c:532
A select few drivers send this instead of SIGNAL_MBM. The docs say this
value is the signal 'in unspecified units, scaled to 0..100'. The range
for SIGNAL_MBM is -10000..0 so this can be scaled to the MBM range easy
enough...
Now, this isn't exactly correct because this value ultimately gets
returned from GetOrderedNetworks() and is documented as 100 * dBm where
in reality its just a unit-less signal strength value. Its not ideal, but
this patch at least will fix BSS ranking for these few drivers.
scan_request_failed and scan_finished remove the finished scan_request
from the request queue right away, before calling the callback. This
breaks those clients that rely on scan_cancel working on such requests
(i.e. to force the destroy callback to be invoked synchronously, see
a0911ca77812 ("station: Make sure roam_scan_id is always canceled").
Fix this by removing the scan_request from the request queue after
invoking the callback. Also provide a re-entrancy guard that will make
sure that the scan_request isn't removed in scan_cancel itself.
There are similar operations being performed but with different
callbacks and userdata, depending on whether 'sr' is NULL or not.
Optimize the function flow slightly to make if-else unnecessary.
While here, update the comment. periodic scans are now scheduled only
based on the periodic timeout timer.
If periodic scan is active and we receive a SCAN_ABORTED event, we would
still invoke the periodic scan callback with an error. This is rather
pointless since the periodic scan callback cannot do anything useful
with this information. Fix that.
We should never reach a point where NEW_SCAN_RESULTS or SCAN_ABORTED are
received before a corresponding TRIGGER_SCAN is received. Even if this
does happen, there's no harm from processing the commands anyway.
This makes it a little easier to book-keep the started variable. Since
scan_request already has a 'passive' bit-field, there should be no
storage penalty.
If scan_cancel is called on a scan_request that is 'finished' but with
the GET_SCAN command still in flight, it will trigger a crash as
follows:
Received Deauthentication event, reason: 2, from_ap: true
src/station.c:station_disconnect_event() 11
src/station.c:station_disassociated() 11
src/station.c:station_reset_connection_state() 11
src/station.c:station_roam_state_clear() 11
src/scan.c:scan_cancel() Trying to cancel scan id 6 for wdev 200000002
src/scan.c:scan_cancel() Scan is at the top of the queue, but not triggered
src/scan.c:get_scan_done() get_scan_done
Aborting (signal 11) [/home/denkenz/iwd-master/src/iwd]
++++++++ backtrace ++++++++
#0 0x7f9871aef3f0 in /lib64/libc.so.6
#1 0x41f470 in station_roam_scan_notify() at /home/denkenz/iwd-master/src/station.c:2285
#2 0x43936a in scan_finished() at /home/denkenz/iwd-master/src/scan.c:1709
#3 0x439495 in get_scan_done() at /home/denkenz/iwd-master/src/scan.c:1739
#4 0x4bdef5 in destroy_request() at /home/denkenz/iwd-master/ell/genl.c:676
#5 0x4c070b in l_genl_family_cancel() at /home/denkenz/iwd-master/ell/genl.c:1960
#6 0x437069 in scan_cancel() at /home/denkenz/iwd-master/src/scan.c:842
#7 0x41dc2e in station_roam_state_clear() at /home/denkenz/iwd-master/src/station.c:1594
#8 0x41dd2b in station_reset_connection_state() at /home/denkenz/iwd-master/src/station.c:1619
#9 0x41dea4 in station_disassociated() at /home/denkenz/iwd-master/src/station.c:1644
The happens because get_scan_done callback is still called as a result of
l_genl_cancel. Add a re-entrancy guard in the form of 'canceled'
variable in struct scan_request. If set, get_scan_done will skip invoking
scan_finished.
It isn't clear what 'l_queue_peek_head() == results->sr' check was trying
to accomplish. If GET_SCAN dump was scheduled, then it should be
reported. Drop it.
results->sr is set to NULL for 'opportunistic' scans which were
triggered externally. See scan_notify() for details. However,
get_scan_done would only invoke scan_finished (and thus the periodic
scan callback sc->sp.callback) only if the scan queue was empty. It
should do so in all cases.