Handle situations where the BSS we're trying to connect to is no longer
in the kernel scan result cache. Normally, the kernel will re-scan the
target frequency if this happens on the CMD_CONNECT path, and retry the
connection.
Unfortunately, CMD_AUTHENTICATE path used for WPA3, OWE and FILS does
not have this scanning behavior. CMD_AUTHENTICATE simply fails with
a -ENOENT error. Work around this by trying a limited scan of the
target frequency and re-trying CMD_AUTHENTICATE once.
Found using lsan:
==29896==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 9 byte(s) in 1 object(s) allocated from:
#0 0x7fcd41e0c710 in __interceptor_malloc /var/tmp/portage/sys-devel/gcc-8.2.0-r6/work/gcc-8.2.0/libsanitizer/asan/asan_malloc_linux.cc:86
#1 0x606abd in l_malloc ell/util.c:62
#2 0x460230 in ie_tlv_vendor_ie_concat src/ie.c:140
#3 0x4605d1 in ie_tlv_extract_wfd_payload src/ie.c:216
#4 0x4a8773 in scan_parse_bss_information_elements src/scan.c:1105
#5 0x4a94a8 in scan_parse_attr_bss src/scan.c:1181
#6 0x4a99f8 in scan_parse_result src/scan.c:1238
#7 0x4abe4e in get_scan_callback src/scan.c:1451
#8 0x6442d9 in process_unicast ell/genl.c:979
#9 0x6453ff in received_data ell/genl.c:1087
#10 0x62e1a4 in io_callback ell/io.c:126
#11 0x628fca in l_main_iterate ell/main.c:473
#12 0x6294e8 in l_main_run ell/main.c:520
#13 0x629d8b in l_main_run_with_signal ell/main.c:642
#14 0x40681b in main src/main.c:505
#15 0x7fcd40a55bdd in __libc_start_main (/lib64/libc.so.6+0x21bdd)
Besides being undefined behaviour, signed integer overflow can cause
unexpected comparison results. In the case of network_rank_compare(),
a connected network with rank INT_MAX would cause newly inserted
networks with negative rank to be inserted earlier in the ordered
network list. This is reflected in the GetOrderedMethods() DBus method
as can be seen in the following iwctl output:
[iwd]# station wlan0 get-networks
Network name Security Signal
----------------------------------------------------
BEOLAN 8021x **** }
BeoBlue psk *** } all unknown,
UI_Test_Network psk *** } hence assigned
deneb_2G psk *** } negative rank
BEOGUEST open **** }
> titan psk ****
Linksys05274_5GHz_dmt psk ****
Lyngby-4G-4 5GHz psk ****
Instead of creating the results->bss_list l_queue lazily, always create
one before sending the GET_SCAN command. This is to make sure that an
empty list is passed to the scan callback (e.g. in station.c) instead of
a NULL. Passing NULL has been causing difficult to debug crashes in
station.c, in fact I think I've been seeing them for over a year now
but can't be sure. station_set_scan_results has been taking ownership
of the new BSS list and, if station->connected_bss was not on the list,
it would try to add it not realizing that l_queue_push_tail() was doing
nothing. Always passing a valid list may help us prevent similar
problems in the future.
The crash might start with:
==120489== Invalid read of size 8
==120489== at 0x425D38: network_bss_select (network.c:709)
==120489== by 0x415BD1: station_try_next_bss (station.c:2263)
==120489== by 0x415E31: station_retry_with_status (station.c:2323)
==120489== by 0x415E31: station_connect_cb (station.c:2367)
==120489== by 0x407E66: netdev_connect_failed (netdev.c:569)
==120489== by 0x40B93D: netdev_connect_event (netdev.c:1801)
==120489== by 0x40B93D: netdev_mlme_notify (netdev.c:3678)
To use the wiphy radio work queue, scanning mostly remained the same.
start_next_scan_request was modified to be used as the work callback,
as well as not start the next scan if the current one was done
(since this is taken care of by wiphy work queue now). All
calls to start_next_scan_request were removed, and more or less
replaced with wiphy_radio_work_done.
scan_{suspend,resume} were both removed since radio management
priorities solve this for us. ANQP requests can be inserted ahead of
scan requests, which accomplishes the same thing.
If start_scan_next_request() is called while a scan request
(NL80211_CMD_TRIGGER_SCAN) is still running, the same scan request will
be sent again. Add a check in the function to avoid sending a request if
one is already in progress. For consistency, check also that scan
results are not being requested (NL80211_CMD_GET_SCAN), before trying to
send the next scan request. Finally, remove similar checks at
start_next_scan_request() callsites to simplify the code.
This also fixes a crash that occurs if the following conditions are met:
- the duplicated request is the only request in the scan request
queue, and
- both scan requests fail with an error not EBUSY.
In this case, the first callback to scan_request_triggered() will delete
the request from the scan request queue. The second callback will find
an empty queue and consequently pass a NULL scan_request pointer to
scan_request_failed(), causing a segmentation fault.
If scanning is suspended, have scan_common() queue its scan request
rather than issuing it immediately. This respects the assumption that
scans are not requested while sc->suspended is true.
#0 0x000055555558ee5d in scan_notify (msg=0x55555560b640, user_data=0x0) at src/scan.c:1706
#1 0x00007ffff7f2c78c in ?? () from /usr/lib/libell.so.0
#2 0x00007ffff7f299ec in ?? () from /usr/lib/libell.so.0
#3 0x00007ffff7f28e4a in l_main_iterate () from /usr/lib/libell.so.0
#4 0x00007ffff7f28efc in l_main_run () from /usr/lib/libell.so.0
#5 0x00007ffff7f290b9 in l_main_run_with_signal () from /usr/lib/libell.so.0
#6 0x00005555555639c4 in main (argc=1, argv=0x7fffffffec18) at src/main.c:497
Save the source frame type in struct scan_bss as it may affect how some
of the data in the struct will be parsed. Also replace the P2P IE
payload data in that struct with a union containing pre-parsed p2p
attributes corresponding to the frame type.
This means users don't have to call the parsers in p2putil.c on that
data, which wouldn't have worked anyway because those parsers assume
input is the raw IE sequence rather than just the "payload".
The kernel sends NL80211_ATTR_SCAN_START_TIME_TSF with CMD_TRIGGER and
RRM requires this value for beacon measurement reports.
The start time is parsed during CMD_TRIGGER and set into the scan request.
A getter was added to obtain this time value for an already triggered
scan.
After making the change, the SCAN_ABORTED case was cleaned up a bit to
remove the local scan_request usage in favor of the one used for all the
other cases.
The kernel allows a scan duration and duration mandatory flag to be
set in scan requests. RRM requests can contain these values so they
have been added to scan_parameters.
Scanning with drivers which do not support EXT_FEATURE_SET_SCAN_DWELL
will not include these values in scan requests.
no_cck_rates is set in the scan parameters generally to make sure
that the Probe Request frames are not sent at any of the 802.11b
rates during active scans. With this patch we also omit those rates
from the Supported Rates IEs, which is required by the p2p spec and
also matches our flag's name.
The intent here was to validate that the frequency is a multiple of 5
and lies in a certain range. Somehow the channel was checked for being
a multiple of 5 instead.
This will be seen in Probe Requests. More IEs can and should
be added here depending on the support in IWD. E.g. HS20 indication,
Interworking, HT/VHT IE's etc.
For (Re)Association the HS20 indication element was passed exactly as
it was found in the scan results. The spec defines what bits can be
set and what cannot when this IE is used in (Re)Association. Instead
of assuming the AP's IE conforms to the spec, we now parse the IE and
re-build it for use with (Re)Association.
Since the full IE is no longer used, it was removed from scan_bss, and
replaced with a bit for HS20 support (hs20_capable). This member is
now used the same as hs20_ie was.
The version parsed during scan results is now used when building the
(Re)Association IE.
The HS20 indication element should always be included during
(Re)Association per the spec. This removes the need for a
dedicated boolean, and now the hs20_ie can be used instead.
If the scan was triggered and later aborted, make sure to reset the
triggered value when the CMD_NEW_SCAN_RESULTS event comes in.
src/station.c:station_enter_state() Old State: disconnected, new state: connecting
src/scan.c:scan_notify() Scan notification 33
src/station.c:station_netdev_event() Associating
src/scan.c:scan_notify() Scan notification 34
Aborting (signal 11) [/home/denkenz/iwd-master/src/iwd]
++++++++ backtrace ++++++++
#0 0x7efd4d6a2ef0 in /lib64/libc.so.6
#1 0x42b20d in scan_notify() at src/scan.c:1383
P2P probe requests are to be sent at min 6.0 Mb/s using OFDM,
specifically the 802.11b rates are prohibited (section 2.4.1 in Wi-Fi
P2p Technical Spec v1.7), some of which use CCK modulation. This is
already the default for 5G but for 2.4G the drivers generally do this
if we set the NL80211_ATTR_TX_NO_CCK_RATE flags with
NL80211_CMD_TRIGGER_SCAN.
The ifindex is used to index the netdevs in the system (wlan, ethernet,
etc.) but we can also do wifi scanning on interfaces that have no
corresponding netdev object, like the P2P-device virtual interfaces.
Use the wdev id's to reference interfaces, the nl80211 api doesn't care
whether we use a NL80211_ATTR_IFINDEX or NL80211_ATTR_WDEV. Only
wireless interfaces have a wdev id.
Save the actual cmd_id returned from l_genl_family_dump and zero it in
the get_scan_done. There's no need to zero it in scan_cancel because
get_scan_done gets called automatically.
Store the scan_context pointer in scan_results directly instead of
storing the ifindex. We now cancel ongoing GET_SCAN commands when the
scan_context is being freed so there's no point going through the extra
step of looking up the scan_context by ifindex inside the command
callback to guard against non-existent scan_contexts.
In order to do ANQP efficiently IWD needs the ability to suspend scanning
temporarily. This is because both scanning and ANQP go offchannel and must
remain off channel for some amount of time. This cannot be done
simultaneously and if e.g. ANQP is requested after a scan is already
pending, the kernel will wait till that scan finishes before sending out
the frame.
This IE tells us what Advertisement Protocols the AP supports. This
is only here to look for ANQP support, so all this does is iterate
through all other Advertisement Protocol tuples looking for ANQP.
If found, anqp_capable is set in the scan_bss
The vendor specific IE was being parsed only to check if the AP supported
WPA, which used a Microsoft OUI. Hotspot/OSEN uses neither WPA or RSN
(although its nearly identical to RSN) so the we also need to check for
this Wifi-Alliance OUI and set bss->osen (new) if found.
When handling a scan finished event for a scan we haven't started check
that we were not halfway through a scan request that would have its
results flushed by the external scan.
Instead of having two separate types of scans make the periodic scan
logic a layer on top of the one-off scan requests, with minimum code to
account for the lower priority of those scans and the fact that periodic
scans also receive results from external scans. Also try to simplify
the code for both the periodic and one-off scans. In the SCAN_RESULTS
and SCAN_ABORT add more complete checks of the current request's state
so we avoid some existing crashes related to external scans.
scan_send_next_cmd and start_next_scan_request are now just one function
since their funcionality was similar and start_next_scan_request is used
everywhere. Also the state after the trigger command receives an EBUSY
is now the same as when a new scan is on top of the queue so we have
fewer situations to consider.
This code still does not account for fragmented scans where an external
scan between two or our fragments flushes the results and we lose some
of the results, or for fragmented scans that take over 30s and the
kernel expires some results (both situations are unlikely.)
Previously, the scan results were disregarded once the new
ones were available. To enable the scan scenarios where the
new scan results are delivered in parts, we introduce a
concept of aging BSSs and will remove them based on
retention time.
CC src/scan.o
src/scan.c: In function ‘scan_bss_compute_rank’:
src/scan.c:1048:4: warning: this decimal constant is unsigned only in ISO C90
factor = factor * data_rate / 2340000000 +
This is not used by any of the scan notify callback implementations and
for P2P we're going to need to scan on an interface without an ifindex
so without this the other changes should be mostly contained in scan.
sc->state would get set when the TRIGGERED event arrived or when the
triggered callback for our own SCAN_TRIGGER command is received.
However it would not get reset to NOT_RUNNING when the NEW_SCAN_RESULTS
event is received, instead we'd first request the results with GET_SCAN
and only reset sc->state when that returns. If during that command a
new scan gets triggered, the GET_SCAN callback would still reset
sc->state and clobber the value set by the new scan.
To fix that repurpose sc->state to only track that period from the
TRIGGERED signal to the NEW_SCAN_RESULTS signal. sc->triggered can be
used to check if we're still waiting for the GET_SCAN command and
sc->start_cmd_id to check if we're waiting for the scan to get
triggered, so one of these three variables will now always indicate if
a scan is in progress.
On successful send, scan_send_start(..) used to set msg to NULL,
therefore the further management of the command by the caller was
impossible. This patch removes wrapper around l_genl_family_send()
and lets the callers to take responsibility for the command.
Some users may need their own control over 2.4/5GHz preference. This
adds a new user option, 'rank_5g_factor', which allows users to increase
or decrease their 5G preference.
This adds support for parsing the VHT IE, which allows a BSS supporting
VHT (80211ac) to be ranked higher than a BSS supporting only HT/basic
rates. Now, with basic/HT/VHT parsing we can calculate the theoretical
maximum data rate for all three and rank the BSS based on that.
This adds HT IE parsing and data rate calculation for HT (80211n)
rates. Now, a BSS supporting HT rates will be ranked higher than
a basic rate BSS, assuming the RSSI is at an acceptable level.
The spec dictates RSSI thresholds for different modulation schemes, which
correlate to different data rates. Until now were were ranking a BSS with
only looking at its advertised data rate, which may not even be possible
if the RSSI does not meet the threshold.
Now, RSSI is taken into consideration and the data rate returned from
parsing (Ext) Supported Rates IE(s) will reflect that.
This should not change the behaviour except for fixing a rare crash
due to scan_cancel not working correctly when cancelling the first scan
request in the queue while a periodic scan was running, and potentially
other corner cases. To be able to better distinguish between a periodic
scan in progress and a scan request in progress add a sc->current_sr
field that points either at a scan request or is NULL when a periodic
scan is in ongoing. Move the triggered flag from scan_request and
scan_preiodic directly to scan_context so it's there together with
start_cmd_id. Hopefully make scan_cancel simpler/clearer.
Note sc->state and sc->triggered have similar semantics so one of them
may be easily removed. Also the wiphy_id parameter to the scan callback
is rather useless, note I temporarily pass 0 as the value on error but
perhaps it should be dropped.
The main difference with this is that scan_context removal will also
trigger the .destroy calls. Normally there won't be any requests left
during scan_context but if there were any we should call destroy on
them.
Fix incorrect usage of the caller’s scan triggered callback.
In case of a failure, destroy scan request and notify caller
about the issue by returning zero scan id instead of calling
callers’ scan triggered callback with an error code.
Until now network.c managed the list of network_info structs including
for known networks and networks that are seen in at least one device's
scan results, with the is_known flag to distinguish known networks.
Each time the list was processed though the code was either interested
in one subset of networks or the other. Split the list into a Known
Networks list and the list of other networks seen in scans. Move all
code related to Known Networks to knownnetworks.c, this simplifies
network.h. It also gets rid of network_info_get_known which actually
returned the list of all network_infos (not just for known networks),
which logically should have been private to network.c. Update device.c
and scan.c to use functions specific to Known Networks instead of
filtering the lists by the is_known flag.
This will also allow knownnetworks.c to export DBus objects and/or
properties for the Known Networks information because it now knows when
Known Networks are added, removed or modified by IWD.