The auth/action status is now tracked in ft.c. If an AP rejects the
FT attempt with "Invalid PMKID" we can now assume this AP is either
mis-configured for FT or is lagging behind getting the proper keys
from neighboring APs (e.g. was just rebooted).
If we see this condition IWD can now fall back to reassociation in
an attempt to still roam to the best candidate. The fallback decision
is still rank based: if a BSS fails FT it is marked as such, its
ranking is reset removing the FT factor and it is inserted back
into the queue.
The motivation behind this isn't necessarily to always force a roam,
but instead to handle two cases where IWD can either make a bad roam
decision or get 'stuck' and never roam:
1. If there is one good roam candidate and other bad ones. For
example say BSS A is experiencing this FT key pull issue:
Current BSS: -85dbm
BSS A: -55dbm
BSS B: -80dbm
The current logic would fail A, and roam to B. In this case
reassociation would have likely succeeded so it makes more sense
to reassociate to A as a fallback.
2. If there is only one candidate, but its failing FT. IWD will
never try anything other than FT and repeatedly fail.
Both of the above have been seen on real network deployments and
result in either poor performance (1) or eventually lead to a full
disconnect due to never roaming (2).
Certain return codes, though failures, can indicate that the AP is
just confused or booting up and treating it as a full failure may
not be the best route.
For example in some production deployments if an AP is rebooted it
may take some time for neighboring APs to exchange keys for
current associations. If a client roams during that time it will
reject saying the PMKID is invalid.
Use the ft_associate call return to communicate the status (if any)
that was in the auth/action response. If there was a parsing error
or no response -ENOENT is still returned.
In many tests the hostapd configuration does not include all the
values that a test uses. Its expected that each individual test
will add the values required. In many cases its required each test
slightly alter the configuration for each change every other test
has to set the value back to either a default or its own setting.
This results in a ton of duplicated code mainly setting things
back to defaults.
To help with this problem the hostapd configuration is read in
initially and stored as the default. Tests can then simply call
.default() to set everything back. This significantly reduces or
completely removes a ton of set_value() calls.
This does require that each hostapd configuration file includes all
values any of the subtests will set, which is a small price for the
convenience.
Removed several debug prints which are very verbose and provide
little to no important information.
The get_scan_{done,callback} prints are pointless since all the
parsed scan results are printed by station anyways.
Printing the BSS load is also not that useful since it doesn't
include the BSSID. If anything the BSS load should be included
when station prints out each individual BSS (along with frequency,
rank, etc).
The advertisement protocol print was just just left in there by
accident when debugging, and also provides basically no useful
information.
Some APs don't include the RSNE in the associate reply during
the OWE exchange. This causes IWD to be incompatible since it has
a hard requirement on the AKM being included.
This relaxes the requirement for the AKM and instead warns if it
is not included.
Below is an example of an association reply without the RSN element
IEEE 802.11 Association Response, Flags: ........
Type/Subtype: Association Response (0x0001)
Frame Control Field: 0x1000
.000 0000 0011 1100 = Duration: 60 microseconds
Receiver address: 64:c4:03:88:ff:26
Destination address: 64:c4:03:88:ff:26
Transmitter address: fc:34:97:2b:1b:48
Source address: fc:34:97:2b:1b:48
BSS Id: fc:34:97:2b:1b:48
.... .... .... 0000 = Fragment number: 0
0001 1100 1000 .... = Sequence number: 456
IEEE 802.11 wireless LAN
Fixed parameters (6 bytes)
Tagged parameters (196 bytes)
Tag: Supported Rates 6(B), 9, 12(B), 18, 24(B), 36, 48, 54, [Mbit/sec]
Tag: RM Enabled Capabilities (5 octets)
Tag: Extended Capabilities (11 octets)
Ext Tag: HE Capabilities (IEEE Std 802.11ax/D3.0)
Ext Tag: HE Operation (IEEE Std 802.11ax/D3.0)
Ext Tag: MU EDCA Parameter Set
Ext Tag: HE 6GHz Band Capabilities
Ext Tag: OWE Diffie-Hellman Parameter
Tag Number: Element ID Extension (255)
Ext Tag length: 51
Ext Tag Number: OWE Diffie-Hellman Parameter (32)
Group: 384-bit random ECP group (20)
Public Key: 14ba9d8abeb2ecd5d95e6c12491b16489d1bcc303e7a7fbd…
Tag: Vendor Specific: Broadcom
Tag: Vendor Specific: Microsoft Corp.: WMM/WME: Parameter Element
Reported-By: Wen Gong <quic_wgong@quicinc.com>
Tested-By: Wen Gong <quic_wgong@quicinc.com>
Handling these events notifies hwsim of address changes for interface
creation/removal outside the initial namespace as well as address
changes due to scanning address randomization.
Interfaces that hwsim already knows about are still handled via
nl80211. But any interfaces not known when ADD/DEL_MAC_ADDR events
come will be treated specially.
For ADD, a dummy interface object will be created and added to the
queue. This lets the frame processing match the destination address
correctly. This can happen both for scan randomization and interface
creation outside of the initial namespace.
For the DEL event we handle similarly and don't touch any interfaces
found via nl80211 (i.e. have a 'name') but need to also be careful
with the dummy interfaces that were created outside the initial
namespace. We want to keep these around but scanning MAC changes can
also delete them. This is why a reference count was added so scanning
doesn't cause a removal.
For example, the following sequence:
ADD_MAC_ADDR (interface creation)
ADD_MAC_ADDR (scanning started)
DEL_MAC_ADDR (scanning done)
Hostapd commit b6d3fd05e3 changed the PMKID derivation in accordance
with 802.11-2020 which then breaks PMKID validation in IWD. This
breaks the FT-8021x AKM in IWD if the AP uses this hostapd version
since the PMKID doesn't validate during EAPoL.
This updates the PMKID derivation to use the correct SHA hash for
this AKM and adds SHA1 based PMKID checking for interoperability
with older hostapd versions.
The PMKID derivation has gotten messy due to the spec
updating/clarifying the hash size for the FT-8021X AKM. This
has led to hostapd updating the derivation which leaves older
hostapd versions using SHA1 and newer versions using SHA256.
To support this the checksum type is being fed to
handshake_state_get_pmkid so the caller can decide what sha to
use. In addition handshake_state_pmkid_matches is being added
which uses get_pmkid() but handles sorting out the hash type
automatically.
This lets preauthentication use handshake_state_get_pmkid where
there is the potential that a new PMKID is derived and eapol
can use handshake_state_pmkid_matches which only derives the
PMKID to compare against the peers.
The existing API was limited to SHA1 or SHA256 and assumed a key
length of 32 bytes. Since other AKMs plan to be added update
this to take the checksum/length directly for better flexibility.
This is consistent with the over-Air path, and makes it clear when
reading the logs if over-DS was used, if there was a response frame,
and if the frame failed to parse in some way.
The parsing code was breaking out of the loop on the first comment
which is incorrect and causes only part of the file to be parsed.
Its odd this hasn't popped up until now but its likely due to
differing dhcpd versions, some which add comments and others that
do not.
FILS rekeys were fixed in hostapd somewhat recently but older
versions will fail this test. Document that so we don't get
confused when running tests against older hostapd versions.
Disable power save if the wiphy indicates its needed. Do this
before issuing GET_LINK so the netdev doesn't signal its up until
power save is disabled.
This allows generating code and test coverage reports using lcov &
genhtml. Useful for understanding how much of the codebase is currently
covered by unit and autotests.
Certain drivers do not handle power save very well resulting in
missed frames, firmware crashes, or other bad behavior. Its easy
enough to disable power save via iw, iwconfig, etc but since IWD
removes and creates the interface on startup it blows away any
previous power save setting. The setting must be done *after* IWD
creates the interface which can be done, but needs to be via some
external daemon monitoring IWD's state. For minimal systems,
e.g. without NetworkManager, it becomes difficult and annoying to
persistently disable power save.
For this reason a new driver flag POWER_SAVE_DISABLE is being
added. This can then be referenced when creating the interfaces
and if set, disable power save.
The driver_infos list in wiphy.c is hard coded and, naturally,
not configurable from a user perspective. As drivers are updated
or added users may be left with their system being broken until the
driver is added, IWD released, and packaged.
This adds the ability to define driver flags inside main.conf under
the "DriverQuirks" group. Keys in this group correspond to values in
enum driver_flag and values are a list of glob matches for specific
drivers:
[DriverQuirks]
DefaultInterface=rtl81*,rtl87*,rtl88*,rtw_*,brcmfmac,bcmsdh_sdmmc
ForcePae=buggy_pae_*
Rather than keep a pointer to the driver_info entry copy the flags
into the wiphy object. This preps for supporting driver flags via
a configuration file, specifically allowing for entries that are a
subset of others. For example:
{ "rtl88*", DEFAULT_IF },
{ "rtl88x2bu", FORCE_PAE },
Before it was not possible to add entires like this since only the
last entry match would get set. Now DEFAULT_IF would get set to all
matches, and FORCE_PAE to only rtl88x2bu. This isn't especially
important for the static list since it could be modified to work
correctly, but will be needed when parsing flags from a
configuration file that may contain duplicates or subsets of the
static list.
If there was some problem during the FT authenticate stage
its nice to know more of what happened: whether the AP didn't
respond, rejected the attempt, or sent an invalid frame/IEs.
In some situations its convenient for the same work item to be
inserted (rescheduled) while its in progress. FT for example does
this now if a roam fails. The same ft_work item gets re-inserted
which, currently, is not safe to do since the item is modified
and removed once completed.
Fix this by introducing wiphy_radio_work_reschedule which is an
explicit API for re-inserting work items from within the do_work
callback.
The wiphy work logic was changed around slightly to remove the item
at the head of the queue prior to starting and note the ID going
into do_work. If do_work signaled done and ID changed we know it
was re-inserted and can skip the destroy logic and move onto the
next item. If the item is not done continue as normal but set the
priority to INT_MIN, as usual, to prevent other items from getting
to the head of the queue.
This adds another radio so IWD hits the FT failure path after
authentication to the first BSS fails. This causes a wiphy work
item to be rescheduled which previously was unsafe.
If IWD connects under bad RF conditions and netconfig takes
a while to complete (e.g. slow DHCP), the roam timeout
could fire before DHCP is done. Then, after the roam,
IWD would transition automatically to connected before
DHCP was finished. In theory DHCP could still complete after
this point but any process depending on IWD's connected
state would be uninformed and assume IP networking is up.
Fix this by stopping netconfig prior to a roam if IWD is not
in a connected state. Then, once the roam either failed or
succeeded, start netconfig again.
iwctl quit (running quit non-interactively) isn't a useful command,
but it shouldn't segfault. Let's avoid calling readline functions if
we haven't initialized readline in this run.
When acting as a configurator the enrollee can start on a different
channel than IWD is connected to. IWD will begin the auth process
on this channel but tell the enrollee to transition to the current
channel after the auth request. Since a configurator must be
connected (a requirement IWD enforces) we can assume a channel
transition will always be to the currently connected channel. This
allows us to simply cancel the offchannel request and wait for a
response (rather than start another offchannel).
Doing this improves the DPP performance and reduces the potential
for a lost frame during the channel transition.
This patch also addresses the comment that we should wait for the
auth request ACK before canceling the offchannel. Now a flag is
set and IWD will cancel the offchannel once the ACK is received.
If IWD gets a disconnect during FT the roaming state will be
cleared, as well as any ft_info's during ft_clear_authentications.
This includes canceling the offchannel operation which also
destroys any pending ft_info's if !info->parsed. This causes a
double free afterwards. In addition the l_queue_remove inside the
foreach callback is not a safe operation either.
To fix this don't remove the ft_info inside the offchannel
destroy callback. The info will get freed by ft_associate regardless
of the outcome (parsed or !parsed). This is also consistent with
how the onchannel logic works.
Log and crash backtrace below:
iwd[488]: src/station.c:station_try_next_transition() 5, target aa:46:8d:37:7c:87
iwd[488]: src/wiphy.c:wiphy_radio_work_insert() Inserting work item 16668
iwd[488]: src/wiphy.c:wiphy_radio_work_insert() Inserting work item 16669
iwd[488]: src/wiphy.c:wiphy_radio_work_done() Work item 16667 done
iwd[488]: src/wiphy.c:wiphy_radio_work_next() Starting work item 16668
iwd[488]: src/netdev.c:netdev_mlme_notify() MLME notification Remain on Channel(55)
iwd[488]: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
iwd[488]: src/netdev.c:netdev_link_notify() event 16 on ifindex 5
iwd[488]: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
iwd[488]: src/netdev.c:netdev_deauthenticate_event()
iwd[488]: src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
iwd[488]: src/netdev.c:netdev_disconnect_event()
iwd[488]: Received Deauthentication event, reason: 6, from_ap: true
iwd[488]: src/station.c:station_disconnect_event() 5
iwd[488]: src/station.c:station_disassociated() 5
iwd[488]: src/station.c:station_reset_connection_state() 5
iwd[488]: src/station.c:station_roam_state_clear() 5
iwd[488]: double free or corruption (fasttop)
5 0x0000555b3dbf44a4 in ft_info_destroy ()
6 0x0000555b3dbf45b3 in remove_ifindex ()
7 0x0000555b3dc4653c in l_queue_foreach_remove ()
8 0x0000555b3dbd0dd1 in station_reset_connection_state ()
9 0x0000555b3dbd37e5 in station_disassociated ()
10 0x0000555b3dbc8bb8 in netdev_mlme_notify ()
11 0x0000555b3dc4e80b in received_data ()
12 0x0000555b3dc4b430 in io_callback ()
13 0x0000555b3dc4a5ed in l_main_iterate ()
14 0x0000555b3dc4a6bc in l_main_run ()
15 0x0000555b3dc4a8e0 in l_main_run_with_signal ()
16 0x0000555b3dbbe888 in main ()