If the FT-Authenticate frame has been sent then a deauth is received
the work item for sending the FT-Associate frame is never canceled.
When this runs station->connected_network is NULL which causes a
crash:
src/station.c:station_try_next_transition() 7, target xx:xx:xx:xx:xx:xx
src/wiphy.c:wiphy_radio_work_insert() Inserting work item 5843
src/wiphy.c:wiphy_radio_work_insert() Inserting work item 5844
src/wiphy.c:wiphy_radio_work_done() Work item 5842 done
src/wiphy.c:wiphy_radio_work_next() Starting work item 5843
src/netdev.c:netdev_mlme_notify() MLME notification Remain on Channel(55)
src/ft.c:ft_send_authenticate()
src/netdev.c:netdev_mlme_notify() MLME notification Frame TX Status(60)
src/netdev.c:netdev_link_notify() event 16 on ifindex 7
src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
src/netdev.c:netdev_deauthenticate_event()
src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
src/netdev.c:netdev_disconnect_event()
Received Deauthentication event, reason: 7, from_ap: true
src/station.c:station_disconnect_event() 7
src/station.c:station_disassociated() 7
src/station.c:station_reset_connection_state() 7
src/station.c:station_roam_state_clear() 7
src/netconfig.c:netconfig_event_handler() l_netconfig event 2
src/netconfig-commit.c:netconfig_commit_print_addrs() removing address: yyy.yyy.yyy.yyy
src/resolve.c:resolve_systemd_revert() ifindex: 7
[DHCPv4] l_dhcp_client_stop:1264 Entering state: DHCP_STATE_INIT
src/station.c:station_enter_state() Old State: connected, new state: disconnected
src/station.c:station_enter_state() Old State: disconnected, new state: autoconnect_quick
src/wiphy.c:wiphy_radio_work_insert() Inserting work item 5845
src/netdev.c:netdev_mlme_notify() MLME notification Cancel Remain on Channel(56)
src/wiphy.c:wiphy_radio_work_done() Work item 5843 done
src/wiphy.c:wiphy_radio_work_next() Starting work item 5844
"Program terminated with signal SIGSEGV, Segmentation fault.",
"#0 0x0000565359ee3f54 in network_bss_find_by_addr ()",
"#0 0x0000565359ee3f54 in network_bss_find_by_addr ()",
"#1 0x0000565359ec9d23 in station_ft_work_ready ()",
"#2 0x0000565359ec0af0 in wiphy_radio_work_next ()",
"#3 0x0000565359f20080 in offchannel_mlme_notify ()",
"#4 0x0000565359f4416b in received_data ()",
"#5 0x0000565359f40d90 in io_callback ()",
"#6 0x0000565359f3ff4d in l_main_iterate ()",
"#7 0x0000565359f4001c in l_main_run ()",
"#8 0x0000565359f40240 in l_main_run_with_signal ()",
"#9 0x0000565359eb3888 in main ()"
ssid is declared as a 32 byte field in handshake_state, hence using it
as a string which is assumed to be nul-terminated will fail for SSIDs
that are 32 bytes long.
Fixes: d938d362b2 ("erp: ERP implementation and key cache move")
Fixes: 433373fe28 ("eapol: cache ERP keys on EAP success")
ssid is declared as a 32 byte field in handshake_state, hence using it
as a string which is assumed to be nul-terminated will fail for SSIDs
that are 32 bytes long.
Fixes: 1f14782857 ("wiphy: add _generate_address_from_ssid")
Fixes: 5a1b1184fc ("netdev: support per-network MAC addresses")
In netdev_retry_owe, if l_gen_family_send fails, the connect_cmd is
never freed or reset. Fix that.
While here, use a stack variable instead of netdev member, since the use
of such a member is unnecessary and confusing.
vendor_ies stored in handshake_state are already added as part of
netdev_populate_common_ies(), which is already invoked by
netdev_build_cmd_connect().
Normally vendor_ies is NULL for OWE connections, so no IEs are
duplicated as a result.
CC src/adhoc.o
In file included from src/adhoc.c:28:0:
/usr/include/linux/if.h:234:19: error: field ‘ifru_addr’ has incomplete type
struct sockaddr ifru_addr;
^
/usr/include/linux/if.h:235:19: error: field ‘ifru_dstaddr’ has incomplete type
struct sockaddr ifru_dstaddr;
^
/usr/include/linux/if.h:236:19: error: field ‘ifru_broadaddr’ has incomplete type
struct sockaddr ifru_broadaddr;
^
/usr/include/linux/if.h:237:19: error: field ‘ifru_netmask’ has incomplete type
struct sockaddr ifru_netmask;
^
/usr/include/linux/if.h:238:20: error: field ‘ifru_hwaddr’ has incomplete type
struct sockaddr ifru_hwaddr;
^
Very rarely on ath10k (potentially other ath cards), disabling
power save while the interface is down causes a timeout when
bringing the interface back up. This seems to be a race in the
driver or firmware but it causes IWD to never start up properly
since there is no retry logic on that path.
Retrying is an option, but a more straight forward approach is
to just reorder the logic to set power save off after the
interface is already up. If the power save setting fails we can
just log it, ignore the failure, and continue. From a users point
of view there is no real difference in doing it this way as
PS still gets disabled prior to IWD connecting/sending data.
Changing behavior based on a buggy driver isn't something we
should be doing, but in this instance the change shouldn't have
any downside and actually isn't any different than how it has
been done prior to the driver quirks change (i.e. use network
manager, iw, or iwconfig to set power save after IWD starts).
For reference, this problem is quite rare and difficult to say
exactly how often but certainly <1% of the time:
iwd[1286641]: src/netdev.c:netdev_disable_ps_cb() Disabled power save for ifindex 54
kernel: ath10k_pci 0000:02:00.0: wmi service ready event not received
iwd[1286641]: Error bringing interface 54 up: Connection timed out
kernel: ath10k_pci 0000:02:00.0: Could not init core: -110
After this IWD just sits idle as it has no interface to start using.
This is even reproducable outside of IWD if you loop and run:
ip link set <wlan> down
iw dev <wlan> set power_save off
ip link set <wlan> up
Eventually the 'up' command will fail with a timeout.
I've brought this to the linux-wireless/ath10k mailing list but
even if its fixed in future kernels we'd still need to support
older kernels, so a workaround/change in IWD is still required.
This is done already for DPP, do the same for PKEX. Few drivers
(ath9k upstream, ath10k/11k in progress) support this which is
unfortunate but since a configurator will not work without this
capability its best to fail early.
The DPP spec allows 3rd party fields in the DPP configuration
object (section 4.5.2). IWD can take advantage of this (when
configuring another IWD supplicant) to communicate additional
profile options that may be required for the network.
The new configuration member will be called "/net/connman/iwd"
and will be an object containing settings specific to IWD.
More settings could be added here if needed but for now only
the following are defined:
{
send_hostname: true/false,
hidden: true/false
}
These correspond to the following network profile settings:
[IPv4].SendHostname
[Settings].Hidden
The scan result handling was fragile because it assumed the kernel
would only give results matching the requested SSID. This isn't
something we should assume so instead keep the configuration object
around until after the scan and use the target SSID to lookup the
network.
Nearly every use of the ssid member first has to memcpy it to a
buffer and NULL terminate. Instead just store the ssid as a
string when creating/parsing from JSON.
The DPP-PKEX spec provides a very limited list of frequencies used
to discover configurators, only 3 on 2.4 and 5GHz bands. Since
configurators (at least in IWD's implementation) are only allowed
on the current operating frequency its very unlikely an enrollee
will find a configurator on these frequencies out of the entire
spectrum.
The spec does mention that the 3 default frequencies should be used
"In lieu of specific channel information obtained in a manner outside
the scope of this specification, ...". This allows the implementation
some flexibility in using a broader range of frequencies.
To increase the chances of finding a configurator shared code
enrollees will first issue a scan to determine what access points are
around, then iterate these frequencies. This is especially helpful
when the configurators are IWD-based since we know that they'll be
on the same channels as the APs in the area.
The post-DPP connection was never done quite right due to station's
state being unknown. The state is now tracked in DPP by a previous
patch but the scan path in DPP is still wrong.
It relies on station autoconnect logic which has the potential to
connect to a different network than what was configured with DPP.
Its unlikely but still could happen in theory. In addition the scan
was not selectively filtering results by the SSID that DPP
configured.
This fixes the above problems by first filtering the scan by the
SSID. Then setting the scan results into station without triggering
autoconnect. And finally using network_autoconnect() directly
instead of relying on station to choose the SSID.
DPP (both DPP and PKEX) run the risk of odd behavior if station
decides to change state. DPP is completely unaware of this and
best case would just result in a protocol failure, worst case
duplicate calls to __station_connect_network.
Add a station watch and stop DPP if station changes state during
the protocol.
Commit c59669a366 ("netdev: disambiguate between disconnection types")
introduced different paths for different types of disconnection
notifications from netdev. Formalize this further by having
netdev_connect_failed only invoke connect_cb.
Disconnections that could be triggered outside of connection
related events are now handled on a different code path. For this
purpose, netdev_disconnected() is introduced.
When a roam event is received, iwd generates a firmware scan request and
notifies its event filter of the ROAMING condition. In cases where the
firmware scan could not be started successfully, netdev_connect_failed
is invoked. This is not a correct use of netev_connect_failed since it
doesn't actually disconnect the underlying netdev and the reflected
state becomes de-synchronized from the underlying kernel device.
The firmware scan request could currently fail for two reasons:
1. nl80211 genl socket is in a bad state, or
2. the scan context does not exist
Since both reasons are highly unlikely, simply use L_WARN instead.
The other two cases where netdev_connect_failed is used could only occur
if the kernel message is invalid. The message is ignored in that case
and a warning is printed.
The situation described above also exists in netdev_get_fw_scan_cb. If
the scan could not be completed successfully, there's not much iwd can
do to recover. Have iwd remain in roaming state and print an error.
There are generally three scenarios where iwd generates a disconnection
command to the kernel:
1. Error conditions stemming from a connection related event. For
example if SAE/FT/FILS authentication fails during Authenticate or
Associate steps and the kernel doesn't disconnect properly.
2. Deauthentication after the connection has been established and not
related to a connection attempt in progress. For example, SA Query
processing that triggers an disconnect.
3. Disconnects that are triggered due to a handshake failure or if
setting keys resulting from the handshake fails. These disconnects
can be triggered as a result of a pending connection or when a
connection has been established (e.g. due to rekeying).
Distinguish between 1 and 2/3 by having the disconnect procedure take
different paths. For now there are no functional changes since all
paths end up in netdev_connect_failed(), but this will change in the
future.
While here, also get rid of netdev_del_station. The only user of this
function was in ap.c and it could easily be replaced by invoking the new
nl80211_build_del_station function. The callback used by
netdev_build_del_station only printed an error and didn't do anything
useful. Get rid of it for now.
netdev_begin_connection() already invokes netdev_connect_failed on
error. Remove any calls to netdev_connect_failed in callers of
netdev_begin_connection().
Fixes: 4165d9414f ("netdev: use wiphy radio work queue for connections")
If netdev_get_oci fails, a goto deauth is invoked in order to terminate
the current connection and return an error to the caller. Unfortunately
the deauth label builds CMD_DEAUTHENTICATE in order to terminate the
connection. This was fine because it used to handle authentication
protocols that ran over CMD_AUTHENTICATE and CMD_ASSOCIATE. However,
OCI can also be used on FullMAC hardware that does not support them.
Use CMD_DISCONNECT instead which works everywhere.
Fixes: 06482b8116 ("netdev: Obtain operating channel info")
The reason code field was being obtained as a uint8_t value, while it is
actually a uint16_t in little-endian byte order.
Fixes: f3cc96499c ("netdev: added support for SA Query")
The reason code from deauthentication frame was being obtained as a
uint8_t instead of a uint16_t. The value was only ever used in an
informational statement. Since the value was in little endian, only the
first 8 bits of the reason code were obtained. Fix that.
Fixes: 2bebb4bdc7 ("netdev: Handle deauth frames prior to association")
Several tests do not pass due to some additional changes that have
not been merged. Remove these cases and add some hardening after
discovering some unfortunate wpa_supplicant behavior.
- Disable p2p in wpa_supplicant. With p2p enabled an extra device
is created which starts receiving DPP frames and printing
confusing messages.
- Remove extra asserts which don't make sense currently. These
will be added back later as future additions to PKEX are
upstreamed.
- Work around wpa_supplicant retransmit limitation. This is
described in detail in the comment in pkex_test.py
- wait_for_event was returning a list in certain cases, not the
event itself
- The configurator ID was not being printed (',' instead of '%')
- The DPP ID was not being properly waited for with PKEX
With the addition of DPP PKEX autotests some of the timeouts are
quite long and hit test-runners maximum timeouts. For UML we should
allow this since time-travel lets us skip idle waits. Move the test
timeout out of a global define and into the argument list so QEMU
and UML can define it differently.
The StartConfigurator() call was left out since there would be no
functional difference to the user in iwctl. Its expected that
human users of the shared code API provide the code/id ahead of
time, i.e. use ConfigureEnrollee/StartEnrollee.
Check that enough space for newline and 0-byte is left in line.
This fixes a buffer overflow on specific completion results.
Reported-By: Leona Maroni <dev@leona.is>