Any auth proto which did not implement the assoc_timeout handler
could end up getting 'stuck' forever if there was an associate
timeout. This is because in the event of an associate timeout IWD
only sets a few flags and relies on the connect event to actually
handle the failure. The problem is a connect event never comes
if the failure was a timeout.
To fix this we can explicitly fail the connection if the auth
proto has not implemented assoc_timeout or if it returns false.
SAE offload support requires some minor tweaks to CMD_CONNECT
as well as special checks once the connect event comes in. Since
at this point we are fully connected.
Roaming on a full mac card is quite different than soft mac
and needs to be specially handled. The process starts with
the CMD_ROAM event, which tells us the driver is already
roamed and associated with a new AP. After this it expects
the 4-way handshake to be initiated. This in itself is quite
simple, the complexity comes with how this is piped into IWD.
After CMD_ROAM fires its assumed that a scan result is
available in the kernel, which is obtained using a newly
added scan API scan_get_firmware_scan. The only special
bit of this is that it does not 'schedule' a scan but simply
calls GET_SCAN. This is treated special and will not be
queued behind any other pending scan requests. This lets us
reuse some parsing code paths in scan and initialize a
scan_bss object which ultimately gets handed to station so
it can update connected_bss/bss_list.
For consistency station must also transition to a roaming state.
Since this roam is all handled by netdev two new events were
added, NETDEV_EVENT_ROAMING and NETDEV_EVENT_ROAMED. Both allow
station to transition between roaming/connected states, and ROAMED
provides station with the new scan_bss to replace connected_bss.
Since GET_STATION (and in turn GetDiagnostics) gets the most
current station info this attribute serves as a better indication
of the current signal strength. In addition full mac cards don't
appear to always have the average attribute.
If the extended feature for CQM levels was not supported no CQM
registration would happen, not even for a single level. This
caused IWD to completely lose the ability to roam since it would
only get notified when the kernel was disconnecting, around -90
dBm, not giving IWD enough time to roam.
Instead if the extended feature is not supported we can still
register for the event, just without multiple signal levels.
Handle situations where the BSS we're trying to connect to is no longer
in the kernel scan result cache. Normally, the kernel will re-scan the
target frequency if this happens on the CMD_CONNECT path, and retry the
connection.
Unfortunately, CMD_AUTHENTICATE path used for WPA3, OWE and FILS does
not have this scanning behavior. CMD_AUTHENTICATE simply fails with
a -ENOENT error. Work around this by trying a limited scan of the
target frequency and re-trying CMD_AUTHENTICATE once.
In some cases the AP can send a deauthenticate frame right after
accepting our authentication. In this case the kernel never properly
sends a CMD_CONNECT event with a failure, even though CMD_COONNECT was
used to initiate the connection. Try to work around that by detecting
that a Deauthenticate event arrives prior to any Associte or Connect
events and handle this case as a connect failure.
netdev_shutdown calls queue_destroy on the netdev_list, which in turn
calls netdev_free. netdev_free invokes the watches to notify them about
the netdev being removed. Those clients, or anything downstream can
still invoke netdev_find. Unfortunately queue_destroy is not re-entrant
safe, so netdev_find might return stale data. Fix that by using
l_queue_peek_head / l_queue_pop_head instead.
src/station.c:station_enter_state() Old State: connecting, new state:
connected
^CTerminate
src/netdev.c:netdev_free() Freeing netdev wlan1[6]
src/device.c:device_free()
Removing scan context for wdev 100000001
src/scan.c:scan_context_free() sc: 0x4ae9ca0
src/netdev.c:netdev_free() Freeing netdev wlan0[48]
src/device.c:device_free()
src/station.c:station_free()
src/netconfig.c:netconfig_destroy()
==103174== Invalid read of size 8
==103174== at 0x467AA9: l_queue_find (queue.c:346)
==103174== by 0x43ACFF: netconfig_reset (netconfig.c:1027)
==103174== by 0x43AFFC: netconfig_destroy (netconfig.c:1123)
==103174== by 0x414379: station_free (station.c:3369)
==103174== by 0x414379: station_destroy_interface (station.c:3466)
==103174== by 0x47C80C: interface_instance_free (dbus-service.c:510)
==103174== by 0x47C80C: _dbus_object_tree_remove_interface
(dbus-service.c:1694)
==103174== by 0x47C99C: _dbus_object_tree_object_destroy
(dbus-service.c:795)
==103174== by 0x409A87: netdev_free (netdev.c:770)
==103174== by 0x4677AE: l_queue_clear (queue.c:107)
==103174== by 0x4677F8: l_queue_destroy (queue.c:82)
==103174== by 0x40CDC1: netdev_shutdown (netdev.c:5089)
==103174== by 0x404736: iwd_shutdown (main.c:78)
==103174== by 0x404736: iwd_shutdown (main.c:65)
==103174== by 0x46BD61: handle_callback (signal.c:78)
==103174== by 0x46BD61: signalfd_read_cb (signal.c:104)
In the case of module_init failing due to a module that comes after
netdev, the netdev module doesn't clean up netdev_list properly.
==6254== 24 bytes in 1 blocks are still reachable in loss record 1 of 1
==6254== at 0x483777F: malloc (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==6254== by 0x4675ED: l_malloc (util.c:61)
==6254== by 0x46909D: l_queue_new (queue.c:63)
==6254== by 0x406AE4: netdev_init (netdev.c:5038)
==6254== by 0x44A7B3: iwd_modules_init (module.c:152)
==6254== by 0x404713: nl80211_appeared (main.c:171)
==6254== by 0x4713DE: process_unicast (genl.c:993)
==6254== by 0x4713DE: received_data (genl.c:1101)
==6254== by 0x46E00B: io_callback (io.c:118)
==6254== by 0x46D20C: l_main_iterate (main.c:477)
==6254== by 0x46D2DB: l_main_run (main.c:524)
==6254== by 0x46D2DB: l_main_run (main.c:506)
==6254== by 0x46D502: l_main_run_with_signal (main.c:656)
==6254== by 0x403EDB: main (main.c:490)
Fix an issue with the recent changes to signal monitoring from commit
f456501b ("station: retry roaming unless notified of a high RSSI"):
1. driver sends NL80211_CQM_RSSI_THRESHOLD_EVENT_LOW
2. netdev->cur_rssi_low changes from FALSE to TRUE
3. netdev sends NETDEV_EVENT_RSSI_THRESHOLD_LOW to station
4. on roam reassociation, cur_rssi_low is reset to FALSE
5. station still assumes RSSI is low, periodically roams
until netdev sends NETDEV_EVENT_RSSI_THRESHOLD_HIGH
6. driver sends NL80211_CQM_RSSI_THRESHOLD_EVENT_HIGH
7. netdev->cur_rssi_low doesn't change (still FALSE)
8. netdev never sends NETDEV_EVENT_RSSI_THRESHOLD_HIGH
9. station remains stuck in an infinite roaming loop
The commit in question introduced the logic in (5). Previously the
assumption in station was - like in netdev - that if the signal was
still low, the driver would send a duplicate LOW event after
reassociation. This change makes netdev follow the same new logic as
station, i.e. assume the same signal state (LOW/HIGH) until told
otherwise by the driver.
With AP now getting its own diagnostic interface it made sense
to move the netdev_station_info struct definition into its own
header which eventually can be accompanied by utilities in
diagnostic.c. These utilities can then be shared with AP and
station as needed.
This is a nl80211 dump version of netdev_get_station aimed at
AP mode. This will dump all stations, parse into
netdev_station_info structs, and call the callback for each
individual station found. Once the dump is completed the destroy
callback is called.
This adds a generalized API for GET_STATION. This API handles
calling and parsing the results into a new structure,
netdev_station_info. This results structure will hold any
data needed by consumers of netdev_get_station. A helper API
(netdev_get_current_station) was added as a convenience which
automatically passes handshake->aa as the MAC.
For now only the RSSI is parsed as this is already being
done for RSSI polling/events. Looking further more info will
be added such as rx/tx rates and estimated throughput.
In both FT or FILS EAPoL isn't used for the initial handshake and only
for the later re-keys. For FT we added the
eapol_sm_set_require_handshake mechanism to tell EAPoL to not require
the initial handshake and we can re-use it for FILS.
build_cmd_ft_authenticate and build_cmd_authenticate were virtually
identical. These have been unified into a single builder.
We were also incorrectly including ATTR_IE to every authenticate
command, which violates the spec for certain protocols, This was
removed and any auth protocols will now add any IEs that they require.
In this situation the kernel is sending a low RSSI event which netdev
picks up, but since we set netdev->connected so early the event is
forwarded to station before IWD has fully connected. Station then
tries to get a neighbor report, which may fail and cause a known
frequency scan. If this is a new network the frequency scan tries to
get any known frequencies in network_info which will be unset and
cause a segfault.
This can be avoided by only sending RSSI events when netdev->operational
is set rather than netdev->connected.
It seems some APs send the IGTK key in big endian format (it is a
uin16). The kernel rightly reports an -EINVAL error when iwd issues a
NEW_KEY with such a value, resulting in the connection being aborted.
Work around this by trying to detect big-endian key indexes and 'fixing'
them up.
In order to support AlwaysRandomizeAddress and AddressOverride, station will
set the desired address into the handshake object. Then, netdev checks if
this was done and will use that address rather than generate one.
For privacy reasons its advantageous to randomize or mask
the MAC address when connecting to networks, especially public
networks.
This patch allows netdev to generate a new MAC address on a
per-network basis. The generated MAC will remain the same when
connecting to the same network. This allows reauthentications
or roaming to work, and not have to fully re-connect (which would
be required if the MAC changed on every connection).
Changing the MAC requires bringing the interface down. This does
lead to potential race conditions with respect to external
processes. There are two potential conditions which are explained
in a TODO comment in this patch.
This API was updated to take an extra boolean which will
automatically power up the device while changing the MAC
address. Since this is what IWD does anyways we can avoid
the need for an intermediate callback and go right into
netdev_initial_up_cb.
mac80211 drivers seem to send the disconnect event which is triggered by
CMD_DISCONNECT prior to the CMD_DISCONNECT response. However, some
drivers, namely brcmfmac, send the response first and then send the
disconnect event. This confused iwd when a connection was immediately
triggered after a disconnection (network switch operation).
Fix this by making sure that connected variable isn't set until the
connect event is actually processed, and ignore disconnect events which
come after CMD_DISCONNECT has alredy succeeded.
Allow netdev_create_from_genl callers to draw a random or non-random MAC
and pass it in the parameter instead of a bool to tell us to generating
the MAC locally. In P2P we are generating the MAC some time before
creating the netdev in order to pass it to the peer during negotiation.
It seems that the kernel uses -EOPNOTSUPP if the change_station
operation is not implemented by the driver. However, some drivers do
implement change_station and choose to report -ENOTSUPP instead of
-EOPNOTSUPP.
To add to the confusion, EOPNOTSUPP and -ENOTSUPP are the same on some
systems (e.g. Gentoo). Be paranoid and allow both errors to be ignored
when sending CMD_SET_STATION.
Fixes: 0238ffb8d9 ("netdev: Use -EOPNOTSUPP instead of -ENOTSUPP")
The kernel uses -EOPNOTSUPP in the case of change_station operation not
being provided. On most systems -EOPNOTSUPP is defined to be the same
as -ENOTSUPP, but seemingly not all systems.
Commit 1057d8aa74 changed the device interface creation logic
from being unconditional inside netdev.c to instead use NETDEV_WATCH_*
events. However, this broke the assumption that the device interface
was created before all others. The effect is that the scan_wdev_add
might no longer be called prior to station interface being created. Fix
this by moving scan_wdev_add/remove calls to netdev.c instead.
Fixes: 1057d8aa74 ("device: Move device creation from netdev.c to event watch")
Create and destroy the device state struct and the DBus interfaces in a
way more similar to the Station, AdHoc and AP interfaces. Drop
netdev_get_device() and the device specific code in netdev that as far
as I can tell wasn't needed.