Commit 1fe5070 added a workaround for drivers which may send the
connect event prior to the connect callback/ack. This caused IWD
to fail to start eapol if reassociation was used due to
netdev_reassociate never setting netdev->connected = false.
netdev_reassociate uses the same code path as normal connections,
but when the connect callback came in connected was already set
to true which then prevents eapol from being registered. Then,
once the connect event comes in, there is no frame watch for
eapol and IWD doesn't respond to any handshake frames.
Prior to this, an error sending the FT Reassociation was treated
as fatal, which is correct for FT-over-Air but not for FT-over-DS.
If the actual l_genl_family_send call fails for FT-over-DS the
existing connection can be maintained and there is no need to
call netdev_connect_failed.
Adding a return to the tx_associate function works for both FT
types. In the FT-over-Air case this return will ultimately get
sent back up to auth_proto_rx_authenticate in which case will
call netdev_connect_failed. For FT-over-DS tx_associate is
actually called from the 'start' operation which can fail and
still maintain the existing connection.
FT-over-DS followed the same pattern as FT-over-Air which worked,
but really limited how the protocol could be used. FT-over-DS is
unique in that we can authenticate to many APs by sending out
FT action frames and parsing the results. Once parsed IWD can
immediately Reassociate, or do so at a later time.
To take advantage of this IWD need to separate FT-over-DS into
two stages: action frame and reassociation.
The initial action frame stage is started by netdev. The target
BSS is sent an FT action frame and a new cache entry is created
in ft.c. Once the response is received the entry is updated
with all the needed data to Reassociate. To limit the record
keeping on netdev each FT-over-DS entry holds a userdata pointer
so netdev doesn't need to maintain its own list of data for
callbacks.
Once the action response is parsed netdev will call back signalling
the action frame sequence was completed (either successfully or not).
At this point the 'normal' FT procedure can start using the
FT-over-DS auth-proto.
The info struct is on the stack which leads to the potential
for uninitialized data access. Zero out the info struct prior
to calling the get station callback:
==141137== Conditional jump or move depends on uninitialised value(s)
==141137== at 0x458A6F: diagnostic_info_to_dict (diagnostic.c:109)
==141137== by 0x41200B: station_get_diagnostic_cb (station.c:3620)
==141137== by 0x405BE1: netdev_get_station_cb (netdev.c:4783)
==141137== by 0x4722F9: process_unicast (genl.c:994)
==141137== by 0x4722F9: received_data (genl.c:1102)
==141137== by 0x46F28B: io_callback (io.c:120)
==141137== by 0x46E5AC: l_main_iterate (main.c:478)
==141137== by 0x46E65B: l_main_run (main.c:525)
==141137== by 0x46E65B: l_main_run (main.c:507)
==141137== by 0x46E86B: l_main_run_with_signal (main.c:647)
==141137== by 0x403EA8: main (main.c:490)
In case the netdev is brought down while we're trying to connect, try to
detect this and fail early instead of trying to send additional
commands.
src/station.c:station_enter_state() Old State: disconnected, new state: connecting
src/station.c:station_netdev_event() Associating
src/netdev.c:netdev_mlme_notify() MLME notification Connect(46)
src/netdev.c:netdev_connect_event()
src/netdev.c:netdev_link_notify() event 16 on ifindex 4
src/eapol.c:eapol_handle_ptk_1_of_4() ifindex=4
src/netdev.c:netdev_link_notify() event 16 on ifindex 4
src/eapol.c:eapol_handle_ptk_3_of_4() ifindex=4
src/netdev.c:netdev_set_gtk() 4
src/station.c:station_handshake_event() Setting keys
src/netdev.c:netdev_set_tk() 4
src/netdev.c:netdev_set_rekey_offload() 4
New Key for Group Key failed for ifindex: 4:Network is down
src/netdev.c:netdev_link_notify() event 16 on ifindex 4
src/station.c:station_free()
src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
src/netdev.c:netdev_disconnect_event()
src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX
src/netdev.c:netdev_link_notify() event 16 on ifindex 4
src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is DE
src/wiphy.c:wiphy_radio_work_done() Work item 14 done
src/station.c:station_connect_cb() 4, result: 4
Segmentation fault
If the iftype changes, kernel silently wipes out any frame registrations
we may have registered. Right now, frame registrations are only done when
the interface is created. This can result in frame watches not being
added if the interface type is changed between station mode to ap mode
and then back to station mode, e.g.:
device wlan0 set-property Mode ap
device wlan0 set-property Mode station
Make sure to re-add frame registrations according to the mode if the
interface type is changed.
Since netdev now keeps track of iftype changes, let it call
frame_watch_wdev_remove on netdevs that it manages to clear frame
registrations that should be cleared due to an iftype change.
Note that P2P_DEVICE wdevs are not managed by any netdev object, but
since their iftype cannot be changed, they should not be affected
by this change.
And set the interface type based on the event rather than the command
callback. This allows us to track interface type changes even if they
come from outside iwd (which shouldn't happen.)
The prepare_ft patch was an intermediate to a full patch
set and was not fully tested stand alone. Its placement
actually broke FT due to handshake->aa getting overwritten
prior to netdev->prev_bssid being copied out. This caused
FT to fail with "transport endpoint not connected (-107)"
Fix a regression where connection to an open network results in an
NotSupported error being returned.
Fixes: d79e883e93 ("netdev: Introduce connection types")
This makes conversions simpler. Also fixes a bug where P2P devices were
printed with an incorrect Mode value since dbus_iftype_to_string was
assuming that an iftype as defined in nl80211.h was being passed in,
while netdev was returning an enum value defined in netdev.h.
The 8021x offloading procedure still does EAP in userspace which
negotiates the PMK. The kernel then expects to obtain this PMK
from userspace by calling SET_PMK. This then allows the firmware
to begin the 4-way handshake.
Using __eapol_install_set_pmk_func to install netdev_set_pmk,
netdev now gets called into once EAP finishes and can begin
the final userspace actions prior to the firmware starting
the 4-way handshake:
- SET_PMK using PMK negotiated with EAP
- Emit SETTING_KEYS event
- netdev_connect_ok
One thing to note is that the kernel provides no way of knowing if
the 4-way handshake completed. Assuming SET_PMK/SET_STATION come
back with no errors, IWD assumes the PMK was valid. If not, or
due to some other issue in the 4-way, the kernel will send a
disconnect.
This adds a new type for 8021x offload as well as support in
building CMD_CONNECT.
As described in the comment, 8021x offloading is not particularly
similar to PSK as far as the code flow in IWD is concerned. There
still needs to be an eapol_sm due to EAP being done in userspace.
This throws somewhat of a wrench into our 'is_offload' cases. And
as such this connection type is handled specially.
The chances were extremely low, but using l_idle_oneshot
could end up causing a invalid memory access if the netdev
went down while waiting for the disconnect idle callback.
Instead netdev can keep track of the idle with l_idle_create
and remove it if the netdev goes down prior to the idle callback.
This crash was caused from the disconnect_cb being called
immediately in cases where send_disconnect was false. The
previous patch actually addressed this separately as this
flag was being set improperly which will, indirectly, fix
one of the two code paths that could cause this crash.
Still, there is a situation where send_disconnect could
be false and in this case IWD would still crash. If IWD
is waiting to queue the connect item and netdev_disconnect
is called it would result in the callback being called
immediately. Instead we can add an l_idle as to allow the
callback to happen out of scope, which is what station
expects.
Prior to this patch, the crashing behavior can be tested using
the following script (or some variant of it, your system timing
may not be the same as mine).
iwctl station wlan0 disconnect
iwctl station wlan0 connect <network1> &
sleep 0.02
iwctl station wlan0 connect <network2>
++++++++ backtrace ++++++++
0 0x7f4e1504e530 in /lib64/libc.so.6
1 0x432b54 in network_get_security() at src/network.c:253
2 0x416e92 in station_handshake_setup() at src/station.c:937
3 0x41a505 in __station_connect_network() at src/station.c:2551
4 0x41a683 in station_disconnect_onconnect_cb() at src/station.c:2581
5 0x40b4ae in netdev_disconnect() at src/netdev.c:3142
6 0x41a719 in station_disconnect_onconnect() at src/station.c:2603
7 0x41a89d in station_connect_network() at src/station.c:2652
8 0x433f1d in network_connect_psk() at src/network.c:886
9 0x43483a in network_connect() at src/network.c:1183
10 0x4add11 in _dbus_object_tree_dispatch() at ell/dbus-service.c:1802
11 0x49ff54 in message_read_handler() at ell/dbus.c:285
12 0x496d2f in io_callback() at ell/io.c:120
13 0x495894 in l_main_iterate() at ell/main.c:478
14 0x49599b in l_main_run() at ell/main.c:521
15 0x495cb3 in l_main_run_with_signal() at ell/main.c:647
16 0x404add in main() at src/main.c:490
17 0x7f4e15038b25 in /lib64/libc.so.6
The send_disconnect flag was being improperly set based only
on connect_cmd_id being zero. This does not take into account
the case of CMD_CONNECT having finished but not EAPoL. In this
case we do need to send a disconnect.
This adds a new connection type, TYPE_PSK_OFFLOAD, which
allows the 4-way handshake to be offloaded by the firmware.
Offloading will be used if the driver advertises support.
The CMD_ROAM event path was also modified to take into account
handshake offloading. If the handshake is offloaded we still
must issue GET_SCAN, but not start eapol since the firmware
takes care of this.
In the FW scan callback eapol was being stared unconditionally which
isn't correct as roaming on open networks is possible. Instead check
that a SM exists just like is done in netdev_connect_event.
This should have been updated along with the connect and roam
event separation. Since netdev_connect_event is not being
re-used for CMD_ROAM the comment did not make sense anymore.
Still, there needs to be a check to ensure we were not disconnected
while waiting for GET_SCAN to come back.
netdev_connect_event was being reused for parsing of CMD_ROAM
attributes which made some amount of sense since these events
are nearly identical, but due to the nature of firmware roaming
there really isn't much IWD needs to parse from CMD_ROAM. In
addition netdev_connect_event was getting rather complicated
since it had to handle both CMD_ROAM and CMD_CONNECT.
The only bits of information IWD needs to parse from CMD_ROAM
is the roamed BSSID, authenticator IEs, and supplicant IEs. Since
this is so limited it now makes little sense to reuse the entire
netdev_connect_event function, and intead only parse what is
needed for CMD_ROAM.
Currently netdev handles SoftMac and FullMac drivers mostly in the same
way, by building CMD_CONNECT nl80211 commands and letting the kernel
figure out the details. Exceptions to this are FILS/OWE/SAE AKMs which
are only supported on SoftMac drivers by using
CMD_AUTHENTICATE/CMD_ASSOCIATE.
Recently, basic support for SAE (WPA3-Personal) offload on FullMac cards
was introduced. When offloaded, the control flow is very different than
under typical conditions and required additional logic checks in several
places. The logic is now becoming quite complex.
Introduce a concept of a connection type in order to make it clearer
what driver and driver features are being used for this connection. In
the future, connection types can be expanded with 802.1X handshake
offload, PSK handshake offload and CMD_EXTERNAL_AUTH based SAE
connections.
Any auth proto which did not implement the assoc_timeout handler
could end up getting 'stuck' forever if there was an associate
timeout. This is because in the event of an associate timeout IWD
only sets a few flags and relies on the connect event to actually
handle the failure. The problem is a connect event never comes
if the failure was a timeout.
To fix this we can explicitly fail the connection if the auth
proto has not implemented assoc_timeout or if it returns false.
SAE offload support requires some minor tweaks to CMD_CONNECT
as well as special checks once the connect event comes in. Since
at this point we are fully connected.
Roaming on a full mac card is quite different than soft mac
and needs to be specially handled. The process starts with
the CMD_ROAM event, which tells us the driver is already
roamed and associated with a new AP. After this it expects
the 4-way handshake to be initiated. This in itself is quite
simple, the complexity comes with how this is piped into IWD.
After CMD_ROAM fires its assumed that a scan result is
available in the kernel, which is obtained using a newly
added scan API scan_get_firmware_scan. The only special
bit of this is that it does not 'schedule' a scan but simply
calls GET_SCAN. This is treated special and will not be
queued behind any other pending scan requests. This lets us
reuse some parsing code paths in scan and initialize a
scan_bss object which ultimately gets handed to station so
it can update connected_bss/bss_list.
For consistency station must also transition to a roaming state.
Since this roam is all handled by netdev two new events were
added, NETDEV_EVENT_ROAMING and NETDEV_EVENT_ROAMED. Both allow
station to transition between roaming/connected states, and ROAMED
provides station with the new scan_bss to replace connected_bss.
Since GET_STATION (and in turn GetDiagnostics) gets the most
current station info this attribute serves as a better indication
of the current signal strength. In addition full mac cards don't
appear to always have the average attribute.
If the extended feature for CQM levels was not supported no CQM
registration would happen, not even for a single level. This
caused IWD to completely lose the ability to roam since it would
only get notified when the kernel was disconnecting, around -90
dBm, not giving IWD enough time to roam.
Instead if the extended feature is not supported we can still
register for the event, just without multiple signal levels.
Handle situations where the BSS we're trying to connect to is no longer
in the kernel scan result cache. Normally, the kernel will re-scan the
target frequency if this happens on the CMD_CONNECT path, and retry the
connection.
Unfortunately, CMD_AUTHENTICATE path used for WPA3, OWE and FILS does
not have this scanning behavior. CMD_AUTHENTICATE simply fails with
a -ENOENT error. Work around this by trying a limited scan of the
target frequency and re-trying CMD_AUTHENTICATE once.
In some cases the AP can send a deauthenticate frame right after
accepting our authentication. In this case the kernel never properly
sends a CMD_CONNECT event with a failure, even though CMD_COONNECT was
used to initiate the connection. Try to work around that by detecting
that a Deauthenticate event arrives prior to any Associte or Connect
events and handle this case as a connect failure.
netdev_shutdown calls queue_destroy on the netdev_list, which in turn
calls netdev_free. netdev_free invokes the watches to notify them about
the netdev being removed. Those clients, or anything downstream can
still invoke netdev_find. Unfortunately queue_destroy is not re-entrant
safe, so netdev_find might return stale data. Fix that by using
l_queue_peek_head / l_queue_pop_head instead.
src/station.c:station_enter_state() Old State: connecting, new state:
connected
^CTerminate
src/netdev.c:netdev_free() Freeing netdev wlan1[6]
src/device.c:device_free()
Removing scan context for wdev 100000001
src/scan.c:scan_context_free() sc: 0x4ae9ca0
src/netdev.c:netdev_free() Freeing netdev wlan0[48]
src/device.c:device_free()
src/station.c:station_free()
src/netconfig.c:netconfig_destroy()
==103174== Invalid read of size 8
==103174== at 0x467AA9: l_queue_find (queue.c:346)
==103174== by 0x43ACFF: netconfig_reset (netconfig.c:1027)
==103174== by 0x43AFFC: netconfig_destroy (netconfig.c:1123)
==103174== by 0x414379: station_free (station.c:3369)
==103174== by 0x414379: station_destroy_interface (station.c:3466)
==103174== by 0x47C80C: interface_instance_free (dbus-service.c:510)
==103174== by 0x47C80C: _dbus_object_tree_remove_interface
(dbus-service.c:1694)
==103174== by 0x47C99C: _dbus_object_tree_object_destroy
(dbus-service.c:795)
==103174== by 0x409A87: netdev_free (netdev.c:770)
==103174== by 0x4677AE: l_queue_clear (queue.c:107)
==103174== by 0x4677F8: l_queue_destroy (queue.c:82)
==103174== by 0x40CDC1: netdev_shutdown (netdev.c:5089)
==103174== by 0x404736: iwd_shutdown (main.c:78)
==103174== by 0x404736: iwd_shutdown (main.c:65)
==103174== by 0x46BD61: handle_callback (signal.c:78)
==103174== by 0x46BD61: signalfd_read_cb (signal.c:104)
In the case of module_init failing due to a module that comes after
netdev, the netdev module doesn't clean up netdev_list properly.
==6254== 24 bytes in 1 blocks are still reachable in loss record 1 of 1
==6254== at 0x483777F: malloc (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==6254== by 0x4675ED: l_malloc (util.c:61)
==6254== by 0x46909D: l_queue_new (queue.c:63)
==6254== by 0x406AE4: netdev_init (netdev.c:5038)
==6254== by 0x44A7B3: iwd_modules_init (module.c:152)
==6254== by 0x404713: nl80211_appeared (main.c:171)
==6254== by 0x4713DE: process_unicast (genl.c:993)
==6254== by 0x4713DE: received_data (genl.c:1101)
==6254== by 0x46E00B: io_callback (io.c:118)
==6254== by 0x46D20C: l_main_iterate (main.c:477)
==6254== by 0x46D2DB: l_main_run (main.c:524)
==6254== by 0x46D2DB: l_main_run (main.c:506)
==6254== by 0x46D502: l_main_run_with_signal (main.c:656)
==6254== by 0x403EDB: main (main.c:490)