Under very rare circumstances the roaming scan triggered might not be
canceled properly. This is because we issue the roam scan recursively
from within a scan callback and re-use the id of the scan for the
subsequent request. The destroy callback is invoked right after the
callback and resets the id. This leads to the scan not being canceled
properly in roam_state_clear().
src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
src/station.c:station_roam_trigger_cb() 37
src/station.c:station_roam_scan() ifindex: 37
src/station.c:station_roam_trigger_cb() Using cached neighbor report for roam
...
src/scan.c:get_scan_done() get_scan_done
src/station.c:station_roam_failed() 37
src/station.c:station_roam_scan() ifindex: 37
src/scan.c:scan_request_triggered() Active scan triggered for wdev 22
^CTerminate
src/netdev.c:netdev_free() Freeing netdev wlan0[37]
src/device.c:device_free()
src/station.c:station_free()
...
Removing scan context for wdev 22
src/scan.c:scan_context_free() sc: 0x4a362a0
src/wiphy.c:wiphy_radio_work_done() Work item 14 done
==19542== Invalid write of size 4
==19542== at 0x411500: station_roam_scan_destroy (station.c:2010)
==19542== by 0x420B5B: scan_request_free (scan.c:156)
==19542== by 0x410BAC: destroy_work (wiphy.c:294)
==19542== by 0x410BAC: wiphy_radio_work_done (wiphy.c:1613)
==19542== by 0x46C66E: l_queue_clear (queue.c:107)
==19542== by 0x46C6B8: l_queue_destroy (queue.c:82)
==19542== by 0x420BAE: scan_context_free (scan.c:205)
==19542== by 0x424135: scan_wdev_remove (scan.c:2272)
==19542== by 0x408754: netdev_free (netdev.c:847)
==19542== by 0x40E18C: netdev_shutdown (netdev.c:5773)
==19542== by 0x404756: iwd_shutdown (main.c:78)
==19542== by 0x404756: iwd_shutdown (main.c:65)
==19542== by 0x470E21: handle_callback (signal.c:78)
==19542== by 0x470E21: signalfd_read_cb (signal.c:104)
==19542== by 0x47166B: io_callback (io.c:120)
==19542== Address 0x4d81f98 is 200 bytes inside a block of size 288 free'd
==19542== at 0x48399CB: free (vg_replace_malloc.c:538)
==19542== by 0x47F3E5: interface_instance_free (dbus-service.c:510)
==19542== by 0x481DEA: _dbus_object_tree_remove_interface (dbus-service.c:1694)
==19542== by 0x481F1C: _dbus_object_tree_object_destroy (dbus-service.c:795)
==19542== by 0x40894F: netdev_free (netdev.c:844)
==19542== by 0x40E18C: netdev_shutdown (netdev.c:5773)
==19542== by 0x404756: iwd_shutdown (main.c:78)
==19542== by 0x404756: iwd_shutdown (main.c:65)
==19542== by 0x470E21: handle_callback (signal.c:78)
==19542== by 0x470E21: signalfd_read_cb (signal.c:104)
==19542== by 0x47166B: io_callback (io.c:120)
==19542== by 0x47088C: l_main_iterate (main.c:478)
==19542== by 0x47095B: l_main_run (main.c:525)
==19542== by 0x47095B: l_main_run (main.c:507)
==19542== by 0x470B6B: l_main_run_with_signal (main.c:647)
==19542== Block was alloc'd at
==19542== at 0x483879F: malloc (vg_replace_malloc.c:307)
==19542== by 0x46AB2D: l_malloc (util.c:62)
==19542== by 0x416599: station_create (station.c:3448)
==19542== by 0x406D55: netdev_newlink_notify (netdev.c:5324)
==19542== by 0x46D4BC: l_hashmap_foreach (hashmap.c:612)
==19542== by 0x472F46: process_broadcast (netlink.c:158)
==19542== by 0x472F46: can_read_data (netlink.c:279)
==19542== by 0x47166B: io_callback (io.c:120)
==19542== by 0x47088C: l_main_iterate (main.c:478)
==19542== by 0x47095B: l_main_run (main.c:525)
==19542== by 0x47095B: l_main_run (main.c:507)
==19542== by 0x470B6B: l_main_run_with_signal (main.c:647)
==19542== by 0x403EDB: main (main.c:490)
==19542==
Prior to this netdev_connect_ok set setting this which really
only applies to station mode. In addition this happens for each
new station that connects to the AP. Instead set the operstate /
link mode when AP starts and stops.
Change ap_start to load all of the AP configuration from a struct
l_settings, moving the 6 or so parameters from struct ap_config members
to the l_settings groups and keys. This extends the ap profile concept
used for the DHCP settings. ap_start callers create the l_settings
object and fill the values in it or read the settings in from a file.
Since ap_setup_dhcp and ap_load_profile_and_dhcp no longer do the
settings file loading, they needed to be refactored and some issues were
fixed in their logic, e.g. l_dhcp_server_set_ip_address() was never
called when the "IP pool" was used. Also the IP pool was previously only
used if the ap->config->profile was NULL and this didn't match what the
docs said:
"If [IPv4].Address is not provided and no IP address is set on the
interface prior to calling StartProfile the IP pool will be used."
The info struct is on the stack which leads to the potential
for uninitialized data access. Zero out the info struct prior
to calling the get station callback:
==141137== Conditional jump or move depends on uninitialised value(s)
==141137== at 0x458A6F: diagnostic_info_to_dict (diagnostic.c:109)
==141137== by 0x41200B: station_get_diagnostic_cb (station.c:3620)
==141137== by 0x405BE1: netdev_get_station_cb (netdev.c:4783)
==141137== by 0x4722F9: process_unicast (genl.c:994)
==141137== by 0x4722F9: received_data (genl.c:1102)
==141137== by 0x46F28B: io_callback (io.c:120)
==141137== by 0x46E5AC: l_main_iterate (main.c:478)
==141137== by 0x46E65B: l_main_run (main.c:525)
==141137== by 0x46E65B: l_main_run (main.c:507)
==141137== by 0x46E86B: l_main_run_with_signal (main.c:647)
==141137== by 0x403EA8: main (main.c:490)
It isn't safe to return a NULL from diagnostic_akm_suite_to_security()
since the value is used directly. Also, if the AKM suite is 0, this
implies that the network is an Open network and not some unknown AKM.
==17982== Invalid read of size 1
==17982== at 0x483BC92: strlen (vg_replace_strmem.c:459)
==17982== by 0x47DE60: _dbus1_builder_append_basic (dbus-util.c:981)
==17982== by 0x41ACB2: dbus_append_dict_basic (dbus.c:197)
==17982== by 0x412050: station_get_diagnostic_cb (station.c:3614)
==17982== by 0x405B19: netdev_get_station_cb (netdev.c:4801)
==17982== by 0x47436E: process_unicast (genl.c:994)
==17982== by 0x47436E: received_data (genl.c:1102)
==17982== by 0x470FBB: io_callback (io.c:120)
==17982== by 0x4701DC: l_main_iterate (main.c:478)
==17982== by 0x4702AB: l_main_run (main.c:525)
==17982== by 0x4702AB: l_main_run (main.c:507)
==17982== by 0x4704BB: l_main_run_with_signal (main.c:647)
==17982== by 0x403EDB: main (main.c:490)
==17982== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==17982==
Aborting (signal 11) [/home/denkenz/iwd/src/iwd]
++++++++ backtrace ++++++++
0 0x488a550 in /lib64/libc.so.6
1 0x483bc92 in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so
2 0x47de61 in _dbus1_builder_append_basic() at ell/dbus-util.c:983
3 0x41acb3 in dbus_append_dict_basic() at src/dbus.c:197
4 0x412051 in station_get_diagnostic_cb() at src/station.c:3618
5 0x405b1a in netdev_get_station_cb() at src/netdev.c:4801
It is possible for the RTNL command callback to come after
netconfig_reset or netconfig_destroy has been called. Make sure that
any outstanding commands that might access the netconfig object are
canceled.
src/netconfig.c:netconfig_ipv4_dhcp_event_handler() DHCPv4 event 0
src/netconfig.c:netconfig_ifaddr_added() wlan0: ifaddr 192.168.1.55/24 broadcast 192.168.1.255
^CTerminate
src/netdev.c:netdev_free() Freeing netdev wlan0[15]
src/device.c:device_free()
src/station.c:station_free()
src/netconfig.c:netconfig_destroy()
src/netconfig.c:netconfig_reset()
src/netconfig.c:netconfig_reset_v4() 16
src/netconfig.c:netconfig_reset_v4() Stopping client
Removing scan context for wdev c
src/scan.c:scan_context_free() sc: 0x4a3cc10
==12792== Invalid read of size 8
==12792== at 0x43BF5A: netconfig_route_add_cmd_cb (netconfig.c:600)
==12792== by 0x4727FA: process_message (netlink.c:181)
==12792== by 0x4727FA: can_read_data (netlink.c:289)
==12792== by 0x470F4B: io_callback (io.c:120)
==12792== by 0x47016C: l_main_iterate (main.c:478)
==12792== by 0x47023B: l_main_run (main.c:525)
==12792== by 0x47023B: l_main_run (main.c:507)
==12792== by 0x47044B: l_main_run_with_signal (main.c:647)
==12792== by 0x403EDB: main (main.c:490)
In case the netdev is brought down while we're trying to connect, try to
detect this and fail early instead of trying to send additional
commands.
src/station.c:station_enter_state() Old State: disconnected, new state: connecting
src/station.c:station_netdev_event() Associating
src/netdev.c:netdev_mlme_notify() MLME notification Connect(46)
src/netdev.c:netdev_connect_event()
src/netdev.c:netdev_link_notify() event 16 on ifindex 4
src/eapol.c:eapol_handle_ptk_1_of_4() ifindex=4
src/netdev.c:netdev_link_notify() event 16 on ifindex 4
src/eapol.c:eapol_handle_ptk_3_of_4() ifindex=4
src/netdev.c:netdev_set_gtk() 4
src/station.c:station_handshake_event() Setting keys
src/netdev.c:netdev_set_tk() 4
src/netdev.c:netdev_set_rekey_offload() 4
New Key for Group Key failed for ifindex: 4:Network is down
src/netdev.c:netdev_link_notify() event 16 on ifindex 4
src/station.c:station_free()
src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
src/netdev.c:netdev_disconnect_event()
src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX
src/netdev.c:netdev_link_notify() event 16 on ifindex 4
src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is DE
src/wiphy.c:wiphy_radio_work_done() Work item 14 done
src/station.c:station_connect_cb() 4, result: 4
Segmentation fault
A prior commit refactored the AKM selection in wiphy.c. This
ended up breaking FILS tests due to the hard coding of a
false fils_hint in wiphy_select_akm. Since our FILS tests
only advertise FILS AKMs wiphy_can_connect would return false
for these networks.
Similar to wiphy_select_akm, add a fils hint parameter to
wiphy_can_connect and pass that down directly to wiphy_select_akm.
If PreSharedKey is set, the current logic does not validate the
Passphrase beyond its existence. This can lead to strange situations
where an invalid WPA3-PSK passphrase might get used. This can of course
only happen if the user (as root) or NetworkManager-iwd-backend writes
such a file incorrectly.
Move the WSC Primary Device Type parsing from p2p.c and eap-wsc.c to a
common function in wscutil.c supporting both formats so that it can be
used in ap.c too.
Logically this frame watch belongs in station. It was kept in device.c
for the purported reason that the station object was removed with
ifdown/ifup changes and hence the frame watch might need to be removed
and re-added unnecessarily. Since the kernel does not actually allow to
unregister a frame watch (only when the netdev is removed or its iftype
changes), re-adding a frame watch might trigger a -EALREADY or similar
error.
Avoid this by registering the frame watch when a new netdev is detected
in STATION mode, or when the interface type changes to STATION.
If a netdev iftype is changed, all frame registrations are removed.
Make sure to re-register for the appropriate frame notifications in case
our iftype is switched back to 'station'. In any other iftype, no frame
watches are registered and rrm_state object is effectively dormant.
Right now, RRM is created when a new netdev is detected and its iftype
is of type station. That means that any devices that start their life
as any other iftype cannot be changed to a station and have RRM function
properly. Fix that by always creating the RRM state regardless of the
initial iftype.
In the case that a netdev is powered down, or an interface type change
occurs, the station object will be removed and any watches will be
freed.
Since rrm is created when the netdev is created and persists across
iftype and power up/down changes, it should provide a destroy callback
to station_add_state_watch so that it can be notified when the watch is
removed.
If the iftype changes, kernel silently wipes out any frame registrations
we may have registered. Right now, frame registrations are only done when
the interface is created. This can result in frame watches not being
added if the interface type is changed between station mode to ap mode
and then back to station mode, e.g.:
device wlan0 set-property Mode ap
device wlan0 set-property Mode station
Make sure to re-add frame registrations according to the mode if the
interface type is changed.
Since netdev now keeps track of iftype changes, let it call
frame_watch_wdev_remove on netdevs that it manages to clear frame
registrations that should be cleared due to an iftype change.
Note that P2P_DEVICE wdevs are not managed by any netdev object, but
since their iftype cannot be changed, they should not be affected
by this change.
And set the interface type based on the event rather than the command
callback. This allows us to track interface type changes even if they
come from outside iwd (which shouldn't happen.)
The prepare_ft patch was an intermediate to a full patch
set and was not fully tested stand alone. Its placement
actually broke FT due to handshake->aa getting overwritten
prior to netdev->prev_bssid being copied out. This caused
FT to fail with "transport endpoint not connected (-107)"
- Make sure to print the cookie information
- Don't print messages for frames we're not interested in. This is
particularly helpful when running auto-tests since frame acks from
hostapd pollute the iwd log.
Fix a regression where connection to an open network results in an
NotSupported error being returned.
Fixes: d79e883e93 ("netdev: Introduce connection types")
This makes conversions simpler. Also fixes a bug where P2P devices were
printed with an incorrect Mode value since dbus_iftype_to_string was
assuming that an iftype as defined in nl80211.h was being passed in,
while netdev was returning an enum value defined in netdev.h.
It was seen that some full mac cards/drivers do not include any
rate information with the NEW_STATION event. This was causing
the NEW_STATION event to be ignored, preventing AP mode from
working on these cards.
Since the full mac path does not even require sta->rates the
parsing can be removed completely.
It was found that if the user cancels/disconnects the agent prior to
entering credentials, IWD would get stuck and could no longer accept
any connect calls with the error "Operation already in progress".
For example exiting iwctl in the Password prompt would cause this:
iwctl
$ station wlan0 connect myssid
$ Password: <Ctrl-C>
This was due to the agent never calling the network callback in the
case of an agent disconnect. Network would wait indefinitely for the
credentials, and disallow any future connect attempts.
To fix this agent_finalize_pending can be called in agent_disconnect
with a NULL reply which behaves the same as if there was an
internal timeout and ultimately allows network to fail the connection
The 8021x offloading procedure still does EAP in userspace which
negotiates the PMK. The kernel then expects to obtain this PMK
from userspace by calling SET_PMK. This then allows the firmware
to begin the 4-way handshake.
Using __eapol_install_set_pmk_func to install netdev_set_pmk,
netdev now gets called into once EAP finishes and can begin
the final userspace actions prior to the firmware starting
the 4-way handshake:
- SET_PMK using PMK negotiated with EAP
- Emit SETTING_KEYS event
- netdev_connect_ok
One thing to note is that the kernel provides no way of knowing if
the 4-way handshake completed. Assuming SET_PMK/SET_STATION come
back with no errors, IWD assumes the PMK was valid. If not, or
due to some other issue in the 4-way, the kernel will send a
disconnect.
This adds a new type for 8021x offload as well as support in
building CMD_CONNECT.
As described in the comment, 8021x offloading is not particularly
similar to PSK as far as the code flow in IWD is concerned. There
still needs to be an eapol_sm due to EAP being done in userspace.
This throws somewhat of a wrench into our 'is_offload' cases. And
as such this connection type is handled specially.
802.1x offloading needs a way to call SET_PMK after EAP finishes.
In the same manner as set_tk/gtk/igtk a new 'install_pmk' function
was added which eapol can call into after EAP completes.
The chances were extremely low, but using l_idle_oneshot
could end up causing a invalid memory access if the netdev
went down while waiting for the disconnect idle callback.
Instead netdev can keep track of the idle with l_idle_create
and remove it if the netdev goes down prior to the idle callback.
This fixes an infinite loop issue when authenticate frames time
out. If the AP is not responding IWD ends up retrying indefinitely
due to how SAE was handling this timeout. Inside sae_auth_timeout
it was actually sending another authenticate frame to reject
the SAE handshake. This, again, resulted in a timeout which called
the SAE timeout handler and repeated indefinitely.
The kernel resend behavior was not taken into account when writing
the SAE timeout behavior and in practice there is actually no need
for SAE to do much of anything in response to a timeout. The
kernel automatically resends Authenticate frames 3 times which mirrors
IWDs SAE behavior anyways. Because of this the authenticate timeout
handler can be completely removed, which will cause the connection
to fail in the case of an autentication timeout.
This crash was caused from the disconnect_cb being called
immediately in cases where send_disconnect was false. The
previous patch actually addressed this separately as this
flag was being set improperly which will, indirectly, fix
one of the two code paths that could cause this crash.
Still, there is a situation where send_disconnect could
be false and in this case IWD would still crash. If IWD
is waiting to queue the connect item and netdev_disconnect
is called it would result in the callback being called
immediately. Instead we can add an l_idle as to allow the
callback to happen out of scope, which is what station
expects.
Prior to this patch, the crashing behavior can be tested using
the following script (or some variant of it, your system timing
may not be the same as mine).
iwctl station wlan0 disconnect
iwctl station wlan0 connect <network1> &
sleep 0.02
iwctl station wlan0 connect <network2>
++++++++ backtrace ++++++++
0 0x7f4e1504e530 in /lib64/libc.so.6
1 0x432b54 in network_get_security() at src/network.c:253
2 0x416e92 in station_handshake_setup() at src/station.c:937
3 0x41a505 in __station_connect_network() at src/station.c:2551
4 0x41a683 in station_disconnect_onconnect_cb() at src/station.c:2581
5 0x40b4ae in netdev_disconnect() at src/netdev.c:3142
6 0x41a719 in station_disconnect_onconnect() at src/station.c:2603
7 0x41a89d in station_connect_network() at src/station.c:2652
8 0x433f1d in network_connect_psk() at src/network.c:886
9 0x43483a in network_connect() at src/network.c:1183
10 0x4add11 in _dbus_object_tree_dispatch() at ell/dbus-service.c:1802
11 0x49ff54 in message_read_handler() at ell/dbus.c:285
12 0x496d2f in io_callback() at ell/io.c:120
13 0x495894 in l_main_iterate() at ell/main.c:478
14 0x49599b in l_main_run() at ell/main.c:521
15 0x495cb3 in l_main_run_with_signal() at ell/main.c:647
16 0x404add in main() at src/main.c:490
17 0x7f4e15038b25 in /lib64/libc.so.6
The send_disconnect flag was being improperly set based only
on connect_cmd_id being zero. This does not take into account
the case of CMD_CONNECT having finished but not EAPoL. In this
case we do need to send a disconnect.