If IWD connects under bad RF conditions and netconfig takes
a while to complete (e.g. slow DHCP), the roam timeout
could fire before DHCP is done. Then, after the roam,
IWD would transition automatically to connected before
DHCP was finished. In theory DHCP could still complete after
this point but any process depending on IWD's connected
state would be uninformed and assume IP networking is up.
Fix this by stopping netconfig prior to a roam if IWD is not
in a connected state. Then, once the roam either failed or
succeeded, start netconfig again.
When acting as a configurator the enrollee can start on a different
channel than IWD is connected to. IWD will begin the auth process
on this channel but tell the enrollee to transition to the current
channel after the auth request. Since a configurator must be
connected (a requirement IWD enforces) we can assume a channel
transition will always be to the currently connected channel. This
allows us to simply cancel the offchannel request and wait for a
response (rather than start another offchannel).
Doing this improves the DPP performance and reduces the potential
for a lost frame during the channel transition.
This patch also addresses the comment that we should wait for the
auth request ACK before canceling the offchannel. Now a flag is
set and IWD will cancel the offchannel once the ACK is received.
If IWD gets a disconnect during FT the roaming state will be
cleared, as well as any ft_info's during ft_clear_authentications.
This includes canceling the offchannel operation which also
destroys any pending ft_info's if !info->parsed. This causes a
double free afterwards. In addition the l_queue_remove inside the
foreach callback is not a safe operation either.
To fix this don't remove the ft_info inside the offchannel
destroy callback. The info will get freed by ft_associate regardless
of the outcome (parsed or !parsed). This is also consistent with
how the onchannel logic works.
Log and crash backtrace below:
iwd[488]: src/station.c:station_try_next_transition() 5, target aa:46:8d:37:7c:87
iwd[488]: src/wiphy.c:wiphy_radio_work_insert() Inserting work item 16668
iwd[488]: src/wiphy.c:wiphy_radio_work_insert() Inserting work item 16669
iwd[488]: src/wiphy.c:wiphy_radio_work_done() Work item 16667 done
iwd[488]: src/wiphy.c:wiphy_radio_work_next() Starting work item 16668
iwd[488]: src/netdev.c:netdev_mlme_notify() MLME notification Remain on Channel(55)
iwd[488]: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
iwd[488]: src/netdev.c:netdev_link_notify() event 16 on ifindex 5
iwd[488]: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
iwd[488]: src/netdev.c:netdev_deauthenticate_event()
iwd[488]: src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
iwd[488]: src/netdev.c:netdev_disconnect_event()
iwd[488]: Received Deauthentication event, reason: 6, from_ap: true
iwd[488]: src/station.c:station_disconnect_event() 5
iwd[488]: src/station.c:station_disassociated() 5
iwd[488]: src/station.c:station_reset_connection_state() 5
iwd[488]: src/station.c:station_roam_state_clear() 5
iwd[488]: double free or corruption (fasttop)
5 0x0000555b3dbf44a4 in ft_info_destroy ()
6 0x0000555b3dbf45b3 in remove_ifindex ()
7 0x0000555b3dc4653c in l_queue_foreach_remove ()
8 0x0000555b3dbd0dd1 in station_reset_connection_state ()
9 0x0000555b3dbd37e5 in station_disassociated ()
10 0x0000555b3dbc8bb8 in netdev_mlme_notify ()
11 0x0000555b3dc4e80b in received_data ()
12 0x0000555b3dc4b430 in io_callback ()
13 0x0000555b3dc4a5ed in l_main_iterate ()
14 0x0000555b3dc4a6bc in l_main_run ()
15 0x0000555b3dc4a8e0 in l_main_run_with_signal ()
16 0x0000555b3dbbe888 in main ()
Hostapd commit bc36991791 now properly sets the secure bit on
message 1/4. This was addressed in an earlier IWD commit but
neglected to allow for backwards compatibility. The check is
fatal which now breaks earlier hostapd version (older than 2.10).
Instead warn on this condition rather than reject the rekey.
Fixes: 7fad6590bd ("eapol: allow 'secure' to be set on rekeys")
The HT40+/- flags were reversed when checking against the 802.11
behavior flags.
HT40+ means the secondary channel is above (+) the primary channel
therefore corresponds to the PRIMARY_CHANNEL_LOWER behavior. And
the opposite for HT40-.
Reported-By: Alagu Sankar <alagusankar@gmail.com>
Use a more appropriate printf conversion string in order to avoid
unnecessary implicit conversion which can lead to a buffer overflow.
Reasons similar to commit:
98b758f893 ("knownnetworks: fix printing SSID in hex")
In the case that the FT target is on the same channel as we're currently
operating on, use ft_authenticate_onchannel instead of ft_authenticate.
Going offchannel in this case can confuse some drivers.
Currently when we try FT-over-Air, the Authenticate frame is always
sent via offchannel infrastructure We request the driver to go
offchannel, then send the Authenticate frame. This works fine as long
as the target AP is on a different channel. On some networks some (or
all) APs might actually be located on the same channel. In this case
going offchannel will result in some drivers not actually sending the
Authenticate frame until after the offchannel operation completes.
Work around this by introducing a new ft_authenticate variant that will
not request an offchannel operation first.
Force conversion to unsigned char before printing to avoid sign
extension when printing SSID in hex. For example, if there are CJK
characters in SSID, it will generate a very long string like
/net/connman/iwd/ffffffe8ffffffaeffffffa1.
If a very long ssid was used (e.g. CJK characters in SSID), it might do
out of bounds write to static variable for lack of checking the position
before the last snprintf() call.
Seeing that some authenticators can't handle TLS session caching
properly, allow the EAP-TLS-based methods session caching support to be
disabled per-network using a method specific FastReauthentication setting.
Defaults to true.
With the previous commit, authentication should succeed at least every
other attempt. I'd also expect that EAP-TLS is not usually affected
because there's no phase2, unlike with EAP-PEAP/EAP-TTLS.
If we have a TLS session cached from this attempt or a previous
successful connection attempt but the overall EAP method fails, forget
the session to improve the chances that authentication succeeds on the
next attempt considering that some authenticators strangely allow
resumption but can't handle it all the way to EAP method success.
Logically the session resumption in the TLS layers on the server should
be transparent to the EAP layers so I guess those may be failed
attempts to further optimise phase 2 when the server thinks it can
already trust the client.
The extra IE length for the WMM IE was being set to 26 which is
the HT IE length, not WMM. Fix this and use the proper size for
the WMM IE of 50 bytes.
This shouldn't have caused any problems prior as the tail length
is always allocated with 256 or 512 extra bytes of headroom.
Since channels numbers are used as indexes into the array, and given
that channel numbers start at '1' instead of 0, make sure to allocate a
buffer large enough to not overflow when the max channel number for a
given band is accessed.
src/manager.c:manager_wiphy_dump_callback() New wiphy phy1 added (1)
==22290== Invalid write of size 2
==22290== at 0x4624B2: nl80211_parse_supported_frequencies (nl80211util.c:570)
==22290== by 0x417CA5: parse_supported_bands (wiphy.c:1636)
==22290== by 0x418594: wiphy_parse_attributes (wiphy.c:1805)
==22290== by 0x418E20: wiphy_update_from_genl (wiphy.c:1991)
==22290== by 0x464589: manager_wiphy_dump_callback (manager.c:564)
==22290== by 0x4CBDDA: process_unicast (genl.c:944)
==22290== by 0x4CC19C: received_data (genl.c:1056)
==22290== by 0x4C7140: io_callback (io.c:120)
==22290== by 0x4C5A97: l_main_iterate (main.c:476)
==22290== by 0x4C5BDC: l_main_run (main.c:523)
==22290== by 0x4C5F0F: l_main_run_with_signal (main.c:645)
==22290== by 0x40503B: main (main.c:600)
==22290== Address 0x4aa76ec is 0 bytes after a block of size 28 alloc'd
==22290== at 0x48417B5: malloc (vg_replace_malloc.c:393)
==22290== by 0x4BC4D1: l_malloc (util.c:62)
==22290== by 0x417BE4: parse_supported_bands (wiphy.c:1619)
==22290== by 0x418594: wiphy_parse_attributes (wiphy.c:1805)
==22290== by 0x418E20: wiphy_update_from_genl (wiphy.c:1991)
==22290== by 0x464589: manager_wiphy_dump_callback (manager.c:564)
==22290== by 0x4CBDDA: process_unicast (genl.c:944)
==22290== by 0x4CC19C: received_data (genl.c:1056)
==22290== by 0x4C7140: io_callback (io.c:120)
==22290== by 0x4C5A97: l_main_iterate (main.c:476)
==22290== by 0x4C5BDC: l_main_run (main.c:523)
==22290== by 0x4C5F0F: l_main_run_with_signal (main.c:645)
==22290==
This adds support for rekeys to AP mode. A single timer is used and
reset to the next station needing a rekey. A default rekey timer of
600 seconds is used unless the profile sets a timeout.
The only changes required was to set the secure bit for message 1,
reset the frame retry counter, and change the 2/4 verifier to use
the rekey flag rather than ptk_complete. This is because we must
set ptk_complete false in order to detect retransmissions of the
4/4 frame.
Initiating a rekey can now be done by simply calling eapol_start().
If IWD ends up dumping wiphy's twice (because of NEW_WIPHY event
soon after initial dump) it will also try and dump interfaces
twice leading to multiple DEL_INTERFACE calls. The second attempt
will fail with -ENODEV (since the interface was already deleted).
Just silently fail with this case and let the other DEL_INTERFACE
path handle the re-creation.
With really badly timed events a wiphy can be registered twice. This
happens when IWD starts and requests a wiphy dump. Immediately after
a NEW_WIPHY event comes in (presumably when the driver loads) which
starts another dump. The NEW_WIPHY event can't simply be ignored
since it could be a hotplug (e.g. USB card) so to fix this we can
instead just prevent it from being registered.
This does mean both dumps will happen but the information will just
be added to the same wiphy object.
Past commits should address any potential problems of the timer
firing during FT, but its still good practice to cancel the timer
once it is no longer needed, i.e. once FT has started.
If station has already started FT ensure station_cannot_roam takes
that into account. Since the state has not yet changed it must also
check if the FT work ID is set.
Under the following conditions IWD can accidentally trigger a second
roam scan while one is already in progress:
- A low RSSI condition is met. This starts the roam rearm timer.
- A packet loss condition is met, which triggers a roam scan.
- The roam rearm timer fires and starts another roam scan while
also overwriting the first roam scan ID.
- Then, if IWD gets disconnected the overwritten roam scan gets
canceled, and the roam state is cleared which NULL's
station->connected_network.
- The initial roam scan results then come in with the assumption
that IWD is still connected which results in a crash trying to
reference station->connected_network.
This can be fixed by adding a station_cannot_roam check in the rearm
timer. If IWD is already doing a roam scan station->preparing_roam
should be set which will cause it to return true and stop any further
action.
Aborting (signal 11) [/usr/libexec/iwd]
iwd[426]: ++++++++ backtrace ++++++++
iwd[426]: #0 0x7f858d7b2090 in /lib/x86_64-linux-gnu/libc.so.6
iwd[426]: #1 0x443df7 in network_get_security() at ome/locus/workspace/iwd/src/network.c:287
iwd[426]: #2 0x421fbb in station_roam_scan_notify() at ome/locus/workspace/iwd/src/station.c:2516
iwd[426]: #3 0x43ebc1 in scan_finished() at ome/locus/workspace/iwd/src/scan.c:1861
iwd[426]: #4 0x43ecf2 in get_scan_done() at ome/locus/workspace/iwd/src/scan.c:1891
iwd[426]: #5 0x4cbfe9 in destroy_request() at ome/locus/workspace/iwd/ell/genl.c:676
iwd[426]: #6 0x4cc98b in process_unicast() at ome/locus/workspace/iwd/ell/genl.c:954
iwd[426]: #7 0x4ccd28 in received_data() at ome/locus/workspace/iwd/ell/genl.c:1052
iwd[426]: #8 0x4c79c9 in io_callback() at ome/locus/workspace/iwd/ell/io.c:120
iwd[426]: #9 0x4c62e3 in l_main_iterate() at ome/locus/workspace/iwd/ell/main.c:476
iwd[426]: #10 0x4c6426 in l_main_run() at ome/locus/workspace/iwd/ell/main.c:519
iwd[426]: #11 0x4c6752 in l_main_run_with_signal() at ome/locus/workspace/iwd/ell/main.c:645
iwd[426]: #12 0x405987 in main() at ome/locus/workspace/iwd/src/main.c:600
iwd[426]: #13 0x7f858d793083 in /lib/x86_64-linux-gnu/libc.so.6
iwd[426]: +++++++++++++++++++++++++++
If the authenticator has already set an snonce then the packet must
be a retransmit. Handle this by sending 3/4 again but making sure
to not reset the frame counter.
Old wpa_supplicant versions do not set the secure bit on 2/4 during
rekeys which causes IWD to reject the message and eventually time out.
Modern versions do set it correctly but even Android 13 (Pixel 5a)
still uses an ancient version of wpa_supplicant which does not set the
bit.
Relax this check and instead just print a warning but allow the message
to be processed.
In try_handshake_complete() we return early if all the keys had
been installed before (initial associations). For rekeys we can
now emit the REKEY_COMPLETE event which lets AP mode reset the
rekey timer for that station.
When the TK is installed the 'ptk_installed' flag was never set to
zero. For initial associations this was fine (already zero) but for
rekeys the flag needs to be unset so try_handshake_complete knows
if the key was installed. This is consistent with how gtk/igtk keys
work as well.
Rekeys for station mode don't need to know when complete since
there is nothing to do once done. AP mode on the other hand needs
to know if the rekey was successful in order to reset/set the next
rekey timer.
The second handshake message was hard coded with the secure bit as
zero but for rekeys the secure bit should be set to 1. Fix this by
changing the 2/4 builder to take a boolean which will set the bit
properly.
It should be noted that hostapd doesn't check this bit so EAPoL
worked just fine, but IWD's checks are more strict.
The PEAP RFC wants implementations to enforce that Phase2 methods have
been successfully completed prior to accepting a successful result TLV.
However, when TLS session resumption is used, some servers will skip
phase2 methods entirely and simply send a Result TLV with a success
code. This results in iwd (erroneously) rejecting the authentication
attempt.
Fix this by marking phase2 method as successful if session resumption is
being used.
This adds a builder which sets the country IE in probes/beacons.
The IE will use the 'single subband triplet sequence' meaning
dot11OperatingClassesRequired is false. This is much easier to
build and doesn't require knowing an operating class.
The IE itself is variable in length and potentially could grow
large if the hardware has a weird configuration (many different
power levels or segmentation in supported channels) so the
overall builder was changed to take the length of the buffer and
warnings will be printed if any space issues are encountered.
IWD's channel/frequency conversions use simple math to convert and
have very minimal checks to ensure the input is valid. This can
lead to some channels/frequencies being calculated which are not
in IWD's E-4 table, specifically in the 5GHz band.
This is especially noticable using mac80211_hwsim which includes
some obscure high 5ghz frequencies which are not part of the 802.11
spec.
To fix this calculate the frequency or channel then iterate E-4
operating classes to check that the value actually matches a class.
If supported this will include the HT capabilities and HT
operations elements in beacons/probes. Some shortcuts were taken
here since not all the information is currently parsed from the
hardware. Namely the HT operation element does not include the
basic MCS set. Still, this will at least show stations that the
AP is capable of more than just basic rates.
The builders themselves are structured similar to the basic rates
builder where they build only the contents and return the length.
The caller must set the type/length manually. This is to support
the two use cases of using with an IE builder vs direct pointer.
To include HT support a chandef needs to be created for whatever
frequency is being used. This allows IWD to provide a secondary
channel to the kernel in the case of 40MHz operation. Now the AP
will generate a chandef when starting based on the channel set
in the user profile (or default).
If HT is not supported the chandef width is set to 20MHz no-HT,
otherwise band_freq_to_ht_chandef is used.
The WMM parameter IE is expected by the linux kernel for any AP
supporting HT/VHT etc. IWD won't actually use WMM and its not
clear exactly why the kernel uses this restriction, but regardless
it must be included to support HT.
For AP mode its convenient for IWD to choose an appropriate
channel definition rather than require the user provide very
low level parameters such as channel width, center1 frequency
etc. For now only HT is supported as VHT/HE etc. require
additional secondary channel frequencies.
The HT API tries to find an operating class using 40Mhz which
complies with any hardware restrictions. If an operating class is
found that is supported/not restricted it is marked as 'best' until
a better one is found. In this case 'better' is a larger channel
width. Since this is HT only 20mhz and 40mhz widths are checked.