Disable power save if the wiphy indicates its needed. Do this
before issuing GET_LINK so the netdev doesn't signal its up until
power save is disabled.
Certain drivers do not handle power save very well resulting in
missed frames, firmware crashes, or other bad behavior. Its easy
enough to disable power save via iw, iwconfig, etc but since IWD
removes and creates the interface on startup it blows away any
previous power save setting. The setting must be done *after* IWD
creates the interface which can be done, but needs to be via some
external daemon monitoring IWD's state. For minimal systems,
e.g. without NetworkManager, it becomes difficult and annoying to
persistently disable power save.
For this reason a new driver flag POWER_SAVE_DISABLE is being
added. This can then be referenced when creating the interfaces
and if set, disable power save.
The driver_infos list in wiphy.c is hard coded and, naturally,
not configurable from a user perspective. As drivers are updated
or added users may be left with their system being broken until the
driver is added, IWD released, and packaged.
This adds the ability to define driver flags inside main.conf under
the "DriverQuirks" group. Keys in this group correspond to values in
enum driver_flag and values are a list of glob matches for specific
drivers:
[DriverQuirks]
DefaultInterface=rtl81*,rtl87*,rtl88*,rtw_*,brcmfmac,bcmsdh_sdmmc
ForcePae=buggy_pae_*
Rather than keep a pointer to the driver_info entry copy the flags
into the wiphy object. This preps for supporting driver flags via
a configuration file, specifically allowing for entries that are a
subset of others. For example:
{ "rtl88*", DEFAULT_IF },
{ "rtl88x2bu", FORCE_PAE },
Before it was not possible to add entires like this since only the
last entry match would get set. Now DEFAULT_IF would get set to all
matches, and FORCE_PAE to only rtl88x2bu. This isn't especially
important for the static list since it could be modified to work
correctly, but will be needed when parsing flags from a
configuration file that may contain duplicates or subsets of the
static list.
If there was some problem during the FT authenticate stage
its nice to know more of what happened: whether the AP didn't
respond, rejected the attempt, or sent an invalid frame/IEs.
In some situations its convenient for the same work item to be
inserted (rescheduled) while its in progress. FT for example does
this now if a roam fails. The same ft_work item gets re-inserted
which, currently, is not safe to do since the item is modified
and removed once completed.
Fix this by introducing wiphy_radio_work_reschedule which is an
explicit API for re-inserting work items from within the do_work
callback.
The wiphy work logic was changed around slightly to remove the item
at the head of the queue prior to starting and note the ID going
into do_work. If do_work signaled done and ID changed we know it
was re-inserted and can skip the destroy logic and move onto the
next item. If the item is not done continue as normal but set the
priority to INT_MIN, as usual, to prevent other items from getting
to the head of the queue.
If IWD connects under bad RF conditions and netconfig takes
a while to complete (e.g. slow DHCP), the roam timeout
could fire before DHCP is done. Then, after the roam,
IWD would transition automatically to connected before
DHCP was finished. In theory DHCP could still complete after
this point but any process depending on IWD's connected
state would be uninformed and assume IP networking is up.
Fix this by stopping netconfig prior to a roam if IWD is not
in a connected state. Then, once the roam either failed or
succeeded, start netconfig again.
When acting as a configurator the enrollee can start on a different
channel than IWD is connected to. IWD will begin the auth process
on this channel but tell the enrollee to transition to the current
channel after the auth request. Since a configurator must be
connected (a requirement IWD enforces) we can assume a channel
transition will always be to the currently connected channel. This
allows us to simply cancel the offchannel request and wait for a
response (rather than start another offchannel).
Doing this improves the DPP performance and reduces the potential
for a lost frame during the channel transition.
This patch also addresses the comment that we should wait for the
auth request ACK before canceling the offchannel. Now a flag is
set and IWD will cancel the offchannel once the ACK is received.
If IWD gets a disconnect during FT the roaming state will be
cleared, as well as any ft_info's during ft_clear_authentications.
This includes canceling the offchannel operation which also
destroys any pending ft_info's if !info->parsed. This causes a
double free afterwards. In addition the l_queue_remove inside the
foreach callback is not a safe operation either.
To fix this don't remove the ft_info inside the offchannel
destroy callback. The info will get freed by ft_associate regardless
of the outcome (parsed or !parsed). This is also consistent with
how the onchannel logic works.
Log and crash backtrace below:
iwd[488]: src/station.c:station_try_next_transition() 5, target aa:46:8d:37:7c:87
iwd[488]: src/wiphy.c:wiphy_radio_work_insert() Inserting work item 16668
iwd[488]: src/wiphy.c:wiphy_radio_work_insert() Inserting work item 16669
iwd[488]: src/wiphy.c:wiphy_radio_work_done() Work item 16667 done
iwd[488]: src/wiphy.c:wiphy_radio_work_next() Starting work item 16668
iwd[488]: src/netdev.c:netdev_mlme_notify() MLME notification Remain on Channel(55)
iwd[488]: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
iwd[488]: src/netdev.c:netdev_link_notify() event 16 on ifindex 5
iwd[488]: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
iwd[488]: src/netdev.c:netdev_deauthenticate_event()
iwd[488]: src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
iwd[488]: src/netdev.c:netdev_disconnect_event()
iwd[488]: Received Deauthentication event, reason: 6, from_ap: true
iwd[488]: src/station.c:station_disconnect_event() 5
iwd[488]: src/station.c:station_disassociated() 5
iwd[488]: src/station.c:station_reset_connection_state() 5
iwd[488]: src/station.c:station_roam_state_clear() 5
iwd[488]: double free or corruption (fasttop)
5 0x0000555b3dbf44a4 in ft_info_destroy ()
6 0x0000555b3dbf45b3 in remove_ifindex ()
7 0x0000555b3dc4653c in l_queue_foreach_remove ()
8 0x0000555b3dbd0dd1 in station_reset_connection_state ()
9 0x0000555b3dbd37e5 in station_disassociated ()
10 0x0000555b3dbc8bb8 in netdev_mlme_notify ()
11 0x0000555b3dc4e80b in received_data ()
12 0x0000555b3dc4b430 in io_callback ()
13 0x0000555b3dc4a5ed in l_main_iterate ()
14 0x0000555b3dc4a6bc in l_main_run ()
15 0x0000555b3dc4a8e0 in l_main_run_with_signal ()
16 0x0000555b3dbbe888 in main ()
Hostapd commit bc36991791 now properly sets the secure bit on
message 1/4. This was addressed in an earlier IWD commit but
neglected to allow for backwards compatibility. The check is
fatal which now breaks earlier hostapd version (older than 2.10).
Instead warn on this condition rather than reject the rekey.
Fixes: 7fad6590bd ("eapol: allow 'secure' to be set on rekeys")
The HT40+/- flags were reversed when checking against the 802.11
behavior flags.
HT40+ means the secondary channel is above (+) the primary channel
therefore corresponds to the PRIMARY_CHANNEL_LOWER behavior. And
the opposite for HT40-.
Reported-By: Alagu Sankar <alagusankar@gmail.com>
Use a more appropriate printf conversion string in order to avoid
unnecessary implicit conversion which can lead to a buffer overflow.
Reasons similar to commit:
98b758f893 ("knownnetworks: fix printing SSID in hex")
In the case that the FT target is on the same channel as we're currently
operating on, use ft_authenticate_onchannel instead of ft_authenticate.
Going offchannel in this case can confuse some drivers.
Currently when we try FT-over-Air, the Authenticate frame is always
sent via offchannel infrastructure We request the driver to go
offchannel, then send the Authenticate frame. This works fine as long
as the target AP is on a different channel. On some networks some (or
all) APs might actually be located on the same channel. In this case
going offchannel will result in some drivers not actually sending the
Authenticate frame until after the offchannel operation completes.
Work around this by introducing a new ft_authenticate variant that will
not request an offchannel operation first.
Force conversion to unsigned char before printing to avoid sign
extension when printing SSID in hex. For example, if there are CJK
characters in SSID, it will generate a very long string like
/net/connman/iwd/ffffffe8ffffffaeffffffa1.
If a very long ssid was used (e.g. CJK characters in SSID), it might do
out of bounds write to static variable for lack of checking the position
before the last snprintf() call.
Seeing that some authenticators can't handle TLS session caching
properly, allow the EAP-TLS-based methods session caching support to be
disabled per-network using a method specific FastReauthentication setting.
Defaults to true.
With the previous commit, authentication should succeed at least every
other attempt. I'd also expect that EAP-TLS is not usually affected
because there's no phase2, unlike with EAP-PEAP/EAP-TTLS.
If we have a TLS session cached from this attempt or a previous
successful connection attempt but the overall EAP method fails, forget
the session to improve the chances that authentication succeeds on the
next attempt considering that some authenticators strangely allow
resumption but can't handle it all the way to EAP method success.
Logically the session resumption in the TLS layers on the server should
be transparent to the EAP layers so I guess those may be failed
attempts to further optimise phase 2 when the server thinks it can
already trust the client.
The extra IE length for the WMM IE was being set to 26 which is
the HT IE length, not WMM. Fix this and use the proper size for
the WMM IE of 50 bytes.
This shouldn't have caused any problems prior as the tail length
is always allocated with 256 or 512 extra bytes of headroom.
Since channels numbers are used as indexes into the array, and given
that channel numbers start at '1' instead of 0, make sure to allocate a
buffer large enough to not overflow when the max channel number for a
given band is accessed.
src/manager.c:manager_wiphy_dump_callback() New wiphy phy1 added (1)
==22290== Invalid write of size 2
==22290== at 0x4624B2: nl80211_parse_supported_frequencies (nl80211util.c:570)
==22290== by 0x417CA5: parse_supported_bands (wiphy.c:1636)
==22290== by 0x418594: wiphy_parse_attributes (wiphy.c:1805)
==22290== by 0x418E20: wiphy_update_from_genl (wiphy.c:1991)
==22290== by 0x464589: manager_wiphy_dump_callback (manager.c:564)
==22290== by 0x4CBDDA: process_unicast (genl.c:944)
==22290== by 0x4CC19C: received_data (genl.c:1056)
==22290== by 0x4C7140: io_callback (io.c:120)
==22290== by 0x4C5A97: l_main_iterate (main.c:476)
==22290== by 0x4C5BDC: l_main_run (main.c:523)
==22290== by 0x4C5F0F: l_main_run_with_signal (main.c:645)
==22290== by 0x40503B: main (main.c:600)
==22290== Address 0x4aa76ec is 0 bytes after a block of size 28 alloc'd
==22290== at 0x48417B5: malloc (vg_replace_malloc.c:393)
==22290== by 0x4BC4D1: l_malloc (util.c:62)
==22290== by 0x417BE4: parse_supported_bands (wiphy.c:1619)
==22290== by 0x418594: wiphy_parse_attributes (wiphy.c:1805)
==22290== by 0x418E20: wiphy_update_from_genl (wiphy.c:1991)
==22290== by 0x464589: manager_wiphy_dump_callback (manager.c:564)
==22290== by 0x4CBDDA: process_unicast (genl.c:944)
==22290== by 0x4CC19C: received_data (genl.c:1056)
==22290== by 0x4C7140: io_callback (io.c:120)
==22290== by 0x4C5A97: l_main_iterate (main.c:476)
==22290== by 0x4C5BDC: l_main_run (main.c:523)
==22290== by 0x4C5F0F: l_main_run_with_signal (main.c:645)
==22290==
This adds support for rekeys to AP mode. A single timer is used and
reset to the next station needing a rekey. A default rekey timer of
600 seconds is used unless the profile sets a timeout.
The only changes required was to set the secure bit for message 1,
reset the frame retry counter, and change the 2/4 verifier to use
the rekey flag rather than ptk_complete. This is because we must
set ptk_complete false in order to detect retransmissions of the
4/4 frame.
Initiating a rekey can now be done by simply calling eapol_start().
If IWD ends up dumping wiphy's twice (because of NEW_WIPHY event
soon after initial dump) it will also try and dump interfaces
twice leading to multiple DEL_INTERFACE calls. The second attempt
will fail with -ENODEV (since the interface was already deleted).
Just silently fail with this case and let the other DEL_INTERFACE
path handle the re-creation.
With really badly timed events a wiphy can be registered twice. This
happens when IWD starts and requests a wiphy dump. Immediately after
a NEW_WIPHY event comes in (presumably when the driver loads) which
starts another dump. The NEW_WIPHY event can't simply be ignored
since it could be a hotplug (e.g. USB card) so to fix this we can
instead just prevent it from being registered.
This does mean both dumps will happen but the information will just
be added to the same wiphy object.
Past commits should address any potential problems of the timer
firing during FT, but its still good practice to cancel the timer
once it is no longer needed, i.e. once FT has started.
If station has already started FT ensure station_cannot_roam takes
that into account. Since the state has not yet changed it must also
check if the FT work ID is set.
Under the following conditions IWD can accidentally trigger a second
roam scan while one is already in progress:
- A low RSSI condition is met. This starts the roam rearm timer.
- A packet loss condition is met, which triggers a roam scan.
- The roam rearm timer fires and starts another roam scan while
also overwriting the first roam scan ID.
- Then, if IWD gets disconnected the overwritten roam scan gets
canceled, and the roam state is cleared which NULL's
station->connected_network.
- The initial roam scan results then come in with the assumption
that IWD is still connected which results in a crash trying to
reference station->connected_network.
This can be fixed by adding a station_cannot_roam check in the rearm
timer. If IWD is already doing a roam scan station->preparing_roam
should be set which will cause it to return true and stop any further
action.
Aborting (signal 11) [/usr/libexec/iwd]
iwd[426]: ++++++++ backtrace ++++++++
iwd[426]: #0 0x7f858d7b2090 in /lib/x86_64-linux-gnu/libc.so.6
iwd[426]: #1 0x443df7 in network_get_security() at ome/locus/workspace/iwd/src/network.c:287
iwd[426]: #2 0x421fbb in station_roam_scan_notify() at ome/locus/workspace/iwd/src/station.c:2516
iwd[426]: #3 0x43ebc1 in scan_finished() at ome/locus/workspace/iwd/src/scan.c:1861
iwd[426]: #4 0x43ecf2 in get_scan_done() at ome/locus/workspace/iwd/src/scan.c:1891
iwd[426]: #5 0x4cbfe9 in destroy_request() at ome/locus/workspace/iwd/ell/genl.c:676
iwd[426]: #6 0x4cc98b in process_unicast() at ome/locus/workspace/iwd/ell/genl.c:954
iwd[426]: #7 0x4ccd28 in received_data() at ome/locus/workspace/iwd/ell/genl.c:1052
iwd[426]: #8 0x4c79c9 in io_callback() at ome/locus/workspace/iwd/ell/io.c:120
iwd[426]: #9 0x4c62e3 in l_main_iterate() at ome/locus/workspace/iwd/ell/main.c:476
iwd[426]: #10 0x4c6426 in l_main_run() at ome/locus/workspace/iwd/ell/main.c:519
iwd[426]: #11 0x4c6752 in l_main_run_with_signal() at ome/locus/workspace/iwd/ell/main.c:645
iwd[426]: #12 0x405987 in main() at ome/locus/workspace/iwd/src/main.c:600
iwd[426]: #13 0x7f858d793083 in /lib/x86_64-linux-gnu/libc.so.6
iwd[426]: +++++++++++++++++++++++++++
If the authenticator has already set an snonce then the packet must
be a retransmit. Handle this by sending 3/4 again but making sure
to not reset the frame counter.
Old wpa_supplicant versions do not set the secure bit on 2/4 during
rekeys which causes IWD to reject the message and eventually time out.
Modern versions do set it correctly but even Android 13 (Pixel 5a)
still uses an ancient version of wpa_supplicant which does not set the
bit.
Relax this check and instead just print a warning but allow the message
to be processed.
In try_handshake_complete() we return early if all the keys had
been installed before (initial associations). For rekeys we can
now emit the REKEY_COMPLETE event which lets AP mode reset the
rekey timer for that station.
When the TK is installed the 'ptk_installed' flag was never set to
zero. For initial associations this was fine (already zero) but for
rekeys the flag needs to be unset so try_handshake_complete knows
if the key was installed. This is consistent with how gtk/igtk keys
work as well.
Rekeys for station mode don't need to know when complete since
there is nothing to do once done. AP mode on the other hand needs
to know if the rekey was successful in order to reset/set the next
rekey timer.
The second handshake message was hard coded with the secure bit as
zero but for rekeys the secure bit should be set to 1. Fix this by
changing the 2/4 builder to take a boolean which will set the bit
properly.
It should be noted that hostapd doesn't check this bit so EAPoL
worked just fine, but IWD's checks are more strict.
The PEAP RFC wants implementations to enforce that Phase2 methods have
been successfully completed prior to accepting a successful result TLV.
However, when TLS session resumption is used, some servers will skip
phase2 methods entirely and simply send a Result TLV with a success
code. This results in iwd (erroneously) rejecting the authentication
attempt.
Fix this by marking phase2 method as successful if session resumption is
being used.
This adds a builder which sets the country IE in probes/beacons.
The IE will use the 'single subband triplet sequence' meaning
dot11OperatingClassesRequired is false. This is much easier to
build and doesn't require knowing an operating class.
The IE itself is variable in length and potentially could grow
large if the hardware has a weird configuration (many different
power levels or segmentation in supported channels) so the
overall builder was changed to take the length of the buffer and
warnings will be printed if any space issues are encountered.
IWD's channel/frequency conversions use simple math to convert and
have very minimal checks to ensure the input is valid. This can
lead to some channels/frequencies being calculated which are not
in IWD's E-4 table, specifically in the 5GHz band.
This is especially noticable using mac80211_hwsim which includes
some obscure high 5ghz frequencies which are not part of the 802.11
spec.
To fix this calculate the frequency or channel then iterate E-4
operating classes to check that the value actually matches a class.
If supported this will include the HT capabilities and HT
operations elements in beacons/probes. Some shortcuts were taken
here since not all the information is currently parsed from the
hardware. Namely the HT operation element does not include the
basic MCS set. Still, this will at least show stations that the
AP is capable of more than just basic rates.
The builders themselves are structured similar to the basic rates
builder where they build only the contents and return the length.
The caller must set the type/length manually. This is to support
the two use cases of using with an IE builder vs direct pointer.
To include HT support a chandef needs to be created for whatever
frequency is being used. This allows IWD to provide a secondary
channel to the kernel in the case of 40MHz operation. Now the AP
will generate a chandef when starting based on the channel set
in the user profile (or default).
If HT is not supported the chandef width is set to 20MHz no-HT,
otherwise band_freq_to_ht_chandef is used.
The WMM parameter IE is expected by the linux kernel for any AP
supporting HT/VHT etc. IWD won't actually use WMM and its not
clear exactly why the kernel uses this restriction, but regardless
it must be included to support HT.
For AP mode its convenient for IWD to choose an appropriate
channel definition rather than require the user provide very
low level parameters such as channel width, center1 frequency
etc. For now only HT is supported as VHT/HE etc. require
additional secondary channel frequencies.
The HT API tries to find an operating class using 40Mhz which
complies with any hardware restrictions. If an operating class is
found that is supported/not restricted it is marked as 'best' until
a better one is found. In this case 'better' is a larger channel
width. Since this is HT only 20mhz and 40mhz widths are checked.
This adds some additional parsing to obtain the AMPDU parameter
byte as well as wiphy_get_ht_capabilities() which returns the
complete IE (combining the 3 separate kernel attributes).
The supported rates IE was being built in two places. This makes that
code common. Unfortunately it needs to support both an ie builder
and using a pointer directly which is why it only builds the contents
of the IE and the caller must set the type/length.
Move the l_netconfig_set_route_priority() and
l_netconfig_set_optimistic_dad_enabled() calls from netconfig_new, which
is called once for the l_netconfig object's lifetime, to
netconfig_load_settings, which is called before every connection attempt.
This is needed because we clean up the l_netconfig configuration by calling
l_netconfig_reset_config() at different points in connection setup and
teardown so we'd reset the route priority that we've set in netconfig_new,
back to 0 and never reload it.
The disabled_freqs list is being removed and replaced with a new
list in the band object. This completely removes the need for
the pending_freqs list as well since any regdom related dumps
can just overwrite the existing frequency list.
This adds two new APIs:
wiphy_get_frequency_info(): Used to get information about a given
frequency such as disabled/no-IR. This can also be used to check
if the frequency is supported (NULL return is unsupported).
wiphy_band_is_disabled(): Checks if a band is disabled. Note that
an unsupported band will also return true. Checking support should
be done with wiphy_get_supported_bands()
As additional frequency info is needed it doesn't make sense to
store a full list of frequencies for every attribute (i.e.
supported, disabled, no-IR, etc).
This changes nl80211_parse_supported_frequencies to take a list
of frequency attributes where each index corresponds to a channel,
and each value can be filled with flag bits to signal any
limitations on that frequency.
wiphy.c then had to be updated to use this rather than the existing
scan_freq_set lists. This, as-is, will break anything using
wiphy_get_disabled_freqs().
Currently the wiphy object keeps track of supported and disabled
frequencies as two separate scan_freq_set's. This is very expensive
and limiting since we have to add more sets in order to track
additional frequency flags (no-IR, no-HT, no-HE etc).
Instead we can refactor how frequencies are stored. They will now
be part of the band object and stored as a list of flag structures
where each index corresponds to a channel
IWD was optimizing FT-over-DS by authenticating to multiple BSS's
at the time of connecting which then made future roams slightly
faster since they could jump right into association. So far this
hasn't posed a problem but it was reported that some AP's actually
enforce a reassociation timeout (included in 4-way handshake).
Hostapd itself does no such enforcement but anything external to
hostapd could monitor FT events and clear the cache if any exceeded
this timeout.
For now remove the early action frames and treat FT-over-DS the
same as FT-over-Air. In the future we could parse the reassociation
timeout, batch out FT-Action frames and track responses but for the
time being this just fix the issue at a small performance cost.
Queue the FT action just like we do with FT Authenticate which makes
it able to be used the same way, i.e. call ft_action() then queue
the ft_associate work right away.
A timer was added to end the work item in case the target never
responds.
If the regdom updates during a periodic scan the results will be
delayed until after the update in order to, potentially, add 6GHz
frequencies since they may become available. The delayed results
happen regardless of 6GHz support but scan_wiphy_watch() was
returning early if 6GHz was not supported causing the scan request
to never complete.
The blamed commit argues that the periodic scan callback doesn't do
anything useful in the event of an aborted scan, but this is not
entirely true. In particular, the callback is responsible for re-arming
the periodic scan timer. Make sure to call scan_finished() so that iwd's
periodic scanning logic continues unabated even when a periodic scan is
aborted.
Also remove the periodic boolean member of struct scan_request, as it
serves no purpose anymore.
Fixes: 6051a14952 ("scan: Don't callback on SCAN_ABORTED")
This enables IWD to use 5GHz frequencies in AP mode. Currently
6GHz is not supported so we can assume a [General].Channel value
36 or above indicates the 5GHz band.
It should be noted that the system will probably need a regulatory
domain set in order for 5GHz to be allowed in AP mode. This is due
to world roaming (00) restricting any/all 5GHz frequencies. This
can be accomplished by setting main.conf [General].Country=CC to
the country this AP will operate in.
wiphy_get_supported_rates expected an enum defined in the nl80211
header but the argument type was an unsigned int, not exactly
intuitive to anyone using the API. Since the nl80211 enum value
was only used in a switch statement it could just as well be IWD's
internal enum band_freq.
This also allows modules which do not reference nl80211.h to use
wiphy_get_supported_rates().
If a CMD_TRIGGER_SCAN request fails with -EBUSY, iwd currently assumes
that a scan is ongoing on the underlying wdev and will retry the same
command when that scan is complete. It gets notified of that completion
via the scan_notify() function, and kicks the scan logic to try again.
However, if there is another wdev on the same wiphy and that wdev has a
scan request in flight, the kernel will also return -EBUSY. In other
words, only one scan request per wiphy is permitted.
As an example, the brcmfmac driver can create an AP interface on the
same wiphy as the default station interface, and scans can be triggered
on that AP interface.
If -EBUSY is returned because another wdev is scanning, then iwd won't
know when it can retry the original trigger request because the relevant
netlink event will arrive on a different wdev. Indeed, if no scan
context exists for that other wdev, then scan_notify will return early
and the scan logic will stall indefinitely.
Instead, and in the event that no scan context matches, use it as a cue
to retry a pending scan request that happens to be destined for the same
wiphy.
The previous commit added an invocation of known_networks_watch_add, but
never updated the module dependency graph.
Fixes: a793a41662 ("station, eapol: Set up eap-tls-common for session caching")
Use eap_set_peer_id() to set a string identifying the TLS server,
currently the hex-encoded SSID of the network, to be used as group name
and primary key in the session cache l_settings object. Provide pointers
to storage_eap_tls_cache_{load,sync} to eap-tls-common.c using
eap_tls_set_session_cache_ops(). Listen to Known Network removed
signals and call eap_tls_forget_peer() to have any session related to
the network also dropped from the cache.
Use l_tls_set_session_cache() to enable session cache/resume in the
TLS-based EAP methods. Sessions for all 802.1x networks are stored in
one l_settings object.
eap_{get,set}_peer_id() API is added for the upper layers to set the
identifier of the authenticator (or the supplicant if we're the
authenticator, if there's ever a use case for that.)
eap-tls-common.c can't call storage_eap_tls_cache_{load,sync}()
or known_networks_watch_add() (to handle known network removals) because
it's linked into some executables that don't have storage.o,
knownnetworks.o or common.o so an upper layer (station.c) will call
eap_tls_set_session_cache_ops() and eap_tls_forget_peer() as needed.
Minor changes to these two methods resulting from two rewrites of them.
Actual changes are:
* storage_tls_session_sync parameter is const,
* more specific naming,
* storage_tls_session_load will return an empty l_settings instead of
NULL so eap-tls-common.c doesn't have to handle this.
storage.c makes no assumptions about the group names in the l_settings
object and keeps no reference to that object, eap-tls-common.c is going
to maintain the memory copy of the cache since this cache and the disk
copy of it are reserved for EAP methods only.
A comma separated list as a string was ok for pure display purposes
but if any processing needed to be done on these values by external
consumers it really makes more sense to use a DBus array.
This wasn't being updated meaning the property is missing until a
scan is issued over DBus.
Rather than duplicate all the property changed calls they were all
factored out into a helper function.
Adds the MulticastDNS option globally to main.conf. If set all
network connections (when netconfig is enabled) will set mDNS
support into the resolver. Note that an individual network profile
can still override the global value if it sets MulticastDNS.
The limitation of cipher selection in ap.c was done so to allow p2p to
work. Now with the ability to specify ciphers in the AP config put the
burden on p2p to limit ciphers as it needs which is only CCMP according
to the spec.
These can now be optionally provided in an AP profile and provide a
way to limit what ciphers can be chosen. This still is dependent on
what the hardware supports.
The validation of these ciphers for station is done when parsing
the BSS RSNE but for AP mode there is no such validation and
potentially any supported cipher could be chosen, even if its
incompatible for the type of key.
The netdev_copy_tk function was being hard coded with authenticator
set to false. This isn't important for any ciphers except TKIP but
now that AP mode supports TKIP it needs to be fixed.
Though TKIP is deprecated and insecure its trivial to support it in
AP mode as we already do in station. This is only to allow AP mode
for old hardware that may only support TKIP. If the hardware supports
any higher level cipher that will be chosen automatically.
The key descriptor version was hard coded to HMAC_SHA1_AES which
is correct when using IE_RSN_AKM_SUITE_PSK + CCMP. ap.c hard
codes the PSK AKM but still uses wiphy to select the cipher. In
theory there could be hardware that only supports TKIP which
would then make IWD non-compliant since a different key descriptor
version should be used with PSK + TKIP (HMAC_MD5_ARC4).
Now use a helper to sort out which key descriptor should be used
given the AKM and cipher suite.
Similarly to l_netconfig track whether IWD's netconfig is active (from
the moment of netconfig_configure() till netconfig_reset()) using a
"started" flag and avoid handling or emitting any events after "started"
is cleared.
This fixes an occasional issue with the Netconfig Agent backend where
station would reset netconfig, netconfig would issue DBus calls to clear
addresses and routes, station would go into DISCONNECTING, perhaps
finish and go into DISCONNECTED and after a while the DBus calls would
come back with an error which would cause a NETCONFIG_EVENT_FAILED
causing station to call netdev_disconnct() for a second time and
transition to and get stuck in DISCONNECTING.
Both CMD_ASSOCIATE and CMD_CONNECT paths were using very similar code to
build RSN specific attributes. Use a common function to build these
attributes to cut down on duplicated code.
While here, also start using ie_rsn_cipher_suite_to_cipher instead of
assuming that the pairwise / group ciphers can only be CCMP or TKIP.
Instead of copy-pasting the same basic operation (memcpy & assignment),
use a goto and a common path instead. This should also make it easier
for the compiler to optimize this function.
The known frequency list may include frequencies that once were
allowed but are now disabled due to regulatory restrictions. Don't
include these frequencies in the roam scan.
These events are sent if IWD fails to authentiate
(ft-over-air-roam-failed) or if it falls back to over air after
failing to use FT-over-DS (try-ft-over-air)
If IPv4 setup fails and the netconfig logic gives up, continue as if the
connection had failed at earlier stages so that autoconnect can try the
next available network.
Certain drivers support/require probe response offloading which
IWD did not check for or properly handle. If probe response
offloading is required the probe response frame watch will not
be added and instead the ATTR_PROBE_RESP will be included with
START_AP.
The head/tail builders were reused but slightly modified to check
if the probe request frame is NULL, since it will be for use with
START_AP.
Parse the AP probe response offload attribute during the dump. If
set this indicates the driver expects the probe response attribute
to be included with START_AP.
Clearing all authentications during ft_authenticate was a very large
hammer and may remove cached authentications that could be used if
the current auth attempt fails.
For example the best BSS may have a problem and fail to authenticate
early with FT-over-DS, then fail with FT-over-Air. But another BSS
may have succeeded early with FT-over-DS. If ft_authenticate clears
all ft_infos that successful authentication will be lost.
AP roaming was structured such that any AP roam request would
force IWD to roam (assuming BSS's were found in scan results).
This isn't always the best behavior since IWD may be connected
to the best BSS in range.
Only force a roam if the AP includes one of the 3 disassociation/
termination bits. Otherwise attempt to roam but don't set the
ap_directed_roaming flag which will allows IWD to stay with the
current BSS if no better candidates are found.
There are a few checks that can be done prior to parsing the
request, in addition the explicit check for preparing_roam was
removed since this is taken care of by station_cannot_roam().
Once offchannel completes we can check if the info structure was
parsed, indicating authentication succeeded. If not there is no
reason to keep it around since IWD will either try another BSS or
fail.
This both adds proper handling to the new roaming logic and fixes
a potential bug with firmware roams.
The new way roaming works doesn't use a connect callback. This
means that any disconnect event or call to netdev_connect_failed
will result in the event handler being called, where before the
connect callback would. This means we need to handle the ROAMING
state in the station disconnect event so IWD properly disassociates
and station goes out of ROAMING.
With firmware roams netdev gets an event which transitions station
into ROAMING. Then netdev issues GET_SCAN. During this time a
disconnect event could come in which would end up in
station_disconnect_event since there is no connect callback. This
needs to be handled the same and let IWD transition out of the
ROAMING state.
This finalizes the refactor by moving all the handshake prep
into FT itself (most was already in there). The netdev-specific
flags and state were added into netdev_ft_tx_associate which
now avoids any need for a netdev API related to FT.
The NETDEV_EVENT_FT_ROAMED event is now emitted once FT completes
(netdev_connect_ok). This did require moving the 'in_ft' flag
setting until after the keys are set into the kernel otherwise
netdev_connect_ok has no context as to if this was FT or some
other connection attempt.
In addition the prev_snonce was removed from netdev. Restoring
the snonce has no value once association begins. If association
fails it will result in a disconnect regardless which requires
a new snonce to be generated
This converts station to using ft_action/ft_authenticate and
ft_associate and dropping the use of the netdev-only/auth-proto
logic.
Doing this allows for more flexibility if FT fails by letting
IWD try another roam candidate instead of disconnecting.
Now the full action frame including the header is provided to ft
which breaks the existing parser since it assumes the buffer starts
at the body of the message.
This forwards Action, Authentication and Association frames to
ft.c via their new hooks in netdev.
Note that this will break FT-over-Air temporarily since the
auth-proto still is in use.
The current behavior is to only find the best roam candidate, which
generally is fine. But if for whatever reason IWD fails to roam it
would be nice having a few backup BSS's rather than having to
re-scan, or worse disassociate and reconnect entirely.
This patch doesn't change the roam behavior, just prepares for
using a roam candidate list. One difference though is any roam
candidates are added to station->bss_list, rather than just the
best BSS. This shouldn't effect any external behavior.
The candidate list is built based on scan_bss rank. First we establish
a base rank, the rank of the current BSS (or zero if AP roaming). Any
BSS in the results with a higher rank, excluding the current BSS, will
be added to the sorted station->roam_bss_list (as a new 'roam_bss'
entry) as well as stations overall BSS list. If the resulting list is
empty there were no better BSS's, otherwise station can now try to roam
starting with the best candidate (head of the roam list).
A new API was added, ft_authenticate, which will send an
authentication frame offchannel via CMD_FRAME. This bypasses
the kernel's authentication state allowing multiple auth
attempts to take place without disconnecting.
Currently netdev handles caching FT auth information and uses FT
parsers/auth-proto to manage the protocol. This sets up to remove
this state machine from netdev and isolate it into ft.c.
This does not break the existing auth-proto (hence the slight
modifications, which will be removed soon).
Eventually the auth-proto will be removed from FT entirely, replaced
just by an FT state machine, similar to how EAPoL works (netdev hooks
to TX/RX frames).
There may be situations (due to Multi-BSS operation) where an AP might
be advertising multiple SSIDs on the same BSSID. It is thus more
correct to lookup the preauthentication target on the network object
instead of the station bss_list. It used to be that the network list of
bsses was not updated when roam scan was performed. Hence the lookup
was always performed on the station bss_list. But this is no longer the
case, so it is safer to lookup on the network object directly on the
network.
The warnings in the authenticate and connect events were identical
so it could be difficult knowing which print it was if IWD is not
in debug mode (to see more context). The prints were changed to
indicate which event it was and for the connect event the reason
attribute is also parsed.
Note the resp_ies_len is also initialized to zero now. After making
the changes gcc was throwing a warning.
FT is special in that it really should not be interrupted. Since
FRAME/OFFCHANNEL have the highest priority we run the risk of
DPP or some other offchannel operation interfering with FT.
FT is now driven (mostly) by station which removes the connect
callback. Instead once FT is completed, keys set, etc. netdev
will send an event to notify station.
Since l_netconfig's DHCPv6 client instance no longer sets parameters on
the l_icmp6_client instance, call l_icmp6_client_set_nodelay() and
l_icmp6_client_set_debug() directly. Also enable optimistic DAD to
speed up IPv6 setup if available.
All uses of frame-xchg were for action frames, and the frame type
was hard coded. Soon other frame types will be needed so the type
must now be specified in the frame_xchg_prefix structure.
This will make the debug API more robust as well as fix issues
certain drivers have when trying to roam. Some of these drivers
may flush scan results after CMD_CONNECT which results in -ENOENT
when trying to roam with CMD_AUTHENTICATE unless you rescan
explicitly.
Now this will be taken care of automatically and station will first
scan for the BSS (or full scan if not already in results) and
attempt to roam once the BSS is seen in a fresh scan.
The logic to replace the old BSS object was factored out into its
own function to be shared by the non-debug roam scan. It was also
simplified to just update the network since this will remove the
old BSS if it exists.
Add a second netconfig-commit backend which, if enabled, doesn't
directly send any of the network configuration to the kernel or system
files but delegates the operation to an interested client's D-Bus
method as described in doc/agent-api.txt. This backend is switched to
when a client registers a netconfig agent object and is swiched away
from when the client disconnects or unregisters the agent. Only one
netconfig agent can be registered any given time.
Add netconfig_event_handler() that responds to events emitted by
the l_netconfig object by calling netconfig_commit, tracking whether
we're connected for either address family and emitting
NETCONFIG_EVENT_CONNECTED or NETCONFIG_EVENT_FAILED as necessary.
NETCONFIG_EVENT_FAILED is a new event as until now failures would cause
the netconfig state machine to stop but no event emitted so that
station.c could take action. As before, these events are only
emitted based on the IPv4 configuration state, not IPv6.
Add netconfig-commit.c whose main method, netconfig_commit actually sets
the configuration obtained by l_netconfig to the system netdev,
specifically it sets local addresses on the interface, adds routes to the
routing table, sets DNS related data and may add entries to the neighbor
cache. netconfig-commit.c uses a backend-ops type structure to allow
for switching backends. In this commit there's only a default backend
that uses l_netconfig_rtnl_apply() and a struct resolve object to write
the configuration.
netconfig_gateway_to_arp is moved from netconfig.c to netconfig-commit.c
(and renamed.) The struct netconfig definition is moved to netconfig.h
so that both files can access the settings stored in the struct.
To avoid repeated lookups by ifindex, replace the ifindex member in
struct netconfig with a struct netdev pointer. A struct netconfig
always lives shorter than the struct netdev.
* make the error handling simpler,
* make error messages more consistent,
* validate address families,
* for IPv4 skip l_rtnl_address_set_noprefixroute()
as l_netconfig will do this internally as needed.
* for IPv6 set the default prefix length to 64 as that's going to be
used for the local prefix route's prefix length and is a more
practical value.
Drop all the struct netconfig members where we were keeping the parsed
netconfig settings and add a struct l_netconfig object. In
netconfig_load_settings load all of the settings once parsed directly
into the l_netconfig object. Only preserve the mdns configuration and
save some boolean values needed to properly handle static configuration
and FILS. Update functions to use the new set of struct netconfig
members.
These booleans mirroring the l_netconfig state could be replaced by
adding l_netconfig getters for settings which currently only have
setters.
In anticipation of switching to use the l_netconfig API, which
internally handles DHCPv4, DHCPv6, ACD, etc., drop pointers to
instances of l_dhcp_client, l_dhcp6_client and l_acd from struct
netconfig. Also drop all code used for handling events from these
APIs, including code to commit the received configurations to the
system. Committing the final settings to the system netdevs is going to
be handled by a new set of utilities in a new file.
The RRM module was blindly scanning using the requested
frequency which may or may not be possible given the hardware.
Instead check that the frequency will work and if not reject
the request.
This was reported by a user seeing the RRM scan fail which was
due to the AP requesting a scan on 5GHz when the adapter was
2.4GHz only.
Support for MAC address changes while powered was recently added to
mac80211. This avoids the need to power down the device which both
saves time as well as preserves any allowed frequencies which may
have been disabled if the device powered down.
The code path for changing the address was reused but now just the
'up' callback will be provided directly to l_rtnl_set_mac. Since
there aren't multiple stages of callbacks the rtnl_data structure
isn't strictly needed, but the code looks cleaner and more
consistent between the powered/non-powered code paths.
The comment/debug error print was also updated to be more general
between the two MAC change code paths.
Documentation for MulticastDNS setting suggests it should be part of the
main iwd configuration file. See man iwd.config. However, in reality
the setting was being pulled from the network provisioning file instead.
The latter actually makes more sense since systemd-resolved has its own
set of global defaults. Fix the documentation to reflect the actual
implementation.
netdev does not keep any pointers to struct scan_bss arguments that are
passed in. Make this explicitly clear by modifying the API definitions
and mark these as const.
This adds a new netdev event for packet loss notifications from
the kernel. Depending on the scenario a station may see packet
loss events without any other indications like low RSSI. In these
cases IWD should still roam since there is no data flowing.
Some APs use an older hostapd OWE implementation which incorrectly
derives the PTK. To work around this group 19 should be used for
these APs. If there is a failure (reason=2) and the AKM is OWE
set force default group into network and retry. If this has been
done already the behavior is no different and the BSS will be
blacklisted.
If a OWE network is buggy and requires the default group this info
needs to be stored in network in order for it to set this into the
handshake on future connect attempts.
This functionality works around the kernel's behavior of allowing
6GHz only after a regulatory domain update. If the regdom updates
scan.c needs to be aware in order to split up periodic scans, or
insert 6GHz frequencies into an ongoing periodic scan. Doing this
allows any 6GHz BSS's to show up in the scan results rather than
needing to issue an entirely new scan to see these BSS's.
The kernel's regulatory domain updates after some number of beacons
are processed. This triggers a regulatory domain update (and wiphy
dump) but only after a scan request. This means a full scan started
prior to the regdom being set will not include any 6Ghz BSS's even
if the regdom was unlocked during the scan.
This can be worked around by splitting up a large scan request into
multiple requests allowing one of the first commands to trigger a
regdom update. Once the regdom updates (and wiphy dumps) we are
hopefully still scanning and could append an additional request to
scan 6GHz.
In the case of an external scan, we won't have a scan_request object,
sr. Make sure to not crash in this case.
Also, since scan_request can no longer carry the frequency set in all
cases, add a new member to scan_results in order to do so.
Fixes: 27d8cf4ccc ("scan: track scanned frequencies for entire request")
The kernel handles setting the regulatory domain by receiving beacons
which set the country IE. Presumably since most regulatory domains
disallow 6GHz the default (world) domain also disables it. This means
until the country is set, 6GHz is disabled.
This poses a problem for IWD's quick scanning since it only scans a few
frequencies and this likely isn't enough beacons for the firmware to
update the country, leaving 6Ghz inaccessable to the user without manual
intervention (e.g. iw scan passive, or periodic scans by IWD).
To try and work around this limitation the quick scan logic has been
updated to check if a 6GHz AP has been connected to before and if that
frequency is disabled (but supported). If this is the case IWD will opt
for a full passive scan rather than scanning a limited set of
frequencies.
For whatever reason the kernel will send regdom updates even if
the regdom didn't change. This ends up causing wiphy to dump
which isn't needed since there should be no changes in disabled
frequencies.
Now the previous country is checked against the new one, and if
they match the wiphy is not dumped again.
A change in regulatory domain can result in frequencies being
enabled or disabled depending on the domain. This effects the
frequencies stored in wiphy which other modules depend on
such as scanning, offchannel work etc.
When the regulatory domain changes re-dump the wiphy in order
to update any frequency restrictions.
A helper to check whether the country code corresponds to a
real country, or some special code indicating the country isn't
yet set. For now, the special codes are OO (world roaming) and
XX (unknown entity).
Events to indicate when a regulatory domain wiphy dump has
started and ended. This is important because certain actions
such as scanning need to be delayed until the dump has finished.
The NEW_SCAN_RESULTS handling was written to only parse the frequency
list if there were no additional scan commands to send. This results in
the scan callback containing frequencies of only the last CMD_TRIGGER.
Until now this worked fine because a) the queue is only used for hidden
networks and b) frequencies were never defined by any callers scanning
for hidden networks (e.g. dbus/periodic scans).
Soon the scan command queue will be used to break up scan requests
meaning only the last scan request frequencies would be used in the
callback, breaking the logic in station.
Now the NEW_SCAN_RESULTS case will parse the frequencies for each scan
command rather than only the last.
The compiler treated the '1' as an int type which was not big enough
to hold a bit shift of 31:
runtime error: left shift of 1 by 31 places cannot be represented in
type 'int'
Instead of doing the iftype check manually, refactor
wiphy_get_supported_iftypes by adding a subroutine which just parses
out iftypes from a mask into a char** list. This removes the need to
case each iftype into a string.
Add extra logging around CQM events to help track wifi status. This is
useful for headless systems that can only be accessed over the network
and so information in the logs is invaluable for debugging outages.
Prior to this change, the only log for CQM messages is saying one was
received. This adds details to what attributes were set and the
associated data with them.
The signal strength log format was chosen to roughly match
wpa_supplicant's which looks like this:
CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-60 noise=-96 txrate=6000
Provides useful information on why a roam might have failed, such as
failing to find the BSS or the BSS being ranked lower, and why that
might be.
The output format is the same as station_add_seen_bss for consistency.
If a frequency is disabled IWD should keep track and disallow any
operations on that channel such as scanning. A new list has been added
which contains only disabled frequencies.
The scan_passive API wasn't using a const struct scan_freq_set as it
should be since it's not modifying the contents. Changing this to
const did require some additional changes like making the scan_parameters
'freqs' member const as well.
After changing scan_parameters, p2p needed updating since it was using
scan_parameters.freqs directly. This was changed to using a separate
scan_freq_set pointer, then setting to scan_parameters.freqs when needed.
Similar to the HT/VHT APIs, this estimates the data rate based on the
HE Capabilities element, in addition to our own capabilities. The
logic is much the same as HT/VHT. The major difference being that HE
uses several MCS tables depending on the channel width. Each width
MCS set is checked (if supported) and the highest estimated rate out
of all the MCS sets is used.
There appears to be a compiler bug with gcc 11.2 which thinks the vht_mcs_set
is a zero length array, and the memset of size 8 is out of bounds. This is only
seen once an element is added to 'struct band'.
In file included from /usr/include/string.h:519,
from src/wiphy.c:34:
In function ‘memset’,
inlined from ‘band_new_from_message’ at src/wiphy.c:1300:2,
inlined from ‘parse_supported_bands’ at src/wiphy.c:1423:11,
inlined from ‘wiphy_parse_attributes’ at src/wiphy.c:1596:5,
inlined from ‘wiphy_update_from_genl’ at src/wiphy.c:1773:2:
/usr/include/bits/string_fortified.h:59:10: error: ‘__builtin_memset’ offset [0, 7] is out of the bounds [0, 0] [-Werror=array-bounds]
59 | return __builtin___memset_chk (__dest, __ch, __len,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
60 | __glibc_objsize0 (__dest));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~
This increases the maximum data rate which now is possible with HE.
A few comments were also updated, one to include 6G when adjusting
the rank for >4000mhz, and the other fixing a typo.
This is a general way of finding the best MCS/NSS values which will work
for HT, VHT, and HE by passing in the max MCS values for each value which
the MCS map could contain (0, 1, or 2).
The HE capabilities information is contained in
NL80211_BAND_ATTR_IFTYPE_DATA where each entry is a set of attributes
which define the rules for one or more interface types. This patch
specifically parses the HE PHY and HE MCS data which will be used for
data rate estimation.
Since the set of info is per-iftype(s) the data is stored in a queue
where each entry contains the PHY/MCS info, and a uint32 bit mask where
each bit index signifies an interface type.
With the addition of HE, the print function for MCS sets needs to change
slightly. The maps themselves are the same format, but the values indicate
different MCS ranges. Now the three MCS max values are passed in.
This queue will hold iftype(s) specific data for HE capabilities. Since
the capabilities may differ per-iftype the data is stored as such. Iftypes
may share a configuration so the band_he_capabilities structure has a
mask for each iftype using that configuration.
Certain module dependencies were missing, which could cause a crash on
exit under (very unlikely) circumstances.
#0 l_queue_peek_head (queue=<optimized out>) at ../iwd-1.28/ell/queue.c:241
#1 0x0000aaaab752f2a0 in wiphy_radio_work_done (wiphy=0xaaaac3a129a0, id=6)
at ../iwd-1.28/src/wiphy.c:2013
#2 0x0000aaaab7523f50 in netdev_connect_free (netdev=netdev@entry=0xaaaac3a13db0)
at ../iwd-1.28/src/netdev.c:765
#3 0x0000aaaab7526208 in netdev_free (data=0xaaaac3a13db0) at ../iwd-1.28/src/netdev.c:909
#4 0x0000aaaab75a3924 in l_queue_clear (queue=queue@entry=0xaaaac3a0c800,
destroy=destroy@entry=0xaaaab7526190 <netdev_free>) at ../iwd-1.28/ell/queue.c:107
#5 0x0000aaaab75a3974 in l_queue_destroy (queue=0xaaaac3a0c800,
destroy=destroy@entry=0xaaaab7526190 <netdev_free>) at ../iwd-1.28/ell/queue.c:82
#6 0x0000aaaab7522050 in netdev_exit () at ../iwd-1.28/src/netdev.c:6653
#7 0x0000aaaab7579bb0 in iwd_modules_exit () at ../iwd-1.28/src/module.c:181
In this particular case, wiphy module was de-initialized prior to the
netdev module:
Jul 14 18:14:39 localhost iwd[2867]: ../iwd-1.28/src/wiphy.c:wiphy_free() Freeing wiphy phy0[0]
Jul 14 18:14:39 localhost iwd[2867]: ../iwd-1.28/src/netdev.c:netdev_free() Freeing netdev wlan0[45]
This fixes a crash associated with toggling the iftype to AP mode
then calling GetDiagnostics. The diagnostic interface is never
cleaned up when netdev goes down so DBus calls can still be made
which ends up crashing since the AP interface objects are no longer
valid.
Running the following iwctl commands in a script (once or twice)
triggers this crash reliably:
iwctl device wlp2s0 set-property Mode ap
iwctl device wlp2s0 set-property Mode station
iwctl device wlp2s0 set-property Mode ap
iwctl ap wlp2s0 start myssid secret123
iwctl ap wlp2s0 show
++++++++ backtrace ++++++++
0 0x7f8f1a8fe320 in /lib64/libc.so.6
1 0x451f35 in ap_dbus_get_diagnostics() at src/ap.c:4043
2 0x4cdf5a in _dbus_object_tree_dispatch() at ell/dbus-service.c:1815
3 0x4bffc7 in message_read_handler() at ell/dbus.c:285
4 0x4b5d7b in io_callback() at ell/io.c:120
5 0x4b489b in l_main_iterate() at ell/main.c:476
6 0x4b49a6 in l_main_run() at ell/main.c:519
7 0x4b4cd9 in l_main_run_with_signal() at ell/main.c:645
8 0x404f5b in main() at src/main.c:600
9 0x7f8f1a8e8b75 in /lib64/libc.so.6
+++++++++++++++++++++++++++
About a month ago hostapd was changed to set the secure bit on
eapol frames during rekeys (bc36991791). The spec is ambiguous
about this and has conflicting info depending on the sections you
read (12.7.2 vs 12.7.6). According to the hostapd commit log TGme
is trying to clarify this and wants to set secure=1 in the case
of rekeys. Because of this, IWD is completely broken with rekeys
since its disallows secure=1 on PTK 1/4 and 2/4.
Now, a bool is passed to the verify functions which signifies if
the PTK has been negotiated already. If secure differs from this
the key frame is not verified.
The man pages (iwd.network) have a section about how to name provisioning
files containing non-alphanumeric characters but not everyone reads the
entire man page.
Warning them that the provisioning file was not read and pointing to
'man iwd.network' should lead someone in the right direction.
EAP-Success might come in with an identifier that is incremented by 1
from the last Response packet. Since identifier field is a byte, the
value might overflow (from 255 -> 0.) This overflow isn't handled
properly resulting in EAP-Success/Failure packets with a 0 identifier
due to overflow being erroneously ignored. Fix that.
Most users of storage_network_open don't log errors when the function
returns a NULL and fall back to defaults (empty l_settings).
storage_network_open() itself only logs errors if the flie is encrypted.
Now also log an error when l_settings_load_from_file() fails to help track
down potential syntax errors.
Drop the wrong negation in the error check. Check that there are no extra
characters after prefix length suffix. Reset errno 0 before the strtoul
call, as recommended by the manpage.
This is actually a false positive only because
p2p_device_validate_conn_wfd bails out if the IE is NULL which
avoids using wfd_data_length. But its subtle and without inspecting
the code it does seem like the length could be used uninitialized.
src/p2p.c:940:7: error: variable 'wfd_data_len' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
if (dev->conn_own_wfd)
^~~~~~~~~~~~~~~~~
src/p2p.c:946:8: note: uninitialized use occurs here
wfd_data_len))
^~~~~~~~~~~~
src/p2p.c:940:3: note: remove the 'if' if its condition is always true
if (dev->conn_own_wfd)
^~~~~~~~~~~~~~~~~~~~~~
src/p2p.c:906:23: note: initialize the variable 'wfd_data_len' to silence this warning
ssize_t wfd_data_len;
^
= 0
On musl-gcc the compiler is giving a warning for igtk_key_index
and gtk_key_index being used uninitialized. This isn't possible
since they are only used if gtk/igtk are non-NULL so pragma to
ignore the warning.
src/fils.c: In function 'fils_rx_associate':
src/fils.c:580:17: error: 'igtk_key_index' may be used uninitialized
in this function [-Werror=maybe-uninitialized]
580 | handshake_state_install_igtk(fils->hs,
igtk_key_index,igtk + 6,
igtk_len - 6, igtk);
(same error for gtk_key_index)
For network configuration files the man pages (iwd.network) state
that [General].{AlwaysRandomizeAddress,AddressOverride} are only
used if main.conf has [General].AddressRandomization=network.
This actually was not being enforced and both iwd.network settings
were still taken into account regardless of what AddressRandomization
was set to (even disabled).
The handshake setup code now checks the AddressRandomization value
and if anything other than 'network' skips the randomization.
There were a few places in dpp/dpp-util which passed a single byte but
was being read in with va_arg(va, size_t). On some architectures this was
causing failures presumably from the compiler using an integer type
smaller than size_t. As we do elsewhere, cast to size_t to force the
compiler to pass a properly sized iteger to va_arg.
After one of the eap-tls-common-based methods succeeds keep the TLS
tunnel instance until the method is freed, rather than free it the
moment the method succeeds. This fixes repeated method runs where until
now each next run would attempt to create a new TLS tunnel instance
but would have no authentication data (CA certificate, client
certificate, private key and private key passphrase) since those are
were by the old l_tls object from the moment of the l_tls_set_auth_data()
call.
Use l_tls_reset() to reset the TLS state after method success, followed
by a new l_tls_start() when the reauthentication starts.
A user reported that IWD was failing to FT in some cases and this was
due to the AP setting the Retry bit in the frame type. This was
unexpected by IWD since it directly checks the frame type against
0x00b0 which does not account for any B8-B15 bits being set.
IWD doesn't need to verify the frame type field for a few reasons:
First mpdu_validate checks the management frame type, Second the kernel
checks prior to forwarding the event. Because of this the check was
removed completely.
Reported-By: Michael Johnson <mjohnson459@gmail.com>
station_signal_agent_notify() has been refactored so that its usage is
simpler. station_rssi_level_changed() has been replaced by an inlined
call to station_signal_agent_notify().
The call to netdev_rssi_level_init() in netdev_connect_common() is
currently a no-op, because netdev->connected has not yet been set at
this stage of the connection attempt. Because netdev_rssi_level_init()
is only used twice, it's been replaced by two inlined calls to
netdev_set_rssi_level_idx().
The SignalLevelAgent API is currently broken by the system bus's
security policy, which blocks iwd's outgoing method call messages. This
patch punches a hole for method calls on the
net.connman.iwd.SignalLevelAgent interface.
There may be situations where DNS information should not be set (for
example in auto-tests where the resolver daemon is not running) or if a
user wishes to ignore DNS information obtained.
Allows granularly specifying the DHCP logging level. This allows the
user to tailor the output to what they need. By default, always display
info, errors and warnings to match the rest of iwd.
Setting `IWD_DHCP_DEBUG` to "debug", "info", "warn", "error" will limit
the logging to that level or higher allowing the default logging
verbosity to be reduced.
Setting `IWD_DHCP_DEBUG` to "1" as per the current behavior will
continue to enable debug level logging.
After the initial handshake, once the TK has been installed, all frames
coming to the AP should be encrypted. However, it seems that some
kernel/driver combinations allow unencrypted EAPoL frames to be received
and forwarded to userspace. This can lead to various attacks.
Some drivers can report whether the EAPoL frame has been received
unencrypted. Use this information to drop unencrypted EAPoL frames
received after the initial handshake has been completed.
After the initial handshake, once the TK has been installed, all frames
coming from the AP should be encrypted. However, it seems that some
kernel/driver combinations allow unencrypted EAPoL frames to be received
and forwarded to userspace. This can lead to a denial-of-service attack
where receipt of an invalid, unencrypted EAP-Failure frame generated by
an adversary results in iwd terminating an ongoing connection.
Some drivers can report whether the EAPoL frame has been received
unencrypted. Use this information to drop unencrypted EAP frames
received after the initial handshake has been completed.
Reported-by: Domien Schepers <schepers.d@northeastern.edu>
After the initial handshake, once the TK has been installed, all frames
coming from the AP should be encrypted. However, it seems that some
kernel/driver combinations allow unencrypted EAPoL frames to be received
and forwarded to userspace. This can lead to a denial-of-service attack
where receipt of an invalid, unencrypted EAPoL 1/4 frame generated by an
adversary results in iwd terminating an ongoing connection.
Some drivers can report whether the EAPoL frame has been received
unencrypted. Use this information to drop unencrypted PTK 1/4 frames
received after the initial handshake has been completed.
Reported-by: Domien Schepers <schepers.d@northeastern.edu>
Do not fail an ongoing handshake when an invalid EAPoL frame is
received. Instead, follow the intent of 802.11-2020 section 12.7.2:
"EAPOL-Key frames containing invalid field values shall be silently
discarded."
This prevents a denial-of-service attack where receipt of an invalid,
unencrypted EAPoL 1/4 frame generated by an adversary results in iwd
terminating an ongoing connection.
Reported-by: Domien Schepers <schepers.d@northeastern.edu>
Periodic scan requests are meant to be performed with a lower priority
than normal scan requests. They're thus given a different priority when
inserting them into the wiphy work queue. Unfortunately, the priority
is not taken into account when they are inserted into the
sr->requests queue. This can result in the scanning code being confused
since it assumes the top of the queue is always the next scheduled or
currently ongoing scan. As a result any further wiphy_work might never be
started properly.
Apr 27 16:34:40 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_insert() Inserting work item 3
Apr 27 16:34:40 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_next() Starting work item 3
Apr 27 16:34:40 iwd[5117]: ../iwd-1.26/src/scan.c:scan_periodic_timeout() 1
Apr 27 16:34:40 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_insert() Inserting work item 4
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_insert() Inserting work item 5
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_done() Work item 3 done
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_next() Starting work item 5
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/scan.c:scan_notify() Scan notification Trigger Scan(33)
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/scan.c:scan_request_triggered() Passive scan triggered for wdev 1
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/scan.c:scan_periodic_triggered() Periodic scan triggered for wdev 1
In the above log, scan request 5 (triggered by dbus) is started before
scan request 4 (periodic scan). Yet the scanning code thinks scan
request 4 was triggered.
Fix this by using the wiphy_work priority to sort the sr->requests queue
so that the scans are ordered in the same manner.
Reported-by: Alvin Šipraga <ALSI@bang-olufsen.dk>
The upstream code immediately retransmitted any no-ACK frames.
This would work in cases where the peer wasn't actively switching
channels (e.g. the ACK was simply lost) but caused unintended
behavior in the case of a channel switch when the peer was not
listening.
If either IWD or the peer needed to switch channels based on the
authenticate request the response may end up not getting ACKed
because the peer is idle, or in the middle of the hardware changing
channels. IWD would get no-ACK and immediately send the frame again
and most likely the same behavior would result. This would very
quickly increment frame_retry past the maximum and DPP would fail.
Instead when no ACK is received wait 1 second before sending out
the next frame. This can re-use the existing frame_pending buffer
and the same logic for re-transmitting.
There is a potential corner case of an offchannel frame callback
being called after ROC has ended.
This could happen in theory if a received frame is queued right as
the ROC session expires. If the ROC cancel event makes it to user
space before the frame IWD will schedule another ROC then receive
the frame. This doesn't prevent IWD from sending out another
frame since OFFCHANNEL_TX_OK is used, but it will prevent IWD from
receiving a response frame since no dwell duration is used with DPP.
To handle this an roc_started bool was added to the dpp_sm which
tracks the ROC state. If dpp_send_frame is called when roc_started
is false the frame will be saved and sent out once the ROC session
is started again.
ConnectHiddenNetwork creates a temporary network object and initiates a
connection with it. If the connection fails (due to an incorrect
passphrase or other reasons), then this temporary object is destroyed.
Delay its destruction until network_disconnected() since
network_connect_failed is called too early. Also, re-order the sequence
in station_reset_connection_state() in order to avoid using the network
object after it has been freed by network_disconnected().
Fixes: 85d9d6461f ("network: Hide hidden networks on connection error")
station_hide_network will remove and free the network object, so calling
network_close_settings will result in a crash. Make sure this is done
prior to network object's destruction.
Fixes: 85d9d6461f ("network: Hide hidden networks on connection error")
If a user connection fails on a freshly scanned psk or open hidden
network, during passphrase request or after, it shall be removed from
the network list. Otherwise, it would be possible to directly connect
to that known network, which will appear as not hidden.
p2p_peer_update_existing may be called with a scan_bss struct built from
a Probe Request frame so it can't access bss->p2p_probe_resp_info even
if peer->bss was built from a Probe Response. Check the source frame
type of the scan_bss struct before updating the Device Address.
This fixes one timing issue that would make the autotest fail often.
Since l_malloc is used the frame contents are not zero'ed automatically
which could result in random bytes being present in the frame which were
expected to be zero. This poses a problem when calculating the MIC as the
crypto operations are done on the entire frame with the expectation of
the MIC being zero.
Fixes: 83212f9b23 ("eapol: change eapol_create_common to support FILS")
explicit_bzero is used in src/storage.c since commit
01cd858760 but src/missing.h is not
included, as a result build with uclibc fails on:
/home/buildroot/autobuild/instance-0/output-1/host/lib/gcc/powerpc-buildroot-linux-uclibc/10.3.0/../../../../powerpc-buildroot-linux-uclibc/bin/ld: src/storage.o: in function `storage_init':
storage.c:(.text+0x13a4): undefined reference to `explicit_bzero'
Fixes:
- http://autobuild.buildroot.org/results/2aff8d3d7c33c95e2c57f7c8a71e69939f0580a1
This is used to hold the current BSS frequency which will be
used after IWD receives a presence announcement. Since this was
not being set, the logic was always thinking there was a channel
mismatch (0 != current_freq) and attempting to go offchannel to
'0' which resulted in -EINVAL, and ultimately protocol termination.
The logic here assumed any BSS's in the roam scan were identical to
ones in station's bss_list with the same address. Usually this is true
but, for example, if the BSS changed frequency the one in station's
list is invalid.
Instead when a match is found remove the old BSS and re-insert the new
one.
With the addition of 6GHz '6000' is no longer the maximum frequency
that could be in .known_network.freq. For more robustness
band_freq_to_channel is used to validate the frequency.
Scanning while in AP mode is somewhat of an edge case, but it does
have some usefulness specifically with onboarding new devices, i.e.
a new device starts an AP, a station connects and provides the new
device with network credentials, the new device switches to station
mode and connects to the desired network.
In addition this could be used later for ACS (though this is a bit
overkill for IWD's access point needs).
Since AP performance is basically non-existant while scanning this
feature is meant to be used in a limited scope.
Two DBus API's were added which mirror the station interface: Scan and
GetOrderedNetworks.
Scan is no different than the station variant, and will perform an active
scan on all channels.
GetOrderedNetworks diverges from station and simply returns an array of
dictionaries containing basic information about networks:
{
Name: <ssid>
SignalStrength: <mBm>
Security: <psk, open, or 8021x>
}
Limitations:
- Hidden networks are not supported. This isn't really possible since
the SSID's are unknown from the AP perspective.
- Sharing scan results with station is not supported. This would be a
convenient improvement in the future with respect to onboarding new
devices. The scan could be performed in AP mode, then switch to
station and connect immediately without needing to rescan. A quick
hack could better this situation by not flushing scan results in
station (if the kernel retains these on an iftype change).
This was already implemented in station but with no dependency on
that module at all. AP will need this for a scanning API so its
being moved into scan.c.
The 802.11ax standards adds some restrictions for the 6GHz band. In short
stations must use SAE, OWE, or 8021x on this band and frame protection is
required.
All uses of this macro will work with a bitwise comparison which is
needed for 6GHz checks and somewhat more flexible since it can be
used to compare RSN info, not only single AKM values.
This adds checks if MFP is set to 0 or 1:
0 - Always fail if the frequency is 6GHz
1 - Fail if MFPC=0 and the frequency is 6GHz.
If HW is capable set MFPR=1 for 6GHz
This is a new band defined in the WiFi 6E (ax) amendment. A completely
new value is needed due to channel reuse between 2.4/5 and 6GHz.
util.c needed minimal updating to prevent compile errors which will
be fixed later to actually handle this band. WSC also needed a case
added for 6GHz but the spec does not outline any RF Band value for
6GHz so the 5GHz value will be returned in this case.
sae.c was failing to build on some platforms:
error: implicit declaration of function 'reallocarray'; did you mean 'realloc'?
[-Werror=implicit-function-declaration]
In certain rare cases IWD gets a link down event before nl80211 ever sends
a disconnect event. Netdev notifies station of the link down which causes
station to be freed, but netdev remains in the same state. Then later the
disconnect event arrives and netdev still thinks its connected, calls into
(the now freed) station object and causes a crash.
To fix this netdev_connect_free() is now called on any link down events
which will reset the netdev object to a proper state.
src/netdev.c:netdev_link_notify() event 16 on ifindex 16
src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
src/netdev.c:netdev_link_notify() event 16 on ifindex 16
src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
src/netdev.c:netdev_deauthenticate_event()
src/netdev.c:netdev_link_notify() event 16 on ifindex 16
src/station.c:station_free()
src/netconfig.c:netconfig_destroy()
src/resolve.c:resolve_systemd_revert() ifindex: 16
src/station.c:station_roam_state_clear() 16
src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
src/netdev.c:netdev_disconnect_event()
Received Deauthentication event, reason: 3, from_ap: false
0 0x472fa4 in station_disconnect_event src/station.c:2916
1 0x472fa4 in station_netdev_event src/station.c:2954
2 0x43a262 in netdev_disconnect_event src/netdev.c:1213
3 0x43a262 in netdev_mlme_notify src/netdev.c:5471
4 0x6706eb in process_multicast ell/genl.c:1029
5 0x6706eb in received_data ell/genl.c:1096
6 0x65e630 in io_callback ell/io.c:120
7 0x65a94e in l_main_iterate ell/main.c:478
8 0x65b0b3 in l_main_run ell/main.c:525
9 0x65b0b3 in l_main_run ell/main.c:507
10 0x65b5cc in l_main_run_with_signal ell/main.c:647
11 0x4124d7 in main src/main.c:532
The difference between the existing code is that IWD will send the
authentication request, making it the initiator.
This handles the use case where IWD is provided a peers URI containing
its bootstrapping key rather than IWD always providing its own URI.
A new DBus API was added, ConfigureEnrollee().
Using ConfigureEnrollee() IWD will act as a configurator but begin by
traversing a channel list (URI provided or default) and waiting for
presence announcements (with one caveat). When an announcement is
received IWD will send an authentication request to the peer, receive
its reply, and send an authentication confirm.
As with being a responder, IWD only supports configuration to the
currently connected BSS and will request the enrollee switch to this
BSS's frequency to preserve network performance.
The caveat here is that only one driver (ath9k) supports multicast frame
registration which prevents presence frame from being received. In this
case it will be required the the peer URI contains a MAC and channel
information. This is because IWD will jump right into sending auth
requests rather than waiting for a presence announcement.
The frame watch which covers the presence procedure (and most
frames for that matter) needs to support multicast frames for
presence to work. Doing this in frame-xchg seems like the right
choice but only ath9k supports multicast frame registration.
Because of this limited support DPP will register for these frames
manually.
Parses K (key), M (mac), C (class/channels), and V (version) tokens
into a new structure dpp_uri_info. H/I are not parsed since there
currently isn't any use for them.
This was caught by static analysis. As is common this should never
happen in the real world since the only way this can fail (apart from
extreme circumstances like OOM) is if the key size is incorrect, which
it will never be.
Static analysis flagged that 'path' was never being checked (which
should not ever be NULL) but during that review I noticed stat()
was being called, then fstat afterwards.
Recently systemd added the ability to pass secret credentials to
services via LoadCredentialEncrypted/SetCredentialEncrypted. Once
set up the service is able to read the decrypted credentials from
a file. The file path is found in the environment variable
CREDENTIALS_DIRECTORY + an identifier. The value of SystemdEncrypt
should be set to the systemd key ID used when the credential was
created.
When SystemdEncrypt is set IWD will attempt to read the decrypted
secret from systemd. If at any point this fails warnings will be
printed but IWD will continue normally. Its expected that any failures
will result in the inability to connect to any networks which have
previously encrypted the passphrase/PSK without re-entering
the passphrase manually. This could happen, for example, if the
systemd secret was changed.
Once the secret is read in it is set into storage to be used for
profile encryption/decryption.
Using storage_decrypt() hotspot can also support profile encyption.
The hotspot consortium name is used as the 'ssid' since this stays
consistent between hotspot networks for any profile.
Some users don't like the idea of storing network credentials in
plaintext on the file system. This patch implements an option to
encrypt such profiles using a secret key. The origin of the key can in
theory be anything, but would typically be provided by systemd via
'LoadEncryptedCredential' setting in the iwd unit file.
The encryption operates on the entire [Security] group as well as all
embedded groups. Once encrypted the [Security] group will be replaced
with two key/values:
EncryptedSalt - A random string of bytes used for the encryption
EncryptedSecurity - A string of bytes containing the encrypted
[Security] group, as well as all embedded groups.
After the profile has been encrypted these values should not be
modified. Note that any values added to [Security] after encryption
has no effect. Once the profile is encrypted there is no way to modify
[Security] without manually decrypting first, or just re-creating it
entirely which effectively treated a 'new' profile.
The encryption/decryption is done using AES-SIV with a salt value and
the network SSID as the IV.
Once a key is set any profiles opened will automatically be encrypted
and re-written to disk. Modules using network_storage_open will be
provided the decrypted profile, and will be unaware it was ever
encrypted in the first place. Similarly when network_storage_sync is
called the profile will by automatically encrypted and written to disk
without the caller needing to do anything special.
A few private storage.c helpers were added to serve several purposes:
storage_init/exit():
This sets/cleans up the encryption key direct from systemd then uses
extract and expand to create a new fixed length key to perform
encryption/decryption.
__storage_decrypt():
Low level API to decrypt an l_settings object using a previously set
key and the SSID/name for the network. This returns a 'changed' out
parameter signifying that the settings need to be encrypted and
re-written to disk. The purpose of exposing this is for a standalone
decryption tool which does not re-write any settings.
storage_decrypt():
Wrapper around __storage_decrypt() that handles re-writing a new
profile to disk. This was exposed in order to support hotspot profiles.
__storage_encrypt():
Encrypts an l_settings object and returns the full profile as data
This got merged without a few additional fixes, in particular an
over 80 character line and incorrect length check.
Fixes: d8116e8828 ("dpp-util: add dpp_point_from_asn1()")
When we detect a new phy being added, we schedule a filtered dump of
the newly detected WIPHY and associated INTERFACEs. This code path and
related processing of the dumps was mostly shared with the un-filtered
dump of all WIPHYs and INTERFACEs which is performed when iwd starts.
This normally worked fine as long as a single WIPHY was created at a
time. However, if multiphy new phys were detected in a short amount of
time, the logic would get confused and try to process phys that have not
been probed yet. This resulted in iwd trying to create devices or not
detecting devices properly.
Fix this by only processing the target WIPHY and related INTERFACEs
when the filtered dump is performed, and not any additional ones that
might still be pending.
While here, remove a misleading comment:
manager_wiphy_check_setup_done() would succeed only if iwd decided to
keep the default interfaces created by the kernel.
This debug print was before any checks which could bail out prior to
autoconnect starting. This was confusing because debug logs would
contain multiple "station_autoconnect_start()" prints making you think
autoconnect was started several times.
The periodic scan code was refactored to make normal scans and
periodic scans consistent by keeping both in the same queue. But
that change left out the abort path where periodic scans were not
actually removed from the queue.
This fixes a rare crash when a periodic scan has been triggered and
the device goes down. This path never removes the request from the
queue but still frees it. Then when the scan context is removed the
stale request is freed again.
0 0x4bb65b in scan_request_cancel src/scan.c:202
1 0x64313c in l_queue_clear ell/queue.c:107
2 0x643348 in l_queue_destroy ell/queue.c:82
3 0x4bbfb7 in scan_context_free src/scan.c:209
4 0x4c9a78 in scan_wdev_remove src/scan.c:2115
5 0x42fecd in netdev_free src/netdev.c:965
6 0x445827 in netdev_destroy src/netdev.c:6507
7 0x52beb9 in manager_config_notify src/manager.c:765
8 0x67084b in process_multicast ell/genl.c:1029
9 0x67084b in received_data ell/genl.c:1096
10 0x65e790 in io_callback ell/io.c:120
11 0x65aaae in l_main_iterate ell/main.c:478
12 0x65b213 in l_main_run ell/main.c:525
13 0x65b213 in l_main_run ell/main.c:507
14 0x65b72c in l_main_run_with_signal ell/main.c:647
15 0x4124e7 in main src/main.c:532
If netdev_connect_failed is called before netdev_get_oci_cb() the
netdev's handshake will be destroyed and ultimately crash when the
callback is called.
This patch moves the cancelation into netdev_connect_free rather than
netdev_free.
++++++++ backtrace ++++++++
0 0x7f4e1787d320 in /lib64/libc.so.6
1 0x42634c in handshake_state_set_chandef() at src/handshake.c:1057
2 0x40a11b in netdev_get_oci_cb() at src/netdev.c:2387
3 0x483d7b in process_unicast() at ell/genl.c:986
4 0x480d3c in io_callback() at ell/io.c:120
5 0x48004d in l_main_iterate() at ell/main.c:472 (discriminator 2)
6 0x4800fc in l_main_run() at ell/main.c:521
7 0x48032c in l_main_run_with_signal() at ell/main.c:649
8 0x403e95 in main() at src/main.c:532
9 0x7f4e17867b75 in /lib64/libc.so.6
+++++++++++++++++++++++++++
Commit 4d2176df29 ("handshake: Allow event handler to free handshake")
introduced a re-entrancy guard so that handshake_state objects that are
destroyed as a result of the event do not cause a crash. It rightly
used a temporary object to store the passed in handshake. Unfortunately
this caused variable shadowing which resulted in crashes fixed by commit
d22b174a73 ("handshake: use _hs directly in handshake_event").
However, since the temporary was no longer used, this fix itself caused
a crash:
#0 0x00005555f0ba8b3d in eapol_handle_ptk_1_of_4 (sm=sm@entry=0x5555f2b4a920, ek=0x5555f2b62588, ek@entry=0x16, unencrypted=unencrypted@entry=false) at src/eapol.c:1236
1236 handshake_event(sm->handshake,
(gdb) bt
#0 0x00005555f0ba8b3d in eapol_handle_ptk_1_of_4 (sm=sm@entry=0x5555f2b4a920, ek=0x5555f2b62588, ek@entry=0x16, unencrypted=unencrypted@entry=false) at src/eapol.c:1236
#1 0x00005555f0bab118 in eapol_key_handle (unencrypted=<optimized out>, frame=<optimized out>, sm=0x5555f2b4a920) at src/eapol.c:2343
#2 eapol_rx_packet (proto=<optimized out>, from=<optimized out>, frame=<optimized out>, unencrypted=<optimized out>, user_data=0x5555f2b4a920) at src/eapol.c:2665
#3 0x00005555f0bac497 in __eapol_rx_packet (ifindex=62, src=src@entry=0x5555f2b62574 "x\212 J\207\267", proto=proto@entry=34958, frame=frame@entry=0x5555f2b62588 "\002\003",
len=len@entry=121, noencrypt=noencrypt@entry=false) at src/eapol.c:3017
#4 0x00005555f0b8c617 in netdev_control_port_frame_event (netdev=0x5555f2b64450, msg=0x5555f2b62588) at src/netdev.c:5574
#5 netdev_unicast_notify (msg=msg@entry=0x5555f2b619a0, user_data=<optimized out>) at src/netdev.c:5613
#6 0x00007f60084c9a51 in dispatch_unicast_watches (msg=0x5555f2b619a0, id=<optimized out>, genl=0x5555f2b3fc80) at ell/genl.c:954
#7 process_unicast (nlmsg=0x7fff61abeac0, genl=0x5555f2b3fc80) at ell/genl.c:973
#8 received_data (io=<optimized out>, user_data=0x5555f2b3fc80) at ell/genl.c:1098
#9 0x00007f60084c61bd in io_callback (fd=<optimized out>, events=1, user_data=0x5555f2b3fd20) at ell/io.c:120
#10 0x00007f60084c536d in l_main_iterate (timeout=<optimized out>) at ell/main.c:478
#11 0x00007f60084c543e in l_main_run () at ell/main.c:525
#12 l_main_run () at ell/main.c:507
#13 0x00007f60084c5670 in l_main_run_with_signal (callback=callback@entry=0x5555f0b89150 <signal_handler>, user_data=user_data@entry=0x0) at ell/main.c:647
#14 0x00005555f0b886a4 in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:532
This happens when the driver does not support rekeying, which causes iwd to
attempt a disconnect and re-connect. The disconnect action is
taken during the event callback and destroys the underlying eapol state
machine. Since a temporary isn't used, attempting to dereference
sm->handshake results in a crash.
Fix this by introducing a UNIQUE_ID macro which should prevent shadowing
and using a temporary variable as originally intended.
Fixes: d22b174a73 ("handshake: use _hs directly in handshake_event")
Fixes: 4d2176df29 ("handshake: Allow event handler to free handshake")
Reported-By: Toke Høiland-Jørgensen <toke@toke.dk>
Tested-by: Toke Høiland-Jørgensen <toke@toke.dk>
There is no need to punch the holes for netdev/wheel groups to send to
the .Agent interface. This is only done by the iwd daemon itself and
the policy for user 'root' already takes care of this.
A select few drivers send this instead of SIGNAL_MBM. The docs say this
value is the signal 'in unspecified units, scaled to 0..100'. The range
for SIGNAL_MBM is -10000..0 so this can be scaled to the MBM range easy
enough...
Now, this isn't exactly correct because this value ultimately gets
returned from GetOrderedNetworks() and is documented as 100 * dBm where
in reality its just a unit-less signal strength value. Its not ideal, but
this patch at least will fix BSS ranking for these few drivers.
The 'at_console' D-Bus policy setting has been deprecated for more then
10 years and could be ignored at any time in the future. Moreover, while
the intend was to allow locally logged on users to interact with iwd, it
didn't actually do that.
More info at https://www.spinics.net/lists/linux-bluetooth/msg75267.html
and https://gitlab.freedesktop.org/dbus/dbus/-/issues/52
Therefor remove the 'at_console' setting block.
On Debian (based) systems, there is a standard defined group which is
allowed to manage network interfaces, and that is the 'netdev' group.
So add a D-Bus setting block to grant the 'netdev' group that access.
Building on GCC 8 resulted in this compiler error.
src/sae.c:107:25: error: implicit declaration of function 'reallocarray';
did you mean 'realloc'? [-Werror=implicit-function-declaration]
sm->rejected_groups = reallocarray(NULL, 2, sizeof(uint16_t));
src/erp.c:134:10: error: comparison of integer expressions of different
signedness: 'unsigned int' and 'int' [-Werror=sign-compare]
src/eap-ttls.c:378:10: error: comparison of integer expressions of different signedness: 'uint32_t' {aka 'unsigned int'} and 'int' [-Werror=sign-compare]
Fixes the following crash:
#0 0x000211c4 in netdev_connect_event (msg=<optimized out>, netdev=0x2016940) at src/netdev.c:2915
#1 0x76f11220 in process_multicast (nlmsg=0x7e8acafc, group=<optimized out>, genl=<optimized out>) at ell/genl.c:1029
#2 received_data (io=<optimized out>, user_data=<optimized out>) at ell/genl.c:1096
#3 0x76f0da08 in io_callback (fd=<optimized out>, events=1, user_data=0x200a560) at ell/io.c:120
#4 0x76f0ca78 in l_main_iterate (timeout=<optimized out>) at ell/main.c:478
#5 0x76f0cb74 in l_main_run () at ell/main.c:525
#6 l_main_run () at ell/main.c:507
#7 0x76f0cdd4 in l_main_run_with_signal (callback=callback@entry=0x18c94 <signal_handler>, user_data=user_data@entry=0x0)
at ell/main.c:647
#8 0x00018178 in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:532
This crash was introduced in commit:
4d2176df29 ("handshake: Allow event handler to free handshake")
The culprit seems to be that 'hs' is being used both in the caller and
in the macro. Since the macro defines a variable 'hs' in local block
scope, it overrides 'hs' from function scope. Yet (_hs) still evaluates
to 'hs' leading the local variable to be initialized with itself. Only
the 'handshake_event(hs, HANDSHAKE_EVENT_SETTING_KEYS))' is affected
since it is the only macro invocation that uses 'hs' from function
scope. Thus, the crash would only happen on hardware supporting handshake
offload (brcmfmac).
Fix this by removing the local scope variable declaration and evaluate
(_hs) instead.
Fixes: 4d2176df29 ("handshake: Allow event handler to free handshake")