Since netconfig is now part of the Connect() call from a DBus
perspective add a note indicating that this method has the potential
to take a very long time if there are issues with DHCP.
Since the method return to Connect() and ConnectBssid() come after
netconfig some tests needed to be updated since they were waiting
for the method return before continuing. For timeout-based tests
specifically this caused them to fail since before they expected
the return to come before the connection was actually completed.
Let the caller specify the method timeout if there is an expectation
that it could take a long time.
For the conventional connect call (not the "bssid" debug variant) let
them pass their own callback handlers. This is useful if we don't
want to wait for the connect call to finish, but later get some
indication that it did finish either successfully or not.
A netconfig failure results in a failed connection which restarts
autoconnect and prevents IWD from retrying the connection on any
other BSS's within the network as a whole. When autoconnect restarts
IWD will scan and choose the "best" BSS which is likely the same as
the prior attempt. If that BSS is somehow misconfigured as far as
DHCP goes, it will likely fail indefinitely and in turn cause IWD to
retry indefinitely.
To improve this netconfig has been adopted into the IWD's BSS retry
logic. If netconfig fails this will not result in IWD transitioning
to a disconnected state, and instead the BSS will be network
blacklisted and the next will be tried. Only once all BSS's have been
tried will IWD go into a disconnected state and start autoconnect
over.
When netconfig is enabled the DBus reply was being sent in
station_connect_ok(), before netconfig had even started. This would
result in a call to Connect() succeeding from a DBus perspective but
really netconfig still needed to complete before IWD transitioned
to a connected state.
Fixes: 72e7d3ceb83d ("station: Handle NETCONFIG_EVENT_FAILED")
This adds a new API network_clear_blacklist() and removes this
functionality from network_connected(). This is done to support BSS
iteration when netconfig is enabled. Since a call to
network_connected() will happen prior to netconfig completing we
cannot clear the blacklist until netconfig has either passed or
failed.
If this fails, in some cases, -EAGAIN would be returned up to netdev
which would then assume a retry would be done automatically. This
would not in fact happen since it was an internal SAE failure which
would result in the connect method return to never get sent.
Now if sae_send_commit() fails, return -EPROTO which will cause
netdev to fail the connection.
A BSS can temporarily reject associations and provide a delay that
the station should wait for before retrying. This is useful when
sane values are used, but taking it to the extreme an AP could
potentially request the client wait UINT32_MAX TU's which equates
to 49 days.
Either due to a bug, or worse by design, the kernel will wait for
however long that timeout is. Luckily the kernel also sends an event
to userspace with the amount of time it will be waiting. To guard
against excessive timeouts IWD will now handle this event and enforce
a maximum allowed value. If the timeout exceeds this IWD will
deauthenticate.
Specifically for the NO_MORE_STAS reason code, add the BSS to the
(now renamed) AP_BUSY blacklist to avoid roaming to this BSS for
the near future.
Since we are now handling individual reason codes differently the
whole IS_TEMPORARY_STATUS macro was removed and replaced with a
case statement.
The initial pass of this feature only envisioned BSS transition
management frames as the trigger to "roam blacklist" a BSS, hence
the original name. But some APs actually utilize status codes that
also indicate to the stations that they are busy, or not able to
handle more connections. This directly aligns with the original
motivation of the "roam blacklist" series and these events should
also trigger this type of blacklist.
First, since we will be applying this blacklist to cases other
than being told to roam, rename this reason code internally to
BLACKLIST_REASON_AP_BUSY. The config option is also being renamed
to [Blacklist].InitialAccessPointBusyTimeout while also supporting
the old config option, but warning that it is deprecated.
Some drivers/hardware are more strict about limiting use of
certain frequencies on startup until the regulatory domain has
been set. For most cards the only way to set the regulatory domain
is to scan and see BSS's nearby that advertise the country they
reside in.
This is particularly important for AP mode since AP's are always
emitting radiation from beacons and will not start until the desired
frequency is both enabled and allows IR. To make this process
seamless in IWD we will first check that the desired frequency is
enabled/IR and if not issue a scan to (hopefully) get the regulatory
domain set.
A prior patch broke this by checking the return of
l_dbus_message_iter_next_entry. This was really subtle but the logic
actually relied on _not_ checking that return in order to handle
empty lists.
Instead of reverting the logic was adapted/commented to make it more
clear what the API expects from DBus. If list contains at least one
value the first element path will get set, if it contains zero
values "new_path" will be set to NULL which will then cause the
list to be cleared later on.
This both fixes the regression, and makes it clear that a zero
element list is supported and handled.
The length of EncryptedSecurity was assumed to be at least 16 bytes
and anything less would underflow the length to l_malloc.
Fixes: 01cd8587606b ("storage: implement network profile encryption")
The MAC and version elements weren't super critical but the channel
and bootstrapping key elements would result in memory leaks if there
were duplicates.
This patch now will not allow duplicate elements in the URI.
Fixes: f7f602e1b1e7 ("dpp-util: add URI parsing")
The survey arrays were exactly the number of valid channels for a
given band (e.g. 14 for 2.4GHz) but since channels start at 1 this
means that the last channel for a band would overflow the array.
Fixes: 35808debaefd ("scan: use GET_SURVEY for SNR calculation in ranking")
Supporting PMKSA on fullmac drivers requires that we set the PMKSA
into the kernel as well as remove it. This can now be triggered
via the new PMKSA driver callbacks which are implemented and set
with this patch.
In order to support fullmac drivers the PMKSA entries must be added
and removed from the kernel. To accomplish this a set of driver
callbacks will be added to the PMKSA module. In addition a new
pmksa_cache_free API will be added whos only purpose is to handle
the removal from the kernel.
The iwd_notice function was more meant for special purpose events
not general debug prints. For these error conditions we should be
using l_warn. For the informational "External Auth to SSID" log
we already print this information when connecting from station. In
addition there are logs when performing external auth so it should
be very obvious external auth is being used without this log.
The netdev frame watches got cleaned up upon the interface going down
which works if the interface is simply being toggled but when IWD
shuts down it first shuts down the interface, then immediately frees
netdev. If a watched frame arrives immediately after that before the
interface shutdown callback it will reference netdev, which has been
freed.
Fix this by clearing out the frame watches in netdev_free.
==147== Invalid read of size 8
==147== at 0x408ADB: netdev_neighbor_report_frame_event (netdev.c:4772)
==147== by 0x467C75: frame_watch_unicast_notify (frame-xchg.c:234)
==147== by 0x4E28F8: __notifylist_notify (notifylist.c:91)
==147== by 0x4E2D37: l_notifylist_notify_matches (notifylist.c:204)
==147== by 0x4A1388: process_unicast (genl.c:844)
==147== by 0x4A1388: received_data (genl.c:972)
==147== by 0x49D82F: io_callback (io.c:105)
==147== by 0x49C93C: l_main_iterate (main.c:461)
==147== by 0x49CA0B: l_main_run (main.c:508)
==147== by 0x49CA0B: l_main_run (main.c:490)
==147== by 0x49CC3F: l_main_run_with_signal (main.c:630)
==147== by 0x4049EC: main (main.c:614)
If an AP directed roam frame comes in while IWD is roaming its
still valuable to parse that frame and blacklist the BSS that
sent it.
This can happen most frequently during a roam scan while connected
to an overloaded BSS that is requesting IWD roams elsewhere.
If the BSS is requesting IWD roam elsewhere add this BSS to the
blacklist using BLACKLIST_REASON_ROAM_REQUESTED. This will lower
the chances of IWD roaming/connecting back to this BSS in the
future.
This then allows IWD to consider this blacklist state when picking
a roam candidate. Its undesireable to fully ban a roam blacklisted
BSS, so some additional sorting logic has been added. Prior to
comparing based on rank, BSS's will be sorted into two higher level
groups:
Above Threshold - BSS is above the RoamThreshold
Below Threshold - BSS is below the RoamThreshold
Within each of these groups the BSS may be roam blacklisted which
will position it at the bottom of the list within its respecitve
group.
This adds a new (less severe) blacklist reason as well as an option
to configure the timeout. This blacklist reason will be used in cases
where a BSS has requested IWD roam elsewhere. At that time a new
blacklist entry will be added which will be used along with some
other criteria to determine if IWD should connect/roam to that BSS
again.
Now that we have multiple blacklist reasons there may be situations
where a blacklist entry already exists but with a different reason.
This is going to be handled by the reason severity. Since we have
just two reasons we will treat a connection failure as most severe
and a roam requested as less severe. This leaves us with two
possible situations:
1. BSS is roam blacklisted, then gets connection blacklisted:
The reason will be "promoted" to connection blacklisted.
2. BSS is connection blacklisted, then gets roam blacklisted:
The blacklist request will be ignored
When pruning the list check_if_expired was comparing to the maximum
amount of time a BSS can be blacklisted, not if the current time had
exceeded the expirationt time. This results in blacklist entries
hanging around longer than they should, which would result in them
poentially being blacklisted even longer if there was another reason
to blacklist in the future.
Instead on prune check the actual expiration and remove the entry if
its expired. Doing this removes the need to check any of the times
in blacklist_contains_bss since prune will remove any expired entries
correctly.
To both prepare for some new blacklisting behavior and allow for
easier consolidation of the network-specific blacklist include a
reason enum for each entry. This allows IWD to differentiate
between multiple blacklist types. For now only the existing
"permanent" type is being added which prevents connections to that
BSS via autoconnect until it expires.
Allowing the timeout blacklist to be disabled has introduced a bug
where a failed connection will not result in the BSS list to be
traversed. This causes IWD to retry the same BSS over and over which
be either a) have some issue preventing a connection or b) may simply
be unreachable/out of range.
This is because IWD was inherently relying on the timeout blacklist
to flag BSS's on failures. With it disabled there was nothing to tell
network_bss_select that we should skip the BSS and it would return
the same BSS indefinitely.
To fix this some of the blacklisting logic was re-worked in station.
Now, a BSS will always get network blacklisted upon a failure. This
allows network.c to traverse to the next BSS upon failure.
For auth/assoc failures we will then only timeout blacklist under
certain conditions, i.e. the status code was not in the temporary
list.
Fixes: 77639d2d452e ("blacklist: allow configuration to disable the blacklist")
Certain use cases may not need or want this feature so allowing it to
be disabled is a much cleaner way than doing something like setting
the timeouts very low.
Now [Blacklist].InitialTimeout can be set to zero which will prevent
any blacklisting.
In addition some other small changes were added:
- Warn if the multiplier is 0, and set to 1 if so.
- Warn if the initial timeout exceeds the maximum timeout.
- Log if the blacklist is disabled
- Use L_USEC_PER_SEC instead of magic numbers.
SAE/WPA3 is completely broken on brcmfmac, at least without a custom
kernel patch which isn't included in many OS distributions. In order
to help with this add a driver quirk so devices with brcmfmac can
utilize WPA2 instead of WPA3 and at least connect to networks at
this capacity until the fix is more widely distributed.
Instead of just printing the PMKSA pointer separate this into two
separate debug messages, one for if the PMKSA exists and the other
if it does not. In addition print out the MAC of the AP so we have
a reference of which PMKSA this is.
With external auth there is no associate event meaning the auth proto
never gets freed, which prevents eapol from starting inside the
OCI callback. Check for this specific case and free the auth proto
after signaling that external auth has completed.
The user can now limit the size and count of PCAP files iwmon will
create. This allows iwmon to run for long periods of time without
filling up disk space.
This implements support for "rolling captures" by allowing iwmon to
limit the PCAP file size and number of PCAP's that are created.
This is a useful feature when long term monitoring is needed. If
there is some rare behavior requiring iwmon to run for days, months,
or longer the resulting PCAP file would become quite large and fill
up disk space.
When enabled (command line arguments in subsequent patch) the PCAP
file size is checked on each write. If it exceeds the limit a new
PCAP file will be created. Once the number of old PCAP files reaches
the set limit the oldest PCAP will be removed from disk.
For syncing iwmon captures with other logging its useful to
timestamp in some absolute format like UTC. This adds an
option which allows the user to specify what time format to
show. For now support:
delta - (default) The time delta between the first packet
and the current packet.
utc - The packet time in UTC
The ath10k driver has shown some performance issues, specifically
packet loss, when frame watches are registered with the multicast
RX flag set. This is relevant for DPP which registers for these
when DPP starts (if the driver supports it). This has only been
observed when there are large groups of clients all using the same
wifi channel so its unlikely to be much of an issue for those using
IWD/ath10k and DPP unless you run large deployments of clients.
But for large deployments with IWD/ath10k we need a way to disable
the multicast RX registrations. Now, with the addition of
wiphy_supports_multicast_rx we can both check that the driver
supports this as well as if its been disabled by the driver quirk.
This driver quirk and associated helper API lets other modules both
check if multicast RX is supported, and if its been disabled via
the driver quirk setting.
The actual connection piece of this is very minimal, and only
requires station to check if there is a PMKSA cached, and if so
include the PMKID in the RSNE. Netdev then takes care of the rest.
The remainder of this patch is the error handling if a PMKSA
connection fails with INVALID_PMKID. In this case IWD should retry
the same BSS without PMKSA.
An option was also added to disable PMKSA if a user wants to do
that. In theory PMKSA is actually less secure compared to SAE so
it could be something a user wants to disable. Going forward though
it will be enabled by default as its a requirement from the WiFi
alliance for WPA3 certification.
To prepare for PMKSA support station needs access to the handshake
object. This is because if PMKSA fails due to an expired/missing
PMKSA on the AP station should retry using the standard association.
This poses a problem currently because netdev frees the handshake
prior to calling the connect callback.
This was quite simple and only requiring caching the PMKSA after a
successful handshake, and using the correct authentication type
for connections if we have a prior PMKSA cached.
This is only being added for initial SAE associations for now since
this is where we gain the biggest improvement, in addition to the
requirement by the WiFi alliance to label products as "WPA3 capable"
This is needed in order to clear the PMKSA from the handshake state
without actually putting it back into the cache. This is something
that will be needed in case the AP rejects the association due to
an expired (or forgotten) PMKSA.
The majority of this patch was authored by Denis Kenzior, but
I have appended setting the PMK inside handshake_state_set_pmksa
as well as checking if the pmkid exists in
handshake_state_steal_pmkid.
Authored-by: Denis Kenzior <denkenz@gmail.com>
Authored-by: James Prestwood <prestwoj@gmail.com>
There are quite a few tests here for various scenarios and PMKSA
throws a wrench into that. Rather than potentially breaking the
tests in attempt to get them working with PMKSA, just disable PMKSA.
Since IWD doesn't utilize DBus signals in "normal" operations its
fine to lazy initialize any of the DBus interfaces since properties
can be obtained as needed with Get/GetAll.
For test-runner though StationDebug uses signals for debug events
and until the StationDebug class is initialized (via a method call
or property access) all signals will be lost. Fix this by always
initializing the StationDebug interface when a Device class is
initialized.
This adds a ref count to the handshake state object (as well as
ref/unref APIs). Currently IWD is careful to ensure that netdev
holds the root reference to the handshake state. Other modules do
track it themselves, but ensure that it doesn't get referenced
after netdev frees it.
Future work related to PMKSA will require that station holds a
references to the handshake state, specifically for retry logic,
after netdev is done with it so we need a way to delay the free
until station is also done.
The utilization rank factor already existed but was very rigid
and only checked a few values. This adds the (optional) ability
to start applying an exponentially decaying factor to both
utilization and station count after some threshold is reached.
This area needs to be re-worked in order to support very highly
loaded networks. If a network either doesn't support client
balancing or does it poorly its left up to the clients to choose
the best BSS possible given all the information available. In
these cases connecting to a highly loaded BSS may fail, or result
in a disconnect soon after connecting. In these cases its likely
better for IWD to choose a slightly lower RSSI/datarate BSS over
the conventionally 'best' BSS in order to aid in distributing
the network load.
The thresholds are currently optional and not enabled by default
but if set they behave as follows:
If the value is above the threshold it is mapped to an integer
between 0 and 30. (using a starting range of <value> - 255).
This integer is then used to index in the exponential decay table
to get a factor between 1 and 0. This factor is then applied to
the rank.
Note that as the value increases above the threshold the rank
will be increasingly effected, as is expected for an exponential
function. These option should be used with care as it may have
unintended consequences, especially with very high load networks.
i.e. you may see IWD roaming to BSS's with much lower signal if
there are high load BSS's nearby.
To maintain the existing behavior if there is no utilization
factor set in main.conf the legacy thresholds/factors will be
used.
This is copied from network.c that uses a static table to lookup
exponential decay values by index (generated from 1/pow(n, 0.3)).
network.c uses this for network ranking but it can be useful for
BSS ranking as well if you need to apply some exponential backoff
to a value.
This has been needed elsewhere but generally shortcuts could be
taken mapping with ranges starting/ending with zero. This is a
more general linear mapping utility to map values between any
two ranges.
gcc-15 switched to -std=c23 by default:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=55e3bd376b2214e200fa76d12b67ff259b06c212
As a result `iwd` fails the build as:
../src/crypto.c:1215:24: error: incompatible types when returning type '_Bool' but 'struct l_ecc_point *' was expected
1215 | return false;
| ^~~~~
Signed-off-by: Rudi Heitbaum <rudi@heitbaum.com>
gcc-15 switched to -std=c23 by default:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=55e3bd376b2214e200fa76d12b67ff259b06c212
As a result `iwd` fails the build as:
wired/ethdev.c: In function 'pae_open':
wired/ethdev.c:340:55:
error: passing argument 4 of 'l_io_set_read_handler'
from incompatible pointer type [-Wincompatible-pointer-types]
340 | l_io_set_read_handler(pae_io, pae_read, NULL, pae_destroy);
| ^~~~~~~~~~~
| |
| void (*)(void)
In file included from ...-ell-0.70-dev/include/ell/ell.h:19,
from wired/ethdev.c:38:
...-ell-0.70-dev/include/ell/io.h:33:68:
note: expected 'l_io_destroy_cb_t' {aka 'void (*)(void *)'}
but argument is of type 'void (*)(void)'
33 | void *user_data, l_io_destroy_cb_t destroy);
| ~~~~~~~~~~~~~~~~~~^~~~~~~
C23 changed the meaning of `void (*)()` from partially defined prototype
to `void (*)(void)`.
The 3rd byte of the country code was being printed as ASCII but this
byte isn't always a printable character. Instead we can check what
the value is and describe what it means from the spec.
These frequencies were seen being advertised by a driver and IWD has
no operating class/channel mapping for them. Specifically 5960 was
causing issues due to a few bugs and mapping to channel 2 of the 6ghz
band. Those bugs have now been resolved.
If these frequencies can be supported in a clean manor we can remove
this test, but until then ensure IWD does not parse them.
After the band is established we check the e4 table for the channel
that matches. The problem here is we will end up checking all the
operating classes, even those that are not within the band that was
determined. This could result in false positives and return a
channel that doesn't make sense.
When the frequencies/channels were parsed there was no check that the
resulting band matched what was expected. Now, pass the band object
itself in which has the band set to what is expected.
If IPv6 is disabled or not supported at the kernel level writing the
sysfs settings will fail. A few of them had a support check but this
patch adds a supported bool to the remainder so we done get errors
like:
Unable to write drop_unsolicited_na to /proc/sys/net/ipv6/conf/wlan0/drop_unsolicited_na
Similar to several other modules DPP registers for its frame
watches on init then ignores anything is receives unless DPP
is actually running.
Due to some recent issues surrounding ath10k and multicast frames
it was discovered that simply registering for multicast RX frames
causes a significant performance impact depending on the current
channel load.
Regardless of the impact to a single driver, it is actually more
efficient to only register for the DPP frames when DPP starts
rather than when IWD initializes. This prevents any of the frames
from hitting userspace which would otherwise be ignored.
Using the frame-xchg group ID's we can only register for DPP
frames when needed, then close that group and the associated
frame watches.
DPP optionally uses the multicast RX flag for frame registrations but
since frame-xchg did not support that, it used its own registration
internally. To avoid code duplication within DPP add a flag to
frame_watch_add in order to allow DPP to utilize frame-xchg.
The selection loop was choosing an initial candidate purely for
use of the "fallback_to_blacklist" flag. But we have a similar
case with OWE transitional networks where we avoid the legacy
open network in preference for OWE:
/* Don't want to connect to the Open BSS if possible */
if (!bss->rsne)
continue;
If no OWE network gets selected we may iterate all BSS's and end
the loop, which then returns NULL.
To fix this move the blacklist check earlier and still ignore any
BSS's in the blacklist. Also add a new flag in the selection loop
indicating an open network was skipped. If we then exhaust all
other BSS's we can return this candidate.
Some drivers like brcmfmac don't support OWE but from userspace its
not possible to query this information. Rather than completely
blacklist brcmfmac we can allow the user to configure this and
disable OWE in IWD.
The "UK" alpha2 code is not the official code for the United Kingdom
but is a "reserved" code for compatibility. The official alpha2 is
"GB" which is being added to the EU list. This fixes issues parsing
neighbor reports, for example:
src/station.c:parse_neighbor_report() Neighbor report received for xx:xx:xx:xx:xx:xx: ch 136 (oper class 3), MD not set
Failed to find band with country string 'GB 32' and oper class 3, trying fallback
src/station.c:station_add_neighbor_report_freqs() Ignored: unsupported oper class
Test handling of technically illegal but harmless cloned IEs.
Based on real traffic captured from retail APs.
As cloned IEs are now allowed the
"/IE order/Bad (Duplicate + Out of Order IE) 1"
test payload has been altered to be more-wrong so it still fails
verification as expected.
Prior to adding the polling fallback this code path was only used for
signal level list notifications and netdev_rssi_polling_update() was
structured as such, where if the RSSI list feature existed there was
nothing to be done as the kernel handled the notifications.
For certain mediatek cards this is broken, hence why the fallback was
added. But netdev_rssi_polling_update() was never changed to take
this into account which bypassed the timer cleanup on disconnections
resulting in a crash when the timer fired after IWD was disconnected:
iwd: ++++++++ backtrace ++++++++
iwd: #0 0x7b5459642520 in /lib/x86_64-linux-gnu/libc.so.6
iwd: #1 0x7b54597aedf4 in /lib/x86_64-linux-gnu/libc.so.6
iwd: #2 0x49f82d in l_netlink_message_append() at ome/jprestwood/iwd/ell/netlink.c:825
iwd: #3 0x4a0c12 in l_genl_msg_append_attr() at ome/jprestwood/iwd/ell/genl.c:1522
iwd: #4 0x405c61 in netdev_rssi_poll() at ome/jprestwood/iwd/src/netdev.c:764
iwd: #5 0x49cce4 in timeout_callback() at ome/jprestwood/iwd/ell/timeout.c:70
iwd: #6 0x49c2ed in l_main_iterate() at ome/jprestwood/iwd/ell/main.c:455 (discriminator 2)
iwd: #7 0x49c3bc in l_main_run() at ome/jprestwood/iwd/ell/main.c:504
iwd: #8 0x49c5f0 in l_main_run_with_signal() at ome/jprestwood/iwd/ell/main.c:632
iwd: #9 0x4049ed in main() at ome/jprestwood/iwd/src/main.c:614
iwd: #10 0x7b5459629d90 in /lib/x86_64-linux-gnu/libc.so.6
iwd: #11 0x7b5459629e40 in /lib/x86_64-linux-gnu/libc.so.6
iwd: +++++++++++++++++++++++++++
To fix this we need to add checks for the cqm_poll_fallback flag in
netdev_rssi_polling_update().
Certain FullMAC drivers do not expose CMD_ASSOCIATE/CMD_AUTHENTICATE,
but lack the ability to fully offload SAE connections to the firmware.
Such connections can still be supported on such firmware by using
CMD_EXTERNAL_AUTH & CMD_FRAME. The firmware sets the
NL80211_FEATURE_SAE bit (which implies support for CMD_AUTHENTICATE, but
oh well), and no other offload extended features.
When CMD_CONNECT is issued, the firmware sends CMD_EXTERNAL_AUTH via
unicast to the owner of the connection. The connection owner is then
expected to send SAE frames with the firmware using CMD_FRAME and
receive authenticate frames using unicast CMD_FRAME notifications as
well. Once SAE authentication completes, userspace is expected to
send a final CMD_EXTERNAL_AUTH back to the kernel with the corresponding
status code. On failure, a non-0 status code should be used.
Note that for historical reasons, SAE AKM sent in CMD_EXTERNAL_AUTH is
given in big endian order, not CPU order as is expected!
The TX or RX bitrate attributes can contain zero nested attributes.
This causes netdev_parse_bitrate() to fail, but this shouldn't then
cause the overall parsing to fail (we just don't have those values).
Fix this by continuing to parse attributes if either the TX/RX
bitrates fail to parse.
If the affinity watch is removed by setting an empty list the
disconnect callback won't be called which was the only place
the watch ID was cleared. This resulted in the next SetProperty call
to think a watch existed, and attempt to compare the sender address
which would be NULL.
The watch ID should be cleared inside the destroy callback, not
the disconnect callback.
If we scan a huge number of frequencies the PKEX timeout can get
rather large. This was overlooked in a prior patch who's intent
was to reduce the PKEX time, but in these cases it increased it.
Now the timeout will be capped at 2 minutes, but will still be
as low as 10 seconds for a single frequency.
In addition there was no timer reset once PKEX was completed.
This could cause excessive waits if, for example, the peer left
the channel mid-authentication. IWD would just wait until the
long PKEX timeout to eventually reset DPP. Once PKEX completes
we can assume that this peer will complete authentication quickly
and if not, we can fail.
While there is proper handling for a regdom update during a
TRIGGER_SCAN scan, prior to NEW_SCAN_RESULTS there is no such
handling if the regdom update comes in during a GET_SCAN or
GET_SURVEY.
In both the 6ghz and non-6ghz code paths we have some issues:
- For non-6ghz devices, or regdom updates that did not enable
6ghz the wiphy state watch callback will automatically issues
another GET_SURVEY/GET_SCAN without checking if there was
already one pending. It does this using the current scan request
which gets freed by the prior GET_SCAN/GET_SURVEY calls when
they complete, causing invalid reads when the subsequent calls
finish.
- If 6ghz was enabled by the update we actually append another
trigger command to the list and potentially run it if its the
current request. This also will end up in the same situation as
the request is freed by the pending GET_SURVEY/GET_SCAN calls.
For the non-6ghz case there is little to no harm in ignoring the
regdom update because its very unlikely it changed the allowed
frequencies.
For the 6ghz case we could potentially handle the new trigger scan
within get_scan_done, but thats beyond the scope of this change
and is likely quite intrusive.
Since surveys end up making driver calls in the kernel its not
entirely known how they are implemented or how long they will
take. For this reason the survey will be skipped if getting the
results from an external scan.
Doing this also fixes a crash caused by external scans where the
scan request pointer is not checked and dereferenced:
0x00005ffa6a0376de in get_survey_done (user_data=0x5ffa783a3f90) at src/scan.c:2059
0x0000749646a29bbd in ?? () from /usr/lib/libell.so.0
0x0000749646a243cb in ?? () from /usr/lib/libell.so.0
0x0000749646a24655 in l_main_iterate () from /usr/lib/libell.so.0
0x0000749646a24ace in l_main_run () from /usr/lib/libell.so.0
0x0000749646a263a4 in l_main_run_with_signal () from /usr/lib/libell.so.0
0x00005ffa6a00d642 in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:614
Reported-by: Daniel Bond <danielbondno@gmail.com>
With the introduction of affinities the CQM threshold can be toggled
by a DBus call. There was no check if there was already a pending
call which would cause the command ID to be overwritten and lose any
potential to cancel it, e.g. if netdev went down.
Some drivers fail to set a CQM threshold and report not supported.
Its unclear exactly why but if this happens roaming is effectively
broken.
To work around this enable RSSI polling if -ENOTSUP is returned.
The polling callback has been changed to emit the HIGH/LOW signal
threshold events instead of just the RSSI level index, just as if
a CQM event came from the kernel.
When the affinity is set to the current BSS lower the roaming
threshold to loosly lock IWD to the current BSS. The lower
threshold is automatically removed upon roaming/disconnection
since the affinity array is also cleared out.
This property will hold an array of object paths for
BasicServiceSet (BSS) objects. For the purpose of this patch
only the setter/getter and client watch is implemented. The
purpose of this array is to guide or loosely lock IWD to certain
BSS's provided that some external client has more information
about the environment than what IWD takes into account for its
roaming decisions.
For the time being, the array is limited to only the connected
BSS path, and any roams or disconnects will clear the array.
The intended use case for this is if the device is stationary
an external client could reduce the likelihood of roaming by
setting the affinity to the current BSS.
This documents new DBus property that expose a bit more control to
how IWD roams.
Setting the affinity on the connected BSS effectively "locks" IWD to
that BSS (except at critical RSSI levels, explained below). This can
be useful for clients that have access to more information about the
environment than IWD. For example, if a client is stationary there
is likely no point in trying to roam until it has moved elsewhere.
A new main.conf option would also be added:
[General].CriticalRoamThreshold
This would be the new roam threshold set if the currently connected
BSS is in the Affinities list. If the RSSI continues to drop below
this level IWD will still attempt to roam.
A user reported a crash which was due to the roam trigger timeout
being overwritten, followed by a disconnect. Post-disconnect the
timer would fire and result in a crash. Its not clear exactly where
the overwrite was happening but upon code inspection it could
happen in the following scenario:
1. Beacon loss event, start roam timeout
2. Signal low event, no check if timeout is running and the timeout
gets overwritten.
The reported crash actually didn't appear to be from the above
scenario but something else, so this logic is being hardened and
improved
Now if a roam timeout already exists and trying to be rearmed IWD
will check the time remaining on the current timer and either keep
the active timer or reschedule it to the lesser of the two values
(current or new rearm time). This will avoid cases such as a long
roam timer being active (e.g. 60 seconds) followed by a beacon or
packet loss event which should trigger a more agressive roam
schedule.
This adds a secondary set of signal thresholds. The purpose of these
are to provide more flexibility in how IWD roams. The critical
threshold is intended to be temporary and is automatically reset
upon any connection changes: disconnects, roams, or new connections.
This prepares for the ability to toggle between two signal
thresholds in netdev. Since each netdev may not need/want the
same threshold store it in the netdev object rather than globally.
Since IWD enrollees can send unicast frames, a PKEX configurator could
still run without multicast support. Using this combination basically
allows any driver to utilize DPP/PKEX assuming the MAC address can
be communicated using some out of band mechanism.
The DPP spec allows for obtaining frequency and MAC addresses up
to the implementation. IWD already takes advantage of this by
first scanning for nearby APs and using only those frequencies.
For further optimization an enrollee may be able to determine the
configurators frequency and MAC ahead of time which would make
finding the configurator much faster.
This will help to get rid of magic number use throughout the project.
The definitions should be limited to global magic numbers that are used
throughout the project, for example SSID length, MAC address length,
etc.
Due to an unnoticed bug after adding the BasicServiceSet object into
network, it became clear that since station already owns the scan_bss
objects it makes sense for it to manage the associated DBus objects
as well. This way network doesn't have to jump through hoops to
determine if the scan_bss object was remove, added, or updated. It
can just manage its list as it did prior.
From the station side this makes things very easy. When scan results
come in we either update or add a new DBus object. And any time a
scan_bss is freed we remove the DBus object.
To reduce code duplication and prepare for moving the BSS interface
to station, add a new API so station can create a BSS path without
a network object directly.
src/eapol.c:1041:9: error: ‘buf’ may be used uninitialized [-Werror=maybe-uninitialized]
1041 | l_put_be16(0, &frame->header.packet_len);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This warning is bogus since the buffer is initialized through use of
eapol_frame members. EAPoL-Start is a very simple frame.
This was seemingly trivial at face value but doing so ended up
pointing out a bug with how group_retry is set when forcing
the default group. Since group_retry is initialized to -1 the
increment in the force_default_group block results in it being
set to zero, which is actually group 20, not 19. This did not
matter for hunt and peck, but H2E actually uses the retry value
to index its pre-generated points which then breaks SAE if
forcing the default group with H2E.
To handle H2E and force_default_group, the group selection
logic will always begin iterating the group array regardless of
SAE type.
The property itself is an array of paths, but this is difficult
to fit nicely within a terminal. Instead just display the count
of BSS's. Displaying a detailed list of BSS's will be done via
a separate command.
There are certain cases where we may not want to display the entire
header for a given set of properties. For example displaying a list
of proxy interfaces. Add finer control by separating out the header
and the prop/value display into two functions.
This will tell network the BSS list is being updated and it can
act accordingly as far as the BSS DBus registrations/unregistration.
In addition any scan_bss object needing to be freed has to wait
until after network_bss_stop_update() because network has to be able
to iterate its old list and unregister any BSS's that were not seen
in the scan results. This is done by pushing each BSS needing to be
freed into a queue, then destroying them after the BSS's are all
added.
This adds a new DBus object/interface for tracking BSS's for
a given network. Since scanning replaces scan_bss objects some
new APIs were added to avoid tearing down the associated DBus
object for each BSS.
network_bss_start_update() should be called before any new BSS's
are added to the network object. This will keep track of the old
list and create a new network->bss_list where more entries can
be added. This is effectively replacing network_bss_list_clear,
except it keeps the old list around until...
network_bss_stop_update() is called when all BSS's have been
added to the network object. This will then iterate the old list
and lookup if any BSS DBus objects need to be destroyed. Once
completed the old list is destroyed.
iwd supports FILS only on softmac drivers. Ensure the capability check
is consistent between wiphy and netdev, both the softmac and the
relevant EXT_FEATURE bit must be checked.
CMD_EXTERNAL_AUTH could potentially be used for FILS for FullMAC cards,
but no hardware supporting this has been identified yet.
Somehow this ability was lost in the refactoring. OWE was intended to
be used on fullmac cards, but the state machine is only actually created
if the connection type ends up being softmac.
Fixes: 8b6ad5d3b9ec ("owe: netdev: refactor to remove OWE as an auth-proto")
Certain flags (for example, NLA_F_NESTED) are ORed with the netlink
attribute type identifier prior to being sent on the wire. Such flags
need to be masked off and not taken into consideration when attribute
type is being compared against known values.
The workaround for Cisco APs reporting an operating class of zero
is still a bug that remains in Cisco equipment. This is made even
worse with the introduction of 6GHz where the channel numbers
overlap with both 2.4 and 5GHz bands. This makes it impossible to
definitively choose a frequency given only a channel number.
To improve this workaround and cover the 6GHz band we can calculate
a frequency for each band and see what is successful. Then append
each frequency we get to the list. This will result in more
frequencies scanned, but this tradeoff is better than potentially
avoiding a roam to 6GHz or high order 5ghz channel numbers.
[denkenz@archdev ~]$ qemu-system-x86_64 --version
QEMU emulator version 9.0.1
Copyright (c) 2003-2024 Fabrice Bellard and the QEMU Project developers
QEMU now seems to complain that 'no-hpet' and 'no-acpi' command line
arguments are unrecognized.
l_genl class has nice ways of discovering and requesting families. The
genl functionality has been added after the iwmon skeleton was created,
but it is now time to migrate to using these APIs.
This attribute is actually an array of signed 32 bit integers and it
was being treated as a single integer. This would work until more
than one threshold was set, then it would fail to parse it.
After the station state changes in DPP setting autoconnect=True was
causing DPP to stop prior to being able to scan for the network.
Instead we can start autoconnect earlier so we aren't toggling the
property while DPP is running.
Prior to now the DPP state was required to be disconnected before
DPP would start. This is inconvenient for the user since it requires
extra state checking and/or DBus method calls. Instead model this
case like WSC and issue a disconnect to station if DPP is requested
to start.
The other conditions on stopping DPP are also preserved and no
changes to the configurator role have been made, i.e. being
disconnected while configuring still stops DPP. Similarly any
connection made during enrolling will stop DPP.
It should also be noted that station's autoconfigure setting is also
preserved and set back to its original value upon DPP completing.
Gets the current autoconenct setting. This is not the current
autoconnect state. Will be used in DPP to reset station's autoconnect
setting back to what it was prior to DPP, in case of failure.
In order to slightly rework the DPP state machine to handle
automatically disconnecting (for enrollees) functions need to be
created that isolate everything needed to start DPP/PKEX in case
a disconnect needs to be done first.
If valgrind didn't report any issues, don't dump the logs. This
makes the test run a lot easier to look at without having to scroll
through pages of valgrind logs that provide no value.
When the survey code was added it neglected to add the same
cancelation logic that existed for the GET_SCAN call, i.e. if
a scan was canceled and there was a pending GET_SURVEY to the
kernel that needs to be canceled, and the request cleaned up.
Fixes: 35808debae ("scan: use GET_SURVEY for SNR calculation in ranking")
IPv6 address entry was not updated to use display_table_row which led to
a shifted line in table, as shown below:
$ iwctl station wlan0 show | head | sed 's| |.|g'
.................................Station:.wlan0................................
--------------------------------------------------------------------------------
..Settable..Property..............Value..........................................
--------------------------------------------------------------------------------
............Scanning..............no...............................................
............State.................connected........................................
............Connected.network.....Clannad.Legacy...................................
............IPv4.address..........192.168.1.12.....................................
............IPv6.address........fdc3:541d:864f:0:96db:c9ff:fe36:b15............
............ConnectedBss..........cc:d8:43:77:91:0e................................
This patch aligns IPv6 address line with other lines in the table.
Fixes: 35dd2c08219a (client: update station to use display_table_row, 2022-07-07)
testEncryptedProfiles:
- This would occationally fail because the test is expecting
to explicitly connect but after the first failed connection
autoconnect takes over and its a race to connect.
testPSK-roam:
- Several rules were not being cleaned up which could cause
tests afterwards to fail
- The AP roam test started failing randomly because of the SNR
ranking changes. It appears that with hwsim _sometimes_ the
SNR is able to be determined which can effect the ranking. This
test assumed the two BSS's would be the same ranking but the
SNR sometimes causes this to not be true.
The wait_for_event() function allows past events to cause this
function to return immediately. This behavior is known, and
relied on for some tests. But in some cases you want to only
handle _new_ events, so we need a way to clear out prior events.
This event is not used anywhere and can be leveraged in autotesting.
Move the event to eapol_start() so it gets called unconditionally
when the 4-way handshake is started.
If a disconnect arrives at any point during the 4-way handshake or
key setting this would result in netdev sending a disconnect event
to station. If this is a reassociation this case is unhandled in
station and causes a hang as it expects any connection failure to
be handled via the reassociation callback, not a random disconnect
event.
To handle this case we can utilize netdev_disconnected() along with
the new NETDEV_RESULT_DISCONNECTED result to ensure the connect
callback gets called if it exists (indicating a pending connection)
Below are logs showing the "Unexpected disconnect event" which
prevents IWD from cleaning up its state and ultimately results in a
hang:
Jul 16 18:16:13: src/station.c:station_transition_reassociate()
Jul 16 18:16:13: event: state, old: connected, new: roaming
Jul 16 18:16:13: src/wiphy.c:wiphy_radio_work_done() Work item 65 done
Jul 16 18:16:13: src/wiphy.c:wiphy_radio_work_next() Starting work item 66
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
Jul 16 18:16:13: src/netdev.c:netdev_deauthenticate_event()
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification New Station(19)
Jul 16 18:16:13: src/station.c:station_netdev_event() Associating
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
Jul 16 18:16:13: src/netdev.c:netdev_authenticate_event()
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Associate(38)
Jul 16 18:16:13: src/netdev.c:netdev_associate_event()
Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Connect(46)
Jul 16 18:16:13: src/netdev.c:netdev_connect_event()
Jul 16 18:16:13: src/netdev.c:netdev_connect_event() aborting and ignore_connect_event not set, proceed
Jul 16 18:16:13: src/netdev.c:netdev_connect_event() expect_connect_failure not set, proceed
Jul 16 18:16:13: src/netdev.c:parse_request_ies()
Jul 16 18:16:13: src/netdev.c:netdev_connect_event() Request / Response IEs parsed
Jul 16 18:16:13: src/netdev.c:netdev_get_oci()
Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:13: src/netdev.c:netdev_get_oci_cb() Obtained OCI: freq: 5220, width: 3, center1: 5210, center2: 0
Jul 16 18:16:13: src/eapol.c:eapol_start()
Jul 16 18:16:13: src/netdev.c:netdev_unicast_notify() Unicast notification Control Port Frame(129)
Jul 16 18:16:13: src/netdev.c:netdev_control_port_frame_event()
Jul 16 18:16:13: src/eapol.c:eapol_handle_ptk_1_of_4() ifindex=6
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Control Port TX Status(139)
Jul 16 18:16:14: src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
Jul 16 18:16:14: src/netdev.c:netdev_cqm_event() Signal change event (above=1 signal=-60)
Jul 16 18:16:17: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:17: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
Jul 16 18:16:17: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
Jul 16 18:16:17: src/netdev.c:netdev_deauthenticate_event()
Jul 16 18:16:17: src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
Jul 16 18:16:17: src/netdev.c:netdev_disconnect_event()
Jul 16 18:16:17: Received Deauthentication event, reason: 15, from_ap: true
Jul 16 18:16:17: src/wiphy.c:wiphy_radio_work_done() Work item 66 done
Jul 16 18:16:17: src/station.c:station_disconnect_event() 6
Jul 16 18:16:17: Unexpected disconnect event
Jul 16 18:16:17: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:17: src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
Jul 16 18:16:17: src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX
After adding the NETDEV_RESULT_DISCONNECTED enum, handshake failures
initiated by the AP come in via this result so the existing logic
to call network_connect_failed() was broken. We could still get a
handshake failure generated internally, so that has been preserved
(via NETDEV_RESULT_HANDSHAKE_FAILED) but a check for a 4-way
handshake timeout reason code was also added.
This new event is sent during a connection if netdev recieves a
disconnect event. This patch cleans up station to handle this
case and leave the existing NETDEV_EVENT_DISCONNECTED_BY_{AP,SME}
handling only for CONNECTED, NETCONFIG, and FW_ROAMING states.
This new result is meant to handle cases where a disconnect
event (deauth/disassoc) was received during an ongoing connection.
Whether that's during authentication, association, the 4-way
handshake, or key setting.
src/p2putil.c: In function 'p2p_get_random_string':
src/p2putil.c:2641:37: error: initializer element is not constant 2641 |
static const int set_size = strlen(CHARSET); |
^~~~~~
This code path is not exercised in the autotest but commonly does
happen in the real world. There is no associated bug with this, but
its helpful to have this event triggered in case something got
introduced in the future.
The authenticating event was not used anymore and the associating
event use was questionable (after the CMD_CONNECT callback).
No other modules actually utilize these events but they are useful
for autotests. Move these events around to map 1:1 when the kernel
sends the auth/assoc events.
There are a few values which are nice to see in debug logs. Namely
the BSS load and SNR. Both of these values may not be available
either due to the AP or local hardware limiations. Rather than print
dummy values for these refactor the print so append the values only
if they are set in the scan result.
For ranking purposes the utilization was defaulted to a valid (127)
which would not change the rank if that IE was not found in the
scan results. Historically this was printed (debug) as part of the
scan results but that was removed as it was somewhat confusing. i.e.
did the AP _really_ have a utilization of 127? or was the IE not
found?
Since it is useful to see the BSS load if that is advertised add a
flag to the scan_bss struct to indicate if the IE was present which
can be checked.
This issues a GET_SURVEY dump after scan results are available and
populates the survey information within the scan results. Currently
the only value obtained is the noise for a given frequency but the
survey results structure was created if in the future more values
need to be added.
From the noise, the SNR can be calculated. This is then used in the
ranking calculation to help lower BSS ranks that are on high noise
channels.
Parsing the flush flag for external scans was not done correctly
as it was not parsing the ATTR_SCAN_FLAGS but instead the flag
bitmap. Fix this by parsing the flags attribute, then checking if
the bit is set.
Add a nested attribute parser. For the first supported attribute
add NL80211_ATTR_SURVEY_INFO.
This allows parsing of nested attributes in the same convenient
way as nl80211_parse_attrs but allows for support of any level of
nested attributes provided that a handler is added for each.
To prep for adding a _nested() variant of this function refactor
this to act on an l_genl_attr object rather than the message itself.
In addition a handler specific to the attribute being parsed is
now passed in, with the current "handler_for_type" being renamed to
"handler_for_nl80211" that corresponds to root level attributes.
This warning is guaranteed to happen for SAE networks where there are
multiple netdev_authenticate_events. This should just be a check so
we don't register eapol twice, not a warning.
EAP-TTLS Start packets are empty by default, but can still be sent with
the L flag set. When attempting to reassemble a message we should not
fail if the length of the message is 0, and just treat it as any other
unfragmented message with the L flag set.
This test uses the same country/country3 values seen by an AP vendor
which causes issues with IWD. The alpha2 is ES (Spain) and the 3rd
byte is 4, indicating to use the E-4. The issue then comes when the
neighbor report claims the BSS is under operating class 3 which is
not part of E-4.
With the fallback implemented, this test will pass since it will
try and lookup only on ES (the EU table) which operating class 3 is
part of.
Its been seen that some vendors incorrectly set the 3rd byte of the
country code which causes the band lookup to fail with the provided
operating class. This isn't compliant with the spec, but its been
seen out in the wild and it causes IWD to behave poorly, specifically
with roaming since it cannot parse neighbor reports. This then
requires IWD to do a full scan on each roam.
Instead of a hard rejection, IWD can instead attempt to determine
the band by ignoring that 3rd byte and only use the alpha2 string.
This makes IWD slightly less strict but at the advantage of not being
crippled when exposed to poor AP configurations.
This was added to support a single buggy AP model that failed to
negotiate the SAE group correctly. This may still be a problem but
since then the [Network].UseDefaultEccGroup option has been added
which accomplishes the same thing.
Remove the special handling for this specific OUI and rely on the
user setting the new option if they have problems.
Both ell/shared and ell/internal targets first create the ell/
directory within IWD. This apparently was just luck that one of
these always finished first in parallel builds. On my system at
least when building using dpkg-buildpackage IWD fails to build
due to the ell/ directory missing. From the logs it appears that
both the shared/internal targets were started but didn't complete
(or at least create the directory) before the ell/ell.h target:
make[1]: Entering directory '/home/jprestwood/tmp/iwd'
/usr/bin/mkdir -p ell
/usr/bin/mkdir -p ell
echo -n > ell/ell.h
/usr/bin/mkdir -p src
/bin/bash: line 1: ell/ell.h: No such file or directory
make[1]: *** [Makefile:4028: ell/ell.h] Error 1
Creating the ell/ directory within the ell/ell.h target solve
the issue. For reference this is the configure command dpkg
is using:
./configure --build=x86_64-linux-gnu \
--prefix=/usr \
--includedir=/usr/include \
--mandir=/usr/share/man \
--infodir=/usr/share/info \
--sysconfdir=/etc \
--localstatedir=/var \
--disable-option-checking \
--disable-silent-rules \
--libdir=/usr/lib/x86_64-linux-gnu \
--runstatedir=/run \
--disable-maintainer-mode \
--disable-dependency-tracking \
--enable-tools \
--enable-dbus-policy
Experimental AP-mode support for receiving a Confirm frame when in the
COMMITTED state. The AP will reply with a Confirm frame.
Note that when acting as an AP, on reception of a Commit frame, the AP
only replies with a Commit frame. The protocols allows to also already
send the Confirm frame, but older clients may not support simultaneously
receiving a Commit and Confirm frame.
Don't mark either client as being the authenticator. In the current unit
tests, both instances act as clients to test functionality. This ensures
the unit does not show an error during the following commits where SAE
for AP mode is added.
This was overlooked in a prior patch and causes warnings to be
printed when the RSSI is too low to estimate an HE data rate or
due to incompatible local capabilities (e.g. MCS support).
Similar to the other estimations, return -ENETUNREACH if the IE
was valid but incompatible.
If the RSSI is too low or the local capabilities were not
compatible to estimate the rate don't warn but instead treat
this the same as -ENOTSUP and drop down to the next capability
set.
If we register the main EAPOL frame listener as late as the associate
event, it may not observe ptk_1_of_4. This defeats handling for early
messages in eapol_rx_packet, which only sees messages once it has been
registered.
If we move registration to the authenticate event, then the EAPOL
frame listeners should observe all messages, without any possible
races. Note that the messages are not actually processed until
eapol_start() is called, and we haven't moved that call site. All
that's changing here is how early EAPOL messages can be observed.
netdev_disconnect() was unconditionally sending CMD_DISCONNECT which
is not the right behavior when IWD has not associated. This means
that if a connection was started then immediately canceled with
the Disconnect() method the kernel would continue to authenticate.
Instead if IWD has not yet associated it should send a deauth
command which causes the kernel to correctly cleanup its state and
stop trying to authenticate.
Below are logs showing the behavior. Autoconnect is started followed
immediately by a DBus Disconnect call, yet the kernel continues
sending authenticate events.
event: state, old: autoconnect_quick, new: connecting (auto)
src/scan.c:scan_cancel() Trying to cancel scan id 1 for wdev 7d
src/wiphy.c:wiphy_radio_work_done() Work item 1 done
src/wiphy.c:wiphy_radio_work_next() Starting work item 2
src/netdev.c:netdev_mlme_notify() MLME notification New Station(19)
src/station.c:station_dbus_disconnect()
src/station.c:station_reset_connection_state() 85
src/station.c:station_roam_state_clear() 85
event: state, old: connecting (auto), new: disconnecting
src/wiphy.c:wiphy_radio_work_done() Work item 2 done
src/station.c:station_connect_cb() 85, result: 5
src/station.c:station_disconnect_cb() 85, success: 1
event: state, old: disconnecting, new: disconnected
src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
src/netdev.c:netdev_authenticate_event()
Unexpected connection related event -- is another supplicant running?
src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
src/netdev.c:netdev_authenticate_event()
Unexpected connection related event -- is another supplicant running?
src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
src/netdev.c:netdev_authenticate_event()
Unexpected connection related event -- is another supplicant running?
src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
src/netdev.c:netdev_authenticate_event()
Unexpected connection related event -- is another supplicant running?
In most cases any failure here is likely just due to the AP not
supporting the feature, whether its HE/VHT/HE. This should result
in the estimation returning -ENOTSUP in which case we move down
the list. Any other non-zero return we will now warn to make it
clear the IEs did exist, but were not properly formatted.
All length check failures were changed to continue instead of
fail. This will now treat invalid lengths as if the IE did not
exist.
In addition HE specifically has an extra validation function which,
if failed, was bailing out of the estimation function entirely.
Instead this is now treated as if there was no HE capabilities and
the logic can move down to VHT, HT, or basic rates.
This was changed from too large of a mask (0xff) in an earlier
commit but was masking 5 bits instead of 6.
Fixes: 121c2c5653 ("monitor: properly mask HE capabilities bitfield")
Caught by static analysis, the dev->conn_peer pointer was being
dereferenced very early on without a NULL check, but further it
was being NULL checked. If there is a possibility of it being NULL
the check should be done much earlier.
If the test needs to do something very specific it may be useful to
prevent IWD from managing all the radios. This can now be done
by setting a "reserve" option in the radio settings. The value of
this should be something other than iwd, hostapd, or wpa_supplicant.
For example:
[rad1]
reserve=false
Caught by static analysis, if ATTR_MAC was not in the message there
would be a memcpy with uninitialized bytes. In addition there is no
reason to memcpy twice. Instead 'mac' can be a const pointer which
both verifies it exists and removes the need for a second memcpy.
Static analysis complains that 'last' could be NULL which is true.
This really could only happen if every frequency was disabled which
likely is impossible but in any case, check before dereferencing
the pointer.
Since these are all stack variables they are not zero initialized.
If parsing fails there may be invalid pointers within the structures
which can get dereferenced by p2p_clear_*
The input queue pointer was being initialized unconditionally so if
parsing fails the out pointer is still set after the queue is
destroyed. This causes a crash during cleanup.
Instead use a temporary pointer while parsing and only after parsing
has finished do we set the out pointer.
Reported-By: Alex Radocea <alex@supernetworks.org>
When HUP is received the IO read callback was never completing which
caused it to block indefinitely until waited for. This didn't matter
for most transient processes but for IWD, hostapd, wpa_supplicant
it would cause test-runner to hang if the process crashed.
Detecting a crash is somewhat hacky because we have no process
management like systemd and the return code isn't reliable as some
processes return non-zero under normal circumstances. So to detect
a crash the process output is being checked for the string:
"++++++++ backtrace ++++++++". This isn't 100% reliable obviously
since its dependent on how the binary is compiled, but even if the
crash itself isn't detected any test should still fail if written
correctly.
Doing this allows auto-tests to handle IWD crashes gracefully by
failing the test, printing the exception (event without debugging)
and continue with other tests.
The slaac_test was one that would occationally fail, but very rarely,
due to the resolvconf log values appearing in an unexpected order.
This appears to be related to a typo in netconfig-commit which would
not set netconfig-domains and instead set dns_list. This was fixed
with a pending patch:
https://lore.kernel.org/iwd/20240227204242.1509980-1-denkenz@gmail.com/T/#u
But applying this now leads to testNetconfig failing slaac_test
100% of the time.
I'm not familiar enough with resolveconf to know if this test change
is ok, but based on the test behavior the expected log and disk logs
are the same, just in the incorrect order. I'm not sure if this the
log order is deterministic so instead the check now iterates the
expected log and verifies each value appears once in the resolvconf
log.
Here is an example of the expected vs disk logs after running the
test:
Expected:
-a wlan1.dns
nameserver 192.168.1.2
nameserver 3ffe:501:ffff💯:10
nameserver 3ffe:501:ffff💯:50
-a wlan1.domain
search test1
search test2
Resolvconf log:
-a wlan1.domain
search test1
search test2
-a wlan1.dns
nameserver 192.168.1.2
nameserver 3ffe:501:ffff💯:10
nameserver 3ffe:501:ffff💯:50
static analysis complains that authenticator is used uninitialized.
This isn't strictly true as memory region is reserved for the
authenticator using the contents of the passed in structure. This
region is then overwritten once the authenticator is actually computed
by authenticator_put(). Silence this warning by explicitly setting
authenticator bytes to 0.
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
This shouldn't be possible in theory since the roam_bss_list being
iterated is a subset of entire scan_bss list station/network has
but to be safe, and catch any issues due to future changes warn on
this condition.
For some encrypt operations DPP passes no AD iovecs (both are
NULL/0). But since the iovec itself is on the stack 'ad' is a
valid pointer from within aes_siv_encrypt. This causes memcpy
to be called which coverity complains about. Since the copy
length is zero it was effectively a no-op, but check num_ad to
prevent the call.
Tests the 3 possible options to UseDefaultEccGroup behave as
expected:
- When not provided use the "auto" behavior.
- When false, always use higher order groups
- When true, always use default group
The SAE test made some assumptions on certain conditions due to
there being no way of checking if those conditions were met
Mainly the use of H2E/hunt-and-peck.
We assumed that when we told hostapd to use H2E or hunt/peck it
would but in reality it was not. Hostapd is apparently not very
good at swapping between the two with a simple "reload" command.
Once H2E is enabled it appears that it cannot be undone.
Similarly the vendor elements seem to carry over from test to
test, and sometimes not which causes unintended behavior.
To fix this create separate APs for the specific scenario being
tested:
- Hunt and peck
- H2E
- Special vendor_element simulating buggy APs
Another issue found was that if password identifies are used
hostapd automatically chooses H2E which was not intented, at
least based on the test names (in reality it wasn't causing any
problems).
The tests have also been improved to use hostapds "sta_status"
command which contains the group number used when authenticating,
so now that at least can be verified.
In order to complete the learned default group behavior station needs
to be aware of when an SAE/OWE connection retried. This is all
handled within netdev/sae so add a new netdev event so station can
set the appropriate network flags to prevent trying the non-default
group again.
If either the settings specify it, or the scan_bss is flagged, set
the use_default_ecc_group flag in the handshake.
This also renames the flag to cover both OWE and SAE
There is special handling for buggy OWE APs which set a network flag
to use the default OWE group. Utilize the more persistent setting
within known-networks as well as the network object (in case there
is no profile).
This also renames the get/set APIs to be generic to ECC groups rather
than only OWE.
This adds the option [Settings].UseDefaultEccGroup which allows a
network profile to specify the behavior when using an ECC-based
protocol. If unset (default) IWD will learn the behavior of the
network for the lifetime of its process.
Many APs do not support group 20 which IWD tries first by default.
This leads to an initial failure followed by a retry using group 19.
This option will allow the user to configure IWD to use group 19
first or learn the network capabilities, if the authentication fails
with group 20 IWD will always use group 19 for the process lifetime.
The information specific to auth/assoc/connect timeouts isn't
communicated to station so emit the notice events within netdev.
We could communicate this to station by adding separate netdev
events, but this does not seem worth it for this use case as
these notice events aren't strictly limited to station.
For anyone debugging or trying to identify network infrastructure
problems the IWD DBus API isn't all that useful and ultimately
requires going through debug logs to figure out exactly what
happened. Having a concise set of debug logs containing only
relavent information would be very useful. In addition, having
some kind of syntax for these logs to be parsed by tooling could
automate these tasks.
This is being done, starting with station, by using iwd_notice
which internally uses l_notice. The use of the notice log level
(5) in IWD will be strictly for the type of messages described
above.
iwd_notice is being added so modules can communicate internal
state or event information via the NOTICE log level. This log
level will be reserved in IWD for only these type of messages.
The iwd_notice macro aims to help enforce some formatting
requirements for these type of log messages. The messages
should be one or more comma-separated "key: value" pairs starting
with "event: <name>" and followed by any additional info that
pertains to that event.
iwd_notice only enforces the initial event key/value format and
additional arguments are left to the caller to be formatted
correctly.
The --logger,-l flag can now be used to specify the logger type.
Unset (default) will set log output to stderr as it is today. The
other valid options are "syslog" and "journal".
basename use is considered harmful. There are two versions of
basename (see man 3 basename for details). The more intuitive version,
which is currently being used inside wiphy.c, is not supported by musl
libc implementation. Use of the libgen version is not preferred, so
drop use of basename entirely. Since wiphy.c is the only call site of
basename() inside iwd, open code the required logic.
ELL now has a setting to limit the number of DHCP attempts. This
will now be set in IWD and if reached will result in a failure
event, and in turn a disconnect.
IWD will set a maximum of 4 retries which should keep the maximum
DHCP time to ~60 seconds roughly.
The known frequency list is now a sorted list and the roam scan
results were not complying with this new requirement. The fix is
easy though since the iteration order of the scan results does
not matter (the roam candidates are inserted by rank). To fix
the known frequencies order we can simply reverse the scan results
list before iterating it.
When operating as an AP, drop message 4 of the 4-way handshake if the AP
has not yet received message 2. Otherwise an attacker can skip message 2
and immediately send message 4 to bypass authentication (the AP would be
using an all-zero ptk to verify the authenticity of message 4).
Modify the existing frequency test to check that the ordering
lines up with the ranking of the BSS.
Add a test to check that quick scans limit the number of known
frequencies.
In very large network deployments there could be a vast amount of APs
which could create a large known frequency list after some time once
all the APs are seen in scan results. This then increases the quick
scan time significantly, in the very worst case (but unlikely) just
as long as a full scan.
To help with this support in knownnetworks was added to limit the
number of frequencies per network. Station will now only get 5
recent frequencies per network making the maximum frequencies 25
in the worst case (~2.5s scan).
The magic values are now defines, and the recent roam frequencies
was also changed to use this define as well.
In order to support an ordered list of known frequencies the list
should be in order of last seen BSS frequencies with the highest
ranked ones first. To accomplish this without adding a lot of
complexity the frequencies can be pushed into the list as long as
they are pushed in reverse rank order (lowest rank first, highest
last). This ensures that very high ranked BSS's will always get
superseded by subsequent scans if not seen.
This adds a new network API to update the known frequency list
based on the current newtork->bss_list. This assumes that station
always wipes the BSS list on scans and populates with only fresh
BSS entries. After the scan this API can be called and it will
reverse the list, then add each frequency.
I've had connections to a WPA3-Personal only network fail with no log
message from iwd, and eventually figured out to was because the driver
would've required using CMD_EXTERNAL_AUTH. With the added log messages
the reason becomes obvious.
Additionally the fallback may happen even if the user explicitly
configured WPA3 in NetworkManager, I believe a warning is appropriate
there.
There was an unhandled corner case if netconfig was running and
multiple roam conditions happened in sequence, all before netconfig
had completed. A single roam before netconfig was already handled
(23f0f5717c) but this did not take into account any additional roam
conditions.
If IWD is in this state, having started netconfig, then roamed, and
again restarted netconfig it is still in a roaming state which will
prevent any further roams. IWD will remain "stuck" on the current
BSS until netconfig completes or gets disconnected.
In addition the general state logic is wrong here. If IWD roams
prior to netconfig it should stay in a connecting state (from the
perspective of DBus).
To fix this a new internal station state was added (no changes to
the DBus API) to distinguish between a purely WiFi connecting state
(STATION_STATE_CONNECTING/AUTO) and netconfig
(STATION_STATE_NETCONFIG). This allows IWD roam as needed if
netconfig is still running. Also, some special handling was added so
the station state property remains in a "connected" state until
netconfig actually completes, regardless of roams.
For some background this scenario happens if the DHCP server goes
down for an extended period, e.g. if its being upgraded/serviced.
This fixes a build break on some systems, specifically the
raspberry Pi 3 (ARM):
monitor/main.c: In function ‘open_packet’:
monitor/main.c:176:3: error: implicit declaration of function ‘close’; did you mean ‘pclose’? [-Werror=implicit-function-declaration]
176 | close(fd);
| ^~~~~
| pclose
This was caused by the unused hostapd instance running after being
re-enabled by mistake. This cause an additional scan result with the
same rank to be seen which would then be connected to by luck of the
draw.
This really needs to be done to many more autotests but since this
one seems to have random failures ensure that all the tests still
run if one fails. In addition add better cleanup for hwsim rules.
This gives the tests a lot more fine-tune control to wait for
specific state transitions rather than only what is exposed over
DBus.
The additional events for "ft-roam" and "reassoc-roam" were removed
since these are now covered by the more generic state change events
("ft-roaming" and "roaming" respectively).
To support multiple nlmon sources, move the logic that reads from iwmon
device into main.c instead of nlmon. nlmon.c now becomes agnostic of
how the packets are actually obtained. Packets are fed in via
high-level APIs such as nlmon_print_rtnl, nlmon_print_genl,
nlmon_print_pae.
The current implementation inside nlmon_receive is asymmetrical. RTNL
packets are printed using nlmon_print_rtnl while GENL packets are
printed using nlmon_message.
nlmon_print_genl and nlmon_print_rtnl already handle iterating over data
containing multiple messages, and are used by nlmon started in reader
mode. Use these for better symmetry inside nlmon_receive.
While here, move store_netlink() call into nlmon_print_rtnl. This makes
handling of PCAP output symmetrical for both RTNL and GENL packets.
This also fixes a possibility where only the first message of a
multi-RTNL packet would be stored.
nlmon_print_genl invokes genl_ctrl when a generic netlink control
message is encountered. genl_ctrl() tries to filter nl80211 family
appearance messages and setup nlmon->id with the extracted family id.
However, the id is already provided inside main.c by using nlmon_open,
and no control messages are processed by nlmon in 'capture' mode (-r
command line argument not passed) since all genl messages go through
nlmon_message() path instead.
configure scripts need to be runnable with a POSIX-compliant /bin/sh.
On many (but not all!) systems, /bin/sh is provided by Bash, so errors
like this aren't spotted. Notably Debian defaults to /bin/sh provided
by dash which doesn't tolerate such bashisms as '+='.
This retains compatibility with bash. Just copy the expanded append like
we do on the line above.
Fixes warnings like:
```
./configure: 13352: CFLAGS+= -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2: not found
```
In order to test that extra settings are applied prior to connecting
two tests were added for hidden networks as well as one testing if
there is already an existing profile after DPP.
The reason hidden networks were used was due to the requirement of
the "Hidden" settings in the profile. If this setting doesn't get
sync'ed to disk the connection will fail.
Before this change DPP was writing the credentials both to disk
and into the network object directly. This allowed the connection
to work fine but additional settings were not picked up due to
network_set_passphrase/psk loading the settings before they were
written.
Instead DPP can avoid setting the credentials to the network
object entirely and just write them to disk. Then, wait for
known networks to notify that the profile was either created
or updated then DPP can proceed to connecting. network_autoconnect()
will take care of loading the profile that DPP wrote and remove the
need for DPP to touch the network object at all.
One thing to note is that an idle callback is still needed from
within the known networks callback. This is because a new profile
requires network.c to set the network_info which is done in the
known networks callback. Rather than assume that network.c will be
called into before dpp.c an l_idle was added.
If a known network is modified on disk known networks does not have
any way of notifying other modules. This will be needed to support a
corner case in DPP if a profile exists but is overwritten after DPP
configuration. Add this event to known networks and handle it in
network.c (though nothing needs to be done in that case).
Without the change test-dpp fails on aarch64-linux as:
$ unit/test-dpp
TEST: DPP test responder-only key derivation
TEST: DPP test mutual key derivation
TEST: DPP test PKEX key derivation
test-dpp: unit/test-dpp.c:514: test_pkex_key_derivation: Assertion `!memcmp(tmp, __tmp, 32)' failed.
This happens due to int/size_t type mismatch passed to vararg
parameters to prf_plus():
bool prf_plus(enum l_checksum_type type, const void *key, size_t key_len,
void *out, size_t out_len,
size_t n_extra, ...)
{
// ...
va_start(va, n_extra);
for (i = 0; i < n_extra; i++) {
iov[i + 1].iov_base = va_arg(va, void *);
iov[i + 1].iov_len = va_arg(va, size_t);
// ...
Note that varargs here could only be a sequence of `void *` / `size_t`
values.
But in src/dpp-util.c `iwd` attempted to pass `int` there:
prf_plus(sha, prk, bytes, z_out, bytes, 5,
mac_i, 6, // <- here
mac_r, 6, // <- and here
m_x, bytes,
n_x, bytes,
key, strlen(key));
aarch64 stores only 32-bit value part of the register:
mov w7, #0x6
str w7, [sp, #...]
and loads full 64-bit form of the register:
ldr x3, [x3]
As a result higher bits of `iov[].iov_len` contain unexpected values and
sendmsg sends a lot more data than expected to the kernel.
The change fixes test-dpp test for me.
While at it fixed obvious `int` / `size_t` mismatch in src/erp.c.
Fixes: 6320d6db0f ("crypto: remove label from prf_plus, instead use va_args")
The path argument was used purely for debugging. It can be just as
informational printing just the SSID of the profile that failed to
parse the setting without requiring callers allocate a string to
call the function.
Certain tests may require external processes to work
(e.g. testNetconfig) and if missing the test will just hang until
the maximum test timeout. Check in start_process if the exe
actually exists and if not throw an exception.
In order to support identifiers the test profiles needed to be
reworked due to hostapd allowing multiple password entires. You
cannot just call set_value() with a new entry as the old ones
still exist. Instead use a unique password for the identifier and
non-identifier use cases.
After adding this test the failure_test started failing due to
hostapd not starting up. This was due to the group being unsupported
but oddly only when hostapd was reloaded (running the test
individually worked). To fix this the group number was changed to 21
which hostapd does support but IWD does not.
Adds a new network profile setting [Security].PasswordIdentifier.
When set (and the BSS enables SAE password identifiers) the network
and handshake object will read this and use it for the SAE
exchange.
Building the handshake will fail if:
- there is no password identifier set and the BSS sets the
"exclusive" bit.
- there is a password identifier set and the BSS does not set
the "in-use" bit.
Using this will provide netdev with a connect callback and unify the
roaming result notification between FT and reassociation. Both paths
will now end up in station_reassociate_cb.
This also adds another return case for ft_handshake_setup which was
previously ignored by ft_associate. Its likely impossible to actually
happen but should be handled nevertheless.
Fixes: 30c6a10f28 ("netdev: Separate connect_failed and disconnected paths")
Essentially exposes (and renames) netdev_ft_tx_associate in order to
be called similarly to netdev_reassociate/netdev_connect where a
connect callback can be provided. This will fix the current bug where
if association times out during FT IWD will hang and never transition
to disconnected.
This also removes the calling of the FT_ROAMED event and instead just
calls the connect callback (since its now set). This unifies the
callback path for reassociation and FT roaming.
This will be called from station after FT-authentication has
finished. It sets up the handshake object to perform reassociation.
This is essentially a copy-paste of ft_associate without sending
the actual frame.
In general only the authenticator FTE is used/validated but with
some FT refactoring coming there needs to be a way to build the
supplicants FTE into the handshake object. Because of this there
needs to be separate FTE buffers for both the authenticator and
supplicant.
The default() method was added for convenience but was extending the
test times significantly when the hostapd config was lengthy. This
was because it called set_value for every value regardless if it
had changed. Instead store the current configuration and in default()
only reset values that differ.
This tests ensures IWD disconnects after receiving an association
timeout event. This exposes a current bug where IWD does not
transition to disconnected after an association timeout when
FT-roaming.
If tests end in an unknown state it is sometimes required that IWD
be stopped manually in order for future tests to run. Add a stop()
method so test tearDown() methods can explicitly stop IWD.
The path for IWD to call this doesn't ever happen in autotests
but during debugging of the DPP agent it was noticed that the
DBus signature was incorrect and would always result in an error
when calling from IWD.
For adding SAE password identifiers the capability bits need to be
verified when loading the identifier from the profile. Pass the
BSS object in to network_load_psk rather than the 'need_passphrase'
boolean.
iov_ie_append assumed that a single IE was being added and thus the
length of the IE could be extracted directly from the element. However,
iov_ie_append was used on buffers which could contain multiple IEs
concatenated together, for example in handshake_state::vendor_ies. Most
of the time this was safe since vendor_ies was NULL or contained a
single element, but would result in incorrect behavior in the general
case. Fix that by changing iov_ie_append signature to take an explicit
length argument and have the caller specify whether the element is a
single IE or multiple.
Fixes: 7e9971661bcb ("netdev: Append any vendor IEs from the handshake")
Use an _auto_ variable to cleanup IEs allocated by
p2p_build_association_req(). While here, take out unneeded L_WARN_ON
since p2p_build_association_req cannot fail.
If the FT-Authenticate frame has been sent then a deauth is received
the work item for sending the FT-Associate frame is never canceled.
When this runs station->connected_network is NULL which causes a
crash:
src/station.c:station_try_next_transition() 7, target xx:xx:xx:xx:xx:xx
src/wiphy.c:wiphy_radio_work_insert() Inserting work item 5843
src/wiphy.c:wiphy_radio_work_insert() Inserting work item 5844
src/wiphy.c:wiphy_radio_work_done() Work item 5842 done
src/wiphy.c:wiphy_radio_work_next() Starting work item 5843
src/netdev.c:netdev_mlme_notify() MLME notification Remain on Channel(55)
src/ft.c:ft_send_authenticate()
src/netdev.c:netdev_mlme_notify() MLME notification Frame TX Status(60)
src/netdev.c:netdev_link_notify() event 16 on ifindex 7
src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
src/netdev.c:netdev_deauthenticate_event()
src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
src/netdev.c:netdev_disconnect_event()
Received Deauthentication event, reason: 7, from_ap: true
src/station.c:station_disconnect_event() 7
src/station.c:station_disassociated() 7
src/station.c:station_reset_connection_state() 7
src/station.c:station_roam_state_clear() 7
src/netconfig.c:netconfig_event_handler() l_netconfig event 2
src/netconfig-commit.c:netconfig_commit_print_addrs() removing address: yyy.yyy.yyy.yyy
src/resolve.c:resolve_systemd_revert() ifindex: 7
[DHCPv4] l_dhcp_client_stop:1264 Entering state: DHCP_STATE_INIT
src/station.c:station_enter_state() Old State: connected, new state: disconnected
src/station.c:station_enter_state() Old State: disconnected, new state: autoconnect_quick
src/wiphy.c:wiphy_radio_work_insert() Inserting work item 5845
src/netdev.c:netdev_mlme_notify() MLME notification Cancel Remain on Channel(56)
src/wiphy.c:wiphy_radio_work_done() Work item 5843 done
src/wiphy.c:wiphy_radio_work_next() Starting work item 5844
"Program terminated with signal SIGSEGV, Segmentation fault.",
"#0 0x0000565359ee3f54 in network_bss_find_by_addr ()",
"#0 0x0000565359ee3f54 in network_bss_find_by_addr ()",
"#1 0x0000565359ec9d23 in station_ft_work_ready ()",
"#2 0x0000565359ec0af0 in wiphy_radio_work_next ()",
"#3 0x0000565359f20080 in offchannel_mlme_notify ()",
"#4 0x0000565359f4416b in received_data ()",
"#5 0x0000565359f40d90 in io_callback ()",
"#6 0x0000565359f3ff4d in l_main_iterate ()",
"#7 0x0000565359f4001c in l_main_run ()",
"#8 0x0000565359f40240 in l_main_run_with_signal ()",
"#9 0x0000565359eb3888 in main ()"
ssid is declared as a 32 byte field in handshake_state, hence using it
as a string which is assumed to be nul-terminated will fail for SSIDs
that are 32 bytes long.
Fixes: d938d362b212 ("erp: ERP implementation and key cache move")
Fixes: 433373fe28a4 ("eapol: cache ERP keys on EAP success")
ssid is declared as a 32 byte field in handshake_state, hence using it
as a string which is assumed to be nul-terminated will fail for SSIDs
that are 32 bytes long.
Fixes: 1f1478285725 ("wiphy: add _generate_address_from_ssid")
Fixes: 5a1b1184fca6 ("netdev: support per-network MAC addresses")
In netdev_retry_owe, if l_gen_family_send fails, the connect_cmd is
never freed or reset. Fix that.
While here, use a stack variable instead of netdev member, since the use
of such a member is unnecessary and confusing.
vendor_ies stored in handshake_state are already added as part of
netdev_populate_common_ies(), which is already invoked by
netdev_build_cmd_connect().
Normally vendor_ies is NULL for OWE connections, so no IEs are
duplicated as a result.
CC src/adhoc.o
In file included from src/adhoc.c:28:0:
/usr/include/linux/if.h:234:19: error: field ‘ifru_addr’ has incomplete type
struct sockaddr ifru_addr;
^
/usr/include/linux/if.h:235:19: error: field ‘ifru_dstaddr’ has incomplete type
struct sockaddr ifru_dstaddr;
^
/usr/include/linux/if.h:236:19: error: field ‘ifru_broadaddr’ has incomplete type
struct sockaddr ifru_broadaddr;
^
/usr/include/linux/if.h:237:19: error: field ‘ifru_netmask’ has incomplete type
struct sockaddr ifru_netmask;
^
/usr/include/linux/if.h:238:20: error: field ‘ifru_hwaddr’ has incomplete type
struct sockaddr ifru_hwaddr;
^
Very rarely on ath10k (potentially other ath cards), disabling
power save while the interface is down causes a timeout when
bringing the interface back up. This seems to be a race in the
driver or firmware but it causes IWD to never start up properly
since there is no retry logic on that path.
Retrying is an option, but a more straight forward approach is
to just reorder the logic to set power save off after the
interface is already up. If the power save setting fails we can
just log it, ignore the failure, and continue. From a users point
of view there is no real difference in doing it this way as
PS still gets disabled prior to IWD connecting/sending data.
Changing behavior based on a buggy driver isn't something we
should be doing, but in this instance the change shouldn't have
any downside and actually isn't any different than how it has
been done prior to the driver quirks change (i.e. use network
manager, iw, or iwconfig to set power save after IWD starts).
For reference, this problem is quite rare and difficult to say
exactly how often but certainly <1% of the time:
iwd[1286641]: src/netdev.c:netdev_disable_ps_cb() Disabled power save for ifindex 54
kernel: ath10k_pci 0000:02:00.0: wmi service ready event not received
iwd[1286641]: Error bringing interface 54 up: Connection timed out
kernel: ath10k_pci 0000:02:00.0: Could not init core: -110
After this IWD just sits idle as it has no interface to start using.
This is even reproducable outside of IWD if you loop and run:
ip link set <wlan> down
iw dev <wlan> set power_save off
ip link set <wlan> up
Eventually the 'up' command will fail with a timeout.
I've brought this to the linux-wireless/ath10k mailing list but
even if its fixed in future kernels we'd still need to support
older kernels, so a workaround/change in IWD is still required.
This is done already for DPP, do the same for PKEX. Few drivers
(ath9k upstream, ath10k/11k in progress) support this which is
unfortunate but since a configurator will not work without this
capability its best to fail early.
The DPP spec allows 3rd party fields in the DPP configuration
object (section 4.5.2). IWD can take advantage of this (when
configuring another IWD supplicant) to communicate additional
profile options that may be required for the network.
The new configuration member will be called "/net/connman/iwd"
and will be an object containing settings specific to IWD.
More settings could be added here if needed but for now only
the following are defined:
{
send_hostname: true/false,
hidden: true/false
}
These correspond to the following network profile settings:
[IPv4].SendHostname
[Settings].Hidden
The scan result handling was fragile because it assumed the kernel
would only give results matching the requested SSID. This isn't
something we should assume so instead keep the configuration object
around until after the scan and use the target SSID to lookup the
network.
Nearly every use of the ssid member first has to memcpy it to a
buffer and NULL terminate. Instead just store the ssid as a
string when creating/parsing from JSON.
The DPP-PKEX spec provides a very limited list of frequencies used
to discover configurators, only 3 on 2.4 and 5GHz bands. Since
configurators (at least in IWD's implementation) are only allowed
on the current operating frequency its very unlikely an enrollee
will find a configurator on these frequencies out of the entire
spectrum.
The spec does mention that the 3 default frequencies should be used
"In lieu of specific channel information obtained in a manner outside
the scope of this specification, ...". This allows the implementation
some flexibility in using a broader range of frequencies.
To increase the chances of finding a configurator shared code
enrollees will first issue a scan to determine what access points are
around, then iterate these frequencies. This is especially helpful
when the configurators are IWD-based since we know that they'll be
on the same channels as the APs in the area.
The post-DPP connection was never done quite right due to station's
state being unknown. The state is now tracked in DPP by a previous
patch but the scan path in DPP is still wrong.
It relies on station autoconnect logic which has the potential to
connect to a different network than what was configured with DPP.
Its unlikely but still could happen in theory. In addition the scan
was not selectively filtering results by the SSID that DPP
configured.
This fixes the above problems by first filtering the scan by the
SSID. Then setting the scan results into station without triggering
autoconnect. And finally using network_autoconnect() directly
instead of relying on station to choose the SSID.
DPP (both DPP and PKEX) run the risk of odd behavior if station
decides to change state. DPP is completely unaware of this and
best case would just result in a protocol failure, worst case
duplicate calls to __station_connect_network.
Add a station watch and stop DPP if station changes state during
the protocol.
Commit c59669a366c5 ("netdev: disambiguate between disconnection types")
introduced different paths for different types of disconnection
notifications from netdev. Formalize this further by having
netdev_connect_failed only invoke connect_cb.
Disconnections that could be triggered outside of connection
related events are now handled on a different code path. For this
purpose, netdev_disconnected() is introduced.
When a roam event is received, iwd generates a firmware scan request and
notifies its event filter of the ROAMING condition. In cases where the
firmware scan could not be started successfully, netdev_connect_failed
is invoked. This is not a correct use of netev_connect_failed since it
doesn't actually disconnect the underlying netdev and the reflected
state becomes de-synchronized from the underlying kernel device.
The firmware scan request could currently fail for two reasons:
1. nl80211 genl socket is in a bad state, or
2. the scan context does not exist
Since both reasons are highly unlikely, simply use L_WARN instead.
The other two cases where netdev_connect_failed is used could only occur
if the kernel message is invalid. The message is ignored in that case
and a warning is printed.
The situation described above also exists in netdev_get_fw_scan_cb. If
the scan could not be completed successfully, there's not much iwd can
do to recover. Have iwd remain in roaming state and print an error.
There are generally three scenarios where iwd generates a disconnection
command to the kernel:
1. Error conditions stemming from a connection related event. For
example if SAE/FT/FILS authentication fails during Authenticate or
Associate steps and the kernel doesn't disconnect properly.
2. Deauthentication after the connection has been established and not
related to a connection attempt in progress. For example, SA Query
processing that triggers an disconnect.
3. Disconnects that are triggered due to a handshake failure or if
setting keys resulting from the handshake fails. These disconnects
can be triggered as a result of a pending connection or when a
connection has been established (e.g. due to rekeying).
Distinguish between 1 and 2/3 by having the disconnect procedure take
different paths. For now there are no functional changes since all
paths end up in netdev_connect_failed(), but this will change in the
future.
While here, also get rid of netdev_del_station. The only user of this
function was in ap.c and it could easily be replaced by invoking the new
nl80211_build_del_station function. The callback used by
netdev_build_del_station only printed an error and didn't do anything
useful. Get rid of it for now.
netdev_begin_connection() already invokes netdev_connect_failed on
error. Remove any calls to netdev_connect_failed in callers of
netdev_begin_connection().
Fixes: 4165d9414f54 ("netdev: use wiphy radio work queue for connections")
If netdev_get_oci fails, a goto deauth is invoked in order to terminate
the current connection and return an error to the caller. Unfortunately
the deauth label builds CMD_DEAUTHENTICATE in order to terminate the
connection. This was fine because it used to handle authentication
protocols that ran over CMD_AUTHENTICATE and CMD_ASSOCIATE. However,
OCI can also be used on FullMAC hardware that does not support them.
Use CMD_DISCONNECT instead which works everywhere.
Fixes: 06482b811626 ("netdev: Obtain operating channel info")
The reason code field was being obtained as a uint8_t value, while it is
actually a uint16_t in little-endian byte order.
Fixes: f3cc96499c44 ("netdev: added support for SA Query")
The reason code from deauthentication frame was being obtained as a
uint8_t instead of a uint16_t. The value was only ever used in an
informational statement. Since the value was in little endian, only the
first 8 bits of the reason code were obtained. Fix that.
Fixes: 2bebb4bdc7ee ("netdev: Handle deauth frames prior to association")
Several tests do not pass due to some additional changes that have
not been merged. Remove these cases and add some hardening after
discovering some unfortunate wpa_supplicant behavior.
- Disable p2p in wpa_supplicant. With p2p enabled an extra device
is created which starts receiving DPP frames and printing
confusing messages.
- Remove extra asserts which don't make sense currently. These
will be added back later as future additions to PKEX are
upstreamed.
- Work around wpa_supplicant retransmit limitation. This is
described in detail in the comment in pkex_test.py
- wait_for_event was returning a list in certain cases, not the
event itself
- The configurator ID was not being printed (',' instead of '%')
- The DPP ID was not being properly waited for with PKEX
With the addition of DPP PKEX autotests some of the timeouts are
quite long and hit test-runners maximum timeouts. For UML we should
allow this since time-travel lets us skip idle waits. Move the test
timeout out of a global define and into the argument list so QEMU
and UML can define it differently.
The StartConfigurator() call was left out since there would be no
functional difference to the user in iwctl. Its expected that
human users of the shared code API provide the code/id ahead of
time, i.e. use ConfigureEnrollee/StartEnrollee.
Check that enough space for newline and 0-byte is left in line.
This fixes a buffer overflow on specific completion results.
Reported-By: Leona Maroni <dev@leona.is>
Adds a configurator variant to be used along side an agent. When
called the configurator will start and wait for an initial PKEX
exchange message from an enrollee at which point it will request
the code from an agent. This provides more flexibility for
configurators that are capable of configuring multiple enrollees
with different identifiers/codes.
Note that the timing requirements per the DPP spec still apply
so this is not meant to be used with a human configurator but
within an automated agent which does a quick lookup of potential
identifiers/codes and can reply within the 200ms window.
The PKEX configurator role is currently limited to being a responder.
When started the configurator will listen on its current operating
channel for a PKEX exchange request. Once received it and the
encrypted key is properly decrypted it treats this peer as the
enrollee and won't allow configurations from other peers unless
PKEX is restarted. The configurator will encrypt and send its
encrypted ephemeral key in the PKEX exchange response. The enrollee
then sends its encrypted bootstrapping key (as commit-reveal request)
then the same for the configurator (as commit-reveal response).
After this, PKEX authentication begins. The enrollee is expected to
send the authenticate request, since its the initiator.
This is the initial support for PKEX enrollees acting as the
initiator. A PKEX initiator starts the protocol by broadcasting
the PKEX exchange request. This request contains a key encrypted
with the pre-shared PKEX code. If accepted the peer sends back
the exchange response with its own encrypted key. The enrollee
decrypts this and performs some crypto/hashing in order to establish
an ephemeral key used to encrypt its own boostrapping key. The
boostrapping key is encrypted and sent to the peer in the PKEX
commit-reveal request. The peer then does the same thing, encrypting
its own bootstrapping key and sending to the initiator as the
PKEX commit-reveal response.
After this, both peers have exchanged their boostrapping keys
securely and can begin DPP authentication, then configuration.
For now the enrollee will only iterate the default channel list
from the Easy Connect spec. Future upates will need to include some
way of discovering non-default channel configurators, but the
protocol needs to be ironed out first.
Stop() will now return NotFound if DPP is not running. This causes
the DPP test to fail since it calls this regardless if the protocol
already stopped. Ignore this exception since tests end in various
states, some stopped and some not.
PKEX and DPP will share the same state machine since the DPP protocol
follows PKEX. This does pose an issue with the DBus interfaces
because we don't want DPP initiated by the SharedCode interface to
start setting properties on the DeviceProvisioning interface.
To handle this a dpp_interface enum is being introduced which binds
the dpp_sm object to a particular interface, for the life of the
protocol run. Once the protocol finishes the dpp_sm can be unbound
allowing either interface to use it again later.
This mispelling was present in the configuration, so I retained parsing
of the legacy BandModifier*Ghz options for compatibility. Without this
change anyone spelling GHz correctly in their configs would be very
confused.
PKEX is part of the WFA EasyConnect specification and is
an additional boostrapping method (like QR codes) for
exchanging public keys between a configurator and enrollee.
PKEX operates over wifi and requires a key/code be exchanged
prior to the protocol. The key is used to encrypt the exchange
of the boostrapping information, then DPP authentication is
started immediately aftewards.
This can be useful for devices which don't have the ability to
scan a QR code, or even as a more convenient way to share
wireless credentials if the PSK is very secure (i.e. not a
human readable string).
PKEX would be used via the three DBus APIs on a new interface
SharedCodeDeviceProvisioning.
ConfigureEnrollee(a{sv}) will start a configurator with a
static shared code (optionally identifier) passed in as the
argument to this method.
StartEnrollee(a{sv}) will start a PKEX enrollee using a static
shared code (optionally identifier) passed as the argument to
the method.
StartConfigurator(o) will start a PKEX configurator and use the
agent specified by the path argument. The configurator will query
the agent for a specific code when an enrollee sends the initial
exchange message.
After the PKEX protocol is finished, DPP bootstrapping keys have
been exchanged and DPP Authentication will start, followed by
configuration.
Beacon loss handling was removed in the past because it was
determined that this even always resulted in a disconnect. This
was short sighted and not always true. The default kernel behavior
waits for 7 lost beacons before emitting this event, then sends
either a few nullfuncs or probe requests to the BSS to determine
if its really gone. If these come back successfully the connection
will remain alive. This can give IWD some time to roam in some
cases so we should be handling this event.
Since beacon loss indicates a very poor connection the roam scan
is delayed by a few seconds in order to give the kernel a chance
to send the nullfuncs/probes or receive more beacons. This may
result in a disconnect, but it would have happened anyways.
Attempting a roam mainly handles the case when the connection can
be maintained after beacon loss, but is still poor.
This is being done to allow the DPP module to work correctly. DPP
currently uses __station_connect_network incorrectly since it
does not (and cannot) change the state after calling. The only
way to connect with a state change is via station_connect_network
which requires a DBus method that triggered the connection; DPP
does not have this due to its potentially long run time.
To support DPP there are a few options:
1. Pass a state into __station_connect_network (this patch)
2. Support a NULL DBus message in station_connect_network. This
would require several NULL checks and adding all that to only
support DPP just didn't feel right.
3. A 3rd connect API in station which wraps
__station_connect_network and changes the state. And again, an
entirely new API for only DPP felt wrong (I guess we did this
for network_autoconnect though...)
Its about 50/50 between call sites that changed state after calling
and those that do not. Changing the state inside
__station_connect_network felt useful enough to cover the cases that
could benefit and the remaining cases could handle it easily enough:
- network_autoconnect(), and the state is changed by station after
calling so it more or less follows the same pattern just routes
through network. This will now pass the CONNECTING_AUTO state
from within network vs station.
- The disconnect/reconnect path. Here the state is changed to
ROAMING prior in order to avoid multiple state changes. Knowing
this the same ROAMING state can be passed which won't trigger a
state change.
- Retrying after a failed BSS. The state changes on the first call
then remains the same for each connection attempt. To support this
the current station->state is passed to avoid a state change.
Until now IWD only supported enrollees as responders (configurators
could do both). For PKEX it makes sense for the enrollee to be the
initiator because configurators in the area are already on their
operating channel and going off is inefficient. For PKEX, whoever
initiates also initiates authentication so for this reason the
authentication path is being opened up to allow enrollees to
initiate.
The check for the header was incorrect according to the spec.
Table 58 indicates that the "Query Response Info" should be set
to 0x00 for the configuration request. The frame handler was
expecting 0x7f which is the value for the config response frame.
Unfortunately wpa_supplicant also gets this wrong and uses 0x7f
in all cases which is likely why this value was set incorrectly
in IWD. The issue is that IWD's config request is correct which
means IWD<->IWD configuration is broken. (and wpa_supplicant as
a configurator likely doesn't validate the config request).
Fix this by checking both 0x7f and 0x00 to handle both
supplicants.
Stopping periodic scans and not restarting them prevents autoconnect
from working again if DPP (or the post-DPP connect) fails. Since
the DPP offchannel work is at a higher priority than scanning (and
since new offchannels are queue'd before canceling) there is no risk
of a scan happening during DPP so its safe to leave periodic scans
running.
The packet loss handler puts a higher priority on roaming compared
to the low signal roam path. This is generally beneficial since this
event usually indicates some problem with the BSS and generally is
an indicator that a disconnect will follow sometime soon.
But by immediately issuing a scan we run the risk of causing many
successive scans if more packet loss events arrive following
the roam scans (and if no candidates are found). Logs provided
further.
To help with this handle the first event with priority and
immediately issue a roam scan. If another event comes in within a
certain timeframe (2 seconds) don't immediately scan, but instead
rearm the roam timer instead of issuing a scan. This also handles
the case of a low signal roam scan followed by a packet loss
event. Delaying the roam will at least provide some time for packets
to get out in between roam scans.
Logs were snipped to be less verbose, but this cycled happened
5 times prior. In total 7 scans were issued in 5 seconds which may
very well have been the reason for the local disconnect:
Oct 27 16:23:46 src/station.c:station_roam_failed() 9
Oct 27 16:23:46 src/wiphy.c:wiphy_radio_work_done() Work item 29 done
Oct 27 16:23:47 src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
Oct 27 16:23:47 src/station.c:station_packets_lost() Packets lost event: 10
Oct 27 16:23:47 src/station.c:station_roam_scan() ifindex: 9
Oct 27 16:23:47 src/wiphy.c:wiphy_radio_work_insert() Inserting work item 30
Oct 27 16:23:47 src/wiphy.c:wiphy_radio_work_next() Starting work item 30
Oct 27 16:23:47 src/station.c:station_start_roam() Using cached neighbor report for roam
Oct 27 16:23:47 src/scan.c:scan_notify() Scan notification Trigger Scan(33)
Oct 27 16:23:47 src/scan.c:scan_request_triggered() Active scan triggered for wdev a
Oct 27 16:23:47 src/scan.c:scan_notify() Scan notification New Scan Results(34)
Oct 27 16:23:47 src/netdev.c:netdev_link_notify() event 16 on ifindex 9
... scan results ...
Oct 27 16:23:47 src/station.c:station_roam_failed() 9
Oct 27 16:23:47 src/wiphy.c:wiphy_radio_work_done() Work item 30 done
Oct 27 16:23:47 src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
Oct 27 16:23:47 src/station.c:station_packets_lost() Packets lost event: 10
Oct 27 16:23:47 src/station.c:station_roam_scan() ifindex: 9
Oct 27 16:23:47 src/wiphy.c:wiphy_radio_work_insert() Inserting work item 31
Oct 27 16:23:47 src/wiphy.c:wiphy_radio_work_next() Starting work item 31
Oct 27 16:23:47 src/station.c:station_start_roam() Using cached neighbor report for roam
Oct 27 16:23:47 src/scan.c:scan_notify() Scan notification Trigger Scan(33)
Oct 27 16:23:47 src/scan.c:scan_request_triggered() Active scan triggered for wdev a
Oct 27 16:23:48 src/scan.c:scan_notify() Scan notification New Scan Results(34)
Oct 27 16:23:48 src/netdev.c:netdev_link_notify() event 16 on ifindex 9
... scan results ...
Oct 27 16:23:48 src/station.c:station_roam_failed() 9
Oct 27 16:23:48 src/wiphy.c:wiphy_radio_work_done() Work item 31 done
Oct 27 16:23:48 src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
Oct 27 16:23:48 src/station.c:station_packets_lost() Packets lost event: 10
Oct 27 16:23:48 src/station.c:station_roam_scan() ifindex: 9
Oct 27 16:23:48 src/wiphy.c:wiphy_radio_work_insert() Inserting work item 32
Oct 27 16:23:48 src/wiphy.c:wiphy_radio_work_next() Starting work item 32
Oct 27 16:23:48 src/station.c:station_start_roam() Using cached neighbor report for roam
Oct 27 16:23:48 src/scan.c:scan_notify() Scan notification Trigger Scan(33)
Oct 27 16:23:48 src/scan.c:scan_request_triggered() Active scan triggered for wdev a
Oct 27 16:23:49 src/netdev.c:netdev_link_notify() event 16 on ifindex 9
Oct 27 16:23:49 src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
Oct 27 16:23:49 src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
Oct 27 16:23:49 src/netdev.c:netdev_deauthenticate_event()
Oct 27 16:23:49 src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
Oct 27 16:23:49 src/netdev.c:netdev_disconnect_event()
Oct 27 16:23:49 Received Deauthentication event, reason: 4, from_ap: false
Include a specific timeout value so different protocols can specify
different timeouts. For example once the authentication timeout
should not take very long (even 10 seconds seems excessive) but
adding PKEX may warrant longer timeouts.
For example discovering a configurator IWD may want to wait several
minutes before ending the discovery. Similarly running PKEX as a
configurator we should put a hard limit on the time, but again
minutes rather than 10 seconds.
The memcpy in HEX2BUF was copying the length of the buffer that was
passed in, not the actual length of the converted hexstring. This
test was segfaulting in the Alpine CI which uses clang/musl.
Its been seen (so far only in mac80211_hwsim + UML) where an
offchannel requests ACK comes after the ROC started event. This
causes the ROC started event to never call back to notify since
info->roc_cookie is unset and it appears to be coming from an
external process.
We can detect this situation in the ROC notify event by checking
if there is a pending ROC command and if info->roc_cookie does
not match. This can also be true for an external event so we just
set a new "early_cookie" member and return.
Then, when the ACK comes in for the ROC request, we can validate
if the prior event was associated with IWD or some external
process. If it was from IWD call the started callback, otherwise
the ROC notify event should come later and handled under the
normal logic where the cookies match.
Instead of looking up by wdev, lookup by the ID itself. We
shouldn't ever have more than one info per wdev in the queue but
looking up the _exact_ info structure doesn't hurt in case things
change in the future.
If netconfig is canceled before completion (when roaming) the
settings are freed and never loaded again once netconfig is started
post-roam. Now after a roam make sure to re-load the settings and
start netconfig.
Commit 23f0f5717c did not correctly handle the reassociation
case where the state is set from within station_try_next_transition.
If IWD reassociates netconfig will get reset and DHCP will need to
be done over again after the roam. Instead get the state ahead of
station_try_next_transition.
Fixes: 23f0f5717ca0 ("station: allow roaming before netconfig finishes")
When using mutual authentication an additional value needs to
be hashed when deriving i/r_auth values. A NULL value indicates
no mutual authentication (zero length iovec is passed to hash).
DPP configurators are running the majority of the protocol on the
current operating channel, meaning no ROC work. The retry logic
was bailing out if !dpp->roc_started with the assumption that DPP
was in between requesting offchannel work and it actually starting.
For configurators, this may not be the case. The offchannel ID also
needs to be checked, and if no work is scheduled we can send the
frame.
The prf_plus API was a bit restrictive because it only took a
string label which isn't compatible with some specs (e.g. DPP
inputs to HKDF-Expand). In addition it took additional label
aruments which were appended to the HMAC call (and the
non-intuitive '\0' if there were extra arguments).
Instead the label argument has been removed and callers can pass
it in through va_args. This also lets the caller decided the length
and can include the '\0' or not, dependent on the spec the caller
is following.
Adds a handler for the HE capabilities element and reworks the way
the MCS/NSS support bits are printed.
Now if the MCS support is 3 (unsupported) it won't be printed. This
makes the logs a bit shorter to read.
Matched the printed function name with the actual function name.
The simple-agent test prints the function name to allow easier debugging.
One name was not set currectly (most likely through copy pasting).
SAE was also relying on the ELL bug which was incorrectly performing
a subtraction on the Y coordinate based on the compressed point type.
Correct this and make the point type more clear (rather than
something like "is_odd + 2").
EAP-PWD was incorrectly computing the PWE but due to the also
incorrect logic in ELL the point converted correctly. This is
being fixed, so both places need the reverse logic.
Also added a big comment explaining why this is, and how
l_ecc_point_from_data behaves since its somewhat confusing since
EAP-PWD expects the pwd-seed to be compared to the actual Y
coordinate (which is handled automatically by ELL).
Add a test to show the incorrect ASN1 conversion to and from points.
This was due to the check if Y is odd/even being inverted which
incorrectly prefixes the X coordinate with the wrong byte.
The test itself was not fully correct because it was using compliant
points rather than full points, and the spec contains the entire
Y coordinate so the full point should be used.
This patch also adds ASN1 conversions to validate that
dpp_point_from_asn1 and dpp_point_to_asn1 work properly.
The previous attempt at working around this warning seems to no longer
work with gcc 13
In function ‘eap_handle_response’,
inlined from ‘eap_rx_packet’ at src/eap.c:570:3:
src/eap.c:421:49: error: ‘vendor_id’ may be used uninitialized [-Werror=maybe-uninitialized]
421 | (type == EAP_TYPE_EXPANDED && vendor_id == (id) && vendor_type == (t))
| ~~~~~~~~~~^~~~~~~
src/eap.c:533:20: note: in expansion of macro ‘IS_EXPANDED_RESPONSE’
533 | } else if (IS_EXPANDED_RESPONSE(our_vendor_id, our_vendor_type))
| ^~~~~~~~~~~~~~~~~~~~
src/eap.c: In function ‘eap_rx_packet’:
src/eap.c:431:18: note: ‘vendor_id’ was declared here
431 | uint32_t vendor_id;
| ^~~~~~~~~
width must be initialized since it depends on best not being NULL. If
best passes the non-NULL check above, then width must be initialized
since both width and best are set at the same time.
For IWD to work correctly either 2.4GHz or 5GHz bands must be enabled
(even for 6GHz to work). Check this and don't allow IWD to initialize
if both 2.4 and 5GHz is disabled.
wiphy_get_allowed_freqs was only being used to see if 6GHz was disabled
or not. This is expensive and requires several allocations when there
already exists wiphy_is_band_disabled(). The prior patch modified
wiphy_is_band_disabled() to return -ENOTSUP which allows scan.c to
completely remove the need for wiphy_get_allowed_freqs.
scan_wiphy_watch was also slightly re-ordered to avoid allocating
freqs_6ghz if the scan request was being completed.
The function wiphy_band_is_disabled() return was a bit misleading
because if the band was not supported it would return true which
could be misunderstood as the band is supported, but disabled.
There was only one call site and because of this behavior
wiphy_band_is_disabled needed to be paired with checking if the
band was supported.
To be more descriptive to the caller, wiphy_band_is_disabled() now
returns an int and if the band isn't supported -ENOTSUP will be
returned, otherwise 1 is returned if the band is disabled and 0
otherwise.
This adds support to allow users to disable entire bands, preventing
scanning and connecting on those frequencies. If the
[Rank].BandModifier* options are set to 0.0 it will imply those
bands should not be used for scanning, connecting or roaming. This
now applies to autoconnect, quick, hidden, roam, and dbus scans.
This is a station only feature meaning other modules like RRM, DPP,
WSC or P2P may still utilize those bands. Trying to limit bands in
those modules may sometimes conflict with the spec which is why it
was not added there. In addition modules like DPP/WSC are only used
in limited capacity for connecting so there is little benefit gained
to disallowing those bands.
To support user-disabled bands periodic scans need to specify a
frequency list filtered by any bands that are disabled. This was
needed in scan.c since periodic scans don't provide a frequency
list in the scan request.
If no bands are disabled the allowed freqs API should still
result in the same scan behavior as if a frequency list is left
out i.e. IWD just filters the frequencies as opposed to the kernel.
Currently the only way a scan can be split is if the request does
not specify any frequencies, implying the request should scan the
entire spectrum. This allows the scan logic to issue an extra
request if 6GHz becomes available during the 2.4 or 5GHz scans.
This restriction was somewhat arbitrary and done to let periodic
scans pick up 6GHz APs through a single scan request.
But now with the addition of allowing user-disabled bands
periodic scans will need to specify a frequency list in case a
given band has been disabled. This will break the scan splitting
code which is why this prep work is being done.
The main difference now is the original scan frequencies are
tracked with the scan request. The reason for this is so if a
request comes in with a limited set of 6GHz frequences IWD won't
end up scanning the full 6GHz spectrum later on.
This is more or less copied from scan_get_allowed_freqs but is
going to be needed by station (basically just saves the need for
station to do the same clone/constrain sequence itself).
One slight alteration is now a band mask can be passed in which
provides more flexibility for additional filtering.
This exposes the [Rank].BandModifier* settings so other modules
can use then. Doing this will allow user-disabling of certain
bands by setting these modifier values to 0.0.
The loop iterating the frequency attributes list was not including
the entire channel set since it was stopping at i < band->freqs_len.
The freq_attrs array is allocated to include the last channel:
band->freq_attrs = l_new(struct band_freq_attrs, num_channels + 1);
band->freqs_len = num_channels;
So instead the for loop should use i <= band->freqs_len. (I also
changed this to start the loop at 1 since channel zero is invalid).
The auth/action status is now tracked in ft.c. If an AP rejects the
FT attempt with "Invalid PMKID" we can now assume this AP is either
mis-configured for FT or is lagging behind getting the proper keys
from neighboring APs (e.g. was just rebooted).
If we see this condition IWD can now fall back to reassociation in
an attempt to still roam to the best candidate. The fallback decision
is still rank based: if a BSS fails FT it is marked as such, its
ranking is reset removing the FT factor and it is inserted back
into the queue.
The motivation behind this isn't necessarily to always force a roam,
but instead to handle two cases where IWD can either make a bad roam
decision or get 'stuck' and never roam:
1. If there is one good roam candidate and other bad ones. For
example say BSS A is experiencing this FT key pull issue:
Current BSS: -85dbm
BSS A: -55dbm
BSS B: -80dbm
The current logic would fail A, and roam to B. In this case
reassociation would have likely succeeded so it makes more sense
to reassociate to A as a fallback.
2. If there is only one candidate, but its failing FT. IWD will
never try anything other than FT and repeatedly fail.
Both of the above have been seen on real network deployments and
result in either poor performance (1) or eventually lead to a full
disconnect due to never roaming (2).
Certain return codes, though failures, can indicate that the AP is
just confused or booting up and treating it as a full failure may
not be the best route.
For example in some production deployments if an AP is rebooted it
may take some time for neighboring APs to exchange keys for
current associations. If a client roams during that time it will
reject saying the PMKID is invalid.
Use the ft_associate call return to communicate the status (if any)
that was in the auth/action response. If there was a parsing error
or no response -ENOENT is still returned.
In many tests the hostapd configuration does not include all the
values that a test uses. Its expected that each individual test
will add the values required. In many cases its required each test
slightly alter the configuration for each change every other test
has to set the value back to either a default or its own setting.
This results in a ton of duplicated code mainly setting things
back to defaults.
To help with this problem the hostapd configuration is read in
initially and stored as the default. Tests can then simply call
.default() to set everything back. This significantly reduces or
completely removes a ton of set_value() calls.
This does require that each hostapd configuration file includes all
values any of the subtests will set, which is a small price for the
convenience.
Removed several debug prints which are very verbose and provide
little to no important information.
The get_scan_{done,callback} prints are pointless since all the
parsed scan results are printed by station anyways.
Printing the BSS load is also not that useful since it doesn't
include the BSSID. If anything the BSS load should be included
when station prints out each individual BSS (along with frequency,
rank, etc).
The advertisement protocol print was just just left in there by
accident when debugging, and also provides basically no useful
information.
Some APs don't include the RSNE in the associate reply during
the OWE exchange. This causes IWD to be incompatible since it has
a hard requirement on the AKM being included.
This relaxes the requirement for the AKM and instead warns if it
is not included.
Below is an example of an association reply without the RSN element
IEEE 802.11 Association Response, Flags: ........
Type/Subtype: Association Response (0x0001)
Frame Control Field: 0x1000
.000 0000 0011 1100 = Duration: 60 microseconds
Receiver address: 64:c4:03:88:ff:26
Destination address: 64:c4:03:88:ff:26
Transmitter address: fc:34:97:2b:1b:48
Source address: fc:34:97:2b:1b:48
BSS Id: fc:34:97:2b:1b:48
.... .... .... 0000 = Fragment number: 0
0001 1100 1000 .... = Sequence number: 456
IEEE 802.11 wireless LAN
Fixed parameters (6 bytes)
Tagged parameters (196 bytes)
Tag: Supported Rates 6(B), 9, 12(B), 18, 24(B), 36, 48, 54, [Mbit/sec]
Tag: RM Enabled Capabilities (5 octets)
Tag: Extended Capabilities (11 octets)
Ext Tag: HE Capabilities (IEEE Std 802.11ax/D3.0)
Ext Tag: HE Operation (IEEE Std 802.11ax/D3.0)
Ext Tag: MU EDCA Parameter Set
Ext Tag: HE 6GHz Band Capabilities
Ext Tag: OWE Diffie-Hellman Parameter
Tag Number: Element ID Extension (255)
Ext Tag length: 51
Ext Tag Number: OWE Diffie-Hellman Parameter (32)
Group: 384-bit random ECP group (20)
Public Key: 14ba9d8abeb2ecd5d95e6c12491b16489d1bcc303e7a7fbd…
Tag: Vendor Specific: Broadcom
Tag: Vendor Specific: Microsoft Corp.: WMM/WME: Parameter Element
Reported-By: Wen Gong <quic_wgong@quicinc.com>
Tested-By: Wen Gong <quic_wgong@quicinc.com>
Handling these events notifies hwsim of address changes for interface
creation/removal outside the initial namespace as well as address
changes due to scanning address randomization.
Interfaces that hwsim already knows about are still handled via
nl80211. But any interfaces not known when ADD/DEL_MAC_ADDR events
come will be treated specially.
For ADD, a dummy interface object will be created and added to the
queue. This lets the frame processing match the destination address
correctly. This can happen both for scan randomization and interface
creation outside of the initial namespace.
For the DEL event we handle similarly and don't touch any interfaces
found via nl80211 (i.e. have a 'name') but need to also be careful
with the dummy interfaces that were created outside the initial
namespace. We want to keep these around but scanning MAC changes can
also delete them. This is why a reference count was added so scanning
doesn't cause a removal.
For example, the following sequence:
ADD_MAC_ADDR (interface creation)
ADD_MAC_ADDR (scanning started)
DEL_MAC_ADDR (scanning done)
Hostapd commit b6d3fd05e3 changed the PMKID derivation in accordance
with 802.11-2020 which then breaks PMKID validation in IWD. This
breaks the FT-8021x AKM in IWD if the AP uses this hostapd version
since the PMKID doesn't validate during EAPoL.
This updates the PMKID derivation to use the correct SHA hash for
this AKM and adds SHA1 based PMKID checking for interoperability
with older hostapd versions.
The PMKID derivation has gotten messy due to the spec
updating/clarifying the hash size for the FT-8021X AKM. This
has led to hostapd updating the derivation which leaves older
hostapd versions using SHA1 and newer versions using SHA256.
To support this the checksum type is being fed to
handshake_state_get_pmkid so the caller can decide what sha to
use. In addition handshake_state_pmkid_matches is being added
which uses get_pmkid() but handles sorting out the hash type
automatically.
This lets preauthentication use handshake_state_get_pmkid where
there is the potential that a new PMKID is derived and eapol
can use handshake_state_pmkid_matches which only derives the
PMKID to compare against the peers.
The existing API was limited to SHA1 or SHA256 and assumed a key
length of 32 bytes. Since other AKMs plan to be added update
this to take the checksum/length directly for better flexibility.
This is consistent with the over-Air path, and makes it clear when
reading the logs if over-DS was used, if there was a response frame,
and if the frame failed to parse in some way.
The parsing code was breaking out of the loop on the first comment
which is incorrect and causes only part of the file to be parsed.
Its odd this hasn't popped up until now but its likely due to
differing dhcpd versions, some which add comments and others that
do not.
FILS rekeys were fixed in hostapd somewhat recently but older
versions will fail this test. Document that so we don't get
confused when running tests against older hostapd versions.
Disable power save if the wiphy indicates its needed. Do this
before issuing GET_LINK so the netdev doesn't signal its up until
power save is disabled.
This allows generating code and test coverage reports using lcov &
genhtml. Useful for understanding how much of the codebase is currently
covered by unit and autotests.
Certain drivers do not handle power save very well resulting in
missed frames, firmware crashes, or other bad behavior. Its easy
enough to disable power save via iw, iwconfig, etc but since IWD
removes and creates the interface on startup it blows away any
previous power save setting. The setting must be done *after* IWD
creates the interface which can be done, but needs to be via some
external daemon monitoring IWD's state. For minimal systems,
e.g. without NetworkManager, it becomes difficult and annoying to
persistently disable power save.
For this reason a new driver flag POWER_SAVE_DISABLE is being
added. This can then be referenced when creating the interfaces
and if set, disable power save.
The driver_infos list in wiphy.c is hard coded and, naturally,
not configurable from a user perspective. As drivers are updated
or added users may be left with their system being broken until the
driver is added, IWD released, and packaged.
This adds the ability to define driver flags inside main.conf under
the "DriverQuirks" group. Keys in this group correspond to values in
enum driver_flag and values are a list of glob matches for specific
drivers:
[DriverQuirks]
DefaultInterface=rtl81*,rtl87*,rtl88*,rtw_*,brcmfmac,bcmsdh_sdmmc
ForcePae=buggy_pae_*
Rather than keep a pointer to the driver_info entry copy the flags
into the wiphy object. This preps for supporting driver flags via
a configuration file, specifically allowing for entries that are a
subset of others. For example:
{ "rtl88*", DEFAULT_IF },
{ "rtl88x2bu", FORCE_PAE },
Before it was not possible to add entires like this since only the
last entry match would get set. Now DEFAULT_IF would get set to all
matches, and FORCE_PAE to only rtl88x2bu. This isn't especially
important for the static list since it could be modified to work
correctly, but will be needed when parsing flags from a
configuration file that may contain duplicates or subsets of the
static list.
If there was some problem during the FT authenticate stage
its nice to know more of what happened: whether the AP didn't
respond, rejected the attempt, or sent an invalid frame/IEs.
In some situations its convenient for the same work item to be
inserted (rescheduled) while its in progress. FT for example does
this now if a roam fails. The same ft_work item gets re-inserted
which, currently, is not safe to do since the item is modified
and removed once completed.
Fix this by introducing wiphy_radio_work_reschedule which is an
explicit API for re-inserting work items from within the do_work
callback.
The wiphy work logic was changed around slightly to remove the item
at the head of the queue prior to starting and note the ID going
into do_work. If do_work signaled done and ID changed we know it
was re-inserted and can skip the destroy logic and move onto the
next item. If the item is not done continue as normal but set the
priority to INT_MIN, as usual, to prevent other items from getting
to the head of the queue.
This adds another radio so IWD hits the FT failure path after
authentication to the first BSS fails. This causes a wiphy work
item to be rescheduled which previously was unsafe.
If IWD connects under bad RF conditions and netconfig takes
a while to complete (e.g. slow DHCP), the roam timeout
could fire before DHCP is done. Then, after the roam,
IWD would transition automatically to connected before
DHCP was finished. In theory DHCP could still complete after
this point but any process depending on IWD's connected
state would be uninformed and assume IP networking is up.
Fix this by stopping netconfig prior to a roam if IWD is not
in a connected state. Then, once the roam either failed or
succeeded, start netconfig again.
iwctl quit (running quit non-interactively) isn't a useful command,
but it shouldn't segfault. Let's avoid calling readline functions if
we haven't initialized readline in this run.
When acting as a configurator the enrollee can start on a different
channel than IWD is connected to. IWD will begin the auth process
on this channel but tell the enrollee to transition to the current
channel after the auth request. Since a configurator must be
connected (a requirement IWD enforces) we can assume a channel
transition will always be to the currently connected channel. This
allows us to simply cancel the offchannel request and wait for a
response (rather than start another offchannel).
Doing this improves the DPP performance and reduces the potential
for a lost frame during the channel transition.
This patch also addresses the comment that we should wait for the
auth request ACK before canceling the offchannel. Now a flag is
set and IWD will cancel the offchannel once the ACK is received.
If IWD gets a disconnect during FT the roaming state will be
cleared, as well as any ft_info's during ft_clear_authentications.
This includes canceling the offchannel operation which also
destroys any pending ft_info's if !info->parsed. This causes a
double free afterwards. In addition the l_queue_remove inside the
foreach callback is not a safe operation either.
To fix this don't remove the ft_info inside the offchannel
destroy callback. The info will get freed by ft_associate regardless
of the outcome (parsed or !parsed). This is also consistent with
how the onchannel logic works.
Log and crash backtrace below:
iwd[488]: src/station.c:station_try_next_transition() 5, target aa:46:8d:37:7c:87
iwd[488]: src/wiphy.c:wiphy_radio_work_insert() Inserting work item 16668
iwd[488]: src/wiphy.c:wiphy_radio_work_insert() Inserting work item 16669
iwd[488]: src/wiphy.c:wiphy_radio_work_done() Work item 16667 done
iwd[488]: src/wiphy.c:wiphy_radio_work_next() Starting work item 16668
iwd[488]: src/netdev.c:netdev_mlme_notify() MLME notification Remain on Channel(55)
iwd[488]: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
iwd[488]: src/netdev.c:netdev_link_notify() event 16 on ifindex 5
iwd[488]: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
iwd[488]: src/netdev.c:netdev_deauthenticate_event()
iwd[488]: src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
iwd[488]: src/netdev.c:netdev_disconnect_event()
iwd[488]: Received Deauthentication event, reason: 6, from_ap: true
iwd[488]: src/station.c:station_disconnect_event() 5
iwd[488]: src/station.c:station_disassociated() 5
iwd[488]: src/station.c:station_reset_connection_state() 5
iwd[488]: src/station.c:station_roam_state_clear() 5
iwd[488]: double free or corruption (fasttop)
5 0x0000555b3dbf44a4 in ft_info_destroy ()
6 0x0000555b3dbf45b3 in remove_ifindex ()
7 0x0000555b3dc4653c in l_queue_foreach_remove ()
8 0x0000555b3dbd0dd1 in station_reset_connection_state ()
9 0x0000555b3dbd37e5 in station_disassociated ()
10 0x0000555b3dbc8bb8 in netdev_mlme_notify ()
11 0x0000555b3dc4e80b in received_data ()
12 0x0000555b3dc4b430 in io_callback ()
13 0x0000555b3dc4a5ed in l_main_iterate ()
14 0x0000555b3dc4a6bc in l_main_run ()
15 0x0000555b3dc4a8e0 in l_main_run_with_signal ()
16 0x0000555b3dbbe888 in main ()
Hostapd commit bc36991791 now properly sets the secure bit on
message 1/4. This was addressed in an earlier IWD commit but
neglected to allow for backwards compatibility. The check is
fatal which now breaks earlier hostapd version (older than 2.10).
Instead warn on this condition rather than reject the rekey.
Fixes: 7fad6590bd ("eapol: allow 'secure' to be set on rekeys")
After adding prefix matching the rule structure contained allocated
memory which was not being cleaned up on exit if rules still
remained in the list (removing the rule via DBus was done correctly)
The following commit:
80db8fd86c0c ("build: Use -Wvariadic-macros warning")
added a warning about variadic-macros. But it isn't quite clear why
since variadic macros are used throghout iwd. GCC doesn't honor this
option, but clang does. Since there's no real reason to stop using
variadic macros at this time, drop this warning.
The HT40+/- flags were reversed when checking against the 802.11
behavior flags.
HT40+ means the secondary channel is above (+) the primary channel
therefore corresponds to the PRIMARY_CHANNEL_LOWER behavior. And
the opposite for HT40-.
Reported-By: Alagu Sankar <alagusankar@gmail.com>
Use a more appropriate printf conversion string in order to avoid
unnecessary implicit conversion which can lead to a buffer overflow.
Reasons similar to commit:
98b758f8934a ("knownnetworks: fix printing SSID in hex")
In the case that the FT target is on the same channel as we're currently
operating on, use ft_authenticate_onchannel instead of ft_authenticate.
Going offchannel in this case can confuse some drivers.
Currently when we try FT-over-Air, the Authenticate frame is always
sent via offchannel infrastructure We request the driver to go
offchannel, then send the Authenticate frame. This works fine as long
as the target AP is on a different channel. On some networks some (or
all) APs might actually be located on the same channel. In this case
going offchannel will result in some drivers not actually sending the
Authenticate frame until after the offchannel operation completes.
Work around this by introducing a new ft_authenticate variant that will
not request an offchannel operation first.
Force conversion to unsigned char before printing to avoid sign
extension when printing SSID in hex. For example, if there are CJK
characters in SSID, it will generate a very long string like
/net/connman/iwd/ffffffe8ffffffaeffffffa1.
If a very long ssid was used (e.g. CJK characters in SSID), it might do
out of bounds write to static variable for lack of checking the position
before the last snprintf() call.
Seeing that some authenticators can't handle TLS session caching
properly, allow the EAP-TLS-based methods session caching support to be
disabled per-network using a method specific FastReauthentication setting.
Defaults to true.
With the previous commit, authentication should succeed at least every
other attempt. I'd also expect that EAP-TLS is not usually affected
because there's no phase2, unlike with EAP-PEAP/EAP-TTLS.
If we have a TLS session cached from this attempt or a previous
successful connection attempt but the overall EAP method fails, forget
the session to improve the chances that authentication succeeds on the
next attempt considering that some authenticators strangely allow
resumption but can't handle it all the way to EAP method success.
Logically the session resumption in the TLS layers on the server should
be transparent to the EAP layers so I guess those may be failed
attempts to further optimise phase 2 when the server thinks it can
already trust the client.
The extra IE length for the WMM IE was being set to 26 which is
the HT IE length, not WMM. Fix this and use the proper size for
the WMM IE of 50 bytes.
This shouldn't have caused any problems prior as the tail length
is always allocated with 256 or 512 extra bytes of headroom.
Since channels numbers are used as indexes into the array, and given
that channel numbers start at '1' instead of 0, make sure to allocate a
buffer large enough to not overflow when the max channel number for a
given band is accessed.
src/manager.c:manager_wiphy_dump_callback() New wiphy phy1 added (1)
==22290== Invalid write of size 2
==22290== at 0x4624B2: nl80211_parse_supported_frequencies (nl80211util.c:570)
==22290== by 0x417CA5: parse_supported_bands (wiphy.c:1636)
==22290== by 0x418594: wiphy_parse_attributes (wiphy.c:1805)
==22290== by 0x418E20: wiphy_update_from_genl (wiphy.c:1991)
==22290== by 0x464589: manager_wiphy_dump_callback (manager.c:564)
==22290== by 0x4CBDDA: process_unicast (genl.c:944)
==22290== by 0x4CC19C: received_data (genl.c:1056)
==22290== by 0x4C7140: io_callback (io.c:120)
==22290== by 0x4C5A97: l_main_iterate (main.c:476)
==22290== by 0x4C5BDC: l_main_run (main.c:523)
==22290== by 0x4C5F0F: l_main_run_with_signal (main.c:645)
==22290== by 0x40503B: main (main.c:600)
==22290== Address 0x4aa76ec is 0 bytes after a block of size 28 alloc'd
==22290== at 0x48417B5: malloc (vg_replace_malloc.c:393)
==22290== by 0x4BC4D1: l_malloc (util.c:62)
==22290== by 0x417BE4: parse_supported_bands (wiphy.c:1619)
==22290== by 0x418594: wiphy_parse_attributes (wiphy.c:1805)
==22290== by 0x418E20: wiphy_update_from_genl (wiphy.c:1991)
==22290== by 0x464589: manager_wiphy_dump_callback (manager.c:564)
==22290== by 0x4CBDDA: process_unicast (genl.c:944)
==22290== by 0x4CC19C: received_data (genl.c:1056)
==22290== by 0x4C7140: io_callback (io.c:120)
==22290== by 0x4C5A97: l_main_iterate (main.c:476)
==22290== by 0x4C5BDC: l_main_run (main.c:523)
==22290== by 0x4C5F0F: l_main_run_with_signal (main.c:645)
==22290==
This adds support for rekeys to AP mode. A single timer is used and
reset to the next station needing a rekey. A default rekey timer of
600 seconds is used unless the profile sets a timeout.
The only changes required was to set the secure bit for message 1,
reset the frame retry counter, and change the 2/4 verifier to use
the rekey flag rather than ptk_complete. This is because we must
set ptk_complete false in order to detect retransmissions of the
4/4 frame.
Initiating a rekey can now be done by simply calling eapol_start().
If IWD ends up dumping wiphy's twice (because of NEW_WIPHY event
soon after initial dump) it will also try and dump interfaces
twice leading to multiple DEL_INTERFACE calls. The second attempt
will fail with -ENODEV (since the interface was already deleted).
Just silently fail with this case and let the other DEL_INTERFACE
path handle the re-creation.
With really badly timed events a wiphy can be registered twice. This
happens when IWD starts and requests a wiphy dump. Immediately after
a NEW_WIPHY event comes in (presumably when the driver loads) which
starts another dump. The NEW_WIPHY event can't simply be ignored
since it could be a hotplug (e.g. USB card) so to fix this we can
instead just prevent it from being registered.
This does mean both dumps will happen but the information will just
be added to the same wiphy object.
Past commits should address any potential problems of the timer
firing during FT, but its still good practice to cancel the timer
once it is no longer needed, i.e. once FT has started.
If station has already started FT ensure station_cannot_roam takes
that into account. Since the state has not yet changed it must also
check if the FT work ID is set.
Under the following conditions IWD can accidentally trigger a second
roam scan while one is already in progress:
- A low RSSI condition is met. This starts the roam rearm timer.
- A packet loss condition is met, which triggers a roam scan.
- The roam rearm timer fires and starts another roam scan while
also overwriting the first roam scan ID.
- Then, if IWD gets disconnected the overwritten roam scan gets
canceled, and the roam state is cleared which NULL's
station->connected_network.
- The initial roam scan results then come in with the assumption
that IWD is still connected which results in a crash trying to
reference station->connected_network.
This can be fixed by adding a station_cannot_roam check in the rearm
timer. If IWD is already doing a roam scan station->preparing_roam
should be set which will cause it to return true and stop any further
action.
Aborting (signal 11) [/usr/libexec/iwd]
iwd[426]: ++++++++ backtrace ++++++++
iwd[426]: #0 0x7f858d7b2090 in /lib/x86_64-linux-gnu/libc.so.6
iwd[426]: #1 0x443df7 in network_get_security() at ome/locus/workspace/iwd/src/network.c:287
iwd[426]: #2 0x421fbb in station_roam_scan_notify() at ome/locus/workspace/iwd/src/station.c:2516
iwd[426]: #3 0x43ebc1 in scan_finished() at ome/locus/workspace/iwd/src/scan.c:1861
iwd[426]: #4 0x43ecf2 in get_scan_done() at ome/locus/workspace/iwd/src/scan.c:1891
iwd[426]: #5 0x4cbfe9 in destroy_request() at ome/locus/workspace/iwd/ell/genl.c:676
iwd[426]: #6 0x4cc98b in process_unicast() at ome/locus/workspace/iwd/ell/genl.c:954
iwd[426]: #7 0x4ccd28 in received_data() at ome/locus/workspace/iwd/ell/genl.c:1052
iwd[426]: #8 0x4c79c9 in io_callback() at ome/locus/workspace/iwd/ell/io.c:120
iwd[426]: #9 0x4c62e3 in l_main_iterate() at ome/locus/workspace/iwd/ell/main.c:476
iwd[426]: #10 0x4c6426 in l_main_run() at ome/locus/workspace/iwd/ell/main.c:519
iwd[426]: #11 0x4c6752 in l_main_run_with_signal() at ome/locus/workspace/iwd/ell/main.c:645
iwd[426]: #12 0x405987 in main() at ome/locus/workspace/iwd/src/main.c:600
iwd[426]: #13 0x7f858d793083 in /lib/x86_64-linux-gnu/libc.so.6
iwd[426]: +++++++++++++++++++++++++++
If the authenticator has already set an snonce then the packet must
be a retransmit. Handle this by sending 3/4 again but making sure
to not reset the frame counter.
Old wpa_supplicant versions do not set the secure bit on 2/4 during
rekeys which causes IWD to reject the message and eventually time out.
Modern versions do set it correctly but even Android 13 (Pixel 5a)
still uses an ancient version of wpa_supplicant which does not set the
bit.
Relax this check and instead just print a warning but allow the message
to be processed.
In try_handshake_complete() we return early if all the keys had
been installed before (initial associations). For rekeys we can
now emit the REKEY_COMPLETE event which lets AP mode reset the
rekey timer for that station.
When the TK is installed the 'ptk_installed' flag was never set to
zero. For initial associations this was fine (already zero) but for
rekeys the flag needs to be unset so try_handshake_complete knows
if the key was installed. This is consistent with how gtk/igtk keys
work as well.
Rekeys for station mode don't need to know when complete since
there is nothing to do once done. AP mode on the other hand needs
to know if the rekey was successful in order to reset/set the next
rekey timer.
The second handshake message was hard coded with the secure bit as
zero but for rekeys the secure bit should be set to 1. Fix this by
changing the 2/4 builder to take a boolean which will set the bit
properly.
It should be noted that hostapd doesn't check this bit so EAPoL
worked just fine, but IWD's checks are more strict.
The PEAP RFC wants implementations to enforce that Phase2 methods have
been successfully completed prior to accepting a successful result TLV.
However, when TLS session resumption is used, some servers will skip
phase2 methods entirely and simply send a Result TLV with a success
code. This results in iwd (erroneously) rejecting the authentication
attempt.
Fix this by marking phase2 method as successful if session resumption is
being used.
This adds a builder which sets the country IE in probes/beacons.
The IE will use the 'single subband triplet sequence' meaning
dot11OperatingClassesRequired is false. This is much easier to
build and doesn't require knowing an operating class.
The IE itself is variable in length and potentially could grow
large if the hardware has a weird configuration (many different
power levels or segmentation in supported channels) so the
overall builder was changed to take the length of the buffer and
warnings will be printed if any space issues are encountered.
IWD's channel/frequency conversions use simple math to convert and
have very minimal checks to ensure the input is valid. This can
lead to some channels/frequencies being calculated which are not
in IWD's E-4 table, specifically in the 5GHz band.
This is especially noticable using mac80211_hwsim which includes
some obscure high 5ghz frequencies which are not part of the 802.11
spec.
To fix this calculate the frequency or channel then iterate E-4
operating classes to check that the value actually matches a class.
The 6GHz test was not incrementing the frequencies properly which
was resulting in invalid frequencies, but since the conversion
API was never linked to E-4 the test was still passing.
The country IE can sometimes have a zero pad byte at the end for
alignment. This was not being checked for which caused the loop
to go past the end of the IE and print an entry for channel 0
(the pad byte) plus some garbage data.
Fix this by checking for the pad byte explicitly which skips the
print and terminates the loop.
If supported this will include the HT capabilities and HT
operations elements in beacons/probes. Some shortcuts were taken
here since not all the information is currently parsed from the
hardware. Namely the HT operation element does not include the
basic MCS set. Still, this will at least show stations that the
AP is capable of more than just basic rates.
The builders themselves are structured similar to the basic rates
builder where they build only the contents and return the length.
The caller must set the type/length manually. This is to support
the two use cases of using with an IE builder vs direct pointer.
To include HT support a chandef needs to be created for whatever
frequency is being used. This allows IWD to provide a secondary
channel to the kernel in the case of 40MHz operation. Now the AP
will generate a chandef when starting based on the channel set
in the user profile (or default).
If HT is not supported the chandef width is set to 20MHz no-HT,
otherwise band_freq_to_ht_chandef is used.
The WMM parameter IE is expected by the linux kernel for any AP
supporting HT/VHT etc. IWD won't actually use WMM and its not
clear exactly why the kernel uses this restriction, but regardless
it must be included to support HT.
For AP mode its convenient for IWD to choose an appropriate
channel definition rather than require the user provide very
low level parameters such as channel width, center1 frequency
etc. For now only HT is supported as VHT/HE etc. require
additional secondary channel frequencies.
The HT API tries to find an operating class using 40Mhz which
complies with any hardware restrictions. If an operating class is
found that is supported/not restricted it is marked as 'best' until
a better one is found. In this case 'better' is a larger channel
width. Since this is HT only 20mhz and 40mhz widths are checked.
This adds some additional parsing to obtain the AMPDU parameter
byte as well as wiphy_get_ht_capabilities() which returns the
complete IE (combining the 3 separate kernel attributes).
The supported rates IE was being built in two places. This makes that
code common. Unfortunately it needs to support both an ie builder
and using a pointer directly which is why it only builds the contents
of the IE and the caller must set the type/length.
Move the l_netconfig_set_route_priority() and
l_netconfig_set_optimistic_dad_enabled() calls from netconfig_new, which
is called once for the l_netconfig object's lifetime, to
netconfig_load_settings, which is called before every connection attempt.
This is needed because we clean up the l_netconfig configuration by calling
l_netconfig_reset_config() at different points in connection setup and
teardown so we'd reset the route priority that we've set in netconfig_new,
back to 0 and never reload it.
The disabled_freqs list is being removed and replaced with a new
list in the band object. This completely removes the need for
the pending_freqs list as well since any regdom related dumps
can just overwrite the existing frequency list.
This adds two new APIs:
wiphy_get_frequency_info(): Used to get information about a given
frequency such as disabled/no-IR. This can also be used to check
if the frequency is supported (NULL return is unsupported).
wiphy_band_is_disabled(): Checks if a band is disabled. Note that
an unsupported band will also return true. Checking support should
be done with wiphy_get_supported_bands()
As additional frequency info is needed it doesn't make sense to
store a full list of frequencies for every attribute (i.e.
supported, disabled, no-IR, etc).
This changes nl80211_parse_supported_frequencies to take a list
of frequency attributes where each index corresponds to a channel,
and each value can be filled with flag bits to signal any
limitations on that frequency.
wiphy.c then had to be updated to use this rather than the existing
scan_freq_set lists. This, as-is, will break anything using
wiphy_get_disabled_freqs().
Currently the wiphy object keeps track of supported and disabled
frequencies as two separate scan_freq_set's. This is very expensive
and limiting since we have to add more sets in order to track
additional frequency flags (no-IR, no-HT, no-HE etc).
Instead we can refactor how frequencies are stored. They will now
be part of the band object and stored as a list of flag structures
where each index corresponds to a channel
IWD was optimizing FT-over-DS by authenticating to multiple BSS's
at the time of connecting which then made future roams slightly
faster since they could jump right into association. So far this
hasn't posed a problem but it was reported that some AP's actually
enforce a reassociation timeout (included in 4-way handshake).
Hostapd itself does no such enforcement but anything external to
hostapd could monitor FT events and clear the cache if any exceeded
this timeout.
For now remove the early action frames and treat FT-over-DS the
same as FT-over-Air. In the future we could parse the reassociation
timeout, batch out FT-Action frames and track responses but for the
time being this just fix the issue at a small performance cost.
Queue the FT action just like we do with FT Authenticate which makes
it able to be used the same way, i.e. call ft_action() then queue
the ft_associate work right away.
A timer was added to end the work item in case the target never
responds.
If the regdom updates during a periodic scan the results will be
delayed until after the update in order to, potentially, add 6GHz
frequencies since they may become available. The delayed results
happen regardless of 6GHz support but scan_wiphy_watch() was
returning early if 6GHz was not supported causing the scan request
to never complete.
The blamed commit argues that the periodic scan callback doesn't do
anything useful in the event of an aborted scan, but this is not
entirely true. In particular, the callback is responsible for re-arming
the periodic scan timer. Make sure to call scan_finished() so that iwd's
periodic scanning logic continues unabated even when a periodic scan is
aborted.
Also remove the periodic boolean member of struct scan_request, as it
serves no purpose anymore.
Fixes: 6051a1495227 ("scan: Don't callback on SCAN_ABORTED")
This enables IWD to use 5GHz frequencies in AP mode. Currently
6GHz is not supported so we can assume a [General].Channel value
36 or above indicates the 5GHz band.
It should be noted that the system will probably need a regulatory
domain set in order for 5GHz to be allowed in AP mode. This is due
to world roaming (00) restricting any/all 5GHz frequencies. This
can be accomplished by setting main.conf [General].Country=CC to
the country this AP will operate in.
wiphy_get_supported_rates expected an enum defined in the nl80211
header but the argument type was an unsigned int, not exactly
intuitive to anyone using the API. Since the nl80211 enum value
was only used in a switch statement it could just as well be IWD's
internal enum band_freq.
This also allows modules which do not reference nl80211.h to use
wiphy_get_supported_rates().
Change wording to say that IPv6 support is enabled by default. No
functional changes.
Fixes: 00baa75e9633 ("netconfig: Enable IPV6 support by default")
Before this change, I noticed that some non-interactive commands
don't work,
$ iwctl version
$ iwctl help
while other ones do.
$ iwctl station wlan0 show
This seems to be a typo bug in the if clause checking for additional
arguments.
This file is a compilation command database used by clangd and ccls,
and can be generated by tools like https://github.com/rizsotto/Bear.
$ bear -- make clean all
If a CMD_TRIGGER_SCAN request fails with -EBUSY, iwd currently assumes
that a scan is ongoing on the underlying wdev and will retry the same
command when that scan is complete. It gets notified of that completion
via the scan_notify() function, and kicks the scan logic to try again.
However, if there is another wdev on the same wiphy and that wdev has a
scan request in flight, the kernel will also return -EBUSY. In other
words, only one scan request per wiphy is permitted.
As an example, the brcmfmac driver can create an AP interface on the
same wiphy as the default station interface, and scans can be triggered
on that AP interface.
If -EBUSY is returned because another wdev is scanning, then iwd won't
know when it can retry the original trigger request because the relevant
netlink event will arrive on a different wdev. Indeed, if no scan
context exists for that other wdev, then scan_notify will return early
and the scan logic will stall indefinitely.
Instead, and in the event that no scan context matches, use it as a cue
to retry a pending scan request that happens to be destined for the same
wiphy.
The previous commit added an invocation of known_networks_watch_add, but
never updated the module dependency graph.
Fixes: a793a41662b2 ("station, eapol: Set up eap-tls-common for session caching")
Use eap_set_peer_id() to set a string identifying the TLS server,
currently the hex-encoded SSID of the network, to be used as group name
and primary key in the session cache l_settings object. Provide pointers
to storage_eap_tls_cache_{load,sync} to eap-tls-common.c using
eap_tls_set_session_cache_ops(). Listen to Known Network removed
signals and call eap_tls_forget_peer() to have any session related to
the network also dropped from the cache.
Use l_tls_set_session_cache() to enable session cache/resume in the
TLS-based EAP methods. Sessions for all 802.1x networks are stored in
one l_settings object.
eap_{get,set}_peer_id() API is added for the upper layers to set the
identifier of the authenticator (or the supplicant if we're the
authenticator, if there's ever a use case for that.)
eap-tls-common.c can't call storage_eap_tls_cache_{load,sync}()
or known_networks_watch_add() (to handle known network removals) because
it's linked into some executables that don't have storage.o,
knownnetworks.o or common.o so an upper layer (station.c) will call
eap_tls_set_session_cache_ops() and eap_tls_forget_peer() as needed.
Minor changes to these two methods resulting from two rewrites of them.
Actual changes are:
* storage_tls_session_sync parameter is const,
* more specific naming,
* storage_tls_session_load will return an empty l_settings instead of
NULL so eap-tls-common.c doesn't have to handle this.
storage.c makes no assumptions about the group names in the l_settings
object and keeps no reference to that object, eap-tls-common.c is going
to maintain the memory copy of the cache since this cache and the disk
copy of it are reserved for EAP methods only.
A comma separated list as a string was ok for pure display purposes
but if any processing needed to be done on these values by external
consumers it really makes more sense to use a DBus array.
AP mode implements a few DBus methods/properties which are named
the same as station: Scan, Scanning, and GetOrderedNetworks. Allow
the Device object to work with these in AP mode by calling the
correct method if the Mode is 'ap'.
This wasn't being updated meaning the property is missing until a
scan is issued over DBus.
Rather than duplicate all the property changed calls they were all
factored out into a helper function.
Adds the MulticastDNS option globally to main.conf. If set all
network connections (when netconfig is enabled) will set mDNS
support into the resolver. Note that an individual network profile
can still override the global value if it sets MulticastDNS.
The AP mode device APIs were hacked together and only able to start
stop an AP. Now that the AP interface has more functionality its
best to use the DBus class template to access the full AP interface
capabilities.
The limitation of cipher selection in ap.c was done so to allow p2p to
work. Now with the ability to specify ciphers in the AP config put the
burden on p2p to limit ciphers as it needs which is only CCMP according
to the spec.
These can now be optionally provided in an AP profile and provide a
way to limit what ciphers can be chosen. This still is dependent on
what the hardware supports.
The validation of these ciphers for station is done when parsing
the BSS RSNE but for AP mode there is no such validation and
potentially any supported cipher could be chosen, even if its
incompatible for the type of key.
This change removes duplicate calls to display_table_footer(), in
station show.
Before this change, the bug caused an extra newline to be output every
time the table updated. This only occurred when the network was
disconnected.
$ iwctl
[iwd]# station wlan0 show
The netdev_copy_tk function was being hard coded with authenticator
set to false. This isn't important for any ciphers except TKIP but
now that AP mode supports TKIP it needs to be fixed.
The disabled cipher list contained a '.' instead of ',' which prevented
the subsequent ciphers from being disabled. This was only group management
ciphers so it didn't have any effect on the test.
The -F option is undocumented but allows you to pass a nl80211
family ID so iwmon doesn't ignore messages which don't match the
systems nl80211 family ID (i.e. pcaps from other systems).
This is somewhat of a pain to use since its unclear what the other
system's family ID actually is until you run it though something
like wireshark. Instead iwmon can ignore the family ID when in
read mode which makes reading other systems pcap files automatic.
Expand nlmon_create to be useful for both pcaps and monitoring. Doing
this also lets iwmon filter pcaps based on --no-ies,rtnl,scan etc
flags since they are part of the config.
Though TKIP is deprecated and insecure its trivial to support it in
AP mode as we already do in station. This is only to allow AP mode
for old hardware that may only support TKIP. If the hardware supports
any higher level cipher that will be chosen automatically.
The __str__ function assumed station mode which throws an exception
if the device is in AP mode. Fix this as well as print out the mode
the device is in.
This API optimizes scanning to run tests quickly by only scanning
the frequencies which hostapd is using. But if a test doesn't use
hostapd this API raises an uncaught exception.
Check if hostapd is being used, and if not just do a full scan.
The key descriptor version was hard coded to HMAC_SHA1_AES which
is correct when using IE_RSN_AKM_SUITE_PSK + CCMP. ap.c hard
codes the PSK AKM but still uses wiphy to select the cipher. In
theory there could be hardware that only supports TKIP which
would then make IWD non-compliant since a different key descriptor
version should be used with PSK + TKIP (HMAC_MD5_ARC4).
Now use a helper to sort out which key descriptor should be used
given the AKM and cipher suite.
Similarly to l_netconfig track whether IWD's netconfig is active (from
the moment of netconfig_configure() till netconfig_reset()) using a
"started" flag and avoid handling or emitting any events after "started"
is cleared.
This fixes an occasional issue with the Netconfig Agent backend where
station would reset netconfig, netconfig would issue DBus calls to clear
addresses and routes, station would go into DISCONNECTING, perhaps
finish and go into DISCONNECTED and after a while the DBus calls would
come back with an error which would cause a NETCONFIG_EVENT_FAILED
causing station to call netdev_disconnct() for a second time and
transition to and get stuck in DISCONNECTING.
Add an additional optional PairwiseCipher property on
net.connman.iwd.StationDiagnostic interface that will hold the current
pairwise cipher in use for the connection.
Both CMD_ASSOCIATE and CMD_CONNECT paths were using very similar code to
build RSN specific attributes. Use a common function to build these
attributes to cut down on duplicated code.
While here, also start using ie_rsn_cipher_suite_to_cipher instead of
assuming that the pairwise / group ciphers can only be CCMP or TKIP.
Instead of copy-pasting the same basic operation (memcpy & assignment),
use a goto and a common path instead. This should also make it easier
for the compiler to optimize this function.
Commit c7640f8346 was meant to fix a sign compare warning
in clang because NLMSG_NEXT internally compares the length
with nlmsghdr->nlmsg_len which is a u32. The problem is the
NLMSG_NEXT can underflow an unsigned value, hence why it
expects an int type to be passed in.
To work around this we can instead pass a larger sized
int64_t which the compiler allows since it can upgrade the
unsigned nlmsghdr->nlmsg_len. There is no underflow risk
with an int64_t either because the buffer used is much
smaller than what can fit in an int64_t.
Fixes: c7640f8346 ("monitor: fix integer comparison error (clang)")
In nearly all cases the auto-line breaks can be on spaces in the
string. The only exception (so far) is DPP which displays a very
long URI without any spaces. When this is displayed a single
character was lost on each break. This was due to the current line
being NULL terminated prior to the next line string being returned.
To handle both cases the next line is copied prior to terminating
the current line. The offsets were modified slightly so the line
will be broken *after* the space, not on the space. This leaves
the space at the end of the current line which is invisible to the
user but allows the no-space case to also work without loss of
the last character (since that last character is where the space
normally would be).
Each color escape is tracked and the new_width is adjusted
accordingly. But if the color escape comes after a space which breaks
the line, the adjusted width ends up being too long since that escape
sequence isn't appearing on the current line. This causes the next
column to be shifted over.
The old 'max' parameter was being used both as an input and output
parameter which was confusing. Instead have next_line take the
column width as input, and output a new width which includes any
color escapes and wide characters.
In theory any input to this function should have valid utf-8 but
just in case the strings should be validated. This removes the
need to check the return of l_utf8_get_codepoint which is useful
since there is no graceful failure path at this point.
The known frequency list may include frequencies that once were
allowed but are now disabled due to regulatory restrictions. Don't
include these frequencies in the roam scan.
The utf-8 bytes were being counted as normal ascii so the
width maximum was not being increased to include
non-printable bytes like it is for color escape sequences.
This lead to the row not printing enough characters which
effected the text further down the line.
Fix this by increasing 'max' when non-codepoint utf-8
characters are found.
This test was taking about 5 minutes to run, specifically
the requested scan test. One slight optimization is to
remove the duplicate hidden network, since there is no
need for two. In addition the requested scan test was
changed so it does not periodic scan and only issues a dbus
scan.
The CI was sometimes taking ~10-15 minutes to run just this
test. This is likely due to the test having 7 radios and
which is a lot of beacons/probes to process.
Disabling the unused hostapd instances drops the runtime down
to about 1 minute.
If a rule was disabled it would cause hwsim to not continue processing
frames using rules further in the queue. _Most_ tests only use one
rule so this shouldn't have changed their behavior but others which
use multiple rules may be effected and the tests have not been
running properly.
These events are sent if IWD fails to authentiate
(ft-over-air-roam-failed) or if it falls back to over air after
failing to use FT-over-DS (try-ft-over-air)
If IPv4 setup fails and the netconfig logic gives up, continue as if the
connection had failed at earlier stages so that autoconnect can try the
next available network.
Certain drivers support/require probe response offloading which
IWD did not check for or properly handle. If probe response
offloading is required the probe response frame watch will not
be added and instead the ATTR_PROBE_RESP will be included with
START_AP.
The head/tail builders were reused but slightly modified to check
if the probe request frame is NULL, since it will be for use with
START_AP.
Parse the AP probe response offload attribute during the dump. If
set this indicates the driver expects the probe response attribute
to be included with START_AP.
Clearing all authentications during ft_authenticate was a very large
hammer and may remove cached authentications that could be used if
the current auth attempt fails.
For example the best BSS may have a problem and fail to authenticate
early with FT-over-DS, then fail with FT-over-Air. But another BSS
may have succeeded early with FT-over-DS. If ft_authenticate clears
all ft_infos that successful authentication will be lost.
This tests the new behavior where the roam request does not
indicate disassociation is imminent. In this case if no
candidates are found IWD should not roam.
Instead of requiring the initial condition be met when calling
wait_for_object_change, wait for it.
This is how every caller of this function uses it, specifically
with roaming where we first wait for DeviceState.roaming, then
call wait_for_object_change. This can be simplified for the caller
so the initial condition is first waited for.
AP roaming was structured such that any AP roam request would
force IWD to roam (assuming BSS's were found in scan results).
This isn't always the best behavior since IWD may be connected
to the best BSS in range.
Only force a roam if the AP includes one of the 3 disassociation/
termination bits. Otherwise attempt to roam but don't set the
ap_directed_roaming flag which will allows IWD to stay with the
current BSS if no better candidates are found.
There are a few checks that can be done prior to parsing the
request, in addition the explicit check for preparing_roam was
removed since this is taken care of by station_cannot_roam().
Once offchannel completes we can check if the info structure was
parsed, indicating authentication succeeded. If not there is no
reason to keep it around since IWD will either try another BSS or
fail.
This both adds proper handling to the new roaming logic and fixes
a potential bug with firmware roams.
The new way roaming works doesn't use a connect callback. This
means that any disconnect event or call to netdev_connect_failed
will result in the event handler being called, where before the
connect callback would. This means we need to handle the ROAMING
state in the station disconnect event so IWD properly disassociates
and station goes out of ROAMING.
With firmware roams netdev gets an event which transitions station
into ROAMING. Then netdev issues GET_SCAN. During this time a
disconnect event could come in which would end up in
station_disconnect_event since there is no connect callback. This
needs to be handled the same and let IWD transition out of the
ROAMING state.
This finalizes the refactor by moving all the handshake prep
into FT itself (most was already in there). The netdev-specific
flags and state were added into netdev_ft_tx_associate which
now avoids any need for a netdev API related to FT.
The NETDEV_EVENT_FT_ROAMED event is now emitted once FT completes
(netdev_connect_ok). This did require moving the 'in_ft' flag
setting until after the keys are set into the kernel otherwise
netdev_connect_ok has no context as to if this was FT or some
other connection attempt.
In addition the prev_snonce was removed from netdev. Restoring
the snonce has no value once association begins. If association
fails it will result in a disconnect regardless which requires
a new snonce to be generated
This converts station to using ft_action/ft_authenticate and
ft_associate and dropping the use of the netdev-only/auth-proto
logic.
Doing this allows for more flexibility if FT fails by letting
IWD try another roam candidate instead of disconnecting.
Now the full action frame including the header is provided to ft
which breaks the existing parser since it assumes the buffer starts
at the body of the message.
This forwards Action, Authentication and Association frames to
ft.c via their new hooks in netdev.
Note that this will break FT-over-Air temporarily since the
auth-proto still is in use.
The current behavior is to only find the best roam candidate, which
generally is fine. But if for whatever reason IWD fails to roam it
would be nice having a few backup BSS's rather than having to
re-scan, or worse disassociate and reconnect entirely.
This patch doesn't change the roam behavior, just prepares for
using a roam candidate list. One difference though is any roam
candidates are added to station->bss_list, rather than just the
best BSS. This shouldn't effect any external behavior.
The candidate list is built based on scan_bss rank. First we establish
a base rank, the rank of the current BSS (or zero if AP roaming). Any
BSS in the results with a higher rank, excluding the current BSS, will
be added to the sorted station->roam_bss_list (as a new 'roam_bss'
entry) as well as stations overall BSS list. If the resulting list is
empty there were no better BSS's, otherwise station can now try to roam
starting with the best candidate (head of the roam list).
A new API was added, ft_authenticate, which will send an
authentication frame offchannel via CMD_FRAME. This bypasses
the kernel's authentication state allowing multiple auth
attempts to take place without disconnecting.
Currently netdev handles caching FT auth information and uses FT
parsers/auth-proto to manage the protocol. This sets up to remove
this state machine from netdev and isolate it into ft.c.
This does not break the existing auth-proto (hence the slight
modifications, which will be removed soon).
Eventually the auth-proto will be removed from FT entirely, replaced
just by an FT state machine, similar to how EAPoL works (netdev hooks
to TX/RX frames).
There may be situations (due to Multi-BSS operation) where an AP might
be advertising multiple SSIDs on the same BSSID. It is thus more
correct to lookup the preauthentication target on the network object
instead of the station bss_list. It used to be that the network list of
bsses was not updated when roam scan was performed. Hence the lookup
was always performed on the station bss_list. But this is no longer the
case, so it is safer to lookup on the network object directly on the
network.
The warnings in the authenticate and connect events were identical
so it could be difficult knowing which print it was if IWD is not
in debug mode (to see more context). The prints were changed to
indicate which event it was and for the connect event the reason
attribute is also parsed.
Note the resp_ies_len is also initialized to zero now. After making
the changes gcc was throwing a warning.
FT is special in that it really should not be interrupted. Since
FRAME/OFFCHANNEL have the highest priority we run the risk of
DPP or some other offchannel operation interfering with FT.
FT is now driven (mostly) by station which removes the connect
callback. Instead once FT is completed, keys set, etc. netdev
will send an event to notify station.
Since l_netconfig's DHCPv6 client instance no longer sets parameters on
the l_icmp6_client instance, call l_icmp6_client_set_nodelay() and
l_icmp6_client_set_debug() directly. Also enable optimistic DAD to
speed up IPv6 setup if available.
All uses of frame-xchg were for action frames, and the frame type
was hard coded. Soon other frame types will be needed so the type
must now be specified in the frame_xchg_prefix structure.
This will make the debug API more robust as well as fix issues
certain drivers have when trying to roam. Some of these drivers
may flush scan results after CMD_CONNECT which results in -ENOENT
when trying to roam with CMD_AUTHENTICATE unless you rescan
explicitly.
Now this will be taken care of automatically and station will first
scan for the BSS (or full scan if not already in results) and
attempt to roam once the BSS is seen in a fresh scan.
The logic to replace the old BSS object was factored out into its
own function to be shared by the non-debug roam scan. It was also
simplified to just update the network since this will remove the
old BSS if it exists.
Add a second netconfig-commit backend which, if enabled, doesn't
directly send any of the network configuration to the kernel or system
files but delegates the operation to an interested client's D-Bus
method as described in doc/agent-api.txt. This backend is switched to
when a client registers a netconfig agent object and is swiched away
from when the client disconnects or unregisters the agent. Only one
netconfig agent can be registered any given time.
Add netconfig_event_handler() that responds to events emitted by
the l_netconfig object by calling netconfig_commit, tracking whether
we're connected for either address family and emitting
NETCONFIG_EVENT_CONNECTED or NETCONFIG_EVENT_FAILED as necessary.
NETCONFIG_EVENT_FAILED is a new event as until now failures would cause
the netconfig state machine to stop but no event emitted so that
station.c could take action. As before, these events are only
emitted based on the IPv4 configuration state, not IPv6.
Add netconfig-commit.c whose main method, netconfig_commit actually sets
the configuration obtained by l_netconfig to the system netdev,
specifically it sets local addresses on the interface, adds routes to the
routing table, sets DNS related data and may add entries to the neighbor
cache. netconfig-commit.c uses a backend-ops type structure to allow
for switching backends. In this commit there's only a default backend
that uses l_netconfig_rtnl_apply() and a struct resolve object to write
the configuration.
netconfig_gateway_to_arp is moved from netconfig.c to netconfig-commit.c
(and renamed.) The struct netconfig definition is moved to netconfig.h
so that both files can access the settings stored in the struct.
To avoid repeated lookups by ifindex, replace the ifindex member in
struct netconfig with a struct netdev pointer. A struct netconfig
always lives shorter than the struct netdev.
* make the error handling simpler,
* make error messages more consistent,
* validate address families,
* for IPv4 skip l_rtnl_address_set_noprefixroute()
as l_netconfig will do this internally as needed.
* for IPv6 set the default prefix length to 64 as that's going to be
used for the local prefix route's prefix length and is a more
practical value.
Drop all the struct netconfig members where we were keeping the parsed
netconfig settings and add a struct l_netconfig object. In
netconfig_load_settings load all of the settings once parsed directly
into the l_netconfig object. Only preserve the mdns configuration and
save some boolean values needed to properly handle static configuration
and FILS. Update functions to use the new set of struct netconfig
members.
These booleans mirroring the l_netconfig state could be replaced by
adding l_netconfig getters for settings which currently only have
setters.
In anticipation of switching to use the l_netconfig API, which
internally handles DHCPv4, DHCPv6, ACD, etc., drop pointers to
instances of l_dhcp_client, l_dhcp6_client and l_acd from struct
netconfig. Also drop all code used for handling events from these
APIs, including code to commit the received configurations to the
system. Committing the final settings to the system netdevs is going to
be handled by a new set of utilities in a new file.
Update ConfigureIPv{4,6}() parameters to simplify mapping our sets of
addresses and routes directly to D-Bus dictionaries. Split Cancel()
into CancelIPv{4,6}().
The RRM module was blindly scanning using the requested
frequency which may or may not be possible given the hardware.
Instead check that the frequency will work and if not reject
the request.
This was reported by a user seeing the RRM scan fail which was
due to the AP requesting a scan on 5GHz when the adapter was
2.4GHz only.
The hostapd events for RRM come regardless of success
so they need to be checked as such. In addition one more
RRM request was added to scan on a frequency which is
disabled (operating class 82, channel 14).
Support for MAC address changes while powered was recently added to
mac80211. This avoids the need to power down the device which both
saves time as well as preserves any allowed frequencies which may
have been disabled if the device powered down.
The code path for changing the address was reused but now just the
'up' callback will be provided directly to l_rtnl_set_mac. Since
there aren't multiple stages of callbacks the rtnl_data structure
isn't strictly needed, but the code looks cleaner and more
consistent between the powered/non-powered code paths.
The comment/debug error print was also updated to be more general
between the two MAC change code paths.
Documentation for MulticastDNS setting suggests it should be part of the
main iwd configuration file. See man iwd.config. However, in reality
the setting was being pulled from the network provisioning file instead.
The latter actually makes more sense since systemd-resolved has its own
set of global defaults. Fix the documentation to reflect the actual
implementation.
Previously we had an ACD failure scenario where a new client forces its
IP to create an IP conflict and an already-connected client detects the
conflict and reacts. Now first test a scenario where a newly connecting
IWD client runs ACD before setting its statically configured IP, detects
a conflict and refuses to continue, then run the second scenario where
the newly connecting DHCP-configured client ignores the conflict and
starts ACD in defend-indefinitely mode and the older client in
defent-once mode gives up its IP.
Due to those variables being global (IWD class variables) calling either
unregister_psk_agent or del on one IWD class instance would unregister
all agents on all instances. Move .psk_agents and two other class
variables to the object. They were already referenced using "self."
as if they were object variables throughout the class.
Part of static_test.py starts a second IWD instance and tries to make
it connect to the AP with the same IP address as the first IWD instance
which is already connected, to produce an IP conflict. For this, the
second instance uses DHCP and the test expects the DHCP server to offer
the address 192.168.1.10 to it. However in the current setup the DHCP
server manages to detect that 192.168.1.10 is in use and offers .11
instead. Break the DHCP server's conflict detection by disabling ICMP
ping replies in order to fix the test.
Previously this has worked because the AP's and the DHCP server's
network interface is in the same network namespace as the first IWD
instance's network interface meaning that pings between the two
interfaces shouldn't work (a known Linux kernel routing quirk...).
I am not sure why those pings currently do work but take no chances and
disable ICMP pings.
netdev does not keep any pointers to struct scan_bss arguments that are
passed in. Make this explicitly clear by modifying the API definitions
and mark these as const.
This adds a few utilities for setting up an FT environment. All the
roaming tests basically copy/paste the same code for setting up the
hostapd instances and this can cause problems if not done correctly.
set_address() sets the MAC address on the device, and restarts hostapd
group_neighbors() takes a list of HostapdCLI objects and makes each a
neighbor to the others.
The neighbor report element requires the operating class which isn't
advertised by hostapd. For this we assume operating class 81 but this
can be set explicitly if it differs. Currently no roaming tests use
5/6GHz frequencies, and just in case an exception will be thrown if
the channel is greater than 14 and the op_class didn't change.
The packet loss test had a few problems. First being that the RSSI for
the original BSS was not low enough to change the rank. This meant any
roam was just lucky that the intended BSS was first in the results.
The second problem is timing related, and only happens on UML. Disabling
the rules after the roaming condition sometimes allows IWD to fully
roam and connect before the next state change checks.
A new test which blocks all data frames once connected, then tries
to send 100 packets. This should result in the kernel sending a
packet loss event to userspace, which IWD should react to and roam.
This adds a new netdev event for packet loss notifications from
the kernel. Depending on the scenario a station may see packet
loss events without any other indications like low RSSI. In these
cases IWD should still roam since there is no data flowing.
The limitations of readline required that the autocompletion choose
a 'default' device. With multiple phys this doesn't work. Now the
readline limitation has been worked around and station can look up
the device for the command completion.
There is a limitation of libreadline where no context/userdata
can be passed to completion functions. Thi affects iwctl since
the entity value isn't known to completion functions.
Workarounds such as getting the default device are employed but
its not a great solution.
Instead hack around this limitation by parsing the prompt to
extract the entity (second arg). Then use a generic match function
given to readline which can call the actual match function and
include the entity.
The ATTR_ARRAY type was quite limited, only supporting u16/u32 and
addresses. This changes the union to a struct so nested/function
can be defined along with array_type.
Some APs use an older hostapd OWE implementation which incorrectly
derives the PTK. To work around this group 19 should be used for
these APs. If there is a failure (reason=2) and the AKM is OWE
set force default group into network and retry. If this has been
done already the behavior is no different and the BSS will be
blacklisted.
If a OWE network is buggy and requires the default group this info
needs to be stored in network in order for it to set this into the
handshake on future connect attempts.
This functionality works around the kernel's behavior of allowing
6GHz only after a regulatory domain update. If the regdom updates
scan.c needs to be aware in order to split up periodic scans, or
insert 6GHz frequencies into an ongoing periodic scan. Doing this
allows any 6GHz BSS's to show up in the scan results rather than
needing to issue an entirely new scan to see these BSS's.
The kernel's regulatory domain updates after some number of beacons
are processed. This triggers a regulatory domain update (and wiphy
dump) but only after a scan request. This means a full scan started
prior to the regdom being set will not include any 6Ghz BSS's even
if the regdom was unlocked during the scan.
This can be worked around by splitting up a large scan request into
multiple requests allowing one of the first commands to trigger a
regdom update. Once the regdom updates (and wiphy dumps) we are
hopefully still scanning and could append an additional request to
scan 6GHz.
In the case of an external scan, we won't have a scan_request object,
sr. Make sure to not crash in this case.
Also, since scan_request can no longer carry the frequency set in all
cases, add a new member to scan_results in order to do so.
Fixes: 27d8cf4ccc59 ("scan: track scanned frequencies for entire request")
The kernel handles setting the regulatory domain by receiving beacons
which set the country IE. Presumably since most regulatory domains
disallow 6GHz the default (world) domain also disables it. This means
until the country is set, 6GHz is disabled.
This poses a problem for IWD's quick scanning since it only scans a few
frequencies and this likely isn't enough beacons for the firmware to
update the country, leaving 6Ghz inaccessable to the user without manual
intervention (e.g. iw scan passive, or periodic scans by IWD).
To try and work around this limitation the quick scan logic has been
updated to check if a 6GHz AP has been connected to before and if that
frequency is disabled (but supported). If this is the case IWD will opt
for a full passive scan rather than scanning a limited set of
frequencies.
For whatever reason the kernel will send regdom updates even if
the regdom didn't change. This ends up causing wiphy to dump
which isn't needed since there should be no changes in disabled
frequencies.
Now the previous country is checked against the new one, and if
they match the wiphy is not dumped again.
A change in regulatory domain can result in frequencies being
enabled or disabled depending on the domain. This effects the
frequencies stored in wiphy which other modules depend on
such as scanning, offchannel work etc.
When the regulatory domain changes re-dump the wiphy in order
to update any frequency restrictions.
A helper to check whether the country code corresponds to a
real country, or some special code indicating the country isn't
yet set. For now, the special codes are OO (world roaming) and
XX (unknown entity).
Events to indicate when a regulatory domain wiphy dump has
started and ended. This is important because certain actions
such as scanning need to be delayed until the dump has finished.
The NEW_SCAN_RESULTS handling was written to only parse the frequency
list if there were no additional scan commands to send. This results in
the scan callback containing frequencies of only the last CMD_TRIGGER.
Until now this worked fine because a) the queue is only used for hidden
networks and b) frequencies were never defined by any callers scanning
for hidden networks (e.g. dbus/periodic scans).
Soon the scan command queue will be used to break up scan requests
meaning only the last scan request frequencies would be used in the
callback, breaking the logic in station.
Now the NEW_SCAN_RESULTS case will parse the frequencies for each scan
command rather than only the last.
The compiler treated the '1' as an int type which was not big enough
to hold a bit shift of 31:
runtime error: left shift of 1 by 31 places cannot be represented in
type 'int'
Instead of doing the iftype check manually, refactor
wiphy_get_supported_iftypes by adding a subroutine which just parses
out iftypes from a mask into a char** list. This removes the need to
case each iftype into a string.
Add extra logging around CQM events to help track wifi status. This is
useful for headless systems that can only be accessed over the network
and so information in the logs is invaluable for debugging outages.
Prior to this change, the only log for CQM messages is saying one was
received. This adds details to what attributes were set and the
associated data with them.
The signal strength log format was chosen to roughly match
wpa_supplicant's which looks like this:
CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-60 noise=-96 txrate=6000
Provides useful information on why a roam might have failed, such as
failing to find the BSS or the BSS being ranked lower, and why that
might be.
The output format is the same as station_add_seen_bss for consistency.
If a frequency is disabled IWD should keep track and disallow any
operations on that channel such as scanning. A new list has been added
which contains only disabled frequencies.
The scan_passive API wasn't using a const struct scan_freq_set as it
should be since it's not modifying the contents. Changing this to
const did require some additional changes like making the scan_parameters
'freqs' member const as well.
After changing scan_parameters, p2p needed updating since it was using
scan_parameters.freqs directly. This was changed to using a separate
scan_freq_set pointer, then setting to scan_parameters.freqs when needed.
Similar to the HT/VHT APIs, this estimates the data rate based on the
HE Capabilities element, in addition to our own capabilities. The
logic is much the same as HT/VHT. The major difference being that HE
uses several MCS tables depending on the channel width. Each width
MCS set is checked (if supported) and the highest estimated rate out
of all the MCS sets is used.
There appears to be a compiler bug with gcc 11.2 which thinks the vht_mcs_set
is a zero length array, and the memset of size 8 is out of bounds. This is only
seen once an element is added to 'struct band'.
In file included from /usr/include/string.h:519,
from src/wiphy.c:34:
In function ‘memset’,
inlined from ‘band_new_from_message’ at src/wiphy.c:1300:2,
inlined from ‘parse_supported_bands’ at src/wiphy.c:1423:11,
inlined from ‘wiphy_parse_attributes’ at src/wiphy.c:1596:5,
inlined from ‘wiphy_update_from_genl’ at src/wiphy.c:1773:2:
/usr/include/bits/string_fortified.h:59:10: error: ‘__builtin_memset’ offset [0, 7] is out of the bounds [0, 0] [-Werror=array-bounds]
59 | return __builtin___memset_chk (__dest, __ch, __len,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
60 | __glibc_objsize0 (__dest));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~
In test-band the band object was allocated using l_malloc, but not
memset to zero. This will cause problems if allocated pointers are
included in struct band once band is freed.
This increases the maximum data rate which now is possible with HE.
A few comments were also updated, one to include 6G when adjusting
the rank for >4000mhz, and the other fixing a typo.
This is a general way of finding the best MCS/NSS values which will work
for HT, VHT, and HE by passing in the max MCS values for each value which
the MCS map could contain (0, 1, or 2).
The HE capabilities information is contained in
NL80211_BAND_ATTR_IFTYPE_DATA where each entry is a set of attributes
which define the rules for one or more interface types. This patch
specifically parses the HE PHY and HE MCS data which will be used for
data rate estimation.
Since the set of info is per-iftype(s) the data is stored in a queue
where each entry contains the PHY/MCS info, and a uint32 bit mask where
each bit index signifies an interface type.
With the addition of HE, the print function for MCS sets needs to change
slightly. The maps themselves are the same format, but the values indicate
different MCS ranges. Now the three MCS max values are passed in.
This queue will hold iftype(s) specific data for HE capabilities. Since
the capabilities may differ per-iftype the data is stored as such. Iftypes
may share a configuration so the band_he_capabilities structure has a
mask for each iftype using that configuration.
When PCI adapters are properly configured they should exist in the
vfio-pci system tree. It is assumed any devices configured as such
are used for test-runner.
This removes the need for a hw.conf file to be supplied, but still
is required for USB adapters. Because of this the --hw option was
updated to allow no value, or a file path.
subprocess.Popen's wait() method was overwritten to be non-blocking but
in certain circumstances you do want to wait forever. Fix this to allow
timeout=None, which calls the parent wait() method directly.
Certain module dependencies were missing, which could cause a crash on
exit under (very unlikely) circumstances.
#0 l_queue_peek_head (queue=<optimized out>) at ../iwd-1.28/ell/queue.c:241
#1 0x0000aaaab752f2a0 in wiphy_radio_work_done (wiphy=0xaaaac3a129a0, id=6)
at ../iwd-1.28/src/wiphy.c:2013
#2 0x0000aaaab7523f50 in netdev_connect_free (netdev=netdev@entry=0xaaaac3a13db0)
at ../iwd-1.28/src/netdev.c:765
#3 0x0000aaaab7526208 in netdev_free (data=0xaaaac3a13db0) at ../iwd-1.28/src/netdev.c:909
#4 0x0000aaaab75a3924 in l_queue_clear (queue=queue@entry=0xaaaac3a0c800,
destroy=destroy@entry=0xaaaab7526190 <netdev_free>) at ../iwd-1.28/ell/queue.c:107
#5 0x0000aaaab75a3974 in l_queue_destroy (queue=0xaaaac3a0c800,
destroy=destroy@entry=0xaaaab7526190 <netdev_free>) at ../iwd-1.28/ell/queue.c:82
#6 0x0000aaaab7522050 in netdev_exit () at ../iwd-1.28/src/netdev.c:6653
#7 0x0000aaaab7579bb0 in iwd_modules_exit () at ../iwd-1.28/src/module.c:181
In this particular case, wiphy module was de-initialized prior to the
netdev module:
Jul 14 18:14:39 localhost iwd[2867]: ../iwd-1.28/src/wiphy.c:wiphy_free() Freeing wiphy phy0[0]
Jul 14 18:14:39 localhost iwd[2867]: ../iwd-1.28/src/netdev.c:netdev_free() Freeing netdev wlan0[45]
Since we use git ls-files to produce the list of all tests for -A, if
the source directory is owned by somebody other than root one might
get:
fatal: unsafe repository ('/home/balrog/repos/iwd' is owned by someone else)
To add an exception for this directory, call:
git config --global --add safe.directory /home/balrog/repos/iwd
Starting
/home/balrog/repos/iwd/tools/..//autotests/ threw an uncaught exception
Traceback (most recent call last):
File "/home/balrog/repos/iwd/tools/run-tests", line 966, in run_auto_tests
subtests = pre_test(ctx, test, copied)
File "/home/balrog/repos/iwd/tools/run-tests", line 814, in pre_test
raise Exception("No hw.conf found for %s" % test)
Exception: No hw.conf found for /home/balrog/repos/iwd/tools/..//autotests/
Mark args.testhome as a safe directory on every run.
Test that the DHCPv4 lease got renewed after the T1 timer runs out.
Then also simulate the DHCPREQUEST during renew being lost and
retransmitted and the lease eventually getting renewed T1 + 60s later.
The main downside is that this test will inevitably take a while if
running in Qemu without the time travel ability.
Update the test and some utility code to run hostapd in an isolated net
namespace for connection_test.py. We now need a second hostapd
instance though because in static_test.py we test ACD and we need to
produce an IP conflict. Moving the hostapd instance unexpectedly fixes
dhcpd's internal mechanism to avoid IP conflicts and it would no longer
assign 192.168.1.10 to the second client, it'd notice that address was
already in use and assign the next free address, or fail if there was
none. So add a second hostapd instance that runs in the main namespace
together with the statically-configured client, it turns out the test
relies on the kernel being unable to deliver IP traffic to interfaces on
the same system.
The kernel will not let us test some scenarios of communication between
two hwsim radios (e.g. STA and AP) if they're in the same net namespace.
For example, when connected, you can't add normal IPv4 subnet routes for
the same subnet on two different interfaces in one namespace (you'd
either get an EEXIST or you'd replace the other route), you can set
different metrics on the routes but that won't fix IP routing. For
testNetconfig the result is that communication works for DHCP before we
get the inital lease but renewals won't work because they're unicast.
Allow hostapd to run on a radio that has been moved to a different
namespace in hw.conf so we don't have to work around these issues.
The --start option was directly passed to the kernel init parameter,
preventing any environment setup from happening.
Intead always use 'run-tests' as the init process but detect --start
and execute that binary/script once inside the environment.
The table header needs to be adjusted to include spaces between
columns.
The 'adapter list' command was also updated to shorted a few
columns which were quite long for the data that is actually displayed.
The existing color code escape sequences required the user to set the
color, write the string, then unset with COLOR_OFF. Instead the macros
can be made to take the string itself and automatically terminate the
color with COLOR_OFF. This makes for much more concise strings.
ad-hoc, ap, and wsc all had descriptions longer than the max width but
this is now taken care of automatically. Remove the tab and newline's
from the description.
The WSC pin command description was also changed to be more accurate
since the pin does not need to be 8 digits.
There was no easy to use API for printing the contents of a table, and
was left up to the caller to handle manually. This adds display_table_row
which makes displaying tables much easier, including automatic support
for line truncation and continuation on the next line.
Lines which are too long will be truncated and displayed on the next
line while also taking into account any colored output. This works with any
number of columns.
This removes the need for the module to play games with encoding newlines
and tabs to make the output look nice.
As a start, this functionality was added to the command display.
This fixes a crash associated with toggling the iftype to AP mode
then calling GetDiagnostics. The diagnostic interface is never
cleaned up when netdev goes down so DBus calls can still be made
which ends up crashing since the AP interface objects are no longer
valid.
Running the following iwctl commands in a script (once or twice)
triggers this crash reliably:
iwctl device wlp2s0 set-property Mode ap
iwctl device wlp2s0 set-property Mode station
iwctl device wlp2s0 set-property Mode ap
iwctl ap wlp2s0 start myssid secret123
iwctl ap wlp2s0 show
++++++++ backtrace ++++++++
0 0x7f8f1a8fe320 in /lib64/libc.so.6
1 0x451f35 in ap_dbus_get_diagnostics() at src/ap.c:4043
2 0x4cdf5a in _dbus_object_tree_dispatch() at ell/dbus-service.c:1815
3 0x4bffc7 in message_read_handler() at ell/dbus.c:285
4 0x4b5d7b in io_callback() at ell/io.c:120
5 0x4b489b in l_main_iterate() at ell/main.c:476
6 0x4b49a6 in l_main_run() at ell/main.c:519
7 0x4b4cd9 in l_main_run_with_signal() at ell/main.c:645
8 0x404f5b in main() at src/main.c:600
9 0x7f8f1a8e8b75 in /lib64/libc.so.6
+++++++++++++++++++++++++++
The generic proxy property display was limited to a width for names/value
which makes the output look nice and uniform, but will cut off any values
which are longer than this limit.
This patch adds some logic to detect this, and continue displaying the
value on the next line.
The width arguments were also updated to be unsigned, which allows checking
the length without a cast.
testAgent had a few tests which weren't reliable, and one was not
actually testing anything, or at least not what the name implied it
should be testing.
The first issue was using iwctl in the first place. There is not a
reliable way to know when iwctl has registered its agent so relying on
that with a sleep, or waiting for the service to become available isn't
100% fool proof. To fix this use the updated PSKAgent which allows
multiple to be registered. This ensures the agent is ready for requests.
This test was also renamed to be consistent with what its actually
testing: that IWD uses the first agent registered.
This removes test_connection_with_other_agent as well because this test
case is covered by the client test itself. There is no need to re-test
iwctl's agent functionality here.
By creating a new bus connection for each agent we can register multiple
with IWD. This did mean the agent interface needs to be unique for each
agent (removing _agent_manager_if) as well as tracking multiple agents
in a list.
IWD uses a few pragmas to ignore warnings which clang does
not support. For -Werror builds these cause build failures
but can be fixed by ignoring unknown warnings and pragmas.
The dbus proxy code assumes that every interface has a set of
properties registered in a 'proxy_interface_property' structure,
assuming the interface has any properties at all. If the interface
is assumed to have no properties (and no property table) but
actually does, the property table lookup fails but is assumed to
have succeeded and causes a crash.
This caused iwctl to crash after some properties were added to DPP
since the DPP interface previously had no properties.
Now, check that the property table was valid before accessing it. This
should allow properties to be added to new interfaces without crashing
older versions of iwctl.
About a month ago hostapd was changed to set the secure bit on
eapol frames during rekeys (bc36991791). The spec is ambiguous
about this and has conflicting info depending on the sections you
read (12.7.2 vs 12.7.6). According to the hostapd commit log TGme
is trying to clarify this and wants to set secure=1 in the case
of rekeys. Because of this, IWD is completely broken with rekeys
since its disallows secure=1 on PTK 1/4 and 2/4.
Now, a bool is passed to the verify functions which signifies if
the PTK has been negotiated already. If secure differs from this
the key frame is not verified.
The test here is verifying that a DBus Connect() call will still
work with 'other' agents registered. In this case it uses iwctl to
set a password, then call Connect() manually.
The problem here is that we have no way of knowing when iwctl fully
starts and registers its agent. There was a sleep in there but that
is unreliable and we occationally were still getting past that without
iwctl having started fully.
To fix this properly we need to wait for iwctl's agent service to appear
on the bus. Since the bus name is unknown we must first find all names,
then cross reference their PID's against the iwctl PID. This is done
using ListNames, and GetConnectionUnixProcessID APIs.
The rekey/reauth logic was broken in a few different ways.
For rekeys the event list was not being reset so any past 4-way
handshake would allow the call to pass. This actually removes
the need for the sleep in the extended key ID test because the
actual handshake event is waited for correctly.
For both rekeys and reauths, just waiting for the EAP/handshake
events was not enough. Without checking if the client got
disconnected we essentially allow a full disconnect and reconnect,
meaning the rekey/reauth failed.
Now a 'disallow' array can be passed to wait_for_event which will
throw an exception if any events in that array are encountered
while waiting for the target event.
Yet another weird UML quirk. The intent of this tests was to ensure
the profile gets encrypted, and to check this both the mtime and
contents of the profile were checked.
But under UML the profile is copied, IWD started, and the profile
is encrypted all without any time passing. The (same) mtime was
then updated without any changes which fails the mtime check.
This puts a sleep after copying the profile to ensure the system
time differs once IWD encrypts the profile.
In UML if any process dies while test-runner is waiting for the DBus
service or some socket to be available it will block forever. This
is due to the way the non_block_wait works.
Its not optimal but it essentially polls over some input function
until the conditions are met. And, depending on the input function,
this can cause UML to hang since it never has a chance to go idle
and advance the time clock.
This can be fixed, at least for services/sockets, by sleeping in
the input function allowing time to pass. This will then allow
test-runner to bail out with an exception.
This patch adds a new wait_for_service function which handles this
automatically, and wait_for_socket was refactored to behave
similarly.
This function was checking if the process object exists, which can
persist long after a process is killed, or dies unexpectedly. Check
that the actual PID exists by sending signal 0.
The man pages (iwd.network) have a section about how to name provisioning
files containing non-alphanumeric characters but not everyone reads the
entire man page.
Warning them that the provisioning file was not read and pointing to
'man iwd.network' should lead someone in the right direction.
EAP-Success might come in with an identifier that is incremented by 1
from the last Response packet. Since identifier field is a byte, the
value might overflow (from 255 -> 0.) This overflow isn't handled
properly resulting in EAP-Success/Failure packets with a 0 identifier
due to overflow being erroneously ignored. Fix that.
test_decryption_failure is quite simple and only verifies that a known
network exists after starting. This causes the test to end before IWD can
fully start up leaving the DBus utilities in limbo having not fully
initialized.
Then, on the next test, stale InterfaceAdded signals arrive (for Station
and P2P) which throw exceptions when trying to get the bus (since IWD is
long gone). In addition the next IWD instance has started so any paths
included in the InterfaceAdded signals are bogus and cause additional
exceptions.
At the end of this test we can call list_devices() which will wait for
the InterfaceAdded signal, and cleanly exit afterwards.
An earlier commit fixed several options but ended up breaking others. The
result_parent/monitor_parent options are hidden from the user and only meant
to be passed to the kernel but they relied on the fact that the underscore
was present, not a dash. This updates the argument to use a dash:
--result-parent
--monitor-parent
Fixes: 00e41eb0ff ("test-runner: Fix parsing for some arguments")
The new regex match update was actually matching way more than it should
have due to how python's 'match' API works. 'match' will return successfully
if zero or more characters match from the beginning of the string. In this
case we actually need the entire regex to match otherwise we start matching
all prefixes, for example:
"--verbose iwd" will match iwd, iwd-dhcp, iwd-acd, iwd-genl and iwd-tls.
Instead use re.fullmatch which requires the entire string to match the
regex.
This was copy pasted from the autoconnect test, and depending on
how the python module cache is ordered can incorrectly use the
wrong test class. This should nothappen because we insert
the paths to the head of the list but for consistency the class
should be named something that reflects what the test is doing.
Enabling this ends up dumping so much logging and, at least with namespaces,
seems to break the logger module and cause really weird behavior, worst of
which is that all processes start dumping to stdout.
This can still be enabled explicitly with --verbose iwd-rtnl, but is turned
off by default when --log is used.
Add a fake resolvconf executable to verify that the right nameserver
addresses were actually committed by iwd. Again use unique nameserver
addresses to reduce the possibility that the test succeeds by pure luck.
In static_test.py add IPv6. Add comments on what we're actually testing
since it wasn't very clear. After the expected ACD conflict detection,
succeed if either the lost address was removed or the client disconnected
from the AP since this seems like a correct action for netconfig to
implement.
In iwd.py make sure all the static methods that touch IWD storage take the
storage_dir parameter instead of hardcoding IWD_STORAGE_DIR, and make
sure that parameter is actually used.
Create the directory if it doesn't exist before copying files into it.
This fixes a problem in testNetconfig where
`IWD.copy_to_storage('ssidTKIP.psk', '/tmp/storage')`
would result in /tmp/storage being created as a file, rather than a
directory containing a file, and resulting in IWD failing to start with:
`Failed to create /tmp/storage`
runner.py creates /tmp/iwd but that doesn't account for IWD sessions
with a custom storage dir path.
Extend test_ip_address_match to support IPv6 and to test the
netmask/prefix length while it reads the local address since those are
retrieved using the same API.
Modify testNetconfig to validate the prefix lengths, change the prefix
lengths to be less common values (not 24 bits for IPv4 or 64 for IPv6),
minor cleanup.
Currently the parameter values reach run-tests by first being parsed by
runner.py's RunnerArgParser, then the resulting object members being
encoded as a commandline string, then as environment variables, then the
environment being converted to a python string list and passed to
RunnerCoreArgParser again. Where argument names (like --sub-tests) had
dashes, the object members had underscores (.sub_tests), this wasn't
taken into account when building the python string list from environment
variables so convert all underscores to dashes and hope that all the
names match now.
Additionally some arguments used nargs='1' or nargs='*' which resulted
in their python values becoming lists. They were converted back to command
line arguments such as: --sub_tests ['static_test.py'], and when parsed
by RunnerCoreArgParser again, the values ended up being lists of lists.
In all three cases it seems the actual user of the parsed value actually
expects a single string with comma-separated substrings in it so just drop
the nargs= uses.
Most users of storage_network_open don't log errors when the function
returns a NULL and fall back to defaults (empty l_settings).
storage_network_open() itself only logs errors if the flie is encrypted.
Now also log an error when l_settings_load_from_file() fails to help track
down potential syntax errors.
Drop the wrong negation in the error check. Check that there are no extra
characters after prefix length suffix. Reset errno 0 before the strtoul
call, as recommended by the manpage.
This is actually a false positive only because
p2p_device_validate_conn_wfd bails out if the IE is NULL which
avoids using wfd_data_length. But its subtle and without inspecting
the code it does seem like the length could be used uninitialized.
src/p2p.c:940:7: error: variable 'wfd_data_len' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
if (dev->conn_own_wfd)
^~~~~~~~~~~~~~~~~
src/p2p.c:946:8: note: uninitialized use occurs here
wfd_data_len))
^~~~~~~~~~~~
src/p2p.c:940:3: note: remove the 'if' if its condition is always true
if (dev->conn_own_wfd)
^~~~~~~~~~~~~~~~~~~~~~
src/p2p.c:906:23: note: initialize the variable 'wfd_data_len' to silence this warning
ssize_t wfd_data_len;
^
= 0
Though the documentation for NLMSG_OK uses an int type for the length
the actual check is based on nlmsghdr->nlmsg_len which is a 32 bit
unsigned integer. Clang was complaining about one call in nlmon.c
because nlmsg_len was int type. Every other usage in nlmon.c uses
a uint32_t, so use that both for consistency and to fix the warning.
monitor/nlmon.c:7998:29: error: comparison of integers of different
signs: '__u32' (aka 'unsigned int') and 'int'
[-Werror,-Wsign-compare]
for (nlmsg = iov.iov_base; NLMSG_OK(nlmsg, nlmsg_len);
^~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/linux/netlink.h💯24: note: expanded from macro 'NLMSG_OK'
(nlh)->nlmsg_len <= (len))
On musl-gcc the compiler is giving a warning for igtk_key_index
and gtk_key_index being used uninitialized. This isn't possible
since they are only used if gtk/igtk are non-NULL so pragma to
ignore the warning.
src/fils.c: In function 'fils_rx_associate':
src/fils.c:580:17: error: 'igtk_key_index' may be used uninitialized
in this function [-Werror=maybe-uninitialized]
580 | handshake_state_install_igtk(fils->hs,
igtk_key_index,igtk + 6,
igtk_len - 6, igtk);
(same error for gtk_key_index)
Since commit 922fa099721903b106a7bc1ccd1ffe8c4a7bce69 in hostap, our
setting of config_methods on P2P-client interface was ignored. Work
around that commit, in addition to the previous workaround we have in
this test, to again ensure the correct config_methods value is used.
This was lazily copied from UML but really made no sense in the context
of QEMU. First QEMU needs the virtfs option to define the mount tag and
in addition a 9p mount should be used rather than 'hostfs'.
The glob match was completely broken for --verbose because globs
are actually path matches, not generally for strings. Instead
match based on regular expressions.
First the verbose option was fixed to store it as an array as well
as write any list arguments into the kernel command line properly
(str() would include []). This has worked up until now because the
'in' keyword in python will work on strings just as well
as lists, for example:
>>> 'test' in 'this,is,a,test'
True
Then, the glob match was replaced with a regex match. Any exceptions
are caught and somewhat ignored (printed, but only seen with --debug).
This only guards against fatal exceptions from a user passing an
invalid expression.
For network configuration files the man pages (iwd.network) state
that [General].{AlwaysRandomizeAddress,AddressOverride} are only
used if main.conf has [General].AddressRandomization=network.
This actually was not being enforced and both iwd.network settings
were still taken into account regardless of what AddressRandomization
was set to (even disabled).
The handshake setup code now checks the AddressRandomization value
and if anything other than 'network' skips the randomization.
This bit of code was throwing exceptions if a test cleaned up files that
test-runner was expecting to clean up. Specifically testHotspot swaps out
main.conf and PSK files many times. This led to the exception being thrown,
caught, and ignored but further on test-runner would print:
"File _X_ not cleaned up!"
Now the files will be checked if they exist before trying to remove it.
There were a few places in dpp/dpp-util which passed a single byte but
was being read in with va_arg(va, size_t). On some architectures this was
causing failures presumably from the compiler using an integer type
smaller than size_t. As we do elsewhere, cast to size_t to force the
compiler to pass a properly sized iteger to va_arg.
Similarly to ofono/phonesim allow tests to be skipped if wpa_supplicant
is not found on the system.
This required some changes to DPP/P2P where Wpas() should be called first
since this can now throw a SkipTest exception.
The Wpas class was also made to allow __del__ to be called without
throwing additional exceptions in case wpa_supplicant was not found.
This allows the EAP tests to pass, but the fix really needs to be in
hostapd itself. Hostapd currently tries to lookup the EAP session
immediately after receiving EAPOL_REAUTH. This uses the identity
it has stored which, in the case of PEAP/TTLS, will always be a phase2
identity. During this initial lookup hostapd hard codes the identity
to be phase1 which is not true for PEAP/TTLS, and the lookup fails.
The current way this was being done was to import collections and
use collections.Mapping. This has been deprecated since python 3.3
but has worked up until python 3.10. After python 3.10 this will
no longer work, and Mapping must be imported from collections.abc.
This was passing IFNAME= along with EAPOL_REAUTH which does not work
in the context of a hostapd socket where the iface is already implied.
This fixes that issue as well as resets the events array and actually
waits for the required events afterwards.
After one of the eap-tls-common-based methods succeeds keep the TLS
tunnel instance until the method is freed, rather than free it the
moment the method succeeds. This fixes repeated method runs where until
now each next run would attempt to create a new TLS tunnel instance
but would have no authentication data (CA certificate, client
certificate, private key and private key passphrase) since those are
were by the old l_tls object from the moment of the l_tls_set_auth_data()
call.
Use l_tls_reset() to reset the TLS state after method success, followed
by a new l_tls_start() when the reauthentication starts.
The signal agent notifications were changed which breaks this test.
Specifically commit ce227e7b94 sends a notification when connected
which breaks the 'agent.calls' check. Since this check is done both
after connecting and once already connected the initial value may
be 1 or 0. Because of this that check was removed entirely.
This test was just piping the PSK files into /tmp/iwd/ssidCCMP.psk
which is a bit fragile if the storage dir was ever to change. Instead
use copy_to_storage and the 'name' keyword to copy the file.
If the user specifies the same parent directory for several outfiles
skip mounting since it already exists. For example:
--monitor /outfiles/monitor.txt --result /outfiles/result.txt
Inside the virtual environments /tmp is mounted as its own FS and not
taken from the host. This poses issues if any output files are directly
under /tmp since test-runner tries to mount the parent directory (/tmp).
The can be fixed by ensuring these output files are either not under
/tmp or at least one folder down the tree (e.g. /tmp/outputs/outfile.txt).
Now this requirement is enforced and test-runner will not start if any
output files parent directory is /tmp.
Usually the test home directory is a git repo somewhere e.g. under
/home. But if the home directory is located under /tmp this poses
a problem since UML remounts /tmp. To handle both cases mount
the home directory explicity.
Certain aspects of QEMU like mounting host directories may still require
root access but for UML this is not the case. To handle both cases first
check if SUDO_UID/GID are set and use those to obtain the actual users
ID's. Otherwise if running as non-root use the UID/GID of the user
directly.
A user reported that IWD was failing to FT in some cases and this was
due to the AP setting the Retry bit in the frame type. This was
unexpected by IWD since it directly checks the frame type against
0x00b0 which does not account for any B8-B15 bits being set.
IWD doesn't need to verify the frame type field for a few reasons:
First mpdu_validate checks the management frame type, Second the kernel
checks prior to forwarding the event. Because of this the check was
removed completely.
Reported-By: Michael Johnson <mjohnson459@gmail.com>
station_signal_agent_notify() has been refactored so that its usage is
simpler. station_rssi_level_changed() has been replaced by an inlined
call to station_signal_agent_notify().
The call to netdev_rssi_level_init() in netdev_connect_common() is
currently a no-op, because netdev->connected has not yet been set at
this stage of the connection attempt. Because netdev_rssi_level_init()
is only used twice, it's been replaced by two inlined calls to
netdev_set_rssi_level_idx().
The SignalLevelAgent API is currently broken by the system bus's
security policy, which blocks iwd's outgoing method call messages. This
patch punches a hole for method calls on the
net.connman.iwd.SignalLevelAgent interface.
There may be situations where DNS information should not be set (for
example in auto-tests where the resolver daemon is not running) or if a
user wishes to ignore DNS information obtained.
Allows granularly specifying the DHCP logging level. This allows the
user to tailor the output to what they need. By default, always display
info, errors and warnings to match the rest of iwd.
Setting `IWD_DHCP_DEBUG` to "debug", "info", "warn", "error" will limit
the logging to that level or higher allowing the default logging
verbosity to be reduced.
Setting `IWD_DHCP_DEBUG` to "1" as per the current behavior will
continue to enable debug level logging.
Use scapy library which allows one to easily construct and fudge various
network packets. This makes constructing spoofed packets much easier
and more readable compared to hex-encoded, hand-crafted frames.
The TA/BSSID addresses of spoofed disassociate frames were set
incorrectly. They should be using the 02:00:00:XX:XX:XX address, but
instead were being converted over to 42:00:00:XX:XX:XX address
After the initial handshake, once the TK has been installed, all frames
coming to the AP should be encrypted. However, it seems that some
kernel/driver combinations allow unencrypted EAPoL frames to be received
and forwarded to userspace. This can lead to various attacks.
Some drivers can report whether the EAPoL frame has been received
unencrypted. Use this information to drop unencrypted EAPoL frames
received after the initial handshake has been completed.
After the initial handshake, once the TK has been installed, all frames
coming from the AP should be encrypted. However, it seems that some
kernel/driver combinations allow unencrypted EAPoL frames to be received
and forwarded to userspace. This can lead to a denial-of-service attack
where receipt of an invalid, unencrypted EAP-Failure frame generated by
an adversary results in iwd terminating an ongoing connection.
Some drivers can report whether the EAPoL frame has been received
unencrypted. Use this information to drop unencrypted EAP frames
received after the initial handshake has been completed.
Reported-by: Domien Schepers <schepers.d@northeastern.edu>
After the initial handshake, once the TK has been installed, all frames
coming from the AP should be encrypted. However, it seems that some
kernel/driver combinations allow unencrypted EAPoL frames to be received
and forwarded to userspace. This can lead to a denial-of-service attack
where receipt of an invalid, unencrypted EAPoL 1/4 frame generated by an
adversary results in iwd terminating an ongoing connection.
Some drivers can report whether the EAPoL frame has been received
unencrypted. Use this information to drop unencrypted PTK 1/4 frames
received after the initial handshake has been completed.
Reported-by: Domien Schepers <schepers.d@northeastern.edu>
Do not fail an ongoing handshake when an invalid EAPoL frame is
received. Instead, follow the intent of 802.11-2020 section 12.7.2:
"EAPOL-Key frames containing invalid field values shall be silently
discarded."
This prevents a denial-of-service attack where receipt of an invalid,
unencrypted EAPoL 1/4 frame generated by an adversary results in iwd
terminating an ongoing connection.
Reported-by: Domien Schepers <schepers.d@northeastern.edu>
Periodic scan requests are meant to be performed with a lower priority
than normal scan requests. They're thus given a different priority when
inserting them into the wiphy work queue. Unfortunately, the priority
is not taken into account when they are inserted into the
sr->requests queue. This can result in the scanning code being confused
since it assumes the top of the queue is always the next scheduled or
currently ongoing scan. As a result any further wiphy_work might never be
started properly.
Apr 27 16:34:40 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_insert() Inserting work item 3
Apr 27 16:34:40 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_next() Starting work item 3
Apr 27 16:34:40 iwd[5117]: ../iwd-1.26/src/scan.c:scan_periodic_timeout() 1
Apr 27 16:34:40 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_insert() Inserting work item 4
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_insert() Inserting work item 5
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_done() Work item 3 done
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_next() Starting work item 5
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/scan.c:scan_notify() Scan notification Trigger Scan(33)
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/scan.c:scan_request_triggered() Passive scan triggered for wdev 1
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/scan.c:scan_periodic_triggered() Periodic scan triggered for wdev 1
In the above log, scan request 5 (triggered by dbus) is started before
scan request 4 (periodic scan). Yet the scanning code thinks scan
request 4 was triggered.
Fix this by using the wiphy_work priority to sort the sr->requests queue
so that the scans are ordered in the same manner.
Reported-by: Alvin Šipraga <ALSI@bang-olufsen.dk>
update_config=1 lets wpa_supplicant write config changes
to the config file. In the real world this is what you want
so your DPP credentials are persistant. But for testing this
is not correct since multiple tests use the same config file
and expect it to be pristine.
Occationally wpa_supplicant was connecting to the AP without
running DPP because the config already had the network
credentials.
The upstream code immediately retransmitted any no-ACK frames.
This would work in cases where the peer wasn't actively switching
channels (e.g. the ACK was simply lost) but caused unintended
behavior in the case of a channel switch when the peer was not
listening.
If either IWD or the peer needed to switch channels based on the
authenticate request the response may end up not getting ACKed
because the peer is idle, or in the middle of the hardware changing
channels. IWD would get no-ACK and immediately send the frame again
and most likely the same behavior would result. This would very
quickly increment frame_retry past the maximum and DPP would fail.
Instead when no ACK is received wait 1 second before sending out
the next frame. This can re-use the existing frame_pending buffer
and the same logic for re-transmitting.
There is a potential corner case of an offchannel frame callback
being called after ROC has ended.
This could happen in theory if a received frame is queued right as
the ROC session expires. If the ROC cancel event makes it to user
space before the frame IWD will schedule another ROC then receive
the frame. This doesn't prevent IWD from sending out another
frame since OFFCHANNEL_TX_OK is used, but it will prevent IWD from
receiving a response frame since no dwell duration is used with DPP.
To handle this an roc_started bool was added to the dpp_sm which
tracks the ROC state. If dpp_send_frame is called when roc_started
is false the frame will be saved and sent out once the ROC session
is started again.
ConnectHiddenNetwork creates a temporary network object and initiates a
connection with it. If the connection fails (due to an incorrect
passphrase or other reasons), then this temporary object is destroyed.
Delay its destruction until network_disconnected() since
network_connect_failed is called too early. Also, re-order the sequence
in station_reset_connection_state() in order to avoid using the network
object after it has been freed by network_disconnected().
Fixes: 85d9d6461f1f ("network: Hide hidden networks on connection error")
station_hide_network will remove and free the network object, so calling
network_close_settings will result in a crash. Make sure this is done
prior to network object's destruction.
Fixes: 85d9d6461f1f ("network: Hide hidden networks on connection error")
This only posed a problem oddly if the kernel binary was in the same
directory as test-runner. Resolving the absolute path with the
argument parser resolves the issue.
The TIOCSTTY ioctl was not shared between UML and QEMU which prevented
any console input from making it into UML. This fixes that, and now
ctrl-c can be used to stop UML test execution.
The MountInfo tuple was changed to explicitly take a source string. This
is redundant for UML and system mounts since the fstype/source are the same,
but it allows QEMU to specify the '9p' fstype and use MountInfo rather than
calling mount() explicitly.
This also moves logging cleanup into _prepare_mounts so both UML and QEMU
can use it.
Many processes are not long running (e.g. hostapd_cli, ip, iw, etc)
and the separators written to log files don't show up for these which
makes debugging difficult. This is even true for IWD/Hostapd for tests
with start_iwd=0.
After writing separators for long running processes write them out for
any additional log files too.
Way too many classes have a dependency on the TestContext class, in
most cases only for is_verbose. This patch removes the dependency from
Process and Namespace classes.
For Process, the test arguments can be parsed in the class itself which
will allow for this class to be completely isolated into its own file.
The Namespace class was already relatively isolated. Both were moved
into utils.py which makes 'run-tests' quite a bit nicer to look at and
more fitting to its name.
This commonizes some mounting code between QEMU and UML to allow exporting
of files to the host environment. UML does this with a hostfs mount while
QEMU still uses 9p.
The common code sanitizing the inputs has been put into _prepare_outfiles
and _prepare_mounts was modified to take an 'extra' arugment containing
additional mount points.
The results and monitor parent directories are now passed into the environment
via arguments, and these are hidden from the help text (in addition to testhome)
If --help or unknown options were supplied to test-runner python
would thrown a maximum recusion depth exception. This was due to
the way ArgumentParser was subclassed.
To fix this call ArgumentParser.__init__() rather than using the
super() method. And do this also for the RunnerCoreArgParse
subclass as well. In addition the namespace argument was removed
from parse_args since its not used, and instead supplied directly
to the parents parse_args method.
There is really no reason to have hwsim create interfaces automatically
for test-runner. test-runner already does this for wpa_supplicant and
hostapd, and IWD can create the interface itself.
If a user connection fails on a freshly scanned psk or open hidden
network, during passphrase request or after, it shall be removed from
the network list. Otherwise, it would be possible to directly connect
to that known network, which will appear as not hidden.
The test was rekeying in a loop which ends up confusing hostapd
depending on the timing of when it gets the REKEY command and any
responses from IWD. UML seemed to handle this fine but not QEMU.
Instead delay the rekey a bit to allow it to fully complete before
sending another.
Similarly to hostapd.wait_for_event, IWD's variant needed to act on
an IO watch because events were being received prior to even calling
wait_for_event.
With how fast UML is hostapd events were being sent out prior to
ever calling wait_for_event. Instead set an IO watch on the control
socket and cache all events as they come. Then, when wait_for_event
is called, it can reference this list. If the event is found any
older events are purged from the list.
The AP-ENABLED event needed a special case because hostapd gets
started before the IO watch can be registered. To fix this an
enabled property was added which queries the state directly. This
is checked first, and if not enabled wait_for_event continues normally.
This removes prints which were never supposed to make it upstream as
well as changes sleep() to wd.wait() as well as increase the wait
period to fix issues with how fast UML runs the tests.
This allows the callers condition to be checked immediately without
the mainloop running. In addition may_block=True allows the mainloop
to poll/sleep rather than immediately return back to the caller. This
handles async IO much better than may_block=False, at least for our
use-case.
Namespace process logs were appearing under 'ip' (and also overwriting
actual 'ip' logs) since they were executed with 'ip netns exec <namespace>'.
Instead special case this and append '-<namespace>' to the log file name.
In addition processes executed prior to any tests were being put under
a folder (name of testhome directory). Now this case is detected and these
logs are put at the top level log directory.
This allows test-runner to run inside a UML binary which has some
advantages, specifically time-travel/infinite CPU speed. This should
fix any scheduler related failures we have on slower systems.
Currently this runner does not suppor the same features as the Qemu
runner, specifically:
- No hardware passthrough
- No logging/monitor (UML -> host mounting isn't implemented yet)
In order to keep all test-runner dev scripts working and to work with
the new runner.py system some file renaming was required.
test-runner was renamed to run-tests
A new test-runner was added which only creates the Runner() class.
This (as well as subsequent commits) will separate test-runner into two
parts:
1. Environment setup
2. Running tests
Spurred by interest in adding UML/host support, test-runner was in need
of a refactor to separate out the environment setup and actually running
the tests.
The environment (currently only Qemu) requires quite a bit of special
handling (ctypes mounting/reboot, 9p mounts, tons of kernel options etc)
which nobody writing tests should need to see or care about. This has all
been moved into 'runner.py'.
Running the tests (inside test-runner) won't change much.
The new 'runner.py' module adds an abstraction class which allows different
Runner's to be implemented, and setup their own environment as they see
fit. This is in preparation for UML and Host runners.
Any test using assertTrue(hostapd.list_sta()) improperly has been
replaced with wait_for_event(). There were a few places where this
was actually ok (i.e. IWD is already connected) but most needed to
be changed since the check was just after IWD connected and hostapd's
list_sta() API may not return a fully updated list.
- Setting the IP address was resulting in an error:
Error: any valid prefix is expected rather than "wln58".
This is fixed by reordering the arguments with the IP address first
- Remove the sleep, and use non_block_wait to wait for the IPv6 address
to be set.
Before setting the address, wait for the interface to go down. This
fixes somewhat rare cases where setting the address returns -EBUSY
and ultimately breaks the neighbor reports.
All tests which could avoid calling scan() directly have been
changed to use the 'full_scan' argument to get_ordered_network.
This was done because of unreliable scanning behavior on slower
systems, like VMs. If we get unlucky with the scheduler some beacons
are not received in time and in turn scan results are missing.
Using full_scan=True works around this issue by repeatedly scanning
until the SSID is found.
It looks like some architectures defconfig were adding these in
automatically, but not others. Explicitly add these to make sure
the kernel is built correctly.
Base the root user check on os.getuid() instead of SUDO_GID so as not to
implicitly require sudo. SUDO_GID being set doesn't guarantee that the
effective user is root either since you can sudo to non-root accounts.
We check that config is not None but then access config.ctx outside of
that if block anyway. Then we do the same for config.ctx and
config.ctx.args. Nest the if blocks for the checks to be useful.
p2p_peer_update_existing may be called with a scan_bss struct built from
a Probe Request frame so it can't access bss->p2p_probe_resp_info even
if peer->bss was built from a Probe Response. Check the source frame
type of the scan_bss struct before updating the Device Address.
This fixes one timing issue that would make the autotest fail often.
Since l_malloc is used the frame contents are not zero'ed automatically
which could result in random bytes being present in the frame which were
expected to be zero. This poses a problem when calculating the MIC as the
crypto operations are done on the entire frame with the expectation of
the MIC being zero.
Fixes: 83212f9b23d5 ("eapol: change eapol_create_common to support FILS")
explicit_bzero is used in src/storage.c since commit
01cd8587606bf2da1af245163150589834126c1c but src/missing.h is not
included, as a result build with uclibc fails on:
/home/buildroot/autobuild/instance-0/output-1/host/lib/gcc/powerpc-buildroot-linux-uclibc/10.3.0/../../../../powerpc-buildroot-linux-uclibc/bin/ld: src/storage.o: in function `storage_init':
storage.c:(.text+0x13a4): undefined reference to `explicit_bzero'
Fixes:
- http://autobuild.buildroot.org/results/2aff8d3d7c33c95e2c57f7c8a71e69939f0580a1
When configuring wpa_supplicant all we care about is that it
received the configuration object. wpa_supplicant takes quite a bit
of time to connect in some cases so waiting for that is unneeded.
This also increases the DPP timeout which may be required on slower
systems or if the timing is particularly unlucky when receiving
frames.
This is used to hold the current BSS frequency which will be
used after IWD receives a presence announcement. Since this was
not being set, the logic was always thinking there was a channel
mismatch (0 != current_freq) and attempting to go offchannel to
'0' which resulted in -EINVAL, and ultimately protocol termination.
Change a few critical checks that were failing sometimes:
- A few asserts were changed to wait_for_object_condition
- A 15 second timeout was removed (default used instead)
- Do a full scan at beginning of each test to clear any
cached BSS's. The second test run was getting stale results
and the RSSI values were not expected.
This was not being properly honored when existing networks were
already populated. This poses an issue for any test which uses
full_scan after setting radio values such as signal strength.
For quite a while test-runner has run into frequent OOM exceptions when
running many tests in a row. Its not completely known exactly why, but
seems to point to the 9p driver which is used for sharing the root fs
between the test-runner VM and the host.
With debugging enabled (-d) one can see the available memory available
relatively stable. If a test fails it may spike ~3-4kb but this quickly
recovers as python garbage collects.
At some point the kernel faults failing to allocate which (usually) is
shown by a python OOM exception. At this point there is plenty of
available memory.
Dumping the kernel trace its seen that the 9p driver is involved:
[ 248.962949] test-runner: page allocation failure: order:7, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
[ 248.962958] CPU: 2 PID: 477 Comm: test-runner Not tainted 5.16.0 #91
[ 248.962960] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-4.fc34 04/01/2014
[ 248.962961] Call Trace:
[ 248.962964] <TASK>
[ 248.962965] dump_stack_lvl+0x34/0x44
[ 248.962971] warn_alloc.cold+0x78/0xdc
[ 248.962975] ? __alloc_pages_direct_compact+0x14c/0x1e0
[ 248.962979] __alloc_pages_slowpath.constprop.0+0xbfe/0xc60
[ 248.962982] __alloc_pages+0x2d5/0x2f0
[ 248.962984] kmalloc_order+0x23/0x80
[ 248.962988] kmalloc_order_trace+0x14/0x80
[ 248.962990] v9fs_alloc_rdir_buf.isra.0+0x1f/0x30
[ 248.962994] v9fs_dir_readdir+0x51/0x1d0
[ 248.962996] ? __handle_mm_fault+0x6e0/0xb40
[ 248.962999] ? inode_security+0x1d/0x50
[ 248.963009] ? selinux_file_permission+0xff/0x140
[ 248.963011] iterate_dir+0x16f/0x1c0
[ 248.963014] __x64_sys_getdents64+0x7b/0x120
[ 248.963016] ? compat_fillonedir+0x150/0x150
[ 248.963019] do_syscall_64+0x3b/0x90
[ 248.963021] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 248.963024] RIP: 0033:0x7fedd7c6d8c7
[ 248.963026] Code: 00 00 0f 05 eb b7 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa b8 ff ff ff 7f 48 39 c2 48 0f 47 d0 b8 d9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 81 a5 0f 00 f7 d8 64 89 02 48
[ 248.963028] RSP: 002b:00007ffd06cd87e8 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
[ 248.963031] RAX: ffffffffffffffda RBX: 000056090d87dd20 RCX: 00007fedd7c6d8c7
[ 248.963032] RDX: 0000000000080000 RSI: 000056090d87dd50 RDI: 000000000000000f
[ 248.963033] RBP: 000056090d87dd50 R08: 0000000000000030 R09: 00007fedc7d37af0
[ 248.963035] R10: 00007fedc7d7d730 R11: 0000000000000293 R12: ffffffffffffff88
[ 248.963038] R13: 000056090d87dd24 R14: 0000000000000000 R15: 000056090d0485e8
Here its seen an allocation of 512k is being requested (order:7), but faults.
In this run it there was ~35MB of available memory on the system.
Available Memory: 35268 kB
Last Test Delta: -2624 kB
Per-test Usage:
[ 0] ** 37016
[ 1] ********* 41584
[ 2] * 36280
[ 3] ********* 41452
[ 4] ******** 40940
[ 5] ****** 39284
[ 6] **** 38348
[ 7] *** 37496
[ 8] **** 37892
[ 9] 35268
This can be reproduced by running all autotests (changing the ram down to
~128MB helps trigger it faster):
./tools/test-runner -k <kernel> -d
After many attempts to fix this it was finally found that simply removing the
explicit 9p2000.u version from the kernel command line 'fixed' the problem.
This even allows decreasing the RAM down to 256MB from 384MB and so far no
OOM's have been seen.
In debug mode the test context is printed before each test. This
adds some additional information in there:
Available Memory: /proc/meminfo: MemAvailable
Last Test Delta: Change in usage between current and last test
Per-test Usage: Graph of usage relative to all past tests. This is
useful for seeing a trend down/up of usage.
This could fail and was not being checked. It was minimally changed to
take the ifindex directly (this was the only thing needed from the ethdev)
which allows checking prior to initializing the ethdev.
Running the tests inside a VM makes it difficult for the host to figure
out if the test actually failed or succeeded. For a human its easy to
read the results table, but for an automated system parsing this would
be fragile. This adds a new option --result <file> which writes PASS/FAIL
to the provided file once all tests are completed. Any failures results in
'FAIL' being written to the file.
\e[1;30m is bold black, often, but not always displayed bright black or
bold bright black. In case it is displayed as real black it is
invisible. \e[1;90m is explicit bold bright black.
\e[37m is white, therefore it is not suitable to be labeled as GREY,
which is \e[90m
The logic here assumed any BSS's in the roam scan were identical to
ones in station's bss_list with the same address. Usually this is true
but, for example, if the BSS changed frequency the one in station's
list is invalid.
Instead when a match is found remove the old BSS and re-insert the new
one.
With the addition of 6GHz '6000' is no longer the maximum frequency
that could be in .known_network.freq. For more robustness
band_freq_to_channel is used to validate the frequency.
Scanning while in AP mode is somewhat of an edge case, but it does
have some usefulness specifically with onboarding new devices, i.e.
a new device starts an AP, a station connects and provides the new
device with network credentials, the new device switches to station
mode and connects to the desired network.
In addition this could be used later for ACS (though this is a bit
overkill for IWD's access point needs).
Since AP performance is basically non-existant while scanning this
feature is meant to be used in a limited scope.
Two DBus API's were added which mirror the station interface: Scan and
GetOrderedNetworks.
Scan is no different than the station variant, and will perform an active
scan on all channels.
GetOrderedNetworks diverges from station and simply returns an array of
dictionaries containing basic information about networks:
{
Name: <ssid>
SignalStrength: <mBm>
Security: <psk, open, or 8021x>
}
Limitations:
- Hidden networks are not supported. This isn't really possible since
the SSID's are unknown from the AP perspective.
- Sharing scan results with station is not supported. This would be a
convenient improvement in the future with respect to onboarding new
devices. The scan could be performed in AP mode, then switch to
station and connect immediately without needing to rescan. A quick
hack could better this situation by not flushing scan results in
station (if the kernel retains these on an iftype change).
This was already implemented in station but with no dependency on
that module at all. AP will need this for a scanning API so its
being moved into scan.c.
The 802.11ax standards adds some restrictions for the 6GHz band. In short
stations must use SAE, OWE, or 8021x on this band and frame protection is
required.
All uses of this macro will work with a bitwise comparison which is
needed for 6GHz checks and somewhat more flexible since it can be
used to compare RSN info, not only single AKM values.
This adds checks if MFP is set to 0 or 1:
0 - Always fail if the frequency is 6GHz
1 - Fail if MFPC=0 and the frequency is 6GHz.
If HW is capable set MFPR=1 for 6GHz
This is a new band defined in the WiFi 6E (ax) amendment. A completely
new value is needed due to channel reuse between 2.4/5 and 6GHz.
util.c needed minimal updating to prevent compile errors which will
be fixed later to actually handle this band. WSC also needed a case
added for 6GHz but the spec does not outline any RF Band value for
6GHz so the 5GHz value will be returned in this case.
sae.c was failing to build on some platforms:
error: implicit declaration of function 'reallocarray'; did you mean 'realloc'?
[-Werror=implicit-function-declaration]
In certain rare cases IWD gets a link down event before nl80211 ever sends
a disconnect event. Netdev notifies station of the link down which causes
station to be freed, but netdev remains in the same state. Then later the
disconnect event arrives and netdev still thinks its connected, calls into
(the now freed) station object and causes a crash.
To fix this netdev_connect_free() is now called on any link down events
which will reset the netdev object to a proper state.
src/netdev.c:netdev_link_notify() event 16 on ifindex 16
src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
src/netdev.c:netdev_link_notify() event 16 on ifindex 16
src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
src/netdev.c:netdev_deauthenticate_event()
src/netdev.c:netdev_link_notify() event 16 on ifindex 16
src/station.c:station_free()
src/netconfig.c:netconfig_destroy()
src/resolve.c:resolve_systemd_revert() ifindex: 16
src/station.c:station_roam_state_clear() 16
src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
src/netdev.c:netdev_disconnect_event()
Received Deauthentication event, reason: 3, from_ap: false
0 0x472fa4 in station_disconnect_event src/station.c:2916
1 0x472fa4 in station_netdev_event src/station.c:2954
2 0x43a262 in netdev_disconnect_event src/netdev.c:1213
3 0x43a262 in netdev_mlme_notify src/netdev.c:5471
4 0x6706eb in process_multicast ell/genl.c:1029
5 0x6706eb in received_data ell/genl.c:1096
6 0x65e630 in io_callback ell/io.c:120
7 0x65a94e in l_main_iterate ell/main.c:478
8 0x65b0b3 in l_main_run ell/main.c:525
9 0x65b0b3 in l_main_run ell/main.c:507
10 0x65b5cc in l_main_run_with_signal ell/main.c:647
11 0x4124d7 in main src/main.c:532
If an event is in response to some command which is returning an
unexpected value (unexpected with respect to wpas.py) handle_eow
would raise an exception.
Specifically with DPP this was being hit when the URI was being
returned.
The difference between the existing code is that IWD will send the
authentication request, making it the initiator.
This handles the use case where IWD is provided a peers URI containing
its bootstrapping key rather than IWD always providing its own URI.
A new DBus API was added, ConfigureEnrollee().
Using ConfigureEnrollee() IWD will act as a configurator but begin by
traversing a channel list (URI provided or default) and waiting for
presence announcements (with one caveat). When an announcement is
received IWD will send an authentication request to the peer, receive
its reply, and send an authentication confirm.
As with being a responder, IWD only supports configuration to the
currently connected BSS and will request the enrollee switch to this
BSS's frequency to preserve network performance.
The caveat here is that only one driver (ath9k) supports multicast frame
registration which prevents presence frame from being received. In this
case it will be required the the peer URI contains a MAC and channel
information. This is because IWD will jump right into sending auth
requests rather than waiting for a presence announcement.
The frame watch which covers the presence procedure (and most
frames for that matter) needs to support multicast frames for
presence to work. Doing this in frame-xchg seems like the right
choice but only ath9k supports multicast frame registration.
Because of this limited support DPP will register for these frames
manually.
Parses K (key), M (mac), C (class/channels), and V (version) tokens
into a new structure dpp_uri_info. H/I are not parsed since there
currently isn't any use for them.
This was caught by static analysis. As is common this should never
happen in the real world since the only way this can fail (apart from
extreme circumstances like OOM) is if the key size is incorrect, which
it will never be.
Static analysis flagged that 'path' was never being checked (which
should not ever be NULL) but during that review I noticed stat()
was being called, then fstat afterwards.
Adds a new wait argument which, if false, will call the DBus method
and return immediately. This allows the caller to create multiple
radios very quickly, simulating (as close as we can) a wifi card
with dual phy's which appear in the kernel simultaneously.
The name argument was also changed to be mandatory, which is now
required by hwsim.
Currently CreateRadio only allows a single outstanding DBus message
until the radio is fully created. 99% of the time this is just fine
but in order to test dual phy cards there needs to be support for
phy's appearing at the same time.
This required storing the pending DBus message inside the radio object
rather than a single static variable.
The code was refactored to handle the internal radio info objects better
for the various cases:
- Creation from CreateRadio()
- Radio already existed before hwsim started, or created externally
- Existing radio changed name, address, etc.
First, Name is now a required option to CreateRadio(). This allows
the radio info to be pushed to the queue immediately (also allowing the
pending DBus message to be tracked). Then, when the NEW_RADIO event
fires the pending radio can be looked up (by name) and filled with the
remaining info.
If the radio was not found by name but a matching ID was found this is
the 'changed' case and the radio is re-initialized with the changed
values.
If neither name or ID matches the radio was created externally, or
prior to hwsim starting. A radio info object is created at this time
and initialized.
The ID was changed to a signed integer in order to initialize it to an
invalid number -1. Doing this was required since a pending uninitalized
radio ID (0) could match an existing radio ID. This required some
bounds checks in case the kernels counter reaches an extremely high value.
This isn't likely to ever happen in practice.
This tool will decrypt an IWD network profile which was previously
encrypted using a systemd provided key. Either a text passphrase
can be provided (--pass) or a file containing the secret (--file).
This can be useful for debugging, or recovering an encrypted
profile after enabling SystemdEncrypt.
Recently systemd added the ability to pass secret credentials to
services via LoadCredentialEncrypted/SetCredentialEncrypted. Once
set up the service is able to read the decrypted credentials from
a file. The file path is found in the environment variable
CREDENTIALS_DIRECTORY + an identifier. The value of SystemdEncrypt
should be set to the systemd key ID used when the credential was
created.
When SystemdEncrypt is set IWD will attempt to read the decrypted
secret from systemd. If at any point this fails warnings will be
printed but IWD will continue normally. Its expected that any failures
will result in the inability to connect to any networks which have
previously encrypted the passphrase/PSK without re-entering
the passphrase manually. This could happen, for example, if the
systemd secret was changed.
Once the secret is read in it is set into storage to be used for
profile encryption/decryption.
Using storage_decrypt() hotspot can also support profile encyption.
The hotspot consortium name is used as the 'ssid' since this stays
consistent between hotspot networks for any profile.
Some users don't like the idea of storing network credentials in
plaintext on the file system. This patch implements an option to
encrypt such profiles using a secret key. The origin of the key can in
theory be anything, but would typically be provided by systemd via
'LoadEncryptedCredential' setting in the iwd unit file.
The encryption operates on the entire [Security] group as well as all
embedded groups. Once encrypted the [Security] group will be replaced
with two key/values:
EncryptedSalt - A random string of bytes used for the encryption
EncryptedSecurity - A string of bytes containing the encrypted
[Security] group, as well as all embedded groups.
After the profile has been encrypted these values should not be
modified. Note that any values added to [Security] after encryption
has no effect. Once the profile is encrypted there is no way to modify
[Security] without manually decrypting first, or just re-creating it
entirely which effectively treated a 'new' profile.
The encryption/decryption is done using AES-SIV with a salt value and
the network SSID as the IV.
Once a key is set any profiles opened will automatically be encrypted
and re-written to disk. Modules using network_storage_open will be
provided the decrypted profile, and will be unaware it was ever
encrypted in the first place. Similarly when network_storage_sync is
called the profile will by automatically encrypted and written to disk
without the caller needing to do anything special.
A few private storage.c helpers were added to serve several purposes:
storage_init/exit():
This sets/cleans up the encryption key direct from systemd then uses
extract and expand to create a new fixed length key to perform
encryption/decryption.
__storage_decrypt():
Low level API to decrypt an l_settings object using a previously set
key and the SSID/name for the network. This returns a 'changed' out
parameter signifying that the settings need to be encrypted and
re-written to disk. The purpose of exposing this is for a standalone
decryption tool which does not re-write any settings.
storage_decrypt():
Wrapper around __storage_decrypt() that handles re-writing a new
profile to disk. This was exposed in order to support hotspot profiles.
__storage_encrypt():
Encrypts an l_settings object and returns the full profile as data
This got merged without a few additional fixes, in particular an
over 80 character line and incorrect length check.
Fixes: d8116e8828b4 ("dpp-util: add dpp_point_from_asn1()")
When we detect a new phy being added, we schedule a filtered dump of
the newly detected WIPHY and associated INTERFACEs. This code path and
related processing of the dumps was mostly shared with the un-filtered
dump of all WIPHYs and INTERFACEs which is performed when iwd starts.
This normally worked fine as long as a single WIPHY was created at a
time. However, if multiphy new phys were detected in a short amount of
time, the logic would get confused and try to process phys that have not
been probed yet. This resulted in iwd trying to create devices or not
detecting devices properly.
Fix this by only processing the target WIPHY and related INTERFACEs
when the filtered dump is performed, and not any additional ones that
might still be pending.
While here, remove a misleading comment:
manager_wiphy_check_setup_done() would succeed only if iwd decided to
keep the default interfaces created by the kernel.
This simulates the conditions that trigger a free-after-use which was
fixed with:
2c355db7 ("scan: remove periodic scans from queue on abort")
This behavior can be reproduced reliably using this test with the above
patch reverted.
This debug print was before any checks which could bail out prior to
autoconnect starting. This was confusing because debug logs would
contain multiple "station_autoconnect_start()" prints making you think
autoconnect was started several times.
The periodic scan code was refactored to make normal scans and
periodic scans consistent by keeping both in the same queue. But
that change left out the abort path where periodic scans were not
actually removed from the queue.
This fixes a rare crash when a periodic scan has been triggered and
the device goes down. This path never removes the request from the
queue but still frees it. Then when the scan context is removed the
stale request is freed again.
0 0x4bb65b in scan_request_cancel src/scan.c:202
1 0x64313c in l_queue_clear ell/queue.c:107
2 0x643348 in l_queue_destroy ell/queue.c:82
3 0x4bbfb7 in scan_context_free src/scan.c:209
4 0x4c9a78 in scan_wdev_remove src/scan.c:2115
5 0x42fecd in netdev_free src/netdev.c:965
6 0x445827 in netdev_destroy src/netdev.c:6507
7 0x52beb9 in manager_config_notify src/manager.c:765
8 0x67084b in process_multicast ell/genl.c:1029
9 0x67084b in received_data ell/genl.c:1096
10 0x65e790 in io_callback ell/io.c:120
11 0x65aaae in l_main_iterate ell/main.c:478
12 0x65b213 in l_main_run ell/main.c:525
13 0x65b213 in l_main_run ell/main.c:507
14 0x65b72c in l_main_run_with_signal ell/main.c:647
15 0x4124e7 in main src/main.c:532
If netdev_connect_failed is called before netdev_get_oci_cb() the
netdev's handshake will be destroyed and ultimately crash when the
callback is called.
This patch moves the cancelation into netdev_connect_free rather than
netdev_free.
++++++++ backtrace ++++++++
0 0x7f4e1787d320 in /lib64/libc.so.6
1 0x42634c in handshake_state_set_chandef() at src/handshake.c:1057
2 0x40a11b in netdev_get_oci_cb() at src/netdev.c:2387
3 0x483d7b in process_unicast() at ell/genl.c:986
4 0x480d3c in io_callback() at ell/io.c:120
5 0x48004d in l_main_iterate() at ell/main.c:472 (discriminator 2)
6 0x4800fc in l_main_run() at ell/main.c:521
7 0x48032c in l_main_run_with_signal() at ell/main.c:649
8 0x403e95 in main() at src/main.c:532
9 0x7f4e17867b75 in /lib64/libc.so.6
+++++++++++++++++++++++++++
Commit 4d2176df2985 ("handshake: Allow event handler to free handshake")
introduced a re-entrancy guard so that handshake_state objects that are
destroyed as a result of the event do not cause a crash. It rightly
used a temporary object to store the passed in handshake. Unfortunately
this caused variable shadowing which resulted in crashes fixed by commit
d22b174a7318 ("handshake: use _hs directly in handshake_event").
However, since the temporary was no longer used, this fix itself caused
a crash:
#0 0x00005555f0ba8b3d in eapol_handle_ptk_1_of_4 (sm=sm@entry=0x5555f2b4a920, ek=0x5555f2b62588, ek@entry=0x16, unencrypted=unencrypted@entry=false) at src/eapol.c:1236
1236 handshake_event(sm->handshake,
(gdb) bt
#0 0x00005555f0ba8b3d in eapol_handle_ptk_1_of_4 (sm=sm@entry=0x5555f2b4a920, ek=0x5555f2b62588, ek@entry=0x16, unencrypted=unencrypted@entry=false) at src/eapol.c:1236
#1 0x00005555f0bab118 in eapol_key_handle (unencrypted=<optimized out>, frame=<optimized out>, sm=0x5555f2b4a920) at src/eapol.c:2343
#2 eapol_rx_packet (proto=<optimized out>, from=<optimized out>, frame=<optimized out>, unencrypted=<optimized out>, user_data=0x5555f2b4a920) at src/eapol.c:2665
#3 0x00005555f0bac497 in __eapol_rx_packet (ifindex=62, src=src@entry=0x5555f2b62574 "x\212 J\207\267", proto=proto@entry=34958, frame=frame@entry=0x5555f2b62588 "\002\003",
len=len@entry=121, noencrypt=noencrypt@entry=false) at src/eapol.c:3017
#4 0x00005555f0b8c617 in netdev_control_port_frame_event (netdev=0x5555f2b64450, msg=0x5555f2b62588) at src/netdev.c:5574
#5 netdev_unicast_notify (msg=msg@entry=0x5555f2b619a0, user_data=<optimized out>) at src/netdev.c:5613
#6 0x00007f60084c9a51 in dispatch_unicast_watches (msg=0x5555f2b619a0, id=<optimized out>, genl=0x5555f2b3fc80) at ell/genl.c:954
#7 process_unicast (nlmsg=0x7fff61abeac0, genl=0x5555f2b3fc80) at ell/genl.c:973
#8 received_data (io=<optimized out>, user_data=0x5555f2b3fc80) at ell/genl.c:1098
#9 0x00007f60084c61bd in io_callback (fd=<optimized out>, events=1, user_data=0x5555f2b3fd20) at ell/io.c:120
#10 0x00007f60084c536d in l_main_iterate (timeout=<optimized out>) at ell/main.c:478
#11 0x00007f60084c543e in l_main_run () at ell/main.c:525
#12 l_main_run () at ell/main.c:507
#13 0x00007f60084c5670 in l_main_run_with_signal (callback=callback@entry=0x5555f0b89150 <signal_handler>, user_data=user_data@entry=0x0) at ell/main.c:647
#14 0x00005555f0b886a4 in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:532
This happens when the driver does not support rekeying, which causes iwd to
attempt a disconnect and re-connect. The disconnect action is
taken during the event callback and destroys the underlying eapol state
machine. Since a temporary isn't used, attempting to dereference
sm->handshake results in a crash.
Fix this by introducing a UNIQUE_ID macro which should prevent shadowing
and using a temporary variable as originally intended.
Fixes: d22b174a7318 ("handshake: use _hs directly in handshake_event")
Fixes: 4d2176df2985 ("handshake: Allow event handler to free handshake")
Reported-By: Toke Høiland-Jørgensen <toke@toke.dk>
Tested-by: Toke Høiland-Jørgensen <toke@toke.dk>
There is no need to punch the holes for netdev/wheel groups to send to
the .Agent interface. This is only done by the iwd daemon itself and
the policy for user 'root' already takes care of this.
A select few drivers send this instead of SIGNAL_MBM. The docs say this
value is the signal 'in unspecified units, scaled to 0..100'. The range
for SIGNAL_MBM is -10000..0 so this can be scaled to the MBM range easy
enough...
Now, this isn't exactly correct because this value ultimately gets
returned from GetOrderedNetworks() and is documented as 100 * dBm where
in reality its just a unit-less signal strength value. Its not ideal, but
this patch at least will fix BSS ranking for these few drivers.
2022-01-31 13:40:19 -06:00
369 changed files with 30550 additions and 9425 deletions
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.