This makes it clear the BSS being selected for a connection/roam has
any quirks associated with its OUI(s) and that IWD may behave
differently based on these.
If a BSS is requesting IWD roam elsewhere but does not include a
preferred candidate list try getting a neighbor report before doing
a full scan.
If the limited scan based on the candidate list comes up empty this
would previously result in IWD giving up on the AP roam entirely.
This patch also improves that behavior slightly by doing a full
scan afterwards as a last ditch effort. If no BSS's are found after
that, IWD will give up on the AP roam.
This has been a long standing issue on Aruba APs where the scan
IEs differ from the IEs received during FT. For compatibility we
have been carrying a patch to disable the replay counter check but
this isn't something that was ever acceptable for upstream. Now
with the addition of vendor quirks this check can be disabled only
for the OUI of Aruba APs.
Reported-by: Michael Johnson <mjohnson459@gmail.com>
Co-authored-by: Michael Johnson <<mjohnson459@gmail.com>
ignore_bss_tm_candidates:
When a BSS requests a station roam it can optionally include a
list of BSS's that can be roamed to. IWD uses this list and only
scans on those frequencies. In some cases though the AP's list
contains very poor options and it would be better for IWD to
request a full neighbor report.
replay_counter_mismatch:
On some Aruba APs there is a mismatch in the replay counters
between what is seen in scans versus authentications/associations.
This difference is not allowed in the spec, therefore IWD will
not connect. This quirk is intended to relax that check.
This module will provide a database for known issues or quirks with
wireless vendors.
The vendor_quirks_append_for_oui() API is intended to be called from
scan.c when parsing vendor attributes. This will lookup any quirks
associated with the OUI provided and combine them into an existing
vendor_quirk structure. This can be repeated against all the vendor
OUI's seen in the scan then referenced later to alter IWD behavior.
In the future more critera could be added such as MAC address prefix
or more generalized IE matches e.g.
vendor_quirks_append_for_mac()
vendor_quirks_append_for_ie()
etc.
If there were no BSS candidates found after trying to roam make
sure the old roam_freqs list gets cleared so IWD doesn't end up
scanning potentially old frequencies on the next retry.
Some drivers do not handle the colocated scan flag very well and this
results in BSS's not being seen in scans. This of course results in
very poor behavior.
This has been seen on ath11k specifically but after some
conversations [1] on the linux-wireless mailing list others have
reported issues with iwlwifi acting similarly. Since there are many
hardware variants that use both ath11k and iwlwifi this new quirk
isn't being forced to those drivers, but let users configure IWD to
disable the flag if needed.
[1] https://lore.kernel.org/linux-wireless/d1e75a08-047d-7947-d51a-2e486efead77@candelatech.com/
In kernel 6.8 a new CMD_ASSOCIATE failure path was added which checks
if the AP has a channel switch in progress. Eariler patches update
IWD into handling this case better, and this new test exercises that.
After CSA IE parsing was added to the kernel this opened up the
possibility that associations could be rejected locally based on
the contents of this CSA IE in the AP's beacons. Overall, it was
always possible for a local rejection but this case was never
considered by IWD. The CSA-based rejection is something that can
and does happen out in the wild.
When this association rejection happens it desync's IWD and the
kernel's state:
1. IWD begins an FT roam. Authenticates successfully, then proceeds
to calling netdev_ft_reassociate().
2. Immediately IWD transitions to a ft-roaming state and waits for
an association response.
3. CMD_ASSOCIATE is rejected by the kernel in the ACK which IWD
handles by sending a deauthenticate command to the kernel (since
we have a valid authentication to the new BSS).
4. Due to a bug IWD uses the target BSSID to deauthenticate which
the kernel rejects since it has no knowledge of this auth. This
error is not handled or logged.
5. IWD proceeds, assuming its deauthenticated, and transitions to a
disconnected state. The kernel remains "connected" which of course
prevents any future connections.
A simple fix for this is to address the bug (4) in IWD that deauths
using the current BSS roam target. This is actually legacy behavior
from back when IWD used CMD_AUTHENTICATE. Today the kernel is unaware
that IWD authenticated so a deauth is not going to be effective.
Instead we can issue a CMD_DISCONNECT. This is somewhat of a large
hammer, but since the handshake and internal state has already been
modified to use the new target BSS we cannot go back and maintain the
existing connect (though it is _possible_, see the TODO in the
patch).
In an ideal world userspace should never be getting a channel switch
event unless connected to an AP, but alas this has been seen at least
with ath10k hardware. This causes IWD to crash since the logic
assumes netdev->handshake is set.
These two newly parsed station info params "inactive time" and the
"connected time" would be helpful to track the duration (in ms) for
which the station was last inactive and the total duration (in s) for
which the station is currently connected to the AP.
When the wlan device is in STA mode, these fields represent the info
of this station device. And when wlan device is in AP mode, then these
fields repesents the stations that are connected to this AP device.
Since netconfig is now part of the Connect() call from a DBus
perspective add a note indicating that this method has the potential
to take a very long time if there are issues with DHCP.
Since the method return to Connect() and ConnectBssid() come after
netconfig some tests needed to be updated since they were waiting
for the method return before continuing. For timeout-based tests
specifically this caused them to fail since before they expected
the return to come before the connection was actually completed.
Let the caller specify the method timeout if there is an expectation
that it could take a long time.
For the conventional connect call (not the "bssid" debug variant) let
them pass their own callback handlers. This is useful if we don't
want to wait for the connect call to finish, but later get some
indication that it did finish either successfully or not.
A netconfig failure results in a failed connection which restarts
autoconnect and prevents IWD from retrying the connection on any
other BSS's within the network as a whole. When autoconnect restarts
IWD will scan and choose the "best" BSS which is likely the same as
the prior attempt. If that BSS is somehow misconfigured as far as
DHCP goes, it will likely fail indefinitely and in turn cause IWD to
retry indefinitely.
To improve this netconfig has been adopted into the IWD's BSS retry
logic. If netconfig fails this will not result in IWD transitioning
to a disconnected state, and instead the BSS will be network
blacklisted and the next will be tried. Only once all BSS's have been
tried will IWD go into a disconnected state and start autoconnect
over.
When netconfig is enabled the DBus reply was being sent in
station_connect_ok(), before netconfig had even started. This would
result in a call to Connect() succeeding from a DBus perspective but
really netconfig still needed to complete before IWD transitioned
to a connected state.
Fixes: 72e7d3ceb83d ("station: Handle NETCONFIG_EVENT_FAILED")
This adds a new API network_clear_blacklist() and removes this
functionality from network_connected(). This is done to support BSS
iteration when netconfig is enabled. Since a call to
network_connected() will happen prior to netconfig completing we
cannot clear the blacklist until netconfig has either passed or
failed.
If this fails, in some cases, -EAGAIN would be returned up to netdev
which would then assume a retry would be done automatically. This
would not in fact happen since it was an internal SAE failure which
would result in the connect method return to never get sent.
Now if sae_send_commit() fails, return -EPROTO which will cause
netdev to fail the connection.
A BSS can temporarily reject associations and provide a delay that
the station should wait for before retrying. This is useful when
sane values are used, but taking it to the extreme an AP could
potentially request the client wait UINT32_MAX TU's which equates
to 49 days.
Either due to a bug, or worse by design, the kernel will wait for
however long that timeout is. Luckily the kernel also sends an event
to userspace with the amount of time it will be waiting. To guard
against excessive timeouts IWD will now handle this event and enforce
a maximum allowed value. If the timeout exceeds this IWD will
deauthenticate.
Specifically for the NO_MORE_STAS reason code, add the BSS to the
(now renamed) AP_BUSY blacklist to avoid roaming to this BSS for
the near future.
Since we are now handling individual reason codes differently the
whole IS_TEMPORARY_STATUS macro was removed and replaced with a
case statement.
The initial pass of this feature only envisioned BSS transition
management frames as the trigger to "roam blacklist" a BSS, hence
the original name. But some APs actually utilize status codes that
also indicate to the stations that they are busy, or not able to
handle more connections. This directly aligns with the original
motivation of the "roam blacklist" series and these events should
also trigger this type of blacklist.
First, since we will be applying this blacklist to cases other
than being told to roam, rename this reason code internally to
BLACKLIST_REASON_AP_BUSY. The config option is also being renamed
to [Blacklist].InitialAccessPointBusyTimeout while also supporting
the old config option, but warning that it is deprecated.
Some drivers/hardware are more strict about limiting use of
certain frequencies on startup until the regulatory domain has
been set. For most cards the only way to set the regulatory domain
is to scan and see BSS's nearby that advertise the country they
reside in.
This is particularly important for AP mode since AP's are always
emitting radiation from beacons and will not start until the desired
frequency is both enabled and allows IR. To make this process
seamless in IWD we will first check that the desired frequency is
enabled/IR and if not issue a scan to (hopefully) get the regulatory
domain set.
A prior patch broke this by checking the return of
l_dbus_message_iter_next_entry. This was really subtle but the logic
actually relied on _not_ checking that return in order to handle
empty lists.
Instead of reverting the logic was adapted/commented to make it more
clear what the API expects from DBus. If list contains at least one
value the first element path will get set, if it contains zero
values "new_path" will be set to NULL which will then cause the
list to be cleared later on.
This both fixes the regression, and makes it clear that a zero
element list is supported and handled.
The length of EncryptedSecurity was assumed to be at least 16 bytes
and anything less would underflow the length to l_malloc.
Fixes: 01cd8587606b ("storage: implement network profile encryption")
The MAC and version elements weren't super critical but the channel
and bootstrapping key elements would result in memory leaks if there
were duplicates.
This patch now will not allow duplicate elements in the URI.
Fixes: f7f602e1b1e7 ("dpp-util: add URI parsing")
The survey arrays were exactly the number of valid channels for a
given band (e.g. 14 for 2.4GHz) but since channels start at 1 this
means that the last channel for a band would overflow the array.
Fixes: 35808debaefd ("scan: use GET_SURVEY for SNR calculation in ranking")
Supporting PMKSA on fullmac drivers requires that we set the PMKSA
into the kernel as well as remove it. This can now be triggered
via the new PMKSA driver callbacks which are implemented and set
with this patch.
In order to support fullmac drivers the PMKSA entries must be added
and removed from the kernel. To accomplish this a set of driver
callbacks will be added to the PMKSA module. In addition a new
pmksa_cache_free API will be added whos only purpose is to handle
the removal from the kernel.
The iwd_notice function was more meant for special purpose events
not general debug prints. For these error conditions we should be
using l_warn. For the informational "External Auth to SSID" log
we already print this information when connecting from station. In
addition there are logs when performing external auth so it should
be very obvious external auth is being used without this log.
The netdev frame watches got cleaned up upon the interface going down
which works if the interface is simply being toggled but when IWD
shuts down it first shuts down the interface, then immediately frees
netdev. If a watched frame arrives immediately after that before the
interface shutdown callback it will reference netdev, which has been
freed.
Fix this by clearing out the frame watches in netdev_free.
==147== Invalid read of size 8
==147== at 0x408ADB: netdev_neighbor_report_frame_event (netdev.c:4772)
==147== by 0x467C75: frame_watch_unicast_notify (frame-xchg.c:234)
==147== by 0x4E28F8: __notifylist_notify (notifylist.c:91)
==147== by 0x4E2D37: l_notifylist_notify_matches (notifylist.c:204)
==147== by 0x4A1388: process_unicast (genl.c:844)
==147== by 0x4A1388: received_data (genl.c:972)
==147== by 0x49D82F: io_callback (io.c:105)
==147== by 0x49C93C: l_main_iterate (main.c:461)
==147== by 0x49CA0B: l_main_run (main.c:508)
==147== by 0x49CA0B: l_main_run (main.c:490)
==147== by 0x49CC3F: l_main_run_with_signal (main.c:630)
==147== by 0x4049EC: main (main.c:614)
If an AP directed roam frame comes in while IWD is roaming its
still valuable to parse that frame and blacklist the BSS that
sent it.
This can happen most frequently during a roam scan while connected
to an overloaded BSS that is requesting IWD roams elsewhere.
If the BSS is requesting IWD roam elsewhere add this BSS to the
blacklist using BLACKLIST_REASON_ROAM_REQUESTED. This will lower
the chances of IWD roaming/connecting back to this BSS in the
future.
This then allows IWD to consider this blacklist state when picking
a roam candidate. Its undesireable to fully ban a roam blacklisted
BSS, so some additional sorting logic has been added. Prior to
comparing based on rank, BSS's will be sorted into two higher level
groups:
Above Threshold - BSS is above the RoamThreshold
Below Threshold - BSS is below the RoamThreshold
Within each of these groups the BSS may be roam blacklisted which
will position it at the bottom of the list within its respecitve
group.
This adds a new (less severe) blacklist reason as well as an option
to configure the timeout. This blacklist reason will be used in cases
where a BSS has requested IWD roam elsewhere. At that time a new
blacklist entry will be added which will be used along with some
other criteria to determine if IWD should connect/roam to that BSS
again.
Now that we have multiple blacklist reasons there may be situations
where a blacklist entry already exists but with a different reason.
This is going to be handled by the reason severity. Since we have
just two reasons we will treat a connection failure as most severe
and a roam requested as less severe. This leaves us with two
possible situations:
1. BSS is roam blacklisted, then gets connection blacklisted:
The reason will be "promoted" to connection blacklisted.
2. BSS is connection blacklisted, then gets roam blacklisted:
The blacklist request will be ignored
When pruning the list check_if_expired was comparing to the maximum
amount of time a BSS can be blacklisted, not if the current time had
exceeded the expirationt time. This results in blacklist entries
hanging around longer than they should, which would result in them
poentially being blacklisted even longer if there was another reason
to blacklist in the future.
Instead on prune check the actual expiration and remove the entry if
its expired. Doing this removes the need to check any of the times
in blacklist_contains_bss since prune will remove any expired entries
correctly.
To both prepare for some new blacklisting behavior and allow for
easier consolidation of the network-specific blacklist include a
reason enum for each entry. This allows IWD to differentiate
between multiple blacklist types. For now only the existing
"permanent" type is being added which prevents connections to that
BSS via autoconnect until it expires.
Allowing the timeout blacklist to be disabled has introduced a bug
where a failed connection will not result in the BSS list to be
traversed. This causes IWD to retry the same BSS over and over which
be either a) have some issue preventing a connection or b) may simply
be unreachable/out of range.
This is because IWD was inherently relying on the timeout blacklist
to flag BSS's on failures. With it disabled there was nothing to tell
network_bss_select that we should skip the BSS and it would return
the same BSS indefinitely.
To fix this some of the blacklisting logic was re-worked in station.
Now, a BSS will always get network blacklisted upon a failure. This
allows network.c to traverse to the next BSS upon failure.
For auth/assoc failures we will then only timeout blacklist under
certain conditions, i.e. the status code was not in the temporary
list.
Fixes: 77639d2d452e ("blacklist: allow configuration to disable the blacklist")
Certain use cases may not need or want this feature so allowing it to
be disabled is a much cleaner way than doing something like setting
the timeouts very low.
Now [Blacklist].InitialTimeout can be set to zero which will prevent
any blacklisting.
In addition some other small changes were added:
- Warn if the multiplier is 0, and set to 1 if so.
- Warn if the initial timeout exceeds the maximum timeout.
- Log if the blacklist is disabled
- Use L_USEC_PER_SEC instead of magic numbers.
SAE/WPA3 is completely broken on brcmfmac, at least without a custom
kernel patch which isn't included in many OS distributions. In order
to help with this add a driver quirk so devices with brcmfmac can
utilize WPA2 instead of WPA3 and at least connect to networks at
this capacity until the fix is more widely distributed.
Instead of just printing the PMKSA pointer separate this into two
separate debug messages, one for if the PMKSA exists and the other
if it does not. In addition print out the MAC of the AP so we have
a reference of which PMKSA this is.
With external auth there is no associate event meaning the auth proto
never gets freed, which prevents eapol from starting inside the
OCI callback. Check for this specific case and free the auth proto
after signaling that external auth has completed.
The user can now limit the size and count of PCAP files iwmon will
create. This allows iwmon to run for long periods of time without
filling up disk space.
This implements support for "rolling captures" by allowing iwmon to
limit the PCAP file size and number of PCAP's that are created.
This is a useful feature when long term monitoring is needed. If
there is some rare behavior requiring iwmon to run for days, months,
or longer the resulting PCAP file would become quite large and fill
up disk space.
When enabled (command line arguments in subsequent patch) the PCAP
file size is checked on each write. If it exceeds the limit a new
PCAP file will be created. Once the number of old PCAP files reaches
the set limit the oldest PCAP will be removed from disk.
For syncing iwmon captures with other logging its useful to
timestamp in some absolute format like UTC. This adds an
option which allows the user to specify what time format to
show. For now support:
delta - (default) The time delta between the first packet
and the current packet.
utc - The packet time in UTC
The ath10k driver has shown some performance issues, specifically
packet loss, when frame watches are registered with the multicast
RX flag set. This is relevant for DPP which registers for these
when DPP starts (if the driver supports it). This has only been
observed when there are large groups of clients all using the same
wifi channel so its unlikely to be much of an issue for those using
IWD/ath10k and DPP unless you run large deployments of clients.
But for large deployments with IWD/ath10k we need a way to disable
the multicast RX registrations. Now, with the addition of
wiphy_supports_multicast_rx we can both check that the driver
supports this as well as if its been disabled by the driver quirk.
This driver quirk and associated helper API lets other modules both
check if multicast RX is supported, and if its been disabled via
the driver quirk setting.
The actual connection piece of this is very minimal, and only
requires station to check if there is a PMKSA cached, and if so
include the PMKID in the RSNE. Netdev then takes care of the rest.
The remainder of this patch is the error handling if a PMKSA
connection fails with INVALID_PMKID. In this case IWD should retry
the same BSS without PMKSA.
An option was also added to disable PMKSA if a user wants to do
that. In theory PMKSA is actually less secure compared to SAE so
it could be something a user wants to disable. Going forward though
it will be enabled by default as its a requirement from the WiFi
alliance for WPA3 certification.
To prepare for PMKSA support station needs access to the handshake
object. This is because if PMKSA fails due to an expired/missing
PMKSA on the AP station should retry using the standard association.
This poses a problem currently because netdev frees the handshake
prior to calling the connect callback.
This was quite simple and only requiring caching the PMKSA after a
successful handshake, and using the correct authentication type
for connections if we have a prior PMKSA cached.
This is only being added for initial SAE associations for now since
this is where we gain the biggest improvement, in addition to the
requirement by the WiFi alliance to label products as "WPA3 capable"
This is needed in order to clear the PMKSA from the handshake state
without actually putting it back into the cache. This is something
that will be needed in case the AP rejects the association due to
an expired (or forgotten) PMKSA.
The majority of this patch was authored by Denis Kenzior, but
I have appended setting the PMK inside handshake_state_set_pmksa
as well as checking if the pmkid exists in
handshake_state_steal_pmkid.
Authored-by: Denis Kenzior <denkenz@gmail.com>
Authored-by: James Prestwood <prestwoj@gmail.com>
There are quite a few tests here for various scenarios and PMKSA
throws a wrench into that. Rather than potentially breaking the
tests in attempt to get them working with PMKSA, just disable PMKSA.
Since IWD doesn't utilize DBus signals in "normal" operations its
fine to lazy initialize any of the DBus interfaces since properties
can be obtained as needed with Get/GetAll.
For test-runner though StationDebug uses signals for debug events
and until the StationDebug class is initialized (via a method call
or property access) all signals will be lost. Fix this by always
initializing the StationDebug interface when a Device class is
initialized.
This adds a ref count to the handshake state object (as well as
ref/unref APIs). Currently IWD is careful to ensure that netdev
holds the root reference to the handshake state. Other modules do
track it themselves, but ensure that it doesn't get referenced
after netdev frees it.
Future work related to PMKSA will require that station holds a
references to the handshake state, specifically for retry logic,
after netdev is done with it so we need a way to delay the free
until station is also done.
The utilization rank factor already existed but was very rigid
and only checked a few values. This adds the (optional) ability
to start applying an exponentially decaying factor to both
utilization and station count after some threshold is reached.
This area needs to be re-worked in order to support very highly
loaded networks. If a network either doesn't support client
balancing or does it poorly its left up to the clients to choose
the best BSS possible given all the information available. In
these cases connecting to a highly loaded BSS may fail, or result
in a disconnect soon after connecting. In these cases its likely
better for IWD to choose a slightly lower RSSI/datarate BSS over
the conventionally 'best' BSS in order to aid in distributing
the network load.
The thresholds are currently optional and not enabled by default
but if set they behave as follows:
If the value is above the threshold it is mapped to an integer
between 0 and 30. (using a starting range of <value> - 255).
This integer is then used to index in the exponential decay table
to get a factor between 1 and 0. This factor is then applied to
the rank.
Note that as the value increases above the threshold the rank
will be increasingly effected, as is expected for an exponential
function. These option should be used with care as it may have
unintended consequences, especially with very high load networks.
i.e. you may see IWD roaming to BSS's with much lower signal if
there are high load BSS's nearby.
To maintain the existing behavior if there is no utilization
factor set in main.conf the legacy thresholds/factors will be
used.
This is copied from network.c that uses a static table to lookup
exponential decay values by index (generated from 1/pow(n, 0.3)).
network.c uses this for network ranking but it can be useful for
BSS ranking as well if you need to apply some exponential backoff
to a value.
This has been needed elsewhere but generally shortcuts could be
taken mapping with ranges starting/ending with zero. This is a
more general linear mapping utility to map values between any
two ranges.
gcc-15 switched to -std=c23 by default:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=55e3bd376b2214e200fa76d12b67ff259b06c212
As a result `iwd` fails the build as:
../src/crypto.c:1215:24: error: incompatible types when returning type '_Bool' but 'struct l_ecc_point *' was expected
1215 | return false;
| ^~~~~
Signed-off-by: Rudi Heitbaum <rudi@heitbaum.com>
gcc-15 switched to -std=c23 by default:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=55e3bd376b2214e200fa76d12b67ff259b06c212
As a result `iwd` fails the build as:
wired/ethdev.c: In function 'pae_open':
wired/ethdev.c:340:55:
error: passing argument 4 of 'l_io_set_read_handler'
from incompatible pointer type [-Wincompatible-pointer-types]
340 | l_io_set_read_handler(pae_io, pae_read, NULL, pae_destroy);
| ^~~~~~~~~~~
| |
| void (*)(void)
In file included from ...-ell-0.70-dev/include/ell/ell.h:19,
from wired/ethdev.c:38:
...-ell-0.70-dev/include/ell/io.h:33:68:
note: expected 'l_io_destroy_cb_t' {aka 'void (*)(void *)'}
but argument is of type 'void (*)(void)'
33 | void *user_data, l_io_destroy_cb_t destroy);
| ~~~~~~~~~~~~~~~~~~^~~~~~~
C23 changed the meaning of `void (*)()` from partially defined prototype
to `void (*)(void)`.
The 3rd byte of the country code was being printed as ASCII but this
byte isn't always a printable character. Instead we can check what
the value is and describe what it means from the spec.
These frequencies were seen being advertised by a driver and IWD has
no operating class/channel mapping for them. Specifically 5960 was
causing issues due to a few bugs and mapping to channel 2 of the 6ghz
band. Those bugs have now been resolved.
If these frequencies can be supported in a clean manor we can remove
this test, but until then ensure IWD does not parse them.
After the band is established we check the e4 table for the channel
that matches. The problem here is we will end up checking all the
operating classes, even those that are not within the band that was
determined. This could result in false positives and return a
channel that doesn't make sense.
When the frequencies/channels were parsed there was no check that the
resulting band matched what was expected. Now, pass the band object
itself in which has the band set to what is expected.
If IPv6 is disabled or not supported at the kernel level writing the
sysfs settings will fail. A few of them had a support check but this
patch adds a supported bool to the remainder so we done get errors
like:
Unable to write drop_unsolicited_na to /proc/sys/net/ipv6/conf/wlan0/drop_unsolicited_na
Similar to several other modules DPP registers for its frame
watches on init then ignores anything is receives unless DPP
is actually running.
Due to some recent issues surrounding ath10k and multicast frames
it was discovered that simply registering for multicast RX frames
causes a significant performance impact depending on the current
channel load.
Regardless of the impact to a single driver, it is actually more
efficient to only register for the DPP frames when DPP starts
rather than when IWD initializes. This prevents any of the frames
from hitting userspace which would otherwise be ignored.
Using the frame-xchg group ID's we can only register for DPP
frames when needed, then close that group and the associated
frame watches.
DPP optionally uses the multicast RX flag for frame registrations but
since frame-xchg did not support that, it used its own registration
internally. To avoid code duplication within DPP add a flag to
frame_watch_add in order to allow DPP to utilize frame-xchg.
The selection loop was choosing an initial candidate purely for
use of the "fallback_to_blacklist" flag. But we have a similar
case with OWE transitional networks where we avoid the legacy
open network in preference for OWE:
/* Don't want to connect to the Open BSS if possible */
if (!bss->rsne)
continue;
If no OWE network gets selected we may iterate all BSS's and end
the loop, which then returns NULL.
To fix this move the blacklist check earlier and still ignore any
BSS's in the blacklist. Also add a new flag in the selection loop
indicating an open network was skipped. If we then exhaust all
other BSS's we can return this candidate.
Some drivers like brcmfmac don't support OWE but from userspace its
not possible to query this information. Rather than completely
blacklist brcmfmac we can allow the user to configure this and
disable OWE in IWD.
The "UK" alpha2 code is not the official code for the United Kingdom
but is a "reserved" code for compatibility. The official alpha2 is
"GB" which is being added to the EU list. This fixes issues parsing
neighbor reports, for example:
src/station.c:parse_neighbor_report() Neighbor report received for xx:xx:xx:xx:xx:xx: ch 136 (oper class 3), MD not set
Failed to find band with country string 'GB 32' and oper class 3, trying fallback
src/station.c:station_add_neighbor_report_freqs() Ignored: unsupported oper class
Test handling of technically illegal but harmless cloned IEs.
Based on real traffic captured from retail APs.
As cloned IEs are now allowed the
"/IE order/Bad (Duplicate + Out of Order IE) 1"
test payload has been altered to be more-wrong so it still fails
verification as expected.
Prior to adding the polling fallback this code path was only used for
signal level list notifications and netdev_rssi_polling_update() was
structured as such, where if the RSSI list feature existed there was
nothing to be done as the kernel handled the notifications.
For certain mediatek cards this is broken, hence why the fallback was
added. But netdev_rssi_polling_update() was never changed to take
this into account which bypassed the timer cleanup on disconnections
resulting in a crash when the timer fired after IWD was disconnected:
iwd: ++++++++ backtrace ++++++++
iwd: #0 0x7b5459642520 in /lib/x86_64-linux-gnu/libc.so.6
iwd: #1 0x7b54597aedf4 in /lib/x86_64-linux-gnu/libc.so.6
iwd: #2 0x49f82d in l_netlink_message_append() at ome/jprestwood/iwd/ell/netlink.c:825
iwd: #3 0x4a0c12 in l_genl_msg_append_attr() at ome/jprestwood/iwd/ell/genl.c:1522
iwd: #4 0x405c61 in netdev_rssi_poll() at ome/jprestwood/iwd/src/netdev.c:764
iwd: #5 0x49cce4 in timeout_callback() at ome/jprestwood/iwd/ell/timeout.c:70
iwd: #6 0x49c2ed in l_main_iterate() at ome/jprestwood/iwd/ell/main.c:455 (discriminator 2)
iwd: #7 0x49c3bc in l_main_run() at ome/jprestwood/iwd/ell/main.c:504
iwd: #8 0x49c5f0 in l_main_run_with_signal() at ome/jprestwood/iwd/ell/main.c:632
iwd: #9 0x4049ed in main() at ome/jprestwood/iwd/src/main.c:614
iwd: #10 0x7b5459629d90 in /lib/x86_64-linux-gnu/libc.so.6
iwd: #11 0x7b5459629e40 in /lib/x86_64-linux-gnu/libc.so.6
iwd: +++++++++++++++++++++++++++
To fix this we need to add checks for the cqm_poll_fallback flag in
netdev_rssi_polling_update().
Certain FullMAC drivers do not expose CMD_ASSOCIATE/CMD_AUTHENTICATE,
but lack the ability to fully offload SAE connections to the firmware.
Such connections can still be supported on such firmware by using
CMD_EXTERNAL_AUTH & CMD_FRAME. The firmware sets the
NL80211_FEATURE_SAE bit (which implies support for CMD_AUTHENTICATE, but
oh well), and no other offload extended features.
When CMD_CONNECT is issued, the firmware sends CMD_EXTERNAL_AUTH via
unicast to the owner of the connection. The connection owner is then
expected to send SAE frames with the firmware using CMD_FRAME and
receive authenticate frames using unicast CMD_FRAME notifications as
well. Once SAE authentication completes, userspace is expected to
send a final CMD_EXTERNAL_AUTH back to the kernel with the corresponding
status code. On failure, a non-0 status code should be used.
Note that for historical reasons, SAE AKM sent in CMD_EXTERNAL_AUTH is
given in big endian order, not CPU order as is expected!
The TX or RX bitrate attributes can contain zero nested attributes.
This causes netdev_parse_bitrate() to fail, but this shouldn't then
cause the overall parsing to fail (we just don't have those values).
Fix this by continuing to parse attributes if either the TX/RX
bitrates fail to parse.
If the affinity watch is removed by setting an empty list the
disconnect callback won't be called which was the only place
the watch ID was cleared. This resulted in the next SetProperty call
to think a watch existed, and attempt to compare the sender address
which would be NULL.
The watch ID should be cleared inside the destroy callback, not
the disconnect callback.
If we scan a huge number of frequencies the PKEX timeout can get
rather large. This was overlooked in a prior patch who's intent
was to reduce the PKEX time, but in these cases it increased it.
Now the timeout will be capped at 2 minutes, but will still be
as low as 10 seconds for a single frequency.
In addition there was no timer reset once PKEX was completed.
This could cause excessive waits if, for example, the peer left
the channel mid-authentication. IWD would just wait until the
long PKEX timeout to eventually reset DPP. Once PKEX completes
we can assume that this peer will complete authentication quickly
and if not, we can fail.
While there is proper handling for a regdom update during a
TRIGGER_SCAN scan, prior to NEW_SCAN_RESULTS there is no such
handling if the regdom update comes in during a GET_SCAN or
GET_SURVEY.
In both the 6ghz and non-6ghz code paths we have some issues:
- For non-6ghz devices, or regdom updates that did not enable
6ghz the wiphy state watch callback will automatically issues
another GET_SURVEY/GET_SCAN without checking if there was
already one pending. It does this using the current scan request
which gets freed by the prior GET_SCAN/GET_SURVEY calls when
they complete, causing invalid reads when the subsequent calls
finish.
- If 6ghz was enabled by the update we actually append another
trigger command to the list and potentially run it if its the
current request. This also will end up in the same situation as
the request is freed by the pending GET_SURVEY/GET_SCAN calls.
For the non-6ghz case there is little to no harm in ignoring the
regdom update because its very unlikely it changed the allowed
frequencies.
For the 6ghz case we could potentially handle the new trigger scan
within get_scan_done, but thats beyond the scope of this change
and is likely quite intrusive.
Since surveys end up making driver calls in the kernel its not
entirely known how they are implemented or how long they will
take. For this reason the survey will be skipped if getting the
results from an external scan.
Doing this also fixes a crash caused by external scans where the
scan request pointer is not checked and dereferenced:
0x00005ffa6a0376de in get_survey_done (user_data=0x5ffa783a3f90) at src/scan.c:2059
0x0000749646a29bbd in ?? () from /usr/lib/libell.so.0
0x0000749646a243cb in ?? () from /usr/lib/libell.so.0
0x0000749646a24655 in l_main_iterate () from /usr/lib/libell.so.0
0x0000749646a24ace in l_main_run () from /usr/lib/libell.so.0
0x0000749646a263a4 in l_main_run_with_signal () from /usr/lib/libell.so.0
0x00005ffa6a00d642 in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:614
Reported-by: Daniel Bond <danielbondno@gmail.com>
With the introduction of affinities the CQM threshold can be toggled
by a DBus call. There was no check if there was already a pending
call which would cause the command ID to be overwritten and lose any
potential to cancel it, e.g. if netdev went down.
Some drivers fail to set a CQM threshold and report not supported.
Its unclear exactly why but if this happens roaming is effectively
broken.
To work around this enable RSSI polling if -ENOTSUP is returned.
The polling callback has been changed to emit the HIGH/LOW signal
threshold events instead of just the RSSI level index, just as if
a CQM event came from the kernel.
When the affinity is set to the current BSS lower the roaming
threshold to loosly lock IWD to the current BSS. The lower
threshold is automatically removed upon roaming/disconnection
since the affinity array is also cleared out.
This property will hold an array of object paths for
BasicServiceSet (BSS) objects. For the purpose of this patch
only the setter/getter and client watch is implemented. The
purpose of this array is to guide or loosely lock IWD to certain
BSS's provided that some external client has more information
about the environment than what IWD takes into account for its
roaming decisions.
For the time being, the array is limited to only the connected
BSS path, and any roams or disconnects will clear the array.
The intended use case for this is if the device is stationary
an external client could reduce the likelihood of roaming by
setting the affinity to the current BSS.
This documents new DBus property that expose a bit more control to
how IWD roams.
Setting the affinity on the connected BSS effectively "locks" IWD to
that BSS (except at critical RSSI levels, explained below). This can
be useful for clients that have access to more information about the
environment than IWD. For example, if a client is stationary there
is likely no point in trying to roam until it has moved elsewhere.
A new main.conf option would also be added:
[General].CriticalRoamThreshold
This would be the new roam threshold set if the currently connected
BSS is in the Affinities list. If the RSSI continues to drop below
this level IWD will still attempt to roam.
A user reported a crash which was due to the roam trigger timeout
being overwritten, followed by a disconnect. Post-disconnect the
timer would fire and result in a crash. Its not clear exactly where
the overwrite was happening but upon code inspection it could
happen in the following scenario:
1. Beacon loss event, start roam timeout
2. Signal low event, no check if timeout is running and the timeout
gets overwritten.
The reported crash actually didn't appear to be from the above
scenario but something else, so this logic is being hardened and
improved
Now if a roam timeout already exists and trying to be rearmed IWD
will check the time remaining on the current timer and either keep
the active timer or reschedule it to the lesser of the two values
(current or new rearm time). This will avoid cases such as a long
roam timer being active (e.g. 60 seconds) followed by a beacon or
packet loss event which should trigger a more agressive roam
schedule.
This adds a secondary set of signal thresholds. The purpose of these
are to provide more flexibility in how IWD roams. The critical
threshold is intended to be temporary and is automatically reset
upon any connection changes: disconnects, roams, or new connections.
This prepares for the ability to toggle between two signal
thresholds in netdev. Since each netdev may not need/want the
same threshold store it in the netdev object rather than globally.
Since IWD enrollees can send unicast frames, a PKEX configurator could
still run without multicast support. Using this combination basically
allows any driver to utilize DPP/PKEX assuming the MAC address can
be communicated using some out of band mechanism.
The DPP spec allows for obtaining frequency and MAC addresses up
to the implementation. IWD already takes advantage of this by
first scanning for nearby APs and using only those frequencies.
For further optimization an enrollee may be able to determine the
configurators frequency and MAC ahead of time which would make
finding the configurator much faster.
This will help to get rid of magic number use throughout the project.
The definitions should be limited to global magic numbers that are used
throughout the project, for example SSID length, MAC address length,
etc.
Due to an unnoticed bug after adding the BasicServiceSet object into
network, it became clear that since station already owns the scan_bss
objects it makes sense for it to manage the associated DBus objects
as well. This way network doesn't have to jump through hoops to
determine if the scan_bss object was remove, added, or updated. It
can just manage its list as it did prior.
From the station side this makes things very easy. When scan results
come in we either update or add a new DBus object. And any time a
scan_bss is freed we remove the DBus object.
To reduce code duplication and prepare for moving the BSS interface
to station, add a new API so station can create a BSS path without
a network object directly.
src/eapol.c:1041:9: error: ‘buf’ may be used uninitialized [-Werror=maybe-uninitialized]
1041 | l_put_be16(0, &frame->header.packet_len);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This warning is bogus since the buffer is initialized through use of
eapol_frame members. EAPoL-Start is a very simple frame.
This was seemingly trivial at face value but doing so ended up
pointing out a bug with how group_retry is set when forcing
the default group. Since group_retry is initialized to -1 the
increment in the force_default_group block results in it being
set to zero, which is actually group 20, not 19. This did not
matter for hunt and peck, but H2E actually uses the retry value
to index its pre-generated points which then breaks SAE if
forcing the default group with H2E.
To handle H2E and force_default_group, the group selection
logic will always begin iterating the group array regardless of
SAE type.
The property itself is an array of paths, but this is difficult
to fit nicely within a terminal. Instead just display the count
of BSS's. Displaying a detailed list of BSS's will be done via
a separate command.
There are certain cases where we may not want to display the entire
header for a given set of properties. For example displaying a list
of proxy interfaces. Add finer control by separating out the header
and the prop/value display into two functions.
This will tell network the BSS list is being updated and it can
act accordingly as far as the BSS DBus registrations/unregistration.
In addition any scan_bss object needing to be freed has to wait
until after network_bss_stop_update() because network has to be able
to iterate its old list and unregister any BSS's that were not seen
in the scan results. This is done by pushing each BSS needing to be
freed into a queue, then destroying them after the BSS's are all
added.
This adds a new DBus object/interface for tracking BSS's for
a given network. Since scanning replaces scan_bss objects some
new APIs were added to avoid tearing down the associated DBus
object for each BSS.
network_bss_start_update() should be called before any new BSS's
are added to the network object. This will keep track of the old
list and create a new network->bss_list where more entries can
be added. This is effectively replacing network_bss_list_clear,
except it keeps the old list around until...
network_bss_stop_update() is called when all BSS's have been
added to the network object. This will then iterate the old list
and lookup if any BSS DBus objects need to be destroyed. Once
completed the old list is destroyed.
iwd supports FILS only on softmac drivers. Ensure the capability check
is consistent between wiphy and netdev, both the softmac and the
relevant EXT_FEATURE bit must be checked.
CMD_EXTERNAL_AUTH could potentially be used for FILS for FullMAC cards,
but no hardware supporting this has been identified yet.
Somehow this ability was lost in the refactoring. OWE was intended to
be used on fullmac cards, but the state machine is only actually created
if the connection type ends up being softmac.
Fixes: 8b6ad5d3b9ec ("owe: netdev: refactor to remove OWE as an auth-proto")
Certain flags (for example, NLA_F_NESTED) are ORed with the netlink
attribute type identifier prior to being sent on the wire. Such flags
need to be masked off and not taken into consideration when attribute
type is being compared against known values.
The workaround for Cisco APs reporting an operating class of zero
is still a bug that remains in Cisco equipment. This is made even
worse with the introduction of 6GHz where the channel numbers
overlap with both 2.4 and 5GHz bands. This makes it impossible to
definitively choose a frequency given only a channel number.
To improve this workaround and cover the 6GHz band we can calculate
a frequency for each band and see what is successful. Then append
each frequency we get to the list. This will result in more
frequencies scanned, but this tradeoff is better than potentially
avoiding a roam to 6GHz or high order 5ghz channel numbers.
[denkenz@archdev ~]$ qemu-system-x86_64 --version
QEMU emulator version 9.0.1
Copyright (c) 2003-2024 Fabrice Bellard and the QEMU Project developers
QEMU now seems to complain that 'no-hpet' and 'no-acpi' command line
arguments are unrecognized.
l_genl class has nice ways of discovering and requesting families. The
genl functionality has been added after the iwmon skeleton was created,
but it is now time to migrate to using these APIs.
This attribute is actually an array of signed 32 bit integers and it
was being treated as a single integer. This would work until more
than one threshold was set, then it would fail to parse it.
After the station state changes in DPP setting autoconnect=True was
causing DPP to stop prior to being able to scan for the network.
Instead we can start autoconnect earlier so we aren't toggling the
property while DPP is running.
Prior to now the DPP state was required to be disconnected before
DPP would start. This is inconvenient for the user since it requires
extra state checking and/or DBus method calls. Instead model this
case like WSC and issue a disconnect to station if DPP is requested
to start.
The other conditions on stopping DPP are also preserved and no
changes to the configurator role have been made, i.e. being
disconnected while configuring still stops DPP. Similarly any
connection made during enrolling will stop DPP.
It should also be noted that station's autoconfigure setting is also
preserved and set back to its original value upon DPP completing.
Gets the current autoconenct setting. This is not the current
autoconnect state. Will be used in DPP to reset station's autoconnect
setting back to what it was prior to DPP, in case of failure.
In order to slightly rework the DPP state machine to handle
automatically disconnecting (for enrollees) functions need to be
created that isolate everything needed to start DPP/PKEX in case
a disconnect needs to be done first.
If valgrind didn't report any issues, don't dump the logs. This
makes the test run a lot easier to look at without having to scroll
through pages of valgrind logs that provide no value.
When the survey code was added it neglected to add the same
cancelation logic that existed for the GET_SCAN call, i.e. if
a scan was canceled and there was a pending GET_SURVEY to the
kernel that needs to be canceled, and the request cleaned up.
Fixes: 35808debae ("scan: use GET_SURVEY for SNR calculation in ranking")
IPv6 address entry was not updated to use display_table_row which led to
a shifted line in table, as shown below:
$ iwctl station wlan0 show | head | sed 's| |.|g'
.................................Station:.wlan0................................
--------------------------------------------------------------------------------
..Settable..Property..............Value..........................................
--------------------------------------------------------------------------------
............Scanning..............no...............................................
............State.................connected........................................
............Connected.network.....Clannad.Legacy...................................
............IPv4.address..........192.168.1.12.....................................
............IPv6.address........fdc3:541d:864f:0:96db:c9ff:fe36:b15............
............ConnectedBss..........cc:d8:43:77:91:0e................................
This patch aligns IPv6 address line with other lines in the table.
Fixes: 35dd2c08219a (client: update station to use display_table_row, 2022-07-07)
testEncryptedProfiles:
- This would occationally fail because the test is expecting
to explicitly connect but after the first failed connection
autoconnect takes over and its a race to connect.
testPSK-roam:
- Several rules were not being cleaned up which could cause
tests afterwards to fail
- The AP roam test started failing randomly because of the SNR
ranking changes. It appears that with hwsim _sometimes_ the
SNR is able to be determined which can effect the ranking. This
test assumed the two BSS's would be the same ranking but the
SNR sometimes causes this to not be true.
The wait_for_event() function allows past events to cause this
function to return immediately. This behavior is known, and
relied on for some tests. But in some cases you want to only
handle _new_ events, so we need a way to clear out prior events.
This event is not used anywhere and can be leveraged in autotesting.
Move the event to eapol_start() so it gets called unconditionally
when the 4-way handshake is started.
If a disconnect arrives at any point during the 4-way handshake or
key setting this would result in netdev sending a disconnect event
to station. If this is a reassociation this case is unhandled in
station and causes a hang as it expects any connection failure to
be handled via the reassociation callback, not a random disconnect
event.
To handle this case we can utilize netdev_disconnected() along with
the new NETDEV_RESULT_DISCONNECTED result to ensure the connect
callback gets called if it exists (indicating a pending connection)
Below are logs showing the "Unexpected disconnect event" which
prevents IWD from cleaning up its state and ultimately results in a
hang:
Jul 16 18:16:13: src/station.c:station_transition_reassociate()
Jul 16 18:16:13: event: state, old: connected, new: roaming
Jul 16 18:16:13: src/wiphy.c:wiphy_radio_work_done() Work item 65 done
Jul 16 18:16:13: src/wiphy.c:wiphy_radio_work_next() Starting work item 66
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
Jul 16 18:16:13: src/netdev.c:netdev_deauthenticate_event()
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification New Station(19)
Jul 16 18:16:13: src/station.c:station_netdev_event() Associating
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
Jul 16 18:16:13: src/netdev.c:netdev_authenticate_event()
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Associate(38)
Jul 16 18:16:13: src/netdev.c:netdev_associate_event()
Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Connect(46)
Jul 16 18:16:13: src/netdev.c:netdev_connect_event()
Jul 16 18:16:13: src/netdev.c:netdev_connect_event() aborting and ignore_connect_event not set, proceed
Jul 16 18:16:13: src/netdev.c:netdev_connect_event() expect_connect_failure not set, proceed
Jul 16 18:16:13: src/netdev.c:parse_request_ies()
Jul 16 18:16:13: src/netdev.c:netdev_connect_event() Request / Response IEs parsed
Jul 16 18:16:13: src/netdev.c:netdev_get_oci()
Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:13: src/netdev.c:netdev_get_oci_cb() Obtained OCI: freq: 5220, width: 3, center1: 5210, center2: 0
Jul 16 18:16:13: src/eapol.c:eapol_start()
Jul 16 18:16:13: src/netdev.c:netdev_unicast_notify() Unicast notification Control Port Frame(129)
Jul 16 18:16:13: src/netdev.c:netdev_control_port_frame_event()
Jul 16 18:16:13: src/eapol.c:eapol_handle_ptk_1_of_4() ifindex=6
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Control Port TX Status(139)
Jul 16 18:16:14: src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
Jul 16 18:16:14: src/netdev.c:netdev_cqm_event() Signal change event (above=1 signal=-60)
Jul 16 18:16:17: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:17: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
Jul 16 18:16:17: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
Jul 16 18:16:17: src/netdev.c:netdev_deauthenticate_event()
Jul 16 18:16:17: src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
Jul 16 18:16:17: src/netdev.c:netdev_disconnect_event()
Jul 16 18:16:17: Received Deauthentication event, reason: 15, from_ap: true
Jul 16 18:16:17: src/wiphy.c:wiphy_radio_work_done() Work item 66 done
Jul 16 18:16:17: src/station.c:station_disconnect_event() 6
Jul 16 18:16:17: Unexpected disconnect event
Jul 16 18:16:17: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:17: src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
Jul 16 18:16:17: src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX
After adding the NETDEV_RESULT_DISCONNECTED enum, handshake failures
initiated by the AP come in via this result so the existing logic
to call network_connect_failed() was broken. We could still get a
handshake failure generated internally, so that has been preserved
(via NETDEV_RESULT_HANDSHAKE_FAILED) but a check for a 4-way
handshake timeout reason code was also added.
This new event is sent during a connection if netdev recieves a
disconnect event. This patch cleans up station to handle this
case and leave the existing NETDEV_EVENT_DISCONNECTED_BY_{AP,SME}
handling only for CONNECTED, NETCONFIG, and FW_ROAMING states.
This new result is meant to handle cases where a disconnect
event (deauth/disassoc) was received during an ongoing connection.
Whether that's during authentication, association, the 4-way
handshake, or key setting.
src/p2putil.c: In function 'p2p_get_random_string':
src/p2putil.c:2641:37: error: initializer element is not constant 2641 |
static const int set_size = strlen(CHARSET); |
^~~~~~
This code path is not exercised in the autotest but commonly does
happen in the real world. There is no associated bug with this, but
its helpful to have this event triggered in case something got
introduced in the future.
The authenticating event was not used anymore and the associating
event use was questionable (after the CMD_CONNECT callback).
No other modules actually utilize these events but they are useful
for autotests. Move these events around to map 1:1 when the kernel
sends the auth/assoc events.
There are a few values which are nice to see in debug logs. Namely
the BSS load and SNR. Both of these values may not be available
either due to the AP or local hardware limiations. Rather than print
dummy values for these refactor the print so append the values only
if they are set in the scan result.
For ranking purposes the utilization was defaulted to a valid (127)
which would not change the rank if that IE was not found in the
scan results. Historically this was printed (debug) as part of the
scan results but that was removed as it was somewhat confusing. i.e.
did the AP _really_ have a utilization of 127? or was the IE not
found?
Since it is useful to see the BSS load if that is advertised add a
flag to the scan_bss struct to indicate if the IE was present which
can be checked.
This issues a GET_SURVEY dump after scan results are available and
populates the survey information within the scan results. Currently
the only value obtained is the noise for a given frequency but the
survey results structure was created if in the future more values
need to be added.
From the noise, the SNR can be calculated. This is then used in the
ranking calculation to help lower BSS ranks that are on high noise
channels.
Parsing the flush flag for external scans was not done correctly
as it was not parsing the ATTR_SCAN_FLAGS but instead the flag
bitmap. Fix this by parsing the flags attribute, then checking if
the bit is set.
Add a nested attribute parser. For the first supported attribute
add NL80211_ATTR_SURVEY_INFO.
This allows parsing of nested attributes in the same convenient
way as nl80211_parse_attrs but allows for support of any level of
nested attributes provided that a handler is added for each.
To prep for adding a _nested() variant of this function refactor
this to act on an l_genl_attr object rather than the message itself.
In addition a handler specific to the attribute being parsed is
now passed in, with the current "handler_for_type" being renamed to
"handler_for_nl80211" that corresponds to root level attributes.
This warning is guaranteed to happen for SAE networks where there are
multiple netdev_authenticate_events. This should just be a check so
we don't register eapol twice, not a warning.
EAP-TTLS Start packets are empty by default, but can still be sent with
the L flag set. When attempting to reassemble a message we should not
fail if the length of the message is 0, and just treat it as any other
unfragmented message with the L flag set.
This test uses the same country/country3 values seen by an AP vendor
which causes issues with IWD. The alpha2 is ES (Spain) and the 3rd
byte is 4, indicating to use the E-4. The issue then comes when the
neighbor report claims the BSS is under operating class 3 which is
not part of E-4.
With the fallback implemented, this test will pass since it will
try and lookup only on ES (the EU table) which operating class 3 is
part of.
Its been seen that some vendors incorrectly set the 3rd byte of the
country code which causes the band lookup to fail with the provided
operating class. This isn't compliant with the spec, but its been
seen out in the wild and it causes IWD to behave poorly, specifically
with roaming since it cannot parse neighbor reports. This then
requires IWD to do a full scan on each roam.
Instead of a hard rejection, IWD can instead attempt to determine
the band by ignoring that 3rd byte and only use the alpha2 string.
This makes IWD slightly less strict but at the advantage of not being
crippled when exposed to poor AP configurations.
This was added to support a single buggy AP model that failed to
negotiate the SAE group correctly. This may still be a problem but
since then the [Network].UseDefaultEccGroup option has been added
which accomplishes the same thing.
Remove the special handling for this specific OUI and rely on the
user setting the new option if they have problems.
Both ell/shared and ell/internal targets first create the ell/
directory within IWD. This apparently was just luck that one of
these always finished first in parallel builds. On my system at
least when building using dpkg-buildpackage IWD fails to build
due to the ell/ directory missing. From the logs it appears that
both the shared/internal targets were started but didn't complete
(or at least create the directory) before the ell/ell.h target:
make[1]: Entering directory '/home/jprestwood/tmp/iwd'
/usr/bin/mkdir -p ell
/usr/bin/mkdir -p ell
echo -n > ell/ell.h
/usr/bin/mkdir -p src
/bin/bash: line 1: ell/ell.h: No such file or directory
make[1]: *** [Makefile:4028: ell/ell.h] Error 1
Creating the ell/ directory within the ell/ell.h target solve
the issue. For reference this is the configure command dpkg
is using:
./configure --build=x86_64-linux-gnu \
--prefix=/usr \
--includedir=/usr/include \
--mandir=/usr/share/man \
--infodir=/usr/share/info \
--sysconfdir=/etc \
--localstatedir=/var \
--disable-option-checking \
--disable-silent-rules \
--libdir=/usr/lib/x86_64-linux-gnu \
--runstatedir=/run \
--disable-maintainer-mode \
--disable-dependency-tracking \
--enable-tools \
--enable-dbus-policy
Experimental AP-mode support for receiving a Confirm frame when in the
COMMITTED state. The AP will reply with a Confirm frame.
Note that when acting as an AP, on reception of a Commit frame, the AP
only replies with a Commit frame. The protocols allows to also already
send the Confirm frame, but older clients may not support simultaneously
receiving a Commit and Confirm frame.
Don't mark either client as being the authenticator. In the current unit
tests, both instances act as clients to test functionality. This ensures
the unit does not show an error during the following commits where SAE
for AP mode is added.
This was overlooked in a prior patch and causes warnings to be
printed when the RSSI is too low to estimate an HE data rate or
due to incompatible local capabilities (e.g. MCS support).
Similar to the other estimations, return -ENETUNREACH if the IE
was valid but incompatible.
If the RSSI is too low or the local capabilities were not
compatible to estimate the rate don't warn but instead treat
this the same as -ENOTSUP and drop down to the next capability
set.
If we register the main EAPOL frame listener as late as the associate
event, it may not observe ptk_1_of_4. This defeats handling for early
messages in eapol_rx_packet, which only sees messages once it has been
registered.
If we move registration to the authenticate event, then the EAPOL
frame listeners should observe all messages, without any possible
races. Note that the messages are not actually processed until
eapol_start() is called, and we haven't moved that call site. All
that's changing here is how early EAPOL messages can be observed.
netdev_disconnect() was unconditionally sending CMD_DISCONNECT which
is not the right behavior when IWD has not associated. This means
that if a connection was started then immediately canceled with
the Disconnect() method the kernel would continue to authenticate.
Instead if IWD has not yet associated it should send a deauth
command which causes the kernel to correctly cleanup its state and
stop trying to authenticate.
Below are logs showing the behavior. Autoconnect is started followed
immediately by a DBus Disconnect call, yet the kernel continues
sending authenticate events.
event: state, old: autoconnect_quick, new: connecting (auto)
src/scan.c:scan_cancel() Trying to cancel scan id 1 for wdev 7d
src/wiphy.c:wiphy_radio_work_done() Work item 1 done
src/wiphy.c:wiphy_radio_work_next() Starting work item 2
src/netdev.c:netdev_mlme_notify() MLME notification New Station(19)
src/station.c:station_dbus_disconnect()
src/station.c:station_reset_connection_state() 85
src/station.c:station_roam_state_clear() 85
event: state, old: connecting (auto), new: disconnecting
src/wiphy.c:wiphy_radio_work_done() Work item 2 done
src/station.c:station_connect_cb() 85, result: 5
src/station.c:station_disconnect_cb() 85, success: 1
event: state, old: disconnecting, new: disconnected
src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
src/netdev.c:netdev_authenticate_event()
Unexpected connection related event -- is another supplicant running?
src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
src/netdev.c:netdev_authenticate_event()
Unexpected connection related event -- is another supplicant running?
src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
src/netdev.c:netdev_authenticate_event()
Unexpected connection related event -- is another supplicant running?
src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
src/netdev.c:netdev_authenticate_event()
Unexpected connection related event -- is another supplicant running?
In most cases any failure here is likely just due to the AP not
supporting the feature, whether its HE/VHT/HE. This should result
in the estimation returning -ENOTSUP in which case we move down
the list. Any other non-zero return we will now warn to make it
clear the IEs did exist, but were not properly formatted.
All length check failures were changed to continue instead of
fail. This will now treat invalid lengths as if the IE did not
exist.
In addition HE specifically has an extra validation function which,
if failed, was bailing out of the estimation function entirely.
Instead this is now treated as if there was no HE capabilities and
the logic can move down to VHT, HT, or basic rates.
This was changed from too large of a mask (0xff) in an earlier
commit but was masking 5 bits instead of 6.
Fixes: 121c2c5653 ("monitor: properly mask HE capabilities bitfield")
Caught by static analysis, the dev->conn_peer pointer was being
dereferenced very early on without a NULL check, but further it
was being NULL checked. If there is a possibility of it being NULL
the check should be done much earlier.
If the test needs to do something very specific it may be useful to
prevent IWD from managing all the radios. This can now be done
by setting a "reserve" option in the radio settings. The value of
this should be something other than iwd, hostapd, or wpa_supplicant.
For example:
[rad1]
reserve=false
Caught by static analysis, if ATTR_MAC was not in the message there
would be a memcpy with uninitialized bytes. In addition there is no
reason to memcpy twice. Instead 'mac' can be a const pointer which
both verifies it exists and removes the need for a second memcpy.
Static analysis complains that 'last' could be NULL which is true.
This really could only happen if every frequency was disabled which
likely is impossible but in any case, check before dereferencing
the pointer.
Since these are all stack variables they are not zero initialized.
If parsing fails there may be invalid pointers within the structures
which can get dereferenced by p2p_clear_*
The input queue pointer was being initialized unconditionally so if
parsing fails the out pointer is still set after the queue is
destroyed. This causes a crash during cleanup.
Instead use a temporary pointer while parsing and only after parsing
has finished do we set the out pointer.
Reported-By: Alex Radocea <alex@supernetworks.org>
When HUP is received the IO read callback was never completing which
caused it to block indefinitely until waited for. This didn't matter
for most transient processes but for IWD, hostapd, wpa_supplicant
it would cause test-runner to hang if the process crashed.
Detecting a crash is somewhat hacky because we have no process
management like systemd and the return code isn't reliable as some
processes return non-zero under normal circumstances. So to detect
a crash the process output is being checked for the string:
"++++++++ backtrace ++++++++". This isn't 100% reliable obviously
since its dependent on how the binary is compiled, but even if the
crash itself isn't detected any test should still fail if written
correctly.
Doing this allows auto-tests to handle IWD crashes gracefully by
failing the test, printing the exception (event without debugging)
and continue with other tests.
The slaac_test was one that would occationally fail, but very rarely,
due to the resolvconf log values appearing in an unexpected order.
This appears to be related to a typo in netconfig-commit which would
not set netconfig-domains and instead set dns_list. This was fixed
with a pending patch:
https://lore.kernel.org/iwd/20240227204242.1509980-1-denkenz@gmail.com/T/#u
But applying this now leads to testNetconfig failing slaac_test
100% of the time.
I'm not familiar enough with resolveconf to know if this test change
is ok, but based on the test behavior the expected log and disk logs
are the same, just in the incorrect order. I'm not sure if this the
log order is deterministic so instead the check now iterates the
expected log and verifies each value appears once in the resolvconf
log.
Here is an example of the expected vs disk logs after running the
test:
Expected:
-a wlan1.dns
nameserver 192.168.1.2
nameserver 3ffe:501:ffff💯:10
nameserver 3ffe:501:ffff💯:50
-a wlan1.domain
search test1
search test2
Resolvconf log:
-a wlan1.domain
search test1
search test2
-a wlan1.dns
nameserver 192.168.1.2
nameserver 3ffe:501:ffff💯:10
nameserver 3ffe:501:ffff💯:50
static analysis complains that authenticator is used uninitialized.
This isn't strictly true as memory region is reserved for the
authenticator using the contents of the passed in structure. This
region is then overwritten once the authenticator is actually computed
by authenticator_put(). Silence this warning by explicitly setting
authenticator bytes to 0.
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
This shouldn't be possible in theory since the roam_bss_list being
iterated is a subset of entire scan_bss list station/network has
but to be safe, and catch any issues due to future changes warn on
this condition.
For some encrypt operations DPP passes no AD iovecs (both are
NULL/0). But since the iovec itself is on the stack 'ad' is a
valid pointer from within aes_siv_encrypt. This causes memcpy
to be called which coverity complains about. Since the copy
length is zero it was effectively a no-op, but check num_ad to
prevent the call.
Tests the 3 possible options to UseDefaultEccGroup behave as
expected:
- When not provided use the "auto" behavior.
- When false, always use higher order groups
- When true, always use default group
The SAE test made some assumptions on certain conditions due to
there being no way of checking if those conditions were met
Mainly the use of H2E/hunt-and-peck.
We assumed that when we told hostapd to use H2E or hunt/peck it
would but in reality it was not. Hostapd is apparently not very
good at swapping between the two with a simple "reload" command.
Once H2E is enabled it appears that it cannot be undone.
Similarly the vendor elements seem to carry over from test to
test, and sometimes not which causes unintended behavior.
To fix this create separate APs for the specific scenario being
tested:
- Hunt and peck
- H2E
- Special vendor_element simulating buggy APs
Another issue found was that if password identifies are used
hostapd automatically chooses H2E which was not intented, at
least based on the test names (in reality it wasn't causing any
problems).
The tests have also been improved to use hostapds "sta_status"
command which contains the group number used when authenticating,
so now that at least can be verified.
In order to complete the learned default group behavior station needs
to be aware of when an SAE/OWE connection retried. This is all
handled within netdev/sae so add a new netdev event so station can
set the appropriate network flags to prevent trying the non-default
group again.
If either the settings specify it, or the scan_bss is flagged, set
the use_default_ecc_group flag in the handshake.
This also renames the flag to cover both OWE and SAE
There is special handling for buggy OWE APs which set a network flag
to use the default OWE group. Utilize the more persistent setting
within known-networks as well as the network object (in case there
is no profile).
This also renames the get/set APIs to be generic to ECC groups rather
than only OWE.
This adds the option [Settings].UseDefaultEccGroup which allows a
network profile to specify the behavior when using an ECC-based
protocol. If unset (default) IWD will learn the behavior of the
network for the lifetime of its process.
Many APs do not support group 20 which IWD tries first by default.
This leads to an initial failure followed by a retry using group 19.
This option will allow the user to configure IWD to use group 19
first or learn the network capabilities, if the authentication fails
with group 20 IWD will always use group 19 for the process lifetime.
The information specific to auth/assoc/connect timeouts isn't
communicated to station so emit the notice events within netdev.
We could communicate this to station by adding separate netdev
events, but this does not seem worth it for this use case as
these notice events aren't strictly limited to station.
For anyone debugging or trying to identify network infrastructure
problems the IWD DBus API isn't all that useful and ultimately
requires going through debug logs to figure out exactly what
happened. Having a concise set of debug logs containing only
relavent information would be very useful. In addition, having
some kind of syntax for these logs to be parsed by tooling could
automate these tasks.
This is being done, starting with station, by using iwd_notice
which internally uses l_notice. The use of the notice log level
(5) in IWD will be strictly for the type of messages described
above.
iwd_notice is being added so modules can communicate internal
state or event information via the NOTICE log level. This log
level will be reserved in IWD for only these type of messages.
The iwd_notice macro aims to help enforce some formatting
requirements for these type of log messages. The messages
should be one or more comma-separated "key: value" pairs starting
with "event: <name>" and followed by any additional info that
pertains to that event.
iwd_notice only enforces the initial event key/value format and
additional arguments are left to the caller to be formatted
correctly.
The --logger,-l flag can now be used to specify the logger type.
Unset (default) will set log output to stderr as it is today. The
other valid options are "syslog" and "journal".
basename use is considered harmful. There are two versions of
basename (see man 3 basename for details). The more intuitive version,
which is currently being used inside wiphy.c, is not supported by musl
libc implementation. Use of the libgen version is not preferred, so
drop use of basename entirely. Since wiphy.c is the only call site of
basename() inside iwd, open code the required logic.
ELL now has a setting to limit the number of DHCP attempts. This
will now be set in IWD and if reached will result in a failure
event, and in turn a disconnect.
IWD will set a maximum of 4 retries which should keep the maximum
DHCP time to ~60 seconds roughly.
The known frequency list is now a sorted list and the roam scan
results were not complying with this new requirement. The fix is
easy though since the iteration order of the scan results does
not matter (the roam candidates are inserted by rank). To fix
the known frequencies order we can simply reverse the scan results
list before iterating it.
When operating as an AP, drop message 4 of the 4-way handshake if the AP
has not yet received message 2. Otherwise an attacker can skip message 2
and immediately send message 4 to bypass authentication (the AP would be
using an all-zero ptk to verify the authenticity of message 4).
Modify the existing frequency test to check that the ordering
lines up with the ranking of the BSS.
Add a test to check that quick scans limit the number of known
frequencies.
In very large network deployments there could be a vast amount of APs
which could create a large known frequency list after some time once
all the APs are seen in scan results. This then increases the quick
scan time significantly, in the very worst case (but unlikely) just
as long as a full scan.
To help with this support in knownnetworks was added to limit the
number of frequencies per network. Station will now only get 5
recent frequencies per network making the maximum frequencies 25
in the worst case (~2.5s scan).
The magic values are now defines, and the recent roam frequencies
was also changed to use this define as well.
In order to support an ordered list of known frequencies the list
should be in order of last seen BSS frequencies with the highest
ranked ones first. To accomplish this without adding a lot of
complexity the frequencies can be pushed into the list as long as
they are pushed in reverse rank order (lowest rank first, highest
last). This ensures that very high ranked BSS's will always get
superseded by subsequent scans if not seen.
This adds a new network API to update the known frequency list
based on the current newtork->bss_list. This assumes that station
always wipes the BSS list on scans and populates with only fresh
BSS entries. After the scan this API can be called and it will
reverse the list, then add each frequency.
I've had connections to a WPA3-Personal only network fail with no log
message from iwd, and eventually figured out to was because the driver
would've required using CMD_EXTERNAL_AUTH. With the added log messages
the reason becomes obvious.
Additionally the fallback may happen even if the user explicitly
configured WPA3 in NetworkManager, I believe a warning is appropriate
there.
There was an unhandled corner case if netconfig was running and
multiple roam conditions happened in sequence, all before netconfig
had completed. A single roam before netconfig was already handled
(23f0f5717c) but this did not take into account any additional roam
conditions.
If IWD is in this state, having started netconfig, then roamed, and
again restarted netconfig it is still in a roaming state which will
prevent any further roams. IWD will remain "stuck" on the current
BSS until netconfig completes or gets disconnected.
In addition the general state logic is wrong here. If IWD roams
prior to netconfig it should stay in a connecting state (from the
perspective of DBus).
To fix this a new internal station state was added (no changes to
the DBus API) to distinguish between a purely WiFi connecting state
(STATION_STATE_CONNECTING/AUTO) and netconfig
(STATION_STATE_NETCONFIG). This allows IWD roam as needed if
netconfig is still running. Also, some special handling was added so
the station state property remains in a "connected" state until
netconfig actually completes, regardless of roams.
For some background this scenario happens if the DHCP server goes
down for an extended period, e.g. if its being upgraded/serviced.
This fixes a build break on some systems, specifically the
raspberry Pi 3 (ARM):
monitor/main.c: In function ‘open_packet’:
monitor/main.c:176:3: error: implicit declaration of function ‘close’; did you mean ‘pclose’? [-Werror=implicit-function-declaration]
176 | close(fd);
| ^~~~~
| pclose
This was caused by the unused hostapd instance running after being
re-enabled by mistake. This cause an additional scan result with the
same rank to be seen which would then be connected to by luck of the
draw.
This really needs to be done to many more autotests but since this
one seems to have random failures ensure that all the tests still
run if one fails. In addition add better cleanup for hwsim rules.
This gives the tests a lot more fine-tune control to wait for
specific state transitions rather than only what is exposed over
DBus.
The additional events for "ft-roam" and "reassoc-roam" were removed
since these are now covered by the more generic state change events
("ft-roaming" and "roaming" respectively).
To support multiple nlmon sources, move the logic that reads from iwmon
device into main.c instead of nlmon. nlmon.c now becomes agnostic of
how the packets are actually obtained. Packets are fed in via
high-level APIs such as nlmon_print_rtnl, nlmon_print_genl,
nlmon_print_pae.
The current implementation inside nlmon_receive is asymmetrical. RTNL
packets are printed using nlmon_print_rtnl while GENL packets are
printed using nlmon_message.
nlmon_print_genl and nlmon_print_rtnl already handle iterating over data
containing multiple messages, and are used by nlmon started in reader
mode. Use these for better symmetry inside nlmon_receive.
While here, move store_netlink() call into nlmon_print_rtnl. This makes
handling of PCAP output symmetrical for both RTNL and GENL packets.
This also fixes a possibility where only the first message of a
multi-RTNL packet would be stored.
nlmon_print_genl invokes genl_ctrl when a generic netlink control
message is encountered. genl_ctrl() tries to filter nl80211 family
appearance messages and setup nlmon->id with the extracted family id.
However, the id is already provided inside main.c by using nlmon_open,
and no control messages are processed by nlmon in 'capture' mode (-r
command line argument not passed) since all genl messages go through
nlmon_message() path instead.
configure scripts need to be runnable with a POSIX-compliant /bin/sh.
On many (but not all!) systems, /bin/sh is provided by Bash, so errors
like this aren't spotted. Notably Debian defaults to /bin/sh provided
by dash which doesn't tolerate such bashisms as '+='.
This retains compatibility with bash. Just copy the expanded append like
we do on the line above.
Fixes warnings like:
```
./configure: 13352: CFLAGS+= -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2: not found
```
In order to test that extra settings are applied prior to connecting
two tests were added for hidden networks as well as one testing if
there is already an existing profile after DPP.
The reason hidden networks were used was due to the requirement of
the "Hidden" settings in the profile. If this setting doesn't get
sync'ed to disk the connection will fail.
Before this change DPP was writing the credentials both to disk
and into the network object directly. This allowed the connection
to work fine but additional settings were not picked up due to
network_set_passphrase/psk loading the settings before they were
written.
Instead DPP can avoid setting the credentials to the network
object entirely and just write them to disk. Then, wait for
known networks to notify that the profile was either created
or updated then DPP can proceed to connecting. network_autoconnect()
will take care of loading the profile that DPP wrote and remove the
need for DPP to touch the network object at all.
One thing to note is that an idle callback is still needed from
within the known networks callback. This is because a new profile
requires network.c to set the network_info which is done in the
known networks callback. Rather than assume that network.c will be
called into before dpp.c an l_idle was added.
If a known network is modified on disk known networks does not have
any way of notifying other modules. This will be needed to support a
corner case in DPP if a profile exists but is overwritten after DPP
configuration. Add this event to known networks and handle it in
network.c (though nothing needs to be done in that case).
Without the change test-dpp fails on aarch64-linux as:
$ unit/test-dpp
TEST: DPP test responder-only key derivation
TEST: DPP test mutual key derivation
TEST: DPP test PKEX key derivation
test-dpp: unit/test-dpp.c:514: test_pkex_key_derivation: Assertion `!memcmp(tmp, __tmp, 32)' failed.
This happens due to int/size_t type mismatch passed to vararg
parameters to prf_plus():
bool prf_plus(enum l_checksum_type type, const void *key, size_t key_len,
void *out, size_t out_len,
size_t n_extra, ...)
{
// ...
va_start(va, n_extra);
for (i = 0; i < n_extra; i++) {
iov[i + 1].iov_base = va_arg(va, void *);
iov[i + 1].iov_len = va_arg(va, size_t);
// ...
Note that varargs here could only be a sequence of `void *` / `size_t`
values.
But in src/dpp-util.c `iwd` attempted to pass `int` there:
prf_plus(sha, prk, bytes, z_out, bytes, 5,
mac_i, 6, // <- here
mac_r, 6, // <- and here
m_x, bytes,
n_x, bytes,
key, strlen(key));
aarch64 stores only 32-bit value part of the register:
mov w7, #0x6
str w7, [sp, #...]
and loads full 64-bit form of the register:
ldr x3, [x3]
As a result higher bits of `iov[].iov_len` contain unexpected values and
sendmsg sends a lot more data than expected to the kernel.
The change fixes test-dpp test for me.
While at it fixed obvious `int` / `size_t` mismatch in src/erp.c.
Fixes: 6320d6db0f ("crypto: remove label from prf_plus, instead use va_args")
The path argument was used purely for debugging. It can be just as
informational printing just the SSID of the profile that failed to
parse the setting without requiring callers allocate a string to
call the function.
Certain tests may require external processes to work
(e.g. testNetconfig) and if missing the test will just hang until
the maximum test timeout. Check in start_process if the exe
actually exists and if not throw an exception.
In order to support identifiers the test profiles needed to be
reworked due to hostapd allowing multiple password entires. You
cannot just call set_value() with a new entry as the old ones
still exist. Instead use a unique password for the identifier and
non-identifier use cases.
After adding this test the failure_test started failing due to
hostapd not starting up. This was due to the group being unsupported
but oddly only when hostapd was reloaded (running the test
individually worked). To fix this the group number was changed to 21
which hostapd does support but IWD does not.
Adds a new network profile setting [Security].PasswordIdentifier.
When set (and the BSS enables SAE password identifiers) the network
and handshake object will read this and use it for the SAE
exchange.
Building the handshake will fail if:
- there is no password identifier set and the BSS sets the
"exclusive" bit.
- there is a password identifier set and the BSS does not set
the "in-use" bit.
Using this will provide netdev with a connect callback and unify the
roaming result notification between FT and reassociation. Both paths
will now end up in station_reassociate_cb.
This also adds another return case for ft_handshake_setup which was
previously ignored by ft_associate. Its likely impossible to actually
happen but should be handled nevertheless.
Fixes: 30c6a10f28 ("netdev: Separate connect_failed and disconnected paths")
Essentially exposes (and renames) netdev_ft_tx_associate in order to
be called similarly to netdev_reassociate/netdev_connect where a
connect callback can be provided. This will fix the current bug where
if association times out during FT IWD will hang and never transition
to disconnected.
This also removes the calling of the FT_ROAMED event and instead just
calls the connect callback (since its now set). This unifies the
callback path for reassociation and FT roaming.
This will be called from station after FT-authentication has
finished. It sets up the handshake object to perform reassociation.
This is essentially a copy-paste of ft_associate without sending
the actual frame.
In general only the authenticator FTE is used/validated but with
some FT refactoring coming there needs to be a way to build the
supplicants FTE into the handshake object. Because of this there
needs to be separate FTE buffers for both the authenticator and
supplicant.
The default() method was added for convenience but was extending the
test times significantly when the hostapd config was lengthy. This
was because it called set_value for every value regardless if it
had changed. Instead store the current configuration and in default()
only reset values that differ.
This tests ensures IWD disconnects after receiving an association
timeout event. This exposes a current bug where IWD does not
transition to disconnected after an association timeout when
FT-roaming.
If tests end in an unknown state it is sometimes required that IWD
be stopped manually in order for future tests to run. Add a stop()
method so test tearDown() methods can explicitly stop IWD.
The path for IWD to call this doesn't ever happen in autotests
but during debugging of the DPP agent it was noticed that the
DBus signature was incorrect and would always result in an error
when calling from IWD.
For adding SAE password identifiers the capability bits need to be
verified when loading the identifier from the profile. Pass the
BSS object in to network_load_psk rather than the 'need_passphrase'
boolean.
iov_ie_append assumed that a single IE was being added and thus the
length of the IE could be extracted directly from the element. However,
iov_ie_append was used on buffers which could contain multiple IEs
concatenated together, for example in handshake_state::vendor_ies. Most
of the time this was safe since vendor_ies was NULL or contained a
single element, but would result in incorrect behavior in the general
case. Fix that by changing iov_ie_append signature to take an explicit
length argument and have the caller specify whether the element is a
single IE or multiple.
Fixes: 7e9971661bcb ("netdev: Append any vendor IEs from the handshake")
Use an _auto_ variable to cleanup IEs allocated by
p2p_build_association_req(). While here, take out unneeded L_WARN_ON
since p2p_build_association_req cannot fail.
If the FT-Authenticate frame has been sent then a deauth is received
the work item for sending the FT-Associate frame is never canceled.
When this runs station->connected_network is NULL which causes a
crash:
src/station.c:station_try_next_transition() 7, target xx:xx:xx:xx:xx:xx
src/wiphy.c:wiphy_radio_work_insert() Inserting work item 5843
src/wiphy.c:wiphy_radio_work_insert() Inserting work item 5844
src/wiphy.c:wiphy_radio_work_done() Work item 5842 done
src/wiphy.c:wiphy_radio_work_next() Starting work item 5843
src/netdev.c:netdev_mlme_notify() MLME notification Remain on Channel(55)
src/ft.c:ft_send_authenticate()
src/netdev.c:netdev_mlme_notify() MLME notification Frame TX Status(60)
src/netdev.c:netdev_link_notify() event 16 on ifindex 7
src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
src/netdev.c:netdev_deauthenticate_event()
src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
src/netdev.c:netdev_disconnect_event()
Received Deauthentication event, reason: 7, from_ap: true
src/station.c:station_disconnect_event() 7
src/station.c:station_disassociated() 7
src/station.c:station_reset_connection_state() 7
src/station.c:station_roam_state_clear() 7
src/netconfig.c:netconfig_event_handler() l_netconfig event 2
src/netconfig-commit.c:netconfig_commit_print_addrs() removing address: yyy.yyy.yyy.yyy
src/resolve.c:resolve_systemd_revert() ifindex: 7
[DHCPv4] l_dhcp_client_stop:1264 Entering state: DHCP_STATE_INIT
src/station.c:station_enter_state() Old State: connected, new state: disconnected
src/station.c:station_enter_state() Old State: disconnected, new state: autoconnect_quick
src/wiphy.c:wiphy_radio_work_insert() Inserting work item 5845
src/netdev.c:netdev_mlme_notify() MLME notification Cancel Remain on Channel(56)
src/wiphy.c:wiphy_radio_work_done() Work item 5843 done
src/wiphy.c:wiphy_radio_work_next() Starting work item 5844
"Program terminated with signal SIGSEGV, Segmentation fault.",
"#0 0x0000565359ee3f54 in network_bss_find_by_addr ()",
"#0 0x0000565359ee3f54 in network_bss_find_by_addr ()",
"#1 0x0000565359ec9d23 in station_ft_work_ready ()",
"#2 0x0000565359ec0af0 in wiphy_radio_work_next ()",
"#3 0x0000565359f20080 in offchannel_mlme_notify ()",
"#4 0x0000565359f4416b in received_data ()",
"#5 0x0000565359f40d90 in io_callback ()",
"#6 0x0000565359f3ff4d in l_main_iterate ()",
"#7 0x0000565359f4001c in l_main_run ()",
"#8 0x0000565359f40240 in l_main_run_with_signal ()",
"#9 0x0000565359eb3888 in main ()"
ssid is declared as a 32 byte field in handshake_state, hence using it
as a string which is assumed to be nul-terminated will fail for SSIDs
that are 32 bytes long.
Fixes: d938d362b212 ("erp: ERP implementation and key cache move")
Fixes: 433373fe28a4 ("eapol: cache ERP keys on EAP success")
ssid is declared as a 32 byte field in handshake_state, hence using it
as a string which is assumed to be nul-terminated will fail for SSIDs
that are 32 bytes long.
Fixes: 1f1478285725 ("wiphy: add _generate_address_from_ssid")
Fixes: 5a1b1184fca6 ("netdev: support per-network MAC addresses")
In netdev_retry_owe, if l_gen_family_send fails, the connect_cmd is
never freed or reset. Fix that.
While here, use a stack variable instead of netdev member, since the use
of such a member is unnecessary and confusing.
vendor_ies stored in handshake_state are already added as part of
netdev_populate_common_ies(), which is already invoked by
netdev_build_cmd_connect().
Normally vendor_ies is NULL for OWE connections, so no IEs are
duplicated as a result.
CC src/adhoc.o
In file included from src/adhoc.c:28:0:
/usr/include/linux/if.h:234:19: error: field ‘ifru_addr’ has incomplete type
struct sockaddr ifru_addr;
^
/usr/include/linux/if.h:235:19: error: field ‘ifru_dstaddr’ has incomplete type
struct sockaddr ifru_dstaddr;
^
/usr/include/linux/if.h:236:19: error: field ‘ifru_broadaddr’ has incomplete type
struct sockaddr ifru_broadaddr;
^
/usr/include/linux/if.h:237:19: error: field ‘ifru_netmask’ has incomplete type
struct sockaddr ifru_netmask;
^
/usr/include/linux/if.h:238:20: error: field ‘ifru_hwaddr’ has incomplete type
struct sockaddr ifru_hwaddr;
^
Very rarely on ath10k (potentially other ath cards), disabling
power save while the interface is down causes a timeout when
bringing the interface back up. This seems to be a race in the
driver or firmware but it causes IWD to never start up properly
since there is no retry logic on that path.
Retrying is an option, but a more straight forward approach is
to just reorder the logic to set power save off after the
interface is already up. If the power save setting fails we can
just log it, ignore the failure, and continue. From a users point
of view there is no real difference in doing it this way as
PS still gets disabled prior to IWD connecting/sending data.
Changing behavior based on a buggy driver isn't something we
should be doing, but in this instance the change shouldn't have
any downside and actually isn't any different than how it has
been done prior to the driver quirks change (i.e. use network
manager, iw, or iwconfig to set power save after IWD starts).
For reference, this problem is quite rare and difficult to say
exactly how often but certainly <1% of the time:
iwd[1286641]: src/netdev.c:netdev_disable_ps_cb() Disabled power save for ifindex 54
kernel: ath10k_pci 0000:02:00.0: wmi service ready event not received
iwd[1286641]: Error bringing interface 54 up: Connection timed out
kernel: ath10k_pci 0000:02:00.0: Could not init core: -110
After this IWD just sits idle as it has no interface to start using.
This is even reproducable outside of IWD if you loop and run:
ip link set <wlan> down
iw dev <wlan> set power_save off
ip link set <wlan> up
Eventually the 'up' command will fail with a timeout.
I've brought this to the linux-wireless/ath10k mailing list but
even if its fixed in future kernels we'd still need to support
older kernels, so a workaround/change in IWD is still required.
This is done already for DPP, do the same for PKEX. Few drivers
(ath9k upstream, ath10k/11k in progress) support this which is
unfortunate but since a configurator will not work without this
capability its best to fail early.
The DPP spec allows 3rd party fields in the DPP configuration
object (section 4.5.2). IWD can take advantage of this (when
configuring another IWD supplicant) to communicate additional
profile options that may be required for the network.
The new configuration member will be called "/net/connman/iwd"
and will be an object containing settings specific to IWD.
More settings could be added here if needed but for now only
the following are defined:
{
send_hostname: true/false,
hidden: true/false
}
These correspond to the following network profile settings:
[IPv4].SendHostname
[Settings].Hidden
The scan result handling was fragile because it assumed the kernel
would only give results matching the requested SSID. This isn't
something we should assume so instead keep the configuration object
around until after the scan and use the target SSID to lookup the
network.
Nearly every use of the ssid member first has to memcpy it to a
buffer and NULL terminate. Instead just store the ssid as a
string when creating/parsing from JSON.
The DPP-PKEX spec provides a very limited list of frequencies used
to discover configurators, only 3 on 2.4 and 5GHz bands. Since
configurators (at least in IWD's implementation) are only allowed
on the current operating frequency its very unlikely an enrollee
will find a configurator on these frequencies out of the entire
spectrum.
The spec does mention that the 3 default frequencies should be used
"In lieu of specific channel information obtained in a manner outside
the scope of this specification, ...". This allows the implementation
some flexibility in using a broader range of frequencies.
To increase the chances of finding a configurator shared code
enrollees will first issue a scan to determine what access points are
around, then iterate these frequencies. This is especially helpful
when the configurators are IWD-based since we know that they'll be
on the same channels as the APs in the area.
The post-DPP connection was never done quite right due to station's
state being unknown. The state is now tracked in DPP by a previous
patch but the scan path in DPP is still wrong.
It relies on station autoconnect logic which has the potential to
connect to a different network than what was configured with DPP.
Its unlikely but still could happen in theory. In addition the scan
was not selectively filtering results by the SSID that DPP
configured.
This fixes the above problems by first filtering the scan by the
SSID. Then setting the scan results into station without triggering
autoconnect. And finally using network_autoconnect() directly
instead of relying on station to choose the SSID.
DPP (both DPP and PKEX) run the risk of odd behavior if station
decides to change state. DPP is completely unaware of this and
best case would just result in a protocol failure, worst case
duplicate calls to __station_connect_network.
Add a station watch and stop DPP if station changes state during
the protocol.
Commit c59669a366c5 ("netdev: disambiguate between disconnection types")
introduced different paths for different types of disconnection
notifications from netdev. Formalize this further by having
netdev_connect_failed only invoke connect_cb.
Disconnections that could be triggered outside of connection
related events are now handled on a different code path. For this
purpose, netdev_disconnected() is introduced.
When a roam event is received, iwd generates a firmware scan request and
notifies its event filter of the ROAMING condition. In cases where the
firmware scan could not be started successfully, netdev_connect_failed
is invoked. This is not a correct use of netev_connect_failed since it
doesn't actually disconnect the underlying netdev and the reflected
state becomes de-synchronized from the underlying kernel device.
The firmware scan request could currently fail for two reasons:
1. nl80211 genl socket is in a bad state, or
2. the scan context does not exist
Since both reasons are highly unlikely, simply use L_WARN instead.
The other two cases where netdev_connect_failed is used could only occur
if the kernel message is invalid. The message is ignored in that case
and a warning is printed.
The situation described above also exists in netdev_get_fw_scan_cb. If
the scan could not be completed successfully, there's not much iwd can
do to recover. Have iwd remain in roaming state and print an error.
There are generally three scenarios where iwd generates a disconnection
command to the kernel:
1. Error conditions stemming from a connection related event. For
example if SAE/FT/FILS authentication fails during Authenticate or
Associate steps and the kernel doesn't disconnect properly.
2. Deauthentication after the connection has been established and not
related to a connection attempt in progress. For example, SA Query
processing that triggers an disconnect.
3. Disconnects that are triggered due to a handshake failure or if
setting keys resulting from the handshake fails. These disconnects
can be triggered as a result of a pending connection or when a
connection has been established (e.g. due to rekeying).
Distinguish between 1 and 2/3 by having the disconnect procedure take
different paths. For now there are no functional changes since all
paths end up in netdev_connect_failed(), but this will change in the
future.
While here, also get rid of netdev_del_station. The only user of this
function was in ap.c and it could easily be replaced by invoking the new
nl80211_build_del_station function. The callback used by
netdev_build_del_station only printed an error and didn't do anything
useful. Get rid of it for now.
netdev_begin_connection() already invokes netdev_connect_failed on
error. Remove any calls to netdev_connect_failed in callers of
netdev_begin_connection().
Fixes: 4165d9414f54 ("netdev: use wiphy radio work queue for connections")
If netdev_get_oci fails, a goto deauth is invoked in order to terminate
the current connection and return an error to the caller. Unfortunately
the deauth label builds CMD_DEAUTHENTICATE in order to terminate the
connection. This was fine because it used to handle authentication
protocols that ran over CMD_AUTHENTICATE and CMD_ASSOCIATE. However,
OCI can also be used on FullMAC hardware that does not support them.
Use CMD_DISCONNECT instead which works everywhere.
Fixes: 06482b811626 ("netdev: Obtain operating channel info")
The reason code field was being obtained as a uint8_t value, while it is
actually a uint16_t in little-endian byte order.
Fixes: f3cc96499c44 ("netdev: added support for SA Query")
The reason code from deauthentication frame was being obtained as a
uint8_t instead of a uint16_t. The value was only ever used in an
informational statement. Since the value was in little endian, only the
first 8 bits of the reason code were obtained. Fix that.
Fixes: 2bebb4bdc7ee ("netdev: Handle deauth frames prior to association")
Several tests do not pass due to some additional changes that have
not been merged. Remove these cases and add some hardening after
discovering some unfortunate wpa_supplicant behavior.
- Disable p2p in wpa_supplicant. With p2p enabled an extra device
is created which starts receiving DPP frames and printing
confusing messages.
- Remove extra asserts which don't make sense currently. These
will be added back later as future additions to PKEX are
upstreamed.
- Work around wpa_supplicant retransmit limitation. This is
described in detail in the comment in pkex_test.py
- wait_for_event was returning a list in certain cases, not the
event itself
- The configurator ID was not being printed (',' instead of '%')
- The DPP ID was not being properly waited for with PKEX
With the addition of DPP PKEX autotests some of the timeouts are
quite long and hit test-runners maximum timeouts. For UML we should
allow this since time-travel lets us skip idle waits. Move the test
timeout out of a global define and into the argument list so QEMU
and UML can define it differently.
The StartConfigurator() call was left out since there would be no
functional difference to the user in iwctl. Its expected that
human users of the shared code API provide the code/id ahead of
time, i.e. use ConfigureEnrollee/StartEnrollee.
Check that enough space for newline and 0-byte is left in line.
This fixes a buffer overflow on specific completion results.
Reported-By: Leona Maroni <dev@leona.is>
Adds a configurator variant to be used along side an agent. When
called the configurator will start and wait for an initial PKEX
exchange message from an enrollee at which point it will request
the code from an agent. This provides more flexibility for
configurators that are capable of configuring multiple enrollees
with different identifiers/codes.
Note that the timing requirements per the DPP spec still apply
so this is not meant to be used with a human configurator but
within an automated agent which does a quick lookup of potential
identifiers/codes and can reply within the 200ms window.
The PKEX configurator role is currently limited to being a responder.
When started the configurator will listen on its current operating
channel for a PKEX exchange request. Once received it and the
encrypted key is properly decrypted it treats this peer as the
enrollee and won't allow configurations from other peers unless
PKEX is restarted. The configurator will encrypt and send its
encrypted ephemeral key in the PKEX exchange response. The enrollee
then sends its encrypted bootstrapping key (as commit-reveal request)
then the same for the configurator (as commit-reveal response).
After this, PKEX authentication begins. The enrollee is expected to
send the authenticate request, since its the initiator.
This is the initial support for PKEX enrollees acting as the
initiator. A PKEX initiator starts the protocol by broadcasting
the PKEX exchange request. This request contains a key encrypted
with the pre-shared PKEX code. If accepted the peer sends back
the exchange response with its own encrypted key. The enrollee
decrypts this and performs some crypto/hashing in order to establish
an ephemeral key used to encrypt its own boostrapping key. The
boostrapping key is encrypted and sent to the peer in the PKEX
commit-reveal request. The peer then does the same thing, encrypting
its own bootstrapping key and sending to the initiator as the
PKEX commit-reveal response.
After this, both peers have exchanged their boostrapping keys
securely and can begin DPP authentication, then configuration.
For now the enrollee will only iterate the default channel list
from the Easy Connect spec. Future upates will need to include some
way of discovering non-default channel configurators, but the
protocol needs to be ironed out first.
Stop() will now return NotFound if DPP is not running. This causes
the DPP test to fail since it calls this regardless if the protocol
already stopped. Ignore this exception since tests end in various
states, some stopped and some not.
PKEX and DPP will share the same state machine since the DPP protocol
follows PKEX. This does pose an issue with the DBus interfaces
because we don't want DPP initiated by the SharedCode interface to
start setting properties on the DeviceProvisioning interface.
To handle this a dpp_interface enum is being introduced which binds
the dpp_sm object to a particular interface, for the life of the
protocol run. Once the protocol finishes the dpp_sm can be unbound
allowing either interface to use it again later.
This mispelling was present in the configuration, so I retained parsing
of the legacy BandModifier*Ghz options for compatibility. Without this
change anyone spelling GHz correctly in their configs would be very
confused.
PKEX is part of the WFA EasyConnect specification and is
an additional boostrapping method (like QR codes) for
exchanging public keys between a configurator and enrollee.
PKEX operates over wifi and requires a key/code be exchanged
prior to the protocol. The key is used to encrypt the exchange
of the boostrapping information, then DPP authentication is
started immediately aftewards.
This can be useful for devices which don't have the ability to
scan a QR code, or even as a more convenient way to share
wireless credentials if the PSK is very secure (i.e. not a
human readable string).
PKEX would be used via the three DBus APIs on a new interface
SharedCodeDeviceProvisioning.
ConfigureEnrollee(a{sv}) will start a configurator with a
static shared code (optionally identifier) passed in as the
argument to this method.
StartEnrollee(a{sv}) will start a PKEX enrollee using a static
shared code (optionally identifier) passed as the argument to
the method.
StartConfigurator(o) will start a PKEX configurator and use the
agent specified by the path argument. The configurator will query
the agent for a specific code when an enrollee sends the initial
exchange message.
After the PKEX protocol is finished, DPP bootstrapping keys have
been exchanged and DPP Authentication will start, followed by
configuration.
Beacon loss handling was removed in the past because it was
determined that this even always resulted in a disconnect. This
was short sighted and not always true. The default kernel behavior
waits for 7 lost beacons before emitting this event, then sends
either a few nullfuncs or probe requests to the BSS to determine
if its really gone. If these come back successfully the connection
will remain alive. This can give IWD some time to roam in some
cases so we should be handling this event.
Since beacon loss indicates a very poor connection the roam scan
is delayed by a few seconds in order to give the kernel a chance
to send the nullfuncs/probes or receive more beacons. This may
result in a disconnect, but it would have happened anyways.
Attempting a roam mainly handles the case when the connection can
be maintained after beacon loss, but is still poor.
This is being done to allow the DPP module to work correctly. DPP
currently uses __station_connect_network incorrectly since it
does not (and cannot) change the state after calling. The only
way to connect with a state change is via station_connect_network
which requires a DBus method that triggered the connection; DPP
does not have this due to its potentially long run time.
To support DPP there are a few options:
1. Pass a state into __station_connect_network (this patch)
2. Support a NULL DBus message in station_connect_network. This
would require several NULL checks and adding all that to only
support DPP just didn't feel right.
3. A 3rd connect API in station which wraps
__station_connect_network and changes the state. And again, an
entirely new API for only DPP felt wrong (I guess we did this
for network_autoconnect though...)
Its about 50/50 between call sites that changed state after calling
and those that do not. Changing the state inside
__station_connect_network felt useful enough to cover the cases that
could benefit and the remaining cases could handle it easily enough:
- network_autoconnect(), and the state is changed by station after
calling so it more or less follows the same pattern just routes
through network. This will now pass the CONNECTING_AUTO state
from within network vs station.
- The disconnect/reconnect path. Here the state is changed to
ROAMING prior in order to avoid multiple state changes. Knowing
this the same ROAMING state can be passed which won't trigger a
state change.
- Retrying after a failed BSS. The state changes on the first call
then remains the same for each connection attempt. To support this
the current station->state is passed to avoid a state change.
Until now IWD only supported enrollees as responders (configurators
could do both). For PKEX it makes sense for the enrollee to be the
initiator because configurators in the area are already on their
operating channel and going off is inefficient. For PKEX, whoever
initiates also initiates authentication so for this reason the
authentication path is being opened up to allow enrollees to
initiate.
The check for the header was incorrect according to the spec.
Table 58 indicates that the "Query Response Info" should be set
to 0x00 for the configuration request. The frame handler was
expecting 0x7f which is the value for the config response frame.
Unfortunately wpa_supplicant also gets this wrong and uses 0x7f
in all cases which is likely why this value was set incorrectly
in IWD. The issue is that IWD's config request is correct which
means IWD<->IWD configuration is broken. (and wpa_supplicant as
a configurator likely doesn't validate the config request).
Fix this by checking both 0x7f and 0x00 to handle both
supplicants.
Stopping periodic scans and not restarting them prevents autoconnect
from working again if DPP (or the post-DPP connect) fails. Since
the DPP offchannel work is at a higher priority than scanning (and
since new offchannels are queue'd before canceling) there is no risk
of a scan happening during DPP so its safe to leave periodic scans
running.
The packet loss handler puts a higher priority on roaming compared
to the low signal roam path. This is generally beneficial since this
event usually indicates some problem with the BSS and generally is
an indicator that a disconnect will follow sometime soon.
But by immediately issuing a scan we run the risk of causing many
successive scans if more packet loss events arrive following
the roam scans (and if no candidates are found). Logs provided
further.
To help with this handle the first event with priority and
immediately issue a roam scan. If another event comes in within a
certain timeframe (2 seconds) don't immediately scan, but instead
rearm the roam timer instead of issuing a scan. This also handles
the case of a low signal roam scan followed by a packet loss
event. Delaying the roam will at least provide some time for packets
to get out in between roam scans.
Logs were snipped to be less verbose, but this cycled happened
5 times prior. In total 7 scans were issued in 5 seconds which may
very well have been the reason for the local disconnect:
Oct 27 16:23:46 src/station.c:station_roam_failed() 9
Oct 27 16:23:46 src/wiphy.c:wiphy_radio_work_done() Work item 29 done
Oct 27 16:23:47 src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
Oct 27 16:23:47 src/station.c:station_packets_lost() Packets lost event: 10
Oct 27 16:23:47 src/station.c:station_roam_scan() ifindex: 9
Oct 27 16:23:47 src/wiphy.c:wiphy_radio_work_insert() Inserting work item 30
Oct 27 16:23:47 src/wiphy.c:wiphy_radio_work_next() Starting work item 30
Oct 27 16:23:47 src/station.c:station_start_roam() Using cached neighbor report for roam
Oct 27 16:23:47 src/scan.c:scan_notify() Scan notification Trigger Scan(33)
Oct 27 16:23:47 src/scan.c:scan_request_triggered() Active scan triggered for wdev a
Oct 27 16:23:47 src/scan.c:scan_notify() Scan notification New Scan Results(34)
Oct 27 16:23:47 src/netdev.c:netdev_link_notify() event 16 on ifindex 9
... scan results ...
Oct 27 16:23:47 src/station.c:station_roam_failed() 9
Oct 27 16:23:47 src/wiphy.c:wiphy_radio_work_done() Work item 30 done
Oct 27 16:23:47 src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
Oct 27 16:23:47 src/station.c:station_packets_lost() Packets lost event: 10
Oct 27 16:23:47 src/station.c:station_roam_scan() ifindex: 9
Oct 27 16:23:47 src/wiphy.c:wiphy_radio_work_insert() Inserting work item 31
Oct 27 16:23:47 src/wiphy.c:wiphy_radio_work_next() Starting work item 31
Oct 27 16:23:47 src/station.c:station_start_roam() Using cached neighbor report for roam
Oct 27 16:23:47 src/scan.c:scan_notify() Scan notification Trigger Scan(33)
Oct 27 16:23:47 src/scan.c:scan_request_triggered() Active scan triggered for wdev a
Oct 27 16:23:48 src/scan.c:scan_notify() Scan notification New Scan Results(34)
Oct 27 16:23:48 src/netdev.c:netdev_link_notify() event 16 on ifindex 9
... scan results ...
Oct 27 16:23:48 src/station.c:station_roam_failed() 9
Oct 27 16:23:48 src/wiphy.c:wiphy_radio_work_done() Work item 31 done
Oct 27 16:23:48 src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
Oct 27 16:23:48 src/station.c:station_packets_lost() Packets lost event: 10
Oct 27 16:23:48 src/station.c:station_roam_scan() ifindex: 9
Oct 27 16:23:48 src/wiphy.c:wiphy_radio_work_insert() Inserting work item 32
Oct 27 16:23:48 src/wiphy.c:wiphy_radio_work_next() Starting work item 32
Oct 27 16:23:48 src/station.c:station_start_roam() Using cached neighbor report for roam
Oct 27 16:23:48 src/scan.c:scan_notify() Scan notification Trigger Scan(33)
Oct 27 16:23:48 src/scan.c:scan_request_triggered() Active scan triggered for wdev a
Oct 27 16:23:49 src/netdev.c:netdev_link_notify() event 16 on ifindex 9
Oct 27 16:23:49 src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
Oct 27 16:23:49 src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
Oct 27 16:23:49 src/netdev.c:netdev_deauthenticate_event()
Oct 27 16:23:49 src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
Oct 27 16:23:49 src/netdev.c:netdev_disconnect_event()
Oct 27 16:23:49 Received Deauthentication event, reason: 4, from_ap: false
Include a specific timeout value so different protocols can specify
different timeouts. For example once the authentication timeout
should not take very long (even 10 seconds seems excessive) but
adding PKEX may warrant longer timeouts.
For example discovering a configurator IWD may want to wait several
minutes before ending the discovery. Similarly running PKEX as a
configurator we should put a hard limit on the time, but again
minutes rather than 10 seconds.
The memcpy in HEX2BUF was copying the length of the buffer that was
passed in, not the actual length of the converted hexstring. This
test was segfaulting in the Alpine CI which uses clang/musl.
Its been seen (so far only in mac80211_hwsim + UML) where an
offchannel requests ACK comes after the ROC started event. This
causes the ROC started event to never call back to notify since
info->roc_cookie is unset and it appears to be coming from an
external process.
We can detect this situation in the ROC notify event by checking
if there is a pending ROC command and if info->roc_cookie does
not match. This can also be true for an external event so we just
set a new "early_cookie" member and return.
Then, when the ACK comes in for the ROC request, we can validate
if the prior event was associated with IWD or some external
process. If it was from IWD call the started callback, otherwise
the ROC notify event should come later and handled under the
normal logic where the cookies match.
Instead of looking up by wdev, lookup by the ID itself. We
shouldn't ever have more than one info per wdev in the queue but
looking up the _exact_ info structure doesn't hurt in case things
change in the future.
If netconfig is canceled before completion (when roaming) the
settings are freed and never loaded again once netconfig is started
post-roam. Now after a roam make sure to re-load the settings and
start netconfig.
Commit 23f0f5717c did not correctly handle the reassociation
case where the state is set from within station_try_next_transition.
If IWD reassociates netconfig will get reset and DHCP will need to
be done over again after the roam. Instead get the state ahead of
station_try_next_transition.
Fixes: 23f0f5717ca0 ("station: allow roaming before netconfig finishes")
When using mutual authentication an additional value needs to
be hashed when deriving i/r_auth values. A NULL value indicates
no mutual authentication (zero length iovec is passed to hash).
DPP configurators are running the majority of the protocol on the
current operating channel, meaning no ROC work. The retry logic
was bailing out if !dpp->roc_started with the assumption that DPP
was in between requesting offchannel work and it actually starting.
For configurators, this may not be the case. The offchannel ID also
needs to be checked, and if no work is scheduled we can send the
frame.
The prf_plus API was a bit restrictive because it only took a
string label which isn't compatible with some specs (e.g. DPP
inputs to HKDF-Expand). In addition it took additional label
aruments which were appended to the HMAC call (and the
non-intuitive '\0' if there were extra arguments).
Instead the label argument has been removed and callers can pass
it in through va_args. This also lets the caller decided the length
and can include the '\0' or not, dependent on the spec the caller
is following.
Adds a handler for the HE capabilities element and reworks the way
the MCS/NSS support bits are printed.
Now if the MCS support is 3 (unsupported) it won't be printed. This
makes the logs a bit shorter to read.
Matched the printed function name with the actual function name.
The simple-agent test prints the function name to allow easier debugging.
One name was not set currectly (most likely through copy pasting).
SAE was also relying on the ELL bug which was incorrectly performing
a subtraction on the Y coordinate based on the compressed point type.
Correct this and make the point type more clear (rather than
something like "is_odd + 2").
EAP-PWD was incorrectly computing the PWE but due to the also
incorrect logic in ELL the point converted correctly. This is
being fixed, so both places need the reverse logic.
Also added a big comment explaining why this is, and how
l_ecc_point_from_data behaves since its somewhat confusing since
EAP-PWD expects the pwd-seed to be compared to the actual Y
coordinate (which is handled automatically by ELL).
Add a test to show the incorrect ASN1 conversion to and from points.
This was due to the check if Y is odd/even being inverted which
incorrectly prefixes the X coordinate with the wrong byte.
The test itself was not fully correct because it was using compliant
points rather than full points, and the spec contains the entire
Y coordinate so the full point should be used.
This patch also adds ASN1 conversions to validate that
dpp_point_from_asn1 and dpp_point_to_asn1 work properly.
The previous attempt at working around this warning seems to no longer
work with gcc 13
In function ‘eap_handle_response’,
inlined from ‘eap_rx_packet’ at src/eap.c:570:3:
src/eap.c:421:49: error: ‘vendor_id’ may be used uninitialized [-Werror=maybe-uninitialized]
421 | (type == EAP_TYPE_EXPANDED && vendor_id == (id) && vendor_type == (t))
| ~~~~~~~~~~^~~~~~~
src/eap.c:533:20: note: in expansion of macro ‘IS_EXPANDED_RESPONSE’
533 | } else if (IS_EXPANDED_RESPONSE(our_vendor_id, our_vendor_type))
| ^~~~~~~~~~~~~~~~~~~~
src/eap.c: In function ‘eap_rx_packet’:
src/eap.c:431:18: note: ‘vendor_id’ was declared here
431 | uint32_t vendor_id;
| ^~~~~~~~~
width must be initialized since it depends on best not being NULL. If
best passes the non-NULL check above, then width must be initialized
since both width and best are set at the same time.
For IWD to work correctly either 2.4GHz or 5GHz bands must be enabled
(even for 6GHz to work). Check this and don't allow IWD to initialize
if both 2.4 and 5GHz is disabled.
wiphy_get_allowed_freqs was only being used to see if 6GHz was disabled
or not. This is expensive and requires several allocations when there
already exists wiphy_is_band_disabled(). The prior patch modified
wiphy_is_band_disabled() to return -ENOTSUP which allows scan.c to
completely remove the need for wiphy_get_allowed_freqs.
scan_wiphy_watch was also slightly re-ordered to avoid allocating
freqs_6ghz if the scan request was being completed.
The function wiphy_band_is_disabled() return was a bit misleading
because if the band was not supported it would return true which
could be misunderstood as the band is supported, but disabled.
There was only one call site and because of this behavior
wiphy_band_is_disabled needed to be paired with checking if the
band was supported.
To be more descriptive to the caller, wiphy_band_is_disabled() now
returns an int and if the band isn't supported -ENOTSUP will be
returned, otherwise 1 is returned if the band is disabled and 0
otherwise.
This adds support to allow users to disable entire bands, preventing
scanning and connecting on those frequencies. If the
[Rank].BandModifier* options are set to 0.0 it will imply those
bands should not be used for scanning, connecting or roaming. This
now applies to autoconnect, quick, hidden, roam, and dbus scans.
This is a station only feature meaning other modules like RRM, DPP,
WSC or P2P may still utilize those bands. Trying to limit bands in
those modules may sometimes conflict with the spec which is why it
was not added there. In addition modules like DPP/WSC are only used
in limited capacity for connecting so there is little benefit gained
to disallowing those bands.
To support user-disabled bands periodic scans need to specify a
frequency list filtered by any bands that are disabled. This was
needed in scan.c since periodic scans don't provide a frequency
list in the scan request.
If no bands are disabled the allowed freqs API should still
result in the same scan behavior as if a frequency list is left
out i.e. IWD just filters the frequencies as opposed to the kernel.
Currently the only way a scan can be split is if the request does
not specify any frequencies, implying the request should scan the
entire spectrum. This allows the scan logic to issue an extra
request if 6GHz becomes available during the 2.4 or 5GHz scans.
This restriction was somewhat arbitrary and done to let periodic
scans pick up 6GHz APs through a single scan request.
But now with the addition of allowing user-disabled bands
periodic scans will need to specify a frequency list in case a
given band has been disabled. This will break the scan splitting
code which is why this prep work is being done.
The main difference now is the original scan frequencies are
tracked with the scan request. The reason for this is so if a
request comes in with a limited set of 6GHz frequences IWD won't
end up scanning the full 6GHz spectrum later on.
This is more or less copied from scan_get_allowed_freqs but is
going to be needed by station (basically just saves the need for
station to do the same clone/constrain sequence itself).
One slight alteration is now a band mask can be passed in which
provides more flexibility for additional filtering.
This exposes the [Rank].BandModifier* settings so other modules
can use then. Doing this will allow user-disabling of certain
bands by setting these modifier values to 0.0.
The loop iterating the frequency attributes list was not including
the entire channel set since it was stopping at i < band->freqs_len.
The freq_attrs array is allocated to include the last channel:
band->freq_attrs = l_new(struct band_freq_attrs, num_channels + 1);
band->freqs_len = num_channels;
So instead the for loop should use i <= band->freqs_len. (I also
changed this to start the loop at 1 since channel zero is invalid).
The auth/action status is now tracked in ft.c. If an AP rejects the
FT attempt with "Invalid PMKID" we can now assume this AP is either
mis-configured for FT or is lagging behind getting the proper keys
from neighboring APs (e.g. was just rebooted).
If we see this condition IWD can now fall back to reassociation in
an attempt to still roam to the best candidate. The fallback decision
is still rank based: if a BSS fails FT it is marked as such, its
ranking is reset removing the FT factor and it is inserted back
into the queue.
The motivation behind this isn't necessarily to always force a roam,
but instead to handle two cases where IWD can either make a bad roam
decision or get 'stuck' and never roam:
1. If there is one good roam candidate and other bad ones. For
example say BSS A is experiencing this FT key pull issue:
Current BSS: -85dbm
BSS A: -55dbm
BSS B: -80dbm
The current logic would fail A, and roam to B. In this case
reassociation would have likely succeeded so it makes more sense
to reassociate to A as a fallback.
2. If there is only one candidate, but its failing FT. IWD will
never try anything other than FT and repeatedly fail.
Both of the above have been seen on real network deployments and
result in either poor performance (1) or eventually lead to a full
disconnect due to never roaming (2).
Certain return codes, though failures, can indicate that the AP is
just confused or booting up and treating it as a full failure may
not be the best route.
For example in some production deployments if an AP is rebooted it
may take some time for neighboring APs to exchange keys for
current associations. If a client roams during that time it will
reject saying the PMKID is invalid.
Use the ft_associate call return to communicate the status (if any)
that was in the auth/action response. If there was a parsing error
or no response -ENOENT is still returned.
In many tests the hostapd configuration does not include all the
values that a test uses. Its expected that each individual test
will add the values required. In many cases its required each test
slightly alter the configuration for each change every other test
has to set the value back to either a default or its own setting.
This results in a ton of duplicated code mainly setting things
back to defaults.
To help with this problem the hostapd configuration is read in
initially and stored as the default. Tests can then simply call
.default() to set everything back. This significantly reduces or
completely removes a ton of set_value() calls.
This does require that each hostapd configuration file includes all
values any of the subtests will set, which is a small price for the
convenience.
Removed several debug prints which are very verbose and provide
little to no important information.
The get_scan_{done,callback} prints are pointless since all the
parsed scan results are printed by station anyways.
Printing the BSS load is also not that useful since it doesn't
include the BSSID. If anything the BSS load should be included
when station prints out each individual BSS (along with frequency,
rank, etc).
The advertisement protocol print was just just left in there by
accident when debugging, and also provides basically no useful
information.
Some APs don't include the RSNE in the associate reply during
the OWE exchange. This causes IWD to be incompatible since it has
a hard requirement on the AKM being included.
This relaxes the requirement for the AKM and instead warns if it
is not included.
Below is an example of an association reply without the RSN element
IEEE 802.11 Association Response, Flags: ........
Type/Subtype: Association Response (0x0001)
Frame Control Field: 0x1000
.000 0000 0011 1100 = Duration: 60 microseconds
Receiver address: 64:c4:03:88:ff:26
Destination address: 64:c4:03:88:ff:26
Transmitter address: fc:34:97:2b:1b:48
Source address: fc:34:97:2b:1b:48
BSS Id: fc:34:97:2b:1b:48
.... .... .... 0000 = Fragment number: 0
0001 1100 1000 .... = Sequence number: 456
IEEE 802.11 wireless LAN
Fixed parameters (6 bytes)
Tagged parameters (196 bytes)
Tag: Supported Rates 6(B), 9, 12(B), 18, 24(B), 36, 48, 54, [Mbit/sec]
Tag: RM Enabled Capabilities (5 octets)
Tag: Extended Capabilities (11 octets)
Ext Tag: HE Capabilities (IEEE Std 802.11ax/D3.0)
Ext Tag: HE Operation (IEEE Std 802.11ax/D3.0)
Ext Tag: MU EDCA Parameter Set
Ext Tag: HE 6GHz Band Capabilities
Ext Tag: OWE Diffie-Hellman Parameter
Tag Number: Element ID Extension (255)
Ext Tag length: 51
Ext Tag Number: OWE Diffie-Hellman Parameter (32)
Group: 384-bit random ECP group (20)
Public Key: 14ba9d8abeb2ecd5d95e6c12491b16489d1bcc303e7a7fbd…
Tag: Vendor Specific: Broadcom
Tag: Vendor Specific: Microsoft Corp.: WMM/WME: Parameter Element
Reported-By: Wen Gong <quic_wgong@quicinc.com>
Tested-By: Wen Gong <quic_wgong@quicinc.com>
Handling these events notifies hwsim of address changes for interface
creation/removal outside the initial namespace as well as address
changes due to scanning address randomization.
Interfaces that hwsim already knows about are still handled via
nl80211. But any interfaces not known when ADD/DEL_MAC_ADDR events
come will be treated specially.
For ADD, a dummy interface object will be created and added to the
queue. This lets the frame processing match the destination address
correctly. This can happen both for scan randomization and interface
creation outside of the initial namespace.
For the DEL event we handle similarly and don't touch any interfaces
found via nl80211 (i.e. have a 'name') but need to also be careful
with the dummy interfaces that were created outside the initial
namespace. We want to keep these around but scanning MAC changes can
also delete them. This is why a reference count was added so scanning
doesn't cause a removal.
For example, the following sequence:
ADD_MAC_ADDR (interface creation)
ADD_MAC_ADDR (scanning started)
DEL_MAC_ADDR (scanning done)
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.