Since netconfig is now part of the Connect() call from a DBus
perspective add a note indicating that this method has the potential
to take a very long time if there are issues with DHCP.
Since the method return to Connect() and ConnectBssid() come after
netconfig some tests needed to be updated since they were waiting
for the method return before continuing. For timeout-based tests
specifically this caused them to fail since before they expected
the return to come before the connection was actually completed.
Let the caller specify the method timeout if there is an expectation
that it could take a long time.
For the conventional connect call (not the "bssid" debug variant) let
them pass their own callback handlers. This is useful if we don't
want to wait for the connect call to finish, but later get some
indication that it did finish either successfully or not.
A netconfig failure results in a failed connection which restarts
autoconnect and prevents IWD from retrying the connection on any
other BSS's within the network as a whole. When autoconnect restarts
IWD will scan and choose the "best" BSS which is likely the same as
the prior attempt. If that BSS is somehow misconfigured as far as
DHCP goes, it will likely fail indefinitely and in turn cause IWD to
retry indefinitely.
To improve this netconfig has been adopted into the IWD's BSS retry
logic. If netconfig fails this will not result in IWD transitioning
to a disconnected state, and instead the BSS will be network
blacklisted and the next will be tried. Only once all BSS's have been
tried will IWD go into a disconnected state and start autoconnect
over.
When netconfig is enabled the DBus reply was being sent in
station_connect_ok(), before netconfig had even started. This would
result in a call to Connect() succeeding from a DBus perspective but
really netconfig still needed to complete before IWD transitioned
to a connected state.
Fixes: 72e7d3ceb83d ("station: Handle NETCONFIG_EVENT_FAILED")
This adds a new API network_clear_blacklist() and removes this
functionality from network_connected(). This is done to support BSS
iteration when netconfig is enabled. Since a call to
network_connected() will happen prior to netconfig completing we
cannot clear the blacklist until netconfig has either passed or
failed.
If this fails, in some cases, -EAGAIN would be returned up to netdev
which would then assume a retry would be done automatically. This
would not in fact happen since it was an internal SAE failure which
would result in the connect method return to never get sent.
Now if sae_send_commit() fails, return -EPROTO which will cause
netdev to fail the connection.
A BSS can temporarily reject associations and provide a delay that
the station should wait for before retrying. This is useful when
sane values are used, but taking it to the extreme an AP could
potentially request the client wait UINT32_MAX TU's which equates
to 49 days.
Either due to a bug, or worse by design, the kernel will wait for
however long that timeout is. Luckily the kernel also sends an event
to userspace with the amount of time it will be waiting. To guard
against excessive timeouts IWD will now handle this event and enforce
a maximum allowed value. If the timeout exceeds this IWD will
deauthenticate.
Specifically for the NO_MORE_STAS reason code, add the BSS to the
(now renamed) AP_BUSY blacklist to avoid roaming to this BSS for
the near future.
Since we are now handling individual reason codes differently the
whole IS_TEMPORARY_STATUS macro was removed and replaced with a
case statement.
The initial pass of this feature only envisioned BSS transition
management frames as the trigger to "roam blacklist" a BSS, hence
the original name. But some APs actually utilize status codes that
also indicate to the stations that they are busy, or not able to
handle more connections. This directly aligns with the original
motivation of the "roam blacklist" series and these events should
also trigger this type of blacklist.
First, since we will be applying this blacklist to cases other
than being told to roam, rename this reason code internally to
BLACKLIST_REASON_AP_BUSY. The config option is also being renamed
to [Blacklist].InitialAccessPointBusyTimeout while also supporting
the old config option, but warning that it is deprecated.
Some drivers/hardware are more strict about limiting use of
certain frequencies on startup until the regulatory domain has
been set. For most cards the only way to set the regulatory domain
is to scan and see BSS's nearby that advertise the country they
reside in.
This is particularly important for AP mode since AP's are always
emitting radiation from beacons and will not start until the desired
frequency is both enabled and allows IR. To make this process
seamless in IWD we will first check that the desired frequency is
enabled/IR and if not issue a scan to (hopefully) get the regulatory
domain set.
A prior patch broke this by checking the return of
l_dbus_message_iter_next_entry. This was really subtle but the logic
actually relied on _not_ checking that return in order to handle
empty lists.
Instead of reverting the logic was adapted/commented to make it more
clear what the API expects from DBus. If list contains at least one
value the first element path will get set, if it contains zero
values "new_path" will be set to NULL which will then cause the
list to be cleared later on.
This both fixes the regression, and makes it clear that a zero
element list is supported and handled.
The length of EncryptedSecurity was assumed to be at least 16 bytes
and anything less would underflow the length to l_malloc.
Fixes: 01cd8587606b ("storage: implement network profile encryption")
The MAC and version elements weren't super critical but the channel
and bootstrapping key elements would result in memory leaks if there
were duplicates.
This patch now will not allow duplicate elements in the URI.
Fixes: f7f602e1b1e7 ("dpp-util: add URI parsing")
The survey arrays were exactly the number of valid channels for a
given band (e.g. 14 for 2.4GHz) but since channels start at 1 this
means that the last channel for a band would overflow the array.
Fixes: 35808debaefd ("scan: use GET_SURVEY for SNR calculation in ranking")
Supporting PMKSA on fullmac drivers requires that we set the PMKSA
into the kernel as well as remove it. This can now be triggered
via the new PMKSA driver callbacks which are implemented and set
with this patch.
In order to support fullmac drivers the PMKSA entries must be added
and removed from the kernel. To accomplish this a set of driver
callbacks will be added to the PMKSA module. In addition a new
pmksa_cache_free API will be added whos only purpose is to handle
the removal from the kernel.
The iwd_notice function was more meant for special purpose events
not general debug prints. For these error conditions we should be
using l_warn. For the informational "External Auth to SSID" log
we already print this information when connecting from station. In
addition there are logs when performing external auth so it should
be very obvious external auth is being used without this log.
The netdev frame watches got cleaned up upon the interface going down
which works if the interface is simply being toggled but when IWD
shuts down it first shuts down the interface, then immediately frees
netdev. If a watched frame arrives immediately after that before the
interface shutdown callback it will reference netdev, which has been
freed.
Fix this by clearing out the frame watches in netdev_free.
==147== Invalid read of size 8
==147== at 0x408ADB: netdev_neighbor_report_frame_event (netdev.c:4772)
==147== by 0x467C75: frame_watch_unicast_notify (frame-xchg.c:234)
==147== by 0x4E28F8: __notifylist_notify (notifylist.c:91)
==147== by 0x4E2D37: l_notifylist_notify_matches (notifylist.c:204)
==147== by 0x4A1388: process_unicast (genl.c:844)
==147== by 0x4A1388: received_data (genl.c:972)
==147== by 0x49D82F: io_callback (io.c:105)
==147== by 0x49C93C: l_main_iterate (main.c:461)
==147== by 0x49CA0B: l_main_run (main.c:508)
==147== by 0x49CA0B: l_main_run (main.c:490)
==147== by 0x49CC3F: l_main_run_with_signal (main.c:630)
==147== by 0x4049EC: main (main.c:614)
If an AP directed roam frame comes in while IWD is roaming its
still valuable to parse that frame and blacklist the BSS that
sent it.
This can happen most frequently during a roam scan while connected
to an overloaded BSS that is requesting IWD roams elsewhere.
If the BSS is requesting IWD roam elsewhere add this BSS to the
blacklist using BLACKLIST_REASON_ROAM_REQUESTED. This will lower
the chances of IWD roaming/connecting back to this BSS in the
future.
This then allows IWD to consider this blacklist state when picking
a roam candidate. Its undesireable to fully ban a roam blacklisted
BSS, so some additional sorting logic has been added. Prior to
comparing based on rank, BSS's will be sorted into two higher level
groups:
Above Threshold - BSS is above the RoamThreshold
Below Threshold - BSS is below the RoamThreshold
Within each of these groups the BSS may be roam blacklisted which
will position it at the bottom of the list within its respecitve
group.
This adds a new (less severe) blacklist reason as well as an option
to configure the timeout. This blacklist reason will be used in cases
where a BSS has requested IWD roam elsewhere. At that time a new
blacklist entry will be added which will be used along with some
other criteria to determine if IWD should connect/roam to that BSS
again.
Now that we have multiple blacklist reasons there may be situations
where a blacklist entry already exists but with a different reason.
This is going to be handled by the reason severity. Since we have
just two reasons we will treat a connection failure as most severe
and a roam requested as less severe. This leaves us with two
possible situations:
1. BSS is roam blacklisted, then gets connection blacklisted:
The reason will be "promoted" to connection blacklisted.
2. BSS is connection blacklisted, then gets roam blacklisted:
The blacklist request will be ignored
When pruning the list check_if_expired was comparing to the maximum
amount of time a BSS can be blacklisted, not if the current time had
exceeded the expirationt time. This results in blacklist entries
hanging around longer than they should, which would result in them
poentially being blacklisted even longer if there was another reason
to blacklist in the future.
Instead on prune check the actual expiration and remove the entry if
its expired. Doing this removes the need to check any of the times
in blacklist_contains_bss since prune will remove any expired entries
correctly.
To both prepare for some new blacklisting behavior and allow for
easier consolidation of the network-specific blacklist include a
reason enum for each entry. This allows IWD to differentiate
between multiple blacklist types. For now only the existing
"permanent" type is being added which prevents connections to that
BSS via autoconnect until it expires.
Allowing the timeout blacklist to be disabled has introduced a bug
where a failed connection will not result in the BSS list to be
traversed. This causes IWD to retry the same BSS over and over which
be either a) have some issue preventing a connection or b) may simply
be unreachable/out of range.
This is because IWD was inherently relying on the timeout blacklist
to flag BSS's on failures. With it disabled there was nothing to tell
network_bss_select that we should skip the BSS and it would return
the same BSS indefinitely.
To fix this some of the blacklisting logic was re-worked in station.
Now, a BSS will always get network blacklisted upon a failure. This
allows network.c to traverse to the next BSS upon failure.
For auth/assoc failures we will then only timeout blacklist under
certain conditions, i.e. the status code was not in the temporary
list.
Fixes: 77639d2d452e ("blacklist: allow configuration to disable the blacklist")
Certain use cases may not need or want this feature so allowing it to
be disabled is a much cleaner way than doing something like setting
the timeouts very low.
Now [Blacklist].InitialTimeout can be set to zero which will prevent
any blacklisting.
In addition some other small changes were added:
- Warn if the multiplier is 0, and set to 1 if so.
- Warn if the initial timeout exceeds the maximum timeout.
- Log if the blacklist is disabled
- Use L_USEC_PER_SEC instead of magic numbers.
SAE/WPA3 is completely broken on brcmfmac, at least without a custom
kernel patch which isn't included in many OS distributions. In order
to help with this add a driver quirk so devices with brcmfmac can
utilize WPA2 instead of WPA3 and at least connect to networks at
this capacity until the fix is more widely distributed.
Instead of just printing the PMKSA pointer separate this into two
separate debug messages, one for if the PMKSA exists and the other
if it does not. In addition print out the MAC of the AP so we have
a reference of which PMKSA this is.
With external auth there is no associate event meaning the auth proto
never gets freed, which prevents eapol from starting inside the
OCI callback. Check for this specific case and free the auth proto
after signaling that external auth has completed.
The user can now limit the size and count of PCAP files iwmon will
create. This allows iwmon to run for long periods of time without
filling up disk space.
This implements support for "rolling captures" by allowing iwmon to
limit the PCAP file size and number of PCAP's that are created.
This is a useful feature when long term monitoring is needed. If
there is some rare behavior requiring iwmon to run for days, months,
or longer the resulting PCAP file would become quite large and fill
up disk space.
When enabled (command line arguments in subsequent patch) the PCAP
file size is checked on each write. If it exceeds the limit a new
PCAP file will be created. Once the number of old PCAP files reaches
the set limit the oldest PCAP will be removed from disk.
For syncing iwmon captures with other logging its useful to
timestamp in some absolute format like UTC. This adds an
option which allows the user to specify what time format to
show. For now support:
delta - (default) The time delta between the first packet
and the current packet.
utc - The packet time in UTC
The ath10k driver has shown some performance issues, specifically
packet loss, when frame watches are registered with the multicast
RX flag set. This is relevant for DPP which registers for these
when DPP starts (if the driver supports it). This has only been
observed when there are large groups of clients all using the same
wifi channel so its unlikely to be much of an issue for those using
IWD/ath10k and DPP unless you run large deployments of clients.
But for large deployments with IWD/ath10k we need a way to disable
the multicast RX registrations. Now, with the addition of
wiphy_supports_multicast_rx we can both check that the driver
supports this as well as if its been disabled by the driver quirk.
This driver quirk and associated helper API lets other modules both
check if multicast RX is supported, and if its been disabled via
the driver quirk setting.
The actual connection piece of this is very minimal, and only
requires station to check if there is a PMKSA cached, and if so
include the PMKID in the RSNE. Netdev then takes care of the rest.
The remainder of this patch is the error handling if a PMKSA
connection fails with INVALID_PMKID. In this case IWD should retry
the same BSS without PMKSA.
An option was also added to disable PMKSA if a user wants to do
that. In theory PMKSA is actually less secure compared to SAE so
it could be something a user wants to disable. Going forward though
it will be enabled by default as its a requirement from the WiFi
alliance for WPA3 certification.
To prepare for PMKSA support station needs access to the handshake
object. This is because if PMKSA fails due to an expired/missing
PMKSA on the AP station should retry using the standard association.
This poses a problem currently because netdev frees the handshake
prior to calling the connect callback.
This was quite simple and only requiring caching the PMKSA after a
successful handshake, and using the correct authentication type
for connections if we have a prior PMKSA cached.
This is only being added for initial SAE associations for now since
this is where we gain the biggest improvement, in addition to the
requirement by the WiFi alliance to label products as "WPA3 capable"
This is needed in order to clear the PMKSA from the handshake state
without actually putting it back into the cache. This is something
that will be needed in case the AP rejects the association due to
an expired (or forgotten) PMKSA.
The majority of this patch was authored by Denis Kenzior, but
I have appended setting the PMK inside handshake_state_set_pmksa
as well as checking if the pmkid exists in
handshake_state_steal_pmkid.
Authored-by: Denis Kenzior <denkenz@gmail.com>
Authored-by: James Prestwood <prestwoj@gmail.com>
There are quite a few tests here for various scenarios and PMKSA
throws a wrench into that. Rather than potentially breaking the
tests in attempt to get them working with PMKSA, just disable PMKSA.
Since IWD doesn't utilize DBus signals in "normal" operations its
fine to lazy initialize any of the DBus interfaces since properties
can be obtained as needed with Get/GetAll.
For test-runner though StationDebug uses signals for debug events
and until the StationDebug class is initialized (via a method call
or property access) all signals will be lost. Fix this by always
initializing the StationDebug interface when a Device class is
initialized.
This adds a ref count to the handshake state object (as well as
ref/unref APIs). Currently IWD is careful to ensure that netdev
holds the root reference to the handshake state. Other modules do
track it themselves, but ensure that it doesn't get referenced
after netdev frees it.
Future work related to PMKSA will require that station holds a
references to the handshake state, specifically for retry logic,
after netdev is done with it so we need a way to delay the free
until station is also done.
The utilization rank factor already existed but was very rigid
and only checked a few values. This adds the (optional) ability
to start applying an exponentially decaying factor to both
utilization and station count after some threshold is reached.
This area needs to be re-worked in order to support very highly
loaded networks. If a network either doesn't support client
balancing or does it poorly its left up to the clients to choose
the best BSS possible given all the information available. In
these cases connecting to a highly loaded BSS may fail, or result
in a disconnect soon after connecting. In these cases its likely
better for IWD to choose a slightly lower RSSI/datarate BSS over
the conventionally 'best' BSS in order to aid in distributing
the network load.
The thresholds are currently optional and not enabled by default
but if set they behave as follows:
If the value is above the threshold it is mapped to an integer
between 0 and 30. (using a starting range of <value> - 255).
This integer is then used to index in the exponential decay table
to get a factor between 1 and 0. This factor is then applied to
the rank.
Note that as the value increases above the threshold the rank
will be increasingly effected, as is expected for an exponential
function. These option should be used with care as it may have
unintended consequences, especially with very high load networks.
i.e. you may see IWD roaming to BSS's with much lower signal if
there are high load BSS's nearby.
To maintain the existing behavior if there is no utilization
factor set in main.conf the legacy thresholds/factors will be
used.
This is copied from network.c that uses a static table to lookup
exponential decay values by index (generated from 1/pow(n, 0.3)).
network.c uses this for network ranking but it can be useful for
BSS ranking as well if you need to apply some exponential backoff
to a value.
This has been needed elsewhere but generally shortcuts could be
taken mapping with ranges starting/ending with zero. This is a
more general linear mapping utility to map values between any
two ranges.
gcc-15 switched to -std=c23 by default:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=55e3bd376b2214e200fa76d12b67ff259b06c212
As a result `iwd` fails the build as:
../src/crypto.c:1215:24: error: incompatible types when returning type '_Bool' but 'struct l_ecc_point *' was expected
1215 | return false;
| ^~~~~
Signed-off-by: Rudi Heitbaum <rudi@heitbaum.com>
gcc-15 switched to -std=c23 by default:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=55e3bd376b2214e200fa76d12b67ff259b06c212
As a result `iwd` fails the build as:
wired/ethdev.c: In function 'pae_open':
wired/ethdev.c:340:55:
error: passing argument 4 of 'l_io_set_read_handler'
from incompatible pointer type [-Wincompatible-pointer-types]
340 | l_io_set_read_handler(pae_io, pae_read, NULL, pae_destroy);
| ^~~~~~~~~~~
| |
| void (*)(void)
In file included from ...-ell-0.70-dev/include/ell/ell.h:19,
from wired/ethdev.c:38:
...-ell-0.70-dev/include/ell/io.h:33:68:
note: expected 'l_io_destroy_cb_t' {aka 'void (*)(void *)'}
but argument is of type 'void (*)(void)'
33 | void *user_data, l_io_destroy_cb_t destroy);
| ~~~~~~~~~~~~~~~~~~^~~~~~~
C23 changed the meaning of `void (*)()` from partially defined prototype
to `void (*)(void)`.
The 3rd byte of the country code was being printed as ASCII but this
byte isn't always a printable character. Instead we can check what
the value is and describe what it means from the spec.
These frequencies were seen being advertised by a driver and IWD has
no operating class/channel mapping for them. Specifically 5960 was
causing issues due to a few bugs and mapping to channel 2 of the 6ghz
band. Those bugs have now been resolved.
If these frequencies can be supported in a clean manor we can remove
this test, but until then ensure IWD does not parse them.
After the band is established we check the e4 table for the channel
that matches. The problem here is we will end up checking all the
operating classes, even those that are not within the band that was
determined. This could result in false positives and return a
channel that doesn't make sense.
When the frequencies/channels were parsed there was no check that the
resulting band matched what was expected. Now, pass the band object
itself in which has the band set to what is expected.
If IPv6 is disabled or not supported at the kernel level writing the
sysfs settings will fail. A few of them had a support check but this
patch adds a supported bool to the remainder so we done get errors
like:
Unable to write drop_unsolicited_na to /proc/sys/net/ipv6/conf/wlan0/drop_unsolicited_na
Similar to several other modules DPP registers for its frame
watches on init then ignores anything is receives unless DPP
is actually running.
Due to some recent issues surrounding ath10k and multicast frames
it was discovered that simply registering for multicast RX frames
causes a significant performance impact depending on the current
channel load.
Regardless of the impact to a single driver, it is actually more
efficient to only register for the DPP frames when DPP starts
rather than when IWD initializes. This prevents any of the frames
from hitting userspace which would otherwise be ignored.
Using the frame-xchg group ID's we can only register for DPP
frames when needed, then close that group and the associated
frame watches.
DPP optionally uses the multicast RX flag for frame registrations but
since frame-xchg did not support that, it used its own registration
internally. To avoid code duplication within DPP add a flag to
frame_watch_add in order to allow DPP to utilize frame-xchg.
The selection loop was choosing an initial candidate purely for
use of the "fallback_to_blacklist" flag. But we have a similar
case with OWE transitional networks where we avoid the legacy
open network in preference for OWE:
/* Don't want to connect to the Open BSS if possible */
if (!bss->rsne)
continue;
If no OWE network gets selected we may iterate all BSS's and end
the loop, which then returns NULL.
To fix this move the blacklist check earlier and still ignore any
BSS's in the blacklist. Also add a new flag in the selection loop
indicating an open network was skipped. If we then exhaust all
other BSS's we can return this candidate.
Some drivers like brcmfmac don't support OWE but from userspace its
not possible to query this information. Rather than completely
blacklist brcmfmac we can allow the user to configure this and
disable OWE in IWD.
The "UK" alpha2 code is not the official code for the United Kingdom
but is a "reserved" code for compatibility. The official alpha2 is
"GB" which is being added to the EU list. This fixes issues parsing
neighbor reports, for example:
src/station.c:parse_neighbor_report() Neighbor report received for xx:xx:xx:xx:xx:xx: ch 136 (oper class 3), MD not set
Failed to find band with country string 'GB 32' and oper class 3, trying fallback
src/station.c:station_add_neighbor_report_freqs() Ignored: unsupported oper class
Test handling of technically illegal but harmless cloned IEs.
Based on real traffic captured from retail APs.
As cloned IEs are now allowed the
"/IE order/Bad (Duplicate + Out of Order IE) 1"
test payload has been altered to be more-wrong so it still fails
verification as expected.
Prior to adding the polling fallback this code path was only used for
signal level list notifications and netdev_rssi_polling_update() was
structured as such, where if the RSSI list feature existed there was
nothing to be done as the kernel handled the notifications.
For certain mediatek cards this is broken, hence why the fallback was
added. But netdev_rssi_polling_update() was never changed to take
this into account which bypassed the timer cleanup on disconnections
resulting in a crash when the timer fired after IWD was disconnected:
iwd: ++++++++ backtrace ++++++++
iwd: #0 0x7b5459642520 in /lib/x86_64-linux-gnu/libc.so.6
iwd: #1 0x7b54597aedf4 in /lib/x86_64-linux-gnu/libc.so.6
iwd: #2 0x49f82d in l_netlink_message_append() at ome/jprestwood/iwd/ell/netlink.c:825
iwd: #3 0x4a0c12 in l_genl_msg_append_attr() at ome/jprestwood/iwd/ell/genl.c:1522
iwd: #4 0x405c61 in netdev_rssi_poll() at ome/jprestwood/iwd/src/netdev.c:764
iwd: #5 0x49cce4 in timeout_callback() at ome/jprestwood/iwd/ell/timeout.c:70
iwd: #6 0x49c2ed in l_main_iterate() at ome/jprestwood/iwd/ell/main.c:455 (discriminator 2)
iwd: #7 0x49c3bc in l_main_run() at ome/jprestwood/iwd/ell/main.c:504
iwd: #8 0x49c5f0 in l_main_run_with_signal() at ome/jprestwood/iwd/ell/main.c:632
iwd: #9 0x4049ed in main() at ome/jprestwood/iwd/src/main.c:614
iwd: #10 0x7b5459629d90 in /lib/x86_64-linux-gnu/libc.so.6
iwd: #11 0x7b5459629e40 in /lib/x86_64-linux-gnu/libc.so.6
iwd: +++++++++++++++++++++++++++
To fix this we need to add checks for the cqm_poll_fallback flag in
netdev_rssi_polling_update().
Certain FullMAC drivers do not expose CMD_ASSOCIATE/CMD_AUTHENTICATE,
but lack the ability to fully offload SAE connections to the firmware.
Such connections can still be supported on such firmware by using
CMD_EXTERNAL_AUTH & CMD_FRAME. The firmware sets the
NL80211_FEATURE_SAE bit (which implies support for CMD_AUTHENTICATE, but
oh well), and no other offload extended features.
When CMD_CONNECT is issued, the firmware sends CMD_EXTERNAL_AUTH via
unicast to the owner of the connection. The connection owner is then
expected to send SAE frames with the firmware using CMD_FRAME and
receive authenticate frames using unicast CMD_FRAME notifications as
well. Once SAE authentication completes, userspace is expected to
send a final CMD_EXTERNAL_AUTH back to the kernel with the corresponding
status code. On failure, a non-0 status code should be used.
Note that for historical reasons, SAE AKM sent in CMD_EXTERNAL_AUTH is
given in big endian order, not CPU order as is expected!
The TX or RX bitrate attributes can contain zero nested attributes.
This causes netdev_parse_bitrate() to fail, but this shouldn't then
cause the overall parsing to fail (we just don't have those values).
Fix this by continuing to parse attributes if either the TX/RX
bitrates fail to parse.
If the affinity watch is removed by setting an empty list the
disconnect callback won't be called which was the only place
the watch ID was cleared. This resulted in the next SetProperty call
to think a watch existed, and attempt to compare the sender address
which would be NULL.
The watch ID should be cleared inside the destroy callback, not
the disconnect callback.
If we scan a huge number of frequencies the PKEX timeout can get
rather large. This was overlooked in a prior patch who's intent
was to reduce the PKEX time, but in these cases it increased it.
Now the timeout will be capped at 2 minutes, but will still be
as low as 10 seconds for a single frequency.
In addition there was no timer reset once PKEX was completed.
This could cause excessive waits if, for example, the peer left
the channel mid-authentication. IWD would just wait until the
long PKEX timeout to eventually reset DPP. Once PKEX completes
we can assume that this peer will complete authentication quickly
and if not, we can fail.
While there is proper handling for a regdom update during a
TRIGGER_SCAN scan, prior to NEW_SCAN_RESULTS there is no such
handling if the regdom update comes in during a GET_SCAN or
GET_SURVEY.
In both the 6ghz and non-6ghz code paths we have some issues:
- For non-6ghz devices, or regdom updates that did not enable
6ghz the wiphy state watch callback will automatically issues
another GET_SURVEY/GET_SCAN without checking if there was
already one pending. It does this using the current scan request
which gets freed by the prior GET_SCAN/GET_SURVEY calls when
they complete, causing invalid reads when the subsequent calls
finish.
- If 6ghz was enabled by the update we actually append another
trigger command to the list and potentially run it if its the
current request. This also will end up in the same situation as
the request is freed by the pending GET_SURVEY/GET_SCAN calls.
For the non-6ghz case there is little to no harm in ignoring the
regdom update because its very unlikely it changed the allowed
frequencies.
For the 6ghz case we could potentially handle the new trigger scan
within get_scan_done, but thats beyond the scope of this change
and is likely quite intrusive.
Since surveys end up making driver calls in the kernel its not
entirely known how they are implemented or how long they will
take. For this reason the survey will be skipped if getting the
results from an external scan.
Doing this also fixes a crash caused by external scans where the
scan request pointer is not checked and dereferenced:
0x00005ffa6a0376de in get_survey_done (user_data=0x5ffa783a3f90) at src/scan.c:2059
0x0000749646a29bbd in ?? () from /usr/lib/libell.so.0
0x0000749646a243cb in ?? () from /usr/lib/libell.so.0
0x0000749646a24655 in l_main_iterate () from /usr/lib/libell.so.0
0x0000749646a24ace in l_main_run () from /usr/lib/libell.so.0
0x0000749646a263a4 in l_main_run_with_signal () from /usr/lib/libell.so.0
0x00005ffa6a00d642 in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:614
Reported-by: Daniel Bond <danielbondno@gmail.com>
With the introduction of affinities the CQM threshold can be toggled
by a DBus call. There was no check if there was already a pending
call which would cause the command ID to be overwritten and lose any
potential to cancel it, e.g. if netdev went down.
Some drivers fail to set a CQM threshold and report not supported.
Its unclear exactly why but if this happens roaming is effectively
broken.
To work around this enable RSSI polling if -ENOTSUP is returned.
The polling callback has been changed to emit the HIGH/LOW signal
threshold events instead of just the RSSI level index, just as if
a CQM event came from the kernel.
When the affinity is set to the current BSS lower the roaming
threshold to loosly lock IWD to the current BSS. The lower
threshold is automatically removed upon roaming/disconnection
since the affinity array is also cleared out.
This property will hold an array of object paths for
BasicServiceSet (BSS) objects. For the purpose of this patch
only the setter/getter and client watch is implemented. The
purpose of this array is to guide or loosely lock IWD to certain
BSS's provided that some external client has more information
about the environment than what IWD takes into account for its
roaming decisions.
For the time being, the array is limited to only the connected
BSS path, and any roams or disconnects will clear the array.
The intended use case for this is if the device is stationary
an external client could reduce the likelihood of roaming by
setting the affinity to the current BSS.
This documents new DBus property that expose a bit more control to
how IWD roams.
Setting the affinity on the connected BSS effectively "locks" IWD to
that BSS (except at critical RSSI levels, explained below). This can
be useful for clients that have access to more information about the
environment than IWD. For example, if a client is stationary there
is likely no point in trying to roam until it has moved elsewhere.
A new main.conf option would also be added:
[General].CriticalRoamThreshold
This would be the new roam threshold set if the currently connected
BSS is in the Affinities list. If the RSSI continues to drop below
this level IWD will still attempt to roam.
A user reported a crash which was due to the roam trigger timeout
being overwritten, followed by a disconnect. Post-disconnect the
timer would fire and result in a crash. Its not clear exactly where
the overwrite was happening but upon code inspection it could
happen in the following scenario:
1. Beacon loss event, start roam timeout
2. Signal low event, no check if timeout is running and the timeout
gets overwritten.
The reported crash actually didn't appear to be from the above
scenario but something else, so this logic is being hardened and
improved
Now if a roam timeout already exists and trying to be rearmed IWD
will check the time remaining on the current timer and either keep
the active timer or reschedule it to the lesser of the two values
(current or new rearm time). This will avoid cases such as a long
roam timer being active (e.g. 60 seconds) followed by a beacon or
packet loss event which should trigger a more agressive roam
schedule.
This adds a secondary set of signal thresholds. The purpose of these
are to provide more flexibility in how IWD roams. The critical
threshold is intended to be temporary and is automatically reset
upon any connection changes: disconnects, roams, or new connections.
This prepares for the ability to toggle between two signal
thresholds in netdev. Since each netdev may not need/want the
same threshold store it in the netdev object rather than globally.
Since IWD enrollees can send unicast frames, a PKEX configurator could
still run without multicast support. Using this combination basically
allows any driver to utilize DPP/PKEX assuming the MAC address can
be communicated using some out of band mechanism.
The DPP spec allows for obtaining frequency and MAC addresses up
to the implementation. IWD already takes advantage of this by
first scanning for nearby APs and using only those frequencies.
For further optimization an enrollee may be able to determine the
configurators frequency and MAC ahead of time which would make
finding the configurator much faster.
This will help to get rid of magic number use throughout the project.
The definitions should be limited to global magic numbers that are used
throughout the project, for example SSID length, MAC address length,
etc.
Due to an unnoticed bug after adding the BasicServiceSet object into
network, it became clear that since station already owns the scan_bss
objects it makes sense for it to manage the associated DBus objects
as well. This way network doesn't have to jump through hoops to
determine if the scan_bss object was remove, added, or updated. It
can just manage its list as it did prior.
From the station side this makes things very easy. When scan results
come in we either update or add a new DBus object. And any time a
scan_bss is freed we remove the DBus object.
To reduce code duplication and prepare for moving the BSS interface
to station, add a new API so station can create a BSS path without
a network object directly.
src/eapol.c:1041:9: error: ‘buf’ may be used uninitialized [-Werror=maybe-uninitialized]
1041 | l_put_be16(0, &frame->header.packet_len);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This warning is bogus since the buffer is initialized through use of
eapol_frame members. EAPoL-Start is a very simple frame.
This was seemingly trivial at face value but doing so ended up
pointing out a bug with how group_retry is set when forcing
the default group. Since group_retry is initialized to -1 the
increment in the force_default_group block results in it being
set to zero, which is actually group 20, not 19. This did not
matter for hunt and peck, but H2E actually uses the retry value
to index its pre-generated points which then breaks SAE if
forcing the default group with H2E.
To handle H2E and force_default_group, the group selection
logic will always begin iterating the group array regardless of
SAE type.
The property itself is an array of paths, but this is difficult
to fit nicely within a terminal. Instead just display the count
of BSS's. Displaying a detailed list of BSS's will be done via
a separate command.
There are certain cases where we may not want to display the entire
header for a given set of properties. For example displaying a list
of proxy interfaces. Add finer control by separating out the header
and the prop/value display into two functions.
This will tell network the BSS list is being updated and it can
act accordingly as far as the BSS DBus registrations/unregistration.
In addition any scan_bss object needing to be freed has to wait
until after network_bss_stop_update() because network has to be able
to iterate its old list and unregister any BSS's that were not seen
in the scan results. This is done by pushing each BSS needing to be
freed into a queue, then destroying them after the BSS's are all
added.
This adds a new DBus object/interface for tracking BSS's for
a given network. Since scanning replaces scan_bss objects some
new APIs were added to avoid tearing down the associated DBus
object for each BSS.
network_bss_start_update() should be called before any new BSS's
are added to the network object. This will keep track of the old
list and create a new network->bss_list where more entries can
be added. This is effectively replacing network_bss_list_clear,
except it keeps the old list around until...
network_bss_stop_update() is called when all BSS's have been
added to the network object. This will then iterate the old list
and lookup if any BSS DBus objects need to be destroyed. Once
completed the old list is destroyed.
iwd supports FILS only on softmac drivers. Ensure the capability check
is consistent between wiphy and netdev, both the softmac and the
relevant EXT_FEATURE bit must be checked.
CMD_EXTERNAL_AUTH could potentially be used for FILS for FullMAC cards,
but no hardware supporting this has been identified yet.
Somehow this ability was lost in the refactoring. OWE was intended to
be used on fullmac cards, but the state machine is only actually created
if the connection type ends up being softmac.
Fixes: 8b6ad5d3b9ec ("owe: netdev: refactor to remove OWE as an auth-proto")
Certain flags (for example, NLA_F_NESTED) are ORed with the netlink
attribute type identifier prior to being sent on the wire. Such flags
need to be masked off and not taken into consideration when attribute
type is being compared against known values.
The workaround for Cisco APs reporting an operating class of zero
is still a bug that remains in Cisco equipment. This is made even
worse with the introduction of 6GHz where the channel numbers
overlap with both 2.4 and 5GHz bands. This makes it impossible to
definitively choose a frequency given only a channel number.
To improve this workaround and cover the 6GHz band we can calculate
a frequency for each band and see what is successful. Then append
each frequency we get to the list. This will result in more
frequencies scanned, but this tradeoff is better than potentially
avoiding a roam to 6GHz or high order 5ghz channel numbers.
[denkenz@archdev ~]$ qemu-system-x86_64 --version
QEMU emulator version 9.0.1
Copyright (c) 2003-2024 Fabrice Bellard and the QEMU Project developers
QEMU now seems to complain that 'no-hpet' and 'no-acpi' command line
arguments are unrecognized.
l_genl class has nice ways of discovering and requesting families. The
genl functionality has been added after the iwmon skeleton was created,
but it is now time to migrate to using these APIs.
This attribute is actually an array of signed 32 bit integers and it
was being treated as a single integer. This would work until more
than one threshold was set, then it would fail to parse it.
After the station state changes in DPP setting autoconnect=True was
causing DPP to stop prior to being able to scan for the network.
Instead we can start autoconnect earlier so we aren't toggling the
property while DPP is running.
Prior to now the DPP state was required to be disconnected before
DPP would start. This is inconvenient for the user since it requires
extra state checking and/or DBus method calls. Instead model this
case like WSC and issue a disconnect to station if DPP is requested
to start.
The other conditions on stopping DPP are also preserved and no
changes to the configurator role have been made, i.e. being
disconnected while configuring still stops DPP. Similarly any
connection made during enrolling will stop DPP.
It should also be noted that station's autoconfigure setting is also
preserved and set back to its original value upon DPP completing.
Gets the current autoconenct setting. This is not the current
autoconnect state. Will be used in DPP to reset station's autoconnect
setting back to what it was prior to DPP, in case of failure.
In order to slightly rework the DPP state machine to handle
automatically disconnecting (for enrollees) functions need to be
created that isolate everything needed to start DPP/PKEX in case
a disconnect needs to be done first.
If valgrind didn't report any issues, don't dump the logs. This
makes the test run a lot easier to look at without having to scroll
through pages of valgrind logs that provide no value.
When the survey code was added it neglected to add the same
cancelation logic that existed for the GET_SCAN call, i.e. if
a scan was canceled and there was a pending GET_SURVEY to the
kernel that needs to be canceled, and the request cleaned up.
Fixes: 35808debae ("scan: use GET_SURVEY for SNR calculation in ranking")
IPv6 address entry was not updated to use display_table_row which led to
a shifted line in table, as shown below:
$ iwctl station wlan0 show | head | sed 's| |.|g'
.................................Station:.wlan0................................
--------------------------------------------------------------------------------
..Settable..Property..............Value..........................................
--------------------------------------------------------------------------------
............Scanning..............no...............................................
............State.................connected........................................
............Connected.network.....Clannad.Legacy...................................
............IPv4.address..........192.168.1.12.....................................
............IPv6.address........fdc3:541d:864f:0:96db:c9ff:fe36:b15............
............ConnectedBss..........cc:d8:43:77:91:0e................................
This patch aligns IPv6 address line with other lines in the table.
Fixes: 35dd2c08219a (client: update station to use display_table_row, 2022-07-07)
testEncryptedProfiles:
- This would occationally fail because the test is expecting
to explicitly connect but after the first failed connection
autoconnect takes over and its a race to connect.
testPSK-roam:
- Several rules were not being cleaned up which could cause
tests afterwards to fail
- The AP roam test started failing randomly because of the SNR
ranking changes. It appears that with hwsim _sometimes_ the
SNR is able to be determined which can effect the ranking. This
test assumed the two BSS's would be the same ranking but the
SNR sometimes causes this to not be true.
The wait_for_event() function allows past events to cause this
function to return immediately. This behavior is known, and
relied on for some tests. But in some cases you want to only
handle _new_ events, so we need a way to clear out prior events.
This event is not used anywhere and can be leveraged in autotesting.
Move the event to eapol_start() so it gets called unconditionally
when the 4-way handshake is started.
If a disconnect arrives at any point during the 4-way handshake or
key setting this would result in netdev sending a disconnect event
to station. If this is a reassociation this case is unhandled in
station and causes a hang as it expects any connection failure to
be handled via the reassociation callback, not a random disconnect
event.
To handle this case we can utilize netdev_disconnected() along with
the new NETDEV_RESULT_DISCONNECTED result to ensure the connect
callback gets called if it exists (indicating a pending connection)
Below are logs showing the "Unexpected disconnect event" which
prevents IWD from cleaning up its state and ultimately results in a
hang:
Jul 16 18:16:13: src/station.c:station_transition_reassociate()
Jul 16 18:16:13: event: state, old: connected, new: roaming
Jul 16 18:16:13: src/wiphy.c:wiphy_radio_work_done() Work item 65 done
Jul 16 18:16:13: src/wiphy.c:wiphy_radio_work_next() Starting work item 66
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
Jul 16 18:16:13: src/netdev.c:netdev_deauthenticate_event()
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification New Station(19)
Jul 16 18:16:13: src/station.c:station_netdev_event() Associating
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
Jul 16 18:16:13: src/netdev.c:netdev_authenticate_event()
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Associate(38)
Jul 16 18:16:13: src/netdev.c:netdev_associate_event()
Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Connect(46)
Jul 16 18:16:13: src/netdev.c:netdev_connect_event()
Jul 16 18:16:13: src/netdev.c:netdev_connect_event() aborting and ignore_connect_event not set, proceed
Jul 16 18:16:13: src/netdev.c:netdev_connect_event() expect_connect_failure not set, proceed
Jul 16 18:16:13: src/netdev.c:parse_request_ies()
Jul 16 18:16:13: src/netdev.c:netdev_connect_event() Request / Response IEs parsed
Jul 16 18:16:13: src/netdev.c:netdev_get_oci()
Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:13: src/netdev.c:netdev_get_oci_cb() Obtained OCI: freq: 5220, width: 3, center1: 5210, center2: 0
Jul 16 18:16:13: src/eapol.c:eapol_start()
Jul 16 18:16:13: src/netdev.c:netdev_unicast_notify() Unicast notification Control Port Frame(129)
Jul 16 18:16:13: src/netdev.c:netdev_control_port_frame_event()
Jul 16 18:16:13: src/eapol.c:eapol_handle_ptk_1_of_4() ifindex=6
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Control Port TX Status(139)
Jul 16 18:16:14: src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
Jul 16 18:16:14: src/netdev.c:netdev_cqm_event() Signal change event (above=1 signal=-60)
Jul 16 18:16:17: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:17: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
Jul 16 18:16:17: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
Jul 16 18:16:17: src/netdev.c:netdev_deauthenticate_event()
Jul 16 18:16:17: src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
Jul 16 18:16:17: src/netdev.c:netdev_disconnect_event()
Jul 16 18:16:17: Received Deauthentication event, reason: 15, from_ap: true
Jul 16 18:16:17: src/wiphy.c:wiphy_radio_work_done() Work item 66 done
Jul 16 18:16:17: src/station.c:station_disconnect_event() 6
Jul 16 18:16:17: Unexpected disconnect event
Jul 16 18:16:17: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:17: src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
Jul 16 18:16:17: src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX
After adding the NETDEV_RESULT_DISCONNECTED enum, handshake failures
initiated by the AP come in via this result so the existing logic
to call network_connect_failed() was broken. We could still get a
handshake failure generated internally, so that has been preserved
(via NETDEV_RESULT_HANDSHAKE_FAILED) but a check for a 4-way
handshake timeout reason code was also added.
This new event is sent during a connection if netdev recieves a
disconnect event. This patch cleans up station to handle this
case and leave the existing NETDEV_EVENT_DISCONNECTED_BY_{AP,SME}
handling only for CONNECTED, NETCONFIG, and FW_ROAMING states.
This new result is meant to handle cases where a disconnect
event (deauth/disassoc) was received during an ongoing connection.
Whether that's during authentication, association, the 4-way
handshake, or key setting.
src/p2putil.c: In function 'p2p_get_random_string':
src/p2putil.c:2641:37: error: initializer element is not constant 2641 |
static const int set_size = strlen(CHARSET); |
^~~~~~
This code path is not exercised in the autotest but commonly does
happen in the real world. There is no associated bug with this, but
its helpful to have this event triggered in case something got
introduced in the future.
The authenticating event was not used anymore and the associating
event use was questionable (after the CMD_CONNECT callback).
No other modules actually utilize these events but they are useful
for autotests. Move these events around to map 1:1 when the kernel
sends the auth/assoc events.
There are a few values which are nice to see in debug logs. Namely
the BSS load and SNR. Both of these values may not be available
either due to the AP or local hardware limiations. Rather than print
dummy values for these refactor the print so append the values only
if they are set in the scan result.
For ranking purposes the utilization was defaulted to a valid (127)
which would not change the rank if that IE was not found in the
scan results. Historically this was printed (debug) as part of the
scan results but that was removed as it was somewhat confusing. i.e.
did the AP _really_ have a utilization of 127? or was the IE not
found?
Since it is useful to see the BSS load if that is advertised add a
flag to the scan_bss struct to indicate if the IE was present which
can be checked.
This issues a GET_SURVEY dump after scan results are available and
populates the survey information within the scan results. Currently
the only value obtained is the noise for a given frequency but the
survey results structure was created if in the future more values
need to be added.
From the noise, the SNR can be calculated. This is then used in the
ranking calculation to help lower BSS ranks that are on high noise
channels.
Parsing the flush flag for external scans was not done correctly
as it was not parsing the ATTR_SCAN_FLAGS but instead the flag
bitmap. Fix this by parsing the flags attribute, then checking if
the bit is set.
Add a nested attribute parser. For the first supported attribute
add NL80211_ATTR_SURVEY_INFO.
This allows parsing of nested attributes in the same convenient
way as nl80211_parse_attrs but allows for support of any level of
nested attributes provided that a handler is added for each.
To prep for adding a _nested() variant of this function refactor
this to act on an l_genl_attr object rather than the message itself.
In addition a handler specific to the attribute being parsed is
now passed in, with the current "handler_for_type" being renamed to
"handler_for_nl80211" that corresponds to root level attributes.
This warning is guaranteed to happen for SAE networks where there are
multiple netdev_authenticate_events. This should just be a check so
we don't register eapol twice, not a warning.
EAP-TTLS Start packets are empty by default, but can still be sent with
the L flag set. When attempting to reassemble a message we should not
fail if the length of the message is 0, and just treat it as any other
unfragmented message with the L flag set.
This test uses the same country/country3 values seen by an AP vendor
which causes issues with IWD. The alpha2 is ES (Spain) and the 3rd
byte is 4, indicating to use the E-4. The issue then comes when the
neighbor report claims the BSS is under operating class 3 which is
not part of E-4.
With the fallback implemented, this test will pass since it will
try and lookup only on ES (the EU table) which operating class 3 is
part of.
Its been seen that some vendors incorrectly set the 3rd byte of the
country code which causes the band lookup to fail with the provided
operating class. This isn't compliant with the spec, but its been
seen out in the wild and it causes IWD to behave poorly, specifically
with roaming since it cannot parse neighbor reports. This then
requires IWD to do a full scan on each roam.
Instead of a hard rejection, IWD can instead attempt to determine
the band by ignoring that 3rd byte and only use the alpha2 string.
This makes IWD slightly less strict but at the advantage of not being
crippled when exposed to poor AP configurations.
This was added to support a single buggy AP model that failed to
negotiate the SAE group correctly. This may still be a problem but
since then the [Network].UseDefaultEccGroup option has been added
which accomplishes the same thing.
Remove the special handling for this specific OUI and rely on the
user setting the new option if they have problems.
Both ell/shared and ell/internal targets first create the ell/
directory within IWD. This apparently was just luck that one of
these always finished first in parallel builds. On my system at
least when building using dpkg-buildpackage IWD fails to build
due to the ell/ directory missing. From the logs it appears that
both the shared/internal targets were started but didn't complete
(or at least create the directory) before the ell/ell.h target:
make[1]: Entering directory '/home/jprestwood/tmp/iwd'
/usr/bin/mkdir -p ell
/usr/bin/mkdir -p ell
echo -n > ell/ell.h
/usr/bin/mkdir -p src
/bin/bash: line 1: ell/ell.h: No such file or directory
make[1]: *** [Makefile:4028: ell/ell.h] Error 1
Creating the ell/ directory within the ell/ell.h target solve
the issue. For reference this is the configure command dpkg
is using:
./configure --build=x86_64-linux-gnu \
--prefix=/usr \
--includedir=/usr/include \
--mandir=/usr/share/man \
--infodir=/usr/share/info \
--sysconfdir=/etc \
--localstatedir=/var \
--disable-option-checking \
--disable-silent-rules \
--libdir=/usr/lib/x86_64-linux-gnu \
--runstatedir=/run \
--disable-maintainer-mode \
--disable-dependency-tracking \
--enable-tools \
--enable-dbus-policy
Experimental AP-mode support for receiving a Confirm frame when in the
COMMITTED state. The AP will reply with a Confirm frame.
Note that when acting as an AP, on reception of a Commit frame, the AP
only replies with a Commit frame. The protocols allows to also already
send the Confirm frame, but older clients may not support simultaneously
receiving a Commit and Confirm frame.
Don't mark either client as being the authenticator. In the current unit
tests, both instances act as clients to test functionality. This ensures
the unit does not show an error during the following commits where SAE
for AP mode is added.
This was overlooked in a prior patch and causes warnings to be
printed when the RSSI is too low to estimate an HE data rate or
due to incompatible local capabilities (e.g. MCS support).
Similar to the other estimations, return -ENETUNREACH if the IE
was valid but incompatible.
If the RSSI is too low or the local capabilities were not
compatible to estimate the rate don't warn but instead treat
this the same as -ENOTSUP and drop down to the next capability
set.
If we register the main EAPOL frame listener as late as the associate
event, it may not observe ptk_1_of_4. This defeats handling for early
messages in eapol_rx_packet, which only sees messages once it has been
registered.
If we move registration to the authenticate event, then the EAPOL
frame listeners should observe all messages, without any possible
races. Note that the messages are not actually processed until
eapol_start() is called, and we haven't moved that call site. All
that's changing here is how early EAPOL messages can be observed.
netdev_disconnect() was unconditionally sending CMD_DISCONNECT which
is not the right behavior when IWD has not associated. This means
that if a connection was started then immediately canceled with
the Disconnect() method the kernel would continue to authenticate.
Instead if IWD has not yet associated it should send a deauth
command which causes the kernel to correctly cleanup its state and
stop trying to authenticate.
Below are logs showing the behavior. Autoconnect is started followed
immediately by a DBus Disconnect call, yet the kernel continues
sending authenticate events.
event: state, old: autoconnect_quick, new: connecting (auto)
src/scan.c:scan_cancel() Trying to cancel scan id 1 for wdev 7d
src/wiphy.c:wiphy_radio_work_done() Work item 1 done
src/wiphy.c:wiphy_radio_work_next() Starting work item 2
src/netdev.c:netdev_mlme_notify() MLME notification New Station(19)
src/station.c:station_dbus_disconnect()
src/station.c:station_reset_connection_state() 85
src/station.c:station_roam_state_clear() 85
event: state, old: connecting (auto), new: disconnecting
src/wiphy.c:wiphy_radio_work_done() Work item 2 done
src/station.c:station_connect_cb() 85, result: 5
src/station.c:station_disconnect_cb() 85, success: 1
event: state, old: disconnecting, new: disconnected
src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
src/netdev.c:netdev_authenticate_event()
Unexpected connection related event -- is another supplicant running?
src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
src/netdev.c:netdev_authenticate_event()
Unexpected connection related event -- is another supplicant running?
src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
src/netdev.c:netdev_authenticate_event()
Unexpected connection related event -- is another supplicant running?
src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
src/netdev.c:netdev_authenticate_event()
Unexpected connection related event -- is another supplicant running?
In most cases any failure here is likely just due to the AP not
supporting the feature, whether its HE/VHT/HE. This should result
in the estimation returning -ENOTSUP in which case we move down
the list. Any other non-zero return we will now warn to make it
clear the IEs did exist, but were not properly formatted.
All length check failures were changed to continue instead of
fail. This will now treat invalid lengths as if the IE did not
exist.
In addition HE specifically has an extra validation function which,
if failed, was bailing out of the estimation function entirely.
Instead this is now treated as if there was no HE capabilities and
the logic can move down to VHT, HT, or basic rates.
This was changed from too large of a mask (0xff) in an earlier
commit but was masking 5 bits instead of 6.
Fixes: 121c2c5653 ("monitor: properly mask HE capabilities bitfield")
Caught by static analysis, the dev->conn_peer pointer was being
dereferenced very early on without a NULL check, but further it
was being NULL checked. If there is a possibility of it being NULL
the check should be done much earlier.
If the test needs to do something very specific it may be useful to
prevent IWD from managing all the radios. This can now be done
by setting a "reserve" option in the radio settings. The value of
this should be something other than iwd, hostapd, or wpa_supplicant.
For example:
[rad1]
reserve=false
Caught by static analysis, if ATTR_MAC was not in the message there
would be a memcpy with uninitialized bytes. In addition there is no
reason to memcpy twice. Instead 'mac' can be a const pointer which
both verifies it exists and removes the need for a second memcpy.
Static analysis complains that 'last' could be NULL which is true.
This really could only happen if every frequency was disabled which
likely is impossible but in any case, check before dereferencing
the pointer.
Since these are all stack variables they are not zero initialized.
If parsing fails there may be invalid pointers within the structures
which can get dereferenced by p2p_clear_*
The input queue pointer was being initialized unconditionally so if
parsing fails the out pointer is still set after the queue is
destroyed. This causes a crash during cleanup.
Instead use a temporary pointer while parsing and only after parsing
has finished do we set the out pointer.
Reported-By: Alex Radocea <alex@supernetworks.org>
When HUP is received the IO read callback was never completing which
caused it to block indefinitely until waited for. This didn't matter
for most transient processes but for IWD, hostapd, wpa_supplicant
it would cause test-runner to hang if the process crashed.
Detecting a crash is somewhat hacky because we have no process
management like systemd and the return code isn't reliable as some
processes return non-zero under normal circumstances. So to detect
a crash the process output is being checked for the string:
"++++++++ backtrace ++++++++". This isn't 100% reliable obviously
since its dependent on how the binary is compiled, but even if the
crash itself isn't detected any test should still fail if written
correctly.
Doing this allows auto-tests to handle IWD crashes gracefully by
failing the test, printing the exception (event without debugging)
and continue with other tests.
The slaac_test was one that would occationally fail, but very rarely,
due to the resolvconf log values appearing in an unexpected order.
This appears to be related to a typo in netconfig-commit which would
not set netconfig-domains and instead set dns_list. This was fixed
with a pending patch:
https://lore.kernel.org/iwd/20240227204242.1509980-1-denkenz@gmail.com/T/#u
But applying this now leads to testNetconfig failing slaac_test
100% of the time.
I'm not familiar enough with resolveconf to know if this test change
is ok, but based on the test behavior the expected log and disk logs
are the same, just in the incorrect order. I'm not sure if this the
log order is deterministic so instead the check now iterates the
expected log and verifies each value appears once in the resolvconf
log.
Here is an example of the expected vs disk logs after running the
test:
Expected:
-a wlan1.dns
nameserver 192.168.1.2
nameserver 3ffe:501:ffff💯:10
nameserver 3ffe:501:ffff💯:50
-a wlan1.domain
search test1
search test2
Resolvconf log:
-a wlan1.domain
search test1
search test2
-a wlan1.dns
nameserver 192.168.1.2
nameserver 3ffe:501:ffff💯:10
nameserver 3ffe:501:ffff💯:50
static analysis complains that authenticator is used uninitialized.
This isn't strictly true as memory region is reserved for the
authenticator using the contents of the passed in structure. This
region is then overwritten once the authenticator is actually computed
by authenticator_put(). Silence this warning by explicitly setting
authenticator bytes to 0.
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
This shouldn't be possible in theory since the roam_bss_list being
iterated is a subset of entire scan_bss list station/network has
but to be safe, and catch any issues due to future changes warn on
this condition.
For some encrypt operations DPP passes no AD iovecs (both are
NULL/0). But since the iovec itself is on the stack 'ad' is a
valid pointer from within aes_siv_encrypt. This causes memcpy
to be called which coverity complains about. Since the copy
length is zero it was effectively a no-op, but check num_ad to
prevent the call.
Tests the 3 possible options to UseDefaultEccGroup behave as
expected:
- When not provided use the "auto" behavior.
- When false, always use higher order groups
- When true, always use default group
The SAE test made some assumptions on certain conditions due to
there being no way of checking if those conditions were met
Mainly the use of H2E/hunt-and-peck.
We assumed that when we told hostapd to use H2E or hunt/peck it
would but in reality it was not. Hostapd is apparently not very
good at swapping between the two with a simple "reload" command.
Once H2E is enabled it appears that it cannot be undone.
Similarly the vendor elements seem to carry over from test to
test, and sometimes not which causes unintended behavior.
To fix this create separate APs for the specific scenario being
tested:
- Hunt and peck
- H2E
- Special vendor_element simulating buggy APs
Another issue found was that if password identifies are used
hostapd automatically chooses H2E which was not intented, at
least based on the test names (in reality it wasn't causing any
problems).
The tests have also been improved to use hostapds "sta_status"
command which contains the group number used when authenticating,
so now that at least can be verified.
In order to complete the learned default group behavior station needs
to be aware of when an SAE/OWE connection retried. This is all
handled within netdev/sae so add a new netdev event so station can
set the appropriate network flags to prevent trying the non-default
group again.
If either the settings specify it, or the scan_bss is flagged, set
the use_default_ecc_group flag in the handshake.
This also renames the flag to cover both OWE and SAE
There is special handling for buggy OWE APs which set a network flag
to use the default OWE group. Utilize the more persistent setting
within known-networks as well as the network object (in case there
is no profile).
This also renames the get/set APIs to be generic to ECC groups rather
than only OWE.
This adds the option [Settings].UseDefaultEccGroup which allows a
network profile to specify the behavior when using an ECC-based
protocol. If unset (default) IWD will learn the behavior of the
network for the lifetime of its process.
Many APs do not support group 20 which IWD tries first by default.
This leads to an initial failure followed by a retry using group 19.
This option will allow the user to configure IWD to use group 19
first or learn the network capabilities, if the authentication fails
with group 20 IWD will always use group 19 for the process lifetime.
The information specific to auth/assoc/connect timeouts isn't
communicated to station so emit the notice events within netdev.
We could communicate this to station by adding separate netdev
events, but this does not seem worth it for this use case as
these notice events aren't strictly limited to station.
For anyone debugging or trying to identify network infrastructure
problems the IWD DBus API isn't all that useful and ultimately
requires going through debug logs to figure out exactly what
happened. Having a concise set of debug logs containing only
relavent information would be very useful. In addition, having
some kind of syntax for these logs to be parsed by tooling could
automate these tasks.
This is being done, starting with station, by using iwd_notice
which internally uses l_notice. The use of the notice log level
(5) in IWD will be strictly for the type of messages described
above.
iwd_notice is being added so modules can communicate internal
state or event information via the NOTICE log level. This log
level will be reserved in IWD for only these type of messages.
The iwd_notice macro aims to help enforce some formatting
requirements for these type of log messages. The messages
should be one or more comma-separated "key: value" pairs starting
with "event: <name>" and followed by any additional info that
pertains to that event.
iwd_notice only enforces the initial event key/value format and
additional arguments are left to the caller to be formatted
correctly.
The --logger,-l flag can now be used to specify the logger type.
Unset (default) will set log output to stderr as it is today. The
other valid options are "syslog" and "journal".
basename use is considered harmful. There are two versions of
basename (see man 3 basename for details). The more intuitive version,
which is currently being used inside wiphy.c, is not supported by musl
libc implementation. Use of the libgen version is not preferred, so
drop use of basename entirely. Since wiphy.c is the only call site of
basename() inside iwd, open code the required logic.
ELL now has a setting to limit the number of DHCP attempts. This
will now be set in IWD and if reached will result in a failure
event, and in turn a disconnect.
IWD will set a maximum of 4 retries which should keep the maximum
DHCP time to ~60 seconds roughly.
The known frequency list is now a sorted list and the roam scan
results were not complying with this new requirement. The fix is
easy though since the iteration order of the scan results does
not matter (the roam candidates are inserted by rank). To fix
the known frequencies order we can simply reverse the scan results
list before iterating it.
When operating as an AP, drop message 4 of the 4-way handshake if the AP
has not yet received message 2. Otherwise an attacker can skip message 2
and immediately send message 4 to bypass authentication (the AP would be
using an all-zero ptk to verify the authenticity of message 4).
Modify the existing frequency test to check that the ordering
lines up with the ranking of the BSS.
Add a test to check that quick scans limit the number of known
frequencies.
In very large network deployments there could be a vast amount of APs
which could create a large known frequency list after some time once
all the APs are seen in scan results. This then increases the quick
scan time significantly, in the very worst case (but unlikely) just
as long as a full scan.
To help with this support in knownnetworks was added to limit the
number of frequencies per network. Station will now only get 5
recent frequencies per network making the maximum frequencies 25
in the worst case (~2.5s scan).
The magic values are now defines, and the recent roam frequencies
was also changed to use this define as well.
In order to support an ordered list of known frequencies the list
should be in order of last seen BSS frequencies with the highest
ranked ones first. To accomplish this without adding a lot of
complexity the frequencies can be pushed into the list as long as
they are pushed in reverse rank order (lowest rank first, highest
last). This ensures that very high ranked BSS's will always get
superseded by subsequent scans if not seen.
This adds a new network API to update the known frequency list
based on the current newtork->bss_list. This assumes that station
always wipes the BSS list on scans and populates with only fresh
BSS entries. After the scan this API can be called and it will
reverse the list, then add each frequency.
I've had connections to a WPA3-Personal only network fail with no log
message from iwd, and eventually figured out to was because the driver
would've required using CMD_EXTERNAL_AUTH. With the added log messages
the reason becomes obvious.
Additionally the fallback may happen even if the user explicitly
configured WPA3 in NetworkManager, I believe a warning is appropriate
there.
There was an unhandled corner case if netconfig was running and
multiple roam conditions happened in sequence, all before netconfig
had completed. A single roam before netconfig was already handled
(23f0f5717c) but this did not take into account any additional roam
conditions.
If IWD is in this state, having started netconfig, then roamed, and
again restarted netconfig it is still in a roaming state which will
prevent any further roams. IWD will remain "stuck" on the current
BSS until netconfig completes or gets disconnected.
In addition the general state logic is wrong here. If IWD roams
prior to netconfig it should stay in a connecting state (from the
perspective of DBus).
To fix this a new internal station state was added (no changes to
the DBus API) to distinguish between a purely WiFi connecting state
(STATION_STATE_CONNECTING/AUTO) and netconfig
(STATION_STATE_NETCONFIG). This allows IWD roam as needed if
netconfig is still running. Also, some special handling was added so
the station state property remains in a "connected" state until
netconfig actually completes, regardless of roams.
For some background this scenario happens if the DHCP server goes
down for an extended period, e.g. if its being upgraded/serviced.
This fixes a build break on some systems, specifically the
raspberry Pi 3 (ARM):
monitor/main.c: In function ‘open_packet’:
monitor/main.c:176:3: error: implicit declaration of function ‘close’; did you mean ‘pclose’? [-Werror=implicit-function-declaration]
176 | close(fd);
| ^~~~~
| pclose
This was caused by the unused hostapd instance running after being
re-enabled by mistake. This cause an additional scan result with the
same rank to be seen which would then be connected to by luck of the
draw.
This really needs to be done to many more autotests but since this
one seems to have random failures ensure that all the tests still
run if one fails. In addition add better cleanup for hwsim rules.
This gives the tests a lot more fine-tune control to wait for
specific state transitions rather than only what is exposed over
DBus.
The additional events for "ft-roam" and "reassoc-roam" were removed
since these are now covered by the more generic state change events
("ft-roaming" and "roaming" respectively).
To support multiple nlmon sources, move the logic that reads from iwmon
device into main.c instead of nlmon. nlmon.c now becomes agnostic of
how the packets are actually obtained. Packets are fed in via
high-level APIs such as nlmon_print_rtnl, nlmon_print_genl,
nlmon_print_pae.
The current implementation inside nlmon_receive is asymmetrical. RTNL
packets are printed using nlmon_print_rtnl while GENL packets are
printed using nlmon_message.
nlmon_print_genl and nlmon_print_rtnl already handle iterating over data
containing multiple messages, and are used by nlmon started in reader
mode. Use these for better symmetry inside nlmon_receive.
While here, move store_netlink() call into nlmon_print_rtnl. This makes
handling of PCAP output symmetrical for both RTNL and GENL packets.
This also fixes a possibility where only the first message of a
multi-RTNL packet would be stored.
nlmon_print_genl invokes genl_ctrl when a generic netlink control
message is encountered. genl_ctrl() tries to filter nl80211 family
appearance messages and setup nlmon->id with the extracted family id.
However, the id is already provided inside main.c by using nlmon_open,
and no control messages are processed by nlmon in 'capture' mode (-r
command line argument not passed) since all genl messages go through
nlmon_message() path instead.
configure scripts need to be runnable with a POSIX-compliant /bin/sh.
On many (but not all!) systems, /bin/sh is provided by Bash, so errors
like this aren't spotted. Notably Debian defaults to /bin/sh provided
by dash which doesn't tolerate such bashisms as '+='.
This retains compatibility with bash. Just copy the expanded append like
we do on the line above.
Fixes warnings like:
```
./configure: 13352: CFLAGS+= -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2: not found
```
In order to test that extra settings are applied prior to connecting
two tests were added for hidden networks as well as one testing if
there is already an existing profile after DPP.
The reason hidden networks were used was due to the requirement of
the "Hidden" settings in the profile. If this setting doesn't get
sync'ed to disk the connection will fail.
Before this change DPP was writing the credentials both to disk
and into the network object directly. This allowed the connection
to work fine but additional settings were not picked up due to
network_set_passphrase/psk loading the settings before they were
written.
Instead DPP can avoid setting the credentials to the network
object entirely and just write them to disk. Then, wait for
known networks to notify that the profile was either created
or updated then DPP can proceed to connecting. network_autoconnect()
will take care of loading the profile that DPP wrote and remove the
need for DPP to touch the network object at all.
One thing to note is that an idle callback is still needed from
within the known networks callback. This is because a new profile
requires network.c to set the network_info which is done in the
known networks callback. Rather than assume that network.c will be
called into before dpp.c an l_idle was added.
If a known network is modified on disk known networks does not have
any way of notifying other modules. This will be needed to support a
corner case in DPP if a profile exists but is overwritten after DPP
configuration. Add this event to known networks and handle it in
network.c (though nothing needs to be done in that case).
Without the change test-dpp fails on aarch64-linux as:
$ unit/test-dpp
TEST: DPP test responder-only key derivation
TEST: DPP test mutual key derivation
TEST: DPP test PKEX key derivation
test-dpp: unit/test-dpp.c:514: test_pkex_key_derivation: Assertion `!memcmp(tmp, __tmp, 32)' failed.
This happens due to int/size_t type mismatch passed to vararg
parameters to prf_plus():
bool prf_plus(enum l_checksum_type type, const void *key, size_t key_len,
void *out, size_t out_len,
size_t n_extra, ...)
{
// ...
va_start(va, n_extra);
for (i = 0; i < n_extra; i++) {
iov[i + 1].iov_base = va_arg(va, void *);
iov[i + 1].iov_len = va_arg(va, size_t);
// ...
Note that varargs here could only be a sequence of `void *` / `size_t`
values.
But in src/dpp-util.c `iwd` attempted to pass `int` there:
prf_plus(sha, prk, bytes, z_out, bytes, 5,
mac_i, 6, // <- here
mac_r, 6, // <- and here
m_x, bytes,
n_x, bytes,
key, strlen(key));
aarch64 stores only 32-bit value part of the register:
mov w7, #0x6
str w7, [sp, #...]
and loads full 64-bit form of the register:
ldr x3, [x3]
As a result higher bits of `iov[].iov_len` contain unexpected values and
sendmsg sends a lot more data than expected to the kernel.
The change fixes test-dpp test for me.
While at it fixed obvious `int` / `size_t` mismatch in src/erp.c.
Fixes: 6320d6db0f ("crypto: remove label from prf_plus, instead use va_args")
The path argument was used purely for debugging. It can be just as
informational printing just the SSID of the profile that failed to
parse the setting without requiring callers allocate a string to
call the function.
Certain tests may require external processes to work
(e.g. testNetconfig) and if missing the test will just hang until
the maximum test timeout. Check in start_process if the exe
actually exists and if not throw an exception.
In order to support identifiers the test profiles needed to be
reworked due to hostapd allowing multiple password entires. You
cannot just call set_value() with a new entry as the old ones
still exist. Instead use a unique password for the identifier and
non-identifier use cases.
After adding this test the failure_test started failing due to
hostapd not starting up. This was due to the group being unsupported
but oddly only when hostapd was reloaded (running the test
individually worked). To fix this the group number was changed to 21
which hostapd does support but IWD does not.
Adds a new network profile setting [Security].PasswordIdentifier.
When set (and the BSS enables SAE password identifiers) the network
and handshake object will read this and use it for the SAE
exchange.
Building the handshake will fail if:
- there is no password identifier set and the BSS sets the
"exclusive" bit.
- there is a password identifier set and the BSS does not set
the "in-use" bit.
Using this will provide netdev with a connect callback and unify the
roaming result notification between FT and reassociation. Both paths
will now end up in station_reassociate_cb.
This also adds another return case for ft_handshake_setup which was
previously ignored by ft_associate. Its likely impossible to actually
happen but should be handled nevertheless.
Fixes: 30c6a10f28 ("netdev: Separate connect_failed and disconnected paths")
Essentially exposes (and renames) netdev_ft_tx_associate in order to
be called similarly to netdev_reassociate/netdev_connect where a
connect callback can be provided. This will fix the current bug where
if association times out during FT IWD will hang and never transition
to disconnected.
This also removes the calling of the FT_ROAMED event and instead just
calls the connect callback (since its now set). This unifies the
callback path for reassociation and FT roaming.
This will be called from station after FT-authentication has
finished. It sets up the handshake object to perform reassociation.
This is essentially a copy-paste of ft_associate without sending
the actual frame.
In general only the authenticator FTE is used/validated but with
some FT refactoring coming there needs to be a way to build the
supplicants FTE into the handshake object. Because of this there
needs to be separate FTE buffers for both the authenticator and
supplicant.
The default() method was added for convenience but was extending the
test times significantly when the hostapd config was lengthy. This
was because it called set_value for every value regardless if it
had changed. Instead store the current configuration and in default()
only reset values that differ.
This tests ensures IWD disconnects after receiving an association
timeout event. This exposes a current bug where IWD does not
transition to disconnected after an association timeout when
FT-roaming.
If tests end in an unknown state it is sometimes required that IWD
be stopped manually in order for future tests to run. Add a stop()
method so test tearDown() methods can explicitly stop IWD.
The path for IWD to call this doesn't ever happen in autotests
but during debugging of the DPP agent it was noticed that the
DBus signature was incorrect and would always result in an error
when calling from IWD.
For adding SAE password identifiers the capability bits need to be
verified when loading the identifier from the profile. Pass the
BSS object in to network_load_psk rather than the 'need_passphrase'
boolean.
iov_ie_append assumed that a single IE was being added and thus the
length of the IE could be extracted directly from the element. However,
iov_ie_append was used on buffers which could contain multiple IEs
concatenated together, for example in handshake_state::vendor_ies. Most
of the time this was safe since vendor_ies was NULL or contained a
single element, but would result in incorrect behavior in the general
case. Fix that by changing iov_ie_append signature to take an explicit
length argument and have the caller specify whether the element is a
single IE or multiple.
Fixes: 7e9971661bcb ("netdev: Append any vendor IEs from the handshake")
Use an _auto_ variable to cleanup IEs allocated by
p2p_build_association_req(). While here, take out unneeded L_WARN_ON
since p2p_build_association_req cannot fail.
If the FT-Authenticate frame has been sent then a deauth is received
the work item for sending the FT-Associate frame is never canceled.
When this runs station->connected_network is NULL which causes a
crash:
src/station.c:station_try_next_transition() 7, target xx:xx:xx:xx:xx:xx
src/wiphy.c:wiphy_radio_work_insert() Inserting work item 5843
src/wiphy.c:wiphy_radio_work_insert() Inserting work item 5844
src/wiphy.c:wiphy_radio_work_done() Work item 5842 done
src/wiphy.c:wiphy_radio_work_next() Starting work item 5843
src/netdev.c:netdev_mlme_notify() MLME notification Remain on Channel(55)
src/ft.c:ft_send_authenticate()
src/netdev.c:netdev_mlme_notify() MLME notification Frame TX Status(60)
src/netdev.c:netdev_link_notify() event 16 on ifindex 7
src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
src/netdev.c:netdev_deauthenticate_event()
src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
src/netdev.c:netdev_disconnect_event()
Received Deauthentication event, reason: 7, from_ap: true
src/station.c:station_disconnect_event() 7
src/station.c:station_disassociated() 7
src/station.c:station_reset_connection_state() 7
src/station.c:station_roam_state_clear() 7
src/netconfig.c:netconfig_event_handler() l_netconfig event 2
src/netconfig-commit.c:netconfig_commit_print_addrs() removing address: yyy.yyy.yyy.yyy
src/resolve.c:resolve_systemd_revert() ifindex: 7
[DHCPv4] l_dhcp_client_stop:1264 Entering state: DHCP_STATE_INIT
src/station.c:station_enter_state() Old State: connected, new state: disconnected
src/station.c:station_enter_state() Old State: disconnected, new state: autoconnect_quick
src/wiphy.c:wiphy_radio_work_insert() Inserting work item 5845
src/netdev.c:netdev_mlme_notify() MLME notification Cancel Remain on Channel(56)
src/wiphy.c:wiphy_radio_work_done() Work item 5843 done
src/wiphy.c:wiphy_radio_work_next() Starting work item 5844
"Program terminated with signal SIGSEGV, Segmentation fault.",
"#0 0x0000565359ee3f54 in network_bss_find_by_addr ()",
"#0 0x0000565359ee3f54 in network_bss_find_by_addr ()",
"#1 0x0000565359ec9d23 in station_ft_work_ready ()",
"#2 0x0000565359ec0af0 in wiphy_radio_work_next ()",
"#3 0x0000565359f20080 in offchannel_mlme_notify ()",
"#4 0x0000565359f4416b in received_data ()",
"#5 0x0000565359f40d90 in io_callback ()",
"#6 0x0000565359f3ff4d in l_main_iterate ()",
"#7 0x0000565359f4001c in l_main_run ()",
"#8 0x0000565359f40240 in l_main_run_with_signal ()",
"#9 0x0000565359eb3888 in main ()"
ssid is declared as a 32 byte field in handshake_state, hence using it
as a string which is assumed to be nul-terminated will fail for SSIDs
that are 32 bytes long.
Fixes: d938d362b212 ("erp: ERP implementation and key cache move")
Fixes: 433373fe28a4 ("eapol: cache ERP keys on EAP success")
ssid is declared as a 32 byte field in handshake_state, hence using it
as a string which is assumed to be nul-terminated will fail for SSIDs
that are 32 bytes long.
Fixes: 1f1478285725 ("wiphy: add _generate_address_from_ssid")
Fixes: 5a1b1184fca6 ("netdev: support per-network MAC addresses")
In netdev_retry_owe, if l_gen_family_send fails, the connect_cmd is
never freed or reset. Fix that.
While here, use a stack variable instead of netdev member, since the use
of such a member is unnecessary and confusing.
vendor_ies stored in handshake_state are already added as part of
netdev_populate_common_ies(), which is already invoked by
netdev_build_cmd_connect().
Normally vendor_ies is NULL for OWE connections, so no IEs are
duplicated as a result.
CC src/adhoc.o
In file included from src/adhoc.c:28:0:
/usr/include/linux/if.h:234:19: error: field ‘ifru_addr’ has incomplete type
struct sockaddr ifru_addr;
^
/usr/include/linux/if.h:235:19: error: field ‘ifru_dstaddr’ has incomplete type
struct sockaddr ifru_dstaddr;
^
/usr/include/linux/if.h:236:19: error: field ‘ifru_broadaddr’ has incomplete type
struct sockaddr ifru_broadaddr;
^
/usr/include/linux/if.h:237:19: error: field ‘ifru_netmask’ has incomplete type
struct sockaddr ifru_netmask;
^
/usr/include/linux/if.h:238:20: error: field ‘ifru_hwaddr’ has incomplete type
struct sockaddr ifru_hwaddr;
^
Very rarely on ath10k (potentially other ath cards), disabling
power save while the interface is down causes a timeout when
bringing the interface back up. This seems to be a race in the
driver or firmware but it causes IWD to never start up properly
since there is no retry logic on that path.
Retrying is an option, but a more straight forward approach is
to just reorder the logic to set power save off after the
interface is already up. If the power save setting fails we can
just log it, ignore the failure, and continue. From a users point
of view there is no real difference in doing it this way as
PS still gets disabled prior to IWD connecting/sending data.
Changing behavior based on a buggy driver isn't something we
should be doing, but in this instance the change shouldn't have
any downside and actually isn't any different than how it has
been done prior to the driver quirks change (i.e. use network
manager, iw, or iwconfig to set power save after IWD starts).
For reference, this problem is quite rare and difficult to say
exactly how often but certainly <1% of the time:
iwd[1286641]: src/netdev.c:netdev_disable_ps_cb() Disabled power save for ifindex 54
kernel: ath10k_pci 0000:02:00.0: wmi service ready event not received
iwd[1286641]: Error bringing interface 54 up: Connection timed out
kernel: ath10k_pci 0000:02:00.0: Could not init core: -110
After this IWD just sits idle as it has no interface to start using.
This is even reproducable outside of IWD if you loop and run:
ip link set <wlan> down
iw dev <wlan> set power_save off
ip link set <wlan> up
Eventually the 'up' command will fail with a timeout.
I've brought this to the linux-wireless/ath10k mailing list but
even if its fixed in future kernels we'd still need to support
older kernels, so a workaround/change in IWD is still required.
This is done already for DPP, do the same for PKEX. Few drivers
(ath9k upstream, ath10k/11k in progress) support this which is
unfortunate but since a configurator will not work without this
capability its best to fail early.
The DPP spec allows 3rd party fields in the DPP configuration
object (section 4.5.2). IWD can take advantage of this (when
configuring another IWD supplicant) to communicate additional
profile options that may be required for the network.
The new configuration member will be called "/net/connman/iwd"
and will be an object containing settings specific to IWD.
More settings could be added here if needed but for now only
the following are defined:
{
send_hostname: true/false,
hidden: true/false
}
These correspond to the following network profile settings:
[IPv4].SendHostname
[Settings].Hidden
The scan result handling was fragile because it assumed the kernel
would only give results matching the requested SSID. This isn't
something we should assume so instead keep the configuration object
around until after the scan and use the target SSID to lookup the
network.
Nearly every use of the ssid member first has to memcpy it to a
buffer and NULL terminate. Instead just store the ssid as a
string when creating/parsing from JSON.
The DPP-PKEX spec provides a very limited list of frequencies used
to discover configurators, only 3 on 2.4 and 5GHz bands. Since
configurators (at least in IWD's implementation) are only allowed
on the current operating frequency its very unlikely an enrollee
will find a configurator on these frequencies out of the entire
spectrum.
The spec does mention that the 3 default frequencies should be used
"In lieu of specific channel information obtained in a manner outside
the scope of this specification, ...". This allows the implementation
some flexibility in using a broader range of frequencies.
To increase the chances of finding a configurator shared code
enrollees will first issue a scan to determine what access points are
around, then iterate these frequencies. This is especially helpful
when the configurators are IWD-based since we know that they'll be
on the same channels as the APs in the area.
The post-DPP connection was never done quite right due to station's
state being unknown. The state is now tracked in DPP by a previous
patch but the scan path in DPP is still wrong.
It relies on station autoconnect logic which has the potential to
connect to a different network than what was configured with DPP.
Its unlikely but still could happen in theory. In addition the scan
was not selectively filtering results by the SSID that DPP
configured.
This fixes the above problems by first filtering the scan by the
SSID. Then setting the scan results into station without triggering
autoconnect. And finally using network_autoconnect() directly
instead of relying on station to choose the SSID.
DPP (both DPP and PKEX) run the risk of odd behavior if station
decides to change state. DPP is completely unaware of this and
best case would just result in a protocol failure, worst case
duplicate calls to __station_connect_network.
Add a station watch and stop DPP if station changes state during
the protocol.
Commit c59669a366c5 ("netdev: disambiguate between disconnection types")
introduced different paths for different types of disconnection
notifications from netdev. Formalize this further by having
netdev_connect_failed only invoke connect_cb.
Disconnections that could be triggered outside of connection
related events are now handled on a different code path. For this
purpose, netdev_disconnected() is introduced.
When a roam event is received, iwd generates a firmware scan request and
notifies its event filter of the ROAMING condition. In cases where the
firmware scan could not be started successfully, netdev_connect_failed
is invoked. This is not a correct use of netev_connect_failed since it
doesn't actually disconnect the underlying netdev and the reflected
state becomes de-synchronized from the underlying kernel device.
The firmware scan request could currently fail for two reasons:
1. nl80211 genl socket is in a bad state, or
2. the scan context does not exist
Since both reasons are highly unlikely, simply use L_WARN instead.
The other two cases where netdev_connect_failed is used could only occur
if the kernel message is invalid. The message is ignored in that case
and a warning is printed.
The situation described above also exists in netdev_get_fw_scan_cb. If
the scan could not be completed successfully, there's not much iwd can
do to recover. Have iwd remain in roaming state and print an error.
There are generally three scenarios where iwd generates a disconnection
command to the kernel:
1. Error conditions stemming from a connection related event. For
example if SAE/FT/FILS authentication fails during Authenticate or
Associate steps and the kernel doesn't disconnect properly.
2. Deauthentication after the connection has been established and not
related to a connection attempt in progress. For example, SA Query
processing that triggers an disconnect.
3. Disconnects that are triggered due to a handshake failure or if
setting keys resulting from the handshake fails. These disconnects
can be triggered as a result of a pending connection or when a
connection has been established (e.g. due to rekeying).
Distinguish between 1 and 2/3 by having the disconnect procedure take
different paths. For now there are no functional changes since all
paths end up in netdev_connect_failed(), but this will change in the
future.
While here, also get rid of netdev_del_station. The only user of this
function was in ap.c and it could easily be replaced by invoking the new
nl80211_build_del_station function. The callback used by
netdev_build_del_station only printed an error and didn't do anything
useful. Get rid of it for now.
netdev_begin_connection() already invokes netdev_connect_failed on
error. Remove any calls to netdev_connect_failed in callers of
netdev_begin_connection().
Fixes: 4165d9414f54 ("netdev: use wiphy radio work queue for connections")
If netdev_get_oci fails, a goto deauth is invoked in order to terminate
the current connection and return an error to the caller. Unfortunately
the deauth label builds CMD_DEAUTHENTICATE in order to terminate the
connection. This was fine because it used to handle authentication
protocols that ran over CMD_AUTHENTICATE and CMD_ASSOCIATE. However,
OCI can also be used on FullMAC hardware that does not support them.
Use CMD_DISCONNECT instead which works everywhere.
Fixes: 06482b811626 ("netdev: Obtain operating channel info")
The reason code field was being obtained as a uint8_t value, while it is
actually a uint16_t in little-endian byte order.
Fixes: f3cc96499c44 ("netdev: added support for SA Query")
The reason code from deauthentication frame was being obtained as a
uint8_t instead of a uint16_t. The value was only ever used in an
informational statement. Since the value was in little endian, only the
first 8 bits of the reason code were obtained. Fix that.
Fixes: 2bebb4bdc7ee ("netdev: Handle deauth frames prior to association")
Several tests do not pass due to some additional changes that have
not been merged. Remove these cases and add some hardening after
discovering some unfortunate wpa_supplicant behavior.
- Disable p2p in wpa_supplicant. With p2p enabled an extra device
is created which starts receiving DPP frames and printing
confusing messages.
- Remove extra asserts which don't make sense currently. These
will be added back later as future additions to PKEX are
upstreamed.
- Work around wpa_supplicant retransmit limitation. This is
described in detail in the comment in pkex_test.py
- wait_for_event was returning a list in certain cases, not the
event itself
- The configurator ID was not being printed (',' instead of '%')
- The DPP ID was not being properly waited for with PKEX
With the addition of DPP PKEX autotests some of the timeouts are
quite long and hit test-runners maximum timeouts. For UML we should
allow this since time-travel lets us skip idle waits. Move the test
timeout out of a global define and into the argument list so QEMU
and UML can define it differently.
The StartConfigurator() call was left out since there would be no
functional difference to the user in iwctl. Its expected that
human users of the shared code API provide the code/id ahead of
time, i.e. use ConfigureEnrollee/StartEnrollee.
Check that enough space for newline and 0-byte is left in line.
This fixes a buffer overflow on specific completion results.
Reported-By: Leona Maroni <dev@leona.is>
Adds a configurator variant to be used along side an agent. When
called the configurator will start and wait for an initial PKEX
exchange message from an enrollee at which point it will request
the code from an agent. This provides more flexibility for
configurators that are capable of configuring multiple enrollees
with different identifiers/codes.
Note that the timing requirements per the DPP spec still apply
so this is not meant to be used with a human configurator but
within an automated agent which does a quick lookup of potential
identifiers/codes and can reply within the 200ms window.
The PKEX configurator role is currently limited to being a responder.
When started the configurator will listen on its current operating
channel for a PKEX exchange request. Once received it and the
encrypted key is properly decrypted it treats this peer as the
enrollee and won't allow configurations from other peers unless
PKEX is restarted. The configurator will encrypt and send its
encrypted ephemeral key in the PKEX exchange response. The enrollee
then sends its encrypted bootstrapping key (as commit-reveal request)
then the same for the configurator (as commit-reveal response).
After this, PKEX authentication begins. The enrollee is expected to
send the authenticate request, since its the initiator.
This is the initial support for PKEX enrollees acting as the
initiator. A PKEX initiator starts the protocol by broadcasting
the PKEX exchange request. This request contains a key encrypted
with the pre-shared PKEX code. If accepted the peer sends back
the exchange response with its own encrypted key. The enrollee
decrypts this and performs some crypto/hashing in order to establish
an ephemeral key used to encrypt its own boostrapping key. The
boostrapping key is encrypted and sent to the peer in the PKEX
commit-reveal request. The peer then does the same thing, encrypting
its own bootstrapping key and sending to the initiator as the
PKEX commit-reveal response.
After this, both peers have exchanged their boostrapping keys
securely and can begin DPP authentication, then configuration.
For now the enrollee will only iterate the default channel list
from the Easy Connect spec. Future upates will need to include some
way of discovering non-default channel configurators, but the
protocol needs to be ironed out first.
Stop() will now return NotFound if DPP is not running. This causes
the DPP test to fail since it calls this regardless if the protocol
already stopped. Ignore this exception since tests end in various
states, some stopped and some not.
PKEX and DPP will share the same state machine since the DPP protocol
follows PKEX. This does pose an issue with the DBus interfaces
because we don't want DPP initiated by the SharedCode interface to
start setting properties on the DeviceProvisioning interface.
To handle this a dpp_interface enum is being introduced which binds
the dpp_sm object to a particular interface, for the life of the
protocol run. Once the protocol finishes the dpp_sm can be unbound
allowing either interface to use it again later.
This mispelling was present in the configuration, so I retained parsing
of the legacy BandModifier*Ghz options for compatibility. Without this
change anyone spelling GHz correctly in their configs would be very
confused.
PKEX is part of the WFA EasyConnect specification and is
an additional boostrapping method (like QR codes) for
exchanging public keys between a configurator and enrollee.
PKEX operates over wifi and requires a key/code be exchanged
prior to the protocol. The key is used to encrypt the exchange
of the boostrapping information, then DPP authentication is
started immediately aftewards.
This can be useful for devices which don't have the ability to
scan a QR code, or even as a more convenient way to share
wireless credentials if the PSK is very secure (i.e. not a
human readable string).
PKEX would be used via the three DBus APIs on a new interface
SharedCodeDeviceProvisioning.
ConfigureEnrollee(a{sv}) will start a configurator with a
static shared code (optionally identifier) passed in as the
argument to this method.
StartEnrollee(a{sv}) will start a PKEX enrollee using a static
shared code (optionally identifier) passed as the argument to
the method.
StartConfigurator(o) will start a PKEX configurator and use the
agent specified by the path argument. The configurator will query
the agent for a specific code when an enrollee sends the initial
exchange message.
After the PKEX protocol is finished, DPP bootstrapping keys have
been exchanged and DPP Authentication will start, followed by
configuration.
Beacon loss handling was removed in the past because it was
determined that this even always resulted in a disconnect. This
was short sighted and not always true. The default kernel behavior
waits for 7 lost beacons before emitting this event, then sends
either a few nullfuncs or probe requests to the BSS to determine
if its really gone. If these come back successfully the connection
will remain alive. This can give IWD some time to roam in some
cases so we should be handling this event.
Since beacon loss indicates a very poor connection the roam scan
is delayed by a few seconds in order to give the kernel a chance
to send the nullfuncs/probes or receive more beacons. This may
result in a disconnect, but it would have happened anyways.
Attempting a roam mainly handles the case when the connection can
be maintained after beacon loss, but is still poor.
This is being done to allow the DPP module to work correctly. DPP
currently uses __station_connect_network incorrectly since it
does not (and cannot) change the state after calling. The only
way to connect with a state change is via station_connect_network
which requires a DBus method that triggered the connection; DPP
does not have this due to its potentially long run time.
To support DPP there are a few options:
1. Pass a state into __station_connect_network (this patch)
2. Support a NULL DBus message in station_connect_network. This
would require several NULL checks and adding all that to only
support DPP just didn't feel right.
3. A 3rd connect API in station which wraps
__station_connect_network and changes the state. And again, an
entirely new API for only DPP felt wrong (I guess we did this
for network_autoconnect though...)
Its about 50/50 between call sites that changed state after calling
and those that do not. Changing the state inside
__station_connect_network felt useful enough to cover the cases that
could benefit and the remaining cases could handle it easily enough:
- network_autoconnect(), and the state is changed by station after
calling so it more or less follows the same pattern just routes
through network. This will now pass the CONNECTING_AUTO state
from within network vs station.
- The disconnect/reconnect path. Here the state is changed to
ROAMING prior in order to avoid multiple state changes. Knowing
this the same ROAMING state can be passed which won't trigger a
state change.
- Retrying after a failed BSS. The state changes on the first call
then remains the same for each connection attempt. To support this
the current station->state is passed to avoid a state change.
Until now IWD only supported enrollees as responders (configurators
could do both). For PKEX it makes sense for the enrollee to be the
initiator because configurators in the area are already on their
operating channel and going off is inefficient. For PKEX, whoever
initiates also initiates authentication so for this reason the
authentication path is being opened up to allow enrollees to
initiate.
The check for the header was incorrect according to the spec.
Table 58 indicates that the "Query Response Info" should be set
to 0x00 for the configuration request. The frame handler was
expecting 0x7f which is the value for the config response frame.
Unfortunately wpa_supplicant also gets this wrong and uses 0x7f
in all cases which is likely why this value was set incorrectly
in IWD. The issue is that IWD's config request is correct which
means IWD<->IWD configuration is broken. (and wpa_supplicant as
a configurator likely doesn't validate the config request).
Fix this by checking both 0x7f and 0x00 to handle both
supplicants.
Stopping periodic scans and not restarting them prevents autoconnect
from working again if DPP (or the post-DPP connect) fails. Since
the DPP offchannel work is at a higher priority than scanning (and
since new offchannels are queue'd before canceling) there is no risk
of a scan happening during DPP so its safe to leave periodic scans
running.
The packet loss handler puts a higher priority on roaming compared
to the low signal roam path. This is generally beneficial since this
event usually indicates some problem with the BSS and generally is
an indicator that a disconnect will follow sometime soon.
But by immediately issuing a scan we run the risk of causing many
successive scans if more packet loss events arrive following
the roam scans (and if no candidates are found). Logs provided
further.
To help with this handle the first event with priority and
immediately issue a roam scan. If another event comes in within a
certain timeframe (2 seconds) don't immediately scan, but instead
rearm the roam timer instead of issuing a scan. This also handles
the case of a low signal roam scan followed by a packet loss
event. Delaying the roam will at least provide some time for packets
to get out in between roam scans.
Logs were snipped to be less verbose, but this cycled happened
5 times prior. In total 7 scans were issued in 5 seconds which may
very well have been the reason for the local disconnect:
Oct 27 16:23:46 src/station.c:station_roam_failed() 9
Oct 27 16:23:46 src/wiphy.c:wiphy_radio_work_done() Work item 29 done
Oct 27 16:23:47 src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
Oct 27 16:23:47 src/station.c:station_packets_lost() Packets lost event: 10
Oct 27 16:23:47 src/station.c:station_roam_scan() ifindex: 9
Oct 27 16:23:47 src/wiphy.c:wiphy_radio_work_insert() Inserting work item 30
Oct 27 16:23:47 src/wiphy.c:wiphy_radio_work_next() Starting work item 30
Oct 27 16:23:47 src/station.c:station_start_roam() Using cached neighbor report for roam
Oct 27 16:23:47 src/scan.c:scan_notify() Scan notification Trigger Scan(33)
Oct 27 16:23:47 src/scan.c:scan_request_triggered() Active scan triggered for wdev a
Oct 27 16:23:47 src/scan.c:scan_notify() Scan notification New Scan Results(34)
Oct 27 16:23:47 src/netdev.c:netdev_link_notify() event 16 on ifindex 9
... scan results ...
Oct 27 16:23:47 src/station.c:station_roam_failed() 9
Oct 27 16:23:47 src/wiphy.c:wiphy_radio_work_done() Work item 30 done
Oct 27 16:23:47 src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
Oct 27 16:23:47 src/station.c:station_packets_lost() Packets lost event: 10
Oct 27 16:23:47 src/station.c:station_roam_scan() ifindex: 9
Oct 27 16:23:47 src/wiphy.c:wiphy_radio_work_insert() Inserting work item 31
Oct 27 16:23:47 src/wiphy.c:wiphy_radio_work_next() Starting work item 31
Oct 27 16:23:47 src/station.c:station_start_roam() Using cached neighbor report for roam
Oct 27 16:23:47 src/scan.c:scan_notify() Scan notification Trigger Scan(33)
Oct 27 16:23:47 src/scan.c:scan_request_triggered() Active scan triggered for wdev a
Oct 27 16:23:48 src/scan.c:scan_notify() Scan notification New Scan Results(34)
Oct 27 16:23:48 src/netdev.c:netdev_link_notify() event 16 on ifindex 9
... scan results ...
Oct 27 16:23:48 src/station.c:station_roam_failed() 9
Oct 27 16:23:48 src/wiphy.c:wiphy_radio_work_done() Work item 31 done
Oct 27 16:23:48 src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
Oct 27 16:23:48 src/station.c:station_packets_lost() Packets lost event: 10
Oct 27 16:23:48 src/station.c:station_roam_scan() ifindex: 9
Oct 27 16:23:48 src/wiphy.c:wiphy_radio_work_insert() Inserting work item 32
Oct 27 16:23:48 src/wiphy.c:wiphy_radio_work_next() Starting work item 32
Oct 27 16:23:48 src/station.c:station_start_roam() Using cached neighbor report for roam
Oct 27 16:23:48 src/scan.c:scan_notify() Scan notification Trigger Scan(33)
Oct 27 16:23:48 src/scan.c:scan_request_triggered() Active scan triggered for wdev a
Oct 27 16:23:49 src/netdev.c:netdev_link_notify() event 16 on ifindex 9
Oct 27 16:23:49 src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
Oct 27 16:23:49 src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
Oct 27 16:23:49 src/netdev.c:netdev_deauthenticate_event()
Oct 27 16:23:49 src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
Oct 27 16:23:49 src/netdev.c:netdev_disconnect_event()
Oct 27 16:23:49 Received Deauthentication event, reason: 4, from_ap: false
Include a specific timeout value so different protocols can specify
different timeouts. For example once the authentication timeout
should not take very long (even 10 seconds seems excessive) but
adding PKEX may warrant longer timeouts.
For example discovering a configurator IWD may want to wait several
minutes before ending the discovery. Similarly running PKEX as a
configurator we should put a hard limit on the time, but again
minutes rather than 10 seconds.
The memcpy in HEX2BUF was copying the length of the buffer that was
passed in, not the actual length of the converted hexstring. This
test was segfaulting in the Alpine CI which uses clang/musl.
Its been seen (so far only in mac80211_hwsim + UML) where an
offchannel requests ACK comes after the ROC started event. This
causes the ROC started event to never call back to notify since
info->roc_cookie is unset and it appears to be coming from an
external process.
We can detect this situation in the ROC notify event by checking
if there is a pending ROC command and if info->roc_cookie does
not match. This can also be true for an external event so we just
set a new "early_cookie" member and return.
Then, when the ACK comes in for the ROC request, we can validate
if the prior event was associated with IWD or some external
process. If it was from IWD call the started callback, otherwise
the ROC notify event should come later and handled under the
normal logic where the cookies match.
Instead of looking up by wdev, lookup by the ID itself. We
shouldn't ever have more than one info per wdev in the queue but
looking up the _exact_ info structure doesn't hurt in case things
change in the future.
If netconfig is canceled before completion (when roaming) the
settings are freed and never loaded again once netconfig is started
post-roam. Now after a roam make sure to re-load the settings and
start netconfig.
Commit 23f0f5717c did not correctly handle the reassociation
case where the state is set from within station_try_next_transition.
If IWD reassociates netconfig will get reset and DHCP will need to
be done over again after the roam. Instead get the state ahead of
station_try_next_transition.
Fixes: 23f0f5717ca0 ("station: allow roaming before netconfig finishes")
When using mutual authentication an additional value needs to
be hashed when deriving i/r_auth values. A NULL value indicates
no mutual authentication (zero length iovec is passed to hash).
DPP configurators are running the majority of the protocol on the
current operating channel, meaning no ROC work. The retry logic
was bailing out if !dpp->roc_started with the assumption that DPP
was in between requesting offchannel work and it actually starting.
For configurators, this may not be the case. The offchannel ID also
needs to be checked, and if no work is scheduled we can send the
frame.
The prf_plus API was a bit restrictive because it only took a
string label which isn't compatible with some specs (e.g. DPP
inputs to HKDF-Expand). In addition it took additional label
aruments which were appended to the HMAC call (and the
non-intuitive '\0' if there were extra arguments).
Instead the label argument has been removed and callers can pass
it in through va_args. This also lets the caller decided the length
and can include the '\0' or not, dependent on the spec the caller
is following.
Adds a handler for the HE capabilities element and reworks the way
the MCS/NSS support bits are printed.
Now if the MCS support is 3 (unsupported) it won't be printed. This
makes the logs a bit shorter to read.
Matched the printed function name with the actual function name.
The simple-agent test prints the function name to allow easier debugging.
One name was not set currectly (most likely through copy pasting).
SAE was also relying on the ELL bug which was incorrectly performing
a subtraction on the Y coordinate based on the compressed point type.
Correct this and make the point type more clear (rather than
something like "is_odd + 2").
EAP-PWD was incorrectly computing the PWE but due to the also
incorrect logic in ELL the point converted correctly. This is
being fixed, so both places need the reverse logic.
Also added a big comment explaining why this is, and how
l_ecc_point_from_data behaves since its somewhat confusing since
EAP-PWD expects the pwd-seed to be compared to the actual Y
coordinate (which is handled automatically by ELL).
Add a test to show the incorrect ASN1 conversion to and from points.
This was due to the check if Y is odd/even being inverted which
incorrectly prefixes the X coordinate with the wrong byte.
The test itself was not fully correct because it was using compliant
points rather than full points, and the spec contains the entire
Y coordinate so the full point should be used.
This patch also adds ASN1 conversions to validate that
dpp_point_from_asn1 and dpp_point_to_asn1 work properly.
The previous attempt at working around this warning seems to no longer
work with gcc 13
In function ‘eap_handle_response’,
inlined from ‘eap_rx_packet’ at src/eap.c:570:3:
src/eap.c:421:49: error: ‘vendor_id’ may be used uninitialized [-Werror=maybe-uninitialized]
421 | (type == EAP_TYPE_EXPANDED && vendor_id == (id) && vendor_type == (t))
| ~~~~~~~~~~^~~~~~~
src/eap.c:533:20: note: in expansion of macro ‘IS_EXPANDED_RESPONSE’
533 | } else if (IS_EXPANDED_RESPONSE(our_vendor_id, our_vendor_type))
| ^~~~~~~~~~~~~~~~~~~~
src/eap.c: In function ‘eap_rx_packet’:
src/eap.c:431:18: note: ‘vendor_id’ was declared here
431 | uint32_t vendor_id;
| ^~~~~~~~~
width must be initialized since it depends on best not being NULL. If
best passes the non-NULL check above, then width must be initialized
since both width and best are set at the same time.
For IWD to work correctly either 2.4GHz or 5GHz bands must be enabled
(even for 6GHz to work). Check this and don't allow IWD to initialize
if both 2.4 and 5GHz is disabled.
wiphy_get_allowed_freqs was only being used to see if 6GHz was disabled
or not. This is expensive and requires several allocations when there
already exists wiphy_is_band_disabled(). The prior patch modified
wiphy_is_band_disabled() to return -ENOTSUP which allows scan.c to
completely remove the need for wiphy_get_allowed_freqs.
scan_wiphy_watch was also slightly re-ordered to avoid allocating
freqs_6ghz if the scan request was being completed.
The function wiphy_band_is_disabled() return was a bit misleading
because if the band was not supported it would return true which
could be misunderstood as the band is supported, but disabled.
There was only one call site and because of this behavior
wiphy_band_is_disabled needed to be paired with checking if the
band was supported.
To be more descriptive to the caller, wiphy_band_is_disabled() now
returns an int and if the band isn't supported -ENOTSUP will be
returned, otherwise 1 is returned if the band is disabled and 0
otherwise.
This adds support to allow users to disable entire bands, preventing
scanning and connecting on those frequencies. If the
[Rank].BandModifier* options are set to 0.0 it will imply those
bands should not be used for scanning, connecting or roaming. This
now applies to autoconnect, quick, hidden, roam, and dbus scans.
This is a station only feature meaning other modules like RRM, DPP,
WSC or P2P may still utilize those bands. Trying to limit bands in
those modules may sometimes conflict with the spec which is why it
was not added there. In addition modules like DPP/WSC are only used
in limited capacity for connecting so there is little benefit gained
to disallowing those bands.
To support user-disabled bands periodic scans need to specify a
frequency list filtered by any bands that are disabled. This was
needed in scan.c since periodic scans don't provide a frequency
list in the scan request.
If no bands are disabled the allowed freqs API should still
result in the same scan behavior as if a frequency list is left
out i.e. IWD just filters the frequencies as opposed to the kernel.
Currently the only way a scan can be split is if the request does
not specify any frequencies, implying the request should scan the
entire spectrum. This allows the scan logic to issue an extra
request if 6GHz becomes available during the 2.4 or 5GHz scans.
This restriction was somewhat arbitrary and done to let periodic
scans pick up 6GHz APs through a single scan request.
But now with the addition of allowing user-disabled bands
periodic scans will need to specify a frequency list in case a
given band has been disabled. This will break the scan splitting
code which is why this prep work is being done.
The main difference now is the original scan frequencies are
tracked with the scan request. The reason for this is so if a
request comes in with a limited set of 6GHz frequences IWD won't
end up scanning the full 6GHz spectrum later on.
This is more or less copied from scan_get_allowed_freqs but is
going to be needed by station (basically just saves the need for
station to do the same clone/constrain sequence itself).
One slight alteration is now a band mask can be passed in which
provides more flexibility for additional filtering.
This exposes the [Rank].BandModifier* settings so other modules
can use then. Doing this will allow user-disabling of certain
bands by setting these modifier values to 0.0.
The loop iterating the frequency attributes list was not including
the entire channel set since it was stopping at i < band->freqs_len.
The freq_attrs array is allocated to include the last channel:
band->freq_attrs = l_new(struct band_freq_attrs, num_channels + 1);
band->freqs_len = num_channels;
So instead the for loop should use i <= band->freqs_len. (I also
changed this to start the loop at 1 since channel zero is invalid).
The auth/action status is now tracked in ft.c. If an AP rejects the
FT attempt with "Invalid PMKID" we can now assume this AP is either
mis-configured for FT or is lagging behind getting the proper keys
from neighboring APs (e.g. was just rebooted).
If we see this condition IWD can now fall back to reassociation in
an attempt to still roam to the best candidate. The fallback decision
is still rank based: if a BSS fails FT it is marked as such, its
ranking is reset removing the FT factor and it is inserted back
into the queue.
The motivation behind this isn't necessarily to always force a roam,
but instead to handle two cases where IWD can either make a bad roam
decision or get 'stuck' and never roam:
1. If there is one good roam candidate and other bad ones. For
example say BSS A is experiencing this FT key pull issue:
Current BSS: -85dbm
BSS A: -55dbm
BSS B: -80dbm
The current logic would fail A, and roam to B. In this case
reassociation would have likely succeeded so it makes more sense
to reassociate to A as a fallback.
2. If there is only one candidate, but its failing FT. IWD will
never try anything other than FT and repeatedly fail.
Both of the above have been seen on real network deployments and
result in either poor performance (1) or eventually lead to a full
disconnect due to never roaming (2).
Certain return codes, though failures, can indicate that the AP is
just confused or booting up and treating it as a full failure may
not be the best route.
For example in some production deployments if an AP is rebooted it
may take some time for neighboring APs to exchange keys for
current associations. If a client roams during that time it will
reject saying the PMKID is invalid.
Use the ft_associate call return to communicate the status (if any)
that was in the auth/action response. If there was a parsing error
or no response -ENOENT is still returned.
In many tests the hostapd configuration does not include all the
values that a test uses. Its expected that each individual test
will add the values required. In many cases its required each test
slightly alter the configuration for each change every other test
has to set the value back to either a default or its own setting.
This results in a ton of duplicated code mainly setting things
back to defaults.
To help with this problem the hostapd configuration is read in
initially and stored as the default. Tests can then simply call
.default() to set everything back. This significantly reduces or
completely removes a ton of set_value() calls.
This does require that each hostapd configuration file includes all
values any of the subtests will set, which is a small price for the
convenience.
Removed several debug prints which are very verbose and provide
little to no important information.
The get_scan_{done,callback} prints are pointless since all the
parsed scan results are printed by station anyways.
Printing the BSS load is also not that useful since it doesn't
include the BSSID. If anything the BSS load should be included
when station prints out each individual BSS (along with frequency,
rank, etc).
The advertisement protocol print was just just left in there by
accident when debugging, and also provides basically no useful
information.
Some APs don't include the RSNE in the associate reply during
the OWE exchange. This causes IWD to be incompatible since it has
a hard requirement on the AKM being included.
This relaxes the requirement for the AKM and instead warns if it
is not included.
Below is an example of an association reply without the RSN element
IEEE 802.11 Association Response, Flags: ........
Type/Subtype: Association Response (0x0001)
Frame Control Field: 0x1000
.000 0000 0011 1100 = Duration: 60 microseconds
Receiver address: 64:c4:03:88:ff:26
Destination address: 64:c4:03:88:ff:26
Transmitter address: fc:34:97:2b:1b:48
Source address: fc:34:97:2b:1b:48
BSS Id: fc:34:97:2b:1b:48
.... .... .... 0000 = Fragment number: 0
0001 1100 1000 .... = Sequence number: 456
IEEE 802.11 wireless LAN
Fixed parameters (6 bytes)
Tagged parameters (196 bytes)
Tag: Supported Rates 6(B), 9, 12(B), 18, 24(B), 36, 48, 54, [Mbit/sec]
Tag: RM Enabled Capabilities (5 octets)
Tag: Extended Capabilities (11 octets)
Ext Tag: HE Capabilities (IEEE Std 802.11ax/D3.0)
Ext Tag: HE Operation (IEEE Std 802.11ax/D3.0)
Ext Tag: MU EDCA Parameter Set
Ext Tag: HE 6GHz Band Capabilities
Ext Tag: OWE Diffie-Hellman Parameter
Tag Number: Element ID Extension (255)
Ext Tag length: 51
Ext Tag Number: OWE Diffie-Hellman Parameter (32)
Group: 384-bit random ECP group (20)
Public Key: 14ba9d8abeb2ecd5d95e6c12491b16489d1bcc303e7a7fbd…
Tag: Vendor Specific: Broadcom
Tag: Vendor Specific: Microsoft Corp.: WMM/WME: Parameter Element
Reported-By: Wen Gong <quic_wgong@quicinc.com>
Tested-By: Wen Gong <quic_wgong@quicinc.com>
Handling these events notifies hwsim of address changes for interface
creation/removal outside the initial namespace as well as address
changes due to scanning address randomization.
Interfaces that hwsim already knows about are still handled via
nl80211. But any interfaces not known when ADD/DEL_MAC_ADDR events
come will be treated specially.
For ADD, a dummy interface object will be created and added to the
queue. This lets the frame processing match the destination address
correctly. This can happen both for scan randomization and interface
creation outside of the initial namespace.
For the DEL event we handle similarly and don't touch any interfaces
found via nl80211 (i.e. have a 'name') but need to also be careful
with the dummy interfaces that were created outside the initial
namespace. We want to keep these around but scanning MAC changes can
also delete them. This is why a reference count was added so scanning
doesn't cause a removal.
For example, the following sequence:
ADD_MAC_ADDR (interface creation)
ADD_MAC_ADDR (scanning started)
DEL_MAC_ADDR (scanning done)
Hostapd commit b6d3fd05e3 changed the PMKID derivation in accordance
with 802.11-2020 which then breaks PMKID validation in IWD. This
breaks the FT-8021x AKM in IWD if the AP uses this hostapd version
since the PMKID doesn't validate during EAPoL.
This updates the PMKID derivation to use the correct SHA hash for
this AKM and adds SHA1 based PMKID checking for interoperability
with older hostapd versions.
The PMKID derivation has gotten messy due to the spec
updating/clarifying the hash size for the FT-8021X AKM. This
has led to hostapd updating the derivation which leaves older
hostapd versions using SHA1 and newer versions using SHA256.
To support this the checksum type is being fed to
handshake_state_get_pmkid so the caller can decide what sha to
use. In addition handshake_state_pmkid_matches is being added
which uses get_pmkid() but handles sorting out the hash type
automatically.
This lets preauthentication use handshake_state_get_pmkid where
there is the potential that a new PMKID is derived and eapol
can use handshake_state_pmkid_matches which only derives the
PMKID to compare against the peers.
The existing API was limited to SHA1 or SHA256 and assumed a key
length of 32 bytes. Since other AKMs plan to be added update
this to take the checksum/length directly for better flexibility.
This is consistent with the over-Air path, and makes it clear when
reading the logs if over-DS was used, if there was a response frame,
and if the frame failed to parse in some way.
The parsing code was breaking out of the loop on the first comment
which is incorrect and causes only part of the file to be parsed.
Its odd this hasn't popped up until now but its likely due to
differing dhcpd versions, some which add comments and others that
do not.
FILS rekeys were fixed in hostapd somewhat recently but older
versions will fail this test. Document that so we don't get
confused when running tests against older hostapd versions.
Disable power save if the wiphy indicates its needed. Do this
before issuing GET_LINK so the netdev doesn't signal its up until
power save is disabled.
This allows generating code and test coverage reports using lcov &
genhtml. Useful for understanding how much of the codebase is currently
covered by unit and autotests.
Certain drivers do not handle power save very well resulting in
missed frames, firmware crashes, or other bad behavior. Its easy
enough to disable power save via iw, iwconfig, etc but since IWD
removes and creates the interface on startup it blows away any
previous power save setting. The setting must be done *after* IWD
creates the interface which can be done, but needs to be via some
external daemon monitoring IWD's state. For minimal systems,
e.g. without NetworkManager, it becomes difficult and annoying to
persistently disable power save.
For this reason a new driver flag POWER_SAVE_DISABLE is being
added. This can then be referenced when creating the interfaces
and if set, disable power save.
The driver_infos list in wiphy.c is hard coded and, naturally,
not configurable from a user perspective. As drivers are updated
or added users may be left with their system being broken until the
driver is added, IWD released, and packaged.
This adds the ability to define driver flags inside main.conf under
the "DriverQuirks" group. Keys in this group correspond to values in
enum driver_flag and values are a list of glob matches for specific
drivers:
[DriverQuirks]
DefaultInterface=rtl81*,rtl87*,rtl88*,rtw_*,brcmfmac,bcmsdh_sdmmc
ForcePae=buggy_pae_*
Rather than keep a pointer to the driver_info entry copy the flags
into the wiphy object. This preps for supporting driver flags via
a configuration file, specifically allowing for entries that are a
subset of others. For example:
{ "rtl88*", DEFAULT_IF },
{ "rtl88x2bu", FORCE_PAE },
Before it was not possible to add entires like this since only the
last entry match would get set. Now DEFAULT_IF would get set to all
matches, and FORCE_PAE to only rtl88x2bu. This isn't especially
important for the static list since it could be modified to work
correctly, but will be needed when parsing flags from a
configuration file that may contain duplicates or subsets of the
static list.
If there was some problem during the FT authenticate stage
its nice to know more of what happened: whether the AP didn't
respond, rejected the attempt, or sent an invalid frame/IEs.
In some situations its convenient for the same work item to be
inserted (rescheduled) while its in progress. FT for example does
this now if a roam fails. The same ft_work item gets re-inserted
which, currently, is not safe to do since the item is modified
and removed once completed.
Fix this by introducing wiphy_radio_work_reschedule which is an
explicit API for re-inserting work items from within the do_work
callback.
The wiphy work logic was changed around slightly to remove the item
at the head of the queue prior to starting and note the ID going
into do_work. If do_work signaled done and ID changed we know it
was re-inserted and can skip the destroy logic and move onto the
next item. If the item is not done continue as normal but set the
priority to INT_MIN, as usual, to prevent other items from getting
to the head of the queue.
This adds another radio so IWD hits the FT failure path after
authentication to the first BSS fails. This causes a wiphy work
item to be rescheduled which previously was unsafe.
If IWD connects under bad RF conditions and netconfig takes
a while to complete (e.g. slow DHCP), the roam timeout
could fire before DHCP is done. Then, after the roam,
IWD would transition automatically to connected before
DHCP was finished. In theory DHCP could still complete after
this point but any process depending on IWD's connected
state would be uninformed and assume IP networking is up.
Fix this by stopping netconfig prior to a roam if IWD is not
in a connected state. Then, once the roam either failed or
succeeded, start netconfig again.
iwctl quit (running quit non-interactively) isn't a useful command,
but it shouldn't segfault. Let's avoid calling readline functions if
we haven't initialized readline in this run.
When acting as a configurator the enrollee can start on a different
channel than IWD is connected to. IWD will begin the auth process
on this channel but tell the enrollee to transition to the current
channel after the auth request. Since a configurator must be
connected (a requirement IWD enforces) we can assume a channel
transition will always be to the currently connected channel. This
allows us to simply cancel the offchannel request and wait for a
response (rather than start another offchannel).
Doing this improves the DPP performance and reduces the potential
for a lost frame during the channel transition.
This patch also addresses the comment that we should wait for the
auth request ACK before canceling the offchannel. Now a flag is
set and IWD will cancel the offchannel once the ACK is received.
If IWD gets a disconnect during FT the roaming state will be
cleared, as well as any ft_info's during ft_clear_authentications.
This includes canceling the offchannel operation which also
destroys any pending ft_info's if !info->parsed. This causes a
double free afterwards. In addition the l_queue_remove inside the
foreach callback is not a safe operation either.
To fix this don't remove the ft_info inside the offchannel
destroy callback. The info will get freed by ft_associate regardless
of the outcome (parsed or !parsed). This is also consistent with
how the onchannel logic works.
Log and crash backtrace below:
iwd[488]: src/station.c:station_try_next_transition() 5, target aa:46:8d:37:7c:87
iwd[488]: src/wiphy.c:wiphy_radio_work_insert() Inserting work item 16668
iwd[488]: src/wiphy.c:wiphy_radio_work_insert() Inserting work item 16669
iwd[488]: src/wiphy.c:wiphy_radio_work_done() Work item 16667 done
iwd[488]: src/wiphy.c:wiphy_radio_work_next() Starting work item 16668
iwd[488]: src/netdev.c:netdev_mlme_notify() MLME notification Remain on Channel(55)
iwd[488]: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
iwd[488]: src/netdev.c:netdev_link_notify() event 16 on ifindex 5
iwd[488]: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
iwd[488]: src/netdev.c:netdev_deauthenticate_event()
iwd[488]: src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
iwd[488]: src/netdev.c:netdev_disconnect_event()
iwd[488]: Received Deauthentication event, reason: 6, from_ap: true
iwd[488]: src/station.c:station_disconnect_event() 5
iwd[488]: src/station.c:station_disassociated() 5
iwd[488]: src/station.c:station_reset_connection_state() 5
iwd[488]: src/station.c:station_roam_state_clear() 5
iwd[488]: double free or corruption (fasttop)
5 0x0000555b3dbf44a4 in ft_info_destroy ()
6 0x0000555b3dbf45b3 in remove_ifindex ()
7 0x0000555b3dc4653c in l_queue_foreach_remove ()
8 0x0000555b3dbd0dd1 in station_reset_connection_state ()
9 0x0000555b3dbd37e5 in station_disassociated ()
10 0x0000555b3dbc8bb8 in netdev_mlme_notify ()
11 0x0000555b3dc4e80b in received_data ()
12 0x0000555b3dc4b430 in io_callback ()
13 0x0000555b3dc4a5ed in l_main_iterate ()
14 0x0000555b3dc4a6bc in l_main_run ()
15 0x0000555b3dc4a8e0 in l_main_run_with_signal ()
16 0x0000555b3dbbe888 in main ()
Hostapd commit bc36991791 now properly sets the secure bit on
message 1/4. This was addressed in an earlier IWD commit but
neglected to allow for backwards compatibility. The check is
fatal which now breaks earlier hostapd version (older than 2.10).
Instead warn on this condition rather than reject the rekey.
Fixes: 7fad6590bd ("eapol: allow 'secure' to be set on rekeys")
After adding prefix matching the rule structure contained allocated
memory which was not being cleaned up on exit if rules still
remained in the list (removing the rule via DBus was done correctly)
The following commit:
80db8fd86c0c ("build: Use -Wvariadic-macros warning")
added a warning about variadic-macros. But it isn't quite clear why
since variadic macros are used throghout iwd. GCC doesn't honor this
option, but clang does. Since there's no real reason to stop using
variadic macros at this time, drop this warning.
The HT40+/- flags were reversed when checking against the 802.11
behavior flags.
HT40+ means the secondary channel is above (+) the primary channel
therefore corresponds to the PRIMARY_CHANNEL_LOWER behavior. And
the opposite for HT40-.
Reported-By: Alagu Sankar <alagusankar@gmail.com>
Use a more appropriate printf conversion string in order to avoid
unnecessary implicit conversion which can lead to a buffer overflow.
Reasons similar to commit:
98b758f8934a ("knownnetworks: fix printing SSID in hex")
In the case that the FT target is on the same channel as we're currently
operating on, use ft_authenticate_onchannel instead of ft_authenticate.
Going offchannel in this case can confuse some drivers.
Currently when we try FT-over-Air, the Authenticate frame is always
sent via offchannel infrastructure We request the driver to go
offchannel, then send the Authenticate frame. This works fine as long
as the target AP is on a different channel. On some networks some (or
all) APs might actually be located on the same channel. In this case
going offchannel will result in some drivers not actually sending the
Authenticate frame until after the offchannel operation completes.
Work around this by introducing a new ft_authenticate variant that will
not request an offchannel operation first.
Force conversion to unsigned char before printing to avoid sign
extension when printing SSID in hex. For example, if there are CJK
characters in SSID, it will generate a very long string like
/net/connman/iwd/ffffffe8ffffffaeffffffa1.
If a very long ssid was used (e.g. CJK characters in SSID), it might do
out of bounds write to static variable for lack of checking the position
before the last snprintf() call.
Seeing that some authenticators can't handle TLS session caching
properly, allow the EAP-TLS-based methods session caching support to be
disabled per-network using a method specific FastReauthentication setting.
Defaults to true.
With the previous commit, authentication should succeed at least every
other attempt. I'd also expect that EAP-TLS is not usually affected
because there's no phase2, unlike with EAP-PEAP/EAP-TTLS.
If we have a TLS session cached from this attempt or a previous
successful connection attempt but the overall EAP method fails, forget
the session to improve the chances that authentication succeeds on the
next attempt considering that some authenticators strangely allow
resumption but can't handle it all the way to EAP method success.
Logically the session resumption in the TLS layers on the server should
be transparent to the EAP layers so I guess those may be failed
attempts to further optimise phase 2 when the server thinks it can
already trust the client.
The extra IE length for the WMM IE was being set to 26 which is
the HT IE length, not WMM. Fix this and use the proper size for
the WMM IE of 50 bytes.
This shouldn't have caused any problems prior as the tail length
is always allocated with 256 or 512 extra bytes of headroom.
Since channels numbers are used as indexes into the array, and given
that channel numbers start at '1' instead of 0, make sure to allocate a
buffer large enough to not overflow when the max channel number for a
given band is accessed.
src/manager.c:manager_wiphy_dump_callback() New wiphy phy1 added (1)
==22290== Invalid write of size 2
==22290== at 0x4624B2: nl80211_parse_supported_frequencies (nl80211util.c:570)
==22290== by 0x417CA5: parse_supported_bands (wiphy.c:1636)
==22290== by 0x418594: wiphy_parse_attributes (wiphy.c:1805)
==22290== by 0x418E20: wiphy_update_from_genl (wiphy.c:1991)
==22290== by 0x464589: manager_wiphy_dump_callback (manager.c:564)
==22290== by 0x4CBDDA: process_unicast (genl.c:944)
==22290== by 0x4CC19C: received_data (genl.c:1056)
==22290== by 0x4C7140: io_callback (io.c:120)
==22290== by 0x4C5A97: l_main_iterate (main.c:476)
==22290== by 0x4C5BDC: l_main_run (main.c:523)
==22290== by 0x4C5F0F: l_main_run_with_signal (main.c:645)
==22290== by 0x40503B: main (main.c:600)
==22290== Address 0x4aa76ec is 0 bytes after a block of size 28 alloc'd
==22290== at 0x48417B5: malloc (vg_replace_malloc.c:393)
==22290== by 0x4BC4D1: l_malloc (util.c:62)
==22290== by 0x417BE4: parse_supported_bands (wiphy.c:1619)
==22290== by 0x418594: wiphy_parse_attributes (wiphy.c:1805)
==22290== by 0x418E20: wiphy_update_from_genl (wiphy.c:1991)
==22290== by 0x464589: manager_wiphy_dump_callback (manager.c:564)
==22290== by 0x4CBDDA: process_unicast (genl.c:944)
==22290== by 0x4CC19C: received_data (genl.c:1056)
==22290== by 0x4C7140: io_callback (io.c:120)
==22290== by 0x4C5A97: l_main_iterate (main.c:476)
==22290== by 0x4C5BDC: l_main_run (main.c:523)
==22290== by 0x4C5F0F: l_main_run_with_signal (main.c:645)
==22290==
This adds support for rekeys to AP mode. A single timer is used and
reset to the next station needing a rekey. A default rekey timer of
600 seconds is used unless the profile sets a timeout.
The only changes required was to set the secure bit for message 1,
reset the frame retry counter, and change the 2/4 verifier to use
the rekey flag rather than ptk_complete. This is because we must
set ptk_complete false in order to detect retransmissions of the
4/4 frame.
Initiating a rekey can now be done by simply calling eapol_start().
If IWD ends up dumping wiphy's twice (because of NEW_WIPHY event
soon after initial dump) it will also try and dump interfaces
twice leading to multiple DEL_INTERFACE calls. The second attempt
will fail with -ENODEV (since the interface was already deleted).
Just silently fail with this case and let the other DEL_INTERFACE
path handle the re-creation.
With really badly timed events a wiphy can be registered twice. This
happens when IWD starts and requests a wiphy dump. Immediately after
a NEW_WIPHY event comes in (presumably when the driver loads) which
starts another dump. The NEW_WIPHY event can't simply be ignored
since it could be a hotplug (e.g. USB card) so to fix this we can
instead just prevent it from being registered.
This does mean both dumps will happen but the information will just
be added to the same wiphy object.
Past commits should address any potential problems of the timer
firing during FT, but its still good practice to cancel the timer
once it is no longer needed, i.e. once FT has started.
If station has already started FT ensure station_cannot_roam takes
that into account. Since the state has not yet changed it must also
check if the FT work ID is set.
Under the following conditions IWD can accidentally trigger a second
roam scan while one is already in progress:
- A low RSSI condition is met. This starts the roam rearm timer.
- A packet loss condition is met, which triggers a roam scan.
- The roam rearm timer fires and starts another roam scan while
also overwriting the first roam scan ID.
- Then, if IWD gets disconnected the overwritten roam scan gets
canceled, and the roam state is cleared which NULL's
station->connected_network.
- The initial roam scan results then come in with the assumption
that IWD is still connected which results in a crash trying to
reference station->connected_network.
This can be fixed by adding a station_cannot_roam check in the rearm
timer. If IWD is already doing a roam scan station->preparing_roam
should be set which will cause it to return true and stop any further
action.
Aborting (signal 11) [/usr/libexec/iwd]
iwd[426]: ++++++++ backtrace ++++++++
iwd[426]: #0 0x7f858d7b2090 in /lib/x86_64-linux-gnu/libc.so.6
iwd[426]: #1 0x443df7 in network_get_security() at ome/locus/workspace/iwd/src/network.c:287
iwd[426]: #2 0x421fbb in station_roam_scan_notify() at ome/locus/workspace/iwd/src/station.c:2516
iwd[426]: #3 0x43ebc1 in scan_finished() at ome/locus/workspace/iwd/src/scan.c:1861
iwd[426]: #4 0x43ecf2 in get_scan_done() at ome/locus/workspace/iwd/src/scan.c:1891
iwd[426]: #5 0x4cbfe9 in destroy_request() at ome/locus/workspace/iwd/ell/genl.c:676
iwd[426]: #6 0x4cc98b in process_unicast() at ome/locus/workspace/iwd/ell/genl.c:954
iwd[426]: #7 0x4ccd28 in received_data() at ome/locus/workspace/iwd/ell/genl.c:1052
iwd[426]: #8 0x4c79c9 in io_callback() at ome/locus/workspace/iwd/ell/io.c:120
iwd[426]: #9 0x4c62e3 in l_main_iterate() at ome/locus/workspace/iwd/ell/main.c:476
iwd[426]: #10 0x4c6426 in l_main_run() at ome/locus/workspace/iwd/ell/main.c:519
iwd[426]: #11 0x4c6752 in l_main_run_with_signal() at ome/locus/workspace/iwd/ell/main.c:645
iwd[426]: #12 0x405987 in main() at ome/locus/workspace/iwd/src/main.c:600
iwd[426]: #13 0x7f858d793083 in /lib/x86_64-linux-gnu/libc.so.6
iwd[426]: +++++++++++++++++++++++++++
If the authenticator has already set an snonce then the packet must
be a retransmit. Handle this by sending 3/4 again but making sure
to not reset the frame counter.
Old wpa_supplicant versions do not set the secure bit on 2/4 during
rekeys which causes IWD to reject the message and eventually time out.
Modern versions do set it correctly but even Android 13 (Pixel 5a)
still uses an ancient version of wpa_supplicant which does not set the
bit.
Relax this check and instead just print a warning but allow the message
to be processed.
In try_handshake_complete() we return early if all the keys had
been installed before (initial associations). For rekeys we can
now emit the REKEY_COMPLETE event which lets AP mode reset the
rekey timer for that station.
When the TK is installed the 'ptk_installed' flag was never set to
zero. For initial associations this was fine (already zero) but for
rekeys the flag needs to be unset so try_handshake_complete knows
if the key was installed. This is consistent with how gtk/igtk keys
work as well.
Rekeys for station mode don't need to know when complete since
there is nothing to do once done. AP mode on the other hand needs
to know if the rekey was successful in order to reset/set the next
rekey timer.
The second handshake message was hard coded with the secure bit as
zero but for rekeys the secure bit should be set to 1. Fix this by
changing the 2/4 builder to take a boolean which will set the bit
properly.
It should be noted that hostapd doesn't check this bit so EAPoL
worked just fine, but IWD's checks are more strict.
The PEAP RFC wants implementations to enforce that Phase2 methods have
been successfully completed prior to accepting a successful result TLV.
However, when TLS session resumption is used, some servers will skip
phase2 methods entirely and simply send a Result TLV with a success
code. This results in iwd (erroneously) rejecting the authentication
attempt.
Fix this by marking phase2 method as successful if session resumption is
being used.
This adds a builder which sets the country IE in probes/beacons.
The IE will use the 'single subband triplet sequence' meaning
dot11OperatingClassesRequired is false. This is much easier to
build and doesn't require knowing an operating class.
The IE itself is variable in length and potentially could grow
large if the hardware has a weird configuration (many different
power levels or segmentation in supported channels) so the
overall builder was changed to take the length of the buffer and
warnings will be printed if any space issues are encountered.
IWD's channel/frequency conversions use simple math to convert and
have very minimal checks to ensure the input is valid. This can
lead to some channels/frequencies being calculated which are not
in IWD's E-4 table, specifically in the 5GHz band.
This is especially noticable using mac80211_hwsim which includes
some obscure high 5ghz frequencies which are not part of the 802.11
spec.
To fix this calculate the frequency or channel then iterate E-4
operating classes to check that the value actually matches a class.
The 6GHz test was not incrementing the frequencies properly which
was resulting in invalid frequencies, but since the conversion
API was never linked to E-4 the test was still passing.
The country IE can sometimes have a zero pad byte at the end for
alignment. This was not being checked for which caused the loop
to go past the end of the IE and print an entry for channel 0
(the pad byte) plus some garbage data.
Fix this by checking for the pad byte explicitly which skips the
print and terminates the loop.
If supported this will include the HT capabilities and HT
operations elements in beacons/probes. Some shortcuts were taken
here since not all the information is currently parsed from the
hardware. Namely the HT operation element does not include the
basic MCS set. Still, this will at least show stations that the
AP is capable of more than just basic rates.
The builders themselves are structured similar to the basic rates
builder where they build only the contents and return the length.
The caller must set the type/length manually. This is to support
the two use cases of using with an IE builder vs direct pointer.
To include HT support a chandef needs to be created for whatever
frequency is being used. This allows IWD to provide a secondary
channel to the kernel in the case of 40MHz operation. Now the AP
will generate a chandef when starting based on the channel set
in the user profile (or default).
If HT is not supported the chandef width is set to 20MHz no-HT,
otherwise band_freq_to_ht_chandef is used.
The WMM parameter IE is expected by the linux kernel for any AP
supporting HT/VHT etc. IWD won't actually use WMM and its not
clear exactly why the kernel uses this restriction, but regardless
it must be included to support HT.
For AP mode its convenient for IWD to choose an appropriate
channel definition rather than require the user provide very
low level parameters such as channel width, center1 frequency
etc. For now only HT is supported as VHT/HE etc. require
additional secondary channel frequencies.
The HT API tries to find an operating class using 40Mhz which
complies with any hardware restrictions. If an operating class is
found that is supported/not restricted it is marked as 'best' until
a better one is found. In this case 'better' is a larger channel
width. Since this is HT only 20mhz and 40mhz widths are checked.
This adds some additional parsing to obtain the AMPDU parameter
byte as well as wiphy_get_ht_capabilities() which returns the
complete IE (combining the 3 separate kernel attributes).
The supported rates IE was being built in two places. This makes that
code common. Unfortunately it needs to support both an ie builder
and using a pointer directly which is why it only builds the contents
of the IE and the caller must set the type/length.
Move the l_netconfig_set_route_priority() and
l_netconfig_set_optimistic_dad_enabled() calls from netconfig_new, which
is called once for the l_netconfig object's lifetime, to
netconfig_load_settings, which is called before every connection attempt.
This is needed because we clean up the l_netconfig configuration by calling
l_netconfig_reset_config() at different points in connection setup and
teardown so we'd reset the route priority that we've set in netconfig_new,
back to 0 and never reload it.
The disabled_freqs list is being removed and replaced with a new
list in the band object. This completely removes the need for
the pending_freqs list as well since any regdom related dumps
can just overwrite the existing frequency list.
This adds two new APIs:
wiphy_get_frequency_info(): Used to get information about a given
frequency such as disabled/no-IR. This can also be used to check
if the frequency is supported (NULL return is unsupported).
wiphy_band_is_disabled(): Checks if a band is disabled. Note that
an unsupported band will also return true. Checking support should
be done with wiphy_get_supported_bands()
As additional frequency info is needed it doesn't make sense to
store a full list of frequencies for every attribute (i.e.
supported, disabled, no-IR, etc).
This changes nl80211_parse_supported_frequencies to take a list
of frequency attributes where each index corresponds to a channel,
and each value can be filled with flag bits to signal any
limitations on that frequency.
wiphy.c then had to be updated to use this rather than the existing
scan_freq_set lists. This, as-is, will break anything using
wiphy_get_disabled_freqs().
Currently the wiphy object keeps track of supported and disabled
frequencies as two separate scan_freq_set's. This is very expensive
and limiting since we have to add more sets in order to track
additional frequency flags (no-IR, no-HT, no-HE etc).
Instead we can refactor how frequencies are stored. They will now
be part of the band object and stored as a list of flag structures
where each index corresponds to a channel
IWD was optimizing FT-over-DS by authenticating to multiple BSS's
at the time of connecting which then made future roams slightly
faster since they could jump right into association. So far this
hasn't posed a problem but it was reported that some AP's actually
enforce a reassociation timeout (included in 4-way handshake).
Hostapd itself does no such enforcement but anything external to
hostapd could monitor FT events and clear the cache if any exceeded
this timeout.
For now remove the early action frames and treat FT-over-DS the
same as FT-over-Air. In the future we could parse the reassociation
timeout, batch out FT-Action frames and track responses but for the
time being this just fix the issue at a small performance cost.
Queue the FT action just like we do with FT Authenticate which makes
it able to be used the same way, i.e. call ft_action() then queue
the ft_associate work right away.
A timer was added to end the work item in case the target never
responds.
If the regdom updates during a periodic scan the results will be
delayed until after the update in order to, potentially, add 6GHz
frequencies since they may become available. The delayed results
happen regardless of 6GHz support but scan_wiphy_watch() was
returning early if 6GHz was not supported causing the scan request
to never complete.
The blamed commit argues that the periodic scan callback doesn't do
anything useful in the event of an aborted scan, but this is not
entirely true. In particular, the callback is responsible for re-arming
the periodic scan timer. Make sure to call scan_finished() so that iwd's
periodic scanning logic continues unabated even when a periodic scan is
aborted.
Also remove the periodic boolean member of struct scan_request, as it
serves no purpose anymore.
Fixes: 6051a1495227 ("scan: Don't callback on SCAN_ABORTED")
This enables IWD to use 5GHz frequencies in AP mode. Currently
6GHz is not supported so we can assume a [General].Channel value
36 or above indicates the 5GHz band.
It should be noted that the system will probably need a regulatory
domain set in order for 5GHz to be allowed in AP mode. This is due
to world roaming (00) restricting any/all 5GHz frequencies. This
can be accomplished by setting main.conf [General].Country=CC to
the country this AP will operate in.
wiphy_get_supported_rates expected an enum defined in the nl80211
header but the argument type was an unsigned int, not exactly
intuitive to anyone using the API. Since the nl80211 enum value
was only used in a switch statement it could just as well be IWD's
internal enum band_freq.
This also allows modules which do not reference nl80211.h to use
wiphy_get_supported_rates().
Change wording to say that IPv6 support is enabled by default. No
functional changes.
Fixes: 00baa75e9633 ("netconfig: Enable IPV6 support by default")
Before this change, I noticed that some non-interactive commands
don't work,
$ iwctl version
$ iwctl help
while other ones do.
$ iwctl station wlan0 show
This seems to be a typo bug in the if clause checking for additional
arguments.
This file is a compilation command database used by clangd and ccls,
and can be generated by tools like https://github.com/rizsotto/Bear.
$ bear -- make clean all
If a CMD_TRIGGER_SCAN request fails with -EBUSY, iwd currently assumes
that a scan is ongoing on the underlying wdev and will retry the same
command when that scan is complete. It gets notified of that completion
via the scan_notify() function, and kicks the scan logic to try again.
However, if there is another wdev on the same wiphy and that wdev has a
scan request in flight, the kernel will also return -EBUSY. In other
words, only one scan request per wiphy is permitted.
As an example, the brcmfmac driver can create an AP interface on the
same wiphy as the default station interface, and scans can be triggered
on that AP interface.
If -EBUSY is returned because another wdev is scanning, then iwd won't
know when it can retry the original trigger request because the relevant
netlink event will arrive on a different wdev. Indeed, if no scan
context exists for that other wdev, then scan_notify will return early
and the scan logic will stall indefinitely.
Instead, and in the event that no scan context matches, use it as a cue
to retry a pending scan request that happens to be destined for the same
wiphy.
The previous commit added an invocation of known_networks_watch_add, but
never updated the module dependency graph.
Fixes: a793a41662b2 ("station, eapol: Set up eap-tls-common for session caching")
Use eap_set_peer_id() to set a string identifying the TLS server,
currently the hex-encoded SSID of the network, to be used as group name
and primary key in the session cache l_settings object. Provide pointers
to storage_eap_tls_cache_{load,sync} to eap-tls-common.c using
eap_tls_set_session_cache_ops(). Listen to Known Network removed
signals and call eap_tls_forget_peer() to have any session related to
the network also dropped from the cache.
Use l_tls_set_session_cache() to enable session cache/resume in the
TLS-based EAP methods. Sessions for all 802.1x networks are stored in
one l_settings object.
eap_{get,set}_peer_id() API is added for the upper layers to set the
identifier of the authenticator (or the supplicant if we're the
authenticator, if there's ever a use case for that.)
eap-tls-common.c can't call storage_eap_tls_cache_{load,sync}()
or known_networks_watch_add() (to handle known network removals) because
it's linked into some executables that don't have storage.o,
knownnetworks.o or common.o so an upper layer (station.c) will call
eap_tls_set_session_cache_ops() and eap_tls_forget_peer() as needed.
Minor changes to these two methods resulting from two rewrites of them.
Actual changes are:
* storage_tls_session_sync parameter is const,
* more specific naming,
* storage_tls_session_load will return an empty l_settings instead of
NULL so eap-tls-common.c doesn't have to handle this.
storage.c makes no assumptions about the group names in the l_settings
object and keeps no reference to that object, eap-tls-common.c is going
to maintain the memory copy of the cache since this cache and the disk
copy of it are reserved for EAP methods only.
A comma separated list as a string was ok for pure display purposes
but if any processing needed to be done on these values by external
consumers it really makes more sense to use a DBus array.
AP mode implements a few DBus methods/properties which are named
the same as station: Scan, Scanning, and GetOrderedNetworks. Allow
the Device object to work with these in AP mode by calling the
correct method if the Mode is 'ap'.
This wasn't being updated meaning the property is missing until a
scan is issued over DBus.
Rather than duplicate all the property changed calls they were all
factored out into a helper function.
Adds the MulticastDNS option globally to main.conf. If set all
network connections (when netconfig is enabled) will set mDNS
support into the resolver. Note that an individual network profile
can still override the global value if it sets MulticastDNS.
The AP mode device APIs were hacked together and only able to start
stop an AP. Now that the AP interface has more functionality its
best to use the DBus class template to access the full AP interface
capabilities.
The limitation of cipher selection in ap.c was done so to allow p2p to
work. Now with the ability to specify ciphers in the AP config put the
burden on p2p to limit ciphers as it needs which is only CCMP according
to the spec.
These can now be optionally provided in an AP profile and provide a
way to limit what ciphers can be chosen. This still is dependent on
what the hardware supports.
The validation of these ciphers for station is done when parsing
the BSS RSNE but for AP mode there is no such validation and
potentially any supported cipher could be chosen, even if its
incompatible for the type of key.
This change removes duplicate calls to display_table_footer(), in
station show.
Before this change, the bug caused an extra newline to be output every
time the table updated. This only occurred when the network was
disconnected.
$ iwctl
[iwd]# station wlan0 show
The netdev_copy_tk function was being hard coded with authenticator
set to false. This isn't important for any ciphers except TKIP but
now that AP mode supports TKIP it needs to be fixed.
The disabled cipher list contained a '.' instead of ',' which prevented
the subsequent ciphers from being disabled. This was only group management
ciphers so it didn't have any effect on the test.
The -F option is undocumented but allows you to pass a nl80211
family ID so iwmon doesn't ignore messages which don't match the
systems nl80211 family ID (i.e. pcaps from other systems).
This is somewhat of a pain to use since its unclear what the other
system's family ID actually is until you run it though something
like wireshark. Instead iwmon can ignore the family ID when in
read mode which makes reading other systems pcap files automatic.
Expand nlmon_create to be useful for both pcaps and monitoring. Doing
this also lets iwmon filter pcaps based on --no-ies,rtnl,scan etc
flags since they are part of the config.
Though TKIP is deprecated and insecure its trivial to support it in
AP mode as we already do in station. This is only to allow AP mode
for old hardware that may only support TKIP. If the hardware supports
any higher level cipher that will be chosen automatically.
The __str__ function assumed station mode which throws an exception
if the device is in AP mode. Fix this as well as print out the mode
the device is in.
This API optimizes scanning to run tests quickly by only scanning
the frequencies which hostapd is using. But if a test doesn't use
hostapd this API raises an uncaught exception.
Check if hostapd is being used, and if not just do a full scan.
The key descriptor version was hard coded to HMAC_SHA1_AES which
is correct when using IE_RSN_AKM_SUITE_PSK + CCMP. ap.c hard
codes the PSK AKM but still uses wiphy to select the cipher. In
theory there could be hardware that only supports TKIP which
would then make IWD non-compliant since a different key descriptor
version should be used with PSK + TKIP (HMAC_MD5_ARC4).
Now use a helper to sort out which key descriptor should be used
given the AKM and cipher suite.
Similarly to l_netconfig track whether IWD's netconfig is active (from
the moment of netconfig_configure() till netconfig_reset()) using a
"started" flag and avoid handling or emitting any events after "started"
is cleared.
This fixes an occasional issue with the Netconfig Agent backend where
station would reset netconfig, netconfig would issue DBus calls to clear
addresses and routes, station would go into DISCONNECTING, perhaps
finish and go into DISCONNECTED and after a while the DBus calls would
come back with an error which would cause a NETCONFIG_EVENT_FAILED
causing station to call netdev_disconnct() for a second time and
transition to and get stuck in DISCONNECTING.
Add an additional optional PairwiseCipher property on
net.connman.iwd.StationDiagnostic interface that will hold the current
pairwise cipher in use for the connection.
Both CMD_ASSOCIATE and CMD_CONNECT paths were using very similar code to
build RSN specific attributes. Use a common function to build these
attributes to cut down on duplicated code.
While here, also start using ie_rsn_cipher_suite_to_cipher instead of
assuming that the pairwise / group ciphers can only be CCMP or TKIP.
Instead of copy-pasting the same basic operation (memcpy & assignment),
use a goto and a common path instead. This should also make it easier
for the compiler to optimize this function.
Commit c7640f8346 was meant to fix a sign compare warning
in clang because NLMSG_NEXT internally compares the length
with nlmsghdr->nlmsg_len which is a u32. The problem is the
NLMSG_NEXT can underflow an unsigned value, hence why it
expects an int type to be passed in.
To work around this we can instead pass a larger sized
int64_t which the compiler allows since it can upgrade the
unsigned nlmsghdr->nlmsg_len. There is no underflow risk
with an int64_t either because the buffer used is much
smaller than what can fit in an int64_t.
Fixes: c7640f8346 ("monitor: fix integer comparison error (clang)")
In nearly all cases the auto-line breaks can be on spaces in the
string. The only exception (so far) is DPP which displays a very
long URI without any spaces. When this is displayed a single
character was lost on each break. This was due to the current line
being NULL terminated prior to the next line string being returned.
To handle both cases the next line is copied prior to terminating
the current line. The offsets were modified slightly so the line
will be broken *after* the space, not on the space. This leaves
the space at the end of the current line which is invisible to the
user but allows the no-space case to also work without loss of
the last character (since that last character is where the space
normally would be).
Each color escape is tracked and the new_width is adjusted
accordingly. But if the color escape comes after a space which breaks
the line, the adjusted width ends up being too long since that escape
sequence isn't appearing on the current line. This causes the next
column to be shifted over.
The old 'max' parameter was being used both as an input and output
parameter which was confusing. Instead have next_line take the
column width as input, and output a new width which includes any
color escapes and wide characters.
In theory any input to this function should have valid utf-8 but
just in case the strings should be validated. This removes the
need to check the return of l_utf8_get_codepoint which is useful
since there is no graceful failure path at this point.
The known frequency list may include frequencies that once were
allowed but are now disabled due to regulatory restrictions. Don't
include these frequencies in the roam scan.
The utf-8 bytes were being counted as normal ascii so the
width maximum was not being increased to include
non-printable bytes like it is for color escape sequences.
This lead to the row not printing enough characters which
effected the text further down the line.
Fix this by increasing 'max' when non-codepoint utf-8
characters are found.
This test was taking about 5 minutes to run, specifically
the requested scan test. One slight optimization is to
remove the duplicate hidden network, since there is no
need for two. In addition the requested scan test was
changed so it does not periodic scan and only issues a dbus
scan.
The CI was sometimes taking ~10-15 minutes to run just this
test. This is likely due to the test having 7 radios and
which is a lot of beacons/probes to process.
Disabling the unused hostapd instances drops the runtime down
to about 1 minute.
If a rule was disabled it would cause hwsim to not continue processing
frames using rules further in the queue. _Most_ tests only use one
rule so this shouldn't have changed their behavior but others which
use multiple rules may be effected and the tests have not been
running properly.
These events are sent if IWD fails to authentiate
(ft-over-air-roam-failed) or if it falls back to over air after
failing to use FT-over-DS (try-ft-over-air)
If IPv4 setup fails and the netconfig logic gives up, continue as if the
connection had failed at earlier stages so that autoconnect can try the
next available network.
Certain drivers support/require probe response offloading which
IWD did not check for or properly handle. If probe response
offloading is required the probe response frame watch will not
be added and instead the ATTR_PROBE_RESP will be included with
START_AP.
The head/tail builders were reused but slightly modified to check
if the probe request frame is NULL, since it will be for use with
START_AP.
Parse the AP probe response offload attribute during the dump. If
set this indicates the driver expects the probe response attribute
to be included with START_AP.
Clearing all authentications during ft_authenticate was a very large
hammer and may remove cached authentications that could be used if
the current auth attempt fails.
For example the best BSS may have a problem and fail to authenticate
early with FT-over-DS, then fail with FT-over-Air. But another BSS
may have succeeded early with FT-over-DS. If ft_authenticate clears
all ft_infos that successful authentication will be lost.
This tests the new behavior where the roam request does not
indicate disassociation is imminent. In this case if no
candidates are found IWD should not roam.
Instead of requiring the initial condition be met when calling
wait_for_object_change, wait for it.
This is how every caller of this function uses it, specifically
with roaming where we first wait for DeviceState.roaming, then
call wait_for_object_change. This can be simplified for the caller
so the initial condition is first waited for.
AP roaming was structured such that any AP roam request would
force IWD to roam (assuming BSS's were found in scan results).
This isn't always the best behavior since IWD may be connected
to the best BSS in range.
Only force a roam if the AP includes one of the 3 disassociation/
termination bits. Otherwise attempt to roam but don't set the
ap_directed_roaming flag which will allows IWD to stay with the
current BSS if no better candidates are found.
There are a few checks that can be done prior to parsing the
request, in addition the explicit check for preparing_roam was
removed since this is taken care of by station_cannot_roam().
Once offchannel completes we can check if the info structure was
parsed, indicating authentication succeeded. If not there is no
reason to keep it around since IWD will either try another BSS or
fail.
This both adds proper handling to the new roaming logic and fixes
a potential bug with firmware roams.
The new way roaming works doesn't use a connect callback. This
means that any disconnect event or call to netdev_connect_failed
will result in the event handler being called, where before the
connect callback would. This means we need to handle the ROAMING
state in the station disconnect event so IWD properly disassociates
and station goes out of ROAMING.
With firmware roams netdev gets an event which transitions station
into ROAMING. Then netdev issues GET_SCAN. During this time a
disconnect event could come in which would end up in
station_disconnect_event since there is no connect callback. This
needs to be handled the same and let IWD transition out of the
ROAMING state.
This finalizes the refactor by moving all the handshake prep
into FT itself (most was already in there). The netdev-specific
flags and state were added into netdev_ft_tx_associate which
now avoids any need for a netdev API related to FT.
The NETDEV_EVENT_FT_ROAMED event is now emitted once FT completes
(netdev_connect_ok). This did require moving the 'in_ft' flag
setting until after the keys are set into the kernel otherwise
netdev_connect_ok has no context as to if this was FT or some
other connection attempt.
In addition the prev_snonce was removed from netdev. Restoring
the snonce has no value once association begins. If association
fails it will result in a disconnect regardless which requires
a new snonce to be generated
This converts station to using ft_action/ft_authenticate and
ft_associate and dropping the use of the netdev-only/auth-proto
logic.
Doing this allows for more flexibility if FT fails by letting
IWD try another roam candidate instead of disconnecting.
Now the full action frame including the header is provided to ft
which breaks the existing parser since it assumes the buffer starts
at the body of the message.
This forwards Action, Authentication and Association frames to
ft.c via their new hooks in netdev.
Note that this will break FT-over-Air temporarily since the
auth-proto still is in use.
The current behavior is to only find the best roam candidate, which
generally is fine. But if for whatever reason IWD fails to roam it
would be nice having a few backup BSS's rather than having to
re-scan, or worse disassociate and reconnect entirely.
This patch doesn't change the roam behavior, just prepares for
using a roam candidate list. One difference though is any roam
candidates are added to station->bss_list, rather than just the
best BSS. This shouldn't effect any external behavior.
The candidate list is built based on scan_bss rank. First we establish
a base rank, the rank of the current BSS (or zero if AP roaming). Any
BSS in the results with a higher rank, excluding the current BSS, will
be added to the sorted station->roam_bss_list (as a new 'roam_bss'
entry) as well as stations overall BSS list. If the resulting list is
empty there were no better BSS's, otherwise station can now try to roam
starting with the best candidate (head of the roam list).
A new API was added, ft_authenticate, which will send an
authentication frame offchannel via CMD_FRAME. This bypasses
the kernel's authentication state allowing multiple auth
attempts to take place without disconnecting.
Currently netdev handles caching FT auth information and uses FT
parsers/auth-proto to manage the protocol. This sets up to remove
this state machine from netdev and isolate it into ft.c.
This does not break the existing auth-proto (hence the slight
modifications, which will be removed soon).
Eventually the auth-proto will be removed from FT entirely, replaced
just by an FT state machine, similar to how EAPoL works (netdev hooks
to TX/RX frames).
There may be situations (due to Multi-BSS operation) where an AP might
be advertising multiple SSIDs on the same BSSID. It is thus more
correct to lookup the preauthentication target on the network object
instead of the station bss_list. It used to be that the network list of
bsses was not updated when roam scan was performed. Hence the lookup
was always performed on the station bss_list. But this is no longer the
case, so it is safer to lookup on the network object directly on the
network.
The warnings in the authenticate and connect events were identical
so it could be difficult knowing which print it was if IWD is not
in debug mode (to see more context). The prints were changed to
indicate which event it was and for the connect event the reason
attribute is also parsed.
Note the resp_ies_len is also initialized to zero now. After making
the changes gcc was throwing a warning.
FT is special in that it really should not be interrupted. Since
FRAME/OFFCHANNEL have the highest priority we run the risk of
DPP or some other offchannel operation interfering with FT.
FT is now driven (mostly) by station which removes the connect
callback. Instead once FT is completed, keys set, etc. netdev
will send an event to notify station.
Since l_netconfig's DHCPv6 client instance no longer sets parameters on
the l_icmp6_client instance, call l_icmp6_client_set_nodelay() and
l_icmp6_client_set_debug() directly. Also enable optimistic DAD to
speed up IPv6 setup if available.
All uses of frame-xchg were for action frames, and the frame type
was hard coded. Soon other frame types will be needed so the type
must now be specified in the frame_xchg_prefix structure.
This will make the debug API more robust as well as fix issues
certain drivers have when trying to roam. Some of these drivers
may flush scan results after CMD_CONNECT which results in -ENOENT
when trying to roam with CMD_AUTHENTICATE unless you rescan
explicitly.
Now this will be taken care of automatically and station will first
scan for the BSS (or full scan if not already in results) and
attempt to roam once the BSS is seen in a fresh scan.
The logic to replace the old BSS object was factored out into its
own function to be shared by the non-debug roam scan. It was also
simplified to just update the network since this will remove the
old BSS if it exists.
Add a second netconfig-commit backend which, if enabled, doesn't
directly send any of the network configuration to the kernel or system
files but delegates the operation to an interested client's D-Bus
method as described in doc/agent-api.txt. This backend is switched to
when a client registers a netconfig agent object and is swiched away
from when the client disconnects or unregisters the agent. Only one
netconfig agent can be registered any given time.
Add netconfig_event_handler() that responds to events emitted by
the l_netconfig object by calling netconfig_commit, tracking whether
we're connected for either address family and emitting
NETCONFIG_EVENT_CONNECTED or NETCONFIG_EVENT_FAILED as necessary.
NETCONFIG_EVENT_FAILED is a new event as until now failures would cause
the netconfig state machine to stop but no event emitted so that
station.c could take action. As before, these events are only
emitted based on the IPv4 configuration state, not IPv6.
Add netconfig-commit.c whose main method, netconfig_commit actually sets
the configuration obtained by l_netconfig to the system netdev,
specifically it sets local addresses on the interface, adds routes to the
routing table, sets DNS related data and may add entries to the neighbor
cache. netconfig-commit.c uses a backend-ops type structure to allow
for switching backends. In this commit there's only a default backend
that uses l_netconfig_rtnl_apply() and a struct resolve object to write
the configuration.
netconfig_gateway_to_arp is moved from netconfig.c to netconfig-commit.c
(and renamed.) The struct netconfig definition is moved to netconfig.h
so that both files can access the settings stored in the struct.
To avoid repeated lookups by ifindex, replace the ifindex member in
struct netconfig with a struct netdev pointer. A struct netconfig
always lives shorter than the struct netdev.
* make the error handling simpler,
* make error messages more consistent,
* validate address families,
* for IPv4 skip l_rtnl_address_set_noprefixroute()
as l_netconfig will do this internally as needed.
* for IPv6 set the default prefix length to 64 as that's going to be
used for the local prefix route's prefix length and is a more
practical value.
Drop all the struct netconfig members where we were keeping the parsed
netconfig settings and add a struct l_netconfig object. In
netconfig_load_settings load all of the settings once parsed directly
into the l_netconfig object. Only preserve the mdns configuration and
save some boolean values needed to properly handle static configuration
and FILS. Update functions to use the new set of struct netconfig
members.
These booleans mirroring the l_netconfig state could be replaced by
adding l_netconfig getters for settings which currently only have
setters.
In anticipation of switching to use the l_netconfig API, which
internally handles DHCPv4, DHCPv6, ACD, etc., drop pointers to
instances of l_dhcp_client, l_dhcp6_client and l_acd from struct
netconfig. Also drop all code used for handling events from these
APIs, including code to commit the received configurations to the
system. Committing the final settings to the system netdevs is going to
be handled by a new set of utilities in a new file.
Update ConfigureIPv{4,6}() parameters to simplify mapping our sets of
addresses and routes directly to D-Bus dictionaries. Split Cancel()
into CancelIPv{4,6}().
The RRM module was blindly scanning using the requested
frequency which may or may not be possible given the hardware.
Instead check that the frequency will work and if not reject
the request.
This was reported by a user seeing the RRM scan fail which was
due to the AP requesting a scan on 5GHz when the adapter was
2.4GHz only.
The hostapd events for RRM come regardless of success
so they need to be checked as such. In addition one more
RRM request was added to scan on a frequency which is
disabled (operating class 82, channel 14).
Support for MAC address changes while powered was recently added to
mac80211. This avoids the need to power down the device which both
saves time as well as preserves any allowed frequencies which may
have been disabled if the device powered down.
The code path for changing the address was reused but now just the
'up' callback will be provided directly to l_rtnl_set_mac. Since
there aren't multiple stages of callbacks the rtnl_data structure
isn't strictly needed, but the code looks cleaner and more
consistent between the powered/non-powered code paths.
The comment/debug error print was also updated to be more general
between the two MAC change code paths.
Documentation for MulticastDNS setting suggests it should be part of the
main iwd configuration file. See man iwd.config. However, in reality
the setting was being pulled from the network provisioning file instead.
The latter actually makes more sense since systemd-resolved has its own
set of global defaults. Fix the documentation to reflect the actual
implementation.
Previously we had an ACD failure scenario where a new client forces its
IP to create an IP conflict and an already-connected client detects the
conflict and reacts. Now first test a scenario where a newly connecting
IWD client runs ACD before setting its statically configured IP, detects
a conflict and refuses to continue, then run the second scenario where
the newly connecting DHCP-configured client ignores the conflict and
starts ACD in defend-indefinitely mode and the older client in
defent-once mode gives up its IP.
Due to those variables being global (IWD class variables) calling either
unregister_psk_agent or del on one IWD class instance would unregister
all agents on all instances. Move .psk_agents and two other class
variables to the object. They were already referenced using "self."
as if they were object variables throughout the class.
Part of static_test.py starts a second IWD instance and tries to make
it connect to the AP with the same IP address as the first IWD instance
which is already connected, to produce an IP conflict. For this, the
second instance uses DHCP and the test expects the DHCP server to offer
the address 192.168.1.10 to it. However in the current setup the DHCP
server manages to detect that 192.168.1.10 is in use and offers .11
instead. Break the DHCP server's conflict detection by disabling ICMP
ping replies in order to fix the test.
Previously this has worked because the AP's and the DHCP server's
network interface is in the same network namespace as the first IWD
instance's network interface meaning that pings between the two
interfaces shouldn't work (a known Linux kernel routing quirk...).
I am not sure why those pings currently do work but take no chances and
disable ICMP pings.
netdev does not keep any pointers to struct scan_bss arguments that are
passed in. Make this explicitly clear by modifying the API definitions
and mark these as const.
This adds a few utilities for setting up an FT environment. All the
roaming tests basically copy/paste the same code for setting up the
hostapd instances and this can cause problems if not done correctly.
set_address() sets the MAC address on the device, and restarts hostapd
group_neighbors() takes a list of HostapdCLI objects and makes each a
neighbor to the others.
The neighbor report element requires the operating class which isn't
advertised by hostapd. For this we assume operating class 81 but this
can be set explicitly if it differs. Currently no roaming tests use
5/6GHz frequencies, and just in case an exception will be thrown if
the channel is greater than 14 and the op_class didn't change.
The packet loss test had a few problems. First being that the RSSI for
the original BSS was not low enough to change the rank. This meant any
roam was just lucky that the intended BSS was first in the results.
The second problem is timing related, and only happens on UML. Disabling
the rules after the roaming condition sometimes allows IWD to fully
roam and connect before the next state change checks.
A new test which blocks all data frames once connected, then tries
to send 100 packets. This should result in the kernel sending a
packet loss event to userspace, which IWD should react to and roam.
This adds a new netdev event for packet loss notifications from
the kernel. Depending on the scenario a station may see packet
loss events without any other indications like low RSSI. In these
cases IWD should still roam since there is no data flowing.
The limitations of readline required that the autocompletion choose
a 'default' device. With multiple phys this doesn't work. Now the
readline limitation has been worked around and station can look up
the device for the command completion.
There is a limitation of libreadline where no context/userdata
can be passed to completion functions. Thi affects iwctl since
the entity value isn't known to completion functions.
Workarounds such as getting the default device are employed but
its not a great solution.
Instead hack around this limitation by parsing the prompt to
extract the entity (second arg). Then use a generic match function
given to readline which can call the actual match function and
include the entity.
The ATTR_ARRAY type was quite limited, only supporting u16/u32 and
addresses. This changes the union to a struct so nested/function
can be defined along with array_type.
Some APs use an older hostapd OWE implementation which incorrectly
derives the PTK. To work around this group 19 should be used for
these APs. If there is a failure (reason=2) and the AKM is OWE
set force default group into network and retry. If this has been
done already the behavior is no different and the BSS will be
blacklisted.
If a OWE network is buggy and requires the default group this info
needs to be stored in network in order for it to set this into the
handshake on future connect attempts.
This functionality works around the kernel's behavior of allowing
6GHz only after a regulatory domain update. If the regdom updates
scan.c needs to be aware in order to split up periodic scans, or
insert 6GHz frequencies into an ongoing periodic scan. Doing this
allows any 6GHz BSS's to show up in the scan results rather than
needing to issue an entirely new scan to see these BSS's.
The kernel's regulatory domain updates after some number of beacons
are processed. This triggers a regulatory domain update (and wiphy
dump) but only after a scan request. This means a full scan started
prior to the regdom being set will not include any 6Ghz BSS's even
if the regdom was unlocked during the scan.
This can be worked around by splitting up a large scan request into
multiple requests allowing one of the first commands to trigger a
regdom update. Once the regdom updates (and wiphy dumps) we are
hopefully still scanning and could append an additional request to
scan 6GHz.
In the case of an external scan, we won't have a scan_request object,
sr. Make sure to not crash in this case.
Also, since scan_request can no longer carry the frequency set in all
cases, add a new member to scan_results in order to do so.
Fixes: 27d8cf4ccc59 ("scan: track scanned frequencies for entire request")
The kernel handles setting the regulatory domain by receiving beacons
which set the country IE. Presumably since most regulatory domains
disallow 6GHz the default (world) domain also disables it. This means
until the country is set, 6GHz is disabled.
This poses a problem for IWD's quick scanning since it only scans a few
frequencies and this likely isn't enough beacons for the firmware to
update the country, leaving 6Ghz inaccessable to the user without manual
intervention (e.g. iw scan passive, or periodic scans by IWD).
To try and work around this limitation the quick scan logic has been
updated to check if a 6GHz AP has been connected to before and if that
frequency is disabled (but supported). If this is the case IWD will opt
for a full passive scan rather than scanning a limited set of
frequencies.
For whatever reason the kernel will send regdom updates even if
the regdom didn't change. This ends up causing wiphy to dump
which isn't needed since there should be no changes in disabled
frequencies.
Now the previous country is checked against the new one, and if
they match the wiphy is not dumped again.
A change in regulatory domain can result in frequencies being
enabled or disabled depending on the domain. This effects the
frequencies stored in wiphy which other modules depend on
such as scanning, offchannel work etc.
When the regulatory domain changes re-dump the wiphy in order
to update any frequency restrictions.
A helper to check whether the country code corresponds to a
real country, or some special code indicating the country isn't
yet set. For now, the special codes are OO (world roaming) and
XX (unknown entity).
Events to indicate when a regulatory domain wiphy dump has
started and ended. This is important because certain actions
such as scanning need to be delayed until the dump has finished.
The NEW_SCAN_RESULTS handling was written to only parse the frequency
list if there were no additional scan commands to send. This results in
the scan callback containing frequencies of only the last CMD_TRIGGER.
Until now this worked fine because a) the queue is only used for hidden
networks and b) frequencies were never defined by any callers scanning
for hidden networks (e.g. dbus/periodic scans).
Soon the scan command queue will be used to break up scan requests
meaning only the last scan request frequencies would be used in the
callback, breaking the logic in station.
Now the NEW_SCAN_RESULTS case will parse the frequencies for each scan
command rather than only the last.
The compiler treated the '1' as an int type which was not big enough
to hold a bit shift of 31:
runtime error: left shift of 1 by 31 places cannot be represented in
type 'int'
Instead of doing the iftype check manually, refactor
wiphy_get_supported_iftypes by adding a subroutine which just parses
out iftypes from a mask into a char** list. This removes the need to
case each iftype into a string.
Add extra logging around CQM events to help track wifi status. This is
useful for headless systems that can only be accessed over the network
and so information in the logs is invaluable for debugging outages.
Prior to this change, the only log for CQM messages is saying one was
received. This adds details to what attributes were set and the
associated data with them.
The signal strength log format was chosen to roughly match
wpa_supplicant's which looks like this:
CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-60 noise=-96 txrate=6000
Provides useful information on why a roam might have failed, such as
failing to find the BSS or the BSS being ranked lower, and why that
might be.
The output format is the same as station_add_seen_bss for consistency.
If a frequency is disabled IWD should keep track and disallow any
operations on that channel such as scanning. A new list has been added
which contains only disabled frequencies.
The scan_passive API wasn't using a const struct scan_freq_set as it
should be since it's not modifying the contents. Changing this to
const did require some additional changes like making the scan_parameters
'freqs' member const as well.
After changing scan_parameters, p2p needed updating since it was using
scan_parameters.freqs directly. This was changed to using a separate
scan_freq_set pointer, then setting to scan_parameters.freqs when needed.
Similar to the HT/VHT APIs, this estimates the data rate based on the
HE Capabilities element, in addition to our own capabilities. The
logic is much the same as HT/VHT. The major difference being that HE
uses several MCS tables depending on the channel width. Each width
MCS set is checked (if supported) and the highest estimated rate out
of all the MCS sets is used.
There appears to be a compiler bug with gcc 11.2 which thinks the vht_mcs_set
is a zero length array, and the memset of size 8 is out of bounds. This is only
seen once an element is added to 'struct band'.
In file included from /usr/include/string.h:519,
from src/wiphy.c:34:
In function ‘memset’,
inlined from ‘band_new_from_message’ at src/wiphy.c:1300:2,
inlined from ‘parse_supported_bands’ at src/wiphy.c:1423:11,
inlined from ‘wiphy_parse_attributes’ at src/wiphy.c:1596:5,
inlined from ‘wiphy_update_from_genl’ at src/wiphy.c:1773:2:
/usr/include/bits/string_fortified.h:59:10: error: ‘__builtin_memset’ offset [0, 7] is out of the bounds [0, 0] [-Werror=array-bounds]
59 | return __builtin___memset_chk (__dest, __ch, __len,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
60 | __glibc_objsize0 (__dest));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~
In test-band the band object was allocated using l_malloc, but not
memset to zero. This will cause problems if allocated pointers are
included in struct band once band is freed.
This increases the maximum data rate which now is possible with HE.
A few comments were also updated, one to include 6G when adjusting
the rank for >4000mhz, and the other fixing a typo.
This is a general way of finding the best MCS/NSS values which will work
for HT, VHT, and HE by passing in the max MCS values for each value which
the MCS map could contain (0, 1, or 2).
The HE capabilities information is contained in
NL80211_BAND_ATTR_IFTYPE_DATA where each entry is a set of attributes
which define the rules for one or more interface types. This patch
specifically parses the HE PHY and HE MCS data which will be used for
data rate estimation.
Since the set of info is per-iftype(s) the data is stored in a queue
where each entry contains the PHY/MCS info, and a uint32 bit mask where
each bit index signifies an interface type.
With the addition of HE, the print function for MCS sets needs to change
slightly. The maps themselves are the same format, but the values indicate
different MCS ranges. Now the three MCS max values are passed in.
This queue will hold iftype(s) specific data for HE capabilities. Since
the capabilities may differ per-iftype the data is stored as such. Iftypes
may share a configuration so the band_he_capabilities structure has a
mask for each iftype using that configuration.
When PCI adapters are properly configured they should exist in the
vfio-pci system tree. It is assumed any devices configured as such
are used for test-runner.
This removes the need for a hw.conf file to be supplied, but still
is required for USB adapters. Because of this the --hw option was
updated to allow no value, or a file path.
subprocess.Popen's wait() method was overwritten to be non-blocking but
in certain circumstances you do want to wait forever. Fix this to allow
timeout=None, which calls the parent wait() method directly.
Certain module dependencies were missing, which could cause a crash on
exit under (very unlikely) circumstances.
#0 l_queue_peek_head (queue=<optimized out>) at ../iwd-1.28/ell/queue.c:241
#1 0x0000aaaab752f2a0 in wiphy_radio_work_done (wiphy=0xaaaac3a129a0, id=6)
at ../iwd-1.28/src/wiphy.c:2013
#2 0x0000aaaab7523f50 in netdev_connect_free (netdev=netdev@entry=0xaaaac3a13db0)
at ../iwd-1.28/src/netdev.c:765
#3 0x0000aaaab7526208 in netdev_free (data=0xaaaac3a13db0) at ../iwd-1.28/src/netdev.c:909
#4 0x0000aaaab75a3924 in l_queue_clear (queue=queue@entry=0xaaaac3a0c800,
destroy=destroy@entry=0xaaaab7526190 <netdev_free>) at ../iwd-1.28/ell/queue.c:107
#5 0x0000aaaab75a3974 in l_queue_destroy (queue=0xaaaac3a0c800,
destroy=destroy@entry=0xaaaab7526190 <netdev_free>) at ../iwd-1.28/ell/queue.c:82
#6 0x0000aaaab7522050 in netdev_exit () at ../iwd-1.28/src/netdev.c:6653
#7 0x0000aaaab7579bb0 in iwd_modules_exit () at ../iwd-1.28/src/module.c:181
In this particular case, wiphy module was de-initialized prior to the
netdev module:
Jul 14 18:14:39 localhost iwd[2867]: ../iwd-1.28/src/wiphy.c:wiphy_free() Freeing wiphy phy0[0]
Jul 14 18:14:39 localhost iwd[2867]: ../iwd-1.28/src/netdev.c:netdev_free() Freeing netdev wlan0[45]
Since we use git ls-files to produce the list of all tests for -A, if
the source directory is owned by somebody other than root one might
get:
fatal: unsafe repository ('/home/balrog/repos/iwd' is owned by someone else)
To add an exception for this directory, call:
git config --global --add safe.directory /home/balrog/repos/iwd
Starting
/home/balrog/repos/iwd/tools/..//autotests/ threw an uncaught exception
Traceback (most recent call last):
File "/home/balrog/repos/iwd/tools/run-tests", line 966, in run_auto_tests
subtests = pre_test(ctx, test, copied)
File "/home/balrog/repos/iwd/tools/run-tests", line 814, in pre_test
raise Exception("No hw.conf found for %s" % test)
Exception: No hw.conf found for /home/balrog/repos/iwd/tools/..//autotests/
Mark args.testhome as a safe directory on every run.
Test that the DHCPv4 lease got renewed after the T1 timer runs out.
Then also simulate the DHCPREQUEST during renew being lost and
retransmitted and the lease eventually getting renewed T1 + 60s later.
The main downside is that this test will inevitably take a while if
running in Qemu without the time travel ability.
Update the test and some utility code to run hostapd in an isolated net
namespace for connection_test.py. We now need a second hostapd
instance though because in static_test.py we test ACD and we need to
produce an IP conflict. Moving the hostapd instance unexpectedly fixes
dhcpd's internal mechanism to avoid IP conflicts and it would no longer
assign 192.168.1.10 to the second client, it'd notice that address was
already in use and assign the next free address, or fail if there was
none. So add a second hostapd instance that runs in the main namespace
together with the statically-configured client, it turns out the test
relies on the kernel being unable to deliver IP traffic to interfaces on
the same system.
The kernel will not let us test some scenarios of communication between
two hwsim radios (e.g. STA and AP) if they're in the same net namespace.
For example, when connected, you can't add normal IPv4 subnet routes for
the same subnet on two different interfaces in one namespace (you'd
either get an EEXIST or you'd replace the other route), you can set
different metrics on the routes but that won't fix IP routing. For
testNetconfig the result is that communication works for DHCP before we
get the inital lease but renewals won't work because they're unicast.
Allow hostapd to run on a radio that has been moved to a different
namespace in hw.conf so we don't have to work around these issues.
The --start option was directly passed to the kernel init parameter,
preventing any environment setup from happening.
Intead always use 'run-tests' as the init process but detect --start
and execute that binary/script once inside the environment.
The table header needs to be adjusted to include spaces between
columns.
The 'adapter list' command was also updated to shorted a few
columns which were quite long for the data that is actually displayed.
The existing color code escape sequences required the user to set the
color, write the string, then unset with COLOR_OFF. Instead the macros
can be made to take the string itself and automatically terminate the
color with COLOR_OFF. This makes for much more concise strings.
ad-hoc, ap, and wsc all had descriptions longer than the max width but
this is now taken care of automatically. Remove the tab and newline's
from the description.
The WSC pin command description was also changed to be more accurate
since the pin does not need to be 8 digits.
There was no easy to use API for printing the contents of a table, and
was left up to the caller to handle manually. This adds display_table_row
which makes displaying tables much easier, including automatic support
for line truncation and continuation on the next line.
Lines which are too long will be truncated and displayed on the next
line while also taking into account any colored output. This works with any
number of columns.
This removes the need for the module to play games with encoding newlines
and tabs to make the output look nice.
As a start, this functionality was added to the command display.
This fixes a crash associated with toggling the iftype to AP mode
then calling GetDiagnostics. The diagnostic interface is never
cleaned up when netdev goes down so DBus calls can still be made
which ends up crashing since the AP interface objects are no longer
valid.
Running the following iwctl commands in a script (once or twice)
triggers this crash reliably:
iwctl device wlp2s0 set-property Mode ap
iwctl device wlp2s0 set-property Mode station
iwctl device wlp2s0 set-property Mode ap
iwctl ap wlp2s0 start myssid secret123
iwctl ap wlp2s0 show
++++++++ backtrace ++++++++
0 0x7f8f1a8fe320 in /lib64/libc.so.6
1 0x451f35 in ap_dbus_get_diagnostics() at src/ap.c:4043
2 0x4cdf5a in _dbus_object_tree_dispatch() at ell/dbus-service.c:1815
3 0x4bffc7 in message_read_handler() at ell/dbus.c:285
4 0x4b5d7b in io_callback() at ell/io.c:120
5 0x4b489b in l_main_iterate() at ell/main.c:476
6 0x4b49a6 in l_main_run() at ell/main.c:519
7 0x4b4cd9 in l_main_run_with_signal() at ell/main.c:645
8 0x404f5b in main() at src/main.c:600
9 0x7f8f1a8e8b75 in /lib64/libc.so.6
+++++++++++++++++++++++++++
The generic proxy property display was limited to a width for names/value
which makes the output look nice and uniform, but will cut off any values
which are longer than this limit.
This patch adds some logic to detect this, and continue displaying the
value on the next line.
The width arguments were also updated to be unsigned, which allows checking
the length without a cast.
testAgent had a few tests which weren't reliable, and one was not
actually testing anything, or at least not what the name implied it
should be testing.
The first issue was using iwctl in the first place. There is not a
reliable way to know when iwctl has registered its agent so relying on
that with a sleep, or waiting for the service to become available isn't
100% fool proof. To fix this use the updated PSKAgent which allows
multiple to be registered. This ensures the agent is ready for requests.
This test was also renamed to be consistent with what its actually
testing: that IWD uses the first agent registered.
This removes test_connection_with_other_agent as well because this test
case is covered by the client test itself. There is no need to re-test
iwctl's agent functionality here.
By creating a new bus connection for each agent we can register multiple
with IWD. This did mean the agent interface needs to be unique for each
agent (removing _agent_manager_if) as well as tracking multiple agents
in a list.
IWD uses a few pragmas to ignore warnings which clang does
not support. For -Werror builds these cause build failures
but can be fixed by ignoring unknown warnings and pragmas.
The dbus proxy code assumes that every interface has a set of
properties registered in a 'proxy_interface_property' structure,
assuming the interface has any properties at all. If the interface
is assumed to have no properties (and no property table) but
actually does, the property table lookup fails but is assumed to
have succeeded and causes a crash.
This caused iwctl to crash after some properties were added to DPP
since the DPP interface previously had no properties.
Now, check that the property table was valid before accessing it. This
should allow properties to be added to new interfaces without crashing
older versions of iwctl.
About a month ago hostapd was changed to set the secure bit on
eapol frames during rekeys (bc36991791). The spec is ambiguous
about this and has conflicting info depending on the sections you
read (12.7.2 vs 12.7.6). According to the hostapd commit log TGme
is trying to clarify this and wants to set secure=1 in the case
of rekeys. Because of this, IWD is completely broken with rekeys
since its disallows secure=1 on PTK 1/4 and 2/4.
Now, a bool is passed to the verify functions which signifies if
the PTK has been negotiated already. If secure differs from this
the key frame is not verified.
The test here is verifying that a DBus Connect() call will still
work with 'other' agents registered. In this case it uses iwctl to
set a password, then call Connect() manually.
The problem here is that we have no way of knowing when iwctl fully
starts and registers its agent. There was a sleep in there but that
is unreliable and we occationally were still getting past that without
iwctl having started fully.
To fix this properly we need to wait for iwctl's agent service to appear
on the bus. Since the bus name is unknown we must first find all names,
then cross reference their PID's against the iwctl PID. This is done
using ListNames, and GetConnectionUnixProcessID APIs.
The rekey/reauth logic was broken in a few different ways.
For rekeys the event list was not being reset so any past 4-way
handshake would allow the call to pass. This actually removes
the need for the sleep in the extended key ID test because the
actual handshake event is waited for correctly.
For both rekeys and reauths, just waiting for the EAP/handshake
events was not enough. Without checking if the client got
disconnected we essentially allow a full disconnect and reconnect,
meaning the rekey/reauth failed.
Now a 'disallow' array can be passed to wait_for_event which will
throw an exception if any events in that array are encountered
while waiting for the target event.
Yet another weird UML quirk. The intent of this tests was to ensure
the profile gets encrypted, and to check this both the mtime and
contents of the profile were checked.
But under UML the profile is copied, IWD started, and the profile
is encrypted all without any time passing. The (same) mtime was
then updated without any changes which fails the mtime check.
This puts a sleep after copying the profile to ensure the system
time differs once IWD encrypts the profile.
In UML if any process dies while test-runner is waiting for the DBus
service or some socket to be available it will block forever. This
is due to the way the non_block_wait works.
Its not optimal but it essentially polls over some input function
until the conditions are met. And, depending on the input function,
this can cause UML to hang since it never has a chance to go idle
and advance the time clock.
This can be fixed, at least for services/sockets, by sleeping in
the input function allowing time to pass. This will then allow
test-runner to bail out with an exception.
This patch adds a new wait_for_service function which handles this
automatically, and wait_for_socket was refactored to behave
similarly.
This function was checking if the process object exists, which can
persist long after a process is killed, or dies unexpectedly. Check
that the actual PID exists by sending signal 0.
The man pages (iwd.network) have a section about how to name provisioning
files containing non-alphanumeric characters but not everyone reads the
entire man page.
Warning them that the provisioning file was not read and pointing to
'man iwd.network' should lead someone in the right direction.
EAP-Success might come in with an identifier that is incremented by 1
from the last Response packet. Since identifier field is a byte, the
value might overflow (from 255 -> 0.) This overflow isn't handled
properly resulting in EAP-Success/Failure packets with a 0 identifier
due to overflow being erroneously ignored. Fix that.
test_decryption_failure is quite simple and only verifies that a known
network exists after starting. This causes the test to end before IWD can
fully start up leaving the DBus utilities in limbo having not fully
initialized.
Then, on the next test, stale InterfaceAdded signals arrive (for Station
and P2P) which throw exceptions when trying to get the bus (since IWD is
long gone). In addition the next IWD instance has started so any paths
included in the InterfaceAdded signals are bogus and cause additional
exceptions.
At the end of this test we can call list_devices() which will wait for
the InterfaceAdded signal, and cleanly exit afterwards.
An earlier commit fixed several options but ended up breaking others. The
result_parent/monitor_parent options are hidden from the user and only meant
to be passed to the kernel but they relied on the fact that the underscore
was present, not a dash. This updates the argument to use a dash:
--result-parent
--monitor-parent
Fixes: 00e41eb0ff ("test-runner: Fix parsing for some arguments")
The new regex match update was actually matching way more than it should
have due to how python's 'match' API works. 'match' will return successfully
if zero or more characters match from the beginning of the string. In this
case we actually need the entire regex to match otherwise we start matching
all prefixes, for example:
"--verbose iwd" will match iwd, iwd-dhcp, iwd-acd, iwd-genl and iwd-tls.
Instead use re.fullmatch which requires the entire string to match the
regex.
This was copy pasted from the autoconnect test, and depending on
how the python module cache is ordered can incorrectly use the
wrong test class. This should nothappen because we insert
the paths to the head of the list but for consistency the class
should be named something that reflects what the test is doing.
Enabling this ends up dumping so much logging and, at least with namespaces,
seems to break the logger module and cause really weird behavior, worst of
which is that all processes start dumping to stdout.
This can still be enabled explicitly with --verbose iwd-rtnl, but is turned
off by default when --log is used.
Add a fake resolvconf executable to verify that the right nameserver
addresses were actually committed by iwd. Again use unique nameserver
addresses to reduce the possibility that the test succeeds by pure luck.
In static_test.py add IPv6. Add comments on what we're actually testing
since it wasn't very clear. After the expected ACD conflict detection,
succeed if either the lost address was removed or the client disconnected
from the AP since this seems like a correct action for netconfig to
implement.
In iwd.py make sure all the static methods that touch IWD storage take the
storage_dir parameter instead of hardcoding IWD_STORAGE_DIR, and make
sure that parameter is actually used.
Create the directory if it doesn't exist before copying files into it.
This fixes a problem in testNetconfig where
`IWD.copy_to_storage('ssidTKIP.psk', '/tmp/storage')`
would result in /tmp/storage being created as a file, rather than a
directory containing a file, and resulting in IWD failing to start with:
`Failed to create /tmp/storage`
runner.py creates /tmp/iwd but that doesn't account for IWD sessions
with a custom storage dir path.
Extend test_ip_address_match to support IPv6 and to test the
netmask/prefix length while it reads the local address since those are
retrieved using the same API.
Modify testNetconfig to validate the prefix lengths, change the prefix
lengths to be less common values (not 24 bits for IPv4 or 64 for IPv6),
minor cleanup.
Currently the parameter values reach run-tests by first being parsed by
runner.py's RunnerArgParser, then the resulting object members being
encoded as a commandline string, then as environment variables, then the
environment being converted to a python string list and passed to
RunnerCoreArgParser again. Where argument names (like --sub-tests) had
dashes, the object members had underscores (.sub_tests), this wasn't
taken into account when building the python string list from environment
variables so convert all underscores to dashes and hope that all the
names match now.
Additionally some arguments used nargs='1' or nargs='*' which resulted
in their python values becoming lists. They were converted back to command
line arguments such as: --sub_tests ['static_test.py'], and when parsed
by RunnerCoreArgParser again, the values ended up being lists of lists.
In all three cases it seems the actual user of the parsed value actually
expects a single string with comma-separated substrings in it so just drop
the nargs= uses.
Most users of storage_network_open don't log errors when the function
returns a NULL and fall back to defaults (empty l_settings).
storage_network_open() itself only logs errors if the flie is encrypted.
Now also log an error when l_settings_load_from_file() fails to help track
down potential syntax errors.
Drop the wrong negation in the error check. Check that there are no extra
characters after prefix length suffix. Reset errno 0 before the strtoul
call, as recommended by the manpage.
This is actually a false positive only because
p2p_device_validate_conn_wfd bails out if the IE is NULL which
avoids using wfd_data_length. But its subtle and without inspecting
the code it does seem like the length could be used uninitialized.
src/p2p.c:940:7: error: variable 'wfd_data_len' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
if (dev->conn_own_wfd)
^~~~~~~~~~~~~~~~~
src/p2p.c:946:8: note: uninitialized use occurs here
wfd_data_len))
^~~~~~~~~~~~
src/p2p.c:940:3: note: remove the 'if' if its condition is always true
if (dev->conn_own_wfd)
^~~~~~~~~~~~~~~~~~~~~~
src/p2p.c:906:23: note: initialize the variable 'wfd_data_len' to silence this warning
ssize_t wfd_data_len;
^
= 0
Though the documentation for NLMSG_OK uses an int type for the length
the actual check is based on nlmsghdr->nlmsg_len which is a 32 bit
unsigned integer. Clang was complaining about one call in nlmon.c
because nlmsg_len was int type. Every other usage in nlmon.c uses
a uint32_t, so use that both for consistency and to fix the warning.
monitor/nlmon.c:7998:29: error: comparison of integers of different
signs: '__u32' (aka 'unsigned int') and 'int'
[-Werror,-Wsign-compare]
for (nlmsg = iov.iov_base; NLMSG_OK(nlmsg, nlmsg_len);
^~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/linux/netlink.h💯24: note: expanded from macro 'NLMSG_OK'
(nlh)->nlmsg_len <= (len))
On musl-gcc the compiler is giving a warning for igtk_key_index
and gtk_key_index being used uninitialized. This isn't possible
since they are only used if gtk/igtk are non-NULL so pragma to
ignore the warning.
src/fils.c: In function 'fils_rx_associate':
src/fils.c:580:17: error: 'igtk_key_index' may be used uninitialized
in this function [-Werror=maybe-uninitialized]
580 | handshake_state_install_igtk(fils->hs,
igtk_key_index,igtk + 6,
igtk_len - 6, igtk);
(same error for gtk_key_index)
Since commit 922fa099721903b106a7bc1ccd1ffe8c4a7bce69 in hostap, our
setting of config_methods on P2P-client interface was ignored. Work
around that commit, in addition to the previous workaround we have in
this test, to again ensure the correct config_methods value is used.
This was lazily copied from UML but really made no sense in the context
of QEMU. First QEMU needs the virtfs option to define the mount tag and
in addition a 9p mount should be used rather than 'hostfs'.
The glob match was completely broken for --verbose because globs
are actually path matches, not generally for strings. Instead
match based on regular expressions.
First the verbose option was fixed to store it as an array as well
as write any list arguments into the kernel command line properly
(str() would include []). This has worked up until now because the
'in' keyword in python will work on strings just as well
as lists, for example:
>>> 'test' in 'this,is,a,test'
True
Then, the glob match was replaced with a regex match. Any exceptions
are caught and somewhat ignored (printed, but only seen with --debug).
This only guards against fatal exceptions from a user passing an
invalid expression.
For network configuration files the man pages (iwd.network) state
that [General].{AlwaysRandomizeAddress,AddressOverride} are only
used if main.conf has [General].AddressRandomization=network.
This actually was not being enforced and both iwd.network settings
were still taken into account regardless of what AddressRandomization
was set to (even disabled).
The handshake setup code now checks the AddressRandomization value
and if anything other than 'network' skips the randomization.
This bit of code was throwing exceptions if a test cleaned up files that
test-runner was expecting to clean up. Specifically testHotspot swaps out
main.conf and PSK files many times. This led to the exception being thrown,
caught, and ignored but further on test-runner would print:
"File _X_ not cleaned up!"
Now the files will be checked if they exist before trying to remove it.
There were a few places in dpp/dpp-util which passed a single byte but
was being read in with va_arg(va, size_t). On some architectures this was
causing failures presumably from the compiler using an integer type
smaller than size_t. As we do elsewhere, cast to size_t to force the
compiler to pass a properly sized iteger to va_arg.
Similarly to ofono/phonesim allow tests to be skipped if wpa_supplicant
is not found on the system.
This required some changes to DPP/P2P where Wpas() should be called first
since this can now throw a SkipTest exception.
The Wpas class was also made to allow __del__ to be called without
throwing additional exceptions in case wpa_supplicant was not found.
This allows the EAP tests to pass, but the fix really needs to be in
hostapd itself. Hostapd currently tries to lookup the EAP session
immediately after receiving EAPOL_REAUTH. This uses the identity
it has stored which, in the case of PEAP/TTLS, will always be a phase2
identity. During this initial lookup hostapd hard codes the identity
to be phase1 which is not true for PEAP/TTLS, and the lookup fails.
The current way this was being done was to import collections and
use collections.Mapping. This has been deprecated since python 3.3
but has worked up until python 3.10. After python 3.10 this will
no longer work, and Mapping must be imported from collections.abc.
This was passing IFNAME= along with EAPOL_REAUTH which does not work
in the context of a hostapd socket where the iface is already implied.
This fixes that issue as well as resets the events array and actually
waits for the required events afterwards.
After one of the eap-tls-common-based methods succeeds keep the TLS
tunnel instance until the method is freed, rather than free it the
moment the method succeeds. This fixes repeated method runs where until
now each next run would attempt to create a new TLS tunnel instance
but would have no authentication data (CA certificate, client
certificate, private key and private key passphrase) since those are
were by the old l_tls object from the moment of the l_tls_set_auth_data()
call.
Use l_tls_reset() to reset the TLS state after method success, followed
by a new l_tls_start() when the reauthentication starts.
The signal agent notifications were changed which breaks this test.
Specifically commit ce227e7b94 sends a notification when connected
which breaks the 'agent.calls' check. Since this check is done both
after connecting and once already connected the initial value may
be 1 or 0. Because of this that check was removed entirely.
This test was just piping the PSK files into /tmp/iwd/ssidCCMP.psk
which is a bit fragile if the storage dir was ever to change. Instead
use copy_to_storage and the 'name' keyword to copy the file.
If the user specifies the same parent directory for several outfiles
skip mounting since it already exists. For example:
--monitor /outfiles/monitor.txt --result /outfiles/result.txt
Inside the virtual environments /tmp is mounted as its own FS and not
taken from the host. This poses issues if any output files are directly
under /tmp since test-runner tries to mount the parent directory (/tmp).
The can be fixed by ensuring these output files are either not under
/tmp or at least one folder down the tree (e.g. /tmp/outputs/outfile.txt).
Now this requirement is enforced and test-runner will not start if any
output files parent directory is /tmp.
Usually the test home directory is a git repo somewhere e.g. under
/home. But if the home directory is located under /tmp this poses
a problem since UML remounts /tmp. To handle both cases mount
the home directory explicity.
Certain aspects of QEMU like mounting host directories may still require
root access but for UML this is not the case. To handle both cases first
check if SUDO_UID/GID are set and use those to obtain the actual users
ID's. Otherwise if running as non-root use the UID/GID of the user
directly.
A user reported that IWD was failing to FT in some cases and this was
due to the AP setting the Retry bit in the frame type. This was
unexpected by IWD since it directly checks the frame type against
0x00b0 which does not account for any B8-B15 bits being set.
IWD doesn't need to verify the frame type field for a few reasons:
First mpdu_validate checks the management frame type, Second the kernel
checks prior to forwarding the event. Because of this the check was
removed completely.
Reported-By: Michael Johnson <mjohnson459@gmail.com>
station_signal_agent_notify() has been refactored so that its usage is
simpler. station_rssi_level_changed() has been replaced by an inlined
call to station_signal_agent_notify().
The call to netdev_rssi_level_init() in netdev_connect_common() is
currently a no-op, because netdev->connected has not yet been set at
this stage of the connection attempt. Because netdev_rssi_level_init()
is only used twice, it's been replaced by two inlined calls to
netdev_set_rssi_level_idx().
The SignalLevelAgent API is currently broken by the system bus's
security policy, which blocks iwd's outgoing method call messages. This
patch punches a hole for method calls on the
net.connman.iwd.SignalLevelAgent interface.
There may be situations where DNS information should not be set (for
example in auto-tests where the resolver daemon is not running) or if a
user wishes to ignore DNS information obtained.
Allows granularly specifying the DHCP logging level. This allows the
user to tailor the output to what they need. By default, always display
info, errors and warnings to match the rest of iwd.
Setting `IWD_DHCP_DEBUG` to "debug", "info", "warn", "error" will limit
the logging to that level or higher allowing the default logging
verbosity to be reduced.
Setting `IWD_DHCP_DEBUG` to "1" as per the current behavior will
continue to enable debug level logging.
Use scapy library which allows one to easily construct and fudge various
network packets. This makes constructing spoofed packets much easier
and more readable compared to hex-encoded, hand-crafted frames.
The TA/BSSID addresses of spoofed disassociate frames were set
incorrectly. They should be using the 02:00:00:XX:XX:XX address, but
instead were being converted over to 42:00:00:XX:XX:XX address
After the initial handshake, once the TK has been installed, all frames
coming to the AP should be encrypted. However, it seems that some
kernel/driver combinations allow unencrypted EAPoL frames to be received
and forwarded to userspace. This can lead to various attacks.
Some drivers can report whether the EAPoL frame has been received
unencrypted. Use this information to drop unencrypted EAPoL frames
received after the initial handshake has been completed.
After the initial handshake, once the TK has been installed, all frames
coming from the AP should be encrypted. However, it seems that some
kernel/driver combinations allow unencrypted EAPoL frames to be received
and forwarded to userspace. This can lead to a denial-of-service attack
where receipt of an invalid, unencrypted EAP-Failure frame generated by
an adversary results in iwd terminating an ongoing connection.
Some drivers can report whether the EAPoL frame has been received
unencrypted. Use this information to drop unencrypted EAP frames
received after the initial handshake has been completed.
Reported-by: Domien Schepers <schepers.d@northeastern.edu>
After the initial handshake, once the TK has been installed, all frames
coming from the AP should be encrypted. However, it seems that some
kernel/driver combinations allow unencrypted EAPoL frames to be received
and forwarded to userspace. This can lead to a denial-of-service attack
where receipt of an invalid, unencrypted EAPoL 1/4 frame generated by an
adversary results in iwd terminating an ongoing connection.
Some drivers can report whether the EAPoL frame has been received
unencrypted. Use this information to drop unencrypted PTK 1/4 frames
received after the initial handshake has been completed.
Reported-by: Domien Schepers <schepers.d@northeastern.edu>
Do not fail an ongoing handshake when an invalid EAPoL frame is
received. Instead, follow the intent of 802.11-2020 section 12.7.2:
"EAPOL-Key frames containing invalid field values shall be silently
discarded."
This prevents a denial-of-service attack where receipt of an invalid,
unencrypted EAPoL 1/4 frame generated by an adversary results in iwd
terminating an ongoing connection.
Reported-by: Domien Schepers <schepers.d@northeastern.edu>
Periodic scan requests are meant to be performed with a lower priority
than normal scan requests. They're thus given a different priority when
inserting them into the wiphy work queue. Unfortunately, the priority
is not taken into account when they are inserted into the
sr->requests queue. This can result in the scanning code being confused
since it assumes the top of the queue is always the next scheduled or
currently ongoing scan. As a result any further wiphy_work might never be
started properly.
Apr 27 16:34:40 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_insert() Inserting work item 3
Apr 27 16:34:40 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_next() Starting work item 3
Apr 27 16:34:40 iwd[5117]: ../iwd-1.26/src/scan.c:scan_periodic_timeout() 1
Apr 27 16:34:40 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_insert() Inserting work item 4
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_insert() Inserting work item 5
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_done() Work item 3 done
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_next() Starting work item 5
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/scan.c:scan_notify() Scan notification Trigger Scan(33)
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/scan.c:scan_request_triggered() Passive scan triggered for wdev 1
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/scan.c:scan_periodic_triggered() Periodic scan triggered for wdev 1
In the above log, scan request 5 (triggered by dbus) is started before
scan request 4 (periodic scan). Yet the scanning code thinks scan
request 4 was triggered.
Fix this by using the wiphy_work priority to sort the sr->requests queue
so that the scans are ordered in the same manner.
Reported-by: Alvin Šipraga <ALSI@bang-olufsen.dk>
update_config=1 lets wpa_supplicant write config changes
to the config file. In the real world this is what you want
so your DPP credentials are persistant. But for testing this
is not correct since multiple tests use the same config file
and expect it to be pristine.
Occationally wpa_supplicant was connecting to the AP without
running DPP because the config already had the network
credentials.
The upstream code immediately retransmitted any no-ACK frames.
This would work in cases where the peer wasn't actively switching
channels (e.g. the ACK was simply lost) but caused unintended
behavior in the case of a channel switch when the peer was not
listening.
If either IWD or the peer needed to switch channels based on the
authenticate request the response may end up not getting ACKed
because the peer is idle, or in the middle of the hardware changing
channels. IWD would get no-ACK and immediately send the frame again
and most likely the same behavior would result. This would very
quickly increment frame_retry past the maximum and DPP would fail.
Instead when no ACK is received wait 1 second before sending out
the next frame. This can re-use the existing frame_pending buffer
and the same logic for re-transmitting.
There is a potential corner case of an offchannel frame callback
being called after ROC has ended.
This could happen in theory if a received frame is queued right as
the ROC session expires. If the ROC cancel event makes it to user
space before the frame IWD will schedule another ROC then receive
the frame. This doesn't prevent IWD from sending out another
frame since OFFCHANNEL_TX_OK is used, but it will prevent IWD from
receiving a response frame since no dwell duration is used with DPP.
To handle this an roc_started bool was added to the dpp_sm which
tracks the ROC state. If dpp_send_frame is called when roc_started
is false the frame will be saved and sent out once the ROC session
is started again.
ConnectHiddenNetwork creates a temporary network object and initiates a
connection with it. If the connection fails (due to an incorrect
passphrase or other reasons), then this temporary object is destroyed.
Delay its destruction until network_disconnected() since
network_connect_failed is called too early. Also, re-order the sequence
in station_reset_connection_state() in order to avoid using the network
object after it has been freed by network_disconnected().
Fixes: 85d9d6461f1f ("network: Hide hidden networks on connection error")
station_hide_network will remove and free the network object, so calling
network_close_settings will result in a crash. Make sure this is done
prior to network object's destruction.
Fixes: 85d9d6461f1f ("network: Hide hidden networks on connection error")
This only posed a problem oddly if the kernel binary was in the same
directory as test-runner. Resolving the absolute path with the
argument parser resolves the issue.
The TIOCSTTY ioctl was not shared between UML and QEMU which prevented
any console input from making it into UML. This fixes that, and now
ctrl-c can be used to stop UML test execution.
The MountInfo tuple was changed to explicitly take a source string. This
is redundant for UML and system mounts since the fstype/source are the same,
but it allows QEMU to specify the '9p' fstype and use MountInfo rather than
calling mount() explicitly.
This also moves logging cleanup into _prepare_mounts so both UML and QEMU
can use it.
Many processes are not long running (e.g. hostapd_cli, ip, iw, etc)
and the separators written to log files don't show up for these which
makes debugging difficult. This is even true for IWD/Hostapd for tests
with start_iwd=0.
After writing separators for long running processes write them out for
any additional log files too.
Way too many classes have a dependency on the TestContext class, in
most cases only for is_verbose. This patch removes the dependency from
Process and Namespace classes.
For Process, the test arguments can be parsed in the class itself which
will allow for this class to be completely isolated into its own file.
The Namespace class was already relatively isolated. Both were moved
into utils.py which makes 'run-tests' quite a bit nicer to look at and
more fitting to its name.
This commonizes some mounting code between QEMU and UML to allow exporting
of files to the host environment. UML does this with a hostfs mount while
QEMU still uses 9p.
The common code sanitizing the inputs has been put into _prepare_outfiles
and _prepare_mounts was modified to take an 'extra' arugment containing
additional mount points.
The results and monitor parent directories are now passed into the environment
via arguments, and these are hidden from the help text (in addition to testhome)
If --help or unknown options were supplied to test-runner python
would thrown a maximum recusion depth exception. This was due to
the way ArgumentParser was subclassed.
To fix this call ArgumentParser.__init__() rather than using the
super() method. And do this also for the RunnerCoreArgParse
subclass as well. In addition the namespace argument was removed
from parse_args since its not used, and instead supplied directly
to the parents parse_args method.
There is really no reason to have hwsim create interfaces automatically
for test-runner. test-runner already does this for wpa_supplicant and
hostapd, and IWD can create the interface itself.
If a user connection fails on a freshly scanned psk or open hidden
network, during passphrase request or after, it shall be removed from
the network list. Otherwise, it would be possible to directly connect
to that known network, which will appear as not hidden.
The test was rekeying in a loop which ends up confusing hostapd
depending on the timing of when it gets the REKEY command and any
responses from IWD. UML seemed to handle this fine but not QEMU.
Instead delay the rekey a bit to allow it to fully complete before
sending another.
Similarly to hostapd.wait_for_event, IWD's variant needed to act on
an IO watch because events were being received prior to even calling
wait_for_event.
With how fast UML is hostapd events were being sent out prior to
ever calling wait_for_event. Instead set an IO watch on the control
socket and cache all events as they come. Then, when wait_for_event
is called, it can reference this list. If the event is found any
older events are purged from the list.
The AP-ENABLED event needed a special case because hostapd gets
started before the IO watch can be registered. To fix this an
enabled property was added which queries the state directly. This
is checked first, and if not enabled wait_for_event continues normally.
This removes prints which were never supposed to make it upstream as
well as changes sleep() to wd.wait() as well as increase the wait
period to fix issues with how fast UML runs the tests.
This allows the callers condition to be checked immediately without
the mainloop running. In addition may_block=True allows the mainloop
to poll/sleep rather than immediately return back to the caller. This
handles async IO much better than may_block=False, at least for our
use-case.
Namespace process logs were appearing under 'ip' (and also overwriting
actual 'ip' logs) since they were executed with 'ip netns exec <namespace>'.
Instead special case this and append '-<namespace>' to the log file name.
In addition processes executed prior to any tests were being put under
a folder (name of testhome directory). Now this case is detected and these
logs are put at the top level log directory.
This allows test-runner to run inside a UML binary which has some
advantages, specifically time-travel/infinite CPU speed. This should
fix any scheduler related failures we have on slower systems.
Currently this runner does not suppor the same features as the Qemu
runner, specifically:
- No hardware passthrough
- No logging/monitor (UML -> host mounting isn't implemented yet)
In order to keep all test-runner dev scripts working and to work with
the new runner.py system some file renaming was required.
test-runner was renamed to run-tests
A new test-runner was added which only creates the Runner() class.
This (as well as subsequent commits) will separate test-runner into two
parts:
1. Environment setup
2. Running tests
Spurred by interest in adding UML/host support, test-runner was in need
of a refactor to separate out the environment setup and actually running
the tests.
The environment (currently only Qemu) requires quite a bit of special
handling (ctypes mounting/reboot, 9p mounts, tons of kernel options etc)
which nobody writing tests should need to see or care about. This has all
been moved into 'runner.py'.
Running the tests (inside test-runner) won't change much.
The new 'runner.py' module adds an abstraction class which allows different
Runner's to be implemented, and setup their own environment as they see
fit. This is in preparation for UML and Host runners.
Any test using assertTrue(hostapd.list_sta()) improperly has been
replaced with wait_for_event(). There were a few places where this
was actually ok (i.e. IWD is already connected) but most needed to
be changed since the check was just after IWD connected and hostapd's
list_sta() API may not return a fully updated list.
- Setting the IP address was resulting in an error:
Error: any valid prefix is expected rather than "wln58".
This is fixed by reordering the arguments with the IP address first
- Remove the sleep, and use non_block_wait to wait for the IPv6 address
to be set.
Before setting the address, wait for the interface to go down. This
fixes somewhat rare cases where setting the address returns -EBUSY
and ultimately breaks the neighbor reports.
All tests which could avoid calling scan() directly have been
changed to use the 'full_scan' argument to get_ordered_network.
This was done because of unreliable scanning behavior on slower
systems, like VMs. If we get unlucky with the scheduler some beacons
are not received in time and in turn scan results are missing.
Using full_scan=True works around this issue by repeatedly scanning
until the SSID is found.
It looks like some architectures defconfig were adding these in
automatically, but not others. Explicitly add these to make sure
the kernel is built correctly.
Base the root user check on os.getuid() instead of SUDO_GID so as not to
implicitly require sudo. SUDO_GID being set doesn't guarantee that the
effective user is root either since you can sudo to non-root accounts.
We check that config is not None but then access config.ctx outside of
that if block anyway. Then we do the same for config.ctx and
config.ctx.args. Nest the if blocks for the checks to be useful.
p2p_peer_update_existing may be called with a scan_bss struct built from
a Probe Request frame so it can't access bss->p2p_probe_resp_info even
if peer->bss was built from a Probe Response. Check the source frame
type of the scan_bss struct before updating the Device Address.
This fixes one timing issue that would make the autotest fail often.
Since l_malloc is used the frame contents are not zero'ed automatically
which could result in random bytes being present in the frame which were
expected to be zero. This poses a problem when calculating the MIC as the
crypto operations are done on the entire frame with the expectation of
the MIC being zero.
Fixes: 83212f9b23d5 ("eapol: change eapol_create_common to support FILS")
explicit_bzero is used in src/storage.c since commit
01cd8587606bf2da1af245163150589834126c1c but src/missing.h is not
included, as a result build with uclibc fails on:
/home/buildroot/autobuild/instance-0/output-1/host/lib/gcc/powerpc-buildroot-linux-uclibc/10.3.0/../../../../powerpc-buildroot-linux-uclibc/bin/ld: src/storage.o: in function `storage_init':
storage.c:(.text+0x13a4): undefined reference to `explicit_bzero'
Fixes:
- http://autobuild.buildroot.org/results/2aff8d3d7c33c95e2c57f7c8a71e69939f0580a1
When configuring wpa_supplicant all we care about is that it
received the configuration object. wpa_supplicant takes quite a bit
of time to connect in some cases so waiting for that is unneeded.
This also increases the DPP timeout which may be required on slower
systems or if the timing is particularly unlucky when receiving
frames.
This is used to hold the current BSS frequency which will be
used after IWD receives a presence announcement. Since this was
not being set, the logic was always thinking there was a channel
mismatch (0 != current_freq) and attempting to go offchannel to
'0' which resulted in -EINVAL, and ultimately protocol termination.
Change a few critical checks that were failing sometimes:
- A few asserts were changed to wait_for_object_condition
- A 15 second timeout was removed (default used instead)
- Do a full scan at beginning of each test to clear any
cached BSS's. The second test run was getting stale results
and the RSSI values were not expected.
This was not being properly honored when existing networks were
already populated. This poses an issue for any test which uses
full_scan after setting radio values such as signal strength.
For quite a while test-runner has run into frequent OOM exceptions when
running many tests in a row. Its not completely known exactly why, but
seems to point to the 9p driver which is used for sharing the root fs
between the test-runner VM and the host.
With debugging enabled (-d) one can see the available memory available
relatively stable. If a test fails it may spike ~3-4kb but this quickly
recovers as python garbage collects.
At some point the kernel faults failing to allocate which (usually) is
shown by a python OOM exception. At this point there is plenty of
available memory.
Dumping the kernel trace its seen that the 9p driver is involved:
[ 248.962949] test-runner: page allocation failure: order:7, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
[ 248.962958] CPU: 2 PID: 477 Comm: test-runner Not tainted 5.16.0 #91
[ 248.962960] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-4.fc34 04/01/2014
[ 248.962961] Call Trace:
[ 248.962964] <TASK>
[ 248.962965] dump_stack_lvl+0x34/0x44
[ 248.962971] warn_alloc.cold+0x78/0xdc
[ 248.962975] ? __alloc_pages_direct_compact+0x14c/0x1e0
[ 248.962979] __alloc_pages_slowpath.constprop.0+0xbfe/0xc60
[ 248.962982] __alloc_pages+0x2d5/0x2f0
[ 248.962984] kmalloc_order+0x23/0x80
[ 248.962988] kmalloc_order_trace+0x14/0x80
[ 248.962990] v9fs_alloc_rdir_buf.isra.0+0x1f/0x30
[ 248.962994] v9fs_dir_readdir+0x51/0x1d0
[ 248.962996] ? __handle_mm_fault+0x6e0/0xb40
[ 248.962999] ? inode_security+0x1d/0x50
[ 248.963009] ? selinux_file_permission+0xff/0x140
[ 248.963011] iterate_dir+0x16f/0x1c0
[ 248.963014] __x64_sys_getdents64+0x7b/0x120
[ 248.963016] ? compat_fillonedir+0x150/0x150
[ 248.963019] do_syscall_64+0x3b/0x90
[ 248.963021] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 248.963024] RIP: 0033:0x7fedd7c6d8c7
[ 248.963026] Code: 00 00 0f 05 eb b7 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa b8 ff ff ff 7f 48 39 c2 48 0f 47 d0 b8 d9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 81 a5 0f 00 f7 d8 64 89 02 48
[ 248.963028] RSP: 002b:00007ffd06cd87e8 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
[ 248.963031] RAX: ffffffffffffffda RBX: 000056090d87dd20 RCX: 00007fedd7c6d8c7
[ 248.963032] RDX: 0000000000080000 RSI: 000056090d87dd50 RDI: 000000000000000f
[ 248.963033] RBP: 000056090d87dd50 R08: 0000000000000030 R09: 00007fedc7d37af0
[ 248.963035] R10: 00007fedc7d7d730 R11: 0000000000000293 R12: ffffffffffffff88
[ 248.963038] R13: 000056090d87dd24 R14: 0000000000000000 R15: 000056090d0485e8
Here its seen an allocation of 512k is being requested (order:7), but faults.
In this run it there was ~35MB of available memory on the system.
Available Memory: 35268 kB
Last Test Delta: -2624 kB
Per-test Usage:
[ 0] ** 37016
[ 1] ********* 41584
[ 2] * 36280
[ 3] ********* 41452
[ 4] ******** 40940
[ 5] ****** 39284
[ 6] **** 38348
[ 7] *** 37496
[ 8] **** 37892
[ 9] 35268
This can be reproduced by running all autotests (changing the ram down to
~128MB helps trigger it faster):
./tools/test-runner -k <kernel> -d
After many attempts to fix this it was finally found that simply removing the
explicit 9p2000.u version from the kernel command line 'fixed' the problem.
This even allows decreasing the RAM down to 256MB from 384MB and so far no
OOM's have been seen.
In debug mode the test context is printed before each test. This
adds some additional information in there:
Available Memory: /proc/meminfo: MemAvailable
Last Test Delta: Change in usage between current and last test
Per-test Usage: Graph of usage relative to all past tests. This is
useful for seeing a trend down/up of usage.
This could fail and was not being checked. It was minimally changed to
take the ifindex directly (this was the only thing needed from the ethdev)
which allows checking prior to initializing the ethdev.
Running the tests inside a VM makes it difficult for the host to figure
out if the test actually failed or succeeded. For a human its easy to
read the results table, but for an automated system parsing this would
be fragile. This adds a new option --result <file> which writes PASS/FAIL
to the provided file once all tests are completed. Any failures results in
'FAIL' being written to the file.
\e[1;30m is bold black, often, but not always displayed bright black or
bold bright black. In case it is displayed as real black it is
invisible. \e[1;90m is explicit bold bright black.
\e[37m is white, therefore it is not suitable to be labeled as GREY,
which is \e[90m
The logic here assumed any BSS's in the roam scan were identical to
ones in station's bss_list with the same address. Usually this is true
but, for example, if the BSS changed frequency the one in station's
list is invalid.
Instead when a match is found remove the old BSS and re-insert the new
one.
With the addition of 6GHz '6000' is no longer the maximum frequency
that could be in .known_network.freq. For more robustness
band_freq_to_channel is used to validate the frequency.
Scanning while in AP mode is somewhat of an edge case, but it does
have some usefulness specifically with onboarding new devices, i.e.
a new device starts an AP, a station connects and provides the new
device with network credentials, the new device switches to station
mode and connects to the desired network.
In addition this could be used later for ACS (though this is a bit
overkill for IWD's access point needs).
Since AP performance is basically non-existant while scanning this
feature is meant to be used in a limited scope.
Two DBus API's were added which mirror the station interface: Scan and
GetOrderedNetworks.
Scan is no different than the station variant, and will perform an active
scan on all channels.
GetOrderedNetworks diverges from station and simply returns an array of
dictionaries containing basic information about networks:
{
Name: <ssid>
SignalStrength: <mBm>
Security: <psk, open, or 8021x>
}
Limitations:
- Hidden networks are not supported. This isn't really possible since
the SSID's are unknown from the AP perspective.
- Sharing scan results with station is not supported. This would be a
convenient improvement in the future with respect to onboarding new
devices. The scan could be performed in AP mode, then switch to
station and connect immediately without needing to rescan. A quick
hack could better this situation by not flushing scan results in
station (if the kernel retains these on an iftype change).
This was already implemented in station but with no dependency on
that module at all. AP will need this for a scanning API so its
being moved into scan.c.
The 802.11ax standards adds some restrictions for the 6GHz band. In short
stations must use SAE, OWE, or 8021x on this band and frame protection is
required.
All uses of this macro will work with a bitwise comparison which is
needed for 6GHz checks and somewhat more flexible since it can be
used to compare RSN info, not only single AKM values.
This adds checks if MFP is set to 0 or 1:
0 - Always fail if the frequency is 6GHz
1 - Fail if MFPC=0 and the frequency is 6GHz.
If HW is capable set MFPR=1 for 6GHz
This is a new band defined in the WiFi 6E (ax) amendment. A completely
new value is needed due to channel reuse between 2.4/5 and 6GHz.
util.c needed minimal updating to prevent compile errors which will
be fixed later to actually handle this band. WSC also needed a case
added for 6GHz but the spec does not outline any RF Band value for
6GHz so the 5GHz value will be returned in this case.
sae.c was failing to build on some platforms:
error: implicit declaration of function 'reallocarray'; did you mean 'realloc'?
[-Werror=implicit-function-declaration]
In certain rare cases IWD gets a link down event before nl80211 ever sends
a disconnect event. Netdev notifies station of the link down which causes
station to be freed, but netdev remains in the same state. Then later the
disconnect event arrives and netdev still thinks its connected, calls into
(the now freed) station object and causes a crash.
To fix this netdev_connect_free() is now called on any link down events
which will reset the netdev object to a proper state.
src/netdev.c:netdev_link_notify() event 16 on ifindex 16
src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
src/netdev.c:netdev_link_notify() event 16 on ifindex 16
src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
src/netdev.c:netdev_deauthenticate_event()
src/netdev.c:netdev_link_notify() event 16 on ifindex 16
src/station.c:station_free()
src/netconfig.c:netconfig_destroy()
src/resolve.c:resolve_systemd_revert() ifindex: 16
src/station.c:station_roam_state_clear() 16
src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
src/netdev.c:netdev_disconnect_event()
Received Deauthentication event, reason: 3, from_ap: false
0 0x472fa4 in station_disconnect_event src/station.c:2916
1 0x472fa4 in station_netdev_event src/station.c:2954
2 0x43a262 in netdev_disconnect_event src/netdev.c:1213
3 0x43a262 in netdev_mlme_notify src/netdev.c:5471
4 0x6706eb in process_multicast ell/genl.c:1029
5 0x6706eb in received_data ell/genl.c:1096
6 0x65e630 in io_callback ell/io.c:120
7 0x65a94e in l_main_iterate ell/main.c:478
8 0x65b0b3 in l_main_run ell/main.c:525
9 0x65b0b3 in l_main_run ell/main.c:507
10 0x65b5cc in l_main_run_with_signal ell/main.c:647
11 0x4124d7 in main src/main.c:532
If an event is in response to some command which is returning an
unexpected value (unexpected with respect to wpas.py) handle_eow
would raise an exception.
Specifically with DPP this was being hit when the URI was being
returned.
The difference between the existing code is that IWD will send the
authentication request, making it the initiator.
This handles the use case where IWD is provided a peers URI containing
its bootstrapping key rather than IWD always providing its own URI.
A new DBus API was added, ConfigureEnrollee().
Using ConfigureEnrollee() IWD will act as a configurator but begin by
traversing a channel list (URI provided or default) and waiting for
presence announcements (with one caveat). When an announcement is
received IWD will send an authentication request to the peer, receive
its reply, and send an authentication confirm.
As with being a responder, IWD only supports configuration to the
currently connected BSS and will request the enrollee switch to this
BSS's frequency to preserve network performance.
The caveat here is that only one driver (ath9k) supports multicast frame
registration which prevents presence frame from being received. In this
case it will be required the the peer URI contains a MAC and channel
information. This is because IWD will jump right into sending auth
requests rather than waiting for a presence announcement.
The frame watch which covers the presence procedure (and most
frames for that matter) needs to support multicast frames for
presence to work. Doing this in frame-xchg seems like the right
choice but only ath9k supports multicast frame registration.
Because of this limited support DPP will register for these frames
manually.
Parses K (key), M (mac), C (class/channels), and V (version) tokens
into a new structure dpp_uri_info. H/I are not parsed since there
currently isn't any use for them.
This was caught by static analysis. As is common this should never
happen in the real world since the only way this can fail (apart from
extreme circumstances like OOM) is if the key size is incorrect, which
it will never be.
Static analysis flagged that 'path' was never being checked (which
should not ever be NULL) but during that review I noticed stat()
was being called, then fstat afterwards.
Adds a new wait argument which, if false, will call the DBus method
and return immediately. This allows the caller to create multiple
radios very quickly, simulating (as close as we can) a wifi card
with dual phy's which appear in the kernel simultaneously.
The name argument was also changed to be mandatory, which is now
required by hwsim.
Currently CreateRadio only allows a single outstanding DBus message
until the radio is fully created. 99% of the time this is just fine
but in order to test dual phy cards there needs to be support for
phy's appearing at the same time.
This required storing the pending DBus message inside the radio object
rather than a single static variable.
The code was refactored to handle the internal radio info objects better
for the various cases:
- Creation from CreateRadio()
- Radio already existed before hwsim started, or created externally
- Existing radio changed name, address, etc.
First, Name is now a required option to CreateRadio(). This allows
the radio info to be pushed to the queue immediately (also allowing the
pending DBus message to be tracked). Then, when the NEW_RADIO event
fires the pending radio can be looked up (by name) and filled with the
remaining info.
If the radio was not found by name but a matching ID was found this is
the 'changed' case and the radio is re-initialized with the changed
values.
If neither name or ID matches the radio was created externally, or
prior to hwsim starting. A radio info object is created at this time
and initialized.
The ID was changed to a signed integer in order to initialize it to an
invalid number -1. Doing this was required since a pending uninitalized
radio ID (0) could match an existing radio ID. This required some
bounds checks in case the kernels counter reaches an extremely high value.
This isn't likely to ever happen in practice.
This tool will decrypt an IWD network profile which was previously
encrypted using a systemd provided key. Either a text passphrase
can be provided (--pass) or a file containing the secret (--file).
This can be useful for debugging, or recovering an encrypted
profile after enabling SystemdEncrypt.
Recently systemd added the ability to pass secret credentials to
services via LoadCredentialEncrypted/SetCredentialEncrypted. Once
set up the service is able to read the decrypted credentials from
a file. The file path is found in the environment variable
CREDENTIALS_DIRECTORY + an identifier. The value of SystemdEncrypt
should be set to the systemd key ID used when the credential was
created.
When SystemdEncrypt is set IWD will attempt to read the decrypted
secret from systemd. If at any point this fails warnings will be
printed but IWD will continue normally. Its expected that any failures
will result in the inability to connect to any networks which have
previously encrypted the passphrase/PSK without re-entering
the passphrase manually. This could happen, for example, if the
systemd secret was changed.
Once the secret is read in it is set into storage to be used for
profile encryption/decryption.
Using storage_decrypt() hotspot can also support profile encyption.
The hotspot consortium name is used as the 'ssid' since this stays
consistent between hotspot networks for any profile.
Some users don't like the idea of storing network credentials in
plaintext on the file system. This patch implements an option to
encrypt such profiles using a secret key. The origin of the key can in
theory be anything, but would typically be provided by systemd via
'LoadEncryptedCredential' setting in the iwd unit file.
The encryption operates on the entire [Security] group as well as all
embedded groups. Once encrypted the [Security] group will be replaced
with two key/values:
EncryptedSalt - A random string of bytes used for the encryption
EncryptedSecurity - A string of bytes containing the encrypted
[Security] group, as well as all embedded groups.
After the profile has been encrypted these values should not be
modified. Note that any values added to [Security] after encryption
has no effect. Once the profile is encrypted there is no way to modify
[Security] without manually decrypting first, or just re-creating it
entirely which effectively treated a 'new' profile.
The encryption/decryption is done using AES-SIV with a salt value and
the network SSID as the IV.
Once a key is set any profiles opened will automatically be encrypted
and re-written to disk. Modules using network_storage_open will be
provided the decrypted profile, and will be unaware it was ever
encrypted in the first place. Similarly when network_storage_sync is
called the profile will by automatically encrypted and written to disk
without the caller needing to do anything special.
A few private storage.c helpers were added to serve several purposes:
storage_init/exit():
This sets/cleans up the encryption key direct from systemd then uses
extract and expand to create a new fixed length key to perform
encryption/decryption.
__storage_decrypt():
Low level API to decrypt an l_settings object using a previously set
key and the SSID/name for the network. This returns a 'changed' out
parameter signifying that the settings need to be encrypted and
re-written to disk. The purpose of exposing this is for a standalone
decryption tool which does not re-write any settings.
storage_decrypt():
Wrapper around __storage_decrypt() that handles re-writing a new
profile to disk. This was exposed in order to support hotspot profiles.
__storage_encrypt():
Encrypts an l_settings object and returns the full profile as data
This got merged without a few additional fixes, in particular an
over 80 character line and incorrect length check.
Fixes: d8116e8828b4 ("dpp-util: add dpp_point_from_asn1()")
When we detect a new phy being added, we schedule a filtered dump of
the newly detected WIPHY and associated INTERFACEs. This code path and
related processing of the dumps was mostly shared with the un-filtered
dump of all WIPHYs and INTERFACEs which is performed when iwd starts.
This normally worked fine as long as a single WIPHY was created at a
time. However, if multiphy new phys were detected in a short amount of
time, the logic would get confused and try to process phys that have not
been probed yet. This resulted in iwd trying to create devices or not
detecting devices properly.
Fix this by only processing the target WIPHY and related INTERFACEs
when the filtered dump is performed, and not any additional ones that
might still be pending.
While here, remove a misleading comment:
manager_wiphy_check_setup_done() would succeed only if iwd decided to
keep the default interfaces created by the kernel.
This simulates the conditions that trigger a free-after-use which was
fixed with:
2c355db7 ("scan: remove periodic scans from queue on abort")
This behavior can be reproduced reliably using this test with the above
patch reverted.
This debug print was before any checks which could bail out prior to
autoconnect starting. This was confusing because debug logs would
contain multiple "station_autoconnect_start()" prints making you think
autoconnect was started several times.
The periodic scan code was refactored to make normal scans and
periodic scans consistent by keeping both in the same queue. But
that change left out the abort path where periodic scans were not
actually removed from the queue.
This fixes a rare crash when a periodic scan has been triggered and
the device goes down. This path never removes the request from the
queue but still frees it. Then when the scan context is removed the
stale request is freed again.
0 0x4bb65b in scan_request_cancel src/scan.c:202
1 0x64313c in l_queue_clear ell/queue.c:107
2 0x643348 in l_queue_destroy ell/queue.c:82
3 0x4bbfb7 in scan_context_free src/scan.c:209
4 0x4c9a78 in scan_wdev_remove src/scan.c:2115
5 0x42fecd in netdev_free src/netdev.c:965
6 0x445827 in netdev_destroy src/netdev.c:6507
7 0x52beb9 in manager_config_notify src/manager.c:765
8 0x67084b in process_multicast ell/genl.c:1029
9 0x67084b in received_data ell/genl.c:1096
10 0x65e790 in io_callback ell/io.c:120
11 0x65aaae in l_main_iterate ell/main.c:478
12 0x65b213 in l_main_run ell/main.c:525
13 0x65b213 in l_main_run ell/main.c:507
14 0x65b72c in l_main_run_with_signal ell/main.c:647
15 0x4124e7 in main src/main.c:532
If netdev_connect_failed is called before netdev_get_oci_cb() the
netdev's handshake will be destroyed and ultimately crash when the
callback is called.
This patch moves the cancelation into netdev_connect_free rather than
netdev_free.
++++++++ backtrace ++++++++
0 0x7f4e1787d320 in /lib64/libc.so.6
1 0x42634c in handshake_state_set_chandef() at src/handshake.c:1057
2 0x40a11b in netdev_get_oci_cb() at src/netdev.c:2387
3 0x483d7b in process_unicast() at ell/genl.c:986
4 0x480d3c in io_callback() at ell/io.c:120
5 0x48004d in l_main_iterate() at ell/main.c:472 (discriminator 2)
6 0x4800fc in l_main_run() at ell/main.c:521
7 0x48032c in l_main_run_with_signal() at ell/main.c:649
8 0x403e95 in main() at src/main.c:532
9 0x7f4e17867b75 in /lib64/libc.so.6
+++++++++++++++++++++++++++
Commit 4d2176df2985 ("handshake: Allow event handler to free handshake")
introduced a re-entrancy guard so that handshake_state objects that are
destroyed as a result of the event do not cause a crash. It rightly
used a temporary object to store the passed in handshake. Unfortunately
this caused variable shadowing which resulted in crashes fixed by commit
d22b174a7318 ("handshake: use _hs directly in handshake_event").
However, since the temporary was no longer used, this fix itself caused
a crash:
#0 0x00005555f0ba8b3d in eapol_handle_ptk_1_of_4 (sm=sm@entry=0x5555f2b4a920, ek=0x5555f2b62588, ek@entry=0x16, unencrypted=unencrypted@entry=false) at src/eapol.c:1236
1236 handshake_event(sm->handshake,
(gdb) bt
#0 0x00005555f0ba8b3d in eapol_handle_ptk_1_of_4 (sm=sm@entry=0x5555f2b4a920, ek=0x5555f2b62588, ek@entry=0x16, unencrypted=unencrypted@entry=false) at src/eapol.c:1236
#1 0x00005555f0bab118 in eapol_key_handle (unencrypted=<optimized out>, frame=<optimized out>, sm=0x5555f2b4a920) at src/eapol.c:2343
#2 eapol_rx_packet (proto=<optimized out>, from=<optimized out>, frame=<optimized out>, unencrypted=<optimized out>, user_data=0x5555f2b4a920) at src/eapol.c:2665
#3 0x00005555f0bac497 in __eapol_rx_packet (ifindex=62, src=src@entry=0x5555f2b62574 "x\212 J\207\267", proto=proto@entry=34958, frame=frame@entry=0x5555f2b62588 "\002\003",
len=len@entry=121, noencrypt=noencrypt@entry=false) at src/eapol.c:3017
#4 0x00005555f0b8c617 in netdev_control_port_frame_event (netdev=0x5555f2b64450, msg=0x5555f2b62588) at src/netdev.c:5574
#5 netdev_unicast_notify (msg=msg@entry=0x5555f2b619a0, user_data=<optimized out>) at src/netdev.c:5613
#6 0x00007f60084c9a51 in dispatch_unicast_watches (msg=0x5555f2b619a0, id=<optimized out>, genl=0x5555f2b3fc80) at ell/genl.c:954
#7 process_unicast (nlmsg=0x7fff61abeac0, genl=0x5555f2b3fc80) at ell/genl.c:973
#8 received_data (io=<optimized out>, user_data=0x5555f2b3fc80) at ell/genl.c:1098
#9 0x00007f60084c61bd in io_callback (fd=<optimized out>, events=1, user_data=0x5555f2b3fd20) at ell/io.c:120
#10 0x00007f60084c536d in l_main_iterate (timeout=<optimized out>) at ell/main.c:478
#11 0x00007f60084c543e in l_main_run () at ell/main.c:525
#12 l_main_run () at ell/main.c:507
#13 0x00007f60084c5670 in l_main_run_with_signal (callback=callback@entry=0x5555f0b89150 <signal_handler>, user_data=user_data@entry=0x0) at ell/main.c:647
#14 0x00005555f0b886a4 in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:532
This happens when the driver does not support rekeying, which causes iwd to
attempt a disconnect and re-connect. The disconnect action is
taken during the event callback and destroys the underlying eapol state
machine. Since a temporary isn't used, attempting to dereference
sm->handshake results in a crash.
Fix this by introducing a UNIQUE_ID macro which should prevent shadowing
and using a temporary variable as originally intended.
Fixes: d22b174a7318 ("handshake: use _hs directly in handshake_event")
Fixes: 4d2176df2985 ("handshake: Allow event handler to free handshake")
Reported-By: Toke Høiland-Jørgensen <toke@toke.dk>
Tested-by: Toke Høiland-Jørgensen <toke@toke.dk>
There is no need to punch the holes for netdev/wheel groups to send to
the .Agent interface. This is only done by the iwd daemon itself and
the policy for user 'root' already takes care of this.
A select few drivers send this instead of SIGNAL_MBM. The docs say this
value is the signal 'in unspecified units, scaled to 0..100'. The range
for SIGNAL_MBM is -10000..0 so this can be scaled to the MBM range easy
enough...
Now, this isn't exactly correct because this value ultimately gets
returned from GetOrderedNetworks() and is documented as 100 * dBm where
in reality its just a unit-less signal strength value. Its not ideal, but
this patch at least will fix BSS ranking for these few drivers.
The 'at_console' D-Bus policy setting has been deprecated for more then
10 years and could be ignored at any time in the future. Moreover, while
the intend was to allow locally logged on users to interact with iwd, it
didn't actually do that.
More info at https://www.spinics.net/lists/linux-bluetooth/msg75267.html
and https://gitlab.freedesktop.org/dbus/dbus/-/issues/52
Therefor remove the 'at_console' setting block.
On Debian (based) systems, there is a standard defined group which is
allowed to manage network interfaces, and that is the 'netdev' group.
So add a D-Bus setting block to grant the 'netdev' group that access.
Building on GCC 8 resulted in this compiler error.
src/sae.c:107:25: error: implicit declaration of function 'reallocarray';
did you mean 'realloc'? [-Werror=implicit-function-declaration]
sm->rejected_groups = reallocarray(NULL, 2, sizeof(uint16_t));
src/erp.c:134:10: error: comparison of integer expressions of different
signedness: 'unsigned int' and 'int' [-Werror=sign-compare]
src/eap-ttls.c:378:10: error: comparison of integer expressions of different signedness: 'uint32_t' {aka 'unsigned int'} and 'int' [-Werror=sign-compare]
Fixes the following crash:
#0 0x000211c4 in netdev_connect_event (msg=<optimized out>, netdev=0x2016940) at src/netdev.c:2915
#1 0x76f11220 in process_multicast (nlmsg=0x7e8acafc, group=<optimized out>, genl=<optimized out>) at ell/genl.c:1029
#2 received_data (io=<optimized out>, user_data=<optimized out>) at ell/genl.c:1096
#3 0x76f0da08 in io_callback (fd=<optimized out>, events=1, user_data=0x200a560) at ell/io.c:120
#4 0x76f0ca78 in l_main_iterate (timeout=<optimized out>) at ell/main.c:478
#5 0x76f0cb74 in l_main_run () at ell/main.c:525
#6 l_main_run () at ell/main.c:507
#7 0x76f0cdd4 in l_main_run_with_signal (callback=callback@entry=0x18c94 <signal_handler>, user_data=user_data@entry=0x0)
at ell/main.c:647
#8 0x00018178 in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:532
This crash was introduced in commit:
4d2176df2985 ("handshake: Allow event handler to free handshake")
The culprit seems to be that 'hs' is being used both in the caller and
in the macro. Since the macro defines a variable 'hs' in local block
scope, it overrides 'hs' from function scope. Yet (_hs) still evaluates
to 'hs' leading the local variable to be initialized with itself. Only
the 'handshake_event(hs, HANDSHAKE_EVENT_SETTING_KEYS))' is affected
since it is the only macro invocation that uses 'hs' from function
scope. Thus, the crash would only happen on hardware supporting handshake
offload (brcmfmac).
Fix this by removing the local scope variable declaration and evaluate
(_hs) instead.
Fixes: 4d2176df2985 ("handshake: Allow event handler to free handshake")
- Ensure that input isn't an empty string
- Ensure that EINVAL errno (which could be optionally returned by
strto{ul|l} is also checked.
- Since strtoul allows '+' and '-' characters in input, ensure that
input which is expected to be an unsigned number doesn't start with
'-'
Given an ASN1 blob of the right form, parse and create
an l_ecc_point object. The form used is specific to DPP
hence why this isn't general purpose and put into dpp-util.
Like in ap.c, allow the event callback to mark the handshake state as
destroyed, without causing invalid accesses after the callback has
returned. In this case the crash was because try_handshake_complete
needed to access members of handshake_state after emitting the event,
as well as access the netdev, which also has been destroyed:
==257707== Invalid read of size 8
==257707== at 0x408C85: try_handshake_complete (netdev.c:1487)
==257707== by 0x408C85: try_handshake_complete (netdev.c:1480)
(...)
==257707== Address 0x4e187e8 is 856 bytes inside a block of size 872 free'd
==257707== at 0x484621F: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==257707== by 0x437887: ap_stop_handshake (ap.c:151)
==257707== by 0x439793: ap_del_station (ap.c:316)
==257707== by 0x43EA92: ap_station_disconnect (ap.c:3411)
==257707== by 0x43EA92: ap_station_disconnect (ap.c:3399)
==257707== by 0x454276: p2p_group_event (p2p.c:1006)
==257707== by 0x439147: ap_event (ap.c:281)
==257707== by 0x4393AB: ap_new_rsna (ap.c:390)
==257707== by 0x4393AB: ap_handshake_event (ap.c:1010)
==257707== by 0x408C7F: try_handshake_complete (netdev.c:1485)
==257707== by 0x408C7F: try_handshake_complete (netdev.c:1480)
(...)
Previously we added logic to defer doing anything in ap_free() to after
the AP event handler has returned so that ap_event() has a chance to
inform whoever called it that the ap_state has been freed. But there's
also a chance that the event handler is destroying both the AP and the
netdev it runs on, so after the handler has returned we can't even use
netdev_get_wdev_id or netdev_get_ifindex. The easiest solution seems to
be to call ap_reset() in ap_free() even if we're within an event handler
to ensure we no longer need any external objects. Also make sure
ap_reset() can be called multiple times.
Another option would be to watch for NETDEV_WATCH_EVENT_DEL and remove
our reference to the netdev (because there's no need actually call
l_rtnl_ifaddr_delete or frame_watch_wdev_remove if the netdev was
destroyed -- frame_watch already tracks netdev removals), or to save
just the ifindex and the wdev id...
The purpose of this was to have a single utility to both cancel an
existing offchannel operation (if one exists) and start a new one.
The problem was the previous offchannel operation was being canceled
first which opened up the radio work queue to other items. This is
not desireable as, for example, a scan would end up breaking the
DPP protocol most likely.
Starting the new offchannel then canceling is the correct order of
operations but to do this required saving the new ID, canceling, then
setting offchannel_id to the new ID so dpp_presence_timeout wouldn't
overwrite the new ID to zero.
This also removes an explicit call to offchannel_cancel which is
already done by dpp_offchannel_start.
Several members are named based on initiator/responder (i/r)
terminology. Eventually both initiator and responder will be
supported so rename these members to use own/peer naming
instead.
ASN1 parsing will soon be required which will need some utilities in
asn1-private.h. To avoid duplication include this private header and
replace the OID's with the defined structures as well as remove the
duplicated macros.
station_set_scan_results takes an autoconnect flag which was being
set true in both regular/quick autoconnect scans. Since OWE networks
are processed after setting the scan results IWD could end up
connecting to a network before all the OWE hidden networks are
populated.
To fix this regular/quick autoconnect results will set the flag to
false, then process OWE networks, then start autoconnect. If any
OWE network scans are pending station_autoconnect_start will fail
but will pick back up after the hidden OWE scan.
During investigation another separate crash was found. The original is
caused by a disconnect event coming in after a neighbor report scan
was completed (roam failed) during the full roam scan.
The second crash is caused by a disconnect coming in during a full
roam scan when no neighbor report scan was ever issued.
scan_request_failed and scan_finished remove the finished scan_request
from the request queue right away, before calling the callback. This
breaks those clients that rely on scan_cancel working on such requests
(i.e. to force the destroy callback to be invoked synchronously, see
a0911ca77812 ("station: Make sure roam_scan_id is always canceled").
Fix this by removing the scan_request from the request queue after
invoking the callback. Also provide a re-entrancy guard that will make
sure that the scan_request isn't removed in scan_cancel itself.
There are similar operations being performed but with different
callbacks and userdata, depending on whether 'sr' is NULL or not.
Optimize the function flow slightly to make if-else unnecessary.
While here, update the comment. periodic scans are now scheduled only
based on the periodic timeout timer.
If periodic scan is active and we receive a SCAN_ABORTED event, we would
still invoke the periodic scan callback with an error. This is rather
pointless since the periodic scan callback cannot do anything useful
with this information. Fix that.
We should never reach a point where NEW_SCAN_RESULTS or SCAN_ABORTED are
received before a corresponding TRIGGER_SCAN is received. Even if this
does happen, there's no harm from processing the commands anyway.
This makes it a little easier to book-keep the started variable. Since
scan_request already has a 'passive' bit-field, there should be no
storage penalty.
If scan_cancel is called on a scan_request that is 'finished' but with
the GET_SCAN command still in flight, it will trigger a crash as
follows:
Received Deauthentication event, reason: 2, from_ap: true
src/station.c:station_disconnect_event() 11
src/station.c:station_disassociated() 11
src/station.c:station_reset_connection_state() 11
src/station.c:station_roam_state_clear() 11
src/scan.c:scan_cancel() Trying to cancel scan id 6 for wdev 200000002
src/scan.c:scan_cancel() Scan is at the top of the queue, but not triggered
src/scan.c:get_scan_done() get_scan_done
Aborting (signal 11) [/home/denkenz/iwd-master/src/iwd]
++++++++ backtrace ++++++++
#0 0x7f9871aef3f0 in /lib64/libc.so.6
#1 0x41f470 in station_roam_scan_notify() at /home/denkenz/iwd-master/src/station.c:2285
#2 0x43936a in scan_finished() at /home/denkenz/iwd-master/src/scan.c:1709
#3 0x439495 in get_scan_done() at /home/denkenz/iwd-master/src/scan.c:1739
#4 0x4bdef5 in destroy_request() at /home/denkenz/iwd-master/ell/genl.c:676
#5 0x4c070b in l_genl_family_cancel() at /home/denkenz/iwd-master/ell/genl.c:1960
#6 0x437069 in scan_cancel() at /home/denkenz/iwd-master/src/scan.c:842
#7 0x41dc2e in station_roam_state_clear() at /home/denkenz/iwd-master/src/station.c:1594
#8 0x41dd2b in station_reset_connection_state() at /home/denkenz/iwd-master/src/station.c:1619
#9 0x41dea4 in station_disassociated() at /home/denkenz/iwd-master/src/station.c:1644
The happens because get_scan_done callback is still called as a result of
l_genl_cancel. Add a re-entrancy guard in the form of 'canceled'
variable in struct scan_request. If set, get_scan_done will skip invoking
scan_finished.
It isn't clear what 'l_queue_peek_head() == results->sr' check was trying
to accomplish. If GET_SCAN dump was scheduled, then it should be
reported. Drop it.
results->sr is set to NULL for 'opportunistic' scans which were
triggered externally. See scan_notify() for details. However,
get_scan_done would only invoke scan_finished (and thus the periodic
scan callback sc->sp.callback) only if the scan queue was empty. It
should do so in all cases.
The point type was being hard coded to 0x3 (BIT1) which may have resulted
in the peer subtracting Y from P when reading in the point (depending on
if Y was odd or not).
Instead set the compressed type to whatever avoids the subtraction which
both saves IWD from needing to do it, as well as the peer.
The intent of this check is to make sure that at least 2 bytes are
available for reading. However, the unintended consequence is that tags
with a zero length at the end of input would be rejected.
While here, rework the check to be more resistant to potential
overflow conditions.
First disconnect wpa_supplicant to make sure it wont miss frames if
it decides to connect. Also alter the order of things for the
configurator test so autoconnect doesn't start until after hostapd
is up (avoids additional scanning and delays)
The DPP spec says nothing about how to handle re-transmits but it
was found in testing this can happen relatively easily for a few
reasons.
If the configurator requests a channel switch but does not get onto
the new channel quick enough the enrollee may have already sent the
authenticate response and it was missed. Also by nature of how the
kernel goes offchannel there are moments in time between ROC when
the card is idle and not receiving any frames.
Only frames where there was no ACK will be retransmitted. If the
peer received the frame and dropped it resending the same frame wont
do any good.
Now the result is sent immediately. Prior a connect attempt or
scan could have started, potentially losing this frame. In addition
the offchannel operation is cancelled after sending the result
which will allow the subsequent connect or scan to happen much
faster since it doesn't have to wait for ROC to expire.
The previous (incorrect) else was removed since it ended up
printing in most cases since the if clause returned. This should
have been an else if conditional from the start and only print if the
station device was not found.
IWD may be in the middle of some long operation, e.g. scanning.
If the URI is returned before IWD is ready, a configurator could
start sending frames and IWD either wont receive them, or will
be unable to respond quickly.
Controlling wpa_supplicant/hostapd from a text based interface is
problematic in that there is no way of knowing if an event corresponds
to a request. In certain cases if wpa_s/hostapd is sending out multiple
events and we make a request, a random event may come back after the
request, but before the actual result.
To fix this, at least for this specific case, we can continue to read
from the socket until the result is numeric.
The offchannel priority was also changed to zero, which matches the
priority of frames. Currently there should be no interaction between
offchannel and connect (previous offchannel priority).
Periodic scans were handled specially where they were only
started if no other requests were pending in the scan queue.
This is fine, and what we want, but this can actually be
handled automatically by nature of the wiphy work queue rather
than needing to check the request queue explicitly.
Instead we can insert periodic scans at a lower priority than
other scans. This puts them at the end of the work queue, as
well as allows future requests to jump ahead if a periodic scan
has not yet started.
Eventually, once all pending scans are done, the peridoic scan
may begin. This is no different than the preivous behavior and
avoids the need for any special checks once scan requests
complete.
One check was added to address the problem of the periodic scan
timer firing before the scan could even start. Currently this
happened to be handled fine in scan_periodic_queue, as it checks
the queue length. Since this check was removed we must see check
for this condition inside scan_periodic_timeout.
This adds a priority argument to scan_common rather than hard
coding it when inserting the work item and uses the newly
defined wiphy priority for scanning.
Work priority was never explicitly defined anywhere, and a module
using wiphy_radio_work APIs needed to ensure it was not inserting
at a priority that would interfere with other work.
Now all the types of work have been defined with their own priority
and future priorities can easily be added before, after, or in
between existing priorities.
- Mostly problems with whitespace:
- Use of spaces instead of tabs
- Stray spaces before closing ')
- Missing spaces
- Missing 'void' from function declarations & definitions that
take no arguments.
- Wrong indentation level
When this attribute is included, the initiator is requesting all
future frames be sent on this channel. There is no reason for a
configurator to act on this attribute (at least for now) so the
request frame will be dropped in this case. Enrollees will act
on it by switching to the new channel and sending the authentication
response.
While connected the driver ends up choosing quite small ROC
durations leading to excessive calls to ROC. This also will
negatively effect any wireless performance for the current
network and possibly lead to missed DPP frames.
Currently the enrollee relied on autoconnect to handle connecting
to the newly configured network. This usually resulted in poor
performance since periodic scans are done at large intervals apart.
Instead first check if the newly configured network is already
in IWD's network queue. If so it can be connected to immediately.
If not, a full scan must be done and results given to station.
With better JSON support the configuration request object
can now be fully parsed. As stated in the previous comment
there really isn't much use from the configurator side apart
from verifying mandatory values are included.
This patch also modifies the configuration result to handle
sending non 'OK' status codes in case of JSON parsing errors.
json_iter_parse is only meant to work on objects while
json_iter_next is only meant to work on arrays.
This adds checks in both APIs to ensure they aren't being
used incorrectly.
Arrays can now be parsed using the JSON_ARRAY type (stored in
a struct json_iter) then iterated using json_iter_next. When
iterating the type can be checked with json_iter_get_type. For
each iteration the value can be obtained using any of the type
getters (int/uint/boolean/null).
This adds support for boolean, (unsigned) integers, and
null types. JSON_PRIMITIVE should be used as the type when
parsing and the value should be struct json_iter.
Once parsed the actual value can be obtained using one of
the primitive getters. If the type does not match they will
return false.
If using JSON_OPTIONAL with JSON_PRIMITIVE the resulting
iterator can be checked with json_iter_is_valid. If false
the key/value was not found or the type was not matching.
First, this was renamed to 'count_tokens_in_container' to be
more general purpose (i.e. include future array counting).
The way the tokens are counted also changed to be more intuitive.
While the previous way was correct, it was somewhat convoluted in
how it worked (finding the next parent of the objects parent).
Instead we can use the container token itself as the parent and
begin counting tokens. When we find a token with a parent index
less than the target we have reached the end of this container.
This also works for nested containers, including arrays since we
no longer rely on a key (which an array element would not have).
For example::
{
"first":{"foo":"bar"},
"second":{"foo2":"bar2"}
}
index 0 <overall object>
index 1 "first" with parent 0
index 2 {"foo":"bar"} with parent 1
Counting tokens inside "first"'s object we have:
index 3 "foo" with parent 2
index 4 "bar" with parent 3
If we continue counting we reach:
index 5 "second" with parent 0
This terminates the counting loop since the parent index is
less than '2' (the index of {"foo":"bar"} object).
In file included from ./ell/ell.h:15,
from ../../src/dpp.c:29:
../../src/dpp.c: In function ‘authenticate_request’:
../../ell/log.h:79:22: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 8 has type ‘size_t’ {aka ‘unsigned int’} [-Wformat=]
79 | l_log(L_LOG_DEBUG, "%s:%s() " format, __FILE__, \
| ^~~~~~~~~~
../../ell/log.h:54:16: note: in definition of macro ‘l_log’
54 | __func__, format "\n", ##__VA_ARGS__)
| ^~~~~~
../../ell/log.h:103:31: note: in expansion of macro ‘L_DEBUG_SYMBOL’
103 | #define l_debug(format, ...) L_DEBUG_SYMBOL(__debug_desc, format, ##__VA_ARGS__)
| ^~~~~~~~~~~~~~
../../src/dpp.c:1235:3: note: in expansion of macro ‘l_debug’
1235 | l_debug("I-Nonce has unexpected length %lu", i_nonce_len);
| ^~~~~~~
Some wpa_cli utilities return some result which isn't possible to
get with wait_for_event unless you know what the result will be.
This adds wait_for_result which just returns the first event that
comes in.
wait_for_event was checking the event string presence in the rx_data
array which meant the event string had to match perfectly to any
received events. This poses problems with events that include additional
information which the caller may not be able to know or does not care
about. For example:
DPP-RX src=02:00:00:00:02:00 freq=2437 type=11
Waiting for this event previously would require the caller know src, freq,
and type. If the caller only wants to wait for DPP-RX, it can now do that.
This is created by the python interpreter for speed optimization
but poses problems if copied to /tmp since previous tests may
have already copied it leading to an exception.
Since a Device class can represent multiple modes (AP, AdHoc, station)
move StationDebug out of the init and only create this class when it
is used (presumably only when the device is in station mode).
The StationDebug class is now created in a property method consistent
with 'station_if'. If Device is not in station mode it is automatically
switched if the test tries any StationDebug methods.
If the Device mode is changed from 'station' the StationDebug class
instance is destroyed.
This file is meant as a sample and contains only the most typically
changed settings. For other settings users should refer to the
iwd.config manual page.
Right now hwsim blindly tries to forward broadcast/multicast frames to
all interfaces it knows about and relies on the kernel to reject the
forwarding attempt if the frequency does not match. This results in
multiple copies of the same message being added to the genl transmit
queue.
On slower systems this can cause a run-away memory consumption effect
where the queued messages are not processed in time prior to a new
message being received for forwarding. The likelyhood of this effect
manifesting itself is directly related to the number of hostapd
instances that are created and are beaconing simultaneously.
Try to optimize frame forwarding by not sending beacon frames
to those interfaces that are in AP mode (i.e. pure hostapd instances)
since such interfaces are going to be operating on a different frequency
and would not be interested in processing beacon frames anyway.
This optimization cuts down peak memory use during certain tests by 30x
or more (~33mb to ~1mb) when profiled with 'valgrind --tool=massif'
Direct leak of 64 byte(s) in 1 object(s) allocated from:
#0 0x7fa226fbf0f8 in __interceptor_malloc (/usr/lib/gcc/x86_64-pc-linux-gnu/9.4.0/libasan.so.5+0x10c0f8)
#1 0x688c98 in l_malloc ell/util.c:62
#2 0x6c2b19 in msg_alloc ell/genl.c:740
#3 0x6cb32c in l_genl_msg_new_sized ell/genl.c:1567
#4 0x424f57 in netdev_build_cmd_authenticate src/netdev.c:3285
#5 0x425b50 in netdev_sae_tx_authenticate src/netdev.c:3385
Direct leak of 7 byte(s) in 1 object(s) allocated from:
#0 0x7fd748ad00f8 in __interceptor_malloc (/usr/lib/gcc/x86_64-pc-linux-gnu/9.4.0/libasan.so.5+0x10c0f8)
#1 0x688c21 in l_malloc ell/util.c:62
#2 0x4beec7 in handshake_state_set_vendor_ies src/handshake.c:324
#3 0x464e4e in station_handshake_setup src/station.c:1203
#4 0x472a2f in __station_connect_network src/station.c:2975
#5 0x473a30 in station_connect_network src/station.c:3078
#6 0x4ed728 in network_connect_8021x src/network.c:1497
Fixes: f24cfa481b0c ("handshake: Add setter for vendor IEs")
Passing the full argument list to StationDebug was removed
because any existing properties (for Device) were being
included and causing incorrect behavior.
This neglected to handle namespaces which should also be
passed to StationDebug. Unfortunately the arguments are not
named when Device() is initialized so they cannot easily be
sorted. Instead just define Device() arguments to match the
DBus abstraction and pass only the path and namespace to
StationDebug
Two commands were added:
dpp <iface> start-enrollee
dpp <iface> start-configurator
dpp <iface> stop
In addition there is support for using the qrencode utility for displaying
the QR code after DPP is started (enrollee or configurator. If qrencode is
found on the system the QR code will be displayed. Otherwise only the URI
will be printed to the console.
This implements a configurator in the responder role. Currently
configuring an enrollee is limited to only the connected network.
This is to avoid the need to go offchannel for any reason. But
because of this a roam, channel switch, or disconnect will cause
the configuration to fail as none of the frames are being sent
offchannel.
Added both enrollee and configurator roles, as well as the needed
logic inside the authentication protocol to verify role compatibility.
The dpp_sm's role will now be used when setting capability bits making
the auth protocol agnostic to enrollees or configurators.
This also allows the card to re-issue ROC if it ends in the middle of
authenticating or configuring as well as add a maximum timeout for
auth/config protocols.
IO errors were also handled as these sometimes can happen with
certain drivers but are not fatal.
Allows creating a new configuration object based on settings, ssid,
and akm suite (for configurator role) as well as converting a
configuration object to JSON.
Rather than hard coding ad0, use the actual frame data. There really
isn't a reason this would differ (only status attribute) but just
in case its better to use the frame data directly.
This is a minimal implementation only supporting legacy network
configuration, i.e. only SSID and PSK/passphrase are supported.
Missing features include:
- Fragmentation/comeback delay support
- DPP AKM support
- 8021x/PKEX support
This implements the DPP protocol used to authenticate to a
DPP configurator.
Note this is not a full implementation of the protocol and
there are a few missing features which will be added as
needed:
- Mutual authentication (needed for BLE bootstrapping)
- Configurator support
- Initiator role
The presence procedure implemented is a far cry from what the spec
actually wants. There are two reason for this: a) the kernels offchannel
support is not at a level where it will work without rather annoying
work arounds, and b) doing the procedure outlined in the spec will
result in terrible discovery performance.
Because of this a simpler single channel announcement is done by default
and the full presence procedure is left out until/if it is needed.
This is a minimal wrapper around jsmn.h to make things a bit easier
for iterating through a JSON object.
To use, first parse the JSON and create a contents object using
json_contents_new(). This object can then be used to initialize a
json_iter object using json_iter_init().
The json_iter object can then be parsed with json_iter_parse by
passing in JSON_MANDATORY/JSON_OPTIONAL arguments. Currently only
JSON_STRING and JSON_OBJECT types are supported. Any JSON_MANDATORY
values that are not found will result in an error.
If a JSON_OPTIONAL string is not found, the pointer will be NULL.
If a JSON_OPTIONAL object is not found, this iterator will be
initialized but 'start' will be -1. This can be checked with a
convenience macro json_object_not_found();
Static analysis was not happy since this return can be negative and
it was being fed into an unsigned argument. In reality this cannot
happen since the key buffer is always set to the maximum size supported
by any curves.
This module provides a convenient wrapper around both
CMD_[CANCEL_]_REMAIN_ON_CHANNEL APIs.
Certain protocols require going offchannel to send frames, and/or
wait for a response. The frame-xchg module somewhat does this but
has some limitations. For example you cannot just go offchannel;
an initial frame must be sent out to start the procedure. In addition
frame-xchg does not work for broadcasts since it expects an ACK.
This module is much simpler and only handles going offchannel for
a duration. During this time frames may be sent or received. After
the duration the caller will get a callback and any included error
if there was one. Any offchannel request can be cancelled prior to
the duration expriring if the offchannel work has finished early.
Make sure we wipe the leases file both for server and client, so that
dhclient doesn't try to re-use leases from previous tests (should really
happen) and waste time waiting for a reply. Extend the timeout from 1s
to 5s, sometimes it takes dhclient 1s just to start. Disable verbose
mode if not needed to avoid dhclient stalling if the pipe is not being
read.
The disconnect event handler was mistakenly bailing out if FT or
reassociation was going on. This was done because a disconnect
event is sent by the kernel when CMD_AUTH/CMD_ASSOC is used.
The problem is an AP could also disconnect IWD which should never
be ignored.
To fix this always parse the disconnect event and, if issued by
the AP, always notify watchers of the disconnect.
Passing *args, **kwargs into StationDebug ended up initializing the
class with Station properties since devices can be initialized from
existing property dictionaries. Since the object path is all
StationDebug needs, pass args[0] instead.
LLD 13 and GNU ld 2.37 support -z start-stop-gc which allows garbage
collection of C identifier name sections despite the __start_/__stop_
references. GNU ld before 2015-10 had the behavior as well. Simply set
the retain attribute so that GCC 11 (if configure-time binutils is 2.36
or newer)/Clang 13 will set the SHF_GNU_RETAIN section attribute to
prevent garbage collection.
Without the patch, there are linker errors with -z start-stop-gc
(LLD default) when -Wl,--gc-sections is used:
```
ld.lld: error: undefined symbol: __start___eap
>>> referenced by eap.c
>>> src/eap.o:(eap_init)
```
The remain attribute will not be needed if the metadata sections are
referenced by code directly.
Document the new API that clients can use to get notified of new network
configuration and be responsible for committing it to the netdev, the
resolver, etc.
On some systems the default radvd pid file location is not accessible.
Specify it to be under /tmp instead.
While there, enable full radvd debug output so it is logged when
test-runner is invoked with the --log option.
ap.c has been mostly careful to call the event handler at the end of any
externally called function to allow methods like ap_free() to be called
within the handler, but that isn't enough. For example in
ap_del_station we may end up emitting two events: STATION_REMOVED and
DHCP_LEASE_EXPIRED. Use a slightly more complicated mechanism to
explicitly guard ap_free calls inside the event handler.
To make it easier, simplify cleanup in ap_assoc_reassoc with the use of
_auto_.
In ap_del_station reorder the actions to send the STATION_REMOVED event
first as the DHCP_LEASE_EXPIRED is a consequence of the former and it
makes sense for the handler to react to it first.
src/eap.c: In function 'eap_rx_packet':
src/eap.c:419:50: error: 'vendor_type' may be used uninitialized in this function [-Werror=maybe-uninitialized]
419 | (type == EAP_TYPE_EXPANDED && vendor_id == (id) && vendor_type == (t))
| ^~
src/eap.c:430:11: note: 'vendor_type' was declared here
430 | uint32_t vendor_type;
It isn't clear why GCC complains about vendor_type, but not vendor_id.
But in all cases if type == EAP_TYPE_EXPANDED, then vendor_type and
vendor_id are set. Silence this spurious warning.
There is an unchecked NULL pointer access in network_has_open_pair.
open_info can be NULL, when out of multiple APs in range that advertise
the same SSID some advertise OWE transition elments and some don't.
A user reported a crash in situations where there was an OWE transition
pair, with an extra open network using the same SSID but not advertising
the OWE transition IE:
++++++++ backtrace ++++++++
0x7f199cadf320 in /lib64/libc.so.6
0x418c08 in network_has_open_pair() at /home/jprestwo/iwd/src/station.c:712
0x4262ce in scan_finished() at /home/jprestwo/iwd/src/scan.c:1718
0x4273cd in get_scan_done() at /home/jprestwo/iwd/src/scan.c:1733
0x47cf7a in destroy_request() at /home/jprestwo/iwd/ell/genl.c:674
0x479f1c in io_callback() at /home/jprestwo/iwd/ell/io.c:120
0x47922d in l_main_iterate() at /home/jprestwo/iwd/ell/main.c:472 (discriminator 2)
0x4792dc in l_main_run() at /home/jprestwo/iwd/ell/main.c:521
0x47950c in l_main_run_with_signal() at /home/jprestwo/iwd/ell/main.c:649
0x403e97 in main() at /home/jprestwo/iwd/src/main.c:532
0x7f199cac9b75 in /lib64/libc.so.6
+++++++++++++++++++++++++++
The Hotspot 2.0 spec has some requirements that IWD was missing depending
on a few bits in extended capabilities and the HS2.0 indication element.
These requirements correspond to a few sysfs options that can be set in
the kernel which are now set on CONNECTED and unset on DISCONNECTED.
Netconfig was the only user of sysfs but now other modules will
also need it.
Adding existing API for IPv6 settings, a IPv4 and IPv6 'supports'
checker, and a setter for IPv4 settings.
If a beacon is lost testAP will fail since it did not utilize any
rescanning logic. Now it can use this feature by passing full_scan.
This is required since IWD APs are not known to test-runner like
hostapd APs are.
Certain scenarios coupled with lost beacons could result in OrderedNetwork
being initialized many times until the dbus library reached its maximum
signal registrations. This could happen where there are two networks,
IWD finds one in a scan but continues to scan for the other and the beacons
are lost. The way get_ordered_networks was written it returns early if any
networks are found. Since get_ordered_network (not plural) uses
get_ordered_networks() in a loop this caused OrderedNetwork's to be created
rapidly until python raises an exception.
To fix this, pass an optional list of networks being looked for to
get_ordered_networks. Only if all the networks in the list are found will
it return early, otherwise it will continue to scan.
In non-interactive mode, when a dbus method call returns the process
exits. This is true for all methods except agent requests since e.g.
Connect() call automatically requests credentials and the client must
wait for that to return before exiting. The new daemon interface must
also be treated in the same way and not exit.
It appears different versions of pyroute2 may or may not have
iwutil, and instead use pyroute2.IW() directly. Try the iwutil
way first, then pyroute2.IW()
The way a SA Query was done following a channel switch was slightly
incorrect. One because it is only needed when OCVC is set, and two
because IWD was not waiting a random delay between 0 and 5000us as
lined out by the spec. This patch fixes both these issues.
When station 'show' is invoked, parse and print any IP addresses
associated with the interface.
If iwd network configuration is disabled and no IP addresses were
configured, print a hint to the user that a DHCP client might need to be
configured.
Commit ed10b00afa3f ("unit: Fix eapol IP Allocation test failure")
did not convert all instances of IP allocation settings to network byte
order.
Fixes: 5c9de0cf23f9 ("eapol: Store IP address in network byte order")
Cache the latest v4 and v6 domain string lists in struct netconfig state
to be able to more easily detect changes in those values in future
commits. For that split netconfig_set_domains's code into this function,
which now only commits the values in netconfig->v{4,6}_domain{,s} to the
resolver, and netconfig_domains_update() which figures out the active
domains string list and saves it into netconfig->v{4,6}_domain{,s}. This
probably saves some cycles as the callers can now decide to only
recalculate the domains list which may have changed.
While there simplify netconfig_set_domains return type to void as the
result was always 0 anyway and was never checked by callers.
Cache the latest v4 and v6 DNS IP string lists in struct netconfig state
to be able to more easily detect changes in those values in future
commits. For that split netconfig_set_dns's code into this function,
which now only commit the values in netconfig->dns{4,6}_list to the
resolver, and netconfig_dns_list_update() which figures out the active
DNS IP address list and saves it in netconfig->dns{4,6} list. This
probably saves some cycles as the callers can now decide to only
recalculate the dns_list which may have changed.
While there simplify netconfig_set_dns return type to void as the result
was always 0 anyway and was never checked by callers.
Cache the latest v4 and v6 gateway IP string in struct netconfig state
to be able to more easily detect changes in those values in future
commits and perhaps to simplify the ..._routes_install functions.
netconfig_ipv4_get_gateway's out_mac parameter can now be NULL. While
editing that function fix a small formatting annoyance.
Use a separate fils variable to make the code a bit prettier.
Also make sure that the out_mac parameter is not NULL prior to storing
the gateway_mac in it.
Add netconfig_enabled() and use that in all places that want to know
whether network configuration is enabled. Drop the enable_network_config
deprecated setting, which was only being handled in one of these 5 or so
places.
This code path was never tested and used to ensure a OWE transition
candidate gets selected over an open one (e.g. if all the BSS's are
blacklisted). But this logic was incorrect and the path was being
taken for BSS's that did not contain the owe_trans element, basically
all BSS's. For RSN's this was somewhat fine since the final check
would set a candidate, but for open BSS's the loop would start over
and potentially complete the loop without ever returning a candidate.
If fallback was false, NULL would be returned.
To fix this only take the OWE transition path if its an OWE transition
BSS, i.e. inverse the logic.
There was no open ssid provisioning file, which was fine as the
first test should have created one. But to be safe, include one
explicitly and use the proper setUp/tearDown functions.
Normally Beacon Reporting subelements are present only if repeated
measurements are requested. However, an all-zero Beacon Reporting
subelement is included by some implementations. Handle this case
similarly to the absent case.
Since Reporting Detail subelement is listed as 'extensible', make sure
that the length check is not overly restrictive. We only interpret the
first field.
It was seen during testing that several offload-capable cards
were not including the OCI in the 4-way handshake. This made
any OCV capable AP unconnectable.
To be safe disable OCV on any cards that support offloading.
802.11 requires an STA initiate the SA Query procedure on channel
switch events. This patch refactors sending the SA Query into its
own routine and starts the procedure when the channel switch event
comes in.
In addition the OCI needs to be verified, so the channel info is
parsed and set into the handshakes chandef.
There are several events for channel switching, and nl80211cmd was
naming two of them "Channel Switch Notify". Change
CH_SWITCH_STARTED_NOTIFY to "Channel Switch Started Notify" to
distinguish the two events.
By sleeping for 4 seconds IWD had plenty of time to fully disconnect
and reconnect in time to pass the final "connected" check. Instead
use wait_for_object_condition to wait for disconnected and expect
this to fail. This will let the test fail if IWD disconnects.
SA query is the final protocol that requires OCI inclusion and
verification. The OCI element is now included and verified in
both request and response frames as required by 802.11.
strcmp behavior is undefined if one of the parameters is NULL.
Server-id is a mandatory value and cannot be NULL. Gateway can be NULL
in DHCP, so check that explicitly.
Reported-by: Andrew Zaborowski <andrew.zaborowski@intel.com>
In certain situations, it is possible for us to know the MAC of the
default gateway when DHCP finishes. This is quite typical on many home
network and small network setups. It is thus possible to pre-populate
the ARP cache with the gateway MAC address to save an extra round trip
at connection time.
Another advantage is during roaming. After version 4.20, linux kernel
flushes ARP caches by default whenever netdev encounters a no carrier
condition (as is the case during roaming). This can prevent packets
from going out after a roam for a significant amount of time due to
lost/delayed ARP responses.
This implements the new handshake callback for setting a TK with
an extended key ID. The procedure is different from legacy zero
index TKs.
First the new TK is set as RX only. Then message 4 should be sent
out (so it uses the existing TK). This poses a slight issue with
PAE sockets since message order is not guaranteed. In this case
the 4th message is stored and sent after the new TK is installed.
Then the new TK is modified using SET_KEY to both send and
receive.
In the case of control port over NL80211 the above can be avoided
and we can simply install the new key, send message 4, and modify
the TK as TX + RX all in sequence, without waiting for any callbacks.
When UseDefaultInterface is set, iwd doesn't attempt to destroy and
recreate any default interfaces it detects. However, only a single
default interface was ever remembered & initialized. This is fine for
most cases since the kernel would typically only create a single netdev
by default.
However, some drivers can create multiple netdevs by default, if
configured to do so. Other usecases, such as tethering, can also
benefit if iwd initialized & managed all default netdevs that were
detected at iwd start time or device hotplug.
oci variable is always set during handshake_util_find_kde. Do not
initialize it unnecessarily to help the compiler / static analysis find
potential issues.
If OCI is not used, then the oci array is never initialized. Do not try
to include it in our GTK 2_of_2 message.
Fixes: ad4d6398542b ("eapol: include OCI in GTK 2/2")
802.11 added Extended Key IDs which aim to solve the issue of PTK
key replacement during rekeys. Since swapping out the existing PTK
may result in data loss because there may be in flight packets still
using the old PTK.
Extended Key IDs use two key IDs for the PTK, which toggle between
0 and 1. During a rekey a new PTK is derived which uses the key ID
not already taken by the existing PTK. This new PTK is added as RX
only, then message 4/4 is sent. This ensure message 4 is encrypted
using the previous PTK. Once sent, the new PTK can be modified to
both RX and TX and the rekey is complete.
To handle this in eapol the extended key ID KDE is parsed which
gives us the new PTK key index. Using the new handshake callback
(handshake_state_set_ext_tk) the new TK is installed. The 4th
message is also included as an argument which is taken care of by
netdev (in case waiting for NEW_KEY is required due to PAE socekts).
REKEY_GTK kicks off the GTK only handshake where REKEY_PTK does
both (via the 4-way). The way this utility was written was causing
hostapd some major issues since both REKEY_GTK and REKEY_PTK was
used.
Instead if address is set only do REKEY_PTK. This will also rekey
the GTK via the 4-way handshake.
If no address is set do REKEY_GTK which will only rekey the GTK.
This may not be required but setting the group key mode explicitly
to multicast makes things consistent, even if only for the benefit
of reading iwmon logs easier.
The procedure for setting extended key IDs is different from the
single PTK key. The key ID is toggled between 0 and 1 and the new
key is set as RX only, then set to RX/TX after message 4/4 goes
out.
Since netdev needs to set this new key before sending message 4,
eapol can include a built message which netdev will store if
required (i.e. using PAE).
ext_key_id_capable indicates the handshake has set the capability bit
in the RSN info. This will only be set if the AP also has the capability
set.
active_tk_index is the key index the AP chose in message 3. This is
now used for both legacy (always zero) and extended key IDs.
Move the reading of ControlPortOverNL80211 into wiphy itself and
renamed wiphy_control_port_capable to wiphy_control_port_enabled.
This makes things easier for any modules interested in control
port support since they will only have to check this one API rather
than read the settings and check capability.
Expose the Device Address property for each peer. The spec doesn't say
much about how permanent the address or the name are, although the
device address by definition lives longer than the interface addresses.
However the device address is defined to be unique and the name is not
so the address can be used to differentiate devices with identical name.
Being unique also may imply that it's assigned globally and thus
permanent.
Network Manager uses the P2P device address when saving connection
profiles (and will need it from the backend) and in this case it seems
better justified than using the name.
The address is already in the object path but the object path also
includes the local phy index which may change for no reason even when
the peer's address hasn't changed so the path is not useful for
remembering which device we've connected to before. Looking at only
parts of the path is considered wrong.
Some drivers might not actually support control port properly even if
advertised by mac80211. Introduce a new method to wiphy that will take
care of looking up any driver quirks that override the presence of
NL80211_EXT_FEATURE_CONTROL_PORT_OVER_NL80211
Make consecutive calls to netconfig_load_settings() memory-leak safe by
introducing a netconfig_free_settings convenience method. This method
will free any settings that are allocated as a result of
netconfig_load_settings() and will be called from netconfig_free() to
ensure that any settings are freed as a result of netconfig_destroy().
For symmetry with IPv4, save the command id for this netlink command so
we can later add logic to the callback as well as be able to cancel the
command. No functional change in this commit alone.
The FT-over-DS test was allowed to fail as it stood. If FT-over-DS
failed it would just do a normal over-Air transition which satisfied
all the checks. To prevent this Authenticate frames are blocked after
the initial connection so if FT-over-DS fails there is no other way
to roam.
FT/FILS handle their own PMK derivation but rekeys still require
using the 4-way handshake. There is some ambiguity in the spec whether
or not the PMKID needs to be included in message 1/4 and it appears
that when rekeying after FT/FILS hostapd does not include a PMKID.
The handshake contains the current BSS's RSNE/WPA which may differ
from the FT-over-DS target. When verifying the target BSS's RSNE/WPA
IE needs to be checked, not the current BSS.
If the deauth path was triggered IWD would deauth but end up
calling the connect callback with whatever result netdev had
set, e.g. 'NETDEV_RESULT_OK'. This, of course, caused station
some confusion.
FT-over-DS cannot use OCV due to how the kernel works. This means
we could connect initially with OCVC set, but a FT-over-DS attempt
needs to unset OCVC. Set OCVC false when rebuilding the RSNE for
reassociation.
The FT-over-DS action stage builds an FT-Request which contains an
RSNE. Since FT-over-DS will not support OCV add a boolean to
ft_build_authenticate_ies so the OCVC bit can be disabled rather
than relying on the handshake setting.
This modifies the FT logic to fist call get_oci() before
reassociation. This allows the OCI to be included in reassociation
and in the 4-way handshake later on.
The code path for getting the OCI had to be slightly changed to
handle an OCI that is already set. First the handshake chandef is
NULL'ed out for any new connection. This prevents a stale OCI from
being used. Then some checks were added for this case in
netdev_connect_event and if chandef is already set, start the 4-way
handshake.
netconfig_load_settings is called when establishing a new initial
association to a network. This function tries to update dhcp/dhcpv6
clients with the MAC address of the netdev being used. However, it is
too early to update the MAC here since netdev might need to powercycle
the underlying network device in order to update the MAC (i.e. when
AddressRandomization="network" is used).
If the MAC is set incorrectly, DHCP clients are unable to obtain the
lease properly and station is stuck in "connecting" mode indefinitely.
Fix this by delaying MAC address update until netconfig_configure() is
invoked.
Fixes: ad228461abbf ("netconfig: Move loading settings to new method, refactor")
If the AP advertises FT-over-DS support it likely wants us to use
it. Additionally signal_low is probably going to be true since IWD
has started a roam attempt.
When netdev goes down so does station, but prior to netdev calling
the neighbor report callback. The way the logic was written station
is dereferenced prior to checking for any errors, causing a use
after free.
Since -ENODEV is used in this case check for that early before
accessing station.
After namespaces were added, the dbus address was customized to
be /tmp/dbus{0..N}. This prevented any dbus applications started
in the shell from working properly.
Set DBUS_SYSTEM_BUS_ADDRESS to the environment prior to entering
the shell.
This adds a utility to convert a chandef obtained from the kernel into a
3 byte OCI element format containing the operating class, primary
channel and secondary channel center frequency index.
This changes scan_bss from using separate members for each
OWE transition element data type (ssid, ssid_len, and bssid)
to a structure that holds them all.
This is being done because OWE transition has option operating
class and channel bytes which will soon be parsed. This would
end up needing 5 separate members in scan_bss which is a bit
much for a single IE that needs to be parsed.
This makes checking the presense of the IE more convenient
as well since it can be done with a simple NULL pointer check
rather than having to l_memeqzero the BSSID.
These members are currently stored in scan_bss but with the
addition of operating class/band info this will become 5
separate members. This is a bit excessive to store in scan_bss
separately so instead this structure can hold everything related
to the OWE transition IE.
Add a utility for setting the OCI obtained from the hardware (prior to
handshake starting) as well as a utility to validate the OCI obtained
from the peer.
This adds a utility that can convert an operating class + channel
combination to a frequency. Operating class is assumed to be a global
operating class from 802.11 Appendix E4.
This information can be found in Operating Channel Information (OCI) IEs,
as well as OWE Transition Mode IEs.
Calling handshake_state_setup_own_ciphers from within
handshate_state_set_authenticator_ie was misleading. In all cases the
supplicant chooses the AKM. This worked since our AP code only ever
advertises a single AKM, but would not work in the general case.
Similarly, the supplicant would choose which authentication type to use
by either sending the WPA1 or WPA2 IE (or OSEN). Thus the setting of
the related variables in handshake_state_set_authenticator_ie was also
incorrect. In iwd, the supplicant_ie would be set after the
authenticator_ie, so these settings would be overwritten in most cases.
Refactor these two setters so that the supplicant's chosen rsn_info
would be used to drive the handshake.
reallocarray has been added to glibc relatively recently (version 2.26,
from 2017) and apparently not all users run new enough glibc. Moreover,
reallocarray is not available with uclibc-ng. So use realloc if
reallocarray is not available to avoid the following build failure
raised since commit 891b78e9e892a3bcd800eb3a298e6380e9a15dd1:
/home/giuliobenetti/autobuild/run/instance-3/output-1/host/lib/gcc/xtensa-buildroot-linux-uclibc/10.3.0/../../../../xtensa-buildroot-linux-uclibc/bin/ld: src/sae.o: in function `sae_rx_authenticate':
sae.c:(.text+0xd74): undefined reference to `reallocarray'
Fixes:
- http://autobuild.buildroot.org/results/c6d3f86282c44645b4f1c61882dc63ccfc8eb35a
This adds several tests for OWE transition networks. Hostapd
does have special options for these networks but currently their
implementation is incorrect as the IE is not ever added to the
OWE BSS. Besides that using vendor_elements provides a much easier
way to create invalid IEs to test.
There isn't much control station has with how BSS's are inserted to
a network object. The rank algorithm makes that decision. Because of
this we could end up in a situation where the Open BSS is preferred
over the OWE transition BSS.
In attempt to better handle this any Open BSS in this type of network
will not be chosen unless its the only candidate (e.g. no other BSSs,
inability to connect with OWE, or an improperly configured network).
OWE Transition is described in the WiFi Alliance OWE Specification
version 1.1. The idea behind it is to support both legacy devices
without any concept of OWE as well as modern ones which support the
OWE protocol.
OWE is a somewhat special type of network. Where it advertises an
RSN element but is still "open". This apparently confuses older
devices so the OWE transition procedure was created.
The idea is simple: have two BSS's, one open, and one as a hidden
OWE network. Each network advertises a vendor IE which points to the
other. A device sees the open network and can connect (legacy) or
parse the IE, scan for the hidden OWE network, and connect to that
instead.
Care was taken to handle connections to hidden networks directly.
The policy is being set that any hidden network with the WFA OWE IE
is not connectable via ConnectHiddenNetwork(). These networks are
special, and can only be connected to via the network object for
the paired open network.
When scan results come in from any source (DBus, quick, autoconnect)
each BSS is checked for the OWE Transition IE. A few paths can be
taken here when the IE is found:
1. The BSS is open. The BSSID in the IE is checked against the
current scan results (excluding hidden networks). If a match is
found we should already have the hidden OWE BSS and nothing
else needs to be done (3).
2. The BSS is open. The BSSID in the IE is not found in the
current scan results, and the open network also has no OWE BSS
in it. This will be processed after scan results.
3. The BSS is not open and contains the OWE IE. This BSS will
automatically get added to the network object and nothing else
needs to be done.
After the scan results each network is checked for any non-paired
open BSS's. If found a scan is started for these BSS's per-network.
Once these scan results come in the network is notified.
From here network.c can detect that this is an OWE transition
network and connect to the OWE BSS rather than the open one.
Specifically OWE networks with multiple open/hidden BSS's are troublesome
to scan for with the current APIs. The scan parameters are limited to a
single SSID and even if that was changed we have the potential of hitting
the max SSID's per scan limit. In all, it puts the burden onto the caller
to sort out the SSIDs/frequencies to scan for.
Rather than requiring station to handle this a new scan API was added,
scan_owe_hidden() which takes a list of open BSS's and will automatically
scan for the SSIDs in the OWE transition IE for each.
It is slightly optimized to first check if all the hidden SSID's are the
same. This is the most likely case (e.g. single pair or single network)
and a single scan command can be used. Otherwise individual scan commands
are queued for each SSID/frequency combo.
handshake_util_ap_ie_matches() is used to make sure that the RSN element
received from the Authenticator during handshake / association response
is the same as the one advertised in Beacon/Probe Response frames. This
utility tries to bitwise compare the element first, and only if that
fails, compares RSN members individually.
For FT, bitwise comparison will always fail since the PMKID has to be
included by the Authenticator in any RSN IEs included in Authenticate
& Association Response frames.
Perform the bitwise comparison as an optimization only during processing
of eapol message 3/4. Also keep the parsed rsn information for future
use and to possibly avoid re-parsing it during later checks.
DBus scan is performed in several subsets. In certain corner-case
circumstances it would be possible for autoconnect to run after each
subset scan. Instead, trigger autoconnect only after the dbus scan
completes.
This also works around a condition where ANQP results could trigger
autoconnect too early.
Several invocations of station_set_scan_results() base the
'add_to_autoconnect' parameter on station_is_autoconnecting(). Simplify
the code by having station_set_scan_results() invoke that itself.
'add_to_autoconnect' now becomes an 'intent' parameter, specifying
whether autoconnect path should be invoked as a result of these scan
results or not when station is in an appropriate state. Rename
'add_to_autoconnect' parameter to make this clearer.
If the frequency of the bss is not in the list of frequencies for the
current scan, then this is a cached bss. It was likely already
processed for ANQP before, so skip it.
IWD has restricted SSIDs to only utf8 so they can be displayed but
with the addition of OWE transition networks this is an unneeded
restriction (for these networks). The SSID of an OWE transition
network is never displayed to the user so limiting to utf8 isn't
required.
Allow non-utf8 SSIDs to be scanned for by including the length in
the scan parameters and not relying on strlen().
This is a parser for the WFA OWE Transition element. For now the
optional band/channel bytes will not be parsed as hostapd does not
yet support these and would also require the 802.11 appendix E-1
to be added to IWD. Because of this OWE Transition networks are
assumed to be on the same channel as their open counterpart.
in6_addr.__in6_u.__u6_addr8 is glibc-specific and named differently in
the headers shipped with musl libc for example. The POSIX compliant and
universal way of accessing it is in6_addr.s6_addr.
This was actually broken if triggered because __network_connect
checks if network->connect_after_owe_hidden is set and returns
already in progress. We want to keep this behavior though for
obvious reasons.
To fix this station_connect_network can be called directly which
bypasses the check. This is essentially how ANQP avoids this
problem as well.
Similar to ANQP a connect call could come in while station is
scanning for OWE hidden networks. This is supported in the same
manor by saving away the dbus message and resuming the connection
after the hidden OWE scan.
With the addition of OWE transition network needs to be notified
of the hidden OWE scan which is quite similar to how it is notified
of ANQP. The ANQP event watch can be made generic and reused to
allow other events besides ANQP.
This is being added to support OWE transition mode. For these
type of networks the OWE BSS may contain a different SSID than
that of the network, but the WFA spec requires this be hidden
from the user. This means we need to set the handshake SSID based
on the BSS rather than the network object.
Refactor netconfig_set_dns to be a bit easier to follow and remove use
of macros. Also bail out early if no DNS addresses are provided instead
of building an empty DNS list since resolve_set_dns() simply returns if
a NULL or empty DNS list is provided.
If set, a rule will start matching 'MatchBytes' some number of bytes
into the frame (MatchBytesOffset). This is useful since header
information, addresses, and sequence numbers may be unpredictable
between test runs.
To avoid unintended matches the Prefix property is left unchanged
and will match starting at the beginning of the frame.
Since IWD tries group 20 first all other OWE tests are actually
triggering group negotiation where this test is not. Since this
code is exercised this test can be removed completely, as well
as the additional radio/network.
Kernel keeps transmitting authentication frames until told to stop or an
authentication frame the kernel considers 'final' is received. Detect
cases where the kernel would keep retransmitting, and if auth_proto
encounters a fatal protocol error, prevent these retransmissions from
occuring by sending a Deauthenticate command to the kernel.
Additionally, treat -EBADMSG/-ENOMSG return from auth_proto specially.
These error codes are meant to convey that a frame should be silently
dropped and retransmissions should continue.
This test simulates the scenario where IWDs commit is not acked which
exposes a hostapd bug that ultimately fails the connection. This behavior
can be seen by reverting the commit which works around this issue:
"sae: don't send commit in confirmed state"
With the above patch applied this test should pass.
Note: The existing timeout test was reused as it was not of much use
anyways. All it did was block auth/assoc frames and expect a failure
which didn't exercise any SAE logic anyways.
This works around a hostapd bug (described more in the TODO comment)
which is exposed because of the kernels overly agressive re-transmit
behavior on missed ACKs. Combined this results in a death if the
initial commit is not acked. This behavior has been identified in
consumer access points and likely won't ever be patched for older
devices. Because of this IWD must work around the problem which can
be eliminated by not sending out this commit message.
This bug was reported to the hostapd ML:
https://lists.infradead.org/pipermail/hostap/2021-September/039842.html
This change should not cause any compatibility problems to non-hostapd
access points and is identical to how wpa_supplicant treats this
scenario.
If a commit is received while in an accepted state the spec states
the scalar should be checked against the previous commit and if
equal the message should be silently dropped.
The hwsim rules did not treat frames and ACKs any differently which
can mislead the developer especially when setting a rule prefix.
If a prefix was used the frame ACK was actually being matched against
the original frame payload which seems wrong because the ACK is not
the original frame.
Though strange, matching the frame prefix on an ACK has its place if
the developer wants to block just the ACK rather than the frame so
to make this case more clear 'DropAck' was added as a rule property.
And only if this is true will an ACK be checked and potentially
dropped.
To maintain the current hwsim behavior DropAck will default to true.
This integer property can be set to only match a rule a number of
times rather than all packets. This is useful for testing behavior
of a single dropped frame or ack. Once the rule has been matched
'MatchTimes' the rules will no longer be applied (unless set again
to some integer greater than zero).
Since Process.processes is a weak reference dictionary any process
put in this dict will disappear if all references are lost. This
is much better than keeping a list in the Namespace which will hold
the references forever until test-runner manually kills them all at
the end of the test. This does still need to be done for daemon
processes but everything else can just go away when it is no longer
needed.
The test-runner logging is very basic and just dumps everything into files
per-test. This means any subtests are just appended to existing log files
which can be difficult to parse after the fact. This is especially hard
when IWD/Hostapd runs once for the entirety of the test (as opposed to
killing between tests).
This patch writes out a separator between each subtests in the form:
===== <file>:<function> =====
To do this all processes are now kept as weak references inside the
Process class itself. Process.write_separators() can be called which
will iterate through all running processes and write the provided
separator.
This also paves the way to remove the ctx.processes array which is more
trouble than its worth due to reference issues.
Note: For tests which start IWD this will have no effect as the separator
is written prior to the test running. For these tests though, it is
much easier to read the log files because you can clearly see when
IWD starts and exits.
Processes which were not explicitly killed ended up staying around
forever because they internally held references to other objects
such as GLib IO watches or write FDs.
This shuffles some code so these objects get cleaned up both when
explititly killed and after being waited for.
This was a placeholder at one point but modules grew to depend on it
being a string. Fix these dependencies and set the root namespace
name to None so there is no more special case needed to handle both
a named namespace and the original 'root' namespace.
In netconfig_load_settings apply the DNS overrides strings we've loaded
instead of leaking them.
Fixes: ad228461abbf ("netconfig: Move loading settings to new method, refactor")
With various versions of wpa_supplicant tested, after an IWD GO tears
the group down, the wpa_supplicant P2P client will not immediately
signal that the group has disappeared but will at least wait for the
lost beacon signal, wait some more and try reconnecting and all that
takes it 10s or a little longer. Possibly sending Deauthenticate frames
to clients first would improve this.
netdev now assumes the SSID was set in the handshake (normally via
network_handshake_setup) but WSC calls netdev_connect directly so
it also should set the SSID.
In order to support OWE in the CMD_CONNECT path the scan_bss parameter
needs to be removed since this is lost after netdev_connect returns.
Nearly everything needed is also stored in the handshake except the
privacy capability which is now being mirrored in the netdev object
itself.
Check whether verbose output is enabled for process name arg[0] before
prepending the "ip netns exec" part to arg since arg[0] is going to be
"ip" after that.
Use the MAC addresses for the gateways and DNS servers received in the
FILS IP Assigment IE together with the gateway IP and DNS server IP.
Commit the IP to MAC mappings directly to the ARP/NDP tables so that the
network stack can skip sending the corresponding queries over the air.
Send and receive the FILS IP Address Assignment IEs during association.
As implemented this would work independently of FILS although the only
AP software handling this mechanism without FILS is likely IWD itself.
No support is added for handling the IP assignment information sent from
the server after the initial Association Request/Response frames, i.e.
the information is only used if it is received directly in the
Association Response without the "response pending" bit, otherwise the
DHCP client will be started.
Add two methods that will allow station to implement FILS IP Address
Assigment, one method to decide whether to send the request during
association, and fill in the values to be used in the request IE, and
another to handle the response IE values received from the server and
apply them. The netconfig->rtm_protocol value used when the address is
assigned this way remains RTPROT_DHCP because from the user's point of
view this is automatic IP assigment by the server, a replacement for
DHCP.
Split loading settings out of network_configure into a new method,
network_load_settings. Make sure both consistently handle errors by
printing messages and informing the caller.
These modules only needed to be imported a single time for the entire
run of tests. This is significantly cheaper in terms of memory and
should prevent random OOM exceptions.
The Procss class was doing quite a bit of what Popen already does like
storing the return code and process arguments. In addition the Process
class ended up storing a Popen object which was frequently accessed.
For both simplicity and memory savings have Process inherit Popen and
add the additional functionality test-runner needs like stdout
processing to output files and the console.
To do this Popen.wait() needed to be overridden to to prevent blocking
as well as wait for the HUP signal so we are sure all the process
output was written. kill() was also overritten to perform cleanup.
The most intrusive change was removing wait as a kwarg, and instead
requiring the caller to call wait(). This doesn't change much in
terms of complexity to the caller, but simplifies the __init__
routine of Process.
Some convenient improvements:
- Separate multiple process instance output (Terminate: <args> will
be written to outfiles each time a process dies.)
- Append to outfile if the same process is started again
- Wait for HUP before returning from wait(). This allows any remaining
output to be written without the need to manually call process_io.
- Store ctx as a class variable so callers don't need to pass it in
(e.g. when using Process directly rather than start_process)
Setter which forces the use of group 19 rather than the group order
that ELL provides. Certain APs have been found to have buggy group
negotiation and only work if group 19 is tried first, and only. When
an AP like this this is found (based on vendor OUI match) SAE will
use group 19 unconditionally, and fail if group 19 does not work.
Other groups could be tried upon failure but per the spec group 19
must be supported so there isn't much use in trying other, optional
groups.
mac80211_hwsim has a funny quirk with multiple addresses in
radios. Some operations require address index zero, some index
one. And these addresses (possibly a result of how test-runner
initializes radios) sometimes get mixed up. For example scan
results may show a BSS address as 02:00:00:00:00:00, while the
next test run shows 42:00:00:00:00:00.
Ultimately, sending out frames requires the first nibble of the
address to be 0x4 so to handle both variants of addresses described
above hwsim.py was updated to always bitwise OR the first byte
with 0x40.
Handle the 802.11ai FILS IP Address Assignment IEs in Association
Request frames when netconfig is enabled. Only IPv4 is supported.
Like the P2P IP Allocation mechanism, since the payload format and logic
is independent from the rest of the FILS standard this is enabled
unconditionally for clients who want to use it even though we don't
actually do FILS in AP mode.
If netconfig is enabled tell the DHCP server to expire any leases owned
by the client that is disconnecting by using l_dhcp_server_expire_by_mac
to return the IPs to the IP pool. They're added to the expired list
so they'd only be used if there are no other addresses left in the pool
and can be reactivated if the client comes back before the address is
used by somebody else.
This should ensure that we're always able to offer an address to a new
client as long as there are fewer concurrent clients than addresses in
the configured subnet or IP range.
Use the struct handshake_state::support_ip_allocation field already
supported in eapol.c authenticator side to enable the P2P IP Allocation
mechanism in ap.c. Add the P2P_GROUP_CAP_IP_ALLOCATION bit in P2P group
capabilities to signal the feature is now supported.
There's no harm in enabling this feature in every AP (not just P2P Group
Owner) but the clients won't know whether we support it other than
through that P2P-specific group capability bit.
Add a handshake event for use by the AP side for mechanisms that
allocate client IPs during the handshake: P2P address allocation and
FILS address assignment. This is emitted only when EAPOL or the
auth_proto is actually about to send the network configuration data to
the client so that ap.c can skip allocating a DHCP leases altogether if
the client doesn't send the required KDE or IE.
This test was failing due to a change introduced in commit
5c9de0cf23f9 which changed handshake state storage of IPs from host
order to network byte order. Update the test to set IPs in network
byte-order.
Fixes: 5c9de0cf23f9 ("eapol: Store IP address in network byte order")
Some drivers ignore the initial IF_OPER_UP setting that was sent during
netdev_connect_ok(). Attempt to work around this by parsing New Link
events. If OperState setting is still not correct in a subsequent event,
retry setting OperState to IF_OPER_UP.
The idea of this test is valid but it is extremely timing dependent
which simply isn't testable on all machines. Removing this test
at least until this can be tested reliably.
This was initially put in to solve an issue that was specific to
mac80211_hwsim where the connect callback would get queued and
delayed until after the connect event. This caused IWD to get very
confused.
Later it was found that "real" drivers can sometimes do this so
some code was added to IWD core to handle it.
Now there isn't much point to delay all frames unless a rule specifies
so change the behavior back to sending out frames immediately.
The hwsim Rule API was structured as properties so once a rule is
created it automatically starts being applied to frames. This happens
before anything has time to actually define the rule (source, destination
etc). This leads to every single frame being matched to the rule until
these other properties are added, which can result in unexpected behavior.
To fix this an "Enabled" property has been added and the rule will not
be applied until this is true.
The hotspot case can actually result in network being NULL which
ends up crashing when accessing "->secrets". In addition any
secrets on this network were never removed for hotspot networks
since everything happened in network_unset_hotspot.
testHotspot suffered from improper cleanup and if a single test failed
all subsequent tests would fail due to IWD still running since IWD()
was never cleaned up.
In addition the PSK agent and hwsim rules are now set onto the cls
object and removed in tearDownClass()
There are really no cases where a test wants to remove a single
rule. Most loop through and remove rules individually so this
is being added as a convenience.
Certain autotests coupled with slower test machines can result in lost
beacons and "Network not found" errors. In attempt to help with this
the test can just rescan (30 seconds max) until the network is found.
Remove EAP-SIM from the generic PEAP test case since skipping
(if ofono is not on system) would skip the entire test rather
than just the EAP-SIM portion.
This tests all EAP methods in their standard configuration. Any
corner cases requiring changes to main.conf or other hostapd
options are not included and will be left as stand alone tests.
This was done because nearly all EAP tests are identical except
the IWD provisioning file and hostapd EAP users fine. The IWD
provisioning file can be swapped out as needed for each individual
test without actually restarting IWD. And the EAP users file can
simply be written to include every possible EAP method that
is supported.
The -S/--sub-tests option allows the user to specify a test file
from inside an autotest. Inside this file there may also be many
test functions. This option is being extended to allow running
a single test function inside a test file. For example:
* Runs all test functions inside connection_test.py *
./test-runner -A some_test -S connection_test
* Runs only connection_test.py test_connect_success() *
./test-runner -A some_test -S connection_test.test_connect_success
The destructor was trying to do more than the scope of a destructor
by trying to handle this single case of hostapd being restarted.
Instead we can simply pass a keyword argument 'reinit' to the
constructor to tell it to reinitialize everything. And as for killing
hostapd this can be done in ungraceful_restart itself rather than
trying to handle it in the destructor.
There was a race condition here where the GLib timeout could have
fired but the test function returned successfully prior to the
end of the while loop. This would end up causing source_remove to
print a warning that the source did not exist.
Instead check if the timeout fired prior to removing it.
This (hopefully) will make this test pass better on slower machines.
In addition the mechanism of copying over separate main.conf files
was changed (rather than echo'ing the option into /tmp/main.conf)
This addresses the TODO where HostapdCLI was creating separate
objects each time HostapdCLI was called. This was worked around
by manually setting the important members but instead the class
can be re-worked to act as somewhat of a singleton, per-config
at least.
If there is no HostapdCLI instance for a given config one is
created and initialized. Subsequent HostapdCLI calls (for the
same config) will be returned the same object rather than a
new one.
Tests that called skipTest would result in an exception which would
hault execution as it was uncaught. In addition this wouldn't result
in an skipped test.
Now the actual test run is surrounded in a try/except block, skipped
exceptions are handled specifically, and a stack trace is printed if
some other exception occurs.
dmesg was being called at the very end of testing and dumped into
a log file. If many tests were run this could take quite a long
time and was timing out the default process wait. Instead --follow
can be used (basically like 'tail') which prints messages as they
come and avoids the time consuming full dump at the end.
The hotspot ANQP delay test was setting a global delay on all
packets which had some unintended consequences. At the time this
was the only way of simulating the test scenario but now hwsim
supports prefix matching so only the ANQP request/response will
be delayed.
This test was accessing the subprocess object and calling terminate
which ends up causing issues with test-runners own process cleanup.
Instead kill() should be used.
Hostapd sometimes has trouble with specifying additional BSSs in
a single config file, at least in the test-runner environment.
Since all the BSS's specified were identical instead the test was
reworked to only have a single BSS and each subtest can connect
in its own unique way.
This test took quite a while to execute (~2 minutes on my machine)
because there was simply no other way to test this scenario but
waiting. Now the no-roam-candidates condition can be waited for
rather than just sleeping for 20 seconds. Additionally the default
RoamRetryInterval was being used which is 60 seconds. Instead
main.conf can set this to 5 seconds and really cut down on the time
to wait.
Part of a comment was also removed due to being incorrect. Even
with neighbor reports IWD still must scan, its just that the
scan is more limited and, in theory, faster.
This is meant to be used as a generic notification to autotests. For
now 'no-roam-candidates' is the only event being sent. The idea
is to extend these events to signal conditions that are otherwise
undiscoverable in autotesting.
There was a common bit of code all over test-runner and utilities
which would wait for 'something' in a loop. At best these loops
would do the right thing and use the GLib.iteration call as to not
block the main loop, and at worst would not use it and just busy
wait.
Namespace.non_block_wait unifies all these into a single API to
a) do the wait correctly and b) prevent duplicate code.
Replace instances of the ap_del_station() +
ap_sta_free()/ap_remove_sta() with calls to ap_station_disconnect to
make sure we consistently remove the station from the ap->sta_states
queue before using ap_del_station(). ap_del_station() may generate an
event to the ap.h API user (e.g. P2P) and this may end up tearing down
the AP completely.
For that scenario we also don't want ap_sta_free() to access sta->ap so
we make sure ap_del_station() performs these cleanup steps so that
ap_sta_free() has nothing to do that accesses sta->ap.
client_frame is not valid for a beacon frame as beacons are not sent in
response to another frame. Move the access to client_frame->address_2
to the conditional blocks for Probe Response and Association Response
frames.
get_ordered_network() now scans automatically and has been updated
to use the StationDebug.Scan() API rather than doing a full
dbus scan (unless full_scan = True). The frequencies to be scanned
are picked automatically based on the current hostapd status
(hidden behind ctx.hostapd.get_frequency()).
This is to support the autotesting framework by allowing a smaller
scan subset. This will cut down on the amount of time spent scanning
via normal DBus scans (where the entire spectrum is scanned).
While losing the convenience of unittest this patch breaks out
each individual test function in order to run it manually and
get results. This vastly improves the user experience by seeing
which test file and function is being executed rather than simply
seeing "PASSED" for the entire test set.
In addition exceptions/failures are printed out as they happen
rather than at the end.
This changes all tests to use the default get_ordered_network behavior
rather than some custom or incorrect logic. Any use of
scan_if_needed=True has been removed since this is now the default.
Also any explicit scanning has been removed for tests which do not
require it (where the default behavior is good enough).
With the addition of connect_bssid/roam very few tests actually
require hwsim. Since hwsim can lead to problems with scan results
its best to have it off by default and have each test that needs
it explicitly turn it on.
Tests which previously turned it off have had that option removed.
Tests that do require hwsim still are vulnerable to scan result
problems, so for these tests beacon_int was added to the hostapd
config which seems to help with reliability somewhat.
There is a common block of code in nearly every test which is incorrect,
most likely a copy-paste from long ago. It goes something like:
wd.wait_for_object_condition(device, 'not obj.scanning')
device.scan()
wd.wait_for_object_condition(device, 'not obj.scanning')
network = device.get_ordered_network("ssid")
The problem here is that sometimes the scanning property does not get
updated fast enough before device.scan() returns, meaning get_ordered_network
comes up with nothing. Some tests pass scan_if_needed=True which 'fixes'
this but ends up re-scanning after the original scan finishes.
To put this to rest scan_if_needed is now defaulted to True, and no
explicit scan should be needed.
Most autotests do not want autoconnect behavior so it is being
turned off by default. There are a few tests where it is needed
and in these few cases the test can enable autoconnect through
the new station debug property.
This adds the property "AutoConnect" to the station debug interface
which can be read/written to disable or enable autoconnect globally.
As one would expect this property is only going to be used for testing
hence why it was put on the debug interface. Mosts tests disable
autoconnect (or they should) because it leads to unexpected connections.
In addition the send_bss_transition call was updated to only send a
single BSS. By sending two BSS's IWD is left to pick whichever one
it wants which makes the test behavior undefined.
This will use the Roam() developer method to force a roam to
a certain BSS. This is particularly useful for any test requiring
roams that are not testing IWD's BSS selection logic. Rather than
creating hwsim rules, setting low RSSI values, and waiting for the
roam logic/scan to happen Roam() can be used to force the roam
logic immediately.
Several tests tests for connectivity with the expectation that it
will fail. This ends up taking 30+ seconds because testutil retries
3 times, each with a 10 second timeout. By passing expect_fail=True
this lowers the timeout to zero, and skips any retries.
If a test has no hw.conf file test-runner was fully exiting and not
running any additional tests. This shouldn't happen in practice
since all upstreamed tests should run, but if any locally created
tests existed like this, it would cause the entire test run to exit
early.
Instead raise an exception which bails out of only that test, and
allows the rest to continue.
This method will initiate a connection to a specific BSS rather
than relying on a network based connection (which the user has
no control over which specific BSS is selected).
The only point of failure in netdev_connect_common was setting
up the handshake type. Moving this outside of netdev_connect_common
makes the code flow much better in netdev_{connect,reassociate} as
nothing needs to be reset upon failure.
Utilize 'storage_is_file' when readdir returns DT_UNKNOWN to ensure
features like autoconnect work on filesystems that don't return a d_type
(eg. XFS).
Utilize 'storage_is_file' when readdir returns DT_UNKNOWN to ensure
features like autoconnect work on filesystems that don't return a d_type
(eg. XFS).
Add a function 'storage_is_file' which will use stat to verify a
file's existence given a path relative to the storage directory.
Not all filesystems provide a file type via readdir's d_type.
XFS is a notable system with optional d_type support.
When d_type is not supported stat must be used as a fallback.
If a stat fallback is not provided iwd will fail to load state files.
The preparing_roam flag is expected to be set by a few roam
routines and normally this is done prior to the roam scan.
The Roam() developer option was not doing this and would
cause failed roams in some cases.
This adds support in netdev_reassociate for all the auth
protocols (SAE/FILS/OWE) by moving the bulk of netdev_connect
into netdev_connect_common. In addition PREV_BSSID is set
in the associate message if 'in_reassoc' is true.
Some connections, like Hotspot require additional IEs to be used during
the Association. These are now passed as 'extra_ies' when invoking
netdev_connect, however they are also needed during ReAssociation and FT
to such APs.
Additionally, it may be that Hotspot-enabled APs will start utilizing
FILS or SAE. In these cases the extra_ies need to be accounted for
somehow, either by making a copy in handshake_state, netdev, or the
auth_proto itself. Similarly, P2P which heavily uses vendor IEs can be
used over SAE in the future.
Since a copy of these IEs is needed, might as well store them in
handshake_state itself for easy book-keeping by network/station.
RM Enabled Capabilities and Extended Capabilities IEs were correctly
being sent when using CMD_CONNECT for initial connections and
re-associations. However, for SoftMac SAE, FT, FILS and OWE connections,
these additional IEs were not added properly during the Associate step.
If the driver supports RRM, then we might as well always send the RM
Enabled Capabilities IE (and use the USE_RRM flag). 802.11-2020
suggests that this IE can be sent whenever
dot11RadioMeasurementActivated is true, and this setting is independent
of whether the peer supports RRM. There's nothing to indicate that an
STA should not send these IEs if the AP is not RRM enabled.
While we correctly emit a NETDEV_EVENT_CHANNEL_SWITCHED event from
netdev for other modules to respond to, we fail to actually update the
frequency of the netdev object in question. Since the netdev frequency
is used elsewhere (e.g. to send action frames), it needs updating too.
Fixes: 5eb0b7ca8e04 ("netdev: add a channel switch event")
This variable ended up being used only on the fast-transition path. On
the re-associate path it was never used, but memcpy-ied nevertheless.
Since its only use is by auth_proto based protocols, move it to the
auth_proto object directly.
Due to how prepare_ft works (we need prev_bssid from the handshake, but
the handshake is reset), have netdev_ft_* methods take an 'orig_bss'
parameter, similar to netdev_reassociate.
IE elements in various management frames are ordered. This ordering is
outlined in 802.11, Section 9.3.3. The ordering is actually different
depending on the frame type. Instead of trying to implement the order
manually, add a utility function that will sort the IEs in the order
expected by the particular management frame type.
Since we already have IE ordering look up tables in the various
management frame type validation functions, move them to global level
and re-use these lookup tables for the sorting utility.
This refactors some code to eliminate getting the ERP entry twice
by simply returning it from network_has_erp_identity (now renamed
to network_get_erp_cache). In addition this code was moved into
station_build_handshake_rsn and properly cleaned up in case there
was an error or if a FILS AKM was not chosen.
The authorized macs pointer was being set to either the wsc_beacon
or wsc_probe_response structures, which were initialized out of
scope to where 'amacs' was being used. This resulted in an out of
scope read, caught by address sanitizers.
One of these message buffers was overflowing due to padding not
being taken into account (caught by sanitizers). Wrapped the length
of all message buffers with EAP_SIM_ROUND as to account for any
padding that attributes may add.
The Process class requires the ability to write out any processes
output to stdout, logging, or an explicit file, as well as store
it inside python for processing by test utilities. To accomplish
this each process was given a temporary file to write to, and that
file had an IO watch set on it. Any data that was written was then
read, and re-written out to where it needed to go. This ended up
being very buggy and quite complex due to needing to mess with
read/write pointers inside the file.
Popen already creates pipes to stdout if told, and they are accessable
via the p.stdout. Its then as simple as setting an IO watch on that
pipe and keeping the same code for reading out new data and writing
it to any files we want. This greatly reduces the complexity.
After some code changes the FT-FILS AKM was no longer selectable
inside network_can_connect_bss. This normally shouldn't matter
since station ends up selecting the AKM explicitly, including
passing the fils_hint, but since the autotests only included
FT-FILS AKMs this caused the transition to fail with no available
BSS's.
To fix this the standard 8021x AKM was added to the hostapd
configs. This allows these BSS's to be selected when attempting
to roam, but since FT-FILS is the only other AKM it will be used
for the actual transition.
testScan was creating 10 separate hidden networks which
sometimes bogged down hostapd to the point that it would
not start up in time before test-runner's timeouts fired.
This appeared to be due to hostapd needing to create 10
separate interfaces which would sometimes fail with -ENFILE.
The test itself only needed two separate networks, so instead
the additional 8 can be completely removed.
Occationally python will fatally terminate trying to load a test
using importlib with an out of memory exception. Increasing RAM
allows reliable exection of all tests.
When logging is enabled TLS debugging is turned on which creates
a PEM file during runtime. There is no way for IWD itself to clean
this up since its meant to be there for debugging.
The network_config was not being copied to network_info when
updated. This caused any new settings to be lost if the network
configuration file was updated during runtime.
The RoamThreshold5G was never honored because it was being
set prior to any connections. This caused the logic inside
netdev_cqm_rssi_update to always choose the 2GHz threshold
(RoamThreshold) due to netdev->frequency being zero at this time.
Instead call netdev_cqm_rssi_update in all connect/transition
calls after netdev->frequency is updated. This will allow both
the 2G and 5G thresholds to be used depending on what frequency
the new BSS is.
The call to netdev_cqm_rssi_update in netdev_setup_interface
was also removed since it serves no purpose, at least now
that there are two thresholds to consider.
Under certain conditions, access points with very low signal could be
detected. This signal is too low to estimate a data rate and causes
this L_WARN to fire. Fix this by returning a -ENETUNREACH error code in
case the signal is too low for any of the supported rates.
The scan ranking logic was previously changed to be based off a
theoretical calculated data rate rather than signal strength.
For HT/VHT networks there are many data points that can be used
for this calculation, but non HT/VHT networks are estimated based
on a simple table mapping signal strengths to data rates.
This table starts at a signal strength of -65 dBm and decreases from
there, meaning any signal strengths greater than -65 dBm will end up
getting the same ranking. This poses a problem for 3/4 blacklisting
tests as they set signal strengths ranging from -20 to -40 dBm.
IWD will then autoconnect to whatever network popped up first, which
may not be the expected network.
To fix this the signal strengths were changed to much lower values
which ensures IWD picks the expected network.
Newer QEMU version warn that msize is set too low and may result
in poor IO performance. The default is 8KiB which QEMU claims is
too low. Explicitly setting to 10KiB removes the warning:
qemu-system-x86_64: warning: 9p: degraded performance: a
reasonable high msize should be chosen on client/guest side
(chosen msize is <= 8192).
See https://wiki.qemu.org/Documentation/9psetup#msize for details.
Transition Disable indications and information stored in the network
profile needs to be enforced. Since Transition Disable information is
now stored inside the network object, add a new method
'network_can_connect_bss' that will take this information into account.
wiphy_can_connect method is thus deprecated and removed.
Transition Disable can also result in certain AKMs and pairwise ciphers
being disabled, so wiphy_select_akm method's signature is changed and
takes the (possibly overriden) ie_rsn_info as input.
This indication can come in via EAPoL message 3 or during
FILS Association. It carries information as to whether certain
transition mode options should be disabled. See WPA3 Specification,
version 3 for more details.
Some network settings keys are set / parsed in multiple files. Add a
utility to parse all common network configuration settings in one place.
Also add some defines to make sure settings are always saved in the
expected group/key.
This returns the length of the actual contents, making the code a bit
easier to read and avoid the need to mask the KDE value which isn't
self-explanatory.
The SAE unit test was written when group 19 was preferred by default for
all SAE connections. However, we have now started to prefer higher
security groups. Trick the test into using group 19 by wrapping
l_ecc_supported_ike_groups implementation to return just curve 19 as a
supported curve.
ERROR: AddressSanitizer: global-buffer-overflow on address 0x000000512c08 at pc 0x00000041848d bp 0x7ffcdde71870 sp 0x7ffcdde71860
READ of size 8 at 0x000000512c08 thread T0
#0 0x41848c in print_attributes monitor/nlmon.c:6268
#1 0x42ac53 in print_message monitor/nlmon.c:6544
#2 0x438968 in nlmon_message monitor/nlmon.c:6698
#3 0x43d5e4 in nlmon_receive monitor/nlmon.c:7658
#4 0x4b3cd0 in io_callback ell/io.c:120
#5 0x4b085a in l_main_iterate ell/main.c:478
#6 0x4b0ee3 in l_main_run ell/main.c:525
#7 0x4b0ee3 in l_main_run ell/main.c:507
#8 0x4b13ac in l_main_run_with_signal ell/main.c:647
#9 0x4072fe in main monitor/main.c:811
Break up the SAE tests into two parts: testSAE and testSAE-AntiClogging
testSAE is simplified to only use two radios and a single phy managed
by hostapd. hostapd configurations are changed via the new 'set_value'
method added to hostapd utils. This allows forcing hostapd to use a
particular sae group set, or force hostapd to use SAE H2E/Hunting and
Pecking Loop for key derivation. A separate test for IKE Group 20 is no
longer required and is folded into connection_test.py
testSAE-AntiClogging is added with an environment for 5 radios instead
of 7, again with hostapd running on a single phy. 'sae_pwe' is used to
force hostapd to use SAE H2E or Hunting and Pecking for key derivation.
Both Anti-Clogging protocol variants are thus tested.
main.conf is added to both directories to force scan randomization off.
This seems to be required for hostapd to work properly on hwsim.
Instead of requiring each auth_proto to perform validation of the frames
received via rx_authenticate & rx_associate, have netdev itself perform
the mpdu validation. This is unlikely to happen anyway since the kernel
performs its own frame validation. Print a warning in case the
validation fails.
There's no reason why a change in groups would result in the
anti-clogging token becoming invalid. This might result in us needing
an extra round-trip if the peer is using countermeasures and our
requested group was deemed unsuitable.
We may receive multiple anti-clogging request messages. We memdup the
token every time, without checking whether memory for one has already
been allocated. Free the old token prior to allocating a new one.
The group was not checked at all. The specification doesn't
mention doing so specifically, but we are only likely to receive an Anti
Clogging Token Request message once we have sent our initial Commit. So
the group should be something we could have sent or might potentially be
able to use.
In case an exceptional condition occurs, handle this more consistently
by returning the following errors:
-ENOMSG -- If a message results in the retransmission timer t0 being
restarted without actually sending anything.
-EBADMSG -- If a received message is to be silently discarded without
affecting the t0 timer.
-ETIMEDOUT -- If SYNC_MAX has been exceeded
-EPROTO -- If a fatal protocol error occurred
Now that sae_verify_* methods no longer allow dropped frames though,
there's no reason to keep these checks. sae_process_commit and
sae_process_confirm will now always receive messages in their respective
state.
sae_verify_* functions were correctly marking frames to be dropped, but
were returning 0, which caused the to-be-dropped frames to be further
processed inside sae_rx_authenticate. Fix that by returning a proper
error.
Make sure to return -EAGAIN whenever a received frame from the peer
results in a retransmission. This also prevents the frame from being
mistakenly processed further in sae_rx_authenticate.
Do not try to transition to a new state from sae_send_commit /
sae_send_confirm since these methods can be called due to
retransmissions or other unexpected messages. Instead, transition to
the new state explicitly from sae_process_commit / sae_process_confirm.
SAE protocol is meant to authenticate peers simultaneously. Hence it
includes a tie-breaker provision in case both peers enter into the
Committed state and the Commit messages arrive at the respective peers
near simultaneously.
However, in the case of STA or Infrastructure mode, only one peer (STA)
would normally enter the Committed state (via Init) and the tie-breaker
provision is not needed. If this condition is detected, abort the
connection.
Also remove the uneeded group change check in process_commit.
sae_compute_pwe doesn't really depend on the state of sae_sm. Only the
curve to be used for the PWE calculation is needed. Rework the function
signature to reflect that and remove unneeded member of struct sae_sm.
ie_tlv_builder_init takes a size_t as input, yet for some reason
ie_tlv_builder_finalize takes an unsigned int argument as output. Fix
the latter to use size_t as well.
During processing of Connect events by netdev, some of these elements
might be updated even when already set. Instead of issuing
l_free/l_memdup each time, check and see whether the elements are
bitwise identical first.
Returns a template RSNX element that can be further modified by callers
to set any additional capabilities if required. wiphy will fill in
those capabilities that are driver / firmware dependent.
Most parameters set into the handshake object are actually known by the
network object itself and not station. This includes address
randomization settings, EAPoL settings, passphrase/psk/8021x settings,
etc. Since the number of these settings will only keep growing, move
the handshake setup into network itself. This also helps keep network
internals better encapsulated.
Refactor network_sync_psk to not require setting attributes into
multiple settings objects. This is in fact unnecessary as the parsed
security parameters are used everywhere else instead. Also make sure to
wipe the [Security] group first, in case any settings were invalid
during loading or otherwise invalidated.
Credentials obtained can now be either in passphrase or PSK form. Prior
to commit 7a9891dbef5b, passphrase credentials were always converted to
PSK form by invoking crypto_psk_from_passphrase. This was changed in
order to support WPA3 networks. Unfortunately the provisioning logic
was never properly updated. Fix that, and also try to not overwrite any
existing settings in case WSC is providing credentials for networks that
are already known.
Fixes: 7a9891dbef5b ("wsc: store plain text passphrase if available")
There will be additional security-related settings that will be
introduced for settings files. In particular, Hash-to-Curve PT
elements, Transition Disable settings and potentially others in the
future. Since PSK is now not the only element that would require
update, rename this function to better reflect this.
PRF+ from RFC 5295 is the more generic function using which HKDF_Expand
is defined. Allow this function to take a vararg list of arguments to
be hashed (these are referred to as 'S' in the RFCs).
Implement hkdf_expand in terms of prf_plus and update all uses to the
new syntax.
This fixes an issue where the udp port was not being opened due to a
permission denied error. The result of this was the dhcp client would
fail to send the renewal request and so the dhcp lease would expire.
The addition of the CAP_NET_BIND_SERVICE capability allows the service
to open sockets in the restricted port range (<1024) which is required
for dhcp.
This is based on a previous patch by Roberto Santalla Fernández.
A new config is introduced into the network config file under IPv4
called SendHostname. If this is set to true then we add the hostname
into all DHCP requests. The default is false.
If the idea is that the interface should only be present when connected
then don't do this in the DISCONNECTING state as there are various
possible transitions from CONNECTED or ROAMING directly to DISCONNECTED.
The Changed() method did not actually return anything, and in fact the
no_reply flag for that message was set.
Similarly, the Release method does not expect a reply.
Don't require a gateway address from the settings file or from the DHCP
server when doing netconfig. Failing when the gateway address was
missing was breaking P2P but also small local networks.
Be paranoid and check that the prefix length in addresses from
used_addr4_list are not zero (they shouldn't be) and that address family
is AF_INET (it should be), mainly to quiet coverity warnings:
While there also fix one line's indentation.
At the end of ip_pool_select_addr4() we'd check if the selected address
is equal to the subnet address and increment it by 1 to produce a valid
host address for the AP. That check was always correct only with 24-bit
prefix, extend it to actually use the prefix-dependent mask instead of
0xff. Fixes a testAP failure triggered 50% of the times because the
netmask is 28 bit long there.
Don't signal the connected state until the client has obtained a DHCP
lease and we can set the ConnectedIP property. From now on that
property is always set when there's a connection.
p2p_parse_association_req() already extracts the P2P IE payload from the
IE sequence, there's no need to call ie_tlv_extract_p2p_payload before
it. Pass the IE sequence directly to p2p_parse_association_req().
Similarly to commit
27d302a0 ("band: Add a utility to estimate VHT rx data rate"), this
commit adds an RX data rate estimation utility for HT connections.
This function is meant to supercede a similar function in ie.c. The
current approach results in very optimistic data rate estimates since it
only takes into account the VHT/HT Capabilities IEs. It does not take
into account any local hardware limitations (such as no VHT/HT support),
limited RX MCS sets & number of spatial streams. It also does not take
into account that the AP might not be actually operating on higher
bandwidth channels.
This function is meant to address that by matching peer TX MCS sets with
the local hardware RX MCS set capability. It also takes into account
channel bandwidth capabilities of the local hardware, as well as whether
the AP is actually operating on a wider channel.
Move the band definition out of wiphy.c and into band.[ch]. This is
done to make certain utilities that depend on band information capable
of being tested from unit tests.
The band concept will most likely grow over time. For now, the only
user will be wiphy.c and unit tests, so the structures are kept public.
It is possible that the address set command succeeds just after a
netconfig object has been destroyed.
==6485== Invalid read of size 8
==6485== at 0x458A6D: netconfig_ipv4_routes_install (netconfig.c:629)
==6485== by 0x458D1C: netconfig_ipv4_ifaddr_add_cmd_cb (netconfig.c:689)
==6485== by 0x4A5E7B: process_message (netlink.c:181)
==6485== by 0x4A626A: can_read_data (netlink.c:289)
==6485== by 0x4A3E19: io_callback (io.c:120)
==6485== by 0x4A27B5: l_main_iterate (main.c:478)
==6485== by 0x4A28F6: l_main_run (main.c:525)
==6485== by 0x4A2C0E: l_main_run_with_signal (main.c:647)
==6485== by 0x404D27: main (main.c:542)
==6485== Address 0x4a47290 is 32 bytes inside a block of size 104 free'd
==6485== at 0x48399CB: free (vg_replace_malloc.c:538)
==6485== by 0x49998B: l_free (util.c:136)
==6485== by 0x457699: netconfig_free (netconfig.c:130)
==6485== by 0x45A038: netconfig_destroy (netconfig.c:1163)
==6485== by 0x41FD16: station_free (station.c:3613)
==6485== by 0x42020E: station_destroy_interface (station.c:3710)
==6485== by 0x4B990E: interface_instance_free (dbus-service.c:510)
==6485== by 0x4BC193: _dbus_object_tree_remove_interface (dbus-service.c:1694)
==6485== by 0x4BA22A: _dbus_object_tree_object_destroy (dbus-service.c:795)
==6485== by 0x4B078D: l_dbus_unregister_object (dbus.c:1537)
==6485== by 0x417ACB: device_netdev_notify (device.c:361)
==6485== by 0x4062B6: netdev_free (netdev.c:808)
==6485== Block was alloc'd at
==6485== at 0x483879F: malloc (vg_replace_malloc.c:307)
==6485== by 0x499857: l_malloc (util.c:62)
==6485== by 0x459DC0: netconfig_new (netconfig.c:1115)
==6485== by 0x41FC29: station_create (station.c:3592)
==6485== by 0x4207B3: station_netdev_watch (station.c:3864)
==6485== by 0x411A17: netdev_initial_up_cb (netdev.c:5588)
==6485== by 0x4A5E7B: process_message (netlink.c:181)
==6485== by 0x4A626A: can_read_data (netlink.c:289)
==6485== by 0x4A3E19: io_callback (io.c:120)
==6485== by 0x4A27B5: l_main_iterate (main.c:478)
==6485== by 0x4A28F6: l_main_run (main.c:525)
==6485== by 0x4A2C0E: l_main_run_with_signal (main.c:647)
==6485==
netdev_free relies on netdev->connected being set to detect whether a
connection is in progress. This variable is only set once the driver
has been connected however, so for situations where a CMD_CONNECT is
still 'in flight' or if the wiphy work is still pending, the ongoing
connection will not be canceled. Fix that by being more thorough when
trying to detect that a connection is in progress.
src/wiphy.c:wiphy_radio_work_next() Starting work item 2
Terminate
src/netdev.c:netdev_free() Freeing netdev wlan0[9]
src/device.c:device_free()
src/station.c:station_free()
src/netconfig.c:netconfig_destroy()
Removing scan context for wdev c
src/scan.c:scan_context_free() sc: 0x4a44c80
src/netdev.c:netdev_mlme_notify() MLME notification New Station(19)
src/netdev.c:netdev_link_notify() event 16 on ifindex 9
==6356== Invalid write of size 4
==6356== at 0x40A253: netdev_cmd_connect_cb (netdev.c:2522)
==6356== by 0x4A8886: process_unicast (genl.c:986)
==6356== by 0x4A8C48: received_data (genl.c:1098)
==6356== by 0x4A3DFD: io_callback (io.c:120)
==6356== by 0x4A2799: l_main_iterate (main.c:478)
==6356== by 0x4A28DA: l_main_run (main.c:525)
==6356== by 0x4A2BF2: l_main_run_with_signal (main.c:647)
==6356== by 0x404D27: main (main.c:542)
==6356== Address 0x4a3e418 is 152 bytes inside a block of size 472 free'd
==6356== at 0x48399CB: free (vg_replace_malloc.c:538)
==6356== by 0x49996F: l_free (util.c:136)
==6356== by 0x406662: netdev_free (netdev.c:886)
==6356== by 0x4129C2: netdev_shutdown (netdev.c:5980)
==6356== by 0x403A14: iwd_shutdown (main.c:79)
==6356== by 0x403A7D: signal_handler (main.c:90)
==6356== by 0x4A2AFB: sigint_handler (main.c:612)
==6356== by 0x4A2F3B: handle_callback (signal.c:78)
==6356== by 0x4A3030: signalfd_read_cb (signal.c:104)
==6356== by 0x4A3DFD: io_callback (io.c:120)
==6356== by 0x4A2799: l_main_iterate (main.c:478)
==6356== by 0x4A28DA: l_main_run (main.c:525)
==6356== Block was alloc'd at
==6356== at 0x483879F: malloc (vg_replace_malloc.c:307)
==6356== by 0x49983B: l_malloc (util.c:62)
==6356== by 0x4121BD: netdev_create_from_genl (netdev.c:5776)
==6356== by 0x451F6F: manager_new_station_interface_cb (manager.c:173)
==6356== by 0x4A8886: process_unicast (genl.c:986)
==6356== by 0x4A8C48: received_data (genl.c:1098)
==6356== by 0x4A3DFD: io_callback (io.c:120)
==6356== by 0x4A2799: l_main_iterate (main.c:478)
==6356== by 0x4A28DA: l_main_run (main.c:525)
==6356== by 0x4A2BF2: l_main_run_with_signal (main.c:647)
==6356== by 0x404D27: main (main.c:542)
If the daemon is started and killed rapidly on startup, it is possible
for netdev_shutdown to be called prior to manager processing messages
that actually create the netdev itself. Since the netdev_list has
already been freed, the storage is lost. Fix that by destroying
netdev_list only when the module is unloaded.
If we're going down, make sure to notify any watches about EVENT_DEL
earlier. Not doing so might result in us not cleaning up requests that
might have been started as the result of this event.
station_free() is invoked when one of two possibilities happen:
- Device has been powered down, and EVENT_DOWN has been emitted
- Device has been removed, and EVENT_DEL has been emitted
In both cases there is not much point for netdev_disconnect to be
invoked as that tries to cleanly shut down an existing connection. The
only thing the ABORTED error accomplishes in this case is to send a
dbus_aborted_error for the pending_connect message, if it exists.
There's already code for doing this in station_free().
src/station.c:station_enter_state() Old State: autoconnect_quick, new state: connecting (auto)
src/scan.c:scan_cancel() Trying to cancel scan id 1 for wdev 7
src/wiphy.c:wiphy_radio_work_done() Work item 1 done
src/wiphy.c:wiphy_radio_work_next() Starting work item 2
Terminate
src/netdev.c:netdev_free() Freeing netdev wlan0[9]
src/device.c:device_free()
src/station.c:station_free()
src/wiphy.c:wiphy_radio_work_done() Work item 2 done
src/station.c:station_connect_cb() 9, result: 5
src/netconfig.c:netconfig_destroy()
Removing scan context for wdev 7
src/scan.c:scan_context_free() sc: 0x4a39490
src/netdev.c:netdev_mlme_notify() MLME notification New Station(19)
src/netdev.c:netdev_link_notify() event 16 on ifindex 9
src/netdev.c:netdev_link_notify() event 16 on ifindex 9
src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
src/netdev.c:netdev_link_notify() event 16 on ifindex 9
src/netdev.c:netdev_mlme_notify() MLME notification Associate(38)
src/netdev.c:netdev_link_notify() event 16 on ifindex 9
src/netdev.c:netdev_mlme_notify() MLME notification Connect(46)
src/netdev.c:netdev_link_notify() event 16 on ifindex 9
src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is US
src/netdev.c:netdev_link_notify() event 16 on ifindex 9
src/netdev.c:netdev_unicast_notify() Unicast notification 129
src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX
==20311== Invalid write of size 4
==20311== at 0x406E74: netdev_cmd_disconnect_cb (netdev.c:1130)
==20311== by 0x4A78A8: process_unicast (genl.c:986)
==20311== by 0x4A7C6A: received_data (genl.c:1098)
==20311== by 0x4A2E1F: io_callback (io.c:120)
==20311== by 0x4A17BB: l_main_iterate (main.c:478)
==20311== by 0x4A18FC: l_main_run (main.c:525)
==20311== by 0x4A1C14: l_main_run_with_signal (main.c:647)
==20311== by 0x404D27: main (main.c:542)
==20311== Address 0x4a37a0c is 156 bytes inside a block of size 472 free'd
==20311== at 0x48399CB: free (vg_replace_malloc.c:538)
==20311== by 0x498991: l_free (util.c:136)
==20311== by 0x406651: netdev_free (netdev.c:883)
==20311== by 0x412976: netdev_shutdown (netdev.c:5970)
==20311== by 0x403A14: iwd_shutdown (main.c:79)
==20311== by 0x403A7D: signal_handler (main.c:90)
==20311== by 0x4A1B1D: sigint_handler (main.c:612)
==20311== by 0x4A1F5D: handle_callback (signal.c:78)
==20311== by 0x4A2052: signalfd_read_cb (signal.c:104)
==20311== by 0x4A2E1F: io_callback (io.c:120)
==20311== by 0x4A17BB: l_main_iterate (main.c:478)
==20311== by 0x4A18FC: l_main_run (main.c:525)
The data rate estimation belongs in wiphy since it should take hardware
capabilities into account. Right now the data rate calculation simply
assumes the hardware is as capable as the AP. scan.c will be ported to
use this utility and the data rate estimation will be expanded to take
wiphy capabilities into account.
scan_parse_result used to parse the wdev and return this to the caller
where it was compared against the expected wdev. Simplify this by
extract the wdev first, and proceeding with the bss parsing afterwards.
Right now a very limited set of band parameters are parsed into wiphy.
This includes the supported rates and the supported frequencies.
However, there is much more information that is given for each band.
Introduce a new band object that will store this information and can be
extended for future use.
[General].APRange is now [IPv4].APAddressPool and the netmask is changed
from 23 to 27 bits to make the test correctly assert that only two
default-sized subnets are allowed by IWD simultaneously (default has
changed from 24 to 28 bits)
Change the char *addr_str and uint8_t prefix_len pair to an
l_rtnl_address object and use ell/rtnl.h utilities that use that
directly. Extend broadcast_from_ip to handle prefix_len.
We generate the DBus error reply type from the errno only when
ap_start() was failing synchronously, now also send the errno through
the callbacks so that we can also return a specific DBus reply when
failing asynchronously. Thea AP autotest relies on receiving the
AlreadyExists DBus error.
Deprecate the global [General].APRanges setting in favour of
[IPv4].APAddressPool with an extended (but backwards-compatible) syntax.
Drop the existing address pool creation code.
The new APAddressPool setting has the same syntax as the profile-local
[IPv4].Address setting and the subnet selection code will fall back
to the global setting if it's missing, this way we use common code to
handle both settings.
Extend the [IPv4].Address setting's syntax to allow a new format: a list
of <IP>/<prefix_len> -form strings that define the address space from
which a subnet is selected. Rewrite the DHCP settings loading with
other notable changes:
* validate some of the settings more thoroughly,
* name all netconfig-related ap_state members with the netconfig_
prefix,
* make sure we always call l_dhcp_server_set_netmask(),
* allow netmasks other than 24-bit and change the default to 28 bits,
* as requested avoid using the l_net_ ioctl-based functions although
l_dhcp still uses them internally,
* as requested avoid touching the ap_state members until the end of
some functions so that on error they're basically a no-op (for
readability).
Add the ip_pool_select_addr4 function to select a random subnet of requested
size from an address space defined by a string list (for use with the
AP profile [IPv4].Address and the global [IPv4].APAddressPool settings),
avoiding those subnets that conflict with subnets in use. We take care
to give a similar weight to all subnets contained in the specified
ranges regardless of how many ranges contain each, basically so that
overlapping ranges don't affect the probabilities (debatable.)
Add the ip-pool submodule that tracks IPv4 addresses in use on the
system for use when selecting the address for a new AP. l_rtnl_address
is used internally because if we're going to return l_rtnl_address
objects it would be misleading if we didn't fill in all of their
properties like flags etc.
If the connected BSS changes channel, netdev will emit an event with the
new channel's frequency. In response, have station change the frequency
of the connected scan_bss struct and inform network about the update.
If the connected BSS announces that it is switching operating channel,
the kernel may emit the NL80211_CMD_CH_SWTICH_NOTIFY event when the
switch is complete. Add a new netdev event NETDEV_EVENT_CHANNEL_SWITCHED
to signal to interested modules that the connected BSS has changed
channel. The event carries a pointer to the new channel's frequency.
NL80211_BSS_LAST_SEEN_BOOTTIME is expressed in nanoseconds, while BSS
timestamps are expressed in microseconds internally. Convert the
attribute to microseconds when using it to timestamp a BSS. This makes
iwd expire absent BSSes within 30 seconds as intended.
Fixes: 454cee12d473 ("scan: Use kernel-reported time-stamp if provided")
Right now, if a connection to a network selected by auto-connect fails,
the entire autoconnect process is restarted. This means that scans are
kicked off again, auto-connect list is rebuilt, etc. This was due to
auto-connect reusing the same failure path as connections triggered via
D-Bus.
The above behavior can lead to weird situations in certain corner cases.
For example, a highly preferred network configured with the wrong
password would result in auto-connect entering an infinite loop.
Fix this by making sure that all auto-connect entries are tried and
exhausted prior to re-scanning again.
The temporary ban list is cleared when a network is connected to
successfully, and also in network_connect_failed. Unfortunately,
network_connect_failed is not called in all paths (i.e. during
autoconnect) since it messes with the state of secrets and passphrases.
Clear the list in network_disconnected() instead, since it is guaranteed
to be called in every circumstance.
This will be effectively the same as the CONNECTING state, but can be
used to enable differing behavior, depending on whether connection was
triggered by autoconnect or via D-Bus.
Code that walked the VHT TX/RX MCS maps seemed to assume that bit_field
operated on bits that start at '1'. But this utility actually operates
on bits that start at '0'. I.e. the least significant bit is at
position 0.
While we're at it, rename the mcs variable into bitoffset to make it
clearer how the maps are being iterated over. Supported MCS is actually
the value found in the map.
We seem to be not specifying the msize for the root filesystem, which
results in this warning being printed:
emu-system-x86_64: warning: 9p: degraded performance: a reasonable high msize should be chosen on client/guest side (chosen msize is <= 8192). See https://wiki.qemu.org/Documentation/9psetup#msize for details.
There doesn't seem to be much performance difference in the end since
iwd does not process large files.
This option has not been used in a very long time, and is of limited
utility since the only thing D-Bus debugging does is hexdumps the
content of D-Bus messages to the terminal.
The current calculation was giving erroneous results when it came to VHT
MCS index 4 and VHT MCS index 8 & 9.
Switch to a precomputed look up table and add a multiplication factor
for short GI.
These test cases depend on setting up the existing hostapd instance to a
set of known addresses, which might be different from what test-runner
sets. During this time, any scans might result in the old and the new
addresses used by hostapd to be found in the scan results.
Fix that by using start_iwd=0 which tells test_runner that the test
wants to start iwd itself. This delays starting iwd until after the
setUpClass routine has been called and hostapd configured properly.
Also use more sensible rssi values for the 'non-preferred' bss.
Otherwise, ranking BSSes by throughput can confuse the test logic
since both BSSes are ranked the same and either can be picked by
autoconnect.
Right now the --valgrind option logs to a static file named
'valgrind.log'. This means that for any test that run multiple
instances of iwd, output is lost for all invocations except the last.
Fix that by using a per-process log file and making sure that all log
files are printed to stdout when the test ends.
This approach isn't perfect since it is possible for the pid to be
reused, but better than the current behavior.
ap_reset() seems to be called whenever the AP is stopped or removed due
to interface shutdown. For some reason ap_reset did not remove the DHCP
server object, resulting in leaks:
==211== at 0x483879F: malloc (vg_replace_malloc.c:307)
==211== by 0x46B5AD: l_malloc (util.c:62)
==211== by 0x49B0E2: l_dhcp_server_new (dhcp-server.c:715)
==211== by 0x433AA3: ap_setup_dhcp (ap.c:2615)
==211== by 0x433AA3: ap_load_dhcp (ap.c:2645)
==211== by 0x433AA3: ap_load_config (ap.c:2753)
==211== by 0x433AA3: ap_start (ap.c:2885)
==211== by 0x434A96: ap_dbus_start_profile (ap.c:3329)
==211== by 0x482DA9: _dbus_object_tree_dispatch (dbus-service.c:1815)
==211== by 0x47A4D9: message_read_handler (dbus.c:285)
==211== by 0x4720EB: io_callback (io.c:120)
==211== by 0x47130C: l_main_iterate (main.c:478)
==211== by 0x4713DB: l_main_run (main.c:525)
==211== by 0x4713DB: l_main_run (main.c:507)
==211== by 0x4715EB: l_main_run_with_signal (main.c:647)
==211== by 0x403EE1: main (main.c:550)
==209== by 0x43E48A: netconfig_ipv4_select_and_install (netconfig.c:887)
==209== by 0x43E48A: netconfig_configure (netconfig.c:1025)
==209== by 0x41743C: station_connect_cb (station.c:2556)
==209== by 0x408E0D: netdev_connect_ok (netdev.c:1311)
==209== by 0x47549E: process_unicast (genl.c:994)
==209== by 0x47549E: received_data (genl.c:1102)
==209== by 0x4720EB: io_callback (io.c:120)
==209== by 0x47130C: l_main_iterate (main.c:478)
==209== by 0x4713DB: l_main_run (main.c:525)
==209== by 0x4713DB: l_main_run (main.c:507)
==209== by 0x4715EB: l_main_run_with_signal (main.c:647)
==209== by 0x403EE1: main (main.c:550)
Prior to the BSS blacklist a BSS based autoconnect list made
the most sense, but now station actually retries all BSS's upon
failure. This means that for each BSS in the autoconnect list
every other BSS under that SSID will be attempted to connect to
if there is a failure. Essentially this is a network based
autoconnect list, just an indirect way of doing it.
Intead the autoconnect list can be purely network based, using
the network rank for sorting. This avoids the need for a special
autoconnect_entry struct as well as ensures the last connected
network is chosen first (simply based on existing network ranking
logic).
It was observed that IWD's ranking for BSS's did not always
end up with the fastest being chosen. This was due to IWD's
heavy weight on signal strength. This is a decent way of ranking
but even better is calculating a theoretical data rate which
was also done and factored in. The problem is the data rate
factor was always outdone by the signal strength.
Intead remove signal strength entirely as this is already taken
into account with the data rate calculation. This also removes
the check for rate IEs. If no IEs are found the parser will
base the data rate soley on RSSI.
There were a few other factors removed which will be added back
when ranking *networks* rather than BSS's. WPA version (or open)
was removed as well as the privacy capability. These values really
should not differ between BSS's in the same SSID and as such
should be used for network ranking instead.
Both ext/supported rates IEs are obtained from scan results. These
IEs are passed to ie_tlv_init/ie_tlv_next, as well as direct length
checks (for supported rates at least, extended supported rates can
be as long as a single byte integer can hold, 1 - 255) which verifies
that the length in the IE matches the overall IE length that is
stored in scan_bss. Because of this, ie_parse_supported_rates_from_data
was doing double duty re-initializing a TLV iterator.
Intead, since we know the IE length is within bounds, the length/data
can simply be directly accessed out of the buffer. This avoids the need
for a wrapper function entirely.
The length parameters were also removed, since this is now obtained
directly from the IE.
The FT-over-DS procedure now authenticates with multiple BSS's
upon connecting. This causes list_sta() to return our address for
any authenticated APs. It has now been changed to work with this
new behavior, as well as a check that the station fully connected
to the expected AP initially.
Since netdev maintains the list of FT over DS info structs there is not
any need for station to get callbacks when the initial action frame
is received, or not. This removes the need for the callback handler,
user data, and response timeout.
Roam times can be slightly improved by sending out the FT-over-DS
action frames to any BSS in the mobility domain immediately after
connecting. This preauthenticates IWD to each AP which means
Reassociation can happen right away when a roam is needed.
When a roam is needed station_transition_start will first try
FT-over-DS (if supported) via netdev_fast_transtion_over_ds. The
return is checked and if netdev has no cached entries FT-over-Air
will be used instead.
The beauty of FT-over-DS is that a station can send and receive
action frames to many APs to prepare for a future roam. Each
AP authenticates the station and when a roam happens the station
can immediately move to reassociation.
To handle this a queue of netdev_ft_over_ds_info structs is used
instead of a single entry. Using the new ft.c parser APIs these
info structs can be looked up when responses come in. For now
the timeouts/callbacks are kept but these will be removed as it
really does not matter if the AP sends a response (keeps station
happy until the next patch).
This is to prepare for multiple concurrent FT-over-DS action frames.
A list will be kept in netdev and for lookup reasons it needs to
parse the start of the frame to grab the aa/spa addresses. In this
call the IEs are also returned and passed to the new
ft_over_ds_parse_action_response.
For now the address checks have been moved into netdev, but this will
eventually turn into a queue lookup.
test-runner will print out if files were left behind after a
test which lets the developer know something was not cleaned
up. But in this case test-runner should also remove these files
so they are not left, and printed, for each subsequent test.
This value sets the roaming threshold on 5GHz networks. The
threshold has been separated from 2.4GHz because in many cases
5GHz can perform much better at low RSSI than 2.4GHz.
In addition the BSS ranking logic was re-worked and now 5GHz is
much more preferred, even at low RSSI. This means we need a
lower floor for RSSI before roaming, otherwise IWD would end
up roaming immediately after connecting due to low RSSI CQM
events.
This is being added as a developer method and should not be used
in production. For testing purposes though, it is quite useful as
it forces IWD to roam to a provided BSS and bypasses IWD's roaming
and ranking logic for choosing a roam candidate.
To use this a BSSID is provided as the only parameter. If this
BSS is not in IWD's current scan results -EINVAL will be returned.
If IWD knows about the BSS it will attempt to roam to it whether
that is via FT, FT-over-DS, or Reassociation. These details are
still sorted out in IWDs station_transition_start() logic.
This will enable developer features to be used. Currently the
only user of this will be StationDiagnostics.Roam() method which
should only be exposed in this mode.
Expose the state directory/storage directory path on D-Bus because it
can't be known to clients until IWD runs, and client might need to
occasionally fiddle with the network config files. While there also
expose the IWD version string, similar to how some other D-Bus services
do.
Certain tests like testAP spawn two IWD process in separate
namespaces. When --valrind is used this eats up quite a bit
of RAM and causes the VM to run out of memory and start
killing off processes.
Similar to 06aa84cca set the operstate when AdHoc is started and
stopped as it is no longer always set by netdev (only for station/p2p
interface types)
Previously resp was a simple array of bytes allocated on the stack.
This was changed to a dynamically allocated array, but the sizeof(resp)
argument to ap_build_beacon_pr_head() was never changed appropriately.
Fix this by introducing a new resp_len variable that holds the number of
bytes allocated for resp. Also, move the allocation after the basic
sanity checks have been performed to avoid allocating/freeing memory
unnecessarily.
Fixes: 18a63f91fd44 ("ap: Write extra frame IEs from the user")
Commit 1fe5070 added a workaround for drivers which may send the
connect event prior to the connect callback/ack. This caused IWD
to fail to start eapol if reassociation was used due to
netdev_reassociate never setting netdev->connected = false.
netdev_reassociate uses the same code path as normal connections,
but when the connect callback came in connected was already set
to true which then prevents eapol from being registered. Then,
once the connect event comes in, there is no frame watch for
eapol and IWD doesn't respond to any handshake frames.
WEP networks are not supported by iwd. However, the only indication is the
message "Operation not supported" while trying to connect. It is not clear
enough that this is due to intentional lack of support (as opposed to some
kind of misconfiguration). This patch explicitly lists WEP networks shown
with get-networks as unsupported. Hopefully this will make it clearer for
those of us not as familiar with iwd.
Prior to this, an error sending the FT Reassociation was treated
as fatal, which is correct for FT-over-Air but not for FT-over-DS.
If the actual l_genl_family_send call fails for FT-over-DS the
existing connection can be maintained and there is no need to
call netdev_connect_failed.
Adding a return to the tx_associate function works for both FT
types. In the FT-over-Air case this return will ultimately get
sent back up to auth_proto_rx_authenticate in which case will
call netdev_connect_failed. For FT-over-DS tx_associate is
actually called from the 'start' operation which can fail and
still maintain the existing connection.
FT-over-DS was refactored to separate the FT action frame and
reassociation. From stations standpoint IWD needs to call
netdev_fast_transition_over_ds_action prior to actually roaming.
For now these two stages are being combined and the action
roam happens immediately after the action response callback.
FT-over-DS followed the same pattern as FT-over-Air which worked,
but really limited how the protocol could be used. FT-over-DS is
unique in that we can authenticate to many APs by sending out
FT action frames and parsing the results. Once parsed IWD can
immediately Reassociate, or do so at a later time.
To take advantage of this IWD need to separate FT-over-DS into
two stages: action frame and reassociation.
The initial action frame stage is started by netdev. The target
BSS is sent an FT action frame and a new cache entry is created
in ft.c. Once the response is received the entry is updated
with all the needed data to Reassociate. To limit the record
keeping on netdev each FT-over-DS entry holds a userdata pointer
so netdev doesn't need to maintain its own list of data for
callbacks.
Once the action response is parsed netdev will call back signalling
the action frame sequence was completed (either successfully or not).
At this point the 'normal' FT procedure can start using the
FT-over-DS auth-proto.
FT-over-DS is being separated into two independent stages. The
first of which is the processing of the action frame response.
This new class will hold all the parsed information from the action
frame and allowing it to be retrieved at a later time when IWD
needs to roam.
Initial info class should be created when the action frame is
being sent out. Once a response is received it can be parsed
with ft_over_ds_parse_action_response. This verifies the frame
and updates the ft_ds_info class with the parsed data.
ft_over_ds_prepare_handshake is the final step prior to
Reassociation. This sets all the stored IEs, anonce, and KH IDs
into the handshake and derives the new PTK.
This adds the RSNE verification to ft_parse_ies which will
be common between over-Air and over-DS. The MDE check was
also factored out into its own minimal function as to
retain the spec comment but allow reuse elsewhere.
Since using --valgrind actually runs IWD using the valgrind
process the --verbose flag would only work if 'valgrind' was
also specified. This was taken into account with is_verbose
but the actual logic enabling stdout did not use that helper.
This was due, in part, to logging since is_verbose will always
return true if --log is used. To fix this a new flag was added
to is_verbose which omits the --log check to handle this
specific case.
Prior to this the diagnostic interface was taken down when station
transitioned to DISCONNECTED. This worked but once station is in
a DISCONNECTING state it then calls netdev_disconnect(). Trying to
get any diagnostic data during this time may not work as its
unknown what state exactly the kernel is in. To be safe take the
interface down when station is DISCONNECTING.
The building of the FT IEs for Action/Authenticate
frames will need to be shared between ft and netdev
once FT-over-DS is refactored.
The building was refactored to work off the callers
buffer rather than internal stack buffers. An argument
'new_snonce' was included as FT-over-DS will generate
a new snonce for the initial action frame, hence the
handshakes snonce cannot be used.
Break up the rather large code block which parses out IEs,
verifies, and sets into the handshake. FT-over-DS needs these
steps broken up in order to parse the action frame response
without modifying the handshake.
Under very rare circumstances the roaming scan triggered might not be
canceled properly. This is because we issue the roam scan recursively
from within a scan callback and re-use the id of the scan for the
subsequent request. The destroy callback is invoked right after the
callback and resets the id. This leads to the scan not being canceled
properly in roam_state_clear().
src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
src/station.c:station_roam_trigger_cb() 37
src/station.c:station_roam_scan() ifindex: 37
src/station.c:station_roam_trigger_cb() Using cached neighbor report for roam
...
src/scan.c:get_scan_done() get_scan_done
src/station.c:station_roam_failed() 37
src/station.c:station_roam_scan() ifindex: 37
src/scan.c:scan_request_triggered() Active scan triggered for wdev 22
^CTerminate
src/netdev.c:netdev_free() Freeing netdev wlan0[37]
src/device.c:device_free()
src/station.c:station_free()
...
Removing scan context for wdev 22
src/scan.c:scan_context_free() sc: 0x4a362a0
src/wiphy.c:wiphy_radio_work_done() Work item 14 done
==19542== Invalid write of size 4
==19542== at 0x411500: station_roam_scan_destroy (station.c:2010)
==19542== by 0x420B5B: scan_request_free (scan.c:156)
==19542== by 0x410BAC: destroy_work (wiphy.c:294)
==19542== by 0x410BAC: wiphy_radio_work_done (wiphy.c:1613)
==19542== by 0x46C66E: l_queue_clear (queue.c:107)
==19542== by 0x46C6B8: l_queue_destroy (queue.c:82)
==19542== by 0x420BAE: scan_context_free (scan.c:205)
==19542== by 0x424135: scan_wdev_remove (scan.c:2272)
==19542== by 0x408754: netdev_free (netdev.c:847)
==19542== by 0x40E18C: netdev_shutdown (netdev.c:5773)
==19542== by 0x404756: iwd_shutdown (main.c:78)
==19542== by 0x404756: iwd_shutdown (main.c:65)
==19542== by 0x470E21: handle_callback (signal.c:78)
==19542== by 0x470E21: signalfd_read_cb (signal.c:104)
==19542== by 0x47166B: io_callback (io.c:120)
==19542== Address 0x4d81f98 is 200 bytes inside a block of size 288 free'd
==19542== at 0x48399CB: free (vg_replace_malloc.c:538)
==19542== by 0x47F3E5: interface_instance_free (dbus-service.c:510)
==19542== by 0x481DEA: _dbus_object_tree_remove_interface (dbus-service.c:1694)
==19542== by 0x481F1C: _dbus_object_tree_object_destroy (dbus-service.c:795)
==19542== by 0x40894F: netdev_free (netdev.c:844)
==19542== by 0x40E18C: netdev_shutdown (netdev.c:5773)
==19542== by 0x404756: iwd_shutdown (main.c:78)
==19542== by 0x404756: iwd_shutdown (main.c:65)
==19542== by 0x470E21: handle_callback (signal.c:78)
==19542== by 0x470E21: signalfd_read_cb (signal.c:104)
==19542== by 0x47166B: io_callback (io.c:120)
==19542== by 0x47088C: l_main_iterate (main.c:478)
==19542== by 0x47095B: l_main_run (main.c:525)
==19542== by 0x47095B: l_main_run (main.c:507)
==19542== by 0x470B6B: l_main_run_with_signal (main.c:647)
==19542== Block was alloc'd at
==19542== at 0x483879F: malloc (vg_replace_malloc.c:307)
==19542== by 0x46AB2D: l_malloc (util.c:62)
==19542== by 0x416599: station_create (station.c:3448)
==19542== by 0x406D55: netdev_newlink_notify (netdev.c:5324)
==19542== by 0x46D4BC: l_hashmap_foreach (hashmap.c:612)
==19542== by 0x472F46: process_broadcast (netlink.c:158)
==19542== by 0x472F46: can_read_data (netlink.c:279)
==19542== by 0x47166B: io_callback (io.c:120)
==19542== by 0x47088C: l_main_iterate (main.c:478)
==19542== by 0x47095B: l_main_run (main.c:525)
==19542== by 0x47095B: l_main_run (main.c:507)
==19542== by 0x470B6B: l_main_run_with_signal (main.c:647)
==19542== by 0x403EDB: main (main.c:490)
==19542==
Prior to this netdev_connect_ok set setting this which really
only applies to station mode. In addition this happens for each
new station that connects to the AP. Instead set the operstate /
link mode when AP starts and stops.
Change ap_start to load all of the AP configuration from a struct
l_settings, moving the 6 or so parameters from struct ap_config members
to the l_settings groups and keys. This extends the ap profile concept
used for the DHCP settings. ap_start callers create the l_settings
object and fill the values in it or read the settings in from a file.
Since ap_setup_dhcp and ap_load_profile_and_dhcp no longer do the
settings file loading, they needed to be refactored and some issues were
fixed in their logic, e.g. l_dhcp_server_set_ip_address() was never
called when the "IP pool" was used. Also the IP pool was previously only
used if the ap->config->profile was NULL and this didn't match what the
docs said:
"If [IPv4].Address is not provided and no IP address is set on the
interface prior to calling StartProfile the IP pool will be used."
The info struct is on the stack which leads to the potential
for uninitialized data access. Zero out the info struct prior
to calling the get station callback:
==141137== Conditional jump or move depends on uninitialised value(s)
==141137== at 0x458A6F: diagnostic_info_to_dict (diagnostic.c:109)
==141137== by 0x41200B: station_get_diagnostic_cb (station.c:3620)
==141137== by 0x405BE1: netdev_get_station_cb (netdev.c:4783)
==141137== by 0x4722F9: process_unicast (genl.c:994)
==141137== by 0x4722F9: received_data (genl.c:1102)
==141137== by 0x46F28B: io_callback (io.c:120)
==141137== by 0x46E5AC: l_main_iterate (main.c:478)
==141137== by 0x46E65B: l_main_run (main.c:525)
==141137== by 0x46E65B: l_main_run (main.c:507)
==141137== by 0x46E86B: l_main_run_with_signal (main.c:647)
==141137== by 0x403EA8: main (main.c:490)
It isn't safe to return a NULL from diagnostic_akm_suite_to_security()
since the value is used directly. Also, if the AKM suite is 0, this
implies that the network is an Open network and not some unknown AKM.
==17982== Invalid read of size 1
==17982== at 0x483BC92: strlen (vg_replace_strmem.c:459)
==17982== by 0x47DE60: _dbus1_builder_append_basic (dbus-util.c:981)
==17982== by 0x41ACB2: dbus_append_dict_basic (dbus.c:197)
==17982== by 0x412050: station_get_diagnostic_cb (station.c:3614)
==17982== by 0x405B19: netdev_get_station_cb (netdev.c:4801)
==17982== by 0x47436E: process_unicast (genl.c:994)
==17982== by 0x47436E: received_data (genl.c:1102)
==17982== by 0x470FBB: io_callback (io.c:120)
==17982== by 0x4701DC: l_main_iterate (main.c:478)
==17982== by 0x4702AB: l_main_run (main.c:525)
==17982== by 0x4702AB: l_main_run (main.c:507)
==17982== by 0x4704BB: l_main_run_with_signal (main.c:647)
==17982== by 0x403EDB: main (main.c:490)
==17982== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==17982==
Aborting (signal 11) [/home/denkenz/iwd/src/iwd]
++++++++ backtrace ++++++++
0 0x488a550 in /lib64/libc.so.6
1 0x483bc92 in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so
2 0x47de61 in _dbus1_builder_append_basic() at ell/dbus-util.c:983
3 0x41acb3 in dbus_append_dict_basic() at src/dbus.c:197
4 0x412051 in station_get_diagnostic_cb() at src/station.c:3618
5 0x405b1a in netdev_get_station_cb() at src/netdev.c:4801
It is possible for the RTNL command callback to come after
netconfig_reset or netconfig_destroy has been called. Make sure that
any outstanding commands that might access the netconfig object are
canceled.
src/netconfig.c:netconfig_ipv4_dhcp_event_handler() DHCPv4 event 0
src/netconfig.c:netconfig_ifaddr_added() wlan0: ifaddr 192.168.1.55/24 broadcast 192.168.1.255
^CTerminate
src/netdev.c:netdev_free() Freeing netdev wlan0[15]
src/device.c:device_free()
src/station.c:station_free()
src/netconfig.c:netconfig_destroy()
src/netconfig.c:netconfig_reset()
src/netconfig.c:netconfig_reset_v4() 16
src/netconfig.c:netconfig_reset_v4() Stopping client
Removing scan context for wdev c
src/scan.c:scan_context_free() sc: 0x4a3cc10
==12792== Invalid read of size 8
==12792== at 0x43BF5A: netconfig_route_add_cmd_cb (netconfig.c:600)
==12792== by 0x4727FA: process_message (netlink.c:181)
==12792== by 0x4727FA: can_read_data (netlink.c:289)
==12792== by 0x470F4B: io_callback (io.c:120)
==12792== by 0x47016C: l_main_iterate (main.c:478)
==12792== by 0x47023B: l_main_run (main.c:525)
==12792== by 0x47023B: l_main_run (main.c:507)
==12792== by 0x47044B: l_main_run_with_signal (main.c:647)
==12792== by 0x403EDB: main (main.c:490)
In case the netdev is brought down while we're trying to connect, try to
detect this and fail early instead of trying to send additional
commands.
src/station.c:station_enter_state() Old State: disconnected, new state: connecting
src/station.c:station_netdev_event() Associating
src/netdev.c:netdev_mlme_notify() MLME notification Connect(46)
src/netdev.c:netdev_connect_event()
src/netdev.c:netdev_link_notify() event 16 on ifindex 4
src/eapol.c:eapol_handle_ptk_1_of_4() ifindex=4
src/netdev.c:netdev_link_notify() event 16 on ifindex 4
src/eapol.c:eapol_handle_ptk_3_of_4() ifindex=4
src/netdev.c:netdev_set_gtk() 4
src/station.c:station_handshake_event() Setting keys
src/netdev.c:netdev_set_tk() 4
src/netdev.c:netdev_set_rekey_offload() 4
New Key for Group Key failed for ifindex: 4:Network is down
src/netdev.c:netdev_link_notify() event 16 on ifindex 4
src/station.c:station_free()
src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
src/netdev.c:netdev_disconnect_event()
src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX
src/netdev.c:netdev_link_notify() event 16 on ifindex 4
src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is DE
src/wiphy.c:wiphy_radio_work_done() Work item 14 done
src/station.c:station_connect_cb() 4, result: 4
Segmentation fault
A prior commit refactored the AKM selection in wiphy.c. This
ended up breaking FILS tests due to the hard coding of a
false fils_hint in wiphy_select_akm. Since our FILS tests
only advertise FILS AKMs wiphy_can_connect would return false
for these networks.
Similar to wiphy_select_akm, add a fils hint parameter to
wiphy_can_connect and pass that down directly to wiphy_select_akm.
If PreSharedKey is set, the current logic does not validate the
Passphrase beyond its existence. This can lead to strange situations
where an invalid WPA3-PSK passphrase might get used. This can of course
only happen if the user (as root) or NetworkManager-iwd-backend writes
such a file incorrectly.
Move the WSC Primary Device Type parsing from p2p.c and eap-wsc.c to a
common function in wscutil.c supporting both formats so that it can be
used in ap.c too.
Logically this frame watch belongs in station. It was kept in device.c
for the purported reason that the station object was removed with
ifdown/ifup changes and hence the frame watch might need to be removed
and re-added unnecessarily. Since the kernel does not actually allow to
unregister a frame watch (only when the netdev is removed or its iftype
changes), re-adding a frame watch might trigger a -EALREADY or similar
error.
Avoid this by registering the frame watch when a new netdev is detected
in STATION mode, or when the interface type changes to STATION.
If a netdev iftype is changed, all frame registrations are removed.
Make sure to re-register for the appropriate frame notifications in case
our iftype is switched back to 'station'. In any other iftype, no frame
watches are registered and rrm_state object is effectively dormant.
Right now, RRM is created when a new netdev is detected and its iftype
is of type station. That means that any devices that start their life
as any other iftype cannot be changed to a station and have RRM function
properly. Fix that by always creating the RRM state regardless of the
initial iftype.
In the case that a netdev is powered down, or an interface type change
occurs, the station object will be removed and any watches will be
freed.
Since rrm is created when the netdev is created and persists across
iftype and power up/down changes, it should provide a destroy callback
to station_add_state_watch so that it can be notified when the watch is
removed.
If the iftype changes, kernel silently wipes out any frame registrations
we may have registered. Right now, frame registrations are only done when
the interface is created. This can result in frame watches not being
added if the interface type is changed between station mode to ap mode
and then back to station mode, e.g.:
device wlan0 set-property Mode ap
device wlan0 set-property Mode station
Make sure to re-add frame registrations according to the mode if the
interface type is changed.
Since netdev now keeps track of iftype changes, let it call
frame_watch_wdev_remove on netdevs that it manages to clear frame
registrations that should be cleared due to an iftype change.
Note that P2P_DEVICE wdevs are not managed by any netdev object, but
since their iftype cannot be changed, they should not be affected
by this change.
And set the interface type based on the event rather than the command
callback. This allows us to track interface type changes even if they
come from outside iwd (which shouldn't happen.)
The prepare_ft patch was an intermediate to a full patch
set and was not fully tested stand alone. Its placement
actually broke FT due to handshake->aa getting overwritten
prior to netdev->prev_bssid being copied out. This caused
FT to fail with "transport endpoint not connected (-107)"
The AuthCenter was still not being fully cleaned up in these
tests. It was being stopped but there was still a reference being
held which prevented __del__ from being called.
There was a bug with process output where the last bit of data would
never make it into stdout or log files. This was due to the IO watch
being cleaned up when the process was killed and never allowing it
to finish writing any pending data.
Now the IO watch implementation has been moved out into its own
function (io_process) which is now used to write the final bits of
data out on process exit.
The processes in the list ultimately get removed for each
kill() call. This causes strange behavior since the list is
being iterated and each iteration is removing items. Instead
iterate over a new temporary list so the actual process list
can be cleaned up.
- Make sure to print the cookie information
- Don't print messages for frames we're not interested in. This is
particularly helpful when running auto-tests since frame acks from
hostapd pollute the iwd log.
This file was not included when testNetconfig was introduced
and is required. My system was working fine as it was in my
local tree but has been missing and not passing for others.
IWD_GENL_DEBUG is not generally useful anymore as it just prints a
hexdump of the raw data on the socket. The messages are quite verbose
and spam test-runner logs for little utility.
Fix a regression where connection to an open network results in an
NotSupported error being returned.
Fixes: d79e883e93df ("netdev: Introduce connection types")
This makes conversions simpler. Also fixes a bug where P2P devices were
printed with an incorrect Mode value since dbus_iftype_to_string was
assuming that an iftype as defined in nl80211.h was being passed in,
while netdev was returning an enum value defined in netdev.h.
It was seen that some full mac cards/drivers do not include any
rate information with the NEW_STATION event. This was causing
the NEW_STATION event to be ignored, preventing AP mode from
working on these cards.
Since the full mac path does not even require sta->rates the
parsing can be removed completely.
It was found that if the user cancels/disconnects the agent prior to
entering credentials, IWD would get stuck and could no longer accept
any connect calls with the error "Operation already in progress".
For example exiting iwctl in the Password prompt would cause this:
iwctl
$ station wlan0 connect myssid
$ Password: <Ctrl-C>
This was due to the agent never calling the network callback in the
case of an agent disconnect. Network would wait indefinitely for the
credentials, and disallow any future connect attempts.
To fix this agent_finalize_pending can be called in agent_disconnect
with a NULL reply which behaves the same as if there was an
internal timeout and ultimately allows network to fail the connection
The 8021x offloading procedure still does EAP in userspace which
negotiates the PMK. The kernel then expects to obtain this PMK
from userspace by calling SET_PMK. This then allows the firmware
to begin the 4-way handshake.
Using __eapol_install_set_pmk_func to install netdev_set_pmk,
netdev now gets called into once EAP finishes and can begin
the final userspace actions prior to the firmware starting
the 4-way handshake:
- SET_PMK using PMK negotiated with EAP
- Emit SETTING_KEYS event
- netdev_connect_ok
One thing to note is that the kernel provides no way of knowing if
the 4-way handshake completed. Assuming SET_PMK/SET_STATION come
back with no errors, IWD assumes the PMK was valid. If not, or
due to some other issue in the 4-way, the kernel will send a
disconnect.
This adds a new type for 8021x offload as well as support in
building CMD_CONNECT.
As described in the comment, 8021x offloading is not particularly
similar to PSK as far as the code flow in IWD is concerned. There
still needs to be an eapol_sm due to EAP being done in userspace.
This throws somewhat of a wrench into our 'is_offload' cases. And
as such this connection type is handled specially.
802.1x offloading needs a way to call SET_PMK after EAP finishes.
In the same manner as set_tk/gtk/igtk a new 'install_pmk' function
was added which eapol can call into after EAP completes.
The timeout functionality was removed from the core SAE
implementation as it causes issues with kernel behavior.
Because of this the timeout tests are no longer valid,
nor is a few asserts in the end-to-end test.
The chances were extremely low, but using l_idle_oneshot
could end up causing a invalid memory access if the netdev
went down while waiting for the disconnect idle callback.
Instead netdev can keep track of the idle with l_idle_create
and remove it if the netdev goes down prior to the idle callback.
This fixes an infinite loop issue when authenticate frames time
out. If the AP is not responding IWD ends up retrying indefinitely
due to how SAE was handling this timeout. Inside sae_auth_timeout
it was actually sending another authenticate frame to reject
the SAE handshake. This, again, resulted in a timeout which called
the SAE timeout handler and repeated indefinitely.
The kernel resend behavior was not taken into account when writing
the SAE timeout behavior and in practice there is actually no need
for SAE to do much of anything in response to a timeout. The
kernel automatically resends Authenticate frames 3 times which mirrors
IWDs SAE behavior anyways. Because of this the authenticate timeout
handler can be completely removed, which will cause the connection
to fail in the case of an autentication timeout.
This crash was caused from the disconnect_cb being called
immediately in cases where send_disconnect was false. The
previous patch actually addressed this separately as this
flag was being set improperly which will, indirectly, fix
one of the two code paths that could cause this crash.
Still, there is a situation where send_disconnect could
be false and in this case IWD would still crash. If IWD
is waiting to queue the connect item and netdev_disconnect
is called it would result in the callback being called
immediately. Instead we can add an l_idle as to allow the
callback to happen out of scope, which is what station
expects.
Prior to this patch, the crashing behavior can be tested using
the following script (or some variant of it, your system timing
may not be the same as mine).
iwctl station wlan0 disconnect
iwctl station wlan0 connect <network1> &
sleep 0.02
iwctl station wlan0 connect <network2>
++++++++ backtrace ++++++++
0 0x7f4e1504e530 in /lib64/libc.so.6
1 0x432b54 in network_get_security() at src/network.c:253
2 0x416e92 in station_handshake_setup() at src/station.c:937
3 0x41a505 in __station_connect_network() at src/station.c:2551
4 0x41a683 in station_disconnect_onconnect_cb() at src/station.c:2581
5 0x40b4ae in netdev_disconnect() at src/netdev.c:3142
6 0x41a719 in station_disconnect_onconnect() at src/station.c:2603
7 0x41a89d in station_connect_network() at src/station.c:2652
8 0x433f1d in network_connect_psk() at src/network.c:886
9 0x43483a in network_connect() at src/network.c:1183
10 0x4add11 in _dbus_object_tree_dispatch() at ell/dbus-service.c:1802
11 0x49ff54 in message_read_handler() at ell/dbus.c:285
12 0x496d2f in io_callback() at ell/io.c:120
13 0x495894 in l_main_iterate() at ell/main.c:478
14 0x49599b in l_main_run() at ell/main.c:521
15 0x495cb3 in l_main_run_with_signal() at ell/main.c:647
16 0x404add in main() at src/main.c:490
17 0x7f4e15038b25 in /lib64/libc.so.6
The send_disconnect flag was being improperly set based only
on connect_cmd_id being zero. This does not take into account
the case of CMD_CONNECT having finished but not EAPoL. In this
case we do need to send a disconnect.
This adds a new connection type, TYPE_PSK_OFFLOAD, which
allows the 4-way handshake to be offloaded by the firmware.
Offloading will be used if the driver advertises support.
The CMD_ROAM event path was also modified to take into account
handshake offloading. If the handshake is offloaded we still
must issue GET_SCAN, but not start eapol since the firmware
takes care of this.
Until now FT was only supported via Auth/Assoc commands which barred
any fullmac cards from using FT AKMs. With PSK offload support these
cards can do FT but only when offloading is used.
In the FW scan callback eapol was being stared unconditionally which
isn't correct as roaming on open networks is possible. Instead check
that a SM exists just like is done in netdev_connect_event.
This should have been updated along with the connect and roam
event separation. Since netdev_connect_event is not being
re-used for CMD_ROAM the comment did not make sense anymore.
Still, there needs to be a check to ensure we were not disconnected
while waiting for GET_SCAN to come back.
netdev_connect_event was being reused for parsing of CMD_ROAM
attributes which made some amount of sense since these events
are nearly identical, but due to the nature of firmware roaming
there really isn't much IWD needs to parse from CMD_ROAM. In
addition netdev_connect_event was getting rather complicated
since it had to handle both CMD_ROAM and CMD_CONNECT.
The only bits of information IWD needs to parse from CMD_ROAM
is the roamed BSSID, authenticator IEs, and supplicant IEs. Since
this is so limited it now makes little sense to reuse the entire
netdev_connect_event function, and intead only parse what is
needed for CMD_ROAM.
station should be isolated as much as possible from the details of the
driver type and how a particular AKM is handled under the hood. It will
be up to wiphy to pick the best AKM for a given bss. netdev in turn
will pick how to drive the particular AKM that was picked.
Currently netdev handles SoftMac and FullMac drivers mostly in the same
way, by building CMD_CONNECT nl80211 commands and letting the kernel
figure out the details. Exceptions to this are FILS/OWE/SAE AKMs which
are only supported on SoftMac drivers by using
CMD_AUTHENTICATE/CMD_ASSOCIATE.
Recently, basic support for SAE (WPA3-Personal) offload on FullMac cards
was introduced. When offloaded, the control flow is very different than
under typical conditions and required additional logic checks in several
places. The logic is now becoming quite complex.
Introduce a concept of a connection type in order to make it clearer
what driver and driver features are being used for this connection. In
the future, connection types can be expanded with 802.1X handshake
offload, PSK handshake offload and CMD_EXTERNAL_AUTH based SAE
connections.
Commit 6e8b76527 added a switch statement for AKM suites which
was not correct as this is a bitmask and may contain multiple
values. Intead we can rely on wiphy_select_akm which is a more
robust check anyways.
Fixes: 6e8b7652788a ("wiphy: add check for CMD_AUTH/CMD_ASSOC support")
If there is an associate timeout, retry a few times in case
it was just a fluke. At this point SAE is fully negotiated
so it makes sense to attempt to save the connection.
Any auth proto which did not implement the assoc_timeout handler
could end up getting 'stuck' forever if there was an associate
timeout. This is because in the event of an associate timeout IWD
only sets a few flags and relies on the connect event to actually
handle the failure. The problem is a connect event never comes
if the failure was a timeout.
To fix this we can explicitly fail the connection if the auth
proto has not implemented assoc_timeout or if it returns false.
In the same vein as requesting a neighbor report after
connecting for the first time, it should also be done
after a roam to obtain the latest neighbor information.
Converts ie_rsn_akm_suite values (and WPA1 hint) into a more
human readable security string such as:
WPA2-Personal, WPA3-Personal, WPA2-Personal + FT etc.
When we cancel a quick scan that has already been triggered, the
Scanning property is never reset to false. This doesn't fully reflect
the actual scanning state of the hardware since we don't (yet) abort
the scan, but at least corrects the public API behavior.
{Network} [/net/connman/iwd/0/7/73706733_psk] Connected = False
{Station} [/net/connman/iwd/0/7] Scanning = True
{Station} [/net/connman/iwd/0/7] State = connecting
{Station} [/net/connman/iwd/0/7] ConnectedNetwork =
/net/connman/iwd/0/7/73706733_psk
{Network} [/net/connman/iwd/0/7/73706733_psk] Connected = True
If IWD is connecting to a SAE/WPA3 BSS and Auth/Assoc commands
are not supported the only option is SAE offload. At this point
network_connect should have verified that the extended feature
for SAE offload exists so we can simply enable offload if these
commands are not supported.
SAE offload support requires some minor tweaks to CMD_CONNECT
as well as special checks once the connect event comes in. Since
at this point we are fully connected.
After adding network_bss_update, network now has a match_addr
queue function which can be used to replace an unneeded
l_queue_get_entries loop with l_queue_find.
This will swap out a scan_bss object with a duplicate that may
exist in a networks bss_list. The duplicate will be removed by
since the object is owned by station it is assumed that it will
be freed elsewhere.
If the hardware roams automatically we want to be sure to not
react to CQM events and attempt to roam/disconnect on our own.
Note: this is only important for very new kernels where CQM
events were recently added to brcmfmac.
Roaming on a full mac card is quite different than soft mac
and needs to be specially handled. The process starts with
the CMD_ROAM event, which tells us the driver is already
roamed and associated with a new AP. After this it expects
the 4-way handshake to be initiated. This in itself is quite
simple, the complexity comes with how this is piped into IWD.
After CMD_ROAM fires its assumed that a scan result is
available in the kernel, which is obtained using a newly
added scan API scan_get_firmware_scan. The only special
bit of this is that it does not 'schedule' a scan but simply
calls GET_SCAN. This is treated special and will not be
queued behind any other pending scan requests. This lets us
reuse some parsing code paths in scan and initialize a
scan_bss object which ultimately gets handed to station so
it can update connected_bss/bss_list.
For consistency station must also transition to a roaming state.
Since this roam is all handled by netdev two new events were
added, NETDEV_EVENT_ROAMING and NETDEV_EVENT_ROAMED. Both allow
station to transition between roaming/connected states, and ROAMED
provides station with the new scan_bss to replace connected_bss.
Adds support for getting firmware scan results from the kernel.
This is intended to be used after the firmware roamed automatically
and the scan result is require for handshake initialization.
The scan 'request' is competely separate from the normal scan
queue, though scan_results, scan_request, and the scan_context
are all used for consistency and code reuse.
Register P2P group's vendor IE writers using the new API to build and
attach the necessary P2P IE and WFD IEs to the (Re)Association Response,
Probe Response and Beacon frames sent by the GO.
Roughly validate the IEs and save some information for use in our own
IEs. p2p_extract_wfd_properties and p2p_device_validate_conn_wfd are
being moved unchanged to be usable in p2p_group_event without forward
declarations and to be next to p2p_build_wfd_ie.
Make the WSC IE processing and writing more self-contained (i.e. so that
it can be more easily moved to a separate file if desired) by using the
new ap_write_extra_ies() mechanism.
Pass the string IEs from the incoming STA association frames to
the user in the AP event data. I drop
ap_event_station_added_data.rsn_ie because that probably wasn't
going to ever be useful and the RSN IE is included in the .assoc_ies
array in any case.
Since GET_STATION (and in turn GetDiagnostics) gets the most
current station info this attribute serves as a better indication
of the current signal strength. In addition full mac cards don't
appear to always have the average attribute.
No instances of this macro now exist. If future instances crop up, the
better approach would be to use pragma directives to quiet such warnings
and allow static analysis to catch any issues.
Expanded packets with a 0 vendor id need to be treated just like
non-expanded ones. This led to very nasty looking if statements
throughout this function. Fix that by introducing a nested function
to take care of the response type normalization. This also allows us to
drop uninitialized_var usage.
Expanded Nak packet contains (possibly multiple) 8 byte chunks that
contain the type (1 byte, always '254') vendor-id (3 bytes) and
vendor-type (4) bytes.
Unfortunately the current logic was reading the vendor-id at the wrong
offset (0 instead of 1) and so the extracted vendor-type was incorrect.
Fixes: 17c569ba4cdd ("eap: Add authenticator method logic and API")
If we received a Nak or an Expanded Nak packet, the intent was to print
our own method type. Instead we tried to print the Nak type contents.
Fix that by always passing in our method info to eap_type_to_str.
Fixes: 17c569ba4cdd ("eap: Add authenticator method logic and API")
The '__' prefix is meant for private, semi-private,
inner implementation or otherwise special APIs that
are typically exposed in a header. In the case of watchlist, these
functions were static and do not fit the above description. Remove the
__ prefix accordingly.
Process output was being duplicated when -v was used. This was
due to both stderr and stdout being appended to the write_fd list
as well as stderr being set to stdout in the Popen call.
To fix this only stdout should be appended to the write_fd list,
but then there comes a problem with closing the streams. stdout
cannot be closed, so instead it is special cased. A new
verbose boolean was added to Process which, if True, will
cause any output to be written to stdout explicitly.
The Namespace class was never being removed when tests finished.
This is fixed by unreffing the hwsim internal _radio object which
both cleans up the radio and allows the Namespace to be removed.
This moves all the de-init code into kill(), which fixes a few
reference issues causing processes to hang around longer than
desired. If the process terminates on its own and/or the last
reference is lost __del__ will kill the process and clean up.
There were a few issues with the cleanup of Hostapd. First the
process was only being killed, which did not actually remove
the process from the list.
In addition, with EAP-SIM/AKA tests, hostapd created sim_db
unix socket files which it does not clean up.
This is somewhat of an open issue/TODO but for now this avoids
the exception caused by trying to remove a radio that has been
moved to a namespace. Once a radio is moved hwsim loses that phy
and can no longer interact with it. This causes the Destroy()
method call to fail.
A cleanup parameter was added to __init__ which can be used
by processes which create any additional files or require more
a custom cleanup routine. Some additional house keeping was
done to make Process cleanup more robust.
Though multi-test processes seemed like a good idea in terms of
efficiency, the additional code/special cases was not worth it
for the only two multi-test processes (dbus/haveged). Intead this
concept was removed completely and TestContext/Namespaces will
now start all processes for each individual test. This also is
fair to all tests as a previous failed test could end up bleeding
into future tests.
The cls object is part of the unittest framework and its lifespan
is out of test-runner's control. Setting objects into the cls
object sometimes keeps those objects around longer than desired.
Its best to unset anything set in cls when the test is tore down.
Printing out processes was done manually but instead we can
make Process printing extendable by adding its own __str__
method. This now will print if the process is a multi-test
process as well.
Output files in namespaces were not handled differently and would
end up overwriting/duplicating files from the root namespace. These
are now named /tmp/<process>-<namespace>-out.
The process class was quite hard to understand, and somewhat
fragile when multiple output options were needed like verbose
and logging, and in some cases even an additional output file.
To make things simpler we can have all processes output to a
temporary file (/tmp/<name>-out) and set a GLib IO watch on
that file. When the IO watch callback fires any additional
files (stdout, log files, output files) can be written to.
For wait=True processes we do not use an IO watch, but do
the same thing once the process exits, write to any additional
output files using the process output we already have.
The log dir was never being cleaned out prior to a new logging
test run. This could leave old stale files around. Note that this
will remove any past log files so if you need them, you want to
make a copy before running test-runner with --log again.
This test fails randomly, and it appears to be due to excessive
scanning. Historically most autotests start a dbus scan right
away. The problem is that most likely a periodic scan is already
ongoing, meaning the dbus scan gets queued. If a Connect() call
comes in (which it always does), the dbus scan gets delayed and will
trigger once connected, at a time the test is not expecting. This
can cause problems with any assumed timing as well as offchannel
frames.
This patch removes the explicit DBus scanning and instead uses
scan_if_needed with get_ordered_networks. The 'all_blacklisted_test'
was also modified to wait for scanning to complete after failing
to connect to all BSS's. This lets all the networks fully come
up (after being blocked by hwsim) and appear in scan results.
When using iwd.conf:[General].EnableNetworkConfiguration=true, it is not
possible to configure systemd.network:[Network].MulticastDNS= as
systemd-networkd considers the link to be unmanaged. This patch allows
iwd to configure that setting on systemd-resolved directly.
WSC EAP method always results in failure, even if successful. Failed
eapol_sm sessions are auto-cleaned, so there's no need to do this
explicitly. Also eapol_exit() will clean up any left-over sessions, so
drop this to make the code a bit simpler.
If the extended feature for CQM levels was not supported no CQM
registration would happen, not even for a single level. This
caused IWD to completely lose the ability to roam since it would
only get notified when the kernel was disconnecting, around -90
dBm, not giving IWD enough time to roam.
Instead if the extended feature is not supported we can still
register for the event, just without multiple signal levels.
This fixes up a previous commit which breaks iwctl. The
check was added to satisfy static analysis but it ended
up preventing iwctl from starting. In this case mkdir
can fail (e.g. if the directory already exists) and only
if it fails should the history be read. Otherwise a
successful mkdir return indicates the history folder is
new and there is no reason to try reading it.
There is no functional change here but checking the return
value makes static analysis much happier. Checking the
return and setting the default inside the if clause is also
consistent with how IWD does it many other places.
Dbus should be started as a multi-test process from the
TestContext, which leaves the dbus address file around for
the full test run. For Namespaces dbus-daemon should be
closed when the Namespace closes.
Handle situations where the BSS we're trying to connect to is no longer
in the kernel scan result cache. Normally, the kernel will re-scan the
target frequency if this happens on the CMD_CONNECT path, and retry the
connection.
Unfortunately, CMD_AUTHENTICATE path used for WPA3, OWE and FILS does
not have this scanning behavior. CMD_AUTHENTICATE simply fails with
a -ENOENT error. Work around this by trying a limited scan of the
target frequency and re-trying CMD_AUTHENTICATE once.
Every single roaming test had one of two problems with watching the
state change between roaming --> connected. Either the test used
wait_for_object_condition to wait for 'connected' which could allow
other states in between. Or it simply used an assert. The assert
wouldn't allow other state changes, but at the cost of potentially
failing due to IWD not having made it to the 'connected' state yet.
Now we have wait_for_object_change which takes two conditions:
initial (from_str) and expected (to_str). This API will not allow
any other conditions except these, and will wait for the expected
condition before continuing. This allows roaming test to reliably
wait for the roaming --> connected state change.
This is similar to wait_for_object_condition, but will not allow
any intermediate state changes between the initial and expected
conditions. This is useful for roaming tests when the expected
state change is 'connected' --> 'roaming' with no changes in
between.
This test occationally failed due to a badly timed DBus scan
triggering right when hwsim tried sending out the spoofed frame.
This caused mac80211_hwsim to reject CMD_FRAME when the timing
was just right.
Rather than always starting a DBus scan we can rely on periodic
scans and only DBus scan if there are no networks in IWD's list.
A scanning check was also added prior to sending out the frame
and if true we wait for not scanning. This is more paranoia than
anything.
Sometimes scan results can come in with a MAC address which
should be in the first index of addrs[] (42:xx:xx:xx:xx:xx).
This causes a failure to lookup the radio path.
There was also a failure path added if the radio cannot be
found rather than rely on DBus to fail with a None path.
The arguments to SendFrame were also changed to use the
ByteArray DBus type rather than python's internal bytearray.
This shouldn't have any effect, but its more consistent with
how DBus arguments should be used.
After recent changes fixing wait_for_object_condition it was accidentally
made to only work with classes, not other types of objects. Instead
create a minimal class to hold _wait_timed_out so it doesnt rely on
'obj' holding the boolean.
An earlier patch fixed a problem where a queued quick scan would
be triggered and fail once already connected, resulting in a state
transition from connected --> autoconnect_full. This fixed the
Connect() path but this could also happen via autoconnect. Starting
from a connected state, the sequence goes:
- DBus scan is triggered
- AP disconnects IWD
- State transition from disconnected --> autoconnect_quick
- Queue quick scan
- DBus scan results come in and used to autoconnect
- A connect work item is inserted ahead of all others, transition
from autoconnect_quick --> connecting.
- Connect completes, transition from connecting --> connected
- Quick scan can finally get triggered, which the kernel fails to
do since IWD is connected, transition from connected -->
autoconnect_full.
This can be fixed by checking for a pending quick scan in the
autoconnect path.
Commit eac2410c8314 ("station: Take scanned frequencies into account")
has made it unnecessary to explicitly invoke station_set_scan_results
with the expire to true in case a dbus scan finished prematurely or a
subset was not able to be started. Remove this no-longer needed logic.
Fixes: eac2410c8314 ("station: Take scanned frequencies into account")
The diagnostic interface will now only come up when station is
connected. This avoids the need for display station to return
a 'connected' out parameter. We can instead just see that
the diagnostic interface doesn't exist.
The diagnostic interface returns an error anyways if station is
not connected so it makes more sense to only bring the interface
up when its actually usable. This also removes the interface
when station disconnects, which was never done before (the
interface stayed up indefinitely due to a forgotten remove call).
When we're auto-connecting and have hidden networks configured, use
active scans regardless of whether we see any hidden BSSes in our
existing scan results.
This allows us to more effectively see/connect to hidden networks
when first powering up or after suspend.
Kernel might report hidden BSSes that are reported from beacon frames
separately than ones reported due to probe responses. This may confuse
the station network collation logic since the scan_bss generated by the
probe response might be removed erroneously when processing the scan_bss
that was generated due to a beacon.
Make sure that bss_match also takes the SSID into account and only
matches scan_bss structures that have the same BSSID and SSID contents.
Instead of manually managing whether to expire BSSes or not, use the
scanned frequency set instead. This makes the API slightly easier to
understand (dropping two boolean arguments in a row) and also a bit more
future-proof.
Commit d372d59bea3e checks whether a hidden network had a previous
connection attempt and re-tries. However, it inadvertently dropped
handling of a condition where a non-hidden network SSID is provided to
ConnectHiddenNetwork. Fix that.
Fixes: d372d59bea3e ("station: Allow ConnectHiddenNetwork to be retried")
The diagnostic interface serves no purpose until the AP has
been started. Any calls on it will return an error so instead
it makes more sense to bring it up when the AP is started, and
down when the AP is stopped.
This will show some basic AP information like Started and
network Name. Some cleanup was done to make the AP interface
and client table columns line up.
Its useful being able to refer to the network Name/SSID once
an AP is started. For example opening an iwctl session with an
already started AP provides no way of obtaining the SSID.
In some cases the AP can send a deauthenticate frame right after
accepting our authentication. In this case the kernel never properly
sends a CMD_CONNECT event with a failure, even though CMD_COONNECT was
used to initiate the connection. Try to work around that by detecting
that a Deauthenticate event arrives prior to any Associte or Connect
events and handle this case as a connect failure.
To help understand scanning results a bit better and cut down on scan
output add an option to not print the contents of the IEs. Only the
SSID IE will be printed.
Now that ConnectHiddenNetwork can be invoked while we're connected, set
the mac randomization hint parameter properly. The kernel will reject
requests if randomization is enabled while we're connected to a network.
If we forget a hidden network, then make sure to remove it from the
network list completely. Otherwise it would be possible to still
issue a Network.Connect to that particular object, but the fact that the
network is hidden would be lost.
StartProfile was added to the AP interface but the required
command was never added to iwctl. This command requires a
profile exists in <configuration dir>/ap/. The syntax is as
follows:
ap <wlanX> start-profile <profile_name>
==17639== 72 (16 direct, 56 indirect) bytes in 1 blocks are definitely
lost in loss record 3 of 3
==17639== at 0x4C2F0CF: malloc (vg_replace_malloc.c:299)
==17639== by 0x4670AD: l_malloc (util.c:61)
==17639== by 0x4215AA: scan_freq_set_new (scan.c:1906)
==17639== by 0x412A9C: parse_neighbor_report (station.c:1910)
==17639== by 0x407335: netdev_neighbor_report_frame_event
(netdev.c:3522)
==17639== by 0x44BBE6: frame_watch_unicast_notify (frame-xchg.c:233)
==17639== by 0x470C04: dispatch_unicast_watches (genl.c:961)
==17639== by 0x470C04: process_unicast (genl.c:980)
==17639== by 0x470C04: received_data (genl.c:1101)
==17639== by 0x46D9DB: io_callback (io.c:118)
==17639== by 0x46CC0C: l_main_iterate (main.c:477)
==17639== by 0x46CCDB: l_main_run (main.c:524)
==17639== by 0x46CF01: l_main_run_with_signal (main.c:656)
==17639== by 0x403EDE: main (main.c:490)
In the case that ConnectHiddenNetwork scans successfully, but fails for
some other reason, the network object is left in the scan results until
it expires. This will prevent subsequent attempts to use
ConnectHiddenNetwork with a .NotHidden error. Fix that by checking
whether a found network is hidden, and if so, allow the request to
proceed.
Rework the logic slightly so that this function returns an error message
on error and NULL on success, just like other D-Bus method
implementations. This also simplifies the code slightly.
We used to not allow to connect to a different network while already
connected. One had to disconnect first. This also applied to
ConnectHiddenNetwork calls.
This restriction can be dropped now. station will intelligently
disconnect from the current AP when a station_connect_network() is
issued.
If the disconnect fails and station_disconnect_onconnect_cb is called
with an error, we reply to the original message accordingly.
Unfortunately pending_connect is not unrefed or cleared in this case.
Fix that.
Fixes: d0ee923dda0b ("station: Disconnect, if needed, on a new connection attempt")
An invalid known_network.freq file containing several UUID
groups which have the same 'name' key results in memory leaks
in IWD. This is because the file is loaded and the group's
are iterated without detecting duplicates. This leads to the
same network_info's known_frequencies being set/overridden
multiple times.
To fix this we just check if the network_info already has a
UUID set. If so remove the stale entry.
There may be other old, invalid, or stale entries from previous
versions of IWD, or a user misconfiguring the file. These will
now also be removed during load.
netdev_shutdown calls queue_destroy on the netdev_list, which in turn
calls netdev_free. netdev_free invokes the watches to notify them about
the netdev being removed. Those clients, or anything downstream can
still invoke netdev_find. Unfortunately queue_destroy is not re-entrant
safe, so netdev_find might return stale data. Fix that by using
l_queue_peek_head / l_queue_pop_head instead.
src/station.c:station_enter_state() Old State: connecting, new state:
connected
^CTerminate
src/netdev.c:netdev_free() Freeing netdev wlan1[6]
src/device.c:device_free()
Removing scan context for wdev 100000001
src/scan.c:scan_context_free() sc: 0x4ae9ca0
src/netdev.c:netdev_free() Freeing netdev wlan0[48]
src/device.c:device_free()
src/station.c:station_free()
src/netconfig.c:netconfig_destroy()
==103174== Invalid read of size 8
==103174== at 0x467AA9: l_queue_find (queue.c:346)
==103174== by 0x43ACFF: netconfig_reset (netconfig.c:1027)
==103174== by 0x43AFFC: netconfig_destroy (netconfig.c:1123)
==103174== by 0x414379: station_free (station.c:3369)
==103174== by 0x414379: station_destroy_interface (station.c:3466)
==103174== by 0x47C80C: interface_instance_free (dbus-service.c:510)
==103174== by 0x47C80C: _dbus_object_tree_remove_interface
(dbus-service.c:1694)
==103174== by 0x47C99C: _dbus_object_tree_object_destroy
(dbus-service.c:795)
==103174== by 0x409A87: netdev_free (netdev.c:770)
==103174== by 0x4677AE: l_queue_clear (queue.c:107)
==103174== by 0x4677F8: l_queue_destroy (queue.c:82)
==103174== by 0x40CDC1: netdev_shutdown (netdev.c:5089)
==103174== by 0x404736: iwd_shutdown (main.c:78)
==103174== by 0x404736: iwd_shutdown (main.c:65)
==103174== by 0x46BD61: handle_callback (signal.c:78)
==103174== by 0x46BD61: signalfd_read_cb (signal.c:104)
In the case of module_init failing due to a module that comes after
netdev, the netdev module doesn't clean up netdev_list properly.
==6254== 24 bytes in 1 blocks are still reachable in loss record 1 of 1
==6254== at 0x483777F: malloc (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==6254== by 0x4675ED: l_malloc (util.c:61)
==6254== by 0x46909D: l_queue_new (queue.c:63)
==6254== by 0x406AE4: netdev_init (netdev.c:5038)
==6254== by 0x44A7B3: iwd_modules_init (module.c:152)
==6254== by 0x404713: nl80211_appeared (main.c:171)
==6254== by 0x4713DE: process_unicast (genl.c:993)
==6254== by 0x4713DE: received_data (genl.c:1101)
==6254== by 0x46E00B: io_callback (io.c:118)
==6254== by 0x46D20C: l_main_iterate (main.c:477)
==6254== by 0x46D2DB: l_main_run (main.c:524)
==6254== by 0x46D2DB: l_main_run (main.c:506)
==6254== by 0x46D502: l_main_run_with_signal (main.c:656)
==6254== by 0x403EDB: main (main.c:490)
Rather than the previous hack which disabled group traffic it
was found that the GTK RSC could be manually set to zero which
allows group traffic. This appears to fix AP mode on brcmfmac
along with the previous fixes. This is not documented in
nl80211, but appears to work with this driver.
This is how a fullmac card tells userspace that a station has
left. This fixes the issue where the same client cannot re-connect
to the same AP multiple times. ap_new_station was renamed to
ap_handle_new_station for consistency.
Some fullmac cards were found to be buggy with getting the GTK
where it returns a BIP key for the GTK index, even after creating
a GTK with NEW_KEY explicitly. In an effort to get these cards
semi-working we can treat this just as a warning and continue with
the handshake without a GTK set which disables group traffic. A
warning is printed in this case so the user is not completely in
the dark.
Fix an issue with the recent changes to signal monitoring from commit
f456501b ("station: retry roaming unless notified of a high RSSI"):
1. driver sends NL80211_CQM_RSSI_THRESHOLD_EVENT_LOW
2. netdev->cur_rssi_low changes from FALSE to TRUE
3. netdev sends NETDEV_EVENT_RSSI_THRESHOLD_LOW to station
4. on roam reassociation, cur_rssi_low is reset to FALSE
5. station still assumes RSSI is low, periodically roams
until netdev sends NETDEV_EVENT_RSSI_THRESHOLD_HIGH
6. driver sends NL80211_CQM_RSSI_THRESHOLD_EVENT_HIGH
7. netdev->cur_rssi_low doesn't change (still FALSE)
8. netdev never sends NETDEV_EVENT_RSSI_THRESHOLD_HIGH
9. station remains stuck in an infinite roaming loop
The commit in question introduced the logic in (5). Previously the
assumption in station was - like in netdev - that if the signal was
still low, the driver would send a duplicate LOW event after
reassociation. This change makes netdev follow the same new logic as
station, i.e. assume the same signal state (LOW/HIGH) until told
otherwise by the driver.
The testAPRoam autotest was silently failing on my machine until I
realized that my distribution hostapd (Arch Linux) is not built with
CONFIG_WNM_AP=y. Indeed, it is also disabled by default in upstream
hostapd. This resulted in the send_bss_transition() function of
hostapd.py silently failing. With this change, throw an exception in
case the BSS_TM_REQ command does not succeed to hopefully save others
the time of debugging this problem.
Since fullmac cards handle auth/assoc in firmware IWD must
react differently while in AP mode just as it does in station.
For fullmac cards a NEW_STATION event is emitted post association
and from here the 4-way handshake can begin. In this NEW_STATION
handler a new sta_state is created and the needed members are
set in order to inject us back into the normal code execution
for softmac post association (i.e. creating group keys and
starting the 4-way handshake). From here everything works the
same as softmac.
After the test-runner re-write many tests were left with
stale options that are no longer used at all. These were
periodically getting removed as changes were made to
individual tests, but its apparent now that a tree wide
removal was needed.
The kvmguest shorthand was removed after the release of Linux 5.10. It
was just shorthand for kvm_guest.config anyway, so update the
test-runner documentation accordingly.
At some point the non-interactive client tests began failing.
This was due to a bug in station where it would transition from
'connected' to 'autoconnect' due to a failed scan request. This
happened because a quick scan got scheduled during an ongoing
scan, then a Connect() gets issued. The work queue treats the
Connect as a priority so it delays the quick scan until after the
connection succeeds. This results in a failed quick scan which
IWD does not expect to happen when in a 'connected' state. This
failed scan actually triggers a state transition which then
gets IWD into a strange state where its connected from the
kernel point of view but does not think it is:
src/station.c:station_connect_cb() 13, result: 0
src/station.c:station_enter_state() Old State: connecting, new state: connected
src/wiphy.c:wiphy_radio_work_done() Work item 6 done
src/wiphy.c:wiphy_radio_work_next() Starting work item 5
src/station.c:station_quick_scan_triggered() Quick scan trigger failed: -95
src/station.c:station_enter_state() Old State: connected, new state: autoconnect_full
To fix this IWD should simply cancel any pending quick scans
if/when a Connect() call comes in.
There were some major problems related to logging and process
output. Tests which required output from start_process would
break if used with '--log/--verbose'. This is because we relied
on 'communicate' to retrieve the process output, but Popen does
not store process output when stdout/stderr are anything other
than PIPE.
Intead, in the case of logging or outfiles, we can simply read
from the file we just wrote to.
For an explicit --verbose application we must handle things
slightly different. A keyword argument was added to Process,
'need_out' which will ensure the process output is kept
regardless of --log or --verbose.
Now a user should be able to use --log/--verbose without any
tests failing.
The verbose arguments come in from the QEMU command line as a
single string. This should have been split into an array immediately
but was not. This led to issues like hostapd debug being enabled
when "-v hostapd_cli" was passed in.
Since the list of files copied to /tmp was part of the return value from
pre_test(), if an exception occurred inside pre_test(), "copied" would
be undefined and the post_test(ctx, copied) call in the finally clause
cause another exception:
raceback (most recent call last):
File "/home/balrog/repos/iwd/tools/test-runner", line 1508, in <module>
run_tests()
File "/home/balrog/repos/iwd/tools/test-runner", line 1242, in run_tests
run_auto_tests(config.ctx, args)
File "/home/balrog/repos/iwd/tools/test-runner", line 1166, in run_auto_tests
post_test(ctx, copied)
UnboundLocalError: local variable 'copied' referenced before assignment
(apart from not being able to clean up the files). Pass "copied" as a
paremeter to pre_test instead.
Switch EAP-TLS-ClientCert and EAP-TLS-ClientKey to use
l_cert_load_container_file for file loading so that the file format is
autodetected. Add new setting EAP-TLS-ClientKeyBundle for loading both
the client certificate and private key from one file.
As requested move the client certificate and private key loading from
eap-tls-common.c to eap-tls.c. No man page change needed because those
two settings weren't documented in it in the first place.
After the re-write this was broken and not noticed until
recently. The issue appeared to be that the GLib timeout
callback retained no context of local variables. Previously
_wait_timed_out was set as a class variable, but this was
removed so multiple IWD instances could work. Without
_wait_timed_out being a class variable the GLib timeout
setting it had no effect on the wait loop.
To fix this we can set _wait_timed_out on the object being
passed in. This is preserved in the GLib timeout callback
and setting it gets honored in the wait loop.
This command uses GetDiagnostics to show a list of connected
clients and some information about them. The information
contained for each connected station nearly maps 1:1 with the
station diagnostics information shown in "station <wlan> show"
apart from "ConnectedBss" which is now "Address".
For now this module serves as a helper for printing diagnostic
dictionary values. The new API (diagnostic_display) takes a
Dbus iterator which has been entered into a dictionary and
prints out each key and value. A mapping struct was defined
which maps keys to types and units. For simple cases the mapping
will consist of a dbus type character and a units string,
e.g. dBm, Kbit/s etc. For more complex printing which requires
processing the value the 'units' void* cant be set to a
function which can be custom written to handle the value.
This adds a new AccessPointDiagnostic interface. This interface
provides similar low level functionality as StationDiagnostic, but
for when IWD is in AP mode. This uses netdev_get_all_stations
which will dump all stations, parse, and return each station in
an individual callback. Once the dump is complete the destroy is
called and all data is packaged as an array of dictionaries.
AP mode will use the same structure for its diagnostic interface
and mostly the same dictionary keys. Apart from ConnectedBss and
Address being different, the remainder are the same so the
diagnostic_station_info to DBus dictionary conversion has been made
common so both station and AP can use it to build its diagnostic
dictionaries.
With AP now getting its own diagnostic interface it made sense
to move the netdev_station_info struct definition into its own
header which eventually can be accompanied by utilities in
diagnostic.c. These utilities can then be shared with AP and
station as needed.
systemd specifies a special passive target unit 'network-pre.target'
which may be pulled in by services that want to run before any network
interface is brought up or configured. Correspondingly, network
management services such as iwd and ead should specify
After=network-pre.target to ensure a proper ordering with respect to
this special target. For more information on network-pre.target, see
systemd.special(7).
Two examples to explain the rationale of this change:
1. On one of our embedded systems running iwd, a oneshot service is
run on startup to configure - among other things - the MAC address of
the wireless network interface based on some data in an EEPROM.
Following the systemd documentation, the oneshot service specifies:
Before=network-pre.target
Wants=network-pre.target
... to ensure that it is run before any network management software
starts. In practice, before this change, iwd was starting up and
connecting to an AP before the service had finished. iwd would then
get kicked off by the AP when the MAC address got changed. By
specifying After=network-pre.target, systemd will take care to avoid
this situation.
2. An administrator may wish to use network-pre.target to ensure
firewall rules are applied before any network management software is
started. This use-case is described in the systemd documentation[1].
Since iwd can be used for IP configuration, it should also respect
the After=network-pre.target convention.
Note that network-pre.target is a passive unit that is only pulled in if
another unit specifies e.g. Wants=network-pre.target. If no such unit
exists, this change will have no effect on the order in which systemd
starts iwd or ead.
[1] https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
Following a successful roaming sequence, schedule another attempt unless
the driver has sent a high RSSI notification. This makes the behaviour
analogous to a failed roaming attempt where we remained connected to the
same BSS.
This makes iwd compatible with wireless drivers which do not necessarily
send out a duplicate low RSSI notification upon reassociation. Without
this change, iwd risks getting indefinitely stuck to a BSS with low
signal strength, even though a better BSS might later become available.
In the case of a high RSSI notification, the minimum roam time will also
be reset to zero. This preserves the original behaviour in the case
where a high RSSI notification is processed after station_roamed().
Doing so also gives a chance for faster roaming action in the following
example scenario:
1. RSSI LOW
2. schedule roam in 5 seconds
(5 seconds pass)
3. try roaming
4. roaming fails, same BSS
5. schedule roam in 60 seconds
(20 seconds pass)
6. RSSI HIGH
7. cancel scheduled roam
(20 seconds pass)
8. RSSI LOW
9. schedule roam in 5 seconds or 20 seconds?
By resetting the minimum roam time, we can avoid waiting 20 seconds when
the station may have moved considerably. And since the high/low RSSI
notifications are configured with a hysteresis, we should still be
protected against too frequent spurious roaming attempts.
This takes a Dbus iterator which has been entered into a
dictionary and prints out each key and value. It requires
a mapping which maps keys to types and units. For simple
cases the mapping will consist of a dbus type character
and a units string, e.g. dBm, Kbit/s etc. For more complex
printing which requires processing the value the 'units'
void* cant be set to a function which can be custom written
to handle the value.
This is a nl80211 dump version of netdev_get_station aimed at
AP mode. This will dump all stations, parse into
netdev_station_info structs, and call the callback for each
individual station found. Once the dump is completed the destroy
callback is called.
Since commit 836beb1276d1bc77889462ae514f0c5b708a38d7 removed beacon
loss handling, the roam_no_orig_ap variable has no use and is always set
to false. This commit removes it.
The information requested with GetDiagnostics will now appear in
the "station <iface> show" command. If IWD is not connected, or
there is no diagnostic interface (older IWD version) 'show' will
behave as it always has, only showing scanning/connected.
Some elements, though unlikely, are not required to be included
with the GET_STATION call that GetDiagnostics relies on. mac80211
based drivers include most of these, but other drivers may not.
To be on the safe side all properties except ConnectedBss are now
optional and may not be included.
This adds a generalized API for GET_STATION. This API handles
calling and parsing the results into a new structure,
netdev_station_info. This results structure will hold any
data needed by consumers of netdev_get_station. A helper API
(netdev_get_current_station) was added as a convenience which
automatically passes handshake->aa as the MAC.
For now only the RSSI is parsed as this is already being
done for RSSI polling/events. Looking further more info will
be added such as rx/tx rates and estimated throughput.
Arrays of dictionaries are quite common, and for basic
types this API makes things much more convenient by
putting all the enter/append/leave calls in one place.
Retrieve the dependencies of readline through pkg-config (and fallback
to -lreadline) to avoid the following build failure:
/nvme/rc-buildroot-test/scripts/instance-0/output-1/host/opt/ext-toolchain/bin/../lib/gcc/x86_64-buildroot-linux-uclibc/8.3.0/../../../../x86_64-buildroot-linux-uclibc/bin/ld: /nvme/rc-buildroot-test/scripts/instance-0/output-1/host/bin/../x86_64-buildroot-linux-uclibc/sysroot/usr/lib/libreadline.a(display.o): in function `cr':
display.c:(.text+0x1ab): undefined reference to `tputs'
Fixes:
- http://autobuild.buildroot.org/results/8fb1341f2f5094c346456b43b4fc04996c2e1485
Add a parameter to station_set_scan_results to allow skipping the
removal of old BSSes. In the DBus-triggered scan only expire BSSes
after having gone through the full supported frequency set.
It should be safe to pass partial scan results to
station_set_scan_results() when not expiring BSSes so using this new
parameter I guess we could also call it for roam scan results.
A scan normally takes about 2 seconds on my dual-band wifi adapter when
connected. The drivers will normally probe on each supported channel in
some unspecified order and will have new partial results after each step
but the kernel sends NL80211_CMD_NEW_SCAN_RESULTS only when the full
scan request finishes, and for segmented scans we will wait for all
segments to finish before calling back from scan_active() or
scan_passive().
To improve user experience define our own channel order favouring the
2.4 channels 1, 6 and 11 and probe those as an individual scan request
so we can update most our DBus org.connman.iwd.Network objects more
quickly, before continuing with 5GHz band channels, updating DBus
objects again and finally the other 2.4GHz band channels.
The overall DBus-triggered scan on my wifi adapter takes about the same
time but my measurements were not very strict, and were not very
consistent with and without this change. With the change most Network
objects are updated after about 200ms though, meaning that I get most
of the network updates in the nm-applet UI 200ms from opening the
network list. The 5GHz band channels take another 1 to 1.5s to scan and
remaining 2.4GHz band channels another ~300ms.
Hopefully this is similar when using other drivers although I can easily
imagine a driver that parallelizes 2.4GHz and 5GHz channel probing using
two radios, or uses 2, 4 or another number of dual-band radios to probe
2, 4, ... channels simultanously. We'd then lose some of the
performance benefit. The faster scan results may be worth the longer
overall scan time anyway.
I'm also assuming that the wiphy's supported frequency list is exactly
what was scanned when we passed no frequency list to
NL80211_CMD_TRIGGER_SCAN and we won't get errors for passing some
frequency that shouldn't have been scanned.
Use the hwsim DBus API rather than command line. This both is
faster and more dynamic than doing so with the command line.
This also avoids tracking the radio ID since we can just hang
on to the radio Dbus object directly.
The Create() API was limited to only taking a Name and boolean
(for p2p enabling). The actual hwsim nl80211 API can take more
attributes than this (which are actually utilized when creating
from the command line). To get the DBus API up to the same
functionality the two arguments in Create were replaced with
a single dictionary. This allows for extending later if more
arguments are needed.
In the NEW_RADIO callback hwsim was assuming that DBus had no
yet replied to the Create() method. In some cases the NEW_RADIO
event fires before the actual callback which will respond to
DBus. This causes a crash in the create callback.
Starts hwsim but does not register to mac80211_hwsim. This is to
allow autotests to disable hwsim, while still having the ability
to create/destroy radios over DBus.
Readline uses the characters \001 and \002 to mark the start and end
of zero-length character sequnces in the prompt before prompt
expansion. Without these characters the input point can become offset
from the visual end of the prompt when performing some actions.
Tests netconfig with a static configuration, as well as tests ACD
functionality.
The test has two IWD radios which will eventually use the same IP.
One is configured statically, one will receive the IP via DHCP.
The static client sets its IP first and begins using it. Then the
DHCP client is started. Since ACD in a DHCP client is configured
to use its address indefinitely, the static client *should* give
up its address.
When the IP is configured to be static we can now use ACD in
order to check that the IP is available and not already in
use. If a conflict is found netconfig will be reset and no IP
will be set on the interface. The ACD client is left with
the default 'defend once' policy, and probes are not turned
off. This will increase connection time, but for static IP's
it is the best approach.
For better reliability the processor count is now set to qemu.
In cases of low CPU count (< 2) hosts the processor count is
limited to 1. Otherwise half of the host cores will be used for
the VM.
Certain classes were still using the default namespace. This
didn't matter yet since testAP was the only test using namespaces,
and the AP interface was the only one being used.
For an IWD station on a separate namespace all objects need to
be accessable, so the namespace is passed along to those as needed.
Allow the storage directory (default /tmp/iwd) to be configured
just like the state directory. This is in order to support multiple
IWD instances which require separate storage directories for network
provisioning files.
The docs just specified what a IP prefix looks like, not an
actual example. Though its not recommended to just copy paste
blindly, its still useful to have some value in the man pages
that actually works if someone just wants to get a DHCP server
working.
In the strange case that the dns list or the domain list are empty and
openresolv is being used, delete the openresolv entry instance instead
of trying to set it to an empty value
Make sure to erase the network_info of a known network that has been
removed before disconnecting any stations connected to it. This fixes
the following warning observed when forgetting a connected network:
WARNING: ../git/src/network.c:network_rank_update() condition n < 0 failed
This also fixes a bug where such a forgotten network would incorrectly
appear as the first element in the response to GetOrderedNetworks(). By
clearing the network_info, network_rank_update() properly negates the
rank of the now-unknown network.
2020-12-01 09:54:52 -06:00
663 changed files with 58308 additions and 19072 deletions
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.