If the interface isn't available by the time we acquire the well-known
name, clients can get confused when the expected interfaces are missing
during bus activation.
This makes it clear the BSS being selected for a connection/roam has
any quirks associated with its OUI(s) and that IWD may behave
differently based on these.
If a BSS is requesting IWD roam elsewhere but does not include a
preferred candidate list try getting a neighbor report before doing
a full scan.
If the limited scan based on the candidate list comes up empty this
would previously result in IWD giving up on the AP roam entirely.
This patch also improves that behavior slightly by doing a full
scan afterwards as a last ditch effort. If no BSS's are found after
that, IWD will give up on the AP roam.
This has been a long standing issue on Aruba APs where the scan
IEs differ from the IEs received during FT. For compatibility we
have been carrying a patch to disable the replay counter check but
this isn't something that was ever acceptable for upstream. Now
with the addition of vendor quirks this check can be disabled only
for the OUI of Aruba APs.
Reported-by: Michael Johnson <mjohnson459@gmail.com>
Co-authored-by: Michael Johnson <<mjohnson459@gmail.com>
ignore_bss_tm_candidates:
When a BSS requests a station roam it can optionally include a
list of BSS's that can be roamed to. IWD uses this list and only
scans on those frequencies. In some cases though the AP's list
contains very poor options and it would be better for IWD to
request a full neighbor report.
replay_counter_mismatch:
On some Aruba APs there is a mismatch in the replay counters
between what is seen in scans versus authentications/associations.
This difference is not allowed in the spec, therefore IWD will
not connect. This quirk is intended to relax that check.
This module will provide a database for known issues or quirks with
wireless vendors.
The vendor_quirks_append_for_oui() API is intended to be called from
scan.c when parsing vendor attributes. This will lookup any quirks
associated with the OUI provided and combine them into an existing
vendor_quirk structure. This can be repeated against all the vendor
OUI's seen in the scan then referenced later to alter IWD behavior.
In the future more critera could be added such as MAC address prefix
or more generalized IE matches e.g.
vendor_quirks_append_for_mac()
vendor_quirks_append_for_ie()
etc.
If there were no BSS candidates found after trying to roam make
sure the old roam_freqs list gets cleared so IWD doesn't end up
scanning potentially old frequencies on the next retry.
Some drivers do not handle the colocated scan flag very well and this
results in BSS's not being seen in scans. This of course results in
very poor behavior.
This has been seen on ath11k specifically but after some
conversations [1] on the linux-wireless mailing list others have
reported issues with iwlwifi acting similarly. Since there are many
hardware variants that use both ath11k and iwlwifi this new quirk
isn't being forced to those drivers, but let users configure IWD to
disable the flag if needed.
[1] https://lore.kernel.org/linux-wireless/d1e75a08-047d-7947-d51a-2e486efead77@candelatech.com/
In kernel 6.8 a new CMD_ASSOCIATE failure path was added which checks
if the AP has a channel switch in progress. Eariler patches update
IWD into handling this case better, and this new test exercises that.
After CSA IE parsing was added to the kernel this opened up the
possibility that associations could be rejected locally based on
the contents of this CSA IE in the AP's beacons. Overall, it was
always possible for a local rejection but this case was never
considered by IWD. The CSA-based rejection is something that can
and does happen out in the wild.
When this association rejection happens it desync's IWD and the
kernel's state:
1. IWD begins an FT roam. Authenticates successfully, then proceeds
to calling netdev_ft_reassociate().
2. Immediately IWD transitions to a ft-roaming state and waits for
an association response.
3. CMD_ASSOCIATE is rejected by the kernel in the ACK which IWD
handles by sending a deauthenticate command to the kernel (since
we have a valid authentication to the new BSS).
4. Due to a bug IWD uses the target BSSID to deauthenticate which
the kernel rejects since it has no knowledge of this auth. This
error is not handled or logged.
5. IWD proceeds, assuming its deauthenticated, and transitions to a
disconnected state. The kernel remains "connected" which of course
prevents any future connections.
A simple fix for this is to address the bug (4) in IWD that deauths
using the current BSS roam target. This is actually legacy behavior
from back when IWD used CMD_AUTHENTICATE. Today the kernel is unaware
that IWD authenticated so a deauth is not going to be effective.
Instead we can issue a CMD_DISCONNECT. This is somewhat of a large
hammer, but since the handshake and internal state has already been
modified to use the new target BSS we cannot go back and maintain the
existing connect (though it is _possible_, see the TODO in the
patch).
In an ideal world userspace should never be getting a channel switch
event unless connected to an AP, but alas this has been seen at least
with ath10k hardware. This causes IWD to crash since the logic
assumes netdev->handshake is set.
These two newly parsed station info params "inactive time" and the
"connected time" would be helpful to track the duration (in ms) for
which the station was last inactive and the total duration (in s) for
which the station is currently connected to the AP.
When the wlan device is in STA mode, these fields represent the info
of this station device. And when wlan device is in AP mode, then these
fields repesents the stations that are connected to this AP device.
Since netconfig is now part of the Connect() call from a DBus
perspective add a note indicating that this method has the potential
to take a very long time if there are issues with DHCP.
Since the method return to Connect() and ConnectBssid() come after
netconfig some tests needed to be updated since they were waiting
for the method return before continuing. For timeout-based tests
specifically this caused them to fail since before they expected
the return to come before the connection was actually completed.
Let the caller specify the method timeout if there is an expectation
that it could take a long time.
For the conventional connect call (not the "bssid" debug variant) let
them pass their own callback handlers. This is useful if we don't
want to wait for the connect call to finish, but later get some
indication that it did finish either successfully or not.
A netconfig failure results in a failed connection which restarts
autoconnect and prevents IWD from retrying the connection on any
other BSS's within the network as a whole. When autoconnect restarts
IWD will scan and choose the "best" BSS which is likely the same as
the prior attempt. If that BSS is somehow misconfigured as far as
DHCP goes, it will likely fail indefinitely and in turn cause IWD to
retry indefinitely.
To improve this netconfig has been adopted into the IWD's BSS retry
logic. If netconfig fails this will not result in IWD transitioning
to a disconnected state, and instead the BSS will be network
blacklisted and the next will be tried. Only once all BSS's have been
tried will IWD go into a disconnected state and start autoconnect
over.
When netconfig is enabled the DBus reply was being sent in
station_connect_ok(), before netconfig had even started. This would
result in a call to Connect() succeeding from a DBus perspective but
really netconfig still needed to complete before IWD transitioned
to a connected state.
Fixes: 72e7d3ceb83d ("station: Handle NETCONFIG_EVENT_FAILED")
This adds a new API network_clear_blacklist() and removes this
functionality from network_connected(). This is done to support BSS
iteration when netconfig is enabled. Since a call to
network_connected() will happen prior to netconfig completing we
cannot clear the blacklist until netconfig has either passed or
failed.
If this fails, in some cases, -EAGAIN would be returned up to netdev
which would then assume a retry would be done automatically. This
would not in fact happen since it was an internal SAE failure which
would result in the connect method return to never get sent.
Now if sae_send_commit() fails, return -EPROTO which will cause
netdev to fail the connection.
A BSS can temporarily reject associations and provide a delay that
the station should wait for before retrying. This is useful when
sane values are used, but taking it to the extreme an AP could
potentially request the client wait UINT32_MAX TU's which equates
to 49 days.
Either due to a bug, or worse by design, the kernel will wait for
however long that timeout is. Luckily the kernel also sends an event
to userspace with the amount of time it will be waiting. To guard
against excessive timeouts IWD will now handle this event and enforce
a maximum allowed value. If the timeout exceeds this IWD will
deauthenticate.
Specifically for the NO_MORE_STAS reason code, add the BSS to the
(now renamed) AP_BUSY blacklist to avoid roaming to this BSS for
the near future.
Since we are now handling individual reason codes differently the
whole IS_TEMPORARY_STATUS macro was removed and replaced with a
case statement.
The initial pass of this feature only envisioned BSS transition
management frames as the trigger to "roam blacklist" a BSS, hence
the original name. But some APs actually utilize status codes that
also indicate to the stations that they are busy, or not able to
handle more connections. This directly aligns with the original
motivation of the "roam blacklist" series and these events should
also trigger this type of blacklist.
First, since we will be applying this blacklist to cases other
than being told to roam, rename this reason code internally to
BLACKLIST_REASON_AP_BUSY. The config option is also being renamed
to [Blacklist].InitialAccessPointBusyTimeout while also supporting
the old config option, but warning that it is deprecated.
Some drivers/hardware are more strict about limiting use of
certain frequencies on startup until the regulatory domain has
been set. For most cards the only way to set the regulatory domain
is to scan and see BSS's nearby that advertise the country they
reside in.
This is particularly important for AP mode since AP's are always
emitting radiation from beacons and will not start until the desired
frequency is both enabled and allows IR. To make this process
seamless in IWD we will first check that the desired frequency is
enabled/IR and if not issue a scan to (hopefully) get the regulatory
domain set.
A prior patch broke this by checking the return of
l_dbus_message_iter_next_entry. This was really subtle but the logic
actually relied on _not_ checking that return in order to handle
empty lists.
Instead of reverting the logic was adapted/commented to make it more
clear what the API expects from DBus. If list contains at least one
value the first element path will get set, if it contains zero
values "new_path" will be set to NULL which will then cause the
list to be cleared later on.
This both fixes the regression, and makes it clear that a zero
element list is supported and handled.
The length of EncryptedSecurity was assumed to be at least 16 bytes
and anything less would underflow the length to l_malloc.
Fixes: 01cd8587606b ("storage: implement network profile encryption")
The MAC and version elements weren't super critical but the channel
and bootstrapping key elements would result in memory leaks if there
were duplicates.
This patch now will not allow duplicate elements in the URI.
Fixes: f7f602e1b1e7 ("dpp-util: add URI parsing")
The survey arrays were exactly the number of valid channels for a
given band (e.g. 14 for 2.4GHz) but since channels start at 1 this
means that the last channel for a band would overflow the array.
Fixes: 35808debaefd ("scan: use GET_SURVEY for SNR calculation in ranking")
Supporting PMKSA on fullmac drivers requires that we set the PMKSA
into the kernel as well as remove it. This can now be triggered
via the new PMKSA driver callbacks which are implemented and set
with this patch.
In order to support fullmac drivers the PMKSA entries must be added
and removed from the kernel. To accomplish this a set of driver
callbacks will be added to the PMKSA module. In addition a new
pmksa_cache_free API will be added whos only purpose is to handle
the removal from the kernel.
The iwd_notice function was more meant for special purpose events
not general debug prints. For these error conditions we should be
using l_warn. For the informational "External Auth to SSID" log
we already print this information when connecting from station. In
addition there are logs when performing external auth so it should
be very obvious external auth is being used without this log.
The netdev frame watches got cleaned up upon the interface going down
which works if the interface is simply being toggled but when IWD
shuts down it first shuts down the interface, then immediately frees
netdev. If a watched frame arrives immediately after that before the
interface shutdown callback it will reference netdev, which has been
freed.
Fix this by clearing out the frame watches in netdev_free.
==147== Invalid read of size 8
==147== at 0x408ADB: netdev_neighbor_report_frame_event (netdev.c:4772)
==147== by 0x467C75: frame_watch_unicast_notify (frame-xchg.c:234)
==147== by 0x4E28F8: __notifylist_notify (notifylist.c:91)
==147== by 0x4E2D37: l_notifylist_notify_matches (notifylist.c:204)
==147== by 0x4A1388: process_unicast (genl.c:844)
==147== by 0x4A1388: received_data (genl.c:972)
==147== by 0x49D82F: io_callback (io.c:105)
==147== by 0x49C93C: l_main_iterate (main.c:461)
==147== by 0x49CA0B: l_main_run (main.c:508)
==147== by 0x49CA0B: l_main_run (main.c:490)
==147== by 0x49CC3F: l_main_run_with_signal (main.c:630)
==147== by 0x4049EC: main (main.c:614)
If an AP directed roam frame comes in while IWD is roaming its
still valuable to parse that frame and blacklist the BSS that
sent it.
This can happen most frequently during a roam scan while connected
to an overloaded BSS that is requesting IWD roams elsewhere.
If the BSS is requesting IWD roam elsewhere add this BSS to the
blacklist using BLACKLIST_REASON_ROAM_REQUESTED. This will lower
the chances of IWD roaming/connecting back to this BSS in the
future.
This then allows IWD to consider this blacklist state when picking
a roam candidate. Its undesireable to fully ban a roam blacklisted
BSS, so some additional sorting logic has been added. Prior to
comparing based on rank, BSS's will be sorted into two higher level
groups:
Above Threshold - BSS is above the RoamThreshold
Below Threshold - BSS is below the RoamThreshold
Within each of these groups the BSS may be roam blacklisted which
will position it at the bottom of the list within its respecitve
group.
This adds a new (less severe) blacklist reason as well as an option
to configure the timeout. This blacklist reason will be used in cases
where a BSS has requested IWD roam elsewhere. At that time a new
blacklist entry will be added which will be used along with some
other criteria to determine if IWD should connect/roam to that BSS
again.
Now that we have multiple blacklist reasons there may be situations
where a blacklist entry already exists but with a different reason.
This is going to be handled by the reason severity. Since we have
just two reasons we will treat a connection failure as most severe
and a roam requested as less severe. This leaves us with two
possible situations:
1. BSS is roam blacklisted, then gets connection blacklisted:
The reason will be "promoted" to connection blacklisted.
2. BSS is connection blacklisted, then gets roam blacklisted:
The blacklist request will be ignored
When pruning the list check_if_expired was comparing to the maximum
amount of time a BSS can be blacklisted, not if the current time had
exceeded the expirationt time. This results in blacklist entries
hanging around longer than they should, which would result in them
poentially being blacklisted even longer if there was another reason
to blacklist in the future.
Instead on prune check the actual expiration and remove the entry if
its expired. Doing this removes the need to check any of the times
in blacklist_contains_bss since prune will remove any expired entries
correctly.
To both prepare for some new blacklisting behavior and allow for
easier consolidation of the network-specific blacklist include a
reason enum for each entry. This allows IWD to differentiate
between multiple blacklist types. For now only the existing
"permanent" type is being added which prevents connections to that
BSS via autoconnect until it expires.
Allowing the timeout blacklist to be disabled has introduced a bug
where a failed connection will not result in the BSS list to be
traversed. This causes IWD to retry the same BSS over and over which
be either a) have some issue preventing a connection or b) may simply
be unreachable/out of range.
This is because IWD was inherently relying on the timeout blacklist
to flag BSS's on failures. With it disabled there was nothing to tell
network_bss_select that we should skip the BSS and it would return
the same BSS indefinitely.
To fix this some of the blacklisting logic was re-worked in station.
Now, a BSS will always get network blacklisted upon a failure. This
allows network.c to traverse to the next BSS upon failure.
For auth/assoc failures we will then only timeout blacklist under
certain conditions, i.e. the status code was not in the temporary
list.
Fixes: 77639d2d452e ("blacklist: allow configuration to disable the blacklist")
Certain use cases may not need or want this feature so allowing it to
be disabled is a much cleaner way than doing something like setting
the timeouts very low.
Now [Blacklist].InitialTimeout can be set to zero which will prevent
any blacklisting.
In addition some other small changes were added:
- Warn if the multiplier is 0, and set to 1 if so.
- Warn if the initial timeout exceeds the maximum timeout.
- Log if the blacklist is disabled
- Use L_USEC_PER_SEC instead of magic numbers.
SAE/WPA3 is completely broken on brcmfmac, at least without a custom
kernel patch which isn't included in many OS distributions. In order
to help with this add a driver quirk so devices with brcmfmac can
utilize WPA2 instead of WPA3 and at least connect to networks at
this capacity until the fix is more widely distributed.
Instead of just printing the PMKSA pointer separate this into two
separate debug messages, one for if the PMKSA exists and the other
if it does not. In addition print out the MAC of the AP so we have
a reference of which PMKSA this is.
With external auth there is no associate event meaning the auth proto
never gets freed, which prevents eapol from starting inside the
OCI callback. Check for this specific case and free the auth proto
after signaling that external auth has completed.
The user can now limit the size and count of PCAP files iwmon will
create. This allows iwmon to run for long periods of time without
filling up disk space.
This implements support for "rolling captures" by allowing iwmon to
limit the PCAP file size and number of PCAP's that are created.
This is a useful feature when long term monitoring is needed. If
there is some rare behavior requiring iwmon to run for days, months,
or longer the resulting PCAP file would become quite large and fill
up disk space.
When enabled (command line arguments in subsequent patch) the PCAP
file size is checked on each write. If it exceeds the limit a new
PCAP file will be created. Once the number of old PCAP files reaches
the set limit the oldest PCAP will be removed from disk.
For syncing iwmon captures with other logging its useful to
timestamp in some absolute format like UTC. This adds an
option which allows the user to specify what time format to
show. For now support:
delta - (default) The time delta between the first packet
and the current packet.
utc - The packet time in UTC
The ath10k driver has shown some performance issues, specifically
packet loss, when frame watches are registered with the multicast
RX flag set. This is relevant for DPP which registers for these
when DPP starts (if the driver supports it). This has only been
observed when there are large groups of clients all using the same
wifi channel so its unlikely to be much of an issue for those using
IWD/ath10k and DPP unless you run large deployments of clients.
But for large deployments with IWD/ath10k we need a way to disable
the multicast RX registrations. Now, with the addition of
wiphy_supports_multicast_rx we can both check that the driver
supports this as well as if its been disabled by the driver quirk.
This driver quirk and associated helper API lets other modules both
check if multicast RX is supported, and if its been disabled via
the driver quirk setting.
The actual connection piece of this is very minimal, and only
requires station to check if there is a PMKSA cached, and if so
include the PMKID in the RSNE. Netdev then takes care of the rest.
The remainder of this patch is the error handling if a PMKSA
connection fails with INVALID_PMKID. In this case IWD should retry
the same BSS without PMKSA.
An option was also added to disable PMKSA if a user wants to do
that. In theory PMKSA is actually less secure compared to SAE so
it could be something a user wants to disable. Going forward though
it will be enabled by default as its a requirement from the WiFi
alliance for WPA3 certification.
To prepare for PMKSA support station needs access to the handshake
object. This is because if PMKSA fails due to an expired/missing
PMKSA on the AP station should retry using the standard association.
This poses a problem currently because netdev frees the handshake
prior to calling the connect callback.
This was quite simple and only requiring caching the PMKSA after a
successful handshake, and using the correct authentication type
for connections if we have a prior PMKSA cached.
This is only being added for initial SAE associations for now since
this is where we gain the biggest improvement, in addition to the
requirement by the WiFi alliance to label products as "WPA3 capable"
This is needed in order to clear the PMKSA from the handshake state
without actually putting it back into the cache. This is something
that will be needed in case the AP rejects the association due to
an expired (or forgotten) PMKSA.
The majority of this patch was authored by Denis Kenzior, but
I have appended setting the PMK inside handshake_state_set_pmksa
as well as checking if the pmkid exists in
handshake_state_steal_pmkid.
Authored-by: Denis Kenzior <denkenz@gmail.com>
Authored-by: James Prestwood <prestwoj@gmail.com>
There are quite a few tests here for various scenarios and PMKSA
throws a wrench into that. Rather than potentially breaking the
tests in attempt to get them working with PMKSA, just disable PMKSA.
Since IWD doesn't utilize DBus signals in "normal" operations its
fine to lazy initialize any of the DBus interfaces since properties
can be obtained as needed with Get/GetAll.
For test-runner though StationDebug uses signals for debug events
and until the StationDebug class is initialized (via a method call
or property access) all signals will be lost. Fix this by always
initializing the StationDebug interface when a Device class is
initialized.
This adds a ref count to the handshake state object (as well as
ref/unref APIs). Currently IWD is careful to ensure that netdev
holds the root reference to the handshake state. Other modules do
track it themselves, but ensure that it doesn't get referenced
after netdev frees it.
Future work related to PMKSA will require that station holds a
references to the handshake state, specifically for retry logic,
after netdev is done with it so we need a way to delay the free
until station is also done.
The utilization rank factor already existed but was very rigid
and only checked a few values. This adds the (optional) ability
to start applying an exponentially decaying factor to both
utilization and station count after some threshold is reached.
This area needs to be re-worked in order to support very highly
loaded networks. If a network either doesn't support client
balancing or does it poorly its left up to the clients to choose
the best BSS possible given all the information available. In
these cases connecting to a highly loaded BSS may fail, or result
in a disconnect soon after connecting. In these cases its likely
better for IWD to choose a slightly lower RSSI/datarate BSS over
the conventionally 'best' BSS in order to aid in distributing
the network load.
The thresholds are currently optional and not enabled by default
but if set they behave as follows:
If the value is above the threshold it is mapped to an integer
between 0 and 30. (using a starting range of <value> - 255).
This integer is then used to index in the exponential decay table
to get a factor between 1 and 0. This factor is then applied to
the rank.
Note that as the value increases above the threshold the rank
will be increasingly effected, as is expected for an exponential
function. These option should be used with care as it may have
unintended consequences, especially with very high load networks.
i.e. you may see IWD roaming to BSS's with much lower signal if
there are high load BSS's nearby.
To maintain the existing behavior if there is no utilization
factor set in main.conf the legacy thresholds/factors will be
used.
This is copied from network.c that uses a static table to lookup
exponential decay values by index (generated from 1/pow(n, 0.3)).
network.c uses this for network ranking but it can be useful for
BSS ranking as well if you need to apply some exponential backoff
to a value.
This has been needed elsewhere but generally shortcuts could be
taken mapping with ranges starting/ending with zero. This is a
more general linear mapping utility to map values between any
two ranges.
gcc-15 switched to -std=c23 by default:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=55e3bd376b2214e200fa76d12b67ff259b06c212
As a result `iwd` fails the build as:
../src/crypto.c:1215:24: error: incompatible types when returning type '_Bool' but 'struct l_ecc_point *' was expected
1215 | return false;
| ^~~~~
Signed-off-by: Rudi Heitbaum <rudi@heitbaum.com>
gcc-15 switched to -std=c23 by default:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=55e3bd376b2214e200fa76d12b67ff259b06c212
As a result `iwd` fails the build as:
wired/ethdev.c: In function 'pae_open':
wired/ethdev.c:340:55:
error: passing argument 4 of 'l_io_set_read_handler'
from incompatible pointer type [-Wincompatible-pointer-types]
340 | l_io_set_read_handler(pae_io, pae_read, NULL, pae_destroy);
| ^~~~~~~~~~~
| |
| void (*)(void)
In file included from ...-ell-0.70-dev/include/ell/ell.h:19,
from wired/ethdev.c:38:
...-ell-0.70-dev/include/ell/io.h:33:68:
note: expected 'l_io_destroy_cb_t' {aka 'void (*)(void *)'}
but argument is of type 'void (*)(void)'
33 | void *user_data, l_io_destroy_cb_t destroy);
| ~~~~~~~~~~~~~~~~~~^~~~~~~
C23 changed the meaning of `void (*)()` from partially defined prototype
to `void (*)(void)`.
The 3rd byte of the country code was being printed as ASCII but this
byte isn't always a printable character. Instead we can check what
the value is and describe what it means from the spec.
These frequencies were seen being advertised by a driver and IWD has
no operating class/channel mapping for them. Specifically 5960 was
causing issues due to a few bugs and mapping to channel 2 of the 6ghz
band. Those bugs have now been resolved.
If these frequencies can be supported in a clean manor we can remove
this test, but until then ensure IWD does not parse them.
After the band is established we check the e4 table for the channel
that matches. The problem here is we will end up checking all the
operating classes, even those that are not within the band that was
determined. This could result in false positives and return a
channel that doesn't make sense.
When the frequencies/channels were parsed there was no check that the
resulting band matched what was expected. Now, pass the band object
itself in which has the band set to what is expected.
If IPv6 is disabled or not supported at the kernel level writing the
sysfs settings will fail. A few of them had a support check but this
patch adds a supported bool to the remainder so we done get errors
like:
Unable to write drop_unsolicited_na to /proc/sys/net/ipv6/conf/wlan0/drop_unsolicited_na
Similar to several other modules DPP registers for its frame
watches on init then ignores anything is receives unless DPP
is actually running.
Due to some recent issues surrounding ath10k and multicast frames
it was discovered that simply registering for multicast RX frames
causes a significant performance impact depending on the current
channel load.
Regardless of the impact to a single driver, it is actually more
efficient to only register for the DPP frames when DPP starts
rather than when IWD initializes. This prevents any of the frames
from hitting userspace which would otherwise be ignored.
Using the frame-xchg group ID's we can only register for DPP
frames when needed, then close that group and the associated
frame watches.
DPP optionally uses the multicast RX flag for frame registrations but
since frame-xchg did not support that, it used its own registration
internally. To avoid code duplication within DPP add a flag to
frame_watch_add in order to allow DPP to utilize frame-xchg.
The selection loop was choosing an initial candidate purely for
use of the "fallback_to_blacklist" flag. But we have a similar
case with OWE transitional networks where we avoid the legacy
open network in preference for OWE:
/* Don't want to connect to the Open BSS if possible */
if (!bss->rsne)
continue;
If no OWE network gets selected we may iterate all BSS's and end
the loop, which then returns NULL.
To fix this move the blacklist check earlier and still ignore any
BSS's in the blacklist. Also add a new flag in the selection loop
indicating an open network was skipped. If we then exhaust all
other BSS's we can return this candidate.
Some drivers like brcmfmac don't support OWE but from userspace its
not possible to query this information. Rather than completely
blacklist brcmfmac we can allow the user to configure this and
disable OWE in IWD.
The "UK" alpha2 code is not the official code for the United Kingdom
but is a "reserved" code for compatibility. The official alpha2 is
"GB" which is being added to the EU list. This fixes issues parsing
neighbor reports, for example:
src/station.c:parse_neighbor_report() Neighbor report received for xx:xx:xx:xx:xx:xx: ch 136 (oper class 3), MD not set
Failed to find band with country string 'GB 32' and oper class 3, trying fallback
src/station.c:station_add_neighbor_report_freqs() Ignored: unsupported oper class
Test handling of technically illegal but harmless cloned IEs.
Based on real traffic captured from retail APs.
As cloned IEs are now allowed the
"/IE order/Bad (Duplicate + Out of Order IE) 1"
test payload has been altered to be more-wrong so it still fails
verification as expected.
Prior to adding the polling fallback this code path was only used for
signal level list notifications and netdev_rssi_polling_update() was
structured as such, where if the RSSI list feature existed there was
nothing to be done as the kernel handled the notifications.
For certain mediatek cards this is broken, hence why the fallback was
added. But netdev_rssi_polling_update() was never changed to take
this into account which bypassed the timer cleanup on disconnections
resulting in a crash when the timer fired after IWD was disconnected:
iwd: ++++++++ backtrace ++++++++
iwd: #0 0x7b5459642520 in /lib/x86_64-linux-gnu/libc.so.6
iwd: #1 0x7b54597aedf4 in /lib/x86_64-linux-gnu/libc.so.6
iwd: #2 0x49f82d in l_netlink_message_append() at ome/jprestwood/iwd/ell/netlink.c:825
iwd: #3 0x4a0c12 in l_genl_msg_append_attr() at ome/jprestwood/iwd/ell/genl.c:1522
iwd: #4 0x405c61 in netdev_rssi_poll() at ome/jprestwood/iwd/src/netdev.c:764
iwd: #5 0x49cce4 in timeout_callback() at ome/jprestwood/iwd/ell/timeout.c:70
iwd: #6 0x49c2ed in l_main_iterate() at ome/jprestwood/iwd/ell/main.c:455 (discriminator 2)
iwd: #7 0x49c3bc in l_main_run() at ome/jprestwood/iwd/ell/main.c:504
iwd: #8 0x49c5f0 in l_main_run_with_signal() at ome/jprestwood/iwd/ell/main.c:632
iwd: #9 0x4049ed in main() at ome/jprestwood/iwd/src/main.c:614
iwd: #10 0x7b5459629d90 in /lib/x86_64-linux-gnu/libc.so.6
iwd: #11 0x7b5459629e40 in /lib/x86_64-linux-gnu/libc.so.6
iwd: +++++++++++++++++++++++++++
To fix this we need to add checks for the cqm_poll_fallback flag in
netdev_rssi_polling_update().
Certain FullMAC drivers do not expose CMD_ASSOCIATE/CMD_AUTHENTICATE,
but lack the ability to fully offload SAE connections to the firmware.
Such connections can still be supported on such firmware by using
CMD_EXTERNAL_AUTH & CMD_FRAME. The firmware sets the
NL80211_FEATURE_SAE bit (which implies support for CMD_AUTHENTICATE, but
oh well), and no other offload extended features.
When CMD_CONNECT is issued, the firmware sends CMD_EXTERNAL_AUTH via
unicast to the owner of the connection. The connection owner is then
expected to send SAE frames with the firmware using CMD_FRAME and
receive authenticate frames using unicast CMD_FRAME notifications as
well. Once SAE authentication completes, userspace is expected to
send a final CMD_EXTERNAL_AUTH back to the kernel with the corresponding
status code. On failure, a non-0 status code should be used.
Note that for historical reasons, SAE AKM sent in CMD_EXTERNAL_AUTH is
given in big endian order, not CPU order as is expected!
The TX or RX bitrate attributes can contain zero nested attributes.
This causes netdev_parse_bitrate() to fail, but this shouldn't then
cause the overall parsing to fail (we just don't have those values).
Fix this by continuing to parse attributes if either the TX/RX
bitrates fail to parse.
If the affinity watch is removed by setting an empty list the
disconnect callback won't be called which was the only place
the watch ID was cleared. This resulted in the next SetProperty call
to think a watch existed, and attempt to compare the sender address
which would be NULL.
The watch ID should be cleared inside the destroy callback, not
the disconnect callback.
If we scan a huge number of frequencies the PKEX timeout can get
rather large. This was overlooked in a prior patch who's intent
was to reduce the PKEX time, but in these cases it increased it.
Now the timeout will be capped at 2 minutes, but will still be
as low as 10 seconds for a single frequency.
In addition there was no timer reset once PKEX was completed.
This could cause excessive waits if, for example, the peer left
the channel mid-authentication. IWD would just wait until the
long PKEX timeout to eventually reset DPP. Once PKEX completes
we can assume that this peer will complete authentication quickly
and if not, we can fail.
While there is proper handling for a regdom update during a
TRIGGER_SCAN scan, prior to NEW_SCAN_RESULTS there is no such
handling if the regdom update comes in during a GET_SCAN or
GET_SURVEY.
In both the 6ghz and non-6ghz code paths we have some issues:
- For non-6ghz devices, or regdom updates that did not enable
6ghz the wiphy state watch callback will automatically issues
another GET_SURVEY/GET_SCAN without checking if there was
already one pending. It does this using the current scan request
which gets freed by the prior GET_SCAN/GET_SURVEY calls when
they complete, causing invalid reads when the subsequent calls
finish.
- If 6ghz was enabled by the update we actually append another
trigger command to the list and potentially run it if its the
current request. This also will end up in the same situation as
the request is freed by the pending GET_SURVEY/GET_SCAN calls.
For the non-6ghz case there is little to no harm in ignoring the
regdom update because its very unlikely it changed the allowed
frequencies.
For the 6ghz case we could potentially handle the new trigger scan
within get_scan_done, but thats beyond the scope of this change
and is likely quite intrusive.
Since surveys end up making driver calls in the kernel its not
entirely known how they are implemented or how long they will
take. For this reason the survey will be skipped if getting the
results from an external scan.
Doing this also fixes a crash caused by external scans where the
scan request pointer is not checked and dereferenced:
0x00005ffa6a0376de in get_survey_done (user_data=0x5ffa783a3f90) at src/scan.c:2059
0x0000749646a29bbd in ?? () from /usr/lib/libell.so.0
0x0000749646a243cb in ?? () from /usr/lib/libell.so.0
0x0000749646a24655 in l_main_iterate () from /usr/lib/libell.so.0
0x0000749646a24ace in l_main_run () from /usr/lib/libell.so.0
0x0000749646a263a4 in l_main_run_with_signal () from /usr/lib/libell.so.0
0x00005ffa6a00d642 in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:614
Reported-by: Daniel Bond <danielbondno@gmail.com>
With the introduction of affinities the CQM threshold can be toggled
by a DBus call. There was no check if there was already a pending
call which would cause the command ID to be overwritten and lose any
potential to cancel it, e.g. if netdev went down.
Some drivers fail to set a CQM threshold and report not supported.
Its unclear exactly why but if this happens roaming is effectively
broken.
To work around this enable RSSI polling if -ENOTSUP is returned.
The polling callback has been changed to emit the HIGH/LOW signal
threshold events instead of just the RSSI level index, just as if
a CQM event came from the kernel.
When the affinity is set to the current BSS lower the roaming
threshold to loosly lock IWD to the current BSS. The lower
threshold is automatically removed upon roaming/disconnection
since the affinity array is also cleared out.
This property will hold an array of object paths for
BasicServiceSet (BSS) objects. For the purpose of this patch
only the setter/getter and client watch is implemented. The
purpose of this array is to guide or loosely lock IWD to certain
BSS's provided that some external client has more information
about the environment than what IWD takes into account for its
roaming decisions.
For the time being, the array is limited to only the connected
BSS path, and any roams or disconnects will clear the array.
The intended use case for this is if the device is stationary
an external client could reduce the likelihood of roaming by
setting the affinity to the current BSS.
This documents new DBus property that expose a bit more control to
how IWD roams.
Setting the affinity on the connected BSS effectively "locks" IWD to
that BSS (except at critical RSSI levels, explained below). This can
be useful for clients that have access to more information about the
environment than IWD. For example, if a client is stationary there
is likely no point in trying to roam until it has moved elsewhere.
A new main.conf option would also be added:
[General].CriticalRoamThreshold
This would be the new roam threshold set if the currently connected
BSS is in the Affinities list. If the RSSI continues to drop below
this level IWD will still attempt to roam.
A user reported a crash which was due to the roam trigger timeout
being overwritten, followed by a disconnect. Post-disconnect the
timer would fire and result in a crash. Its not clear exactly where
the overwrite was happening but upon code inspection it could
happen in the following scenario:
1. Beacon loss event, start roam timeout
2. Signal low event, no check if timeout is running and the timeout
gets overwritten.
The reported crash actually didn't appear to be from the above
scenario but something else, so this logic is being hardened and
improved
Now if a roam timeout already exists and trying to be rearmed IWD
will check the time remaining on the current timer and either keep
the active timer or reschedule it to the lesser of the two values
(current or new rearm time). This will avoid cases such as a long
roam timer being active (e.g. 60 seconds) followed by a beacon or
packet loss event which should trigger a more agressive roam
schedule.
This adds a secondary set of signal thresholds. The purpose of these
are to provide more flexibility in how IWD roams. The critical
threshold is intended to be temporary and is automatically reset
upon any connection changes: disconnects, roams, or new connections.
This prepares for the ability to toggle between two signal
thresholds in netdev. Since each netdev may not need/want the
same threshold store it in the netdev object rather than globally.
Since IWD enrollees can send unicast frames, a PKEX configurator could
still run without multicast support. Using this combination basically
allows any driver to utilize DPP/PKEX assuming the MAC address can
be communicated using some out of band mechanism.
The DPP spec allows for obtaining frequency and MAC addresses up
to the implementation. IWD already takes advantage of this by
first scanning for nearby APs and using only those frequencies.
For further optimization an enrollee may be able to determine the
configurators frequency and MAC ahead of time which would make
finding the configurator much faster.
This will help to get rid of magic number use throughout the project.
The definitions should be limited to global magic numbers that are used
throughout the project, for example SSID length, MAC address length,
etc.
Due to an unnoticed bug after adding the BasicServiceSet object into
network, it became clear that since station already owns the scan_bss
objects it makes sense for it to manage the associated DBus objects
as well. This way network doesn't have to jump through hoops to
determine if the scan_bss object was remove, added, or updated. It
can just manage its list as it did prior.
From the station side this makes things very easy. When scan results
come in we either update or add a new DBus object. And any time a
scan_bss is freed we remove the DBus object.
To reduce code duplication and prepare for moving the BSS interface
to station, add a new API so station can create a BSS path without
a network object directly.
src/eapol.c:1041:9: error: ‘buf’ may be used uninitialized [-Werror=maybe-uninitialized]
1041 | l_put_be16(0, &frame->header.packet_len);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This warning is bogus since the buffer is initialized through use of
eapol_frame members. EAPoL-Start is a very simple frame.
This was seemingly trivial at face value but doing so ended up
pointing out a bug with how group_retry is set when forcing
the default group. Since group_retry is initialized to -1 the
increment in the force_default_group block results in it being
set to zero, which is actually group 20, not 19. This did not
matter for hunt and peck, but H2E actually uses the retry value
to index its pre-generated points which then breaks SAE if
forcing the default group with H2E.
To handle H2E and force_default_group, the group selection
logic will always begin iterating the group array regardless of
SAE type.
The property itself is an array of paths, but this is difficult
to fit nicely within a terminal. Instead just display the count
of BSS's. Displaying a detailed list of BSS's will be done via
a separate command.
There are certain cases where we may not want to display the entire
header for a given set of properties. For example displaying a list
of proxy interfaces. Add finer control by separating out the header
and the prop/value display into two functions.
This will tell network the BSS list is being updated and it can
act accordingly as far as the BSS DBus registrations/unregistration.
In addition any scan_bss object needing to be freed has to wait
until after network_bss_stop_update() because network has to be able
to iterate its old list and unregister any BSS's that were not seen
in the scan results. This is done by pushing each BSS needing to be
freed into a queue, then destroying them after the BSS's are all
added.
This adds a new DBus object/interface for tracking BSS's for
a given network. Since scanning replaces scan_bss objects some
new APIs were added to avoid tearing down the associated DBus
object for each BSS.
network_bss_start_update() should be called before any new BSS's
are added to the network object. This will keep track of the old
list and create a new network->bss_list where more entries can
be added. This is effectively replacing network_bss_list_clear,
except it keeps the old list around until...
network_bss_stop_update() is called when all BSS's have been
added to the network object. This will then iterate the old list
and lookup if any BSS DBus objects need to be destroyed. Once
completed the old list is destroyed.
iwd supports FILS only on softmac drivers. Ensure the capability check
is consistent between wiphy and netdev, both the softmac and the
relevant EXT_FEATURE bit must be checked.
CMD_EXTERNAL_AUTH could potentially be used for FILS for FullMAC cards,
but no hardware supporting this has been identified yet.
Somehow this ability was lost in the refactoring. OWE was intended to
be used on fullmac cards, but the state machine is only actually created
if the connection type ends up being softmac.
Fixes: 8b6ad5d3b9ec ("owe: netdev: refactor to remove OWE as an auth-proto")
Certain flags (for example, NLA_F_NESTED) are ORed with the netlink
attribute type identifier prior to being sent on the wire. Such flags
need to be masked off and not taken into consideration when attribute
type is being compared against known values.
The workaround for Cisco APs reporting an operating class of zero
is still a bug that remains in Cisco equipment. This is made even
worse with the introduction of 6GHz where the channel numbers
overlap with both 2.4 and 5GHz bands. This makes it impossible to
definitively choose a frequency given only a channel number.
To improve this workaround and cover the 6GHz band we can calculate
a frequency for each band and see what is successful. Then append
each frequency we get to the list. This will result in more
frequencies scanned, but this tradeoff is better than potentially
avoiding a roam to 6GHz or high order 5ghz channel numbers.
[denkenz@archdev ~]$ qemu-system-x86_64 --version
QEMU emulator version 9.0.1
Copyright (c) 2003-2024 Fabrice Bellard and the QEMU Project developers
QEMU now seems to complain that 'no-hpet' and 'no-acpi' command line
arguments are unrecognized.
l_genl class has nice ways of discovering and requesting families. The
genl functionality has been added after the iwmon skeleton was created,
but it is now time to migrate to using these APIs.
This attribute is actually an array of signed 32 bit integers and it
was being treated as a single integer. This would work until more
than one threshold was set, then it would fail to parse it.
After the station state changes in DPP setting autoconnect=True was
causing DPP to stop prior to being able to scan for the network.
Instead we can start autoconnect earlier so we aren't toggling the
property while DPP is running.
Prior to now the DPP state was required to be disconnected before
DPP would start. This is inconvenient for the user since it requires
extra state checking and/or DBus method calls. Instead model this
case like WSC and issue a disconnect to station if DPP is requested
to start.
The other conditions on stopping DPP are also preserved and no
changes to the configurator role have been made, i.e. being
disconnected while configuring still stops DPP. Similarly any
connection made during enrolling will stop DPP.
It should also be noted that station's autoconfigure setting is also
preserved and set back to its original value upon DPP completing.
Gets the current autoconenct setting. This is not the current
autoconnect state. Will be used in DPP to reset station's autoconnect
setting back to what it was prior to DPP, in case of failure.
In order to slightly rework the DPP state machine to handle
automatically disconnecting (for enrollees) functions need to be
created that isolate everything needed to start DPP/PKEX in case
a disconnect needs to be done first.
If valgrind didn't report any issues, don't dump the logs. This
makes the test run a lot easier to look at without having to scroll
through pages of valgrind logs that provide no value.
When the survey code was added it neglected to add the same
cancelation logic that existed for the GET_SCAN call, i.e. if
a scan was canceled and there was a pending GET_SURVEY to the
kernel that needs to be canceled, and the request cleaned up.
Fixes: 35808debae ("scan: use GET_SURVEY for SNR calculation in ranking")
IPv6 address entry was not updated to use display_table_row which led to
a shifted line in table, as shown below:
$ iwctl station wlan0 show | head | sed 's| |.|g'
.................................Station:.wlan0................................
--------------------------------------------------------------------------------
..Settable..Property..............Value..........................................
--------------------------------------------------------------------------------
............Scanning..............no...............................................
............State.................connected........................................
............Connected.network.....Clannad.Legacy...................................
............IPv4.address..........192.168.1.12.....................................
............IPv6.address........fdc3:541d:864f:0:96db:c9ff:fe36:b15............
............ConnectedBss..........cc:d8:43:77:91:0e................................
This patch aligns IPv6 address line with other lines in the table.
Fixes: 35dd2c08219a (client: update station to use display_table_row, 2022-07-07)
testEncryptedProfiles:
- This would occationally fail because the test is expecting
to explicitly connect but after the first failed connection
autoconnect takes over and its a race to connect.
testPSK-roam:
- Several rules were not being cleaned up which could cause
tests afterwards to fail
- The AP roam test started failing randomly because of the SNR
ranking changes. It appears that with hwsim _sometimes_ the
SNR is able to be determined which can effect the ranking. This
test assumed the two BSS's would be the same ranking but the
SNR sometimes causes this to not be true.
The wait_for_event() function allows past events to cause this
function to return immediately. This behavior is known, and
relied on for some tests. But in some cases you want to only
handle _new_ events, so we need a way to clear out prior events.
This event is not used anywhere and can be leveraged in autotesting.
Move the event to eapol_start() so it gets called unconditionally
when the 4-way handshake is started.
If a disconnect arrives at any point during the 4-way handshake or
key setting this would result in netdev sending a disconnect event
to station. If this is a reassociation this case is unhandled in
station and causes a hang as it expects any connection failure to
be handled via the reassociation callback, not a random disconnect
event.
To handle this case we can utilize netdev_disconnected() along with
the new NETDEV_RESULT_DISCONNECTED result to ensure the connect
callback gets called if it exists (indicating a pending connection)
Below are logs showing the "Unexpected disconnect event" which
prevents IWD from cleaning up its state and ultimately results in a
hang:
Jul 16 18:16:13: src/station.c:station_transition_reassociate()
Jul 16 18:16:13: event: state, old: connected, new: roaming
Jul 16 18:16:13: src/wiphy.c:wiphy_radio_work_done() Work item 65 done
Jul 16 18:16:13: src/wiphy.c:wiphy_radio_work_next() Starting work item 66
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
Jul 16 18:16:13: src/netdev.c:netdev_deauthenticate_event()
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification New Station(19)
Jul 16 18:16:13: src/station.c:station_netdev_event() Associating
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
Jul 16 18:16:13: src/netdev.c:netdev_authenticate_event()
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Associate(38)
Jul 16 18:16:13: src/netdev.c:netdev_associate_event()
Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Connect(46)
Jul 16 18:16:13: src/netdev.c:netdev_connect_event()
Jul 16 18:16:13: src/netdev.c:netdev_connect_event() aborting and ignore_connect_event not set, proceed
Jul 16 18:16:13: src/netdev.c:netdev_connect_event() expect_connect_failure not set, proceed
Jul 16 18:16:13: src/netdev.c:parse_request_ies()
Jul 16 18:16:13: src/netdev.c:netdev_connect_event() Request / Response IEs parsed
Jul 16 18:16:13: src/netdev.c:netdev_get_oci()
Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:13: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:13: src/netdev.c:netdev_get_oci_cb() Obtained OCI: freq: 5220, width: 3, center1: 5210, center2: 0
Jul 16 18:16:13: src/eapol.c:eapol_start()
Jul 16 18:16:13: src/netdev.c:netdev_unicast_notify() Unicast notification Control Port Frame(129)
Jul 16 18:16:13: src/netdev.c:netdev_control_port_frame_event()
Jul 16 18:16:13: src/eapol.c:eapol_handle_ptk_1_of_4() ifindex=6
Jul 16 18:16:13: src/netdev.c:netdev_mlme_notify() MLME notification Control Port TX Status(139)
Jul 16 18:16:14: src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
Jul 16 18:16:14: src/netdev.c:netdev_cqm_event() Signal change event (above=1 signal=-60)
Jul 16 18:16:17: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:17: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
Jul 16 18:16:17: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
Jul 16 18:16:17: src/netdev.c:netdev_deauthenticate_event()
Jul 16 18:16:17: src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
Jul 16 18:16:17: src/netdev.c:netdev_disconnect_event()
Jul 16 18:16:17: Received Deauthentication event, reason: 15, from_ap: true
Jul 16 18:16:17: src/wiphy.c:wiphy_radio_work_done() Work item 66 done
Jul 16 18:16:17: src/station.c:station_disconnect_event() 6
Jul 16 18:16:17: Unexpected disconnect event
Jul 16 18:16:17: src/netdev.c:netdev_link_notify() event 16 on ifindex 6
Jul 16 18:16:17: src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
Jul 16 18:16:17: src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX
After adding the NETDEV_RESULT_DISCONNECTED enum, handshake failures
initiated by the AP come in via this result so the existing logic
to call network_connect_failed() was broken. We could still get a
handshake failure generated internally, so that has been preserved
(via NETDEV_RESULT_HANDSHAKE_FAILED) but a check for a 4-way
handshake timeout reason code was also added.
This new event is sent during a connection if netdev recieves a
disconnect event. This patch cleans up station to handle this
case and leave the existing NETDEV_EVENT_DISCONNECTED_BY_{AP,SME}
handling only for CONNECTED, NETCONFIG, and FW_ROAMING states.
This new result is meant to handle cases where a disconnect
event (deauth/disassoc) was received during an ongoing connection.
Whether that's during authentication, association, the 4-way
handshake, or key setting.
src/p2putil.c: In function 'p2p_get_random_string':
src/p2putil.c:2641:37: error: initializer element is not constant 2641 |
static const int set_size = strlen(CHARSET); |
^~~~~~
This code path is not exercised in the autotest but commonly does
happen in the real world. There is no associated bug with this, but
its helpful to have this event triggered in case something got
introduced in the future.
The authenticating event was not used anymore and the associating
event use was questionable (after the CMD_CONNECT callback).
No other modules actually utilize these events but they are useful
for autotests. Move these events around to map 1:1 when the kernel
sends the auth/assoc events.
There are a few values which are nice to see in debug logs. Namely
the BSS load and SNR. Both of these values may not be available
either due to the AP or local hardware limiations. Rather than print
dummy values for these refactor the print so append the values only
if they are set in the scan result.
For ranking purposes the utilization was defaulted to a valid (127)
which would not change the rank if that IE was not found in the
scan results. Historically this was printed (debug) as part of the
scan results but that was removed as it was somewhat confusing. i.e.
did the AP _really_ have a utilization of 127? or was the IE not
found?
Since it is useful to see the BSS load if that is advertised add a
flag to the scan_bss struct to indicate if the IE was present which
can be checked.
This issues a GET_SURVEY dump after scan results are available and
populates the survey information within the scan results. Currently
the only value obtained is the noise for a given frequency but the
survey results structure was created if in the future more values
need to be added.
From the noise, the SNR can be calculated. This is then used in the
ranking calculation to help lower BSS ranks that are on high noise
channels.
Parsing the flush flag for external scans was not done correctly
as it was not parsing the ATTR_SCAN_FLAGS but instead the flag
bitmap. Fix this by parsing the flags attribute, then checking if
the bit is set.
Add a nested attribute parser. For the first supported attribute
add NL80211_ATTR_SURVEY_INFO.
This allows parsing of nested attributes in the same convenient
way as nl80211_parse_attrs but allows for support of any level of
nested attributes provided that a handler is added for each.
To prep for adding a _nested() variant of this function refactor
this to act on an l_genl_attr object rather than the message itself.
In addition a handler specific to the attribute being parsed is
now passed in, with the current "handler_for_type" being renamed to
"handler_for_nl80211" that corresponds to root level attributes.
This warning is guaranteed to happen for SAE networks where there are
multiple netdev_authenticate_events. This should just be a check so
we don't register eapol twice, not a warning.
EAP-TTLS Start packets are empty by default, but can still be sent with
the L flag set. When attempting to reassemble a message we should not
fail if the length of the message is 0, and just treat it as any other
unfragmented message with the L flag set.
This test uses the same country/country3 values seen by an AP vendor
which causes issues with IWD. The alpha2 is ES (Spain) and the 3rd
byte is 4, indicating to use the E-4. The issue then comes when the
neighbor report claims the BSS is under operating class 3 which is
not part of E-4.
With the fallback implemented, this test will pass since it will
try and lookup only on ES (the EU table) which operating class 3 is
part of.
Its been seen that some vendors incorrectly set the 3rd byte of the
country code which causes the band lookup to fail with the provided
operating class. This isn't compliant with the spec, but its been
seen out in the wild and it causes IWD to behave poorly, specifically
with roaming since it cannot parse neighbor reports. This then
requires IWD to do a full scan on each roam.
Instead of a hard rejection, IWD can instead attempt to determine
the band by ignoring that 3rd byte and only use the alpha2 string.
This makes IWD slightly less strict but at the advantage of not being
crippled when exposed to poor AP configurations.
This was added to support a single buggy AP model that failed to
negotiate the SAE group correctly. This may still be a problem but
since then the [Network].UseDefaultEccGroup option has been added
which accomplishes the same thing.
Remove the special handling for this specific OUI and rely on the
user setting the new option if they have problems.
Both ell/shared and ell/internal targets first create the ell/
directory within IWD. This apparently was just luck that one of
these always finished first in parallel builds. On my system at
least when building using dpkg-buildpackage IWD fails to build
due to the ell/ directory missing. From the logs it appears that
both the shared/internal targets were started but didn't complete
(or at least create the directory) before the ell/ell.h target:
make[1]: Entering directory '/home/jprestwood/tmp/iwd'
/usr/bin/mkdir -p ell
/usr/bin/mkdir -p ell
echo -n > ell/ell.h
/usr/bin/mkdir -p src
/bin/bash: line 1: ell/ell.h: No such file or directory
make[1]: *** [Makefile:4028: ell/ell.h] Error 1
Creating the ell/ directory within the ell/ell.h target solve
the issue. For reference this is the configure command dpkg
is using:
./configure --build=x86_64-linux-gnu \
--prefix=/usr \
--includedir=/usr/include \
--mandir=/usr/share/man \
--infodir=/usr/share/info \
--sysconfdir=/etc \
--localstatedir=/var \
--disable-option-checking \
--disable-silent-rules \
--libdir=/usr/lib/x86_64-linux-gnu \
--runstatedir=/run \
--disable-maintainer-mode \
--disable-dependency-tracking \
--enable-tools \
--enable-dbus-policy
Experimental AP-mode support for receiving a Confirm frame when in the
COMMITTED state. The AP will reply with a Confirm frame.
Note that when acting as an AP, on reception of a Commit frame, the AP
only replies with a Commit frame. The protocols allows to also already
send the Confirm frame, but older clients may not support simultaneously
receiving a Commit and Confirm frame.
Don't mark either client as being the authenticator. In the current unit
tests, both instances act as clients to test functionality. This ensures
the unit does not show an error during the following commits where SAE
for AP mode is added.
This was overlooked in a prior patch and causes warnings to be
printed when the RSSI is too low to estimate an HE data rate or
due to incompatible local capabilities (e.g. MCS support).
Similar to the other estimations, return -ENETUNREACH if the IE
was valid but incompatible.
If the RSSI is too low or the local capabilities were not
compatible to estimate the rate don't warn but instead treat
this the same as -ENOTSUP and drop down to the next capability
set.
If we register the main EAPOL frame listener as late as the associate
event, it may not observe ptk_1_of_4. This defeats handling for early
messages in eapol_rx_packet, which only sees messages once it has been
registered.
If we move registration to the authenticate event, then the EAPOL
frame listeners should observe all messages, without any possible
races. Note that the messages are not actually processed until
eapol_start() is called, and we haven't moved that call site. All
that's changing here is how early EAPOL messages can be observed.
netdev_disconnect() was unconditionally sending CMD_DISCONNECT which
is not the right behavior when IWD has not associated. This means
that if a connection was started then immediately canceled with
the Disconnect() method the kernel would continue to authenticate.
Instead if IWD has not yet associated it should send a deauth
command which causes the kernel to correctly cleanup its state and
stop trying to authenticate.
Below are logs showing the behavior. Autoconnect is started followed
immediately by a DBus Disconnect call, yet the kernel continues
sending authenticate events.
event: state, old: autoconnect_quick, new: connecting (auto)
src/scan.c:scan_cancel() Trying to cancel scan id 1 for wdev 7d
src/wiphy.c:wiphy_radio_work_done() Work item 1 done
src/wiphy.c:wiphy_radio_work_next() Starting work item 2
src/netdev.c:netdev_mlme_notify() MLME notification New Station(19)
src/station.c:station_dbus_disconnect()
src/station.c:station_reset_connection_state() 85
src/station.c:station_roam_state_clear() 85
event: state, old: connecting (auto), new: disconnecting
src/wiphy.c:wiphy_radio_work_done() Work item 2 done
src/station.c:station_connect_cb() 85, result: 5
src/station.c:station_disconnect_cb() 85, success: 1
event: state, old: disconnecting, new: disconnected
src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
src/netdev.c:netdev_authenticate_event()
Unexpected connection related event -- is another supplicant running?
src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
src/netdev.c:netdev_authenticate_event()
Unexpected connection related event -- is another supplicant running?
src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
src/netdev.c:netdev_authenticate_event()
Unexpected connection related event -- is another supplicant running?
src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
src/netdev.c:netdev_authenticate_event()
Unexpected connection related event -- is another supplicant running?
In most cases any failure here is likely just due to the AP not
supporting the feature, whether its HE/VHT/HE. This should result
in the estimation returning -ENOTSUP in which case we move down
the list. Any other non-zero return we will now warn to make it
clear the IEs did exist, but were not properly formatted.
All length check failures were changed to continue instead of
fail. This will now treat invalid lengths as if the IE did not
exist.
In addition HE specifically has an extra validation function which,
if failed, was bailing out of the estimation function entirely.
Instead this is now treated as if there was no HE capabilities and
the logic can move down to VHT, HT, or basic rates.
This was changed from too large of a mask (0xff) in an earlier
commit but was masking 5 bits instead of 6.
Fixes: 121c2c5653 ("monitor: properly mask HE capabilities bitfield")
Caught by static analysis, the dev->conn_peer pointer was being
dereferenced very early on without a NULL check, but further it
was being NULL checked. If there is a possibility of it being NULL
the check should be done much earlier.
If the test needs to do something very specific it may be useful to
prevent IWD from managing all the radios. This can now be done
by setting a "reserve" option in the radio settings. The value of
this should be something other than iwd, hostapd, or wpa_supplicant.
For example:
[rad1]
reserve=false
Caught by static analysis, if ATTR_MAC was not in the message there
would be a memcpy with uninitialized bytes. In addition there is no
reason to memcpy twice. Instead 'mac' can be a const pointer which
both verifies it exists and removes the need for a second memcpy.
Static analysis complains that 'last' could be NULL which is true.
This really could only happen if every frequency was disabled which
likely is impossible but in any case, check before dereferencing
the pointer.
Since these are all stack variables they are not zero initialized.
If parsing fails there may be invalid pointers within the structures
which can get dereferenced by p2p_clear_*
The input queue pointer was being initialized unconditionally so if
parsing fails the out pointer is still set after the queue is
destroyed. This causes a crash during cleanup.
Instead use a temporary pointer while parsing and only after parsing
has finished do we set the out pointer.
Reported-By: Alex Radocea <alex@supernetworks.org>
When HUP is received the IO read callback was never completing which
caused it to block indefinitely until waited for. This didn't matter
for most transient processes but for IWD, hostapd, wpa_supplicant
it would cause test-runner to hang if the process crashed.
Detecting a crash is somewhat hacky because we have no process
management like systemd and the return code isn't reliable as some
processes return non-zero under normal circumstances. So to detect
a crash the process output is being checked for the string:
"++++++++ backtrace ++++++++". This isn't 100% reliable obviously
since its dependent on how the binary is compiled, but even if the
crash itself isn't detected any test should still fail if written
correctly.
Doing this allows auto-tests to handle IWD crashes gracefully by
failing the test, printing the exception (event without debugging)
and continue with other tests.
The slaac_test was one that would occationally fail, but very rarely,
due to the resolvconf log values appearing in an unexpected order.
This appears to be related to a typo in netconfig-commit which would
not set netconfig-domains and instead set dns_list. This was fixed
with a pending patch:
https://lore.kernel.org/iwd/20240227204242.1509980-1-denkenz@gmail.com/T/#u
But applying this now leads to testNetconfig failing slaac_test
100% of the time.
I'm not familiar enough with resolveconf to know if this test change
is ok, but based on the test behavior the expected log and disk logs
are the same, just in the incorrect order. I'm not sure if this the
log order is deterministic so instead the check now iterates the
expected log and verifies each value appears once in the resolvconf
log.
Here is an example of the expected vs disk logs after running the
test:
Expected:
-a wlan1.dns
nameserver 192.168.1.2
nameserver 3ffe:501:ffff💯:10
nameserver 3ffe:501:ffff💯:50
-a wlan1.domain
search test1
search test2
Resolvconf log:
-a wlan1.domain
search test1
search test2
-a wlan1.dns
nameserver 192.168.1.2
nameserver 3ffe:501:ffff💯:10
nameserver 3ffe:501:ffff💯:50
static analysis complains that authenticator is used uninitialized.
This isn't strictly true as memory region is reserved for the
authenticator using the contents of the passed in structure. This
region is then overwritten once the authenticator is actually computed
by authenticator_put(). Silence this warning by explicitly setting
authenticator bytes to 0.
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
This shouldn't be possible in theory since the roam_bss_list being
iterated is a subset of entire scan_bss list station/network has
but to be safe, and catch any issues due to future changes warn on
this condition.
For some encrypt operations DPP passes no AD iovecs (both are
NULL/0). But since the iovec itself is on the stack 'ad' is a
valid pointer from within aes_siv_encrypt. This causes memcpy
to be called which coverity complains about. Since the copy
length is zero it was effectively a no-op, but check num_ad to
prevent the call.
Tests the 3 possible options to UseDefaultEccGroup behave as
expected:
- When not provided use the "auto" behavior.
- When false, always use higher order groups
- When true, always use default group
The SAE test made some assumptions on certain conditions due to
there being no way of checking if those conditions were met
Mainly the use of H2E/hunt-and-peck.
We assumed that when we told hostapd to use H2E or hunt/peck it
would but in reality it was not. Hostapd is apparently not very
good at swapping between the two with a simple "reload" command.
Once H2E is enabled it appears that it cannot be undone.
Similarly the vendor elements seem to carry over from test to
test, and sometimes not which causes unintended behavior.
To fix this create separate APs for the specific scenario being
tested:
- Hunt and peck
- H2E
- Special vendor_element simulating buggy APs
Another issue found was that if password identifies are used
hostapd automatically chooses H2E which was not intented, at
least based on the test names (in reality it wasn't causing any
problems).
The tests have also been improved to use hostapds "sta_status"
command which contains the group number used when authenticating,
so now that at least can be verified.
In order to complete the learned default group behavior station needs
to be aware of when an SAE/OWE connection retried. This is all
handled within netdev/sae so add a new netdev event so station can
set the appropriate network flags to prevent trying the non-default
group again.
If either the settings specify it, or the scan_bss is flagged, set
the use_default_ecc_group flag in the handshake.
This also renames the flag to cover both OWE and SAE
There is special handling for buggy OWE APs which set a network flag
to use the default OWE group. Utilize the more persistent setting
within known-networks as well as the network object (in case there
is no profile).
This also renames the get/set APIs to be generic to ECC groups rather
than only OWE.
This adds the option [Settings].UseDefaultEccGroup which allows a
network profile to specify the behavior when using an ECC-based
protocol. If unset (default) IWD will learn the behavior of the
network for the lifetime of its process.
Many APs do not support group 20 which IWD tries first by default.
This leads to an initial failure followed by a retry using group 19.
This option will allow the user to configure IWD to use group 19
first or learn the network capabilities, if the authentication fails
with group 20 IWD will always use group 19 for the process lifetime.
The information specific to auth/assoc/connect timeouts isn't
communicated to station so emit the notice events within netdev.
We could communicate this to station by adding separate netdev
events, but this does not seem worth it for this use case as
these notice events aren't strictly limited to station.
For anyone debugging or trying to identify network infrastructure
problems the IWD DBus API isn't all that useful and ultimately
requires going through debug logs to figure out exactly what
happened. Having a concise set of debug logs containing only
relavent information would be very useful. In addition, having
some kind of syntax for these logs to be parsed by tooling could
automate these tasks.
This is being done, starting with station, by using iwd_notice
which internally uses l_notice. The use of the notice log level
(5) in IWD will be strictly for the type of messages described
above.
iwd_notice is being added so modules can communicate internal
state or event information via the NOTICE log level. This log
level will be reserved in IWD for only these type of messages.
The iwd_notice macro aims to help enforce some formatting
requirements for these type of log messages. The messages
should be one or more comma-separated "key: value" pairs starting
with "event: <name>" and followed by any additional info that
pertains to that event.
iwd_notice only enforces the initial event key/value format and
additional arguments are left to the caller to be formatted
correctly.
The --logger,-l flag can now be used to specify the logger type.
Unset (default) will set log output to stderr as it is today. The
other valid options are "syslog" and "journal".
basename use is considered harmful. There are two versions of
basename (see man 3 basename for details). The more intuitive version,
which is currently being used inside wiphy.c, is not supported by musl
libc implementation. Use of the libgen version is not preferred, so
drop use of basename entirely. Since wiphy.c is the only call site of
basename() inside iwd, open code the required logic.
ELL now has a setting to limit the number of DHCP attempts. This
will now be set in IWD and if reached will result in a failure
event, and in turn a disconnect.
IWD will set a maximum of 4 retries which should keep the maximum
DHCP time to ~60 seconds roughly.
The known frequency list is now a sorted list and the roam scan
results were not complying with this new requirement. The fix is
easy though since the iteration order of the scan results does
not matter (the roam candidates are inserted by rank). To fix
the known frequencies order we can simply reverse the scan results
list before iterating it.
When operating as an AP, drop message 4 of the 4-way handshake if the AP
has not yet received message 2. Otherwise an attacker can skip message 2
and immediately send message 4 to bypass authentication (the AP would be
using an all-zero ptk to verify the authenticity of message 4).
Modify the existing frequency test to check that the ordering
lines up with the ranking of the BSS.
Add a test to check that quick scans limit the number of known
frequencies.
In very large network deployments there could be a vast amount of APs
which could create a large known frequency list after some time once
all the APs are seen in scan results. This then increases the quick
scan time significantly, in the very worst case (but unlikely) just
as long as a full scan.
To help with this support in knownnetworks was added to limit the
number of frequencies per network. Station will now only get 5
recent frequencies per network making the maximum frequencies 25
in the worst case (~2.5s scan).
The magic values are now defines, and the recent roam frequencies
was also changed to use this define as well.
In order to support an ordered list of known frequencies the list
should be in order of last seen BSS frequencies with the highest
ranked ones first. To accomplish this without adding a lot of
complexity the frequencies can be pushed into the list as long as
they are pushed in reverse rank order (lowest rank first, highest
last). This ensures that very high ranked BSS's will always get
superseded by subsequent scans if not seen.
This adds a new network API to update the known frequency list
based on the current newtork->bss_list. This assumes that station
always wipes the BSS list on scans and populates with only fresh
BSS entries. After the scan this API can be called and it will
reverse the list, then add each frequency.
I've had connections to a WPA3-Personal only network fail with no log
message from iwd, and eventually figured out to was because the driver
would've required using CMD_EXTERNAL_AUTH. With the added log messages
the reason becomes obvious.
Additionally the fallback may happen even if the user explicitly
configured WPA3 in NetworkManager, I believe a warning is appropriate
there.
There was an unhandled corner case if netconfig was running and
multiple roam conditions happened in sequence, all before netconfig
had completed. A single roam before netconfig was already handled
(23f0f5717c) but this did not take into account any additional roam
conditions.
If IWD is in this state, having started netconfig, then roamed, and
again restarted netconfig it is still in a roaming state which will
prevent any further roams. IWD will remain "stuck" on the current
BSS until netconfig completes or gets disconnected.
In addition the general state logic is wrong here. If IWD roams
prior to netconfig it should stay in a connecting state (from the
perspective of DBus).
To fix this a new internal station state was added (no changes to
the DBus API) to distinguish between a purely WiFi connecting state
(STATION_STATE_CONNECTING/AUTO) and netconfig
(STATION_STATE_NETCONFIG). This allows IWD roam as needed if
netconfig is still running. Also, some special handling was added so
the station state property remains in a "connected" state until
netconfig actually completes, regardless of roams.
For some background this scenario happens if the DHCP server goes
down for an extended period, e.g. if its being upgraded/serviced.
This fixes a build break on some systems, specifically the
raspberry Pi 3 (ARM):
monitor/main.c: In function ‘open_packet’:
monitor/main.c:176:3: error: implicit declaration of function ‘close’; did you mean ‘pclose’? [-Werror=implicit-function-declaration]
176 | close(fd);
| ^~~~~
| pclose
This was caused by the unused hostapd instance running after being
re-enabled by mistake. This cause an additional scan result with the
same rank to be seen which would then be connected to by luck of the
draw.
This really needs to be done to many more autotests but since this
one seems to have random failures ensure that all the tests still
run if one fails. In addition add better cleanup for hwsim rules.
This gives the tests a lot more fine-tune control to wait for
specific state transitions rather than only what is exposed over
DBus.
The additional events for "ft-roam" and "reassoc-roam" were removed
since these are now covered by the more generic state change events
("ft-roaming" and "roaming" respectively).
To support multiple nlmon sources, move the logic that reads from iwmon
device into main.c instead of nlmon. nlmon.c now becomes agnostic of
how the packets are actually obtained. Packets are fed in via
high-level APIs such as nlmon_print_rtnl, nlmon_print_genl,
nlmon_print_pae.
The current implementation inside nlmon_receive is asymmetrical. RTNL
packets are printed using nlmon_print_rtnl while GENL packets are
printed using nlmon_message.
nlmon_print_genl and nlmon_print_rtnl already handle iterating over data
containing multiple messages, and are used by nlmon started in reader
mode. Use these for better symmetry inside nlmon_receive.
While here, move store_netlink() call into nlmon_print_rtnl. This makes
handling of PCAP output symmetrical for both RTNL and GENL packets.
This also fixes a possibility where only the first message of a
multi-RTNL packet would be stored.
nlmon_print_genl invokes genl_ctrl when a generic netlink control
message is encountered. genl_ctrl() tries to filter nl80211 family
appearance messages and setup nlmon->id with the extracted family id.
However, the id is already provided inside main.c by using nlmon_open,
and no control messages are processed by nlmon in 'capture' mode (-r
command line argument not passed) since all genl messages go through
nlmon_message() path instead.
configure scripts need to be runnable with a POSIX-compliant /bin/sh.
On many (but not all!) systems, /bin/sh is provided by Bash, so errors
like this aren't spotted. Notably Debian defaults to /bin/sh provided
by dash which doesn't tolerate such bashisms as '+='.
This retains compatibility with bash. Just copy the expanded append like
we do on the line above.
Fixes warnings like:
```
./configure: 13352: CFLAGS+= -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2: not found
```
In order to test that extra settings are applied prior to connecting
two tests were added for hidden networks as well as one testing if
there is already an existing profile after DPP.
The reason hidden networks were used was due to the requirement of
the "Hidden" settings in the profile. If this setting doesn't get
sync'ed to disk the connection will fail.
Before this change DPP was writing the credentials both to disk
and into the network object directly. This allowed the connection
to work fine but additional settings were not picked up due to
network_set_passphrase/psk loading the settings before they were
written.
Instead DPP can avoid setting the credentials to the network
object entirely and just write them to disk. Then, wait for
known networks to notify that the profile was either created
or updated then DPP can proceed to connecting. network_autoconnect()
will take care of loading the profile that DPP wrote and remove the
need for DPP to touch the network object at all.
One thing to note is that an idle callback is still needed from
within the known networks callback. This is because a new profile
requires network.c to set the network_info which is done in the
known networks callback. Rather than assume that network.c will be
called into before dpp.c an l_idle was added.
If a known network is modified on disk known networks does not have
any way of notifying other modules. This will be needed to support a
corner case in DPP if a profile exists but is overwritten after DPP
configuration. Add this event to known networks and handle it in
network.c (though nothing needs to be done in that case).
Without the change test-dpp fails on aarch64-linux as:
$ unit/test-dpp
TEST: DPP test responder-only key derivation
TEST: DPP test mutual key derivation
TEST: DPP test PKEX key derivation
test-dpp: unit/test-dpp.c:514: test_pkex_key_derivation: Assertion `!memcmp(tmp, __tmp, 32)' failed.
This happens due to int/size_t type mismatch passed to vararg
parameters to prf_plus():
bool prf_plus(enum l_checksum_type type, const void *key, size_t key_len,
void *out, size_t out_len,
size_t n_extra, ...)
{
// ...
va_start(va, n_extra);
for (i = 0; i < n_extra; i++) {
iov[i + 1].iov_base = va_arg(va, void *);
iov[i + 1].iov_len = va_arg(va, size_t);
// ...
Note that varargs here could only be a sequence of `void *` / `size_t`
values.
But in src/dpp-util.c `iwd` attempted to pass `int` there:
prf_plus(sha, prk, bytes, z_out, bytes, 5,
mac_i, 6, // <- here
mac_r, 6, // <- and here
m_x, bytes,
n_x, bytes,
key, strlen(key));
aarch64 stores only 32-bit value part of the register:
mov w7, #0x6
str w7, [sp, #...]
and loads full 64-bit form of the register:
ldr x3, [x3]
As a result higher bits of `iov[].iov_len` contain unexpected values and
sendmsg sends a lot more data than expected to the kernel.
The change fixes test-dpp test for me.
While at it fixed obvious `int` / `size_t` mismatch in src/erp.c.
Fixes: 6320d6db0f ("crypto: remove label from prf_plus, instead use va_args")
The path argument was used purely for debugging. It can be just as
informational printing just the SSID of the profile that failed to
parse the setting without requiring callers allocate a string to
call the function.
Certain tests may require external processes to work
(e.g. testNetconfig) and if missing the test will just hang until
the maximum test timeout. Check in start_process if the exe
actually exists and if not throw an exception.
In order to support identifiers the test profiles needed to be
reworked due to hostapd allowing multiple password entires. You
cannot just call set_value() with a new entry as the old ones
still exist. Instead use a unique password for the identifier and
non-identifier use cases.
After adding this test the failure_test started failing due to
hostapd not starting up. This was due to the group being unsupported
but oddly only when hostapd was reloaded (running the test
individually worked). To fix this the group number was changed to 21
which hostapd does support but IWD does not.
Adds a new network profile setting [Security].PasswordIdentifier.
When set (and the BSS enables SAE password identifiers) the network
and handshake object will read this and use it for the SAE
exchange.
Building the handshake will fail if:
- there is no password identifier set and the BSS sets the
"exclusive" bit.
- there is a password identifier set and the BSS does not set
the "in-use" bit.
Using this will provide netdev with a connect callback and unify the
roaming result notification between FT and reassociation. Both paths
will now end up in station_reassociate_cb.
This also adds another return case for ft_handshake_setup which was
previously ignored by ft_associate. Its likely impossible to actually
happen but should be handled nevertheless.
Fixes: 30c6a10f28 ("netdev: Separate connect_failed and disconnected paths")
Essentially exposes (and renames) netdev_ft_tx_associate in order to
be called similarly to netdev_reassociate/netdev_connect where a
connect callback can be provided. This will fix the current bug where
if association times out during FT IWD will hang and never transition
to disconnected.
This also removes the calling of the FT_ROAMED event and instead just
calls the connect callback (since its now set). This unifies the
callback path for reassociation and FT roaming.
This will be called from station after FT-authentication has
finished. It sets up the handshake object to perform reassociation.
This is essentially a copy-paste of ft_associate without sending
the actual frame.
In general only the authenticator FTE is used/validated but with
some FT refactoring coming there needs to be a way to build the
supplicants FTE into the handshake object. Because of this there
needs to be separate FTE buffers for both the authenticator and
supplicant.
The default() method was added for convenience but was extending the
test times significantly when the hostapd config was lengthy. This
was because it called set_value for every value regardless if it
had changed. Instead store the current configuration and in default()
only reset values that differ.
This tests ensures IWD disconnects after receiving an association
timeout event. This exposes a current bug where IWD does not
transition to disconnected after an association timeout when
FT-roaming.
If tests end in an unknown state it is sometimes required that IWD
be stopped manually in order for future tests to run. Add a stop()
method so test tearDown() methods can explicitly stop IWD.
The path for IWD to call this doesn't ever happen in autotests
but during debugging of the DPP agent it was noticed that the
DBus signature was incorrect and would always result in an error
when calling from IWD.
For adding SAE password identifiers the capability bits need to be
verified when loading the identifier from the profile. Pass the
BSS object in to network_load_psk rather than the 'need_passphrase'
boolean.
iov_ie_append assumed that a single IE was being added and thus the
length of the IE could be extracted directly from the element. However,
iov_ie_append was used on buffers which could contain multiple IEs
concatenated together, for example in handshake_state::vendor_ies. Most
of the time this was safe since vendor_ies was NULL or contained a
single element, but would result in incorrect behavior in the general
case. Fix that by changing iov_ie_append signature to take an explicit
length argument and have the caller specify whether the element is a
single IE or multiple.
Fixes: 7e9971661bcb ("netdev: Append any vendor IEs from the handshake")
Use an _auto_ variable to cleanup IEs allocated by
p2p_build_association_req(). While here, take out unneeded L_WARN_ON
since p2p_build_association_req cannot fail.
If the FT-Authenticate frame has been sent then a deauth is received
the work item for sending the FT-Associate frame is never canceled.
When this runs station->connected_network is NULL which causes a
crash:
src/station.c:station_try_next_transition() 7, target xx:xx:xx:xx:xx:xx
src/wiphy.c:wiphy_radio_work_insert() Inserting work item 5843
src/wiphy.c:wiphy_radio_work_insert() Inserting work item 5844
src/wiphy.c:wiphy_radio_work_done() Work item 5842 done
src/wiphy.c:wiphy_radio_work_next() Starting work item 5843
src/netdev.c:netdev_mlme_notify() MLME notification Remain on Channel(55)
src/ft.c:ft_send_authenticate()
src/netdev.c:netdev_mlme_notify() MLME notification Frame TX Status(60)
src/netdev.c:netdev_link_notify() event 16 on ifindex 7
src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
src/netdev.c:netdev_deauthenticate_event()
src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
src/netdev.c:netdev_disconnect_event()
Received Deauthentication event, reason: 7, from_ap: true
src/station.c:station_disconnect_event() 7
src/station.c:station_disassociated() 7
src/station.c:station_reset_connection_state() 7
src/station.c:station_roam_state_clear() 7
src/netconfig.c:netconfig_event_handler() l_netconfig event 2
src/netconfig-commit.c:netconfig_commit_print_addrs() removing address: yyy.yyy.yyy.yyy
src/resolve.c:resolve_systemd_revert() ifindex: 7
[DHCPv4] l_dhcp_client_stop:1264 Entering state: DHCP_STATE_INIT
src/station.c:station_enter_state() Old State: connected, new state: disconnected
src/station.c:station_enter_state() Old State: disconnected, new state: autoconnect_quick
src/wiphy.c:wiphy_radio_work_insert() Inserting work item 5845
src/netdev.c:netdev_mlme_notify() MLME notification Cancel Remain on Channel(56)
src/wiphy.c:wiphy_radio_work_done() Work item 5843 done
src/wiphy.c:wiphy_radio_work_next() Starting work item 5844
"Program terminated with signal SIGSEGV, Segmentation fault.",
"#0 0x0000565359ee3f54 in network_bss_find_by_addr ()",
"#0 0x0000565359ee3f54 in network_bss_find_by_addr ()",
"#1 0x0000565359ec9d23 in station_ft_work_ready ()",
"#2 0x0000565359ec0af0 in wiphy_radio_work_next ()",
"#3 0x0000565359f20080 in offchannel_mlme_notify ()",
"#4 0x0000565359f4416b in received_data ()",
"#5 0x0000565359f40d90 in io_callback ()",
"#6 0x0000565359f3ff4d in l_main_iterate ()",
"#7 0x0000565359f4001c in l_main_run ()",
"#8 0x0000565359f40240 in l_main_run_with_signal ()",
"#9 0x0000565359eb3888 in main ()"
ssid is declared as a 32 byte field in handshake_state, hence using it
as a string which is assumed to be nul-terminated will fail for SSIDs
that are 32 bytes long.
Fixes: d938d362b212 ("erp: ERP implementation and key cache move")
Fixes: 433373fe28a4 ("eapol: cache ERP keys on EAP success")
ssid is declared as a 32 byte field in handshake_state, hence using it
as a string which is assumed to be nul-terminated will fail for SSIDs
that are 32 bytes long.
Fixes: 1f1478285725 ("wiphy: add _generate_address_from_ssid")
Fixes: 5a1b1184fca6 ("netdev: support per-network MAC addresses")
In netdev_retry_owe, if l_gen_family_send fails, the connect_cmd is
never freed or reset. Fix that.
While here, use a stack variable instead of netdev member, since the use
of such a member is unnecessary and confusing.
vendor_ies stored in handshake_state are already added as part of
netdev_populate_common_ies(), which is already invoked by
netdev_build_cmd_connect().
Normally vendor_ies is NULL for OWE connections, so no IEs are
duplicated as a result.
CC src/adhoc.o
In file included from src/adhoc.c:28:0:
/usr/include/linux/if.h:234:19: error: field ‘ifru_addr’ has incomplete type
struct sockaddr ifru_addr;
^
/usr/include/linux/if.h:235:19: error: field ‘ifru_dstaddr’ has incomplete type
struct sockaddr ifru_dstaddr;
^
/usr/include/linux/if.h:236:19: error: field ‘ifru_broadaddr’ has incomplete type
struct sockaddr ifru_broadaddr;
^
/usr/include/linux/if.h:237:19: error: field ‘ifru_netmask’ has incomplete type
struct sockaddr ifru_netmask;
^
/usr/include/linux/if.h:238:20: error: field ‘ifru_hwaddr’ has incomplete type
struct sockaddr ifru_hwaddr;
^
Very rarely on ath10k (potentially other ath cards), disabling
power save while the interface is down causes a timeout when
bringing the interface back up. This seems to be a race in the
driver or firmware but it causes IWD to never start up properly
since there is no retry logic on that path.
Retrying is an option, but a more straight forward approach is
to just reorder the logic to set power save off after the
interface is already up. If the power save setting fails we can
just log it, ignore the failure, and continue. From a users point
of view there is no real difference in doing it this way as
PS still gets disabled prior to IWD connecting/sending data.
Changing behavior based on a buggy driver isn't something we
should be doing, but in this instance the change shouldn't have
any downside and actually isn't any different than how it has
been done prior to the driver quirks change (i.e. use network
manager, iw, or iwconfig to set power save after IWD starts).
For reference, this problem is quite rare and difficult to say
exactly how often but certainly <1% of the time:
iwd[1286641]: src/netdev.c:netdev_disable_ps_cb() Disabled power save for ifindex 54
kernel: ath10k_pci 0000:02:00.0: wmi service ready event not received
iwd[1286641]: Error bringing interface 54 up: Connection timed out
kernel: ath10k_pci 0000:02:00.0: Could not init core: -110
After this IWD just sits idle as it has no interface to start using.
This is even reproducable outside of IWD if you loop and run:
ip link set <wlan> down
iw dev <wlan> set power_save off
ip link set <wlan> up
Eventually the 'up' command will fail with a timeout.
I've brought this to the linux-wireless/ath10k mailing list but
even if its fixed in future kernels we'd still need to support
older kernels, so a workaround/change in IWD is still required.
This is done already for DPP, do the same for PKEX. Few drivers
(ath9k upstream, ath10k/11k in progress) support this which is
unfortunate but since a configurator will not work without this
capability its best to fail early.
The DPP spec allows 3rd party fields in the DPP configuration
object (section 4.5.2). IWD can take advantage of this (when
configuring another IWD supplicant) to communicate additional
profile options that may be required for the network.
The new configuration member will be called "/net/connman/iwd"
and will be an object containing settings specific to IWD.
More settings could be added here if needed but for now only
the following are defined:
{
send_hostname: true/false,
hidden: true/false
}
These correspond to the following network profile settings:
[IPv4].SendHostname
[Settings].Hidden
The scan result handling was fragile because it assumed the kernel
would only give results matching the requested SSID. This isn't
something we should assume so instead keep the configuration object
around until after the scan and use the target SSID to lookup the
network.
Nearly every use of the ssid member first has to memcpy it to a
buffer and NULL terminate. Instead just store the ssid as a
string when creating/parsing from JSON.
The DPP-PKEX spec provides a very limited list of frequencies used
to discover configurators, only 3 on 2.4 and 5GHz bands. Since
configurators (at least in IWD's implementation) are only allowed
on the current operating frequency its very unlikely an enrollee
will find a configurator on these frequencies out of the entire
spectrum.
The spec does mention that the 3 default frequencies should be used
"In lieu of specific channel information obtained in a manner outside
the scope of this specification, ...". This allows the implementation
some flexibility in using a broader range of frequencies.
To increase the chances of finding a configurator shared code
enrollees will first issue a scan to determine what access points are
around, then iterate these frequencies. This is especially helpful
when the configurators are IWD-based since we know that they'll be
on the same channels as the APs in the area.
The post-DPP connection was never done quite right due to station's
state being unknown. The state is now tracked in DPP by a previous
patch but the scan path in DPP is still wrong.
It relies on station autoconnect logic which has the potential to
connect to a different network than what was configured with DPP.
Its unlikely but still could happen in theory. In addition the scan
was not selectively filtering results by the SSID that DPP
configured.
This fixes the above problems by first filtering the scan by the
SSID. Then setting the scan results into station without triggering
autoconnect. And finally using network_autoconnect() directly
instead of relying on station to choose the SSID.
DPP (both DPP and PKEX) run the risk of odd behavior if station
decides to change state. DPP is completely unaware of this and
best case would just result in a protocol failure, worst case
duplicate calls to __station_connect_network.
Add a station watch and stop DPP if station changes state during
the protocol.
Commit c59669a366c5 ("netdev: disambiguate between disconnection types")
introduced different paths for different types of disconnection
notifications from netdev. Formalize this further by having
netdev_connect_failed only invoke connect_cb.
Disconnections that could be triggered outside of connection
related events are now handled on a different code path. For this
purpose, netdev_disconnected() is introduced.
When a roam event is received, iwd generates a firmware scan request and
notifies its event filter of the ROAMING condition. In cases where the
firmware scan could not be started successfully, netdev_connect_failed
is invoked. This is not a correct use of netev_connect_failed since it
doesn't actually disconnect the underlying netdev and the reflected
state becomes de-synchronized from the underlying kernel device.
The firmware scan request could currently fail for two reasons:
1. nl80211 genl socket is in a bad state, or
2. the scan context does not exist
Since both reasons are highly unlikely, simply use L_WARN instead.
The other two cases where netdev_connect_failed is used could only occur
if the kernel message is invalid. The message is ignored in that case
and a warning is printed.
The situation described above also exists in netdev_get_fw_scan_cb. If
the scan could not be completed successfully, there's not much iwd can
do to recover. Have iwd remain in roaming state and print an error.
There are generally three scenarios where iwd generates a disconnection
command to the kernel:
1. Error conditions stemming from a connection related event. For
example if SAE/FT/FILS authentication fails during Authenticate or
Associate steps and the kernel doesn't disconnect properly.
2. Deauthentication after the connection has been established and not
related to a connection attempt in progress. For example, SA Query
processing that triggers an disconnect.
3. Disconnects that are triggered due to a handshake failure or if
setting keys resulting from the handshake fails. These disconnects
can be triggered as a result of a pending connection or when a
connection has been established (e.g. due to rekeying).
Distinguish between 1 and 2/3 by having the disconnect procedure take
different paths. For now there are no functional changes since all
paths end up in netdev_connect_failed(), but this will change in the
future.
While here, also get rid of netdev_del_station. The only user of this
function was in ap.c and it could easily be replaced by invoking the new
nl80211_build_del_station function. The callback used by
netdev_build_del_station only printed an error and didn't do anything
useful. Get rid of it for now.
netdev_begin_connection() already invokes netdev_connect_failed on
error. Remove any calls to netdev_connect_failed in callers of
netdev_begin_connection().
Fixes: 4165d9414f54 ("netdev: use wiphy radio work queue for connections")
If netdev_get_oci fails, a goto deauth is invoked in order to terminate
the current connection and return an error to the caller. Unfortunately
the deauth label builds CMD_DEAUTHENTICATE in order to terminate the
connection. This was fine because it used to handle authentication
protocols that ran over CMD_AUTHENTICATE and CMD_ASSOCIATE. However,
OCI can also be used on FullMAC hardware that does not support them.
Use CMD_DISCONNECT instead which works everywhere.
Fixes: 06482b811626 ("netdev: Obtain operating channel info")
The reason code field was being obtained as a uint8_t value, while it is
actually a uint16_t in little-endian byte order.
Fixes: f3cc96499c44 ("netdev: added support for SA Query")
The reason code from deauthentication frame was being obtained as a
uint8_t instead of a uint16_t. The value was only ever used in an
informational statement. Since the value was in little endian, only the
first 8 bits of the reason code were obtained. Fix that.
Fixes: 2bebb4bdc7ee ("netdev: Handle deauth frames prior to association")
Several tests do not pass due to some additional changes that have
not been merged. Remove these cases and add some hardening after
discovering some unfortunate wpa_supplicant behavior.
- Disable p2p in wpa_supplicant. With p2p enabled an extra device
is created which starts receiving DPP frames and printing
confusing messages.
- Remove extra asserts which don't make sense currently. These
will be added back later as future additions to PKEX are
upstreamed.
- Work around wpa_supplicant retransmit limitation. This is
described in detail in the comment in pkex_test.py
- wait_for_event was returning a list in certain cases, not the
event itself
- The configurator ID was not being printed (',' instead of '%')
- The DPP ID was not being properly waited for with PKEX
With the addition of DPP PKEX autotests some of the timeouts are
quite long and hit test-runners maximum timeouts. For UML we should
allow this since time-travel lets us skip idle waits. Move the test
timeout out of a global define and into the argument list so QEMU
and UML can define it differently.
The StartConfigurator() call was left out since there would be no
functional difference to the user in iwctl. Its expected that
human users of the shared code API provide the code/id ahead of
time, i.e. use ConfigureEnrollee/StartEnrollee.
Check that enough space for newline and 0-byte is left in line.
This fixes a buffer overflow on specific completion results.
Reported-By: Leona Maroni <dev@leona.is>
Adds a configurator variant to be used along side an agent. When
called the configurator will start and wait for an initial PKEX
exchange message from an enrollee at which point it will request
the code from an agent. This provides more flexibility for
configurators that are capable of configuring multiple enrollees
with different identifiers/codes.
Note that the timing requirements per the DPP spec still apply
so this is not meant to be used with a human configurator but
within an automated agent which does a quick lookup of potential
identifiers/codes and can reply within the 200ms window.
The PKEX configurator role is currently limited to being a responder.
When started the configurator will listen on its current operating
channel for a PKEX exchange request. Once received it and the
encrypted key is properly decrypted it treats this peer as the
enrollee and won't allow configurations from other peers unless
PKEX is restarted. The configurator will encrypt and send its
encrypted ephemeral key in the PKEX exchange response. The enrollee
then sends its encrypted bootstrapping key (as commit-reveal request)
then the same for the configurator (as commit-reveal response).
After this, PKEX authentication begins. The enrollee is expected to
send the authenticate request, since its the initiator.
This is the initial support for PKEX enrollees acting as the
initiator. A PKEX initiator starts the protocol by broadcasting
the PKEX exchange request. This request contains a key encrypted
with the pre-shared PKEX code. If accepted the peer sends back
the exchange response with its own encrypted key. The enrollee
decrypts this and performs some crypto/hashing in order to establish
an ephemeral key used to encrypt its own boostrapping key. The
boostrapping key is encrypted and sent to the peer in the PKEX
commit-reveal request. The peer then does the same thing, encrypting
its own bootstrapping key and sending to the initiator as the
PKEX commit-reveal response.
After this, both peers have exchanged their boostrapping keys
securely and can begin DPP authentication, then configuration.
For now the enrollee will only iterate the default channel list
from the Easy Connect spec. Future upates will need to include some
way of discovering non-default channel configurators, but the
protocol needs to be ironed out first.
Stop() will now return NotFound if DPP is not running. This causes
the DPP test to fail since it calls this regardless if the protocol
already stopped. Ignore this exception since tests end in various
states, some stopped and some not.
PKEX and DPP will share the same state machine since the DPP protocol
follows PKEX. This does pose an issue with the DBus interfaces
because we don't want DPP initiated by the SharedCode interface to
start setting properties on the DeviceProvisioning interface.
To handle this a dpp_interface enum is being introduced which binds
the dpp_sm object to a particular interface, for the life of the
protocol run. Once the protocol finishes the dpp_sm can be unbound
allowing either interface to use it again later.
This mispelling was present in the configuration, so I retained parsing
of the legacy BandModifier*Ghz options for compatibility. Without this
change anyone spelling GHz correctly in their configs would be very
confused.
PKEX is part of the WFA EasyConnect specification and is
an additional boostrapping method (like QR codes) for
exchanging public keys between a configurator and enrollee.
PKEX operates over wifi and requires a key/code be exchanged
prior to the protocol. The key is used to encrypt the exchange
of the boostrapping information, then DPP authentication is
started immediately aftewards.
This can be useful for devices which don't have the ability to
scan a QR code, or even as a more convenient way to share
wireless credentials if the PSK is very secure (i.e. not a
human readable string).
PKEX would be used via the three DBus APIs on a new interface
SharedCodeDeviceProvisioning.
ConfigureEnrollee(a{sv}) will start a configurator with a
static shared code (optionally identifier) passed in as the
argument to this method.
StartEnrollee(a{sv}) will start a PKEX enrollee using a static
shared code (optionally identifier) passed as the argument to
the method.
StartConfigurator(o) will start a PKEX configurator and use the
agent specified by the path argument. The configurator will query
the agent for a specific code when an enrollee sends the initial
exchange message.
After the PKEX protocol is finished, DPP bootstrapping keys have
been exchanged and DPP Authentication will start, followed by
configuration.
Beacon loss handling was removed in the past because it was
determined that this even always resulted in a disconnect. This
was short sighted and not always true. The default kernel behavior
waits for 7 lost beacons before emitting this event, then sends
either a few nullfuncs or probe requests to the BSS to determine
if its really gone. If these come back successfully the connection
will remain alive. This can give IWD some time to roam in some
cases so we should be handling this event.
Since beacon loss indicates a very poor connection the roam scan
is delayed by a few seconds in order to give the kernel a chance
to send the nullfuncs/probes or receive more beacons. This may
result in a disconnect, but it would have happened anyways.
Attempting a roam mainly handles the case when the connection can
be maintained after beacon loss, but is still poor.
This is being done to allow the DPP module to work correctly. DPP
currently uses __station_connect_network incorrectly since it
does not (and cannot) change the state after calling. The only
way to connect with a state change is via station_connect_network
which requires a DBus method that triggered the connection; DPP
does not have this due to its potentially long run time.
To support DPP there are a few options:
1. Pass a state into __station_connect_network (this patch)
2. Support a NULL DBus message in station_connect_network. This
would require several NULL checks and adding all that to only
support DPP just didn't feel right.
3. A 3rd connect API in station which wraps
__station_connect_network and changes the state. And again, an
entirely new API for only DPP felt wrong (I guess we did this
for network_autoconnect though...)
Its about 50/50 between call sites that changed state after calling
and those that do not. Changing the state inside
__station_connect_network felt useful enough to cover the cases that
could benefit and the remaining cases could handle it easily enough:
- network_autoconnect(), and the state is changed by station after
calling so it more or less follows the same pattern just routes
through network. This will now pass the CONNECTING_AUTO state
from within network vs station.
- The disconnect/reconnect path. Here the state is changed to
ROAMING prior in order to avoid multiple state changes. Knowing
this the same ROAMING state can be passed which won't trigger a
state change.
- Retrying after a failed BSS. The state changes on the first call
then remains the same for each connection attempt. To support this
the current station->state is passed to avoid a state change.
Until now IWD only supported enrollees as responders (configurators
could do both). For PKEX it makes sense for the enrollee to be the
initiator because configurators in the area are already on their
operating channel and going off is inefficient. For PKEX, whoever
initiates also initiates authentication so for this reason the
authentication path is being opened up to allow enrollees to
initiate.
The check for the header was incorrect according to the spec.
Table 58 indicates that the "Query Response Info" should be set
to 0x00 for the configuration request. The frame handler was
expecting 0x7f which is the value for the config response frame.
Unfortunately wpa_supplicant also gets this wrong and uses 0x7f
in all cases which is likely why this value was set incorrectly
in IWD. The issue is that IWD's config request is correct which
means IWD<->IWD configuration is broken. (and wpa_supplicant as
a configurator likely doesn't validate the config request).
Fix this by checking both 0x7f and 0x00 to handle both
supplicants.
Stopping periodic scans and not restarting them prevents autoconnect
from working again if DPP (or the post-DPP connect) fails. Since
the DPP offchannel work is at a higher priority than scanning (and
since new offchannels are queue'd before canceling) there is no risk
of a scan happening during DPP so its safe to leave periodic scans
running.
The packet loss handler puts a higher priority on roaming compared
to the low signal roam path. This is generally beneficial since this
event usually indicates some problem with the BSS and generally is
an indicator that a disconnect will follow sometime soon.
But by immediately issuing a scan we run the risk of causing many
successive scans if more packet loss events arrive following
the roam scans (and if no candidates are found). Logs provided
further.
To help with this handle the first event with priority and
immediately issue a roam scan. If another event comes in within a
certain timeframe (2 seconds) don't immediately scan, but instead
rearm the roam timer instead of issuing a scan. This also handles
the case of a low signal roam scan followed by a packet loss
event. Delaying the roam will at least provide some time for packets
to get out in between roam scans.
Logs were snipped to be less verbose, but this cycled happened
5 times prior. In total 7 scans were issued in 5 seconds which may
very well have been the reason for the local disconnect:
Oct 27 16:23:46 src/station.c:station_roam_failed() 9
Oct 27 16:23:46 src/wiphy.c:wiphy_radio_work_done() Work item 29 done
Oct 27 16:23:47 src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
Oct 27 16:23:47 src/station.c:station_packets_lost() Packets lost event: 10
Oct 27 16:23:47 src/station.c:station_roam_scan() ifindex: 9
Oct 27 16:23:47 src/wiphy.c:wiphy_radio_work_insert() Inserting work item 30
Oct 27 16:23:47 src/wiphy.c:wiphy_radio_work_next() Starting work item 30
Oct 27 16:23:47 src/station.c:station_start_roam() Using cached neighbor report for roam
Oct 27 16:23:47 src/scan.c:scan_notify() Scan notification Trigger Scan(33)
Oct 27 16:23:47 src/scan.c:scan_request_triggered() Active scan triggered for wdev a
Oct 27 16:23:47 src/scan.c:scan_notify() Scan notification New Scan Results(34)
Oct 27 16:23:47 src/netdev.c:netdev_link_notify() event 16 on ifindex 9
... scan results ...
Oct 27 16:23:47 src/station.c:station_roam_failed() 9
Oct 27 16:23:47 src/wiphy.c:wiphy_radio_work_done() Work item 30 done
Oct 27 16:23:47 src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
Oct 27 16:23:47 src/station.c:station_packets_lost() Packets lost event: 10
Oct 27 16:23:47 src/station.c:station_roam_scan() ifindex: 9
Oct 27 16:23:47 src/wiphy.c:wiphy_radio_work_insert() Inserting work item 31
Oct 27 16:23:47 src/wiphy.c:wiphy_radio_work_next() Starting work item 31
Oct 27 16:23:47 src/station.c:station_start_roam() Using cached neighbor report for roam
Oct 27 16:23:47 src/scan.c:scan_notify() Scan notification Trigger Scan(33)
Oct 27 16:23:47 src/scan.c:scan_request_triggered() Active scan triggered for wdev a
Oct 27 16:23:48 src/scan.c:scan_notify() Scan notification New Scan Results(34)
Oct 27 16:23:48 src/netdev.c:netdev_link_notify() event 16 on ifindex 9
... scan results ...
Oct 27 16:23:48 src/station.c:station_roam_failed() 9
Oct 27 16:23:48 src/wiphy.c:wiphy_radio_work_done() Work item 31 done
Oct 27 16:23:48 src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
Oct 27 16:23:48 src/station.c:station_packets_lost() Packets lost event: 10
Oct 27 16:23:48 src/station.c:station_roam_scan() ifindex: 9
Oct 27 16:23:48 src/wiphy.c:wiphy_radio_work_insert() Inserting work item 32
Oct 27 16:23:48 src/wiphy.c:wiphy_radio_work_next() Starting work item 32
Oct 27 16:23:48 src/station.c:station_start_roam() Using cached neighbor report for roam
Oct 27 16:23:48 src/scan.c:scan_notify() Scan notification Trigger Scan(33)
Oct 27 16:23:48 src/scan.c:scan_request_triggered() Active scan triggered for wdev a
Oct 27 16:23:49 src/netdev.c:netdev_link_notify() event 16 on ifindex 9
Oct 27 16:23:49 src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
Oct 27 16:23:49 src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
Oct 27 16:23:49 src/netdev.c:netdev_deauthenticate_event()
Oct 27 16:23:49 src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
Oct 27 16:23:49 src/netdev.c:netdev_disconnect_event()
Oct 27 16:23:49 Received Deauthentication event, reason: 4, from_ap: false
Include a specific timeout value so different protocols can specify
different timeouts. For example once the authentication timeout
should not take very long (even 10 seconds seems excessive) but
adding PKEX may warrant longer timeouts.
For example discovering a configurator IWD may want to wait several
minutes before ending the discovery. Similarly running PKEX as a
configurator we should put a hard limit on the time, but again
minutes rather than 10 seconds.
The memcpy in HEX2BUF was copying the length of the buffer that was
passed in, not the actual length of the converted hexstring. This
test was segfaulting in the Alpine CI which uses clang/musl.
Its been seen (so far only in mac80211_hwsim + UML) where an
offchannel requests ACK comes after the ROC started event. This
causes the ROC started event to never call back to notify since
info->roc_cookie is unset and it appears to be coming from an
external process.
We can detect this situation in the ROC notify event by checking
if there is a pending ROC command and if info->roc_cookie does
not match. This can also be true for an external event so we just
set a new "early_cookie" member and return.
Then, when the ACK comes in for the ROC request, we can validate
if the prior event was associated with IWD or some external
process. If it was from IWD call the started callback, otherwise
the ROC notify event should come later and handled under the
normal logic where the cookies match.
Instead of looking up by wdev, lookup by the ID itself. We
shouldn't ever have more than one info per wdev in the queue but
looking up the _exact_ info structure doesn't hurt in case things
change in the future.
If netconfig is canceled before completion (when roaming) the
settings are freed and never loaded again once netconfig is started
post-roam. Now after a roam make sure to re-load the settings and
start netconfig.
Commit 23f0f5717c did not correctly handle the reassociation
case where the state is set from within station_try_next_transition.
If IWD reassociates netconfig will get reset and DHCP will need to
be done over again after the roam. Instead get the state ahead of
station_try_next_transition.
Fixes: 23f0f5717ca0 ("station: allow roaming before netconfig finishes")
When using mutual authentication an additional value needs to
be hashed when deriving i/r_auth values. A NULL value indicates
no mutual authentication (zero length iovec is passed to hash).
DPP configurators are running the majority of the protocol on the
current operating channel, meaning no ROC work. The retry logic
was bailing out if !dpp->roc_started with the assumption that DPP
was in between requesting offchannel work and it actually starting.
For configurators, this may not be the case. The offchannel ID also
needs to be checked, and if no work is scheduled we can send the
frame.
The prf_plus API was a bit restrictive because it only took a
string label which isn't compatible with some specs (e.g. DPP
inputs to HKDF-Expand). In addition it took additional label
aruments which were appended to the HMAC call (and the
non-intuitive '\0' if there were extra arguments).
Instead the label argument has been removed and callers can pass
it in through va_args. This also lets the caller decided the length
and can include the '\0' or not, dependent on the spec the caller
is following.
Adds a handler for the HE capabilities element and reworks the way
the MCS/NSS support bits are printed.
Now if the MCS support is 3 (unsupported) it won't be printed. This
makes the logs a bit shorter to read.
Matched the printed function name with the actual function name.
The simple-agent test prints the function name to allow easier debugging.
One name was not set currectly (most likely through copy pasting).
SAE was also relying on the ELL bug which was incorrectly performing
a subtraction on the Y coordinate based on the compressed point type.
Correct this and make the point type more clear (rather than
something like "is_odd + 2").
EAP-PWD was incorrectly computing the PWE but due to the also
incorrect logic in ELL the point converted correctly. This is
being fixed, so both places need the reverse logic.
Also added a big comment explaining why this is, and how
l_ecc_point_from_data behaves since its somewhat confusing since
EAP-PWD expects the pwd-seed to be compared to the actual Y
coordinate (which is handled automatically by ELL).
Add a test to show the incorrect ASN1 conversion to and from points.
This was due to the check if Y is odd/even being inverted which
incorrectly prefixes the X coordinate with the wrong byte.
The test itself was not fully correct because it was using compliant
points rather than full points, and the spec contains the entire
Y coordinate so the full point should be used.
This patch also adds ASN1 conversions to validate that
dpp_point_from_asn1 and dpp_point_to_asn1 work properly.
The previous attempt at working around this warning seems to no longer
work with gcc 13
In function ‘eap_handle_response’,
inlined from ‘eap_rx_packet’ at src/eap.c:570:3:
src/eap.c:421:49: error: ‘vendor_id’ may be used uninitialized [-Werror=maybe-uninitialized]
421 | (type == EAP_TYPE_EXPANDED && vendor_id == (id) && vendor_type == (t))
| ~~~~~~~~~~^~~~~~~
src/eap.c:533:20: note: in expansion of macro ‘IS_EXPANDED_RESPONSE’
533 | } else if (IS_EXPANDED_RESPONSE(our_vendor_id, our_vendor_type))
| ^~~~~~~~~~~~~~~~~~~~
src/eap.c: In function ‘eap_rx_packet’:
src/eap.c:431:18: note: ‘vendor_id’ was declared here
431 | uint32_t vendor_id;
| ^~~~~~~~~
width must be initialized since it depends on best not being NULL. If
best passes the non-NULL check above, then width must be initialized
since both width and best are set at the same time.
For IWD to work correctly either 2.4GHz or 5GHz bands must be enabled
(even for 6GHz to work). Check this and don't allow IWD to initialize
if both 2.4 and 5GHz is disabled.
wiphy_get_allowed_freqs was only being used to see if 6GHz was disabled
or not. This is expensive and requires several allocations when there
already exists wiphy_is_band_disabled(). The prior patch modified
wiphy_is_band_disabled() to return -ENOTSUP which allows scan.c to
completely remove the need for wiphy_get_allowed_freqs.
scan_wiphy_watch was also slightly re-ordered to avoid allocating
freqs_6ghz if the scan request was being completed.
The function wiphy_band_is_disabled() return was a bit misleading
because if the band was not supported it would return true which
could be misunderstood as the band is supported, but disabled.
There was only one call site and because of this behavior
wiphy_band_is_disabled needed to be paired with checking if the
band was supported.
To be more descriptive to the caller, wiphy_band_is_disabled() now
returns an int and if the band isn't supported -ENOTSUP will be
returned, otherwise 1 is returned if the band is disabled and 0
otherwise.
This adds support to allow users to disable entire bands, preventing
scanning and connecting on those frequencies. If the
[Rank].BandModifier* options are set to 0.0 it will imply those
bands should not be used for scanning, connecting or roaming. This
now applies to autoconnect, quick, hidden, roam, and dbus scans.
This is a station only feature meaning other modules like RRM, DPP,
WSC or P2P may still utilize those bands. Trying to limit bands in
those modules may sometimes conflict with the spec which is why it
was not added there. In addition modules like DPP/WSC are only used
in limited capacity for connecting so there is little benefit gained
to disallowing those bands.
To support user-disabled bands periodic scans need to specify a
frequency list filtered by any bands that are disabled. This was
needed in scan.c since periodic scans don't provide a frequency
list in the scan request.
If no bands are disabled the allowed freqs API should still
result in the same scan behavior as if a frequency list is left
out i.e. IWD just filters the frequencies as opposed to the kernel.
Currently the only way a scan can be split is if the request does
not specify any frequencies, implying the request should scan the
entire spectrum. This allows the scan logic to issue an extra
request if 6GHz becomes available during the 2.4 or 5GHz scans.
This restriction was somewhat arbitrary and done to let periodic
scans pick up 6GHz APs through a single scan request.
But now with the addition of allowing user-disabled bands
periodic scans will need to specify a frequency list in case a
given band has been disabled. This will break the scan splitting
code which is why this prep work is being done.
The main difference now is the original scan frequencies are
tracked with the scan request. The reason for this is so if a
request comes in with a limited set of 6GHz frequences IWD won't
end up scanning the full 6GHz spectrum later on.
This is more or less copied from scan_get_allowed_freqs but is
going to be needed by station (basically just saves the need for
station to do the same clone/constrain sequence itself).
One slight alteration is now a band mask can be passed in which
provides more flexibility for additional filtering.
This exposes the [Rank].BandModifier* settings so other modules
can use then. Doing this will allow user-disabling of certain
bands by setting these modifier values to 0.0.
The loop iterating the frequency attributes list was not including
the entire channel set since it was stopping at i < band->freqs_len.
The freq_attrs array is allocated to include the last channel:
band->freq_attrs = l_new(struct band_freq_attrs, num_channels + 1);
band->freqs_len = num_channels;
So instead the for loop should use i <= band->freqs_len. (I also
changed this to start the loop at 1 since channel zero is invalid).
The auth/action status is now tracked in ft.c. If an AP rejects the
FT attempt with "Invalid PMKID" we can now assume this AP is either
mis-configured for FT or is lagging behind getting the proper keys
from neighboring APs (e.g. was just rebooted).
If we see this condition IWD can now fall back to reassociation in
an attempt to still roam to the best candidate. The fallback decision
is still rank based: if a BSS fails FT it is marked as such, its
ranking is reset removing the FT factor and it is inserted back
into the queue.
The motivation behind this isn't necessarily to always force a roam,
but instead to handle two cases where IWD can either make a bad roam
decision or get 'stuck' and never roam:
1. If there is one good roam candidate and other bad ones. For
example say BSS A is experiencing this FT key pull issue:
Current BSS: -85dbm
BSS A: -55dbm
BSS B: -80dbm
The current logic would fail A, and roam to B. In this case
reassociation would have likely succeeded so it makes more sense
to reassociate to A as a fallback.
2. If there is only one candidate, but its failing FT. IWD will
never try anything other than FT and repeatedly fail.
Both of the above have been seen on real network deployments and
result in either poor performance (1) or eventually lead to a full
disconnect due to never roaming (2).
Certain return codes, though failures, can indicate that the AP is
just confused or booting up and treating it as a full failure may
not be the best route.
For example in some production deployments if an AP is rebooted it
may take some time for neighboring APs to exchange keys for
current associations. If a client roams during that time it will
reject saying the PMKID is invalid.
Use the ft_associate call return to communicate the status (if any)
that was in the auth/action response. If there was a parsing error
or no response -ENOENT is still returned.
In many tests the hostapd configuration does not include all the
values that a test uses. Its expected that each individual test
will add the values required. In many cases its required each test
slightly alter the configuration for each change every other test
has to set the value back to either a default or its own setting.
This results in a ton of duplicated code mainly setting things
back to defaults.
To help with this problem the hostapd configuration is read in
initially and stored as the default. Tests can then simply call
.default() to set everything back. This significantly reduces or
completely removes a ton of set_value() calls.
This does require that each hostapd configuration file includes all
values any of the subtests will set, which is a small price for the
convenience.
Removed several debug prints which are very verbose and provide
little to no important information.
The get_scan_{done,callback} prints are pointless since all the
parsed scan results are printed by station anyways.
Printing the BSS load is also not that useful since it doesn't
include the BSSID. If anything the BSS load should be included
when station prints out each individual BSS (along with frequency,
rank, etc).
The advertisement protocol print was just just left in there by
accident when debugging, and also provides basically no useful
information.
Some APs don't include the RSNE in the associate reply during
the OWE exchange. This causes IWD to be incompatible since it has
a hard requirement on the AKM being included.
This relaxes the requirement for the AKM and instead warns if it
is not included.
Below is an example of an association reply without the RSN element
IEEE 802.11 Association Response, Flags: ........
Type/Subtype: Association Response (0x0001)
Frame Control Field: 0x1000
.000 0000 0011 1100 = Duration: 60 microseconds
Receiver address: 64:c4:03:88:ff:26
Destination address: 64:c4:03:88:ff:26
Transmitter address: fc:34:97:2b:1b:48
Source address: fc:34:97:2b:1b:48
BSS Id: fc:34:97:2b:1b:48
.... .... .... 0000 = Fragment number: 0
0001 1100 1000 .... = Sequence number: 456
IEEE 802.11 wireless LAN
Fixed parameters (6 bytes)
Tagged parameters (196 bytes)
Tag: Supported Rates 6(B), 9, 12(B), 18, 24(B), 36, 48, 54, [Mbit/sec]
Tag: RM Enabled Capabilities (5 octets)
Tag: Extended Capabilities (11 octets)
Ext Tag: HE Capabilities (IEEE Std 802.11ax/D3.0)
Ext Tag: HE Operation (IEEE Std 802.11ax/D3.0)
Ext Tag: MU EDCA Parameter Set
Ext Tag: HE 6GHz Band Capabilities
Ext Tag: OWE Diffie-Hellman Parameter
Tag Number: Element ID Extension (255)
Ext Tag length: 51
Ext Tag Number: OWE Diffie-Hellman Parameter (32)
Group: 384-bit random ECP group (20)
Public Key: 14ba9d8abeb2ecd5d95e6c12491b16489d1bcc303e7a7fbd…
Tag: Vendor Specific: Broadcom
Tag: Vendor Specific: Microsoft Corp.: WMM/WME: Parameter Element
Reported-By: Wen Gong <quic_wgong@quicinc.com>
Tested-By: Wen Gong <quic_wgong@quicinc.com>
Handling these events notifies hwsim of address changes for interface
creation/removal outside the initial namespace as well as address
changes due to scanning address randomization.
Interfaces that hwsim already knows about are still handled via
nl80211. But any interfaces not known when ADD/DEL_MAC_ADDR events
come will be treated specially.
For ADD, a dummy interface object will be created and added to the
queue. This lets the frame processing match the destination address
correctly. This can happen both for scan randomization and interface
creation outside of the initial namespace.
For the DEL event we handle similarly and don't touch any interfaces
found via nl80211 (i.e. have a 'name') but need to also be careful
with the dummy interfaces that were created outside the initial
namespace. We want to keep these around but scanning MAC changes can
also delete them. This is why a reference count was added so scanning
doesn't cause a removal.
For example, the following sequence:
ADD_MAC_ADDR (interface creation)
ADD_MAC_ADDR (scanning started)
DEL_MAC_ADDR (scanning done)
Hostapd commit b6d3fd05e3 changed the PMKID derivation in accordance
with 802.11-2020 which then breaks PMKID validation in IWD. This
breaks the FT-8021x AKM in IWD if the AP uses this hostapd version
since the PMKID doesn't validate during EAPoL.
This updates the PMKID derivation to use the correct SHA hash for
this AKM and adds SHA1 based PMKID checking for interoperability
with older hostapd versions.
The PMKID derivation has gotten messy due to the spec
updating/clarifying the hash size for the FT-8021X AKM. This
has led to hostapd updating the derivation which leaves older
hostapd versions using SHA1 and newer versions using SHA256.
To support this the checksum type is being fed to
handshake_state_get_pmkid so the caller can decide what sha to
use. In addition handshake_state_pmkid_matches is being added
which uses get_pmkid() but handles sorting out the hash type
automatically.
This lets preauthentication use handshake_state_get_pmkid where
there is the potential that a new PMKID is derived and eapol
can use handshake_state_pmkid_matches which only derives the
PMKID to compare against the peers.
The existing API was limited to SHA1 or SHA256 and assumed a key
length of 32 bytes. Since other AKMs plan to be added update
this to take the checksum/length directly for better flexibility.
This is consistent with the over-Air path, and makes it clear when
reading the logs if over-DS was used, if there was a response frame,
and if the frame failed to parse in some way.
The parsing code was breaking out of the loop on the first comment
which is incorrect and causes only part of the file to be parsed.
Its odd this hasn't popped up until now but its likely due to
differing dhcpd versions, some which add comments and others that
do not.
FILS rekeys were fixed in hostapd somewhat recently but older
versions will fail this test. Document that so we don't get
confused when running tests against older hostapd versions.
Disable power save if the wiphy indicates its needed. Do this
before issuing GET_LINK so the netdev doesn't signal its up until
power save is disabled.
This allows generating code and test coverage reports using lcov &
genhtml. Useful for understanding how much of the codebase is currently
covered by unit and autotests.
Certain drivers do not handle power save very well resulting in
missed frames, firmware crashes, or other bad behavior. Its easy
enough to disable power save via iw, iwconfig, etc but since IWD
removes and creates the interface on startup it blows away any
previous power save setting. The setting must be done *after* IWD
creates the interface which can be done, but needs to be via some
external daemon monitoring IWD's state. For minimal systems,
e.g. without NetworkManager, it becomes difficult and annoying to
persistently disable power save.
For this reason a new driver flag POWER_SAVE_DISABLE is being
added. This can then be referenced when creating the interfaces
and if set, disable power save.
The driver_infos list in wiphy.c is hard coded and, naturally,
not configurable from a user perspective. As drivers are updated
or added users may be left with their system being broken until the
driver is added, IWD released, and packaged.
This adds the ability to define driver flags inside main.conf under
the "DriverQuirks" group. Keys in this group correspond to values in
enum driver_flag and values are a list of glob matches for specific
drivers:
[DriverQuirks]
DefaultInterface=rtl81*,rtl87*,rtl88*,rtw_*,brcmfmac,bcmsdh_sdmmc
ForcePae=buggy_pae_*
Rather than keep a pointer to the driver_info entry copy the flags
into the wiphy object. This preps for supporting driver flags via
a configuration file, specifically allowing for entries that are a
subset of others. For example:
{ "rtl88*", DEFAULT_IF },
{ "rtl88x2bu", FORCE_PAE },
Before it was not possible to add entires like this since only the
last entry match would get set. Now DEFAULT_IF would get set to all
matches, and FORCE_PAE to only rtl88x2bu. This isn't especially
important for the static list since it could be modified to work
correctly, but will be needed when parsing flags from a
configuration file that may contain duplicates or subsets of the
static list.
If there was some problem during the FT authenticate stage
its nice to know more of what happened: whether the AP didn't
respond, rejected the attempt, or sent an invalid frame/IEs.
In some situations its convenient for the same work item to be
inserted (rescheduled) while its in progress. FT for example does
this now if a roam fails. The same ft_work item gets re-inserted
which, currently, is not safe to do since the item is modified
and removed once completed.
Fix this by introducing wiphy_radio_work_reschedule which is an
explicit API for re-inserting work items from within the do_work
callback.
The wiphy work logic was changed around slightly to remove the item
at the head of the queue prior to starting and note the ID going
into do_work. If do_work signaled done and ID changed we know it
was re-inserted and can skip the destroy logic and move onto the
next item. If the item is not done continue as normal but set the
priority to INT_MIN, as usual, to prevent other items from getting
to the head of the queue.
This adds another radio so IWD hits the FT failure path after
authentication to the first BSS fails. This causes a wiphy work
item to be rescheduled which previously was unsafe.
If IWD connects under bad RF conditions and netconfig takes
a while to complete (e.g. slow DHCP), the roam timeout
could fire before DHCP is done. Then, after the roam,
IWD would transition automatically to connected before
DHCP was finished. In theory DHCP could still complete after
this point but any process depending on IWD's connected
state would be uninformed and assume IP networking is up.
Fix this by stopping netconfig prior to a roam if IWD is not
in a connected state. Then, once the roam either failed or
succeeded, start netconfig again.
iwctl quit (running quit non-interactively) isn't a useful command,
but it shouldn't segfault. Let's avoid calling readline functions if
we haven't initialized readline in this run.
When acting as a configurator the enrollee can start on a different
channel than IWD is connected to. IWD will begin the auth process
on this channel but tell the enrollee to transition to the current
channel after the auth request. Since a configurator must be
connected (a requirement IWD enforces) we can assume a channel
transition will always be to the currently connected channel. This
allows us to simply cancel the offchannel request and wait for a
response (rather than start another offchannel).
Doing this improves the DPP performance and reduces the potential
for a lost frame during the channel transition.
This patch also addresses the comment that we should wait for the
auth request ACK before canceling the offchannel. Now a flag is
set and IWD will cancel the offchannel once the ACK is received.
If IWD gets a disconnect during FT the roaming state will be
cleared, as well as any ft_info's during ft_clear_authentications.
This includes canceling the offchannel operation which also
destroys any pending ft_info's if !info->parsed. This causes a
double free afterwards. In addition the l_queue_remove inside the
foreach callback is not a safe operation either.
To fix this don't remove the ft_info inside the offchannel
destroy callback. The info will get freed by ft_associate regardless
of the outcome (parsed or !parsed). This is also consistent with
how the onchannel logic works.
Log and crash backtrace below:
iwd[488]: src/station.c:station_try_next_transition() 5, target aa:46:8d:37:7c:87
iwd[488]: src/wiphy.c:wiphy_radio_work_insert() Inserting work item 16668
iwd[488]: src/wiphy.c:wiphy_radio_work_insert() Inserting work item 16669
iwd[488]: src/wiphy.c:wiphy_radio_work_done() Work item 16667 done
iwd[488]: src/wiphy.c:wiphy_radio_work_next() Starting work item 16668
iwd[488]: src/netdev.c:netdev_mlme_notify() MLME notification Remain on Channel(55)
iwd[488]: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
iwd[488]: src/netdev.c:netdev_link_notify() event 16 on ifindex 5
iwd[488]: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
iwd[488]: src/netdev.c:netdev_deauthenticate_event()
iwd[488]: src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
iwd[488]: src/netdev.c:netdev_disconnect_event()
iwd[488]: Received Deauthentication event, reason: 6, from_ap: true
iwd[488]: src/station.c:station_disconnect_event() 5
iwd[488]: src/station.c:station_disassociated() 5
iwd[488]: src/station.c:station_reset_connection_state() 5
iwd[488]: src/station.c:station_roam_state_clear() 5
iwd[488]: double free or corruption (fasttop)
5 0x0000555b3dbf44a4 in ft_info_destroy ()
6 0x0000555b3dbf45b3 in remove_ifindex ()
7 0x0000555b3dc4653c in l_queue_foreach_remove ()
8 0x0000555b3dbd0dd1 in station_reset_connection_state ()
9 0x0000555b3dbd37e5 in station_disassociated ()
10 0x0000555b3dbc8bb8 in netdev_mlme_notify ()
11 0x0000555b3dc4e80b in received_data ()
12 0x0000555b3dc4b430 in io_callback ()
13 0x0000555b3dc4a5ed in l_main_iterate ()
14 0x0000555b3dc4a6bc in l_main_run ()
15 0x0000555b3dc4a8e0 in l_main_run_with_signal ()
16 0x0000555b3dbbe888 in main ()
Hostapd commit bc36991791 now properly sets the secure bit on
message 1/4. This was addressed in an earlier IWD commit but
neglected to allow for backwards compatibility. The check is
fatal which now breaks earlier hostapd version (older than 2.10).
Instead warn on this condition rather than reject the rekey.
Fixes: 7fad6590bd ("eapol: allow 'secure' to be set on rekeys")
After adding prefix matching the rule structure contained allocated
memory which was not being cleaned up on exit if rules still
remained in the list (removing the rule via DBus was done correctly)
The following commit:
80db8fd86c0c ("build: Use -Wvariadic-macros warning")
added a warning about variadic-macros. But it isn't quite clear why
since variadic macros are used throghout iwd. GCC doesn't honor this
option, but clang does. Since there's no real reason to stop using
variadic macros at this time, drop this warning.
The HT40+/- flags were reversed when checking against the 802.11
behavior flags.
HT40+ means the secondary channel is above (+) the primary channel
therefore corresponds to the PRIMARY_CHANNEL_LOWER behavior. And
the opposite for HT40-.
Reported-By: Alagu Sankar <alagusankar@gmail.com>
Use a more appropriate printf conversion string in order to avoid
unnecessary implicit conversion which can lead to a buffer overflow.
Reasons similar to commit:
98b758f8934a ("knownnetworks: fix printing SSID in hex")
In the case that the FT target is on the same channel as we're currently
operating on, use ft_authenticate_onchannel instead of ft_authenticate.
Going offchannel in this case can confuse some drivers.
Currently when we try FT-over-Air, the Authenticate frame is always
sent via offchannel infrastructure We request the driver to go
offchannel, then send the Authenticate frame. This works fine as long
as the target AP is on a different channel. On some networks some (or
all) APs might actually be located on the same channel. In this case
going offchannel will result in some drivers not actually sending the
Authenticate frame until after the offchannel operation completes.
Work around this by introducing a new ft_authenticate variant that will
not request an offchannel operation first.
Force conversion to unsigned char before printing to avoid sign
extension when printing SSID in hex. For example, if there are CJK
characters in SSID, it will generate a very long string like
/net/connman/iwd/ffffffe8ffffffaeffffffa1.
If a very long ssid was used (e.g. CJK characters in SSID), it might do
out of bounds write to static variable for lack of checking the position
before the last snprintf() call.
Seeing that some authenticators can't handle TLS session caching
properly, allow the EAP-TLS-based methods session caching support to be
disabled per-network using a method specific FastReauthentication setting.
Defaults to true.
With the previous commit, authentication should succeed at least every
other attempt. I'd also expect that EAP-TLS is not usually affected
because there's no phase2, unlike with EAP-PEAP/EAP-TTLS.
If we have a TLS session cached from this attempt or a previous
successful connection attempt but the overall EAP method fails, forget
the session to improve the chances that authentication succeeds on the
next attempt considering that some authenticators strangely allow
resumption but can't handle it all the way to EAP method success.
Logically the session resumption in the TLS layers on the server should
be transparent to the EAP layers so I guess those may be failed
attempts to further optimise phase 2 when the server thinks it can
already trust the client.
The extra IE length for the WMM IE was being set to 26 which is
the HT IE length, not WMM. Fix this and use the proper size for
the WMM IE of 50 bytes.
This shouldn't have caused any problems prior as the tail length
is always allocated with 256 or 512 extra bytes of headroom.
Since channels numbers are used as indexes into the array, and given
that channel numbers start at '1' instead of 0, make sure to allocate a
buffer large enough to not overflow when the max channel number for a
given band is accessed.
src/manager.c:manager_wiphy_dump_callback() New wiphy phy1 added (1)
==22290== Invalid write of size 2
==22290== at 0x4624B2: nl80211_parse_supported_frequencies (nl80211util.c:570)
==22290== by 0x417CA5: parse_supported_bands (wiphy.c:1636)
==22290== by 0x418594: wiphy_parse_attributes (wiphy.c:1805)
==22290== by 0x418E20: wiphy_update_from_genl (wiphy.c:1991)
==22290== by 0x464589: manager_wiphy_dump_callback (manager.c:564)
==22290== by 0x4CBDDA: process_unicast (genl.c:944)
==22290== by 0x4CC19C: received_data (genl.c:1056)
==22290== by 0x4C7140: io_callback (io.c:120)
==22290== by 0x4C5A97: l_main_iterate (main.c:476)
==22290== by 0x4C5BDC: l_main_run (main.c:523)
==22290== by 0x4C5F0F: l_main_run_with_signal (main.c:645)
==22290== by 0x40503B: main (main.c:600)
==22290== Address 0x4aa76ec is 0 bytes after a block of size 28 alloc'd
==22290== at 0x48417B5: malloc (vg_replace_malloc.c:393)
==22290== by 0x4BC4D1: l_malloc (util.c:62)
==22290== by 0x417BE4: parse_supported_bands (wiphy.c:1619)
==22290== by 0x418594: wiphy_parse_attributes (wiphy.c:1805)
==22290== by 0x418E20: wiphy_update_from_genl (wiphy.c:1991)
==22290== by 0x464589: manager_wiphy_dump_callback (manager.c:564)
==22290== by 0x4CBDDA: process_unicast (genl.c:944)
==22290== by 0x4CC19C: received_data (genl.c:1056)
==22290== by 0x4C7140: io_callback (io.c:120)
==22290== by 0x4C5A97: l_main_iterate (main.c:476)
==22290== by 0x4C5BDC: l_main_run (main.c:523)
==22290== by 0x4C5F0F: l_main_run_with_signal (main.c:645)
==22290==
This adds support for rekeys to AP mode. A single timer is used and
reset to the next station needing a rekey. A default rekey timer of
600 seconds is used unless the profile sets a timeout.
The only changes required was to set the secure bit for message 1,
reset the frame retry counter, and change the 2/4 verifier to use
the rekey flag rather than ptk_complete. This is because we must
set ptk_complete false in order to detect retransmissions of the
4/4 frame.
Initiating a rekey can now be done by simply calling eapol_start().
If IWD ends up dumping wiphy's twice (because of NEW_WIPHY event
soon after initial dump) it will also try and dump interfaces
twice leading to multiple DEL_INTERFACE calls. The second attempt
will fail with -ENODEV (since the interface was already deleted).
Just silently fail with this case and let the other DEL_INTERFACE
path handle the re-creation.
With really badly timed events a wiphy can be registered twice. This
happens when IWD starts and requests a wiphy dump. Immediately after
a NEW_WIPHY event comes in (presumably when the driver loads) which
starts another dump. The NEW_WIPHY event can't simply be ignored
since it could be a hotplug (e.g. USB card) so to fix this we can
instead just prevent it from being registered.
This does mean both dumps will happen but the information will just
be added to the same wiphy object.
Past commits should address any potential problems of the timer
firing during FT, but its still good practice to cancel the timer
once it is no longer needed, i.e. once FT has started.
If station has already started FT ensure station_cannot_roam takes
that into account. Since the state has not yet changed it must also
check if the FT work ID is set.
Under the following conditions IWD can accidentally trigger a second
roam scan while one is already in progress:
- A low RSSI condition is met. This starts the roam rearm timer.
- A packet loss condition is met, which triggers a roam scan.
- The roam rearm timer fires and starts another roam scan while
also overwriting the first roam scan ID.
- Then, if IWD gets disconnected the overwritten roam scan gets
canceled, and the roam state is cleared which NULL's
station->connected_network.
- The initial roam scan results then come in with the assumption
that IWD is still connected which results in a crash trying to
reference station->connected_network.
This can be fixed by adding a station_cannot_roam check in the rearm
timer. If IWD is already doing a roam scan station->preparing_roam
should be set which will cause it to return true and stop any further
action.
Aborting (signal 11) [/usr/libexec/iwd]
iwd[426]: ++++++++ backtrace ++++++++
iwd[426]: #0 0x7f858d7b2090 in /lib/x86_64-linux-gnu/libc.so.6
iwd[426]: #1 0x443df7 in network_get_security() at ome/locus/workspace/iwd/src/network.c:287
iwd[426]: #2 0x421fbb in station_roam_scan_notify() at ome/locus/workspace/iwd/src/station.c:2516
iwd[426]: #3 0x43ebc1 in scan_finished() at ome/locus/workspace/iwd/src/scan.c:1861
iwd[426]: #4 0x43ecf2 in get_scan_done() at ome/locus/workspace/iwd/src/scan.c:1891
iwd[426]: #5 0x4cbfe9 in destroy_request() at ome/locus/workspace/iwd/ell/genl.c:676
iwd[426]: #6 0x4cc98b in process_unicast() at ome/locus/workspace/iwd/ell/genl.c:954
iwd[426]: #7 0x4ccd28 in received_data() at ome/locus/workspace/iwd/ell/genl.c:1052
iwd[426]: #8 0x4c79c9 in io_callback() at ome/locus/workspace/iwd/ell/io.c:120
iwd[426]: #9 0x4c62e3 in l_main_iterate() at ome/locus/workspace/iwd/ell/main.c:476
iwd[426]: #10 0x4c6426 in l_main_run() at ome/locus/workspace/iwd/ell/main.c:519
iwd[426]: #11 0x4c6752 in l_main_run_with_signal() at ome/locus/workspace/iwd/ell/main.c:645
iwd[426]: #12 0x405987 in main() at ome/locus/workspace/iwd/src/main.c:600
iwd[426]: #13 0x7f858d793083 in /lib/x86_64-linux-gnu/libc.so.6
iwd[426]: +++++++++++++++++++++++++++
If the authenticator has already set an snonce then the packet must
be a retransmit. Handle this by sending 3/4 again but making sure
to not reset the frame counter.
Old wpa_supplicant versions do not set the secure bit on 2/4 during
rekeys which causes IWD to reject the message and eventually time out.
Modern versions do set it correctly but even Android 13 (Pixel 5a)
still uses an ancient version of wpa_supplicant which does not set the
bit.
Relax this check and instead just print a warning but allow the message
to be processed.
In try_handshake_complete() we return early if all the keys had
been installed before (initial associations). For rekeys we can
now emit the REKEY_COMPLETE event which lets AP mode reset the
rekey timer for that station.
When the TK is installed the 'ptk_installed' flag was never set to
zero. For initial associations this was fine (already zero) but for
rekeys the flag needs to be unset so try_handshake_complete knows
if the key was installed. This is consistent with how gtk/igtk keys
work as well.
Rekeys for station mode don't need to know when complete since
there is nothing to do once done. AP mode on the other hand needs
to know if the rekey was successful in order to reset/set the next
rekey timer.
The second handshake message was hard coded with the secure bit as
zero but for rekeys the secure bit should be set to 1. Fix this by
changing the 2/4 builder to take a boolean which will set the bit
properly.
It should be noted that hostapd doesn't check this bit so EAPoL
worked just fine, but IWD's checks are more strict.
The PEAP RFC wants implementations to enforce that Phase2 methods have
been successfully completed prior to accepting a successful result TLV.
However, when TLS session resumption is used, some servers will skip
phase2 methods entirely and simply send a Result TLV with a success
code. This results in iwd (erroneously) rejecting the authentication
attempt.
Fix this by marking phase2 method as successful if session resumption is
being used.
This adds a builder which sets the country IE in probes/beacons.
The IE will use the 'single subband triplet sequence' meaning
dot11OperatingClassesRequired is false. This is much easier to
build and doesn't require knowing an operating class.
The IE itself is variable in length and potentially could grow
large if the hardware has a weird configuration (many different
power levels or segmentation in supported channels) so the
overall builder was changed to take the length of the buffer and
warnings will be printed if any space issues are encountered.
IWD's channel/frequency conversions use simple math to convert and
have very minimal checks to ensure the input is valid. This can
lead to some channels/frequencies being calculated which are not
in IWD's E-4 table, specifically in the 5GHz band.
This is especially noticable using mac80211_hwsim which includes
some obscure high 5ghz frequencies which are not part of the 802.11
spec.
To fix this calculate the frequency or channel then iterate E-4
operating classes to check that the value actually matches a class.
The 6GHz test was not incrementing the frequencies properly which
was resulting in invalid frequencies, but since the conversion
API was never linked to E-4 the test was still passing.
The country IE can sometimes have a zero pad byte at the end for
alignment. This was not being checked for which caused the loop
to go past the end of the IE and print an entry for channel 0
(the pad byte) plus some garbage data.
Fix this by checking for the pad byte explicitly which skips the
print and terminates the loop.
If supported this will include the HT capabilities and HT
operations elements in beacons/probes. Some shortcuts were taken
here since not all the information is currently parsed from the
hardware. Namely the HT operation element does not include the
basic MCS set. Still, this will at least show stations that the
AP is capable of more than just basic rates.
The builders themselves are structured similar to the basic rates
builder where they build only the contents and return the length.
The caller must set the type/length manually. This is to support
the two use cases of using with an IE builder vs direct pointer.
To include HT support a chandef needs to be created for whatever
frequency is being used. This allows IWD to provide a secondary
channel to the kernel in the case of 40MHz operation. Now the AP
will generate a chandef when starting based on the channel set
in the user profile (or default).
If HT is not supported the chandef width is set to 20MHz no-HT,
otherwise band_freq_to_ht_chandef is used.
The WMM parameter IE is expected by the linux kernel for any AP
supporting HT/VHT etc. IWD won't actually use WMM and its not
clear exactly why the kernel uses this restriction, but regardless
it must be included to support HT.
For AP mode its convenient for IWD to choose an appropriate
channel definition rather than require the user provide very
low level parameters such as channel width, center1 frequency
etc. For now only HT is supported as VHT/HE etc. require
additional secondary channel frequencies.
The HT API tries to find an operating class using 40Mhz which
complies with any hardware restrictions. If an operating class is
found that is supported/not restricted it is marked as 'best' until
a better one is found. In this case 'better' is a larger channel
width. Since this is HT only 20mhz and 40mhz widths are checked.
This adds some additional parsing to obtain the AMPDU parameter
byte as well as wiphy_get_ht_capabilities() which returns the
complete IE (combining the 3 separate kernel attributes).
The supported rates IE was being built in two places. This makes that
code common. Unfortunately it needs to support both an ie builder
and using a pointer directly which is why it only builds the contents
of the IE and the caller must set the type/length.
Move the l_netconfig_set_route_priority() and
l_netconfig_set_optimistic_dad_enabled() calls from netconfig_new, which
is called once for the l_netconfig object's lifetime, to
netconfig_load_settings, which is called before every connection attempt.
This is needed because we clean up the l_netconfig configuration by calling
l_netconfig_reset_config() at different points in connection setup and
teardown so we'd reset the route priority that we've set in netconfig_new,
back to 0 and never reload it.
The disabled_freqs list is being removed and replaced with a new
list in the band object. This completely removes the need for
the pending_freqs list as well since any regdom related dumps
can just overwrite the existing frequency list.
This adds two new APIs:
wiphy_get_frequency_info(): Used to get information about a given
frequency such as disabled/no-IR. This can also be used to check
if the frequency is supported (NULL return is unsupported).
wiphy_band_is_disabled(): Checks if a band is disabled. Note that
an unsupported band will also return true. Checking support should
be done with wiphy_get_supported_bands()
As additional frequency info is needed it doesn't make sense to
store a full list of frequencies for every attribute (i.e.
supported, disabled, no-IR, etc).
This changes nl80211_parse_supported_frequencies to take a list
of frequency attributes where each index corresponds to a channel,
and each value can be filled with flag bits to signal any
limitations on that frequency.
wiphy.c then had to be updated to use this rather than the existing
scan_freq_set lists. This, as-is, will break anything using
wiphy_get_disabled_freqs().
Currently the wiphy object keeps track of supported and disabled
frequencies as two separate scan_freq_set's. This is very expensive
and limiting since we have to add more sets in order to track
additional frequency flags (no-IR, no-HT, no-HE etc).
Instead we can refactor how frequencies are stored. They will now
be part of the band object and stored as a list of flag structures
where each index corresponds to a channel
IWD was optimizing FT-over-DS by authenticating to multiple BSS's
at the time of connecting which then made future roams slightly
faster since they could jump right into association. So far this
hasn't posed a problem but it was reported that some AP's actually
enforce a reassociation timeout (included in 4-way handshake).
Hostapd itself does no such enforcement but anything external to
hostapd could monitor FT events and clear the cache if any exceeded
this timeout.
For now remove the early action frames and treat FT-over-DS the
same as FT-over-Air. In the future we could parse the reassociation
timeout, batch out FT-Action frames and track responses but for the
time being this just fix the issue at a small performance cost.
Queue the FT action just like we do with FT Authenticate which makes
it able to be used the same way, i.e. call ft_action() then queue
the ft_associate work right away.
A timer was added to end the work item in case the target never
responds.
If the regdom updates during a periodic scan the results will be
delayed until after the update in order to, potentially, add 6GHz
frequencies since they may become available. The delayed results
happen regardless of 6GHz support but scan_wiphy_watch() was
returning early if 6GHz was not supported causing the scan request
to never complete.
The blamed commit argues that the periodic scan callback doesn't do
anything useful in the event of an aborted scan, but this is not
entirely true. In particular, the callback is responsible for re-arming
the periodic scan timer. Make sure to call scan_finished() so that iwd's
periodic scanning logic continues unabated even when a periodic scan is
aborted.
Also remove the periodic boolean member of struct scan_request, as it
serves no purpose anymore.
Fixes: 6051a1495227 ("scan: Don't callback on SCAN_ABORTED")
This enables IWD to use 5GHz frequencies in AP mode. Currently
6GHz is not supported so we can assume a [General].Channel value
36 or above indicates the 5GHz band.
It should be noted that the system will probably need a regulatory
domain set in order for 5GHz to be allowed in AP mode. This is due
to world roaming (00) restricting any/all 5GHz frequencies. This
can be accomplished by setting main.conf [General].Country=CC to
the country this AP will operate in.
wiphy_get_supported_rates expected an enum defined in the nl80211
header but the argument type was an unsigned int, not exactly
intuitive to anyone using the API. Since the nl80211 enum value
was only used in a switch statement it could just as well be IWD's
internal enum band_freq.
This also allows modules which do not reference nl80211.h to use
wiphy_get_supported_rates().
Change wording to say that IPv6 support is enabled by default. No
functional changes.
Fixes: 00baa75e9633 ("netconfig: Enable IPV6 support by default")
Before this change, I noticed that some non-interactive commands
don't work,
$ iwctl version
$ iwctl help
while other ones do.
$ iwctl station wlan0 show
This seems to be a typo bug in the if clause checking for additional
arguments.
This file is a compilation command database used by clangd and ccls,
and can be generated by tools like https://github.com/rizsotto/Bear.
$ bear -- make clean all
If a CMD_TRIGGER_SCAN request fails with -EBUSY, iwd currently assumes
that a scan is ongoing on the underlying wdev and will retry the same
command when that scan is complete. It gets notified of that completion
via the scan_notify() function, and kicks the scan logic to try again.
However, if there is another wdev on the same wiphy and that wdev has a
scan request in flight, the kernel will also return -EBUSY. In other
words, only one scan request per wiphy is permitted.
As an example, the brcmfmac driver can create an AP interface on the
same wiphy as the default station interface, and scans can be triggered
on that AP interface.
If -EBUSY is returned because another wdev is scanning, then iwd won't
know when it can retry the original trigger request because the relevant
netlink event will arrive on a different wdev. Indeed, if no scan
context exists for that other wdev, then scan_notify will return early
and the scan logic will stall indefinitely.
Instead, and in the event that no scan context matches, use it as a cue
to retry a pending scan request that happens to be destined for the same
wiphy.
The previous commit added an invocation of known_networks_watch_add, but
never updated the module dependency graph.
Fixes: a793a41662b2 ("station, eapol: Set up eap-tls-common for session caching")
Use eap_set_peer_id() to set a string identifying the TLS server,
currently the hex-encoded SSID of the network, to be used as group name
and primary key in the session cache l_settings object. Provide pointers
to storage_eap_tls_cache_{load,sync} to eap-tls-common.c using
eap_tls_set_session_cache_ops(). Listen to Known Network removed
signals and call eap_tls_forget_peer() to have any session related to
the network also dropped from the cache.
Use l_tls_set_session_cache() to enable session cache/resume in the
TLS-based EAP methods. Sessions for all 802.1x networks are stored in
one l_settings object.
eap_{get,set}_peer_id() API is added for the upper layers to set the
identifier of the authenticator (or the supplicant if we're the
authenticator, if there's ever a use case for that.)
eap-tls-common.c can't call storage_eap_tls_cache_{load,sync}()
or known_networks_watch_add() (to handle known network removals) because
it's linked into some executables that don't have storage.o,
knownnetworks.o or common.o so an upper layer (station.c) will call
eap_tls_set_session_cache_ops() and eap_tls_forget_peer() as needed.
Minor changes to these two methods resulting from two rewrites of them.
Actual changes are:
* storage_tls_session_sync parameter is const,
* more specific naming,
* storage_tls_session_load will return an empty l_settings instead of
NULL so eap-tls-common.c doesn't have to handle this.
storage.c makes no assumptions about the group names in the l_settings
object and keeps no reference to that object, eap-tls-common.c is going
to maintain the memory copy of the cache since this cache and the disk
copy of it are reserved for EAP methods only.
A comma separated list as a string was ok for pure display purposes
but if any processing needed to be done on these values by external
consumers it really makes more sense to use a DBus array.
AP mode implements a few DBus methods/properties which are named
the same as station: Scan, Scanning, and GetOrderedNetworks. Allow
the Device object to work with these in AP mode by calling the
correct method if the Mode is 'ap'.
This wasn't being updated meaning the property is missing until a
scan is issued over DBus.
Rather than duplicate all the property changed calls they were all
factored out into a helper function.
Adds the MulticastDNS option globally to main.conf. If set all
network connections (when netconfig is enabled) will set mDNS
support into the resolver. Note that an individual network profile
can still override the global value if it sets MulticastDNS.
The AP mode device APIs were hacked together and only able to start
stop an AP. Now that the AP interface has more functionality its
best to use the DBus class template to access the full AP interface
capabilities.
The limitation of cipher selection in ap.c was done so to allow p2p to
work. Now with the ability to specify ciphers in the AP config put the
burden on p2p to limit ciphers as it needs which is only CCMP according
to the spec.
These can now be optionally provided in an AP profile and provide a
way to limit what ciphers can be chosen. This still is dependent on
what the hardware supports.
The validation of these ciphers for station is done when parsing
the BSS RSNE but for AP mode there is no such validation and
potentially any supported cipher could be chosen, even if its
incompatible for the type of key.
This change removes duplicate calls to display_table_footer(), in
station show.
Before this change, the bug caused an extra newline to be output every
time the table updated. This only occurred when the network was
disconnected.
$ iwctl
[iwd]# station wlan0 show
The netdev_copy_tk function was being hard coded with authenticator
set to false. This isn't important for any ciphers except TKIP but
now that AP mode supports TKIP it needs to be fixed.
The disabled cipher list contained a '.' instead of ',' which prevented
the subsequent ciphers from being disabled. This was only group management
ciphers so it didn't have any effect on the test.
The -F option is undocumented but allows you to pass a nl80211
family ID so iwmon doesn't ignore messages which don't match the
systems nl80211 family ID (i.e. pcaps from other systems).
This is somewhat of a pain to use since its unclear what the other
system's family ID actually is until you run it though something
like wireshark. Instead iwmon can ignore the family ID when in
read mode which makes reading other systems pcap files automatic.
Expand nlmon_create to be useful for both pcaps and monitoring. Doing
this also lets iwmon filter pcaps based on --no-ies,rtnl,scan etc
flags since they are part of the config.
Though TKIP is deprecated and insecure its trivial to support it in
AP mode as we already do in station. This is only to allow AP mode
for old hardware that may only support TKIP. If the hardware supports
any higher level cipher that will be chosen automatically.
The __str__ function assumed station mode which throws an exception
if the device is in AP mode. Fix this as well as print out the mode
the device is in.
This API optimizes scanning to run tests quickly by only scanning
the frequencies which hostapd is using. But if a test doesn't use
hostapd this API raises an uncaught exception.
Check if hostapd is being used, and if not just do a full scan.
The key descriptor version was hard coded to HMAC_SHA1_AES which
is correct when using IE_RSN_AKM_SUITE_PSK + CCMP. ap.c hard
codes the PSK AKM but still uses wiphy to select the cipher. In
theory there could be hardware that only supports TKIP which
would then make IWD non-compliant since a different key descriptor
version should be used with PSK + TKIP (HMAC_MD5_ARC4).
Now use a helper to sort out which key descriptor should be used
given the AKM and cipher suite.
Similarly to l_netconfig track whether IWD's netconfig is active (from
the moment of netconfig_configure() till netconfig_reset()) using a
"started" flag and avoid handling or emitting any events after "started"
is cleared.
This fixes an occasional issue with the Netconfig Agent backend where
station would reset netconfig, netconfig would issue DBus calls to clear
addresses and routes, station would go into DISCONNECTING, perhaps
finish and go into DISCONNECTED and after a while the DBus calls would
come back with an error which would cause a NETCONFIG_EVENT_FAILED
causing station to call netdev_disconnct() for a second time and
transition to and get stuck in DISCONNECTING.
Add an additional optional PairwiseCipher property on
net.connman.iwd.StationDiagnostic interface that will hold the current
pairwise cipher in use for the connection.
Both CMD_ASSOCIATE and CMD_CONNECT paths were using very similar code to
build RSN specific attributes. Use a common function to build these
attributes to cut down on duplicated code.
While here, also start using ie_rsn_cipher_suite_to_cipher instead of
assuming that the pairwise / group ciphers can only be CCMP or TKIP.
Instead of copy-pasting the same basic operation (memcpy & assignment),
use a goto and a common path instead. This should also make it easier
for the compiler to optimize this function.
Commit c7640f8346 was meant to fix a sign compare warning
in clang because NLMSG_NEXT internally compares the length
with nlmsghdr->nlmsg_len which is a u32. The problem is the
NLMSG_NEXT can underflow an unsigned value, hence why it
expects an int type to be passed in.
To work around this we can instead pass a larger sized
int64_t which the compiler allows since it can upgrade the
unsigned nlmsghdr->nlmsg_len. There is no underflow risk
with an int64_t either because the buffer used is much
smaller than what can fit in an int64_t.
Fixes: c7640f8346 ("monitor: fix integer comparison error (clang)")
In nearly all cases the auto-line breaks can be on spaces in the
string. The only exception (so far) is DPP which displays a very
long URI without any spaces. When this is displayed a single
character was lost on each break. This was due to the current line
being NULL terminated prior to the next line string being returned.
To handle both cases the next line is copied prior to terminating
the current line. The offsets were modified slightly so the line
will be broken *after* the space, not on the space. This leaves
the space at the end of the current line which is invisible to the
user but allows the no-space case to also work without loss of
the last character (since that last character is where the space
normally would be).
Each color escape is tracked and the new_width is adjusted
accordingly. But if the color escape comes after a space which breaks
the line, the adjusted width ends up being too long since that escape
sequence isn't appearing on the current line. This causes the next
column to be shifted over.
The old 'max' parameter was being used both as an input and output
parameter which was confusing. Instead have next_line take the
column width as input, and output a new width which includes any
color escapes and wide characters.
In theory any input to this function should have valid utf-8 but
just in case the strings should be validated. This removes the
need to check the return of l_utf8_get_codepoint which is useful
since there is no graceful failure path at this point.
The known frequency list may include frequencies that once were
allowed but are now disabled due to regulatory restrictions. Don't
include these frequencies in the roam scan.
The utf-8 bytes were being counted as normal ascii so the
width maximum was not being increased to include
non-printable bytes like it is for color escape sequences.
This lead to the row not printing enough characters which
effected the text further down the line.
Fix this by increasing 'max' when non-codepoint utf-8
characters are found.
This test was taking about 5 minutes to run, specifically
the requested scan test. One slight optimization is to
remove the duplicate hidden network, since there is no
need for two. In addition the requested scan test was
changed so it does not periodic scan and only issues a dbus
scan.
The CI was sometimes taking ~10-15 minutes to run just this
test. This is likely due to the test having 7 radios and
which is a lot of beacons/probes to process.
Disabling the unused hostapd instances drops the runtime down
to about 1 minute.
If a rule was disabled it would cause hwsim to not continue processing
frames using rules further in the queue. _Most_ tests only use one
rule so this shouldn't have changed their behavior but others which
use multiple rules may be effected and the tests have not been
running properly.
These events are sent if IWD fails to authentiate
(ft-over-air-roam-failed) or if it falls back to over air after
failing to use FT-over-DS (try-ft-over-air)
If IPv4 setup fails and the netconfig logic gives up, continue as if the
connection had failed at earlier stages so that autoconnect can try the
next available network.
Certain drivers support/require probe response offloading which
IWD did not check for or properly handle. If probe response
offloading is required the probe response frame watch will not
be added and instead the ATTR_PROBE_RESP will be included with
START_AP.
The head/tail builders were reused but slightly modified to check
if the probe request frame is NULL, since it will be for use with
START_AP.
Parse the AP probe response offload attribute during the dump. If
set this indicates the driver expects the probe response attribute
to be included with START_AP.
Clearing all authentications during ft_authenticate was a very large
hammer and may remove cached authentications that could be used if
the current auth attempt fails.
For example the best BSS may have a problem and fail to authenticate
early with FT-over-DS, then fail with FT-over-Air. But another BSS
may have succeeded early with FT-over-DS. If ft_authenticate clears
all ft_infos that successful authentication will be lost.
This tests the new behavior where the roam request does not
indicate disassociation is imminent. In this case if no
candidates are found IWD should not roam.
Instead of requiring the initial condition be met when calling
wait_for_object_change, wait for it.
This is how every caller of this function uses it, specifically
with roaming where we first wait for DeviceState.roaming, then
call wait_for_object_change. This can be simplified for the caller
so the initial condition is first waited for.
AP roaming was structured such that any AP roam request would
force IWD to roam (assuming BSS's were found in scan results).
This isn't always the best behavior since IWD may be connected
to the best BSS in range.
Only force a roam if the AP includes one of the 3 disassociation/
termination bits. Otherwise attempt to roam but don't set the
ap_directed_roaming flag which will allows IWD to stay with the
current BSS if no better candidates are found.
There are a few checks that can be done prior to parsing the
request, in addition the explicit check for preparing_roam was
removed since this is taken care of by station_cannot_roam().
Once offchannel completes we can check if the info structure was
parsed, indicating authentication succeeded. If not there is no
reason to keep it around since IWD will either try another BSS or
fail.
This both adds proper handling to the new roaming logic and fixes
a potential bug with firmware roams.
The new way roaming works doesn't use a connect callback. This
means that any disconnect event or call to netdev_connect_failed
will result in the event handler being called, where before the
connect callback would. This means we need to handle the ROAMING
state in the station disconnect event so IWD properly disassociates
and station goes out of ROAMING.
With firmware roams netdev gets an event which transitions station
into ROAMING. Then netdev issues GET_SCAN. During this time a
disconnect event could come in which would end up in
station_disconnect_event since there is no connect callback. This
needs to be handled the same and let IWD transition out of the
ROAMING state.
This finalizes the refactor by moving all the handshake prep
into FT itself (most was already in there). The netdev-specific
flags and state were added into netdev_ft_tx_associate which
now avoids any need for a netdev API related to FT.
The NETDEV_EVENT_FT_ROAMED event is now emitted once FT completes
(netdev_connect_ok). This did require moving the 'in_ft' flag
setting until after the keys are set into the kernel otherwise
netdev_connect_ok has no context as to if this was FT or some
other connection attempt.
In addition the prev_snonce was removed from netdev. Restoring
the snonce has no value once association begins. If association
fails it will result in a disconnect regardless which requires
a new snonce to be generated
This converts station to using ft_action/ft_authenticate and
ft_associate and dropping the use of the netdev-only/auth-proto
logic.
Doing this allows for more flexibility if FT fails by letting
IWD try another roam candidate instead of disconnecting.
Now the full action frame including the header is provided to ft
which breaks the existing parser since it assumes the buffer starts
at the body of the message.
This forwards Action, Authentication and Association frames to
ft.c via their new hooks in netdev.
Note that this will break FT-over-Air temporarily since the
auth-proto still is in use.
The current behavior is to only find the best roam candidate, which
generally is fine. But if for whatever reason IWD fails to roam it
would be nice having a few backup BSS's rather than having to
re-scan, or worse disassociate and reconnect entirely.
This patch doesn't change the roam behavior, just prepares for
using a roam candidate list. One difference though is any roam
candidates are added to station->bss_list, rather than just the
best BSS. This shouldn't effect any external behavior.
The candidate list is built based on scan_bss rank. First we establish
a base rank, the rank of the current BSS (or zero if AP roaming). Any
BSS in the results with a higher rank, excluding the current BSS, will
be added to the sorted station->roam_bss_list (as a new 'roam_bss'
entry) as well as stations overall BSS list. If the resulting list is
empty there were no better BSS's, otherwise station can now try to roam
starting with the best candidate (head of the roam list).
A new API was added, ft_authenticate, which will send an
authentication frame offchannel via CMD_FRAME. This bypasses
the kernel's authentication state allowing multiple auth
attempts to take place without disconnecting.
Currently netdev handles caching FT auth information and uses FT
parsers/auth-proto to manage the protocol. This sets up to remove
this state machine from netdev and isolate it into ft.c.
This does not break the existing auth-proto (hence the slight
modifications, which will be removed soon).
Eventually the auth-proto will be removed from FT entirely, replaced
just by an FT state machine, similar to how EAPoL works (netdev hooks
to TX/RX frames).
There may be situations (due to Multi-BSS operation) where an AP might
be advertising multiple SSIDs on the same BSSID. It is thus more
correct to lookup the preauthentication target on the network object
instead of the station bss_list. It used to be that the network list of
bsses was not updated when roam scan was performed. Hence the lookup
was always performed on the station bss_list. But this is no longer the
case, so it is safer to lookup on the network object directly on the
network.
The warnings in the authenticate and connect events were identical
so it could be difficult knowing which print it was if IWD is not
in debug mode (to see more context). The prints were changed to
indicate which event it was and for the connect event the reason
attribute is also parsed.
Note the resp_ies_len is also initialized to zero now. After making
the changes gcc was throwing a warning.
FT is special in that it really should not be interrupted. Since
FRAME/OFFCHANNEL have the highest priority we run the risk of
DPP or some other offchannel operation interfering with FT.
FT is now driven (mostly) by station which removes the connect
callback. Instead once FT is completed, keys set, etc. netdev
will send an event to notify station.
Since l_netconfig's DHCPv6 client instance no longer sets parameters on
the l_icmp6_client instance, call l_icmp6_client_set_nodelay() and
l_icmp6_client_set_debug() directly. Also enable optimistic DAD to
speed up IPv6 setup if available.
All uses of frame-xchg were for action frames, and the frame type
was hard coded. Soon other frame types will be needed so the type
must now be specified in the frame_xchg_prefix structure.
This will make the debug API more robust as well as fix issues
certain drivers have when trying to roam. Some of these drivers
may flush scan results after CMD_CONNECT which results in -ENOENT
when trying to roam with CMD_AUTHENTICATE unless you rescan
explicitly.
Now this will be taken care of automatically and station will first
scan for the BSS (or full scan if not already in results) and
attempt to roam once the BSS is seen in a fresh scan.
The logic to replace the old BSS object was factored out into its
own function to be shared by the non-debug roam scan. It was also
simplified to just update the network since this will remove the
old BSS if it exists.
Add a second netconfig-commit backend which, if enabled, doesn't
directly send any of the network configuration to the kernel or system
files but delegates the operation to an interested client's D-Bus
method as described in doc/agent-api.txt. This backend is switched to
when a client registers a netconfig agent object and is swiched away
from when the client disconnects or unregisters the agent. Only one
netconfig agent can be registered any given time.
Add netconfig_event_handler() that responds to events emitted by
the l_netconfig object by calling netconfig_commit, tracking whether
we're connected for either address family and emitting
NETCONFIG_EVENT_CONNECTED or NETCONFIG_EVENT_FAILED as necessary.
NETCONFIG_EVENT_FAILED is a new event as until now failures would cause
the netconfig state machine to stop but no event emitted so that
station.c could take action. As before, these events are only
emitted based on the IPv4 configuration state, not IPv6.
Add netconfig-commit.c whose main method, netconfig_commit actually sets
the configuration obtained by l_netconfig to the system netdev,
specifically it sets local addresses on the interface, adds routes to the
routing table, sets DNS related data and may add entries to the neighbor
cache. netconfig-commit.c uses a backend-ops type structure to allow
for switching backends. In this commit there's only a default backend
that uses l_netconfig_rtnl_apply() and a struct resolve object to write
the configuration.
netconfig_gateway_to_arp is moved from netconfig.c to netconfig-commit.c
(and renamed.) The struct netconfig definition is moved to netconfig.h
so that both files can access the settings stored in the struct.
To avoid repeated lookups by ifindex, replace the ifindex member in
struct netconfig with a struct netdev pointer. A struct netconfig
always lives shorter than the struct netdev.
* make the error handling simpler,
* make error messages more consistent,
* validate address families,
* for IPv4 skip l_rtnl_address_set_noprefixroute()
as l_netconfig will do this internally as needed.
* for IPv6 set the default prefix length to 64 as that's going to be
used for the local prefix route's prefix length and is a more
practical value.
Drop all the struct netconfig members where we were keeping the parsed
netconfig settings and add a struct l_netconfig object. In
netconfig_load_settings load all of the settings once parsed directly
into the l_netconfig object. Only preserve the mdns configuration and
save some boolean values needed to properly handle static configuration
and FILS. Update functions to use the new set of struct netconfig
members.
These booleans mirroring the l_netconfig state could be replaced by
adding l_netconfig getters for settings which currently only have
setters.
In anticipation of switching to use the l_netconfig API, which
internally handles DHCPv4, DHCPv6, ACD, etc., drop pointers to
instances of l_dhcp_client, l_dhcp6_client and l_acd from struct
netconfig. Also drop all code used for handling events from these
APIs, including code to commit the received configurations to the
system. Committing the final settings to the system netdevs is going to
be handled by a new set of utilities in a new file.
Update ConfigureIPv{4,6}() parameters to simplify mapping our sets of
addresses and routes directly to D-Bus dictionaries. Split Cancel()
into CancelIPv{4,6}().
The RRM module was blindly scanning using the requested
frequency which may or may not be possible given the hardware.
Instead check that the frequency will work and if not reject
the request.
This was reported by a user seeing the RRM scan fail which was
due to the AP requesting a scan on 5GHz when the adapter was
2.4GHz only.
The hostapd events for RRM come regardless of success
so they need to be checked as such. In addition one more
RRM request was added to scan on a frequency which is
disabled (operating class 82, channel 14).
Support for MAC address changes while powered was recently added to
mac80211. This avoids the need to power down the device which both
saves time as well as preserves any allowed frequencies which may
have been disabled if the device powered down.
The code path for changing the address was reused but now just the
'up' callback will be provided directly to l_rtnl_set_mac. Since
there aren't multiple stages of callbacks the rtnl_data structure
isn't strictly needed, but the code looks cleaner and more
consistent between the powered/non-powered code paths.
The comment/debug error print was also updated to be more general
between the two MAC change code paths.
Documentation for MulticastDNS setting suggests it should be part of the
main iwd configuration file. See man iwd.config. However, in reality
the setting was being pulled from the network provisioning file instead.
The latter actually makes more sense since systemd-resolved has its own
set of global defaults. Fix the documentation to reflect the actual
implementation.
Previously we had an ACD failure scenario where a new client forces its
IP to create an IP conflict and an already-connected client detects the
conflict and reacts. Now first test a scenario where a newly connecting
IWD client runs ACD before setting its statically configured IP, detects
a conflict and refuses to continue, then run the second scenario where
the newly connecting DHCP-configured client ignores the conflict and
starts ACD in defend-indefinitely mode and the older client in
defent-once mode gives up its IP.
Due to those variables being global (IWD class variables) calling either
unregister_psk_agent or del on one IWD class instance would unregister
all agents on all instances. Move .psk_agents and two other class
variables to the object. They were already referenced using "self."
as if they were object variables throughout the class.
Part of static_test.py starts a second IWD instance and tries to make
it connect to the AP with the same IP address as the first IWD instance
which is already connected, to produce an IP conflict. For this, the
second instance uses DHCP and the test expects the DHCP server to offer
the address 192.168.1.10 to it. However in the current setup the DHCP
server manages to detect that 192.168.1.10 is in use and offers .11
instead. Break the DHCP server's conflict detection by disabling ICMP
ping replies in order to fix the test.
Previously this has worked because the AP's and the DHCP server's
network interface is in the same network namespace as the first IWD
instance's network interface meaning that pings between the two
interfaces shouldn't work (a known Linux kernel routing quirk...).
I am not sure why those pings currently do work but take no chances and
disable ICMP pings.
netdev does not keep any pointers to struct scan_bss arguments that are
passed in. Make this explicitly clear by modifying the API definitions
and mark these as const.
This adds a few utilities for setting up an FT environment. All the
roaming tests basically copy/paste the same code for setting up the
hostapd instances and this can cause problems if not done correctly.
set_address() sets the MAC address on the device, and restarts hostapd
group_neighbors() takes a list of HostapdCLI objects and makes each a
neighbor to the others.
The neighbor report element requires the operating class which isn't
advertised by hostapd. For this we assume operating class 81 but this
can be set explicitly if it differs. Currently no roaming tests use
5/6GHz frequencies, and just in case an exception will be thrown if
the channel is greater than 14 and the op_class didn't change.
The packet loss test had a few problems. First being that the RSSI for
the original BSS was not low enough to change the rank. This meant any
roam was just lucky that the intended BSS was first in the results.
The second problem is timing related, and only happens on UML. Disabling
the rules after the roaming condition sometimes allows IWD to fully
roam and connect before the next state change checks.
A new test which blocks all data frames once connected, then tries
to send 100 packets. This should result in the kernel sending a
packet loss event to userspace, which IWD should react to and roam.
This adds a new netdev event for packet loss notifications from
the kernel. Depending on the scenario a station may see packet
loss events without any other indications like low RSSI. In these
cases IWD should still roam since there is no data flowing.
The limitations of readline required that the autocompletion choose
a 'default' device. With multiple phys this doesn't work. Now the
readline limitation has been worked around and station can look up
the device for the command completion.
There is a limitation of libreadline where no context/userdata
can be passed to completion functions. Thi affects iwctl since
the entity value isn't known to completion functions.
Workarounds such as getting the default device are employed but
its not a great solution.
Instead hack around this limitation by parsing the prompt to
extract the entity (second arg). Then use a generic match function
given to readline which can call the actual match function and
include the entity.
The ATTR_ARRAY type was quite limited, only supporting u16/u32 and
addresses. This changes the union to a struct so nested/function
can be defined along with array_type.
Some APs use an older hostapd OWE implementation which incorrectly
derives the PTK. To work around this group 19 should be used for
these APs. If there is a failure (reason=2) and the AKM is OWE
set force default group into network and retry. If this has been
done already the behavior is no different and the BSS will be
blacklisted.
If a OWE network is buggy and requires the default group this info
needs to be stored in network in order for it to set this into the
handshake on future connect attempts.
This functionality works around the kernel's behavior of allowing
6GHz only after a regulatory domain update. If the regdom updates
scan.c needs to be aware in order to split up periodic scans, or
insert 6GHz frequencies into an ongoing periodic scan. Doing this
allows any 6GHz BSS's to show up in the scan results rather than
needing to issue an entirely new scan to see these BSS's.
The kernel's regulatory domain updates after some number of beacons
are processed. This triggers a regulatory domain update (and wiphy
dump) but only after a scan request. This means a full scan started
prior to the regdom being set will not include any 6Ghz BSS's even
if the regdom was unlocked during the scan.
This can be worked around by splitting up a large scan request into
multiple requests allowing one of the first commands to trigger a
regdom update. Once the regdom updates (and wiphy dumps) we are
hopefully still scanning and could append an additional request to
scan 6GHz.
In the case of an external scan, we won't have a scan_request object,
sr. Make sure to not crash in this case.
Also, since scan_request can no longer carry the frequency set in all
cases, add a new member to scan_results in order to do so.
Fixes: 27d8cf4ccc59 ("scan: track scanned frequencies for entire request")
The kernel handles setting the regulatory domain by receiving beacons
which set the country IE. Presumably since most regulatory domains
disallow 6GHz the default (world) domain also disables it. This means
until the country is set, 6GHz is disabled.
This poses a problem for IWD's quick scanning since it only scans a few
frequencies and this likely isn't enough beacons for the firmware to
update the country, leaving 6Ghz inaccessable to the user without manual
intervention (e.g. iw scan passive, or periodic scans by IWD).
To try and work around this limitation the quick scan logic has been
updated to check if a 6GHz AP has been connected to before and if that
frequency is disabled (but supported). If this is the case IWD will opt
for a full passive scan rather than scanning a limited set of
frequencies.
For whatever reason the kernel will send regdom updates even if
the regdom didn't change. This ends up causing wiphy to dump
which isn't needed since there should be no changes in disabled
frequencies.
Now the previous country is checked against the new one, and if
they match the wiphy is not dumped again.
A change in regulatory domain can result in frequencies being
enabled or disabled depending on the domain. This effects the
frequencies stored in wiphy which other modules depend on
such as scanning, offchannel work etc.
When the regulatory domain changes re-dump the wiphy in order
to update any frequency restrictions.
A helper to check whether the country code corresponds to a
real country, or some special code indicating the country isn't
yet set. For now, the special codes are OO (world roaming) and
XX (unknown entity).
Events to indicate when a regulatory domain wiphy dump has
started and ended. This is important because certain actions
such as scanning need to be delayed until the dump has finished.
The NEW_SCAN_RESULTS handling was written to only parse the frequency
list if there were no additional scan commands to send. This results in
the scan callback containing frequencies of only the last CMD_TRIGGER.
Until now this worked fine because a) the queue is only used for hidden
networks and b) frequencies were never defined by any callers scanning
for hidden networks (e.g. dbus/periodic scans).
Soon the scan command queue will be used to break up scan requests
meaning only the last scan request frequencies would be used in the
callback, breaking the logic in station.
Now the NEW_SCAN_RESULTS case will parse the frequencies for each scan
command rather than only the last.
The compiler treated the '1' as an int type which was not big enough
to hold a bit shift of 31:
runtime error: left shift of 1 by 31 places cannot be represented in
type 'int'
Instead of doing the iftype check manually, refactor
wiphy_get_supported_iftypes by adding a subroutine which just parses
out iftypes from a mask into a char** list. This removes the need to
case each iftype into a string.
Add extra logging around CQM events to help track wifi status. This is
useful for headless systems that can only be accessed over the network
and so information in the logs is invaluable for debugging outages.
Prior to this change, the only log for CQM messages is saying one was
received. This adds details to what attributes were set and the
associated data with them.
The signal strength log format was chosen to roughly match
wpa_supplicant's which looks like this:
CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-60 noise=-96 txrate=6000
Provides useful information on why a roam might have failed, such as
failing to find the BSS or the BSS being ranked lower, and why that
might be.
The output format is the same as station_add_seen_bss for consistency.
If a frequency is disabled IWD should keep track and disallow any
operations on that channel such as scanning. A new list has been added
which contains only disabled frequencies.
The scan_passive API wasn't using a const struct scan_freq_set as it
should be since it's not modifying the contents. Changing this to
const did require some additional changes like making the scan_parameters
'freqs' member const as well.
After changing scan_parameters, p2p needed updating since it was using
scan_parameters.freqs directly. This was changed to using a separate
scan_freq_set pointer, then setting to scan_parameters.freqs when needed.
Similar to the HT/VHT APIs, this estimates the data rate based on the
HE Capabilities element, in addition to our own capabilities. The
logic is much the same as HT/VHT. The major difference being that HE
uses several MCS tables depending on the channel width. Each width
MCS set is checked (if supported) and the highest estimated rate out
of all the MCS sets is used.
There appears to be a compiler bug with gcc 11.2 which thinks the vht_mcs_set
is a zero length array, and the memset of size 8 is out of bounds. This is only
seen once an element is added to 'struct band'.
In file included from /usr/include/string.h:519,
from src/wiphy.c:34:
In function ‘memset’,
inlined from ‘band_new_from_message’ at src/wiphy.c:1300:2,
inlined from ‘parse_supported_bands’ at src/wiphy.c:1423:11,
inlined from ‘wiphy_parse_attributes’ at src/wiphy.c:1596:5,
inlined from ‘wiphy_update_from_genl’ at src/wiphy.c:1773:2:
/usr/include/bits/string_fortified.h:59:10: error: ‘__builtin_memset’ offset [0, 7] is out of the bounds [0, 0] [-Werror=array-bounds]
59 | return __builtin___memset_chk (__dest, __ch, __len,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
60 | __glibc_objsize0 (__dest));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~
In test-band the band object was allocated using l_malloc, but not
memset to zero. This will cause problems if allocated pointers are
included in struct band once band is freed.
This increases the maximum data rate which now is possible with HE.
A few comments were also updated, one to include 6G when adjusting
the rank for >4000mhz, and the other fixing a typo.
This is a general way of finding the best MCS/NSS values which will work
for HT, VHT, and HE by passing in the max MCS values for each value which
the MCS map could contain (0, 1, or 2).
The HE capabilities information is contained in
NL80211_BAND_ATTR_IFTYPE_DATA where each entry is a set of attributes
which define the rules for one or more interface types. This patch
specifically parses the HE PHY and HE MCS data which will be used for
data rate estimation.
Since the set of info is per-iftype(s) the data is stored in a queue
where each entry contains the PHY/MCS info, and a uint32 bit mask where
each bit index signifies an interface type.
With the addition of HE, the print function for MCS sets needs to change
slightly. The maps themselves are the same format, but the values indicate
different MCS ranges. Now the three MCS max values are passed in.
This queue will hold iftype(s) specific data for HE capabilities. Since
the capabilities may differ per-iftype the data is stored as such. Iftypes
may share a configuration so the band_he_capabilities structure has a
mask for each iftype using that configuration.
When PCI adapters are properly configured they should exist in the
vfio-pci system tree. It is assumed any devices configured as such
are used for test-runner.
This removes the need for a hw.conf file to be supplied, but still
is required for USB adapters. Because of this the --hw option was
updated to allow no value, or a file path.
subprocess.Popen's wait() method was overwritten to be non-blocking but
in certain circumstances you do want to wait forever. Fix this to allow
timeout=None, which calls the parent wait() method directly.
Certain module dependencies were missing, which could cause a crash on
exit under (very unlikely) circumstances.
#0 l_queue_peek_head (queue=<optimized out>) at ../iwd-1.28/ell/queue.c:241
#1 0x0000aaaab752f2a0 in wiphy_radio_work_done (wiphy=0xaaaac3a129a0, id=6)
at ../iwd-1.28/src/wiphy.c:2013
#2 0x0000aaaab7523f50 in netdev_connect_free (netdev=netdev@entry=0xaaaac3a13db0)
at ../iwd-1.28/src/netdev.c:765
#3 0x0000aaaab7526208 in netdev_free (data=0xaaaac3a13db0) at ../iwd-1.28/src/netdev.c:909
#4 0x0000aaaab75a3924 in l_queue_clear (queue=queue@entry=0xaaaac3a0c800,
destroy=destroy@entry=0xaaaab7526190 <netdev_free>) at ../iwd-1.28/ell/queue.c:107
#5 0x0000aaaab75a3974 in l_queue_destroy (queue=0xaaaac3a0c800,
destroy=destroy@entry=0xaaaab7526190 <netdev_free>) at ../iwd-1.28/ell/queue.c:82
#6 0x0000aaaab7522050 in netdev_exit () at ../iwd-1.28/src/netdev.c:6653
#7 0x0000aaaab7579bb0 in iwd_modules_exit () at ../iwd-1.28/src/module.c:181
In this particular case, wiphy module was de-initialized prior to the
netdev module:
Jul 14 18:14:39 localhost iwd[2867]: ../iwd-1.28/src/wiphy.c:wiphy_free() Freeing wiphy phy0[0]
Jul 14 18:14:39 localhost iwd[2867]: ../iwd-1.28/src/netdev.c:netdev_free() Freeing netdev wlan0[45]
Since we use git ls-files to produce the list of all tests for -A, if
the source directory is owned by somebody other than root one might
get:
fatal: unsafe repository ('/home/balrog/repos/iwd' is owned by someone else)
To add an exception for this directory, call:
git config --global --add safe.directory /home/balrog/repos/iwd
Starting
/home/balrog/repos/iwd/tools/..//autotests/ threw an uncaught exception
Traceback (most recent call last):
File "/home/balrog/repos/iwd/tools/run-tests", line 966, in run_auto_tests
subtests = pre_test(ctx, test, copied)
File "/home/balrog/repos/iwd/tools/run-tests", line 814, in pre_test
raise Exception("No hw.conf found for %s" % test)
Exception: No hw.conf found for /home/balrog/repos/iwd/tools/..//autotests/
Mark args.testhome as a safe directory on every run.
Test that the DHCPv4 lease got renewed after the T1 timer runs out.
Then also simulate the DHCPREQUEST during renew being lost and
retransmitted and the lease eventually getting renewed T1 + 60s later.
The main downside is that this test will inevitably take a while if
running in Qemu without the time travel ability.
Update the test and some utility code to run hostapd in an isolated net
namespace for connection_test.py. We now need a second hostapd
instance though because in static_test.py we test ACD and we need to
produce an IP conflict. Moving the hostapd instance unexpectedly fixes
dhcpd's internal mechanism to avoid IP conflicts and it would no longer
assign 192.168.1.10 to the second client, it'd notice that address was
already in use and assign the next free address, or fail if there was
none. So add a second hostapd instance that runs in the main namespace
together with the statically-configured client, it turns out the test
relies on the kernel being unable to deliver IP traffic to interfaces on
the same system.
The kernel will not let us test some scenarios of communication between
two hwsim radios (e.g. STA and AP) if they're in the same net namespace.
For example, when connected, you can't add normal IPv4 subnet routes for
the same subnet on two different interfaces in one namespace (you'd
either get an EEXIST or you'd replace the other route), you can set
different metrics on the routes but that won't fix IP routing. For
testNetconfig the result is that communication works for DHCP before we
get the inital lease but renewals won't work because they're unicast.
Allow hostapd to run on a radio that has been moved to a different
namespace in hw.conf so we don't have to work around these issues.
The --start option was directly passed to the kernel init parameter,
preventing any environment setup from happening.
Intead always use 'run-tests' as the init process but detect --start
and execute that binary/script once inside the environment.
The table header needs to be adjusted to include spaces between
columns.
The 'adapter list' command was also updated to shorted a few
columns which were quite long for the data that is actually displayed.
The existing color code escape sequences required the user to set the
color, write the string, then unset with COLOR_OFF. Instead the macros
can be made to take the string itself and automatically terminate the
color with COLOR_OFF. This makes for much more concise strings.
ad-hoc, ap, and wsc all had descriptions longer than the max width but
this is now taken care of automatically. Remove the tab and newline's
from the description.
The WSC pin command description was also changed to be more accurate
since the pin does not need to be 8 digits.
There was no easy to use API for printing the contents of a table, and
was left up to the caller to handle manually. This adds display_table_row
which makes displaying tables much easier, including automatic support
for line truncation and continuation on the next line.
Lines which are too long will be truncated and displayed on the next
line while also taking into account any colored output. This works with any
number of columns.
This removes the need for the module to play games with encoding newlines
and tabs to make the output look nice.
As a start, this functionality was added to the command display.
This fixes a crash associated with toggling the iftype to AP mode
then calling GetDiagnostics. The diagnostic interface is never
cleaned up when netdev goes down so DBus calls can still be made
which ends up crashing since the AP interface objects are no longer
valid.
Running the following iwctl commands in a script (once or twice)
triggers this crash reliably:
iwctl device wlp2s0 set-property Mode ap
iwctl device wlp2s0 set-property Mode station
iwctl device wlp2s0 set-property Mode ap
iwctl ap wlp2s0 start myssid secret123
iwctl ap wlp2s0 show
++++++++ backtrace ++++++++
0 0x7f8f1a8fe320 in /lib64/libc.so.6
1 0x451f35 in ap_dbus_get_diagnostics() at src/ap.c:4043
2 0x4cdf5a in _dbus_object_tree_dispatch() at ell/dbus-service.c:1815
3 0x4bffc7 in message_read_handler() at ell/dbus.c:285
4 0x4b5d7b in io_callback() at ell/io.c:120
5 0x4b489b in l_main_iterate() at ell/main.c:476
6 0x4b49a6 in l_main_run() at ell/main.c:519
7 0x4b4cd9 in l_main_run_with_signal() at ell/main.c:645
8 0x404f5b in main() at src/main.c:600
9 0x7f8f1a8e8b75 in /lib64/libc.so.6
+++++++++++++++++++++++++++
The generic proxy property display was limited to a width for names/value
which makes the output look nice and uniform, but will cut off any values
which are longer than this limit.
This patch adds some logic to detect this, and continue displaying the
value on the next line.
The width arguments were also updated to be unsigned, which allows checking
the length without a cast.
testAgent had a few tests which weren't reliable, and one was not
actually testing anything, or at least not what the name implied it
should be testing.
The first issue was using iwctl in the first place. There is not a
reliable way to know when iwctl has registered its agent so relying on
that with a sleep, or waiting for the service to become available isn't
100% fool proof. To fix this use the updated PSKAgent which allows
multiple to be registered. This ensures the agent is ready for requests.
This test was also renamed to be consistent with what its actually
testing: that IWD uses the first agent registered.
This removes test_connection_with_other_agent as well because this test
case is covered by the client test itself. There is no need to re-test
iwctl's agent functionality here.
By creating a new bus connection for each agent we can register multiple
with IWD. This did mean the agent interface needs to be unique for each
agent (removing _agent_manager_if) as well as tracking multiple agents
in a list.
IWD uses a few pragmas to ignore warnings which clang does
not support. For -Werror builds these cause build failures
but can be fixed by ignoring unknown warnings and pragmas.
The dbus proxy code assumes that every interface has a set of
properties registered in a 'proxy_interface_property' structure,
assuming the interface has any properties at all. If the interface
is assumed to have no properties (and no property table) but
actually does, the property table lookup fails but is assumed to
have succeeded and causes a crash.
This caused iwctl to crash after some properties were added to DPP
since the DPP interface previously had no properties.
Now, check that the property table was valid before accessing it. This
should allow properties to be added to new interfaces without crashing
older versions of iwctl.
About a month ago hostapd was changed to set the secure bit on
eapol frames during rekeys (bc36991791). The spec is ambiguous
about this and has conflicting info depending on the sections you
read (12.7.2 vs 12.7.6). According to the hostapd commit log TGme
is trying to clarify this and wants to set secure=1 in the case
of rekeys. Because of this, IWD is completely broken with rekeys
since its disallows secure=1 on PTK 1/4 and 2/4.
Now, a bool is passed to the verify functions which signifies if
the PTK has been negotiated already. If secure differs from this
the key frame is not verified.
The test here is verifying that a DBus Connect() call will still
work with 'other' agents registered. In this case it uses iwctl to
set a password, then call Connect() manually.
The problem here is that we have no way of knowing when iwctl fully
starts and registers its agent. There was a sleep in there but that
is unreliable and we occationally were still getting past that without
iwctl having started fully.
To fix this properly we need to wait for iwctl's agent service to appear
on the bus. Since the bus name is unknown we must first find all names,
then cross reference their PID's against the iwctl PID. This is done
using ListNames, and GetConnectionUnixProcessID APIs.
The rekey/reauth logic was broken in a few different ways.
For rekeys the event list was not being reset so any past 4-way
handshake would allow the call to pass. This actually removes
the need for the sleep in the extended key ID test because the
actual handshake event is waited for correctly.
For both rekeys and reauths, just waiting for the EAP/handshake
events was not enough. Without checking if the client got
disconnected we essentially allow a full disconnect and reconnect,
meaning the rekey/reauth failed.
Now a 'disallow' array can be passed to wait_for_event which will
throw an exception if any events in that array are encountered
while waiting for the target event.
Yet another weird UML quirk. The intent of this tests was to ensure
the profile gets encrypted, and to check this both the mtime and
contents of the profile were checked.
But under UML the profile is copied, IWD started, and the profile
is encrypted all without any time passing. The (same) mtime was
then updated without any changes which fails the mtime check.
This puts a sleep after copying the profile to ensure the system
time differs once IWD encrypts the profile.
In UML if any process dies while test-runner is waiting for the DBus
service or some socket to be available it will block forever. This
is due to the way the non_block_wait works.
Its not optimal but it essentially polls over some input function
until the conditions are met. And, depending on the input function,
this can cause UML to hang since it never has a chance to go idle
and advance the time clock.
This can be fixed, at least for services/sockets, by sleeping in
the input function allowing time to pass. This will then allow
test-runner to bail out with an exception.
This patch adds a new wait_for_service function which handles this
automatically, and wait_for_socket was refactored to behave
similarly.
This function was checking if the process object exists, which can
persist long after a process is killed, or dies unexpectedly. Check
that the actual PID exists by sending signal 0.
The man pages (iwd.network) have a section about how to name provisioning
files containing non-alphanumeric characters but not everyone reads the
entire man page.
Warning them that the provisioning file was not read and pointing to
'man iwd.network' should lead someone in the right direction.
EAP-Success might come in with an identifier that is incremented by 1
from the last Response packet. Since identifier field is a byte, the
value might overflow (from 255 -> 0.) This overflow isn't handled
properly resulting in EAP-Success/Failure packets with a 0 identifier
due to overflow being erroneously ignored. Fix that.
test_decryption_failure is quite simple and only verifies that a known
network exists after starting. This causes the test to end before IWD can
fully start up leaving the DBus utilities in limbo having not fully
initialized.
Then, on the next test, stale InterfaceAdded signals arrive (for Station
and P2P) which throw exceptions when trying to get the bus (since IWD is
long gone). In addition the next IWD instance has started so any paths
included in the InterfaceAdded signals are bogus and cause additional
exceptions.
At the end of this test we can call list_devices() which will wait for
the InterfaceAdded signal, and cleanly exit afterwards.
An earlier commit fixed several options but ended up breaking others. The
result_parent/monitor_parent options are hidden from the user and only meant
to be passed to the kernel but they relied on the fact that the underscore
was present, not a dash. This updates the argument to use a dash:
--result-parent
--monitor-parent
Fixes: 00e41eb0ff ("test-runner: Fix parsing for some arguments")
The new regex match update was actually matching way more than it should
have due to how python's 'match' API works. 'match' will return successfully
if zero or more characters match from the beginning of the string. In this
case we actually need the entire regex to match otherwise we start matching
all prefixes, for example:
"--verbose iwd" will match iwd, iwd-dhcp, iwd-acd, iwd-genl and iwd-tls.
Instead use re.fullmatch which requires the entire string to match the
regex.
This was copy pasted from the autoconnect test, and depending on
how the python module cache is ordered can incorrectly use the
wrong test class. This should nothappen because we insert
the paths to the head of the list but for consistency the class
should be named something that reflects what the test is doing.
Enabling this ends up dumping so much logging and, at least with namespaces,
seems to break the logger module and cause really weird behavior, worst of
which is that all processes start dumping to stdout.
This can still be enabled explicitly with --verbose iwd-rtnl, but is turned
off by default when --log is used.
Add a fake resolvconf executable to verify that the right nameserver
addresses were actually committed by iwd. Again use unique nameserver
addresses to reduce the possibility that the test succeeds by pure luck.
In static_test.py add IPv6. Add comments on what we're actually testing
since it wasn't very clear. After the expected ACD conflict detection,
succeed if either the lost address was removed or the client disconnected
from the AP since this seems like a correct action for netconfig to
implement.
In iwd.py make sure all the static methods that touch IWD storage take the
storage_dir parameter instead of hardcoding IWD_STORAGE_DIR, and make
sure that parameter is actually used.
Create the directory if it doesn't exist before copying files into it.
This fixes a problem in testNetconfig where
`IWD.copy_to_storage('ssidTKIP.psk', '/tmp/storage')`
would result in /tmp/storage being created as a file, rather than a
directory containing a file, and resulting in IWD failing to start with:
`Failed to create /tmp/storage`
runner.py creates /tmp/iwd but that doesn't account for IWD sessions
with a custom storage dir path.
Extend test_ip_address_match to support IPv6 and to test the
netmask/prefix length while it reads the local address since those are
retrieved using the same API.
Modify testNetconfig to validate the prefix lengths, change the prefix
lengths to be less common values (not 24 bits for IPv4 or 64 for IPv6),
minor cleanup.
Currently the parameter values reach run-tests by first being parsed by
runner.py's RunnerArgParser, then the resulting object members being
encoded as a commandline string, then as environment variables, then the
environment being converted to a python string list and passed to
RunnerCoreArgParser again. Where argument names (like --sub-tests) had
dashes, the object members had underscores (.sub_tests), this wasn't
taken into account when building the python string list from environment
variables so convert all underscores to dashes and hope that all the
names match now.
Additionally some arguments used nargs='1' or nargs='*' which resulted
in their python values becoming lists. They were converted back to command
line arguments such as: --sub_tests ['static_test.py'], and when parsed
by RunnerCoreArgParser again, the values ended up being lists of lists.
In all three cases it seems the actual user of the parsed value actually
expects a single string with comma-separated substrings in it so just drop
the nargs= uses.
Most users of storage_network_open don't log errors when the function
returns a NULL and fall back to defaults (empty l_settings).
storage_network_open() itself only logs errors if the flie is encrypted.
Now also log an error when l_settings_load_from_file() fails to help track
down potential syntax errors.
Drop the wrong negation in the error check. Check that there are no extra
characters after prefix length suffix. Reset errno 0 before the strtoul
call, as recommended by the manpage.
This is actually a false positive only because
p2p_device_validate_conn_wfd bails out if the IE is NULL which
avoids using wfd_data_length. But its subtle and without inspecting
the code it does seem like the length could be used uninitialized.
src/p2p.c:940:7: error: variable 'wfd_data_len' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
if (dev->conn_own_wfd)
^~~~~~~~~~~~~~~~~
src/p2p.c:946:8: note: uninitialized use occurs here
wfd_data_len))
^~~~~~~~~~~~
src/p2p.c:940:3: note: remove the 'if' if its condition is always true
if (dev->conn_own_wfd)
^~~~~~~~~~~~~~~~~~~~~~
src/p2p.c:906:23: note: initialize the variable 'wfd_data_len' to silence this warning
ssize_t wfd_data_len;
^
= 0
Though the documentation for NLMSG_OK uses an int type for the length
the actual check is based on nlmsghdr->nlmsg_len which is a 32 bit
unsigned integer. Clang was complaining about one call in nlmon.c
because nlmsg_len was int type. Every other usage in nlmon.c uses
a uint32_t, so use that both for consistency and to fix the warning.
monitor/nlmon.c:7998:29: error: comparison of integers of different
signs: '__u32' (aka 'unsigned int') and 'int'
[-Werror,-Wsign-compare]
for (nlmsg = iov.iov_base; NLMSG_OK(nlmsg, nlmsg_len);
^~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/linux/netlink.h💯24: note: expanded from macro 'NLMSG_OK'
(nlh)->nlmsg_len <= (len))
On musl-gcc the compiler is giving a warning for igtk_key_index
and gtk_key_index being used uninitialized. This isn't possible
since they are only used if gtk/igtk are non-NULL so pragma to
ignore the warning.
src/fils.c: In function 'fils_rx_associate':
src/fils.c:580:17: error: 'igtk_key_index' may be used uninitialized
in this function [-Werror=maybe-uninitialized]
580 | handshake_state_install_igtk(fils->hs,
igtk_key_index,igtk + 6,
igtk_len - 6, igtk);
(same error for gtk_key_index)
Since commit 922fa099721903b106a7bc1ccd1ffe8c4a7bce69 in hostap, our
setting of config_methods on P2P-client interface was ignored. Work
around that commit, in addition to the previous workaround we have in
this test, to again ensure the correct config_methods value is used.
This was lazily copied from UML but really made no sense in the context
of QEMU. First QEMU needs the virtfs option to define the mount tag and
in addition a 9p mount should be used rather than 'hostfs'.
The glob match was completely broken for --verbose because globs
are actually path matches, not generally for strings. Instead
match based on regular expressions.
First the verbose option was fixed to store it as an array as well
as write any list arguments into the kernel command line properly
(str() would include []). This has worked up until now because the
'in' keyword in python will work on strings just as well
as lists, for example:
>>> 'test' in 'this,is,a,test'
True
Then, the glob match was replaced with a regex match. Any exceptions
are caught and somewhat ignored (printed, but only seen with --debug).
This only guards against fatal exceptions from a user passing an
invalid expression.
For network configuration files the man pages (iwd.network) state
that [General].{AlwaysRandomizeAddress,AddressOverride} are only
used if main.conf has [General].AddressRandomization=network.
This actually was not being enforced and both iwd.network settings
were still taken into account regardless of what AddressRandomization
was set to (even disabled).
The handshake setup code now checks the AddressRandomization value
and if anything other than 'network' skips the randomization.
This bit of code was throwing exceptions if a test cleaned up files that
test-runner was expecting to clean up. Specifically testHotspot swaps out
main.conf and PSK files many times. This led to the exception being thrown,
caught, and ignored but further on test-runner would print:
"File _X_ not cleaned up!"
Now the files will be checked if they exist before trying to remove it.
There were a few places in dpp/dpp-util which passed a single byte but
was being read in with va_arg(va, size_t). On some architectures this was
causing failures presumably from the compiler using an integer type
smaller than size_t. As we do elsewhere, cast to size_t to force the
compiler to pass a properly sized iteger to va_arg.
Similarly to ofono/phonesim allow tests to be skipped if wpa_supplicant
is not found on the system.
This required some changes to DPP/P2P where Wpas() should be called first
since this can now throw a SkipTest exception.
The Wpas class was also made to allow __del__ to be called without
throwing additional exceptions in case wpa_supplicant was not found.
This allows the EAP tests to pass, but the fix really needs to be in
hostapd itself. Hostapd currently tries to lookup the EAP session
immediately after receiving EAPOL_REAUTH. This uses the identity
it has stored which, in the case of PEAP/TTLS, will always be a phase2
identity. During this initial lookup hostapd hard codes the identity
to be phase1 which is not true for PEAP/TTLS, and the lookup fails.
The current way this was being done was to import collections and
use collections.Mapping. This has been deprecated since python 3.3
but has worked up until python 3.10. After python 3.10 this will
no longer work, and Mapping must be imported from collections.abc.
This was passing IFNAME= along with EAPOL_REAUTH which does not work
in the context of a hostapd socket where the iface is already implied.
This fixes that issue as well as resets the events array and actually
waits for the required events afterwards.
After one of the eap-tls-common-based methods succeeds keep the TLS
tunnel instance until the method is freed, rather than free it the
moment the method succeeds. This fixes repeated method runs where until
now each next run would attempt to create a new TLS tunnel instance
but would have no authentication data (CA certificate, client
certificate, private key and private key passphrase) since those are
were by the old l_tls object from the moment of the l_tls_set_auth_data()
call.
Use l_tls_reset() to reset the TLS state after method success, followed
by a new l_tls_start() when the reauthentication starts.
The signal agent notifications were changed which breaks this test.
Specifically commit ce227e7b94 sends a notification when connected
which breaks the 'agent.calls' check. Since this check is done both
after connecting and once already connected the initial value may
be 1 or 0. Because of this that check was removed entirely.
This test was just piping the PSK files into /tmp/iwd/ssidCCMP.psk
which is a bit fragile if the storage dir was ever to change. Instead
use copy_to_storage and the 'name' keyword to copy the file.
If the user specifies the same parent directory for several outfiles
skip mounting since it already exists. For example:
--monitor /outfiles/monitor.txt --result /outfiles/result.txt
Inside the virtual environments /tmp is mounted as its own FS and not
taken from the host. This poses issues if any output files are directly
under /tmp since test-runner tries to mount the parent directory (/tmp).
The can be fixed by ensuring these output files are either not under
/tmp or at least one folder down the tree (e.g. /tmp/outputs/outfile.txt).
Now this requirement is enforced and test-runner will not start if any
output files parent directory is /tmp.
Usually the test home directory is a git repo somewhere e.g. under
/home. But if the home directory is located under /tmp this poses
a problem since UML remounts /tmp. To handle both cases mount
the home directory explicity.
Certain aspects of QEMU like mounting host directories may still require
root access but for UML this is not the case. To handle both cases first
check if SUDO_UID/GID are set and use those to obtain the actual users
ID's. Otherwise if running as non-root use the UID/GID of the user
directly.
A user reported that IWD was failing to FT in some cases and this was
due to the AP setting the Retry bit in the frame type. This was
unexpected by IWD since it directly checks the frame type against
0x00b0 which does not account for any B8-B15 bits being set.
IWD doesn't need to verify the frame type field for a few reasons:
First mpdu_validate checks the management frame type, Second the kernel
checks prior to forwarding the event. Because of this the check was
removed completely.
Reported-By: Michael Johnson <mjohnson459@gmail.com>
station_signal_agent_notify() has been refactored so that its usage is
simpler. station_rssi_level_changed() has been replaced by an inlined
call to station_signal_agent_notify().
The call to netdev_rssi_level_init() in netdev_connect_common() is
currently a no-op, because netdev->connected has not yet been set at
this stage of the connection attempt. Because netdev_rssi_level_init()
is only used twice, it's been replaced by two inlined calls to
netdev_set_rssi_level_idx().
The SignalLevelAgent API is currently broken by the system bus's
security policy, which blocks iwd's outgoing method call messages. This
patch punches a hole for method calls on the
net.connman.iwd.SignalLevelAgent interface.
There may be situations where DNS information should not be set (for
example in auto-tests where the resolver daemon is not running) or if a
user wishes to ignore DNS information obtained.
Allows granularly specifying the DHCP logging level. This allows the
user to tailor the output to what they need. By default, always display
info, errors and warnings to match the rest of iwd.
Setting `IWD_DHCP_DEBUG` to "debug", "info", "warn", "error" will limit
the logging to that level or higher allowing the default logging
verbosity to be reduced.
Setting `IWD_DHCP_DEBUG` to "1" as per the current behavior will
continue to enable debug level logging.
Use scapy library which allows one to easily construct and fudge various
network packets. This makes constructing spoofed packets much easier
and more readable compared to hex-encoded, hand-crafted frames.
The TA/BSSID addresses of spoofed disassociate frames were set
incorrectly. They should be using the 02:00:00:XX:XX:XX address, but
instead were being converted over to 42:00:00:XX:XX:XX address
After the initial handshake, once the TK has been installed, all frames
coming to the AP should be encrypted. However, it seems that some
kernel/driver combinations allow unencrypted EAPoL frames to be received
and forwarded to userspace. This can lead to various attacks.
Some drivers can report whether the EAPoL frame has been received
unencrypted. Use this information to drop unencrypted EAPoL frames
received after the initial handshake has been completed.
After the initial handshake, once the TK has been installed, all frames
coming from the AP should be encrypted. However, it seems that some
kernel/driver combinations allow unencrypted EAPoL frames to be received
and forwarded to userspace. This can lead to a denial-of-service attack
where receipt of an invalid, unencrypted EAP-Failure frame generated by
an adversary results in iwd terminating an ongoing connection.
Some drivers can report whether the EAPoL frame has been received
unencrypted. Use this information to drop unencrypted EAP frames
received after the initial handshake has been completed.
Reported-by: Domien Schepers <schepers.d@northeastern.edu>
After the initial handshake, once the TK has been installed, all frames
coming from the AP should be encrypted. However, it seems that some
kernel/driver combinations allow unencrypted EAPoL frames to be received
and forwarded to userspace. This can lead to a denial-of-service attack
where receipt of an invalid, unencrypted EAPoL 1/4 frame generated by an
adversary results in iwd terminating an ongoing connection.
Some drivers can report whether the EAPoL frame has been received
unencrypted. Use this information to drop unencrypted PTK 1/4 frames
received after the initial handshake has been completed.
Reported-by: Domien Schepers <schepers.d@northeastern.edu>
Do not fail an ongoing handshake when an invalid EAPoL frame is
received. Instead, follow the intent of 802.11-2020 section 12.7.2:
"EAPOL-Key frames containing invalid field values shall be silently
discarded."
This prevents a denial-of-service attack where receipt of an invalid,
unencrypted EAPoL 1/4 frame generated by an adversary results in iwd
terminating an ongoing connection.
Reported-by: Domien Schepers <schepers.d@northeastern.edu>
Periodic scan requests are meant to be performed with a lower priority
than normal scan requests. They're thus given a different priority when
inserting them into the wiphy work queue. Unfortunately, the priority
is not taken into account when they are inserted into the
sr->requests queue. This can result in the scanning code being confused
since it assumes the top of the queue is always the next scheduled or
currently ongoing scan. As a result any further wiphy_work might never be
started properly.
Apr 27 16:34:40 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_insert() Inserting work item 3
Apr 27 16:34:40 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_next() Starting work item 3
Apr 27 16:34:40 iwd[5117]: ../iwd-1.26/src/scan.c:scan_periodic_timeout() 1
Apr 27 16:34:40 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_insert() Inserting work item 4
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_insert() Inserting work item 5
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_done() Work item 3 done
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/wiphy.c:wiphy_radio_work_next() Starting work item 5
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/scan.c:scan_notify() Scan notification Trigger Scan(33)
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/scan.c:scan_request_triggered() Passive scan triggered for wdev 1
Apr 27 16:34:43 iwd[5117]: ../iwd-1.26/src/scan.c:scan_periodic_triggered() Periodic scan triggered for wdev 1
In the above log, scan request 5 (triggered by dbus) is started before
scan request 4 (periodic scan). Yet the scanning code thinks scan
request 4 was triggered.
Fix this by using the wiphy_work priority to sort the sr->requests queue
so that the scans are ordered in the same manner.
Reported-by: Alvin Šipraga <ALSI@bang-olufsen.dk>
update_config=1 lets wpa_supplicant write config changes
to the config file. In the real world this is what you want
so your DPP credentials are persistant. But for testing this
is not correct since multiple tests use the same config file
and expect it to be pristine.
Occationally wpa_supplicant was connecting to the AP without
running DPP because the config already had the network
credentials.
The upstream code immediately retransmitted any no-ACK frames.
This would work in cases where the peer wasn't actively switching
channels (e.g. the ACK was simply lost) but caused unintended
behavior in the case of a channel switch when the peer was not
listening.
If either IWD or the peer needed to switch channels based on the
authenticate request the response may end up not getting ACKed
because the peer is idle, or in the middle of the hardware changing
channels. IWD would get no-ACK and immediately send the frame again
and most likely the same behavior would result. This would very
quickly increment frame_retry past the maximum and DPP would fail.
Instead when no ACK is received wait 1 second before sending out
the next frame. This can re-use the existing frame_pending buffer
and the same logic for re-transmitting.
There is a potential corner case of an offchannel frame callback
being called after ROC has ended.
This could happen in theory if a received frame is queued right as
the ROC session expires. If the ROC cancel event makes it to user
space before the frame IWD will schedule another ROC then receive
the frame. This doesn't prevent IWD from sending out another
frame since OFFCHANNEL_TX_OK is used, but it will prevent IWD from
receiving a response frame since no dwell duration is used with DPP.
To handle this an roc_started bool was added to the dpp_sm which
tracks the ROC state. If dpp_send_frame is called when roc_started
is false the frame will be saved and sent out once the ROC session
is started again.
ConnectHiddenNetwork creates a temporary network object and initiates a
connection with it. If the connection fails (due to an incorrect
passphrase or other reasons), then this temporary object is destroyed.
Delay its destruction until network_disconnected() since
network_connect_failed is called too early. Also, re-order the sequence
in station_reset_connection_state() in order to avoid using the network
object after it has been freed by network_disconnected().
Fixes: 85d9d6461f1f ("network: Hide hidden networks on connection error")
station_hide_network will remove and free the network object, so calling
network_close_settings will result in a crash. Make sure this is done
prior to network object's destruction.
Fixes: 85d9d6461f1f ("network: Hide hidden networks on connection error")
This only posed a problem oddly if the kernel binary was in the same
directory as test-runner. Resolving the absolute path with the
argument parser resolves the issue.
The TIOCSTTY ioctl was not shared between UML and QEMU which prevented
any console input from making it into UML. This fixes that, and now
ctrl-c can be used to stop UML test execution.
The MountInfo tuple was changed to explicitly take a source string. This
is redundant for UML and system mounts since the fstype/source are the same,
but it allows QEMU to specify the '9p' fstype and use MountInfo rather than
calling mount() explicitly.
This also moves logging cleanup into _prepare_mounts so both UML and QEMU
can use it.
Many processes are not long running (e.g. hostapd_cli, ip, iw, etc)
and the separators written to log files don't show up for these which
makes debugging difficult. This is even true for IWD/Hostapd for tests
with start_iwd=0.
After writing separators for long running processes write them out for
any additional log files too.
Way too many classes have a dependency on the TestContext class, in
most cases only for is_verbose. This patch removes the dependency from
Process and Namespace classes.
For Process, the test arguments can be parsed in the class itself which
will allow for this class to be completely isolated into its own file.
The Namespace class was already relatively isolated. Both were moved
into utils.py which makes 'run-tests' quite a bit nicer to look at and
more fitting to its name.
This commonizes some mounting code between QEMU and UML to allow exporting
of files to the host environment. UML does this with a hostfs mount while
QEMU still uses 9p.
The common code sanitizing the inputs has been put into _prepare_outfiles
and _prepare_mounts was modified to take an 'extra' arugment containing
additional mount points.
The results and monitor parent directories are now passed into the environment
via arguments, and these are hidden from the help text (in addition to testhome)
If --help or unknown options were supplied to test-runner python
would thrown a maximum recusion depth exception. This was due to
the way ArgumentParser was subclassed.
To fix this call ArgumentParser.__init__() rather than using the
super() method. And do this also for the RunnerCoreArgParse
subclass as well. In addition the namespace argument was removed
from parse_args since its not used, and instead supplied directly
to the parents parse_args method.
There is really no reason to have hwsim create interfaces automatically
for test-runner. test-runner already does this for wpa_supplicant and
hostapd, and IWD can create the interface itself.
If a user connection fails on a freshly scanned psk or open hidden
network, during passphrase request or after, it shall be removed from
the network list. Otherwise, it would be possible to directly connect
to that known network, which will appear as not hidden.
The test was rekeying in a loop which ends up confusing hostapd
depending on the timing of when it gets the REKEY command and any
responses from IWD. UML seemed to handle this fine but not QEMU.
Instead delay the rekey a bit to allow it to fully complete before
sending another.
Similarly to hostapd.wait_for_event, IWD's variant needed to act on
an IO watch because events were being received prior to even calling
wait_for_event.
With how fast UML is hostapd events were being sent out prior to
ever calling wait_for_event. Instead set an IO watch on the control
socket and cache all events as they come. Then, when wait_for_event
is called, it can reference this list. If the event is found any
older events are purged from the list.
The AP-ENABLED event needed a special case because hostapd gets
started before the IO watch can be registered. To fix this an
enabled property was added which queries the state directly. This
is checked first, and if not enabled wait_for_event continues normally.
This removes prints which were never supposed to make it upstream as
well as changes sleep() to wd.wait() as well as increase the wait
period to fix issues with how fast UML runs the tests.
This allows the callers condition to be checked immediately without
the mainloop running. In addition may_block=True allows the mainloop
to poll/sleep rather than immediately return back to the caller. This
handles async IO much better than may_block=False, at least for our
use-case.
Namespace process logs were appearing under 'ip' (and also overwriting
actual 'ip' logs) since they were executed with 'ip netns exec <namespace>'.
Instead special case this and append '-<namespace>' to the log file name.
In addition processes executed prior to any tests were being put under
a folder (name of testhome directory). Now this case is detected and these
logs are put at the top level log directory.
This allows test-runner to run inside a UML binary which has some
advantages, specifically time-travel/infinite CPU speed. This should
fix any scheduler related failures we have on slower systems.
Currently this runner does not suppor the same features as the Qemu
runner, specifically:
- No hardware passthrough
- No logging/monitor (UML -> host mounting isn't implemented yet)
In order to keep all test-runner dev scripts working and to work with
the new runner.py system some file renaming was required.
test-runner was renamed to run-tests
A new test-runner was added which only creates the Runner() class.
This (as well as subsequent commits) will separate test-runner into two
parts:
1. Environment setup
2. Running tests
Spurred by interest in adding UML/host support, test-runner was in need
of a refactor to separate out the environment setup and actually running
the tests.
The environment (currently only Qemu) requires quite a bit of special
handling (ctypes mounting/reboot, 9p mounts, tons of kernel options etc)
which nobody writing tests should need to see or care about. This has all
been moved into 'runner.py'.
Running the tests (inside test-runner) won't change much.
The new 'runner.py' module adds an abstraction class which allows different
Runner's to be implemented, and setup their own environment as they see
fit. This is in preparation for UML and Host runners.
Any test using assertTrue(hostapd.list_sta()) improperly has been
replaced with wait_for_event(). There were a few places where this
was actually ok (i.e. IWD is already connected) but most needed to
be changed since the check was just after IWD connected and hostapd's
list_sta() API may not return a fully updated list.
- Setting the IP address was resulting in an error:
Error: any valid prefix is expected rather than "wln58".
This is fixed by reordering the arguments with the IP address first
- Remove the sleep, and use non_block_wait to wait for the IPv6 address
to be set.
Before setting the address, wait for the interface to go down. This
fixes somewhat rare cases where setting the address returns -EBUSY
and ultimately breaks the neighbor reports.
All tests which could avoid calling scan() directly have been
changed to use the 'full_scan' argument to get_ordered_network.
This was done because of unreliable scanning behavior on slower
systems, like VMs. If we get unlucky with the scheduler some beacons
are not received in time and in turn scan results are missing.
Using full_scan=True works around this issue by repeatedly scanning
until the SSID is found.
It looks like some architectures defconfig were adding these in
automatically, but not others. Explicitly add these to make sure
the kernel is built correctly.
Base the root user check on os.getuid() instead of SUDO_GID so as not to
implicitly require sudo. SUDO_GID being set doesn't guarantee that the
effective user is root either since you can sudo to non-root accounts.
We check that config is not None but then access config.ctx outside of
that if block anyway. Then we do the same for config.ctx and
config.ctx.args. Nest the if blocks for the checks to be useful.
p2p_peer_update_existing may be called with a scan_bss struct built from
a Probe Request frame so it can't access bss->p2p_probe_resp_info even
if peer->bss was built from a Probe Response. Check the source frame
type of the scan_bss struct before updating the Device Address.
This fixes one timing issue that would make the autotest fail often.
Since l_malloc is used the frame contents are not zero'ed automatically
which could result in random bytes being present in the frame which were
expected to be zero. This poses a problem when calculating the MIC as the
crypto operations are done on the entire frame with the expectation of
the MIC being zero.
Fixes: 83212f9b23d5 ("eapol: change eapol_create_common to support FILS")
explicit_bzero is used in src/storage.c since commit
01cd8587606bf2da1af245163150589834126c1c but src/missing.h is not
included, as a result build with uclibc fails on:
/home/buildroot/autobuild/instance-0/output-1/host/lib/gcc/powerpc-buildroot-linux-uclibc/10.3.0/../../../../powerpc-buildroot-linux-uclibc/bin/ld: src/storage.o: in function `storage_init':
storage.c:(.text+0x13a4): undefined reference to `explicit_bzero'
Fixes:
- http://autobuild.buildroot.org/results/2aff8d3d7c33c95e2c57f7c8a71e69939f0580a1
When configuring wpa_supplicant all we care about is that it
received the configuration object. wpa_supplicant takes quite a bit
of time to connect in some cases so waiting for that is unneeded.
This also increases the DPP timeout which may be required on slower
systems or if the timing is particularly unlucky when receiving
frames.
This is used to hold the current BSS frequency which will be
used after IWD receives a presence announcement. Since this was
not being set, the logic was always thinking there was a channel
mismatch (0 != current_freq) and attempting to go offchannel to
'0' which resulted in -EINVAL, and ultimately protocol termination.
Change a few critical checks that were failing sometimes:
- A few asserts were changed to wait_for_object_condition
- A 15 second timeout was removed (default used instead)
- Do a full scan at beginning of each test to clear any
cached BSS's. The second test run was getting stale results
and the RSSI values were not expected.
This was not being properly honored when existing networks were
already populated. This poses an issue for any test which uses
full_scan after setting radio values such as signal strength.
For quite a while test-runner has run into frequent OOM exceptions when
running many tests in a row. Its not completely known exactly why, but
seems to point to the 9p driver which is used for sharing the root fs
between the test-runner VM and the host.
With debugging enabled (-d) one can see the available memory available
relatively stable. If a test fails it may spike ~3-4kb but this quickly
recovers as python garbage collects.
At some point the kernel faults failing to allocate which (usually) is
shown by a python OOM exception. At this point there is plenty of
available memory.
Dumping the kernel trace its seen that the 9p driver is involved:
[ 248.962949] test-runner: page allocation failure: order:7, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
[ 248.962958] CPU: 2 PID: 477 Comm: test-runner Not tainted 5.16.0 #91
[ 248.962960] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-4.fc34 04/01/2014
[ 248.962961] Call Trace:
[ 248.962964] <TASK>
[ 248.962965] dump_stack_lvl+0x34/0x44
[ 248.962971] warn_alloc.cold+0x78/0xdc
[ 248.962975] ? __alloc_pages_direct_compact+0x14c/0x1e0
[ 248.962979] __alloc_pages_slowpath.constprop.0+0xbfe/0xc60
[ 248.962982] __alloc_pages+0x2d5/0x2f0
[ 248.962984] kmalloc_order+0x23/0x80
[ 248.962988] kmalloc_order_trace+0x14/0x80
[ 248.962990] v9fs_alloc_rdir_buf.isra.0+0x1f/0x30
[ 248.962994] v9fs_dir_readdir+0x51/0x1d0
[ 248.962996] ? __handle_mm_fault+0x6e0/0xb40
[ 248.962999] ? inode_security+0x1d/0x50
[ 248.963009] ? selinux_file_permission+0xff/0x140
[ 248.963011] iterate_dir+0x16f/0x1c0
[ 248.963014] __x64_sys_getdents64+0x7b/0x120
[ 248.963016] ? compat_fillonedir+0x150/0x150
[ 248.963019] do_syscall_64+0x3b/0x90
[ 248.963021] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 248.963024] RIP: 0033:0x7fedd7c6d8c7
[ 248.963026] Code: 00 00 0f 05 eb b7 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa b8 ff ff ff 7f 48 39 c2 48 0f 47 d0 b8 d9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 81 a5 0f 00 f7 d8 64 89 02 48
[ 248.963028] RSP: 002b:00007ffd06cd87e8 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
[ 248.963031] RAX: ffffffffffffffda RBX: 000056090d87dd20 RCX: 00007fedd7c6d8c7
[ 248.963032] RDX: 0000000000080000 RSI: 000056090d87dd50 RDI: 000000000000000f
[ 248.963033] RBP: 000056090d87dd50 R08: 0000000000000030 R09: 00007fedc7d37af0
[ 248.963035] R10: 00007fedc7d7d730 R11: 0000000000000293 R12: ffffffffffffff88
[ 248.963038] R13: 000056090d87dd24 R14: 0000000000000000 R15: 000056090d0485e8
Here its seen an allocation of 512k is being requested (order:7), but faults.
In this run it there was ~35MB of available memory on the system.
Available Memory: 35268 kB
Last Test Delta: -2624 kB
Per-test Usage:
[ 0] ** 37016
[ 1] ********* 41584
[ 2] * 36280
[ 3] ********* 41452
[ 4] ******** 40940
[ 5] ****** 39284
[ 6] **** 38348
[ 7] *** 37496
[ 8] **** 37892
[ 9] 35268
This can be reproduced by running all autotests (changing the ram down to
~128MB helps trigger it faster):
./tools/test-runner -k <kernel> -d
After many attempts to fix this it was finally found that simply removing the
explicit 9p2000.u version from the kernel command line 'fixed' the problem.
This even allows decreasing the RAM down to 256MB from 384MB and so far no
OOM's have been seen.
In debug mode the test context is printed before each test. This
adds some additional information in there:
Available Memory: /proc/meminfo: MemAvailable
Last Test Delta: Change in usage between current and last test
Per-test Usage: Graph of usage relative to all past tests. This is
useful for seeing a trend down/up of usage.
This could fail and was not being checked. It was minimally changed to
take the ifindex directly (this was the only thing needed from the ethdev)
which allows checking prior to initializing the ethdev.
Running the tests inside a VM makes it difficult for the host to figure
out if the test actually failed or succeeded. For a human its easy to
read the results table, but for an automated system parsing this would
be fragile. This adds a new option --result <file> which writes PASS/FAIL
to the provided file once all tests are completed. Any failures results in
'FAIL' being written to the file.
\e[1;30m is bold black, often, but not always displayed bright black or
bold bright black. In case it is displayed as real black it is
invisible. \e[1;90m is explicit bold bright black.
\e[37m is white, therefore it is not suitable to be labeled as GREY,
which is \e[90m
The logic here assumed any BSS's in the roam scan were identical to
ones in station's bss_list with the same address. Usually this is true
but, for example, if the BSS changed frequency the one in station's
list is invalid.
Instead when a match is found remove the old BSS and re-insert the new
one.
With the addition of 6GHz '6000' is no longer the maximum frequency
that could be in .known_network.freq. For more robustness
band_freq_to_channel is used to validate the frequency.
Scanning while in AP mode is somewhat of an edge case, but it does
have some usefulness specifically with onboarding new devices, i.e.
a new device starts an AP, a station connects and provides the new
device with network credentials, the new device switches to station
mode and connects to the desired network.
In addition this could be used later for ACS (though this is a bit
overkill for IWD's access point needs).
Since AP performance is basically non-existant while scanning this
feature is meant to be used in a limited scope.
Two DBus API's were added which mirror the station interface: Scan and
GetOrderedNetworks.
Scan is no different than the station variant, and will perform an active
scan on all channels.
GetOrderedNetworks diverges from station and simply returns an array of
dictionaries containing basic information about networks:
{
Name: <ssid>
SignalStrength: <mBm>
Security: <psk, open, or 8021x>
}
Limitations:
- Hidden networks are not supported. This isn't really possible since
the SSID's are unknown from the AP perspective.
- Sharing scan results with station is not supported. This would be a
convenient improvement in the future with respect to onboarding new
devices. The scan could be performed in AP mode, then switch to
station and connect immediately without needing to rescan. A quick
hack could better this situation by not flushing scan results in
station (if the kernel retains these on an iftype change).
This was already implemented in station but with no dependency on
that module at all. AP will need this for a scanning API so its
being moved into scan.c.
The 802.11ax standards adds some restrictions for the 6GHz band. In short
stations must use SAE, OWE, or 8021x on this band and frame protection is
required.
All uses of this macro will work with a bitwise comparison which is
needed for 6GHz checks and somewhat more flexible since it can be
used to compare RSN info, not only single AKM values.
This adds checks if MFP is set to 0 or 1:
0 - Always fail if the frequency is 6GHz
1 - Fail if MFPC=0 and the frequency is 6GHz.
If HW is capable set MFPR=1 for 6GHz
This is a new band defined in the WiFi 6E (ax) amendment. A completely
new value is needed due to channel reuse between 2.4/5 and 6GHz.
util.c needed minimal updating to prevent compile errors which will
be fixed later to actually handle this band. WSC also needed a case
added for 6GHz but the spec does not outline any RF Band value for
6GHz so the 5GHz value will be returned in this case.
sae.c was failing to build on some platforms:
error: implicit declaration of function 'reallocarray'; did you mean 'realloc'?
[-Werror=implicit-function-declaration]
In certain rare cases IWD gets a link down event before nl80211 ever sends
a disconnect event. Netdev notifies station of the link down which causes
station to be freed, but netdev remains in the same state. Then later the
disconnect event arrives and netdev still thinks its connected, calls into
(the now freed) station object and causes a crash.
To fix this netdev_connect_free() is now called on any link down events
which will reset the netdev object to a proper state.
src/netdev.c:netdev_link_notify() event 16 on ifindex 16
src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
src/netdev.c:netdev_link_notify() event 16 on ifindex 16
src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
src/netdev.c:netdev_deauthenticate_event()
src/netdev.c:netdev_link_notify() event 16 on ifindex 16
src/station.c:station_free()
src/netconfig.c:netconfig_destroy()
src/resolve.c:resolve_systemd_revert() ifindex: 16
src/station.c:station_roam_state_clear() 16
src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
src/netdev.c:netdev_disconnect_event()
Received Deauthentication event, reason: 3, from_ap: false
0 0x472fa4 in station_disconnect_event src/station.c:2916
1 0x472fa4 in station_netdev_event src/station.c:2954
2 0x43a262 in netdev_disconnect_event src/netdev.c:1213
3 0x43a262 in netdev_mlme_notify src/netdev.c:5471
4 0x6706eb in process_multicast ell/genl.c:1029
5 0x6706eb in received_data ell/genl.c:1096
6 0x65e630 in io_callback ell/io.c:120
7 0x65a94e in l_main_iterate ell/main.c:478
8 0x65b0b3 in l_main_run ell/main.c:525
9 0x65b0b3 in l_main_run ell/main.c:507
10 0x65b5cc in l_main_run_with_signal ell/main.c:647
11 0x4124d7 in main src/main.c:532
If an event is in response to some command which is returning an
unexpected value (unexpected with respect to wpas.py) handle_eow
would raise an exception.
Specifically with DPP this was being hit when the URI was being
returned.
The difference between the existing code is that IWD will send the
authentication request, making it the initiator.
This handles the use case where IWD is provided a peers URI containing
its bootstrapping key rather than IWD always providing its own URI.
A new DBus API was added, ConfigureEnrollee().
Using ConfigureEnrollee() IWD will act as a configurator but begin by
traversing a channel list (URI provided or default) and waiting for
presence announcements (with one caveat). When an announcement is
received IWD will send an authentication request to the peer, receive
its reply, and send an authentication confirm.
As with being a responder, IWD only supports configuration to the
currently connected BSS and will request the enrollee switch to this
BSS's frequency to preserve network performance.
The caveat here is that only one driver (ath9k) supports multicast frame
registration which prevents presence frame from being received. In this
case it will be required the the peer URI contains a MAC and channel
information. This is because IWD will jump right into sending auth
requests rather than waiting for a presence announcement.
The frame watch which covers the presence procedure (and most
frames for that matter) needs to support multicast frames for
presence to work. Doing this in frame-xchg seems like the right
choice but only ath9k supports multicast frame registration.
Because of this limited support DPP will register for these frames
manually.
Parses K (key), M (mac), C (class/channels), and V (version) tokens
into a new structure dpp_uri_info. H/I are not parsed since there
currently isn't any use for them.
This was caught by static analysis. As is common this should never
happen in the real world since the only way this can fail (apart from
extreme circumstances like OOM) is if the key size is incorrect, which
it will never be.
Static analysis flagged that 'path' was never being checked (which
should not ever be NULL) but during that review I noticed stat()
was being called, then fstat afterwards.
Adds a new wait argument which, if false, will call the DBus method
and return immediately. This allows the caller to create multiple
radios very quickly, simulating (as close as we can) a wifi card
with dual phy's which appear in the kernel simultaneously.
The name argument was also changed to be mandatory, which is now
required by hwsim.
Currently CreateRadio only allows a single outstanding DBus message
until the radio is fully created. 99% of the time this is just fine
but in order to test dual phy cards there needs to be support for
phy's appearing at the same time.
This required storing the pending DBus message inside the radio object
rather than a single static variable.
The code was refactored to handle the internal radio info objects better
for the various cases:
- Creation from CreateRadio()
- Radio already existed before hwsim started, or created externally
- Existing radio changed name, address, etc.
First, Name is now a required option to CreateRadio(). This allows
the radio info to be pushed to the queue immediately (also allowing the
pending DBus message to be tracked). Then, when the NEW_RADIO event
fires the pending radio can be looked up (by name) and filled with the
remaining info.
If the radio was not found by name but a matching ID was found this is
the 'changed' case and the radio is re-initialized with the changed
values.
If neither name or ID matches the radio was created externally, or
prior to hwsim starting. A radio info object is created at this time
and initialized.
The ID was changed to a signed integer in order to initialize it to an
invalid number -1. Doing this was required since a pending uninitalized
radio ID (0) could match an existing radio ID. This required some
bounds checks in case the kernels counter reaches an extremely high value.
This isn't likely to ever happen in practice.
This tool will decrypt an IWD network profile which was previously
encrypted using a systemd provided key. Either a text passphrase
can be provided (--pass) or a file containing the secret (--file).
This can be useful for debugging, or recovering an encrypted
profile after enabling SystemdEncrypt.
Recently systemd added the ability to pass secret credentials to
services via LoadCredentialEncrypted/SetCredentialEncrypted. Once
set up the service is able to read the decrypted credentials from
a file. The file path is found in the environment variable
CREDENTIALS_DIRECTORY + an identifier. The value of SystemdEncrypt
should be set to the systemd key ID used when the credential was
created.
When SystemdEncrypt is set IWD will attempt to read the decrypted
secret from systemd. If at any point this fails warnings will be
printed but IWD will continue normally. Its expected that any failures
will result in the inability to connect to any networks which have
previously encrypted the passphrase/PSK without re-entering
the passphrase manually. This could happen, for example, if the
systemd secret was changed.
Once the secret is read in it is set into storage to be used for
profile encryption/decryption.
Using storage_decrypt() hotspot can also support profile encyption.
The hotspot consortium name is used as the 'ssid' since this stays
consistent between hotspot networks for any profile.
Some users don't like the idea of storing network credentials in
plaintext on the file system. This patch implements an option to
encrypt such profiles using a secret key. The origin of the key can in
theory be anything, but would typically be provided by systemd via
'LoadEncryptedCredential' setting in the iwd unit file.
The encryption operates on the entire [Security] group as well as all
embedded groups. Once encrypted the [Security] group will be replaced
with two key/values:
EncryptedSalt - A random string of bytes used for the encryption
EncryptedSecurity - A string of bytes containing the encrypted
[Security] group, as well as all embedded groups.
After the profile has been encrypted these values should not be
modified. Note that any values added to [Security] after encryption
has no effect. Once the profile is encrypted there is no way to modify
[Security] without manually decrypting first, or just re-creating it
entirely which effectively treated a 'new' profile.
The encryption/decryption is done using AES-SIV with a salt value and
the network SSID as the IV.
Once a key is set any profiles opened will automatically be encrypted
and re-written to disk. Modules using network_storage_open will be
provided the decrypted profile, and will be unaware it was ever
encrypted in the first place. Similarly when network_storage_sync is
called the profile will by automatically encrypted and written to disk
without the caller needing to do anything special.
A few private storage.c helpers were added to serve several purposes:
storage_init/exit():
This sets/cleans up the encryption key direct from systemd then uses
extract and expand to create a new fixed length key to perform
encryption/decryption.
__storage_decrypt():
Low level API to decrypt an l_settings object using a previously set
key and the SSID/name for the network. This returns a 'changed' out
parameter signifying that the settings need to be encrypted and
re-written to disk. The purpose of exposing this is for a standalone
decryption tool which does not re-write any settings.
storage_decrypt():
Wrapper around __storage_decrypt() that handles re-writing a new
profile to disk. This was exposed in order to support hotspot profiles.
__storage_encrypt():
Encrypts an l_settings object and returns the full profile as data
This got merged without a few additional fixes, in particular an
over 80 character line and incorrect length check.
Fixes: d8116e8828b4 ("dpp-util: add dpp_point_from_asn1()")
When we detect a new phy being added, we schedule a filtered dump of
the newly detected WIPHY and associated INTERFACEs. This code path and
related processing of the dumps was mostly shared with the un-filtered
dump of all WIPHYs and INTERFACEs which is performed when iwd starts.
This normally worked fine as long as a single WIPHY was created at a
time. However, if multiphy new phys were detected in a short amount of
time, the logic would get confused and try to process phys that have not
been probed yet. This resulted in iwd trying to create devices or not
detecting devices properly.
Fix this by only processing the target WIPHY and related INTERFACEs
when the filtered dump is performed, and not any additional ones that
might still be pending.
While here, remove a misleading comment:
manager_wiphy_check_setup_done() would succeed only if iwd decided to
keep the default interfaces created by the kernel.
This simulates the conditions that trigger a free-after-use which was
fixed with:
2c355db7 ("scan: remove periodic scans from queue on abort")
This behavior can be reproduced reliably using this test with the above
patch reverted.
This debug print was before any checks which could bail out prior to
autoconnect starting. This was confusing because debug logs would
contain multiple "station_autoconnect_start()" prints making you think
autoconnect was started several times.
The periodic scan code was refactored to make normal scans and
periodic scans consistent by keeping both in the same queue. But
that change left out the abort path where periodic scans were not
actually removed from the queue.
This fixes a rare crash when a periodic scan has been triggered and
the device goes down. This path never removes the request from the
queue but still frees it. Then when the scan context is removed the
stale request is freed again.
0 0x4bb65b in scan_request_cancel src/scan.c:202
1 0x64313c in l_queue_clear ell/queue.c:107
2 0x643348 in l_queue_destroy ell/queue.c:82
3 0x4bbfb7 in scan_context_free src/scan.c:209
4 0x4c9a78 in scan_wdev_remove src/scan.c:2115
5 0x42fecd in netdev_free src/netdev.c:965
6 0x445827 in netdev_destroy src/netdev.c:6507
7 0x52beb9 in manager_config_notify src/manager.c:765
8 0x67084b in process_multicast ell/genl.c:1029
9 0x67084b in received_data ell/genl.c:1096
10 0x65e790 in io_callback ell/io.c:120
11 0x65aaae in l_main_iterate ell/main.c:478
12 0x65b213 in l_main_run ell/main.c:525
13 0x65b213 in l_main_run ell/main.c:507
14 0x65b72c in l_main_run_with_signal ell/main.c:647
15 0x4124e7 in main src/main.c:532
If netdev_connect_failed is called before netdev_get_oci_cb() the
netdev's handshake will be destroyed and ultimately crash when the
callback is called.
This patch moves the cancelation into netdev_connect_free rather than
netdev_free.
++++++++ backtrace ++++++++
0 0x7f4e1787d320 in /lib64/libc.so.6
1 0x42634c in handshake_state_set_chandef() at src/handshake.c:1057
2 0x40a11b in netdev_get_oci_cb() at src/netdev.c:2387
3 0x483d7b in process_unicast() at ell/genl.c:986
4 0x480d3c in io_callback() at ell/io.c:120
5 0x48004d in l_main_iterate() at ell/main.c:472 (discriminator 2)
6 0x4800fc in l_main_run() at ell/main.c:521
7 0x48032c in l_main_run_with_signal() at ell/main.c:649
8 0x403e95 in main() at src/main.c:532
9 0x7f4e17867b75 in /lib64/libc.so.6
+++++++++++++++++++++++++++
Commit 4d2176df2985 ("handshake: Allow event handler to free handshake")
introduced a re-entrancy guard so that handshake_state objects that are
destroyed as a result of the event do not cause a crash. It rightly
used a temporary object to store the passed in handshake. Unfortunately
this caused variable shadowing which resulted in crashes fixed by commit
d22b174a7318 ("handshake: use _hs directly in handshake_event").
However, since the temporary was no longer used, this fix itself caused
a crash:
#0 0x00005555f0ba8b3d in eapol_handle_ptk_1_of_4 (sm=sm@entry=0x5555f2b4a920, ek=0x5555f2b62588, ek@entry=0x16, unencrypted=unencrypted@entry=false) at src/eapol.c:1236
1236 handshake_event(sm->handshake,
(gdb) bt
#0 0x00005555f0ba8b3d in eapol_handle_ptk_1_of_4 (sm=sm@entry=0x5555f2b4a920, ek=0x5555f2b62588, ek@entry=0x16, unencrypted=unencrypted@entry=false) at src/eapol.c:1236
#1 0x00005555f0bab118 in eapol_key_handle (unencrypted=<optimized out>, frame=<optimized out>, sm=0x5555f2b4a920) at src/eapol.c:2343
#2 eapol_rx_packet (proto=<optimized out>, from=<optimized out>, frame=<optimized out>, unencrypted=<optimized out>, user_data=0x5555f2b4a920) at src/eapol.c:2665
#3 0x00005555f0bac497 in __eapol_rx_packet (ifindex=62, src=src@entry=0x5555f2b62574 "x\212 J\207\267", proto=proto@entry=34958, frame=frame@entry=0x5555f2b62588 "\002\003",
len=len@entry=121, noencrypt=noencrypt@entry=false) at src/eapol.c:3017
#4 0x00005555f0b8c617 in netdev_control_port_frame_event (netdev=0x5555f2b64450, msg=0x5555f2b62588) at src/netdev.c:5574
#5 netdev_unicast_notify (msg=msg@entry=0x5555f2b619a0, user_data=<optimized out>) at src/netdev.c:5613
#6 0x00007f60084c9a51 in dispatch_unicast_watches (msg=0x5555f2b619a0, id=<optimized out>, genl=0x5555f2b3fc80) at ell/genl.c:954
#7 process_unicast (nlmsg=0x7fff61abeac0, genl=0x5555f2b3fc80) at ell/genl.c:973
#8 received_data (io=<optimized out>, user_data=0x5555f2b3fc80) at ell/genl.c:1098
#9 0x00007f60084c61bd in io_callback (fd=<optimized out>, events=1, user_data=0x5555f2b3fd20) at ell/io.c:120
#10 0x00007f60084c536d in l_main_iterate (timeout=<optimized out>) at ell/main.c:478
#11 0x00007f60084c543e in l_main_run () at ell/main.c:525
#12 l_main_run () at ell/main.c:507
#13 0x00007f60084c5670 in l_main_run_with_signal (callback=callback@entry=0x5555f0b89150 <signal_handler>, user_data=user_data@entry=0x0) at ell/main.c:647
#14 0x00005555f0b886a4 in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:532
This happens when the driver does not support rekeying, which causes iwd to
attempt a disconnect and re-connect. The disconnect action is
taken during the event callback and destroys the underlying eapol state
machine. Since a temporary isn't used, attempting to dereference
sm->handshake results in a crash.
Fix this by introducing a UNIQUE_ID macro which should prevent shadowing
and using a temporary variable as originally intended.
Fixes: d22b174a7318 ("handshake: use _hs directly in handshake_event")
Fixes: 4d2176df2985 ("handshake: Allow event handler to free handshake")
Reported-By: Toke Høiland-Jørgensen <toke@toke.dk>
Tested-by: Toke Høiland-Jørgensen <toke@toke.dk>
There is no need to punch the holes for netdev/wheel groups to send to
the .Agent interface. This is only done by the iwd daemon itself and
the policy for user 'root' already takes care of this.
A select few drivers send this instead of SIGNAL_MBM. The docs say this
value is the signal 'in unspecified units, scaled to 0..100'. The range
for SIGNAL_MBM is -10000..0 so this can be scaled to the MBM range easy
enough...
Now, this isn't exactly correct because this value ultimately gets
returned from GetOrderedNetworks() and is documented as 100 * dBm where
in reality its just a unit-less signal strength value. Its not ideal, but
this patch at least will fix BSS ranking for these few drivers.
The 'at_console' D-Bus policy setting has been deprecated for more then
10 years and could be ignored at any time in the future. Moreover, while
the intend was to allow locally logged on users to interact with iwd, it
didn't actually do that.
More info at https://www.spinics.net/lists/linux-bluetooth/msg75267.html
and https://gitlab.freedesktop.org/dbus/dbus/-/issues/52
Therefor remove the 'at_console' setting block.
On Debian (based) systems, there is a standard defined group which is
allowed to manage network interfaces, and that is the 'netdev' group.
So add a D-Bus setting block to grant the 'netdev' group that access.
Building on GCC 8 resulted in this compiler error.
src/sae.c:107:25: error: implicit declaration of function 'reallocarray';
did you mean 'realloc'? [-Werror=implicit-function-declaration]
sm->rejected_groups = reallocarray(NULL, 2, sizeof(uint16_t));
src/erp.c:134:10: error: comparison of integer expressions of different
signedness: 'unsigned int' and 'int' [-Werror=sign-compare]
src/eap-ttls.c:378:10: error: comparison of integer expressions of different signedness: 'uint32_t' {aka 'unsigned int'} and 'int' [-Werror=sign-compare]
Fixes the following crash:
#0 0x000211c4 in netdev_connect_event (msg=<optimized out>, netdev=0x2016940) at src/netdev.c:2915
#1 0x76f11220 in process_multicast (nlmsg=0x7e8acafc, group=<optimized out>, genl=<optimized out>) at ell/genl.c:1029
#2 received_data (io=<optimized out>, user_data=<optimized out>) at ell/genl.c:1096
#3 0x76f0da08 in io_callback (fd=<optimized out>, events=1, user_data=0x200a560) at ell/io.c:120
#4 0x76f0ca78 in l_main_iterate (timeout=<optimized out>) at ell/main.c:478
#5 0x76f0cb74 in l_main_run () at ell/main.c:525
#6 l_main_run () at ell/main.c:507
#7 0x76f0cdd4 in l_main_run_with_signal (callback=callback@entry=0x18c94 <signal_handler>, user_data=user_data@entry=0x0)
at ell/main.c:647
#8 0x00018178 in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:532
This crash was introduced in commit:
4d2176df2985 ("handshake: Allow event handler to free handshake")
The culprit seems to be that 'hs' is being used both in the caller and
in the macro. Since the macro defines a variable 'hs' in local block
scope, it overrides 'hs' from function scope. Yet (_hs) still evaluates
to 'hs' leading the local variable to be initialized with itself. Only
the 'handshake_event(hs, HANDSHAKE_EVENT_SETTING_KEYS))' is affected
since it is the only macro invocation that uses 'hs' from function
scope. Thus, the crash would only happen on hardware supporting handshake
offload (brcmfmac).
Fix this by removing the local scope variable declaration and evaluate
(_hs) instead.
Fixes: 4d2176df2985 ("handshake: Allow event handler to free handshake")
- Ensure that input isn't an empty string
- Ensure that EINVAL errno (which could be optionally returned by
strto{ul|l} is also checked.
- Since strtoul allows '+' and '-' characters in input, ensure that
input which is expected to be an unsigned number doesn't start with
'-'
Given an ASN1 blob of the right form, parse and create
an l_ecc_point object. The form used is specific to DPP
hence why this isn't general purpose and put into dpp-util.
Like in ap.c, allow the event callback to mark the handshake state as
destroyed, without causing invalid accesses after the callback has
returned. In this case the crash was because try_handshake_complete
needed to access members of handshake_state after emitting the event,
as well as access the netdev, which also has been destroyed:
==257707== Invalid read of size 8
==257707== at 0x408C85: try_handshake_complete (netdev.c:1487)
==257707== by 0x408C85: try_handshake_complete (netdev.c:1480)
(...)
==257707== Address 0x4e187e8 is 856 bytes inside a block of size 872 free'd
==257707== at 0x484621F: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==257707== by 0x437887: ap_stop_handshake (ap.c:151)
==257707== by 0x439793: ap_del_station (ap.c:316)
==257707== by 0x43EA92: ap_station_disconnect (ap.c:3411)
==257707== by 0x43EA92: ap_station_disconnect (ap.c:3399)
==257707== by 0x454276: p2p_group_event (p2p.c:1006)
==257707== by 0x439147: ap_event (ap.c:281)
==257707== by 0x4393AB: ap_new_rsna (ap.c:390)
==257707== by 0x4393AB: ap_handshake_event (ap.c:1010)
==257707== by 0x408C7F: try_handshake_complete (netdev.c:1485)
==257707== by 0x408C7F: try_handshake_complete (netdev.c:1480)
(...)
Previously we added logic to defer doing anything in ap_free() to after
the AP event handler has returned so that ap_event() has a chance to
inform whoever called it that the ap_state has been freed. But there's
also a chance that the event handler is destroying both the AP and the
netdev it runs on, so after the handler has returned we can't even use
netdev_get_wdev_id or netdev_get_ifindex. The easiest solution seems to
be to call ap_reset() in ap_free() even if we're within an event handler
to ensure we no longer need any external objects. Also make sure
ap_reset() can be called multiple times.
Another option would be to watch for NETDEV_WATCH_EVENT_DEL and remove
our reference to the netdev (because there's no need actually call
l_rtnl_ifaddr_delete or frame_watch_wdev_remove if the netdev was
destroyed -- frame_watch already tracks netdev removals), or to save
just the ifindex and the wdev id...
The purpose of this was to have a single utility to both cancel an
existing offchannel operation (if one exists) and start a new one.
The problem was the previous offchannel operation was being canceled
first which opened up the radio work queue to other items. This is
not desireable as, for example, a scan would end up breaking the
DPP protocol most likely.
Starting the new offchannel then canceling is the correct order of
operations but to do this required saving the new ID, canceling, then
setting offchannel_id to the new ID so dpp_presence_timeout wouldn't
overwrite the new ID to zero.
This also removes an explicit call to offchannel_cancel which is
already done by dpp_offchannel_start.
Several members are named based on initiator/responder (i/r)
terminology. Eventually both initiator and responder will be
supported so rename these members to use own/peer naming
instead.
ASN1 parsing will soon be required which will need some utilities in
asn1-private.h. To avoid duplication include this private header and
replace the OID's with the defined structures as well as remove the
duplicated macros.
station_set_scan_results takes an autoconnect flag which was being
set true in both regular/quick autoconnect scans. Since OWE networks
are processed after setting the scan results IWD could end up
connecting to a network before all the OWE hidden networks are
populated.
To fix this regular/quick autoconnect results will set the flag to
false, then process OWE networks, then start autoconnect. If any
OWE network scans are pending station_autoconnect_start will fail
but will pick back up after the hidden OWE scan.
During investigation another separate crash was found. The original is
caused by a disconnect event coming in after a neighbor report scan
was completed (roam failed) during the full roam scan.
The second crash is caused by a disconnect coming in during a full
roam scan when no neighbor report scan was ever issued.
scan_request_failed and scan_finished remove the finished scan_request
from the request queue right away, before calling the callback. This
breaks those clients that rely on scan_cancel working on such requests
(i.e. to force the destroy callback to be invoked synchronously, see
a0911ca77812 ("station: Make sure roam_scan_id is always canceled").
Fix this by removing the scan_request from the request queue after
invoking the callback. Also provide a re-entrancy guard that will make
sure that the scan_request isn't removed in scan_cancel itself.
There are similar operations being performed but with different
callbacks and userdata, depending on whether 'sr' is NULL or not.
Optimize the function flow slightly to make if-else unnecessary.
While here, update the comment. periodic scans are now scheduled only
based on the periodic timeout timer.
If periodic scan is active and we receive a SCAN_ABORTED event, we would
still invoke the periodic scan callback with an error. This is rather
pointless since the periodic scan callback cannot do anything useful
with this information. Fix that.
We should never reach a point where NEW_SCAN_RESULTS or SCAN_ABORTED are
received before a corresponding TRIGGER_SCAN is received. Even if this
does happen, there's no harm from processing the commands anyway.
This makes it a little easier to book-keep the started variable. Since
scan_request already has a 'passive' bit-field, there should be no
storage penalty.
If scan_cancel is called on a scan_request that is 'finished' but with
the GET_SCAN command still in flight, it will trigger a crash as
follows:
Received Deauthentication event, reason: 2, from_ap: true
src/station.c:station_disconnect_event() 11
src/station.c:station_disassociated() 11
src/station.c:station_reset_connection_state() 11
src/station.c:station_roam_state_clear() 11
src/scan.c:scan_cancel() Trying to cancel scan id 6 for wdev 200000002
src/scan.c:scan_cancel() Scan is at the top of the queue, but not triggered
src/scan.c:get_scan_done() get_scan_done
Aborting (signal 11) [/home/denkenz/iwd-master/src/iwd]
++++++++ backtrace ++++++++
#0 0x7f9871aef3f0 in /lib64/libc.so.6
#1 0x41f470 in station_roam_scan_notify() at /home/denkenz/iwd-master/src/station.c:2285
#2 0x43936a in scan_finished() at /home/denkenz/iwd-master/src/scan.c:1709
#3 0x439495 in get_scan_done() at /home/denkenz/iwd-master/src/scan.c:1739
#4 0x4bdef5 in destroy_request() at /home/denkenz/iwd-master/ell/genl.c:676
#5 0x4c070b in l_genl_family_cancel() at /home/denkenz/iwd-master/ell/genl.c:1960
#6 0x437069 in scan_cancel() at /home/denkenz/iwd-master/src/scan.c:842
#7 0x41dc2e in station_roam_state_clear() at /home/denkenz/iwd-master/src/station.c:1594
#8 0x41dd2b in station_reset_connection_state() at /home/denkenz/iwd-master/src/station.c:1619
#9 0x41dea4 in station_disassociated() at /home/denkenz/iwd-master/src/station.c:1644
The happens because get_scan_done callback is still called as a result of
l_genl_cancel. Add a re-entrancy guard in the form of 'canceled'
variable in struct scan_request. If set, get_scan_done will skip invoking
scan_finished.
It isn't clear what 'l_queue_peek_head() == results->sr' check was trying
to accomplish. If GET_SCAN dump was scheduled, then it should be
reported. Drop it.
results->sr is set to NULL for 'opportunistic' scans which were
triggered externally. See scan_notify() for details. However,
get_scan_done would only invoke scan_finished (and thus the periodic
scan callback sc->sp.callback) only if the scan queue was empty. It
should do so in all cases.
The point type was being hard coded to 0x3 (BIT1) which may have resulted
in the peer subtracting Y from P when reading in the point (depending on
if Y was odd or not).
Instead set the compressed type to whatever avoids the subtraction which
both saves IWD from needing to do it, as well as the peer.
The intent of this check is to make sure that at least 2 bytes are
available for reading. However, the unintended consequence is that tags
with a zero length at the end of input would be rejected.
While here, rework the check to be more resistant to potential
overflow conditions.
First disconnect wpa_supplicant to make sure it wont miss frames if
it decides to connect. Also alter the order of things for the
configurator test so autoconnect doesn't start until after hostapd
is up (avoids additional scanning and delays)
The DPP spec says nothing about how to handle re-transmits but it
was found in testing this can happen relatively easily for a few
reasons.
If the configurator requests a channel switch but does not get onto
the new channel quick enough the enrollee may have already sent the
authenticate response and it was missed. Also by nature of how the
kernel goes offchannel there are moments in time between ROC when
the card is idle and not receiving any frames.
Only frames where there was no ACK will be retransmitted. If the
peer received the frame and dropped it resending the same frame wont
do any good.
Now the result is sent immediately. Prior a connect attempt or
scan could have started, potentially losing this frame. In addition
the offchannel operation is cancelled after sending the result
which will allow the subsequent connect or scan to happen much
faster since it doesn't have to wait for ROC to expire.
The previous (incorrect) else was removed since it ended up
printing in most cases since the if clause returned. This should
have been an else if conditional from the start and only print if the
station device was not found.
IWD may be in the middle of some long operation, e.g. scanning.
If the URI is returned before IWD is ready, a configurator could
start sending frames and IWD either wont receive them, or will
be unable to respond quickly.
Controlling wpa_supplicant/hostapd from a text based interface is
problematic in that there is no way of knowing if an event corresponds
to a request. In certain cases if wpa_s/hostapd is sending out multiple
events and we make a request, a random event may come back after the
request, but before the actual result.
To fix this, at least for this specific case, we can continue to read
from the socket until the result is numeric.
The offchannel priority was also changed to zero, which matches the
priority of frames. Currently there should be no interaction between
offchannel and connect (previous offchannel priority).
Periodic scans were handled specially where they were only
started if no other requests were pending in the scan queue.
This is fine, and what we want, but this can actually be
handled automatically by nature of the wiphy work queue rather
than needing to check the request queue explicitly.
Instead we can insert periodic scans at a lower priority than
other scans. This puts them at the end of the work queue, as
well as allows future requests to jump ahead if a periodic scan
has not yet started.
Eventually, once all pending scans are done, the peridoic scan
may begin. This is no different than the preivous behavior and
avoids the need for any special checks once scan requests
complete.
One check was added to address the problem of the periodic scan
timer firing before the scan could even start. Currently this
happened to be handled fine in scan_periodic_queue, as it checks
the queue length. Since this check was removed we must see check
for this condition inside scan_periodic_timeout.
This adds a priority argument to scan_common rather than hard
coding it when inserting the work item and uses the newly
defined wiphy priority for scanning.
Work priority was never explicitly defined anywhere, and a module
using wiphy_radio_work APIs needed to ensure it was not inserting
at a priority that would interfere with other work.
Now all the types of work have been defined with their own priority
and future priorities can easily be added before, after, or in
between existing priorities.
- Mostly problems with whitespace:
- Use of spaces instead of tabs
- Stray spaces before closing ')
- Missing spaces
- Missing 'void' from function declarations & definitions that
take no arguments.
- Wrong indentation level
When this attribute is included, the initiator is requesting all
future frames be sent on this channel. There is no reason for a
configurator to act on this attribute (at least for now) so the
request frame will be dropped in this case. Enrollees will act
on it by switching to the new channel and sending the authentication
response.
While connected the driver ends up choosing quite small ROC
durations leading to excessive calls to ROC. This also will
negatively effect any wireless performance for the current
network and possibly lead to missed DPP frames.
Currently the enrollee relied on autoconnect to handle connecting
to the newly configured network. This usually resulted in poor
performance since periodic scans are done at large intervals apart.
Instead first check if the newly configured network is already
in IWD's network queue. If so it can be connected to immediately.
If not, a full scan must be done and results given to station.
With better JSON support the configuration request object
can now be fully parsed. As stated in the previous comment
there really isn't much use from the configurator side apart
from verifying mandatory values are included.
This patch also modifies the configuration result to handle
sending non 'OK' status codes in case of JSON parsing errors.
json_iter_parse is only meant to work on objects while
json_iter_next is only meant to work on arrays.
This adds checks in both APIs to ensure they aren't being
used incorrectly.
Arrays can now be parsed using the JSON_ARRAY type (stored in
a struct json_iter) then iterated using json_iter_next. When
iterating the type can be checked with json_iter_get_type. For
each iteration the value can be obtained using any of the type
getters (int/uint/boolean/null).
This adds support for boolean, (unsigned) integers, and
null types. JSON_PRIMITIVE should be used as the type when
parsing and the value should be struct json_iter.
Once parsed the actual value can be obtained using one of
the primitive getters. If the type does not match they will
return false.
If using JSON_OPTIONAL with JSON_PRIMITIVE the resulting
iterator can be checked with json_iter_is_valid. If false
the key/value was not found or the type was not matching.
First, this was renamed to 'count_tokens_in_container' to be
more general purpose (i.e. include future array counting).
The way the tokens are counted also changed to be more intuitive.
While the previous way was correct, it was somewhat convoluted in
how it worked (finding the next parent of the objects parent).
Instead we can use the container token itself as the parent and
begin counting tokens. When we find a token with a parent index
less than the target we have reached the end of this container.
This also works for nested containers, including arrays since we
no longer rely on a key (which an array element would not have).
For example::
{
"first":{"foo":"bar"},
"second":{"foo2":"bar2"}
}
index 0 <overall object>
index 1 "first" with parent 0
index 2 {"foo":"bar"} with parent 1
Counting tokens inside "first"'s object we have:
index 3 "foo" with parent 2
index 4 "bar" with parent 3
If we continue counting we reach:
index 5 "second" with parent 0
This terminates the counting loop since the parent index is
less than '2' (the index of {"foo":"bar"} object).
In file included from ./ell/ell.h:15,
from ../../src/dpp.c:29:
../../src/dpp.c: In function ‘authenticate_request’:
../../ell/log.h:79:22: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 8 has type ‘size_t’ {aka ‘unsigned int’} [-Wformat=]
79 | l_log(L_LOG_DEBUG, "%s:%s() " format, __FILE__, \
| ^~~~~~~~~~
../../ell/log.h:54:16: note: in definition of macro ‘l_log’
54 | __func__, format "\n", ##__VA_ARGS__)
| ^~~~~~
../../ell/log.h:103:31: note: in expansion of macro ‘L_DEBUG_SYMBOL’
103 | #define l_debug(format, ...) L_DEBUG_SYMBOL(__debug_desc, format, ##__VA_ARGS__)
| ^~~~~~~~~~~~~~
../../src/dpp.c:1235:3: note: in expansion of macro ‘l_debug’
1235 | l_debug("I-Nonce has unexpected length %lu", i_nonce_len);
| ^~~~~~~
Some wpa_cli utilities return some result which isn't possible to
get with wait_for_event unless you know what the result will be.
This adds wait_for_result which just returns the first event that
comes in.
wait_for_event was checking the event string presence in the rx_data
array which meant the event string had to match perfectly to any
received events. This poses problems with events that include additional
information which the caller may not be able to know or does not care
about. For example:
DPP-RX src=02:00:00:00:02:00 freq=2437 type=11
Waiting for this event previously would require the caller know src, freq,
and type. If the caller only wants to wait for DPP-RX, it can now do that.
This is created by the python interpreter for speed optimization
but poses problems if copied to /tmp since previous tests may
have already copied it leading to an exception.
Since a Device class can represent multiple modes (AP, AdHoc, station)
move StationDebug out of the init and only create this class when it
is used (presumably only when the device is in station mode).
The StationDebug class is now created in a property method consistent
with 'station_if'. If Device is not in station mode it is automatically
switched if the test tries any StationDebug methods.
If the Device mode is changed from 'station' the StationDebug class
instance is destroyed.
This file is meant as a sample and contains only the most typically
changed settings. For other settings users should refer to the
iwd.config manual page.
Right now hwsim blindly tries to forward broadcast/multicast frames to
all interfaces it knows about and relies on the kernel to reject the
forwarding attempt if the frequency does not match. This results in
multiple copies of the same message being added to the genl transmit
queue.
On slower systems this can cause a run-away memory consumption effect
where the queued messages are not processed in time prior to a new
message being received for forwarding. The likelyhood of this effect
manifesting itself is directly related to the number of hostapd
instances that are created and are beaconing simultaneously.
Try to optimize frame forwarding by not sending beacon frames
to those interfaces that are in AP mode (i.e. pure hostapd instances)
since such interfaces are going to be operating on a different frequency
and would not be interested in processing beacon frames anyway.
This optimization cuts down peak memory use during certain tests by 30x
or more (~33mb to ~1mb) when profiled with 'valgrind --tool=massif'
Direct leak of 64 byte(s) in 1 object(s) allocated from:
#0 0x7fa226fbf0f8 in __interceptor_malloc (/usr/lib/gcc/x86_64-pc-linux-gnu/9.4.0/libasan.so.5+0x10c0f8)
#1 0x688c98 in l_malloc ell/util.c:62
#2 0x6c2b19 in msg_alloc ell/genl.c:740
#3 0x6cb32c in l_genl_msg_new_sized ell/genl.c:1567
#4 0x424f57 in netdev_build_cmd_authenticate src/netdev.c:3285
#5 0x425b50 in netdev_sae_tx_authenticate src/netdev.c:3385
Direct leak of 7 byte(s) in 1 object(s) allocated from:
#0 0x7fd748ad00f8 in __interceptor_malloc (/usr/lib/gcc/x86_64-pc-linux-gnu/9.4.0/libasan.so.5+0x10c0f8)
#1 0x688c21 in l_malloc ell/util.c:62
#2 0x4beec7 in handshake_state_set_vendor_ies src/handshake.c:324
#3 0x464e4e in station_handshake_setup src/station.c:1203
#4 0x472a2f in __station_connect_network src/station.c:2975
#5 0x473a30 in station_connect_network src/station.c:3078
#6 0x4ed728 in network_connect_8021x src/network.c:1497
Fixes: f24cfa481b0c ("handshake: Add setter for vendor IEs")
Passing the full argument list to StationDebug was removed
because any existing properties (for Device) were being
included and causing incorrect behavior.
This neglected to handle namespaces which should also be
passed to StationDebug. Unfortunately the arguments are not
named when Device() is initialized so they cannot easily be
sorted. Instead just define Device() arguments to match the
DBus abstraction and pass only the path and namespace to
StationDebug
Two commands were added:
dpp <iface> start-enrollee
dpp <iface> start-configurator
dpp <iface> stop
In addition there is support for using the qrencode utility for displaying
the QR code after DPP is started (enrollee or configurator. If qrencode is
found on the system the QR code will be displayed. Otherwise only the URI
will be printed to the console.
This implements a configurator in the responder role. Currently
configuring an enrollee is limited to only the connected network.
This is to avoid the need to go offchannel for any reason. But
because of this a roam, channel switch, or disconnect will cause
the configuration to fail as none of the frames are being sent
offchannel.
Added both enrollee and configurator roles, as well as the needed
logic inside the authentication protocol to verify role compatibility.
The dpp_sm's role will now be used when setting capability bits making
the auth protocol agnostic to enrollees or configurators.
This also allows the card to re-issue ROC if it ends in the middle of
authenticating or configuring as well as add a maximum timeout for
auth/config protocols.
IO errors were also handled as these sometimes can happen with
certain drivers but are not fatal.
Allows creating a new configuration object based on settings, ssid,
and akm suite (for configurator role) as well as converting a
configuration object to JSON.
Rather than hard coding ad0, use the actual frame data. There really
isn't a reason this would differ (only status attribute) but just
in case its better to use the frame data directly.
This is a minimal implementation only supporting legacy network
configuration, i.e. only SSID and PSK/passphrase are supported.
Missing features include:
- Fragmentation/comeback delay support
- DPP AKM support
- 8021x/PKEX support
This implements the DPP protocol used to authenticate to a
DPP configurator.
Note this is not a full implementation of the protocol and
there are a few missing features which will be added as
needed:
- Mutual authentication (needed for BLE bootstrapping)
- Configurator support
- Initiator role
The presence procedure implemented is a far cry from what the spec
actually wants. There are two reason for this: a) the kernels offchannel
support is not at a level where it will work without rather annoying
work arounds, and b) doing the procedure outlined in the spec will
result in terrible discovery performance.
Because of this a simpler single channel announcement is done by default
and the full presence procedure is left out until/if it is needed.
This is a minimal wrapper around jsmn.h to make things a bit easier
for iterating through a JSON object.
To use, first parse the JSON and create a contents object using
json_contents_new(). This object can then be used to initialize a
json_iter object using json_iter_init().
The json_iter object can then be parsed with json_iter_parse by
passing in JSON_MANDATORY/JSON_OPTIONAL arguments. Currently only
JSON_STRING and JSON_OBJECT types are supported. Any JSON_MANDATORY
values that are not found will result in an error.
If a JSON_OPTIONAL string is not found, the pointer will be NULL.
If a JSON_OPTIONAL object is not found, this iterator will be
initialized but 'start' will be -1. This can be checked with a
convenience macro json_object_not_found();
Static analysis was not happy since this return can be negative and
it was being fed into an unsigned argument. In reality this cannot
happen since the key buffer is always set to the maximum size supported
by any curves.
This module provides a convenient wrapper around both
CMD_[CANCEL_]_REMAIN_ON_CHANNEL APIs.
Certain protocols require going offchannel to send frames, and/or
wait for a response. The frame-xchg module somewhat does this but
has some limitations. For example you cannot just go offchannel;
an initial frame must be sent out to start the procedure. In addition
frame-xchg does not work for broadcasts since it expects an ACK.
This module is much simpler and only handles going offchannel for
a duration. During this time frames may be sent or received. After
the duration the caller will get a callback and any included error
if there was one. Any offchannel request can be cancelled prior to
the duration expriring if the offchannel work has finished early.
Make sure we wipe the leases file both for server and client, so that
dhclient doesn't try to re-use leases from previous tests (should really
happen) and waste time waiting for a reply. Extend the timeout from 1s
to 5s, sometimes it takes dhclient 1s just to start. Disable verbose
mode if not needed to avoid dhclient stalling if the pipe is not being
read.
The disconnect event handler was mistakenly bailing out if FT or
reassociation was going on. This was done because a disconnect
event is sent by the kernel when CMD_AUTH/CMD_ASSOC is used.
The problem is an AP could also disconnect IWD which should never
be ignored.
To fix this always parse the disconnect event and, if issued by
the AP, always notify watchers of the disconnect.
Passing *args, **kwargs into StationDebug ended up initializing the
class with Station properties since devices can be initialized from
existing property dictionaries. Since the object path is all
StationDebug needs, pass args[0] instead.
LLD 13 and GNU ld 2.37 support -z start-stop-gc which allows garbage
collection of C identifier name sections despite the __start_/__stop_
references. GNU ld before 2015-10 had the behavior as well. Simply set
the retain attribute so that GCC 11 (if configure-time binutils is 2.36
or newer)/Clang 13 will set the SHF_GNU_RETAIN section attribute to
prevent garbage collection.
Without the patch, there are linker errors with -z start-stop-gc
(LLD default) when -Wl,--gc-sections is used:
```
ld.lld: error: undefined symbol: __start___eap
>>> referenced by eap.c
>>> src/eap.o:(eap_init)
```
The remain attribute will not be needed if the metadata sections are
referenced by code directly.
Document the new API that clients can use to get notified of new network
configuration and be responsible for committing it to the netdev, the
resolver, etc.
On some systems the default radvd pid file location is not accessible.
Specify it to be under /tmp instead.
While there, enable full radvd debug output so it is logged when
test-runner is invoked with the --log option.
ap.c has been mostly careful to call the event handler at the end of any
externally called function to allow methods like ap_free() to be called
within the handler, but that isn't enough. For example in
ap_del_station we may end up emitting two events: STATION_REMOVED and
DHCP_LEASE_EXPIRED. Use a slightly more complicated mechanism to
explicitly guard ap_free calls inside the event handler.
To make it easier, simplify cleanup in ap_assoc_reassoc with the use of
_auto_.
In ap_del_station reorder the actions to send the STATION_REMOVED event
first as the DHCP_LEASE_EXPIRED is a consequence of the former and it
makes sense for the handler to react to it first.
src/eap.c: In function 'eap_rx_packet':
src/eap.c:419:50: error: 'vendor_type' may be used uninitialized in this function [-Werror=maybe-uninitialized]
419 | (type == EAP_TYPE_EXPANDED && vendor_id == (id) && vendor_type == (t))
| ^~
src/eap.c:430:11: note: 'vendor_type' was declared here
430 | uint32_t vendor_type;
It isn't clear why GCC complains about vendor_type, but not vendor_id.
But in all cases if type == EAP_TYPE_EXPANDED, then vendor_type and
vendor_id are set. Silence this spurious warning.
There is an unchecked NULL pointer access in network_has_open_pair.
open_info can be NULL, when out of multiple APs in range that advertise
the same SSID some advertise OWE transition elments and some don't.
A user reported a crash in situations where there was an OWE transition
pair, with an extra open network using the same SSID but not advertising
the OWE transition IE:
++++++++ backtrace ++++++++
0x7f199cadf320 in /lib64/libc.so.6
0x418c08 in network_has_open_pair() at /home/jprestwo/iwd/src/station.c:712
0x4262ce in scan_finished() at /home/jprestwo/iwd/src/scan.c:1718
0x4273cd in get_scan_done() at /home/jprestwo/iwd/src/scan.c:1733
0x47cf7a in destroy_request() at /home/jprestwo/iwd/ell/genl.c:674
0x479f1c in io_callback() at /home/jprestwo/iwd/ell/io.c:120
0x47922d in l_main_iterate() at /home/jprestwo/iwd/ell/main.c:472 (discriminator 2)
0x4792dc in l_main_run() at /home/jprestwo/iwd/ell/main.c:521
0x47950c in l_main_run_with_signal() at /home/jprestwo/iwd/ell/main.c:649
0x403e97 in main() at /home/jprestwo/iwd/src/main.c:532
0x7f199cac9b75 in /lib64/libc.so.6
+++++++++++++++++++++++++++
The Hotspot 2.0 spec has some requirements that IWD was missing depending
on a few bits in extended capabilities and the HS2.0 indication element.
These requirements correspond to a few sysfs options that can be set in
the kernel which are now set on CONNECTED and unset on DISCONNECTED.
Netconfig was the only user of sysfs but now other modules will
also need it.
Adding existing API for IPv6 settings, a IPv4 and IPv6 'supports'
checker, and a setter for IPv4 settings.
If a beacon is lost testAP will fail since it did not utilize any
rescanning logic. Now it can use this feature by passing full_scan.
This is required since IWD APs are not known to test-runner like
hostapd APs are.
Certain scenarios coupled with lost beacons could result in OrderedNetwork
being initialized many times until the dbus library reached its maximum
signal registrations. This could happen where there are two networks,
IWD finds one in a scan but continues to scan for the other and the beacons
are lost. The way get_ordered_networks was written it returns early if any
networks are found. Since get_ordered_network (not plural) uses
get_ordered_networks() in a loop this caused OrderedNetwork's to be created
rapidly until python raises an exception.
To fix this, pass an optional list of networks being looked for to
get_ordered_networks. Only if all the networks in the list are found will
it return early, otherwise it will continue to scan.
In non-interactive mode, when a dbus method call returns the process
exits. This is true for all methods except agent requests since e.g.
Connect() call automatically requests credentials and the client must
wait for that to return before exiting. The new daemon interface must
also be treated in the same way and not exit.
It appears different versions of pyroute2 may or may not have
iwutil, and instead use pyroute2.IW() directly. Try the iwutil
way first, then pyroute2.IW()
The way a SA Query was done following a channel switch was slightly
incorrect. One because it is only needed when OCVC is set, and two
because IWD was not waiting a random delay between 0 and 5000us as
lined out by the spec. This patch fixes both these issues.
When station 'show' is invoked, parse and print any IP addresses
associated with the interface.
If iwd network configuration is disabled and no IP addresses were
configured, print a hint to the user that a DHCP client might need to be
configured.
Commit ed10b00afa3f ("unit: Fix eapol IP Allocation test failure")
did not convert all instances of IP allocation settings to network byte
order.
Fixes: 5c9de0cf23f9 ("eapol: Store IP address in network byte order")
Cache the latest v4 and v6 domain string lists in struct netconfig state
to be able to more easily detect changes in those values in future
commits. For that split netconfig_set_domains's code into this function,
which now only commits the values in netconfig->v{4,6}_domain{,s} to the
resolver, and netconfig_domains_update() which figures out the active
domains string list and saves it into netconfig->v{4,6}_domain{,s}. This
probably saves some cycles as the callers can now decide to only
recalculate the domains list which may have changed.
While there simplify netconfig_set_domains return type to void as the
result was always 0 anyway and was never checked by callers.
Cache the latest v4 and v6 DNS IP string lists in struct netconfig state
to be able to more easily detect changes in those values in future
commits. For that split netconfig_set_dns's code into this function,
which now only commit the values in netconfig->dns{4,6}_list to the
resolver, and netconfig_dns_list_update() which figures out the active
DNS IP address list and saves it in netconfig->dns{4,6} list. This
probably saves some cycles as the callers can now decide to only
recalculate the dns_list which may have changed.
While there simplify netconfig_set_dns return type to void as the result
was always 0 anyway and was never checked by callers.
Cache the latest v4 and v6 gateway IP string in struct netconfig state
to be able to more easily detect changes in those values in future
commits and perhaps to simplify the ..._routes_install functions.
netconfig_ipv4_get_gateway's out_mac parameter can now be NULL. While
editing that function fix a small formatting annoyance.
Use a separate fils variable to make the code a bit prettier.
Also make sure that the out_mac parameter is not NULL prior to storing
the gateway_mac in it.
Add netconfig_enabled() and use that in all places that want to know
whether network configuration is enabled. Drop the enable_network_config
deprecated setting, which was only being handled in one of these 5 or so
places.
This code path was never tested and used to ensure a OWE transition
candidate gets selected over an open one (e.g. if all the BSS's are
blacklisted). But this logic was incorrect and the path was being
taken for BSS's that did not contain the owe_trans element, basically
all BSS's. For RSN's this was somewhat fine since the final check
would set a candidate, but for open BSS's the loop would start over
and potentially complete the loop without ever returning a candidate.
If fallback was false, NULL would be returned.
To fix this only take the OWE transition path if its an OWE transition
BSS, i.e. inverse the logic.
There was no open ssid provisioning file, which was fine as the
first test should have created one. But to be safe, include one
explicitly and use the proper setUp/tearDown functions.
Normally Beacon Reporting subelements are present only if repeated
measurements are requested. However, an all-zero Beacon Reporting
subelement is included by some implementations. Handle this case
similarly to the absent case.
Since Reporting Detail subelement is listed as 'extensible', make sure
that the length check is not overly restrictive. We only interpret the
first field.
It was seen during testing that several offload-capable cards
were not including the OCI in the 4-way handshake. This made
any OCV capable AP unconnectable.
To be safe disable OCV on any cards that support offloading.
802.11 requires an STA initiate the SA Query procedure on channel
switch events. This patch refactors sending the SA Query into its
own routine and starts the procedure when the channel switch event
comes in.
In addition the OCI needs to be verified, so the channel info is
parsed and set into the handshakes chandef.
There are several events for channel switching, and nl80211cmd was
naming two of them "Channel Switch Notify". Change
CH_SWITCH_STARTED_NOTIFY to "Channel Switch Started Notify" to
distinguish the two events.
By sleeping for 4 seconds IWD had plenty of time to fully disconnect
and reconnect in time to pass the final "connected" check. Instead
use wait_for_object_condition to wait for disconnected and expect
this to fail. This will let the test fail if IWD disconnects.
SA query is the final protocol that requires OCI inclusion and
verification. The OCI element is now included and verified in
both request and response frames as required by 802.11.
strcmp behavior is undefined if one of the parameters is NULL.
Server-id is a mandatory value and cannot be NULL. Gateway can be NULL
in DHCP, so check that explicitly.
Reported-by: Andrew Zaborowski <andrew.zaborowski@intel.com>
In certain situations, it is possible for us to know the MAC of the
default gateway when DHCP finishes. This is quite typical on many home
network and small network setups. It is thus possible to pre-populate
the ARP cache with the gateway MAC address to save an extra round trip
at connection time.
Another advantage is during roaming. After version 4.20, linux kernel
flushes ARP caches by default whenever netdev encounters a no carrier
condition (as is the case during roaming). This can prevent packets
from going out after a roam for a significant amount of time due to
lost/delayed ARP responses.
This implements the new handshake callback for setting a TK with
an extended key ID. The procedure is different from legacy zero
index TKs.
First the new TK is set as RX only. Then message 4 should be sent
out (so it uses the existing TK). This poses a slight issue with
PAE sockets since message order is not guaranteed. In this case
the 4th message is stored and sent after the new TK is installed.
Then the new TK is modified using SET_KEY to both send and
receive.
In the case of control port over NL80211 the above can be avoided
and we can simply install the new key, send message 4, and modify
the TK as TX + RX all in sequence, without waiting for any callbacks.
When UseDefaultInterface is set, iwd doesn't attempt to destroy and
recreate any default interfaces it detects. However, only a single
default interface was ever remembered & initialized. This is fine for
most cases since the kernel would typically only create a single netdev
by default.
However, some drivers can create multiple netdevs by default, if
configured to do so. Other usecases, such as tethering, can also
benefit if iwd initialized & managed all default netdevs that were
detected at iwd start time or device hotplug.
oci variable is always set during handshake_util_find_kde. Do not
initialize it unnecessarily to help the compiler / static analysis find
potential issues.
If OCI is not used, then the oci array is never initialized. Do not try
to include it in our GTK 2_of_2 message.
Fixes: ad4d6398542b ("eapol: include OCI in GTK 2/2")
802.11 added Extended Key IDs which aim to solve the issue of PTK
key replacement during rekeys. Since swapping out the existing PTK
may result in data loss because there may be in flight packets still
using the old PTK.
Extended Key IDs use two key IDs for the PTK, which toggle between
0 and 1. During a rekey a new PTK is derived which uses the key ID
not already taken by the existing PTK. This new PTK is added as RX
only, then message 4/4 is sent. This ensure message 4 is encrypted
using the previous PTK. Once sent, the new PTK can be modified to
both RX and TX and the rekey is complete.
To handle this in eapol the extended key ID KDE is parsed which
gives us the new PTK key index. Using the new handshake callback
(handshake_state_set_ext_tk) the new TK is installed. The 4th
message is also included as an argument which is taken care of by
netdev (in case waiting for NEW_KEY is required due to PAE socekts).
REKEY_GTK kicks off the GTK only handshake where REKEY_PTK does
both (via the 4-way). The way this utility was written was causing
hostapd some major issues since both REKEY_GTK and REKEY_PTK was
used.
Instead if address is set only do REKEY_PTK. This will also rekey
the GTK via the 4-way handshake.
If no address is set do REKEY_GTK which will only rekey the GTK.
This may not be required but setting the group key mode explicitly
to multicast makes things consistent, even if only for the benefit
of reading iwmon logs easier.
The procedure for setting extended key IDs is different from the
single PTK key. The key ID is toggled between 0 and 1 and the new
key is set as RX only, then set to RX/TX after message 4/4 goes
out.
Since netdev needs to set this new key before sending message 4,
eapol can include a built message which netdev will store if
required (i.e. using PAE).
ext_key_id_capable indicates the handshake has set the capability bit
in the RSN info. This will only be set if the AP also has the capability
set.
active_tk_index is the key index the AP chose in message 3. This is
now used for both legacy (always zero) and extended key IDs.
Move the reading of ControlPortOverNL80211 into wiphy itself and
renamed wiphy_control_port_capable to wiphy_control_port_enabled.
This makes things easier for any modules interested in control
port support since they will only have to check this one API rather
than read the settings and check capability.
Expose the Device Address property for each peer. The spec doesn't say
much about how permanent the address or the name are, although the
device address by definition lives longer than the interface addresses.
However the device address is defined to be unique and the name is not
so the address can be used to differentiate devices with identical name.
Being unique also may imply that it's assigned globally and thus
permanent.
Network Manager uses the P2P device address when saving connection
profiles (and will need it from the backend) and in this case it seems
better justified than using the name.
The address is already in the object path but the object path also
includes the local phy index which may change for no reason even when
the peer's address hasn't changed so the path is not useful for
remembering which device we've connected to before. Looking at only
parts of the path is considered wrong.
Some drivers might not actually support control port properly even if
advertised by mac80211. Introduce a new method to wiphy that will take
care of looking up any driver quirks that override the presence of
NL80211_EXT_FEATURE_CONTROL_PORT_OVER_NL80211
Make consecutive calls to netconfig_load_settings() memory-leak safe by
introducing a netconfig_free_settings convenience method. This method
will free any settings that are allocated as a result of
netconfig_load_settings() and will be called from netconfig_free() to
ensure that any settings are freed as a result of netconfig_destroy().
For symmetry with IPv4, save the command id for this netlink command so
we can later add logic to the callback as well as be able to cancel the
command. No functional change in this commit alone.
The FT-over-DS test was allowed to fail as it stood. If FT-over-DS
failed it would just do a normal over-Air transition which satisfied
all the checks. To prevent this Authenticate frames are blocked after
the initial connection so if FT-over-DS fails there is no other way
to roam.
FT/FILS handle their own PMK derivation but rekeys still require
using the 4-way handshake. There is some ambiguity in the spec whether
or not the PMKID needs to be included in message 1/4 and it appears
that when rekeying after FT/FILS hostapd does not include a PMKID.
The handshake contains the current BSS's RSNE/WPA which may differ
from the FT-over-DS target. When verifying the target BSS's RSNE/WPA
IE needs to be checked, not the current BSS.
If the deauth path was triggered IWD would deauth but end up
calling the connect callback with whatever result netdev had
set, e.g. 'NETDEV_RESULT_OK'. This, of course, caused station
some confusion.
FT-over-DS cannot use OCV due to how the kernel works. This means
we could connect initially with OCVC set, but a FT-over-DS attempt
needs to unset OCVC. Set OCVC false when rebuilding the RSNE for
reassociation.
The FT-over-DS action stage builds an FT-Request which contains an
RSNE. Since FT-over-DS will not support OCV add a boolean to
ft_build_authenticate_ies so the OCVC bit can be disabled rather
than relying on the handshake setting.
This modifies the FT logic to fist call get_oci() before
reassociation. This allows the OCI to be included in reassociation
and in the 4-way handshake later on.
The code path for getting the OCI had to be slightly changed to
handle an OCI that is already set. First the handshake chandef is
NULL'ed out for any new connection. This prevents a stale OCI from
being used. Then some checks were added for this case in
netdev_connect_event and if chandef is already set, start the 4-way
handshake.
netconfig_load_settings is called when establishing a new initial
association to a network. This function tries to update dhcp/dhcpv6
clients with the MAC address of the netdev being used. However, it is
too early to update the MAC here since netdev might need to powercycle
the underlying network device in order to update the MAC (i.e. when
AddressRandomization="network" is used).
If the MAC is set incorrectly, DHCP clients are unable to obtain the
lease properly and station is stuck in "connecting" mode indefinitely.
Fix this by delaying MAC address update until netconfig_configure() is
invoked.
Fixes: ad228461abbf ("netconfig: Move loading settings to new method, refactor")
If the AP advertises FT-over-DS support it likely wants us to use
it. Additionally signal_low is probably going to be true since IWD
has started a roam attempt.
When netdev goes down so does station, but prior to netdev calling
the neighbor report callback. The way the logic was written station
is dereferenced prior to checking for any errors, causing a use
after free.
Since -ENODEV is used in this case check for that early before
accessing station.
After namespaces were added, the dbus address was customized to
be /tmp/dbus{0..N}. This prevented any dbus applications started
in the shell from working properly.
Set DBUS_SYSTEM_BUS_ADDRESS to the environment prior to entering
the shell.
This adds a utility to convert a chandef obtained from the kernel into a
3 byte OCI element format containing the operating class, primary
channel and secondary channel center frequency index.
This changes scan_bss from using separate members for each
OWE transition element data type (ssid, ssid_len, and bssid)
to a structure that holds them all.
This is being done because OWE transition has option operating
class and channel bytes which will soon be parsed. This would
end up needing 5 separate members in scan_bss which is a bit
much for a single IE that needs to be parsed.
This makes checking the presense of the IE more convenient
as well since it can be done with a simple NULL pointer check
rather than having to l_memeqzero the BSSID.
These members are currently stored in scan_bss but with the
addition of operating class/band info this will become 5
separate members. This is a bit excessive to store in scan_bss
separately so instead this structure can hold everything related
to the OWE transition IE.
Add a utility for setting the OCI obtained from the hardware (prior to
handshake starting) as well as a utility to validate the OCI obtained
from the peer.
This adds a utility that can convert an operating class + channel
combination to a frequency. Operating class is assumed to be a global
operating class from 802.11 Appendix E4.
This information can be found in Operating Channel Information (OCI) IEs,
as well as OWE Transition Mode IEs.
Calling handshake_state_setup_own_ciphers from within
handshate_state_set_authenticator_ie was misleading. In all cases the
supplicant chooses the AKM. This worked since our AP code only ever
advertises a single AKM, but would not work in the general case.
Similarly, the supplicant would choose which authentication type to use
by either sending the WPA1 or WPA2 IE (or OSEN). Thus the setting of
the related variables in handshake_state_set_authenticator_ie was also
incorrect. In iwd, the supplicant_ie would be set after the
authenticator_ie, so these settings would be overwritten in most cases.
Refactor these two setters so that the supplicant's chosen rsn_info
would be used to drive the handshake.
reallocarray has been added to glibc relatively recently (version 2.26,
from 2017) and apparently not all users run new enough glibc. Moreover,
reallocarray is not available with uclibc-ng. So use realloc if
reallocarray is not available to avoid the following build failure
raised since commit 891b78e9e892a3bcd800eb3a298e6380e9a15dd1:
/home/giuliobenetti/autobuild/run/instance-3/output-1/host/lib/gcc/xtensa-buildroot-linux-uclibc/10.3.0/../../../../xtensa-buildroot-linux-uclibc/bin/ld: src/sae.o: in function `sae_rx_authenticate':
sae.c:(.text+0xd74): undefined reference to `reallocarray'
Fixes:
- http://autobuild.buildroot.org/results/c6d3f86282c44645b4f1c61882dc63ccfc8eb35a
This adds several tests for OWE transition networks. Hostapd
does have special options for these networks but currently their
implementation is incorrect as the IE is not ever added to the
OWE BSS. Besides that using vendor_elements provides a much easier
way to create invalid IEs to test.
There isn't much control station has with how BSS's are inserted to
a network object. The rank algorithm makes that decision. Because of
this we could end up in a situation where the Open BSS is preferred
over the OWE transition BSS.
In attempt to better handle this any Open BSS in this type of network
will not be chosen unless its the only candidate (e.g. no other BSSs,
inability to connect with OWE, or an improperly configured network).
OWE Transition is described in the WiFi Alliance OWE Specification
version 1.1. The idea behind it is to support both legacy devices
without any concept of OWE as well as modern ones which support the
OWE protocol.
OWE is a somewhat special type of network. Where it advertises an
RSN element but is still "open". This apparently confuses older
devices so the OWE transition procedure was created.
The idea is simple: have two BSS's, one open, and one as a hidden
OWE network. Each network advertises a vendor IE which points to the
other. A device sees the open network and can connect (legacy) or
parse the IE, scan for the hidden OWE network, and connect to that
instead.
Care was taken to handle connections to hidden networks directly.
The policy is being set that any hidden network with the WFA OWE IE
is not connectable via ConnectHiddenNetwork(). These networks are
special, and can only be connected to via the network object for
the paired open network.
When scan results come in from any source (DBus, quick, autoconnect)
each BSS is checked for the OWE Transition IE. A few paths can be
taken here when the IE is found:
1. The BSS is open. The BSSID in the IE is checked against the
current scan results (excluding hidden networks). If a match is
found we should already have the hidden OWE BSS and nothing
else needs to be done (3).
2. The BSS is open. The BSSID in the IE is not found in the
current scan results, and the open network also has no OWE BSS
in it. This will be processed after scan results.
3. The BSS is not open and contains the OWE IE. This BSS will
automatically get added to the network object and nothing else
needs to be done.
After the scan results each network is checked for any non-paired
open BSS's. If found a scan is started for these BSS's per-network.
Once these scan results come in the network is notified.
From here network.c can detect that this is an OWE transition
network and connect to the OWE BSS rather than the open one.
Specifically OWE networks with multiple open/hidden BSS's are troublesome
to scan for with the current APIs. The scan parameters are limited to a
single SSID and even if that was changed we have the potential of hitting
the max SSID's per scan limit. In all, it puts the burden onto the caller
to sort out the SSIDs/frequencies to scan for.
Rather than requiring station to handle this a new scan API was added,
scan_owe_hidden() which takes a list of open BSS's and will automatically
scan for the SSIDs in the OWE transition IE for each.
It is slightly optimized to first check if all the hidden SSID's are the
same. This is the most likely case (e.g. single pair or single network)
and a single scan command can be used. Otherwise individual scan commands
are queued for each SSID/frequency combo.
handshake_util_ap_ie_matches() is used to make sure that the RSN element
received from the Authenticator during handshake / association response
is the same as the one advertised in Beacon/Probe Response frames. This
utility tries to bitwise compare the element first, and only if that
fails, compares RSN members individually.
For FT, bitwise comparison will always fail since the PMKID has to be
included by the Authenticator in any RSN IEs included in Authenticate
& Association Response frames.
Perform the bitwise comparison as an optimization only during processing
of eapol message 3/4. Also keep the parsed rsn information for future
use and to possibly avoid re-parsing it during later checks.
DBus scan is performed in several subsets. In certain corner-case
circumstances it would be possible for autoconnect to run after each
subset scan. Instead, trigger autoconnect only after the dbus scan
completes.
This also works around a condition where ANQP results could trigger
autoconnect too early.
Several invocations of station_set_scan_results() base the
'add_to_autoconnect' parameter on station_is_autoconnecting(). Simplify
the code by having station_set_scan_results() invoke that itself.
'add_to_autoconnect' now becomes an 'intent' parameter, specifying
whether autoconnect path should be invoked as a result of these scan
results or not when station is in an appropriate state. Rename
'add_to_autoconnect' parameter to make this clearer.
If the frequency of the bss is not in the list of frequencies for the
current scan, then this is a cached bss. It was likely already
processed for ANQP before, so skip it.
IWD has restricted SSIDs to only utf8 so they can be displayed but
with the addition of OWE transition networks this is an unneeded
restriction (for these networks). The SSID of an OWE transition
network is never displayed to the user so limiting to utf8 isn't
required.
Allow non-utf8 SSIDs to be scanned for by including the length in
the scan parameters and not relying on strlen().
This is a parser for the WFA OWE Transition element. For now the
optional band/channel bytes will not be parsed as hostapd does not
yet support these and would also require the 802.11 appendix E-1
to be added to IWD. Because of this OWE Transition networks are
assumed to be on the same channel as their open counterpart.
in6_addr.__in6_u.__u6_addr8 is glibc-specific and named differently in
the headers shipped with musl libc for example. The POSIX compliant and
universal way of accessing it is in6_addr.s6_addr.
This was actually broken if triggered because __network_connect
checks if network->connect_after_owe_hidden is set and returns
already in progress. We want to keep this behavior though for
obvious reasons.
To fix this station_connect_network can be called directly which
bypasses the check. This is essentially how ANQP avoids this
problem as well.
Similar to ANQP a connect call could come in while station is
scanning for OWE hidden networks. This is supported in the same
manor by saving away the dbus message and resuming the connection
after the hidden OWE scan.
With the addition of OWE transition network needs to be notified
of the hidden OWE scan which is quite similar to how it is notified
of ANQP. The ANQP event watch can be made generic and reused to
allow other events besides ANQP.
This is being added to support OWE transition mode. For these
type of networks the OWE BSS may contain a different SSID than
that of the network, but the WFA spec requires this be hidden
from the user. This means we need to set the handshake SSID based
on the BSS rather than the network object.
Refactor netconfig_set_dns to be a bit easier to follow and remove use
of macros. Also bail out early if no DNS addresses are provided instead
of building an empty DNS list since resolve_set_dns() simply returns if
a NULL or empty DNS list is provided.
If set, a rule will start matching 'MatchBytes' some number of bytes
into the frame (MatchBytesOffset). This is useful since header
information, addresses, and sequence numbers may be unpredictable
between test runs.
To avoid unintended matches the Prefix property is left unchanged
and will match starting at the beginning of the frame.
Since IWD tries group 20 first all other OWE tests are actually
triggering group negotiation where this test is not. Since this
code is exercised this test can be removed completely, as well
as the additional radio/network.
Kernel keeps transmitting authentication frames until told to stop or an
authentication frame the kernel considers 'final' is received. Detect
cases where the kernel would keep retransmitting, and if auth_proto
encounters a fatal protocol error, prevent these retransmissions from
occuring by sending a Deauthenticate command to the kernel.
Additionally, treat -EBADMSG/-ENOMSG return from auth_proto specially.
These error codes are meant to convey that a frame should be silently
dropped and retransmissions should continue.
This test simulates the scenario where IWDs commit is not acked which
exposes a hostapd bug that ultimately fails the connection. This behavior
can be seen by reverting the commit which works around this issue:
"sae: don't send commit in confirmed state"
With the above patch applied this test should pass.
Note: The existing timeout test was reused as it was not of much use
anyways. All it did was block auth/assoc frames and expect a failure
which didn't exercise any SAE logic anyways.
This works around a hostapd bug (described more in the TODO comment)
which is exposed because of the kernels overly agressive re-transmit
behavior on missed ACKs. Combined this results in a death if the
initial commit is not acked. This behavior has been identified in
consumer access points and likely won't ever be patched for older
devices. Because of this IWD must work around the problem which can
be eliminated by not sending out this commit message.
This bug was reported to the hostapd ML:
https://lists.infradead.org/pipermail/hostap/2021-September/039842.html
This change should not cause any compatibility problems to non-hostapd
access points and is identical to how wpa_supplicant treats this
scenario.
If a commit is received while in an accepted state the spec states
the scalar should be checked against the previous commit and if
equal the message should be silently dropped.
The hwsim rules did not treat frames and ACKs any differently which
can mislead the developer especially when setting a rule prefix.
If a prefix was used the frame ACK was actually being matched against
the original frame payload which seems wrong because the ACK is not
the original frame.
Though strange, matching the frame prefix on an ACK has its place if
the developer wants to block just the ACK rather than the frame so
to make this case more clear 'DropAck' was added as a rule property.
And only if this is true will an ACK be checked and potentially
dropped.
To maintain the current hwsim behavior DropAck will default to true.
This integer property can be set to only match a rule a number of
times rather than all packets. This is useful for testing behavior
of a single dropped frame or ack. Once the rule has been matched
'MatchTimes' the rules will no longer be applied (unless set again
to some integer greater than zero).
Since Process.processes is a weak reference dictionary any process
put in this dict will disappear if all references are lost. This
is much better than keeping a list in the Namespace which will hold
the references forever until test-runner manually kills them all at
the end of the test. This does still need to be done for daemon
processes but everything else can just go away when it is no longer
needed.
The test-runner logging is very basic and just dumps everything into files
per-test. This means any subtests are just appended to existing log files
which can be difficult to parse after the fact. This is especially hard
when IWD/Hostapd runs once for the entirety of the test (as opposed to
killing between tests).
This patch writes out a separator between each subtests in the form:
===== <file>:<function> =====
To do this all processes are now kept as weak references inside the
Process class itself. Process.write_separators() can be called which
will iterate through all running processes and write the provided
separator.
This also paves the way to remove the ctx.processes array which is more
trouble than its worth due to reference issues.
Note: For tests which start IWD this will have no effect as the separator
is written prior to the test running. For these tests though, it is
much easier to read the log files because you can clearly see when
IWD starts and exits.
Processes which were not explicitly killed ended up staying around
forever because they internally held references to other objects
such as GLib IO watches or write FDs.
This shuffles some code so these objects get cleaned up both when
explititly killed and after being waited for.
This was a placeholder at one point but modules grew to depend on it
being a string. Fix these dependencies and set the root namespace
name to None so there is no more special case needed to handle both
a named namespace and the original 'root' namespace.
In netconfig_load_settings apply the DNS overrides strings we've loaded
instead of leaking them.
Fixes: ad228461abbf ("netconfig: Move loading settings to new method, refactor")
With various versions of wpa_supplicant tested, after an IWD GO tears
the group down, the wpa_supplicant P2P client will not immediately
signal that the group has disappeared but will at least wait for the
lost beacon signal, wait some more and try reconnecting and all that
takes it 10s or a little longer. Possibly sending Deauthenticate frames
to clients first would improve this.
netdev now assumes the SSID was set in the handshake (normally via
network_handshake_setup) but WSC calls netdev_connect directly so
it also should set the SSID.
In order to support OWE in the CMD_CONNECT path the scan_bss parameter
needs to be removed since this is lost after netdev_connect returns.
Nearly everything needed is also stored in the handshake except the
privacy capability which is now being mirrored in the netdev object
itself.
Check whether verbose output is enabled for process name arg[0] before
prepending the "ip netns exec" part to arg since arg[0] is going to be
"ip" after that.
Use the MAC addresses for the gateways and DNS servers received in the
FILS IP Assigment IE together with the gateway IP and DNS server IP.
Commit the IP to MAC mappings directly to the ARP/NDP tables so that the
network stack can skip sending the corresponding queries over the air.
Send and receive the FILS IP Address Assignment IEs during association.
As implemented this would work independently of FILS although the only
AP software handling this mechanism without FILS is likely IWD itself.
No support is added for handling the IP assignment information sent from
the server after the initial Association Request/Response frames, i.e.
the information is only used if it is received directly in the
Association Response without the "response pending" bit, otherwise the
DHCP client will be started.
Add two methods that will allow station to implement FILS IP Address
Assigment, one method to decide whether to send the request during
association, and fill in the values to be used in the request IE, and
another to handle the response IE values received from the server and
apply them. The netconfig->rtm_protocol value used when the address is
assigned this way remains RTPROT_DHCP because from the user's point of
view this is automatic IP assigment by the server, a replacement for
DHCP.
Split loading settings out of network_configure into a new method,
network_load_settings. Make sure both consistently handle errors by
printing messages and informing the caller.
These modules only needed to be imported a single time for the entire
run of tests. This is significantly cheaper in terms of memory and
should prevent random OOM exceptions.
The Procss class was doing quite a bit of what Popen already does like
storing the return code and process arguments. In addition the Process
class ended up storing a Popen object which was frequently accessed.
For both simplicity and memory savings have Process inherit Popen and
add the additional functionality test-runner needs like stdout
processing to output files and the console.
To do this Popen.wait() needed to be overridden to to prevent blocking
as well as wait for the HUP signal so we are sure all the process
output was written. kill() was also overritten to perform cleanup.
The most intrusive change was removing wait as a kwarg, and instead
requiring the caller to call wait(). This doesn't change much in
terms of complexity to the caller, but simplifies the __init__
routine of Process.
Some convenient improvements:
- Separate multiple process instance output (Terminate: <args> will
be written to outfiles each time a process dies.)
- Append to outfile if the same process is started again
- Wait for HUP before returning from wait(). This allows any remaining
output to be written without the need to manually call process_io.
- Store ctx as a class variable so callers don't need to pass it in
(e.g. when using Process directly rather than start_process)
Setter which forces the use of group 19 rather than the group order
that ELL provides. Certain APs have been found to have buggy group
negotiation and only work if group 19 is tried first, and only. When
an AP like this this is found (based on vendor OUI match) SAE will
use group 19 unconditionally, and fail if group 19 does not work.
Other groups could be tried upon failure but per the spec group 19
must be supported so there isn't much use in trying other, optional
groups.
mac80211_hwsim has a funny quirk with multiple addresses in
radios. Some operations require address index zero, some index
one. And these addresses (possibly a result of how test-runner
initializes radios) sometimes get mixed up. For example scan
results may show a BSS address as 02:00:00:00:00:00, while the
next test run shows 42:00:00:00:00:00.
Ultimately, sending out frames requires the first nibble of the
address to be 0x4 so to handle both variants of addresses described
above hwsim.py was updated to always bitwise OR the first byte
with 0x40.
Handle the 802.11ai FILS IP Address Assignment IEs in Association
Request frames when netconfig is enabled. Only IPv4 is supported.
Like the P2P IP Allocation mechanism, since the payload format and logic
is independent from the rest of the FILS standard this is enabled
unconditionally for clients who want to use it even though we don't
actually do FILS in AP mode.
If netconfig is enabled tell the DHCP server to expire any leases owned
by the client that is disconnecting by using l_dhcp_server_expire_by_mac
to return the IPs to the IP pool. They're added to the expired list
so they'd only be used if there are no other addresses left in the pool
and can be reactivated if the client comes back before the address is
used by somebody else.
This should ensure that we're always able to offer an address to a new
client as long as there are fewer concurrent clients than addresses in
the configured subnet or IP range.
Use the struct handshake_state::support_ip_allocation field already
supported in eapol.c authenticator side to enable the P2P IP Allocation
mechanism in ap.c. Add the P2P_GROUP_CAP_IP_ALLOCATION bit in P2P group
capabilities to signal the feature is now supported.
There's no harm in enabling this feature in every AP (not just P2P Group
Owner) but the clients won't know whether we support it other than
through that P2P-specific group capability bit.
Add a handshake event for use by the AP side for mechanisms that
allocate client IPs during the handshake: P2P address allocation and
FILS address assignment. This is emitted only when EAPOL or the
auth_proto is actually about to send the network configuration data to
the client so that ap.c can skip allocating a DHCP leases altogether if
the client doesn't send the required KDE or IE.
This test was failing due to a change introduced in commit
5c9de0cf23f9 which changed handshake state storage of IPs from host
order to network byte order. Update the test to set IPs in network
byte-order.
Fixes: 5c9de0cf23f9 ("eapol: Store IP address in network byte order")
Some drivers ignore the initial IF_OPER_UP setting that was sent during
netdev_connect_ok(). Attempt to work around this by parsing New Link
events. If OperState setting is still not correct in a subsequent event,
retry setting OperState to IF_OPER_UP.
The idea of this test is valid but it is extremely timing dependent
which simply isn't testable on all machines. Removing this test
at least until this can be tested reliably.
This was initially put in to solve an issue that was specific to
mac80211_hwsim where the connect callback would get queued and
delayed until after the connect event. This caused IWD to get very
confused.
Later it was found that "real" drivers can sometimes do this so
some code was added to IWD core to handle it.
Now there isn't much point to delay all frames unless a rule specifies
so change the behavior back to sending out frames immediately.
The hwsim Rule API was structured as properties so once a rule is
created it automatically starts being applied to frames. This happens
before anything has time to actually define the rule (source, destination
etc). This leads to every single frame being matched to the rule until
these other properties are added, which can result in unexpected behavior.
To fix this an "Enabled" property has been added and the rule will not
be applied until this is true.
The hotspot case can actually result in network being NULL which
ends up crashing when accessing "->secrets". In addition any
secrets on this network were never removed for hotspot networks
since everything happened in network_unset_hotspot.
testHotspot suffered from improper cleanup and if a single test failed
all subsequent tests would fail due to IWD still running since IWD()
was never cleaned up.
In addition the PSK agent and hwsim rules are now set onto the cls
object and removed in tearDownClass()
There are really no cases where a test wants to remove a single
rule. Most loop through and remove rules individually so this
is being added as a convenience.
Certain autotests coupled with slower test machines can result in lost
beacons and "Network not found" errors. In attempt to help with this
the test can just rescan (30 seconds max) until the network is found.
Remove EAP-SIM from the generic PEAP test case since skipping
(if ofono is not on system) would skip the entire test rather
than just the EAP-SIM portion.
This tests all EAP methods in their standard configuration. Any
corner cases requiring changes to main.conf or other hostapd
options are not included and will be left as stand alone tests.
This was done because nearly all EAP tests are identical except
the IWD provisioning file and hostapd EAP users fine. The IWD
provisioning file can be swapped out as needed for each individual
test without actually restarting IWD. And the EAP users file can
simply be written to include every possible EAP method that
is supported.
The -S/--sub-tests option allows the user to specify a test file
from inside an autotest. Inside this file there may also be many
test functions. This option is being extended to allow running
a single test function inside a test file. For example:
* Runs all test functions inside connection_test.py *
./test-runner -A some_test -S connection_test
* Runs only connection_test.py test_connect_success() *
./test-runner -A some_test -S connection_test.test_connect_success
The destructor was trying to do more than the scope of a destructor
by trying to handle this single case of hostapd being restarted.
Instead we can simply pass a keyword argument 'reinit' to the
constructor to tell it to reinitialize everything. And as for killing
hostapd this can be done in ungraceful_restart itself rather than
trying to handle it in the destructor.
There was a race condition here where the GLib timeout could have
fired but the test function returned successfully prior to the
end of the while loop. This would end up causing source_remove to
print a warning that the source did not exist.
Instead check if the timeout fired prior to removing it.
This (hopefully) will make this test pass better on slower machines.
In addition the mechanism of copying over separate main.conf files
was changed (rather than echo'ing the option into /tmp/main.conf)
This addresses the TODO where HostapdCLI was creating separate
objects each time HostapdCLI was called. This was worked around
by manually setting the important members but instead the class
can be re-worked to act as somewhat of a singleton, per-config
at least.
If there is no HostapdCLI instance for a given config one is
created and initialized. Subsequent HostapdCLI calls (for the
same config) will be returned the same object rather than a
new one.
Tests that called skipTest would result in an exception which would
hault execution as it was uncaught. In addition this wouldn't result
in an skipped test.
Now the actual test run is surrounded in a try/except block, skipped
exceptions are handled specifically, and a stack trace is printed if
some other exception occurs.
dmesg was being called at the very end of testing and dumped into
a log file. If many tests were run this could take quite a long
time and was timing out the default process wait. Instead --follow
can be used (basically like 'tail') which prints messages as they
come and avoids the time consuming full dump at the end.
The hotspot ANQP delay test was setting a global delay on all
packets which had some unintended consequences. At the time this
was the only way of simulating the test scenario but now hwsim
supports prefix matching so only the ANQP request/response will
be delayed.
This test was accessing the subprocess object and calling terminate
which ends up causing issues with test-runners own process cleanup.
Instead kill() should be used.
Hostapd sometimes has trouble with specifying additional BSSs in
a single config file, at least in the test-runner environment.
Since all the BSS's specified were identical instead the test was
reworked to only have a single BSS and each subtest can connect
in its own unique way.
This test took quite a while to execute (~2 minutes on my machine)
because there was simply no other way to test this scenario but
waiting. Now the no-roam-candidates condition can be waited for
rather than just sleeping for 20 seconds. Additionally the default
RoamRetryInterval was being used which is 60 seconds. Instead
main.conf can set this to 5 seconds and really cut down on the time
to wait.
Part of a comment was also removed due to being incorrect. Even
with neighbor reports IWD still must scan, its just that the
scan is more limited and, in theory, faster.
This is meant to be used as a generic notification to autotests. For
now 'no-roam-candidates' is the only event being sent. The idea
is to extend these events to signal conditions that are otherwise
undiscoverable in autotesting.
There was a common bit of code all over test-runner and utilities
which would wait for 'something' in a loop. At best these loops
would do the right thing and use the GLib.iteration call as to not
block the main loop, and at worst would not use it and just busy
wait.
Namespace.non_block_wait unifies all these into a single API to
a) do the wait correctly and b) prevent duplicate code.
Replace instances of the ap_del_station() +
ap_sta_free()/ap_remove_sta() with calls to ap_station_disconnect to
make sure we consistently remove the station from the ap->sta_states
queue before using ap_del_station(). ap_del_station() may generate an
event to the ap.h API user (e.g. P2P) and this may end up tearing down
the AP completely.
For that scenario we also don't want ap_sta_free() to access sta->ap so
we make sure ap_del_station() performs these cleanup steps so that
ap_sta_free() has nothing to do that accesses sta->ap.
client_frame is not valid for a beacon frame as beacons are not sent in
response to another frame. Move the access to client_frame->address_2
to the conditional blocks for Probe Response and Association Response
frames.
get_ordered_network() now scans automatically and has been updated
to use the StationDebug.Scan() API rather than doing a full
dbus scan (unless full_scan = True). The frequencies to be scanned
are picked automatically based on the current hostapd status
(hidden behind ctx.hostapd.get_frequency()).
This is to support the autotesting framework by allowing a smaller
scan subset. This will cut down on the amount of time spent scanning
via normal DBus scans (where the entire spectrum is scanned).
While losing the convenience of unittest this patch breaks out
each individual test function in order to run it manually and
get results. This vastly improves the user experience by seeing
which test file and function is being executed rather than simply
seeing "PASSED" for the entire test set.
In addition exceptions/failures are printed out as they happen
rather than at the end.
This changes all tests to use the default get_ordered_network behavior
rather than some custom or incorrect logic. Any use of
scan_if_needed=True has been removed since this is now the default.
Also any explicit scanning has been removed for tests which do not
require it (where the default behavior is good enough).
With the addition of connect_bssid/roam very few tests actually
require hwsim. Since hwsim can lead to problems with scan results
its best to have it off by default and have each test that needs
it explicitly turn it on.
Tests which previously turned it off have had that option removed.
Tests that do require hwsim still are vulnerable to scan result
problems, so for these tests beacon_int was added to the hostapd
config which seems to help with reliability somewhat.
There is a common block of code in nearly every test which is incorrect,
most likely a copy-paste from long ago. It goes something like:
wd.wait_for_object_condition(device, 'not obj.scanning')
device.scan()
wd.wait_for_object_condition(device, 'not obj.scanning')
network = device.get_ordered_network("ssid")
The problem here is that sometimes the scanning property does not get
updated fast enough before device.scan() returns, meaning get_ordered_network
comes up with nothing. Some tests pass scan_if_needed=True which 'fixes'
this but ends up re-scanning after the original scan finishes.
To put this to rest scan_if_needed is now defaulted to True, and no
explicit scan should be needed.
Most autotests do not want autoconnect behavior so it is being
turned off by default. There are a few tests where it is needed
and in these few cases the test can enable autoconnect through
the new station debug property.
This adds the property "AutoConnect" to the station debug interface
which can be read/written to disable or enable autoconnect globally.
As one would expect this property is only going to be used for testing
hence why it was put on the debug interface. Mosts tests disable
autoconnect (or they should) because it leads to unexpected connections.
In addition the send_bss_transition call was updated to only send a
single BSS. By sending two BSS's IWD is left to pick whichever one
it wants which makes the test behavior undefined.
This will use the Roam() developer method to force a roam to
a certain BSS. This is particularly useful for any test requiring
roams that are not testing IWD's BSS selection logic. Rather than
creating hwsim rules, setting low RSSI values, and waiting for the
roam logic/scan to happen Roam() can be used to force the roam
logic immediately.
Several tests tests for connectivity with the expectation that it
will fail. This ends up taking 30+ seconds because testutil retries
3 times, each with a 10 second timeout. By passing expect_fail=True
this lowers the timeout to zero, and skips any retries.
If a test has no hw.conf file test-runner was fully exiting and not
running any additional tests. This shouldn't happen in practice
since all upstreamed tests should run, but if any locally created
tests existed like this, it would cause the entire test run to exit
early.
Instead raise an exception which bails out of only that test, and
allows the rest to continue.
This method will initiate a connection to a specific BSS rather
than relying on a network based connection (which the user has
no control over which specific BSS is selected).
The only point of failure in netdev_connect_common was setting
up the handshake type. Moving this outside of netdev_connect_common
makes the code flow much better in netdev_{connect,reassociate} as
nothing needs to be reset upon failure.
Utilize 'storage_is_file' when readdir returns DT_UNKNOWN to ensure
features like autoconnect work on filesystems that don't return a d_type
(eg. XFS).
Utilize 'storage_is_file' when readdir returns DT_UNKNOWN to ensure
features like autoconnect work on filesystems that don't return a d_type
(eg. XFS).
Add a function 'storage_is_file' which will use stat to verify a
file's existence given a path relative to the storage directory.
Not all filesystems provide a file type via readdir's d_type.
XFS is a notable system with optional d_type support.
When d_type is not supported stat must be used as a fallback.
If a stat fallback is not provided iwd will fail to load state files.
The preparing_roam flag is expected to be set by a few roam
routines and normally this is done prior to the roam scan.
The Roam() developer option was not doing this and would
cause failed roams in some cases.
This adds support in netdev_reassociate for all the auth
protocols (SAE/FILS/OWE) by moving the bulk of netdev_connect
into netdev_connect_common. In addition PREV_BSSID is set
in the associate message if 'in_reassoc' is true.
Some connections, like Hotspot require additional IEs to be used during
the Association. These are now passed as 'extra_ies' when invoking
netdev_connect, however they are also needed during ReAssociation and FT
to such APs.
Additionally, it may be that Hotspot-enabled APs will start utilizing
FILS or SAE. In these cases the extra_ies need to be accounted for
somehow, either by making a copy in handshake_state, netdev, or the
auth_proto itself. Similarly, P2P which heavily uses vendor IEs can be
used over SAE in the future.
Since a copy of these IEs is needed, might as well store them in
handshake_state itself for easy book-keeping by network/station.
RM Enabled Capabilities and Extended Capabilities IEs were correctly
being sent when using CMD_CONNECT for initial connections and
re-associations. However, for SoftMac SAE, FT, FILS and OWE connections,
these additional IEs were not added properly during the Associate step.
If the driver supports RRM, then we might as well always send the RM
Enabled Capabilities IE (and use the USE_RRM flag). 802.11-2020
suggests that this IE can be sent whenever
dot11RadioMeasurementActivated is true, and this setting is independent
of whether the peer supports RRM. There's nothing to indicate that an
STA should not send these IEs if the AP is not RRM enabled.
While we correctly emit a NETDEV_EVENT_CHANNEL_SWITCHED event from
netdev for other modules to respond to, we fail to actually update the
frequency of the netdev object in question. Since the netdev frequency
is used elsewhere (e.g. to send action frames), it needs updating too.
Fixes: 5eb0b7ca8e04 ("netdev: add a channel switch event")
This variable ended up being used only on the fast-transition path. On
the re-associate path it was never used, but memcpy-ied nevertheless.
Since its only use is by auth_proto based protocols, move it to the
auth_proto object directly.
Due to how prepare_ft works (we need prev_bssid from the handshake, but
the handshake is reset), have netdev_ft_* methods take an 'orig_bss'
parameter, similar to netdev_reassociate.
IE elements in various management frames are ordered. This ordering is
outlined in 802.11, Section 9.3.3. The ordering is actually different
depending on the frame type. Instead of trying to implement the order
manually, add a utility function that will sort the IEs in the order
expected by the particular management frame type.
Since we already have IE ordering look up tables in the various
management frame type validation functions, move them to global level
and re-use these lookup tables for the sorting utility.
This refactors some code to eliminate getting the ERP entry twice
by simply returning it from network_has_erp_identity (now renamed
to network_get_erp_cache). In addition this code was moved into
station_build_handshake_rsn and properly cleaned up in case there
was an error or if a FILS AKM was not chosen.
The authorized macs pointer was being set to either the wsc_beacon
or wsc_probe_response structures, which were initialized out of
scope to where 'amacs' was being used. This resulted in an out of
scope read, caught by address sanitizers.
One of these message buffers was overflowing due to padding not
being taken into account (caught by sanitizers). Wrapped the length
of all message buffers with EAP_SIM_ROUND as to account for any
padding that attributes may add.
The Process class requires the ability to write out any processes
output to stdout, logging, or an explicit file, as well as store
it inside python for processing by test utilities. To accomplish
this each process was given a temporary file to write to, and that
file had an IO watch set on it. Any data that was written was then
read, and re-written out to where it needed to go. This ended up
being very buggy and quite complex due to needing to mess with
read/write pointers inside the file.
Popen already creates pipes to stdout if told, and they are accessable
via the p.stdout. Its then as simple as setting an IO watch on that
pipe and keeping the same code for reading out new data and writing
it to any files we want. This greatly reduces the complexity.
After some code changes the FT-FILS AKM was no longer selectable
inside network_can_connect_bss. This normally shouldn't matter
since station ends up selecting the AKM explicitly, including
passing the fils_hint, but since the autotests only included
FT-FILS AKMs this caused the transition to fail with no available
BSS's.
To fix this the standard 8021x AKM was added to the hostapd
configs. This allows these BSS's to be selected when attempting
to roam, but since FT-FILS is the only other AKM it will be used
for the actual transition.
testScan was creating 10 separate hidden networks which
sometimes bogged down hostapd to the point that it would
not start up in time before test-runner's timeouts fired.
This appeared to be due to hostapd needing to create 10
separate interfaces which would sometimes fail with -ENFILE.
The test itself only needed two separate networks, so instead
the additional 8 can be completely removed.
Occationally python will fatally terminate trying to load a test
using importlib with an out of memory exception. Increasing RAM
allows reliable exection of all tests.
When logging is enabled TLS debugging is turned on which creates
a PEM file during runtime. There is no way for IWD itself to clean
this up since its meant to be there for debugging.
The network_config was not being copied to network_info when
updated. This caused any new settings to be lost if the network
configuration file was updated during runtime.
The RoamThreshold5G was never honored because it was being
set prior to any connections. This caused the logic inside
netdev_cqm_rssi_update to always choose the 2GHz threshold
(RoamThreshold) due to netdev->frequency being zero at this time.
Instead call netdev_cqm_rssi_update in all connect/transition
calls after netdev->frequency is updated. This will allow both
the 2G and 5G thresholds to be used depending on what frequency
the new BSS is.
The call to netdev_cqm_rssi_update in netdev_setup_interface
was also removed since it serves no purpose, at least now
that there are two thresholds to consider.
Under certain conditions, access points with very low signal could be
detected. This signal is too low to estimate a data rate and causes
this L_WARN to fire. Fix this by returning a -ENETUNREACH error code in
case the signal is too low for any of the supported rates.
The scan ranking logic was previously changed to be based off a
theoretical calculated data rate rather than signal strength.
For HT/VHT networks there are many data points that can be used
for this calculation, but non HT/VHT networks are estimated based
on a simple table mapping signal strengths to data rates.
This table starts at a signal strength of -65 dBm and decreases from
there, meaning any signal strengths greater than -65 dBm will end up
getting the same ranking. This poses a problem for 3/4 blacklisting
tests as they set signal strengths ranging from -20 to -40 dBm.
IWD will then autoconnect to whatever network popped up first, which
may not be the expected network.
To fix this the signal strengths were changed to much lower values
which ensures IWD picks the expected network.
Newer QEMU version warn that msize is set too low and may result
in poor IO performance. The default is 8KiB which QEMU claims is
too low. Explicitly setting to 10KiB removes the warning:
qemu-system-x86_64: warning: 9p: degraded performance: a
reasonable high msize should be chosen on client/guest side
(chosen msize is <= 8192).
See https://wiki.qemu.org/Documentation/9psetup#msize for details.
Transition Disable indications and information stored in the network
profile needs to be enforced. Since Transition Disable information is
now stored inside the network object, add a new method
'network_can_connect_bss' that will take this information into account.
wiphy_can_connect method is thus deprecated and removed.
Transition Disable can also result in certain AKMs and pairwise ciphers
being disabled, so wiphy_select_akm method's signature is changed and
takes the (possibly overriden) ie_rsn_info as input.
This indication can come in via EAPoL message 3 or during
FILS Association. It carries information as to whether certain
transition mode options should be disabled. See WPA3 Specification,
version 3 for more details.
Some network settings keys are set / parsed in multiple files. Add a
utility to parse all common network configuration settings in one place.
Also add some defines to make sure settings are always saved in the
expected group/key.
This returns the length of the actual contents, making the code a bit
easier to read and avoid the need to mask the KDE value which isn't
self-explanatory.
The SAE unit test was written when group 19 was preferred by default for
all SAE connections. However, we have now started to prefer higher
security groups. Trick the test into using group 19 by wrapping
l_ecc_supported_ike_groups implementation to return just curve 19 as a
supported curve.
ERROR: AddressSanitizer: global-buffer-overflow on address 0x000000512c08 at pc 0x00000041848d bp 0x7ffcdde71870 sp 0x7ffcdde71860
READ of size 8 at 0x000000512c08 thread T0
#0 0x41848c in print_attributes monitor/nlmon.c:6268
#1 0x42ac53 in print_message monitor/nlmon.c:6544
#2 0x438968 in nlmon_message monitor/nlmon.c:6698
#3 0x43d5e4 in nlmon_receive monitor/nlmon.c:7658
#4 0x4b3cd0 in io_callback ell/io.c:120
#5 0x4b085a in l_main_iterate ell/main.c:478
#6 0x4b0ee3 in l_main_run ell/main.c:525
#7 0x4b0ee3 in l_main_run ell/main.c:507
#8 0x4b13ac in l_main_run_with_signal ell/main.c:647
#9 0x4072fe in main monitor/main.c:811
Break up the SAE tests into two parts: testSAE and testSAE-AntiClogging
testSAE is simplified to only use two radios and a single phy managed
by hostapd. hostapd configurations are changed via the new 'set_value'
method added to hostapd utils. This allows forcing hostapd to use a
particular sae group set, or force hostapd to use SAE H2E/Hunting and
Pecking Loop for key derivation. A separate test for IKE Group 20 is no
longer required and is folded into connection_test.py
testSAE-AntiClogging is added with an environment for 5 radios instead
of 7, again with hostapd running on a single phy. 'sae_pwe' is used to
force hostapd to use SAE H2E or Hunting and Pecking for key derivation.
Both Anti-Clogging protocol variants are thus tested.
main.conf is added to both directories to force scan randomization off.
This seems to be required for hostapd to work properly on hwsim.
Instead of requiring each auth_proto to perform validation of the frames
received via rx_authenticate & rx_associate, have netdev itself perform
the mpdu validation. This is unlikely to happen anyway since the kernel
performs its own frame validation. Print a warning in case the
validation fails.
There's no reason why a change in groups would result in the
anti-clogging token becoming invalid. This might result in us needing
an extra round-trip if the peer is using countermeasures and our
requested group was deemed unsuitable.
We may receive multiple anti-clogging request messages. We memdup the
token every time, without checking whether memory for one has already
been allocated. Free the old token prior to allocating a new one.
The group was not checked at all. The specification doesn't
mention doing so specifically, but we are only likely to receive an Anti
Clogging Token Request message once we have sent our initial Commit. So
the group should be something we could have sent or might potentially be
able to use.
In case an exceptional condition occurs, handle this more consistently
by returning the following errors:
-ENOMSG -- If a message results in the retransmission timer t0 being
restarted without actually sending anything.
-EBADMSG -- If a received message is to be silently discarded without
affecting the t0 timer.
-ETIMEDOUT -- If SYNC_MAX has been exceeded
-EPROTO -- If a fatal protocol error occurred
Now that sae_verify_* methods no longer allow dropped frames though,
there's no reason to keep these checks. sae_process_commit and
sae_process_confirm will now always receive messages in their respective
state.
sae_verify_* functions were correctly marking frames to be dropped, but
were returning 0, which caused the to-be-dropped frames to be further
processed inside sae_rx_authenticate. Fix that by returning a proper
error.
Make sure to return -EAGAIN whenever a received frame from the peer
results in a retransmission. This also prevents the frame from being
mistakenly processed further in sae_rx_authenticate.
Do not try to transition to a new state from sae_send_commit /
sae_send_confirm since these methods can be called due to
retransmissions or other unexpected messages. Instead, transition to
the new state explicitly from sae_process_commit / sae_process_confirm.
SAE protocol is meant to authenticate peers simultaneously. Hence it
includes a tie-breaker provision in case both peers enter into the
Committed state and the Commit messages arrive at the respective peers
near simultaneously.
However, in the case of STA or Infrastructure mode, only one peer (STA)
would normally enter the Committed state (via Init) and the tie-breaker
provision is not needed. If this condition is detected, abort the
connection.
Also remove the uneeded group change check in process_commit.
sae_compute_pwe doesn't really depend on the state of sae_sm. Only the
curve to be used for the PWE calculation is needed. Rework the function
signature to reflect that and remove unneeded member of struct sae_sm.
ie_tlv_builder_init takes a size_t as input, yet for some reason
ie_tlv_builder_finalize takes an unsigned int argument as output. Fix
the latter to use size_t as well.
During processing of Connect events by netdev, some of these elements
might be updated even when already set. Instead of issuing
l_free/l_memdup each time, check and see whether the elements are
bitwise identical first.
Returns a template RSNX element that can be further modified by callers
to set any additional capabilities if required. wiphy will fill in
those capabilities that are driver / firmware dependent.
Most parameters set into the handshake object are actually known by the
network object itself and not station. This includes address
randomization settings, EAPoL settings, passphrase/psk/8021x settings,
etc. Since the number of these settings will only keep growing, move
the handshake setup into network itself. This also helps keep network
internals better encapsulated.
Refactor network_sync_psk to not require setting attributes into
multiple settings objects. This is in fact unnecessary as the parsed
security parameters are used everywhere else instead. Also make sure to
wipe the [Security] group first, in case any settings were invalid
during loading or otherwise invalidated.
Credentials obtained can now be either in passphrase or PSK form. Prior
to commit 7a9891dbef5b, passphrase credentials were always converted to
PSK form by invoking crypto_psk_from_passphrase. This was changed in
order to support WPA3 networks. Unfortunately the provisioning logic
was never properly updated. Fix that, and also try to not overwrite any
existing settings in case WSC is providing credentials for networks that
are already known.
Fixes: 7a9891dbef5b ("wsc: store plain text passphrase if available")
There will be additional security-related settings that will be
introduced for settings files. In particular, Hash-to-Curve PT
elements, Transition Disable settings and potentially others in the
future. Since PSK is now not the only element that would require
update, rename this function to better reflect this.
PRF+ from RFC 5295 is the more generic function using which HKDF_Expand
is defined. Allow this function to take a vararg list of arguments to
be hashed (these are referred to as 'S' in the RFCs).
Implement hkdf_expand in terms of prf_plus and update all uses to the
new syntax.
This fixes an issue where the udp port was not being opened due to a
permission denied error. The result of this was the dhcp client would
fail to send the renewal request and so the dhcp lease would expire.
The addition of the CAP_NET_BIND_SERVICE capability allows the service
to open sockets in the restricted port range (<1024) which is required
for dhcp.
This is based on a previous patch by Roberto Santalla Fernández.
A new config is introduced into the network config file under IPv4
called SendHostname. If this is set to true then we add the hostname
into all DHCP requests. The default is false.
If the idea is that the interface should only be present when connected
then don't do this in the DISCONNECTING state as there are various
possible transitions from CONNECTED or ROAMING directly to DISCONNECTED.
The Changed() method did not actually return anything, and in fact the
no_reply flag for that message was set.
Similarly, the Release method does not expect a reply.
Don't require a gateway address from the settings file or from the DHCP
server when doing netconfig. Failing when the gateway address was
missing was breaking P2P but also small local networks.
Be paranoid and check that the prefix length in addresses from
used_addr4_list are not zero (they shouldn't be) and that address family
is AF_INET (it should be), mainly to quiet coverity warnings:
While there also fix one line's indentation.
At the end of ip_pool_select_addr4() we'd check if the selected address
is equal to the subnet address and increment it by 1 to produce a valid
host address for the AP. That check was always correct only with 24-bit
prefix, extend it to actually use the prefix-dependent mask instead of
0xff. Fixes a testAP failure triggered 50% of the times because the
netmask is 28 bit long there.
Don't signal the connected state until the client has obtained a DHCP
lease and we can set the ConnectedIP property. From now on that
property is always set when there's a connection.
p2p_parse_association_req() already extracts the P2P IE payload from the
IE sequence, there's no need to call ie_tlv_extract_p2p_payload before
it. Pass the IE sequence directly to p2p_parse_association_req().
Similarly to commit
27d302a0 ("band: Add a utility to estimate VHT rx data rate"), this
commit adds an RX data rate estimation utility for HT connections.
This function is meant to supercede a similar function in ie.c. The
current approach results in very optimistic data rate estimates since it
only takes into account the VHT/HT Capabilities IEs. It does not take
into account any local hardware limitations (such as no VHT/HT support),
limited RX MCS sets & number of spatial streams. It also does not take
into account that the AP might not be actually operating on higher
bandwidth channels.
This function is meant to address that by matching peer TX MCS sets with
the local hardware RX MCS set capability. It also takes into account
channel bandwidth capabilities of the local hardware, as well as whether
the AP is actually operating on a wider channel.
Move the band definition out of wiphy.c and into band.[ch]. This is
done to make certain utilities that depend on band information capable
of being tested from unit tests.
The band concept will most likely grow over time. For now, the only
user will be wiphy.c and unit tests, so the structures are kept public.
It is possible that the address set command succeeds just after a
netconfig object has been destroyed.
==6485== Invalid read of size 8
==6485== at 0x458A6D: netconfig_ipv4_routes_install (netconfig.c:629)
==6485== by 0x458D1C: netconfig_ipv4_ifaddr_add_cmd_cb (netconfig.c:689)
==6485== by 0x4A5E7B: process_message (netlink.c:181)
==6485== by 0x4A626A: can_read_data (netlink.c:289)
==6485== by 0x4A3E19: io_callback (io.c:120)
==6485== by 0x4A27B5: l_main_iterate (main.c:478)
==6485== by 0x4A28F6: l_main_run (main.c:525)
==6485== by 0x4A2C0E: l_main_run_with_signal (main.c:647)
==6485== by 0x404D27: main (main.c:542)
==6485== Address 0x4a47290 is 32 bytes inside a block of size 104 free'd
==6485== at 0x48399CB: free (vg_replace_malloc.c:538)
==6485== by 0x49998B: l_free (util.c:136)
==6485== by 0x457699: netconfig_free (netconfig.c:130)
==6485== by 0x45A038: netconfig_destroy (netconfig.c:1163)
==6485== by 0x41FD16: station_free (station.c:3613)
==6485== by 0x42020E: station_destroy_interface (station.c:3710)
==6485== by 0x4B990E: interface_instance_free (dbus-service.c:510)
==6485== by 0x4BC193: _dbus_object_tree_remove_interface (dbus-service.c:1694)
==6485== by 0x4BA22A: _dbus_object_tree_object_destroy (dbus-service.c:795)
==6485== by 0x4B078D: l_dbus_unregister_object (dbus.c:1537)
==6485== by 0x417ACB: device_netdev_notify (device.c:361)
==6485== by 0x4062B6: netdev_free (netdev.c:808)
==6485== Block was alloc'd at
==6485== at 0x483879F: malloc (vg_replace_malloc.c:307)
==6485== by 0x499857: l_malloc (util.c:62)
==6485== by 0x459DC0: netconfig_new (netconfig.c:1115)
==6485== by 0x41FC29: station_create (station.c:3592)
==6485== by 0x4207B3: station_netdev_watch (station.c:3864)
==6485== by 0x411A17: netdev_initial_up_cb (netdev.c:5588)
==6485== by 0x4A5E7B: process_message (netlink.c:181)
==6485== by 0x4A626A: can_read_data (netlink.c:289)
==6485== by 0x4A3E19: io_callback (io.c:120)
==6485== by 0x4A27B5: l_main_iterate (main.c:478)
==6485== by 0x4A28F6: l_main_run (main.c:525)
==6485== by 0x4A2C0E: l_main_run_with_signal (main.c:647)
==6485==
netdev_free relies on netdev->connected being set to detect whether a
connection is in progress. This variable is only set once the driver
has been connected however, so for situations where a CMD_CONNECT is
still 'in flight' or if the wiphy work is still pending, the ongoing
connection will not be canceled. Fix that by being more thorough when
trying to detect that a connection is in progress.
src/wiphy.c:wiphy_radio_work_next() Starting work item 2
Terminate
src/netdev.c:netdev_free() Freeing netdev wlan0[9]
src/device.c:device_free()
src/station.c:station_free()
src/netconfig.c:netconfig_destroy()
Removing scan context for wdev c
src/scan.c:scan_context_free() sc: 0x4a44c80
src/netdev.c:netdev_mlme_notify() MLME notification New Station(19)
src/netdev.c:netdev_link_notify() event 16 on ifindex 9
==6356== Invalid write of size 4
==6356== at 0x40A253: netdev_cmd_connect_cb (netdev.c:2522)
==6356== by 0x4A8886: process_unicast (genl.c:986)
==6356== by 0x4A8C48: received_data (genl.c:1098)
==6356== by 0x4A3DFD: io_callback (io.c:120)
==6356== by 0x4A2799: l_main_iterate (main.c:478)
==6356== by 0x4A28DA: l_main_run (main.c:525)
==6356== by 0x4A2BF2: l_main_run_with_signal (main.c:647)
==6356== by 0x404D27: main (main.c:542)
==6356== Address 0x4a3e418 is 152 bytes inside a block of size 472 free'd
==6356== at 0x48399CB: free (vg_replace_malloc.c:538)
==6356== by 0x49996F: l_free (util.c:136)
==6356== by 0x406662: netdev_free (netdev.c:886)
==6356== by 0x4129C2: netdev_shutdown (netdev.c:5980)
==6356== by 0x403A14: iwd_shutdown (main.c:79)
==6356== by 0x403A7D: signal_handler (main.c:90)
==6356== by 0x4A2AFB: sigint_handler (main.c:612)
==6356== by 0x4A2F3B: handle_callback (signal.c:78)
==6356== by 0x4A3030: signalfd_read_cb (signal.c:104)
==6356== by 0x4A3DFD: io_callback (io.c:120)
==6356== by 0x4A2799: l_main_iterate (main.c:478)
==6356== by 0x4A28DA: l_main_run (main.c:525)
==6356== Block was alloc'd at
==6356== at 0x483879F: malloc (vg_replace_malloc.c:307)
==6356== by 0x49983B: l_malloc (util.c:62)
==6356== by 0x4121BD: netdev_create_from_genl (netdev.c:5776)
==6356== by 0x451F6F: manager_new_station_interface_cb (manager.c:173)
==6356== by 0x4A8886: process_unicast (genl.c:986)
==6356== by 0x4A8C48: received_data (genl.c:1098)
==6356== by 0x4A3DFD: io_callback (io.c:120)
==6356== by 0x4A2799: l_main_iterate (main.c:478)
==6356== by 0x4A28DA: l_main_run (main.c:525)
==6356== by 0x4A2BF2: l_main_run_with_signal (main.c:647)
==6356== by 0x404D27: main (main.c:542)
If the daemon is started and killed rapidly on startup, it is possible
for netdev_shutdown to be called prior to manager processing messages
that actually create the netdev itself. Since the netdev_list has
already been freed, the storage is lost. Fix that by destroying
netdev_list only when the module is unloaded.
If we're going down, make sure to notify any watches about EVENT_DEL
earlier. Not doing so might result in us not cleaning up requests that
might have been started as the result of this event.
station_free() is invoked when one of two possibilities happen:
- Device has been powered down, and EVENT_DOWN has been emitted
- Device has been removed, and EVENT_DEL has been emitted
In both cases there is not much point for netdev_disconnect to be
invoked as that tries to cleanly shut down an existing connection. The
only thing the ABORTED error accomplishes in this case is to send a
dbus_aborted_error for the pending_connect message, if it exists.
There's already code for doing this in station_free().
src/station.c:station_enter_state() Old State: autoconnect_quick, new state: connecting (auto)
src/scan.c:scan_cancel() Trying to cancel scan id 1 for wdev 7
src/wiphy.c:wiphy_radio_work_done() Work item 1 done
src/wiphy.c:wiphy_radio_work_next() Starting work item 2
Terminate
src/netdev.c:netdev_free() Freeing netdev wlan0[9]
src/device.c:device_free()
src/station.c:station_free()
src/wiphy.c:wiphy_radio_work_done() Work item 2 done
src/station.c:station_connect_cb() 9, result: 5
src/netconfig.c:netconfig_destroy()
Removing scan context for wdev 7
src/scan.c:scan_context_free() sc: 0x4a39490
src/netdev.c:netdev_mlme_notify() MLME notification New Station(19)
src/netdev.c:netdev_link_notify() event 16 on ifindex 9
src/netdev.c:netdev_link_notify() event 16 on ifindex 9
src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
src/netdev.c:netdev_link_notify() event 16 on ifindex 9
src/netdev.c:netdev_mlme_notify() MLME notification Associate(38)
src/netdev.c:netdev_link_notify() event 16 on ifindex 9
src/netdev.c:netdev_mlme_notify() MLME notification Connect(46)
src/netdev.c:netdev_link_notify() event 16 on ifindex 9
src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is US
src/netdev.c:netdev_link_notify() event 16 on ifindex 9
src/netdev.c:netdev_unicast_notify() Unicast notification 129
src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX
==20311== Invalid write of size 4
==20311== at 0x406E74: netdev_cmd_disconnect_cb (netdev.c:1130)
==20311== by 0x4A78A8: process_unicast (genl.c:986)
==20311== by 0x4A7C6A: received_data (genl.c:1098)
==20311== by 0x4A2E1F: io_callback (io.c:120)
==20311== by 0x4A17BB: l_main_iterate (main.c:478)
==20311== by 0x4A18FC: l_main_run (main.c:525)
==20311== by 0x4A1C14: l_main_run_with_signal (main.c:647)
==20311== by 0x404D27: main (main.c:542)
==20311== Address 0x4a37a0c is 156 bytes inside a block of size 472 free'd
==20311== at 0x48399CB: free (vg_replace_malloc.c:538)
==20311== by 0x498991: l_free (util.c:136)
==20311== by 0x406651: netdev_free (netdev.c:883)
==20311== by 0x412976: netdev_shutdown (netdev.c:5970)
==20311== by 0x403A14: iwd_shutdown (main.c:79)
==20311== by 0x403A7D: signal_handler (main.c:90)
==20311== by 0x4A1B1D: sigint_handler (main.c:612)
==20311== by 0x4A1F5D: handle_callback (signal.c:78)
==20311== by 0x4A2052: signalfd_read_cb (signal.c:104)
==20311== by 0x4A2E1F: io_callback (io.c:120)
==20311== by 0x4A17BB: l_main_iterate (main.c:478)
==20311== by 0x4A18FC: l_main_run (main.c:525)
The data rate estimation belongs in wiphy since it should take hardware
capabilities into account. Right now the data rate calculation simply
assumes the hardware is as capable as the AP. scan.c will be ported to
use this utility and the data rate estimation will be expanded to take
wiphy capabilities into account.
scan_parse_result used to parse the wdev and return this to the caller
where it was compared against the expected wdev. Simplify this by
extract the wdev first, and proceeding with the bss parsing afterwards.
Right now a very limited set of band parameters are parsed into wiphy.
This includes the supported rates and the supported frequencies.
However, there is much more information that is given for each band.
Introduce a new band object that will store this information and can be
extended for future use.
[General].APRange is now [IPv4].APAddressPool and the netmask is changed
from 23 to 27 bits to make the test correctly assert that only two
default-sized subnets are allowed by IWD simultaneously (default has
changed from 24 to 28 bits)
Change the char *addr_str and uint8_t prefix_len pair to an
l_rtnl_address object and use ell/rtnl.h utilities that use that
directly. Extend broadcast_from_ip to handle prefix_len.
We generate the DBus error reply type from the errno only when
ap_start() was failing synchronously, now also send the errno through
the callbacks so that we can also return a specific DBus reply when
failing asynchronously. Thea AP autotest relies on receiving the
AlreadyExists DBus error.
Deprecate the global [General].APRanges setting in favour of
[IPv4].APAddressPool with an extended (but backwards-compatible) syntax.
Drop the existing address pool creation code.
The new APAddressPool setting has the same syntax as the profile-local
[IPv4].Address setting and the subnet selection code will fall back
to the global setting if it's missing, this way we use common code to
handle both settings.
Extend the [IPv4].Address setting's syntax to allow a new format: a list
of <IP>/<prefix_len> -form strings that define the address space from
which a subnet is selected. Rewrite the DHCP settings loading with
other notable changes:
* validate some of the settings more thoroughly,
* name all netconfig-related ap_state members with the netconfig_
prefix,
* make sure we always call l_dhcp_server_set_netmask(),
* allow netmasks other than 24-bit and change the default to 28 bits,
* as requested avoid using the l_net_ ioctl-based functions although
l_dhcp still uses them internally,
* as requested avoid touching the ap_state members until the end of
some functions so that on error they're basically a no-op (for
readability).
Add the ip_pool_select_addr4 function to select a random subnet of requested
size from an address space defined by a string list (for use with the
AP profile [IPv4].Address and the global [IPv4].APAddressPool settings),
avoiding those subnets that conflict with subnets in use. We take care
to give a similar weight to all subnets contained in the specified
ranges regardless of how many ranges contain each, basically so that
overlapping ranges don't affect the probabilities (debatable.)
Add the ip-pool submodule that tracks IPv4 addresses in use on the
system for use when selecting the address for a new AP. l_rtnl_address
is used internally because if we're going to return l_rtnl_address
objects it would be misleading if we didn't fill in all of their
properties like flags etc.
If the connected BSS changes channel, netdev will emit an event with the
new channel's frequency. In response, have station change the frequency
of the connected scan_bss struct and inform network about the update.
If the connected BSS announces that it is switching operating channel,
the kernel may emit the NL80211_CMD_CH_SWTICH_NOTIFY event when the
switch is complete. Add a new netdev event NETDEV_EVENT_CHANNEL_SWITCHED
to signal to interested modules that the connected BSS has changed
channel. The event carries a pointer to the new channel's frequency.
NL80211_BSS_LAST_SEEN_BOOTTIME is expressed in nanoseconds, while BSS
timestamps are expressed in microseconds internally. Convert the
attribute to microseconds when using it to timestamp a BSS. This makes
iwd expire absent BSSes within 30 seconds as intended.
Fixes: 454cee12d473 ("scan: Use kernel-reported time-stamp if provided")
Right now, if a connection to a network selected by auto-connect fails,
the entire autoconnect process is restarted. This means that scans are
kicked off again, auto-connect list is rebuilt, etc. This was due to
auto-connect reusing the same failure path as connections triggered via
D-Bus.
The above behavior can lead to weird situations in certain corner cases.
For example, a highly preferred network configured with the wrong
password would result in auto-connect entering an infinite loop.
Fix this by making sure that all auto-connect entries are tried and
exhausted prior to re-scanning again.
The temporary ban list is cleared when a network is connected to
successfully, and also in network_connect_failed. Unfortunately,
network_connect_failed is not called in all paths (i.e. during
autoconnect) since it messes with the state of secrets and passphrases.
Clear the list in network_disconnected() instead, since it is guaranteed
to be called in every circumstance.
This will be effectively the same as the CONNECTING state, but can be
used to enable differing behavior, depending on whether connection was
triggered by autoconnect or via D-Bus.
Code that walked the VHT TX/RX MCS maps seemed to assume that bit_field
operated on bits that start at '1'. But this utility actually operates
on bits that start at '0'. I.e. the least significant bit is at
position 0.
While we're at it, rename the mcs variable into bitoffset to make it
clearer how the maps are being iterated over. Supported MCS is actually
the value found in the map.
We seem to be not specifying the msize for the root filesystem, which
results in this warning being printed:
emu-system-x86_64: warning: 9p: degraded performance: a reasonable high msize should be chosen on client/guest side (chosen msize is <= 8192). See https://wiki.qemu.org/Documentation/9psetup#msize for details.
There doesn't seem to be much performance difference in the end since
iwd does not process large files.
This option has not been used in a very long time, and is of limited
utility since the only thing D-Bus debugging does is hexdumps the
content of D-Bus messages to the terminal.
The current calculation was giving erroneous results when it came to VHT
MCS index 4 and VHT MCS index 8 & 9.
Switch to a precomputed look up table and add a multiplication factor
for short GI.
These test cases depend on setting up the existing hostapd instance to a
set of known addresses, which might be different from what test-runner
sets. During this time, any scans might result in the old and the new
addresses used by hostapd to be found in the scan results.
Fix that by using start_iwd=0 which tells test_runner that the test
wants to start iwd itself. This delays starting iwd until after the
setUpClass routine has been called and hostapd configured properly.
Also use more sensible rssi values for the 'non-preferred' bss.
Otherwise, ranking BSSes by throughput can confuse the test logic
since both BSSes are ranked the same and either can be picked by
autoconnect.
Right now the --valgrind option logs to a static file named
'valgrind.log'. This means that for any test that run multiple
instances of iwd, output is lost for all invocations except the last.
Fix that by using a per-process log file and making sure that all log
files are printed to stdout when the test ends.
This approach isn't perfect since it is possible for the pid to be
reused, but better than the current behavior.
ap_reset() seems to be called whenever the AP is stopped or removed due
to interface shutdown. For some reason ap_reset did not remove the DHCP
server object, resulting in leaks:
==211== at 0x483879F: malloc (vg_replace_malloc.c:307)
==211== by 0x46B5AD: l_malloc (util.c:62)
==211== by 0x49B0E2: l_dhcp_server_new (dhcp-server.c:715)
==211== by 0x433AA3: ap_setup_dhcp (ap.c:2615)
==211== by 0x433AA3: ap_load_dhcp (ap.c:2645)
==211== by 0x433AA3: ap_load_config (ap.c:2753)
==211== by 0x433AA3: ap_start (ap.c:2885)
==211== by 0x434A96: ap_dbus_start_profile (ap.c:3329)
==211== by 0x482DA9: _dbus_object_tree_dispatch (dbus-service.c:1815)
==211== by 0x47A4D9: message_read_handler (dbus.c:285)
==211== by 0x4720EB: io_callback (io.c:120)
==211== by 0x47130C: l_main_iterate (main.c:478)
==211== by 0x4713DB: l_main_run (main.c:525)
==211== by 0x4713DB: l_main_run (main.c:507)
==211== by 0x4715EB: l_main_run_with_signal (main.c:647)
==211== by 0x403EE1: main (main.c:550)
==209== by 0x43E48A: netconfig_ipv4_select_and_install (netconfig.c:887)
==209== by 0x43E48A: netconfig_configure (netconfig.c:1025)
==209== by 0x41743C: station_connect_cb (station.c:2556)
==209== by 0x408E0D: netdev_connect_ok (netdev.c:1311)
==209== by 0x47549E: process_unicast (genl.c:994)
==209== by 0x47549E: received_data (genl.c:1102)
==209== by 0x4720EB: io_callback (io.c:120)
==209== by 0x47130C: l_main_iterate (main.c:478)
==209== by 0x4713DB: l_main_run (main.c:525)
==209== by 0x4713DB: l_main_run (main.c:507)
==209== by 0x4715EB: l_main_run_with_signal (main.c:647)
==209== by 0x403EE1: main (main.c:550)
Prior to the BSS blacklist a BSS based autoconnect list made
the most sense, but now station actually retries all BSS's upon
failure. This means that for each BSS in the autoconnect list
every other BSS under that SSID will be attempted to connect to
if there is a failure. Essentially this is a network based
autoconnect list, just an indirect way of doing it.
Intead the autoconnect list can be purely network based, using
the network rank for sorting. This avoids the need for a special
autoconnect_entry struct as well as ensures the last connected
network is chosen first (simply based on existing network ranking
logic).
It was observed that IWD's ranking for BSS's did not always
end up with the fastest being chosen. This was due to IWD's
heavy weight on signal strength. This is a decent way of ranking
but even better is calculating a theoretical data rate which
was also done and factored in. The problem is the data rate
factor was always outdone by the signal strength.
Intead remove signal strength entirely as this is already taken
into account with the data rate calculation. This also removes
the check for rate IEs. If no IEs are found the parser will
base the data rate soley on RSSI.
There were a few other factors removed which will be added back
when ranking *networks* rather than BSS's. WPA version (or open)
was removed as well as the privacy capability. These values really
should not differ between BSS's in the same SSID and as such
should be used for network ranking instead.
Both ext/supported rates IEs are obtained from scan results. These
IEs are passed to ie_tlv_init/ie_tlv_next, as well as direct length
checks (for supported rates at least, extended supported rates can
be as long as a single byte integer can hold, 1 - 255) which verifies
that the length in the IE matches the overall IE length that is
stored in scan_bss. Because of this, ie_parse_supported_rates_from_data
was doing double duty re-initializing a TLV iterator.
Intead, since we know the IE length is within bounds, the length/data
can simply be directly accessed out of the buffer. This avoids the need
for a wrapper function entirely.
The length parameters were also removed, since this is now obtained
directly from the IE.
The FT-over-DS procedure now authenticates with multiple BSS's
upon connecting. This causes list_sta() to return our address for
any authenticated APs. It has now been changed to work with this
new behavior, as well as a check that the station fully connected
to the expected AP initially.
Since netdev maintains the list of FT over DS info structs there is not
any need for station to get callbacks when the initial action frame
is received, or not. This removes the need for the callback handler,
user data, and response timeout.
Roam times can be slightly improved by sending out the FT-over-DS
action frames to any BSS in the mobility domain immediately after
connecting. This preauthenticates IWD to each AP which means
Reassociation can happen right away when a roam is needed.
When a roam is needed station_transition_start will first try
FT-over-DS (if supported) via netdev_fast_transtion_over_ds. The
return is checked and if netdev has no cached entries FT-over-Air
will be used instead.
The beauty of FT-over-DS is that a station can send and receive
action frames to many APs to prepare for a future roam. Each
AP authenticates the station and when a roam happens the station
can immediately move to reassociation.
To handle this a queue of netdev_ft_over_ds_info structs is used
instead of a single entry. Using the new ft.c parser APIs these
info structs can be looked up when responses come in. For now
the timeouts/callbacks are kept but these will be removed as it
really does not matter if the AP sends a response (keeps station
happy until the next patch).
This is to prepare for multiple concurrent FT-over-DS action frames.
A list will be kept in netdev and for lookup reasons it needs to
parse the start of the frame to grab the aa/spa addresses. In this
call the IEs are also returned and passed to the new
ft_over_ds_parse_action_response.
For now the address checks have been moved into netdev, but this will
eventually turn into a queue lookup.
test-runner will print out if files were left behind after a
test which lets the developer know something was not cleaned
up. But in this case test-runner should also remove these files
so they are not left, and printed, for each subsequent test.
This value sets the roaming threshold on 5GHz networks. The
threshold has been separated from 2.4GHz because in many cases
5GHz can perform much better at low RSSI than 2.4GHz.
In addition the BSS ranking logic was re-worked and now 5GHz is
much more preferred, even at low RSSI. This means we need a
lower floor for RSSI before roaming, otherwise IWD would end
up roaming immediately after connecting due to low RSSI CQM
events.
This is being added as a developer method and should not be used
in production. For testing purposes though, it is quite useful as
it forces IWD to roam to a provided BSS and bypasses IWD's roaming
and ranking logic for choosing a roam candidate.
To use this a BSSID is provided as the only parameter. If this
BSS is not in IWD's current scan results -EINVAL will be returned.
If IWD knows about the BSS it will attempt to roam to it whether
that is via FT, FT-over-DS, or Reassociation. These details are
still sorted out in IWDs station_transition_start() logic.
This will enable developer features to be used. Currently the
only user of this will be StationDiagnostics.Roam() method which
should only be exposed in this mode.
Expose the state directory/storage directory path on D-Bus because it
can't be known to clients until IWD runs, and client might need to
occasionally fiddle with the network config files. While there also
expose the IWD version string, similar to how some other D-Bus services
do.
Certain tests like testAP spawn two IWD process in separate
namespaces. When --valrind is used this eats up quite a bit
of RAM and causes the VM to run out of memory and start
killing off processes.
Similar to 06aa84cca set the operstate when AdHoc is started and
stopped as it is no longer always set by netdev (only for station/p2p
interface types)
Previously resp was a simple array of bytes allocated on the stack.
This was changed to a dynamically allocated array, but the sizeof(resp)
argument to ap_build_beacon_pr_head() was never changed appropriately.
Fix this by introducing a new resp_len variable that holds the number of
bytes allocated for resp. Also, move the allocation after the basic
sanity checks have been performed to avoid allocating/freeing memory
unnecessarily.
Fixes: 18a63f91fd44 ("ap: Write extra frame IEs from the user")
Commit 1fe5070 added a workaround for drivers which may send the
connect event prior to the connect callback/ack. This caused IWD
to fail to start eapol if reassociation was used due to
netdev_reassociate never setting netdev->connected = false.
netdev_reassociate uses the same code path as normal connections,
but when the connect callback came in connected was already set
to true which then prevents eapol from being registered. Then,
once the connect event comes in, there is no frame watch for
eapol and IWD doesn't respond to any handshake frames.
WEP networks are not supported by iwd. However, the only indication is the
message "Operation not supported" while trying to connect. It is not clear
enough that this is due to intentional lack of support (as opposed to some
kind of misconfiguration). This patch explicitly lists WEP networks shown
with get-networks as unsupported. Hopefully this will make it clearer for
those of us not as familiar with iwd.
Prior to this, an error sending the FT Reassociation was treated
as fatal, which is correct for FT-over-Air but not for FT-over-DS.
If the actual l_genl_family_send call fails for FT-over-DS the
existing connection can be maintained and there is no need to
call netdev_connect_failed.
Adding a return to the tx_associate function works for both FT
types. In the FT-over-Air case this return will ultimately get
sent back up to auth_proto_rx_authenticate in which case will
call netdev_connect_failed. For FT-over-DS tx_associate is
actually called from the 'start' operation which can fail and
still maintain the existing connection.
FT-over-DS was refactored to separate the FT action frame and
reassociation. From stations standpoint IWD needs to call
netdev_fast_transition_over_ds_action prior to actually roaming.
For now these two stages are being combined and the action
roam happens immediately after the action response callback.
FT-over-DS followed the same pattern as FT-over-Air which worked,
but really limited how the protocol could be used. FT-over-DS is
unique in that we can authenticate to many APs by sending out
FT action frames and parsing the results. Once parsed IWD can
immediately Reassociate, or do so at a later time.
To take advantage of this IWD need to separate FT-over-DS into
two stages: action frame and reassociation.
The initial action frame stage is started by netdev. The target
BSS is sent an FT action frame and a new cache entry is created
in ft.c. Once the response is received the entry is updated
with all the needed data to Reassociate. To limit the record
keeping on netdev each FT-over-DS entry holds a userdata pointer
so netdev doesn't need to maintain its own list of data for
callbacks.
Once the action response is parsed netdev will call back signalling
the action frame sequence was completed (either successfully or not).
At this point the 'normal' FT procedure can start using the
FT-over-DS auth-proto.
FT-over-DS is being separated into two independent stages. The
first of which is the processing of the action frame response.
This new class will hold all the parsed information from the action
frame and allowing it to be retrieved at a later time when IWD
needs to roam.
Initial info class should be created when the action frame is
being sent out. Once a response is received it can be parsed
with ft_over_ds_parse_action_response. This verifies the frame
and updates the ft_ds_info class with the parsed data.
ft_over_ds_prepare_handshake is the final step prior to
Reassociation. This sets all the stored IEs, anonce, and KH IDs
into the handshake and derives the new PTK.
This adds the RSNE verification to ft_parse_ies which will
be common between over-Air and over-DS. The MDE check was
also factored out into its own minimal function as to
retain the spec comment but allow reuse elsewhere.
Since using --valgrind actually runs IWD using the valgrind
process the --verbose flag would only work if 'valgrind' was
also specified. This was taken into account with is_verbose
but the actual logic enabling stdout did not use that helper.
This was due, in part, to logging since is_verbose will always
return true if --log is used. To fix this a new flag was added
to is_verbose which omits the --log check to handle this
specific case.
Prior to this the diagnostic interface was taken down when station
transitioned to DISCONNECTED. This worked but once station is in
a DISCONNECTING state it then calls netdev_disconnect(). Trying to
get any diagnostic data during this time may not work as its
unknown what state exactly the kernel is in. To be safe take the
interface down when station is DISCONNECTING.
The building of the FT IEs for Action/Authenticate
frames will need to be shared between ft and netdev
once FT-over-DS is refactored.
The building was refactored to work off the callers
buffer rather than internal stack buffers. An argument
'new_snonce' was included as FT-over-DS will generate
a new snonce for the initial action frame, hence the
handshakes snonce cannot be used.
Break up the rather large code block which parses out IEs,
verifies, and sets into the handshake. FT-over-DS needs these
steps broken up in order to parse the action frame response
without modifying the handshake.
Under very rare circumstances the roaming scan triggered might not be
canceled properly. This is because we issue the roam scan recursively
from within a scan callback and re-use the id of the scan for the
subsequent request. The destroy callback is invoked right after the
callback and resets the id. This leads to the scan not being canceled
properly in roam_state_clear().
src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
src/station.c:station_roam_trigger_cb() 37
src/station.c:station_roam_scan() ifindex: 37
src/station.c:station_roam_trigger_cb() Using cached neighbor report for roam
...
src/scan.c:get_scan_done() get_scan_done
src/station.c:station_roam_failed() 37
src/station.c:station_roam_scan() ifindex: 37
src/scan.c:scan_request_triggered() Active scan triggered for wdev 22
^CTerminate
src/netdev.c:netdev_free() Freeing netdev wlan0[37]
src/device.c:device_free()
src/station.c:station_free()
...
Removing scan context for wdev 22
src/scan.c:scan_context_free() sc: 0x4a362a0
src/wiphy.c:wiphy_radio_work_done() Work item 14 done
==19542== Invalid write of size 4
==19542== at 0x411500: station_roam_scan_destroy (station.c:2010)
==19542== by 0x420B5B: scan_request_free (scan.c:156)
==19542== by 0x410BAC: destroy_work (wiphy.c:294)
==19542== by 0x410BAC: wiphy_radio_work_done (wiphy.c:1613)
==19542== by 0x46C66E: l_queue_clear (queue.c:107)
==19542== by 0x46C6B8: l_queue_destroy (queue.c:82)
==19542== by 0x420BAE: scan_context_free (scan.c:205)
==19542== by 0x424135: scan_wdev_remove (scan.c:2272)
==19542== by 0x408754: netdev_free (netdev.c:847)
==19542== by 0x40E18C: netdev_shutdown (netdev.c:5773)
==19542== by 0x404756: iwd_shutdown (main.c:78)
==19542== by 0x404756: iwd_shutdown (main.c:65)
==19542== by 0x470E21: handle_callback (signal.c:78)
==19542== by 0x470E21: signalfd_read_cb (signal.c:104)
==19542== by 0x47166B: io_callback (io.c:120)
==19542== Address 0x4d81f98 is 200 bytes inside a block of size 288 free'd
==19542== at 0x48399CB: free (vg_replace_malloc.c:538)
==19542== by 0x47F3E5: interface_instance_free (dbus-service.c:510)
==19542== by 0x481DEA: _dbus_object_tree_remove_interface (dbus-service.c:1694)
==19542== by 0x481F1C: _dbus_object_tree_object_destroy (dbus-service.c:795)
==19542== by 0x40894F: netdev_free (netdev.c:844)
==19542== by 0x40E18C: netdev_shutdown (netdev.c:5773)
==19542== by 0x404756: iwd_shutdown (main.c:78)
==19542== by 0x404756: iwd_shutdown (main.c:65)
==19542== by 0x470E21: handle_callback (signal.c:78)
==19542== by 0x470E21: signalfd_read_cb (signal.c:104)
==19542== by 0x47166B: io_callback (io.c:120)
==19542== by 0x47088C: l_main_iterate (main.c:478)
==19542== by 0x47095B: l_main_run (main.c:525)
==19542== by 0x47095B: l_main_run (main.c:507)
==19542== by 0x470B6B: l_main_run_with_signal (main.c:647)
==19542== Block was alloc'd at
==19542== at 0x483879F: malloc (vg_replace_malloc.c:307)
==19542== by 0x46AB2D: l_malloc (util.c:62)
==19542== by 0x416599: station_create (station.c:3448)
==19542== by 0x406D55: netdev_newlink_notify (netdev.c:5324)
==19542== by 0x46D4BC: l_hashmap_foreach (hashmap.c:612)
==19542== by 0x472F46: process_broadcast (netlink.c:158)
==19542== by 0x472F46: can_read_data (netlink.c:279)
==19542== by 0x47166B: io_callback (io.c:120)
==19542== by 0x47088C: l_main_iterate (main.c:478)
==19542== by 0x47095B: l_main_run (main.c:525)
==19542== by 0x47095B: l_main_run (main.c:507)
==19542== by 0x470B6B: l_main_run_with_signal (main.c:647)
==19542== by 0x403EDB: main (main.c:490)
==19542==
Prior to this netdev_connect_ok set setting this which really
only applies to station mode. In addition this happens for each
new station that connects to the AP. Instead set the operstate /
link mode when AP starts and stops.
Change ap_start to load all of the AP configuration from a struct
l_settings, moving the 6 or so parameters from struct ap_config members
to the l_settings groups and keys. This extends the ap profile concept
used for the DHCP settings. ap_start callers create the l_settings
object and fill the values in it or read the settings in from a file.
Since ap_setup_dhcp and ap_load_profile_and_dhcp no longer do the
settings file loading, they needed to be refactored and some issues were
fixed in their logic, e.g. l_dhcp_server_set_ip_address() was never
called when the "IP pool" was used. Also the IP pool was previously only
used if the ap->config->profile was NULL and this didn't match what the
docs said:
"If [IPv4].Address is not provided and no IP address is set on the
interface prior to calling StartProfile the IP pool will be used."
The info struct is on the stack which leads to the potential
for uninitialized data access. Zero out the info struct prior
to calling the get station callback:
==141137== Conditional jump or move depends on uninitialised value(s)
==141137== at 0x458A6F: diagnostic_info_to_dict (diagnostic.c:109)
==141137== by 0x41200B: station_get_diagnostic_cb (station.c:3620)
==141137== by 0x405BE1: netdev_get_station_cb (netdev.c:4783)
==141137== by 0x4722F9: process_unicast (genl.c:994)
==141137== by 0x4722F9: received_data (genl.c:1102)
==141137== by 0x46F28B: io_callback (io.c:120)
==141137== by 0x46E5AC: l_main_iterate (main.c:478)
==141137== by 0x46E65B: l_main_run (main.c:525)
==141137== by 0x46E65B: l_main_run (main.c:507)
==141137== by 0x46E86B: l_main_run_with_signal (main.c:647)
==141137== by 0x403EA8: main (main.c:490)
It isn't safe to return a NULL from diagnostic_akm_suite_to_security()
since the value is used directly. Also, if the AKM suite is 0, this
implies that the network is an Open network and not some unknown AKM.
==17982== Invalid read of size 1
==17982== at 0x483BC92: strlen (vg_replace_strmem.c:459)
==17982== by 0x47DE60: _dbus1_builder_append_basic (dbus-util.c:981)
==17982== by 0x41ACB2: dbus_append_dict_basic (dbus.c:197)
==17982== by 0x412050: station_get_diagnostic_cb (station.c:3614)
==17982== by 0x405B19: netdev_get_station_cb (netdev.c:4801)
==17982== by 0x47436E: process_unicast (genl.c:994)
==17982== by 0x47436E: received_data (genl.c:1102)
==17982== by 0x470FBB: io_callback (io.c:120)
==17982== by 0x4701DC: l_main_iterate (main.c:478)
==17982== by 0x4702AB: l_main_run (main.c:525)
==17982== by 0x4702AB: l_main_run (main.c:507)
==17982== by 0x4704BB: l_main_run_with_signal (main.c:647)
==17982== by 0x403EDB: main (main.c:490)
==17982== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==17982==
Aborting (signal 11) [/home/denkenz/iwd/src/iwd]
++++++++ backtrace ++++++++
0 0x488a550 in /lib64/libc.so.6
1 0x483bc92 in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so
2 0x47de61 in _dbus1_builder_append_basic() at ell/dbus-util.c:983
3 0x41acb3 in dbus_append_dict_basic() at src/dbus.c:197
4 0x412051 in station_get_diagnostic_cb() at src/station.c:3618
5 0x405b1a in netdev_get_station_cb() at src/netdev.c:4801
It is possible for the RTNL command callback to come after
netconfig_reset or netconfig_destroy has been called. Make sure that
any outstanding commands that might access the netconfig object are
canceled.
src/netconfig.c:netconfig_ipv4_dhcp_event_handler() DHCPv4 event 0
src/netconfig.c:netconfig_ifaddr_added() wlan0: ifaddr 192.168.1.55/24 broadcast 192.168.1.255
^CTerminate
src/netdev.c:netdev_free() Freeing netdev wlan0[15]
src/device.c:device_free()
src/station.c:station_free()
src/netconfig.c:netconfig_destroy()
src/netconfig.c:netconfig_reset()
src/netconfig.c:netconfig_reset_v4() 16
src/netconfig.c:netconfig_reset_v4() Stopping client
Removing scan context for wdev c
src/scan.c:scan_context_free() sc: 0x4a3cc10
==12792== Invalid read of size 8
==12792== at 0x43BF5A: netconfig_route_add_cmd_cb (netconfig.c:600)
==12792== by 0x4727FA: process_message (netlink.c:181)
==12792== by 0x4727FA: can_read_data (netlink.c:289)
==12792== by 0x470F4B: io_callback (io.c:120)
==12792== by 0x47016C: l_main_iterate (main.c:478)
==12792== by 0x47023B: l_main_run (main.c:525)
==12792== by 0x47023B: l_main_run (main.c:507)
==12792== by 0x47044B: l_main_run_with_signal (main.c:647)
==12792== by 0x403EDB: main (main.c:490)
In case the netdev is brought down while we're trying to connect, try to
detect this and fail early instead of trying to send additional
commands.
src/station.c:station_enter_state() Old State: disconnected, new state: connecting
src/station.c:station_netdev_event() Associating
src/netdev.c:netdev_mlme_notify() MLME notification Connect(46)
src/netdev.c:netdev_connect_event()
src/netdev.c:netdev_link_notify() event 16 on ifindex 4
src/eapol.c:eapol_handle_ptk_1_of_4() ifindex=4
src/netdev.c:netdev_link_notify() event 16 on ifindex 4
src/eapol.c:eapol_handle_ptk_3_of_4() ifindex=4
src/netdev.c:netdev_set_gtk() 4
src/station.c:station_handshake_event() Setting keys
src/netdev.c:netdev_set_tk() 4
src/netdev.c:netdev_set_rekey_offload() 4
New Key for Group Key failed for ifindex: 4:Network is down
src/netdev.c:netdev_link_notify() event 16 on ifindex 4
src/station.c:station_free()
src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
src/netdev.c:netdev_disconnect_event()
src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX
src/netdev.c:netdev_link_notify() event 16 on ifindex 4
src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is DE
src/wiphy.c:wiphy_radio_work_done() Work item 14 done
src/station.c:station_connect_cb() 4, result: 4
Segmentation fault
A prior commit refactored the AKM selection in wiphy.c. This
ended up breaking FILS tests due to the hard coding of a
false fils_hint in wiphy_select_akm. Since our FILS tests
only advertise FILS AKMs wiphy_can_connect would return false
for these networks.
Similar to wiphy_select_akm, add a fils hint parameter to
wiphy_can_connect and pass that down directly to wiphy_select_akm.
If PreSharedKey is set, the current logic does not validate the
Passphrase beyond its existence. This can lead to strange situations
where an invalid WPA3-PSK passphrase might get used. This can of course
only happen if the user (as root) or NetworkManager-iwd-backend writes
such a file incorrectly.
Move the WSC Primary Device Type parsing from p2p.c and eap-wsc.c to a
common function in wscutil.c supporting both formats so that it can be
used in ap.c too.
Logically this frame watch belongs in station. It was kept in device.c
for the purported reason that the station object was removed with
ifdown/ifup changes and hence the frame watch might need to be removed
and re-added unnecessarily. Since the kernel does not actually allow to
unregister a frame watch (only when the netdev is removed or its iftype
changes), re-adding a frame watch might trigger a -EALREADY or similar
error.
Avoid this by registering the frame watch when a new netdev is detected
in STATION mode, or when the interface type changes to STATION.
If a netdev iftype is changed, all frame registrations are removed.
Make sure to re-register for the appropriate frame notifications in case
our iftype is switched back to 'station'. In any other iftype, no frame
watches are registered and rrm_state object is effectively dormant.
Right now, RRM is created when a new netdev is detected and its iftype
is of type station. That means that any devices that start their life
as any other iftype cannot be changed to a station and have RRM function
properly. Fix that by always creating the RRM state regardless of the
initial iftype.
In the case that a netdev is powered down, or an interface type change
occurs, the station object will be removed and any watches will be
freed.
Since rrm is created when the netdev is created and persists across
iftype and power up/down changes, it should provide a destroy callback
to station_add_state_watch so that it can be notified when the watch is
removed.
If the iftype changes, kernel silently wipes out any frame registrations
we may have registered. Right now, frame registrations are only done when
the interface is created. This can result in frame watches not being
added if the interface type is changed between station mode to ap mode
and then back to station mode, e.g.:
device wlan0 set-property Mode ap
device wlan0 set-property Mode station
Make sure to re-add frame registrations according to the mode if the
interface type is changed.
Since netdev now keeps track of iftype changes, let it call
frame_watch_wdev_remove on netdevs that it manages to clear frame
registrations that should be cleared due to an iftype change.
Note that P2P_DEVICE wdevs are not managed by any netdev object, but
since their iftype cannot be changed, they should not be affected
by this change.
And set the interface type based on the event rather than the command
callback. This allows us to track interface type changes even if they
come from outside iwd (which shouldn't happen.)
The prepare_ft patch was an intermediate to a full patch
set and was not fully tested stand alone. Its placement
actually broke FT due to handshake->aa getting overwritten
prior to netdev->prev_bssid being copied out. This caused
FT to fail with "transport endpoint not connected (-107)"
The AuthCenter was still not being fully cleaned up in these
tests. It was being stopped but there was still a reference being
held which prevented __del__ from being called.
There was a bug with process output where the last bit of data would
never make it into stdout or log files. This was due to the IO watch
being cleaned up when the process was killed and never allowing it
to finish writing any pending data.
Now the IO watch implementation has been moved out into its own
function (io_process) which is now used to write the final bits of
data out on process exit.
The processes in the list ultimately get removed for each
kill() call. This causes strange behavior since the list is
being iterated and each iteration is removing items. Instead
iterate over a new temporary list so the actual process list
can be cleaned up.
- Make sure to print the cookie information
- Don't print messages for frames we're not interested in. This is
particularly helpful when running auto-tests since frame acks from
hostapd pollute the iwd log.
This file was not included when testNetconfig was introduced
and is required. My system was working fine as it was in my
local tree but has been missing and not passing for others.
IWD_GENL_DEBUG is not generally useful anymore as it just prints a
hexdump of the raw data on the socket. The messages are quite verbose
and spam test-runner logs for little utility.
Fix a regression where connection to an open network results in an
NotSupported error being returned.
Fixes: d79e883e93df ("netdev: Introduce connection types")
This makes conversions simpler. Also fixes a bug where P2P devices were
printed with an incorrect Mode value since dbus_iftype_to_string was
assuming that an iftype as defined in nl80211.h was being passed in,
while netdev was returning an enum value defined in netdev.h.
It was seen that some full mac cards/drivers do not include any
rate information with the NEW_STATION event. This was causing
the NEW_STATION event to be ignored, preventing AP mode from
working on these cards.
Since the full mac path does not even require sta->rates the
parsing can be removed completely.
It was found that if the user cancels/disconnects the agent prior to
entering credentials, IWD would get stuck and could no longer accept
any connect calls with the error "Operation already in progress".
For example exiting iwctl in the Password prompt would cause this:
iwctl
$ station wlan0 connect myssid
$ Password: <Ctrl-C>
This was due to the agent never calling the network callback in the
case of an agent disconnect. Network would wait indefinitely for the
credentials, and disallow any future connect attempts.
To fix this agent_finalize_pending can be called in agent_disconnect
with a NULL reply which behaves the same as if there was an
internal timeout and ultimately allows network to fail the connection
The 8021x offloading procedure still does EAP in userspace which
negotiates the PMK. The kernel then expects to obtain this PMK
from userspace by calling SET_PMK. This then allows the firmware
to begin the 4-way handshake.
Using __eapol_install_set_pmk_func to install netdev_set_pmk,
netdev now gets called into once EAP finishes and can begin
the final userspace actions prior to the firmware starting
the 4-way handshake:
- SET_PMK using PMK negotiated with EAP
- Emit SETTING_KEYS event
- netdev_connect_ok
One thing to note is that the kernel provides no way of knowing if
the 4-way handshake completed. Assuming SET_PMK/SET_STATION come
back with no errors, IWD assumes the PMK was valid. If not, or
due to some other issue in the 4-way, the kernel will send a
disconnect.
This adds a new type for 8021x offload as well as support in
building CMD_CONNECT.
As described in the comment, 8021x offloading is not particularly
similar to PSK as far as the code flow in IWD is concerned. There
still needs to be an eapol_sm due to EAP being done in userspace.
This throws somewhat of a wrench into our 'is_offload' cases. And
as such this connection type is handled specially.
802.1x offloading needs a way to call SET_PMK after EAP finishes.
In the same manner as set_tk/gtk/igtk a new 'install_pmk' function
was added which eapol can call into after EAP completes.
The timeout functionality was removed from the core SAE
implementation as it causes issues with kernel behavior.
Because of this the timeout tests are no longer valid,
nor is a few asserts in the end-to-end test.
The chances were extremely low, but using l_idle_oneshot
could end up causing a invalid memory access if the netdev
went down while waiting for the disconnect idle callback.
Instead netdev can keep track of the idle with l_idle_create
and remove it if the netdev goes down prior to the idle callback.
This fixes an infinite loop issue when authenticate frames time
out. If the AP is not responding IWD ends up retrying indefinitely
due to how SAE was handling this timeout. Inside sae_auth_timeout
it was actually sending another authenticate frame to reject
the SAE handshake. This, again, resulted in a timeout which called
the SAE timeout handler and repeated indefinitely.
The kernel resend behavior was not taken into account when writing
the SAE timeout behavior and in practice there is actually no need
for SAE to do much of anything in response to a timeout. The
kernel automatically resends Authenticate frames 3 times which mirrors
IWDs SAE behavior anyways. Because of this the authenticate timeout
handler can be completely removed, which will cause the connection
to fail in the case of an autentication timeout.
This crash was caused from the disconnect_cb being called
immediately in cases where send_disconnect was false. The
previous patch actually addressed this separately as this
flag was being set improperly which will, indirectly, fix
one of the two code paths that could cause this crash.
Still, there is a situation where send_disconnect could
be false and in this case IWD would still crash. If IWD
is waiting to queue the connect item and netdev_disconnect
is called it would result in the callback being called
immediately. Instead we can add an l_idle as to allow the
callback to happen out of scope, which is what station
expects.
Prior to this patch, the crashing behavior can be tested using
the following script (or some variant of it, your system timing
may not be the same as mine).
iwctl station wlan0 disconnect
iwctl station wlan0 connect <network1> &
sleep 0.02
iwctl station wlan0 connect <network2>
++++++++ backtrace ++++++++
0 0x7f4e1504e530 in /lib64/libc.so.6
1 0x432b54 in network_get_security() at src/network.c:253
2 0x416e92 in station_handshake_setup() at src/station.c:937
3 0x41a505 in __station_connect_network() at src/station.c:2551
4 0x41a683 in station_disconnect_onconnect_cb() at src/station.c:2581
5 0x40b4ae in netdev_disconnect() at src/netdev.c:3142
6 0x41a719 in station_disconnect_onconnect() at src/station.c:2603
7 0x41a89d in station_connect_network() at src/station.c:2652
8 0x433f1d in network_connect_psk() at src/network.c:886
9 0x43483a in network_connect() at src/network.c:1183
10 0x4add11 in _dbus_object_tree_dispatch() at ell/dbus-service.c:1802
11 0x49ff54 in message_read_handler() at ell/dbus.c:285
12 0x496d2f in io_callback() at ell/io.c:120
13 0x495894 in l_main_iterate() at ell/main.c:478
14 0x49599b in l_main_run() at ell/main.c:521
15 0x495cb3 in l_main_run_with_signal() at ell/main.c:647
16 0x404add in main() at src/main.c:490
17 0x7f4e15038b25 in /lib64/libc.so.6
The send_disconnect flag was being improperly set based only
on connect_cmd_id being zero. This does not take into account
the case of CMD_CONNECT having finished but not EAPoL. In this
case we do need to send a disconnect.
This adds a new connection type, TYPE_PSK_OFFLOAD, which
allows the 4-way handshake to be offloaded by the firmware.
Offloading will be used if the driver advertises support.
The CMD_ROAM event path was also modified to take into account
handshake offloading. If the handshake is offloaded we still
must issue GET_SCAN, but not start eapol since the firmware
takes care of this.
Until now FT was only supported via Auth/Assoc commands which barred
any fullmac cards from using FT AKMs. With PSK offload support these
cards can do FT but only when offloading is used.
In the FW scan callback eapol was being stared unconditionally which
isn't correct as roaming on open networks is possible. Instead check
that a SM exists just like is done in netdev_connect_event.
This should have been updated along with the connect and roam
event separation. Since netdev_connect_event is not being
re-used for CMD_ROAM the comment did not make sense anymore.
Still, there needs to be a check to ensure we were not disconnected
while waiting for GET_SCAN to come back.
netdev_connect_event was being reused for parsing of CMD_ROAM
attributes which made some amount of sense since these events
are nearly identical, but due to the nature of firmware roaming
there really isn't much IWD needs to parse from CMD_ROAM. In
addition netdev_connect_event was getting rather complicated
since it had to handle both CMD_ROAM and CMD_CONNECT.
The only bits of information IWD needs to parse from CMD_ROAM
is the roamed BSSID, authenticator IEs, and supplicant IEs. Since
this is so limited it now makes little sense to reuse the entire
netdev_connect_event function, and intead only parse what is
needed for CMD_ROAM.
station should be isolated as much as possible from the details of the
driver type and how a particular AKM is handled under the hood. It will
be up to wiphy to pick the best AKM for a given bss. netdev in turn
will pick how to drive the particular AKM that was picked.
Currently netdev handles SoftMac and FullMac drivers mostly in the same
way, by building CMD_CONNECT nl80211 commands and letting the kernel
figure out the details. Exceptions to this are FILS/OWE/SAE AKMs which
are only supported on SoftMac drivers by using
CMD_AUTHENTICATE/CMD_ASSOCIATE.
Recently, basic support for SAE (WPA3-Personal) offload on FullMac cards
was introduced. When offloaded, the control flow is very different than
under typical conditions and required additional logic checks in several
places. The logic is now becoming quite complex.
Introduce a concept of a connection type in order to make it clearer
what driver and driver features are being used for this connection. In
the future, connection types can be expanded with 802.1X handshake
offload, PSK handshake offload and CMD_EXTERNAL_AUTH based SAE
connections.
Commit 6e8b76527 added a switch statement for AKM suites which
was not correct as this is a bitmask and may contain multiple
values. Intead we can rely on wiphy_select_akm which is a more
robust check anyways.
Fixes: 6e8b7652788a ("wiphy: add check for CMD_AUTH/CMD_ASSOC support")
If there is an associate timeout, retry a few times in case
it was just a fluke. At this point SAE is fully negotiated
so it makes sense to attempt to save the connection.
Any auth proto which did not implement the assoc_timeout handler
could end up getting 'stuck' forever if there was an associate
timeout. This is because in the event of an associate timeout IWD
only sets a few flags and relies on the connect event to actually
handle the failure. The problem is a connect event never comes
if the failure was a timeout.
To fix this we can explicitly fail the connection if the auth
proto has not implemented assoc_timeout or if it returns false.
In the same vein as requesting a neighbor report after
connecting for the first time, it should also be done
after a roam to obtain the latest neighbor information.
Converts ie_rsn_akm_suite values (and WPA1 hint) into a more
human readable security string such as:
WPA2-Personal, WPA3-Personal, WPA2-Personal + FT etc.
When we cancel a quick scan that has already been triggered, the
Scanning property is never reset to false. This doesn't fully reflect
the actual scanning state of the hardware since we don't (yet) abort
the scan, but at least corrects the public API behavior.
{Network} [/net/connman/iwd/0/7/73706733_psk] Connected = False
{Station} [/net/connman/iwd/0/7] Scanning = True
{Station} [/net/connman/iwd/0/7] State = connecting
{Station} [/net/connman/iwd/0/7] ConnectedNetwork =
/net/connman/iwd/0/7/73706733_psk
{Network} [/net/connman/iwd/0/7/73706733_psk] Connected = True
If IWD is connecting to a SAE/WPA3 BSS and Auth/Assoc commands
are not supported the only option is SAE offload. At this point
network_connect should have verified that the extended feature
for SAE offload exists so we can simply enable offload if these
commands are not supported.
SAE offload support requires some minor tweaks to CMD_CONNECT
as well as special checks once the connect event comes in. Since
at this point we are fully connected.
After adding network_bss_update, network now has a match_addr
queue function which can be used to replace an unneeded
l_queue_get_entries loop with l_queue_find.
This will swap out a scan_bss object with a duplicate that may
exist in a networks bss_list. The duplicate will be removed by
since the object is owned by station it is assumed that it will
be freed elsewhere.
If the hardware roams automatically we want to be sure to not
react to CQM events and attempt to roam/disconnect on our own.
Note: this is only important for very new kernels where CQM
events were recently added to brcmfmac.
Roaming on a full mac card is quite different than soft mac
and needs to be specially handled. The process starts with
the CMD_ROAM event, which tells us the driver is already
roamed and associated with a new AP. After this it expects
the 4-way handshake to be initiated. This in itself is quite
simple, the complexity comes with how this is piped into IWD.
After CMD_ROAM fires its assumed that a scan result is
available in the kernel, which is obtained using a newly
added scan API scan_get_firmware_scan. The only special
bit of this is that it does not 'schedule' a scan but simply
calls GET_SCAN. This is treated special and will not be
queued behind any other pending scan requests. This lets us
reuse some parsing code paths in scan and initialize a
scan_bss object which ultimately gets handed to station so
it can update connected_bss/bss_list.
For consistency station must also transition to a roaming state.
Since this roam is all handled by netdev two new events were
added, NETDEV_EVENT_ROAMING and NETDEV_EVENT_ROAMED. Both allow
station to transition between roaming/connected states, and ROAMED
provides station with the new scan_bss to replace connected_bss.
Adds support for getting firmware scan results from the kernel.
This is intended to be used after the firmware roamed automatically
and the scan result is require for handshake initialization.
The scan 'request' is competely separate from the normal scan
queue, though scan_results, scan_request, and the scan_context
are all used for consistency and code reuse.
Register P2P group's vendor IE writers using the new API to build and
attach the necessary P2P IE and WFD IEs to the (Re)Association Response,
Probe Response and Beacon frames sent by the GO.
Roughly validate the IEs and save some information for use in our own
IEs. p2p_extract_wfd_properties and p2p_device_validate_conn_wfd are
being moved unchanged to be usable in p2p_group_event without forward
declarations and to be next to p2p_build_wfd_ie.
Make the WSC IE processing and writing more self-contained (i.e. so that
it can be more easily moved to a separate file if desired) by using the
new ap_write_extra_ies() mechanism.
Pass the string IEs from the incoming STA association frames to
the user in the AP event data. I drop
ap_event_station_added_data.rsn_ie because that probably wasn't
going to ever be useful and the RSN IE is included in the .assoc_ies
array in any case.
Since GET_STATION (and in turn GetDiagnostics) gets the most
current station info this attribute serves as a better indication
of the current signal strength. In addition full mac cards don't
appear to always have the average attribute.
No instances of this macro now exist. If future instances crop up, the
better approach would be to use pragma directives to quiet such warnings
and allow static analysis to catch any issues.
Expanded packets with a 0 vendor id need to be treated just like
non-expanded ones. This led to very nasty looking if statements
throughout this function. Fix that by introducing a nested function
to take care of the response type normalization. This also allows us to
drop uninitialized_var usage.
Expanded Nak packet contains (possibly multiple) 8 byte chunks that
contain the type (1 byte, always '254') vendor-id (3 bytes) and
vendor-type (4) bytes.
Unfortunately the current logic was reading the vendor-id at the wrong
offset (0 instead of 1) and so the extracted vendor-type was incorrect.
Fixes: 17c569ba4cdd ("eap: Add authenticator method logic and API")
If we received a Nak or an Expanded Nak packet, the intent was to print
our own method type. Instead we tried to print the Nak type contents.
Fix that by always passing in our method info to eap_type_to_str.
Fixes: 17c569ba4cdd ("eap: Add authenticator method logic and API")
The '__' prefix is meant for private, semi-private,
inner implementation or otherwise special APIs that
are typically exposed in a header. In the case of watchlist, these
functions were static and do not fit the above description. Remove the
__ prefix accordingly.
Process output was being duplicated when -v was used. This was
due to both stderr and stdout being appended to the write_fd list
as well as stderr being set to stdout in the Popen call.
To fix this only stdout should be appended to the write_fd list,
but then there comes a problem with closing the streams. stdout
cannot be closed, so instead it is special cased. A new
verbose boolean was added to Process which, if True, will
cause any output to be written to stdout explicitly.
The Namespace class was never being removed when tests finished.
This is fixed by unreffing the hwsim internal _radio object which
both cleans up the radio and allows the Namespace to be removed.
This moves all the de-init code into kill(), which fixes a few
reference issues causing processes to hang around longer than
desired. If the process terminates on its own and/or the last
reference is lost __del__ will kill the process and clean up.
There were a few issues with the cleanup of Hostapd. First the
process was only being killed, which did not actually remove
the process from the list.
In addition, with EAP-SIM/AKA tests, hostapd created sim_db
unix socket files which it does not clean up.
This is somewhat of an open issue/TODO but for now this avoids
the exception caused by trying to remove a radio that has been
moved to a namespace. Once a radio is moved hwsim loses that phy
and can no longer interact with it. This causes the Destroy()
method call to fail.
A cleanup parameter was added to __init__ which can be used
by processes which create any additional files or require more
a custom cleanup routine. Some additional house keeping was
done to make Process cleanup more robust.
Though multi-test processes seemed like a good idea in terms of
efficiency, the additional code/special cases was not worth it
for the only two multi-test processes (dbus/haveged). Intead this
concept was removed completely and TestContext/Namespaces will
now start all processes for each individual test. This also is
fair to all tests as a previous failed test could end up bleeding
into future tests.
The cls object is part of the unittest framework and its lifespan
is out of test-runner's control. Setting objects into the cls
object sometimes keeps those objects around longer than desired.
Its best to unset anything set in cls when the test is tore down.
Printing out processes was done manually but instead we can
make Process printing extendable by adding its own __str__
method. This now will print if the process is a multi-test
process as well.
Output files in namespaces were not handled differently and would
end up overwriting/duplicating files from the root namespace. These
are now named /tmp/<process>-<namespace>-out.
The process class was quite hard to understand, and somewhat
fragile when multiple output options were needed like verbose
and logging, and in some cases even an additional output file.
To make things simpler we can have all processes output to a
temporary file (/tmp/<name>-out) and set a GLib IO watch on
that file. When the IO watch callback fires any additional
files (stdout, log files, output files) can be written to.
For wait=True processes we do not use an IO watch, but do
the same thing once the process exits, write to any additional
output files using the process output we already have.
The log dir was never being cleaned out prior to a new logging
test run. This could leave old stale files around. Note that this
will remove any past log files so if you need them, you want to
make a copy before running test-runner with --log again.
This test fails randomly, and it appears to be due to excessive
scanning. Historically most autotests start a dbus scan right
away. The problem is that most likely a periodic scan is already
ongoing, meaning the dbus scan gets queued. If a Connect() call
comes in (which it always does), the dbus scan gets delayed and will
trigger once connected, at a time the test is not expecting. This
can cause problems with any assumed timing as well as offchannel
frames.
This patch removes the explicit DBus scanning and instead uses
scan_if_needed with get_ordered_networks. The 'all_blacklisted_test'
was also modified to wait for scanning to complete after failing
to connect to all BSS's. This lets all the networks fully come
up (after being blocked by hwsim) and appear in scan results.
When using iwd.conf:[General].EnableNetworkConfiguration=true, it is not
possible to configure systemd.network:[Network].MulticastDNS= as
systemd-networkd considers the link to be unmanaged. This patch allows
iwd to configure that setting on systemd-resolved directly.
WSC EAP method always results in failure, even if successful. Failed
eapol_sm sessions are auto-cleaned, so there's no need to do this
explicitly. Also eapol_exit() will clean up any left-over sessions, so
drop this to make the code a bit simpler.
If the extended feature for CQM levels was not supported no CQM
registration would happen, not even for a single level. This
caused IWD to completely lose the ability to roam since it would
only get notified when the kernel was disconnecting, around -90
dBm, not giving IWD enough time to roam.
Instead if the extended feature is not supported we can still
register for the event, just without multiple signal levels.
This fixes up a previous commit which breaks iwctl. The
check was added to satisfy static analysis but it ended
up preventing iwctl from starting. In this case mkdir
can fail (e.g. if the directory already exists) and only
if it fails should the history be read. Otherwise a
successful mkdir return indicates the history folder is
new and there is no reason to try reading it.
There is no functional change here but checking the return
value makes static analysis much happier. Checking the
return and setting the default inside the if clause is also
consistent with how IWD does it many other places.
Dbus should be started as a multi-test process from the
TestContext, which leaves the dbus address file around for
the full test run. For Namespaces dbus-daemon should be
closed when the Namespace closes.
Handle situations where the BSS we're trying to connect to is no longer
in the kernel scan result cache. Normally, the kernel will re-scan the
target frequency if this happens on the CMD_CONNECT path, and retry the
connection.
Unfortunately, CMD_AUTHENTICATE path used for WPA3, OWE and FILS does
not have this scanning behavior. CMD_AUTHENTICATE simply fails with
a -ENOENT error. Work around this by trying a limited scan of the
target frequency and re-trying CMD_AUTHENTICATE once.
Every single roaming test had one of two problems with watching the
state change between roaming --> connected. Either the test used
wait_for_object_condition to wait for 'connected' which could allow
other states in between. Or it simply used an assert. The assert
wouldn't allow other state changes, but at the cost of potentially
failing due to IWD not having made it to the 'connected' state yet.
Now we have wait_for_object_change which takes two conditions:
initial (from_str) and expected (to_str). This API will not allow
any other conditions except these, and will wait for the expected
condition before continuing. This allows roaming test to reliably
wait for the roaming --> connected state change.
This is similar to wait_for_object_condition, but will not allow
any intermediate state changes between the initial and expected
conditions. This is useful for roaming tests when the expected
state change is 'connected' --> 'roaming' with no changes in
between.
This test occationally failed due to a badly timed DBus scan
triggering right when hwsim tried sending out the spoofed frame.
This caused mac80211_hwsim to reject CMD_FRAME when the timing
was just right.
Rather than always starting a DBus scan we can rely on periodic
scans and only DBus scan if there are no networks in IWD's list.
A scanning check was also added prior to sending out the frame
and if true we wait for not scanning. This is more paranoia than
anything.
Sometimes scan results can come in with a MAC address which
should be in the first index of addrs[] (42:xx:xx:xx:xx:xx).
This causes a failure to lookup the radio path.
There was also a failure path added if the radio cannot be
found rather than rely on DBus to fail with a None path.
The arguments to SendFrame were also changed to use the
ByteArray DBus type rather than python's internal bytearray.
This shouldn't have any effect, but its more consistent with
how DBus arguments should be used.
After recent changes fixing wait_for_object_condition it was accidentally
made to only work with classes, not other types of objects. Instead
create a minimal class to hold _wait_timed_out so it doesnt rely on
'obj' holding the boolean.
An earlier patch fixed a problem where a queued quick scan would
be triggered and fail once already connected, resulting in a state
transition from connected --> autoconnect_full. This fixed the
Connect() path but this could also happen via autoconnect. Starting
from a connected state, the sequence goes:
- DBus scan is triggered
- AP disconnects IWD
- State transition from disconnected --> autoconnect_quick
- Queue quick scan
- DBus scan results come in and used to autoconnect
- A connect work item is inserted ahead of all others, transition
from autoconnect_quick --> connecting.
- Connect completes, transition from connecting --> connected
- Quick scan can finally get triggered, which the kernel fails to
do since IWD is connected, transition from connected -->
autoconnect_full.
This can be fixed by checking for a pending quick scan in the
autoconnect path.
Commit eac2410c8314 ("station: Take scanned frequencies into account")
has made it unnecessary to explicitly invoke station_set_scan_results
with the expire to true in case a dbus scan finished prematurely or a
subset was not able to be started. Remove this no-longer needed logic.
Fixes: eac2410c8314 ("station: Take scanned frequencies into account")
The diagnostic interface will now only come up when station is
connected. This avoids the need for display station to return
a 'connected' out parameter. We can instead just see that
the diagnostic interface doesn't exist.
The diagnostic interface returns an error anyways if station is
not connected so it makes more sense to only bring the interface
up when its actually usable. This also removes the interface
when station disconnects, which was never done before (the
interface stayed up indefinitely due to a forgotten remove call).
When we're auto-connecting and have hidden networks configured, use
active scans regardless of whether we see any hidden BSSes in our
existing scan results.
This allows us to more effectively see/connect to hidden networks
when first powering up or after suspend.
Kernel might report hidden BSSes that are reported from beacon frames
separately than ones reported due to probe responses. This may confuse
the station network collation logic since the scan_bss generated by the
probe response might be removed erroneously when processing the scan_bss
that was generated due to a beacon.
Make sure that bss_match also takes the SSID into account and only
matches scan_bss structures that have the same BSSID and SSID contents.
Instead of manually managing whether to expire BSSes or not, use the
scanned frequency set instead. This makes the API slightly easier to
understand (dropping two boolean arguments in a row) and also a bit more
future-proof.
Commit d372d59bea3e checks whether a hidden network had a previous
connection attempt and re-tries. However, it inadvertently dropped
handling of a condition where a non-hidden network SSID is provided to
ConnectHiddenNetwork. Fix that.
Fixes: d372d59bea3e ("station: Allow ConnectHiddenNetwork to be retried")
The diagnostic interface serves no purpose until the AP has
been started. Any calls on it will return an error so instead
it makes more sense to bring it up when the AP is started, and
down when the AP is stopped.
This will show some basic AP information like Started and
network Name. Some cleanup was done to make the AP interface
and client table columns line up.
Its useful being able to refer to the network Name/SSID once
an AP is started. For example opening an iwctl session with an
already started AP provides no way of obtaining the SSID.
In some cases the AP can send a deauthenticate frame right after
accepting our authentication. In this case the kernel never properly
sends a CMD_CONNECT event with a failure, even though CMD_COONNECT was
used to initiate the connection. Try to work around that by detecting
that a Deauthenticate event arrives prior to any Associte or Connect
events and handle this case as a connect failure.
To help understand scanning results a bit better and cut down on scan
output add an option to not print the contents of the IEs. Only the
SSID IE will be printed.
Now that ConnectHiddenNetwork can be invoked while we're connected, set
the mac randomization hint parameter properly. The kernel will reject
requests if randomization is enabled while we're connected to a network.
If we forget a hidden network, then make sure to remove it from the
network list completely. Otherwise it would be possible to still
issue a Network.Connect to that particular object, but the fact that the
network is hidden would be lost.
StartProfile was added to the AP interface but the required
command was never added to iwctl. This command requires a
profile exists in <configuration dir>/ap/. The syntax is as
follows:
ap <wlanX> start-profile <profile_name>
==17639== 72 (16 direct, 56 indirect) bytes in 1 blocks are definitely
lost in loss record 3 of 3
==17639== at 0x4C2F0CF: malloc (vg_replace_malloc.c:299)
==17639== by 0x4670AD: l_malloc (util.c:61)
==17639== by 0x4215AA: scan_freq_set_new (scan.c:1906)
==17639== by 0x412A9C: parse_neighbor_report (station.c:1910)
==17639== by 0x407335: netdev_neighbor_report_frame_event
(netdev.c:3522)
==17639== by 0x44BBE6: frame_watch_unicast_notify (frame-xchg.c:233)
==17639== by 0x470C04: dispatch_unicast_watches (genl.c:961)
==17639== by 0x470C04: process_unicast (genl.c:980)
==17639== by 0x470C04: received_data (genl.c:1101)
==17639== by 0x46D9DB: io_callback (io.c:118)
==17639== by 0x46CC0C: l_main_iterate (main.c:477)
==17639== by 0x46CCDB: l_main_run (main.c:524)
==17639== by 0x46CF01: l_main_run_with_signal (main.c:656)
==17639== by 0x403EDE: main (main.c:490)
In the case that ConnectHiddenNetwork scans successfully, but fails for
some other reason, the network object is left in the scan results until
it expires. This will prevent subsequent attempts to use
ConnectHiddenNetwork with a .NotHidden error. Fix that by checking
whether a found network is hidden, and if so, allow the request to
proceed.
Rework the logic slightly so that this function returns an error message
on error and NULL on success, just like other D-Bus method
implementations. This also simplifies the code slightly.
We used to not allow to connect to a different network while already
connected. One had to disconnect first. This also applied to
ConnectHiddenNetwork calls.
This restriction can be dropped now. station will intelligently
disconnect from the current AP when a station_connect_network() is
issued.
If the disconnect fails and station_disconnect_onconnect_cb is called
with an error, we reply to the original message accordingly.
Unfortunately pending_connect is not unrefed or cleared in this case.
Fix that.
Fixes: d0ee923dda0b ("station: Disconnect, if needed, on a new connection attempt")
An invalid known_network.freq file containing several UUID
groups which have the same 'name' key results in memory leaks
in IWD. This is because the file is loaded and the group's
are iterated without detecting duplicates. This leads to the
same network_info's known_frequencies being set/overridden
multiple times.
To fix this we just check if the network_info already has a
UUID set. If so remove the stale entry.
There may be other old, invalid, or stale entries from previous
versions of IWD, or a user misconfiguring the file. These will
now also be removed during load.
netdev_shutdown calls queue_destroy on the netdev_list, which in turn
calls netdev_free. netdev_free invokes the watches to notify them about
the netdev being removed. Those clients, or anything downstream can
still invoke netdev_find. Unfortunately queue_destroy is not re-entrant
safe, so netdev_find might return stale data. Fix that by using
l_queue_peek_head / l_queue_pop_head instead.
src/station.c:station_enter_state() Old State: connecting, new state:
connected
^CTerminate
src/netdev.c:netdev_free() Freeing netdev wlan1[6]
src/device.c:device_free()
Removing scan context for wdev 100000001
src/scan.c:scan_context_free() sc: 0x4ae9ca0
src/netdev.c:netdev_free() Freeing netdev wlan0[48]
src/device.c:device_free()
src/station.c:station_free()
src/netconfig.c:netconfig_destroy()
==103174== Invalid read of size 8
==103174== at 0x467AA9: l_queue_find (queue.c:346)
==103174== by 0x43ACFF: netconfig_reset (netconfig.c:1027)
==103174== by 0x43AFFC: netconfig_destroy (netconfig.c:1123)
==103174== by 0x414379: station_free (station.c:3369)
==103174== by 0x414379: station_destroy_interface (station.c:3466)
==103174== by 0x47C80C: interface_instance_free (dbus-service.c:510)
==103174== by 0x47C80C: _dbus_object_tree_remove_interface
(dbus-service.c:1694)
==103174== by 0x47C99C: _dbus_object_tree_object_destroy
(dbus-service.c:795)
==103174== by 0x409A87: netdev_free (netdev.c:770)
==103174== by 0x4677AE: l_queue_clear (queue.c:107)
==103174== by 0x4677F8: l_queue_destroy (queue.c:82)
==103174== by 0x40CDC1: netdev_shutdown (netdev.c:5089)
==103174== by 0x404736: iwd_shutdown (main.c:78)
==103174== by 0x404736: iwd_shutdown (main.c:65)
==103174== by 0x46BD61: handle_callback (signal.c:78)
==103174== by 0x46BD61: signalfd_read_cb (signal.c:104)
In the case of module_init failing due to a module that comes after
netdev, the netdev module doesn't clean up netdev_list properly.
==6254== 24 bytes in 1 blocks are still reachable in loss record 1 of 1
==6254== at 0x483777F: malloc (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==6254== by 0x4675ED: l_malloc (util.c:61)
==6254== by 0x46909D: l_queue_new (queue.c:63)
==6254== by 0x406AE4: netdev_init (netdev.c:5038)
==6254== by 0x44A7B3: iwd_modules_init (module.c:152)
==6254== by 0x404713: nl80211_appeared (main.c:171)
==6254== by 0x4713DE: process_unicast (genl.c:993)
==6254== by 0x4713DE: received_data (genl.c:1101)
==6254== by 0x46E00B: io_callback (io.c:118)
==6254== by 0x46D20C: l_main_iterate (main.c:477)
==6254== by 0x46D2DB: l_main_run (main.c:524)
==6254== by 0x46D2DB: l_main_run (main.c:506)
==6254== by 0x46D502: l_main_run_with_signal (main.c:656)
==6254== by 0x403EDB: main (main.c:490)
Rather than the previous hack which disabled group traffic it
was found that the GTK RSC could be manually set to zero which
allows group traffic. This appears to fix AP mode on brcmfmac
along with the previous fixes. This is not documented in
nl80211, but appears to work with this driver.
This is how a fullmac card tells userspace that a station has
left. This fixes the issue where the same client cannot re-connect
to the same AP multiple times. ap_new_station was renamed to
ap_handle_new_station for consistency.
Some fullmac cards were found to be buggy with getting the GTK
where it returns a BIP key for the GTK index, even after creating
a GTK with NEW_KEY explicitly. In an effort to get these cards
semi-working we can treat this just as a warning and continue with
the handshake without a GTK set which disables group traffic. A
warning is printed in this case so the user is not completely in
the dark.
Fix an issue with the recent changes to signal monitoring from commit
f456501b ("station: retry roaming unless notified of a high RSSI"):
1. driver sends NL80211_CQM_RSSI_THRESHOLD_EVENT_LOW
2. netdev->cur_rssi_low changes from FALSE to TRUE
3. netdev sends NETDEV_EVENT_RSSI_THRESHOLD_LOW to station
4. on roam reassociation, cur_rssi_low is reset to FALSE
5. station still assumes RSSI is low, periodically roams
until netdev sends NETDEV_EVENT_RSSI_THRESHOLD_HIGH
6. driver sends NL80211_CQM_RSSI_THRESHOLD_EVENT_HIGH
7. netdev->cur_rssi_low doesn't change (still FALSE)
8. netdev never sends NETDEV_EVENT_RSSI_THRESHOLD_HIGH
9. station remains stuck in an infinite roaming loop
The commit in question introduced the logic in (5). Previously the
assumption in station was - like in netdev - that if the signal was
still low, the driver would send a duplicate LOW event after
reassociation. This change makes netdev follow the same new logic as
station, i.e. assume the same signal state (LOW/HIGH) until told
otherwise by the driver.
The testAPRoam autotest was silently failing on my machine until I
realized that my distribution hostapd (Arch Linux) is not built with
CONFIG_WNM_AP=y. Indeed, it is also disabled by default in upstream
hostapd. This resulted in the send_bss_transition() function of
hostapd.py silently failing. With this change, throw an exception in
case the BSS_TM_REQ command does not succeed to hopefully save others
the time of debugging this problem.
Since fullmac cards handle auth/assoc in firmware IWD must
react differently while in AP mode just as it does in station.
For fullmac cards a NEW_STATION event is emitted post association
and from here the 4-way handshake can begin. In this NEW_STATION
handler a new sta_state is created and the needed members are
set in order to inject us back into the normal code execution
for softmac post association (i.e. creating group keys and
starting the 4-way handshake). From here everything works the
same as softmac.
After the test-runner re-write many tests were left with
stale options that are no longer used at all. These were
periodically getting removed as changes were made to
individual tests, but its apparent now that a tree wide
removal was needed.
The kvmguest shorthand was removed after the release of Linux 5.10. It
was just shorthand for kvm_guest.config anyway, so update the
test-runner documentation accordingly.
At some point the non-interactive client tests began failing.
This was due to a bug in station where it would transition from
'connected' to 'autoconnect' due to a failed scan request. This
happened because a quick scan got scheduled during an ongoing
scan, then a Connect() gets issued. The work queue treats the
Connect as a priority so it delays the quick scan until after the
connection succeeds. This results in a failed quick scan which
IWD does not expect to happen when in a 'connected' state. This
failed scan actually triggers a state transition which then
gets IWD into a strange state where its connected from the
kernel point of view but does not think it is:
src/station.c:station_connect_cb() 13, result: 0
src/station.c:station_enter_state() Old State: connecting, new state: connected
src/wiphy.c:wiphy_radio_work_done() Work item 6 done
src/wiphy.c:wiphy_radio_work_next() Starting work item 5
src/station.c:station_quick_scan_triggered() Quick scan trigger failed: -95
src/station.c:station_enter_state() Old State: connected, new state: autoconnect_full
To fix this IWD should simply cancel any pending quick scans
if/when a Connect() call comes in.
There were some major problems related to logging and process
output. Tests which required output from start_process would
break if used with '--log/--verbose'. This is because we relied
on 'communicate' to retrieve the process output, but Popen does
not store process output when stdout/stderr are anything other
than PIPE.
Intead, in the case of logging or outfiles, we can simply read
from the file we just wrote to.
For an explicit --verbose application we must handle things
slightly different. A keyword argument was added to Process,
'need_out' which will ensure the process output is kept
regardless of --log or --verbose.
Now a user should be able to use --log/--verbose without any
tests failing.
The verbose arguments come in from the QEMU command line as a
single string. This should have been split into an array immediately
but was not. This led to issues like hostapd debug being enabled
when "-v hostapd_cli" was passed in.
Since the list of files copied to /tmp was part of the return value from
pre_test(), if an exception occurred inside pre_test(), "copied" would
be undefined and the post_test(ctx, copied) call in the finally clause
cause another exception:
raceback (most recent call last):
File "/home/balrog/repos/iwd/tools/test-runner", line 1508, in <module>
run_tests()
File "/home/balrog/repos/iwd/tools/test-runner", line 1242, in run_tests
run_auto_tests(config.ctx, args)
File "/home/balrog/repos/iwd/tools/test-runner", line 1166, in run_auto_tests
post_test(ctx, copied)
UnboundLocalError: local variable 'copied' referenced before assignment
(apart from not being able to clean up the files). Pass "copied" as a
paremeter to pre_test instead.
Switch EAP-TLS-ClientCert and EAP-TLS-ClientKey to use
l_cert_load_container_file for file loading so that the file format is
autodetected. Add new setting EAP-TLS-ClientKeyBundle for loading both
the client certificate and private key from one file.
As requested move the client certificate and private key loading from
eap-tls-common.c to eap-tls.c. No man page change needed because those
two settings weren't documented in it in the first place.
After the re-write this was broken and not noticed until
recently. The issue appeared to be that the GLib timeout
callback retained no context of local variables. Previously
_wait_timed_out was set as a class variable, but this was
removed so multiple IWD instances could work. Without
_wait_timed_out being a class variable the GLib timeout
setting it had no effect on the wait loop.
To fix this we can set _wait_timed_out on the object being
passed in. This is preserved in the GLib timeout callback
and setting it gets honored in the wait loop.
This command uses GetDiagnostics to show a list of connected
clients and some information about them. The information
contained for each connected station nearly maps 1:1 with the
station diagnostics information shown in "station <wlan> show"
apart from "ConnectedBss" which is now "Address".
For now this module serves as a helper for printing diagnostic
dictionary values. The new API (diagnostic_display) takes a
Dbus iterator which has been entered into a dictionary and
prints out each key and value. A mapping struct was defined
which maps keys to types and units. For simple cases the mapping
will consist of a dbus type character and a units string,
e.g. dBm, Kbit/s etc. For more complex printing which requires
processing the value the 'units' void* cant be set to a
function which can be custom written to handle the value.
This adds a new AccessPointDiagnostic interface. This interface
provides similar low level functionality as StationDiagnostic, but
for when IWD is in AP mode. This uses netdev_get_all_stations
which will dump all stations, parse, and return each station in
an individual callback. Once the dump is complete the destroy is
called and all data is packaged as an array of dictionaries.
AP mode will use the same structure for its diagnostic interface
and mostly the same dictionary keys. Apart from ConnectedBss and
Address being different, the remainder are the same so the
diagnostic_station_info to DBus dictionary conversion has been made
common so both station and AP can use it to build its diagnostic
dictionaries.
With AP now getting its own diagnostic interface it made sense
to move the netdev_station_info struct definition into its own
header which eventually can be accompanied by utilities in
diagnostic.c. These utilities can then be shared with AP and
station as needed.
systemd specifies a special passive target unit 'network-pre.target'
which may be pulled in by services that want to run before any network
interface is brought up or configured. Correspondingly, network
management services such as iwd and ead should specify
After=network-pre.target to ensure a proper ordering with respect to
this special target. For more information on network-pre.target, see
systemd.special(7).
Two examples to explain the rationale of this change:
1. On one of our embedded systems running iwd, a oneshot service is
run on startup to configure - among other things - the MAC address of
the wireless network interface based on some data in an EEPROM.
Following the systemd documentation, the oneshot service specifies:
Before=network-pre.target
Wants=network-pre.target
... to ensure that it is run before any network management software
starts. In practice, before this change, iwd was starting up and
connecting to an AP before the service had finished. iwd would then
get kicked off by the AP when the MAC address got changed. By
specifying After=network-pre.target, systemd will take care to avoid
this situation.
2. An administrator may wish to use network-pre.target to ensure
firewall rules are applied before any network management software is
started. This use-case is described in the systemd documentation[1].
Since iwd can be used for IP configuration, it should also respect
the After=network-pre.target convention.
Note that network-pre.target is a passive unit that is only pulled in if
another unit specifies e.g. Wants=network-pre.target. If no such unit
exists, this change will have no effect on the order in which systemd
starts iwd or ead.
[1] https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
Following a successful roaming sequence, schedule another attempt unless
the driver has sent a high RSSI notification. This makes the behaviour
analogous to a failed roaming attempt where we remained connected to the
same BSS.
This makes iwd compatible with wireless drivers which do not necessarily
send out a duplicate low RSSI notification upon reassociation. Without
this change, iwd risks getting indefinitely stuck to a BSS with low
signal strength, even though a better BSS might later become available.
In the case of a high RSSI notification, the minimum roam time will also
be reset to zero. This preserves the original behaviour in the case
where a high RSSI notification is processed after station_roamed().
Doing so also gives a chance for faster roaming action in the following
example scenario:
1. RSSI LOW
2. schedule roam in 5 seconds
(5 seconds pass)
3. try roaming
4. roaming fails, same BSS
5. schedule roam in 60 seconds
(20 seconds pass)
6. RSSI HIGH
7. cancel scheduled roam
(20 seconds pass)
8. RSSI LOW
9. schedule roam in 5 seconds or 20 seconds?
By resetting the minimum roam time, we can avoid waiting 20 seconds when
the station may have moved considerably. And since the high/low RSSI
notifications are configured with a hysteresis, we should still be
protected against too frequent spurious roaming attempts.
This takes a Dbus iterator which has been entered into a
dictionary and prints out each key and value. It requires
a mapping which maps keys to types and units. For simple
cases the mapping will consist of a dbus type character
and a units string, e.g. dBm, Kbit/s etc. For more complex
printing which requires processing the value the 'units'
void* cant be set to a function which can be custom written
to handle the value.
This is a nl80211 dump version of netdev_get_station aimed at
AP mode. This will dump all stations, parse into
netdev_station_info structs, and call the callback for each
individual station found. Once the dump is completed the destroy
callback is called.
Since commit 836beb1276d1bc77889462ae514f0c5b708a38d7 removed beacon
loss handling, the roam_no_orig_ap variable has no use and is always set
to false. This commit removes it.
The information requested with GetDiagnostics will now appear in
the "station <iface> show" command. If IWD is not connected, or
there is no diagnostic interface (older IWD version) 'show' will
behave as it always has, only showing scanning/connected.
Some elements, though unlikely, are not required to be included
with the GET_STATION call that GetDiagnostics relies on. mac80211
based drivers include most of these, but other drivers may not.
To be on the safe side all properties except ConnectedBss are now
optional and may not be included.
This adds a generalized API for GET_STATION. This API handles
calling and parsing the results into a new structure,
netdev_station_info. This results structure will hold any
data needed by consumers of netdev_get_station. A helper API
(netdev_get_current_station) was added as a convenience which
automatically passes handshake->aa as the MAC.
For now only the RSSI is parsed as this is already being
done for RSSI polling/events. Looking further more info will
be added such as rx/tx rates and estimated throughput.
Arrays of dictionaries are quite common, and for basic
types this API makes things much more convenient by
putting all the enter/append/leave calls in one place.
Retrieve the dependencies of readline through pkg-config (and fallback
to -lreadline) to avoid the following build failure:
/nvme/rc-buildroot-test/scripts/instance-0/output-1/host/opt/ext-toolchain/bin/../lib/gcc/x86_64-buildroot-linux-uclibc/8.3.0/../../../../x86_64-buildroot-linux-uclibc/bin/ld: /nvme/rc-buildroot-test/scripts/instance-0/output-1/host/bin/../x86_64-buildroot-linux-uclibc/sysroot/usr/lib/libreadline.a(display.o): in function `cr':
display.c:(.text+0x1ab): undefined reference to `tputs'
Fixes:
- http://autobuild.buildroot.org/results/8fb1341f2f5094c346456b43b4fc04996c2e1485
Add a parameter to station_set_scan_results to allow skipping the
removal of old BSSes. In the DBus-triggered scan only expire BSSes
after having gone through the full supported frequency set.
It should be safe to pass partial scan results to
station_set_scan_results() when not expiring BSSes so using this new
parameter I guess we could also call it for roam scan results.
A scan normally takes about 2 seconds on my dual-band wifi adapter when
connected. The drivers will normally probe on each supported channel in
some unspecified order and will have new partial results after each step
but the kernel sends NL80211_CMD_NEW_SCAN_RESULTS only when the full
scan request finishes, and for segmented scans we will wait for all
segments to finish before calling back from scan_active() or
scan_passive().
To improve user experience define our own channel order favouring the
2.4 channels 1, 6 and 11 and probe those as an individual scan request
so we can update most our DBus org.connman.iwd.Network objects more
quickly, before continuing with 5GHz band channels, updating DBus
objects again and finally the other 2.4GHz band channels.
The overall DBus-triggered scan on my wifi adapter takes about the same
time but my measurements were not very strict, and were not very
consistent with and without this change. With the change most Network
objects are updated after about 200ms though, meaning that I get most
of the network updates in the nm-applet UI 200ms from opening the
network list. The 5GHz band channels take another 1 to 1.5s to scan and
remaining 2.4GHz band channels another ~300ms.
Hopefully this is similar when using other drivers although I can easily
imagine a driver that parallelizes 2.4GHz and 5GHz channel probing using
two radios, or uses 2, 4 or another number of dual-band radios to probe
2, 4, ... channels simultanously. We'd then lose some of the
performance benefit. The faster scan results may be worth the longer
overall scan time anyway.
I'm also assuming that the wiphy's supported frequency list is exactly
what was scanned when we passed no frequency list to
NL80211_CMD_TRIGGER_SCAN and we won't get errors for passing some
frequency that shouldn't have been scanned.
Use the hwsim DBus API rather than command line. This both is
faster and more dynamic than doing so with the command line.
This also avoids tracking the radio ID since we can just hang
on to the radio Dbus object directly.
The Create() API was limited to only taking a Name and boolean
(for p2p enabling). The actual hwsim nl80211 API can take more
attributes than this (which are actually utilized when creating
from the command line). To get the DBus API up to the same
functionality the two arguments in Create were replaced with
a single dictionary. This allows for extending later if more
arguments are needed.
In the NEW_RADIO callback hwsim was assuming that DBus had no
yet replied to the Create() method. In some cases the NEW_RADIO
event fires before the actual callback which will respond to
DBus. This causes a crash in the create callback.
Starts hwsim but does not register to mac80211_hwsim. This is to
allow autotests to disable hwsim, while still having the ability
to create/destroy radios over DBus.
Readline uses the characters \001 and \002 to mark the start and end
of zero-length character sequnces in the prompt before prompt
expansion. Without these characters the input point can become offset
from the visual end of the prompt when performing some actions.
Tests netconfig with a static configuration, as well as tests ACD
functionality.
The test has two IWD radios which will eventually use the same IP.
One is configured statically, one will receive the IP via DHCP.
The static client sets its IP first and begins using it. Then the
DHCP client is started. Since ACD in a DHCP client is configured
to use its address indefinitely, the static client *should* give
up its address.
When the IP is configured to be static we can now use ACD in
order to check that the IP is available and not already in
use. If a conflict is found netconfig will be reset and no IP
will be set on the interface. The ACD client is left with
the default 'defend once' policy, and probes are not turned
off. This will increase connection time, but for static IP's
it is the best approach.
For better reliability the processor count is now set to qemu.
In cases of low CPU count (< 2) hosts the processor count is
limited to 1. Otherwise half of the host cores will be used for
the VM.
Certain classes were still using the default namespace. This
didn't matter yet since testAP was the only test using namespaces,
and the AP interface was the only one being used.
For an IWD station on a separate namespace all objects need to
be accessable, so the namespace is passed along to those as needed.
Allow the storage directory (default /tmp/iwd) to be configured
just like the state directory. This is in order to support multiple
IWD instances which require separate storage directories for network
provisioning files.
The docs just specified what a IP prefix looks like, not an
actual example. Though its not recommended to just copy paste
blindly, its still useful to have some value in the man pages
that actually works if someone just wants to get a DHCP server
working.
In the strange case that the dns list or the domain list are empty and
openresolv is being used, delete the openresolv entry instance instead
of trying to set it to an empty value
Make sure to erase the network_info of a known network that has been
removed before disconnecting any stations connected to it. This fixes
the following warning observed when forgetting a connected network:
WARNING: ../git/src/network.c:network_rank_update() condition n < 0 failed
This also fixes a bug where such a forgotten network would incorrectly
appear as the first element in the response to GetOrderedNetworks(). By
clearing the network_info, network_rank_update() properly negates the
rank of the now-unknown network.
Due to timing this test sometimes does not pass because it was
just asserting on the device state rather than waiting for a
change. This generally worked but not always.
==5279== 104 bytes in 2 blocks are definitely lost in loss record 1 of 1
==5279== at 0x4C2F0CF: malloc (vg_replace_malloc.c:299)
==5279== by 0x4655CD: l_malloc (util.c:61)
==5279== by 0x47116B: l_rtnl_address_new (rtnl.c:136)
==5279== by 0x438F4B: netconfig_get_dhcp4_address (netconfig.c:429)
==5279== by 0x438F4B: netconfig_ipv4_dhcp_event_handler
(netconfig.c:735)
==5279== by 0x491C77: dhcp_client_event_notify (dhcp.c:332)
==5279== by 0x491C77: dhcp_client_rx_message (dhcp.c:810)
==5279== by 0x492A88: _dhcp_default_transport_read_handler
(dhcp-transport.c:151)
==5279== by 0x46BECB: io_callback (io.c:118)
==5279== by 0x46B10C: l_main_iterate (main.c:477)
==5279== by 0x46B1DB: l_main_run (main.c:524)
==5279== by 0x46B3EA: l_main_run_with_signal (main.c:646)
==5279== by 0x403ECE: main (main.c:490)
Fix the AlwaysRandomizeAddress setting name.
Add the stricter specification of the extension syntax.
Clarify that GTC and MD5 can't be used as outer EAP methods with wifi.
Tracking of addresses that weren't set by us seemed a bit questionable.
Take this out for now. If this is ever needed, then a queue with
l_rtnl_address objects should be used.
Introduce a new v4_address member which will hold the currently
configured IPV4 address (static or obtained via DHCP). Use the new
l_rtnl_address class for this.
As a side-effect, lease expiration will now properly remove the
configured address.
This patch converts the code to use the new l_rtnl_address class. The
settings parsing code will now return an l_rtnl_address object which
can be installed directly.
Also, address removal path for static addresses has been removed, since
netconfig_reset() sets disable_ipv6 setting to '1', which will remove
all IPV6 addresses for the interface.
This patch converts the code to use the new l_rtnl_route class instead
of using l_rtnl_route6* utilities. The settings parsing code will now
return an l_rtnl_route object which can be installed directly.
Also, the route removal path has been removed since netconfig_reset()
sets disable_ipv6 setting to '1' which will remove all IPV6 routes and
addresses for the interface.
Our simulated environment was really only meant to test air-to-air
communication by using mac80211_hwsim. Protocols like DHCP use IP
communication which starts to fall apart when using hwsim radios.
Mainly unicast sockets do not work since there is no underlying
network infrastructure.
In order to simulate a more realistic environment network namespaces
are introduced in this patch. This allows wireless phy's to be added
to a network namespace and unique IWD instances manage those phys.
This is done automatically when 'NameSpaces' entries are configured
in hw.conf:
[SETUP]
num_radios=2
[NameSpaces]
ns0=rad1,...
This will create a namespace named ns0, and add rad1 to that
namespace. rad1 will not appear as a phy in what's being called the
'root' namespace (the default namespace).
As far as a test is concerned you can create a new IWD() class and
pass the namespace in. This will start a new IWD instance in that
namespace:
ns0 = ctx.get_namespace('ns0')
wd_ns0 = IWD(start_iwd=True, namespace=ns0)
'wd_ns0' can now be used to interact with IWD in that namespace, just
like any other IWD class object.
This also changes the resolve API a little bit to act as a 'set' API
instead of an incremental 'add' API. This is actually easier to manage
in the resolve module since both systemd and resolvconf want changes
wholesale and not incrementally.
Both these tests create many radios which sometimes causes timing
problems when hwsim is running. Since hwsim is not required for
these tests we can disable it and increase test reliability.
Waiting to request neighbor reports until we are in need of a roam
delays the roam time, and probably isn't as reliable since we are
most likely in a low RSSI state. Instead the neighbor report can
be requested immediately after connecting, saved, and used if/when
a roam is needed. The existing behavior is maintained if the early
neighbor report fails where a neighbor report is requested at the
time of the roam.
The code which parses the reports was factored out and shared
between the existing (late) neighbor report callback and the early
neighbor report callback.
Sometimes improperly written tests can end up causing future tests
to fail. For faster debugging you can now add a '+' after a given
autotest which will start that test and run all tests which come
alphabetically after it (as if you are running a full autotest suite).
Example:
./test-runner -A testWPA+
This will run testWPA, testWPA2, testWPA2-no-CCMP, testWPA2-SHA256,
and testWPA2withMFP.
This can result in strange test results since there was no less
than zero checks before subtracting the total tests from failed
tests. In case of an internal exception we can just set all values
to zero. This will be handled specially as we do for timeout
errors.
When network namespaces are introduced there may be multiple
IWD class instances. This makes IWD.get_instance ambiguous
when namespaces are involved. iwd.py has been refactored to
not use IWD.get_instance, but testutil still needs it since
its purely based off interface names. Rather than remove it
and modify every test to pass the IWD object we can just
maintain the existing behavior for only the root namespace.
You can now specify a limited list of subtests to run out of a
full auto-test using --sub-tests,-S. This option is limited in
that it is only meant to be used with a single autotest (since
it doesn't make much sense otherwise).
The subtest can be specified both with or without the file
extension.
Example usage:
./test-runner -A testAP -S failure_test,dhcp_test.py
This will only run the two subtests and exclude any other *.py
tests present in the test directory.
handshake_state_set_authenticator_ie must be called to set group_cipher
in struct handshake_shake before handshake_set_gtk_state, otherwise
handshake_set_gtk_state is unable to determine the key length to set
handshake state gtk.
Fixes: 4bc20a097965 ("ap: Start EAP-WSC authentication with WSC enrollees")
For now the RA client is ran automatically when DHCPv6 client starts.
RA takes care of installing / deleting prefix routes and installing the
default gateway. If Router Advertisements indicate support DHCPv6, then
DHCPv6 transactions are kicked off and the address is set / removed
automatically.
Stateless configuration is not yet supported.
Modern kernels ~5.4+ have changed the way lost beacons are
reported and effectively make the lost beacon event useless
because it is immediately followed by a disconnect event. This
does not allow IWD enough time to do much of anything before
the disconnect comes in and we are forced to fully re-connect
to a different AP.
The agent path was generated based on the current time which
sometimes yielded duplicate paths if agents were created quickly
after one another. Instead a simple iterator removes any chance
of a duplicate path.
If running multiple tests testNetconfig would fail due to the
hardcoded wln0 in the dhcpd.conf file. dhcpd can actually start
by passing in the interface to the run command rather than
inside the config file.
If EnableNetworkConfiguration was enabled ap.c required that
APRanges also be set. This prevents IWD from starting which
effects a perfectly valid station configuration. Instead if
APRanges is not provided IWD still allows ap_init to pass but
DHCP just will not be enabled.
Code was added with commit 04487f575b6 which passes a radio object
to the Interface class constructor and stores it in the Interface
object. The radio class also stores each Interface object which
creates a circular reference and causes the Radio to stick around
long after the tests finishes.
I cannot see why the Interface needs to keep track of the Radio
object. None of the wpa_supplicant utilities use this so it has
been removed.
Users can now supply an AP provisioning file containing an [IPv4]
section and define various DHCP settings:
[IPv4]
Address=<address>
Netmask=<netmask>
Gateway=<gateway>
IPRange=<start_address>,<end_address>
DNSList=<dns1>,<dns2>,...<dnsN>
LeaseTime=<lease_time>
There are a few notes/requirements to keep in mind when using a
provisioning file:
- All settings are optional but [IPv4].Address is required if the
interface does not already have an address set.
- If no [IPv4].Address is defined in the provisioning file and the AP
interface does not already have an address set, StartWithConfig()
will fail with -EINVAL.
- If a provisioning file is provided it will take precedence, and the
AP will not pull from the IP pool.
- A provisioning file containing an IPv4 section assumes DHCP is being
enabled and will override [General].EnableNetworkConfiguration.
- Any address that AP sets on the interface will be deleted when the AP
is stopped.
Users can now start an AP from settings based on a profile
on disk. The only argument is the SSID which will be used to
lookup the profile. If no profile is found a NotFound error
will be returned. Any invalid profiles will result in an
Invalid return.
This seems to happen occationally with testAP (potentially others).
The invalid read appears to happen when the frame_xchg_tx_cb detects
an early status and no ACK. In this particular case there is no
retry interval so we reach the retry limit and 'done' the frame.
This frees the 'fx' data all before the destroy callback can get
called. Once we finally return and the destroy callback is called
'fx' is freed and we see the invalid write.
==206== Memcheck, a memory error detector
==206== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==206== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==206== Command: iwd -p rad1,rad2,rad3,rad4 -d
==206== Parent PID: 140
==206==
==206== Invalid write of size 4
==206== at 0x4493A0: frame_xchg_tx_destroy (frame-xchg.c:941)
==206== by 0x46DAF6: destroy_request (genl.c:673)
==206== by 0x46DAF6: process_unicast (genl.c:1002)
==206== by 0x46DAF6: received_data (genl.c:1101)
==206== by 0x46AA4B: io_callback (io.c:118)
==206== by 0x469D6C: l_main_iterate (main.c:477)
==206== by 0x469E1B: l_main_run (main.c:524)
==206== by 0x469E1B: l_main_run (main.c:506)
==206== by 0x46A02B: l_main_run_with_signal (main.c:646)
==206== by 0x403E78: main (main.c:490)
==206== Address 0x4c59c6c is 172 bytes inside a block of size 176 free'd
==206== at 0x483B9F5: free (vg_replace_malloc.c:538)
==206== by 0x40F14C: destroy_work (wiphy.c:248)
==206== by 0x40F14C: wiphy_radio_work_done (wiphy.c:1578)
==206== by 0x44A916: frame_xchg_tx_cb (frame-xchg.c:930)
==206== by 0x46DAD9: process_unicast (genl.c:993)
==206== by 0x46DAD9: received_data (genl.c:1101)
==206== by 0x46AA4B: io_callback (io.c:118)
==206== by 0x469D6C: l_main_iterate (main.c:477)
==206== by 0x469E1B: l_main_run (main.c:524)
==206== by 0x469E1B: l_main_run (main.c:506)
==206== by 0x46A02B: l_main_run_with_signal (main.c:646)
==206== by 0x403E78: main (main.c:490)
==206== Block was alloc'd at
==206== at 0x483A809: malloc (vg_replace_malloc.c:307)
==206== by 0x4643CD: l_malloc (util.c:61)
==206== by 0x44AF8C: frame_xchg_startv (frame-xchg.c:1155)
==206== by 0x44B2A4: frame_xchg_start (frame-xchg.c:1108)
==206== by 0x42BC55: ap_send_mgmt_frame (ap.c:709)
==206== by 0x42F513: ap_probe_req_cb (ap.c:1869)
==206== by 0x449752: frame_watch_unicast_notify (frame-xchg.c:233)
==206== by 0x46DA2F: dispatch_unicast_watches (genl.c:961)
==206== by 0x46DA2F: process_unicast (genl.c:980)
==206== by 0x46DA2F: received_data (genl.c:1101)
==206== by 0x46AA4B: io_callback (io.c:118)
==206== by 0x469D6C: l_main_iterate (main.c:477)
==206== by 0x469E1B: l_main_run (main.c:524)
==206== by 0x469E1B: l_main_run (main.c:506)
==206== by 0x46A02B: l_main_run_with_signal (main.c:646)
==206==
The existing AP tests needed to be modified to start IWD from
python since the DHCP test uses a different main.conf.
Also removed some stale hw.conf options that are no longer used.
The DHCP server can be enabled by enabling network configuration
with [General].EnableNetworkConfiguration. If an IP is not set
on the interface before the AP is started a valid IP range must
also be provided under [General].APRanges in IP prefix format e.g.
[General]
EnableNetworkConfiguration=true
APRanges=192.168.1.1/24
Each AP started will get assigned a new subnet within the range
specified by APRanges as to not conflict with other AP interfaces.
If there are no subnets left in the pool when an AP is started
it will fail with -EEXIST. Any AP's that are stopped will release
their subnet back into the pool to be used with other APs.
The DHCP IP pool will be automatically chosen by the ELL DHCP
implementation (+1 the AP's IP to *.254). The remaining DHCP
settings will be defaults chosen by ELL (DNS, lease time, etc).
If the caller specifies the number of devices only return that many.
Some sub-tests may only need a subset of the total number of devices
for the test. If the number of devices expected is less than the total
being returned, python would throw an exception.
If a test does not need any hostapd instances but still loads
hostapd.py for some reason we want to gracefully throw an
exception rather than fail in some other manor.
Add the new wpas.Wpas class roughly based on hostapd.HostapdCLI but only
adding methods for the P2P-related stuff.
Adding "wpa_supplicant" to -v will enable output from the wpa_supplicant
process to be printed and "wpa_supplicant-dbg" will make it more verbose
("wpa_supplicant" is not needed because it seems to be automatically
enabled because of the glob matching in ctx.is_verbose)
Add support for a WPA_SUPPLICANT section in hw.conf where
'radN=<config_path>' lines will only reserve radios and create
interfaces for the autotest to be able to start wpa_supplicant on them,
i.e. this prevents iwd or hostapd from being started on them but doesn't
start a wpa_supplicant instance by itself.
The host systems configuration directories for IWD/EAD were
being mounted in the virtual machine. This required that the
host create these directories before hand. Instead we can
just set up the system and IWD/EAD to use directories in /tmp
that we create when we start the VM. This avoids the need for
any host configuration.
When the SignalLevelAgent doc blurb was moved to station-api.txt it
seems the interface was changed to Station.SignalLevelAgent in some
places but not in most and not in the code. Also fix the pointers to
the doc file.
periodic_scan_stop is called whenever we exit the autoscan state but a
periodic scan may not be running at the time. If we have a
user-triggered scan running, or the autoconnect_quick scan, and we reset
Scanning to false before that scan finished, a client could en up
calling GetOrderedNetwork too early and not receiving the scan results.
ConnectHiddenNetwork can be seen a triggering this sequence:
1. the active scan,
2. the optional agent request,
3. the Authentication/Association/4-Way Handshake/netconfig,
4. connected state
Currently Disconnect() interrupts 3 and 4, allow it to also interrupt
state 1. It's difficult to tell whether we're in state 2 from within
station.c.
Allow the "hwsim_medium=no" setting in hw.conf's SETUP section to
disable starting hwsim. It looks like the packets going through
userspace add enough latency that active scans don't work, probe
responses don't arrive within the "dwell time" or probe requests are not
ACKed on time. I've tried modifying tools/hwsim.c to respond with the
HWSIM_CMD_TX_INFO_FRAME cmd as the first thing after receiving a
HWSIM_CMD_FRAME and even skipping the queue in ell/genl.c by writing the
command synchronously, but neither helped enough to make the scans work.
This does not rule out that hwsim or the way our scans are done can be
fixed and that would obviously be better than what I did in this patch.
Since our DBus API and our use cases only support initiating connections
and not accepting incoming connections we don't really need to reply to
Probe Requests on the P2P-Device interface. Start doing it firstly so
that we can test the scenario where we get discovered and pre-authorized
to connect in an autotest (wpa_supplicant doesn't seem to have a way to
authorize everyone, which is probably why most Wi-Fi Display dongles
don't do it and instead reply with "Fail: Information not available" and
then restart connection from their side) and secondly because the spec
wants us to do it.
Make sure dev->peer_list is non-NULL before using l_queue_push_tail()
same as we do when the peer info comes from a Probe Response (active
scan in Find Phase). Otherwise peers discovered through Probe Requests
before any Probe Responses are received will be lost.
The device type category array is indexed by the category ID so if we're
skipping i == 0 in the iteration, we should also skip the 0'th element
in device_type_categories.
The callback for the FRAME command was causing a crash in
wiphy_radio_work_done when not cancelled when the wiphy was being
removed from the system. This was likely to happen if this radio work
item was waiting for another item to finish. When the first one was
being cancelled due to the wiphy being removed, this one would be
started and immediately stopped by the radio work queue.
Now this crash could be fixed by dropping all frame exchange instances
on an interface that is being removed which is easy to do, but properly
cancelling the commands saves us the headache of analysing whether
there's a race condition in other situations where a frame exchange is
being aborted.
We want to use this flag only on the interfaces with one of the three
P2P iftypes so set the flag automatically depending on the iftype from
the last 'config' notification.
This extends test-runner to also use iwmon if --log is enabled.
For this case the iwmon log will be found inside each test
log directory.
A new option, --monitor <file> was added in case full logging isn't
desired (potentially for timing issues) but a iwmon log is needed.
Be aware that when --monitor is used test-runner will mount the
entire parent directory. test-runner itself will only write to the
file specified, but just know that the parent directory is available
as read-write inside the VM.
--log takes precedence over --monitor, meaning the iwmon log will
be written to <logdir>/<test>/iwmon instead of the file specified
with --monitor if both options are provided.
Convert ap_send_mgmt_frame() to use frame_xchg_start for sending frames,
this fixes among other things the ACK-received checks.
One side effect is that we're no longer sending Probe Responses with the
don't-wait-for-ack flag because frame-xchg doesn't support it, but other
AP implementations don't use that flag either.
Another side-effect is that we do use the no-cck-rate flag
unconditionally, something we may want to fix but would need to add
another parameter to frame-xchg.
The virtual environment changed slightly adding two network adatpers
which are connected to the same backend so they can communicate with
each other (basically connected to a switch). The hostapd command
line was modified to allow no interfaces to be passed in which lets
us create zero radios but still specify a radius_config file.
This module is essentially a heavily stripped down version of iwd.py
to work with EAD. Class names were changed to match EAD but basically
the EAD, Adapter, and AdapterList classes map 1:1 to IWD, Device, and
DeviceList.
This is somewhat of a hack, but the IWDDBusAbstract is a very
convenient abstraction to DBus objects. The only piece that restricts
it to IWD is the hardcoded IWD_SERVICE. Instead we can pass in a
keyword argument which defaults to IWD_SERVICE. That way other modules
(like EAD) can utilize this abstraction with their own service simply
by changing that service argument.
The interface was hard coded to wln0 which works when running single
tests but not when running multiple. Instead use the actual ifname
that hostapd is using.
Add a "psk" setting to allow the user to pass the binary PSK directly
instead of generating it from the passphrase and the SSID. In that case
we'll only send the PSK to WSC enrollees.
There has been a desire to remove the ELL plugin dependency from
IWD which is the only consumer of the plugin API. This removes
the dependency and prepares the tree for converting the existing
ofono plugin into a regular module.
sim_hardcoded was removed completely. This was originall implemented
before full ofono support purely to test the IWD side of EAP-SIM/AKA.
Since the ofono plugin (module-to-be) is now fully implemented there
really isn't a need for sim_hardcoded.
Tests that DHCP using IWD's internal netconfig functions properly.
The actual IP address assignment is not verified, but since IWD does
not signal the connection as successful unless DHCP succeeds we
can assume it was successful by checking that the device is connected.
The process of actually starting dhcpd and configuring the interfaces
is quite simple so it was left in the autotest itself. If (or when)
more tests require IP capabilities (p2p, FILS, etc) this could be
moved into test-runner itself and be made common. The reason I did not
put it in there now is a) because this is the only test and b) more
complex DHCP cases are likely to develop and may require more than this
simplistic setup (like multiple APs/interfaces)
This is just a more concise/pythonic way of doing function arguments.
Since Process/start_process have basically the same argument names
we can simplify and use **kwargs which will pass the named arguments
directly to Process(). This also allows us to add arguments to Process
without touching start_process if we need.
The AdHoc functionality in iwd.py was not consistent at all with
how all the other classes worked (my bad). Instead we can create
a very simple AdHocDevice class which inherits all the DBus magic
in the IWDDBusAbstract class.
The Started property was being set in the Join IBSS callback which
isn't really when the IBSS has been started. The kernel automatically
scans for IBSS networks which takes some time. Its better to wait
on setting Started until we get the Join IBSS event.
Commit 1f910f84b47d ("eapol: Use eapol_start in authenticator mode too")
introduced the requirement that authentication eapol_sm objects also had
to be started via eapol_start. Adhoc was never updated to do that.
Many tests waited on the network object 'connected' property after
issuing a Connect command. This is not correct as 'connected' is
set quite early in the connection process. The correct way of doing
this is waiting for the device state to change to connected.
This common code was replaced, hopefully putting to rest any random
failures that happen occasionally.
Some cleanup code got removed by mistake which cleared out any
hwsim rules before the next subtest. Without this the second test
would end up getting erroneous signal strength numbers in the scan
results causing a failure.
This got added in the re-write but a __del__ method was also
added to the Rule class as well. This caused problems if hwsim
cleaned up since it removed the rules, which caused each rule
to call __del__. Since the rule had already been removed there
was no longer a DBus interface which raised an exception.
For multi-bss networks its nice to know which BSS is being connected
to. The ranking can hint at it, but blacklisting or network capabilities
could effect which network is actually chosen. An explicit debug print
makes debugging much easier.
Again the hs->support_ip_allocation flag is used for two purposes here,
first the user signals whether to support this mechanism through this
flag, then it reads the flag to find out if an IP was allocated.
Support IP allocation during the 4-Way Handshake as defined in the P2P
spec. This is the supplicant side implementation.
The API requires the user to set hs->support_ip_allocation true before
eapol_start(). On HANDSHAKE_EVENT_COMPLETE, if this same flag is still
set, we've received the IP lease, the netmask and the authenticator's
IP from the authenticator and there's no need to start DHCP. If the
flag is cleared, the user needs to use DHCP.
Allow the possibility of becoming the Group-owner when we parse the GO
Negotiation Request, build GO Negotiation Response and parse the GO
Negotiation Confirmation, i.e. if we're responding to a negotiation
initiated by the peer after it needed to request user action.
Until now the code assumed we can't become the GO or we'd report error.
Allow the possibility of becoming the Group-owner when we build the GO
Negotiation Request, parse GO Negotiation Response and build the GO
Negotiation Confirmation, i.e. if we're the initiator of the
negotiation.
Until now the code assumed we can't become the GO or we'd report error.
Add a utility to select random characters from the set defined in P2P
v1.7 Section 3.2.1. In this version the assumption is that we're only
actually using this for the two SSID characters.
explicit_bzero is used in src/ap.c since commit
d55e00b31d7bccdbb2ea1cdeb0a749df77a51e47 but src/missing.h is not
included, as a result build with uclibc fails on:
/srv/storage/autobuild/run/instance-1/output-1/host/lib/gcc/xtensa-buildroot-linux-uclibc/9.3.0/../../../../xtensa-buildroot-linux-uclibc/bin/ld: src/ap.o: in function `ap_probe_req_cb':
ap.c:(.text+0x23d8): undefined reference to `explicit_bzero'
Fixes:
- http://autobuild.buildroot.org/results/c7a0096a269bfc52bd8e23d453d36d5bfb61441d
Before the re-write there was interesting escapes being used for
set_neighbor. Curiously now hostapd fails to set the neighbor due
to these escapes so they have been removed.
Switched around hwsim rules with the IWD initializer to avoid
IWD periodically scanning before hwsim rules are in place. Removed
some unneeded code during teardown.
Changed to wait for DeviceState instead of network object as well
as moved hwsim rules ahead of the IWD initializer to avoid IWD
scanning before the rules are fully in place.
This test occationally failed, and it uses the old style of waiting
for connected on the network object instead of the device object.
The hwsim rule was also moved ahead of the IWD() initializer which
ensures that IWD doesn't scan before the rule can be set/processed.
This test occationally fails due to no hwsim rules. Basically we
were just expecting iwd to connect to one of 3 access points but
the ranking was equal, so it chose the first in the scan list.
Now a signal strength is assigned to each AP to steer IWD into
connecting to the expected AP.
As with other tests, wait on device state instead of the network
object. The connectivity test was also changed to not check for
group traffic since AP does not negotiate the IGTK at this time.
There were a number of fixes here. The waits were changed to wait
on the device state instead of the network state and hwsim rules
were removed after the test as to not interfere with future tests.
One of the rules was setting the signal to -10000 wich was causing
the ranking to be zero.
Updated testFT-SAE-roam to use the TestContext APIs as well as
fixed the failure which was introduced after requiring stricter
AKM logic for SAE networks. The new failure was due to the hostapd
config not including the standard SAE AKM which is actually
required by the spec.
Slower systems may not be able to make some timeouts that tests
mandated. All timeouts were increased significantly to allow tests
to pass on slow systems.
It is not safe to assume that the python dbus implementation will
wait for a method to return. The documentation says this with
respect to reply_handler/error_handler:
"If both are None, the implementation may request that no reply is sent"
To stay on the safe side we should always include the error/reply
handlers and wait for the operation to complete.
Removed test-runner.c, and renamed py_runner to test-runner. Removed
tools/test-runner from .gitignore.
This was done as a separate commit to avoid a nasty diff between the
existing test runner, and the new python version
test-runner now supports interface name replacement inside hostapd
config files. Since a given test configuration doesn't know what
interface names there will be $ifaceN can be specified instead e.g.
rsn_preauth_interfaces=$iface0 $iface1
The $ifaceN values will be replace with actual interface names when
the test is started.
This patch also removes ctrl_interface inside the hostapd config
files as this is no longer required.
This test was unreliable since it was assuming a periodic scan would
happen at just the right time. Instead since we are expecting autoconnect
we can just wait for DeviceState.connected then after we are connected
verify the network was correct.
This test was never 100% reliable, and after the test-runner re-write
it became extremely unreliable. The issue came down to the very common
block of code thats present in many tests where we wait for obj.scanning
then not obj.scanning. This is fine when a dbus scan() is explicitly
done before, otherwise it could lead to problems. Without a dbus scan
explicitly called we are assuming a periodic scan will happen. If it
already happen the initial wait for obj.scanning will never return and
time out.
This probably needs to be changed in several tests, but for this specific
case we can remove the waits completely. Since
check_autoconnect_hidden_network has a 30 second wait on
DeviceState.connected this will ultimately time out if anything goes
wrong. There isn't any great reason to wait for scanning (for this test
specifically).
A minor style change was also made when initializing IWD. The values
passed in this test are now the default, so no arguments need to be
passed.
iwd.py was updated to use the TestContext APIs to start/stop
IWD. This makes the process managment consistent between starting
IWD from test-runner or from the IWD() constructor.
The psk agent is now tracked, and destroyed upon __del__. This is
to fix issues where a test throws an exception and never
unregisters the agent, causing future tests to fail.
The configuration directory was also chaged to /tmp by
default. This was done since all tests which used this used /tmp
anyways.
The GLib mainloop was removed, and instead put into test-runner
itself. Now any mainloop operations can use ctx.mainloop instead
Before hostapd was initialized using the wiphy_map which has now
gone away. Instead we have a global config module which contains
a single 'ctx'. This is the centeral store for all test information.
This patch converts hostapd.py to lookup instances by already
initialized Hostapd object. The interface parameter was removed
since all tests have been converted to use config= instead.
In addition HostapdCLI was changed to allow no parameters if there
is only a single hostapd instance.
This patch completely re-writes test-runner in Python. This was done
because the existing C test-runner had some clunky work arounds and
maintaining or adding new features was starting to become a huge pain.
There were a few aspects of test-runner which continually had to
be dealt with when adding any new functionality:
* Argument parsing: Adding new arguments to test-runner wasn't so
bad, but if you wanted those arguments passed into the VM it
became a huge pain. Arguments needed to be parsed, then re-formatted
into the qemu command line, then re-parsed in a special order
(backwards) once in the VM. The burden for adding new arguments was
quite high so it was avoided (at least by me) at all costs.
* The separation between C and Python: The tests are all written in
python, but the executables, radios, and interfaces were all created
from C. The way we solved this was by encoding the require info as
environment variables, then parsing those from Python. It worked,
but it was, again, a huge pain.
* Process management: It started with all processes being launched
from C, but eventually tests required the ability to start IWD, or
kill hostapd ungracefully in order to test certain functionality.
Since the processes were tracked in C, Python had no way of
signalling that it killed a process and when it started one C had
no idea. This was mitigated (basically by killall), but it was
no where close to an elegant solution.
Re-writing test-runner in python solves all these problems and will
be much easier to maintain.
* Argument parsing: Now all arguments are forwarded automatically
to the VM. The ArgParse library takes care of parsing and each
argument is stored in a dictionary.
* Separation between C and Python: No more C, so no more separation.
* Process management: Python will now manage all processes. This
allows a test to kill, restart, or start a new process and not
have to remember the PID or to kill it after the test.
There are a few more important aspects of the python implementation
that should now be considered when writing new tests:
* The IWD constructor now has different default arugments. IWD
will always be started unless specified and the configuration
directory will always be /tmp
* Any non *.py file in the test directory will be copied to /tmp.
This avoids the need for 'tmpfs_extra_stuff' completely.
* ctrl_interface will automatically be appended to every hostapd
config. There is no need to include this in a config file from
now on.
* Test cleanup is extremely important. All tests get run in the
same interpreter now and the tests themselves are actually loaded
as python modules. This means e.g. if you somehow kept a reference
to IWD() any subsequent tests would not start since IWD is still
running.
* For debugging, the test context can be printed which shows running
processes, radios, and interfaces.
Three non-native python modules were used: PrettyTable, colored, and
pyroute2
$ pip3 install prettytable
$ pip3 install termcolor
$ pip3 install pyroute2
The tests basically remained the same with a few minor changes.
The wiphy_map and in turn hostapd_map are no longer used. This
was already partially converted a long time ago when the 'config'
parameter was added to HostapdCLI. This patch fully converts all
autotests to use 'config' rather than looking up by interface.
Some test scripts were named 'test.py' which was fine before but
the new rewrite actually loads each python test as a module. The
name 'test' is too ambiguous and causes issues due to a native
python module with the same name. All of these files were
renamed to 'connection_test.py'.
Add the special case "DIRECT-" SSID, called the P2P Wildcard SSID, in
ap_probe_req_cb so as not to reject those Probe Requests on the basis of
ssid mismatch. I'd have preferred to keep all the P2P-specific bits in
p2p.c but in this case there's little point in adding a generic
config setting for SSID-matching quirks.
Prefix all the struct p2p_device members that are part of the connection
state with the "conn_" string for consistency. If we needed to support
multiple client connections, these members are the ones that would
probably land in a separate structure, without that prefix.
For WSC we should have been sending our probe requests from the same
address we're going to be doing EAP-WSC with the GO. Somehow I was able
to connect to most devices without that but other implementations seem
to use the Interface Address (the P2P-Client's MAC), not the Device
Address (P2P-Device's MAC). We could switch the order to first create
the new interface and scan from it is simpler to use the scan_context we
already have created on the device interface and set a different mac.
Check the conditions for PBC enrollee registration when we receive the
Association Request with WSC IE and indicate to the enrollee whether we
accept the association using a WSC IE in the Association Response.
After this, a NULL sta->assoc_rsne indicates that the station is not
establishing the RSNA and is a WSC enrollee.
Implement the caching of WSC probe requests -- when an Enrollee later
associates to start registration we need to have its Probe Request on
file. Also use this cache for PBC "Session Overlap" detection.
This adds the API for putting the AP in Push Button mode, which we'll
need to P2P GO side but may be useful on its own too. A WSC IE is added
to our beacons and probe responses indicating whether the PBC mode is
active.
On a new association or re-association, in addition to forgetting a
complete RSN Association, also stop the EAPoL SM to stop any ongoing
handshake.
Do this in a new function ap_stop_handshake that is now used in a few
places that had copies of the same few lines. I'll be adding some more
lines to this function for WSC support.
Reuse this flag on the authenticator side with a slightly different
meaning: when it's true we're forced to wait for the EAPoL-Start before
sending the first EAPoL-EAP frame to the supplicant, such as is required
in a WSC enrollee registration when the Association Request didn't have
a v2.0 WSC IE.
Add the wfa_build_authorized_macs function (wfa_ prefix following the
wfa_extract_ naming) and use it in wsc_build_probe_response. The logic
is changed slightly to treat the first 6-zeros address in the array as
the end of the array.
Setting 'match' false wouldn't do anything because it was already false.
If the frame is addressed to some other non-broadcast address ignore it
directly and exit ap_probe_req_cb.
To limit the number of ap_start parameters, group basic AP config
parameters in the ap_config struct that is passed as a pointer and owned
by the ap_state.
The intent was to read the UUID-E from the settings rather than generate
it from the enrollee's MAC because it needs to match the UUID-E from
enrolee's Probe Requests, fix this. The UUID-E supplied in the unit
test was being ignored but the test still passed because the supplied
UUID-E was generated the same way we generated it in eap-wsc.c.
When we're sending our probe response to the same peer that we're
currently connected or connecting to, use current WSC Configuration
Methods, UUID-E and WFD IE selected for this connection attempt, not the
ones we'd use when discovering peers or being discovered by peers.
In the case of the WFD IE, the "Available for WFD Session" flag is going
to differ between the two cases -- we may be unavailable for other peers
but we're still available for the peer we're trying to start the WFD
session with.
When we send our GO Negotiation Response, send the Configuration Method
selected for the current connection rather than the accepted methods mask
that we hold in dev->device_info.
When building the scan IEs for our provisioning scans, use the UUID-E
based on the Interface Address, not the Device Address, as that is what
wsc.c will be using to in the registration protocol.
Eventually we may have to base the UUID-E on the Device Address or
something else that is persistent, and pass the actual UUID-E to wsc.c,
as the Interface Address is randomly generated on every connect attempt.
IIRC the UUID-E is supposed to be persistent.
wsc_attr_builder_start_attr and wsc_attr_builder_free look at
builder->curlen to see whether the TLV's length needs to be updated to
include the previous attribute. If builder->curlen is 0
wsc_attr_builder_start_attr assumes there's no previous attribute and
starts writing at current builder->offset. If the previous attribute
length was 0 curlen would stay at 0 and that attribute would get
overwritten with the new one. To solve this add the 4 bytes of the T
and L to curlen as soon as a new attribute is started, and subtract
them when writing the L value. The alternative would be to set a flag
to say whether an attribute was started.
The spec explicitly allows 0-length attributes in section 12:
"The variable length string attributes, e.g., Device Name, are encoded
without null-termination, i.e., no 0x00 octets added to the end of the
value. If the string is empty, the attribute length is set to zero."
Add ability to populate search domains for resolvconf based systems.
Search domains are added using the 'search' directive and added using
the <ifname>.domain key into resolvconf.
Introduce a new resolvconf_invoke function that takes care of all the
details of invoking resolvconf and simplify the code a bit.
Introduce have_dns that tracks whether DNS servers were actually
provided. If no DNS info was provided, do not invoke resolvconf to
remove it.
Instead of interface index, resolvconf is now invoked with the printable
name of the interface and the dns entries are placed in the "dns"
protocol. This makes it a bit simpler to add additional info to
resolvconf instead of trying to generate a monolithic entry.
Resolve module does not currently track any state that has been set on
a per ifindex basis. This was okay while the set of information we
supported was quite small. However, with dhcpv6 support being prepared,
a more flexible framework is needed.
Change the resolve API to allocate and return an instance for a given
ifindex that has the ability to track information that was provided.
Found using lsan:
==29896==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 9 byte(s) in 1 object(s) allocated from:
#0 0x7fcd41e0c710 in __interceptor_malloc /var/tmp/portage/sys-devel/gcc-8.2.0-r6/work/gcc-8.2.0/libsanitizer/asan/asan_malloc_linux.cc:86
#1 0x606abd in l_malloc ell/util.c:62
#2 0x460230 in ie_tlv_vendor_ie_concat src/ie.c:140
#3 0x4605d1 in ie_tlv_extract_wfd_payload src/ie.c:216
#4 0x4a8773 in scan_parse_bss_information_elements src/scan.c:1105
#5 0x4a94a8 in scan_parse_attr_bss src/scan.c:1181
#6 0x4a99f8 in scan_parse_result src/scan.c:1238
#7 0x4abe4e in get_scan_callback src/scan.c:1451
#8 0x6442d9 in process_unicast ell/genl.c:979
#9 0x6453ff in received_data ell/genl.c:1087
#10 0x62e1a4 in io_callback ell/io.c:126
#11 0x628fca in l_main_iterate ell/main.c:473
#12 0x6294e8 in l_main_run ell/main.c:520
#13 0x629d8b in l_main_run_with_signal ell/main.c:642
#14 0x40681b in main src/main.c:505
#15 0x7fcd40a55bdd in __libc_start_main (/lib64/libc.so.6+0x21bdd)
When the client is interrupted in the middle of user input entry and the
input is masked, the terminal might be left in a weird state. Make sure
to reset the prompt if the agent is being cleaned up in the middle of an
operation.
This commit has all the changes to extend and generalise the current
eap-wsc.c code to handle both the Enrollee and Registrar side of the
protocol, reusing existing functions and structures.
Alongside the current EAP-WSC enrollee side support, add the initial
part of registrar side. In the same file, register a new method with
the name string of "WSC-R". In this patch only the load_settings
method is added. validate_identity and handle_response are added in
later patches.
Handle EAPoL-EAP frames using our eap.c methods in authenticator mode
same as we do on the supplicant side. The user (ap.c) will only need to
set a valid 8021x_settings in the handshake object, same as on the
supplicant side.
The goal is to add specifically EAP-WSC registrar side and it looks like
extending our EAP and EAPoL code to support both supplicant and
authenticator-side methods is simpler than adding just EAP-WSC as a
special case.
Since EAP-WSC always ends in an EAP failure, I haven't actually tested
the success path.
On the supplicant side eapol_register would only register the eapol_sm
on a given netdev to start receiving frames and an eapol_start call is
required for the state machine to start executing. On the authenticator
side we shouldn't have the "early frame" problem but there's no reason
for the semantics of the two methods to be different. Somehow we were
doing everything in eapol_register and not using eapol_start if
hs->authenticator was true, so bring this in line with the supplicant
side and require eapol_start to be called also from ap.c.
Move the update of station->networks_sorted order to before we set
station->connected_network NULL to avoid a crash when we attempt to
use the NULL pointer.
Besides being undefined behaviour, signed integer overflow can cause
unexpected comparison results. In the case of network_rank_compare(),
a connected network with rank INT_MAX would cause newly inserted
networks with negative rank to be inserted earlier in the ordered
network list. This is reflected in the GetOrderedMethods() DBus method
as can be seen in the following iwctl output:
[iwd]# station wlan0 get-networks
Network name Security Signal
----------------------------------------------------
BEOLAN 8021x **** }
BeoBlue psk *** } all unknown,
UI_Test_Network psk *** } hence assigned
deneb_2G psk *** } negative rank
BEOGUEST open **** }
> titan psk ****
Linksys05274_5GHz_dmt psk ****
Lyngby-4G-4 5GHz psk ****
Doing so ensures that the currently connected network is always at the
beginning of the list. Previously, the list would only get updated after
a scan.
This fixes the documented behaviour of GetOrderedNetworks() DBus method,
which states that the currently connected network is always at the
beginning of the returned array.
Fix a logic error which prevented iwd from using SAE/WPA3 when
attempting to connect to APs that are in transition mode. The SAE/WPA3
check incorrectly required mfpr bit to be set, which is true for
APs in WPA3-Personal only mode, but is set to 0 for APs in
WPA3-Personal transition mode.
This patch also adds a bit more diagnostic output to help diagnose
causes for connections where WPA3 is not attempted even when advertised
by the AP.
Replace the usage of eap_send_response() in the method implementations
with a new eap_method_respond that skips the redundant "type" parameter.
The new eap_send_packet is used inside eap_method_respond and will be
reused for sending request packets in authenticator side EAP methods.
Throughout the supplicant mode we'd use the eapol_sm_write wrapper but
in the authenticator mode we'd call __eapol_tx_packet directly. Adapt
eapol_sm_write to use the right destination address and use it
consistently.
sm->handshake already contains our RSN/WPA IE so there's no need to
rebuild it for msg 3/4, especially since we hardcode the fact that we
only support one pairwise cipher. If we start declaring more supported
ciphers and need to include a second RSNE we can first parse
sm->hs->authenticator_ie into a struct ir_rsn_info, overwrite the cipher
and rebuild it from that struct.
This way we duplicate less code and we hardcode fewer facts about the AP
in eapol.c which also helps in adding EAP-WSC.
In both FT or FILS EAPoL isn't used for the initial handshake and only
for the later re-keys. For FT we added the
eapol_sm_set_require_handshake mechanism to tell EAPoL to not require
the initial handshake and we can re-use it for FILS.
Currently an adversary can retransmit EAPOL Msg4/4 to make the AP
reinstall the PTK. Against older Linux kernels this can subsequently
be used to decrypt, replay, and possibly decrypt frames. See the
KRACK attacks research at krackattacks.com for attack scenarios.
In this case no machine-in-the-middle position is needed to trigger
the key reinstallation.
Fix this by using the ptk_complete boolean to track when the 4-way
handshake has completed (similar to its usage for clients). When
receiving a retransmitted Msg4/4 accept this frame but do not reinstall
the PTK.
Credits to Chris M. Stone, Sam Thomas, and Tom Chothia of Birmingham
University to help discover this issue.
Instead of creating the results->bss_list l_queue lazily, always create
one before sending the GET_SCAN command. This is to make sure that an
empty list is passed to the scan callback (e.g. in station.c) instead of
a NULL. Passing NULL has been causing difficult to debug crashes in
station.c, in fact I think I've been seeing them for over a year now
but can't be sure. station_set_scan_results has been taking ownership
of the new BSS list and, if station->connected_bss was not on the list,
it would try to add it not realizing that l_queue_push_tail() was doing
nothing. Always passing a valid list may help us prevent similar
problems in the future.
The crash might start with:
==120489== Invalid read of size 8
==120489== at 0x425D38: network_bss_select (network.c:709)
==120489== by 0x415BD1: station_try_next_bss (station.c:2263)
==120489== by 0x415E31: station_retry_with_status (station.c:2323)
==120489== by 0x415E31: station_connect_cb (station.c:2367)
==120489== by 0x407E66: netdev_connect_failed (netdev.c:569)
==120489== by 0x40B93D: netdev_connect_event (netdev.c:1801)
==120489== by 0x40B93D: netdev_mlme_notify (netdev.c:3678)
The Gtk.Switch representing the p2p.Device's Enabled property should use
the "delayed state change" logic as described in Gtk.Switch docs, i.e.
we need to use .set_state() instead of .set_active() when we get
confirmation of the property having changed its value in the
PropertiesChanged handler. The ::active property is automatically
changed by Gtk.Switch on user input.
This way the UI gives visual feedback of when the device enable/disable
op starts and ends (or fails).
Subscribe to InterfacesAdded/Removed/PropertiesChanged signals before
using GetManagedObjects. For some reason when iwd starts after the
client, we consistently get the managed objects list from before Adapter
interfaces are added but we miss the subsequent InterfacesAdded
signals, probably has to do with the GetManagedObjects and the AddMatch
calls all being synchronous.
Secondly call self.populate_devices() on init as it won't be called if
IWD is not on the bus.
Incorporate the LGPL v2.1 licensed implementation of ARC4, taken from
the Nettle project (https://git.lysator.liu.se/nettle/nettle.git,
commit 3e7a480a1e351884), and tweak it a bit so we don't have to
operate on a skip buffer to fast forward the stream cipher, but can
simply invoke it with NULL dst or src arguments to achieve the same.
This removes the dependency [via libell] on the OS's implementation of
ecb(arc4), which may be going away, and which is not usually accelerated
in the first place.
Use a constant control flow in the derivation loop, avoiding leakage
in the iteration succesfuly converting the password.
Increase number of iterations (20 to 30) to avoid issues with
passwords needing more iterations.
Define a bunch of stream parameters each with a getter and an optional
setter. In the right pane of the window show widgets for these
properties, some as just labels and some as editable controls depending
on the type of the property. Parse the EDID data.
With some devices the 10 seconds are not enough for the P2P Group Owner
to give us an address but I think we still want to use a timeout as
short as possible so that the user doesn't wait too long if the
connection isn't working.
p2p_connection_reset may be called as a result of a WFD service
unregistering and p2p_own_wfd is going to be NULL, don't update
p2p_own_wfd->available in this case.
With some WFD devices we occasionally get a Disconnect before or during
the DHCP setup on the first connection attempt to a newly formeg group,
with the reason code MMPDU_REASON_CODE_PREV_AUTH_NOT_VALID. Retrying a
a few times makes the connections consistently successful. Some
conditions are simplified/update in this patch because
conn_dhcp_timeout now implies conn_wsc_bss, and both imply
conn_retry_count.
In 98cf2bf3ece070bfe7ff45670b95d24b34bf3e13 frame_xchg_stop was removed
and its use in p2p.c was changed to frame_xchg_cancel with the slight
complication that the ID returned by frame_xchg_start had do be stored.
Re-add frame_xchg_stop, (renamed as frame_xchg_stop_wdev) to simplify
this bit in p2p.c.
Since there may now be multiple frames-xchg record for each wdev, when
we receive the TX Status event, make sure we find the record who's radio
work has started, as indicated by fx->retry_cnt > 0. Otherwise we're
relying on the ordering of the frames in the "frame_xchgs" queue and
constant priority.
The BSSID (address_3) in response frames was being checked to be the
same as in the request frame, or all-zeros for faulty drivers. At least
one Wi-Fi Display device sends a GO Negotiation Response with the BSSID
different from its Device Address (by 1 bit) and I didn't see an easy
way to obtain that address beforhand so we can "whitelist" it for this
check, so just drop that check for now.
ANQP didn't have this check before it started using frame-xchg so it
shouldn't be critical.
When a frame registered in a given group Id triggers a callback and that
callback ends up calling frame_watch_group_remove for that group Id,
that call will happen inside WATCHLIST_NOTIFY_MATCHES and will free the
memory used by the watchlist. watchlist.h has protection against the
watchlist being "destroyed" inside WATCHLIST_NOTIFY_MATCHES, but not
against its memory being freed -- the memory where it stores the in_notify
and destroy_pending flags. Free the group immediately after
WATCHLIST_NOTIFY_MATCHES to avoid reads/writes to those flags triggering
valgrind warnings.
frame_xchg_destroy is passed as the wiphy radio work's destroy callback
to wiphy.c. If it's also called directly in frame_xchg_exit, there's
going to be a use-after-free when it's called again from wiphy_exit, so
instead use wiphy_radio_work_done which will call frame_xchg_destroy and
forget the frame_xchg record.
This patch lets us establish WFD connections by parsing, validating and
acting on WFD IEs in received frames, and adding our own WFD IEs in the
GO Negotiation and Association frames. Applications should assume that
any connection to a WFD-capable peer when we ourselves have a WFD
service registered, are WFD connections and should handle RTSP and
other IP-based protocols on those connections.
When connecting to a WFD-capable peer and when we have a WFD service
registered, the connection will fail if there are any conflicting or
invalid WFD parameters during GO Negotiation.
If anyone's registered as implementing the WFD service, add the
net.connman.iwd.p2p.Display DBus interface on peer objects that are
WFD-capable and are available for a WFD Session.
The net.connman.iwd.p2p.ServiceManager interface on the /net/connman/iwd
object lets user applications register/unregister the Wi-Fi Display
service. In this commit all it does is it adds local WFD information
as given by the app, to the frames we send out during discovery.
Instead of accepting raw WFD IE contents from the app and exposing
peers' raw WFD IEs to the app, we build the WFD IEs in our code based on
the few meaningful DBus properties that we support and using default
values for the rest. If an app ever needs any of the other WFD
capabilities more properties can be added.
First, looking for DeviceState.connected gives a much better indication
if we are actually connected vs the connected property on the network
object. Second, its good practice to also check that hostapd sees that
the station is connected.
Restarting hostapd from python was actually leaking memory and
causing the hostapd object to stay referenced in python. The
GLib timeout in wait_for_event was the ultimate cause, but this
had no come to light because no tests restarted hostapd then
used wait_for_event.
In addition, any use of wait_for_event after a restart would
cause an exception because the event socket was never re-attached
after hostapd restarted.
Now we properly clean up the timeout in wait_for_event and
re-initialize the hostapd object on restart.
The are useful for P2P service implementations to know unambiguously
which network interface a new P2P connection is on and the peer's IPv4
address if they need to initiate an IP connection or validate an
incoming connection's address from the peer.
This uses l_dhcp_lease_get_server_id to get the IP of the server that
offered us our current lease. l_dhcp_lease_get_server_id returns the
vaue of the L_DHCP_OPTION_SERVER_IDENTIFIER option, which is the address
that any unicast DHCP frames are supposed to be sent to so it seems to
be the best way to get the P2P group owner's IP address as a P2P-client.
peer->device_addr is a pointer to the Device Address contained in
one of two possible places in peer->bss. If during discovery we've
received a new beacon/probe response for an existing peer and we're
going to replace peer->bss, we also have to update peer->device_addr.
If we were in discovery only to be able to receive the target peer's
GO Negotiation Request (i.e. we have no users requesting discovery)
and we've received the frame and decided that the connection has
failed, exit discovery.
To use the wiphy radio work queue, scanning mostly remained the same.
start_next_scan_request was modified to be used as the work callback,
as well as not start the next scan if the current one was done
(since this is taken care of by wiphy work queue now). All
calls to start_next_scan_request were removed, and more or less
replaced with wiphy_radio_work_done.
scan_{suspend,resume} were both removed since radio management
priorities solve this for us. ANQP requests can be inserted ahead of
scan requests, which accomplishes the same thing.
Before connecting to a hidden network we must scan. During this scan
if another connection attempt comes in the expected behavior is to
abort the original connection. Rather than waiting for the scan to
complete, then canceling the original hidden connection we can just
cancel the hidden scan immediately, reply to dbus, and continue with
the new connection attempt.
The new frame-xchg module now handles a lot of what ANQP used to do. ANQP
now does not need to depend on nl80211/netdev for building and sending
frames. It also no longer needs any of the request lookups, frame watches
or to maintain a queue of requests because frame-xchg filters this for us.
From an API perspective:
- anqp_request() was changed to take the wdev_id rather than ifindex.
- anqp_cancel() was added so that station can properly clean up ANQP
requests if the device disappears.
During testing a bug was also fixed in station on the timeout path
where the request queue would get popped twice.
In order to first integrate frame-xchg some refactoring needed to
be done. First it is useful to allow queueing frames up rather than
requiring the module (p2p, anqp etc) to wait for the last frame to
finish. This can be aided by radio management but frame-xchg needed
some refactoring as well.
First was getting rid of this fx pointer re-use. It looks like this
was done to save a bit of memory but things get pretty complex
needed to check if the pointer is stale or has been reset. Instead
of this we now just allocate a new pointer each frame-xchg. This
allows for the module to queue multiple requests as well as removes
the complexity of needed to check if the fx pointer is stale.
Next was adding the ability to track frame-xchgs by ID. If a module
can queue up multiple requests it also needs to be able to cancel
them individually vs per-wdev. This comes free with the wiphy work
queue since it returns an ID which can be given directly to the
caller.
Then radio management was simply piped in by adding the
insert/done APIs.
These APIs will handle fairness and order in any operations which
radios can only do sequentially (offchannel, scanning, connection etc.).
Both scan and frame-xchg are complex modules (especially scanning)
which is why the radio management APIs were implemented generic enough
where the changes to both modules will be minimal. Any module that
requires this kind of work can push a work item into the radio
management work queue (wiphy_radio_work_insert) and when the work
is ready to be started radio management will call back into the module.
Once the work is completed (and this may be some time later e.g. in
scan results or a frame watch) the module can signal back that the
work is finished (wiphy_radio_work_done). Wiphy will then pop the
queue and continue with the next work item.
A concept of priority was added in order to allow important offchannel
operations (e.g. ANQP) to take priority over other work items. The
priority is an integer, where lower values are of a higher priority.
The concept of priority cleanly solves a lot of the complexity that
was added in order to support ANQP queries (suspending scanning and
waiting for ANQP to finish before connecting).
Instead ANQP queries can be queued at a higher priority than scanning
which removes the need for suspending scans. In addition we can treat
connections as radio management work and insert them at a lower
priority than ANQP, but higher than scanning. This forces the
connection to wait for ANQP without having to track any state.
Many tests force a reauth after the initial connection. When the tests
were written there was no way of ensuring the reauth completed except
waiting (IWD.wait()). Now we can wait for hostapd events in the tests,
which is faster and more reliable than busy waiting.
This test was not reliably passing. Busy waiting is not really reliable,
but in this specific case its really the only option as the blacklist
must expire based on time.
When roaming, iwd tries to scan a limited number of frequencies to keep
the roaming latency down. Ideally the frequency list would come in from
a neighbor report, but if neighbor reports are not supported, we fall
back to our internal database for known frequencies of this network.
iwd tries to keep the number of scans down to a bare minimum, which
means that we might miss APs that are in range. This could happen
because the user might have moved physically and our frequency list is
no longer up to date, or if the AP frequencies have been reconfigured.
If a limited scan fails to find any good roaming candidates, re-attempt
a full scan right away.
If the roam failed and we are no longer connected, station_disassociated
is called which ends up calling station_roam_state_clear. Thus
resetting the variables is not needed. Reflow the logic to make this a
bit more explicit.
If the roam attempt fails, do not reset this to false. Generally this
is set by the fact that we lost beacon and to not attempt neighbor
reports, etc. This hint should be preserved across roam attempts.
If an application has a bug and hangs on SIGTERM this causes
test-runner to hang as well. This is obviously an issue with
the application in question, but test-runner should have a way
of continuing onto the next test rather than hanging.
Instead we can use WNOHANG and a sleep to allow applications
some amount of time to exit, and if they haven't use SIGKILL
instead as well as print an error. Similar to how
wait_for_socket works. The timeout is hard coded to 2 seconds
(100ms sleep + 20 iterations).
frame_xchg_startv was using sizeof(mmpdu) to check the minimum length
for a frame. Instead mmpdu_header_len should be used since this checks
fc.order and returns either 24 or 28 bytes, not 28 bytes always.
This change adds the requirement that the first iovec in the array
must contain at least the first 2 bytes (mmpdu_fc) of the header.
This really shouldn't be a problem since all current users of
frame-xchg put the entire header (or entire frame) into the first
iovec in the array.
explicit_bzero is used in src/p2p.c since commit
1675c765a3d66f3c626a1581a7bf514e1859b064 but src/missing.h is not
included, as a result build with uclibc fails on:
/home/naourr/work/instance-0/output-1/per-package/iwd/host/opt/ext-toolchain/bin/../lib/gcc/mips64el-buildroot-linux-uclibc/5.5.0/../../../../mips64el-buildroot-linux-uclibc/bin/ld: src/p2p.o: in function `p2p_connection_reset':
p2p.c:(.text+0x2cf4): undefined reference to `explicit_bzero'
/home/naourr/work/instance-0/output-1/per-package/iwd/host/opt/ext-toolchain/bin/../lib/gcc/mips64el-buildroot-linux-uclibc/5.5.0/../../../../mips64el-buildroot-linux-uclibc/bin/ld: p2p.c:(.text+0x2cfc): undefined reference to `explicit_bzero'
This logic was using l_hashmap_insert, which supports duplicates. Since
some entries were inserted multiple times, they ended up being printed
multiple times. Fix that by introducing a macro that uses
l_hashmap_replace instead.
Right now, if the connection fails, then network always thinks that the
password should be re-asked. Loosen this to only do so if the
connection failed at least in the handshake phase. If the connection
failed due to Association / Authentication timeout, it is likely that
something is wrong with the AP and it can't respond.
Using the new station ANQP watch network can delay the connection
request until after ANQP has finished. Since station may be
autoconnecting we must also add a check in network_autoconnect
which prevents it from autoconnecting if we have a pending Connect
request.
This is to allow network to watch for ANQP activity in order to
fix the race condition between scanning finishing and ANQP finishing.
Without this it is possible for a DBus Connect() to come in before
ANQP has completed and causing the network to return NotConfigured,
when its actually in the process of obtaining all the network info.
The watch was made globally in station due to network not having
a station object until each individual network is created. Adding a
watch during network creation would result in many watchers as well
as a lot of removal/addition as networks are found and lost.
Change signature of network_connect_new_hidden_network to take
reference to the caller's l_dbus_message struct. This allows to
set the caller's l_dbus_message struct to NULL after replying in
the case of a failure.
==201== at 0x467C15: l_dbus_message_unref (dbus-message.c:412)
==201== by 0x412A51: station_hidden_network_scan_results (station.c:2504)
==201== by 0x41EAEA: scan_finished (scan.c:1505)
==201== by 0x41EC10: get_scan_done (scan.c:1535)
==201== by 0x462592: destroy_request (genl.c:673)
==201== by 0x462987: process_unicast (genl.c:988)
==201== by 0x462987: received_data (genl.c:1087)
==201== by 0x45F5A2: io_callback (io.c:126)
==201== by 0x45E8FD: l_main_iterate (main.c:474)
==201== by 0x45E9BB: l_main_run (main.c:521)
==201== by 0x45EBCA: l_main_run_with_signal (main.c:643)
==201== by 0x403B15: main (main.c:512)
Introduce hidden_pending to keep reference to the dbus message object
while we wait for the scan results to be returned while trying to
connect to a hidden network. This simplifies the logic by separating it
into two independent logical units: scanning, connecting and eliminates
a possibility of a memory leak in the case when Network.Connect being
initiated while Station.ConnectHiddenNetwork is in progress.
If a connection is initiated (via dbus) while a quick scan is in
progress, the quick scan will be aborted. In this case,
station_quick_scan_results will always transition to the
AUTOCONNECT_FULL state regardless of whether it should or not.
Fix this by making sure that we only enter AUTOCONNECT_FULL if we're
still in the AUTOCONNECT_QUICK state.
Reported-by: Alvin Šipraga <alsi@bang-olufsen.dk>
If start_scan_next_request() is called while a scan request
(NL80211_CMD_TRIGGER_SCAN) is still running, the same scan request will
be sent again. Add a check in the function to avoid sending a request if
one is already in progress. For consistency, check also that scan
results are not being requested (NL80211_CMD_GET_SCAN), before trying to
send the next scan request. Finally, remove similar checks at
start_next_scan_request() callsites to simplify the code.
This also fixes a crash that occurs if the following conditions are met:
- the duplicated request is the only request in the scan request
queue, and
- both scan requests fail with an error not EBUSY.
In this case, the first callback to scan_request_triggered() will delete
the request from the scan request queue. The second callback will find
an empty queue and consequently pass a NULL scan_request pointer to
scan_request_failed(), causing a segmentation fault.
If scanning is suspended, have scan_common() queue its scan request
rather than issuing it immediately. This respects the assumption that
scans are not requested while sc->suspended is true.
This bug is caused by the following behavior:
1. Start a frame-xchg, wait for callback
2. From callback start a new frame-xchg, same prefix.
The new frame-xchg request will detect that there is a duplicate watch,
which is correct behavior. It will then remove this duplicate from the
watchlist. The issue here is that we are in the watchlist notify loop
from the original xchg. This causes that loop to read from the now
freed watchlist item, causing an invalid read.
Instead of freeing the item immediately, check if the notify loop is in
progress and only set 'id' to zero and 'stale_items' to true. This will
allow the notify loop to finish, then the watchlist code will prune out
any stale items. If not in the notify loop the item can be freed as it
was before.
Don't match the default group's (group_id 0) wdev_id against the
provided wdev_id because the default group can be used on all wdevs and
its wdev_id is 0. Also match individual item's wdev_id in the group to
make up for this although it normally wouldn't matter.
802.11ai mandates that the RSN element is included during authentication
for FILS. This previously was happening by chance since supplicant_ie
was being included with CMD_AUTHENTICATE. This included more than just
the RSNE so that was removed in an earlier commit. Now FILS builds the
RSNE itself and includes this with CMD_AUTHENTICATE.
build_cmd_ft_authenticate and build_cmd_authenticate were virtually
identical. These have been unified into a single builder.
We were also incorrectly including ATTR_IE to every authenticate
command, which violates the spec for certain protocols, This was
removed and any auth protocols will now add any IEs that they require.
In certain cases the autoconnect portion of each subtest was connecting
to the network so fast that the check for obj.scanning was never successful
since IWD was already connected (and in turn not scanning). Since the
autoconnect path will wait for the device to be connected there really isn't
a reason to wait for any scanning conditions. The normal connect path does
need to wait for scanning though, and for this we can now use the new
scan_if_needed parameter to get_ordered_networks.
There is a very common block of code inside many autotests
which goes something like:
device.scan()
condition = 'obj.scanning'
wd.wait_for_object_condition(device, condition)
condition = 'not obj.scanning'
wd.wait_for_object_condition(device, condition)
network = device.get_ordered_network('an-ssid')
When you see the same pattern in nearly all the tests this shows
we need a helper. Basic autotests which merely check that a
connection succeeded should not need to write the same code again
and again. This code ends up being copy-pasted which can lead to
bugs.
There is also a code pattern which attempts to get ordered
networks, and if this fails it scans and tries again. This, while
not optimal, does prevent unneeded scanning by first checking if
any networks already exist.
This patch solves both the code reuse issue as well as the recovery
if get_ordered_network(s) fails. A new optional parameter was
added to get_ordered_network(s) which is False by default. If True
get_ordered_network(s) will perform a scan if the initial call
yields no networks. Tests will now be able to simply call
get_ordered_network(s) without performing a scan before hand.
These values were meant only to force IWD's BSS preference but
since the RSSI's were so low in some cases this caused a roam
immediately after connecting. This patch changes the RSSI values
to prevent a roam from happening.
Previously iwmon was running per-test, which would jumble any subtests
together into the same log file making it hard to parse. Now create
a separate directory for each subtest and put the monitor log and
pcap there.
In this situation the kernel is sending a low RSSI event which netdev
picks up, but since we set netdev->connected so early the event is
forwarded to station before IWD has fully connected. Station then
tries to get a neighbor report, which may fail and cause a known
frequency scan. If this is a new network the frequency scan tries to
get any known frequencies in network_info which will be unset and
cause a segfault.
This can be avoided by only sending RSSI events when netdev->operational
is set rather than netdev->connected.
Using mac80211_hwsim can sometimes result in out of order messages
coming from the kernel. Since mac80211_hwsim immediately sends out
frames and the kernel keeps command responses in a separate queue,
bad scheduling can result in these messages being out of order.
In some cases we receive Auth/Assoc frames before the response to
our original CMD_CONNECT. This causes autotests to fail randomly,
some more often than others.
To fix this we can introduce a small delay into hwsim. Just a 1ms
delay makes the random failures disappear in the tests. This delay
is also makes hwsim more realistic since actual hardware will always
introduce some kind of delay when sending or receiving frames.
Some full mac cards don't like being given a FT AKM when connecting.
From an API perspective this should be supported, but in practice
these cards behave differently and some do no accept FT AKMs. Until
this becomes more stable any cards not supporting Auth/Assoc commands
(full mac) will not connect using FT AKMs.
This callback gets called way to many times to have a debug print
in the location that it was. Instead only print if a NEW wiphy is
found, and also print the name/id.
Save the value of the watchlist pointer at the beginning of the
WATCHLIST_NOTIFY_* macros as if it was a function. This will fix a
frame-xchg.c scenario in which one of the watch callback removes the
frame watch group and the memory where the watchlist pointer was
becomes unallocated but the macro still needs to access it ones or
twice while it destroys the watchlist. Another option would be for
the pointer to be copied in frame-xchg.c itself.
Use netconfig.c functions to unconditionally run DHCP negotiation,
fail the connection setup if DHCP fails. Only report connection success
after netconfig returns.
Add the final two steps of the connection setup, and corresponding
disconnect logic:
* the WSC connection to the GO to do the client provisioning,
* the netdev_connect call to use the provisioned credentials for the
final WPA2 connection.
Once we've found the provisioning BSS create the P2P-Client interface
that we're going to use for the actual provisioning and the final P2P
connection.
Some devices (a Wi-Fi Display dongle in my case) will send us Probe
Requests and wait for a response before they send us the GO
Negotiation Request that we're waiting for after the peer initially
replied with "Fail: Information Not Available" to our GO Negotiation
attempt. Curiously this specific device I tested would even accept
a Probe Response with a mangled body such that the IE sequence couldn't
be parsed.
Handle the scenario where the peer's P2P state machine doesn't know
whether a connection has been authorized by the user and needs some time
to ask the user or a higher software layer whether to accept a
connection. In that case their GO Negotiation Response to our GO
Negotiation Request will have the status code "Fail: Information Not
Available" and we need to give the peer 120s to start a new GO
Negotiation with us. In this patch we handle the GO Negotiation
responder side where we parse the Request frame, build and send the
Response and finally parse the Confirmation. The existing code so far
only did the initiator side.
Parse the GO Negotiation Response frame and if no errors found send the
GO Negotiation Confirmation. If that gets ACKed wait for the GO to set
up the group.
Add net.connman.iwd.SimpleConfiguration interfaces to peer objects on
DBus and handle method calls. Building and transmitting the actual
action frames to start the connection sequence is done in the following
commits.
For wired authentication the protocol version used in the EAPOL
packets sent by ead is fixed to 802.1X-2004 (2) but some switches
implementing only 802.1X-2001 erroneously ignore these packets.
As ead only sends EAPOL-Start and EAP-Packet packets and these have
not changed between 802.1X-2001 and 802.1X-2004 there should be
no reason to use 802.1X-2004. Hence, this changes ead to always use
802.1X-2001 (1) instead.
Switches implementing newer versions of 802.1X should not have
problems responding to packets using the original version.
Add some of the Device Discovery logic and the DBus API. Device
Discovery is documented as having three states: the Scan Phase, the Find
Phase and the Listen State.
This patch adds the Scan Phase and the next patch adds the Listen State,
which will happen sequentially in a loop until discovery is stopped.
The Find Phase, which is documented as happening at the beginning of the
Discovery Phase, is incorporated into the Scan Phases. The difference
between the two is that Find Phase scans all of the supported channels
while the Scan Phase only scans the three "social" channels. In
practical terms the Find Phase would discover existing groups, which may
operate on any channel, while the Scan Phase will only discover P2P
Devices -- peers that are not in a group yet. To cover existing groups,
we add a few "non-social" channels to each of our active scans
implementing the Scan Phases.
When a new wiphy is added query its regulatory domain and listen for
nl80211 regulatory notifications to be able to provide current
regulatory country code through the new wiphy_get_reg_domain_country().
Implement the Enabled property on device interface. The P2P device is
currently disabled on startup but automatically enabling the P2P device
can be considered.
With the previous commit, wscutil now depends on ie.h. Unfortunately,
wired also includes eap-wsc and wscutil in the build, but not ie, which
results in a link-time failure.
Fix this by droppig eap-wsc and wscutil from wired. There's no reason
that ethernet authentication would ever use the WiFi Protected Setup
authentication.
SOL_NETLINK is used since commit
87a198111af1ea67053895f7435fb99e3cdd2159 resulting in the following
build failure with glibc < 2.24:
src/frame-xchg.c: In function 'frame_watch_group_io_read':
src/frame-xchg.c:328:27: error: 'SOL_NETLINK' undeclared (first use in this function)
if (cmsg->cmsg_level != SOL_NETLINK)
^
This failure is due to glibc that doesn't support SOL_NETLINK before
version 2.24 and
f9b437d5ef
Fixes:
- http://autobuild.buildroot.org/results/3485088b84111c271bbcfaf025aa4103c6452072
'Connected' property of the network object is set before the connection
attempt is made and does not indicate a connection success. Therefore,
use device status property to identify the connection status of the device.
The display refresh is automatically enabled or disabled depending on
the width of the window. This allows to avoid the incorrect display on
refresh for the small windows.
Instead of calling display(""), explicitly use the sequence of
commands to force readline to properly update its internal state
and re-display the prompt.
For PSK networks we have netdev.c taking care of setting the linkmode &
operstate. For open adhoc networks, netdev.c was never involved which
resulted in linkmode & operstate never being set. Fix this by invoking
the necessary magic when a connection is established.
adhoc_reset() destroys ssid and sta_states but leaves the pointers
around, athough the adhoc_state structure is not always freed.
This causes a segfault when exiting iwd after a client has done
adhoc start and adhoc stop on a device since adhoc_reset() is called
from adhoc_sta_free although it was previously called from
adhoc_leave_cb().
The netdev_leave_adhoc() returns a negative errno on errors and zero
on success, but adhoc_dbus_stop() assumed the inverse when checking for
an error.
Also, the DBus message was not being referenced in adhoc->pending and
then adhoc_leave_cb() segfaulted attempting to dereference it.
Doing 'ad-hoc <wlan> start_open <"network name">' returned a
"No matching method found" error because start_open called
net.connman.iwd.AdHoc.Start instead of net.connman.iwd.AdHoc.StartOpen.
It seems some APs send the IGTK key in big endian format (it is a
uin16). The kernel rightly reports an -EINVAL error when iwd issues a
NEW_KEY with such a value, resulting in the connection being aborted.
Work around this by trying to detect big-endian key indexes and 'fixing'
them up.
When running test-runner as non-root the environment variables
SUDO_GID/SUDO_UID were unset, causing atoi to segfault. This replaces
atoi with strtol, and checks the existance of SUDO_GID/SUDO_UID
before trying to turn it into an integer. This patch also allows
the uid/gid to be read from the user if running as non-root.
Note: running as non-root does require the users permissions to be
setup properly. Directories and files are created when running with
logging, so if the user running test-runner does not have these
permissions the creation of these files will fail.
The configuration value of iwd_config_dir was defaulting to /etc/iwd
which, in the context of test-runner, is probably not the best idea.
The system may have a main.conf file in /etc/iwd which could cause
tests to fail or behave unexpectedly.
In addition all tests which use iwd_config_dir set it to /tmp anyways.
Because of this, the new default value will be /tmp and no tests will
even need to bother setting this.
The configuration value itself is not being removed because it may be
useful to set arbitrary paths (e.g. /etc/iwd) for example when using
the shell functionality.
This bug has been in here since OWE was written, but a similar bug also
existed in hostapd which allowed the PTK derivation to be identical.
In January 2020 hostapd fixed this bug, which now makes IWD incompatible
when using group 20 or 21.
This patch fixes the bug for IWD, so now OWE should be compatible with
recent hostapd version. This will break compatibility with old hostapd
versions which still have this bug.
If the AP only supports an AKM which requires an auth protocol
CMD_AUTHENTICATE/CMD_ASSOCIATE must be supported or else the
auth protocol cannot be run. All the auth protocols are started
assuming that the card supports these commands, but the support
was never checked when parsing supported commands.
This patch will prevent any fullMAC cards from using
SAE/FILS/OWE. This was the same behavior as before, just an
earlier failure path.
This function was intended to catch socket errors and destroy the group
but it would leak the l_io object if that happened, and if called on
ordinary shutdown it could cause a crash. Since we're now assuming
that the netlink socket operations never fail just remove it.
Only add constants for parsing the Device Information subelement as that
is the main thing we care about in P2P code. And since our own WFD IEs
will likely only need to contain the Device Information subelement, we
don't need builder utilities. We do need iterator utilities because we
may receive WFD IEs with more subelements.
In some cases a P2P peer will ACK our frame but not reply on the first
attempt, and other implementations seem to handle this by going back to
retransmitting the frame at a high rate until it gets ACKed again, at
which point they will again give the peer a longer time to tx the
response frame. Implement the same logic here by adding a
retries_on_ack parameter that takes the number of additional times we
want to restart the normal retransmit counter after we received no
response frame on the first attempt. So passing 0 maintains the
current behaviour, 1 for 1 extra attempt, etc.
In effect we may retransmit a frame about 15 * (retry_on_ack + 1) *
<in-kernel retransmit limit> times. The kernel/driver retransmits a
frame a number of times if there's no ACK (I've seen about 20 normally)
at a high frequency, if that fails we retry the whole process 15 times
inside frame-xchg.c and if we still get no ACK at any point, we give up.
If we do get an ACK, we wait for a response frame and if we don't get
that we will optionally reset the retry counter and restart the whole
thing retry_on_ack times.
In order to support AlwaysRandomizeAddress and AddressOverride, station will
set the desired address into the handshake object. Then, netdev checks if
this was done and will use that address rather than generate one.
This patch adds two new options to a network provisioning file:
AlwaysRandomizeAddress={true,false}
If true, IWD will randomize the MAC address on each connection to this
network. The address does not persists between connections, any new
connection will result in a different MAC.
AddressOverride=<MAC>
If set, the MAC address will be set to <MAC> assuming its a valid MAC
address.
These two options should not be used together, and will only take effect
if [General].AddressRandomization is set to 'network' in the IWD
config file.
If neither of these options are set, and [General].AddressRandomization
is set to 'network', the default behavior remains the same; the MAC
will be generated deterministically on a per-network basis.
Since frame_watch_remove_by_handler only forgets a given function +
user data pointers, and doesn't remove the frame prefixes added in the
kernel, we can avoid later re-registering those prefixes with the
kernel by keeping them in our local watchlist, and only replacing the
handler pointer with a dummy function.
If during WATCHLIST_NOTIFY{,_MATCHES,_NO_ARGS} one of the watch
notify callback triggers a call to watchlist_destroy, give up calling
remaining watches and destroy the watchlist without crashing. This is
useful in frame-xchg.c (P2P use case) where a frame watch may trigger
a move to a new state after receiving a specific frame, and remove one
group of frame watches (including its watchlist) to create a different
group.
For privacy reasons its advantageous to randomize or mask
the MAC address when connecting to networks, especially public
networks.
This patch allows netdev to generate a new MAC address on a
per-network basis. The generated MAC will remain the same when
connecting to the same network. This allows reauthentications
or roaming to work, and not have to fully re-connect (which would
be required if the MAC changed on every connection).
Changing the MAC requires bringing the interface down. This does
lead to potential race conditions with respect to external
processes. There are two potential conditions which are explained
in a TODO comment in this patch.
This API is being added to support per-network MAC address
generation. The MAC is generated based on the network SSID
and the adapters permanent address using HMAC-SHA256. The
SHA digest is then constrained to make it MAC address
compliant.
Generating the MAC address like this will ensure that the
MAC remains the same each time a given SSID is connected to.
Make sure a frame callback is free to call frame_xchg_stop without
causing a crash. Frame callback here means the one that gets
called if our tx frame was ACKed and triggered a respone frame that
matched one of the provided prefixes, within the given time.
All in all a frame callback is allowed to call either
frame_xchg_stop or frame_xchg_startv or neither. Same applies to
the final callback (called when no matching responses received).
Don't crash if the user calls frame_xchg_stop(wdev) from inside the
frame exchange's final callback. That call is going to be redundant but
it's convenient to do this inside a cleanup function for a given wdev
without having to check whether any frame exchange was actually running.
This key is special in hostapd, and was being treated as a normal hostapd
config file. This special radius config file needs to be kept unpaired from
any interfaces so now its passed in as a separate argument and appended to
the end of the hostapd execute command.
Tests which use a standalone RADIUS server may crash due to
the wiphy array not taking into account the 'radius_server'
key which is skipped during setup.
The goto was jumping to a label which freed the wiphy list which
had not yet been initialized. This also fixes another similar issue
if chdir fails (in this case tmpfs_extra_stuff would get freed
before being allocated).
This API was updated to take an extra boolean which will
automatically power up the device while changing the MAC
address. Since this is what IWD does anyways we can avoid
the need for an intermediate callback and go right into
netdev_initial_up_cb.
iwd would fail to connect using EAP-TLS when no CA certificate was
provided as it checked for successful loading of the CA certificate
instead of the client certificate when attempting to load the client
certificate.
Ensure that directory is created before its written to
This can cause a build race in a highly parallelised build where a directory is not yet created but
output file is being written using redirection e.g.
rst2man.py --strict --no-raw --no-generator --no-datestamp < ../git/monitor/iwmon.rst > monitor/iwmon.1
/bin/sh: monitor/iwmon.1: No such file or directory
make[1]: *** [Makefile:3544: monitor/iwmon.1] Error 1
Signed-off-by: Khem Raj <raj.khem@gmail.com>
The password for EAP-GTC is directly used in an EAP response. The
response buffer is created on the stack so an overly large password
could cause a stack overflow.
mac80211 drivers seem to send the disconnect event which is triggered by
CMD_DISCONNECT prior to the CMD_DISCONNECT response. However, some
drivers, namely brcmfmac, send the response first and then send the
disconnect event. This confused iwd when a connection was immediately
triggered after a disconnection (network switch operation).
Fix this by making sure that connected variable isn't set until the
connect event is actually processed, and ignore disconnect events which
come after CMD_DISCONNECT has alredy succeeded.
Do agent registration as part of agent manager proxy creation.
This ensures that the registration call is made only after the agent
manager’s interface becomes available on the bus.
Add the newly created proxy objects into the queue before the
interface specific initialization logic takes place. This way the new
proxy objects can be used within the initialization procedures.
For nl80211 sockets other than our main l_genl object use socket io
directly, to avoid creating many instances of l_genl. The only reason
we use multiple sockets is to work around an nl80211 design quirk that
requires closing the socket to unregister management frame watches.
Normally there should not be a need to create multiple sockets in a
program.
Add a little state machine and a related API, to simplify sending out a
frame, receiving the Ack / No-ack status and (if acked) waiting for a
response frame from the target device, one of a list of possible
frame prefixes. The nl80211 API for this makes it complicated
enough that this new API seems to be justified, on top of that there's a
quirk when using the brcmfmac driver where the nl80211 response
(containing the operation's cookie), the Tx Status event and the response
Frame event are received from nl80211 in reverse order (not seen with
other drivers so far), further complicating what should be a pretty
simple task.
Try to better deduplicate the frame watches. Until now we'd check if
we'd already registered a given frame body prefix with the kernel, or a
matching more general prefix (shorter). Now also try to check if we
have already have a watch with the same callback pointer and user_data
value, and:
* an identical or shorter (more general) prefix, in that case ignore
the new watch completely.
* a longer (more specific) prefix, in that case forget the existing
watch.
The use case for this is when we have a single callback for multiple
watches and multiple frame types, and inside that callback we're looking
at the frame body again and matching it to frame types. In that case
we don't want that function to be called multiple times for one frame
event.
In frame_watch_group_remove I forgot to actually match the group to be
removed by both wdev_id and group_id. group_ids are unique only in the
scope of one wdev.
I forgot to actually add new groups being created in
frame_watch_group_get to the watch_groups queue, meaning that we'd
re-create the group every time a new watch was added to the group.
Previously, the parsing of the OMs objects has been done in one pass,
therefore, the proxy object's dependencies may not have been parsed at the
time when they were looked up for the dependency assignments. Now, the
parsing of the OM objects is done in two passes: 1) Create proxy objects -
one per interface and path, 2) Populate the proxy objects with properties
and assign dependencies. Therefore, we are guaranteed to have the proxy
objects created by the time they are looked up for the dependency
assignments.
Processing the duplicated TLVs while connecting to a malicious AP may lead
to overflow of the response buffer. This patch ensures that the
duplicated TLVs are not parsed.
The pending wiphy state 'use_default' variable was not set early enough
in some circumstances resulting in weird behavior for blacklisted
drivers. Fix this by adding a manager_wiphy_dump_done callback which
will properly initialize the use_default value.
Fixes: c4b2f10483e8 ("manager: Handle missing NEW_WIPHY events")
brcmfmac does not allow the removal of the default / primary interface.
So there isn't much point in having iwd attempt this.
Another issue is that brcmfmac _does_ allow the deletion of non-default
interfaces. So starting iwd on a system with a station & ap interface
active can result in iwd attempting to delete all the interfaces. Given
the above, it succeeds in deleting the ap interface but not the station
one. In strange circumstances it might end up thinking that the ap
interface is the 'default' and trying to use it, whereas it was just
successfully removed.
==192== Conditional jump or move depends on uninitialised value(s)
==192== at 0x4531D3: l_queue_find (queue.c:346)
==192== by 0x42F1F8: manager_config_notify (manager.c:667)
==192== by 0x45A895: process_multicast (genl.c:970)
==192== by 0x45A895: received_data (genl.c:1037)
==192== by 0x4577B2: io_callback (io.c:126)
==192== by 0x456B0D: l_main_iterate (main.c:473)
==192== by 0x456BCB: l_main_run (main.c:520)
==192== by 0x456DDA: l_main_run_with_signal (main.c:642)
==192== by 0x4034B0: main (main.c:497)
The kernel emits NEW_WIPHY events whenever a new wiphy is registered.
Unfortunately these events are emitted under the 'legacy' semantics and
have a hard size limit of 4096 bytes. Unfortunately, it is possible for
a NEW_WIPHY message to exceed this limit (ath10k cards seem to be
affected in particular), which results in the kernel never sending these
messages out. This can lead to NEW_INTERFACE events being emitted with
a wiphy_id that had no corresponding NEW_WIPHY event emitted. Such a
sequence can confuse iwd's hardware detection logic, particularly during
hot-plug or system boot.
Fix this by re-dumping the wiphy if such a condition is detected. This
has some interaction with blacklisted wiphys, so the wiphy objects are
now always tracked and marked as blacklisted. Before, the blacklisted
wiphys were simply not added to the iwd list of tracked wiphys.
For the inner EAP methods that support generation of the key material
include it into imck generation. This allows to cryptographically
bind the inner method with the tunnel.
Windows Server 2008 - Network Policy Server (NPS) generates an invalid
Compound MAC for Cryptobinding TLV when is used within PEAPv0 due to
incorrect parsing of the message containing TLS Client Hello.
Setting L bit and including TLS Message Length field, even for the
packets that do not require fragmentation, corrects the issue. The
redundant TLS Message Length field in unfragmented packets doesn't
seem to affect the other server implementations.
Sometimes, at least with brcmfmac, the default interface apparently
takes a moment to get created after the NEW_WIPHY event. We didn't
really consider this case in the NEW_WIPHY handler and we've got a race
condition. It fixes the following bug for me:
https://bugs.archlinux.org/task/63912 -- tested by removing and
re-modprobing the brcmfmac module rather than rebooting.
To work around this wait for the NEW_INTERFACE event and then retry the
setup. We still do the initial attempt directly after NEW_WIPHY to
handle cases like wiphys with no default interfaces and pre-existing
wiphys.
We track mtime as the 'LastConnectedTime' of the network, and also sort
the known network list according to the last connected time.
Unfortunately we were never reacting to ATTRIB changes, and so were
never updating the network_info->connected_time whenever a network was
connected to.
Rework the logic to address this. This also fixes a small bug where the
connected_time was not set properly prior to removal / re-insertion of
the network_info.
These arrays should have been declared extern in the first place.
Newer versions of gcc now complain about this:
/usr/lib/gcc/x86_64-pc-linux-gnu/9.2.0/../../../../x86_64-pc-linux-gnu/bin/ld: client/dbus-proxy.o:(.bss+0x0): multiple definition of `properties_yes_no_opts'; client/adapter.o:(.bss+0x0): first defined here
/usr/lib/gcc/x86_64-pc-linux-gnu/9.2.0/../../../../x86_64-pc-linux-gnu/bin/ld: client/dbus-proxy.o:(.bss+0x20): multiple definition of `properties_on_off_opts'; client/adapter.o:(.bss+0x20): first defined here
/usr/lib/gcc/x86_64-pc-linux-gnu/9.2.0/../../../../x86_64-pc-linux-gnu/bin/ld: client/device.o:(.bss+0x20): multiple definition of `properties_on_off_opts'; client/adapter.o:(.bss+0x20): first defined here
/usr/lib/gcc/x86_64-pc-linux-gnu/9.2.0/../../../../x86_64-pc-linux-gnu/bin/ld: client/device.o:(.bss+0x0): multiple definition of `properties_yes_no_opts'; client/adapter.o:(.bss+0x0): first defined here
/usr/lib/gcc/x86_64-pc-linux-gnu/9.2.0/../../../../x86_64-pc-linux-gnu/bin/ld: client/known-networks.o:(.bss+0x0): multiple definition of `properties_yes_no_opts'; client/adapter.o:(.bss+0x0): first defined here
/usr/lib/gcc/x86_64-pc-linux-gnu/9.2.0/../../../../x86_64-pc-linux-gnu/bin/ld: client/known-networks.o:(.bss+0x20): multiple definition of `properties_on_off_opts'; client/adapter.o:(.bss+0x20): first defined here
/usr/lib/gcc/x86_64-pc-linux-gnu/9.2.0/../../../../x86_64-pc-linux-gnu/bin/ld: client/properties.o:(.data.rel.local+0x0): multiple definition of `properties_yes_no_opts'; client/adapter.o:(.bss+0x0): first defined here
/usr/lib/gcc/x86_64-pc-linux-gnu/9.2.0/../../../../x86_64-pc-linux-gnu/bin/ld: client/properties.o:(.data.rel.local+0x20): multiple definition of `properties_on_off_opts'; client/adapter.o:
NLMSG_OK and NLMSG_NEXT expect to operate on nlmsg_len which is an int
(signed type). The current code uses an unsigned type which means that
it cannot detect underflows. Such underflows can happen when NLMSG_NEXT
tries to advance nlmsg_len by a number of bytes (due to alignment) which
are greater than the current nlmsg_len itself. This causes iwmon to
crash on certain messages.
Reported-By: Daniel Wagner <wagi@monom.org>
We use the mtime on the network profile as the 'Last Connected Time'.
When we update any property and sync the file to disk, the mtime was not
preserved (since we were creating a new temporary file instead of
modifying the old one). This led to LastConnectedTime property change
being emitted / updated incorrectly when a writable property on the
KnownNetwork interface was updated.
Our design preference is to not call any callbacks in the _free/_destroy
method of a class (with the exception of explicit destroy callbacks
provided, if any).
Invoking the callback in this case was unnecessary: wsc_dbus_free was
already replying to pending connect / cancel messages. The only other
thing the callback would attempt to do is to set station back into
autoconnect mode. This was unnecessary as well since the netdev is
already down.
This change removes the callback invocation. Since wsc_enrollee_destroy
is now just calling wsc_enrollee_free, remove this from the API and
expose wsc_enrollee_free instead.
Split the WSC D-Bus interface class (struct wsc) into a base class
common to station mode and P2P mode (struct wsc_dbus) and station-
specific logic like scanning, saving the credentials as a known network
and triggering the station-mode connection (struct wsc_station_dbus).
Make the base class and its utilities public in wsc.h for P2P use.
Create struct wsc_enrollee which is allocated with wsc_enrollee_new,
taking a done callback as a parameter. The callback is always
called so there's no need for a separate destroy callback. The object
only lives until the done callback happens so wsc_enrollee_cancel/destroy
can only be used before this.
Looks like the rest of the file is simplified thanks to this.
This new API is independent of netdev.c and allows actually
unregistering from receiving notifications of frames, although with some
quirks. The current API only allowed the callback for a registration to
be forgotten but our process and/or the kernel would still be woken up
when matching frames were received because the kernel had no frame
unregister call. In the new API you can supply a group-id paramter when
registering frames. If it is non-zero the frame_watch_group_remove() call
can be used to remove all frame registrations that had a given group-id
by closing the netlink socket on which the notifications would be
received. This means though that it's a slightly costly operation.
The file is named frame-xchg.c because I'm thinking of also adding
utilities for sending frames and waiting for one of a number of replies
and handling the acked/un-acked information.
There's are two changes to the example raw data in m8_encrypted_settings,
one is to change the Network Index value to 1 and the other is to drop
the Network Key Index attribute:
Network Index R Deprecated - use fixed value 1 for
backwards compatibility.
Network Key O Deprecated. Only included by WSC 1.0
Index devices. Ignored by WSC 2.0 or newer
devices.
Instead of taking the credentials from wsc object directly, have the
caller pass these in. This makes it more consistent with how the
done_cb was done.
Split the interface-specific logic from the core WSC logic. The core
WSC code is the part that we can re-use between P2P and station and
doesn't include the D-Bus code, scanning for the target BSS or the
attempt to make a station mode connection.
Allow netdev_create_from_genl callers to draw a random or non-random MAC
and pass it in the parameter instead of a bool to tell us to generating
the MAC locally. In P2P we are generating the MAC some time before
creating the netdev in order to pass it to the peer during negotiation.
Some test cases require (at least with recent hostapd versions) a
stand alone radius server. This is done using driver=none in the
hostapd config file. For this use case hostapd does not need any
radio since its not doing anything wireless related.
Now inside the hw.conf file, under the HOSTAPD group, you can
specify a config file as the value to 'radius_server' key. This
config file will be used without any associated radio when hostapd
is started.
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.