If a CMD_TRIGGER_SCAN request fails with -EBUSY, iwd currently assumes
that a scan is ongoing on the underlying wdev and will retry the same
command when that scan is complete. It gets notified of that completion
via the scan_notify() function, and kicks the scan logic to try again.
However, if there is another wdev on the same wiphy and that wdev has a
scan request in flight, the kernel will also return -EBUSY. In other
words, only one scan request per wiphy is permitted.
As an example, the brcmfmac driver can create an AP interface on the
same wiphy as the default station interface, and scans can be triggered
on that AP interface.
If -EBUSY is returned because another wdev is scanning, then iwd won't
know when it can retry the original trigger request because the relevant
netlink event will arrive on a different wdev. Indeed, if no scan
context exists for that other wdev, then scan_notify will return early
and the scan logic will stall indefinitely.
Instead, and in the event that no scan context matches, use it as a cue
to retry a pending scan request that happens to be destined for the same
wiphy.
The previous commit added an invocation of known_networks_watch_add, but
never updated the module dependency graph.
Fixes: a793a41662 ("station, eapol: Set up eap-tls-common for session caching")
Use eap_set_peer_id() to set a string identifying the TLS server,
currently the hex-encoded SSID of the network, to be used as group name
and primary key in the session cache l_settings object. Provide pointers
to storage_eap_tls_cache_{load,sync} to eap-tls-common.c using
eap_tls_set_session_cache_ops(). Listen to Known Network removed
signals and call eap_tls_forget_peer() to have any session related to
the network also dropped from the cache.
Use l_tls_set_session_cache() to enable session cache/resume in the
TLS-based EAP methods. Sessions for all 802.1x networks are stored in
one l_settings object.
eap_{get,set}_peer_id() API is added for the upper layers to set the
identifier of the authenticator (or the supplicant if we're the
authenticator, if there's ever a use case for that.)
eap-tls-common.c can't call storage_eap_tls_cache_{load,sync}()
or known_networks_watch_add() (to handle known network removals) because
it's linked into some executables that don't have storage.o,
knownnetworks.o or common.o so an upper layer (station.c) will call
eap_tls_set_session_cache_ops() and eap_tls_forget_peer() as needed.
Minor changes to these two methods resulting from two rewrites of them.
Actual changes are:
* storage_tls_session_sync parameter is const,
* more specific naming,
* storage_tls_session_load will return an empty l_settings instead of
NULL so eap-tls-common.c doesn't have to handle this.
storage.c makes no assumptions about the group names in the l_settings
object and keeps no reference to that object, eap-tls-common.c is going
to maintain the memory copy of the cache since this cache and the disk
copy of it are reserved for EAP methods only.
A comma separated list as a string was ok for pure display purposes
but if any processing needed to be done on these values by external
consumers it really makes more sense to use a DBus array.
This wasn't being updated meaning the property is missing until a
scan is issued over DBus.
Rather than duplicate all the property changed calls they were all
factored out into a helper function.
Adds the MulticastDNS option globally to main.conf. If set all
network connections (when netconfig is enabled) will set mDNS
support into the resolver. Note that an individual network profile
can still override the global value if it sets MulticastDNS.
The limitation of cipher selection in ap.c was done so to allow p2p to
work. Now with the ability to specify ciphers in the AP config put the
burden on p2p to limit ciphers as it needs which is only CCMP according
to the spec.
These can now be optionally provided in an AP profile and provide a
way to limit what ciphers can be chosen. This still is dependent on
what the hardware supports.
The validation of these ciphers for station is done when parsing
the BSS RSNE but for AP mode there is no such validation and
potentially any supported cipher could be chosen, even if its
incompatible for the type of key.
The netdev_copy_tk function was being hard coded with authenticator
set to false. This isn't important for any ciphers except TKIP but
now that AP mode supports TKIP it needs to be fixed.
Though TKIP is deprecated and insecure its trivial to support it in
AP mode as we already do in station. This is only to allow AP mode
for old hardware that may only support TKIP. If the hardware supports
any higher level cipher that will be chosen automatically.
The key descriptor version was hard coded to HMAC_SHA1_AES which
is correct when using IE_RSN_AKM_SUITE_PSK + CCMP. ap.c hard
codes the PSK AKM but still uses wiphy to select the cipher. In
theory there could be hardware that only supports TKIP which
would then make IWD non-compliant since a different key descriptor
version should be used with PSK + TKIP (HMAC_MD5_ARC4).
Now use a helper to sort out which key descriptor should be used
given the AKM and cipher suite.
Similarly to l_netconfig track whether IWD's netconfig is active (from
the moment of netconfig_configure() till netconfig_reset()) using a
"started" flag and avoid handling or emitting any events after "started"
is cleared.
This fixes an occasional issue with the Netconfig Agent backend where
station would reset netconfig, netconfig would issue DBus calls to clear
addresses and routes, station would go into DISCONNECTING, perhaps
finish and go into DISCONNECTED and after a while the DBus calls would
come back with an error which would cause a NETCONFIG_EVENT_FAILED
causing station to call netdev_disconnct() for a second time and
transition to and get stuck in DISCONNECTING.
Both CMD_ASSOCIATE and CMD_CONNECT paths were using very similar code to
build RSN specific attributes. Use a common function to build these
attributes to cut down on duplicated code.
While here, also start using ie_rsn_cipher_suite_to_cipher instead of
assuming that the pairwise / group ciphers can only be CCMP or TKIP.
Instead of copy-pasting the same basic operation (memcpy & assignment),
use a goto and a common path instead. This should also make it easier
for the compiler to optimize this function.
The known frequency list may include frequencies that once were
allowed but are now disabled due to regulatory restrictions. Don't
include these frequencies in the roam scan.
These events are sent if IWD fails to authentiate
(ft-over-air-roam-failed) or if it falls back to over air after
failing to use FT-over-DS (try-ft-over-air)
If IPv4 setup fails and the netconfig logic gives up, continue as if the
connection had failed at earlier stages so that autoconnect can try the
next available network.
Certain drivers support/require probe response offloading which
IWD did not check for or properly handle. If probe response
offloading is required the probe response frame watch will not
be added and instead the ATTR_PROBE_RESP will be included with
START_AP.
The head/tail builders were reused but slightly modified to check
if the probe request frame is NULL, since it will be for use with
START_AP.
Parse the AP probe response offload attribute during the dump. If
set this indicates the driver expects the probe response attribute
to be included with START_AP.
Clearing all authentications during ft_authenticate was a very large
hammer and may remove cached authentications that could be used if
the current auth attempt fails.
For example the best BSS may have a problem and fail to authenticate
early with FT-over-DS, then fail with FT-over-Air. But another BSS
may have succeeded early with FT-over-DS. If ft_authenticate clears
all ft_infos that successful authentication will be lost.
AP roaming was structured such that any AP roam request would
force IWD to roam (assuming BSS's were found in scan results).
This isn't always the best behavior since IWD may be connected
to the best BSS in range.
Only force a roam if the AP includes one of the 3 disassociation/
termination bits. Otherwise attempt to roam but don't set the
ap_directed_roaming flag which will allows IWD to stay with the
current BSS if no better candidates are found.
There are a few checks that can be done prior to parsing the
request, in addition the explicit check for preparing_roam was
removed since this is taken care of by station_cannot_roam().
Once offchannel completes we can check if the info structure was
parsed, indicating authentication succeeded. If not there is no
reason to keep it around since IWD will either try another BSS or
fail.
This both adds proper handling to the new roaming logic and fixes
a potential bug with firmware roams.
The new way roaming works doesn't use a connect callback. This
means that any disconnect event or call to netdev_connect_failed
will result in the event handler being called, where before the
connect callback would. This means we need to handle the ROAMING
state in the station disconnect event so IWD properly disassociates
and station goes out of ROAMING.
With firmware roams netdev gets an event which transitions station
into ROAMING. Then netdev issues GET_SCAN. During this time a
disconnect event could come in which would end up in
station_disconnect_event since there is no connect callback. This
needs to be handled the same and let IWD transition out of the
ROAMING state.
This finalizes the refactor by moving all the handshake prep
into FT itself (most was already in there). The netdev-specific
flags and state were added into netdev_ft_tx_associate which
now avoids any need for a netdev API related to FT.
The NETDEV_EVENT_FT_ROAMED event is now emitted once FT completes
(netdev_connect_ok). This did require moving the 'in_ft' flag
setting until after the keys are set into the kernel otherwise
netdev_connect_ok has no context as to if this was FT or some
other connection attempt.
In addition the prev_snonce was removed from netdev. Restoring
the snonce has no value once association begins. If association
fails it will result in a disconnect regardless which requires
a new snonce to be generated
This converts station to using ft_action/ft_authenticate and
ft_associate and dropping the use of the netdev-only/auth-proto
logic.
Doing this allows for more flexibility if FT fails by letting
IWD try another roam candidate instead of disconnecting.
Now the full action frame including the header is provided to ft
which breaks the existing parser since it assumes the buffer starts
at the body of the message.
This forwards Action, Authentication and Association frames to
ft.c via their new hooks in netdev.
Note that this will break FT-over-Air temporarily since the
auth-proto still is in use.
The current behavior is to only find the best roam candidate, which
generally is fine. But if for whatever reason IWD fails to roam it
would be nice having a few backup BSS's rather than having to
re-scan, or worse disassociate and reconnect entirely.
This patch doesn't change the roam behavior, just prepares for
using a roam candidate list. One difference though is any roam
candidates are added to station->bss_list, rather than just the
best BSS. This shouldn't effect any external behavior.
The candidate list is built based on scan_bss rank. First we establish
a base rank, the rank of the current BSS (or zero if AP roaming). Any
BSS in the results with a higher rank, excluding the current BSS, will
be added to the sorted station->roam_bss_list (as a new 'roam_bss'
entry) as well as stations overall BSS list. If the resulting list is
empty there were no better BSS's, otherwise station can now try to roam
starting with the best candidate (head of the roam list).
A new API was added, ft_authenticate, which will send an
authentication frame offchannel via CMD_FRAME. This bypasses
the kernel's authentication state allowing multiple auth
attempts to take place without disconnecting.
Currently netdev handles caching FT auth information and uses FT
parsers/auth-proto to manage the protocol. This sets up to remove
this state machine from netdev and isolate it into ft.c.
This does not break the existing auth-proto (hence the slight
modifications, which will be removed soon).
Eventually the auth-proto will be removed from FT entirely, replaced
just by an FT state machine, similar to how EAPoL works (netdev hooks
to TX/RX frames).
There may be situations (due to Multi-BSS operation) where an AP might
be advertising multiple SSIDs on the same BSSID. It is thus more
correct to lookup the preauthentication target on the network object
instead of the station bss_list. It used to be that the network list of
bsses was not updated when roam scan was performed. Hence the lookup
was always performed on the station bss_list. But this is no longer the
case, so it is safer to lookup on the network object directly on the
network.
The warnings in the authenticate and connect events were identical
so it could be difficult knowing which print it was if IWD is not
in debug mode (to see more context). The prints were changed to
indicate which event it was and for the connect event the reason
attribute is also parsed.
Note the resp_ies_len is also initialized to zero now. After making
the changes gcc was throwing a warning.
FT is special in that it really should not be interrupted. Since
FRAME/OFFCHANNEL have the highest priority we run the risk of
DPP or some other offchannel operation interfering with FT.
FT is now driven (mostly) by station which removes the connect
callback. Instead once FT is completed, keys set, etc. netdev
will send an event to notify station.
Since l_netconfig's DHCPv6 client instance no longer sets parameters on
the l_icmp6_client instance, call l_icmp6_client_set_nodelay() and
l_icmp6_client_set_debug() directly. Also enable optimistic DAD to
speed up IPv6 setup if available.
All uses of frame-xchg were for action frames, and the frame type
was hard coded. Soon other frame types will be needed so the type
must now be specified in the frame_xchg_prefix structure.
This will make the debug API more robust as well as fix issues
certain drivers have when trying to roam. Some of these drivers
may flush scan results after CMD_CONNECT which results in -ENOENT
when trying to roam with CMD_AUTHENTICATE unless you rescan
explicitly.
Now this will be taken care of automatically and station will first
scan for the BSS (or full scan if not already in results) and
attempt to roam once the BSS is seen in a fresh scan.
The logic to replace the old BSS object was factored out into its
own function to be shared by the non-debug roam scan. It was also
simplified to just update the network since this will remove the
old BSS if it exists.
Add a second netconfig-commit backend which, if enabled, doesn't
directly send any of the network configuration to the kernel or system
files but delegates the operation to an interested client's D-Bus
method as described in doc/agent-api.txt. This backend is switched to
when a client registers a netconfig agent object and is swiched away
from when the client disconnects or unregisters the agent. Only one
netconfig agent can be registered any given time.
Add netconfig_event_handler() that responds to events emitted by
the l_netconfig object by calling netconfig_commit, tracking whether
we're connected for either address family and emitting
NETCONFIG_EVENT_CONNECTED or NETCONFIG_EVENT_FAILED as necessary.
NETCONFIG_EVENT_FAILED is a new event as until now failures would cause
the netconfig state machine to stop but no event emitted so that
station.c could take action. As before, these events are only
emitted based on the IPv4 configuration state, not IPv6.
Add netconfig-commit.c whose main method, netconfig_commit actually sets
the configuration obtained by l_netconfig to the system netdev,
specifically it sets local addresses on the interface, adds routes to the
routing table, sets DNS related data and may add entries to the neighbor
cache. netconfig-commit.c uses a backend-ops type structure to allow
for switching backends. In this commit there's only a default backend
that uses l_netconfig_rtnl_apply() and a struct resolve object to write
the configuration.
netconfig_gateway_to_arp is moved from netconfig.c to netconfig-commit.c
(and renamed.) The struct netconfig definition is moved to netconfig.h
so that both files can access the settings stored in the struct.
To avoid repeated lookups by ifindex, replace the ifindex member in
struct netconfig with a struct netdev pointer. A struct netconfig
always lives shorter than the struct netdev.
* make the error handling simpler,
* make error messages more consistent,
* validate address families,
* for IPv4 skip l_rtnl_address_set_noprefixroute()
as l_netconfig will do this internally as needed.
* for IPv6 set the default prefix length to 64 as that's going to be
used for the local prefix route's prefix length and is a more
practical value.
Drop all the struct netconfig members where we were keeping the parsed
netconfig settings and add a struct l_netconfig object. In
netconfig_load_settings load all of the settings once parsed directly
into the l_netconfig object. Only preserve the mdns configuration and
save some boolean values needed to properly handle static configuration
and FILS. Update functions to use the new set of struct netconfig
members.
These booleans mirroring the l_netconfig state could be replaced by
adding l_netconfig getters for settings which currently only have
setters.
In anticipation of switching to use the l_netconfig API, which
internally handles DHCPv4, DHCPv6, ACD, etc., drop pointers to
instances of l_dhcp_client, l_dhcp6_client and l_acd from struct
netconfig. Also drop all code used for handling events from these
APIs, including code to commit the received configurations to the
system. Committing the final settings to the system netdevs is going to
be handled by a new set of utilities in a new file.
The RRM module was blindly scanning using the requested
frequency which may or may not be possible given the hardware.
Instead check that the frequency will work and if not reject
the request.
This was reported by a user seeing the RRM scan fail which was
due to the AP requesting a scan on 5GHz when the adapter was
2.4GHz only.
Support for MAC address changes while powered was recently added to
mac80211. This avoids the need to power down the device which both
saves time as well as preserves any allowed frequencies which may
have been disabled if the device powered down.
The code path for changing the address was reused but now just the
'up' callback will be provided directly to l_rtnl_set_mac. Since
there aren't multiple stages of callbacks the rtnl_data structure
isn't strictly needed, but the code looks cleaner and more
consistent between the powered/non-powered code paths.
The comment/debug error print was also updated to be more general
between the two MAC change code paths.
Documentation for MulticastDNS setting suggests it should be part of the
main iwd configuration file. See man iwd.config. However, in reality
the setting was being pulled from the network provisioning file instead.
The latter actually makes more sense since systemd-resolved has its own
set of global defaults. Fix the documentation to reflect the actual
implementation.
netdev does not keep any pointers to struct scan_bss arguments that are
passed in. Make this explicitly clear by modifying the API definitions
and mark these as const.
This adds a new netdev event for packet loss notifications from
the kernel. Depending on the scenario a station may see packet
loss events without any other indications like low RSSI. In these
cases IWD should still roam since there is no data flowing.
Some APs use an older hostapd OWE implementation which incorrectly
derives the PTK. To work around this group 19 should be used for
these APs. If there is a failure (reason=2) and the AKM is OWE
set force default group into network and retry. If this has been
done already the behavior is no different and the BSS will be
blacklisted.
If a OWE network is buggy and requires the default group this info
needs to be stored in network in order for it to set this into the
handshake on future connect attempts.
This functionality works around the kernel's behavior of allowing
6GHz only after a regulatory domain update. If the regdom updates
scan.c needs to be aware in order to split up periodic scans, or
insert 6GHz frequencies into an ongoing periodic scan. Doing this
allows any 6GHz BSS's to show up in the scan results rather than
needing to issue an entirely new scan to see these BSS's.
The kernel's regulatory domain updates after some number of beacons
are processed. This triggers a regulatory domain update (and wiphy
dump) but only after a scan request. This means a full scan started
prior to the regdom being set will not include any 6Ghz BSS's even
if the regdom was unlocked during the scan.
This can be worked around by splitting up a large scan request into
multiple requests allowing one of the first commands to trigger a
regdom update. Once the regdom updates (and wiphy dumps) we are
hopefully still scanning and could append an additional request to
scan 6GHz.
In the case of an external scan, we won't have a scan_request object,
sr. Make sure to not crash in this case.
Also, since scan_request can no longer carry the frequency set in all
cases, add a new member to scan_results in order to do so.
Fixes: 27d8cf4ccc ("scan: track scanned frequencies for entire request")
The kernel handles setting the regulatory domain by receiving beacons
which set the country IE. Presumably since most regulatory domains
disallow 6GHz the default (world) domain also disables it. This means
until the country is set, 6GHz is disabled.
This poses a problem for IWD's quick scanning since it only scans a few
frequencies and this likely isn't enough beacons for the firmware to
update the country, leaving 6Ghz inaccessable to the user without manual
intervention (e.g. iw scan passive, or periodic scans by IWD).
To try and work around this limitation the quick scan logic has been
updated to check if a 6GHz AP has been connected to before and if that
frequency is disabled (but supported). If this is the case IWD will opt
for a full passive scan rather than scanning a limited set of
frequencies.
For whatever reason the kernel will send regdom updates even if
the regdom didn't change. This ends up causing wiphy to dump
which isn't needed since there should be no changes in disabled
frequencies.
Now the previous country is checked against the new one, and if
they match the wiphy is not dumped again.
A change in regulatory domain can result in frequencies being
enabled or disabled depending on the domain. This effects the
frequencies stored in wiphy which other modules depend on
such as scanning, offchannel work etc.
When the regulatory domain changes re-dump the wiphy in order
to update any frequency restrictions.
A helper to check whether the country code corresponds to a
real country, or some special code indicating the country isn't
yet set. For now, the special codes are OO (world roaming) and
XX (unknown entity).
Events to indicate when a regulatory domain wiphy dump has
started and ended. This is important because certain actions
such as scanning need to be delayed until the dump has finished.
The NEW_SCAN_RESULTS handling was written to only parse the frequency
list if there were no additional scan commands to send. This results in
the scan callback containing frequencies of only the last CMD_TRIGGER.
Until now this worked fine because a) the queue is only used for hidden
networks and b) frequencies were never defined by any callers scanning
for hidden networks (e.g. dbus/periodic scans).
Soon the scan command queue will be used to break up scan requests
meaning only the last scan request frequencies would be used in the
callback, breaking the logic in station.
Now the NEW_SCAN_RESULTS case will parse the frequencies for each scan
command rather than only the last.
The compiler treated the '1' as an int type which was not big enough
to hold a bit shift of 31:
runtime error: left shift of 1 by 31 places cannot be represented in
type 'int'
Instead of doing the iftype check manually, refactor
wiphy_get_supported_iftypes by adding a subroutine which just parses
out iftypes from a mask into a char** list. This removes the need to
case each iftype into a string.
Add extra logging around CQM events to help track wifi status. This is
useful for headless systems that can only be accessed over the network
and so information in the logs is invaluable for debugging outages.
Prior to this change, the only log for CQM messages is saying one was
received. This adds details to what attributes were set and the
associated data with them.
The signal strength log format was chosen to roughly match
wpa_supplicant's which looks like this:
CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-60 noise=-96 txrate=6000
Provides useful information on why a roam might have failed, such as
failing to find the BSS or the BSS being ranked lower, and why that
might be.
The output format is the same as station_add_seen_bss for consistency.
If a frequency is disabled IWD should keep track and disallow any
operations on that channel such as scanning. A new list has been added
which contains only disabled frequencies.
The scan_passive API wasn't using a const struct scan_freq_set as it
should be since it's not modifying the contents. Changing this to
const did require some additional changes like making the scan_parameters
'freqs' member const as well.
After changing scan_parameters, p2p needed updating since it was using
scan_parameters.freqs directly. This was changed to using a separate
scan_freq_set pointer, then setting to scan_parameters.freqs when needed.