This was added to support a single buggy AP model that failed to
negotiate the SAE group correctly. This may still be a problem but
since then the [Network].UseDefaultEccGroup option has been added
which accomplishes the same thing.
Remove the special handling for this specific OUI and rely on the
user setting the new option if they have problems.
Experimental AP-mode support for receiving a Confirm frame when in the
COMMITTED state. The AP will reply with a Confirm frame.
Note that when acting as an AP, on reception of a Commit frame, the AP
only replies with a Commit frame. The protocols allows to also already
send the Confirm frame, but older clients may not support simultaneously
receiving a Commit and Confirm frame.
This was overlooked in a prior patch and causes warnings to be
printed when the RSSI is too low to estimate an HE data rate or
due to incompatible local capabilities (e.g. MCS support).
Similar to the other estimations, return -ENETUNREACH if the IE
was valid but incompatible.
If the RSSI is too low or the local capabilities were not
compatible to estimate the rate don't warn but instead treat
this the same as -ENOTSUP and drop down to the next capability
set.
If we register the main EAPOL frame listener as late as the associate
event, it may not observe ptk_1_of_4. This defeats handling for early
messages in eapol_rx_packet, which only sees messages once it has been
registered.
If we move registration to the authenticate event, then the EAPOL
frame listeners should observe all messages, without any possible
races. Note that the messages are not actually processed until
eapol_start() is called, and we haven't moved that call site. All
that's changing here is how early EAPOL messages can be observed.
netdev_disconnect() was unconditionally sending CMD_DISCONNECT which
is not the right behavior when IWD has not associated. This means
that if a connection was started then immediately canceled with
the Disconnect() method the kernel would continue to authenticate.
Instead if IWD has not yet associated it should send a deauth
command which causes the kernel to correctly cleanup its state and
stop trying to authenticate.
Below are logs showing the behavior. Autoconnect is started followed
immediately by a DBus Disconnect call, yet the kernel continues
sending authenticate events.
event: state, old: autoconnect_quick, new: connecting (auto)
src/scan.c:scan_cancel() Trying to cancel scan id 1 for wdev 7d
src/wiphy.c:wiphy_radio_work_done() Work item 1 done
src/wiphy.c:wiphy_radio_work_next() Starting work item 2
src/netdev.c:netdev_mlme_notify() MLME notification New Station(19)
src/station.c:station_dbus_disconnect()
src/station.c:station_reset_connection_state() 85
src/station.c:station_roam_state_clear() 85
event: state, old: connecting (auto), new: disconnecting
src/wiphy.c:wiphy_radio_work_done() Work item 2 done
src/station.c:station_connect_cb() 85, result: 5
src/station.c:station_disconnect_cb() 85, success: 1
event: state, old: disconnecting, new: disconnected
src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
src/netdev.c:netdev_authenticate_event()
Unexpected connection related event -- is another supplicant running?
src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
src/netdev.c:netdev_authenticate_event()
Unexpected connection related event -- is another supplicant running?
src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
src/netdev.c:netdev_authenticate_event()
Unexpected connection related event -- is another supplicant running?
src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
src/netdev.c:netdev_authenticate_event()
Unexpected connection related event -- is another supplicant running?
In most cases any failure here is likely just due to the AP not
supporting the feature, whether its HE/VHT/HE. This should result
in the estimation returning -ENOTSUP in which case we move down
the list. Any other non-zero return we will now warn to make it
clear the IEs did exist, but were not properly formatted.
All length check failures were changed to continue instead of
fail. This will now treat invalid lengths as if the IE did not
exist.
In addition HE specifically has an extra validation function which,
if failed, was bailing out of the estimation function entirely.
Instead this is now treated as if there was no HE capabilities and
the logic can move down to VHT, HT, or basic rates.
Caught by static analysis, the dev->conn_peer pointer was being
dereferenced very early on without a NULL check, but further it
was being NULL checked. If there is a possibility of it being NULL
the check should be done much earlier.
Caught by static analysis, if ATTR_MAC was not in the message there
would be a memcpy with uninitialized bytes. In addition there is no
reason to memcpy twice. Instead 'mac' can be a const pointer which
both verifies it exists and removes the need for a second memcpy.
Static analysis complains that 'last' could be NULL which is true.
This really could only happen if every frequency was disabled which
likely is impossible but in any case, check before dereferencing
the pointer.
Since these are all stack variables they are not zero initialized.
If parsing fails there may be invalid pointers within the structures
which can get dereferenced by p2p_clear_*
The input queue pointer was being initialized unconditionally so if
parsing fails the out pointer is still set after the queue is
destroyed. This causes a crash during cleanup.
Instead use a temporary pointer while parsing and only after parsing
has finished do we set the out pointer.
Reported-By: Alex Radocea <alex@supernetworks.org>
static analysis complains that authenticator is used uninitialized.
This isn't strictly true as memory region is reserved for the
authenticator using the contents of the passed in structure. This
region is then overwritten once the authenticator is actually computed
by authenticator_put(). Silence this warning by explicitly setting
authenticator bytes to 0.
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
This shouldn't be possible in theory since the roam_bss_list being
iterated is a subset of entire scan_bss list station/network has
but to be safe, and catch any issues due to future changes warn on
this condition.
For some encrypt operations DPP passes no AD iovecs (both are
NULL/0). But since the iovec itself is on the stack 'ad' is a
valid pointer from within aes_siv_encrypt. This causes memcpy
to be called which coverity complains about. Since the copy
length is zero it was effectively a no-op, but check num_ad to
prevent the call.
In order to complete the learned default group behavior station needs
to be aware of when an SAE/OWE connection retried. This is all
handled within netdev/sae so add a new netdev event so station can
set the appropriate network flags to prevent trying the non-default
group again.
If either the settings specify it, or the scan_bss is flagged, set
the use_default_ecc_group flag in the handshake.
This also renames the flag to cover both OWE and SAE
There is special handling for buggy OWE APs which set a network flag
to use the default OWE group. Utilize the more persistent setting
within known-networks as well as the network object (in case there
is no profile).
This also renames the get/set APIs to be generic to ECC groups rather
than only OWE.
This adds the option [Settings].UseDefaultEccGroup which allows a
network profile to specify the behavior when using an ECC-based
protocol. If unset (default) IWD will learn the behavior of the
network for the lifetime of its process.
Many APs do not support group 20 which IWD tries first by default.
This leads to an initial failure followed by a retry using group 19.
This option will allow the user to configure IWD to use group 19
first or learn the network capabilities, if the authentication fails
with group 20 IWD will always use group 19 for the process lifetime.
The information specific to auth/assoc/connect timeouts isn't
communicated to station so emit the notice events within netdev.
We could communicate this to station by adding separate netdev
events, but this does not seem worth it for this use case as
these notice events aren't strictly limited to station.
For anyone debugging or trying to identify network infrastructure
problems the IWD DBus API isn't all that useful and ultimately
requires going through debug logs to figure out exactly what
happened. Having a concise set of debug logs containing only
relavent information would be very useful. In addition, having
some kind of syntax for these logs to be parsed by tooling could
automate these tasks.
This is being done, starting with station, by using iwd_notice
which internally uses l_notice. The use of the notice log level
(5) in IWD will be strictly for the type of messages described
above.
iwd_notice is being added so modules can communicate internal
state or event information via the NOTICE log level. This log
level will be reserved in IWD for only these type of messages.
The iwd_notice macro aims to help enforce some formatting
requirements for these type of log messages. The messages
should be one or more comma-separated "key: value" pairs starting
with "event: <name>" and followed by any additional info that
pertains to that event.
iwd_notice only enforces the initial event key/value format and
additional arguments are left to the caller to be formatted
correctly.
The --logger,-l flag can now be used to specify the logger type.
Unset (default) will set log output to stderr as it is today. The
other valid options are "syslog" and "journal".
basename use is considered harmful. There are two versions of
basename (see man 3 basename for details). The more intuitive version,
which is currently being used inside wiphy.c, is not supported by musl
libc implementation. Use of the libgen version is not preferred, so
drop use of basename entirely. Since wiphy.c is the only call site of
basename() inside iwd, open code the required logic.
ELL now has a setting to limit the number of DHCP attempts. This
will now be set in IWD and if reached will result in a failure
event, and in turn a disconnect.
IWD will set a maximum of 4 retries which should keep the maximum
DHCP time to ~60 seconds roughly.
The known frequency list is now a sorted list and the roam scan
results were not complying with this new requirement. The fix is
easy though since the iteration order of the scan results does
not matter (the roam candidates are inserted by rank). To fix
the known frequencies order we can simply reverse the scan results
list before iterating it.
When operating as an AP, drop message 4 of the 4-way handshake if the AP
has not yet received message 2. Otherwise an attacker can skip message 2
and immediately send message 4 to bypass authentication (the AP would be
using an all-zero ptk to verify the authenticity of message 4).
In very large network deployments there could be a vast amount of APs
which could create a large known frequency list after some time once
all the APs are seen in scan results. This then increases the quick
scan time significantly, in the very worst case (but unlikely) just
as long as a full scan.
To help with this support in knownnetworks was added to limit the
number of frequencies per network. Station will now only get 5
recent frequencies per network making the maximum frequencies 25
in the worst case (~2.5s scan).
The magic values are now defines, and the recent roam frequencies
was also changed to use this define as well.
In order to support an ordered list of known frequencies the list
should be in order of last seen BSS frequencies with the highest
ranked ones first. To accomplish this without adding a lot of
complexity the frequencies can be pushed into the list as long as
they are pushed in reverse rank order (lowest rank first, highest
last). This ensures that very high ranked BSS's will always get
superseded by subsequent scans if not seen.
This adds a new network API to update the known frequency list
based on the current newtork->bss_list. This assumes that station
always wipes the BSS list on scans and populates with only fresh
BSS entries. After the scan this API can be called and it will
reverse the list, then add each frequency.
I've had connections to a WPA3-Personal only network fail with no log
message from iwd, and eventually figured out to was because the driver
would've required using CMD_EXTERNAL_AUTH. With the added log messages
the reason becomes obvious.
Additionally the fallback may happen even if the user explicitly
configured WPA3 in NetworkManager, I believe a warning is appropriate
there.
There was an unhandled corner case if netconfig was running and
multiple roam conditions happened in sequence, all before netconfig
had completed. A single roam before netconfig was already handled
(23f0f5717c) but this did not take into account any additional roam
conditions.
If IWD is in this state, having started netconfig, then roamed, and
again restarted netconfig it is still in a roaming state which will
prevent any further roams. IWD will remain "stuck" on the current
BSS until netconfig completes or gets disconnected.
In addition the general state logic is wrong here. If IWD roams
prior to netconfig it should stay in a connecting state (from the
perspective of DBus).
To fix this a new internal station state was added (no changes to
the DBus API) to distinguish between a purely WiFi connecting state
(STATION_STATE_CONNECTING/AUTO) and netconfig
(STATION_STATE_NETCONFIG). This allows IWD roam as needed if
netconfig is still running. Also, some special handling was added so
the station state property remains in a "connected" state until
netconfig actually completes, regardless of roams.
For some background this scenario happens if the DHCP server goes
down for an extended period, e.g. if its being upgraded/serviced.
This gives the tests a lot more fine-tune control to wait for
specific state transitions rather than only what is exposed over
DBus.
The additional events for "ft-roam" and "reassoc-roam" were removed
since these are now covered by the more generic state change events
("ft-roaming" and "roaming" respectively).
Before this change DPP was writing the credentials both to disk
and into the network object directly. This allowed the connection
to work fine but additional settings were not picked up due to
network_set_passphrase/psk loading the settings before they were
written.
Instead DPP can avoid setting the credentials to the network
object entirely and just write them to disk. Then, wait for
known networks to notify that the profile was either created
or updated then DPP can proceed to connecting. network_autoconnect()
will take care of loading the profile that DPP wrote and remove the
need for DPP to touch the network object at all.
One thing to note is that an idle callback is still needed from
within the known networks callback. This is because a new profile
requires network.c to set the network_info which is done in the
known networks callback. Rather than assume that network.c will be
called into before dpp.c an l_idle was added.
If a known network is modified on disk known networks does not have
any way of notifying other modules. This will be needed to support a
corner case in DPP if a profile exists but is overwritten after DPP
configuration. Add this event to known networks and handle it in
network.c (though nothing needs to be done in that case).
Without the change test-dpp fails on aarch64-linux as:
$ unit/test-dpp
TEST: DPP test responder-only key derivation
TEST: DPP test mutual key derivation
TEST: DPP test PKEX key derivation
test-dpp: unit/test-dpp.c:514: test_pkex_key_derivation: Assertion `!memcmp(tmp, __tmp, 32)' failed.
This happens due to int/size_t type mismatch passed to vararg
parameters to prf_plus():
bool prf_plus(enum l_checksum_type type, const void *key, size_t key_len,
void *out, size_t out_len,
size_t n_extra, ...)
{
// ...
va_start(va, n_extra);
for (i = 0; i < n_extra; i++) {
iov[i + 1].iov_base = va_arg(va, void *);
iov[i + 1].iov_len = va_arg(va, size_t);
// ...
Note that varargs here could only be a sequence of `void *` / `size_t`
values.
But in src/dpp-util.c `iwd` attempted to pass `int` there:
prf_plus(sha, prk, bytes, z_out, bytes, 5,
mac_i, 6, // <- here
mac_r, 6, // <- and here
m_x, bytes,
n_x, bytes,
key, strlen(key));
aarch64 stores only 32-bit value part of the register:
mov w7, #0x6
str w7, [sp, #...]
and loads full 64-bit form of the register:
ldr x3, [x3]
As a result higher bits of `iov[].iov_len` contain unexpected values and
sendmsg sends a lot more data than expected to the kernel.
The change fixes test-dpp test for me.
While at it fixed obvious `int` / `size_t` mismatch in src/erp.c.
Fixes: 6320d6db0f ("crypto: remove label from prf_plus, instead use va_args")
The path argument was used purely for debugging. It can be just as
informational printing just the SSID of the profile that failed to
parse the setting without requiring callers allocate a string to
call the function.
Adds a new network profile setting [Security].PasswordIdentifier.
When set (and the BSS enables SAE password identifiers) the network
and handshake object will read this and use it for the SAE
exchange.
Building the handshake will fail if:
- there is no password identifier set and the BSS sets the
"exclusive" bit.
- there is a password identifier set and the BSS does not set
the "in-use" bit.
Using this will provide netdev with a connect callback and unify the
roaming result notification between FT and reassociation. Both paths
will now end up in station_reassociate_cb.
This also adds another return case for ft_handshake_setup which was
previously ignored by ft_associate. Its likely impossible to actually
happen but should be handled nevertheless.
Fixes: 30c6a10f28 ("netdev: Separate connect_failed and disconnected paths")
Essentially exposes (and renames) netdev_ft_tx_associate in order to
be called similarly to netdev_reassociate/netdev_connect where a
connect callback can be provided. This will fix the current bug where
if association times out during FT IWD will hang and never transition
to disconnected.
This also removes the calling of the FT_ROAMED event and instead just
calls the connect callback (since its now set). This unifies the
callback path for reassociation and FT roaming.
This will be called from station after FT-authentication has
finished. It sets up the handshake object to perform reassociation.
This is essentially a copy-paste of ft_associate without sending
the actual frame.
In general only the authenticator FTE is used/validated but with
some FT refactoring coming there needs to be a way to build the
supplicants FTE into the handshake object. Because of this there
needs to be separate FTE buffers for both the authenticator and
supplicant.
For adding SAE password identifiers the capability bits need to be
verified when loading the identifier from the profile. Pass the
BSS object in to network_load_psk rather than the 'need_passphrase'
boolean.
iov_ie_append assumed that a single IE was being added and thus the
length of the IE could be extracted directly from the element. However,
iov_ie_append was used on buffers which could contain multiple IEs
concatenated together, for example in handshake_state::vendor_ies. Most
of the time this was safe since vendor_ies was NULL or contained a
single element, but would result in incorrect behavior in the general
case. Fix that by changing iov_ie_append signature to take an explicit
length argument and have the caller specify whether the element is a
single IE or multiple.
Fixes: 7e9971661b ("netdev: Append any vendor IEs from the handshake")
Use an _auto_ variable to cleanup IEs allocated by
p2p_build_association_req(). While here, take out unneeded L_WARN_ON
since p2p_build_association_req cannot fail.
If the FT-Authenticate frame has been sent then a deauth is received
the work item for sending the FT-Associate frame is never canceled.
When this runs station->connected_network is NULL which causes a
crash:
src/station.c:station_try_next_transition() 7, target xx:xx:xx:xx:xx:xx
src/wiphy.c:wiphy_radio_work_insert() Inserting work item 5843
src/wiphy.c:wiphy_radio_work_insert() Inserting work item 5844
src/wiphy.c:wiphy_radio_work_done() Work item 5842 done
src/wiphy.c:wiphy_radio_work_next() Starting work item 5843
src/netdev.c:netdev_mlme_notify() MLME notification Remain on Channel(55)
src/ft.c:ft_send_authenticate()
src/netdev.c:netdev_mlme_notify() MLME notification Frame TX Status(60)
src/netdev.c:netdev_link_notify() event 16 on ifindex 7
src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
src/netdev.c:netdev_deauthenticate_event()
src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
src/netdev.c:netdev_disconnect_event()
Received Deauthentication event, reason: 7, from_ap: true
src/station.c:station_disconnect_event() 7
src/station.c:station_disassociated() 7
src/station.c:station_reset_connection_state() 7
src/station.c:station_roam_state_clear() 7
src/netconfig.c:netconfig_event_handler() l_netconfig event 2
src/netconfig-commit.c:netconfig_commit_print_addrs() removing address: yyy.yyy.yyy.yyy
src/resolve.c:resolve_systemd_revert() ifindex: 7
[DHCPv4] l_dhcp_client_stop:1264 Entering state: DHCP_STATE_INIT
src/station.c:station_enter_state() Old State: connected, new state: disconnected
src/station.c:station_enter_state() Old State: disconnected, new state: autoconnect_quick
src/wiphy.c:wiphy_radio_work_insert() Inserting work item 5845
src/netdev.c:netdev_mlme_notify() MLME notification Cancel Remain on Channel(56)
src/wiphy.c:wiphy_radio_work_done() Work item 5843 done
src/wiphy.c:wiphy_radio_work_next() Starting work item 5844
"Program terminated with signal SIGSEGV, Segmentation fault.",
"#0 0x0000565359ee3f54 in network_bss_find_by_addr ()",
"#0 0x0000565359ee3f54 in network_bss_find_by_addr ()",
"#1 0x0000565359ec9d23 in station_ft_work_ready ()",
"#2 0x0000565359ec0af0 in wiphy_radio_work_next ()",
"#3 0x0000565359f20080 in offchannel_mlme_notify ()",
"#4 0x0000565359f4416b in received_data ()",
"#5 0x0000565359f40d90 in io_callback ()",
"#6 0x0000565359f3ff4d in l_main_iterate ()",
"#7 0x0000565359f4001c in l_main_run ()",
"#8 0x0000565359f40240 in l_main_run_with_signal ()",
"#9 0x0000565359eb3888 in main ()"
ssid is declared as a 32 byte field in handshake_state, hence using it
as a string which is assumed to be nul-terminated will fail for SSIDs
that are 32 bytes long.
Fixes: d938d362b2 ("erp: ERP implementation and key cache move")
Fixes: 433373fe28 ("eapol: cache ERP keys on EAP success")
ssid is declared as a 32 byte field in handshake_state, hence using it
as a string which is assumed to be nul-terminated will fail for SSIDs
that are 32 bytes long.
Fixes: 1f14782857 ("wiphy: add _generate_address_from_ssid")
Fixes: 5a1b1184fc ("netdev: support per-network MAC addresses")
In netdev_retry_owe, if l_gen_family_send fails, the connect_cmd is
never freed or reset. Fix that.
While here, use a stack variable instead of netdev member, since the use
of such a member is unnecessary and confusing.
vendor_ies stored in handshake_state are already added as part of
netdev_populate_common_ies(), which is already invoked by
netdev_build_cmd_connect().
Normally vendor_ies is NULL for OWE connections, so no IEs are
duplicated as a result.
CC src/adhoc.o
In file included from src/adhoc.c:28:0:
/usr/include/linux/if.h:234:19: error: field ‘ifru_addr’ has incomplete type
struct sockaddr ifru_addr;
^
/usr/include/linux/if.h:235:19: error: field ‘ifru_dstaddr’ has incomplete type
struct sockaddr ifru_dstaddr;
^
/usr/include/linux/if.h:236:19: error: field ‘ifru_broadaddr’ has incomplete type
struct sockaddr ifru_broadaddr;
^
/usr/include/linux/if.h:237:19: error: field ‘ifru_netmask’ has incomplete type
struct sockaddr ifru_netmask;
^
/usr/include/linux/if.h:238:20: error: field ‘ifru_hwaddr’ has incomplete type
struct sockaddr ifru_hwaddr;
^
Very rarely on ath10k (potentially other ath cards), disabling
power save while the interface is down causes a timeout when
bringing the interface back up. This seems to be a race in the
driver or firmware but it causes IWD to never start up properly
since there is no retry logic on that path.
Retrying is an option, but a more straight forward approach is
to just reorder the logic to set power save off after the
interface is already up. If the power save setting fails we can
just log it, ignore the failure, and continue. From a users point
of view there is no real difference in doing it this way as
PS still gets disabled prior to IWD connecting/sending data.
Changing behavior based on a buggy driver isn't something we
should be doing, but in this instance the change shouldn't have
any downside and actually isn't any different than how it has
been done prior to the driver quirks change (i.e. use network
manager, iw, or iwconfig to set power save after IWD starts).
For reference, this problem is quite rare and difficult to say
exactly how often but certainly <1% of the time:
iwd[1286641]: src/netdev.c:netdev_disable_ps_cb() Disabled power save for ifindex 54
kernel: ath10k_pci 0000:02:00.0: wmi service ready event not received
iwd[1286641]: Error bringing interface 54 up: Connection timed out
kernel: ath10k_pci 0000:02:00.0: Could not init core: -110
After this IWD just sits idle as it has no interface to start using.
This is even reproducable outside of IWD if you loop and run:
ip link set <wlan> down
iw dev <wlan> set power_save off
ip link set <wlan> up
Eventually the 'up' command will fail with a timeout.
I've brought this to the linux-wireless/ath10k mailing list but
even if its fixed in future kernels we'd still need to support
older kernels, so a workaround/change in IWD is still required.
This is done already for DPP, do the same for PKEX. Few drivers
(ath9k upstream, ath10k/11k in progress) support this which is
unfortunate but since a configurator will not work without this
capability its best to fail early.
The DPP spec allows 3rd party fields in the DPP configuration
object (section 4.5.2). IWD can take advantage of this (when
configuring another IWD supplicant) to communicate additional
profile options that may be required for the network.
The new configuration member will be called "/net/connman/iwd"
and will be an object containing settings specific to IWD.
More settings could be added here if needed but for now only
the following are defined:
{
send_hostname: true/false,
hidden: true/false
}
These correspond to the following network profile settings:
[IPv4].SendHostname
[Settings].Hidden
The scan result handling was fragile because it assumed the kernel
would only give results matching the requested SSID. This isn't
something we should assume so instead keep the configuration object
around until after the scan and use the target SSID to lookup the
network.
Nearly every use of the ssid member first has to memcpy it to a
buffer and NULL terminate. Instead just store the ssid as a
string when creating/parsing from JSON.
The DPP-PKEX spec provides a very limited list of frequencies used
to discover configurators, only 3 on 2.4 and 5GHz bands. Since
configurators (at least in IWD's implementation) are only allowed
on the current operating frequency its very unlikely an enrollee
will find a configurator on these frequencies out of the entire
spectrum.
The spec does mention that the 3 default frequencies should be used
"In lieu of specific channel information obtained in a manner outside
the scope of this specification, ...". This allows the implementation
some flexibility in using a broader range of frequencies.
To increase the chances of finding a configurator shared code
enrollees will first issue a scan to determine what access points are
around, then iterate these frequencies. This is especially helpful
when the configurators are IWD-based since we know that they'll be
on the same channels as the APs in the area.
The post-DPP connection was never done quite right due to station's
state being unknown. The state is now tracked in DPP by a previous
patch but the scan path in DPP is still wrong.
It relies on station autoconnect logic which has the potential to
connect to a different network than what was configured with DPP.
Its unlikely but still could happen in theory. In addition the scan
was not selectively filtering results by the SSID that DPP
configured.
This fixes the above problems by first filtering the scan by the
SSID. Then setting the scan results into station without triggering
autoconnect. And finally using network_autoconnect() directly
instead of relying on station to choose the SSID.
DPP (both DPP and PKEX) run the risk of odd behavior if station
decides to change state. DPP is completely unaware of this and
best case would just result in a protocol failure, worst case
duplicate calls to __station_connect_network.
Add a station watch and stop DPP if station changes state during
the protocol.
Commit c59669a366 ("netdev: disambiguate between disconnection types")
introduced different paths for different types of disconnection
notifications from netdev. Formalize this further by having
netdev_connect_failed only invoke connect_cb.
Disconnections that could be triggered outside of connection
related events are now handled on a different code path. For this
purpose, netdev_disconnected() is introduced.
When a roam event is received, iwd generates a firmware scan request and
notifies its event filter of the ROAMING condition. In cases where the
firmware scan could not be started successfully, netdev_connect_failed
is invoked. This is not a correct use of netev_connect_failed since it
doesn't actually disconnect the underlying netdev and the reflected
state becomes de-synchronized from the underlying kernel device.
The firmware scan request could currently fail for two reasons:
1. nl80211 genl socket is in a bad state, or
2. the scan context does not exist
Since both reasons are highly unlikely, simply use L_WARN instead.
The other two cases where netdev_connect_failed is used could only occur
if the kernel message is invalid. The message is ignored in that case
and a warning is printed.
The situation described above also exists in netdev_get_fw_scan_cb. If
the scan could not be completed successfully, there's not much iwd can
do to recover. Have iwd remain in roaming state and print an error.
There are generally three scenarios where iwd generates a disconnection
command to the kernel:
1. Error conditions stemming from a connection related event. For
example if SAE/FT/FILS authentication fails during Authenticate or
Associate steps and the kernel doesn't disconnect properly.
2. Deauthentication after the connection has been established and not
related to a connection attempt in progress. For example, SA Query
processing that triggers an disconnect.
3. Disconnects that are triggered due to a handshake failure or if
setting keys resulting from the handshake fails. These disconnects
can be triggered as a result of a pending connection or when a
connection has been established (e.g. due to rekeying).
Distinguish between 1 and 2/3 by having the disconnect procedure take
different paths. For now there are no functional changes since all
paths end up in netdev_connect_failed(), but this will change in the
future.
While here, also get rid of netdev_del_station. The only user of this
function was in ap.c and it could easily be replaced by invoking the new
nl80211_build_del_station function. The callback used by
netdev_build_del_station only printed an error and didn't do anything
useful. Get rid of it for now.
netdev_begin_connection() already invokes netdev_connect_failed on
error. Remove any calls to netdev_connect_failed in callers of
netdev_begin_connection().
Fixes: 4165d9414f ("netdev: use wiphy radio work queue for connections")
If netdev_get_oci fails, a goto deauth is invoked in order to terminate
the current connection and return an error to the caller. Unfortunately
the deauth label builds CMD_DEAUTHENTICATE in order to terminate the
connection. This was fine because it used to handle authentication
protocols that ran over CMD_AUTHENTICATE and CMD_ASSOCIATE. However,
OCI can also be used on FullMAC hardware that does not support them.
Use CMD_DISCONNECT instead which works everywhere.
Fixes: 06482b8116 ("netdev: Obtain operating channel info")
The reason code field was being obtained as a uint8_t value, while it is
actually a uint16_t in little-endian byte order.
Fixes: f3cc96499c ("netdev: added support for SA Query")
The reason code from deauthentication frame was being obtained as a
uint8_t instead of a uint16_t. The value was only ever used in an
informational statement. Since the value was in little endian, only the
first 8 bits of the reason code were obtained. Fix that.
Fixes: 2bebb4bdc7 ("netdev: Handle deauth frames prior to association")
Adds a configurator variant to be used along side an agent. When
called the configurator will start and wait for an initial PKEX
exchange message from an enrollee at which point it will request
the code from an agent. This provides more flexibility for
configurators that are capable of configuring multiple enrollees
with different identifiers/codes.
Note that the timing requirements per the DPP spec still apply
so this is not meant to be used with a human configurator but
within an automated agent which does a quick lookup of potential
identifiers/codes and can reply within the 200ms window.
The PKEX configurator role is currently limited to being a responder.
When started the configurator will listen on its current operating
channel for a PKEX exchange request. Once received it and the
encrypted key is properly decrypted it treats this peer as the
enrollee and won't allow configurations from other peers unless
PKEX is restarted. The configurator will encrypt and send its
encrypted ephemeral key in the PKEX exchange response. The enrollee
then sends its encrypted bootstrapping key (as commit-reveal request)
then the same for the configurator (as commit-reveal response).
After this, PKEX authentication begins. The enrollee is expected to
send the authenticate request, since its the initiator.
This is the initial support for PKEX enrollees acting as the
initiator. A PKEX initiator starts the protocol by broadcasting
the PKEX exchange request. This request contains a key encrypted
with the pre-shared PKEX code. If accepted the peer sends back
the exchange response with its own encrypted key. The enrollee
decrypts this and performs some crypto/hashing in order to establish
an ephemeral key used to encrypt its own boostrapping key. The
boostrapping key is encrypted and sent to the peer in the PKEX
commit-reveal request. The peer then does the same thing, encrypting
its own bootstrapping key and sending to the initiator as the
PKEX commit-reveal response.
After this, both peers have exchanged their boostrapping keys
securely and can begin DPP authentication, then configuration.
For now the enrollee will only iterate the default channel list
from the Easy Connect spec. Future upates will need to include some
way of discovering non-default channel configurators, but the
protocol needs to be ironed out first.
PKEX and DPP will share the same state machine since the DPP protocol
follows PKEX. This does pose an issue with the DBus interfaces
because we don't want DPP initiated by the SharedCode interface to
start setting properties on the DeviceProvisioning interface.
To handle this a dpp_interface enum is being introduced which binds
the dpp_sm object to a particular interface, for the life of the
protocol run. Once the protocol finishes the dpp_sm can be unbound
allowing either interface to use it again later.
This mispelling was present in the configuration, so I retained parsing
of the legacy BandModifier*Ghz options for compatibility. Without this
change anyone spelling GHz correctly in their configs would be very
confused.
Beacon loss handling was removed in the past because it was
determined that this even always resulted in a disconnect. This
was short sighted and not always true. The default kernel behavior
waits for 7 lost beacons before emitting this event, then sends
either a few nullfuncs or probe requests to the BSS to determine
if its really gone. If these come back successfully the connection
will remain alive. This can give IWD some time to roam in some
cases so we should be handling this event.
Since beacon loss indicates a very poor connection the roam scan
is delayed by a few seconds in order to give the kernel a chance
to send the nullfuncs/probes or receive more beacons. This may
result in a disconnect, but it would have happened anyways.
Attempting a roam mainly handles the case when the connection can
be maintained after beacon loss, but is still poor.
This is being done to allow the DPP module to work correctly. DPP
currently uses __station_connect_network incorrectly since it
does not (and cannot) change the state after calling. The only
way to connect with a state change is via station_connect_network
which requires a DBus method that triggered the connection; DPP
does not have this due to its potentially long run time.
To support DPP there are a few options:
1. Pass a state into __station_connect_network (this patch)
2. Support a NULL DBus message in station_connect_network. This
would require several NULL checks and adding all that to only
support DPP just didn't feel right.
3. A 3rd connect API in station which wraps
__station_connect_network and changes the state. And again, an
entirely new API for only DPP felt wrong (I guess we did this
for network_autoconnect though...)
Its about 50/50 between call sites that changed state after calling
and those that do not. Changing the state inside
__station_connect_network felt useful enough to cover the cases that
could benefit and the remaining cases could handle it easily enough:
- network_autoconnect(), and the state is changed by station after
calling so it more or less follows the same pattern just routes
through network. This will now pass the CONNECTING_AUTO state
from within network vs station.
- The disconnect/reconnect path. Here the state is changed to
ROAMING prior in order to avoid multiple state changes. Knowing
this the same ROAMING state can be passed which won't trigger a
state change.
- Retrying after a failed BSS. The state changes on the first call
then remains the same for each connection attempt. To support this
the current station->state is passed to avoid a state change.
Until now IWD only supported enrollees as responders (configurators
could do both). For PKEX it makes sense for the enrollee to be the
initiator because configurators in the area are already on their
operating channel and going off is inefficient. For PKEX, whoever
initiates also initiates authentication so for this reason the
authentication path is being opened up to allow enrollees to
initiate.
The check for the header was incorrect according to the spec.
Table 58 indicates that the "Query Response Info" should be set
to 0x00 for the configuration request. The frame handler was
expecting 0x7f which is the value for the config response frame.
Unfortunately wpa_supplicant also gets this wrong and uses 0x7f
in all cases which is likely why this value was set incorrectly
in IWD. The issue is that IWD's config request is correct which
means IWD<->IWD configuration is broken. (and wpa_supplicant as
a configurator likely doesn't validate the config request).
Fix this by checking both 0x7f and 0x00 to handle both
supplicants.
Stopping periodic scans and not restarting them prevents autoconnect
from working again if DPP (or the post-DPP connect) fails. Since
the DPP offchannel work is at a higher priority than scanning (and
since new offchannels are queue'd before canceling) there is no risk
of a scan happening during DPP so its safe to leave periodic scans
running.
The packet loss handler puts a higher priority on roaming compared
to the low signal roam path. This is generally beneficial since this
event usually indicates some problem with the BSS and generally is
an indicator that a disconnect will follow sometime soon.
But by immediately issuing a scan we run the risk of causing many
successive scans if more packet loss events arrive following
the roam scans (and if no candidates are found). Logs provided
further.
To help with this handle the first event with priority and
immediately issue a roam scan. If another event comes in within a
certain timeframe (2 seconds) don't immediately scan, but instead
rearm the roam timer instead of issuing a scan. This also handles
the case of a low signal roam scan followed by a packet loss
event. Delaying the roam will at least provide some time for packets
to get out in between roam scans.
Logs were snipped to be less verbose, but this cycled happened
5 times prior. In total 7 scans were issued in 5 seconds which may
very well have been the reason for the local disconnect:
Oct 27 16:23:46 src/station.c:station_roam_failed() 9
Oct 27 16:23:46 src/wiphy.c:wiphy_radio_work_done() Work item 29 done
Oct 27 16:23:47 src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
Oct 27 16:23:47 src/station.c:station_packets_lost() Packets lost event: 10
Oct 27 16:23:47 src/station.c:station_roam_scan() ifindex: 9
Oct 27 16:23:47 src/wiphy.c:wiphy_radio_work_insert() Inserting work item 30
Oct 27 16:23:47 src/wiphy.c:wiphy_radio_work_next() Starting work item 30
Oct 27 16:23:47 src/station.c:station_start_roam() Using cached neighbor report for roam
Oct 27 16:23:47 src/scan.c:scan_notify() Scan notification Trigger Scan(33)
Oct 27 16:23:47 src/scan.c:scan_request_triggered() Active scan triggered for wdev a
Oct 27 16:23:47 src/scan.c:scan_notify() Scan notification New Scan Results(34)
Oct 27 16:23:47 src/netdev.c:netdev_link_notify() event 16 on ifindex 9
... scan results ...
Oct 27 16:23:47 src/station.c:station_roam_failed() 9
Oct 27 16:23:47 src/wiphy.c:wiphy_radio_work_done() Work item 30 done
Oct 27 16:23:47 src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
Oct 27 16:23:47 src/station.c:station_packets_lost() Packets lost event: 10
Oct 27 16:23:47 src/station.c:station_roam_scan() ifindex: 9
Oct 27 16:23:47 src/wiphy.c:wiphy_radio_work_insert() Inserting work item 31
Oct 27 16:23:47 src/wiphy.c:wiphy_radio_work_next() Starting work item 31
Oct 27 16:23:47 src/station.c:station_start_roam() Using cached neighbor report for roam
Oct 27 16:23:47 src/scan.c:scan_notify() Scan notification Trigger Scan(33)
Oct 27 16:23:47 src/scan.c:scan_request_triggered() Active scan triggered for wdev a
Oct 27 16:23:48 src/scan.c:scan_notify() Scan notification New Scan Results(34)
Oct 27 16:23:48 src/netdev.c:netdev_link_notify() event 16 on ifindex 9
... scan results ...
Oct 27 16:23:48 src/station.c:station_roam_failed() 9
Oct 27 16:23:48 src/wiphy.c:wiphy_radio_work_done() Work item 31 done
Oct 27 16:23:48 src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
Oct 27 16:23:48 src/station.c:station_packets_lost() Packets lost event: 10
Oct 27 16:23:48 src/station.c:station_roam_scan() ifindex: 9
Oct 27 16:23:48 src/wiphy.c:wiphy_radio_work_insert() Inserting work item 32
Oct 27 16:23:48 src/wiphy.c:wiphy_radio_work_next() Starting work item 32
Oct 27 16:23:48 src/station.c:station_start_roam() Using cached neighbor report for roam
Oct 27 16:23:48 src/scan.c:scan_notify() Scan notification Trigger Scan(33)
Oct 27 16:23:48 src/scan.c:scan_request_triggered() Active scan triggered for wdev a
Oct 27 16:23:49 src/netdev.c:netdev_link_notify() event 16 on ifindex 9
Oct 27 16:23:49 src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
Oct 27 16:23:49 src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
Oct 27 16:23:49 src/netdev.c:netdev_deauthenticate_event()
Oct 27 16:23:49 src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
Oct 27 16:23:49 src/netdev.c:netdev_disconnect_event()
Oct 27 16:23:49 Received Deauthentication event, reason: 4, from_ap: false
Include a specific timeout value so different protocols can specify
different timeouts. For example once the authentication timeout
should not take very long (even 10 seconds seems excessive) but
adding PKEX may warrant longer timeouts.
For example discovering a configurator IWD may want to wait several
minutes before ending the discovery. Similarly running PKEX as a
configurator we should put a hard limit on the time, but again
minutes rather than 10 seconds.
Its been seen (so far only in mac80211_hwsim + UML) where an
offchannel requests ACK comes after the ROC started event. This
causes the ROC started event to never call back to notify since
info->roc_cookie is unset and it appears to be coming from an
external process.
We can detect this situation in the ROC notify event by checking
if there is a pending ROC command and if info->roc_cookie does
not match. This can also be true for an external event so we just
set a new "early_cookie" member and return.
Then, when the ACK comes in for the ROC request, we can validate
if the prior event was associated with IWD or some external
process. If it was from IWD call the started callback, otherwise
the ROC notify event should come later and handled under the
normal logic where the cookies match.
Instead of looking up by wdev, lookup by the ID itself. We
shouldn't ever have more than one info per wdev in the queue but
looking up the _exact_ info structure doesn't hurt in case things
change in the future.
If netconfig is canceled before completion (when roaming) the
settings are freed and never loaded again once netconfig is started
post-roam. Now after a roam make sure to re-load the settings and
start netconfig.