ConnectHiddenNetwork can be seen a triggering this sequence:
1. the active scan,
2. the optional agent request,
3. the Authentication/Association/4-Way Handshake/netconfig,
4. connected state
Currently Disconnect() interrupts 3 and 4, allow it to also interrupt
state 1. It's difficult to tell whether we're in state 2 from within
station.c.
For multi-bss networks its nice to know which BSS is being connected
to. The ranking can hint at it, but blacklisting or network capabilities
could effect which network is actually chosen. An explicit debug print
makes debugging much easier.
Move the update of station->networks_sorted order to before we set
station->connected_network NULL to avoid a crash when we attempt to
use the NULL pointer.
Besides being undefined behaviour, signed integer overflow can cause
unexpected comparison results. In the case of network_rank_compare(),
a connected network with rank INT_MAX would cause newly inserted
networks with negative rank to be inserted earlier in the ordered
network list. This is reflected in the GetOrderedMethods() DBus method
as can be seen in the following iwctl output:
[iwd]# station wlan0 get-networks
Network name Security Signal
----------------------------------------------------
BEOLAN 8021x **** }
BeoBlue psk *** } all unknown,
UI_Test_Network psk *** } hence assigned
deneb_2G psk *** } negative rank
BEOGUEST open **** }
> titan psk ****
Linksys05274_5GHz_dmt psk ****
Lyngby-4G-4 5GHz psk ****
Doing so ensures that the currently connected network is always at the
beginning of the list. Previously, the list would only get updated after
a scan.
This fixes the documented behaviour of GetOrderedNetworks() DBus method,
which states that the currently connected network is always at the
beginning of the returned array.
To use the wiphy radio work queue, scanning mostly remained the same.
start_next_scan_request was modified to be used as the work callback,
as well as not start the next scan if the current one was done
(since this is taken care of by wiphy work queue now). All
calls to start_next_scan_request were removed, and more or less
replaced with wiphy_radio_work_done.
scan_{suspend,resume} were both removed since radio management
priorities solve this for us. ANQP requests can be inserted ahead of
scan requests, which accomplishes the same thing.
Before connecting to a hidden network we must scan. During this scan
if another connection attempt comes in the expected behavior is to
abort the original connection. Rather than waiting for the scan to
complete, then canceling the original hidden connection we can just
cancel the hidden scan immediately, reply to dbus, and continue with
the new connection attempt.
The new frame-xchg module now handles a lot of what ANQP used to do. ANQP
now does not need to depend on nl80211/netdev for building and sending
frames. It also no longer needs any of the request lookups, frame watches
or to maintain a queue of requests because frame-xchg filters this for us.
From an API perspective:
- anqp_request() was changed to take the wdev_id rather than ifindex.
- anqp_cancel() was added so that station can properly clean up ANQP
requests if the device disappears.
During testing a bug was also fixed in station on the timeout path
where the request queue would get popped twice.
When roaming, iwd tries to scan a limited number of frequencies to keep
the roaming latency down. Ideally the frequency list would come in from
a neighbor report, but if neighbor reports are not supported, we fall
back to our internal database for known frequencies of this network.
iwd tries to keep the number of scans down to a bare minimum, which
means that we might miss APs that are in range. This could happen
because the user might have moved physically and our frequency list is
no longer up to date, or if the AP frequencies have been reconfigured.
If a limited scan fails to find any good roaming candidates, re-attempt
a full scan right away.
If the roam failed and we are no longer connected, station_disassociated
is called which ends up calling station_roam_state_clear. Thus
resetting the variables is not needed. Reflow the logic to make this a
bit more explicit.
If the roam attempt fails, do not reset this to false. Generally this
is set by the fact that we lost beacon and to not attempt neighbor
reports, etc. This hint should be preserved across roam attempts.
Right now, if the connection fails, then network always thinks that the
password should be re-asked. Loosen this to only do so if the
connection failed at least in the handshake phase. If the connection
failed due to Association / Authentication timeout, it is likely that
something is wrong with the AP and it can't respond.
This is to allow network to watch for ANQP activity in order to
fix the race condition between scanning finishing and ANQP finishing.
Without this it is possible for a DBus Connect() to come in before
ANQP has completed and causing the network to return NotConfigured,
when its actually in the process of obtaining all the network info.
The watch was made globally in station due to network not having
a station object until each individual network is created. Adding a
watch during network creation would result in many watchers as well
as a lot of removal/addition as networks are found and lost.
Change signature of network_connect_new_hidden_network to take
reference to the caller's l_dbus_message struct. This allows to
set the caller's l_dbus_message struct to NULL after replying in
the case of a failure.
==201== at 0x467C15: l_dbus_message_unref (dbus-message.c:412)
==201== by 0x412A51: station_hidden_network_scan_results (station.c:2504)
==201== by 0x41EAEA: scan_finished (scan.c:1505)
==201== by 0x41EC10: get_scan_done (scan.c:1535)
==201== by 0x462592: destroy_request (genl.c:673)
==201== by 0x462987: process_unicast (genl.c:988)
==201== by 0x462987: received_data (genl.c:1087)
==201== by 0x45F5A2: io_callback (io.c:126)
==201== by 0x45E8FD: l_main_iterate (main.c:474)
==201== by 0x45E9BB: l_main_run (main.c:521)
==201== by 0x45EBCA: l_main_run_with_signal (main.c:643)
==201== by 0x403B15: main (main.c:512)
Introduce hidden_pending to keep reference to the dbus message object
while we wait for the scan results to be returned while trying to
connect to a hidden network. This simplifies the logic by separating it
into two independent logical units: scanning, connecting and eliminates
a possibility of a memory leak in the case when Network.Connect being
initiated while Station.ConnectHiddenNetwork is in progress.
If a connection is initiated (via dbus) while a quick scan is in
progress, the quick scan will be aborted. In this case,
station_quick_scan_results will always transition to the
AUTOCONNECT_FULL state regardless of whether it should or not.
Fix this by making sure that we only enter AUTOCONNECT_FULL if we're
still in the AUTOCONNECT_QUICK state.
Reported-by: Alvin Šipraga <alsi@bang-olufsen.dk>
This patch adds two new options to a network provisioning file:
AlwaysRandomizeAddress={true,false}
If true, IWD will randomize the MAC address on each connection to this
network. The address does not persists between connections, any new
connection will result in a different MAC.
AddressOverride=<MAC>
If set, the MAC address will be set to <MAC> assuming its a valid MAC
address.
These two options should not be used together, and will only take effect
if [General].AddressRandomization is set to 'network' in the IWD
config file.
If neither of these options are set, and [General].AddressRandomization
is set to 'network', the default behavior remains the same; the MAC
will be generated deterministically on a per-network basis.
If a scan is requested during the middle of a connection we should
return busy instead of attempting the scan. The kernel ends up coming
back with not supported in this case, which is misleading and
difficult to debug.
For Radio Resource Management (RRM) we will need access to the currently
connected BSS as well as the last scan results in order to do certain
kinds of requested measurements.
On EAP events, call the handshake_event handler with the new event type
HANDSHAKE_EVENT_EAP_NOTIFY isntead of the eapol_event callback.
This allows the handler to be set before calling
netdev_connect/netdev_connect_wsc. It's also in theory more type-safe
because we don't need the cast in netdev_connect_wsc anymore.
Convert the handshake event callback type to use variable argument
list to allow for more flexibility in event-specific arguments
passed to the callbacks.
Note the uint16_t reason code is promoted to an int when using variable
arguments so va_arg(args, int) has to be used.
Previously, station state 'connected' used to identify an interface associated
with AP. With the introduction of netconfig, an interface is assumed to be
connected after the IP addresses have been assigned to it. If netconfig is
disabled, the behavior remains unchanged.
As a first step to enable the usage of netconfig in ead and
prospective transition to be a part of ell, the public API for
creation and destruction of the netconfig objects has been
renamed and changed. Instead of hiding the netconfig objects inside
of netconfig module, the object is now passed back to the caller.
The internal queue of netconfig objects remains untouched, due
to limitations in ell’s implementation of rtnl. After the proper
changes are done to ell, netconfig_list is expected to be removed
from netconfig module.
If neighbor reports are unavailable, or the report yielded no
results we can quickly scan for only known frequencies. This
changes the original behavior where we would do a full scan
in this case.
Station was building up the HS20 elements manually. Now we can
use this new API and let network take care of the complexity
of building network specific vendor IEs.
Since hotspot networks may require ANQP the autoconnect loop needed to
be delayed until after the ANQP results came back and the network
objects were updated. If there are hotspot networks in range ANQP will
be performed and once complete autoconnect will begin for all networks
including hotspots. If no hotspots are in range autoconnect will
proceed as it always has.
Note: Assuming hotspots are in range this will introduce some delay
in autoconnecting to any network since ANQP must come back. The full
plan is to intellegently decide when and when not to do ANQP in order
to minimize delays but since ANQP is disabled by default the behavior
introduced with this patch is acceptable.
For (Re)Association the HS20 indication element was passed exactly as
it was found in the scan results. The spec defines what bits can be
set and what cannot when this IE is used in (Re)Association. Instead
of assuming the AP's IE conforms to the spec, we now parse the IE and
re-build it for use with (Re)Association.
Since the full IE is no longer used, it was removed from scan_bss, and
replaced with a bit for HS20 support (hs20_capable). This member is
now used the same as hs20_ie was.
The version parsed during scan results is now used when building the
(Re)Association IE.
Previously, iwd used to throw net.connman.iwd.Busy when connection
attempt was made while connected. The new behavior allows iwd to
seamlessly disconnect from the connected network and attempt a new
connection.
The HS20 indication element should always be included during
(Re)Association per the spec. This removes the need for a
dedicated boolean, and now the hs20_ie can be used instead.
Per the hotspot 2.0 spec, if a matching roaming consortium OI is
found it should be added to the (Re)Association request. vendor_ies
can now be provided to netdev_connect, which get appended to the IE
attribute.
The ifindex is used to index the netdevs in the system (wlan, ethernet,
etc.) but we can also do wifi scanning on interfaces that have no
corresponding netdev object, like the P2P-device virtual interfaces.
Use the wdev id's to reference interfaces, the nl80211 api doesn't care
whether we use a NL80211_ATTR_IFINDEX or NL80211_ATTR_WDEV. Only
wireless interfaces have a wdev id.
After a scan, station can now pause future scans and start ANQP requests
to discover Hotspot's NAI realm. This lets us check if the AP's NAI realm
matches any stored hotspot configuration files. If so we can connect to
this network. If the network provides an HESSID and a matching one is
found in a hotspot provisioning file we can skip ANQP and directly connect
as this is expected to be our 'home network'
The handshake object had 4 setters for authenticator/supplicant IE.
Since the IE ultimately gets put into the same buffer, there really
only needs to be a single setter for authenticator/supplicant. The
handshake object can deal with parsing to decide what kind of IE it
is (WPA or RSN).
This adds some checks for the FT_OVER_FILS AKMs in station and netdev
allowing the FILS-FT AKMs to be selected during a connection.
Inside netdev_connect_event we actually have to skip parsing the IEs
because FILS itself takes care of this (needs to handle them specially)
FT-over-DS is a way to do a Fast BSS Transition using action frames for
the authenticate step. This allows a station to start a fast transition
to a target AP while still being connected to the original AP. This,
in theory, can result in less carrier downtime.
The existing ft_sm_new was removed, and two new constructors were added;
one for over-air, and another for over-ds. The internals of ft.c mostly
remain the same. A flag to distinguish between air/ds was added along
with a new parser to parse the action frames rather than authenticate
frames. The IE parsing is identical.
Netdev now just initializes the auth-proto differently depending on if
its doing over-air or over-ds. A new TX authenticate function was added
and used for over-ds. This will send out the IEs from ft.c with an
FT Request action frame.
The FT Response action frame is then recieved from the AP and fed into
the auth-proto state machine. After this point ft-over-ds behaves the
same as ft-over-air (associate to the target AP).
Some simple code was added in station.c to determine if over-air or
over-ds should be used. FT-over-DS can be beneficial in cases where the
AP is directing us to roam, or if the RSSI falls below a threshold.
It should not be used if we have lost communication to the AP all
(beacon lost) as it only works while we can still talk to the original
AP.
wiphy_select_akm needed to be updated to take a flag, which can be
set to true if there are known reauth keys for this connection. If
we have reauth keys, and FILS is available we will choose it.
Quick scan uses a set of frequencies associated with the
known networks. This allows to reduce the scan latency.
At this time, the frequency selection follows a very simple
logic by taking all known frequencies from the top 5 most
recently connected networks.
If connection isn't established after the quick scan attempt,
we fall back to the full periodic scan.
Previously, the scan results were disregarded once the new
ones were available. To enable the scan scenarios where the
new scan results are delivered in parts, we introduce a
concept of aging BSSs and will remove them based on
retention time.
The auto-connect state will now consist of the two phases:
STATION_STATE_AUTOCONNECT_QUICK and STATION_STATE_AUTOCONNECT_FULL.
The auto-connect will always start with STATION_STATE_AUTOCONNECT_QUICK
and then transition into STATION_STATE_AUTOCONNECT_FULL if no
connection has been established. During STATION_STATE_AUTOCONNECT_QUICK
phase we take advantage of the wireless scans with the limited number
of channels on which the known networks have been observed before.
This approach allows to shorten the time required for the network
sweeps, therefore decreases the connection latency if the connection
is possible. Thereafter, if no connection has been established after
the first phase we transition into STATION_STATE_AUTOCONNECT_FULL and
do the periodic scan just like we did before the split in
STATION_STATE_AUTOCONNECT state.
This is not used by any of the scan notify callback implementations and
for P2P we're going to need to scan on an interface without an ifindex
so without this the other changes should be mostly contained in scan.
Several Auth/Assoc failure status codes indicate that the connection
failed for reasons such as bandwidth issues, poor channel conditions
etc. These conditions should not result in the BSS being blacklisted
since its likely only a temporary issue and the AP is not actually
"broken" per-se.
This adds support in station.c to temporarily blacklist these BSS's
on a per-network basis. After the connection has completed we clear
out these blacklist entries.
Several netdev events benefit from including event data in the callback.
This is similar to how the connect callback works as well. The content
of the event data is documented in netdev.h (netdev_event_func_t).
By including event data for the two disconnect events, we can pass the
reason code to better handle the failure in station.c. Now, inside
station_disconnect_event, we still check if there is a pending connection,
and if so we can call the connect callback directly with HANDSHAKE_FAILED.
Doing it this way unifies the code path into a single switch statment to
handle all failures.
In addition, we pass the RSSI level index as event data to
RSSI_LEVEL_NOTIFY. This removes the need for a getter to be exposed in
netdev.h.
This change cleans up the mess of status vs reason codes. The two
types of codes have already been separated into different enumerations,
but netdev was still treating them the same (with last_status_code).
A new 'event_data' argument was added to the connect callback, which
has a different meaning depending on the result of the connection
(described inside netdev.h, netdev_connect_cb_t). This allows for the
removal of netdev_get_last_status_code since the status or reason
code is now passed via event_data.
Inside the netdev object last_status_code was renamed to last_code, for
the purpose of storing either status or reason. This is only used when
a disconnect needs to be emitted before failing the connection. In all
other cases we just pass the code directly into the connect_cb and do
not store it.
All ocurrences of netdev_connect_failed were updated to use the proper
code depending on the netdev result. Most of these simply changed from
REASON_CODE_UNSPECIFIED to STATUS_CODE_UNSPECIFIED. This was simply for
consistency (both codes have the same value).
netdev_[authenticate|associate]_event's were updated to parse the
status code and, if present, use that if their was a failure rather
than defaulting to UNSPECIFIED.
If we have a BSS list where all BSS's have been blacklisted we still
need a way to force a connection to that network, instead of having
to wait for the blacklist entry to expire. network_bss_select now
takes a boolean 'fallback_to_blacklist' which causes the selection
to still return a connectable BSS even if the entire list was
blacklisted.
In most cases this is set to true, as these cases are initiated by
DBus calls. The only case where this is not true is inside
station_try_next_bss, where we do want to honor the blacklist.
This both prevents an explicit connect call (where all BSS's are
blacklisted) from trying all the blacklisted BSS's, as well as the
autoconnect case where we simply should not try to connect if all
the BSS's are blacklisted.
There are is some implied behavior here that may not be obvious:
On an explicit DBus connect call IWD will attempt to connect to
any non-blacklisted BSS found under the network. If unsuccessful,
the current BSS will be blacklisted and IWD will try the next
in the list. This will repeat until all BSS's are blacklisted,
and in this case the connect call will fail.
If a connect is tried again when all BSS's are blacklisted IWD
will attempt to connect to the first connectable blacklisted
BSS, and if this fails the connect call will fail. No more
connection attempts will happen until the next DBus call.
If IWD fails to connect to a BSS we can attempt to connect to a different
BSS under the same network and blacklist the first BSS. In the case of an
incorrect PSK (MMPDU code 2 or 23) we will still fail the connection.
station_connect_cb was refactored to better handle the dbus case. Now the
netdev result switch statement is handled before deciding whether to send
a dbus reply. This allows for both cases where we are trying to connect
to the next BSS in autoconnect, as well as in the dbus case.
This makes __station_connect_network even less intelligent by JUST
making it connect to a network, without any state changes. This makes
the rekey logic much cleaner.
We were also changing dbus properties when setting the state to
CONNECTING, so those dbus property change calls were moved into
station_enter_state.
A new driver extended feature bit was added signifying if the driver
supports PTK replacement/rekeying. During a connect, netdev checks
for the driver feature and sets the handshakes 'no_rekey' flag
accordingly.
At some point the AP will decide to rekey which is handled inside
eapol. If no_rekey is unset we rekey as normal and the connection
remains open. If we have set no_rekey eapol will emit
HANDSHAKE_EVENT_REKEY_FAILED, which is now caught inside station. If
this happens our only choice is to fully disconnect and reconnect.
The changes to station.c are minor. Specifically,
station_build_handshake_rsn was modified to always build up the RSN
information, not just for SECURITY_8021X and SECURITY_PSK. This is
because OWE needs this RSN information, even though it is still
SECURITY_NONE. Since "regular" open networks don't need this, a check
was added (security == NONE && akm != OWE) which skips the RSN
building.
netdev.c needed to be changed in nearly the same manor as it was for
SAE. When connecting, we check if the AKM is for OWE, and if so create
a new OWE SM and start it. OWE handles all the ECDH, and netdev handles
sending CMD_AUTHENTICATE and CMD_ASSOCIATE when triggered by OWE. The
incoming authenticate/associate events just get forwarded to OWE as they
do with SAE.
During the handshake setup, if security != SECURITY_PSK then 8021x settings
would get set in the handshake object. This didn't appear to break anything
(e.g. Open/WEP) but its better to explicitly check that we are setting up
an 8021x network.
A sorted list of hidden network BSSs observed in the recent scan
is kept for the informational purposes of the clients. In addition,
it has deprecated the usage of seen_hidden_networks variable.
If there are Ad-hoc BSSes they should be present in the scan results
together with regular APs as far as scan.c is concerned. But in
station mode we can't connect to them -- the Connect method will fail and
autoconnect would fail. Since we have no property to indicate a
network is an IBSS just filter these results out for now. There are
perhaps better solutions but the benefit is very low.
This is a replacement for station's static select_akm_suite. This was
done because wiphy can make a much more intellegent decision about the
akm suite by checking the wiphy supported features e.g. SAE support.
This allows a connection to hybrid WPA2/WPA3 AP's if SAE is not
supported in the kernel.
As a consequence of the previous commit, netdev watches are always
called when the device object is valid. As a result, we can drop the
netdev_get_device calls and checks from individual AP/AdHoc/Station/WSC
netdev watches
Instead of creating the Station interface in device.c create it directly
on the netdev watch event the same way that the AP and AdHoc interfaces
are created and freed. This fixes some minor incosistencies, for
example station_free was previously called twice, once from device.c and
once from the netdev watch.
device.c would previously keep the pointer returned by station_create()
but that pointer was not actually useful so remove it. Autotests still
seem to pass.
Call netdev_disconnect() to make netdev forget any of station.c's
callbacks for connections or transitions in progress or established.
Otherwise station.c will crash as soon as we're connected and try to
change interface mode:
==17601== Invalid read of size 8
==17601== at 0x11DFA0: station_disconnect_event (station.c:775)
==17601== by 0x11DFA0: station_netdev_event (station.c:1570)
==17601== by 0x115D18: netdev_disconnect_event (netdev.c:868)
==17601== by 0x115D18: netdev_mlme_notify (netdev.c:3403)
==17601== by 0x14E287: l_queue_foreach (queue.c:441)
==17601== by 0x1558B4: process_multicast (genl.c:469)
==17601== by 0x1558B4: received_data (genl.c:532)
==17601== by 0x152888: io_callback (io.c:123)
==17601== by 0x151BCD: l_main_iterate (main.c:376)
==17601== by 0x151C9B: l_main_run (main.c:423)
==17601== by 0x10FE20: main (main.c:489)
Boiled down, FT over SAE is no different than FT over PSK, apart from
the different AKM suite. The bulk of this change fixes the current
netdev/station logic related to SAE by rebuilding the RSNE and adding
the MDE if present in the handshake to match what the PSK logic does.
A common function was introduced into station which will rebuild the
handshake rsne's for a target network. This is used for both new
network connections as well as fast transitions.
To prepare for FT over SAE, several case/if statements needed to include
IE_RSN_AKM_SUITE_FT_OVER_SAE. Also a new macro was introduced to remove
duplicate if statement code checking for both FT_OVER_SAE and SAE AKM's.
The ifindex as reported by netdev is unsigned, so make sure that it is
printed as such. It is astronomically unlikely that this causes any
actual issues, but lets be paranoid.
Move the roam initiation (signal loss, ap directed roaming) and scanning
details into station from device. Certain device functions have been
exposed temporarily to make this possible.