The diagnostic interface returns an error anyways if station is
not connected so it makes more sense to only bring the interface
up when its actually usable. This also removes the interface
when station disconnects, which was never done before (the
interface stayed up indefinitely due to a forgotten remove call).
When we're auto-connecting and have hidden networks configured, use
active scans regardless of whether we see any hidden BSSes in our
existing scan results.
This allows us to more effectively see/connect to hidden networks
when first powering up or after suspend.
Kernel might report hidden BSSes that are reported from beacon frames
separately than ones reported due to probe responses. This may confuse
the station network collation logic since the scan_bss generated by the
probe response might be removed erroneously when processing the scan_bss
that was generated due to a beacon.
Make sure that bss_match also takes the SSID into account and only
matches scan_bss structures that have the same BSSID and SSID contents.
Instead of manually managing whether to expire BSSes or not, use the
scanned frequency set instead. This makes the API slightly easier to
understand (dropping two boolean arguments in a row) and also a bit more
future-proof.
Commit d372d59bea checks whether a hidden network had a previous
connection attempt and re-tries. However, it inadvertently dropped
handling of a condition where a non-hidden network SSID is provided to
ConnectHiddenNetwork. Fix that.
Fixes: d372d59bea ("station: Allow ConnectHiddenNetwork to be retried")
Now that ConnectHiddenNetwork can be invoked while we're connected, set
the mac randomization hint parameter properly. The kernel will reject
requests if randomization is enabled while we're connected to a network.
If we forget a hidden network, then make sure to remove it from the
network list completely. Otherwise it would be possible to still
issue a Network.Connect to that particular object, but the fact that the
network is hidden would be lost.
==17639== 72 (16 direct, 56 indirect) bytes in 1 blocks are definitely
lost in loss record 3 of 3
==17639== at 0x4C2F0CF: malloc (vg_replace_malloc.c:299)
==17639== by 0x4670AD: l_malloc (util.c:61)
==17639== by 0x4215AA: scan_freq_set_new (scan.c:1906)
==17639== by 0x412A9C: parse_neighbor_report (station.c:1910)
==17639== by 0x407335: netdev_neighbor_report_frame_event
(netdev.c:3522)
==17639== by 0x44BBE6: frame_watch_unicast_notify (frame-xchg.c:233)
==17639== by 0x470C04: dispatch_unicast_watches (genl.c:961)
==17639== by 0x470C04: process_unicast (genl.c:980)
==17639== by 0x470C04: received_data (genl.c:1101)
==17639== by 0x46D9DB: io_callback (io.c:118)
==17639== by 0x46CC0C: l_main_iterate (main.c:477)
==17639== by 0x46CCDB: l_main_run (main.c:524)
==17639== by 0x46CF01: l_main_run_with_signal (main.c:656)
==17639== by 0x403EDE: main (main.c:490)
In the case that ConnectHiddenNetwork scans successfully, but fails for
some other reason, the network object is left in the scan results until
it expires. This will prevent subsequent attempts to use
ConnectHiddenNetwork with a .NotHidden error. Fix that by checking
whether a found network is hidden, and if so, allow the request to
proceed.
Rework the logic slightly so that this function returns an error message
on error and NULL on success, just like other D-Bus method
implementations. This also simplifies the code slightly.
We used to not allow to connect to a different network while already
connected. One had to disconnect first. This also applied to
ConnectHiddenNetwork calls.
This restriction can be dropped now. station will intelligently
disconnect from the current AP when a station_connect_network() is
issued.
If the disconnect fails and station_disconnect_onconnect_cb is called
with an error, we reply to the original message accordingly.
Unfortunately pending_connect is not unrefed or cleared in this case.
Fix that.
Fixes: d0ee923dda ("station: Disconnect, if needed, on a new connection attempt")
At some point the non-interactive client tests began failing.
This was due to a bug in station where it would transition from
'connected' to 'autoconnect' due to a failed scan request. This
happened because a quick scan got scheduled during an ongoing
scan, then a Connect() gets issued. The work queue treats the
Connect as a priority so it delays the quick scan until after the
connection succeeds. This results in a failed quick scan which
IWD does not expect to happen when in a 'connected' state. This
failed scan actually triggers a state transition which then
gets IWD into a strange state where its connected from the
kernel point of view but does not think it is:
src/station.c:station_connect_cb() 13, result: 0
src/station.c:station_enter_state() Old State: connecting, new state: connected
src/wiphy.c:wiphy_radio_work_done() Work item 6 done
src/wiphy.c:wiphy_radio_work_next() Starting work item 5
src/station.c:station_quick_scan_triggered() Quick scan trigger failed: -95
src/station.c:station_enter_state() Old State: connected, new state: autoconnect_full
To fix this IWD should simply cancel any pending quick scans
if/when a Connect() call comes in.
With AP now getting its own diagnostic interface it made sense
to move the netdev_station_info struct definition into its own
header which eventually can be accompanied by utilities in
diagnostic.c. These utilities can then be shared with AP and
station as needed.
Following a successful roaming sequence, schedule another attempt unless
the driver has sent a high RSSI notification. This makes the behaviour
analogous to a failed roaming attempt where we remained connected to the
same BSS.
This makes iwd compatible with wireless drivers which do not necessarily
send out a duplicate low RSSI notification upon reassociation. Without
this change, iwd risks getting indefinitely stuck to a BSS with low
signal strength, even though a better BSS might later become available.
In the case of a high RSSI notification, the minimum roam time will also
be reset to zero. This preserves the original behaviour in the case
where a high RSSI notification is processed after station_roamed().
Doing so also gives a chance for faster roaming action in the following
example scenario:
1. RSSI LOW
2. schedule roam in 5 seconds
(5 seconds pass)
3. try roaming
4. roaming fails, same BSS
5. schedule roam in 60 seconds
(20 seconds pass)
6. RSSI HIGH
7. cancel scheduled roam
(20 seconds pass)
8. RSSI LOW
9. schedule roam in 5 seconds or 20 seconds?
By resetting the minimum roam time, we can avoid waiting 20 seconds when
the station may have moved considerably. And since the high/low RSSI
notifications are configured with a hysteresis, we should still be
protected against too frequent spurious roaming attempts.
Add a parameter to station_set_scan_results to allow skipping the
removal of old BSSes. In the DBus-triggered scan only expire BSSes
after having gone through the full supported frequency set.
It should be safe to pass partial scan results to
station_set_scan_results() when not expiring BSSes so using this new
parameter I guess we could also call it for roam scan results.
A scan normally takes about 2 seconds on my dual-band wifi adapter when
connected. The drivers will normally probe on each supported channel in
some unspecified order and will have new partial results after each step
but the kernel sends NL80211_CMD_NEW_SCAN_RESULTS only when the full
scan request finishes, and for segmented scans we will wait for all
segments to finish before calling back from scan_active() or
scan_passive().
To improve user experience define our own channel order favouring the
2.4 channels 1, 6 and 11 and probe those as an individual scan request
so we can update most our DBus org.connman.iwd.Network objects more
quickly, before continuing with 5GHz band channels, updating DBus
objects again and finally the other 2.4GHz band channels.
The overall DBus-triggered scan on my wifi adapter takes about the same
time but my measurements were not very strict, and were not very
consistent with and without this change. With the change most Network
objects are updated after about 200ms though, meaning that I get most
of the network updates in the nm-applet UI 200ms from opening the
network list. The 5GHz band channels take another 1 to 1.5s to scan and
remaining 2.4GHz band channels another ~300ms.
Hopefully this is similar when using other drivers although I can easily
imagine a driver that parallelizes 2.4GHz and 5GHz channel probing using
two radios, or uses 2, 4 or another number of dual-band radios to probe
2, 4, ... channels simultanously. We'd then lose some of the
performance benefit. The faster scan results may be worth the longer
overall scan time anyway.
I'm also assuming that the wiphy's supported frequency list is exactly
what was scanned when we passed no frequency list to
NL80211_CMD_TRIGGER_SCAN and we won't get errors for passing some
frequency that shouldn't have been scanned.
Waiting to request neighbor reports until we are in need of a roam
delays the roam time, and probably isn't as reliable since we are
most likely in a low RSSI state. Instead the neighbor report can
be requested immediately after connecting, saved, and used if/when
a roam is needed. The existing behavior is maintained if the early
neighbor report fails where a neighbor report is requested at the
time of the roam.
The code which parses the reports was factored out and shared
between the existing (late) neighbor report callback and the early
neighbor report callback.
Modern kernels ~5.4+ have changed the way lost beacons are
reported and effectively make the lost beacon event useless
because it is immediately followed by a disconnect event. This
does not allow IWD enough time to do much of anything before
the disconnect comes in and we are forced to fully re-connect
to a different AP.
periodic_scan_stop is called whenever we exit the autoscan state but a
periodic scan may not be running at the time. If we have a
user-triggered scan running, or the autoconnect_quick scan, and we reset
Scanning to false before that scan finished, a client could en up
calling GetOrderedNetwork too early and not receiving the scan results.
ConnectHiddenNetwork can be seen a triggering this sequence:
1. the active scan,
2. the optional agent request,
3. the Authentication/Association/4-Way Handshake/netconfig,
4. connected state
Currently Disconnect() interrupts 3 and 4, allow it to also interrupt
state 1. It's difficult to tell whether we're in state 2 from within
station.c.
For multi-bss networks its nice to know which BSS is being connected
to. The ranking can hint at it, but blacklisting or network capabilities
could effect which network is actually chosen. An explicit debug print
makes debugging much easier.
Move the update of station->networks_sorted order to before we set
station->connected_network NULL to avoid a crash when we attempt to
use the NULL pointer.
Besides being undefined behaviour, signed integer overflow can cause
unexpected comparison results. In the case of network_rank_compare(),
a connected network with rank INT_MAX would cause newly inserted
networks with negative rank to be inserted earlier in the ordered
network list. This is reflected in the GetOrderedMethods() DBus method
as can be seen in the following iwctl output:
[iwd]# station wlan0 get-networks
Network name Security Signal
----------------------------------------------------
BEOLAN 8021x **** }
BeoBlue psk *** } all unknown,
UI_Test_Network psk *** } hence assigned
deneb_2G psk *** } negative rank
BEOGUEST open **** }
> titan psk ****
Linksys05274_5GHz_dmt psk ****
Lyngby-4G-4 5GHz psk ****
Doing so ensures that the currently connected network is always at the
beginning of the list. Previously, the list would only get updated after
a scan.
This fixes the documented behaviour of GetOrderedNetworks() DBus method,
which states that the currently connected network is always at the
beginning of the returned array.
To use the wiphy radio work queue, scanning mostly remained the same.
start_next_scan_request was modified to be used as the work callback,
as well as not start the next scan if the current one was done
(since this is taken care of by wiphy work queue now). All
calls to start_next_scan_request were removed, and more or less
replaced with wiphy_radio_work_done.
scan_{suspend,resume} were both removed since radio management
priorities solve this for us. ANQP requests can be inserted ahead of
scan requests, which accomplishes the same thing.
Before connecting to a hidden network we must scan. During this scan
if another connection attempt comes in the expected behavior is to
abort the original connection. Rather than waiting for the scan to
complete, then canceling the original hidden connection we can just
cancel the hidden scan immediately, reply to dbus, and continue with
the new connection attempt.
The new frame-xchg module now handles a lot of what ANQP used to do. ANQP
now does not need to depend on nl80211/netdev for building and sending
frames. It also no longer needs any of the request lookups, frame watches
or to maintain a queue of requests because frame-xchg filters this for us.
From an API perspective:
- anqp_request() was changed to take the wdev_id rather than ifindex.
- anqp_cancel() was added so that station can properly clean up ANQP
requests if the device disappears.
During testing a bug was also fixed in station on the timeout path
where the request queue would get popped twice.
When roaming, iwd tries to scan a limited number of frequencies to keep
the roaming latency down. Ideally the frequency list would come in from
a neighbor report, but if neighbor reports are not supported, we fall
back to our internal database for known frequencies of this network.
iwd tries to keep the number of scans down to a bare minimum, which
means that we might miss APs that are in range. This could happen
because the user might have moved physically and our frequency list is
no longer up to date, or if the AP frequencies have been reconfigured.
If a limited scan fails to find any good roaming candidates, re-attempt
a full scan right away.
If the roam failed and we are no longer connected, station_disassociated
is called which ends up calling station_roam_state_clear. Thus
resetting the variables is not needed. Reflow the logic to make this a
bit more explicit.
If the roam attempt fails, do not reset this to false. Generally this
is set by the fact that we lost beacon and to not attempt neighbor
reports, etc. This hint should be preserved across roam attempts.
Right now, if the connection fails, then network always thinks that the
password should be re-asked. Loosen this to only do so if the
connection failed at least in the handshake phase. If the connection
failed due to Association / Authentication timeout, it is likely that
something is wrong with the AP and it can't respond.
This is to allow network to watch for ANQP activity in order to
fix the race condition between scanning finishing and ANQP finishing.
Without this it is possible for a DBus Connect() to come in before
ANQP has completed and causing the network to return NotConfigured,
when its actually in the process of obtaining all the network info.
The watch was made globally in station due to network not having
a station object until each individual network is created. Adding a
watch during network creation would result in many watchers as well
as a lot of removal/addition as networks are found and lost.
Change signature of network_connect_new_hidden_network to take
reference to the caller's l_dbus_message struct. This allows to
set the caller's l_dbus_message struct to NULL after replying in
the case of a failure.
==201== at 0x467C15: l_dbus_message_unref (dbus-message.c:412)
==201== by 0x412A51: station_hidden_network_scan_results (station.c:2504)
==201== by 0x41EAEA: scan_finished (scan.c:1505)
==201== by 0x41EC10: get_scan_done (scan.c:1535)
==201== by 0x462592: destroy_request (genl.c:673)
==201== by 0x462987: process_unicast (genl.c:988)
==201== by 0x462987: received_data (genl.c:1087)
==201== by 0x45F5A2: io_callback (io.c:126)
==201== by 0x45E8FD: l_main_iterate (main.c:474)
==201== by 0x45E9BB: l_main_run (main.c:521)
==201== by 0x45EBCA: l_main_run_with_signal (main.c:643)
==201== by 0x403B15: main (main.c:512)
Introduce hidden_pending to keep reference to the dbus message object
while we wait for the scan results to be returned while trying to
connect to a hidden network. This simplifies the logic by separating it
into two independent logical units: scanning, connecting and eliminates
a possibility of a memory leak in the case when Network.Connect being
initiated while Station.ConnectHiddenNetwork is in progress.
If a connection is initiated (via dbus) while a quick scan is in
progress, the quick scan will be aborted. In this case,
station_quick_scan_results will always transition to the
AUTOCONNECT_FULL state regardless of whether it should or not.
Fix this by making sure that we only enter AUTOCONNECT_FULL if we're
still in the AUTOCONNECT_QUICK state.
Reported-by: Alvin Šipraga <alsi@bang-olufsen.dk>
This patch adds two new options to a network provisioning file:
AlwaysRandomizeAddress={true,false}
If true, IWD will randomize the MAC address on each connection to this
network. The address does not persists between connections, any new
connection will result in a different MAC.
AddressOverride=<MAC>
If set, the MAC address will be set to <MAC> assuming its a valid MAC
address.
These two options should not be used together, and will only take effect
if [General].AddressRandomization is set to 'network' in the IWD
config file.
If neither of these options are set, and [General].AddressRandomization
is set to 'network', the default behavior remains the same; the MAC
will be generated deterministically on a per-network basis.
If a scan is requested during the middle of a connection we should
return busy instead of attempting the scan. The kernel ends up coming
back with not supported in this case, which is misleading and
difficult to debug.
For Radio Resource Management (RRM) we will need access to the currently
connected BSS as well as the last scan results in order to do certain
kinds of requested measurements.
On EAP events, call the handshake_event handler with the new event type
HANDSHAKE_EVENT_EAP_NOTIFY isntead of the eapol_event callback.
This allows the handler to be set before calling
netdev_connect/netdev_connect_wsc. It's also in theory more type-safe
because we don't need the cast in netdev_connect_wsc anymore.
Convert the handshake event callback type to use variable argument
list to allow for more flexibility in event-specific arguments
passed to the callbacks.
Note the uint16_t reason code is promoted to an int when using variable
arguments so va_arg(args, int) has to be used.
Previously, station state 'connected' used to identify an interface associated
with AP. With the introduction of netconfig, an interface is assumed to be
connected after the IP addresses have been assigned to it. If netconfig is
disabled, the behavior remains unchanged.
As a first step to enable the usage of netconfig in ead and
prospective transition to be a part of ell, the public API for
creation and destruction of the netconfig objects has been
renamed and changed. Instead of hiding the netconfig objects inside
of netconfig module, the object is now passed back to the caller.
The internal queue of netconfig objects remains untouched, due
to limitations in ell’s implementation of rtnl. After the proper
changes are done to ell, netconfig_list is expected to be removed
from netconfig module.
If neighbor reports are unavailable, or the report yielded no
results we can quickly scan for only known frequencies. This
changes the original behavior where we would do a full scan
in this case.
Station was building up the HS20 elements manually. Now we can
use this new API and let network take care of the complexity
of building network specific vendor IEs.
Since hotspot networks may require ANQP the autoconnect loop needed to
be delayed until after the ANQP results came back and the network
objects were updated. If there are hotspot networks in range ANQP will
be performed and once complete autoconnect will begin for all networks
including hotspots. If no hotspots are in range autoconnect will
proceed as it always has.
Note: Assuming hotspots are in range this will introduce some delay
in autoconnecting to any network since ANQP must come back. The full
plan is to intellegently decide when and when not to do ANQP in order
to minimize delays but since ANQP is disabled by default the behavior
introduced with this patch is acceptable.
For (Re)Association the HS20 indication element was passed exactly as
it was found in the scan results. The spec defines what bits can be
set and what cannot when this IE is used in (Re)Association. Instead
of assuming the AP's IE conforms to the spec, we now parse the IE and
re-build it for use with (Re)Association.
Since the full IE is no longer used, it was removed from scan_bss, and
replaced with a bit for HS20 support (hs20_capable). This member is
now used the same as hs20_ie was.
The version parsed during scan results is now used when building the
(Re)Association IE.
Previously, iwd used to throw net.connman.iwd.Busy when connection
attempt was made while connected. The new behavior allows iwd to
seamlessly disconnect from the connected network and attempt a new
connection.
The HS20 indication element should always be included during
(Re)Association per the spec. This removes the need for a
dedicated boolean, and now the hs20_ie can be used instead.
Per the hotspot 2.0 spec, if a matching roaming consortium OI is
found it should be added to the (Re)Association request. vendor_ies
can now be provided to netdev_connect, which get appended to the IE
attribute.
The ifindex is used to index the netdevs in the system (wlan, ethernet,
etc.) but we can also do wifi scanning on interfaces that have no
corresponding netdev object, like the P2P-device virtual interfaces.
Use the wdev id's to reference interfaces, the nl80211 api doesn't care
whether we use a NL80211_ATTR_IFINDEX or NL80211_ATTR_WDEV. Only
wireless interfaces have a wdev id.
After a scan, station can now pause future scans and start ANQP requests
to discover Hotspot's NAI realm. This lets us check if the AP's NAI realm
matches any stored hotspot configuration files. If so we can connect to
this network. If the network provides an HESSID and a matching one is
found in a hotspot provisioning file we can skip ANQP and directly connect
as this is expected to be our 'home network'
The handshake object had 4 setters for authenticator/supplicant IE.
Since the IE ultimately gets put into the same buffer, there really
only needs to be a single setter for authenticator/supplicant. The
handshake object can deal with parsing to decide what kind of IE it
is (WPA or RSN).
This adds some checks for the FT_OVER_FILS AKMs in station and netdev
allowing the FILS-FT AKMs to be selected during a connection.
Inside netdev_connect_event we actually have to skip parsing the IEs
because FILS itself takes care of this (needs to handle them specially)