The purpose of this was to have a single utility to both cancel an
existing offchannel operation (if one exists) and start a new one.
The problem was the previous offchannel operation was being canceled
first which opened up the radio work queue to other items. This is
not desireable as, for example, a scan would end up breaking the
DPP protocol most likely.
Starting the new offchannel then canceling is the correct order of
operations but to do this required saving the new ID, canceling, then
setting offchannel_id to the new ID so dpp_presence_timeout wouldn't
overwrite the new ID to zero.
This also removes an explicit call to offchannel_cancel which is
already done by dpp_offchannel_start.
Several members are named based on initiator/responder (i/r)
terminology. Eventually both initiator and responder will be
supported so rename these members to use own/peer naming
instead.
ASN1 parsing will soon be required which will need some utilities in
asn1-private.h. To avoid duplication include this private header and
replace the OID's with the defined structures as well as remove the
duplicated macros.
station_set_scan_results takes an autoconnect flag which was being
set true in both regular/quick autoconnect scans. Since OWE networks
are processed after setting the scan results IWD could end up
connecting to a network before all the OWE hidden networks are
populated.
To fix this regular/quick autoconnect results will set the flag to
false, then process OWE networks, then start autoconnect. If any
OWE network scans are pending station_autoconnect_start will fail
but will pick back up after the hidden OWE scan.
scan_request_failed and scan_finished remove the finished scan_request
from the request queue right away, before calling the callback. This
breaks those clients that rely on scan_cancel working on such requests
(i.e. to force the destroy callback to be invoked synchronously, see
a0911ca778 ("station: Make sure roam_scan_id is always canceled").
Fix this by removing the scan_request from the request queue after
invoking the callback. Also provide a re-entrancy guard that will make
sure that the scan_request isn't removed in scan_cancel itself.
There are similar operations being performed but with different
callbacks and userdata, depending on whether 'sr' is NULL or not.
Optimize the function flow slightly to make if-else unnecessary.
While here, update the comment. periodic scans are now scheduled only
based on the periodic timeout timer.
If periodic scan is active and we receive a SCAN_ABORTED event, we would
still invoke the periodic scan callback with an error. This is rather
pointless since the periodic scan callback cannot do anything useful
with this information. Fix that.
We should never reach a point where NEW_SCAN_RESULTS or SCAN_ABORTED are
received before a corresponding TRIGGER_SCAN is received. Even if this
does happen, there's no harm from processing the commands anyway.
This makes it a little easier to book-keep the started variable. Since
scan_request already has a 'passive' bit-field, there should be no
storage penalty.
If scan_cancel is called on a scan_request that is 'finished' but with
the GET_SCAN command still in flight, it will trigger a crash as
follows:
Received Deauthentication event, reason: 2, from_ap: true
src/station.c:station_disconnect_event() 11
src/station.c:station_disassociated() 11
src/station.c:station_reset_connection_state() 11
src/station.c:station_roam_state_clear() 11
src/scan.c:scan_cancel() Trying to cancel scan id 6 for wdev 200000002
src/scan.c:scan_cancel() Scan is at the top of the queue, but not triggered
src/scan.c:get_scan_done() get_scan_done
Aborting (signal 11) [/home/denkenz/iwd-master/src/iwd]
++++++++ backtrace ++++++++
#0 0x7f9871aef3f0 in /lib64/libc.so.6
#1 0x41f470 in station_roam_scan_notify() at /home/denkenz/iwd-master/src/station.c:2285
#2 0x43936a in scan_finished() at /home/denkenz/iwd-master/src/scan.c:1709
#3 0x439495 in get_scan_done() at /home/denkenz/iwd-master/src/scan.c:1739
#4 0x4bdef5 in destroy_request() at /home/denkenz/iwd-master/ell/genl.c:676
#5 0x4c070b in l_genl_family_cancel() at /home/denkenz/iwd-master/ell/genl.c:1960
#6 0x437069 in scan_cancel() at /home/denkenz/iwd-master/src/scan.c:842
#7 0x41dc2e in station_roam_state_clear() at /home/denkenz/iwd-master/src/station.c:1594
#8 0x41dd2b in station_reset_connection_state() at /home/denkenz/iwd-master/src/station.c:1619
#9 0x41dea4 in station_disassociated() at /home/denkenz/iwd-master/src/station.c:1644
The happens because get_scan_done callback is still called as a result of
l_genl_cancel. Add a re-entrancy guard in the form of 'canceled'
variable in struct scan_request. If set, get_scan_done will skip invoking
scan_finished.
It isn't clear what 'l_queue_peek_head() == results->sr' check was trying
to accomplish. If GET_SCAN dump was scheduled, then it should be
reported. Drop it.
results->sr is set to NULL for 'opportunistic' scans which were
triggered externally. See scan_notify() for details. However,
get_scan_done would only invoke scan_finished (and thus the periodic
scan callback sc->sp.callback) only if the scan queue was empty. It
should do so in all cases.
The point type was being hard coded to 0x3 (BIT1) which may have resulted
in the peer subtracting Y from P when reading in the point (depending on
if Y was odd or not).
Instead set the compressed type to whatever avoids the subtraction which
both saves IWD from needing to do it, as well as the peer.
The intent of this check is to make sure that at least 2 bytes are
available for reading. However, the unintended consequence is that tags
with a zero length at the end of input would be rejected.
While here, rework the check to be more resistant to potential
overflow conditions.
The DPP spec says nothing about how to handle re-transmits but it
was found in testing this can happen relatively easily for a few
reasons.
If the configurator requests a channel switch but does not get onto
the new channel quick enough the enrollee may have already sent the
authenticate response and it was missed. Also by nature of how the
kernel goes offchannel there are moments in time between ROC when
the card is idle and not receiving any frames.
Only frames where there was no ACK will be retransmitted. If the
peer received the frame and dropped it resending the same frame wont
do any good.
Now the result is sent immediately. Prior a connect attempt or
scan could have started, potentially losing this frame. In addition
the offchannel operation is cancelled after sending the result
which will allow the subsequent connect or scan to happen much
faster since it doesn't have to wait for ROC to expire.
The previous (incorrect) else was removed since it ended up
printing in most cases since the if clause returned. This should
have been an else if conditional from the start and only print if the
station device was not found.
IWD may be in the middle of some long operation, e.g. scanning.
If the URI is returned before IWD is ready, a configurator could
start sending frames and IWD either wont receive them, or will
be unable to respond quickly.
The offchannel priority was also changed to zero, which matches the
priority of frames. Currently there should be no interaction between
offchannel and connect (previous offchannel priority).
Periodic scans were handled specially where they were only
started if no other requests were pending in the scan queue.
This is fine, and what we want, but this can actually be
handled automatically by nature of the wiphy work queue rather
than needing to check the request queue explicitly.
Instead we can insert periodic scans at a lower priority than
other scans. This puts them at the end of the work queue, as
well as allows future requests to jump ahead if a periodic scan
has not yet started.
Eventually, once all pending scans are done, the peridoic scan
may begin. This is no different than the preivous behavior and
avoids the need for any special checks once scan requests
complete.
One check was added to address the problem of the periodic scan
timer firing before the scan could even start. Currently this
happened to be handled fine in scan_periodic_queue, as it checks
the queue length. Since this check was removed we must see check
for this condition inside scan_periodic_timeout.
This adds a priority argument to scan_common rather than hard
coding it when inserting the work item and uses the newly
defined wiphy priority for scanning.
Work priority was never explicitly defined anywhere, and a module
using wiphy_radio_work APIs needed to ensure it was not inserting
at a priority that would interfere with other work.
Now all the types of work have been defined with their own priority
and future priorities can easily be added before, after, or in
between existing priorities.
- Mostly problems with whitespace:
- Use of spaces instead of tabs
- Stray spaces before closing ')
- Missing spaces
- Missing 'void' from function declarations & definitions that
take no arguments.
- Wrong indentation level
When this attribute is included, the initiator is requesting all
future frames be sent on this channel. There is no reason for a
configurator to act on this attribute (at least for now) so the
request frame will be dropped in this case. Enrollees will act
on it by switching to the new channel and sending the authentication
response.
While connected the driver ends up choosing quite small ROC
durations leading to excessive calls to ROC. This also will
negatively effect any wireless performance for the current
network and possibly lead to missed DPP frames.
Currently the enrollee relied on autoconnect to handle connecting
to the newly configured network. This usually resulted in poor
performance since periodic scans are done at large intervals apart.
Instead first check if the newly configured network is already
in IWD's network queue. If so it can be connected to immediately.
If not, a full scan must be done and results given to station.
With better JSON support the configuration request object
can now be fully parsed. As stated in the previous comment
there really isn't much use from the configurator side apart
from verifying mandatory values are included.
This patch also modifies the configuration result to handle
sending non 'OK' status codes in case of JSON parsing errors.
json_iter_parse is only meant to work on objects while
json_iter_next is only meant to work on arrays.
This adds checks in both APIs to ensure they aren't being
used incorrectly.
Arrays can now be parsed using the JSON_ARRAY type (stored in
a struct json_iter) then iterated using json_iter_next. When
iterating the type can be checked with json_iter_get_type. For
each iteration the value can be obtained using any of the type
getters (int/uint/boolean/null).
This adds support for boolean, (unsigned) integers, and
null types. JSON_PRIMITIVE should be used as the type when
parsing and the value should be struct json_iter.
Once parsed the actual value can be obtained using one of
the primitive getters. If the type does not match they will
return false.
If using JSON_OPTIONAL with JSON_PRIMITIVE the resulting
iterator can be checked with json_iter_is_valid. If false
the key/value was not found or the type was not matching.
First, this was renamed to 'count_tokens_in_container' to be
more general purpose (i.e. include future array counting).
The way the tokens are counted also changed to be more intuitive.
While the previous way was correct, it was somewhat convoluted in
how it worked (finding the next parent of the objects parent).
Instead we can use the container token itself as the parent and
begin counting tokens. When we find a token with a parent index
less than the target we have reached the end of this container.
This also works for nested containers, including arrays since we
no longer rely on a key (which an array element would not have).
For example::
{
"first":{"foo":"bar"},
"second":{"foo2":"bar2"}
}
index 0 <overall object>
index 1 "first" with parent 0
index 2 {"foo":"bar"} with parent 1
Counting tokens inside "first"'s object we have:
index 3 "foo" with parent 2
index 4 "bar" with parent 3
If we continue counting we reach:
index 5 "second" with parent 0
This terminates the counting loop since the parent index is
less than '2' (the index of {"foo":"bar"} object).
In file included from ./ell/ell.h:15,
from ../../src/dpp.c:29:
../../src/dpp.c: In function ‘authenticate_request’:
../../ell/log.h:79:22: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 8 has type ‘size_t’ {aka ‘unsigned int’} [-Wformat=]
79 | l_log(L_LOG_DEBUG, "%s:%s() " format, __FILE__, \
| ^~~~~~~~~~
../../ell/log.h:54:16: note: in definition of macro ‘l_log’
54 | __func__, format "\n", ##__VA_ARGS__)
| ^~~~~~
../../ell/log.h:103:31: note: in expansion of macro ‘L_DEBUG_SYMBOL’
103 | #define l_debug(format, ...) L_DEBUG_SYMBOL(__debug_desc, format, ##__VA_ARGS__)
| ^~~~~~~~~~~~~~
../../src/dpp.c:1235:3: note: in expansion of macro ‘l_debug’
1235 | l_debug("I-Nonce has unexpected length %lu", i_nonce_len);
| ^~~~~~~
Direct leak of 64 byte(s) in 1 object(s) allocated from:
#0 0x7fa226fbf0f8 in __interceptor_malloc (/usr/lib/gcc/x86_64-pc-linux-gnu/9.4.0/libasan.so.5+0x10c0f8)
#1 0x688c98 in l_malloc ell/util.c:62
#2 0x6c2b19 in msg_alloc ell/genl.c:740
#3 0x6cb32c in l_genl_msg_new_sized ell/genl.c:1567
#4 0x424f57 in netdev_build_cmd_authenticate src/netdev.c:3285
#5 0x425b50 in netdev_sae_tx_authenticate src/netdev.c:3385
Direct leak of 7 byte(s) in 1 object(s) allocated from:
#0 0x7fd748ad00f8 in __interceptor_malloc (/usr/lib/gcc/x86_64-pc-linux-gnu/9.4.0/libasan.so.5+0x10c0f8)
#1 0x688c21 in l_malloc ell/util.c:62
#2 0x4beec7 in handshake_state_set_vendor_ies src/handshake.c:324
#3 0x464e4e in station_handshake_setup src/station.c:1203
#4 0x472a2f in __station_connect_network src/station.c:2975
#5 0x473a30 in station_connect_network src/station.c:3078
#6 0x4ed728 in network_connect_8021x src/network.c:1497
Fixes: f24cfa481b ("handshake: Add setter for vendor IEs")
This implements a configurator in the responder role. Currently
configuring an enrollee is limited to only the connected network.
This is to avoid the need to go offchannel for any reason. But
because of this a roam, channel switch, or disconnect will cause
the configuration to fail as none of the frames are being sent
offchannel.
Added both enrollee and configurator roles, as well as the needed
logic inside the authentication protocol to verify role compatibility.
The dpp_sm's role will now be used when setting capability bits making
the auth protocol agnostic to enrollees or configurators.
This also allows the card to re-issue ROC if it ends in the middle of
authenticating or configuring as well as add a maximum timeout for
auth/config protocols.
IO errors were also handled as these sometimes can happen with
certain drivers but are not fatal.
Allows creating a new configuration object based on settings, ssid,
and akm suite (for configurator role) as well as converting a
configuration object to JSON.
Rather than hard coding ad0, use the actual frame data. There really
isn't a reason this would differ (only status attribute) but just
in case its better to use the frame data directly.
This is a minimal implementation only supporting legacy network
configuration, i.e. only SSID and PSK/passphrase are supported.
Missing features include:
- Fragmentation/comeback delay support
- DPP AKM support
- 8021x/PKEX support
This implements the DPP protocol used to authenticate to a
DPP configurator.
Note this is not a full implementation of the protocol and
there are a few missing features which will be added as
needed:
- Mutual authentication (needed for BLE bootstrapping)
- Configurator support
- Initiator role
The presence procedure implemented is a far cry from what the spec
actually wants. There are two reason for this: a) the kernels offchannel
support is not at a level where it will work without rather annoying
work arounds, and b) doing the procedure outlined in the spec will
result in terrible discovery performance.
Because of this a simpler single channel announcement is done by default
and the full presence procedure is left out until/if it is needed.
This is a minimal wrapper around jsmn.h to make things a bit easier
for iterating through a JSON object.
To use, first parse the JSON and create a contents object using
json_contents_new(). This object can then be used to initialize a
json_iter object using json_iter_init().
The json_iter object can then be parsed with json_iter_parse by
passing in JSON_MANDATORY/JSON_OPTIONAL arguments. Currently only
JSON_STRING and JSON_OBJECT types are supported. Any JSON_MANDATORY
values that are not found will result in an error.
If a JSON_OPTIONAL string is not found, the pointer will be NULL.
If a JSON_OPTIONAL object is not found, this iterator will be
initialized but 'start' will be -1. This can be checked with a
convenience macro json_object_not_found();
Static analysis was not happy since this return can be negative and
it was being fed into an unsigned argument. In reality this cannot
happen since the key buffer is always set to the maximum size supported
by any curves.
This module provides a convenient wrapper around both
CMD_[CANCEL_]_REMAIN_ON_CHANNEL APIs.
Certain protocols require going offchannel to send frames, and/or
wait for a response. The frame-xchg module somewhat does this but
has some limitations. For example you cannot just go offchannel;
an initial frame must be sent out to start the procedure. In addition
frame-xchg does not work for broadcasts since it expects an ACK.
This module is much simpler and only handles going offchannel for
a duration. During this time frames may be sent or received. After
the duration the caller will get a callback and any included error
if there was one. Any offchannel request can be cancelled prior to
the duration expriring if the offchannel work has finished early.
The disconnect event handler was mistakenly bailing out if FT or
reassociation was going on. This was done because a disconnect
event is sent by the kernel when CMD_AUTH/CMD_ASSOC is used.
The problem is an AP could also disconnect IWD which should never
be ignored.
To fix this always parse the disconnect event and, if issued by
the AP, always notify watchers of the disconnect.
LLD 13 and GNU ld 2.37 support -z start-stop-gc which allows garbage
collection of C identifier name sections despite the __start_/__stop_
references. GNU ld before 2015-10 had the behavior as well. Simply set
the retain attribute so that GCC 11 (if configure-time binutils is 2.36
or newer)/Clang 13 will set the SHF_GNU_RETAIN section attribute to
prevent garbage collection.
Without the patch, there are linker errors with -z start-stop-gc
(LLD default) when -Wl,--gc-sections is used:
```
ld.lld: error: undefined symbol: __start___eap
>>> referenced by eap.c
>>> src/eap.o:(eap_init)
```
The remain attribute will not be needed if the metadata sections are
referenced by code directly.
ap.c has been mostly careful to call the event handler at the end of any
externally called function to allow methods like ap_free() to be called
within the handler, but that isn't enough. For example in
ap_del_station we may end up emitting two events: STATION_REMOVED and
DHCP_LEASE_EXPIRED. Use a slightly more complicated mechanism to
explicitly guard ap_free calls inside the event handler.
To make it easier, simplify cleanup in ap_assoc_reassoc with the use of
_auto_.
In ap_del_station reorder the actions to send the STATION_REMOVED event
first as the DHCP_LEASE_EXPIRED is a consequence of the former and it
makes sense for the handler to react to it first.
src/eap.c: In function 'eap_rx_packet':
src/eap.c:419:50: error: 'vendor_type' may be used uninitialized in this function [-Werror=maybe-uninitialized]
419 | (type == EAP_TYPE_EXPANDED && vendor_id == (id) && vendor_type == (t))
| ^~
src/eap.c:430:11: note: 'vendor_type' was declared here
430 | uint32_t vendor_type;
It isn't clear why GCC complains about vendor_type, but not vendor_id.
But in all cases if type == EAP_TYPE_EXPANDED, then vendor_type and
vendor_id are set. Silence this spurious warning.
There is an unchecked NULL pointer access in network_has_open_pair.
open_info can be NULL, when out of multiple APs in range that advertise
the same SSID some advertise OWE transition elments and some don't.
The Hotspot 2.0 spec has some requirements that IWD was missing depending
on a few bits in extended capabilities and the HS2.0 indication element.
These requirements correspond to a few sysfs options that can be set in
the kernel which are now set on CONNECTED and unset on DISCONNECTED.
Netconfig was the only user of sysfs but now other modules will
also need it.
Adding existing API for IPv6 settings, a IPv4 and IPv6 'supports'
checker, and a setter for IPv4 settings.
The way a SA Query was done following a channel switch was slightly
incorrect. One because it is only needed when OCVC is set, and two
because IWD was not waiting a random delay between 0 and 5000us as
lined out by the spec. This patch fixes both these issues.
Cache the latest v4 and v6 domain string lists in struct netconfig state
to be able to more easily detect changes in those values in future
commits. For that split netconfig_set_domains's code into this function,
which now only commits the values in netconfig->v{4,6}_domain{,s} to the
resolver, and netconfig_domains_update() which figures out the active
domains string list and saves it into netconfig->v{4,6}_domain{,s}. This
probably saves some cycles as the callers can now decide to only
recalculate the domains list which may have changed.
While there simplify netconfig_set_domains return type to void as the
result was always 0 anyway and was never checked by callers.
Cache the latest v4 and v6 DNS IP string lists in struct netconfig state
to be able to more easily detect changes in those values in future
commits. For that split netconfig_set_dns's code into this function,
which now only commit the values in netconfig->dns{4,6}_list to the
resolver, and netconfig_dns_list_update() which figures out the active
DNS IP address list and saves it in netconfig->dns{4,6} list. This
probably saves some cycles as the callers can now decide to only
recalculate the dns_list which may have changed.
While there simplify netconfig_set_dns return type to void as the result
was always 0 anyway and was never checked by callers.
Cache the latest v4 and v6 gateway IP string in struct netconfig state
to be able to more easily detect changes in those values in future
commits and perhaps to simplify the ..._routes_install functions.
netconfig_ipv4_get_gateway's out_mac parameter can now be NULL. While
editing that function fix a small formatting annoyance.
Use a separate fils variable to make the code a bit prettier.
Also make sure that the out_mac parameter is not NULL prior to storing
the gateway_mac in it.
Add netconfig_enabled() and use that in all places that want to know
whether network configuration is enabled. Drop the enable_network_config
deprecated setting, which was only being handled in one of these 5 or so
places.
This code path was never tested and used to ensure a OWE transition
candidate gets selected over an open one (e.g. if all the BSS's are
blacklisted). But this logic was incorrect and the path was being
taken for BSS's that did not contain the owe_trans element, basically
all BSS's. For RSN's this was somewhat fine since the final check
would set a candidate, but for open BSS's the loop would start over
and potentially complete the loop without ever returning a candidate.
If fallback was false, NULL would be returned.
To fix this only take the OWE transition path if its an OWE transition
BSS, i.e. inverse the logic.
Normally Beacon Reporting subelements are present only if repeated
measurements are requested. However, an all-zero Beacon Reporting
subelement is included by some implementations. Handle this case
similarly to the absent case.
Since Reporting Detail subelement is listed as 'extensible', make sure
that the length check is not overly restrictive. We only interpret the
first field.
It was seen during testing that several offload-capable cards
were not including the OCI in the 4-way handshake. This made
any OCV capable AP unconnectable.
To be safe disable OCV on any cards that support offloading.
802.11 requires an STA initiate the SA Query procedure on channel
switch events. This patch refactors sending the SA Query into its
own routine and starts the procedure when the channel switch event
comes in.
In addition the OCI needs to be verified, so the channel info is
parsed and set into the handshakes chandef.
There are several events for channel switching, and nl80211cmd was
naming two of them "Channel Switch Notify". Change
CH_SWITCH_STARTED_NOTIFY to "Channel Switch Started Notify" to
distinguish the two events.
SA query is the final protocol that requires OCI inclusion and
verification. The OCI element is now included and verified in
both request and response frames as required by 802.11.
strcmp behavior is undefined if one of the parameters is NULL.
Server-id is a mandatory value and cannot be NULL. Gateway can be NULL
in DHCP, so check that explicitly.
Reported-by: Andrew Zaborowski <andrew.zaborowski@intel.com>
In certain situations, it is possible for us to know the MAC of the
default gateway when DHCP finishes. This is quite typical on many home
network and small network setups. It is thus possible to pre-populate
the ARP cache with the gateway MAC address to save an extra round trip
at connection time.
Another advantage is during roaming. After version 4.20, linux kernel
flushes ARP caches by default whenever netdev encounters a no carrier
condition (as is the case during roaming). This can prevent packets
from going out after a roam for a significant amount of time due to
lost/delayed ARP responses.
This implements the new handshake callback for setting a TK with
an extended key ID. The procedure is different from legacy zero
index TKs.
First the new TK is set as RX only. Then message 4 should be sent
out (so it uses the existing TK). This poses a slight issue with
PAE sockets since message order is not guaranteed. In this case
the 4th message is stored and sent after the new TK is installed.
Then the new TK is modified using SET_KEY to both send and
receive.
In the case of control port over NL80211 the above can be avoided
and we can simply install the new key, send message 4, and modify
the TK as TX + RX all in sequence, without waiting for any callbacks.
When UseDefaultInterface is set, iwd doesn't attempt to destroy and
recreate any default interfaces it detects. However, only a single
default interface was ever remembered & initialized. This is fine for
most cases since the kernel would typically only create a single netdev
by default.
However, some drivers can create multiple netdevs by default, if
configured to do so. Other usecases, such as tethering, can also
benefit if iwd initialized & managed all default netdevs that were
detected at iwd start time or device hotplug.
oci variable is always set during handshake_util_find_kde. Do not
initialize it unnecessarily to help the compiler / static analysis find
potential issues.
If OCI is not used, then the oci array is never initialized. Do not try
to include it in our GTK 2_of_2 message.
Fixes: ad4d639854 ("eapol: include OCI in GTK 2/2")
802.11 added Extended Key IDs which aim to solve the issue of PTK
key replacement during rekeys. Since swapping out the existing PTK
may result in data loss because there may be in flight packets still
using the old PTK.
Extended Key IDs use two key IDs for the PTK, which toggle between
0 and 1. During a rekey a new PTK is derived which uses the key ID
not already taken by the existing PTK. This new PTK is added as RX
only, then message 4/4 is sent. This ensure message 4 is encrypted
using the previous PTK. Once sent, the new PTK can be modified to
both RX and TX and the rekey is complete.
To handle this in eapol the extended key ID KDE is parsed which
gives us the new PTK key index. Using the new handshake callback
(handshake_state_set_ext_tk) the new TK is installed. The 4th
message is also included as an argument which is taken care of by
netdev (in case waiting for NEW_KEY is required due to PAE socekts).
This may not be required but setting the group key mode explicitly
to multicast makes things consistent, even if only for the benefit
of reading iwmon logs easier.
The procedure for setting extended key IDs is different from the
single PTK key. The key ID is toggled between 0 and 1 and the new
key is set as RX only, then set to RX/TX after message 4/4 goes
out.
Since netdev needs to set this new key before sending message 4,
eapol can include a built message which netdev will store if
required (i.e. using PAE).
ext_key_id_capable indicates the handshake has set the capability bit
in the RSN info. This will only be set if the AP also has the capability
set.
active_tk_index is the key index the AP chose in message 3. This is
now used for both legacy (always zero) and extended key IDs.
Move the reading of ControlPortOverNL80211 into wiphy itself and
renamed wiphy_control_port_capable to wiphy_control_port_enabled.
This makes things easier for any modules interested in control
port support since they will only have to check this one API rather
than read the settings and check capability.
Expose the Device Address property for each peer. The spec doesn't say
much about how permanent the address or the name are, although the
device address by definition lives longer than the interface addresses.
However the device address is defined to be unique and the name is not
so the address can be used to differentiate devices with identical name.
Being unique also may imply that it's assigned globally and thus
permanent.
Network Manager uses the P2P device address when saving connection
profiles (and will need it from the backend) and in this case it seems
better justified than using the name.
The address is already in the object path but the object path also
includes the local phy index which may change for no reason even when
the peer's address hasn't changed so the path is not useful for
remembering which device we've connected to before. Looking at only
parts of the path is considered wrong.
Some drivers might not actually support control port properly even if
advertised by mac80211. Introduce a new method to wiphy that will take
care of looking up any driver quirks that override the presence of
NL80211_EXT_FEATURE_CONTROL_PORT_OVER_NL80211
Make consecutive calls to netconfig_load_settings() memory-leak safe by
introducing a netconfig_free_settings convenience method. This method
will free any settings that are allocated as a result of
netconfig_load_settings() and will be called from netconfig_free() to
ensure that any settings are freed as a result of netconfig_destroy().
For symmetry with IPv4, save the command id for this netlink command so
we can later add logic to the callback as well as be able to cancel the
command. No functional change in this commit alone.
FT/FILS handle their own PMK derivation but rekeys still require
using the 4-way handshake. There is some ambiguity in the spec whether
or not the PMKID needs to be included in message 1/4 and it appears
that when rekeying after FT/FILS hostapd does not include a PMKID.
The handshake contains the current BSS's RSNE/WPA which may differ
from the FT-over-DS target. When verifying the target BSS's RSNE/WPA
IE needs to be checked, not the current BSS.
If the deauth path was triggered IWD would deauth but end up
calling the connect callback with whatever result netdev had
set, e.g. 'NETDEV_RESULT_OK'. This, of course, caused station
some confusion.
FT-over-DS cannot use OCV due to how the kernel works. This means
we could connect initially with OCVC set, but a FT-over-DS attempt
needs to unset OCVC. Set OCVC false when rebuilding the RSNE for
reassociation.
The FT-over-DS action stage builds an FT-Request which contains an
RSNE. Since FT-over-DS will not support OCV add a boolean to
ft_build_authenticate_ies so the OCVC bit can be disabled rather
than relying on the handshake setting.
This modifies the FT logic to fist call get_oci() before
reassociation. This allows the OCI to be included in reassociation
and in the 4-way handshake later on.
The code path for getting the OCI had to be slightly changed to
handle an OCI that is already set. First the handshake chandef is
NULL'ed out for any new connection. This prevents a stale OCI from
being used. Then some checks were added for this case in
netdev_connect_event and if chandef is already set, start the 4-way
handshake.
netconfig_load_settings is called when establishing a new initial
association to a network. This function tries to update dhcp/dhcpv6
clients with the MAC address of the netdev being used. However, it is
too early to update the MAC here since netdev might need to powercycle
the underlying network device in order to update the MAC (i.e. when
AddressRandomization="network" is used).
If the MAC is set incorrectly, DHCP clients are unable to obtain the
lease properly and station is stuck in "connecting" mode indefinitely.
Fix this by delaying MAC address update until netconfig_configure() is
invoked.
Fixes: ad228461ab ("netconfig: Move loading settings to new method, refactor")
If the AP advertises FT-over-DS support it likely wants us to use
it. Additionally signal_low is probably going to be true since IWD
has started a roam attempt.
When netdev goes down so does station, but prior to netdev calling
the neighbor report callback. The way the logic was written station
is dereferenced prior to checking for any errors, causing a use
after free.
Since -ENODEV is used in this case check for that early before
accessing station.
This adds a utility to convert a chandef obtained from the kernel into a
3 byte OCI element format containing the operating class, primary
channel and secondary channel center frequency index.
This changes scan_bss from using separate members for each
OWE transition element data type (ssid, ssid_len, and bssid)
to a structure that holds them all.
This is being done because OWE transition has option operating
class and channel bytes which will soon be parsed. This would
end up needing 5 separate members in scan_bss which is a bit
much for a single IE that needs to be parsed.
This makes checking the presense of the IE more convenient
as well since it can be done with a simple NULL pointer check
rather than having to l_memeqzero the BSSID.
These members are currently stored in scan_bss but with the
addition of operating class/band info this will become 5
separate members. This is a bit excessive to store in scan_bss
separately so instead this structure can hold everything related
to the OWE transition IE.
Add a utility for setting the OCI obtained from the hardware (prior to
handshake starting) as well as a utility to validate the OCI obtained
from the peer.
This adds a utility that can convert an operating class + channel
combination to a frequency. Operating class is assumed to be a global
operating class from 802.11 Appendix E4.
This information can be found in Operating Channel Information (OCI) IEs,
as well as OWE Transition Mode IEs.
Calling handshake_state_setup_own_ciphers from within
handshate_state_set_authenticator_ie was misleading. In all cases the
supplicant chooses the AKM. This worked since our AP code only ever
advertises a single AKM, but would not work in the general case.
Similarly, the supplicant would choose which authentication type to use
by either sending the WPA1 or WPA2 IE (or OSEN). Thus the setting of
the related variables in handshake_state_set_authenticator_ie was also
incorrect. In iwd, the supplicant_ie would be set after the
authenticator_ie, so these settings would be overwritten in most cases.
Refactor these two setters so that the supplicant's chosen rsn_info
would be used to drive the handshake.
reallocarray has been added to glibc relatively recently (version 2.26,
from 2017) and apparently not all users run new enough glibc. Moreover,
reallocarray is not available with uclibc-ng. So use realloc if
reallocarray is not available to avoid the following build failure
raised since commit 891b78e9e8:
/home/giuliobenetti/autobuild/run/instance-3/output-1/host/lib/gcc/xtensa-buildroot-linux-uclibc/10.3.0/../../../../xtensa-buildroot-linux-uclibc/bin/ld: src/sae.o: in function `sae_rx_authenticate':
sae.c:(.text+0xd74): undefined reference to `reallocarray'
Fixes:
- http://autobuild.buildroot.org/results/c6d3f86282c44645b4f1c61882dc63ccfc8eb35a
There isn't much control station has with how BSS's are inserted to
a network object. The rank algorithm makes that decision. Because of
this we could end up in a situation where the Open BSS is preferred
over the OWE transition BSS.
In attempt to better handle this any Open BSS in this type of network
will not be chosen unless its the only candidate (e.g. no other BSSs,
inability to connect with OWE, or an improperly configured network).
OWE Transition is described in the WiFi Alliance OWE Specification
version 1.1. The idea behind it is to support both legacy devices
without any concept of OWE as well as modern ones which support the
OWE protocol.
OWE is a somewhat special type of network. Where it advertises an
RSN element but is still "open". This apparently confuses older
devices so the OWE transition procedure was created.
The idea is simple: have two BSS's, one open, and one as a hidden
OWE network. Each network advertises a vendor IE which points to the
other. A device sees the open network and can connect (legacy) or
parse the IE, scan for the hidden OWE network, and connect to that
instead.
Care was taken to handle connections to hidden networks directly.
The policy is being set that any hidden network with the WFA OWE IE
is not connectable via ConnectHiddenNetwork(). These networks are
special, and can only be connected to via the network object for
the paired open network.
When scan results come in from any source (DBus, quick, autoconnect)
each BSS is checked for the OWE Transition IE. A few paths can be
taken here when the IE is found:
1. The BSS is open. The BSSID in the IE is checked against the
current scan results (excluding hidden networks). If a match is
found we should already have the hidden OWE BSS and nothing
else needs to be done (3).
2. The BSS is open. The BSSID in the IE is not found in the
current scan results, and the open network also has no OWE BSS
in it. This will be processed after scan results.
3. The BSS is not open and contains the OWE IE. This BSS will
automatically get added to the network object and nothing else
needs to be done.
After the scan results each network is checked for any non-paired
open BSS's. If found a scan is started for these BSS's per-network.
Once these scan results come in the network is notified.
From here network.c can detect that this is an OWE transition
network and connect to the OWE BSS rather than the open one.
Specifically OWE networks with multiple open/hidden BSS's are troublesome
to scan for with the current APIs. The scan parameters are limited to a
single SSID and even if that was changed we have the potential of hitting
the max SSID's per scan limit. In all, it puts the burden onto the caller
to sort out the SSIDs/frequencies to scan for.
Rather than requiring station to handle this a new scan API was added,
scan_owe_hidden() which takes a list of open BSS's and will automatically
scan for the SSIDs in the OWE transition IE for each.
It is slightly optimized to first check if all the hidden SSID's are the
same. This is the most likely case (e.g. single pair or single network)
and a single scan command can be used. Otherwise individual scan commands
are queued for each SSID/frequency combo.
handshake_util_ap_ie_matches() is used to make sure that the RSN element
received from the Authenticator during handshake / association response
is the same as the one advertised in Beacon/Probe Response frames. This
utility tries to bitwise compare the element first, and only if that
fails, compares RSN members individually.
For FT, bitwise comparison will always fail since the PMKID has to be
included by the Authenticator in any RSN IEs included in Authenticate
& Association Response frames.
Perform the bitwise comparison as an optimization only during processing
of eapol message 3/4. Also keep the parsed rsn information for future
use and to possibly avoid re-parsing it during later checks.
DBus scan is performed in several subsets. In certain corner-case
circumstances it would be possible for autoconnect to run after each
subset scan. Instead, trigger autoconnect only after the dbus scan
completes.
This also works around a condition where ANQP results could trigger
autoconnect too early.
Several invocations of station_set_scan_results() base the
'add_to_autoconnect' parameter on station_is_autoconnecting(). Simplify
the code by having station_set_scan_results() invoke that itself.
'add_to_autoconnect' now becomes an 'intent' parameter, specifying
whether autoconnect path should be invoked as a result of these scan
results or not when station is in an appropriate state. Rename
'add_to_autoconnect' parameter to make this clearer.
If the frequency of the bss is not in the list of frequencies for the
current scan, then this is a cached bss. It was likely already
processed for ANQP before, so skip it.
IWD has restricted SSIDs to only utf8 so they can be displayed but
with the addition of OWE transition networks this is an unneeded
restriction (for these networks). The SSID of an OWE transition
network is never displayed to the user so limiting to utf8 isn't
required.
Allow non-utf8 SSIDs to be scanned for by including the length in
the scan parameters and not relying on strlen().
This is a parser for the WFA OWE Transition element. For now the
optional band/channel bytes will not be parsed as hostapd does not
yet support these and would also require the 802.11 appendix E-1
to be added to IWD. Because of this OWE Transition networks are
assumed to be on the same channel as their open counterpart.
in6_addr.__in6_u.__u6_addr8 is glibc-specific and named differently in
the headers shipped with musl libc for example. The POSIX compliant and
universal way of accessing it is in6_addr.s6_addr.
This was actually broken if triggered because __network_connect
checks if network->connect_after_owe_hidden is set and returns
already in progress. We want to keep this behavior though for
obvious reasons.
To fix this station_connect_network can be called directly which
bypasses the check. This is essentially how ANQP avoids this
problem as well.
Similar to ANQP a connect call could come in while station is
scanning for OWE hidden networks. This is supported in the same
manor by saving away the dbus message and resuming the connection
after the hidden OWE scan.
With the addition of OWE transition network needs to be notified
of the hidden OWE scan which is quite similar to how it is notified
of ANQP. The ANQP event watch can be made generic and reused to
allow other events besides ANQP.
This is being added to support OWE transition mode. For these
type of networks the OWE BSS may contain a different SSID than
that of the network, but the WFA spec requires this be hidden
from the user. This means we need to set the handshake SSID based
on the BSS rather than the network object.
Refactor netconfig_set_dns to be a bit easier to follow and remove use
of macros. Also bail out early if no DNS addresses are provided instead
of building an empty DNS list since resolve_set_dns() simply returns if
a NULL or empty DNS list is provided.
Kernel keeps transmitting authentication frames until told to stop or an
authentication frame the kernel considers 'final' is received. Detect
cases where the kernel would keep retransmitting, and if auth_proto
encounters a fatal protocol error, prevent these retransmissions from
occuring by sending a Deauthenticate command to the kernel.
Additionally, treat -EBADMSG/-ENOMSG return from auth_proto specially.
These error codes are meant to convey that a frame should be silently
dropped and retransmissions should continue.
This works around a hostapd bug (described more in the TODO comment)
which is exposed because of the kernels overly agressive re-transmit
behavior on missed ACKs. Combined this results in a death if the
initial commit is not acked. This behavior has been identified in
consumer access points and likely won't ever be patched for older
devices. Because of this IWD must work around the problem which can
be eliminated by not sending out this commit message.
This bug was reported to the hostapd ML:
https://lists.infradead.org/pipermail/hostap/2021-September/039842.html
This change should not cause any compatibility problems to non-hostapd
access points and is identical to how wpa_supplicant treats this
scenario.
If a commit is received while in an accepted state the spec states
the scalar should be checked against the previous commit and if
equal the message should be silently dropped.
In netconfig_load_settings apply the DNS overrides strings we've loaded
instead of leaking them.
Fixes: ad228461ab ("netconfig: Move loading settings to new method, refactor")
netdev now assumes the SSID was set in the handshake (normally via
network_handshake_setup) but WSC calls netdev_connect directly so
it also should set the SSID.
In order to support OWE in the CMD_CONNECT path the scan_bss parameter
needs to be removed since this is lost after netdev_connect returns.
Nearly everything needed is also stored in the handshake except the
privacy capability which is now being mirrored in the netdev object
itself.
Use the MAC addresses for the gateways and DNS servers received in the
FILS IP Assigment IE together with the gateway IP and DNS server IP.
Commit the IP to MAC mappings directly to the ARP/NDP tables so that the
network stack can skip sending the corresponding queries over the air.
Send and receive the FILS IP Address Assignment IEs during association.
As implemented this would work independently of FILS although the only
AP software handling this mechanism without FILS is likely IWD itself.
No support is added for handling the IP assignment information sent from
the server after the initial Association Request/Response frames, i.e.
the information is only used if it is received directly in the
Association Response without the "response pending" bit, otherwise the
DHCP client will be started.
Add two methods that will allow station to implement FILS IP Address
Assigment, one method to decide whether to send the request during
association, and fill in the values to be used in the request IE, and
another to handle the response IE values received from the server and
apply them. The netconfig->rtm_protocol value used when the address is
assigned this way remains RTPROT_DHCP because from the user's point of
view this is automatic IP assigment by the server, a replacement for
DHCP.
Split loading settings out of network_configure into a new method,
network_load_settings. Make sure both consistently handle errors by
printing messages and informing the caller.
Setter which forces the use of group 19 rather than the group order
that ELL provides. Certain APs have been found to have buggy group
negotiation and only work if group 19 is tried first, and only. When
an AP like this this is found (based on vendor OUI match) SAE will
use group 19 unconditionally, and fail if group 19 does not work.
Other groups could be tried upon failure but per the spec group 19
must be supported so there isn't much use in trying other, optional
groups.
Handle the 802.11ai FILS IP Address Assignment IEs in Association
Request frames when netconfig is enabled. Only IPv4 is supported.
Like the P2P IP Allocation mechanism, since the payload format and logic
is independent from the rest of the FILS standard this is enabled
unconditionally for clients who want to use it even though we don't
actually do FILS in AP mode.
If netconfig is enabled tell the DHCP server to expire any leases owned
by the client that is disconnecting by using l_dhcp_server_expire_by_mac
to return the IPs to the IP pool. They're added to the expired list
so they'd only be used if there are no other addresses left in the pool
and can be reactivated if the client comes back before the address is
used by somebody else.
This should ensure that we're always able to offer an address to a new
client as long as there are fewer concurrent clients than addresses in
the configured subnet or IP range.
Use the struct handshake_state::support_ip_allocation field already
supported in eapol.c authenticator side to enable the P2P IP Allocation
mechanism in ap.c. Add the P2P_GROUP_CAP_IP_ALLOCATION bit in P2P group
capabilities to signal the feature is now supported.
There's no harm in enabling this feature in every AP (not just P2P Group
Owner) but the clients won't know whether we support it other than
through that P2P-specific group capability bit.
Add a handshake event for use by the AP side for mechanisms that
allocate client IPs during the handshake: P2P address allocation and
FILS address assignment. This is emitted only when EAPOL or the
auth_proto is actually about to send the network configuration data to
the client so that ap.c can skip allocating a DHCP leases altogether if
the client doesn't send the required KDE or IE.
Some drivers ignore the initial IF_OPER_UP setting that was sent during
netdev_connect_ok(). Attempt to work around this by parsing New Link
events. If OperState setting is still not correct in a subsequent event,
retry setting OperState to IF_OPER_UP.
The hotspot case can actually result in network being NULL which
ends up crashing when accessing "->secrets". In addition any
secrets on this network were never removed for hotspot networks
since everything happened in network_unset_hotspot.
This is meant to be used as a generic notification to autotests. For
now 'no-roam-candidates' is the only event being sent. The idea
is to extend these events to signal conditions that are otherwise
undiscoverable in autotesting.
Replace instances of the ap_del_station() +
ap_sta_free()/ap_remove_sta() with calls to ap_station_disconnect to
make sure we consistently remove the station from the ap->sta_states
queue before using ap_del_station(). ap_del_station() may generate an
event to the ap.h API user (e.g. P2P) and this may end up tearing down
the AP completely.
For that scenario we also don't want ap_sta_free() to access sta->ap so
we make sure ap_del_station() performs these cleanup steps so that
ap_sta_free() has nothing to do that accesses sta->ap.
client_frame is not valid for a beacon frame as beacons are not sent in
response to another frame. Move the access to client_frame->address_2
to the conditional blocks for Probe Response and Association Response
frames.
This is to support the autotesting framework by allowing a smaller
scan subset. This will cut down on the amount of time spent scanning
via normal DBus scans (where the entire spectrum is scanned).
Most autotests do not want autoconnect behavior so it is being
turned off by default. There are a few tests where it is needed
and in these few cases the test can enable autoconnect through
the new station debug property.
This adds the property "AutoConnect" to the station debug interface
which can be read/written to disable or enable autoconnect globally.
As one would expect this property is only going to be used for testing
hence why it was put on the debug interface. Mosts tests disable
autoconnect (or they should) because it leads to unexpected connections.
This method will initiate a connection to a specific BSS rather
than relying on a network based connection (which the user has
no control over which specific BSS is selected).
The only point of failure in netdev_connect_common was setting
up the handshake type. Moving this outside of netdev_connect_common
makes the code flow much better in netdev_{connect,reassociate} as
nothing needs to be reset upon failure.
Utilize 'storage_is_file' when readdir returns DT_UNKNOWN to ensure
features like autoconnect work on filesystems that don't return a d_type
(eg. XFS).
Utilize 'storage_is_file' when readdir returns DT_UNKNOWN to ensure
features like autoconnect work on filesystems that don't return a d_type
(eg. XFS).
Add a function 'storage_is_file' which will use stat to verify a
file's existence given a path relative to the storage directory.
Not all filesystems provide a file type via readdir's d_type.
XFS is a notable system with optional d_type support.
When d_type is not supported stat must be used as a fallback.
If a stat fallback is not provided iwd will fail to load state files.
The preparing_roam flag is expected to be set by a few roam
routines and normally this is done prior to the roam scan.
The Roam() developer option was not doing this and would
cause failed roams in some cases.
This adds support in netdev_reassociate for all the auth
protocols (SAE/FILS/OWE) by moving the bulk of netdev_connect
into netdev_connect_common. In addition PREV_BSSID is set
in the associate message if 'in_reassoc' is true.
Some connections, like Hotspot require additional IEs to be used during
the Association. These are now passed as 'extra_ies' when invoking
netdev_connect, however they are also needed during ReAssociation and FT
to such APs.
Additionally, it may be that Hotspot-enabled APs will start utilizing
FILS or SAE. In these cases the extra_ies need to be accounted for
somehow, either by making a copy in handshake_state, netdev, or the
auth_proto itself. Similarly, P2P which heavily uses vendor IEs can be
used over SAE in the future.
Since a copy of these IEs is needed, might as well store them in
handshake_state itself for easy book-keeping by network/station.
RM Enabled Capabilities and Extended Capabilities IEs were correctly
being sent when using CMD_CONNECT for initial connections and
re-associations. However, for SoftMac SAE, FT, FILS and OWE connections,
these additional IEs were not added properly during the Associate step.
If the driver supports RRM, then we might as well always send the RM
Enabled Capabilities IE (and use the USE_RRM flag). 802.11-2020
suggests that this IE can be sent whenever
dot11RadioMeasurementActivated is true, and this setting is independent
of whether the peer supports RRM. There's nothing to indicate that an
STA should not send these IEs if the AP is not RRM enabled.
While we correctly emit a NETDEV_EVENT_CHANNEL_SWITCHED event from
netdev for other modules to respond to, we fail to actually update the
frequency of the netdev object in question. Since the netdev frequency
is used elsewhere (e.g. to send action frames), it needs updating too.
Fixes: 5eb0b7ca8e ("netdev: add a channel switch event")
This variable ended up being used only on the fast-transition path. On
the re-associate path it was never used, but memcpy-ied nevertheless.
Since its only use is by auth_proto based protocols, move it to the
auth_proto object directly.
Due to how prepare_ft works (we need prev_bssid from the handshake, but
the handshake is reset), have netdev_ft_* methods take an 'orig_bss'
parameter, similar to netdev_reassociate.
IE elements in various management frames are ordered. This ordering is
outlined in 802.11, Section 9.3.3. The ordering is actually different
depending on the frame type. Instead of trying to implement the order
manually, add a utility function that will sort the IEs in the order
expected by the particular management frame type.
Since we already have IE ordering look up tables in the various
management frame type validation functions, move them to global level
and re-use these lookup tables for the sorting utility.
This refactors some code to eliminate getting the ERP entry twice
by simply returning it from network_has_erp_identity (now renamed
to network_get_erp_cache). In addition this code was moved into
station_build_handshake_rsn and properly cleaned up in case there
was an error or if a FILS AKM was not chosen.
The authorized macs pointer was being set to either the wsc_beacon
or wsc_probe_response structures, which were initialized out of
scope to where 'amacs' was being used. This resulted in an out of
scope read, caught by address sanitizers.
One of these message buffers was overflowing due to padding not
being taken into account (caught by sanitizers). Wrapped the length
of all message buffers with EAP_SIM_ROUND as to account for any
padding that attributes may add.
The network_config was not being copied to network_info when
updated. This caused any new settings to be lost if the network
configuration file was updated during runtime.
The RoamThreshold5G was never honored because it was being
set prior to any connections. This caused the logic inside
netdev_cqm_rssi_update to always choose the 2GHz threshold
(RoamThreshold) due to netdev->frequency being zero at this time.
Instead call netdev_cqm_rssi_update in all connect/transition
calls after netdev->frequency is updated. This will allow both
the 2G and 5G thresholds to be used depending on what frequency
the new BSS is.
The call to netdev_cqm_rssi_update in netdev_setup_interface
was also removed since it serves no purpose, at least now
that there are two thresholds to consider.
Under certain conditions, access points with very low signal could be
detected. This signal is too low to estimate a data rate and causes
this L_WARN to fire. Fix this by returning a -ENETUNREACH error code in
case the signal is too low for any of the supported rates.
Transition Disable indications and information stored in the network
profile needs to be enforced. Since Transition Disable information is
now stored inside the network object, add a new method
'network_can_connect_bss' that will take this information into account.
wiphy_can_connect method is thus deprecated and removed.
Transition Disable can also result in certain AKMs and pairwise ciphers
being disabled, so wiphy_select_akm method's signature is changed and
takes the (possibly overriden) ie_rsn_info as input.
This indication can come in via EAPoL message 3 or during
FILS Association. It carries information as to whether certain
transition mode options should be disabled. See WPA3 Specification,
version 3 for more details.
Some network settings keys are set / parsed in multiple files. Add a
utility to parse all common network configuration settings in one place.
Also add some defines to make sure settings are always saved in the
expected group/key.
This returns the length of the actual contents, making the code a bit
easier to read and avoid the need to mask the KDE value which isn't
self-explanatory.
Instead of requiring each auth_proto to perform validation of the frames
received via rx_authenticate & rx_associate, have netdev itself perform
the mpdu validation. This is unlikely to happen anyway since the kernel
performs its own frame validation. Print a warning in case the
validation fails.
There's no reason why a change in groups would result in the
anti-clogging token becoming invalid. This might result in us needing
an extra round-trip if the peer is using countermeasures and our
requested group was deemed unsuitable.
We may receive multiple anti-clogging request messages. We memdup the
token every time, without checking whether memory for one has already
been allocated. Free the old token prior to allocating a new one.
The group was not checked at all. The specification doesn't
mention doing so specifically, but we are only likely to receive an Anti
Clogging Token Request message once we have sent our initial Commit. So
the group should be something we could have sent or might potentially be
able to use.
In case an exceptional condition occurs, handle this more consistently
by returning the following errors:
-ENOMSG -- If a message results in the retransmission timer t0 being
restarted without actually sending anything.
-EBADMSG -- If a received message is to be silently discarded without
affecting the t0 timer.
-ETIMEDOUT -- If SYNC_MAX has been exceeded
-EPROTO -- If a fatal protocol error occurred
Now that sae_verify_* methods no longer allow dropped frames though,
there's no reason to keep these checks. sae_process_commit and
sae_process_confirm will now always receive messages in their respective
state.
sae_verify_* functions were correctly marking frames to be dropped, but
were returning 0, which caused the to-be-dropped frames to be further
processed inside sae_rx_authenticate. Fix that by returning a proper
error.
Make sure to return -EAGAIN whenever a received frame from the peer
results in a retransmission. This also prevents the frame from being
mistakenly processed further in sae_rx_authenticate.
Do not try to transition to a new state from sae_send_commit /
sae_send_confirm since these methods can be called due to
retransmissions or other unexpected messages. Instead, transition to
the new state explicitly from sae_process_commit / sae_process_confirm.
SAE protocol is meant to authenticate peers simultaneously. Hence it
includes a tie-breaker provision in case both peers enter into the
Committed state and the Commit messages arrive at the respective peers
near simultaneously.
However, in the case of STA or Infrastructure mode, only one peer (STA)
would normally enter the Committed state (via Init) and the tie-breaker
provision is not needed. If this condition is detected, abort the
connection.
Also remove the uneeded group change check in process_commit.
sae_compute_pwe doesn't really depend on the state of sae_sm. Only the
curve to be used for the PWE calculation is needed. Rework the function
signature to reflect that and remove unneeded member of struct sae_sm.
ie_tlv_builder_init takes a size_t as input, yet for some reason
ie_tlv_builder_finalize takes an unsigned int argument as output. Fix
the latter to use size_t as well.
During processing of Connect events by netdev, some of these elements
might be updated even when already set. Instead of issuing
l_free/l_memdup each time, check and see whether the elements are
bitwise identical first.
Returns a template RSNX element that can be further modified by callers
to set any additional capabilities if required. wiphy will fill in
those capabilities that are driver / firmware dependent.
Most parameters set into the handshake object are actually known by the
network object itself and not station. This includes address
randomization settings, EAPoL settings, passphrase/psk/8021x settings,
etc. Since the number of these settings will only keep growing, move
the handshake setup into network itself. This also helps keep network
internals better encapsulated.
Refactor network_sync_psk to not require setting attributes into
multiple settings objects. This is in fact unnecessary as the parsed
security parameters are used everywhere else instead. Also make sure to
wipe the [Security] group first, in case any settings were invalid
during loading or otherwise invalidated.
Credentials obtained can now be either in passphrase or PSK form. Prior
to commit 7a9891dbef, passphrase credentials were always converted to
PSK form by invoking crypto_psk_from_passphrase. This was changed in
order to support WPA3 networks. Unfortunately the provisioning logic
was never properly updated. Fix that, and also try to not overwrite any
existing settings in case WSC is providing credentials for networks that
are already known.
Fixes: 7a9891dbef ("wsc: store plain text passphrase if available")
There will be additional security-related settings that will be
introduced for settings files. In particular, Hash-to-Curve PT
elements, Transition Disable settings and potentially others in the
future. Since PSK is now not the only element that would require
update, rename this function to better reflect this.
PRF+ from RFC 5295 is the more generic function using which HKDF_Expand
is defined. Allow this function to take a vararg list of arguments to
be hashed (these are referred to as 'S' in the RFCs).
Implement hkdf_expand in terms of prf_plus and update all uses to the
new syntax.
This fixes an issue where the udp port was not being opened due to a
permission denied error. The result of this was the dhcp client would
fail to send the renewal request and so the dhcp lease would expire.
The addition of the CAP_NET_BIND_SERVICE capability allows the service
to open sockets in the restricted port range (<1024) which is required
for dhcp.
This is based on a previous patch by Roberto Santalla Fernández.
A new config is introduced into the network config file under IPv4
called SendHostname. If this is set to true then we add the hostname
into all DHCP requests. The default is false.
If the idea is that the interface should only be present when connected
then don't do this in the DISCONNECTING state as there are various
possible transitions from CONNECTED or ROAMING directly to DISCONNECTED.
Don't require a gateway address from the settings file or from the DHCP
server when doing netconfig. Failing when the gateway address was
missing was breaking P2P but also small local networks.
Be paranoid and check that the prefix length in addresses from
used_addr4_list are not zero (they shouldn't be) and that address family
is AF_INET (it should be), mainly to quiet coverity warnings:
While there also fix one line's indentation.
At the end of ip_pool_select_addr4() we'd check if the selected address
is equal to the subnet address and increment it by 1 to produce a valid
host address for the AP. That check was always correct only with 24-bit
prefix, extend it to actually use the prefix-dependent mask instead of
0xff. Fixes a testAP failure triggered 50% of the times because the
netmask is 28 bit long there.
Don't signal the connected state until the client has obtained a DHCP
lease and we can set the ConnectedIP property. From now on that
property is always set when there's a connection.
p2p_parse_association_req() already extracts the P2P IE payload from the
IE sequence, there's no need to call ie_tlv_extract_p2p_payload before
it. Pass the IE sequence directly to p2p_parse_association_req().
Similarly to commit
27d302a0 ("band: Add a utility to estimate VHT rx data rate"), this
commit adds an RX data rate estimation utility for HT connections.
This function is meant to supercede a similar function in ie.c. The
current approach results in very optimistic data rate estimates since it
only takes into account the VHT/HT Capabilities IEs. It does not take
into account any local hardware limitations (such as no VHT/HT support),
limited RX MCS sets & number of spatial streams. It also does not take
into account that the AP might not be actually operating on higher
bandwidth channels.
This function is meant to address that by matching peer TX MCS sets with
the local hardware RX MCS set capability. It also takes into account
channel bandwidth capabilities of the local hardware, as well as whether
the AP is actually operating on a wider channel.
Move the band definition out of wiphy.c and into band.[ch]. This is
done to make certain utilities that depend on band information capable
of being tested from unit tests.
The band concept will most likely grow over time. For now, the only
user will be wiphy.c and unit tests, so the structures are kept public.
It is possible that the address set command succeeds just after a
netconfig object has been destroyed.
==6485== Invalid read of size 8
==6485== at 0x458A6D: netconfig_ipv4_routes_install (netconfig.c:629)
==6485== by 0x458D1C: netconfig_ipv4_ifaddr_add_cmd_cb (netconfig.c:689)
==6485== by 0x4A5E7B: process_message (netlink.c:181)
==6485== by 0x4A626A: can_read_data (netlink.c:289)
==6485== by 0x4A3E19: io_callback (io.c:120)
==6485== by 0x4A27B5: l_main_iterate (main.c:478)
==6485== by 0x4A28F6: l_main_run (main.c:525)
==6485== by 0x4A2C0E: l_main_run_with_signal (main.c:647)
==6485== by 0x404D27: main (main.c:542)
==6485== Address 0x4a47290 is 32 bytes inside a block of size 104 free'd
==6485== at 0x48399CB: free (vg_replace_malloc.c:538)
==6485== by 0x49998B: l_free (util.c:136)
==6485== by 0x457699: netconfig_free (netconfig.c:130)
==6485== by 0x45A038: netconfig_destroy (netconfig.c:1163)
==6485== by 0x41FD16: station_free (station.c:3613)
==6485== by 0x42020E: station_destroy_interface (station.c:3710)
==6485== by 0x4B990E: interface_instance_free (dbus-service.c:510)
==6485== by 0x4BC193: _dbus_object_tree_remove_interface (dbus-service.c:1694)
==6485== by 0x4BA22A: _dbus_object_tree_object_destroy (dbus-service.c:795)
==6485== by 0x4B078D: l_dbus_unregister_object (dbus.c:1537)
==6485== by 0x417ACB: device_netdev_notify (device.c:361)
==6485== by 0x4062B6: netdev_free (netdev.c:808)
==6485== Block was alloc'd at
==6485== at 0x483879F: malloc (vg_replace_malloc.c:307)
==6485== by 0x499857: l_malloc (util.c:62)
==6485== by 0x459DC0: netconfig_new (netconfig.c:1115)
==6485== by 0x41FC29: station_create (station.c:3592)
==6485== by 0x4207B3: station_netdev_watch (station.c:3864)
==6485== by 0x411A17: netdev_initial_up_cb (netdev.c:5588)
==6485== by 0x4A5E7B: process_message (netlink.c:181)
==6485== by 0x4A626A: can_read_data (netlink.c:289)
==6485== by 0x4A3E19: io_callback (io.c:120)
==6485== by 0x4A27B5: l_main_iterate (main.c:478)
==6485== by 0x4A28F6: l_main_run (main.c:525)
==6485== by 0x4A2C0E: l_main_run_with_signal (main.c:647)
==6485==
netdev_free relies on netdev->connected being set to detect whether a
connection is in progress. This variable is only set once the driver
has been connected however, so for situations where a CMD_CONNECT is
still 'in flight' or if the wiphy work is still pending, the ongoing
connection will not be canceled. Fix that by being more thorough when
trying to detect that a connection is in progress.
src/wiphy.c:wiphy_radio_work_next() Starting work item 2
Terminate
src/netdev.c:netdev_free() Freeing netdev wlan0[9]
src/device.c:device_free()
src/station.c:station_free()
src/netconfig.c:netconfig_destroy()
Removing scan context for wdev c
src/scan.c:scan_context_free() sc: 0x4a44c80
src/netdev.c:netdev_mlme_notify() MLME notification New Station(19)
src/netdev.c:netdev_link_notify() event 16 on ifindex 9
==6356== Invalid write of size 4
==6356== at 0x40A253: netdev_cmd_connect_cb (netdev.c:2522)
==6356== by 0x4A8886: process_unicast (genl.c:986)
==6356== by 0x4A8C48: received_data (genl.c:1098)
==6356== by 0x4A3DFD: io_callback (io.c:120)
==6356== by 0x4A2799: l_main_iterate (main.c:478)
==6356== by 0x4A28DA: l_main_run (main.c:525)
==6356== by 0x4A2BF2: l_main_run_with_signal (main.c:647)
==6356== by 0x404D27: main (main.c:542)
==6356== Address 0x4a3e418 is 152 bytes inside a block of size 472 free'd
==6356== at 0x48399CB: free (vg_replace_malloc.c:538)
==6356== by 0x49996F: l_free (util.c:136)
==6356== by 0x406662: netdev_free (netdev.c:886)
==6356== by 0x4129C2: netdev_shutdown (netdev.c:5980)
==6356== by 0x403A14: iwd_shutdown (main.c:79)
==6356== by 0x403A7D: signal_handler (main.c:90)
==6356== by 0x4A2AFB: sigint_handler (main.c:612)
==6356== by 0x4A2F3B: handle_callback (signal.c:78)
==6356== by 0x4A3030: signalfd_read_cb (signal.c:104)
==6356== by 0x4A3DFD: io_callback (io.c:120)
==6356== by 0x4A2799: l_main_iterate (main.c:478)
==6356== by 0x4A28DA: l_main_run (main.c:525)
==6356== Block was alloc'd at
==6356== at 0x483879F: malloc (vg_replace_malloc.c:307)
==6356== by 0x49983B: l_malloc (util.c:62)
==6356== by 0x4121BD: netdev_create_from_genl (netdev.c:5776)
==6356== by 0x451F6F: manager_new_station_interface_cb (manager.c:173)
==6356== by 0x4A8886: process_unicast (genl.c:986)
==6356== by 0x4A8C48: received_data (genl.c:1098)
==6356== by 0x4A3DFD: io_callback (io.c:120)
==6356== by 0x4A2799: l_main_iterate (main.c:478)
==6356== by 0x4A28DA: l_main_run (main.c:525)
==6356== by 0x4A2BF2: l_main_run_with_signal (main.c:647)
==6356== by 0x404D27: main (main.c:542)
If the daemon is started and killed rapidly on startup, it is possible
for netdev_shutdown to be called prior to manager processing messages
that actually create the netdev itself. Since the netdev_list has
already been freed, the storage is lost. Fix that by destroying
netdev_list only when the module is unloaded.
If we're going down, make sure to notify any watches about EVENT_DEL
earlier. Not doing so might result in us not cleaning up requests that
might have been started as the result of this event.
station_free() is invoked when one of two possibilities happen:
- Device has been powered down, and EVENT_DOWN has been emitted
- Device has been removed, and EVENT_DEL has been emitted
In both cases there is not much point for netdev_disconnect to be
invoked as that tries to cleanly shut down an existing connection. The
only thing the ABORTED error accomplishes in this case is to send a
dbus_aborted_error for the pending_connect message, if it exists.
There's already code for doing this in station_free().
src/station.c:station_enter_state() Old State: autoconnect_quick, new state: connecting (auto)
src/scan.c:scan_cancel() Trying to cancel scan id 1 for wdev 7
src/wiphy.c:wiphy_radio_work_done() Work item 1 done
src/wiphy.c:wiphy_radio_work_next() Starting work item 2
Terminate
src/netdev.c:netdev_free() Freeing netdev wlan0[9]
src/device.c:device_free()
src/station.c:station_free()
src/wiphy.c:wiphy_radio_work_done() Work item 2 done
src/station.c:station_connect_cb() 9, result: 5
src/netconfig.c:netconfig_destroy()
Removing scan context for wdev 7
src/scan.c:scan_context_free() sc: 0x4a39490
src/netdev.c:netdev_mlme_notify() MLME notification New Station(19)
src/netdev.c:netdev_link_notify() event 16 on ifindex 9
src/netdev.c:netdev_link_notify() event 16 on ifindex 9
src/netdev.c:netdev_mlme_notify() MLME notification Authenticate(37)
src/netdev.c:netdev_link_notify() event 16 on ifindex 9
src/netdev.c:netdev_mlme_notify() MLME notification Associate(38)
src/netdev.c:netdev_link_notify() event 16 on ifindex 9
src/netdev.c:netdev_mlme_notify() MLME notification Connect(46)
src/netdev.c:netdev_link_notify() event 16 on ifindex 9
src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is US
src/netdev.c:netdev_link_notify() event 16 on ifindex 9
src/netdev.c:netdev_unicast_notify() Unicast notification 129
src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX
==20311== Invalid write of size 4
==20311== at 0x406E74: netdev_cmd_disconnect_cb (netdev.c:1130)
==20311== by 0x4A78A8: process_unicast (genl.c:986)
==20311== by 0x4A7C6A: received_data (genl.c:1098)
==20311== by 0x4A2E1F: io_callback (io.c:120)
==20311== by 0x4A17BB: l_main_iterate (main.c:478)
==20311== by 0x4A18FC: l_main_run (main.c:525)
==20311== by 0x4A1C14: l_main_run_with_signal (main.c:647)
==20311== by 0x404D27: main (main.c:542)
==20311== Address 0x4a37a0c is 156 bytes inside a block of size 472 free'd
==20311== at 0x48399CB: free (vg_replace_malloc.c:538)
==20311== by 0x498991: l_free (util.c:136)
==20311== by 0x406651: netdev_free (netdev.c:883)
==20311== by 0x412976: netdev_shutdown (netdev.c:5970)
==20311== by 0x403A14: iwd_shutdown (main.c:79)
==20311== by 0x403A7D: signal_handler (main.c:90)
==20311== by 0x4A1B1D: sigint_handler (main.c:612)
==20311== by 0x4A1F5D: handle_callback (signal.c:78)
==20311== by 0x4A2052: signalfd_read_cb (signal.c:104)
==20311== by 0x4A2E1F: io_callback (io.c:120)
==20311== by 0x4A17BB: l_main_iterate (main.c:478)
==20311== by 0x4A18FC: l_main_run (main.c:525)
The data rate estimation belongs in wiphy since it should take hardware
capabilities into account. Right now the data rate calculation simply
assumes the hardware is as capable as the AP. scan.c will be ported to
use this utility and the data rate estimation will be expanded to take
wiphy capabilities into account.
scan_parse_result used to parse the wdev and return this to the caller
where it was compared against the expected wdev. Simplify this by
extract the wdev first, and proceeding with the bss parsing afterwards.
Right now a very limited set of band parameters are parsed into wiphy.
This includes the supported rates and the supported frequencies.
However, there is much more information that is given for each band.
Introduce a new band object that will store this information and can be
extended for future use.
Change the char *addr_str and uint8_t prefix_len pair to an
l_rtnl_address object and use ell/rtnl.h utilities that use that
directly. Extend broadcast_from_ip to handle prefix_len.
We generate the DBus error reply type from the errno only when
ap_start() was failing synchronously, now also send the errno through
the callbacks so that we can also return a specific DBus reply when
failing asynchronously. Thea AP autotest relies on receiving the
AlreadyExists DBus error.
Deprecate the global [General].APRanges setting in favour of
[IPv4].APAddressPool with an extended (but backwards-compatible) syntax.
Drop the existing address pool creation code.
The new APAddressPool setting has the same syntax as the profile-local
[IPv4].Address setting and the subnet selection code will fall back
to the global setting if it's missing, this way we use common code to
handle both settings.
Extend the [IPv4].Address setting's syntax to allow a new format: a list
of <IP>/<prefix_len> -form strings that define the address space from
which a subnet is selected. Rewrite the DHCP settings loading with
other notable changes:
* validate some of the settings more thoroughly,
* name all netconfig-related ap_state members with the netconfig_
prefix,
* make sure we always call l_dhcp_server_set_netmask(),
* allow netmasks other than 24-bit and change the default to 28 bits,
* as requested avoid using the l_net_ ioctl-based functions although
l_dhcp still uses them internally,
* as requested avoid touching the ap_state members until the end of
some functions so that on error they're basically a no-op (for
readability).
Add the ip_pool_select_addr4 function to select a random subnet of requested
size from an address space defined by a string list (for use with the
AP profile [IPv4].Address and the global [IPv4].APAddressPool settings),
avoiding those subnets that conflict with subnets in use. We take care
to give a similar weight to all subnets contained in the specified
ranges regardless of how many ranges contain each, basically so that
overlapping ranges don't affect the probabilities (debatable.)
Add the ip-pool submodule that tracks IPv4 addresses in use on the
system for use when selecting the address for a new AP. l_rtnl_address
is used internally because if we're going to return l_rtnl_address
objects it would be misleading if we didn't fill in all of their
properties like flags etc.
If the connected BSS changes channel, netdev will emit an event with the
new channel's frequency. In response, have station change the frequency
of the connected scan_bss struct and inform network about the update.
If the connected BSS announces that it is switching operating channel,
the kernel may emit the NL80211_CMD_CH_SWTICH_NOTIFY event when the
switch is complete. Add a new netdev event NETDEV_EVENT_CHANNEL_SWITCHED
to signal to interested modules that the connected BSS has changed
channel. The event carries a pointer to the new channel's frequency.
NL80211_BSS_LAST_SEEN_BOOTTIME is expressed in nanoseconds, while BSS
timestamps are expressed in microseconds internally. Convert the
attribute to microseconds when using it to timestamp a BSS. This makes
iwd expire absent BSSes within 30 seconds as intended.
Fixes: 454cee12d4 ("scan: Use kernel-reported time-stamp if provided")
Right now, if a connection to a network selected by auto-connect fails,
the entire autoconnect process is restarted. This means that scans are
kicked off again, auto-connect list is rebuilt, etc. This was due to
auto-connect reusing the same failure path as connections triggered via
D-Bus.
The above behavior can lead to weird situations in certain corner cases.
For example, a highly preferred network configured with the wrong
password would result in auto-connect entering an infinite loop.
Fix this by making sure that all auto-connect entries are tried and
exhausted prior to re-scanning again.
The temporary ban list is cleared when a network is connected to
successfully, and also in network_connect_failed. Unfortunately,
network_connect_failed is not called in all paths (i.e. during
autoconnect) since it messes with the state of secrets and passphrases.
Clear the list in network_disconnected() instead, since it is guaranteed
to be called in every circumstance.
This will be effectively the same as the CONNECTING state, but can be
used to enable differing behavior, depending on whether connection was
triggered by autoconnect or via D-Bus.
Code that walked the VHT TX/RX MCS maps seemed to assume that bit_field
operated on bits that start at '1'. But this utility actually operates
on bits that start at '0'. I.e. the least significant bit is at
position 0.
While we're at it, rename the mcs variable into bitoffset to make it
clearer how the maps are being iterated over. Supported MCS is actually
the value found in the map.
This option has not been used in a very long time, and is of limited
utility since the only thing D-Bus debugging does is hexdumps the
content of D-Bus messages to the terminal.
The current calculation was giving erroneous results when it came to VHT
MCS index 4 and VHT MCS index 8 & 9.
Switch to a precomputed look up table and add a multiplication factor
for short GI.
ap_reset() seems to be called whenever the AP is stopped or removed due
to interface shutdown. For some reason ap_reset did not remove the DHCP
server object, resulting in leaks:
==211== at 0x483879F: malloc (vg_replace_malloc.c:307)
==211== by 0x46B5AD: l_malloc (util.c:62)
==211== by 0x49B0E2: l_dhcp_server_new (dhcp-server.c:715)
==211== by 0x433AA3: ap_setup_dhcp (ap.c:2615)
==211== by 0x433AA3: ap_load_dhcp (ap.c:2645)
==211== by 0x433AA3: ap_load_config (ap.c:2753)
==211== by 0x433AA3: ap_start (ap.c:2885)
==211== by 0x434A96: ap_dbus_start_profile (ap.c:3329)
==211== by 0x482DA9: _dbus_object_tree_dispatch (dbus-service.c:1815)
==211== by 0x47A4D9: message_read_handler (dbus.c:285)
==211== by 0x4720EB: io_callback (io.c:120)
==211== by 0x47130C: l_main_iterate (main.c:478)
==211== by 0x4713DB: l_main_run (main.c:525)
==211== by 0x4713DB: l_main_run (main.c:507)
==211== by 0x4715EB: l_main_run_with_signal (main.c:647)
==211== by 0x403EE1: main (main.c:550)
==209== by 0x43E48A: netconfig_ipv4_select_and_install (netconfig.c:887)
==209== by 0x43E48A: netconfig_configure (netconfig.c:1025)
==209== by 0x41743C: station_connect_cb (station.c:2556)
==209== by 0x408E0D: netdev_connect_ok (netdev.c:1311)
==209== by 0x47549E: process_unicast (genl.c:994)
==209== by 0x47549E: received_data (genl.c:1102)
==209== by 0x4720EB: io_callback (io.c:120)
==209== by 0x47130C: l_main_iterate (main.c:478)
==209== by 0x4713DB: l_main_run (main.c:525)
==209== by 0x4713DB: l_main_run (main.c:507)
==209== by 0x4715EB: l_main_run_with_signal (main.c:647)
==209== by 0x403EE1: main (main.c:550)
Prior to the BSS blacklist a BSS based autoconnect list made
the most sense, but now station actually retries all BSS's upon
failure. This means that for each BSS in the autoconnect list
every other BSS under that SSID will be attempted to connect to
if there is a failure. Essentially this is a network based
autoconnect list, just an indirect way of doing it.
Intead the autoconnect list can be purely network based, using
the network rank for sorting. This avoids the need for a special
autoconnect_entry struct as well as ensures the last connected
network is chosen first (simply based on existing network ranking
logic).
It was observed that IWD's ranking for BSS's did not always
end up with the fastest being chosen. This was due to IWD's
heavy weight on signal strength. This is a decent way of ranking
but even better is calculating a theoretical data rate which
was also done and factored in. The problem is the data rate
factor was always outdone by the signal strength.
Intead remove signal strength entirely as this is already taken
into account with the data rate calculation. This also removes
the check for rate IEs. If no IEs are found the parser will
base the data rate soley on RSSI.
There were a few other factors removed which will be added back
when ranking *networks* rather than BSS's. WPA version (or open)
was removed as well as the privacy capability. These values really
should not differ between BSS's in the same SSID and as such
should be used for network ranking instead.
Both ext/supported rates IEs are obtained from scan results. These
IEs are passed to ie_tlv_init/ie_tlv_next, as well as direct length
checks (for supported rates at least, extended supported rates can
be as long as a single byte integer can hold, 1 - 255) which verifies
that the length in the IE matches the overall IE length that is
stored in scan_bss. Because of this, ie_parse_supported_rates_from_data
was doing double duty re-initializing a TLV iterator.
Intead, since we know the IE length is within bounds, the length/data
can simply be directly accessed out of the buffer. This avoids the need
for a wrapper function entirely.
The length parameters were also removed, since this is now obtained
directly from the IE.
Since netdev maintains the list of FT over DS info structs there is not
any need for station to get callbacks when the initial action frame
is received, or not. This removes the need for the callback handler,
user data, and response timeout.
Roam times can be slightly improved by sending out the FT-over-DS
action frames to any BSS in the mobility domain immediately after
connecting. This preauthenticates IWD to each AP which means
Reassociation can happen right away when a roam is needed.
When a roam is needed station_transition_start will first try
FT-over-DS (if supported) via netdev_fast_transtion_over_ds. The
return is checked and if netdev has no cached entries FT-over-Air
will be used instead.
The beauty of FT-over-DS is that a station can send and receive
action frames to many APs to prepare for a future roam. Each
AP authenticates the station and when a roam happens the station
can immediately move to reassociation.
To handle this a queue of netdev_ft_over_ds_info structs is used
instead of a single entry. Using the new ft.c parser APIs these
info structs can be looked up when responses come in. For now
the timeouts/callbacks are kept but these will be removed as it
really does not matter if the AP sends a response (keeps station
happy until the next patch).
This is to prepare for multiple concurrent FT-over-DS action frames.
A list will be kept in netdev and for lookup reasons it needs to
parse the start of the frame to grab the aa/spa addresses. In this
call the IEs are also returned and passed to the new
ft_over_ds_parse_action_response.
For now the address checks have been moved into netdev, but this will
eventually turn into a queue lookup.
This value sets the roaming threshold on 5GHz networks. The
threshold has been separated from 2.4GHz because in many cases
5GHz can perform much better at low RSSI than 2.4GHz.
In addition the BSS ranking logic was re-worked and now 5GHz is
much more preferred, even at low RSSI. This means we need a
lower floor for RSSI before roaming, otherwise IWD would end
up roaming immediately after connecting due to low RSSI CQM
events.
This is being added as a developer method and should not be used
in production. For testing purposes though, it is quite useful as
it forces IWD to roam to a provided BSS and bypasses IWD's roaming
and ranking logic for choosing a roam candidate.
To use this a BSSID is provided as the only parameter. If this
BSS is not in IWD's current scan results -EINVAL will be returned.
If IWD knows about the BSS it will attempt to roam to it whether
that is via FT, FT-over-DS, or Reassociation. These details are
still sorted out in IWDs station_transition_start() logic.
This will enable developer features to be used. Currently the
only user of this will be StationDiagnostics.Roam() method which
should only be exposed in this mode.
Expose the state directory/storage directory path on D-Bus because it
can't be known to clients until IWD runs, and client might need to
occasionally fiddle with the network config files. While there also
expose the IWD version string, similar to how some other D-Bus services
do.
Similar to 06aa84cca set the operstate when AdHoc is started and
stopped as it is no longer always set by netdev (only for station/p2p
interface types)
Previously resp was a simple array of bytes allocated on the stack.
This was changed to a dynamically allocated array, but the sizeof(resp)
argument to ap_build_beacon_pr_head() was never changed appropriately.
Fix this by introducing a new resp_len variable that holds the number of
bytes allocated for resp. Also, move the allocation after the basic
sanity checks have been performed to avoid allocating/freeing memory
unnecessarily.
Fixes: 18a63f91fd ("ap: Write extra frame IEs from the user")
Commit 1fe5070 added a workaround for drivers which may send the
connect event prior to the connect callback/ack. This caused IWD
to fail to start eapol if reassociation was used due to
netdev_reassociate never setting netdev->connected = false.
netdev_reassociate uses the same code path as normal connections,
but when the connect callback came in connected was already set
to true which then prevents eapol from being registered. Then,
once the connect event comes in, there is no frame watch for
eapol and IWD doesn't respond to any handshake frames.
Prior to this, an error sending the FT Reassociation was treated
as fatal, which is correct for FT-over-Air but not for FT-over-DS.
If the actual l_genl_family_send call fails for FT-over-DS the
existing connection can be maintained and there is no need to
call netdev_connect_failed.
Adding a return to the tx_associate function works for both FT
types. In the FT-over-Air case this return will ultimately get
sent back up to auth_proto_rx_authenticate in which case will
call netdev_connect_failed. For FT-over-DS tx_associate is
actually called from the 'start' operation which can fail and
still maintain the existing connection.
FT-over-DS was refactored to separate the FT action frame and
reassociation. From stations standpoint IWD needs to call
netdev_fast_transition_over_ds_action prior to actually roaming.
For now these two stages are being combined and the action
roam happens immediately after the action response callback.
FT-over-DS followed the same pattern as FT-over-Air which worked,
but really limited how the protocol could be used. FT-over-DS is
unique in that we can authenticate to many APs by sending out
FT action frames and parsing the results. Once parsed IWD can
immediately Reassociate, or do so at a later time.
To take advantage of this IWD need to separate FT-over-DS into
two stages: action frame and reassociation.
The initial action frame stage is started by netdev. The target
BSS is sent an FT action frame and a new cache entry is created
in ft.c. Once the response is received the entry is updated
with all the needed data to Reassociate. To limit the record
keeping on netdev each FT-over-DS entry holds a userdata pointer
so netdev doesn't need to maintain its own list of data for
callbacks.
Once the action response is parsed netdev will call back signalling
the action frame sequence was completed (either successfully or not).
At this point the 'normal' FT procedure can start using the
FT-over-DS auth-proto.
FT-over-DS is being separated into two independent stages. The
first of which is the processing of the action frame response.
This new class will hold all the parsed information from the action
frame and allowing it to be retrieved at a later time when IWD
needs to roam.
Initial info class should be created when the action frame is
being sent out. Once a response is received it can be parsed
with ft_over_ds_parse_action_response. This verifies the frame
and updates the ft_ds_info class with the parsed data.
ft_over_ds_prepare_handshake is the final step prior to
Reassociation. This sets all the stored IEs, anonce, and KH IDs
into the handshake and derives the new PTK.
This adds the RSNE verification to ft_parse_ies which will
be common between over-Air and over-DS. The MDE check was
also factored out into its own minimal function as to
retain the spec comment but allow reuse elsewhere.
Prior to this the diagnostic interface was taken down when station
transitioned to DISCONNECTED. This worked but once station is in
a DISCONNECTING state it then calls netdev_disconnect(). Trying to
get any diagnostic data during this time may not work as its
unknown what state exactly the kernel is in. To be safe take the
interface down when station is DISCONNECTING.
The building of the FT IEs for Action/Authenticate
frames will need to be shared between ft and netdev
once FT-over-DS is refactored.
The building was refactored to work off the callers
buffer rather than internal stack buffers. An argument
'new_snonce' was included as FT-over-DS will generate
a new snonce for the initial action frame, hence the
handshakes snonce cannot be used.
Break up the rather large code block which parses out IEs,
verifies, and sets into the handshake. FT-over-DS needs these
steps broken up in order to parse the action frame response
without modifying the handshake.
Under very rare circumstances the roaming scan triggered might not be
canceled properly. This is because we issue the roam scan recursively
from within a scan callback and re-use the id of the scan for the
subsequent request. The destroy callback is invoked right after the
callback and resets the id. This leads to the scan not being canceled
properly in roam_state_clear().
src/netdev.c:netdev_mlme_notify() MLME notification Notify CQM(64)
src/station.c:station_roam_trigger_cb() 37
src/station.c:station_roam_scan() ifindex: 37
src/station.c:station_roam_trigger_cb() Using cached neighbor report for roam
...
src/scan.c:get_scan_done() get_scan_done
src/station.c:station_roam_failed() 37
src/station.c:station_roam_scan() ifindex: 37
src/scan.c:scan_request_triggered() Active scan triggered for wdev 22
^CTerminate
src/netdev.c:netdev_free() Freeing netdev wlan0[37]
src/device.c:device_free()
src/station.c:station_free()
...
Removing scan context for wdev 22
src/scan.c:scan_context_free() sc: 0x4a362a0
src/wiphy.c:wiphy_radio_work_done() Work item 14 done
==19542== Invalid write of size 4
==19542== at 0x411500: station_roam_scan_destroy (station.c:2010)
==19542== by 0x420B5B: scan_request_free (scan.c:156)
==19542== by 0x410BAC: destroy_work (wiphy.c:294)
==19542== by 0x410BAC: wiphy_radio_work_done (wiphy.c:1613)
==19542== by 0x46C66E: l_queue_clear (queue.c:107)
==19542== by 0x46C6B8: l_queue_destroy (queue.c:82)
==19542== by 0x420BAE: scan_context_free (scan.c:205)
==19542== by 0x424135: scan_wdev_remove (scan.c:2272)
==19542== by 0x408754: netdev_free (netdev.c:847)
==19542== by 0x40E18C: netdev_shutdown (netdev.c:5773)
==19542== by 0x404756: iwd_shutdown (main.c:78)
==19542== by 0x404756: iwd_shutdown (main.c:65)
==19542== by 0x470E21: handle_callback (signal.c:78)
==19542== by 0x470E21: signalfd_read_cb (signal.c:104)
==19542== by 0x47166B: io_callback (io.c:120)
==19542== Address 0x4d81f98 is 200 bytes inside a block of size 288 free'd
==19542== at 0x48399CB: free (vg_replace_malloc.c:538)
==19542== by 0x47F3E5: interface_instance_free (dbus-service.c:510)
==19542== by 0x481DEA: _dbus_object_tree_remove_interface (dbus-service.c:1694)
==19542== by 0x481F1C: _dbus_object_tree_object_destroy (dbus-service.c:795)
==19542== by 0x40894F: netdev_free (netdev.c:844)
==19542== by 0x40E18C: netdev_shutdown (netdev.c:5773)
==19542== by 0x404756: iwd_shutdown (main.c:78)
==19542== by 0x404756: iwd_shutdown (main.c:65)
==19542== by 0x470E21: handle_callback (signal.c:78)
==19542== by 0x470E21: signalfd_read_cb (signal.c:104)
==19542== by 0x47166B: io_callback (io.c:120)
==19542== by 0x47088C: l_main_iterate (main.c:478)
==19542== by 0x47095B: l_main_run (main.c:525)
==19542== by 0x47095B: l_main_run (main.c:507)
==19542== by 0x470B6B: l_main_run_with_signal (main.c:647)
==19542== Block was alloc'd at
==19542== at 0x483879F: malloc (vg_replace_malloc.c:307)
==19542== by 0x46AB2D: l_malloc (util.c:62)
==19542== by 0x416599: station_create (station.c:3448)
==19542== by 0x406D55: netdev_newlink_notify (netdev.c:5324)
==19542== by 0x46D4BC: l_hashmap_foreach (hashmap.c:612)
==19542== by 0x472F46: process_broadcast (netlink.c:158)
==19542== by 0x472F46: can_read_data (netlink.c:279)
==19542== by 0x47166B: io_callback (io.c:120)
==19542== by 0x47088C: l_main_iterate (main.c:478)
==19542== by 0x47095B: l_main_run (main.c:525)
==19542== by 0x47095B: l_main_run (main.c:507)
==19542== by 0x470B6B: l_main_run_with_signal (main.c:647)
==19542== by 0x403EDB: main (main.c:490)
==19542==
Prior to this netdev_connect_ok set setting this which really
only applies to station mode. In addition this happens for each
new station that connects to the AP. Instead set the operstate /
link mode when AP starts and stops.
Change ap_start to load all of the AP configuration from a struct
l_settings, moving the 6 or so parameters from struct ap_config members
to the l_settings groups and keys. This extends the ap profile concept
used for the DHCP settings. ap_start callers create the l_settings
object and fill the values in it or read the settings in from a file.
Since ap_setup_dhcp and ap_load_profile_and_dhcp no longer do the
settings file loading, they needed to be refactored and some issues were
fixed in their logic, e.g. l_dhcp_server_set_ip_address() was never
called when the "IP pool" was used. Also the IP pool was previously only
used if the ap->config->profile was NULL and this didn't match what the
docs said:
"If [IPv4].Address is not provided and no IP address is set on the
interface prior to calling StartProfile the IP pool will be used."
The info struct is on the stack which leads to the potential
for uninitialized data access. Zero out the info struct prior
to calling the get station callback:
==141137== Conditional jump or move depends on uninitialised value(s)
==141137== at 0x458A6F: diagnostic_info_to_dict (diagnostic.c:109)
==141137== by 0x41200B: station_get_diagnostic_cb (station.c:3620)
==141137== by 0x405BE1: netdev_get_station_cb (netdev.c:4783)
==141137== by 0x4722F9: process_unicast (genl.c:994)
==141137== by 0x4722F9: received_data (genl.c:1102)
==141137== by 0x46F28B: io_callback (io.c:120)
==141137== by 0x46E5AC: l_main_iterate (main.c:478)
==141137== by 0x46E65B: l_main_run (main.c:525)
==141137== by 0x46E65B: l_main_run (main.c:507)
==141137== by 0x46E86B: l_main_run_with_signal (main.c:647)
==141137== by 0x403EA8: main (main.c:490)
It isn't safe to return a NULL from diagnostic_akm_suite_to_security()
since the value is used directly. Also, if the AKM suite is 0, this
implies that the network is an Open network and not some unknown AKM.
==17982== Invalid read of size 1
==17982== at 0x483BC92: strlen (vg_replace_strmem.c:459)
==17982== by 0x47DE60: _dbus1_builder_append_basic (dbus-util.c:981)
==17982== by 0x41ACB2: dbus_append_dict_basic (dbus.c:197)
==17982== by 0x412050: station_get_diagnostic_cb (station.c:3614)
==17982== by 0x405B19: netdev_get_station_cb (netdev.c:4801)
==17982== by 0x47436E: process_unicast (genl.c:994)
==17982== by 0x47436E: received_data (genl.c:1102)
==17982== by 0x470FBB: io_callback (io.c:120)
==17982== by 0x4701DC: l_main_iterate (main.c:478)
==17982== by 0x4702AB: l_main_run (main.c:525)
==17982== by 0x4702AB: l_main_run (main.c:507)
==17982== by 0x4704BB: l_main_run_with_signal (main.c:647)
==17982== by 0x403EDB: main (main.c:490)
==17982== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==17982==
Aborting (signal 11) [/home/denkenz/iwd/src/iwd]
++++++++ backtrace ++++++++
0 0x488a550 in /lib64/libc.so.6
1 0x483bc92 in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so
2 0x47de61 in _dbus1_builder_append_basic() at ell/dbus-util.c:983
3 0x41acb3 in dbus_append_dict_basic() at src/dbus.c:197
4 0x412051 in station_get_diagnostic_cb() at src/station.c:3618
5 0x405b1a in netdev_get_station_cb() at src/netdev.c:4801
It is possible for the RTNL command callback to come after
netconfig_reset or netconfig_destroy has been called. Make sure that
any outstanding commands that might access the netconfig object are
canceled.
src/netconfig.c:netconfig_ipv4_dhcp_event_handler() DHCPv4 event 0
src/netconfig.c:netconfig_ifaddr_added() wlan0: ifaddr 192.168.1.55/24 broadcast 192.168.1.255
^CTerminate
src/netdev.c:netdev_free() Freeing netdev wlan0[15]
src/device.c:device_free()
src/station.c:station_free()
src/netconfig.c:netconfig_destroy()
src/netconfig.c:netconfig_reset()
src/netconfig.c:netconfig_reset_v4() 16
src/netconfig.c:netconfig_reset_v4() Stopping client
Removing scan context for wdev c
src/scan.c:scan_context_free() sc: 0x4a3cc10
==12792== Invalid read of size 8
==12792== at 0x43BF5A: netconfig_route_add_cmd_cb (netconfig.c:600)
==12792== by 0x4727FA: process_message (netlink.c:181)
==12792== by 0x4727FA: can_read_data (netlink.c:289)
==12792== by 0x470F4B: io_callback (io.c:120)
==12792== by 0x47016C: l_main_iterate (main.c:478)
==12792== by 0x47023B: l_main_run (main.c:525)
==12792== by 0x47023B: l_main_run (main.c:507)
==12792== by 0x47044B: l_main_run_with_signal (main.c:647)
==12792== by 0x403EDB: main (main.c:490)
In case the netdev is brought down while we're trying to connect, try to
detect this and fail early instead of trying to send additional
commands.
src/station.c:station_enter_state() Old State: disconnected, new state: connecting
src/station.c:station_netdev_event() Associating
src/netdev.c:netdev_mlme_notify() MLME notification Connect(46)
src/netdev.c:netdev_connect_event()
src/netdev.c:netdev_link_notify() event 16 on ifindex 4
src/eapol.c:eapol_handle_ptk_1_of_4() ifindex=4
src/netdev.c:netdev_link_notify() event 16 on ifindex 4
src/eapol.c:eapol_handle_ptk_3_of_4() ifindex=4
src/netdev.c:netdev_set_gtk() 4
src/station.c:station_handshake_event() Setting keys
src/netdev.c:netdev_set_tk() 4
src/netdev.c:netdev_set_rekey_offload() 4
New Key for Group Key failed for ifindex: 4:Network is down
src/netdev.c:netdev_link_notify() event 16 on ifindex 4
src/station.c:station_free()
src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
src/netdev.c:netdev_disconnect_event()
src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX
src/netdev.c:netdev_link_notify() event 16 on ifindex 4
src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is DE
src/wiphy.c:wiphy_radio_work_done() Work item 14 done
src/station.c:station_connect_cb() 4, result: 4
Segmentation fault
A prior commit refactored the AKM selection in wiphy.c. This
ended up breaking FILS tests due to the hard coding of a
false fils_hint in wiphy_select_akm. Since our FILS tests
only advertise FILS AKMs wiphy_can_connect would return false
for these networks.
Similar to wiphy_select_akm, add a fils hint parameter to
wiphy_can_connect and pass that down directly to wiphy_select_akm.
If PreSharedKey is set, the current logic does not validate the
Passphrase beyond its existence. This can lead to strange situations
where an invalid WPA3-PSK passphrase might get used. This can of course
only happen if the user (as root) or NetworkManager-iwd-backend writes
such a file incorrectly.
Move the WSC Primary Device Type parsing from p2p.c and eap-wsc.c to a
common function in wscutil.c supporting both formats so that it can be
used in ap.c too.
Logically this frame watch belongs in station. It was kept in device.c
for the purported reason that the station object was removed with
ifdown/ifup changes and hence the frame watch might need to be removed
and re-added unnecessarily. Since the kernel does not actually allow to
unregister a frame watch (only when the netdev is removed or its iftype
changes), re-adding a frame watch might trigger a -EALREADY or similar
error.
Avoid this by registering the frame watch when a new netdev is detected
in STATION mode, or when the interface type changes to STATION.
If a netdev iftype is changed, all frame registrations are removed.
Make sure to re-register for the appropriate frame notifications in case
our iftype is switched back to 'station'. In any other iftype, no frame
watches are registered and rrm_state object is effectively dormant.
Right now, RRM is created when a new netdev is detected and its iftype
is of type station. That means that any devices that start their life
as any other iftype cannot be changed to a station and have RRM function
properly. Fix that by always creating the RRM state regardless of the
initial iftype.
In the case that a netdev is powered down, or an interface type change
occurs, the station object will be removed and any watches will be
freed.
Since rrm is created when the netdev is created and persists across
iftype and power up/down changes, it should provide a destroy callback
to station_add_state_watch so that it can be notified when the watch is
removed.
If the iftype changes, kernel silently wipes out any frame registrations
we may have registered. Right now, frame registrations are only done when
the interface is created. This can result in frame watches not being
added if the interface type is changed between station mode to ap mode
and then back to station mode, e.g.:
device wlan0 set-property Mode ap
device wlan0 set-property Mode station
Make sure to re-add frame registrations according to the mode if the
interface type is changed.
Since netdev now keeps track of iftype changes, let it call
frame_watch_wdev_remove on netdevs that it manages to clear frame
registrations that should be cleared due to an iftype change.
Note that P2P_DEVICE wdevs are not managed by any netdev object, but
since their iftype cannot be changed, they should not be affected
by this change.
And set the interface type based on the event rather than the command
callback. This allows us to track interface type changes even if they
come from outside iwd (which shouldn't happen.)
The prepare_ft patch was an intermediate to a full patch
set and was not fully tested stand alone. Its placement
actually broke FT due to handshake->aa getting overwritten
prior to netdev->prev_bssid being copied out. This caused
FT to fail with "transport endpoint not connected (-107)"
- Make sure to print the cookie information
- Don't print messages for frames we're not interested in. This is
particularly helpful when running auto-tests since frame acks from
hostapd pollute the iwd log.
Fix a regression where connection to an open network results in an
NotSupported error being returned.
Fixes: d79e883e93 ("netdev: Introduce connection types")
This makes conversions simpler. Also fixes a bug where P2P devices were
printed with an incorrect Mode value since dbus_iftype_to_string was
assuming that an iftype as defined in nl80211.h was being passed in,
while netdev was returning an enum value defined in netdev.h.
It was seen that some full mac cards/drivers do not include any
rate information with the NEW_STATION event. This was causing
the NEW_STATION event to be ignored, preventing AP mode from
working on these cards.
Since the full mac path does not even require sta->rates the
parsing can be removed completely.
It was found that if the user cancels/disconnects the agent prior to
entering credentials, IWD would get stuck and could no longer accept
any connect calls with the error "Operation already in progress".
For example exiting iwctl in the Password prompt would cause this:
iwctl
$ station wlan0 connect myssid
$ Password: <Ctrl-C>
This was due to the agent never calling the network callback in the
case of an agent disconnect. Network would wait indefinitely for the
credentials, and disallow any future connect attempts.
To fix this agent_finalize_pending can be called in agent_disconnect
with a NULL reply which behaves the same as if there was an
internal timeout and ultimately allows network to fail the connection
The 8021x offloading procedure still does EAP in userspace which
negotiates the PMK. The kernel then expects to obtain this PMK
from userspace by calling SET_PMK. This then allows the firmware
to begin the 4-way handshake.
Using __eapol_install_set_pmk_func to install netdev_set_pmk,
netdev now gets called into once EAP finishes and can begin
the final userspace actions prior to the firmware starting
the 4-way handshake:
- SET_PMK using PMK negotiated with EAP
- Emit SETTING_KEYS event
- netdev_connect_ok
One thing to note is that the kernel provides no way of knowing if
the 4-way handshake completed. Assuming SET_PMK/SET_STATION come
back with no errors, IWD assumes the PMK was valid. If not, or
due to some other issue in the 4-way, the kernel will send a
disconnect.
This adds a new type for 8021x offload as well as support in
building CMD_CONNECT.
As described in the comment, 8021x offloading is not particularly
similar to PSK as far as the code flow in IWD is concerned. There
still needs to be an eapol_sm due to EAP being done in userspace.
This throws somewhat of a wrench into our 'is_offload' cases. And
as such this connection type is handled specially.
802.1x offloading needs a way to call SET_PMK after EAP finishes.
In the same manner as set_tk/gtk/igtk a new 'install_pmk' function
was added which eapol can call into after EAP completes.
The chances were extremely low, but using l_idle_oneshot
could end up causing a invalid memory access if the netdev
went down while waiting for the disconnect idle callback.
Instead netdev can keep track of the idle with l_idle_create
and remove it if the netdev goes down prior to the idle callback.
This fixes an infinite loop issue when authenticate frames time
out. If the AP is not responding IWD ends up retrying indefinitely
due to how SAE was handling this timeout. Inside sae_auth_timeout
it was actually sending another authenticate frame to reject
the SAE handshake. This, again, resulted in a timeout which called
the SAE timeout handler and repeated indefinitely.
The kernel resend behavior was not taken into account when writing
the SAE timeout behavior and in practice there is actually no need
for SAE to do much of anything in response to a timeout. The
kernel automatically resends Authenticate frames 3 times which mirrors
IWDs SAE behavior anyways. Because of this the authenticate timeout
handler can be completely removed, which will cause the connection
to fail in the case of an autentication timeout.
This crash was caused from the disconnect_cb being called
immediately in cases where send_disconnect was false. The
previous patch actually addressed this separately as this
flag was being set improperly which will, indirectly, fix
one of the two code paths that could cause this crash.
Still, there is a situation where send_disconnect could
be false and in this case IWD would still crash. If IWD
is waiting to queue the connect item and netdev_disconnect
is called it would result in the callback being called
immediately. Instead we can add an l_idle as to allow the
callback to happen out of scope, which is what station
expects.
Prior to this patch, the crashing behavior can be tested using
the following script (or some variant of it, your system timing
may not be the same as mine).
iwctl station wlan0 disconnect
iwctl station wlan0 connect <network1> &
sleep 0.02
iwctl station wlan0 connect <network2>
++++++++ backtrace ++++++++
0 0x7f4e1504e530 in /lib64/libc.so.6
1 0x432b54 in network_get_security() at src/network.c:253
2 0x416e92 in station_handshake_setup() at src/station.c:937
3 0x41a505 in __station_connect_network() at src/station.c:2551
4 0x41a683 in station_disconnect_onconnect_cb() at src/station.c:2581
5 0x40b4ae in netdev_disconnect() at src/netdev.c:3142
6 0x41a719 in station_disconnect_onconnect() at src/station.c:2603
7 0x41a89d in station_connect_network() at src/station.c:2652
8 0x433f1d in network_connect_psk() at src/network.c:886
9 0x43483a in network_connect() at src/network.c:1183
10 0x4add11 in _dbus_object_tree_dispatch() at ell/dbus-service.c:1802
11 0x49ff54 in message_read_handler() at ell/dbus.c:285
12 0x496d2f in io_callback() at ell/io.c:120
13 0x495894 in l_main_iterate() at ell/main.c:478
14 0x49599b in l_main_run() at ell/main.c:521
15 0x495cb3 in l_main_run_with_signal() at ell/main.c:647
16 0x404add in main() at src/main.c:490
17 0x7f4e15038b25 in /lib64/libc.so.6
The send_disconnect flag was being improperly set based only
on connect_cmd_id being zero. This does not take into account
the case of CMD_CONNECT having finished but not EAPoL. In this
case we do need to send a disconnect.
This adds a new connection type, TYPE_PSK_OFFLOAD, which
allows the 4-way handshake to be offloaded by the firmware.
Offloading will be used if the driver advertises support.
The CMD_ROAM event path was also modified to take into account
handshake offloading. If the handshake is offloaded we still
must issue GET_SCAN, but not start eapol since the firmware
takes care of this.
Until now FT was only supported via Auth/Assoc commands which barred
any fullmac cards from using FT AKMs. With PSK offload support these
cards can do FT but only when offloading is used.
In the FW scan callback eapol was being stared unconditionally which
isn't correct as roaming on open networks is possible. Instead check
that a SM exists just like is done in netdev_connect_event.
This should have been updated along with the connect and roam
event separation. Since netdev_connect_event is not being
re-used for CMD_ROAM the comment did not make sense anymore.
Still, there needs to be a check to ensure we were not disconnected
while waiting for GET_SCAN to come back.
netdev_connect_event was being reused for parsing of CMD_ROAM
attributes which made some amount of sense since these events
are nearly identical, but due to the nature of firmware roaming
there really isn't much IWD needs to parse from CMD_ROAM. In
addition netdev_connect_event was getting rather complicated
since it had to handle both CMD_ROAM and CMD_CONNECT.
The only bits of information IWD needs to parse from CMD_ROAM
is the roamed BSSID, authenticator IEs, and supplicant IEs. Since
this is so limited it now makes little sense to reuse the entire
netdev_connect_event function, and intead only parse what is
needed for CMD_ROAM.
station should be isolated as much as possible from the details of the
driver type and how a particular AKM is handled under the hood. It will
be up to wiphy to pick the best AKM for a given bss. netdev in turn
will pick how to drive the particular AKM that was picked.
Currently netdev handles SoftMac and FullMac drivers mostly in the same
way, by building CMD_CONNECT nl80211 commands and letting the kernel
figure out the details. Exceptions to this are FILS/OWE/SAE AKMs which
are only supported on SoftMac drivers by using
CMD_AUTHENTICATE/CMD_ASSOCIATE.
Recently, basic support for SAE (WPA3-Personal) offload on FullMac cards
was introduced. When offloaded, the control flow is very different than
under typical conditions and required additional logic checks in several
places. The logic is now becoming quite complex.
Introduce a concept of a connection type in order to make it clearer
what driver and driver features are being used for this connection. In
the future, connection types can be expanded with 802.1X handshake
offload, PSK handshake offload and CMD_EXTERNAL_AUTH based SAE
connections.
Commit 6e8b76527 added a switch statement for AKM suites which
was not correct as this is a bitmask and may contain multiple
values. Intead we can rely on wiphy_select_akm which is a more
robust check anyways.
Fixes: 6e8b765278 ("wiphy: add check for CMD_AUTH/CMD_ASSOC support")
If there is an associate timeout, retry a few times in case
it was just a fluke. At this point SAE is fully negotiated
so it makes sense to attempt to save the connection.
Any auth proto which did not implement the assoc_timeout handler
could end up getting 'stuck' forever if there was an associate
timeout. This is because in the event of an associate timeout IWD
only sets a few flags and relies on the connect event to actually
handle the failure. The problem is a connect event never comes
if the failure was a timeout.
To fix this we can explicitly fail the connection if the auth
proto has not implemented assoc_timeout or if it returns false.
In the same vein as requesting a neighbor report after
connecting for the first time, it should also be done
after a roam to obtain the latest neighbor information.
Converts ie_rsn_akm_suite values (and WPA1 hint) into a more
human readable security string such as:
WPA2-Personal, WPA3-Personal, WPA2-Personal + FT etc.
When we cancel a quick scan that has already been triggered, the
Scanning property is never reset to false. This doesn't fully reflect
the actual scanning state of the hardware since we don't (yet) abort
the scan, but at least corrects the public API behavior.
{Network} [/net/connman/iwd/0/7/73706733_psk] Connected = False
{Station} [/net/connman/iwd/0/7] Scanning = True
{Station} [/net/connman/iwd/0/7] State = connecting
{Station} [/net/connman/iwd/0/7] ConnectedNetwork =
/net/connman/iwd/0/7/73706733_psk
{Network} [/net/connman/iwd/0/7/73706733_psk] Connected = True
If IWD is connecting to a SAE/WPA3 BSS and Auth/Assoc commands
are not supported the only option is SAE offload. At this point
network_connect should have verified that the extended feature
for SAE offload exists so we can simply enable offload if these
commands are not supported.
SAE offload support requires some minor tweaks to CMD_CONNECT
as well as special checks once the connect event comes in. Since
at this point we are fully connected.
After adding network_bss_update, network now has a match_addr
queue function which can be used to replace an unneeded
l_queue_get_entries loop with l_queue_find.
This will swap out a scan_bss object with a duplicate that may
exist in a networks bss_list. The duplicate will be removed by
since the object is owned by station it is assumed that it will
be freed elsewhere.
If the hardware roams automatically we want to be sure to not
react to CQM events and attempt to roam/disconnect on our own.
Note: this is only important for very new kernels where CQM
events were recently added to brcmfmac.
Roaming on a full mac card is quite different than soft mac
and needs to be specially handled. The process starts with
the CMD_ROAM event, which tells us the driver is already
roamed and associated with a new AP. After this it expects
the 4-way handshake to be initiated. This in itself is quite
simple, the complexity comes with how this is piped into IWD.
After CMD_ROAM fires its assumed that a scan result is
available in the kernel, which is obtained using a newly
added scan API scan_get_firmware_scan. The only special
bit of this is that it does not 'schedule' a scan but simply
calls GET_SCAN. This is treated special and will not be
queued behind any other pending scan requests. This lets us
reuse some parsing code paths in scan and initialize a
scan_bss object which ultimately gets handed to station so
it can update connected_bss/bss_list.
For consistency station must also transition to a roaming state.
Since this roam is all handled by netdev two new events were
added, NETDEV_EVENT_ROAMING and NETDEV_EVENT_ROAMED. Both allow
station to transition between roaming/connected states, and ROAMED
provides station with the new scan_bss to replace connected_bss.
Adds support for getting firmware scan results from the kernel.
This is intended to be used after the firmware roamed automatically
and the scan result is require for handshake initialization.
The scan 'request' is competely separate from the normal scan
queue, though scan_results, scan_request, and the scan_context
are all used for consistency and code reuse.
Register P2P group's vendor IE writers using the new API to build and
attach the necessary P2P IE and WFD IEs to the (Re)Association Response,
Probe Response and Beacon frames sent by the GO.
Roughly validate the IEs and save some information for use in our own
IEs. p2p_extract_wfd_properties and p2p_device_validate_conn_wfd are
being moved unchanged to be usable in p2p_group_event without forward
declarations and to be next to p2p_build_wfd_ie.
Make the WSC IE processing and writing more self-contained (i.e. so that
it can be more easily moved to a separate file if desired) by using the
new ap_write_extra_ies() mechanism.
Pass the string IEs from the incoming STA association frames to
the user in the AP event data. I drop
ap_event_station_added_data.rsn_ie because that probably wasn't
going to ever be useful and the RSN IE is included in the .assoc_ies
array in any case.
Since GET_STATION (and in turn GetDiagnostics) gets the most
current station info this attribute serves as a better indication
of the current signal strength. In addition full mac cards don't
appear to always have the average attribute.
No instances of this macro now exist. If future instances crop up, the
better approach would be to use pragma directives to quiet such warnings
and allow static analysis to catch any issues.
Expanded packets with a 0 vendor id need to be treated just like
non-expanded ones. This led to very nasty looking if statements
throughout this function. Fix that by introducing a nested function
to take care of the response type normalization. This also allows us to
drop uninitialized_var usage.
Expanded Nak packet contains (possibly multiple) 8 byte chunks that
contain the type (1 byte, always '254') vendor-id (3 bytes) and
vendor-type (4) bytes.
Unfortunately the current logic was reading the vendor-id at the wrong
offset (0 instead of 1) and so the extracted vendor-type was incorrect.
Fixes: 17c569ba4c ("eap: Add authenticator method logic and API")
If we received a Nak or an Expanded Nak packet, the intent was to print
our own method type. Instead we tried to print the Nak type contents.
Fix that by always passing in our method info to eap_type_to_str.
Fixes: 17c569ba4c ("eap: Add authenticator method logic and API")
The '__' prefix is meant for private, semi-private,
inner implementation or otherwise special APIs that
are typically exposed in a header. In the case of watchlist, these
functions were static and do not fit the above description. Remove the
__ prefix accordingly.
When using iwd.conf:[General].EnableNetworkConfiguration=true, it is not
possible to configure systemd.network:[Network].MulticastDNS= as
systemd-networkd considers the link to be unmanaged. This patch allows
iwd to configure that setting on systemd-resolved directly.
If the extended feature for CQM levels was not supported no CQM
registration would happen, not even for a single level. This
caused IWD to completely lose the ability to roam since it would
only get notified when the kernel was disconnecting, around -90
dBm, not giving IWD enough time to roam.
Instead if the extended feature is not supported we can still
register for the event, just without multiple signal levels.
There is no functional change here but checking the return
value makes static analysis much happier. Checking the
return and setting the default inside the if clause is also
consistent with how IWD does it many other places.
Handle situations where the BSS we're trying to connect to is no longer
in the kernel scan result cache. Normally, the kernel will re-scan the
target frequency if this happens on the CMD_CONNECT path, and retry the
connection.
Unfortunately, CMD_AUTHENTICATE path used for WPA3, OWE and FILS does
not have this scanning behavior. CMD_AUTHENTICATE simply fails with
a -ENOENT error. Work around this by trying a limited scan of the
target frequency and re-trying CMD_AUTHENTICATE once.
An earlier patch fixed a problem where a queued quick scan would
be triggered and fail once already connected, resulting in a state
transition from connected --> autoconnect_full. This fixed the
Connect() path but this could also happen via autoconnect. Starting
from a connected state, the sequence goes:
- DBus scan is triggered
- AP disconnects IWD
- State transition from disconnected --> autoconnect_quick
- Queue quick scan
- DBus scan results come in and used to autoconnect
- A connect work item is inserted ahead of all others, transition
from autoconnect_quick --> connecting.
- Connect completes, transition from connecting --> connected
- Quick scan can finally get triggered, which the kernel fails to
do since IWD is connected, transition from connected -->
autoconnect_full.
This can be fixed by checking for a pending quick scan in the
autoconnect path.
Commit eac2410c83 ("station: Take scanned frequencies into account")
has made it unnecessary to explicitly invoke station_set_scan_results
with the expire to true in case a dbus scan finished prematurely or a
subset was not able to be started. Remove this no-longer needed logic.
Fixes: eac2410c83 ("station: Take scanned frequencies into account")
The diagnostic interface returns an error anyways if station is
not connected so it makes more sense to only bring the interface
up when its actually usable. This also removes the interface
when station disconnects, which was never done before (the
interface stayed up indefinitely due to a forgotten remove call).
When we're auto-connecting and have hidden networks configured, use
active scans regardless of whether we see any hidden BSSes in our
existing scan results.
This allows us to more effectively see/connect to hidden networks
when first powering up or after suspend.
Kernel might report hidden BSSes that are reported from beacon frames
separately than ones reported due to probe responses. This may confuse
the station network collation logic since the scan_bss generated by the
probe response might be removed erroneously when processing the scan_bss
that was generated due to a beacon.
Make sure that bss_match also takes the SSID into account and only
matches scan_bss structures that have the same BSSID and SSID contents.
Instead of manually managing whether to expire BSSes or not, use the
scanned frequency set instead. This makes the API slightly easier to
understand (dropping two boolean arguments in a row) and also a bit more
future-proof.
Commit d372d59bea checks whether a hidden network had a previous
connection attempt and re-tries. However, it inadvertently dropped
handling of a condition where a non-hidden network SSID is provided to
ConnectHiddenNetwork. Fix that.
Fixes: d372d59bea ("station: Allow ConnectHiddenNetwork to be retried")
The diagnostic interface serves no purpose until the AP has
been started. Any calls on it will return an error so instead
it makes more sense to bring it up when the AP is started, and
down when the AP is stopped.
Its useful being able to refer to the network Name/SSID once
an AP is started. For example opening an iwctl session with an
already started AP provides no way of obtaining the SSID.
In some cases the AP can send a deauthenticate frame right after
accepting our authentication. In this case the kernel never properly
sends a CMD_CONNECT event with a failure, even though CMD_COONNECT was
used to initiate the connection. Try to work around that by detecting
that a Deauthenticate event arrives prior to any Associte or Connect
events and handle this case as a connect failure.
Now that ConnectHiddenNetwork can be invoked while we're connected, set
the mac randomization hint parameter properly. The kernel will reject
requests if randomization is enabled while we're connected to a network.
If we forget a hidden network, then make sure to remove it from the
network list completely. Otherwise it would be possible to still
issue a Network.Connect to that particular object, but the fact that the
network is hidden would be lost.
==17639== 72 (16 direct, 56 indirect) bytes in 1 blocks are definitely
lost in loss record 3 of 3
==17639== at 0x4C2F0CF: malloc (vg_replace_malloc.c:299)
==17639== by 0x4670AD: l_malloc (util.c:61)
==17639== by 0x4215AA: scan_freq_set_new (scan.c:1906)
==17639== by 0x412A9C: parse_neighbor_report (station.c:1910)
==17639== by 0x407335: netdev_neighbor_report_frame_event
(netdev.c:3522)
==17639== by 0x44BBE6: frame_watch_unicast_notify (frame-xchg.c:233)
==17639== by 0x470C04: dispatch_unicast_watches (genl.c:961)
==17639== by 0x470C04: process_unicast (genl.c:980)
==17639== by 0x470C04: received_data (genl.c:1101)
==17639== by 0x46D9DB: io_callback (io.c:118)
==17639== by 0x46CC0C: l_main_iterate (main.c:477)
==17639== by 0x46CCDB: l_main_run (main.c:524)
==17639== by 0x46CF01: l_main_run_with_signal (main.c:656)
==17639== by 0x403EDE: main (main.c:490)
In the case that ConnectHiddenNetwork scans successfully, but fails for
some other reason, the network object is left in the scan results until
it expires. This will prevent subsequent attempts to use
ConnectHiddenNetwork with a .NotHidden error. Fix that by checking
whether a found network is hidden, and if so, allow the request to
proceed.
Rework the logic slightly so that this function returns an error message
on error and NULL on success, just like other D-Bus method
implementations. This also simplifies the code slightly.
We used to not allow to connect to a different network while already
connected. One had to disconnect first. This also applied to
ConnectHiddenNetwork calls.
This restriction can be dropped now. station will intelligently
disconnect from the current AP when a station_connect_network() is
issued.
If the disconnect fails and station_disconnect_onconnect_cb is called
with an error, we reply to the original message accordingly.
Unfortunately pending_connect is not unrefed or cleared in this case.
Fix that.
Fixes: d0ee923dda ("station: Disconnect, if needed, on a new connection attempt")
An invalid known_network.freq file containing several UUID
groups which have the same 'name' key results in memory leaks
in IWD. This is because the file is loaded and the group's
are iterated without detecting duplicates. This leads to the
same network_info's known_frequencies being set/overridden
multiple times.
To fix this we just check if the network_info already has a
UUID set. If so remove the stale entry.
There may be other old, invalid, or stale entries from previous
versions of IWD, or a user misconfiguring the file. These will
now also be removed during load.
netdev_shutdown calls queue_destroy on the netdev_list, which in turn
calls netdev_free. netdev_free invokes the watches to notify them about
the netdev being removed. Those clients, or anything downstream can
still invoke netdev_find. Unfortunately queue_destroy is not re-entrant
safe, so netdev_find might return stale data. Fix that by using
l_queue_peek_head / l_queue_pop_head instead.
src/station.c:station_enter_state() Old State: connecting, new state:
connected
^CTerminate
src/netdev.c:netdev_free() Freeing netdev wlan1[6]
src/device.c:device_free()
Removing scan context for wdev 100000001
src/scan.c:scan_context_free() sc: 0x4ae9ca0
src/netdev.c:netdev_free() Freeing netdev wlan0[48]
src/device.c:device_free()
src/station.c:station_free()
src/netconfig.c:netconfig_destroy()
==103174== Invalid read of size 8
==103174== at 0x467AA9: l_queue_find (queue.c:346)
==103174== by 0x43ACFF: netconfig_reset (netconfig.c:1027)
==103174== by 0x43AFFC: netconfig_destroy (netconfig.c:1123)
==103174== by 0x414379: station_free (station.c:3369)
==103174== by 0x414379: station_destroy_interface (station.c:3466)
==103174== by 0x47C80C: interface_instance_free (dbus-service.c:510)
==103174== by 0x47C80C: _dbus_object_tree_remove_interface
(dbus-service.c:1694)
==103174== by 0x47C99C: _dbus_object_tree_object_destroy
(dbus-service.c:795)
==103174== by 0x409A87: netdev_free (netdev.c:770)
==103174== by 0x4677AE: l_queue_clear (queue.c:107)
==103174== by 0x4677F8: l_queue_destroy (queue.c:82)
==103174== by 0x40CDC1: netdev_shutdown (netdev.c:5089)
==103174== by 0x404736: iwd_shutdown (main.c:78)
==103174== by 0x404736: iwd_shutdown (main.c:65)
==103174== by 0x46BD61: handle_callback (signal.c:78)
==103174== by 0x46BD61: signalfd_read_cb (signal.c:104)
In the case of module_init failing due to a module that comes after
netdev, the netdev module doesn't clean up netdev_list properly.
==6254== 24 bytes in 1 blocks are still reachable in loss record 1 of 1
==6254== at 0x483777F: malloc (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==6254== by 0x4675ED: l_malloc (util.c:61)
==6254== by 0x46909D: l_queue_new (queue.c:63)
==6254== by 0x406AE4: netdev_init (netdev.c:5038)
==6254== by 0x44A7B3: iwd_modules_init (module.c:152)
==6254== by 0x404713: nl80211_appeared (main.c:171)
==6254== by 0x4713DE: process_unicast (genl.c:993)
==6254== by 0x4713DE: received_data (genl.c:1101)
==6254== by 0x46E00B: io_callback (io.c:118)
==6254== by 0x46D20C: l_main_iterate (main.c:477)
==6254== by 0x46D2DB: l_main_run (main.c:524)
==6254== by 0x46D2DB: l_main_run (main.c:506)
==6254== by 0x46D502: l_main_run_with_signal (main.c:656)
==6254== by 0x403EDB: main (main.c:490)
Rather than the previous hack which disabled group traffic it
was found that the GTK RSC could be manually set to zero which
allows group traffic. This appears to fix AP mode on brcmfmac
along with the previous fixes. This is not documented in
nl80211, but appears to work with this driver.
This is how a fullmac card tells userspace that a station has
left. This fixes the issue where the same client cannot re-connect
to the same AP multiple times. ap_new_station was renamed to
ap_handle_new_station for consistency.
Some fullmac cards were found to be buggy with getting the GTK
where it returns a BIP key for the GTK index, even after creating
a GTK with NEW_KEY explicitly. In an effort to get these cards
semi-working we can treat this just as a warning and continue with
the handshake without a GTK set which disables group traffic. A
warning is printed in this case so the user is not completely in
the dark.
Fix an issue with the recent changes to signal monitoring from commit
f456501b ("station: retry roaming unless notified of a high RSSI"):
1. driver sends NL80211_CQM_RSSI_THRESHOLD_EVENT_LOW
2. netdev->cur_rssi_low changes from FALSE to TRUE
3. netdev sends NETDEV_EVENT_RSSI_THRESHOLD_LOW to station
4. on roam reassociation, cur_rssi_low is reset to FALSE
5. station still assumes RSSI is low, periodically roams
until netdev sends NETDEV_EVENT_RSSI_THRESHOLD_HIGH
6. driver sends NL80211_CQM_RSSI_THRESHOLD_EVENT_HIGH
7. netdev->cur_rssi_low doesn't change (still FALSE)
8. netdev never sends NETDEV_EVENT_RSSI_THRESHOLD_HIGH
9. station remains stuck in an infinite roaming loop
The commit in question introduced the logic in (5). Previously the
assumption in station was - like in netdev - that if the signal was
still low, the driver would send a duplicate LOW event after
reassociation. This change makes netdev follow the same new logic as
station, i.e. assume the same signal state (LOW/HIGH) until told
otherwise by the driver.
Since fullmac cards handle auth/assoc in firmware IWD must
react differently while in AP mode just as it does in station.
For fullmac cards a NEW_STATION event is emitted post association
and from here the 4-way handshake can begin. In this NEW_STATION
handler a new sta_state is created and the needed members are
set in order to inject us back into the normal code execution
for softmac post association (i.e. creating group keys and
starting the 4-way handshake). From here everything works the
same as softmac.
At some point the non-interactive client tests began failing.
This was due to a bug in station where it would transition from
'connected' to 'autoconnect' due to a failed scan request. This
happened because a quick scan got scheduled during an ongoing
scan, then a Connect() gets issued. The work queue treats the
Connect as a priority so it delays the quick scan until after the
connection succeeds. This results in a failed quick scan which
IWD does not expect to happen when in a 'connected' state. This
failed scan actually triggers a state transition which then
gets IWD into a strange state where its connected from the
kernel point of view but does not think it is:
src/station.c:station_connect_cb() 13, result: 0
src/station.c:station_enter_state() Old State: connecting, new state: connected
src/wiphy.c:wiphy_radio_work_done() Work item 6 done
src/wiphy.c:wiphy_radio_work_next() Starting work item 5
src/station.c:station_quick_scan_triggered() Quick scan trigger failed: -95
src/station.c:station_enter_state() Old State: connected, new state: autoconnect_full
To fix this IWD should simply cancel any pending quick scans
if/when a Connect() call comes in.