When handling a scan finished event for a scan we haven't started check
that we were not halfway through a scan request that would have its
results flushed by the external scan.
FT-over-DS is a way to do a Fast BSS Transition using action frames for
the authenticate step. This allows a station to start a fast transition
to a target AP while still being connected to the original AP. This,
in theory, can result in less carrier downtime.
The existing ft_sm_new was removed, and two new constructors were added;
one for over-air, and another for over-ds. The internals of ft.c mostly
remain the same. A flag to distinguish between air/ds was added along
with a new parser to parse the action frames rather than authenticate
frames. The IE parsing is identical.
Netdev now just initializes the auth-proto differently depending on if
its doing over-air or over-ds. A new TX authenticate function was added
and used for over-ds. This will send out the IEs from ft.c with an
FT Request action frame.
The FT Response action frame is then recieved from the AP and fed into
the auth-proto state machine. After this point ft-over-ds behaves the
same as ft-over-air (associate to the target AP).
Some simple code was added in station.c to determine if over-air or
over-ds should be used. FT-over-DS can be beneficial in cases where the
AP is directing us to roam, or if the RSSI falls below a threshold.
It should not be used if we have lost communication to the AP all
(beacon lost) as it only works while we can still talk to the original
AP.
To support FT-over-DS this API needed some slight modifications:
- Instead of setting the DA to netdev->handshake->aa, it is just set to
the same address as the 'to' parameter. The kernel actually requires
and checks for these addresses to match. All occurences were passing
the handshake->aa anyways so this change should have no adverse
affects; and its actually required by ft-over-ds to pass in the
previous BSSID, so hard coding handshake->aa will not work.
- The frequency is is also passed in now, as ft-over-ds needs to use
the frequency of the currently connected AP (netdev->frequency get
set to the new target in netdev_fast_transition. Previous frequency
is also saved now).
- A new vector variant (netdev_send_action_framev) was added as well
to support sending out the FT Request action frame since the FT
TX authenticate function provides an iovec of the IEs. The existing
function was already having to prepend the action frame header to
the body, so its not any more or less copying to do the same thing
with an iovec instead.
Since FT already handles processing the FT IE's (and building for
associate) it didn't make sense to have all the IE building inside
netdev_build_cmd_ft_authenticate. Instead this logic was moved into
ft.c, and an iovec is now passed from FT into
netdev_ft_tx_authenticate. This leaves the netdev command builder
unburdened by the details of FT, as well as prepares for FT-over-DS.
Blacklist some drivers known to crash when interfaces are deleted or
created so that we don't even attempt that before falling back to using
the default interface.
Read the driver name for each wiphy from sysfs if available. I didn't
find a better way to obtain the driver name for a phy than by reading
the dir name that the "driver" symlink points at. For an existing
netdev this can be done using the SIOCETHTOOL ioctl.
manager_interface_dump_done would use manager_create_interfaces() at the
end of the loop iterating over pending_wiphys. To prevent it from
crashing make sure manager_create_interfaces never frees the pending
wiphy state and instead make the caller check whether it needs to be
freed so it can be done safely inside loops.
Instead of having two separate types of scans make the periodic scan
logic a layer on top of the one-off scan requests, with minimum code to
account for the lower priority of those scans and the fact that periodic
scans also receive results from external scans. Also try to simplify
the code for both the periodic and one-off scans. In the SCAN_RESULTS
and SCAN_ABORT add more complete checks of the current request's state
so we avoid some existing crashes related to external scans.
scan_send_next_cmd and start_next_scan_request are now just one function
since their funcionality was similar and start_next_scan_request is used
everywhere. Also the state after the trigger command receives an EBUSY
is now the same as when a new scan is on top of the queue so we have
fewer situations to consider.
This code still does not account for fragmented scans where an external
scan between two or our fragments flushes the results and we lose some
of the results, or for fragmented scans that take over 30s and the
kernel expires some results (both situations are unlikely.)
In both netdev_{authenticate,associate}_event there is no need to check
for in_ft at the start since netdev->ap will always be set if in_ft is
set.
There was also no need to set eapol_sm_set_use_eapol_start, as setting
require_handshake implies this and achieves the same result when starting
the SM.
Since FT operates over Authenticate/Associate, it makes the most sense
for it to behave like the other auth-protos.
This change moves all the FT specific processing out of netdev and into
ft.c. The bulk of the changes were strait copy-pastes from netdev into
ft.c with minor API changes (e.g. remove struct netdev).
The 'in_ft' boolean unforunately is still required for a few reasons:
- netdev_disconnect_event relies on this flag so it can ignore the
disconnect which comes in when doing a fast transition. We cannot
simply check netdev->ap because this would cause the other auth-protos
to not handle a disconnect correctly.
- netdev_associate_event needs to correctly setup the eapol_sm when
in FT mode by setting require_handshake and use_eapol_start to false.
This cannot be handled inside eapol by checking the AKM because an AP
may only advertise a FT AKM, and the initial mobility association
does require the 4-way handshake.
Now the 'ft' module, previously ftutil, will be used to drive FT via
the auth-proto virtual class. This renaming is in preparation as
ftutil will become obsolete since all the IE building/processing is
going to be moved out of netdev. The new ft.c module will utilize
the existing ftutil functionality, but since this is now a full blown
auth protocol naming it 'ft' is better suited.
The duplicate/similar code in netdev_associate_event and
netdev_connect_event leads to very hard to follow code, especially
when you throw OWE/SAE/FILS or full mac cards into the mix.
Currently these protocols finish the connection inside
netdev_associate_event, and set ignore_connect_event. But for full
mac cards we must finish the connection in netdev_connect_event.
In attempt to simplify this, all connections will be completed
and/or the 4-way started in netdev_connect_event. This satisfies
both soft/full mac cards as well as simplifies the FT processing
in netdev_associate_event. Since the FT IEs can be processed in
netdev_connect_event (as they already are to support full mac)
we can assume that any FT processing inside netdev_associate_event
is for a fast transition, not initial mobility association. This
simplifies netdev_ft_process_associate by removing all the blocks
that would get hit if transition == false.
Handling FT this way also fixes FT-SAE which was broken after the
auth-proto changes since the initial mobility association was
never processed if there was an auth-proto running.
SAE was a bit trickier than OWE/FILS because the initial implementation
for SAE did not include parsing raw authenticate frames (netdev skipped
the header and passed just the authentication data). OWE/FILS did not
do this and parse the entire frame in the RX callbacks. Because of this
it was not as simple as just setting some RX callbacks. In addition,
the TX functions include some of the authentication header/data, but
not all (thanks NL80211), so this will require an overhaul to test-sae
since the unit test passes frames from one SM to another to test the
protocol end-to-end (essentially the header needs to be prepended to
any data coming from the TX functions for the end-to-end tests).
Since ERP is only used for FILS and not behaving in the 'normal' ERP
fashion (dealing with actual EAP data, timeouts etc.) we can structure
ERP as a more synchronous protocol, removing the need for a complete
callback.
Now, erp_rx_packet returns a status, so FILS can decide how to handle
any failures. The complete callback was also removed in favor of a
getter for the RMSK (erp_get_rmsk). This allows FILS to syncronously
handle ERP, and potentially fail directly in fils_rx_authenticate.
A new eapol API was added specifically for FILS (eapol_set_started). Since
either way is special cased for FILS, its a bit cleaner to just check the
AKM inside eapol_start and, if FILS, dont start any timeouts or start the
handshake (effectively what eapol_set_started was doing).
This is a new concept applying to any protocol working over authenticate
and/or associate frames (OWE/SAE/FILS). All these protocols behave
similarly enough that they can be unified into a handshake driver
structure.
Now, each protocol will initialize this auth_proto structure inside
their own internal data. The auth_proto will be returned from
the initializer which netdev can then use to manage the protocol by
forwarding authenticate/associate frames into the individual drivers.
The auth_proto consists only of function pointers:
start - starts the protocol
free - frees the driver data
rx_authenticate - receive authenticate frame
rx_associate - receive associate frame
auth_timeout - authenticate frame timed out
assoc_timeout - associate frame timed out
If the setting is true we'll not attempt to remove or create
interfaces on any wiphys and will only use the default interface
(if it exists). If false, force us managing the interfaces. Both
values override the auto logic.
An unexpected Associate event would cause iwd to crash when accessing
netdev->handshake->mde. netdev->handshake is only set if we're
attempting to connect or connected somewhere so check netdev->connected
first.
FILS needs to allocate an extra 16 bytes of key data for the AES-SIV
vector. Instead of leaving it up to the caller to figure this out (as
was done with the GTK builder) eapol_create_common can allocate the
extra space since it knows the MIC length.
This also updates _create_gtk_2_of_2 as it no longer needs to create
an extra data array.
Since FILS does not use a MIC, the 1/4 handler would always get called
for FILS PTK rekeys. We can use the fact that message 1/4 has no MIC as
well as no encrypted data to determine which packet it is. Both no MIC
and no encrypted data means its message 1/4. Anything else is 3/4.
For FILS rekeys, we still derive the PTK using the 4-way handshake.
And for FILS-SHA384 we need the SHA384 KDF variant when deriving.
This change adds both FILS-SHA256 and FILS-SHA384 to the checks
for determining the SHA variant.
crypto_derive_pairwise_ptk was taking a boolean to decide whether to
use SHA1 or SHA256, but for FILS SHA384 may also be required for
rekeys depending on the AKM.
crypto_derive_pairwise_ptk was changed to take l_checksum_type instead
of a boolean to allow for all 3 SHA types.
AP still relies on the get_data/set_length semantics. Its more convenient
to still use these since it avoids the need for extra temporary buffers
when building the rates IE.
The TLV builder APIs were not very intuative, and in some (or all)
cases required access to the builder structure directly, either to
set the TLV buffer or to get the buffer at the end.
This change adds a new API, ie_tlv_builder_set_data, which both sets
the length for the current TLV and copies the TLV data in one go.
This will avoid the need for memcpy(ie_tlv_builder_get_data(...),...)
ie_tlv_builder_finalize was also changed to return a pointer to the
start of the build buffer. This will eliminate the need to access
builder.tlv after building the TLVs.
ie_tlv_builder_init was changed to take an optional buffer to hold
the TLV data. Passing NULL/0 will build the TLV in the internal
buffer. Passing in a pointer and length will build into the passed
in buffer.
Let manager.c signal to wiphy.c when the wiphy parsing from the genl
messages is complete. When we query for existing wiphy using the
GET_WIPHY dump command we get many genl messages per wiphy, on a
notification we only get one message. So after wiphy_create there may
be one or many calls to wiphy_update_from_genl. wiphy_create_complete
is called after all of them, so wiphy.c can be sure it's done with
parsing the wiphy attributes when in prints the new wiphy summary log
message, like it did before manager.c was added.
I had wrongly assumed that all the important wiphy attributes were in
the first message in the dump, but NL80211_ATTR_EXT_FEATURES was not and
wasn't being parsed which was breaking at least testRSSIAgent.
SAE was behaving inconsitently with respect to freeing the state.
It was freeing the SM internally on failure, but requiring netdev
free it on success.
This removes the call to sae_sm_free in sae.c upon failure, and
instead netdev frees the SM in the complete callback in all cases
regardless of success or failure.
From netdev's prospective FILS works the same as OWE/SAE where we create
a fils_sm and forward all auth/assoc frames into the FILS module. The
only real difference is we do not start EAPoL once FILS completes.
FILS (Fast Initial Link Setup) allows a station to negotiate a PTK during
authentication and association. This allows for a faster connection as
opposed to doing full EAP and the 4-way. FILS uses ERP (EAP Reauth Protocol)
to achieve this, but encapsulates the ERP data into an IE inside
authenticate frames. Association is then used to verify both sides have
valid keys, as well as delivering the GTK/IGTK.
FILS will work similar to SAE/OWE/FT where netdev registers a fils_sm, and
then forwards all Auth/Assoc frame data to and from the FILS module.
Keeping the ERP cache on the handshake object allows station.c to
handle all the ERP details and encapsulate them into a handshake.
FILS can then use the ERP cache right from the handshake rather
than getting it itself.
wiphy_select_akm needed to be updated to take a flag, which can be
set to true if there are known reauth keys for this connection. If
we have reauth keys, and FILS is available we will choose it.
If the AP send an associate with an unsupported group status, OWE
was completely starting over and sending out an authenticate frame
when it could instead just resend the associate frame with a
different group.
With FILS support coming there needs to be a way to set the PTK directly.
Other AKMs derive the PTK via the 4-way handshake, but FILS computes the
PTK on its own.
This reverts commit 1e337259ce.
Using util_get_username was wrong in this context. MSCHAPv2 expects us
to only strip the domain name from identities of the form
domain\identity. util_get_username would also strip identities of the
form username@domain.com.
FILS-SHA384 got overlooked and the kek length was being hard coded
to 32 bytes when encrypting the key data. There was also one occurence
where the kek_len was just being set incorrectly.
If the input length is 16 bytes, this means aes_siv_decrypt should
only be verifying the 16 byte SIV and not decrypting any data. If
this is the case, we can skip over the whole AES-CTR portion of
AES-SIV and only verify the SIV.
In eapol_key_handle, 'have_snonce' is checked before decrypting the
key data. For FILS, there will be no snonce so this check can be
skipped if mic_len == 0.
The GTK handshake for FILS uses AES-SIV to encrypt the key data, and
does away with the MIC completely. Now, when finalizing the 2/2 GTK
packet we check the MIC length, and if zero we assume FILS is being
used and we use AES-SIV to encrypt the key data.
For FILS, there is no actual data being encrypted for GTK 2/2 (hence
why the input data length is zero). This results in only the SIV
being generated, which essentially serves the same purpose as a MIC.
FILS does not use a MIC, as well as requires encrypted data on GTK 2/2.
This updates eapol_create_gtk_2_of_2 to pass in extra data to
eapol_create_common, which will reserve room for this encrypted data.
Extra data is only reserved if mic_len == 0.
FILS does not use a MIC in EAPoL frames and also requires encrypted
data on all EAPoL frames. In the common builder the mic_len is now
checked and the flags are set appropriately.
FILS authentication does away with the MIC, so checking for key_mic
in the eapol key frame does not allow FILS to work. Now we pass in
the mic_len to eapol_verify_gtk_1_of_2, and if it is non-zero we can
check that the MIC is present in the frame.
FILS does not require an eapol_sm for authentication, but rekeys
are still performed using the 4-way handshake. Because of this
FILS needs to create a eapol_sm in a 'started' state, but without
calling eapol_start as this will initialize EAP and create handshake
timeouts.
This allows EAPoL to wait for any 4-way packets, and handle them
as rekeys.
ERP (EAP Reauthentication Protocol) allows a station to quickly
reauthenticate using keys from a previous EAP authentication.
This change both implements ERP as well as moves the key cache into
the ERP module.
ERP in its current form is here to only support FILS. ERP is likely not
widespread and there is no easy way to determine if an AP supports ERP
without trying it. Attempting ERP with a non-ERP enabled AP will actually
result in longer connection times since ERP must fail and then full EAP
is done afterwards. For this reason ERP was separated from EAP and a
separate ERP state machine must be created. As it stands now, ERP cannot
be used on its own, only with FILS.
Quick scan uses a set of frequencies associated with the
known networks. This allows to reduce the scan latency.
At this time, the frequency selection follows a very simple
logic by taking all known frequencies from the top 5 most
recently connected networks.
If connection isn't established after the quick scan attempt,
we fall back to the full periodic scan.
Instead of handling NEW_WIPHY events and WIHPY_DUMP events in a similar
fashion, split up the paths to optimize iwd startup time. There's
fundamentally no reason to wait a second (and eat up file-descriptor
resources for timers unnecessarily) when we can simply start an
interface dump right after the wiphy dump.
In case a new wiphy is added in the middle of a wiphy dump, we will
likely get a new wiphy event anyway, in which case a setup_timeout will
be created and we will ignore this phy during the interface dump
processing.
This also optimizes the case of iwd being re-started, in which case
there are no interfaces present.
Separate out the two types of NEW_WIPHY handlers into separate paths and
factor out the common code into a utility function.
Dumps of CMD_NEW_WIPHY can be split up over several messages, while
CMD_NEW_WIPHY events (generated when a new card is plugged in) are
stuffed into a single message.
This also prepares ground for follow-on commits where we will handle the
two types of events differently.
src/netdev.c:netdev_create_from_genl() Skipping duplicate netdev wlp2s0[3]
Aborting (signal 11) [/home/denkenz/iwd/src/iwd]
++++++++ backtrace ++++++++
#0 0x7fc4c7a4e930 in /lib64/libc.so.6
#1 0x40ea13 in netdev_getlink_cb() at src/netdev.c:4654
#2 0x468cab in process_message() at ell/netlink.c:183
#3 0x4690a3 in can_read_data() at ell/netlink.c:289
#4 0x46681d in io_callback() at ell/io.c:126
#5 0x4651cd in l_main_iterate() at ell/main.c:473
#6 0x46530e in l_main_run() at ell/main.c:516
#7 0x465626 in l_main_run_with_signal() at ell/main.c:642
#8 0x403df8 in main() at src/main.c:513
#9 0x7fc4c7a39bde in /lib64/libc.so.6
Mirror netdev.c white/blacklist logic. If either or both the whitelist
and the blacklist are given also fall back to not touching the existing
interface setup on the wiphy.
If we get an error during DEL_INTERFACE or NEW_INTERFACE we may be
dealing with a driver that doesn't implement virtual interfaces or
doesn't implement deleting the default interface. In this case fall
back to using the first usable interface that we've detected on this
wiphy.
There's at least one full-mac driver that doesn't implement the cfg80211
.del_virtual_intf and .add_virtual_intf methods and at least one that
only allows P2P interfaces to be manipulated. mac80211 drivers seem to
at least implement those methods but I didn't check to see if there are
driver where they'd eventually return EOPNOTSUPP.
This is probably the trickiest part in this patchset. I'm introducing a
new logic where instead of using the interfaces that we find present
when a wiphy is detected, which would normally be the one default
interface per wiphy but could be 0 or more than one, we create one
ourselves with the socket owner attribute and use exactly one for
Station, AP and Ad-Hoc modes. When IWD starts we delete all the
interfaces on existing wiphys that we're going to use (as determined by
the wiphy white/blacklists) or freshly hotplugged ones, and only then we
register the interface we're going to use meaning that the wiphy's
limits on the number of concurrent interfaces of each type should be at
0. Otherwise we'd be unlikely to be abe to create the station interface
as most adapters only allow one. After that we ignore any interfaces
that may be created by other processes as we have no use for multiple
station interfaces.
At this point manager.c only keeps local state for wiphys during
the interface setup although when we start adding P2P code we will be
creating and removing interfaces multiple times during the wiphy's
runtime and may need to track it here or in wiphy.c. We do not
specifically check the interface number limits received during the wiphy
dump, if we need to create any interfaces and we're over the driver's
maximum for that specific iftype we'll still attempt it and report error
if it fails.
I tested this and it seems to work with my laptop's intel card and some
USB hotplug adapters.
The latest refactoring ended up assuming that FT related elements would
be handled in netdev_associate_event. However, FullMac cards (that do
not generate netdev_associate_event) could still connect using FT AKMs
and perform the Initial mobility association. In such cases the FTE
element was required but ended up not being set into the handshake.
This caused the handshake to fail during PTK 1_of_4 processing.
Fix this by making sure that FTE + related info is set into the
handshake, albeit with a lower sanity checking level since the
elements have been processed by the firmware already.
Note that it is currently impossible for actual FTs to be performed on
FullMac cards, so the extra logic and sanity checking to handle these
can be skipped.
Add functionality to read and parse the known frequencies
from permanent storage on start of the service. On service
shutdown, we sync the known frequencies back to the permanent
storage.
Each known network (previously connected) will have a set
of known frequencies associated with it, e.g. a set of
frequencies from all BSSs observed. The list of known
frequencies is sorted with the most recently observed
frequency in the head.
Previously, the scan results were disregarded once the new
ones were available. To enable the scan scenarios where the
new scan results are delivered in parts, we introduce a
concept of aging BSSs and will remove them based on
retention time.
Add manager.c, a new file where the wiphy and interface creation/removal
will be handled and interface use policies will be implemented. Since
not all kernel-side nl80211 interfaces are tied to kernel-side netdevs,
netdev.c can't manage all of the interfaces that we will be using, so
the logic is being moved to a common place where all interfaces on a
wiphy will be managed according to the policy, device support for things
like P2P and user enabling/disabling/connecting with P2P which require
interfaces to be dynamically added and removed.
Add wiphy_create, wiphy_update_from_genl and wiphy_destroy that together
will let a new file command the wiphy creation, updates and deletion
with the same functionality the current config notification handler
implements in wiphy.c.
As mentioned in code comments the name is NUL-terminated so there's no
need to return the length path, which was ignored in some occasions
anyway. Consistently treat it as NUL-terminated but also validate.