Define minimum delay between roam attempts and add automatic retries.
This handles a few situations:
* roam attempt failing, then RSSI going above the threshold and below
again -- in that case we don't want to reattempt too soon, we'll only
reattempt after 60s.
* roam attempt failing then RSSI staying low for longer than 60 -- in
that case we want to reattempt after 60s too.
* signal being low from the moment we connected -- in that case we also
want to attempt a roam every some time.
Fix a leak of the MDE buffer. It is now only needed for the single call
to handshake_state_set_mde which copies the bytes anyway so use a buffer
on stack.
Since caab23f192085e6c8e47c41fc1ae9f795d1cbe86 hostapd is going to set
this bit to zero for RSN networks but both values will obviously be in
use. Only check the value if is_wpa is true - in this case check the
value is exactly 16, see hostapd commit:
commit caab23f192085e6c8e47c41fc1ae9f795d1cbe86
Author: Jouni Malinen <j@w1.fi>
Date: Sun Feb 5 13:52:43 2017 +0200
Set EAPOL-Key Key Length field to 0 for group message 1/2 in RSN
P802.11i/D3.0 described the Key Length as having value 16 for the group
key handshake. However, this was changed to 0 in the published IEEE Std
802.11i-2004 amendment (and still remains 0 in the current standard IEEE
Std 802.11-2016). We need to maintain the non-zero value for WPA (v1)
cases, but the RSN case can be changed to 0 to be closer to the current
standard.
Add sr NULL check before accessing sr->id. Call scan_request_free on
request structure and call the destroy callback. Cancel the netlink
TRIGGER_SCAN command if still running and try starting the next scan
in the queue. It'll probably still fail with EBUSY but it'll be
reattempted later.
Always call start_next_scan_request when a scan request has finished,
with a success or a failure, including a periodic scan attempt. Inside
that function check if there's any work to be done, either for one-off
scan requests or periodic scan, instead of having this check only inside
get_scan_done. Call start_next_scan_request in scan_periodic_start and
scan_periodic_timeout.
Also call the trigger callback with an error code when sending the
netlink command fails after the scan request has been queued because
another scan was in progress when the scan was requested.
Program received signal SIGSEGV, Segmentation fault.
0x0000000000419d38 in scan_done (msg=0x692580, userdata=0x688250)
at src/scan.c:250
250 sc->state = sr->passive ? SCAN_STATE_PASSIVE : SCAN_STATE_ACTIVE;
(gdb) bt
0 0x0000000000419d38 in scan_done (msg=0x692580, userdata=0x688250)
at src/scan.c:250
1 0x000000000043cac0 in process_unicast (genl=0x686d60, nlmsg=0x7fffffffc3b0)
at ell/genl.c:390
2 0x000000000043ceb0 in received_data (io=0x686e60, user_data=0x686d60)
at ell/genl.c:506
3 0x000000000043967d in io_callback (fd=6, events=1, user_data=0x686e60)
at ell/io.c:120
4 0x000000000043824d in l_main_run () at ell/main.c:381
5 0x000000000040303c in main (argc=1, argv=0x7fffffffe668) at src/main.c:259
The reasoning is that the logic inside scan_common is reversed. Instead
of freeing the scan request on error, we always do it. This causes the
trigger_scan callback to receive invalid userdata.
Save the ids of the netlink trigger scan commands that we send and
cancel them in scan_ifindex_remove to fix a race leading to a
segfault. The segfault would happen every time if scan_ifindex_remove
was called in the same main loop iteration in which we sent the
command, on shutdown:
^CTerminate
src/netdev.c:netdev_free() Freeing netdev wlan3[6]
src/device.c:device_disassociated() 6
src/device.c:device_enter_state() Old State: connected, new state:
disconnected
src/device.c:device_enter_state() Old State: disconnected, new state:
autoconnect
src/scan.c:scan_periodic_start() Starting periodic scan for ifindex: 6
src/device.c:device_free()
src/device.c:bss_free() Freeing BSS 02:00:00:00:00:00
src/device.c:bss_free() Freeing BSS 02:00:00:00:01:00
Removing scan context for ifindex: 6
src/scan.c:scan_context_free() sc: 0x5555557ca290
src/scan.c:scan_notify() Scan notification 33
src/netdev.c:netdev_operstate_down_cb() netdev: 6, success: 1
src/scan.c:scan_periodic_done()
src/scan.c:scan_periodic_done() Periodic scan triggered for ifindex:
1434209520
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000064 in ?? ()
(gdb) bt
#0 0x0000000000000064 in ?? ()
#1 0x0000555555583560 in process_unicast (nlmsg=0x7fffffffc1a0,
genl=0x5555557c1d60) at ell/genl.c:390
#2 received_data (io=<optimized out>, user_data=0x5555557c1d60)
at ell/genl.c:506
#3 0x0000555555580d45 in io_callback (fd=<optimized out>,
events=1, user_data=0x5555557c1e60) at ell/io.c:120
#4 0x000055555558005f in l_main_run () at ell/main.c:381
#5 0x00005555555599c1 in main (argc=<optimized out>, argv=<optimized out>)
at src/main.c:259
Parse the contents of the GTK and IGTK subelements in an FT IE instead
of working with buffers containing the whole subelement. Some more
validation of the subelement contents. Drop support for GTK / IGTK when
building the FTE (unused).
Don't start the handshake timeout in eapol_start if either
handshake->ptk_complete is set (handshake already done) or
handshake->have_snonce is set (steps 1&2 done). This accounts for
eapol_start being called after a Fast Transition when a 4-Way handshake
is not expected.
Add a flush flag to scan_parameters to tell the kernel to flush the
cache of scan results before the new scan. Use this flag in the
active scan during roaming.
Split the igtk parameter to handshake_state_install_igtk into one
parameter for the actual IGTK buffer and one for the IPN buffer instead
of requiring the caller to have them both in one continuous buffer.
With FT protocol, one is received encrypted and the other in plain text.
Make sure that the Neighbor Report timeout is cancelled when connection
breaks or device is being destroyed, and call the callback. Add an
errno parameter to the callback to indicate the cause.
With this patch an actual fast transition should happen when the signal
strength goes low but there are still various details to be fixed before
this becomes useful:
* the kernel tends to return cached scan results and won't update the
rssi values,
* there's no timer to prevent too frequent transition attempts or to
retry after some time if the signal is still low,
* no candidate other than the top ranked BSS is tried. With FT it
may be impossible to try another BSS anyway although there isn't
anything in the spec to imply this. It would require keeping the
handshake_state around after netdev gives up on the transition
attempt.
Trigger a scan of the selected channels or all channels if no useful
neighbor list was obtained, then process the scan results to select the
final target BSS.
The actual transition to the new BSS is not included in this patch for
readability.
Trigger a roam attempt when the RSSI level has been low for at least 5
seconds using the netdev RSSI LOW/HIGH events. See if neighbor reports
are supported and if so, request and process the neighbor reports list
to restrict the number of channels to be scanned. The scanning part is
not included in this patch for readability.
Validate the fourth message of the fast transition sequence and save the
new keys and state as current values in the netdev object. The
FT-specific IE validation that was already present in the initial MD
is moved to a new function.
Build and send the FT Authentication Request frame, the initial Fast
Transition message.
In this version the assumption is that once we start a transition attempt
there's no going back so the old handshake_state, scan_bss, etc. can be
replaced by the new objects immediately and there's no point at which both
the old and the new connection states are needed. Also the disconnect
event for the old connection is implicit. At netdev level the state
during a transition is almost the same with a new connection setup.
The first disconnect event on the netlink socket after the FT Authenticate
is assumed to be the one generated by the kernel for the old connection.
The disconnect event doesn't contain the AP bssid (unlike the
deauthenticate event preceding it), otherwise we could check to see if
the bssid is the one we are interested in or could check connect_cmd_id
assuming a disconnect doesn't happen before the connect command finishes.
This adds support for iwd.conf 'ManagementFrameProtection' setting.
This setting has the following semantics, with '1' being the default:
0 - MFP off, even if hardware is capable
1 - Use MFP if available
2 - MFP required. If the hardware is not capable, no connections will
be possible. Use at your own risk.
Despite RFC3748 mandating MSKs to be at least 256 bits some EAP methods
return shorter MSKs. Since we call handshake_failed when the MSK is too
short, EAP methods have to be careful with their calls to set_key_material
because it may result in a call to the method's .remove method.
EAP-TLS and EAP-TTLS can't handle that currently and would be difficult to
adapt because of the TLS internals but they always set msk_len to 64 so
handshake_failed will not be called.
Make sure that eap_set_key_material can free the whole EAP method and
EAP state machine before returning, by calling that function last. This
relies on eap_mschapv2_handle_success being the last call in about 5
stack frames above it too.
Action Frames are sent by nl80211 as unicast data. We're not receiving
any other unicast packets in iwd at this time so let netdev directly
handle all unicast data on the genl socket.
Add a version of scan_active that accepts a struct with the scan
parameters so we can more easily add new parameters. Since the genl
message is now built within scan_active_start the extra_ie memory
can be freed by the caller at any time.
clang complains about enum as var_arg type
because of the argument standard conversion.
In a small test I did neither clang nor gcc can
properly warn about out of range values, so it's
purely for documentation either way.
There are situations when a CMD_DISCONNECT or deauthenticate will be
issued locally because of an error detected locally where netdev would
not be able to emit a event to the device object. The CMD_DISCONNECT
handler can only send an event if the disconnect is triggered by the AP
because we don't have an enum value defined for other diconnects. We
have these values defined for the connect callback but those errors may
happen when the connect callback is already NULL because a connection
has been estabilshed. So add an event type for local errors.
These situations may occur in a transition negotiation or in an eapol
handshake failure during rekeying resulting in a call to
netdev_handshake_failed.
The kernel parses NL80211_ATTR_USE_MFP to mean an enumeration
nl80211_mfp. So instead of using a boolean, we should be using the
value NL80211_MFP_REQUIRED.
Make the use of EAPOL-Start the default and send it when configured for
8021x and either we receive no EAPOL-EAP from from the AP before
timeout, or if the AP tries to start a 4-Way Handshake.
On certain routers, the 4-Way handshake message 3 of 4 contains a key iv
field which is not zero as it is supposed to. This causes us to fail
the handshake.
Since the iv field is not utilized in this particular case, it is safe
to simply warn rather than fail the handshake outright.
Use the NLMSG_ALIGN macro on the family header size (struct ifinfomsg in
this case). The ascii graphics in include/net/netlink.h show that both
the netlink header and the family header should be padded. The netlink
header (nlmsghdr) is already padded in ell. To "document" this
requirementin ell what we could do is take two buffers, one for the
family header and one for the attributes.
This doesn't change anything for most people because ifinfomsg is
already 16-byte long on the usual architectures.
Remove the keys and other data from struct eapol_sm, update device.c,
netdev.c and wsc.c to use the handshake_state object instead of
eapol_sm. This also gets rid of eapol_cancel and the ifindex parameter
in some of the eapol functions where sm->handshake->ifindex can be
used instead.
struct handshake_state is an object that stores all the key data and other
authentication state and does the low level operations on the keys. Together
with the next patch this mostly just splits eapol.c into two layers
so that the key operations can also be used in Fast Transitions which don't
use eapol.
If device_select_akm_suite selects Fast Transition association then pass
the MD IE and other bits needed for eapol and netdev to do an FT
association and 4-Way Handshake.
If an MD IE is supplied to netdev_connect, pass that MD IE in the
associate request, then validate and handle the MD IE and FT IE in the
associate response from AP.
Add space in the eapol_sm struct for the pieces of information required
for the FT 4-Way Handshake and add setters for device.c and netdev.c to
be able to provide the data.
Don't decide on the AKM suite to use when the bss entries are received
and processed, instead select the suite when the connection is triggered
using a new function device_select_akm_suite, similar to
wiphy_select_cipher(). Describing the AKM suite through flags will be
more difficult when more than 2 suites per security type are supported.
Also handle the wiphy_select_cipher 0 return value when no cipher can be
selected.
The len parameter was only used so it could be validated against ie[1],
but since it was not checked to be > 2, it must have been validated
already, the check was redundant. In any case all users directly
passed ie[1] as len anyway. This makes it consistent with the ie
parsers and builders which didn't require a length.
In many cases the pairwise and group cipher information is not the only
information needed from the BSS RSN/WPA elements in order to make a
decision. For example, th MFPC/MFPR bits might be needed, or
pre-authentication capability bits, group management ciphers, etc.
This patch refactors bss_get_supported_ciphers into the more general
scan_bss_get_rsn_info function
Split eapol_start into two calls, one to register the state machine so
that the PAE read handler knows not to discard frames for that ifindex,
and eapol_start to actually start processing the frames. This is needed
because, as per the comment in netdev.c, due to scheduling the PAE
socket read handler may trigger before the CMD_CONNECT event handler,
which needs to parse the FTE from the Associate Response frame and
supply it to the eapol SM before it can do anything with the message 1
of 4 of the FT handshake.
Another issue is that depending on the driver or timing, the underlying
link might not be marked as 'ready' by the kernel. In this case, our
response to Message 1 of the 4-way Handshake is written and accepted by
the kernel, but gets dropped on the floor internally. Which leads to
timeouts if the AP doesn't retransmit.
This function takes an Operating Channel and a Country String to convert
it into a band. Using scan_oper_class_to_band and scan_channel_to_freq,
an Operating Channel, a Country String and a Channel Number together can
be converted into an actual frequency. EU and US country codes based on
wpa_supplicant's tables.
It doesn't matter for crypto_derive_pairwise_ptk in non-SHA256 mode
but in the FT PTK derivation function, as well as in SHA256 mode all
bytes of the output do actually change with the PTK size.
Fix autoconnect trying to connect to networks never used before as found
by Tim Kourt. Update the comments to be consistent with the use of the
is_known field and the docs, in that a Known Network is any network that
has a config file in the iwd storage, and an autoconnect candidate is a
network that has been connected to before.
If the handshake fails, we trigger a deauthentication prior to reporting
NETDEV_RESULT_HANDSHAKE_FAILED. If a netdev_disconnect is invoked in
the meantime, then the caller will receive -ENOTCONN. This is
incorrect, since we are in fact logically connected until the connect_cb
is notified.
Tweak the behavior to keep the connected variable as true, but check
whether disconnect_cmd_id has been issued in the netdev_disconnect_event
callback.
If the device is currently connected, we will initiate a disconnection
(or wait for the disconnection to complete) prior to starting the
WSC-EAP association.