Build and send the FT Authentication Request frame, the initial Fast
Transition message.
In this version the assumption is that once we start a transition attempt
there's no going back so the old handshake_state, scan_bss, etc. can be
replaced by the new objects immediately and there's no point at which both
the old and the new connection states are needed. Also the disconnect
event for the old connection is implicit. At netdev level the state
during a transition is almost the same with a new connection setup.
The first disconnect event on the netlink socket after the FT Authenticate
is assumed to be the one generated by the kernel for the old connection.
The disconnect event doesn't contain the AP bssid (unlike the
deauthenticate event preceding it), otherwise we could check to see if
the bssid is the one we are interested in or could check connect_cmd_id
assuming a disconnect doesn't happen before the connect command finishes.
Action Frames are sent by nl80211 as unicast data. We're not receiving
any other unicast packets in iwd at this time so let netdev directly
handle all unicast data on the genl socket.
There are situations when a CMD_DISCONNECT or deauthenticate will be
issued locally because of an error detected locally where netdev would
not be able to emit a event to the device object. The CMD_DISCONNECT
handler can only send an event if the disconnect is triggered by the AP
because we don't have an enum value defined for other diconnects. We
have these values defined for the connect callback but those errors may
happen when the connect callback is already NULL because a connection
has been estabilshed. So add an event type for local errors.
These situations may occur in a transition negotiation or in an eapol
handshake failure during rekeying resulting in a call to
netdev_handshake_failed.
The kernel parses NL80211_ATTR_USE_MFP to mean an enumeration
nl80211_mfp. So instead of using a boolean, we should be using the
value NL80211_MFP_REQUIRED.
Use the NLMSG_ALIGN macro on the family header size (struct ifinfomsg in
this case). The ascii graphics in include/net/netlink.h show that both
the netlink header and the family header should be padded. The netlink
header (nlmsghdr) is already padded in ell. To "document" this
requirementin ell what we could do is take two buffers, one for the
family header and one for the attributes.
This doesn't change anything for most people because ifinfomsg is
already 16-byte long on the usual architectures.
Remove the keys and other data from struct eapol_sm, update device.c,
netdev.c and wsc.c to use the handshake_state object instead of
eapol_sm. This also gets rid of eapol_cancel and the ifindex parameter
in some of the eapol functions where sm->handshake->ifindex can be
used instead.
If an MD IE is supplied to netdev_connect, pass that MD IE in the
associate request, then validate and handle the MD IE and FT IE in the
associate response from AP.
Split eapol_start into two calls, one to register the state machine so
that the PAE read handler knows not to discard frames for that ifindex,
and eapol_start to actually start processing the frames. This is needed
because, as per the comment in netdev.c, due to scheduling the PAE
socket read handler may trigger before the CMD_CONNECT event handler,
which needs to parse the FTE from the Associate Response frame and
supply it to the eapol SM before it can do anything with the message 1
of 4 of the FT handshake.
Another issue is that depending on the driver or timing, the underlying
link might not be marked as 'ready' by the kernel. In this case, our
response to Message 1 of the 4-way Handshake is written and accepted by
the kernel, but gets dropped on the floor internally. Which leads to
timeouts if the AP doesn't retransmit.
If the handshake fails, we trigger a deauthentication prior to reporting
NETDEV_RESULT_HANDSHAKE_FAILED. If a netdev_disconnect is invoked in
the meantime, then the caller will receive -ENOTCONN. This is
incorrect, since we are in fact logically connected until the connect_cb
is notified.
Tweak the behavior to keep the connected variable as true, but check
whether disconnect_cmd_id has been issued in the netdev_disconnect_event
callback.
We used to open a socket for each wireless interface. This patch uses a
single socket with an attached BPF to handle all EAPoL traffic via a
single file descriptor.
CMD_DISCONNECT fails on some occasions when CMD_CONNECT is still
running. When this happens the DBus disconnect command receives an
error reply but iwd's device state is left as disconnected even though
there's a connection at the kernel level which times out a few seconds
later. If the CMD_CONNECT is cancelled I couldn't reproduce this so far.
src/network.c:network_connect()
src/network.c:network_connect_psk()
src/network.c:network_connect_psk() psk:
69ae3f8b2f84a438cf6a44275913182dd2714510ccb8cbdf8da9dc8b61718560
src/network.c:network_connect_psk() len: 32
src/network.c:network_connect_psk() ask_psk: false
src/device.c:device_enter_state() Old State: disconnected, new state:
connecting
src/scan.c:scan_notify() Scan notification 33
src/device.c:device_netdev_event() Associating
src/netdev.c:netdev_mlme_notify() MLME notification 60
MLME notification is missing ifindex attribute
src/device.c:device_dbus_disconnect()
src/device.c:device_connect_cb() 6, result: 5
src/device.c:device_enter_state() Old State: connecting, new state:
disconnecting
src/device.c:device_disconnect_cb() 6, success: 0
src/device.c:device_enter_state() Old State: disconnecting, new state:
disconnected
src/scan.c:scan_notify() Scan notification 34
src/netdev.c:netdev_mlme_notify() MLME notification 19
src/netdev.c:netdev_mlme_notify() MLME notification 60
MLME notification is missing ifindex attribute
src/netdev.c:netdev_mlme_notify() MLME notification 37
src/netdev.c:netdev_authenticate_event()
src/scan.c:get_scan_callback() get_scan_callback
src/scan.c:get_scan_done() get_scan_done
src/netdev.c:netdev_mlme_notify() MLME notification 60
MLME notification is missing ifindex attribute
src/netdev.c:netdev_mlme_notify() MLME notification 19
MLME notification is missing ifindex attribute
src/netdev.c:netdev_mlme_notify() MLME notification 38
src/netdev.c:netdev_associate_event()
src/netdev.c:netdev_mlme_notify() MLME notification 46
src/netdev.c:netdev_connect_event()
<delay>
src/netdev.c:netdev_mlme_notify() MLME notification 60
MLME notification is missing ifindex attribute
src/netdev.c:netdev_mlme_notify() MLME notification 20
MLME notification is missing ifindex attribute
src/netdev.c:netdev_mlme_notify() MLME notification 20
src/netdev.c:netdev_mlme_notify() MLME notification 39
src/netdev.c:netdev_deauthenticate_event()
This is to make sure device_remove and netdev_connect_free are called
early so we don't continue setting up a connection and don't let DBus
clients power device back up after we've called netdev_set_powered.
Prevents situations like this:
src/device.c:device_enter_state() Old State: connecting, new state:
connected
src/scan.c:scan_periodic_stop() Stopping periodic scan for ifindex: 3
src/device.c:device_dbus_disconnect()
src/device.c:device_connect_cb() 3
src/device.c:device_disassociated() 3
src/device.c:device_enter_state() Old State: connected, new state:
autoconnect
Try to make the connect and disconnect operations look more like a
transaction where the callback is always called eventually, also with a
clear indication if the operation is in profress. The connected state
lasts from the start of the connection attempt until the disconnect.
1. Non-null netdev->connected or disconnect_cb indicate that the operation
is active.
2. Every entry-point in netdev.c checks if connected is still set
before executing the next step of the connection setup. CMD_CONNECT and
the subsequent commands may succeed even if CMD_DISCONNECT is called
in the middle so they can't only rely on the error value for that.
3. netdev->connect_cb and other elements of the connection state are
reset by netdev_connect_free which groups the clean-up operations to
make sure we don't miss anything. Since the callback pointers are
reset device.c doesn't need to check that it receives a spurious
event in those callbacks for example after calling netdev_disconnect.
If initial bring up returns ERFKILL proceed and the inteface can be
explicitly brought up by the client once rfkill is disabled.
Also fix the error number returned to netdev_set_powered callback to be
negative as expected by netdev_initial_up_cb.
==3059== 7 bytes in 1 blocks are still reachable in loss record 1 of 2
==3059== at 0x4C2C970: malloc (vg_replace_malloc.c:296)
==3059== by 0x50BB319: strndup (in /lib64/libc-2.22.so)
==3059== by 0x417B4D: l_strndup (util.c:180)
==3059== by 0x417E1B: l_strsplit (util.c:311)
==3059== by 0x4057FC: netdev_init (netdev.c:1658)
==3059== by 0x402E26: nl80211_appeared (main.c:112)
==3059== by 0x41F577: get_family_callback (genl.c:1038)
==3059== by 0x41EE3F: process_unicast (genl.c:390)
==3059== by 0x41EE3F: received_data (genl.c:506)
==3059== by 0x41C6F4: io_callback (io.c:120)
==3059== by 0x41BAA9: l_main_run (main.c:381)
==3059== by 0x402B9C: main (main.c:234)
When a new wiphy is added, the kernel usually adds a default STA
interface as well. This interface is currently not signaled over
nl80211 in any way.
This implements a selective dump of the wiphy interfaces in order to
obtain the newly added netdev. Selective dump is currently not
supported by the kernel, so all netdevs will be returned. A patch on
linux-wireless is pending that implements the selective dump
functionality.
==24934== 16 bytes in 1 blocks are definitely lost in loss record 1 of 1
==24934== at 0x4C2C970: malloc (vg_replace_malloc.c:296)
==24934== by 0x41675D: l_malloc (util.c:62)
==24934== by 0x4033B3: netdev_set_linkmode_and_operstate
(netdev.c:149)
==24934== by 0x4042B9: netdev_free (netdev.c:221)
==24934== by 0x41735D: l_queue_clear (queue.c:107)
==24934== by 0x4173A8: l_queue_destroy (queue.c:82)
==24934== by 0x40543D: netdev_exit (netdev.c:1459)
==24934== by 0x402D6F: nl80211_vanished (main.c:126)
==24934== by 0x41E607: l_genl_family_unref (genl.c:1057)
==24934== by 0x402B50: main (main.c:237)
When setting operstate to dormant or down, give it a callback for debug
purposes. It looks like that operstate down message does not have a
chance to go out currently.
The eapol state machine parameters are now built inside device.c when
the network connection is attempted. The reason is that the device
object knows about network settings, wiphy constraints and should
contain the main 'management' logic.
netdev now manages the actual low-level process of building association
messages, detecting authentication events, etc.
Turn netdev watches into device watches. The intent is to refactor out
netdev specific details into its own class and move device specific
logic into device.c away from wiphy.c