Instead of one global protocol_version, we now store it inside eapol_sm.
This allows us to use the same protocol version for our response as the
request from the authenticator.
For unit tests where we had protocol version mismatches, a new method is
introduced to explicitly set the protocol version to use.
CMD_DISCONNECT fails on some occasions when CMD_CONNECT is still
running. When this happens the DBus disconnect command receives an
error reply but iwd's device state is left as disconnected even though
there's a connection at the kernel level which times out a few seconds
later. If the CMD_CONNECT is cancelled I couldn't reproduce this so far.
src/network.c:network_connect()
src/network.c:network_connect_psk()
src/network.c:network_connect_psk() psk:
69ae3f8b2f84a438cf6a44275913182dd2714510ccb8cbdf8da9dc8b61718560
src/network.c:network_connect_psk() len: 32
src/network.c:network_connect_psk() ask_psk: false
src/device.c:device_enter_state() Old State: disconnected, new state:
connecting
src/scan.c:scan_notify() Scan notification 33
src/device.c:device_netdev_event() Associating
src/netdev.c:netdev_mlme_notify() MLME notification 60
MLME notification is missing ifindex attribute
src/device.c:device_dbus_disconnect()
src/device.c:device_connect_cb() 6, result: 5
src/device.c:device_enter_state() Old State: connecting, new state:
disconnecting
src/device.c:device_disconnect_cb() 6, success: 0
src/device.c:device_enter_state() Old State: disconnecting, new state:
disconnected
src/scan.c:scan_notify() Scan notification 34
src/netdev.c:netdev_mlme_notify() MLME notification 19
src/netdev.c:netdev_mlme_notify() MLME notification 60
MLME notification is missing ifindex attribute
src/netdev.c:netdev_mlme_notify() MLME notification 37
src/netdev.c:netdev_authenticate_event()
src/scan.c:get_scan_callback() get_scan_callback
src/scan.c:get_scan_done() get_scan_done
src/netdev.c:netdev_mlme_notify() MLME notification 60
MLME notification is missing ifindex attribute
src/netdev.c:netdev_mlme_notify() MLME notification 19
MLME notification is missing ifindex attribute
src/netdev.c:netdev_mlme_notify() MLME notification 38
src/netdev.c:netdev_associate_event()
src/netdev.c:netdev_mlme_notify() MLME notification 46
src/netdev.c:netdev_connect_event()
<delay>
src/netdev.c:netdev_mlme_notify() MLME notification 60
MLME notification is missing ifindex attribute
src/netdev.c:netdev_mlme_notify() MLME notification 20
MLME notification is missing ifindex attribute
src/netdev.c:netdev_mlme_notify() MLME notification 20
src/netdev.c:netdev_mlme_notify() MLME notification 39
src/netdev.c:netdev_deauthenticate_event()
This is to make sure device_remove and netdev_connect_free are called
early so we don't continue setting up a connection and don't let DBus
clients power device back up after we've called netdev_set_powered.
Calling device_disassociated inside disconnect_cb was mostly pointless.
Most attributes were already cleared by device_disconnect() when
initiating the disconnection procedure.
This patch also modifies the logic for triggering the autoconnect. If
the user initiated the disconnect call, then autoconnect should not be
triggered. If the disconnect was triggered by other means, then iwd
will still enter autoconnect mode.
All of the abortion logic is invoked when device_disconnect is called.
So there's no point calling device_disassociated in this case. This
also prevents us from entering into autoconnect mode too early.
Prevents situations like this:
src/device.c:device_enter_state() Old State: connecting, new state:
connected
src/scan.c:scan_periodic_stop() Stopping periodic scan for ifindex: 3
src/device.c:device_dbus_disconnect()
src/device.c:device_connect_cb() 3
src/device.c:device_disassociated() 3
src/device.c:device_enter_state() Old State: connected, new state:
autoconnect
Also, remove the check for device->state == DEVICE_STATE_CONNECTING.
device_connect_cb should always called when the state is CONNECTING.
If this is not so, it indicates a bug inside the netdev layer.
This was introduced by commit f468fceb02.
However, after commit 2d78f51fac66b9beff03a56f12e5fb8456625f07, the
connect_cb is called from inside netdev_disconnect. This in turn causes
the dbus-reply to be sent out if needed. So by the time we get to the
code in question, connect_pending is always NULL.
Try to make the connect and disconnect operations look more like a
transaction where the callback is always called eventually, also with a
clear indication if the operation is in profress. The connected state
lasts from the start of the connection attempt until the disconnect.
1. Non-null netdev->connected or disconnect_cb indicate that the operation
is active.
2. Every entry-point in netdev.c checks if connected is still set
before executing the next step of the connection setup. CMD_CONNECT and
the subsequent commands may succeed even if CMD_DISCONNECT is called
in the middle so they can't only rely on the error value for that.
3. netdev->connect_cb and other elements of the connection state are
reset by netdev_connect_free which groups the clean-up operations to
make sure we don't miss anything. Since the callback pointers are
reset device.c doesn't need to check that it receives a spurious
event in those callbacks for example after calling netdev_disconnect.
If initial bring up returns ERFKILL proceed and the inteface can be
explicitly brought up by the client once rfkill is disabled.
Also fix the error number returned to netdev_set_powered callback to be
negative as expected by netdev_initial_up_cb.
map_wiphy made the assumption that phy names follow the "phyN" pattern
but phys created or renamed by the "iw" command can have arbitrary
names. It seems that /sys/class/rfkill/rfkill%u/name is not updated on
a phy rename, so we can't use it to subsequently read
/sys/class/ieee80211/<name>/index but both
/sys/class/rfkill/rfkill%u/../index and
/sys/class/rfkill/rfkill%u/device/index point to that file.
==3059== 7 bytes in 1 blocks are still reachable in loss record 1 of 2
==3059== at 0x4C2C970: malloc (vg_replace_malloc.c:296)
==3059== by 0x50BB319: strndup (in /lib64/libc-2.22.so)
==3059== by 0x417B4D: l_strndup (util.c:180)
==3059== by 0x417E1B: l_strsplit (util.c:311)
==3059== by 0x4057FC: netdev_init (netdev.c:1658)
==3059== by 0x402E26: nl80211_appeared (main.c:112)
==3059== by 0x41F577: get_family_callback (genl.c:1038)
==3059== by 0x41EE3F: process_unicast (genl.c:390)
==3059== by 0x41EE3F: received_data (genl.c:506)
==3059== by 0x41C6F4: io_callback (io.c:120)
==3059== by 0x41BAA9: l_main_run (main.c:381)
==3059== by 0x402B9C: main (main.c:234)
Previously device.c would remove the whole object at the path of the
Device and the WSC interfaces but now the watches are called without the
whole object appearing and disappearing.
Change the path for net.connman.iwd.Device objects to /phyX/Y and
register net.connman.iwd.Adapter at /phyX grouping devices of the same
wiphy.
Turns out no changes to the test/* scripts are needed.
The boolean property indicates if a scan is ongoing. Only the scans
triggered by device.c are reflected (not the ones from WSC) because only
those scans affect the list of networks seen by Dbus.
Add rfkill.c/rfkill.h to be used for watching per-wiphy RFkill state.
It uses both /dev/rfkill and /sys because /dev/rfkill is the recommended
way of interfacing with rfkill but at the same time it doesn't provide
the information on mapping to wiphy IDs.
Note that the autoconnect_list may still contain the network. Currently
only the top entry from the list is ever used and only on
new_scan_results(), i.e. at the same time the list is being created.
If at some point it becomes part of actual device state it needs to also
be reset when a network is being forgotten.
If Disconnect is called during an ongoing connection attempt send a
CMD_DEAUTHENTICATE command same as when we're already connected, and
send a reply to potential dbus Connect call.
When a new wiphy is added, the kernel usually adds a default STA
interface as well. This interface is currently not signaled over
nl80211 in any way.
This implements a selective dump of the wiphy interfaces in order to
obtain the newly added netdev. Selective dump is currently not
supported by the kernel, so all netdevs will be returned. A patch on
linux-wireless is pending that implements the selective dump
functionality.
==24934== 16 bytes in 1 blocks are definitely lost in loss record 1 of 1
==24934== at 0x4C2C970: malloc (vg_replace_malloc.c:296)
==24934== by 0x41675D: l_malloc (util.c:62)
==24934== by 0x4033B3: netdev_set_linkmode_and_operstate
(netdev.c:149)
==24934== by 0x4042B9: netdev_free (netdev.c:221)
==24934== by 0x41735D: l_queue_clear (queue.c:107)
==24934== by 0x4173A8: l_queue_destroy (queue.c:82)
==24934== by 0x40543D: netdev_exit (netdev.c:1459)
==24934== by 0x402D6F: nl80211_vanished (main.c:126)
==24934== by 0x41E607: l_genl_family_unref (genl.c:1057)
==24934== by 0x402B50: main (main.c:237)
Instead of calling the device added or removed callback when the
interface is detected, call it when interface goes up or down. This
only affects the addition and removal of the WSC interface now.
During the network_info refactoring the adding of the connected BSS to
device->bss_list in case it is not in the scan results has moved to
after the l_hashmap_foreach_remove call meaning that the network could
be removed even though it is still pointed at by
device->connected networks. Reverse the order to what it was before.
Alternatively network_process network could take not of the fact the
network is connected and not call network_remove on it leaving it with
an empty bss_list.
It is probably rare that a disconnect should fail but if it happens the
device->state is not returned to CONNECTED and I'm not sure if it should
be, so the ConnectedNetwork property and other bits should probably be
reset at the start of the disconnection instead of at the end.
Also check if state is CONNECTED before calling network_disconnected
because network_connected may have not been called yet.
--interfaces (-i) tells iwd which interfaces to manage. If the option
is ommitted, all interfaces will be managed.
--nointerfaces (-I) tells iwd which interfaces to blacklist. If the
option is ommitted, no interfaces will be blacklisted.
When setting operstate to dormant or down, give it a callback for debug
purposes. It looks like that operstate down message does not have a
chance to go out currently.
knownnetworks.c/.h implements the KnownNetworks interface and loads the
known networks from storage on startup. The list of all the networks
including information on whether a network is known is managed in
network.c to avoid having two separate lists of network_info structures
and keeping them in sync. That turns out to be difficult because the
network.c list is sorted by connected_time and connected_time changes
can be triggered in both network.c or knownnetworks.c. Both can also
trigger a network_info to be removed completely.
network_info gets a is_known flag that is used for the
GetOrderedNetworks tracking and to implement the KnownNetworks
interface - loading of the list of known networks on startup and
forgetting networks.
For simplicity and future use (possibly performance), every struct network
gets a pointer to a network_info structure, there's one network_info for
every network being by any interface, not only known networks. The SSID
and security type information is removed from struct network because the
network_info holds that information.
network_info also gets a seen_count field to count how many references
from network.info fields it has, so as to fix the removal of
network_info structures. Previously, once they were added to the
networks list, they'd stay there forever possibly skewing the network
ranking results.
This also fixed the network ranking used by GetOrderedNetwork which
wasn't working due to a missing assignment of *index in
network_find_info also triggering valgrind alerts.
The eapol state machine parameters are now built inside device.c when
the network connection is attempted. The reason is that the device
object knows about network settings, wiphy constraints and should
contain the main 'management' logic.
netdev now manages the actual low-level process of building association
messages, detecting authentication events, etc.
Keep an updated sorted list of networks in addition to the "networks"
hashmap. The list can be queried through the GetOrderedNetworks dbus
method.
We also take advantage of that list to get rid of a single
l_hashmap_foreach in new_scan_results.
A function that calculates a new rank type to order all networks
currently seen by a netdev. The order is designed for displaying the
list to user so that the networks most likely to be wanted by the user
are first on the list.
Since the rankmod value only makes sense for autoconnectable networks,
change network_rankmod to return an indication of whether the rankmod is
valid as a boolean instead of as a double, as discussed before.
Do nothing in device_disassociated if device->connected_network
indicates we are not associated. This may happen if the device was
connected since before iwd was started, this should possibly be fixed
separately by querying device state when device is detected.
Make sure networks of all 4 security types have a settings file created
or updated with a new modification time on a successful connect so that
autoconnect and network sorting works for networks other than PSK too.
By doing this on storage_network_touch failure we make sure we don't
overwrite anything dropped into the settings directory while we were
connecting.
for network_seen and network_connected
Only accept a struct network pointer instead of separately the ssid and
security type. This is needed so we can do some more simplification in
the next patch by having access to the network struct.
It looks like with multiple netdev seeing the same networks we'd create
multiple network_info structures for each network. Since the
"networks" list (of network_info structs) is global that's probbaly not
the intention here.
Turn netdev watches into device watches. The intent is to refactor out
netdev specific details into its own class and move device specific
logic into device.c away from wiphy.c
Sometimes the periodic scan is started and stopped before the timeout
was created. If periodic_scan_stop was called before, the timeout
object was not reset to NULL, which can lead to a crash.
The lost beacon event can be received when iwd thinks netdev is
diconnected if it was connected before iwd started, and then
netdev_disassociated will segfault.
It seems until now dbus.c would always connect to dbus-1 (unless
DBUS_SESSION_BUS_ADDRESS pointed at kdbus) and passing -K only made
iwd create a kdbus bus and not use it. Now use ell to actually use
kdbus instead of dbus-1 with -K. Don't use the src/kdbus.c functions
that duplicate ell functionality. As a side effect the connection
description and the bloom sizes are now the ell defaults.
Instead of passing the user_data parameter in every __eapol_rx_packet
call to be used by EAPOL in all tx_packet calls, add
eapol_sm_set_tx_user_data function that sets the value of user_data for
all subsequent tx_packet calls. This way tx_packet can be called from
places that are not necessarily inside an __eapol_rx_packet call.
Only EAP as the inner authentication option is supported. According to
wikipedia this is the most popular EAP-TTLS use case, with MD5 as the
inner EAP's method.
Add the EAP-TLS authentication method. Currently, all the credentials
data is read from the provisioning file even though things like the
private key passphrase should possibly be obtained from the dbus agent.
Probe Response messages can contain additional attributes tucked away
into the WFA-Vendor specific attribute. Parse these attributes while
making sure the order is as expected.
==2469== 24 bytes in 1 blocks are still reachable in loss record 1 of 1
==2469== at 0x4C2B970: malloc (vg_replace_malloc.c:296)
==2469== by 0x40E6DD: l_malloc (util.c:62)
==2469== by 0x40F1CD: l_queue_new (queue.c:63)
==2469== by 0x40D534: scan_init (scan.c:796)
==2469== by 0x403AC3: nl80211_appeared (wiphy.c:2121)
==2469== by 0x415FF3: get_family_callback (genl.c:987)
==2469== by 0x415A4F: process_request (genl.c:381)
==2469== by 0x415A4F: received_data (genl.c:492)
==2469== by 0x413184: io_callback (io.c:120)
==2469== by 0x4127C2: l_main_run (main.c:346)
==2469== by 0x40253E: main (main.c:171)
Refactoring the entire scan code, and this part seems to not be
supported by the target kernels. Revisit / redo this functionality once
things become a bit clearer.
In the D-bus .Connect call return an error immediately if we
find that there's no common cipher supported between iwd, the
network adapter and the AP. This is to avoid asking the agent
for the passkey if we know the connection will fail.
An alternative would be to only show networks that we can connect
to in the scan results on D-bus but I suspect that would cause
more pain to users debugging their wifi setups on average.
For now, if a passphrase is needed we check once before querying
for passphrase and recheck afterwards when we're about to
associate.
Instead of passing in the RSN/WPA elements, simply pass in the
configured cipher. This will make the implementation of the install_gtk
callback much simpler.
Step 4 is always sent without encrypted Key Data according to Section
11.6.6.5. In the case of WPA, Encrypted Key Data field is reserved, and
should always be 0. Thus it is safe to drop the !is_wpa condition.
When handling repeated 4-Way Handshakes from the AP there will be no
.Connect() call pending so we need to check that netdev->connect_pending
is non-NULL. It may be a good idea to check this even during initial
handshake.
When disconnect is triggered locally, we do not clean up properly.
==4336== at 0x4C2B970: malloc (vg_replace_malloc.c:296)
==4336== by 0x40CEED: l_malloc (util.c:62)
==4336== by 0x40F46A: l_settings_new (settings.c:82)
==4336== by 0x40CE2E: storage_network_open (storage.c:180)
==4336== by 0x40498E: network_connect_psk (wiphy.c:307)
==4336== by 0x40498E: network_connect (wiphy.c:359)
==4336== by 0x41D7EE: _dbus_object_tree_dispatch (dbus-service.c:845)
==4336== by 0x416A16: message_read_handler (dbus.c:297)
==4336== by 0x411984: io_callback (io.c:120)
==4336== by 0x410FC2: l_main_run (main.c:346)
==4336== by 0x40253E: main (main.c:171)
This happens when connecting / disconnecting successfully multiple
times.
==4336== 64 bytes in 2 blocks are definitely lost in loss record 9 of 11
==4336== at 0x4C2B970: malloc (vg_replace_malloc.c:296)
==4336== by 0x40CEED: l_malloc (util.c:62)
==4336== by 0x40D6D9: l_util_from_hexstring (util.c:493)
==4336== by 0x4049C6: network_connect_psk (wiphy.c:315)
==4336== by 0x4049C6: network_connect (wiphy.c:359)
==4336== by 0x41D7EE: _dbus_object_tree_dispatch (dbus-service.c:845)
==4336== by 0x416A16: message_read_handler (dbus.c:297)
==4336== by 0x411984: io_callback (io.c:120)
==4336== by 0x410FC2: l_main_run (main.c:346)
==4336== by 0x40253E: main (main.c:171)
Instead of storing the network pointer for each BSS, store it on the
netdev object. This saves space inside struct bss and makes longer term
refactoring simpler.
==4249== 231 (32 direct, 199 indirect) bytes in 1 blocks are definitely
lost in loss record 10 of 10
==4249== at 0x4C2B970: malloc (vg_replace_malloc.c:296)
==4249== by 0x40CF5D: l_malloc (util.c:62)
==4249== by 0x40F4DA: l_settings_new (settings.c:82)
==4249== by 0x40CE9E: storage_network_open (storage.c:180)
==4249== by 0x40499E: network_connect_psk (wiphy.c:307)
==4249== by 0x40499E: network_connect (wiphy.c:359)
==4249== by 0x41D85E: _dbus_object_tree_dispatch (dbus-service.c:845)
==4249== by 0x416A86: message_read_handler (dbus.c:297)
==4249== by 0x4119F4: io_callback (io.c:120)
==4249== by 0x411032: l_main_run (main.c:346)
==4249== by 0x40253E: main (main.c:171)
This patch saves off the PSK generated based on the passphrase provided
by the agent/user. The PSK is saved only if the connection is
successful.
Subsequent connection attempts to the known AP use the PSK saved on the
filesystem (default /var/lib/iwd/<ssid>.psk). If the connection fails,
the agent will again be asked for the passphrase on the next attempt.
CMD_DEAUTHENTICATE seems to carry only the management frame pdu
information. CMD_DISCONNECT is carrying the information that is
actually needed by us:
> Event: Disconnect (0x30) len 28 1140.118545
Wiphy: 0 (0x00000000)
Interface Index: 3 (0x00000003)
Reason Code: 2 (0x0002)
Disconnect by AP: true
We will ignore non-UTF8 based SSIDs. Support for non-UTF8 SSIDs seems
to be of dubious value in the real world as the vast majority of
consumer devices would not even allow such SSIDs to be configured or
used.
There also seems to be no compelling argument to support such SSIDs, so
until that argument arrives, non-UTF8 SSIDs will be filtered out. This
makes the D-Bus API and implementation much easier.
We start a timer. This handles the case that the Authenticator does
not send us the first message of the 4-way handshake, or disappears
before sending us the 3rd message.
We need to set the linkmode and operstate after successful
authentication.
Initial value for linkmode is 1 (user space controlled) and
IF_OPER_DORMANT for opermode. After successful authentication,
the operstate is set to IF_OPER_UP.
More specific details can be seen in kernel sources at
https://www.kernel.org/doc/Documentation/networking/operstates.txt
CC src/wiphy.o
src/wiphy.c: In function ‘eapol_read’:
src/wiphy.c:172:24: error: argument to ‘sizeof’ in ‘memset’ call is the same expression as the destination; did you mean to remove the addressof? [-Werror=sizeof-pointer-memaccess]
memset(&sll, 0, sizeof(&sll));
^
We can give reply to connect DBus call in associating event only
when we are connecting to Open network. For PSK AP, the reply can
only be sent after we have finished 4-way handshaking.
Currently it supports Microsoft vendor specific information element
with version and type value 1 only. Typically it contains WPA security
related information.
The success or not of a scan command is found from the message directly.
There's no need to look for any attribute from the scan netlink answer.
The message is an error message or not, and that tells if the scan has
been started or not.
Modifying a bit how networks are stored inside the hashtable:
1 - instead of the network id, the network's object path is used
2 - network holds the pointer of the object path
3 - the hashtable does not free the key (network_free() will)
This permits to optimize on:
1 - one memory allocation used for 2 distinct things
2 - remove the need to re-compute the object path (and the id) when it's
needed, it can use dircetly the one stored in the network structure.
Request a passphrase via Agent if none is set at the time network is
being connected. When freeing a network, cancel any outstanding Agent
requests and free allocated memory.
l_genl_family_send only returns request id. If request
failed at low level, current implementation does not handle that.
In case of request failure clear pending dbus messages.
Open networks do not contain a RSN element, so storing a 256 byte buffer
was too expensive.
This patch also has the side-effect of fixing detection of Open
Networks. Prior to this, if the scan results did not contain an RSN IE,
the 'rsne' variable would be set to all zeros. scan_get_ssid_security
would then be called, but instead of a NULL struct ie_rsn_info, a
non-null, but zerod out ie_rsn_info would be passed in. This caused the
code to work, but for the wrong reasons.