EAP-PWD was hard coded to only work on LE architectures. This
adds 2 conversion functions to go from network byte order (BE)
to any native architecture, and vise versa.
The file, src/ecc.c was taken from the bluez project:
https://git.kernel.org/pub/scm/bluetooth/bluez.git/tree/src/shared/ecc.c
There were minor changes made, e.g. changing some functions to globals
for access in EAP-PWD as well as removing some unneeded code. There was
also some code appended which allows for point addition, modulus inverse
as well as a function to compute a Y value given an X.
If Control Port over NL80211 is not supported, open up a PAE socket and
stuff it into an l_io on the netdev object. Install a read handler on
the l_io and call __eapol_rx_packet as needed.
With the introduction of Control Port Over NL80211 feature, the
transport details need to be moved out of eapol and into netdev.c.
Whether a given WiFi hardware supports transfer of Control Port packets
over NL80211 is Wiphy and kernel version related, so the transport
decisions need to be made elsewhere.
On connect add any secrets we've received through the agent to the
l_settings objects which the EAP methods will process in load_settings.
The settings object is modified but is never written to storage. If
this was to change because some settings need to be saved to storage,
a new l_settings object might be needed with the union of the settings
from the file and the secrets so as to avoid saving the sensitive data.
These EAP methods do not store the identity inside the settings file
since it is obtained from the SIM card, then provided to IWD via
get_identity method. If the get_identity method is implemented, do
not fail the settings check when EAP-Identity is missing.
Use eap_check_settings directly from network.c before we start the
connection attempt at netdev.c level, to obtain all of the required
passwords/passphrases through the agent. This is in network.c because
here we can decide the policies for whether to call the agent in
autoconnect or only if we had a request from the user, also whether we
want to save any of that for later re-use (either password data or
kernel-side key serial), etc.
In this patch we save the credentials for the lifetime of the network
object in memory, and we skip the network if it requires any passphrases
we don't have while in autoconnect, same as with PSK networks where the
PSK isn't given in the settings. Note that NetworkManager does pop up
the password window for PSK or EAP passwords even in autoconnect.
If EAP needs multiple passwords we will call the agent sequentially for
each.
Confirm that the PEM file paths that we'll be passing to the l_tls
object are loading Ok and request/validate the private key passphrase
if needed. Then also call eap_check_settings to validate the inner
method's settings.
Confirm that the PEM file paths that we'll be passing to the l_tls
object are loading Ok and request/validate the private key passphrase
if needed. Then also call eap_check_settings to validate the inner
method's settings.
With the goal of requesting the required passwords/passphrases, such as
the TLS private key passphrase, from the agent, add a static method
eap_check_settings to validate the settings and calculate what passwords
are needed for those settings, if any. This is separate from
eap_load_settings because that can only be called later, once we've
got an eap state machine object. We need to get all the needed EAP
credentials from the user before we even start connecting.
While we do this, we also validate the settings and output any error
messages through l_error (this could be changed so the messages go
somewhere else in the future), so I removed the error messages from
eap_load_settings and that method now assumes that eap_check_settings
has been called before.
eap_check_settings calls the appropriate method's .check_settings method
if the settings are complete enough to contain the method name. The
policy is that any data can be provided inside the l_settings object
(from the network provisioning/config file), but some of the more
sensitive fields, like private key passwords, can be optionally omitted
and then the UI will ask for them and iwd will be careful with
caching them.
Within struct eap_secret_info, "id" is mainly for the EAP method to
locate the info in the list. "value" is the actual value returned
by agent. "parameter" is an optional string to be passed to the agent.
For a private key passphrase it may be the path to the key file, for a
password it may be the username for which the password is requested.
In agent_receive_reply we first call the callback for the pending
request (agent_finalize_pending) then try to send the next request
in the queue. Check that the next request has not been sent already
which could happen if it has been just queued by the callback.
The difference in the handlers was that in the
NETDEV_EVENT_DISCONNECT_BY_AP case we would make sure to reply
to a pending dbus Connect call. We also need to do that for
NETDEV_EVENT_DISCONNECT_BY_SME. This happens if another process
sends an nl80211 disconnect command while we're connecting.
The eapol handshake timeout can now be configured in main.conf
(/etc/iwd/main.conf) using the key eapol_handshake_timeout. This
allows the user to configure a long timeout if debugging.
After an EAP exchange rsn_info would be uninitialized and in the FT case
we'd use it to generate the step 2 IEs which would cause an RSNE
mismatch during FT handshake.
Until now we'd save the second 32 bytes of the MSK as the PMK and use
that for the PMK-R0 as well as the PMKID calculation. The PMKID
actually uses the first 32 bytes of the PMK while the PMK-R0's XXKey
input maps to the second 32 bytes. Add a pmk_len parameter to
handshake_state_set_pmk to handle that. Update the eapol_eap_results_cb
802.11 quotes to the 2016 version.
handshake_state_install_ptk triggers a call to
netdev_set_pairwise_key_cb which calls netdev_connect_ok, so don't call
netdev_connect_ok after handshake_state_install_ptk. This doesn't fix
any specific problem though.
If the request being cancelled by agent_request_cancel has already been
sent over dbus we need to reset pending_id, the timeout, call l_dbus_cancel
to avoid the agent_receive_reply callback (and crash) and perhaps start
the next request. Alternatively we could only reset the callback and not
free the request, then wait until the agent method to return before starting
the next request.
Move the cancelling of the eapol timeout from the end of step 1 to
step 3 to guard the whole handshake. At the end of step 1 stop the
EAPOL-Start timeout for the case of 802.1X authentication + a cached
PMKSA (not used yet.)
Some APs respond to Neighbor Report Requests with neighbor reports that
have a zero operating class value and a non-zero channel number. This
does not mean that the channel is in the same band that the reporting
AP operates in. Try to guess the band that the channel refers to out of
2.4 and 5GHz -- the bands supported by those APs.
wpa_supplicant also has this workaround in place.
SA Query procedure is used when an unprotected disassociate frame
is received (with frame protection enabled). There are two code
paths that can occur when this disassociate frame is received:
1. Send out SA Query and receive a response from the AP within a
timeout. This means that the disassociate frame was not sent
from the AP and can be ignored.
2. Send out SA Query and receive no response. In this case it is
assumed that the AP went down ungracefully and is now back up.
Since frame protection is enabled, you must re-associate with
the AP.
1. Enforce implementation of handle_request function
2. In case of unimplemented handle_retransmit try to use
handle_request instead and rely on method specific
mechanism to restart the conversation if necessary
3. Make method->free implementation unrequired
When we call scan_periodic_stop and a periodic scan is in progress (i.e.
the trigger callback has been called already) we get no new callback
from scan.c and the device Scanning property remains True forever so set
it to False.
The change from scan_periodic_stop to periodic_scan_stop looks silly but
it's consistent with our naming :)
This patch adds a watcher/parser for the frame event associated with
an AP directed BSS transition (AP roaming). When the AP sends a BSS
transition request, this will parse out the BSS candidate list
(neighbor report) and initiate a roam scan. After this point the
existing roaming code path is reused.
The identity retrieved from simauth was required to include the
prefix for SIM/AKA/AKA', but in reality a real SIM would not
include that prefix in the IMSI. Now the correct prefix is
prepended onto the identity depending on the EAP method.
If the SQN in AUTN is incorrect the simauth module will return
the AUTS parameter, which is sent back to the server and the
servers SQN number is updated.
Forcing a plugin to create and register simauth at once is sometimes
inconvenient. This patch separates the creation and registration
into two API's, and also adds several others to add the required simauth
data incrementally (identity, driver data, sim/aka support). This also
allows for the driver to unregister the auth provider without freeing
up the simauth object itself e.g. if the driver temporarily becomes
unavailable, but will come back sometime in the future.
The simauth watch API's were also renamed. Watchers will now get a
callback when the provider has been unregistered, so they have been
renamed to sim_auth_unregistered_watch_[add|remove].
src/simauth.c:163:6: error: no previous declaration for ‘sim_auth_cancel_request’ [-Werror=missing-declarations]
void sim_auth_cancel_request(struct iwd_sim_auth *auth, int id)
^~~~~~~~~~~~~~~~~~~~~~~
iwd now supports plugin loading, whitelisting and blacklisting. Both
the whitelist and the blacklist support multiple patterns separated by a
',' character.
Make sure device->autoconnect is set when entering the autoconnect state
after netdev UP event. Otherwise the next time
device_set_autoconnect(device, false) is called it will exit early seeing
that device->autoconnect is false and not switch the device state.
This is the core module that takes care of registering
authentication drivers. EAP-SIM/AKA will be able to acquire
a driver that supports the required algorithms. The driver
implementation (hardcoded/ofono etc.) is isolated into
separate plugin modules.
EAP-SIM/AKA/AKA' retrieve the EAP-Identity off the SIM card
not from the settings file. This adds a new EAP method API
which can optionally be implemented to retrieve the identity.
If get_identity is implemented, the EAP layer will use it to
retrieve the identity rather than looking in the settings file.
network_settings_load expects NULL value to be returned
on failed attempts to read the settings files inside of
storage_network_open. At the same time storage_network_open
used to always return an initialized l_settings
structure despite the outcome of the read operations,
indicating a success.
When the 4-Way Handshake is done eapol.c calls netdev_set_tk, then
optionally netdev_set_gtk and netdev_set_igtk. To support the no group
key option send the final SET STATION enabling the controlled port
inside the callback for the netdev_set_tk operation which always means
the end of a 4-Way Handshake rather than in the netdev_set_gtk callback.
The spec says exactly that the controlled port is enabled at the end of
the 4-Way Handshake.
The netlink operations will still be queued in the same order because
the netdev_set_tk/netdev_set_gtk/netdev_set_igtk calls happen in one
main loop iteration but even if the order changed it wouldn't matter.
On failure of any of the three operations netdev_setting_keys_failed
gets called and the remaining operations are cancelled.
Track the contents and size of the GTK and IGTK and if the Authenticator
(or an adversary) tries to set the same GTK/IGTK, process the packet
normally but do not resubmit the GTK/IGTK to the kernel.
GTK KDE was being checked for being a minimum of 6 bytes. Not quite
sure why since the minimum GTK key length is 16 bytes for CCMP.
Similarly make sure that the maximum length is not more than 32, which
is currently the largest key size (TKIP)
This is a bizarre case since MIC calculation succeeded for the incoming
packet. But just in case MIC calculation fails for the outgoing packet,
kill the handshake.
The comments quoted sections of the specification that indicated STA
behavior for verifying Message 3 of 4 or GTK 1 of 2. But in reality the
code directly below simply calculated the MIC for Message 4 of 4 or GTK
2 of 2.
Use eapol_frame_watch_add/eapol_frame_watch_remove in eapol_sm, while
there simplify the early_frame logic and confirm sender address for
received frames.
Set all the new field values into struct sta_state only after all the
error checks for better readabilty and fixing a possible issue if we
did "sta->rates = rates" and then detected en error and freed "rates".
Also update a comment which I think used the wording from 802.11-2012
instead of 802.11-2016.
DEL_KEY is not needed and will return errors right after NEW_STATION or
right after DEL_STATION. In both cases the kernel makes sure there are
no old keys for the station already.
As a temporary DBus API to switch between Station and Access Point
modes, add two methods on the Device interface. Add a new state
DEVICE_STATE_ACCESS_POINT which is in effect from the moment
StartAccessPoint is received (even before it returns) until
StopAccessPoint returns, there are no intermediate states when the
methods run for simplicity. Add checks across device.c to make sure
Station related functionality is disabled when in Access Point mode.
Add a utility to append a KDE to the key_data field in an EAPoL frame.
The KDE types enum is actually added to handshake.h because we've got
the utilities for finding those KDEs in a buffer there. The new
function is specific to EAPoL-Key frames though and perhaps to simple to
be split across handshake.c and eapol.c. Also it didn't seem useful to
use the ie_tlv_builder here.
Parse Association Request frames and send Association Responses, handle
Disassociation. With this we should be able to receive uncontrolled
port data frames since we register the STAs with the kernel.
In this version I don't register for Reassociation frames.
Validate the IE order for some of the cases. For other cases, as with
the Disassociation, Deauthentication and Action frame types in section
9.3 it's not even clear from the spec the fields are expected to be IEs
(in fact for Action frame we know they aren't). For the Shared Key
authentication type drop the union with the contents as they can be
easier parsed as an IE sequence. For SAE we are not expecting an IE
sequence apparently so this is where the union could come useful but
let's leave that until we want to support SAE.
Check the IE order for each frame type where we'd just do the body
minimum length check until now (and not always correctly). We do not
try to validate the contents of any IEs (may be doable for some) or the
minimum mandatory IEs presence. This is because which IEs are required
depend on the contents of other fields in the frame, on the
authentication state and STA config and even contents of a request frame
which we're validating the response to. Frame handlers have to do this
work anyway.
Declare the two missing frame subtype enum values for Action frames,
assume Action frames are valid. Once we have specific validation code
for any Action frames elsewhere, we can move it to mpdu_validate, but
right don't try to validate the frame body as there are many subtypes
and we don't use any of them except Neighbor Reports which are actually
really simple.
Since we use the special 0xffff value in the builder code, check that
the tag is not 0xffff in ie_tlv_builder_finalize before writing the
header. This is for consistency, not for a specific use case.
Make parsing TLVs using Extended Element IDs easier by returning the
extended tag value as listed in enum ie_type instead of just the 255
value, and not returning the pointer to the extended tag as the IE data
and instead the pointer to the next byte after the extended ID.
The l_queue_find() to find other watches matching the new prefix
needs to be before the watchlist_link(), otherwise the prefix will
match itself and "registered" is always true.
In WATCHLIST_NOTIFY_MATCHES pass pointer to the item instead of
item->notify_data to free item->notify_data to be the final watch user's
user_data. This is also what netdev expects.
The EAP-method's .probe methods only checked the method name so do that
in eap.c instead and allocate method state in .load_settings. Rename
method's .remove method to .free to improve the naming.
This can be used to selectively notify watchlist items. The match
function is called for each watchlist_item and match_data is passed
along. If the match function returns true, then the watch_item is
notified. The match function signature and semantics are identical
to l_queue_match_func_t.
Rename netdev_register_frame to netdev_frame_watch_add and expose to be
usable outside of netdev.c, add netdev_frame_watch_remove also. Update
the Neighbor Report handling which was the only user of
netdev_register_frame.
The handler is now simpler because we use a lookup list with all the
prefixes and individual frame handlers only see the frames matching the
right prefix. This is also useful for the future Access-Point mode.
src/mpdu.c: In function ‘mpdu_validate’:
src/mpdu.c:180:9: error: ‘mmpdu’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
mmpdu = (const struct mmpdu_header *) mmpdu;
^
Refactor management frame structures to take into account optional
presence of some parts of the header:
* drop the single structure for management header and body since
the body offset is variable.
* add mmpdu_get_body to locate the start of frame body.
* drop the union of different management frame type bodies.
* prefix names specific to management frames with "mmpdu" instead
of "mpdu" including any enums based on 802.11-2012 section 8.4.
* move the FC field to the mmpdu_header structure.
This EAP method uses nearly all the logic from EAP-AKA. The major
difference is it uses the new key derivation functions for AKA' as
well as the SHA256 MAC calculation.
EAP-AKA' uses SHA256 rather than SHA1 to generate the packet MAC's.
This updates the derive MAC API to take the EAP method type and
correctly use the right SHA variant to derive the MAC.
This is the core key generation code for the AKA' method which
follows RFC 5448. Two new functions are implemented, one for
deriving CK'/IK' and the other for deriving the encryption keys
using CK'/IK'.
If the kernel device driver or the kernel nl80211 version doesn't
support the new RSSI threshold list CQM monitoring, implement similar
logic in iwd with periodic polling. This is only active when an RSSI
agent is registered to receive the events. I tested this with the same
testRSSIAgent autotests that tests the driver-side rssi monitoring
except with all timeouts multiplied by ~20.
The AT_VERSION_LIST attribute length was not being properly
checked. The actual length check did not include possible padding
bytes, so align_len() was added to ensure it was padded properly.
The comment about the padding being included in the Master Key
generation was not correct (padding is NOT included), and was removed.
Function to allow netdev.c to explicitly tell eapol.c whether to expect
EAP / 4-Way handshake. This is to potentially make the code more
descriptive, until now we'd look at sm->handshake->ptk_complete to see
if a new PTK was needed.
A 4-Way handshake is required on association to an AP except after FT.
Modify netdev_get_iftype, which was until now unused, and add
netdev_set_iftype. Don't skip interfaces with types other than STATION
on startup, instead reset the type to STATION in device.c.
netdev_get_iftype is modified to use our own interface type enum to
avoid forcing users to include "nl80211.h".
Note that setting an interface UP and DOWN wouldn't generally reset the
iftype to STATION. Another process may still change the type while iwd
is running and iwd would not detect this as it would detect another
interface setting interface DOWN, not sure how far we want to go in
monitoring all of the properties this way.
If we're adding the BSS to the list only because it is the current BSS,
set the rank to 0 (lowest possible value) in case the list gets used in
the next Connect call.
Allow attempts to connect to a new AP using the Reassociation frame even
if netdev->operational is false. This is needed if we want to continue
an ongoing roam attempt after the original connection broke and will be
needed when we start using cached PMKSAs in the future.
Use beacon loss event to trigger a roam attempt in addition to the RSSI
monitoring. Due to the how well beacons are normally received compared
to data packets, a beacon loss indicates a serious problem with the
connection so act as soon as a first beacon loss event is seen.
Avoid roaming methods that involve the current AP: preauthentication,
neighbor report request and FT-over-the-DS (not supported)
There are situations including after beacon loss and during FT where the
cfg80211 will detect we're now disconnected (in some cases will send a
Deauthenticate frame too) and generate this event, or the driver may do
this. For example in ieee80211_report_disconnect in net/mac80211/mlme.c
will (through cfg80211) generate a CMD_DEAUTHENTICATE followed by a
CMD_DISCONNECT.
The kernel doesn't reset the netdev's state to disconnected when it
sends us a beacon loss event so we can't either unless we automatically
send a disconnect command to the kernel.
It seems the handling of beacon loss depends on the driver. For example
in mac80211 only after N beacon loss events (default 7) a probe request is
sent to the AP and a deauthenticate packet is sent if no probe reply is
receiver within T (default 500ms).
If an application initiates the Connect() operation and
that application has an agent registered, then that
application's agent will be called. Otherwise, the default
agent is called.
==40686== Syscall param sendmsg(msg.msg_iov[0]) points to uninitialised byte(s)
==40686== at 0x5147037: sendmsg (in /usr/lib64/libc-2.24.so)
==40686== by 0x43957C: operate_cipher (cipher.c:354)
==40686== by 0x439C18: l_cipher_decrypt (cipher.c:415)
==40686== by 0x40FAB8: arc4_skip (crypto.c:181)
Initialize the skip buffer to 0s. This isn't strictly necessary, but
hides the above valgrind warning.
The aim of arc4 skip is simply to seed some data into the RC4 cipher so
it makes it harder for the attacker to decrypt. This 'initialization'
doesn't really care what data is fed.
CMD_DEAUTHENTICATE is not available for FullMAC based cards. We already
use CMD_CONNECT in the non-FT cases, which works on all cards. However,
for some reason we kept using CMD_DEAUTHENTICATE instead of CMD_DISCONNECT.
For FT (error) cases, keep using CMD_DEAUTHENTICATE.
Certain WiFi drivers do not support using CMD_SET_STATION (e.g.
mwifiex). It is not completely clear how such drivers handle the
AUTHORIZED state, but they don't seem to take it into account. So for
such drivers, ignore the -ENOTSUPP error return from CMD_SET_STATION.
These flags are documented in RFC2863 and kernel's
Documentation/networking/operstates.txt. Operstate doesn't have any
siginificant effect on normal connectivity or on our autotests because
it is not used by the kernel except in some rare cases but it is
supposed to affect some userspace daemons that watch for RTM_NEWLINK
events, so I believe we *should* set them according to this
documentation. Changes:
* There's no point setting link_mode or operstate of the netdev when
we're bringing the admin state DOWN as that overrides operstate.
* Instead of numerical values for link_mode use the if.h defines.
* Set IF_OPER_UP when association succeeds also in the Fast Transition
case. The driver will have set carrier off and then on so the
operstate should be IF_OPER_DORMANT at this point and needs to be
reset to UP.
Allow registering and unregistering agent object to receive RSSI level
notifications. The methods are similar to the ones related to the
password agent, including a Release method for the agent.
Add an methods and an event using the new
NL80211_EXT_FEATURE_CQM_RSSI_LIST kernel feature to request RSSI
monitoring with notifications only when RSSI moves from one of the N
intervals requested to another.
device.c will call netdev_set_rssi_report_levels to request
NETDEV_EVENT_RSSI_LEVEL_NOTIFY events every time the RSSI level changes,
level meaning one of the intervals delimited by the threshold values
passed as argument. Inside the event handler it can call
netdev_get_rssi_level to read the new level.
There's no fallback to periodic polling implemented in this patch for
the case of older kernels and/or the driver not supporting
NL80211_EXT_FEATURE_CQM_RSSI_LIST.
==27901== Conditional jump or move depends on uninitialised value(s)
==27901== at 0x41157A: handshake_util_find_pmkid_kde
(handshake.c:537)
==27901== by 0x40E03A: eapol_handle_ptk_1_of_4 (eapol.c:852)
==27901== by 0x40F3CD: eapol_key_handle (eapol.c:1417)
==27901== by 0x40F955: eapol_rx_packet (eapol.c:1607)
==27901== by 0x410321: __eapol_rx_packet (eapol.c:1915)
Agent implementation inside agent.c takes a reference of the trigger
message associated with the request. When the callback is called, the
message is passed as an argument. The callback is responsible for
taking the message reference if necessary. Once the callback returns,
agent releases its reference.
For error paths, our code was using dbus_pending_reply which in turn
uses dbus_message_unref. This caused the agent to try an unref
operation on an already freed object.
Move the calling of the *_shutdown functions from the signal handler to
a new public function, and use that function inside the DBus disconnect
handler to make sure resources are cleanly released.
Skip the matching of the PMKID KDE to the PMKID list in the RSNE if
we've seen a new EAP authentication before the step 1/4 was received.
That would mean that the server had not accepted the PMKIDs we submitted
and we performed a new 8021X authentication, producing a new PMKSA which
won't be on the list in the RSNE.
Currently we'd send EAPOL-Start whenever EAP was configured and we
received an EAPOL-Key before EAP negotiation. Instead only do that if
we know we can't respond to the 4-Way handshake because we don't have
a PMK yet or the PMKID doesn't match. Require a PMKID in step 1/4 if
we'd sent a list of PMKIDs in our RSNE.
Modify the packet filter to also accept frames with ethertype of 0x88c7
and pass the ethertype value to __eapol_rx_packet so it can filter out
the frames where this value doesn't match the sm->preauth flag.
Add a wrapper for eapol_start that sets the sm->preauth flag and sends
the EAPOL-Start frame immediately to skip the timeout since we know
that the supplicant has to initiate the authentication.
Use netdev_reassociate if FT is not available. device_select_akm_suite
is only moved up in the file and the reused code from device_connect is
moved to a separate function.
netdev_reassociate transitions to another BSS without FT. Similar to
netdev_connect but uses reassociation instead of association and
requires and an existing connection.
Pass an additional parameter to the scan results notify functions to
tell them whether the scan was successful. If it wasn't don't bother
passing an empty bss_list queue, pass NULL as bss_list. This way the
callbacks can tell whether the scan indicates there are no BSSes in
range or simply was aborted and the old scan results should be kept.
++++++++ backtrace ++++++++
0 0x7fc0b20ca370 in /lib64/libc.so.6
1 0x4497d5 in l_dbus_message_new_error_valist() at /home/denkenz/iwd/ell/dbus-message.c:372
2 0x44994d in l_dbus_message_new_error() at /home/denkenz/iwd/ell/dbus-message.c:394
3 0x41369b in dbus_error_not_supported() at /home/denkenz/iwd/src/dbus.c:148
4 0x40eaf5 in device_connect_network() at /home/denkenz/iwd/src/device.c:1282
5 0x41f61c in network_autoconnect() at /home/denkenz/iwd/src/network.c:424
6 0x40c1c1 in device_autoconnect_next() at /home/denkenz/iwd/src/device.c:172
7 0x40cabf in device_set_scan_results() at /home/denkenz/iwd/src/device.c:368
8 0x40cb06 in new_scan_results() at /home/denkenz/iwd/src/device.c:376
9 0x41be8a in scan_finished() at /home/denkenz/iwd/src/scan.c:1021
10 0x41bf9e in get_scan_done() at /home/denkenz/iwd/src/scan.c:1048
11 0x43d5ce in destroy_request() at /home/denkenz/iwd/ell/genl.c:136
12 0x43ded1 in process_unicast() at /home/denkenz/iwd/ell/genl.c:395
13 0x43e295 in received_data() at /home/denkenz/iwd/ell/genl.c:502
14 0x43aa62 in io_callback() at /home/denkenz/iwd/ell/io.c:120
15 0x439632 in l_main_run() at /home/denkenz/iwd/ell/main.c:375 (discriminator 2)
16 0x403074 in main() at /home/denkenz/iwd/src/main.c:261
17 0x7fc0b20b7620 in /lib64/libc.so.6
Also handle the case of a periodic scan when handling a
NL80211_CMD_SCAN_ABORTED. The goal is to make sure the supplied callback
is always called if .trigger was called before, but this should also fix
some other corner cases.
* I add a sp.triggered field for periodic scans since sc->state doesn't
tell us whether the scan in progress was triggered by ourselved o
someone else (in that case .trigger has not been called)
* Since the NL80211_CMD_SCAN_ABORTED becomes similar to get_scan_done I
move the common code to scan_finished
* I believe this fixes a situation where we weren't updating sc->state
if we'd not triggered the scan, because both get_scan_done and the
NL80211_CMD_SCAN_ABORTED would return directly.
If the current request is not freed when we receive the
NL80211_CMD_SCAN_ABORTED event, device.c will keep thinking that
we're still scanning and the scan.c logic also gets confused and may
resend the current request at some point and call sr->trigger again
causing a segfault in device.c.
I pass an empty bss_list to the callback, another possibility would be
to pass NULL to let the callback know not to replace old results yet.
The callbacks would need to handle a NULL first.
Handle the changes of interface address in RTNL New Link messages
similarly to the name changes, emit a NETDEV_WATCH_EVENT_ADDRESS_CHANGE
event and a propety change on dbus.
Note this can only happen when the interface is down so it doesn't
break anything but we need to handle it anyway.
DBus has certain rules on what constitutes a valid path. Since the
wiphy name is freeform, it is possible to set it such that the contents
do not contain a valid path.
We fall back to simply using the wiphy index as the path.
DBus strings must be valid utf8. The kernel only enforces that the
wiphy name is null terminated string. It does not validate or otherwise
check the contents in any way. Thus it is possible to have
non-printable or non-utf8 characters inside.
NL80211_CMD_SET_WIPHY can be used to set various attributes on the wiphy
object in the kernel. This includes ATTR_WIPHY_NAME among others. iwd
currently does not parse or store any of the other attributes, so we
react to changes in WIPHY_NAME only.
The wiphy attribute should never be repeated by the kernel, so this
check is ultimately not needed. This condition can also be easily
checked by looking at the iwmon output in case things do go terribly
wrong.
Fix 1a64c4b771 by setting use_eapol_start
by default only when 8021x authentication is configured. Otherwise we'd
be sending EAPOL-Start even for WPA2 Personal possibly after the 4-Way
Handshake success.
This implements very initial support of WPS PIN based connections. The
scanning logic attempts to find an AP in PIN mode and tries to connect
to that AP. We currently do not try multiple APs if available or
implement the WSC 1.0 connection logic.
Right now the code checks for is_rsn to wait for the 4-way handshake and
sends the NETDEV_EVENT_4WAY_HANDSHAKE. However, is_rsn condition is not
true for WSC connections since they do not set an RSN field. Still,
they are EAP based handshakes and should be treated in the same manner.
We relax the is_rsn check to instead check for netdev->sm. Currently
netdev->sm is only non-NULL if handshake->own_ie field is not NULL or in
the case of eap-wsc connections.
Define minimum delay between roam attempts and add automatic retries.
This handles a few situations:
* roam attempt failing, then RSSI going above the threshold and below
again -- in that case we don't want to reattempt too soon, we'll only
reattempt after 60s.
* roam attempt failing then RSSI staying low for longer than 60 -- in
that case we want to reattempt after 60s too.
* signal being low from the moment we connected -- in that case we also
want to attempt a roam every some time.
Fix a leak of the MDE buffer. It is now only needed for the single call
to handshake_state_set_mde which copies the bytes anyway so use a buffer
on stack.
Since caab23f192085e6c8e47c41fc1ae9f795d1cbe86 hostapd is going to set
this bit to zero for RSN networks but both values will obviously be in
use. Only check the value if is_wpa is true - in this case check the
value is exactly 16, see hostapd commit:
commit caab23f192085e6c8e47c41fc1ae9f795d1cbe86
Author: Jouni Malinen <j@w1.fi>
Date: Sun Feb 5 13:52:43 2017 +0200
Set EAPOL-Key Key Length field to 0 for group message 1/2 in RSN
P802.11i/D3.0 described the Key Length as having value 16 for the group
key handshake. However, this was changed to 0 in the published IEEE Std
802.11i-2004 amendment (and still remains 0 in the current standard IEEE
Std 802.11-2016). We need to maintain the non-zero value for WPA (v1)
cases, but the RSN case can be changed to 0 to be closer to the current
standard.
Add sr NULL check before accessing sr->id. Call scan_request_free on
request structure and call the destroy callback. Cancel the netlink
TRIGGER_SCAN command if still running and try starting the next scan
in the queue. It'll probably still fail with EBUSY but it'll be
reattempted later.
Always call start_next_scan_request when a scan request has finished,
with a success or a failure, including a periodic scan attempt. Inside
that function check if there's any work to be done, either for one-off
scan requests or periodic scan, instead of having this check only inside
get_scan_done. Call start_next_scan_request in scan_periodic_start and
scan_periodic_timeout.
Also call the trigger callback with an error code when sending the
netlink command fails after the scan request has been queued because
another scan was in progress when the scan was requested.
Program received signal SIGSEGV, Segmentation fault.
0x0000000000419d38 in scan_done (msg=0x692580, userdata=0x688250)
at src/scan.c:250
250 sc->state = sr->passive ? SCAN_STATE_PASSIVE : SCAN_STATE_ACTIVE;
(gdb) bt
0 0x0000000000419d38 in scan_done (msg=0x692580, userdata=0x688250)
at src/scan.c:250
1 0x000000000043cac0 in process_unicast (genl=0x686d60, nlmsg=0x7fffffffc3b0)
at ell/genl.c:390
2 0x000000000043ceb0 in received_data (io=0x686e60, user_data=0x686d60)
at ell/genl.c:506
3 0x000000000043967d in io_callback (fd=6, events=1, user_data=0x686e60)
at ell/io.c:120
4 0x000000000043824d in l_main_run () at ell/main.c:381
5 0x000000000040303c in main (argc=1, argv=0x7fffffffe668) at src/main.c:259
The reasoning is that the logic inside scan_common is reversed. Instead
of freeing the scan request on error, we always do it. This causes the
trigger_scan callback to receive invalid userdata.
Save the ids of the netlink trigger scan commands that we send and
cancel them in scan_ifindex_remove to fix a race leading to a
segfault. The segfault would happen every time if scan_ifindex_remove
was called in the same main loop iteration in which we sent the
command, on shutdown:
^CTerminate
src/netdev.c:netdev_free() Freeing netdev wlan3[6]
src/device.c:device_disassociated() 6
src/device.c:device_enter_state() Old State: connected, new state:
disconnected
src/device.c:device_enter_state() Old State: disconnected, new state:
autoconnect
src/scan.c:scan_periodic_start() Starting periodic scan for ifindex: 6
src/device.c:device_free()
src/device.c:bss_free() Freeing BSS 02:00:00:00:00:00
src/device.c:bss_free() Freeing BSS 02:00:00:00:01:00
Removing scan context for ifindex: 6
src/scan.c:scan_context_free() sc: 0x5555557ca290
src/scan.c:scan_notify() Scan notification 33
src/netdev.c:netdev_operstate_down_cb() netdev: 6, success: 1
src/scan.c:scan_periodic_done()
src/scan.c:scan_periodic_done() Periodic scan triggered for ifindex:
1434209520
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000064 in ?? ()
(gdb) bt
#0 0x0000000000000064 in ?? ()
#1 0x0000555555583560 in process_unicast (nlmsg=0x7fffffffc1a0,
genl=0x5555557c1d60) at ell/genl.c:390
#2 received_data (io=<optimized out>, user_data=0x5555557c1d60)
at ell/genl.c:506
#3 0x0000555555580d45 in io_callback (fd=<optimized out>,
events=1, user_data=0x5555557c1e60) at ell/io.c:120
#4 0x000055555558005f in l_main_run () at ell/main.c:381
#5 0x00005555555599c1 in main (argc=<optimized out>, argv=<optimized out>)
at src/main.c:259
Parse the contents of the GTK and IGTK subelements in an FT IE instead
of working with buffers containing the whole subelement. Some more
validation of the subelement contents. Drop support for GTK / IGTK when
building the FTE (unused).
Don't start the handshake timeout in eapol_start if either
handshake->ptk_complete is set (handshake already done) or
handshake->have_snonce is set (steps 1&2 done). This accounts for
eapol_start being called after a Fast Transition when a 4-Way handshake
is not expected.
Add a flush flag to scan_parameters to tell the kernel to flush the
cache of scan results before the new scan. Use this flag in the
active scan during roaming.
Split the igtk parameter to handshake_state_install_igtk into one
parameter for the actual IGTK buffer and one for the IPN buffer instead
of requiring the caller to have them both in one continuous buffer.
With FT protocol, one is received encrypted and the other in plain text.
Make sure that the Neighbor Report timeout is cancelled when connection
breaks or device is being destroyed, and call the callback. Add an
errno parameter to the callback to indicate the cause.
With this patch an actual fast transition should happen when the signal
strength goes low but there are still various details to be fixed before
this becomes useful:
* the kernel tends to return cached scan results and won't update the
rssi values,
* there's no timer to prevent too frequent transition attempts or to
retry after some time if the signal is still low,
* no candidate other than the top ranked BSS is tried. With FT it
may be impossible to try another BSS anyway although there isn't
anything in the spec to imply this. It would require keeping the
handshake_state around after netdev gives up on the transition
attempt.
Trigger a scan of the selected channels or all channels if no useful
neighbor list was obtained, then process the scan results to select the
final target BSS.
The actual transition to the new BSS is not included in this patch for
readability.
Trigger a roam attempt when the RSSI level has been low for at least 5
seconds using the netdev RSSI LOW/HIGH events. See if neighbor reports
are supported and if so, request and process the neighbor reports list
to restrict the number of channels to be scanned. The scanning part is
not included in this patch for readability.
Validate the fourth message of the fast transition sequence and save the
new keys and state as current values in the netdev object. The
FT-specific IE validation that was already present in the initial MD
is moved to a new function.
Build and send the FT Authentication Request frame, the initial Fast
Transition message.
In this version the assumption is that once we start a transition attempt
there's no going back so the old handshake_state, scan_bss, etc. can be
replaced by the new objects immediately and there's no point at which both
the old and the new connection states are needed. Also the disconnect
event for the old connection is implicit. At netdev level the state
during a transition is almost the same with a new connection setup.
The first disconnect event on the netlink socket after the FT Authenticate
is assumed to be the one generated by the kernel for the old connection.
The disconnect event doesn't contain the AP bssid (unlike the
deauthenticate event preceding it), otherwise we could check to see if
the bssid is the one we are interested in or could check connect_cmd_id
assuming a disconnect doesn't happen before the connect command finishes.
This adds support for iwd.conf 'ManagementFrameProtection' setting.
This setting has the following semantics, with '1' being the default:
0 - MFP off, even if hardware is capable
1 - Use MFP if available
2 - MFP required. If the hardware is not capable, no connections will
be possible. Use at your own risk.
Despite RFC3748 mandating MSKs to be at least 256 bits some EAP methods
return shorter MSKs. Since we call handshake_failed when the MSK is too
short, EAP methods have to be careful with their calls to set_key_material
because it may result in a call to the method's .remove method.
EAP-TLS and EAP-TTLS can't handle that currently and would be difficult to
adapt because of the TLS internals but they always set msk_len to 64 so
handshake_failed will not be called.
Make sure that eap_set_key_material can free the whole EAP method and
EAP state machine before returning, by calling that function last. This
relies on eap_mschapv2_handle_success being the last call in about 5
stack frames above it too.
Action Frames are sent by nl80211 as unicast data. We're not receiving
any other unicast packets in iwd at this time so let netdev directly
handle all unicast data on the genl socket.
Add a version of scan_active that accepts a struct with the scan
parameters so we can more easily add new parameters. Since the genl
message is now built within scan_active_start the extra_ie memory
can be freed by the caller at any time.
clang complains about enum as var_arg type
because of the argument standard conversion.
In a small test I did neither clang nor gcc can
properly warn about out of range values, so it's
purely for documentation either way.
There are situations when a CMD_DISCONNECT or deauthenticate will be
issued locally because of an error detected locally where netdev would
not be able to emit a event to the device object. The CMD_DISCONNECT
handler can only send an event if the disconnect is triggered by the AP
because we don't have an enum value defined for other diconnects. We
have these values defined for the connect callback but those errors may
happen when the connect callback is already NULL because a connection
has been estabilshed. So add an event type for local errors.
These situations may occur in a transition negotiation or in an eapol
handshake failure during rekeying resulting in a call to
netdev_handshake_failed.
The kernel parses NL80211_ATTR_USE_MFP to mean an enumeration
nl80211_mfp. So instead of using a boolean, we should be using the
value NL80211_MFP_REQUIRED.
Make the use of EAPOL-Start the default and send it when configured for
8021x and either we receive no EAPOL-EAP from from the AP before
timeout, or if the AP tries to start a 4-Way Handshake.
On certain routers, the 4-Way handshake message 3 of 4 contains a key iv
field which is not zero as it is supposed to. This causes us to fail
the handshake.
Since the iv field is not utilized in this particular case, it is safe
to simply warn rather than fail the handshake outright.
Use the NLMSG_ALIGN macro on the family header size (struct ifinfomsg in
this case). The ascii graphics in include/net/netlink.h show that both
the netlink header and the family header should be padded. The netlink
header (nlmsghdr) is already padded in ell. To "document" this
requirementin ell what we could do is take two buffers, one for the
family header and one for the attributes.
This doesn't change anything for most people because ifinfomsg is
already 16-byte long on the usual architectures.
Remove the keys and other data from struct eapol_sm, update device.c,
netdev.c and wsc.c to use the handshake_state object instead of
eapol_sm. This also gets rid of eapol_cancel and the ifindex parameter
in some of the eapol functions where sm->handshake->ifindex can be
used instead.
struct handshake_state is an object that stores all the key data and other
authentication state and does the low level operations on the keys. Together
with the next patch this mostly just splits eapol.c into two layers
so that the key operations can also be used in Fast Transitions which don't
use eapol.
If device_select_akm_suite selects Fast Transition association then pass
the MD IE and other bits needed for eapol and netdev to do an FT
association and 4-Way Handshake.
If an MD IE is supplied to netdev_connect, pass that MD IE in the
associate request, then validate and handle the MD IE and FT IE in the
associate response from AP.
Add space in the eapol_sm struct for the pieces of information required
for the FT 4-Way Handshake and add setters for device.c and netdev.c to
be able to provide the data.
Don't decide on the AKM suite to use when the bss entries are received
and processed, instead select the suite when the connection is triggered
using a new function device_select_akm_suite, similar to
wiphy_select_cipher(). Describing the AKM suite through flags will be
more difficult when more than 2 suites per security type are supported.
Also handle the wiphy_select_cipher 0 return value when no cipher can be
selected.
The len parameter was only used so it could be validated against ie[1],
but since it was not checked to be > 2, it must have been validated
already, the check was redundant. In any case all users directly
passed ie[1] as len anyway. This makes it consistent with the ie
parsers and builders which didn't require a length.
In many cases the pairwise and group cipher information is not the only
information needed from the BSS RSN/WPA elements in order to make a
decision. For example, th MFPC/MFPR bits might be needed, or
pre-authentication capability bits, group management ciphers, etc.
This patch refactors bss_get_supported_ciphers into the more general
scan_bss_get_rsn_info function
Split eapol_start into two calls, one to register the state machine so
that the PAE read handler knows not to discard frames for that ifindex,
and eapol_start to actually start processing the frames. This is needed
because, as per the comment in netdev.c, due to scheduling the PAE
socket read handler may trigger before the CMD_CONNECT event handler,
which needs to parse the FTE from the Associate Response frame and
supply it to the eapol SM before it can do anything with the message 1
of 4 of the FT handshake.
Another issue is that depending on the driver or timing, the underlying
link might not be marked as 'ready' by the kernel. In this case, our
response to Message 1 of the 4-way Handshake is written and accepted by
the kernel, but gets dropped on the floor internally. Which leads to
timeouts if the AP doesn't retransmit.
This function takes an Operating Channel and a Country String to convert
it into a band. Using scan_oper_class_to_band and scan_channel_to_freq,
an Operating Channel, a Country String and a Channel Number together can
be converted into an actual frequency. EU and US country codes based on
wpa_supplicant's tables.
It doesn't matter for crypto_derive_pairwise_ptk in non-SHA256 mode
but in the FT PTK derivation function, as well as in SHA256 mode all
bytes of the output do actually change with the PTK size.
Fix autoconnect trying to connect to networks never used before as found
by Tim Kourt. Update the comments to be consistent with the use of the
is_known field and the docs, in that a Known Network is any network that
has a config file in the iwd storage, and an autoconnect candidate is a
network that has been connected to before.
If the handshake fails, we trigger a deauthentication prior to reporting
NETDEV_RESULT_HANDSHAKE_FAILED. If a netdev_disconnect is invoked in
the meantime, then the caller will receive -ENOTCONN. This is
incorrect, since we are in fact logically connected until the connect_cb
is notified.
Tweak the behavior to keep the connected variable as true, but check
whether disconnect_cmd_id has been issued in the netdev_disconnect_event
callback.
If the device is currently connected, we will initiate a disconnection
(or wait for the disconnection to complete) prior to starting the
WSC-EAP association.
Use the org.freedesktop.DBus.Properties interfaces on objects with
properties and drop the old style GetProperty/SetProperty methods on
individual interfaces. Agent and KnownNetworks have no properties at
this time so don't add org.freedesktop.DBus.Properties interfaces.
We send the scan results where we obtained a PushButton target over to
device object. If EAP-WSC transaction is successful, then the scan
results are searched to find a network/bss combination found in the
credentials obtained. If found, the network is connected to
automatically.
This also fixes a potential buffer overflow since the ssid was cast to a
string inside network_create. However, ssid is a buffer of 32 bytes,
and would not be null-terminated in the case of a 32-byte SSID.
==5362== Conditional jump or move depends on uninitialised value(s)
==5362== at 0x419B62: wsc_wfa_ext_iter_next (wscutil.c:52)
==5362== by 0x41B869: wsc_parse_probe_response (wscutil.c:1016)
==5362== by 0x41FD77: scan_results (wsc.c:218)
==5362== by 0x415669: get_scan_done (scan.c:892)
==5362== by 0x432932: destroy_request (genl.c:134)
==5362== by 0x433245: process_unicast (genl.c:394)
==5362== by 0x43361A: received_data (genl.c:506)
==5362== by 0x42FDC2: io_callback (io.c:120)
==5362== by 0x42EABE: l_main_run (main.c:381)
==5362== by 0x402F90: main (main.c:234)
This is used to get arbitrary information out of the EAP method. Needed
for EAP-WSC to signal credential information obtained from the peer.
Other uses include signaling why EAP-WSC failed (e.g. invalid PIN, etc)
and processing of M2D discovery messages. The information in M2Ds might
be useful to external clients.
We used to open a socket for each wireless interface. This patch uses a
single socket with an attached BPF to handle all EAPoL traffic via a
single file descriptor.
When parsing the EAPoL-Key key data field we don't strip the 0xdd /
0x00 padding from the decrypted data so there may be trailing padding
after the IE sequence and valgrind will report an invalid read of the
length byte. Same thing may happen if we're sent garbage.
When we send M5 & M7, we need to generate a random IV. For testing
purposes, the IV can be provided in settings, otherwise it will be
generated randomly.
We need quite a bit of attributes of M2 for the duration of the WSC
handshake. Most importantly, we need to use the peer's public key when
processing M4 and M6. RegistrarNonce is also needed for generating any
ACK/NACK messages as needed.
Also, peer's device attributes such as Model, Manufacturer, etc might be
useful to report upon successful handshake.
AuthKey is already uploaded into auth_key_hmac. KeyWrapKey is now
uploaded into the AES-CBC(128) cipher. We currently have no use for
EMSK.
So we no longer need to keep the wsc_session_key structure around.
Encrypted Settings TLVs are structured similarly to the various WSC
messages. However, they lack a version2 extension field and use a Key
Wrap Authenticator element instead of Authenticator.
DevicePassword is the PIN, either static, dynamically generated or
entered by the user. For PushButton mode, DevicePassword is set to
'00000000'. It can also be provided via external means, such as NFC.
This patch allows DevicePassword to be externally configured into the
EAP-WSC layer. Optionally, the secret nonce values can also be
provided for testing purposes. If omitted, they will be generated using
l_getrandom.
We use the load_settings method to bootstrap the internal state of the
EAP WSC state machine. We require certain information to be provided by
the higher layers, namely:
Global Device parameters
- Manufacturer
- Model Name
- Model Number
- Serial Number
- Device Name
- Primary Device Type
- OS Version
Session specific parameters
- MAC Address
- Configuration Methods
- RF Bands
The following parameters are auto-generated for each new session, but
can be over-ridden if desired
- Private Key
- Enrollee Nonce
Expanded EAP methods should get their packets for handling starting at
the op-code field. They're not really interested in
type/vendor-id/vendor-type fields.
Instead of one global protocol_version, we now store it inside eapol_sm.
This allows us to use the same protocol version for our response as the
request from the authenticator.
For unit tests where we had protocol version mismatches, a new method is
introduced to explicitly set the protocol version to use.
CMD_DISCONNECT fails on some occasions when CMD_CONNECT is still
running. When this happens the DBus disconnect command receives an
error reply but iwd's device state is left as disconnected even though
there's a connection at the kernel level which times out a few seconds
later. If the CMD_CONNECT is cancelled I couldn't reproduce this so far.
src/network.c:network_connect()
src/network.c:network_connect_psk()
src/network.c:network_connect_psk() psk:
69ae3f8b2f84a438cf6a44275913182dd2714510ccb8cbdf8da9dc8b61718560
src/network.c:network_connect_psk() len: 32
src/network.c:network_connect_psk() ask_psk: false
src/device.c:device_enter_state() Old State: disconnected, new state:
connecting
src/scan.c:scan_notify() Scan notification 33
src/device.c:device_netdev_event() Associating
src/netdev.c:netdev_mlme_notify() MLME notification 60
MLME notification is missing ifindex attribute
src/device.c:device_dbus_disconnect()
src/device.c:device_connect_cb() 6, result: 5
src/device.c:device_enter_state() Old State: connecting, new state:
disconnecting
src/device.c:device_disconnect_cb() 6, success: 0
src/device.c:device_enter_state() Old State: disconnecting, new state:
disconnected
src/scan.c:scan_notify() Scan notification 34
src/netdev.c:netdev_mlme_notify() MLME notification 19
src/netdev.c:netdev_mlme_notify() MLME notification 60
MLME notification is missing ifindex attribute
src/netdev.c:netdev_mlme_notify() MLME notification 37
src/netdev.c:netdev_authenticate_event()
src/scan.c:get_scan_callback() get_scan_callback
src/scan.c:get_scan_done() get_scan_done
src/netdev.c:netdev_mlme_notify() MLME notification 60
MLME notification is missing ifindex attribute
src/netdev.c:netdev_mlme_notify() MLME notification 19
MLME notification is missing ifindex attribute
src/netdev.c:netdev_mlme_notify() MLME notification 38
src/netdev.c:netdev_associate_event()
src/netdev.c:netdev_mlme_notify() MLME notification 46
src/netdev.c:netdev_connect_event()
<delay>
src/netdev.c:netdev_mlme_notify() MLME notification 60
MLME notification is missing ifindex attribute
src/netdev.c:netdev_mlme_notify() MLME notification 20
MLME notification is missing ifindex attribute
src/netdev.c:netdev_mlme_notify() MLME notification 20
src/netdev.c:netdev_mlme_notify() MLME notification 39
src/netdev.c:netdev_deauthenticate_event()
This is to make sure device_remove and netdev_connect_free are called
early so we don't continue setting up a connection and don't let DBus
clients power device back up after we've called netdev_set_powered.
Calling device_disassociated inside disconnect_cb was mostly pointless.
Most attributes were already cleared by device_disconnect() when
initiating the disconnection procedure.
This patch also modifies the logic for triggering the autoconnect. If
the user initiated the disconnect call, then autoconnect should not be
triggered. If the disconnect was triggered by other means, then iwd
will still enter autoconnect mode.
All of the abortion logic is invoked when device_disconnect is called.
So there's no point calling device_disassociated in this case. This
also prevents us from entering into autoconnect mode too early.
Prevents situations like this:
src/device.c:device_enter_state() Old State: connecting, new state:
connected
src/scan.c:scan_periodic_stop() Stopping periodic scan for ifindex: 3
src/device.c:device_dbus_disconnect()
src/device.c:device_connect_cb() 3
src/device.c:device_disassociated() 3
src/device.c:device_enter_state() Old State: connected, new state:
autoconnect
Also, remove the check for device->state == DEVICE_STATE_CONNECTING.
device_connect_cb should always called when the state is CONNECTING.
If this is not so, it indicates a bug inside the netdev layer.
This was introduced by commit f468fceb02.
However, after commit 2d78f51fac66b9beff03a56f12e5fb8456625f07, the
connect_cb is called from inside netdev_disconnect. This in turn causes
the dbus-reply to be sent out if needed. So by the time we get to the
code in question, connect_pending is always NULL.
Try to make the connect and disconnect operations look more like a
transaction where the callback is always called eventually, also with a
clear indication if the operation is in profress. The connected state
lasts from the start of the connection attempt until the disconnect.
1. Non-null netdev->connected or disconnect_cb indicate that the operation
is active.
2. Every entry-point in netdev.c checks if connected is still set
before executing the next step of the connection setup. CMD_CONNECT and
the subsequent commands may succeed even if CMD_DISCONNECT is called
in the middle so they can't only rely on the error value for that.
3. netdev->connect_cb and other elements of the connection state are
reset by netdev_connect_free which groups the clean-up operations to
make sure we don't miss anything. Since the callback pointers are
reset device.c doesn't need to check that it receives a spurious
event in those callbacks for example after calling netdev_disconnect.
If initial bring up returns ERFKILL proceed and the inteface can be
explicitly brought up by the client once rfkill is disabled.
Also fix the error number returned to netdev_set_powered callback to be
negative as expected by netdev_initial_up_cb.
map_wiphy made the assumption that phy names follow the "phyN" pattern
but phys created or renamed by the "iw" command can have arbitrary
names. It seems that /sys/class/rfkill/rfkill%u/name is not updated on
a phy rename, so we can't use it to subsequently read
/sys/class/ieee80211/<name>/index but both
/sys/class/rfkill/rfkill%u/../index and
/sys/class/rfkill/rfkill%u/device/index point to that file.
==3059== 7 bytes in 1 blocks are still reachable in loss record 1 of 2
==3059== at 0x4C2C970: malloc (vg_replace_malloc.c:296)
==3059== by 0x50BB319: strndup (in /lib64/libc-2.22.so)
==3059== by 0x417B4D: l_strndup (util.c:180)
==3059== by 0x417E1B: l_strsplit (util.c:311)
==3059== by 0x4057FC: netdev_init (netdev.c:1658)
==3059== by 0x402E26: nl80211_appeared (main.c:112)
==3059== by 0x41F577: get_family_callback (genl.c:1038)
==3059== by 0x41EE3F: process_unicast (genl.c:390)
==3059== by 0x41EE3F: received_data (genl.c:506)
==3059== by 0x41C6F4: io_callback (io.c:120)
==3059== by 0x41BAA9: l_main_run (main.c:381)
==3059== by 0x402B9C: main (main.c:234)
Previously device.c would remove the whole object at the path of the
Device and the WSC interfaces but now the watches are called without the
whole object appearing and disappearing.
Change the path for net.connman.iwd.Device objects to /phyX/Y and
register net.connman.iwd.Adapter at /phyX grouping devices of the same
wiphy.
Turns out no changes to the test/* scripts are needed.
The boolean property indicates if a scan is ongoing. Only the scans
triggered by device.c are reflected (not the ones from WSC) because only
those scans affect the list of networks seen by Dbus.
Add rfkill.c/rfkill.h to be used for watching per-wiphy RFkill state.
It uses both /dev/rfkill and /sys because /dev/rfkill is the recommended
way of interfacing with rfkill but at the same time it doesn't provide
the information on mapping to wiphy IDs.
Note that the autoconnect_list may still contain the network. Currently
only the top entry from the list is ever used and only on
new_scan_results(), i.e. at the same time the list is being created.
If at some point it becomes part of actual device state it needs to also
be reset when a network is being forgotten.
If Disconnect is called during an ongoing connection attempt send a
CMD_DEAUTHENTICATE command same as when we're already connected, and
send a reply to potential dbus Connect call.
When a new wiphy is added, the kernel usually adds a default STA
interface as well. This interface is currently not signaled over
nl80211 in any way.
This implements a selective dump of the wiphy interfaces in order to
obtain the newly added netdev. Selective dump is currently not
supported by the kernel, so all netdevs will be returned. A patch on
linux-wireless is pending that implements the selective dump
functionality.
==24934== 16 bytes in 1 blocks are definitely lost in loss record 1 of 1
==24934== at 0x4C2C970: malloc (vg_replace_malloc.c:296)
==24934== by 0x41675D: l_malloc (util.c:62)
==24934== by 0x4033B3: netdev_set_linkmode_and_operstate
(netdev.c:149)
==24934== by 0x4042B9: netdev_free (netdev.c:221)
==24934== by 0x41735D: l_queue_clear (queue.c:107)
==24934== by 0x4173A8: l_queue_destroy (queue.c:82)
==24934== by 0x40543D: netdev_exit (netdev.c:1459)
==24934== by 0x402D6F: nl80211_vanished (main.c:126)
==24934== by 0x41E607: l_genl_family_unref (genl.c:1057)
==24934== by 0x402B50: main (main.c:237)
Instead of calling the device added or removed callback when the
interface is detected, call it when interface goes up or down. This
only affects the addition and removal of the WSC interface now.
During the network_info refactoring the adding of the connected BSS to
device->bss_list in case it is not in the scan results has moved to
after the l_hashmap_foreach_remove call meaning that the network could
be removed even though it is still pointed at by
device->connected networks. Reverse the order to what it was before.
Alternatively network_process network could take not of the fact the
network is connected and not call network_remove on it leaving it with
an empty bss_list.
It is probably rare that a disconnect should fail but if it happens the
device->state is not returned to CONNECTED and I'm not sure if it should
be, so the ConnectedNetwork property and other bits should probably be
reset at the start of the disconnection instead of at the end.
Also check if state is CONNECTED before calling network_disconnected
because network_connected may have not been called yet.
--interfaces (-i) tells iwd which interfaces to manage. If the option
is ommitted, all interfaces will be managed.
--nointerfaces (-I) tells iwd which interfaces to blacklist. If the
option is ommitted, no interfaces will be blacklisted.
When setting operstate to dormant or down, give it a callback for debug
purposes. It looks like that operstate down message does not have a
chance to go out currently.
knownnetworks.c/.h implements the KnownNetworks interface and loads the
known networks from storage on startup. The list of all the networks
including information on whether a network is known is managed in
network.c to avoid having two separate lists of network_info structures
and keeping them in sync. That turns out to be difficult because the
network.c list is sorted by connected_time and connected_time changes
can be triggered in both network.c or knownnetworks.c. Both can also
trigger a network_info to be removed completely.
network_info gets a is_known flag that is used for the
GetOrderedNetworks tracking and to implement the KnownNetworks
interface - loading of the list of known networks on startup and
forgetting networks.
For simplicity and future use (possibly performance), every struct network
gets a pointer to a network_info structure, there's one network_info for
every network being by any interface, not only known networks. The SSID
and security type information is removed from struct network because the
network_info holds that information.
network_info also gets a seen_count field to count how many references
from network.info fields it has, so as to fix the removal of
network_info structures. Previously, once they were added to the
networks list, they'd stay there forever possibly skewing the network
ranking results.
This also fixed the network ranking used by GetOrderedNetwork which
wasn't working due to a missing assignment of *index in
network_find_info also triggering valgrind alerts.
The eapol state machine parameters are now built inside device.c when
the network connection is attempted. The reason is that the device
object knows about network settings, wiphy constraints and should
contain the main 'management' logic.
netdev now manages the actual low-level process of building association
messages, detecting authentication events, etc.
Keep an updated sorted list of networks in addition to the "networks"
hashmap. The list can be queried through the GetOrderedNetworks dbus
method.
We also take advantage of that list to get rid of a single
l_hashmap_foreach in new_scan_results.
A function that calculates a new rank type to order all networks
currently seen by a netdev. The order is designed for displaying the
list to user so that the networks most likely to be wanted by the user
are first on the list.
Since the rankmod value only makes sense for autoconnectable networks,
change network_rankmod to return an indication of whether the rankmod is
valid as a boolean instead of as a double, as discussed before.
Do nothing in device_disassociated if device->connected_network
indicates we are not associated. This may happen if the device was
connected since before iwd was started, this should possibly be fixed
separately by querying device state when device is detected.
Make sure networks of all 4 security types have a settings file created
or updated with a new modification time on a successful connect so that
autoconnect and network sorting works for networks other than PSK too.
By doing this on storage_network_touch failure we make sure we don't
overwrite anything dropped into the settings directory while we were
connecting.
for network_seen and network_connected
Only accept a struct network pointer instead of separately the ssid and
security type. This is needed so we can do some more simplification in
the next patch by having access to the network struct.
It looks like with multiple netdev seeing the same networks we'd create
multiple network_info structures for each network. Since the
"networks" list (of network_info structs) is global that's probbaly not
the intention here.
Turn netdev watches into device watches. The intent is to refactor out
netdev specific details into its own class and move device specific
logic into device.c away from wiphy.c
Sometimes the periodic scan is started and stopped before the timeout
was created. If periodic_scan_stop was called before, the timeout
object was not reset to NULL, which can lead to a crash.
The lost beacon event can be received when iwd thinks netdev is
diconnected if it was connected before iwd started, and then
netdev_disassociated will segfault.
It seems until now dbus.c would always connect to dbus-1 (unless
DBUS_SESSION_BUS_ADDRESS pointed at kdbus) and passing -K only made
iwd create a kdbus bus and not use it. Now use ell to actually use
kdbus instead of dbus-1 with -K. Don't use the src/kdbus.c functions
that duplicate ell functionality. As a side effect the connection
description and the bloom sizes are now the ell defaults.
Instead of passing the user_data parameter in every __eapol_rx_packet
call to be used by EAPOL in all tx_packet calls, add
eapol_sm_set_tx_user_data function that sets the value of user_data for
all subsequent tx_packet calls. This way tx_packet can be called from
places that are not necessarily inside an __eapol_rx_packet call.
Only EAP as the inner authentication option is supported. According to
wikipedia this is the most popular EAP-TTLS use case, with MD5 as the
inner EAP's method.
Add the EAP-TLS authentication method. Currently, all the credentials
data is read from the provisioning file even though things like the
private key passphrase should possibly be obtained from the dbus agent.
Probe Response messages can contain additional attributes tucked away
into the WFA-Vendor specific attribute. Parse these attributes while
making sure the order is as expected.
==2469== 24 bytes in 1 blocks are still reachable in loss record 1 of 1
==2469== at 0x4C2B970: malloc (vg_replace_malloc.c:296)
==2469== by 0x40E6DD: l_malloc (util.c:62)
==2469== by 0x40F1CD: l_queue_new (queue.c:63)
==2469== by 0x40D534: scan_init (scan.c:796)
==2469== by 0x403AC3: nl80211_appeared (wiphy.c:2121)
==2469== by 0x415FF3: get_family_callback (genl.c:987)
==2469== by 0x415A4F: process_request (genl.c:381)
==2469== by 0x415A4F: received_data (genl.c:492)
==2469== by 0x413184: io_callback (io.c:120)
==2469== by 0x4127C2: l_main_run (main.c:346)
==2469== by 0x40253E: main (main.c:171)
Refactoring the entire scan code, and this part seems to not be
supported by the target kernels. Revisit / redo this functionality once
things become a bit clearer.
In the D-bus .Connect call return an error immediately if we
find that there's no common cipher supported between iwd, the
network adapter and the AP. This is to avoid asking the agent
for the passkey if we know the connection will fail.
An alternative would be to only show networks that we can connect
to in the scan results on D-bus but I suspect that would cause
more pain to users debugging their wifi setups on average.
For now, if a passphrase is needed we check once before querying
for passphrase and recheck afterwards when we're about to
associate.
Instead of passing in the RSN/WPA elements, simply pass in the
configured cipher. This will make the implementation of the install_gtk
callback much simpler.
Step 4 is always sent without encrypted Key Data according to Section
11.6.6.5. In the case of WPA, Encrypted Key Data field is reserved, and
should always be 0. Thus it is safe to drop the !is_wpa condition.
When handling repeated 4-Way Handshakes from the AP there will be no
.Connect() call pending so we need to check that netdev->connect_pending
is non-NULL. It may be a good idea to check this even during initial
handshake.
When disconnect is triggered locally, we do not clean up properly.
==4336== at 0x4C2B970: malloc (vg_replace_malloc.c:296)
==4336== by 0x40CEED: l_malloc (util.c:62)
==4336== by 0x40F46A: l_settings_new (settings.c:82)
==4336== by 0x40CE2E: storage_network_open (storage.c:180)
==4336== by 0x40498E: network_connect_psk (wiphy.c:307)
==4336== by 0x40498E: network_connect (wiphy.c:359)
==4336== by 0x41D7EE: _dbus_object_tree_dispatch (dbus-service.c:845)
==4336== by 0x416A16: message_read_handler (dbus.c:297)
==4336== by 0x411984: io_callback (io.c:120)
==4336== by 0x410FC2: l_main_run (main.c:346)
==4336== by 0x40253E: main (main.c:171)
This happens when connecting / disconnecting successfully multiple
times.
==4336== 64 bytes in 2 blocks are definitely lost in loss record 9 of 11
==4336== at 0x4C2B970: malloc (vg_replace_malloc.c:296)
==4336== by 0x40CEED: l_malloc (util.c:62)
==4336== by 0x40D6D9: l_util_from_hexstring (util.c:493)
==4336== by 0x4049C6: network_connect_psk (wiphy.c:315)
==4336== by 0x4049C6: network_connect (wiphy.c:359)
==4336== by 0x41D7EE: _dbus_object_tree_dispatch (dbus-service.c:845)
==4336== by 0x416A16: message_read_handler (dbus.c:297)
==4336== by 0x411984: io_callback (io.c:120)
==4336== by 0x410FC2: l_main_run (main.c:346)
==4336== by 0x40253E: main (main.c:171)
Instead of storing the network pointer for each BSS, store it on the
netdev object. This saves space inside struct bss and makes longer term
refactoring simpler.
==4249== 231 (32 direct, 199 indirect) bytes in 1 blocks are definitely
lost in loss record 10 of 10
==4249== at 0x4C2B970: malloc (vg_replace_malloc.c:296)
==4249== by 0x40CF5D: l_malloc (util.c:62)
==4249== by 0x40F4DA: l_settings_new (settings.c:82)
==4249== by 0x40CE9E: storage_network_open (storage.c:180)
==4249== by 0x40499E: network_connect_psk (wiphy.c:307)
==4249== by 0x40499E: network_connect (wiphy.c:359)
==4249== by 0x41D85E: _dbus_object_tree_dispatch (dbus-service.c:845)
==4249== by 0x416A86: message_read_handler (dbus.c:297)
==4249== by 0x4119F4: io_callback (io.c:120)
==4249== by 0x411032: l_main_run (main.c:346)
==4249== by 0x40253E: main (main.c:171)
This patch saves off the PSK generated based on the passphrase provided
by the agent/user. The PSK is saved only if the connection is
successful.
Subsequent connection attempts to the known AP use the PSK saved on the
filesystem (default /var/lib/iwd/<ssid>.psk). If the connection fails,
the agent will again be asked for the passphrase on the next attempt.
CMD_DEAUTHENTICATE seems to carry only the management frame pdu
information. CMD_DISCONNECT is carrying the information that is
actually needed by us:
> Event: Disconnect (0x30) len 28 1140.118545
Wiphy: 0 (0x00000000)
Interface Index: 3 (0x00000003)
Reason Code: 2 (0x0002)
Disconnect by AP: true
We will ignore non-UTF8 based SSIDs. Support for non-UTF8 SSIDs seems
to be of dubious value in the real world as the vast majority of
consumer devices would not even allow such SSIDs to be configured or
used.
There also seems to be no compelling argument to support such SSIDs, so
until that argument arrives, non-UTF8 SSIDs will be filtered out. This
makes the D-Bus API and implementation much easier.
We start a timer. This handles the case that the Authenticator does
not send us the first message of the 4-way handshake, or disappears
before sending us the 3rd message.
We need to set the linkmode and operstate after successful
authentication.
Initial value for linkmode is 1 (user space controlled) and
IF_OPER_DORMANT for opermode. After successful authentication,
the operstate is set to IF_OPER_UP.
More specific details can be seen in kernel sources at
https://www.kernel.org/doc/Documentation/networking/operstates.txt
CC src/wiphy.o
src/wiphy.c: In function ‘eapol_read’:
src/wiphy.c:172:24: error: argument to ‘sizeof’ in ‘memset’ call is the same expression as the destination; did you mean to remove the addressof? [-Werror=sizeof-pointer-memaccess]
memset(&sll, 0, sizeof(&sll));
^
We can give reply to connect DBus call in associating event only
when we are connecting to Open network. For PSK AP, the reply can
only be sent after we have finished 4-way handshaking.
Currently it supports Microsoft vendor specific information element
with version and type value 1 only. Typically it contains WPA security
related information.
The success or not of a scan command is found from the message directly.
There's no need to look for any attribute from the scan netlink answer.
The message is an error message or not, and that tells if the scan has
been started or not.
Modifying a bit how networks are stored inside the hashtable:
1 - instead of the network id, the network's object path is used
2 - network holds the pointer of the object path
3 - the hashtable does not free the key (network_free() will)
This permits to optimize on:
1 - one memory allocation used for 2 distinct things
2 - remove the need to re-compute the object path (and the id) when it's
needed, it can use dircetly the one stored in the network structure.
Request a passphrase via Agent if none is set at the time network is
being connected. When freeing a network, cancel any outstanding Agent
requests and free allocated memory.
l_genl_family_send only returns request id. If request
failed at low level, current implementation does not handle that.
In case of request failure clear pending dbus messages.
Open networks do not contain a RSN element, so storing a 256 byte buffer
was too expensive.
This patch also has the side-effect of fixing detection of Open
Networks. Prior to this, if the scan results did not contain an RSN IE,
the 'rsne' variable would be set to all zeros. scan_get_ssid_security
would then be called, but instead of a NULL struct ie_rsn_info, a
non-null, but zerod out ie_rsn_info would be passed in. This caused the
code to work, but for the wrong reasons.
Instead of mallocing the ssid buffer, use a static array. This removes an
extra couple of malloc/free operations and should result in less memory
utilization on average.
Authenticate event on wiphy mlme notification does not provide
enough information on which network/bss authentication command
was sent. BSS and network information is required to send associate
command to AP. So cache bss pointer in netdev struct and retrieve
on wiphy mlme notifications.
Instead of storing multiple copies of the same BSS (one hanging off the
netdev object and one hanging off the network object), we instead store
the BSS list only on the netdev object.
The network object gets a pointer to the BSS structure on the netdev
list. As a side effect, the BSS list is always sorted properly.
Instead of always mallocing space for the ssid array, and then freeing
it in most circumstances, do the opposite. Only allocate the array once
it is actually needed. This has the side effect of removing an unneeded
variable and making the code simpler.
The idea here is that network object will contain a list of BSS
that have the same SSID and security setting. From user point of view,
we will connect to a certain SSID instead of connecting directly to
a BSS. We pick the best BSS automatically from the BSS list when
connecting to a SSID.
MLME notify function prints error if wiphy or netdev is missing.
The error text in this case talks about scan notification instead
of more proper MLME notification.
The frame which comes in is an EAPoL-key frame, thus changing the name
accordingly (as well as the parameter names).
Also, returning the cast pointer instead of a boolean is easier to
use as there won't be any need to perform the cast ourselves afterward
What comes in is a frame, and let's set it to uint8_t pointer, which is
semantically better than unsigned char.
Also, returning the cast pointer instead of a boolean is easier to
use as there won't be any need to perform the cast ourselves afterward
Use a static buffer for converting an SSID to an approximate string in
UTF8. Replace each char that is not UTF8 compatible with the UTF8
replacement symbol.
Make sure to print some errors if attributes cannot be appended
to a message. It is dangerous to ignore the return code from
l_genl_msg_append_attr() because the kernel might act weirdly
if some attribute is missing.
Add rudimentary support for mac80211 scheduled scan feature.
This is done so that kernel support for task called "Bind
NL80211_CMD_START_SCHED_SCAN to netlink socket" from TODO
file can be tested. The current scan interval is set to 60
seconds which is probably too fast for the final version.
Because none of the attributes are assigned until after the DeviceAdded
signal is emitted, the signal appears with invalid properties. For now,
move the netdev structure fill-out into the if statement.
If the netdev attributes can change, then these need to be handled
separately and appropriate signals to be sent.
==20758== Invalid read of size 1
==20758== at 0x401254: ie_tlv_iter_next (ie.c:55)
==20758== by 0x40104B: ie_test (test-ie.c:57)
==20758== by 0x4021C0: l_test_run (test.c:83)
==20758== by 0x4011B7: main (test-ie.c:123)
==20758== Address 0x51e10f3 is 0 bytes after a block of size 19 alloc'd
==20758== at 0x4C2C874: realloc (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==20758== by 0x4010CF: append_data (test-ie.c:101)
==20758== by 0x40118F: main (test-ie.c:119)
==20758==
==20758== Invalid read of size 1
==20758== at 0x401266: ie_tlv_iter_next (ie.c:56)
==20758== by 0x40104B: ie_test (test-ie.c:57)
==20758== by 0x4021C0: l_test_run (test.c:83)
==20758== by 0x4011B7: main (test-ie.c:123)
==20758== Address 0x51e10f4 is 1 bytes after a block of size 19 alloc'd
==20758== at 0x4C2C874: realloc (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==20758== by 0x4010CF: append_data (test-ie.c:101)
==20758== by 0x40118F: main (test-ie.c:119)