OWE processing can be completely taken care of inside
netdev_authenticate_event and netdev_associate_event. This removes
the need for OWE specific checks inside netdev_connect_event. We can
now return early out of the connect event if OWE is in progress.
Several Auth/Assoc failure status codes indicate that the connection
failed for reasons such as bandwidth issues, poor channel conditions
etc. These conditions should not result in the BSS being blacklisted
since its likely only a temporary issue and the AP is not actually
"broken" per-se.
This adds support in station.c to temporarily blacklist these BSS's
on a per-network basis. After the connection has completed we clear
out these blacklist entries.
Certain error conditions require that a BSS be blacklisted only for
the duration of the current connection. The existing blacklist
does not allow for this, and since this blacklist is shared between
all interfaces it doesnt make sense to use it for this purpose.
Instead, each network object can contain its own blacklist of
scan_bss elements. New elements can be added with network_blacklist_add.
The blacklist is cleared when the connection completes, either
successfully or not.
Now inside network_bss_select both the per-network blacklist as well as
the global blacklist will be checked before returning a BSS.
Several netdev events benefit from including event data in the callback.
This is similar to how the connect callback works as well. The content
of the event data is documented in netdev.h (netdev_event_func_t).
By including event data for the two disconnect events, we can pass the
reason code to better handle the failure in station.c. Now, inside
station_disconnect_event, we still check if there is a pending connection,
and if so we can call the connect callback directly with HANDSHAKE_FAILED.
Doing it this way unifies the code path into a single switch statment to
handle all failures.
In addition, we pass the RSSI level index as event data to
RSSI_LEVEL_NOTIFY. This removes the need for a getter to be exposed in
netdev.h.
On successful send, scan_send_start(..) used to set msg to NULL,
therefore the further management of the command by the caller was
impossible. This patch removes wrapper around l_genl_family_send()
and lets the callers to take responsibility for the command.
This change cleans up the mess of status vs reason codes. The two
types of codes have already been separated into different enumerations,
but netdev was still treating them the same (with last_status_code).
A new 'event_data' argument was added to the connect callback, which
has a different meaning depending on the result of the connection
(described inside netdev.h, netdev_connect_cb_t). This allows for the
removal of netdev_get_last_status_code since the status or reason
code is now passed via event_data.
Inside the netdev object last_status_code was renamed to last_code, for
the purpose of storing either status or reason. This is only used when
a disconnect needs to be emitted before failing the connection. In all
other cases we just pass the code directly into the connect_cb and do
not store it.
All ocurrences of netdev_connect_failed were updated to use the proper
code depending on the netdev result. Most of these simply changed from
REASON_CODE_UNSPECIFIED to STATUS_CODE_UNSPECIFIED. This was simply for
consistency (both codes have the same value).
netdev_[authenticate|associate]_event's were updated to parse the
status code and, if present, use that if their was a failure rather
than defaulting to UNSPECIFIED.
Even though .check_settings in our EAP method implementations does the
settings validation, .load_settings also has minimum sanity checks to
rule out segfaults if the settings have changed since the last
.check_settings call.
If OWE fails in association there is no reason to send a disconnect
since its already known that we failed. Instead we can directly
call netdev_connect_failed
Instead of sending a reason_code to netdev_setting_keys_failed, make it
take an errno (negative) instead. Since key setting failures are
entirely a system / software issue, and not a protocol issue, it makes
no sense to use a protocol error code.
Some users may need their own control over 2.4/5GHz preference. This
adds a new user option, 'rank_5g_factor', which allows users to increase
or decrease their 5G preference.
This adds support for parsing the VHT IE, which allows a BSS supporting
VHT (80211ac) to be ranked higher than a BSS supporting only HT/basic
rates. Now, with basic/HT/VHT parsing we can calculate the theoretical
maximum data rate for all three and rank the BSS based on that.
This adds HT IE parsing and data rate calculation for HT (80211n)
rates. Now, a BSS supporting HT rates will be ranked higher than
a basic rate BSS, assuming the RSSI is at an acceptable level.
The spec dictates RSSI thresholds for different modulation schemes, which
correlate to different data rates. Until now were were ranking a BSS with
only looking at its advertised data rate, which may not even be possible
if the RSSI does not meet the threshold.
Now, RSSI is taken into consideration and the data rate returned from
parsing (Ext) Supported Rates IE(s) will reflect that.
All over the place we do "ie[1] + 2" for getting the IE length. It
is much clearer to use a macro to do this. The macro also checks
for NULL, and returns zero in this case.
Supported rates will soon be parsed along with HT/VHT capabilities
to determine the best data rate. This will remove the need for the
supported_rates uintset element in scan_bss, as well as the single
API to only parse the supported rates IE. AP still does rely on
this though (since it only supports basic rates), so the parsing
function was moved into ap.c.
In the methods' check_settings do a more complete early check for
possible certificate / private key misconfiguration, including check
that the certificate and the private key are always present or absent
together and that they actually match each other. Do this by encrypting
and decrypting a small buffer because we have no better API for that.
A method's .check_settings method checks for inconsistent setting files
and prints readable errors so there's no need to do that again in
.load_settings, although at some point after removing the duplicate
error messages from the load_settings methods we agreed to keep minimum
checks that could cause a crash e.g. in a corner case like when the
setting file got modified between the check_settings and the
load_settings call. Some error messages have been re-added to
load_settings after that (e.g. in
bb4e1ebd4f) but they're incomplete and not
useful so remove them.
Previously, the storage dir has only been created after a successful
network connection, causing removal of Known Network interface from
Dbus and failure to register dir watcher until daemon is restarted.
A length check was still assuming the 256 bit ECC group. This
was updated to scale with the group. The commit buffer was also
not properly sized. This was changed to allow for the largest
ECC group supported.
SAE was hardcoded to work only with group 19. This change fixes up the
hard coded lengths to allow it to work with group 20 since ELL supports
it. There was also good amount of logic added to support negotiating
groups. Before, since we only supported group 19, we would just reject
the connection to an AP unless it only supported group 19.
This did lead to a discovery of a potential bug in hostapd, which was
worked around in SAE in order to properly support group negotiation.
If an AP receives a commit request with a group it does not support it
should reject the authentication with code 77. According to the spec
it should also include the group number which it is rejecting. This is
not the case with hostapd. To fix this we needed to special case a
length check where we would otherwise fail the connection.
Most of this work was already done after moving ECC into ELL, but
there were still a few places where the 256-bit group was assumed.
This allows the 384-bit group to be used, and theoretically any
other group added to ELL in the future.
If we have a BSS list where all BSS's have been blacklisted we still
need a way to force a connection to that network, instead of having
to wait for the blacklist entry to expire. network_bss_select now
takes a boolean 'fallback_to_blacklist' which causes the selection
to still return a connectable BSS even if the entire list was
blacklisted.
In most cases this is set to true, as these cases are initiated by
DBus calls. The only case where this is not true is inside
station_try_next_bss, where we do want to honor the blacklist.
This both prevents an explicit connect call (where all BSS's are
blacklisted) from trying all the blacklisted BSS's, as well as the
autoconnect case where we simply should not try to connect if all
the BSS's are blacklisted.
There are is some implied behavior here that may not be obvious:
On an explicit DBus connect call IWD will attempt to connect to
any non-blacklisted BSS found under the network. If unsuccessful,
the current BSS will be blacklisted and IWD will try the next
in the list. This will repeat until all BSS's are blacklisted,
and in this case the connect call will fail.
If a connect is tried again when all BSS's are blacklisted IWD
will attempt to connect to the first connectable blacklisted
BSS, and if this fails the connect call will fail. No more
connection attempts will happen until the next DBus call.
If IWD fails to connect to a BSS we can attempt to connect to a different
BSS under the same network and blacklist the first BSS. In the case of an
incorrect PSK (MMPDU code 2 or 23) we will still fail the connection.
station_connect_cb was refactored to better handle the dbus case. Now the
netdev result switch statement is handled before deciding whether to send
a dbus reply. This allows for both cases where we are trying to connect
to the next BSS in autoconnect, as well as in the dbus case.
This makes __station_connect_network even less intelligent by JUST
making it connect to a network, without any state changes. This makes
the rekey logic much cleaner.
We were also changing dbus properties when setting the state to
CONNECTING, so those dbus property change calls were moved into
station_enter_state.
A new driver extended feature bit was added signifying if the driver
supports PTK replacement/rekeying. During a connect, netdev checks
for the driver feature and sets the handshakes 'no_rekey' flag
accordingly.
At some point the AP will decide to rekey which is handled inside
eapol. If no_rekey is unset we rekey as normal and the connection
remains open. If we have set no_rekey eapol will emit
HANDSHAKE_EVENT_REKEY_FAILED, which is now caught inside station. If
this happens our only choice is to fully disconnect and reconnect.
If we receive handshake message 1/4 after we are already connected
the AP is attempting to rekey. This may not be allowed and if not
we do not process the rekey and emit HANDSHAKE_EVENT_REKEY_FAILED
so any listeners can handle accordingly.
The AP structure was getting cleaned up twice. When the DBus stop method came
in we do AP_STOP on nl80211. In this callback the AP was getting freed in
ap_reset. Also when the DBus interface was cleaned up it triggered ap_reset.
Since ap->started gets set to false in ap_reset, we now check this and bail
out if the AP is already stopped.
Fixes:
++++++++ backtrace ++++++++
0 0x7f099c11ef20 in /lib/x86_64-linux-gnu/libc.so.6
1 0x43fed0 in l_queue_foreach() at ell/queue.c:441 (discriminator 3)
2 0x423a6c in ap_reset() at src/ap.c:140
3 0x423b69 in ap_free() at src/ap.c:162
4 0x44ee86 in interface_instance_free() at ell/dbus-service.c:513
5 0x451730 in _dbus_object_tree_remove_interface() at ell/dbus-service.c:1650
6 0x405c07 in netdev_newlink_notify() at src/netdev.c:4449 (discriminator 9)
7 0x440775 in l_hashmap_foreach() at ell/hashmap.c:534
8 0x4455d3 in process_broadcast() at ell/netlink.c:158
9 0x4439b3 in io_callback() at ell/io.c:126
10 0x442c4e in l_main_iterate() at ell/main.c:473
11 0x442d1c in l_main_run() at ell/main.c:516
12 0x442f2b in l_main_run_with_signal() at ell/main.c:644
13 0x403ab3 in main() at src/main.c:504
14 0x7f099c101b97 in /lib/x86_64-linux-gnu/libc.so.6
+++++++++++++++++++++++++++
This will allow for blacklisting a BSS if the connection fails. The
actual blacklist module is simple and must be driven by station. All
it does is add BSS addresses, a timestamp, and a timeout to a queue.
Entries can also be removed, or checked if they exist. The blacklist
timeout is configuratble in main.conf, as well as the blacklist
timeout multiplier and maximum timeout. The multiplier is used after
a blacklisted BSS timeout expires but we still fail to connect on the
next connection attempt. We multiply the current timeout by the
multiplier so the BSS remains in the blacklist for a larger growing
amount of time until it reaches the maximum (24 hours by default).
Soon BSS blacklisting will be added, and in order to properly decide if
a BSS should be blacklisted we need the status code on a failed
connection. This change stores the status code when there is a failure
in netdev and adds a getter to retrieve later. In many cases we have
the actual status code from the AP, but in some corner cases its not
obtainable (e.g. an error sending an NL80211 command) in which case we
just default to MMPDU_REASON_CODE_UNSPECIFIED.
Rather than continue with the pattern of setting netdev->result and
now netdev->last_status_code, the netdev_connect_failed function was
redefined so its no longer used as both a NL80211 callback and called
directly. Instead a new function was added, netdev_disconnect_cb which
just calls netdev_connect_failed. netdev_disconnect_cb should not be
used for all the NL80211 disconnect commands. Now netdev_connect_failed
takes both a result and status code which it sets in the netdev object.
In the case where we were using netdev_connect_failed as a callback we
still need to set the result and last_status_code but at least this is
better than having to set those in all cases.
Remove an unneeded buffer and its memcpy, remove the now unneeded use of
l_checksum_digest_length and use l_checksum_reset instead of creating a
new l_checksum for each chunk.
ELL ECC supports group 20 (P384) so OWE can also support it. This also
adds group negotiation, where OWE can choose a different group than the
default if the AP requests it.
A check needed to be added in netdev in order for the negotiation to work.
The RFC says that if a group is not supported association should be rejected
with code 77 (unsupported finite cyclic group) and association should be
started again. This rejection was causing a connect event to be emitted by
the kernel (in addition to an associate event) which would result in netdev
terminating the connection, which we didn't want. Since OWE receives the
rejected associate event it can intelligently decide whether it really wants
to terminate (out of supported groups) or try the next available group.
This also utilizes the new MIC/KEK/KCK length changes, since OWE dictates
the lengths of those keys.
Rather than hard coding to SHA256, we can pass in l_checksum_type
and use that SHA. This will allow for OWE/SAE/PWD to support more
curves that use different SHA algorithms for hashing.
OWE defines KEK/KCK lengths depending on group. This change adds a
case into handshake_get_key_sizes. With OWE we can determine the
key lengths based on the PMK length in the handshake.
In preparation for OWE supporting multiple groups eapol needed some
additional cases to handle the OWE AKM since OWE dictates the KEK,
KCK and MIC key lengths (depending on group).
Right now the PMK is hard coded to 32 bytes, which works for the vast
majority of cases. The only outlier is OWE which can generate a PMK
of 32, 48 or 64 bytes depending on the ECC group used. The PMK length
is already stored in the handshake, so now we can just pass that to
crypto_derive_pairwise_ptk
The crypto_ptk was hard coded for 16 byte KCK/KEK. Depending on the
AKM these can be up to 32 bytes. This changes completely removes the
crypto_ptk struct and adds getters to the handshake object for the
kck and kek. Like before the PTK is derived into a continuous buffer,
and the kck/kek getters take care of returning the proper key offset
depending on AKM.
To allow for larger than 16 byte keys aes_unwrap needed to be
modified to take the kek length.
The MIC length was hard coded to 16 bytes everywhere, and since several
AKMs require larger MIC's (24/32) this needed to change. The main issue
was that the MIC was hard coded to 16 bytes inside eapol_key. Instead
of doing this, the MIC, key_data_length, and key_data elements were all
bundled into key_data[0]. In order to retrieve the MIC, key_data_len,
or key_data several macros were introduced which account for the MIC
length provided.
A consequence of this is that all the verify functions inside eapol now
require the MIC length as a parameter because without it they cannot
determine the byte offset of key_data or key_data_length.
The MIC length for a given handshake is set inside the SM when starting
EAPoL. This length is determined by the AKM for the handshake.
Non-802.11 AKMs can define their own key lengths. Currently only OWE does
this, and the MIC/KEK/KCK lengths will be determined by the PMK length so
we need to save it.
Make sure we don't pass NULLs to memcmp or l_memdup when the prefix
buffer is NULL. There's no point having callers pass dummy buffers if
they need to watch frames independent of the frame data.
Start using l_key_generate_dh_private and l_key_validate_dh_payload to
check for the disallowed corner case values in the DH private/public
values generated/received.
Some of the EAP methods don't require a clear-text identity to
be sent with the Identity Response packet. The mandatory identity
filed has resulted in unnecessary transmission of the garbage
values. This patch makes the Identity field to be optional and
shift responsibility to ensure its existence to the individual
methods if the field is required. All necessary identity checks
have been previously propagated to individual methods.
If a network is being forgotten, then make sure to reset connected_time.
Otherwise the rank logic thinks that the network is known which can
result in network_find_rank_index returning -1.
Found by sanitizer:
src/network.c:1329:23: runtime error: index -1 out of bounds for type
'double [64]'
==25412==ERROR: AddressSanitizer: global-buffer-overflow on address 0x000000421ab0 at pc 0x000000402faf bp 0x7fffffffdb00 sp 0x7fffffffdaf0
READ of size 4 at 0x000000421ab0 thread T0
#0 0x402fae in validate_mgmt_ies src/mpdu.c:128
#1 0x403ce8 in validate_probe_request_mmpdu src/mpdu.c:370
#2 0x404ef2 in validate_mgmt_mpdu src/mpdu.c:662
#3 0x405166 in mpdu_validate src/mpdu.c:706
#4 0x402529 in ie_order_test unit/test-mpdu.c:156
#5 0x418f49 in l_test_run ell/test.c:83
#6 0x402715 in main unit/test-mpdu.c:171
#7 0x7ffff5d43ed9 in __libc_start_main (/lib64/libc.so.6+0x20ed9)
#8 0x4019a9 in _start (/home/denkenz/iwd-master/unit/test-mpdu+0x4019a9)
This fixes the valgrind warning:
==14804== Conditional jump or move depends on uninitialised value(s)
==14804== at 0x402E56: sae_is_quadradic_residue (sae.c:218)
==14804== by 0x402E56: sae_compute_pwe (sae.c:272)
==14804== by 0x402E56: sae_build_commit (sae.c:333)
==14804== by 0x402E56: sae_send_commit (sae.c:591)
==14804== by 0x401CC3: test_confirm_after_accept (test-sae.c:454)
==14804== by 0x408A28: l_test_run (test.c:83)
==14804== by 0x401427: main (test-sae.c:566)
The return from l_ecc_point_from_data was not being checked for NULL,
which would cause a segfault if the peer sent an invalid point.
This adds a check and fails the protocol if p_element is NULL, as the
spec defines.
src/eap-ttls.c:766:50: error: ‘Password’ directive output may be truncated writing 8 bytes into a region of size between 1 and 72 [-Werror=format-truncation=]
snprintf(password_key, sizeof(password_key), "%sPassword", prefix);
^~~~~~~~
In file included from /usr/include/stdio.h:862,
from src/eap-ttls.c:28:
/usr/include/bits/stdio2.h:64:10: note: ‘__builtin___snprintf_chk’ output between 9 and 80 bytes into a destination of size 72
return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
__bos (__s), __fmt, __va_arg_pack ());
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Stop using l_pem_load_certificate which has been removed from ell, use
the same functions to load certificate files to validate them as those
used by the TLS implementation itself.
Check that the TLS logic has verified the server is trusted by the CA if
one was configured. This is more of an assert as ell intentionally only
allows empty certificate chains from the peer in server mode (if a CA
certficate is set) although this could be made configurable.
This should not change the behaviour except for fixing a rare crash
due to scan_cancel not working correctly when cancelling the first scan
request in the queue while a periodic scan was running, and potentially
other corner cases. To be able to better distinguish between a periodic
scan in progress and a scan request in progress add a sc->current_sr
field that points either at a scan request or is NULL when a periodic
scan is in ongoing. Move the triggered flag from scan_request and
scan_preiodic directly to scan_context so it's there together with
start_cmd_id. Hopefully make scan_cancel simpler/clearer.
Note sc->state and sc->triggered have similar semantics so one of them
may be easily removed. Also the wiphy_id parameter to the scan callback
is rather useless, note I temporarily pass 0 as the value on error but
perhaps it should be dropped.
In the name of failing earlier try to generate the PSK from the
passphrase as soon as we receive the passphrase or read it from the
file, mainly to validate it has the right number of characters.
The passphrase length currently gets validates inside
crypto_psk_from_passphrase which will be called when we receive a new
passphrase from the agent or when the config file has no PSK in it. We
do not do this when there's already both the PSK and the passphrase
available in the settings -- we can add that separately if needed.
The main difference with this is that scan_context removal will also
trigger the .destroy calls. Normally there won't be any requests left
during scan_context but if there were any we should call destroy on
them.
If we haven't sent a PMKID, and we're not running EAP, then ignore
whatever PMKID the AP sends us. Frequently the APs send us garbage in
this field. For PSK and related AKMs, if the PMK is wrong, then we
simply fail to generate a proper MIC and the handshake would fail at a
later stage anyway.
Fix incorrect usage of the caller’s scan triggered callback.
In case of a failure, destroy scan request and notify caller
about the issue by returning zero scan id instead of calling
callers’ scan triggered callback with an error code.
Using backtrace() is of no use when building with PIE (which most
distro compilers do by default) and prevents catching the coredump
for later retracing, which is needed since distros usually don't
install debug symbols by default either.
This patch thus only enables backtrace() when --enable-maintainer-mode
is passed and also tries to explicitly disable PIE.
ECDH was expecting the private key in LE, but the public key in BE byte ordering.
For consistency the ECDH now expect all inputs in LE byte ordering. It is up to
the caller to order the bytes appropriately.
This required adding some ecc_native2be/be2native calls in OWE
The changes to station.c are minor. Specifically,
station_build_handshake_rsn was modified to always build up the RSN
information, not just for SECURITY_8021X and SECURITY_PSK. This is
because OWE needs this RSN information, even though it is still
SECURITY_NONE. Since "regular" open networks don't need this, a check
was added (security == NONE && akm != OWE) which skips the RSN
building.
netdev.c needed to be changed in nearly the same manor as it was for
SAE. When connecting, we check if the AKM is for OWE, and if so create
a new OWE SM and start it. OWE handles all the ECDH, and netdev handles
sending CMD_AUTHENTICATE and CMD_ASSOCIATE when triggered by OWE. The
incoming authenticate/associate events just get forwarded to OWE as they
do with SAE.
This module is similar to SAE in that it communicates over authenticate
and associate frames. Creating a new OWE SM requires registering two TX
functions that handle sending the data out over CMD_AUTHENTICATE/ASSOCIATE,
as well as a complete function.
Once ready, calling owe_start will kick off the OWE process, first by
sending out an authenticate frame. There is nothing special here, since
OWE is done over the associate request/response.
After the authenticate response comes in OWE will send out the associate
frame which includes the ECDH public key, and then receive the AP's
public key via the associate response. From here OWE will use ECDH to
compute the shared secret, and the PMK/PMKID. Both are set into the
handshake object.
Assuming the PMK/PMKID are successfully computed the OWE complete callback
will trigger, meaning the 4-way handshake can begin using the PMK/PMKID
that were set in the handshake object.
The RFC (5869) for this implementation defines two functions,
HKDF-Extract and HKDF-Expand. The existing 'hkdf_256' was implementing
the Extract function, so it was renamed appropriately. The name was
changed for consistency when the Expand function will be added in the
future.
In the current version SECURITY_PSK was handled inside the is_rsn block
while the SECURITY_8021X was off in its own block. This was weird and a
bit misleading. Simplify the code flow through the use of a goto and
decrease the nesting level.
Also optimize out unnecessary use of scan_bss_get_rsn_info
In network_autoconnect, when the network was SECURITY_8021X there was no
check (for SECURITY_PSK) before calling network_load_psk. Since the
provisioning file was for an 8021x network neither PreSharedKey or
Passphrase existed so this would always fail. This fixes the 8021x failure
in testConnectAutoconnect.
During the handshake setup, if security != SECURITY_PSK then 8021x settings
would get set in the handshake object. This didn't appear to break anything
(e.g. Open/WEP) but its better to explicitly check that we are setting up
an 8021x network.
Check for HAVE_EXECINFO_H for all __iwd_backtrace_init usages.
Fixes:
src/main.o: In function `main':
main.c:(.text.startup+0x798): undefined reference to `__iwd_backtrace_init'
collect2: error: ld returned 1 exit status
A sorted list of hidden network BSSs observed in the recent scan
is kept for the informational purposes of the clients. In addition,
it has deprecated the usage of seen_hidden_networks variable.
Refactor the network->psk and network->passphrase loading and saving
logic to not require the PreSharedKey entry in the psk config file and
to generate network->psk lazily on request. Still cache the computed
PSK in memory and in the .psk file to avoid recomputing it which uses
many syscalls. While there update the ask_psk variable to
ask_passphrase because we're specifically asking for the passphrase.
According to the specification, Supported rates IE is supposed
to have a maximum length of eight rate bytes. In the wild an
Access Point is found to add 12 bytes of data instead of placing
excess rate bytes in an Extended Rates IE.
BSS: len 480
BSSID 44:39:C4:XX:XX:XX
Probe Response: true
TSF: 0 (0x0000000000000000)
IEs: len 188
...
Supported rates:
1.0(B) 2.0(B) 5.5(B) 6.0(B) 9.0 11.0(B) 12.0(B) 18.0 Mbit/s
24.0(B) 36.0 48.0 54.0 Mbit/s
82 84 8b 8c 12 96 98 24 b0 48 60 6c .......$.H`l
DSSS parameter set: channel 3
03
...
Any following IEs decode nicely, thus it seems that we can relax
Supported Rates IE length handling to support this thermostat.
After moving AP EAPoL code into eapol.c there were a few functions that
no longer needed to be public API's. These were changed to static's and
the header definition was removed.
Set an upper limit on a fragmented EAP-TLS request size similar to how
we do it in EAP-TTLS. While there make the code more similar to the
EAP-TTLS flag processing to keep them closer in sync. Note that the
spec suggests a 64KB limit but it's not clear if that is for the TLS
record or EAP request although it takes into account the whole TLS
negotiation so it might be good for both.
Some of the TTLS server implementations set the L flag in the fragment
packets other than the first one. To stay interoperable with such devices,
iwd is relaxing the L bit check.
Switch EAP-MD5 to use the common password setting key nomenclature.
The key name has been changed from PREFIX-MD5-Secret to PREFIX-Password.
Note: The old key name is supported.
In addition, this patch adds an ability to request Identity and/or
Password from user.
Adhoc was not waiting for BOTH handshakes to complete before adding the
new peer to the ConnectedPeers property. Actually waiting for the gtk/igtk
(in a previous commit) helps with this, but adhoc also needed to keep track
of which handshakes had completed, and only add the peer once BOTH were done.
This required a small change in netdev, where we memcmp the addresses from
both handshakes and only set the PTK on one.
Currently, netdev triggers the HANDSHAKE_COMPLETE event after completing
the SET_STATION (after setting the pairwise key). Depending on the timing
this may happen before the GTK/IGTK are set which will result in group
traffic not working initially (the GTK/IGTK would still get set, but group
traffic would not work immediately after DBus said you were connected, this
mainly poses a problem with autotests).
In order to fix this, several flags were added in netdev_handshake_state:
ptk_installed, gtk_installed, igtk_installed, and completed. Each of these
flags are set true when their respective keys are set, and in each key
callback we try to trigger the handshake complete event (assuming all the
flags are true). Initially the gtk/igtk flags are set to true, for reasons
explained below.
In the WPA2 case, all the key setter functions are called sequentially from
eapol. With this change, the PTK is now set AFTER the gtk/igtk. This is
because the gtk/igtk are optional and only set if group traffic is allowed.
If the gtk/igtk are not used, we set the PTK and can immediately trigger the
handshake complete event (since gtk_installed/igtk_installed are initialized
as true). When the gtk/igtk are being set, we immediately set their flags to
false and wait for their callbacks in addition to the PTK callback. Doing it
this way handles both group traffic and non group traffic paths.
WPA1 throws a wrench into this since the group keys are obtained in a
separate handshake. For this case a new flag was added to the handshake_state,
'wait_for_gtk'. This allows netdev to set the PTK after the initial 4-way,
but still wait for the gtk/igtk setters to get called before triggering the
handshake complete event. As a precaution, netdev sets a timeout that will
trigger if the gtk/igtk setters are never called. In this case we can still
complete the connection, but print a warning that group traffic will not be
allowed.
==1628== Invalid read of size 1
==1628== at 0x405E71: hardware_rekey_cb (netdev.c:1381)
==1628== by 0x444E5B: process_unicast (genl.c:415)
==1628== by 0x444E5B: received_data (genl.c:534)
==1628== by 0x442032: io_callback (io.c:126)
==1628== by 0x4414CD: l_main_iterate (main.c:387)
==1628== by 0x44158B: l_main_run (main.c:434)
==1628== by 0x403775: main (main.c:489)
==1628== Address 0x5475208 is 312 bytes inside a block of size 320 free'd
==1628== at 0x4C2ED18: free (vg_replace_malloc.c:530)
==1628== by 0x43D94D: l_queue_clear (queue.c:107)
==1628== by 0x43D998: l_queue_destroy (queue.c:82)
==1628== by 0x40B431: netdev_shutdown (netdev.c:4765)
==1628== by 0x403B17: iwd_shutdown (main.c:81)
==1628== by 0x4419D2: signal_callback (signal.c:82)
==1628== by 0x4414CD: l_main_iterate (main.c:387)
==1628== by 0x44158B: l_main_run (main.c:434)
==1628== by 0x403775: main (main.c:489)
==1628== Block was alloc'd at
==1628== at 0x4C2DB6B: malloc (vg_replace_malloc.c:299)
==1628== by 0x43CA4D: l_malloc (util.c:62)
==1628== by 0x40A853: netdev_create_from_genl (netdev.c:4517)
==1628== by 0x444E5B: process_unicast (genl.c:415)
==1628== by 0x444E5B: received_data (genl.c:534)
==1628== by 0x442032: io_callback (io.c:126)
==1628== by 0x4414CD: l_main_iterate (main.c:387)
==1628== by 0x44158B: l_main_run (main.c:434)
==1628== by 0x403775: main (main.c:489)
Adhoc requires 2 GTK's to be set, a single TX GTK and a per-mac RX GTK.
The per-mac RX GTK already gets set via netdev_set_gtk. The single TX GTK
is created the same as AP, where, upon the first station connecting a GTK
is generated and set in the kernel. Then any subsequent stations use
GET_KEY to retrieve the GTK and set it in the handshake.
AdHoc will also need the same functionality to verify and parse the
key sequence from GET_KEY. This block of code was moved from AP's
GET_KEY callback into nl80211_parse_get_key_seq.
Netdev/AP share several NL80211 commands and each has their own
builder API's. These were moved into a common file nl80211_util.[ch].
A helper was added to AP for building NEW_STATION to make the associate
callback look cleaner (rather than manually building NEW_STATION).
Check that netdev->device is not NULL before doing device_remove()
(which would crash) and emitting NETDEV_WATCH_EVENT_DEL. It may be
NULL if the initial RTM_SETLINK has failed to bring device UP.
If there are Ad-hoc BSSes they should be present in the scan results
together with regular APs as far as scan.c is concerned. But in
station mode we can't connect to them -- the Connect method will fail and
autoconnect would fail. Since we have no property to indicate a
network is an IBSS just filter these results out for now. There are
perhaps better solutions but the benefit is very low.
This is a replacement for station's static select_akm_suite. This was
done because wiphy can make a much more intellegent decision about the
akm suite by checking the wiphy supported features e.g. SAE support.
This allows a connection to hybrid WPA2/WPA3 AP's if SAE is not
supported in the kernel.
Set a default GTK cipher type same as our current PTK type, generate a
random GTK when the first STA connects and set it up in the kernel, then
pass the values that EAPoL is going to need to the handshake_state.
In netdev_set_powered also check that no NL80211_CMD_SET_INTERFACE is in
progress because once it returned we would overwrite
netdev->set_powered_cmd_id (could also add a check there but it seems
more logical to just disallow Powered property changes while Mode is
being changed, since we also disallow Mode changes while Powered is
being changed.)
Since device object no longer creates / destroys station objects, use
station_find inside ap directed roam events to direct these to the
station interface.
Add places to store the GTK data, index and RSC in struct
handshake_state and add a setter function for these fields. We may want
to also convert install_gtk to use these fields similar to install_ptk.
As a consequence of the previous commit, netdev watches are always
called when the device object is valid. As a result, we can drop the
netdev_get_device calls and checks from individual AP/AdHoc/Station/WSC
netdev watches
Instead of creating the Station interface in device.c create it directly
on the netdev watch event the same way that the AP and AdHoc interfaces
are created and freed. This fixes some minor incosistencies, for
example station_free was previously called twice, once from device.c and
once from the netdev watch.
device.c would previously keep the pointer returned by station_create()
but that pointer was not actually useful so remove it. Autotests still
seem to pass.
Call netdev_disconnect() to make netdev forget any of station.c's
callbacks for connections or transitions in progress or established.
Otherwise station.c will crash as soon as we're connected and try to
change interface mode:
==17601== Invalid read of size 8
==17601== at 0x11DFA0: station_disconnect_event (station.c:775)
==17601== by 0x11DFA0: station_netdev_event (station.c:1570)
==17601== by 0x115D18: netdev_disconnect_event (netdev.c:868)
==17601== by 0x115D18: netdev_mlme_notify (netdev.c:3403)
==17601== by 0x14E287: l_queue_foreach (queue.c:441)
==17601== by 0x1558B4: process_multicast (genl.c:469)
==17601== by 0x1558B4: received_data (genl.c:532)
==17601== by 0x152888: io_callback (io.c:123)
==17601== by 0x151BCD: l_main_iterate (main.c:376)
==17601== by 0x151C9B: l_main_run (main.c:423)
==17601== by 0x10FE20: main (main.c:489)
Since the interfaces are not supposed to exist when the device is DOWN
(we destroy the interfaces on NETDEV_WATCH_EVENT_DOWN too), don't
create the interfaces if the device hasn't been brought up yet.
When we detect a new device we either bring it down and then up or only
up. The IFF_UP flag in netdev->ifi_flags is updated before that, then
we send the two rtnl commands and then fire the NETDEV_WATCH_EVENT_NEW
event if either the bring up succeeded or -ERFKILL was returned, so the
device may either be UP or DOWN at that point.
It seems that a RTNL NEWLINK notification is usually received before
the RTNL command callback but I don't think this is guaranteed so update
the IFF_UP flag in the callbacks so that the NETDEV_WATCH_EVENT_NEW
handlers can reliably use netdev_get_is_up()
The NL80211_ATTR_KEY_DEFAULT_TYPES attribute is only parsed by the
kernel if either NL80211_ATTR_KEY_DEFAULT or
NL80211_ATTR_KEY_DEFAULT_MGMT are also present, however these are only
used with NL80211_CMD_SET_KEY and ignored for NEW_KEY. As far as I
understand the default key concept only makes sense for a Tx key because
on Rx all keys can be tried, so we don't need this for client mode. The
kernel decides whether the NEW_KEY is for unicast or multicast based on
whether NL80211_ATTR_KEY_MAC was supplied.
device password was read from settings using l_settings_get_string which
returns a newly-allocated string due to un-escape semantics. However,
when assigning wsc->device_password, we strdup-ed the password again
unnecessarily.
==1069== 14 bytes in 2 blocks are definitely lost in loss record 1 of 1
==1069== at 0x4C2AF0F: malloc (vg_replace_malloc.c:299)
==1069== by 0x16696A: l_malloc (util.c:62)
==1069== by 0x16B14B: unescape_value (settings.c:108)
==1069== by 0x16D12C: l_settings_get_string (settings.c:971)
==1069== by 0x149680: eap_wsc_load_settings (eap-wsc.c:1270)
==1069== by 0x146113: eap_load_settings (eap.c:556)
==1069== by 0x12E079: eapol_start (eapol.c:2022)
==1069== by 0x1143A5: netdev_connect_event (netdev.c:1728)
==1069== by 0x118751: netdev_mlme_notify (netdev.c:3406)
==1069== by 0x1734F1: notify_handler (genl.c:454)
==1069== by 0x168987: l_queue_foreach (queue.c:441)
==1069== by 0x173561: process_multicast (genl.c:469)
wsc_pin_is_valid allows two types of PINs through:
1. 4 digit numeric PIN
2. 8 digit numeric PIN
The current code always calls wsc_pin_is_checksum_valid to determine
whether a DEFAULT or USER_SPECIFIED PIN is used. However, this function
is not safe to call on 4 digit PINs and causes a buffer overflow.
Add simple checks to treat 4 digit PINs as DEFAULT PINs and do not call
wsc_pin_is_checksum_valid on these.
Reported-By: Matthias Gerstner <matthias.gerstner@suse.de>
EAP-WSC handles 4 digit, 8 digit and out-of-band Device passwords. The
latter in particular can be anything, so drop the mandatory minimum
password length check here.
This also has the effect of enabling 4-digit PINs to actually work as
they are intended.
The struct allows to support multiple types of the tunneled methods.
Previously, EAP-TTLS was supporting only the eap based ones.
This patch is also starts to move some of the phase 2 EAP
functionality into the new structure.
Boiled down, FT over SAE is no different than FT over PSK, apart from
the different AKM suite. The bulk of this change fixes the current
netdev/station logic related to SAE by rebuilding the RSNE and adding
the MDE if present in the handshake to match what the PSK logic does.
A common function was introduced into station which will rebuild the
handshake rsne's for a target network. This is used for both new
network connections as well as fast transitions.
To prepare for FT over SAE, several case/if statements needed to include
IE_RSN_AKM_SUITE_FT_OVER_SAE. Also a new macro was introduced to remove
duplicate if statement code checking for both FT_OVER_SAE and SAE AKM's.
All the watchlist notify macros were broken in that they did not check
that the watchlist item was still valid before calling it. This only
came into play when a watchlist was being notified and one of the notify
functions removed an item from the same watchlist. It appears this was
already thought of since watchlist_remove checks 'in_notify' and will
mark the item's id as stale (0), but that id never got checked in the
notify macros.
This fixes testAdHoc valgrind warning:
==3347== Invalid read of size 4
==3347== at 0x416612: eapol_rx_auth_packet (eapol.c:1871)
==3347== by 0x416DD4: __eapol_rx_packet (eapol.c:2334)
==3347== by 0x40725B: netdev_pae_read (netdev.c:3515)
==3347== by 0x440958: io_callback (io.c:123)
==3347== by 0x43FDED: l_main_iterate (main.c:376)
==3347== by 0x43FEAB: l_main_run (main.c:423)
==3347== by 0x40377A: main (main.c:489)
...
In the case of the open networks with hidden SSIDs
the settings object is already created.
Valgrind:
==4084== at 0x4C2EB6B: malloc (vg_replace_malloc.c:299)
==4084== by 0x43B44D: l_malloc (util.c:62)
==4084== by 0x43E3FA: l_settings_new (settings.c:83)
==4084== by 0x41D101: network_connect_new_hidden_network (network.c:1053)
==4084== by 0x4105B7: station_hidden_network_scan_results (station.c:1733)
==4084== by 0x419817: scan_finished (scan.c:1165)
==4084== by 0x419CAA: get_scan_done (scan.c:1191)
==4084== by 0x443562: destroy_request (genl.c:139)
==4084== by 0x4437F7: process_unicast (genl.c:424)
==4084== by 0x4437F7: received_data (genl.c:534)
==4084== by 0x440958: io_callback (io.c:123)
==4084== by 0x43FDED: l_main_iterate (main.c:376)
==4084== by 0x43FEAB: l_main_run (main.c:423)
Some of the PEAP server implementations set the L flag along with
redundant TLS Message Length field for the un-fragmented packets.
This patch allows to identify and handle such occasions.
EAP Extensions type 33 is used in PEAPv0 as a termination
mechanism for the tunneled EAP methods. In PEAPv1
the regular EAP-Success/Failure packets must be used to terminate
the method. Some of the server implementations of PEAPv1
rely on EAP Extensions method to terminate the conversation
instead of the required Success/Failure packets. This patch
makes iwd interoperable with such devices.
The "H" function used by SAE and EAP-PWD was effectively the same
function, EAP-PWD just used a zero key for its calls. This removes
the duplicate implementations and merges them into crypto.c as
"hkdf_256".
Since EAP-PWD always uses a zero'ed key, passing in a NULL key to
hkdf_256 will actually use a 32 byte zero'ed array as the key. This
avoids the need for EAP-PWD to store or create a zero'ed key for
every call.
Both the original "H" functions never called va_end, so that was
added to hkdf_256.
The ifindex as reported by netdev is unsigned, so make sure that it is
printed as such. It is astronomically unlikely that this causes any
actual issues, but lets be paranoid.
Move the roam initiation (signal loss, ap directed roaming) and scanning
details into station from device. Certain device functions have been
exposed temporarily to make this possible.
process_bss performs two main operations. It adds a seen BSS to a
network object (existing or new) and if the device is in the autoconnect
state, it adds an autoconnect entry as needed. Split this operation
into two separate & independent steps.