Besides being undefined behaviour, signed integer overflow can cause
unexpected comparison results. In the case of network_rank_compare(),
a connected network with rank INT_MAX would cause newly inserted
networks with negative rank to be inserted earlier in the ordered
network list. This is reflected in the GetOrderedMethods() DBus method
as can be seen in the following iwctl output:
[iwd]# station wlan0 get-networks
Network name Security Signal
----------------------------------------------------
BEOLAN 8021x **** }
BeoBlue psk *** } all unknown,
UI_Test_Network psk *** } hence assigned
deneb_2G psk *** } negative rank
BEOGUEST open **** }
> titan psk ****
Linksys05274_5GHz_dmt psk ****
Lyngby-4G-4 5GHz psk ****
Doing so ensures that the currently connected network is always at the
beginning of the list. Previously, the list would only get updated after
a scan.
This fixes the documented behaviour of GetOrderedNetworks() DBus method,
which states that the currently connected network is always at the
beginning of the returned array.
Fix a logic error which prevented iwd from using SAE/WPA3 when
attempting to connect to APs that are in transition mode. The SAE/WPA3
check incorrectly required mfpr bit to be set, which is true for
APs in WPA3-Personal only mode, but is set to 0 for APs in
WPA3-Personal transition mode.
This patch also adds a bit more diagnostic output to help diagnose
causes for connections where WPA3 is not attempted even when advertised
by the AP.
Replace the usage of eap_send_response() in the method implementations
with a new eap_method_respond that skips the redundant "type" parameter.
The new eap_send_packet is used inside eap_method_respond and will be
reused for sending request packets in authenticator side EAP methods.
Throughout the supplicant mode we'd use the eapol_sm_write wrapper but
in the authenticator mode we'd call __eapol_tx_packet directly. Adapt
eapol_sm_write to use the right destination address and use it
consistently.
sm->handshake already contains our RSN/WPA IE so there's no need to
rebuild it for msg 3/4, especially since we hardcode the fact that we
only support one pairwise cipher. If we start declaring more supported
ciphers and need to include a second RSNE we can first parse
sm->hs->authenticator_ie into a struct ir_rsn_info, overwrite the cipher
and rebuild it from that struct.
This way we duplicate less code and we hardcode fewer facts about the AP
in eapol.c which also helps in adding EAP-WSC.
In both FT or FILS EAPoL isn't used for the initial handshake and only
for the later re-keys. For FT we added the
eapol_sm_set_require_handshake mechanism to tell EAPoL to not require
the initial handshake and we can re-use it for FILS.
Currently an adversary can retransmit EAPOL Msg4/4 to make the AP
reinstall the PTK. Against older Linux kernels this can subsequently
be used to decrypt, replay, and possibly decrypt frames. See the
KRACK attacks research at krackattacks.com for attack scenarios.
In this case no machine-in-the-middle position is needed to trigger
the key reinstallation.
Fix this by using the ptk_complete boolean to track when the 4-way
handshake has completed (similar to its usage for clients). When
receiving a retransmitted Msg4/4 accept this frame but do not reinstall
the PTK.
Credits to Chris M. Stone, Sam Thomas, and Tom Chothia of Birmingham
University to help discover this issue.
Instead of creating the results->bss_list l_queue lazily, always create
one before sending the GET_SCAN command. This is to make sure that an
empty list is passed to the scan callback (e.g. in station.c) instead of
a NULL. Passing NULL has been causing difficult to debug crashes in
station.c, in fact I think I've been seeing them for over a year now
but can't be sure. station_set_scan_results has been taking ownership
of the new BSS list and, if station->connected_bss was not on the list,
it would try to add it not realizing that l_queue_push_tail() was doing
nothing. Always passing a valid list may help us prevent similar
problems in the future.
The crash might start with:
==120489== Invalid read of size 8
==120489== at 0x425D38: network_bss_select (network.c:709)
==120489== by 0x415BD1: station_try_next_bss (station.c:2263)
==120489== by 0x415E31: station_retry_with_status (station.c:2323)
==120489== by 0x415E31: station_connect_cb (station.c:2367)
==120489== by 0x407E66: netdev_connect_failed (netdev.c:569)
==120489== by 0x40B93D: netdev_connect_event (netdev.c:1801)
==120489== by 0x40B93D: netdev_mlme_notify (netdev.c:3678)
Incorporate the LGPL v2.1 licensed implementation of ARC4, taken from
the Nettle project (https://git.lysator.liu.se/nettle/nettle.git,
commit 3e7a480a1e351884), and tweak it a bit so we don't have to
operate on a skip buffer to fast forward the stream cipher, but can
simply invoke it with NULL dst or src arguments to achieve the same.
This removes the dependency [via libell] on the OS's implementation of
ecb(arc4), which may be going away, and which is not usually accelerated
in the first place.
Use a constant control flow in the derivation loop, avoiding leakage
in the iteration succesfuly converting the password.
Increase number of iterations (20 to 30) to avoid issues with
passwords needing more iterations.
With some devices the 10 seconds are not enough for the P2P Group Owner
to give us an address but I think we still want to use a timeout as
short as possible so that the user doesn't wait too long if the
connection isn't working.
p2p_connection_reset may be called as a result of a WFD service
unregistering and p2p_own_wfd is going to be NULL, don't update
p2p_own_wfd->available in this case.
With some WFD devices we occasionally get a Disconnect before or during
the DHCP setup on the first connection attempt to a newly formeg group,
with the reason code MMPDU_REASON_CODE_PREV_AUTH_NOT_VALID. Retrying a
a few times makes the connections consistently successful. Some
conditions are simplified/update in this patch because
conn_dhcp_timeout now implies conn_wsc_bss, and both imply
conn_retry_count.
In 98cf2bf3ec frame_xchg_stop was removed
and its use in p2p.c was changed to frame_xchg_cancel with the slight
complication that the ID returned by frame_xchg_start had do be stored.
Re-add frame_xchg_stop, (renamed as frame_xchg_stop_wdev) to simplify
this bit in p2p.c.
Since there may now be multiple frames-xchg record for each wdev, when
we receive the TX Status event, make sure we find the record who's radio
work has started, as indicated by fx->retry_cnt > 0. Otherwise we're
relying on the ordering of the frames in the "frame_xchgs" queue and
constant priority.
The BSSID (address_3) in response frames was being checked to be the
same as in the request frame, or all-zeros for faulty drivers. At least
one Wi-Fi Display device sends a GO Negotiation Response with the BSSID
different from its Device Address (by 1 bit) and I didn't see an easy
way to obtain that address beforhand so we can "whitelist" it for this
check, so just drop that check for now.
ANQP didn't have this check before it started using frame-xchg so it
shouldn't be critical.
When a frame registered in a given group Id triggers a callback and that
callback ends up calling frame_watch_group_remove for that group Id,
that call will happen inside WATCHLIST_NOTIFY_MATCHES and will free the
memory used by the watchlist. watchlist.h has protection against the
watchlist being "destroyed" inside WATCHLIST_NOTIFY_MATCHES, but not
against its memory being freed -- the memory where it stores the in_notify
and destroy_pending flags. Free the group immediately after
WATCHLIST_NOTIFY_MATCHES to avoid reads/writes to those flags triggering
valgrind warnings.
frame_xchg_destroy is passed as the wiphy radio work's destroy callback
to wiphy.c. If it's also called directly in frame_xchg_exit, there's
going to be a use-after-free when it's called again from wiphy_exit, so
instead use wiphy_radio_work_done which will call frame_xchg_destroy and
forget the frame_xchg record.
This patch lets us establish WFD connections by parsing, validating and
acting on WFD IEs in received frames, and adding our own WFD IEs in the
GO Negotiation and Association frames. Applications should assume that
any connection to a WFD-capable peer when we ourselves have a WFD
service registered, are WFD connections and should handle RTSP and
other IP-based protocols on those connections.
When connecting to a WFD-capable peer and when we have a WFD service
registered, the connection will fail if there are any conflicting or
invalid WFD parameters during GO Negotiation.
If anyone's registered as implementing the WFD service, add the
net.connman.iwd.p2p.Display DBus interface on peer objects that are
WFD-capable and are available for a WFD Session.
The net.connman.iwd.p2p.ServiceManager interface on the /net/connman/iwd
object lets user applications register/unregister the Wi-Fi Display
service. In this commit all it does is it adds local WFD information
as given by the app, to the frames we send out during discovery.
Instead of accepting raw WFD IE contents from the app and exposing
peers' raw WFD IEs to the app, we build the WFD IEs in our code based on
the few meaningful DBus properties that we support and using default
values for the rest. If an app ever needs any of the other WFD
capabilities more properties can be added.
The are useful for P2P service implementations to know unambiguously
which network interface a new P2P connection is on and the peer's IPv4
address if they need to initiate an IP connection or validate an
incoming connection's address from the peer.
This uses l_dhcp_lease_get_server_id to get the IP of the server that
offered us our current lease. l_dhcp_lease_get_server_id returns the
vaue of the L_DHCP_OPTION_SERVER_IDENTIFIER option, which is the address
that any unicast DHCP frames are supposed to be sent to so it seems to
be the best way to get the P2P group owner's IP address as a P2P-client.
peer->device_addr is a pointer to the Device Address contained in
one of two possible places in peer->bss. If during discovery we've
received a new beacon/probe response for an existing peer and we're
going to replace peer->bss, we also have to update peer->device_addr.
If we were in discovery only to be able to receive the target peer's
GO Negotiation Request (i.e. we have no users requesting discovery)
and we've received the frame and decided that the connection has
failed, exit discovery.
To use the wiphy radio work queue, scanning mostly remained the same.
start_next_scan_request was modified to be used as the work callback,
as well as not start the next scan if the current one was done
(since this is taken care of by wiphy work queue now). All
calls to start_next_scan_request were removed, and more or less
replaced with wiphy_radio_work_done.
scan_{suspend,resume} were both removed since radio management
priorities solve this for us. ANQP requests can be inserted ahead of
scan requests, which accomplishes the same thing.
Before connecting to a hidden network we must scan. During this scan
if another connection attempt comes in the expected behavior is to
abort the original connection. Rather than waiting for the scan to
complete, then canceling the original hidden connection we can just
cancel the hidden scan immediately, reply to dbus, and continue with
the new connection attempt.
The new frame-xchg module now handles a lot of what ANQP used to do. ANQP
now does not need to depend on nl80211/netdev for building and sending
frames. It also no longer needs any of the request lookups, frame watches
or to maintain a queue of requests because frame-xchg filters this for us.
From an API perspective:
- anqp_request() was changed to take the wdev_id rather than ifindex.
- anqp_cancel() was added so that station can properly clean up ANQP
requests if the device disappears.
During testing a bug was also fixed in station on the timeout path
where the request queue would get popped twice.
In order to first integrate frame-xchg some refactoring needed to
be done. First it is useful to allow queueing frames up rather than
requiring the module (p2p, anqp etc) to wait for the last frame to
finish. This can be aided by radio management but frame-xchg needed
some refactoring as well.
First was getting rid of this fx pointer re-use. It looks like this
was done to save a bit of memory but things get pretty complex
needed to check if the pointer is stale or has been reset. Instead
of this we now just allocate a new pointer each frame-xchg. This
allows for the module to queue multiple requests as well as removes
the complexity of needed to check if the fx pointer is stale.
Next was adding the ability to track frame-xchgs by ID. If a module
can queue up multiple requests it also needs to be able to cancel
them individually vs per-wdev. This comes free with the wiphy work
queue since it returns an ID which can be given directly to the
caller.
Then radio management was simply piped in by adding the
insert/done APIs.
These APIs will handle fairness and order in any operations which
radios can only do sequentially (offchannel, scanning, connection etc.).
Both scan and frame-xchg are complex modules (especially scanning)
which is why the radio management APIs were implemented generic enough
where the changes to both modules will be minimal. Any module that
requires this kind of work can push a work item into the radio
management work queue (wiphy_radio_work_insert) and when the work
is ready to be started radio management will call back into the module.
Once the work is completed (and this may be some time later e.g. in
scan results or a frame watch) the module can signal back that the
work is finished (wiphy_radio_work_done). Wiphy will then pop the
queue and continue with the next work item.
A concept of priority was added in order to allow important offchannel
operations (e.g. ANQP) to take priority over other work items. The
priority is an integer, where lower values are of a higher priority.
The concept of priority cleanly solves a lot of the complexity that
was added in order to support ANQP queries (suspending scanning and
waiting for ANQP to finish before connecting).
Instead ANQP queries can be queued at a higher priority than scanning
which removes the need for suspending scans. In addition we can treat
connections as radio management work and insert them at a lower
priority than ANQP, but higher than scanning. This forces the
connection to wait for ANQP without having to track any state.
When roaming, iwd tries to scan a limited number of frequencies to keep
the roaming latency down. Ideally the frequency list would come in from
a neighbor report, but if neighbor reports are not supported, we fall
back to our internal database for known frequencies of this network.
iwd tries to keep the number of scans down to a bare minimum, which
means that we might miss APs that are in range. This could happen
because the user might have moved physically and our frequency list is
no longer up to date, or if the AP frequencies have been reconfigured.
If a limited scan fails to find any good roaming candidates, re-attempt
a full scan right away.
If the roam failed and we are no longer connected, station_disassociated
is called which ends up calling station_roam_state_clear. Thus
resetting the variables is not needed. Reflow the logic to make this a
bit more explicit.
If the roam attempt fails, do not reset this to false. Generally this
is set by the fact that we lost beacon and to not attempt neighbor
reports, etc. This hint should be preserved across roam attempts.
frame_xchg_startv was using sizeof(mmpdu) to check the minimum length
for a frame. Instead mmpdu_header_len should be used since this checks
fc.order and returns either 24 or 28 bytes, not 28 bytes always.
This change adds the requirement that the first iovec in the array
must contain at least the first 2 bytes (mmpdu_fc) of the header.
This really shouldn't be a problem since all current users of
frame-xchg put the entire header (or entire frame) into the first
iovec in the array.
explicit_bzero is used in src/p2p.c since commit
1675c765a3 but src/missing.h is not
included, as a result build with uclibc fails on:
/home/naourr/work/instance-0/output-1/per-package/iwd/host/opt/ext-toolchain/bin/../lib/gcc/mips64el-buildroot-linux-uclibc/5.5.0/../../../../mips64el-buildroot-linux-uclibc/bin/ld: src/p2p.o: in function `p2p_connection_reset':
p2p.c:(.text+0x2cf4): undefined reference to `explicit_bzero'
/home/naourr/work/instance-0/output-1/per-package/iwd/host/opt/ext-toolchain/bin/../lib/gcc/mips64el-buildroot-linux-uclibc/5.5.0/../../../../mips64el-buildroot-linux-uclibc/bin/ld: p2p.c:(.text+0x2cfc): undefined reference to `explicit_bzero'
This logic was using l_hashmap_insert, which supports duplicates. Since
some entries were inserted multiple times, they ended up being printed
multiple times. Fix that by introducing a macro that uses
l_hashmap_replace instead.
Right now, if the connection fails, then network always thinks that the
password should be re-asked. Loosen this to only do so if the
connection failed at least in the handshake phase. If the connection
failed due to Association / Authentication timeout, it is likely that
something is wrong with the AP and it can't respond.
Using the new station ANQP watch network can delay the connection
request until after ANQP has finished. Since station may be
autoconnecting we must also add a check in network_autoconnect
which prevents it from autoconnecting if we have a pending Connect
request.
This is to allow network to watch for ANQP activity in order to
fix the race condition between scanning finishing and ANQP finishing.
Without this it is possible for a DBus Connect() to come in before
ANQP has completed and causing the network to return NotConfigured,
when its actually in the process of obtaining all the network info.
The watch was made globally in station due to network not having
a station object until each individual network is created. Adding a
watch during network creation would result in many watchers as well
as a lot of removal/addition as networks are found and lost.
Change signature of network_connect_new_hidden_network to take
reference to the caller's l_dbus_message struct. This allows to
set the caller's l_dbus_message struct to NULL after replying in
the case of a failure.
==201== at 0x467C15: l_dbus_message_unref (dbus-message.c:412)
==201== by 0x412A51: station_hidden_network_scan_results (station.c:2504)
==201== by 0x41EAEA: scan_finished (scan.c:1505)
==201== by 0x41EC10: get_scan_done (scan.c:1535)
==201== by 0x462592: destroy_request (genl.c:673)
==201== by 0x462987: process_unicast (genl.c:988)
==201== by 0x462987: received_data (genl.c:1087)
==201== by 0x45F5A2: io_callback (io.c:126)
==201== by 0x45E8FD: l_main_iterate (main.c:474)
==201== by 0x45E9BB: l_main_run (main.c:521)
==201== by 0x45EBCA: l_main_run_with_signal (main.c:643)
==201== by 0x403B15: main (main.c:512)
Introduce hidden_pending to keep reference to the dbus message object
while we wait for the scan results to be returned while trying to
connect to a hidden network. This simplifies the logic by separating it
into two independent logical units: scanning, connecting and eliminates
a possibility of a memory leak in the case when Network.Connect being
initiated while Station.ConnectHiddenNetwork is in progress.
If a connection is initiated (via dbus) while a quick scan is in
progress, the quick scan will be aborted. In this case,
station_quick_scan_results will always transition to the
AUTOCONNECT_FULL state regardless of whether it should or not.
Fix this by making sure that we only enter AUTOCONNECT_FULL if we're
still in the AUTOCONNECT_QUICK state.
Reported-by: Alvin Šipraga <alsi@bang-olufsen.dk>
If start_scan_next_request() is called while a scan request
(NL80211_CMD_TRIGGER_SCAN) is still running, the same scan request will
be sent again. Add a check in the function to avoid sending a request if
one is already in progress. For consistency, check also that scan
results are not being requested (NL80211_CMD_GET_SCAN), before trying to
send the next scan request. Finally, remove similar checks at
start_next_scan_request() callsites to simplify the code.
This also fixes a crash that occurs if the following conditions are met:
- the duplicated request is the only request in the scan request
queue, and
- both scan requests fail with an error not EBUSY.
In this case, the first callback to scan_request_triggered() will delete
the request from the scan request queue. The second callback will find
an empty queue and consequently pass a NULL scan_request pointer to
scan_request_failed(), causing a segmentation fault.
If scanning is suspended, have scan_common() queue its scan request
rather than issuing it immediately. This respects the assumption that
scans are not requested while sc->suspended is true.
This bug is caused by the following behavior:
1. Start a frame-xchg, wait for callback
2. From callback start a new frame-xchg, same prefix.
The new frame-xchg request will detect that there is a duplicate watch,
which is correct behavior. It will then remove this duplicate from the
watchlist. The issue here is that we are in the watchlist notify loop
from the original xchg. This causes that loop to read from the now
freed watchlist item, causing an invalid read.
Instead of freeing the item immediately, check if the notify loop is in
progress and only set 'id' to zero and 'stale_items' to true. This will
allow the notify loop to finish, then the watchlist code will prune out
any stale items. If not in the notify loop the item can be freed as it
was before.
Don't match the default group's (group_id 0) wdev_id against the
provided wdev_id because the default group can be used on all wdevs and
its wdev_id is 0. Also match individual item's wdev_id in the group to
make up for this although it normally wouldn't matter.
802.11ai mandates that the RSN element is included during authentication
for FILS. This previously was happening by chance since supplicant_ie
was being included with CMD_AUTHENTICATE. This included more than just
the RSNE so that was removed in an earlier commit. Now FILS builds the
RSNE itself and includes this with CMD_AUTHENTICATE.
build_cmd_ft_authenticate and build_cmd_authenticate were virtually
identical. These have been unified into a single builder.
We were also incorrectly including ATTR_IE to every authenticate
command, which violates the spec for certain protocols, This was
removed and any auth protocols will now add any IEs that they require.
In this situation the kernel is sending a low RSSI event which netdev
picks up, but since we set netdev->connected so early the event is
forwarded to station before IWD has fully connected. Station then
tries to get a neighbor report, which may fail and cause a known
frequency scan. If this is a new network the frequency scan tries to
get any known frequencies in network_info which will be unset and
cause a segfault.
This can be avoided by only sending RSSI events when netdev->operational
is set rather than netdev->connected.
Some full mac cards don't like being given a FT AKM when connecting.
From an API perspective this should be supported, but in practice
these cards behave differently and some do no accept FT AKMs. Until
this becomes more stable any cards not supporting Auth/Assoc commands
(full mac) will not connect using FT AKMs.
This callback gets called way to many times to have a debug print
in the location that it was. Instead only print if a NEW wiphy is
found, and also print the name/id.
Save the value of the watchlist pointer at the beginning of the
WATCHLIST_NOTIFY_* macros as if it was a function. This will fix a
frame-xchg.c scenario in which one of the watch callback removes the
frame watch group and the memory where the watchlist pointer was
becomes unallocated but the macro still needs to access it ones or
twice while it destroys the watchlist. Another option would be for
the pointer to be copied in frame-xchg.c itself.
Use netconfig.c functions to unconditionally run DHCP negotiation,
fail the connection setup if DHCP fails. Only report connection success
after netconfig returns.
Add the final two steps of the connection setup, and corresponding
disconnect logic:
* the WSC connection to the GO to do the client provisioning,
* the netdev_connect call to use the provisioned credentials for the
final WPA2 connection.
Once we've found the provisioning BSS create the P2P-Client interface
that we're going to use for the actual provisioning and the final P2P
connection.
Some devices (a Wi-Fi Display dongle in my case) will send us Probe
Requests and wait for a response before they send us the GO
Negotiation Request that we're waiting for after the peer initially
replied with "Fail: Information Not Available" to our GO Negotiation
attempt. Curiously this specific device I tested would even accept
a Probe Response with a mangled body such that the IE sequence couldn't
be parsed.
Handle the scenario where the peer's P2P state machine doesn't know
whether a connection has been authorized by the user and needs some time
to ask the user or a higher software layer whether to accept a
connection. In that case their GO Negotiation Response to our GO
Negotiation Request will have the status code "Fail: Information Not
Available" and we need to give the peer 120s to start a new GO
Negotiation with us. In this patch we handle the GO Negotiation
responder side where we parse the Request frame, build and send the
Response and finally parse the Confirmation. The existing code so far
only did the initiator side.
Parse the GO Negotiation Response frame and if no errors found send the
GO Negotiation Confirmation. If that gets ACKed wait for the GO to set
up the group.
Add net.connman.iwd.SimpleConfiguration interfaces to peer objects on
DBus and handle method calls. Building and transmitting the actual
action frames to start the connection sequence is done in the following
commits.
Add some of the Device Discovery logic and the DBus API. Device
Discovery is documented as having three states: the Scan Phase, the Find
Phase and the Listen State.
This patch adds the Scan Phase and the next patch adds the Listen State,
which will happen sequentially in a loop until discovery is stopped.
The Find Phase, which is documented as happening at the beginning of the
Discovery Phase, is incorporated into the Scan Phases. The difference
between the two is that Find Phase scans all of the supported channels
while the Scan Phase only scans the three "social" channels. In
practical terms the Find Phase would discover existing groups, which may
operate on any channel, while the Scan Phase will only discover P2P
Devices -- peers that are not in a group yet. To cover existing groups,
we add a few "non-social" channels to each of our active scans
implementing the Scan Phases.
When a new wiphy is added query its regulatory domain and listen for
nl80211 regulatory notifications to be able to provide current
regulatory country code through the new wiphy_get_reg_domain_country().
Implement the Enabled property on device interface. The P2P device is
currently disabled on startup but automatically enabling the P2P device
can be considered.
SOL_NETLINK is used since commit
87a198111a resulting in the following
build failure with glibc < 2.24:
src/frame-xchg.c: In function 'frame_watch_group_io_read':
src/frame-xchg.c:328:27: error: 'SOL_NETLINK' undeclared (first use in this function)
if (cmsg->cmsg_level != SOL_NETLINK)
^
This failure is due to glibc that doesn't support SOL_NETLINK before
version 2.24 and
f9b437d5ef
Fixes:
- http://autobuild.buildroot.org/results/3485088b84111c271bbcfaf025aa4103c6452072
For PSK networks we have netdev.c taking care of setting the linkmode &
operstate. For open adhoc networks, netdev.c was never involved which
resulted in linkmode & operstate never being set. Fix this by invoking
the necessary magic when a connection is established.
adhoc_reset() destroys ssid and sta_states but leaves the pointers
around, athough the adhoc_state structure is not always freed.
This causes a segfault when exiting iwd after a client has done
adhoc start and adhoc stop on a device since adhoc_reset() is called
from adhoc_sta_free although it was previously called from
adhoc_leave_cb().
The netdev_leave_adhoc() returns a negative errno on errors and zero
on success, but adhoc_dbus_stop() assumed the inverse when checking for
an error.
Also, the DBus message was not being referenced in adhoc->pending and
then adhoc_leave_cb() segfaulted attempting to dereference it.
It seems some APs send the IGTK key in big endian format (it is a
uin16). The kernel rightly reports an -EINVAL error when iwd issues a
NEW_KEY with such a value, resulting in the connection being aborted.
Work around this by trying to detect big-endian key indexes and 'fixing'
them up.
This bug has been in here since OWE was written, but a similar bug also
existed in hostapd which allowed the PTK derivation to be identical.
In January 2020 hostapd fixed this bug, which now makes IWD incompatible
when using group 20 or 21.
This patch fixes the bug for IWD, so now OWE should be compatible with
recent hostapd version. This will break compatibility with old hostapd
versions which still have this bug.
If the AP only supports an AKM which requires an auth protocol
CMD_AUTHENTICATE/CMD_ASSOCIATE must be supported or else the
auth protocol cannot be run. All the auth protocols are started
assuming that the card supports these commands, but the support
was never checked when parsing supported commands.
This patch will prevent any fullMAC cards from using
SAE/FILS/OWE. This was the same behavior as before, just an
earlier failure path.
This function was intended to catch socket errors and destroy the group
but it would leak the l_io object if that happened, and if called on
ordinary shutdown it could cause a crash. Since we're now assuming
that the netlink socket operations never fail just remove it.
Only add constants for parsing the Device Information subelement as that
is the main thing we care about in P2P code. And since our own WFD IEs
will likely only need to contain the Device Information subelement, we
don't need builder utilities. We do need iterator utilities because we
may receive WFD IEs with more subelements.
In some cases a P2P peer will ACK our frame but not reply on the first
attempt, and other implementations seem to handle this by going back to
retransmitting the frame at a high rate until it gets ACKed again, at
which point they will again give the peer a longer time to tx the
response frame. Implement the same logic here by adding a
retries_on_ack parameter that takes the number of additional times we
want to restart the normal retransmit counter after we received no
response frame on the first attempt. So passing 0 maintains the
current behaviour, 1 for 1 extra attempt, etc.
In effect we may retransmit a frame about 15 * (retry_on_ack + 1) *
<in-kernel retransmit limit> times. The kernel/driver retransmits a
frame a number of times if there's no ACK (I've seen about 20 normally)
at a high frequency, if that fails we retry the whole process 15 times
inside frame-xchg.c and if we still get no ACK at any point, we give up.
If we do get an ACK, we wait for a response frame and if we don't get
that we will optionally reset the retry counter and restart the whole
thing retry_on_ack times.
In order to support AlwaysRandomizeAddress and AddressOverride, station will
set the desired address into the handshake object. Then, netdev checks if
this was done and will use that address rather than generate one.
This patch adds two new options to a network provisioning file:
AlwaysRandomizeAddress={true,false}
If true, IWD will randomize the MAC address on each connection to this
network. The address does not persists between connections, any new
connection will result in a different MAC.
AddressOverride=<MAC>
If set, the MAC address will be set to <MAC> assuming its a valid MAC
address.
These two options should not be used together, and will only take effect
if [General].AddressRandomization is set to 'network' in the IWD
config file.
If neither of these options are set, and [General].AddressRandomization
is set to 'network', the default behavior remains the same; the MAC
will be generated deterministically on a per-network basis.
Since frame_watch_remove_by_handler only forgets a given function +
user data pointers, and doesn't remove the frame prefixes added in the
kernel, we can avoid later re-registering those prefixes with the
kernel by keeping them in our local watchlist, and only replacing the
handler pointer with a dummy function.
If during WATCHLIST_NOTIFY{,_MATCHES,_NO_ARGS} one of the watch
notify callback triggers a call to watchlist_destroy, give up calling
remaining watches and destroy the watchlist without crashing. This is
useful in frame-xchg.c (P2P use case) where a frame watch may trigger
a move to a new state after receiving a specific frame, and remove one
group of frame watches (including its watchlist) to create a different
group.
For privacy reasons its advantageous to randomize or mask
the MAC address when connecting to networks, especially public
networks.
This patch allows netdev to generate a new MAC address on a
per-network basis. The generated MAC will remain the same when
connecting to the same network. This allows reauthentications
or roaming to work, and not have to fully re-connect (which would
be required if the MAC changed on every connection).
Changing the MAC requires bringing the interface down. This does
lead to potential race conditions with respect to external
processes. There are two potential conditions which are explained
in a TODO comment in this patch.
This API is being added to support per-network MAC address
generation. The MAC is generated based on the network SSID
and the adapters permanent address using HMAC-SHA256. The
SHA digest is then constrained to make it MAC address
compliant.
Generating the MAC address like this will ensure that the
MAC remains the same each time a given SSID is connected to.
Make sure a frame callback is free to call frame_xchg_stop without
causing a crash. Frame callback here means the one that gets
called if our tx frame was ACKed and triggered a respone frame that
matched one of the provided prefixes, within the given time.
All in all a frame callback is allowed to call either
frame_xchg_stop or frame_xchg_startv or neither. Same applies to
the final callback (called when no matching responses received).
Don't crash if the user calls frame_xchg_stop(wdev) from inside the
frame exchange's final callback. That call is going to be redundant but
it's convenient to do this inside a cleanup function for a given wdev
without having to check whether any frame exchange was actually running.
This API was updated to take an extra boolean which will
automatically power up the device while changing the MAC
address. Since this is what IWD does anyways we can avoid
the need for an intermediate callback and go right into
netdev_initial_up_cb.
iwd would fail to connect using EAP-TLS when no CA certificate was
provided as it checked for successful loading of the CA certificate
instead of the client certificate when attempting to load the client
certificate.
The password for EAP-GTC is directly used in an EAP response. The
response buffer is created on the stack so an overly large password
could cause a stack overflow.
mac80211 drivers seem to send the disconnect event which is triggered by
CMD_DISCONNECT prior to the CMD_DISCONNECT response. However, some
drivers, namely brcmfmac, send the response first and then send the
disconnect event. This confused iwd when a connection was immediately
triggered after a disconnection (network switch operation).
Fix this by making sure that connected variable isn't set until the
connect event is actually processed, and ignore disconnect events which
come after CMD_DISCONNECT has alredy succeeded.
For nl80211 sockets other than our main l_genl object use socket io
directly, to avoid creating many instances of l_genl. The only reason
we use multiple sockets is to work around an nl80211 design quirk that
requires closing the socket to unregister management frame watches.
Normally there should not be a need to create multiple sockets in a
program.
Add a little state machine and a related API, to simplify sending out a
frame, receiving the Ack / No-ack status and (if acked) waiting for a
response frame from the target device, one of a list of possible
frame prefixes. The nl80211 API for this makes it complicated
enough that this new API seems to be justified, on top of that there's a
quirk when using the brcmfmac driver where the nl80211 response
(containing the operation's cookie), the Tx Status event and the response
Frame event are received from nl80211 in reverse order (not seen with
other drivers so far), further complicating what should be a pretty
simple task.
Try to better deduplicate the frame watches. Until now we'd check if
we'd already registered a given frame body prefix with the kernel, or a
matching more general prefix (shorter). Now also try to check if we
have already have a watch with the same callback pointer and user_data
value, and:
* an identical or shorter (more general) prefix, in that case ignore
the new watch completely.
* a longer (more specific) prefix, in that case forget the existing
watch.
The use case for this is when we have a single callback for multiple
watches and multiple frame types, and inside that callback we're looking
at the frame body again and matching it to frame types. In that case
we don't want that function to be called multiple times for one frame
event.
In frame_watch_group_remove I forgot to actually match the group to be
removed by both wdev_id and group_id. group_ids are unique only in the
scope of one wdev.
I forgot to actually add new groups being created in
frame_watch_group_get to the watch_groups queue, meaning that we'd
re-create the group every time a new watch was added to the group.
Processing the duplicated TLVs while connecting to a malicious AP may lead
to overflow of the response buffer. This patch ensures that the
duplicated TLVs are not parsed.
The pending wiphy state 'use_default' variable was not set early enough
in some circumstances resulting in weird behavior for blacklisted
drivers. Fix this by adding a manager_wiphy_dump_done callback which
will properly initialize the use_default value.
Fixes: c4b2f10483 ("manager: Handle missing NEW_WIPHY events")
brcmfmac does not allow the removal of the default / primary interface.
So there isn't much point in having iwd attempt this.
Another issue is that brcmfmac _does_ allow the deletion of non-default
interfaces. So starting iwd on a system with a station & ap interface
active can result in iwd attempting to delete all the interfaces. Given
the above, it succeeds in deleting the ap interface but not the station
one. In strange circumstances it might end up thinking that the ap
interface is the 'default' and trying to use it, whereas it was just
successfully removed.
==192== Conditional jump or move depends on uninitialised value(s)
==192== at 0x4531D3: l_queue_find (queue.c:346)
==192== by 0x42F1F8: manager_config_notify (manager.c:667)
==192== by 0x45A895: process_multicast (genl.c:970)
==192== by 0x45A895: received_data (genl.c:1037)
==192== by 0x4577B2: io_callback (io.c:126)
==192== by 0x456B0D: l_main_iterate (main.c:473)
==192== by 0x456BCB: l_main_run (main.c:520)
==192== by 0x456DDA: l_main_run_with_signal (main.c:642)
==192== by 0x4034B0: main (main.c:497)
The kernel emits NEW_WIPHY events whenever a new wiphy is registered.
Unfortunately these events are emitted under the 'legacy' semantics and
have a hard size limit of 4096 bytes. Unfortunately, it is possible for
a NEW_WIPHY message to exceed this limit (ath10k cards seem to be
affected in particular), which results in the kernel never sending these
messages out. This can lead to NEW_INTERFACE events being emitted with
a wiphy_id that had no corresponding NEW_WIPHY event emitted. Such a
sequence can confuse iwd's hardware detection logic, particularly during
hot-plug or system boot.
Fix this by re-dumping the wiphy if such a condition is detected. This
has some interaction with blacklisted wiphys, so the wiphy objects are
now always tracked and marked as blacklisted. Before, the blacklisted
wiphys were simply not added to the iwd list of tracked wiphys.
For the inner EAP methods that support generation of the key material
include it into imck generation. This allows to cryptographically
bind the inner method with the tunnel.
Windows Server 2008 - Network Policy Server (NPS) generates an invalid
Compound MAC for Cryptobinding TLV when is used within PEAPv0 due to
incorrect parsing of the message containing TLS Client Hello.
Setting L bit and including TLS Message Length field, even for the
packets that do not require fragmentation, corrects the issue. The
redundant TLS Message Length field in unfragmented packets doesn't
seem to affect the other server implementations.
Sometimes, at least with brcmfmac, the default interface apparently
takes a moment to get created after the NEW_WIPHY event. We didn't
really consider this case in the NEW_WIPHY handler and we've got a race
condition. It fixes the following bug for me:
https://bugs.archlinux.org/task/63912 -- tested by removing and
re-modprobing the brcmfmac module rather than rebooting.
To work around this wait for the NEW_INTERFACE event and then retry the
setup. We still do the initial attempt directly after NEW_WIPHY to
handle cases like wiphys with no default interfaces and pre-existing
wiphys.
We track mtime as the 'LastConnectedTime' of the network, and also sort
the known network list according to the last connected time.
Unfortunately we were never reacting to ATTRIB changes, and so were
never updating the network_info->connected_time whenever a network was
connected to.
Rework the logic to address this. This also fixes a small bug where the
connected_time was not set properly prior to removal / re-insertion of
the network_info.
We use the mtime on the network profile as the 'Last Connected Time'.
When we update any property and sync the file to disk, the mtime was not
preserved (since we were creating a new temporary file instead of
modifying the old one). This led to LastConnectedTime property change
being emitted / updated incorrectly when a writable property on the
KnownNetwork interface was updated.
Our design preference is to not call any callbacks in the _free/_destroy
method of a class (with the exception of explicit destroy callbacks
provided, if any).
Invoking the callback in this case was unnecessary: wsc_dbus_free was
already replying to pending connect / cancel messages. The only other
thing the callback would attempt to do is to set station back into
autoconnect mode. This was unnecessary as well since the netdev is
already down.
This change removes the callback invocation. Since wsc_enrollee_destroy
is now just calling wsc_enrollee_free, remove this from the API and
expose wsc_enrollee_free instead.
Split the WSC D-Bus interface class (struct wsc) into a base class
common to station mode and P2P mode (struct wsc_dbus) and station-
specific logic like scanning, saving the credentials as a known network
and triggering the station-mode connection (struct wsc_station_dbus).
Make the base class and its utilities public in wsc.h for P2P use.
Create struct wsc_enrollee which is allocated with wsc_enrollee_new,
taking a done callback as a parameter. The callback is always
called so there's no need for a separate destroy callback. The object
only lives until the done callback happens so wsc_enrollee_cancel/destroy
can only be used before this.
Looks like the rest of the file is simplified thanks to this.
This new API is independent of netdev.c and allows actually
unregistering from receiving notifications of frames, although with some
quirks. The current API only allowed the callback for a registration to
be forgotten but our process and/or the kernel would still be woken up
when matching frames were received because the kernel had no frame
unregister call. In the new API you can supply a group-id paramter when
registering frames. If it is non-zero the frame_watch_group_remove() call
can be used to remove all frame registrations that had a given group-id
by closing the netlink socket on which the notifications would be
received. This means though that it's a slightly costly operation.
The file is named frame-xchg.c because I'm thinking of also adding
utilities for sending frames and waiting for one of a number of replies
and handling the acked/un-acked information.
Instead of taking the credentials from wsc object directly, have the
caller pass these in. This makes it more consistent with how the
done_cb was done.
Split the interface-specific logic from the core WSC logic. The core
WSC code is the part that we can re-use between P2P and station and
doesn't include the D-Bus code, scanning for the target BSS or the
attempt to make a station mode connection.
Allow netdev_create_from_genl callers to draw a random or non-random MAC
and pass it in the parameter instead of a bool to tell us to generating
the MAC locally. In P2P we are generating the MAC some time before
creating the netdev in order to pass it to the peer during negotiation.
Some server implementation don't seem to provide the valid compound MACs.
In the meantime, iwd will ignore the invalid Crypto-Binding TLVs as their
usage is optional.
The intent was to check for the presence of the add_domain_name
operation, not add_dns operation.
Fixes: 930528e35e ("resolve: Add systemd-resolved domain name installer")
It seems that the kernel uses -EOPNOTSUPP if the change_station
operation is not implemented by the driver. However, some drivers do
implement change_station and choose to report -ENOTSUPP instead of
-EOPNOTSUPP.
To add to the confusion, EOPNOTSUPP and -ENOTSUPP are the same on some
systems (e.g. Gentoo). Be paranoid and allow both errors to be ignored
when sending CMD_SET_STATION.
Fixes: 0238ffb8d9 ("netdev: Use -EOPNOTSUPP instead of -ENOTSUPP")
The first if case should be -10950, not 10950. Without the negative
this first case would get hit every time since signal strength values
are always negative.
The Crypto Binding TLV is used to ensure that the EAP peer and the
EAP server participated in both the inner and the outer EAP
authentications of a PEAP authentication by cryptographically associating
the phase 1 and phase 2 authentications.
The usage of Crypto-Binding in PEAPv0 is optional and is triggered by
the reception of the Crypto-Binding TLV from the server.
The handler for EAP Extensions has been modified to support multiple
TLV types instead of the single Result TLV. This will allow to handle
the other TLVs such as Crypto-Binding TLV.
There are some server implementations that send requests that are
not "Password" but still want us send password. This commit modify
the behavior to send a warning and still try to auth with password.
This makes me able to auth with server in my school which sends
"Enter Aruba Login".
wpa_supplicant does not check if it is "Password".
The kernel uses -EOPNOTSUPP in the case of change_station operation not
being provided. On most systems -EOPNOTSUPP is defined to be the same
as -ENOTSUPP, but seemingly not all systems.
Previously, the key was installed once the tunnel was created
despite the outcome of the second authentication phase. Now, the
key installation is delayed until the successful completion of
the second authentication phase. This excludes the unnecessary
operations in the case of a failure and key reinstallation with
cypro-binding in use.
Commit 1057d8aa74 changed the device interface creation logic
from being unconditional inside netdev.c to instead use NETDEV_WATCH_*
events. However, this broke the assumption that the device interface
was created before all others. The effect is that the scan_wdev_add
might no longer be called prior to station interface being created. Fix
this by moving scan_wdev_add/remove calls to netdev.c instead.
Fixes: 1057d8aa74 ("device: Move device creation from netdev.c to event watch")
#0 0x000055555558ee5d in scan_notify (msg=0x55555560b640, user_data=0x0) at src/scan.c:1706
#1 0x00007ffff7f2c78c in ?? () from /usr/lib/libell.so.0
#2 0x00007ffff7f299ec in ?? () from /usr/lib/libell.so.0
#3 0x00007ffff7f28e4a in l_main_iterate () from /usr/lib/libell.so.0
#4 0x00007ffff7f28efc in l_main_run () from /usr/lib/libell.so.0
#5 0x00007ffff7f290b9 in l_main_run_with_signal () from /usr/lib/libell.so.0
#6 0x00005555555639c4 in main (argc=1, argv=0x7fffffffec18) at src/main.c:497
Save the source frame type in struct scan_bss as it may affect how some
of the data in the struct will be parsed. Also replace the P2P IE
payload data in that struct with a union containing pre-parsed p2p
attributes corresponding to the frame type.
This means users don't have to call the parsers in p2putil.c on that
data, which wouldn't have worked anyway because those parsers assume
input is the raw IE sequence rather than just the "payload".
All these functions free up the resources used by the struct but don't
free the struct itself (allowing it to be static) so rename the
functions to avoid confusion.
The kernel sends NL80211_ATTR_SCAN_START_TIME_TSF with CMD_TRIGGER and
RRM requires this value for beacon measurement reports.
The start time is parsed during CMD_TRIGGER and set into the scan request.
A getter was added to obtain this time value for an already triggered
scan.
After making the change, the SCAN_ABORTED case was cleaned up a bit to
remove the local scan_request usage in favor of the one used for all the
other cases.