Like in ap.c, allow the event callback to mark the handshake state as
destroyed, without causing invalid accesses after the callback has
returned. In this case the crash was because try_handshake_complete
needed to access members of handshake_state after emitting the event,
as well as access the netdev, which also has been destroyed:
==257707== Invalid read of size 8
==257707== at 0x408C85: try_handshake_complete (netdev.c:1487)
==257707== by 0x408C85: try_handshake_complete (netdev.c:1480)
(...)
==257707== Address 0x4e187e8 is 856 bytes inside a block of size 872 free'd
==257707== at 0x484621F: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==257707== by 0x437887: ap_stop_handshake (ap.c:151)
==257707== by 0x439793: ap_del_station (ap.c:316)
==257707== by 0x43EA92: ap_station_disconnect (ap.c:3411)
==257707== by 0x43EA92: ap_station_disconnect (ap.c:3399)
==257707== by 0x454276: p2p_group_event (p2p.c:1006)
==257707== by 0x439147: ap_event (ap.c:281)
==257707== by 0x4393AB: ap_new_rsna (ap.c:390)
==257707== by 0x4393AB: ap_handshake_event (ap.c:1010)
==257707== by 0x408C7F: try_handshake_complete (netdev.c:1485)
==257707== by 0x408C7F: try_handshake_complete (netdev.c:1480)
(...)
Previously we added logic to defer doing anything in ap_free() to after
the AP event handler has returned so that ap_event() has a chance to
inform whoever called it that the ap_state has been freed. But there's
also a chance that the event handler is destroying both the AP and the
netdev it runs on, so after the handler has returned we can't even use
netdev_get_wdev_id or netdev_get_ifindex. The easiest solution seems to
be to call ap_reset() in ap_free() even if we're within an event handler
to ensure we no longer need any external objects. Also make sure
ap_reset() can be called multiple times.
Another option would be to watch for NETDEV_WATCH_EVENT_DEL and remove
our reference to the netdev (because there's no need actually call
l_rtnl_ifaddr_delete or frame_watch_wdev_remove if the netdev was
destroyed -- frame_watch already tracks netdev removals), or to save
just the ifindex and the wdev id...
The purpose of this was to have a single utility to both cancel an
existing offchannel operation (if one exists) and start a new one.
The problem was the previous offchannel operation was being canceled
first which opened up the radio work queue to other items. This is
not desireable as, for example, a scan would end up breaking the
DPP protocol most likely.
Starting the new offchannel then canceling is the correct order of
operations but to do this required saving the new ID, canceling, then
setting offchannel_id to the new ID so dpp_presence_timeout wouldn't
overwrite the new ID to zero.
This also removes an explicit call to offchannel_cancel which is
already done by dpp_offchannel_start.
Several members are named based on initiator/responder (i/r)
terminology. Eventually both initiator and responder will be
supported so rename these members to use own/peer naming
instead.
ASN1 parsing will soon be required which will need some utilities in
asn1-private.h. To avoid duplication include this private header and
replace the OID's with the defined structures as well as remove the
duplicated macros.
station_set_scan_results takes an autoconnect flag which was being
set true in both regular/quick autoconnect scans. Since OWE networks
are processed after setting the scan results IWD could end up
connecting to a network before all the OWE hidden networks are
populated.
To fix this regular/quick autoconnect results will set the flag to
false, then process OWE networks, then start autoconnect. If any
OWE network scans are pending station_autoconnect_start will fail
but will pick back up after the hidden OWE scan.
During investigation another separate crash was found. The original is
caused by a disconnect event coming in after a neighbor report scan
was completed (roam failed) during the full roam scan.
The second crash is caused by a disconnect coming in during a full
roam scan when no neighbor report scan was ever issued.
scan_request_failed and scan_finished remove the finished scan_request
from the request queue right away, before calling the callback. This
breaks those clients that rely on scan_cancel working on such requests
(i.e. to force the destroy callback to be invoked synchronously, see
a0911ca778 ("station: Make sure roam_scan_id is always canceled").
Fix this by removing the scan_request from the request queue after
invoking the callback. Also provide a re-entrancy guard that will make
sure that the scan_request isn't removed in scan_cancel itself.
There are similar operations being performed but with different
callbacks and userdata, depending on whether 'sr' is NULL or not.
Optimize the function flow slightly to make if-else unnecessary.
While here, update the comment. periodic scans are now scheduled only
based on the periodic timeout timer.
If periodic scan is active and we receive a SCAN_ABORTED event, we would
still invoke the periodic scan callback with an error. This is rather
pointless since the periodic scan callback cannot do anything useful
with this information. Fix that.
We should never reach a point where NEW_SCAN_RESULTS or SCAN_ABORTED are
received before a corresponding TRIGGER_SCAN is received. Even if this
does happen, there's no harm from processing the commands anyway.
This makes it a little easier to book-keep the started variable. Since
scan_request already has a 'passive' bit-field, there should be no
storage penalty.
If scan_cancel is called on a scan_request that is 'finished' but with
the GET_SCAN command still in flight, it will trigger a crash as
follows:
Received Deauthentication event, reason: 2, from_ap: true
src/station.c:station_disconnect_event() 11
src/station.c:station_disassociated() 11
src/station.c:station_reset_connection_state() 11
src/station.c:station_roam_state_clear() 11
src/scan.c:scan_cancel() Trying to cancel scan id 6 for wdev 200000002
src/scan.c:scan_cancel() Scan is at the top of the queue, but not triggered
src/scan.c:get_scan_done() get_scan_done
Aborting (signal 11) [/home/denkenz/iwd-master/src/iwd]
++++++++ backtrace ++++++++
#0 0x7f9871aef3f0 in /lib64/libc.so.6
#1 0x41f470 in station_roam_scan_notify() at /home/denkenz/iwd-master/src/station.c:2285
#2 0x43936a in scan_finished() at /home/denkenz/iwd-master/src/scan.c:1709
#3 0x439495 in get_scan_done() at /home/denkenz/iwd-master/src/scan.c:1739
#4 0x4bdef5 in destroy_request() at /home/denkenz/iwd-master/ell/genl.c:676
#5 0x4c070b in l_genl_family_cancel() at /home/denkenz/iwd-master/ell/genl.c:1960
#6 0x437069 in scan_cancel() at /home/denkenz/iwd-master/src/scan.c:842
#7 0x41dc2e in station_roam_state_clear() at /home/denkenz/iwd-master/src/station.c:1594
#8 0x41dd2b in station_reset_connection_state() at /home/denkenz/iwd-master/src/station.c:1619
#9 0x41dea4 in station_disassociated() at /home/denkenz/iwd-master/src/station.c:1644
The happens because get_scan_done callback is still called as a result of
l_genl_cancel. Add a re-entrancy guard in the form of 'canceled'
variable in struct scan_request. If set, get_scan_done will skip invoking
scan_finished.
It isn't clear what 'l_queue_peek_head() == results->sr' check was trying
to accomplish. If GET_SCAN dump was scheduled, then it should be
reported. Drop it.
results->sr is set to NULL for 'opportunistic' scans which were
triggered externally. See scan_notify() for details. However,
get_scan_done would only invoke scan_finished (and thus the periodic
scan callback sc->sp.callback) only if the scan queue was empty. It
should do so in all cases.
The point type was being hard coded to 0x3 (BIT1) which may have resulted
in the peer subtracting Y from P when reading in the point (depending on
if Y was odd or not).
Instead set the compressed type to whatever avoids the subtraction which
both saves IWD from needing to do it, as well as the peer.
The intent of this check is to make sure that at least 2 bytes are
available for reading. However, the unintended consequence is that tags
with a zero length at the end of input would be rejected.
While here, rework the check to be more resistant to potential
overflow conditions.
First disconnect wpa_supplicant to make sure it wont miss frames if
it decides to connect. Also alter the order of things for the
configurator test so autoconnect doesn't start until after hostapd
is up (avoids additional scanning and delays)
The DPP spec says nothing about how to handle re-transmits but it
was found in testing this can happen relatively easily for a few
reasons.
If the configurator requests a channel switch but does not get onto
the new channel quick enough the enrollee may have already sent the
authenticate response and it was missed. Also by nature of how the
kernel goes offchannel there are moments in time between ROC when
the card is idle and not receiving any frames.
Only frames where there was no ACK will be retransmitted. If the
peer received the frame and dropped it resending the same frame wont
do any good.
Now the result is sent immediately. Prior a connect attempt or
scan could have started, potentially losing this frame. In addition
the offchannel operation is cancelled after sending the result
which will allow the subsequent connect or scan to happen much
faster since it doesn't have to wait for ROC to expire.
The previous (incorrect) else was removed since it ended up
printing in most cases since the if clause returned. This should
have been an else if conditional from the start and only print if the
station device was not found.
IWD may be in the middle of some long operation, e.g. scanning.
If the URI is returned before IWD is ready, a configurator could
start sending frames and IWD either wont receive them, or will
be unable to respond quickly.
Controlling wpa_supplicant/hostapd from a text based interface is
problematic in that there is no way of knowing if an event corresponds
to a request. In certain cases if wpa_s/hostapd is sending out multiple
events and we make a request, a random event may come back after the
request, but before the actual result.
To fix this, at least for this specific case, we can continue to read
from the socket until the result is numeric.