This simulates the conditions that trigger a free-after-use which was
fixed with:
2c355db7 ("scan: remove periodic scans from queue on abort")
This behavior can be reproduced reliably using this test with the above
patch reverted.
During investigation another separate crash was found. The original is
caused by a disconnect event coming in after a neighbor report scan
was completed (roam failed) during the full roam scan.
The second crash is caused by a disconnect coming in during a full
roam scan when no neighbor report scan was ever issued.
First disconnect wpa_supplicant to make sure it wont miss frames if
it decides to connect. Also alter the order of things for the
configurator test so autoconnect doesn't start until after hostapd
is up (avoids additional scanning and delays)
Controlling wpa_supplicant/hostapd from a text based interface is
problematic in that there is no way of knowing if an event corresponds
to a request. In certain cases if wpa_s/hostapd is sending out multiple
events and we make a request, a random event may come back after the
request, but before the actual result.
To fix this, at least for this specific case, we can continue to read
from the socket until the result is numeric.
Some wpa_cli utilities return some result which isn't possible to
get with wait_for_event unless you know what the result will be.
This adds wait_for_result which just returns the first event that
comes in.
wait_for_event was checking the event string presence in the rx_data
array which meant the event string had to match perfectly to any
received events. This poses problems with events that include additional
information which the caller may not be able to know or does not care
about. For example:
DPP-RX src=02:00:00:00:02:00 freq=2437 type=11
Waiting for this event previously would require the caller know src, freq,
and type. If the caller only wants to wait for DPP-RX, it can now do that.
Since a Device class can represent multiple modes (AP, AdHoc, station)
move StationDebug out of the init and only create this class when it
is used (presumably only when the device is in station mode).
The StationDebug class is now created in a property method consistent
with 'station_if'. If Device is not in station mode it is automatically
switched if the test tries any StationDebug methods.
If the Device mode is changed from 'station' the StationDebug class
instance is destroyed.
Passing the full argument list to StationDebug was removed
because any existing properties (for Device) were being
included and causing incorrect behavior.
This neglected to handle namespaces which should also be
passed to StationDebug. Unfortunately the arguments are not
named when Device() is initialized so they cannot easily be
sorted. Instead just define Device() arguments to match the
DBus abstraction and pass only the path and namespace to
StationDebug
Make sure we wipe the leases file both for server and client, so that
dhclient doesn't try to re-use leases from previous tests (should really
happen) and waste time waiting for a reply. Extend the timeout from 1s
to 5s, sometimes it takes dhclient 1s just to start. Disable verbose
mode if not needed to avoid dhclient stalling if the pipe is not being
read.
Passing *args, **kwargs into StationDebug ended up initializing the
class with Station properties since devices can be initialized from
existing property dictionaries. Since the object path is all
StationDebug needs, pass args[0] instead.
On some systems the default radvd pid file location is not accessible.
Specify it to be under /tmp instead.
While there, enable full radvd debug output so it is logged when
test-runner is invoked with the --log option.
A user reported a crash in situations where there was an OWE transition
pair, with an extra open network using the same SSID but not advertising
the OWE transition IE:
++++++++ backtrace ++++++++
0x7f199cadf320 in /lib64/libc.so.6
0x418c08 in network_has_open_pair() at /home/jprestwo/iwd/src/station.c:712
0x4262ce in scan_finished() at /home/jprestwo/iwd/src/scan.c:1718
0x4273cd in get_scan_done() at /home/jprestwo/iwd/src/scan.c:1733
0x47cf7a in destroy_request() at /home/jprestwo/iwd/ell/genl.c:674
0x479f1c in io_callback() at /home/jprestwo/iwd/ell/io.c:120
0x47922d in l_main_iterate() at /home/jprestwo/iwd/ell/main.c:472 (discriminator 2)
0x4792dc in l_main_run() at /home/jprestwo/iwd/ell/main.c:521
0x47950c in l_main_run_with_signal() at /home/jprestwo/iwd/ell/main.c:649
0x403e97 in main() at /home/jprestwo/iwd/src/main.c:532
0x7f199cac9b75 in /lib64/libc.so.6
+++++++++++++++++++++++++++
If a beacon is lost testAP will fail since it did not utilize any
rescanning logic. Now it can use this feature by passing full_scan.
This is required since IWD APs are not known to test-runner like
hostapd APs are.
Certain scenarios coupled with lost beacons could result in OrderedNetwork
being initialized many times until the dbus library reached its maximum
signal registrations. This could happen where there are two networks,
IWD finds one in a scan but continues to scan for the other and the beacons
are lost. The way get_ordered_networks was written it returns early if any
networks are found. Since get_ordered_network (not plural) uses
get_ordered_networks() in a loop this caused OrderedNetwork's to be created
rapidly until python raises an exception.
To fix this, pass an optional list of networks being looked for to
get_ordered_networks. Only if all the networks in the list are found will
it return early, otherwise it will continue to scan.
There was no open ssid provisioning file, which was fine as the
first test should have created one. But to be safe, include one
explicitly and use the proper setUp/tearDown functions.
By sleeping for 4 seconds IWD had plenty of time to fully disconnect
and reconnect in time to pass the final "connected" check. Instead
use wait_for_object_condition to wait for disconnected and expect
this to fail. This will let the test fail if IWD disconnects.
REKEY_GTK kicks off the GTK only handshake where REKEY_PTK does
both (via the 4-way). The way this utility was written was causing
hostapd some major issues since both REKEY_GTK and REKEY_PTK was
used.
Instead if address is set only do REKEY_PTK. This will also rekey
the GTK via the 4-way handshake.
If no address is set do REKEY_GTK which will only rekey the GTK.
The FT-over-DS test was allowed to fail as it stood. If FT-over-DS
failed it would just do a normal over-Air transition which satisfied
all the checks. To prevent this Authenticate frames are blocked after
the initial connection so if FT-over-DS fails there is no other way
to roam.
This adds several tests for OWE transition networks. Hostapd
does have special options for these networks but currently their
implementation is incorrect as the IE is not ever added to the
OWE BSS. Besides that using vendor_elements provides a much easier
way to create invalid IEs to test.
Since IWD tries group 20 first all other OWE tests are actually
triggering group negotiation where this test is not. Since this
code is exercised this test can be removed completely, as well
as the additional radio/network.
This test simulates the scenario where IWDs commit is not acked which
exposes a hostapd bug that ultimately fails the connection. This behavior
can be seen by reverting the commit which works around this issue:
"sae: don't send commit in confirmed state"
With the above patch applied this test should pass.
Note: The existing timeout test was reused as it was not of much use
anyways. All it did was block auth/assoc frames and expect a failure
which didn't exercise any SAE logic anyways.
This was a placeholder at one point but modules grew to depend on it
being a string. Fix these dependencies and set the root namespace
name to None so there is no more special case needed to handle both
a named namespace and the original 'root' namespace.
With various versions of wpa_supplicant tested, after an IWD GO tears
the group down, the wpa_supplicant P2P client will not immediately
signal that the group has disappeared but will at least wait for the
lost beacon signal, wait some more and try reconnecting and all that
takes it 10s or a little longer. Possibly sending Deauthenticate frames
to clients first would improve this.
mac80211_hwsim has a funny quirk with multiple addresses in
radios. Some operations require address index zero, some index
one. And these addresses (possibly a result of how test-runner
initializes radios) sometimes get mixed up. For example scan
results may show a BSS address as 02:00:00:00:00:00, while the
next test run shows 42:00:00:00:00:00.
Ultimately, sending out frames requires the first nibble of the
address to be 0x4 so to handle both variants of addresses described
above hwsim.py was updated to always bitwise OR the first byte
with 0x40.
The idea of this test is valid but it is extremely timing dependent
which simply isn't testable on all machines. Removing this test
at least until this can be tested reliably.
testHotspot suffered from improper cleanup and if a single test failed
all subsequent tests would fail due to IWD still running since IWD()
was never cleaned up.
In addition the PSK agent and hwsim rules are now set onto the cls
object and removed in tearDownClass()
There are really no cases where a test wants to remove a single
rule. Most loop through and remove rules individually so this
is being added as a convenience.
Certain autotests coupled with slower test machines can result in lost
beacons and "Network not found" errors. In attempt to help with this
the test can just rescan (30 seconds max) until the network is found.
Remove EAP-SIM from the generic PEAP test case since skipping
(if ofono is not on system) would skip the entire test rather
than just the EAP-SIM portion.
This tests all EAP methods in their standard configuration. Any
corner cases requiring changes to main.conf or other hostapd
options are not included and will be left as stand alone tests.
This was done because nearly all EAP tests are identical except
the IWD provisioning file and hostapd EAP users fine. The IWD
provisioning file can be swapped out as needed for each individual
test without actually restarting IWD. And the EAP users file can
simply be written to include every possible EAP method that
is supported.
The destructor was trying to do more than the scope of a destructor
by trying to handle this single case of hostapd being restarted.
Instead we can simply pass a keyword argument 'reinit' to the
constructor to tell it to reinitialize everything. And as for killing
hostapd this can be done in ungraceful_restart itself rather than
trying to handle it in the destructor.
This (hopefully) will make this test pass better on slower machines.
In addition the mechanism of copying over separate main.conf files
was changed (rather than echo'ing the option into /tmp/main.conf)
This addresses the TODO where HostapdCLI was creating separate
objects each time HostapdCLI was called. This was worked around
by manually setting the important members but instead the class
can be re-worked to act as somewhat of a singleton, per-config
at least.
If there is no HostapdCLI instance for a given config one is
created and initialized. Subsequent HostapdCLI calls (for the
same config) will be returned the same object rather than a
new one.
The hotspot ANQP delay test was setting a global delay on all
packets which had some unintended consequences. At the time this
was the only way of simulating the test scenario but now hwsim
supports prefix matching so only the ANQP request/response will
be delayed.
This test was accessing the subprocess object and calling terminate
which ends up causing issues with test-runners own process cleanup.
Instead kill() should be used.
Hostapd sometimes has trouble with specifying additional BSSs in
a single config file, at least in the test-runner environment.
Since all the BSS's specified were identical instead the test was
reworked to only have a single BSS and each subtest can connect
in its own unique way.
This test took quite a while to execute (~2 minutes on my machine)
because there was simply no other way to test this scenario but
waiting. Now the no-roam-candidates condition can be waited for
rather than just sleeping for 20 seconds. Additionally the default
RoamRetryInterval was being used which is 60 seconds. Instead
main.conf can set this to 5 seconds and really cut down on the time
to wait.
Part of a comment was also removed due to being incorrect. Even
with neighbor reports IWD still must scan, its just that the
scan is more limited and, in theory, faster.
get_ordered_network() now scans automatically and has been updated
to use the StationDebug.Scan() API rather than doing a full
dbus scan (unless full_scan = True). The frequencies to be scanned
are picked automatically based on the current hostapd status
(hidden behind ctx.hostapd.get_frequency()).
This changes all tests to use the default get_ordered_network behavior
rather than some custom or incorrect logic. Any use of
scan_if_needed=True has been removed since this is now the default.
Also any explicit scanning has been removed for tests which do not
require it (where the default behavior is good enough).
With the addition of connect_bssid/roam very few tests actually
require hwsim. Since hwsim can lead to problems with scan results
its best to have it off by default and have each test that needs
it explicitly turn it on.
Tests which previously turned it off have had that option removed.
Tests that do require hwsim still are vulnerable to scan result
problems, so for these tests beacon_int was added to the hostapd
config which seems to help with reliability somewhat.
There is a common block of code in nearly every test which is incorrect,
most likely a copy-paste from long ago. It goes something like:
wd.wait_for_object_condition(device, 'not obj.scanning')
device.scan()
wd.wait_for_object_condition(device, 'not obj.scanning')
network = device.get_ordered_network("ssid")
The problem here is that sometimes the scanning property does not get
updated fast enough before device.scan() returns, meaning get_ordered_network
comes up with nothing. Some tests pass scan_if_needed=True which 'fixes'
this but ends up re-scanning after the original scan finishes.
To put this to rest scan_if_needed is now defaulted to True, and no
explicit scan should be needed.
In addition the send_bss_transition call was updated to only send a
single BSS. By sending two BSS's IWD is left to pick whichever one
it wants which makes the test behavior undefined.
This will use the Roam() developer method to force a roam to
a certain BSS. This is particularly useful for any test requiring
roams that are not testing IWD's BSS selection logic. Rather than
creating hwsim rules, setting low RSSI values, and waiting for the
roam logic/scan to happen Roam() can be used to force the roam
logic immediately.
Several tests tests for connectivity with the expectation that it
will fail. This ends up taking 30+ seconds because testutil retries
3 times, each with a 10 second timeout. By passing expect_fail=True
this lowers the timeout to zero, and skips any retries.
After some code changes the FT-FILS AKM was no longer selectable
inside network_can_connect_bss. This normally shouldn't matter
since station ends up selecting the AKM explicitly, including
passing the fils_hint, but since the autotests only included
FT-FILS AKMs this caused the transition to fail with no available
BSS's.
To fix this the standard 8021x AKM was added to the hostapd
configs. This allows these BSS's to be selected when attempting
to roam, but since FT-FILS is the only other AKM it will be used
for the actual transition.
testScan was creating 10 separate hidden networks which
sometimes bogged down hostapd to the point that it would
not start up in time before test-runner's timeouts fired.
This appeared to be due to hostapd needing to create 10
separate interfaces which would sometimes fail with -ENFILE.
The test itself only needed two separate networks, so instead
the additional 8 can be completely removed.
The scan ranking logic was previously changed to be based off a
theoretical calculated data rate rather than signal strength.
For HT/VHT networks there are many data points that can be used
for this calculation, but non HT/VHT networks are estimated based
on a simple table mapping signal strengths to data rates.
This table starts at a signal strength of -65 dBm and decreases from
there, meaning any signal strengths greater than -65 dBm will end up
getting the same ranking. This poses a problem for 3/4 blacklisting
tests as they set signal strengths ranging from -20 to -40 dBm.
IWD will then autoconnect to whatever network popped up first, which
may not be the expected network.
To fix this the signal strengths were changed to much lower values
which ensures IWD picks the expected network.
Break up the SAE tests into two parts: testSAE and testSAE-AntiClogging
testSAE is simplified to only use two radios and a single phy managed
by hostapd. hostapd configurations are changed via the new 'set_value'
method added to hostapd utils. This allows forcing hostapd to use a
particular sae group set, or force hostapd to use SAE H2E/Hunting and
Pecking Loop for key derivation. A separate test for IKE Group 20 is no
longer required and is folded into connection_test.py
testSAE-AntiClogging is added with an environment for 5 radios instead
of 7, again with hostapd running on a single phy. 'sae_pwe' is used to
force hostapd to use SAE H2E or Hunting and Pecking for key derivation.
Both Anti-Clogging protocol variants are thus tested.
main.conf is added to both directories to force scan randomization off.
This seems to be required for hostapd to work properly on hwsim.
[General].APRange is now [IPv4].APAddressPool and the netmask is changed
from 23 to 27 bits to make the test correctly assert that only two
default-sized subnets are allowed by IWD simultaneously (default has
changed from 24 to 28 bits)
These test cases depend on setting up the existing hostapd instance to a
set of known addresses, which might be different from what test-runner
sets. During this time, any scans might result in the old and the new
addresses used by hostapd to be found in the scan results.
Fix that by using start_iwd=0 which tells test_runner that the test
wants to start iwd itself. This delays starting iwd until after the
setUpClass routine has been called and hostapd configured properly.
Also use more sensible rssi values for the 'non-preferred' bss.
Otherwise, ranking BSSes by throughput can confuse the test logic
since both BSSes are ranked the same and either can be picked by
autoconnect.
The FT-over-DS procedure now authenticates with multiple BSS's
upon connecting. This causes list_sta() to return our address for
any authenticated APs. It has now been changed to work with this
new behavior, as well as a check that the station fully connected
to the expected AP initially.
The AuthCenter was still not being fully cleaned up in these
tests. It was being stopped but there was still a reference being
held which prevented __del__ from being called.
This file was not included when testNetconfig was introduced
and is required. My system was working fine as it was in my
local tree but has been missing and not passing for others.
The cls object is part of the unittest framework and its lifespan
is out of test-runner's control. Setting objects into the cls
object sometimes keeps those objects around longer than desired.
Its best to unset anything set in cls when the test is tore down.
This test fails randomly, and it appears to be due to excessive
scanning. Historically most autotests start a dbus scan right
away. The problem is that most likely a periodic scan is already
ongoing, meaning the dbus scan gets queued. If a Connect() call
comes in (which it always does), the dbus scan gets delayed and will
trigger once connected, at a time the test is not expecting. This
can cause problems with any assumed timing as well as offchannel
frames.
This patch removes the explicit DBus scanning and instead uses
scan_if_needed with get_ordered_networks. The 'all_blacklisted_test'
was also modified to wait for scanning to complete after failing
to connect to all BSS's. This lets all the networks fully come
up (after being blocked by hwsim) and appear in scan results.
Every single roaming test had one of two problems with watching the
state change between roaming --> connected. Either the test used
wait_for_object_condition to wait for 'connected' which could allow
other states in between. Or it simply used an assert. The assert
wouldn't allow other state changes, but at the cost of potentially
failing due to IWD not having made it to the 'connected' state yet.
Now we have wait_for_object_change which takes two conditions:
initial (from_str) and expected (to_str). This API will not allow
any other conditions except these, and will wait for the expected
condition before continuing. This allows roaming test to reliably
wait for the roaming --> connected state change.
This is similar to wait_for_object_condition, but will not allow
any intermediate state changes between the initial and expected
conditions. This is useful for roaming tests when the expected
state change is 'connected' --> 'roaming' with no changes in
between.
This test occationally failed due to a badly timed DBus scan
triggering right when hwsim tried sending out the spoofed frame.
This caused mac80211_hwsim to reject CMD_FRAME when the timing
was just right.
Rather than always starting a DBus scan we can rely on periodic
scans and only DBus scan if there are no networks in IWD's list.
A scanning check was also added prior to sending out the frame
and if true we wait for not scanning. This is more paranoia than
anything.
Sometimes scan results can come in with a MAC address which
should be in the first index of addrs[] (42:xx:xx:xx:xx:xx).
This causes a failure to lookup the radio path.
There was also a failure path added if the radio cannot be
found rather than rely on DBus to fail with a None path.
The arguments to SendFrame were also changed to use the
ByteArray DBus type rather than python's internal bytearray.
This shouldn't have any effect, but its more consistent with
how DBus arguments should be used.
After recent changes fixing wait_for_object_condition it was accidentally
made to only work with classes, not other types of objects. Instead
create a minimal class to hold _wait_timed_out so it doesnt rely on
'obj' holding the boolean.
The testAPRoam autotest was silently failing on my machine until I
realized that my distribution hostapd (Arch Linux) is not built with
CONFIG_WNM_AP=y. Indeed, it is also disabled by default in upstream
hostapd. This resulted in the send_bss_transition() function of
hostapd.py silently failing. With this change, throw an exception in
case the BSS_TM_REQ command does not succeed to hopefully save others
the time of debugging this problem.
After the test-runner re-write many tests were left with
stale options that are no longer used at all. These were
periodically getting removed as changes were made to
individual tests, but its apparent now that a tree wide
removal was needed.
There were some major problems related to logging and process
output. Tests which required output from start_process would
break if used with '--log/--verbose'. This is because we relied
on 'communicate' to retrieve the process output, but Popen does
not store process output when stdout/stderr are anything other
than PIPE.
Intead, in the case of logging or outfiles, we can simply read
from the file we just wrote to.
For an explicit --verbose application we must handle things
slightly different. A keyword argument was added to Process,
'need_out' which will ensure the process output is kept
regardless of --log or --verbose.
Now a user should be able to use --log/--verbose without any
tests failing.
After the re-write this was broken and not noticed until
recently. The issue appeared to be that the GLib timeout
callback retained no context of local variables. Previously
_wait_timed_out was set as a class variable, but this was
removed so multiple IWD instances could work. Without
_wait_timed_out being a class variable the GLib timeout
setting it had no effect on the wait loop.
To fix this we can set _wait_timed_out on the object being
passed in. This is preserved in the GLib timeout callback
and setting it gets honored in the wait loop.
Tests netconfig with a static configuration, as well as tests ACD
functionality.
The test has two IWD radios which will eventually use the same IP.
One is configured statically, one will receive the IP via DHCP.
The static client sets its IP first and begins using it. Then the
DHCP client is started. Since ACD in a DHCP client is configured
to use its address indefinitely, the static client *should* give
up its address.
Certain classes were still using the default namespace. This
didn't matter yet since testAP was the only test using namespaces,
and the AP interface was the only one being used.
For an IWD station on a separate namespace all objects need to
be accessable, so the namespace is passed along to those as needed.
Due to timing this test sometimes does not pass because it was
just asserting on the device state rather than waiting for a
change. This generally worked but not always.
Both these tests create many radios which sometimes causes timing
problems when hwsim is running. Since hwsim is not required for
these tests we can disable it and increase test reliability.
When network namespaces are introduced there may be multiple
IWD class instances. This makes IWD.get_instance ambiguous
when namespaces are involved. iwd.py has been refactored to
not use IWD.get_instance, but testutil still needs it since
its purely based off interface names. Rather than remove it
and modify every test to pass the IWD object we can just
maintain the existing behavior for only the root namespace.
The agent path was generated based on the current time which
sometimes yielded duplicate paths if agents were created quickly
after one another. Instead a simple iterator removes any chance
of a duplicate path.