In order to keep all test-runner dev scripts working and to work with
the new runner.py system some file renaming was required.
test-runner was renamed to run-tests
A new test-runner was added which only creates the Runner() class.
This (as well as subsequent commits) will separate test-runner into two
parts:
1. Environment setup
2. Running tests
Spurred by interest in adding UML/host support, test-runner was in need
of a refactor to separate out the environment setup and actually running
the tests.
The environment (currently only Qemu) requires quite a bit of special
handling (ctypes mounting/reboot, 9p mounts, tons of kernel options etc)
which nobody writing tests should need to see or care about. This has all
been moved into 'runner.py'.
Running the tests (inside test-runner) won't change much.
The new 'runner.py' module adds an abstraction class which allows different
Runner's to be implemented, and setup their own environment as they see
fit. This is in preparation for UML and Host runners.
Any test using assertTrue(hostapd.list_sta()) improperly has been
replaced with wait_for_event(). There were a few places where this
was actually ok (i.e. IWD is already connected) but most needed to
be changed since the check was just after IWD connected and hostapd's
list_sta() API may not return a fully updated list.
- Setting the IP address was resulting in an error:
Error: any valid prefix is expected rather than "wln58".
This is fixed by reordering the arguments with the IP address first
- Remove the sleep, and use non_block_wait to wait for the IPv6 address
to be set.
Before setting the address, wait for the interface to go down. This
fixes somewhat rare cases where setting the address returns -EBUSY
and ultimately breaks the neighbor reports.
All tests which could avoid calling scan() directly have been
changed to use the 'full_scan' argument to get_ordered_network.
This was done because of unreliable scanning behavior on slower
systems, like VMs. If we get unlucky with the scheduler some beacons
are not received in time and in turn scan results are missing.
Using full_scan=True works around this issue by repeatedly scanning
until the SSID is found.
It looks like some architectures defconfig were adding these in
automatically, but not others. Explicitly add these to make sure
the kernel is built correctly.
Base the root user check on os.getuid() instead of SUDO_GID so as not to
implicitly require sudo. SUDO_GID being set doesn't guarantee that the
effective user is root either since you can sudo to non-root accounts.
We check that config is not None but then access config.ctx outside of
that if block anyway. Then we do the same for config.ctx and
config.ctx.args. Nest the if blocks for the checks to be useful.
p2p_peer_update_existing may be called with a scan_bss struct built from
a Probe Request frame so it can't access bss->p2p_probe_resp_info even
if peer->bss was built from a Probe Response. Check the source frame
type of the scan_bss struct before updating the Device Address.
This fixes one timing issue that would make the autotest fail often.
Since l_malloc is used the frame contents are not zero'ed automatically
which could result in random bytes being present in the frame which were
expected to be zero. This poses a problem when calculating the MIC as the
crypto operations are done on the entire frame with the expectation of
the MIC being zero.
Fixes: 83212f9b23 ("eapol: change eapol_create_common to support FILS")
explicit_bzero is used in src/storage.c since commit
01cd858760 but src/missing.h is not
included, as a result build with uclibc fails on:
/home/buildroot/autobuild/instance-0/output-1/host/lib/gcc/powerpc-buildroot-linux-uclibc/10.3.0/../../../../powerpc-buildroot-linux-uclibc/bin/ld: src/storage.o: in function `storage_init':
storage.c:(.text+0x13a4): undefined reference to `explicit_bzero'
Fixes:
- http://autobuild.buildroot.org/results/2aff8d3d7c33c95e2c57f7c8a71e69939f0580a1
When configuring wpa_supplicant all we care about is that it
received the configuration object. wpa_supplicant takes quite a bit
of time to connect in some cases so waiting for that is unneeded.
This also increases the DPP timeout which may be required on slower
systems or if the timing is particularly unlucky when receiving
frames.
This is used to hold the current BSS frequency which will be
used after IWD receives a presence announcement. Since this was
not being set, the logic was always thinking there was a channel
mismatch (0 != current_freq) and attempting to go offchannel to
'0' which resulted in -EINVAL, and ultimately protocol termination.
Change a few critical checks that were failing sometimes:
- A few asserts were changed to wait_for_object_condition
- A 15 second timeout was removed (default used instead)
- Do a full scan at beginning of each test to clear any
cached BSS's. The second test run was getting stale results
and the RSSI values were not expected.
This was not being properly honored when existing networks were
already populated. This poses an issue for any test which uses
full_scan after setting radio values such as signal strength.
For quite a while test-runner has run into frequent OOM exceptions when
running many tests in a row. Its not completely known exactly why, but
seems to point to the 9p driver which is used for sharing the root fs
between the test-runner VM and the host.
With debugging enabled (-d) one can see the available memory available
relatively stable. If a test fails it may spike ~3-4kb but this quickly
recovers as python garbage collects.
At some point the kernel faults failing to allocate which (usually) is
shown by a python OOM exception. At this point there is plenty of
available memory.
Dumping the kernel trace its seen that the 9p driver is involved:
[ 248.962949] test-runner: page allocation failure: order:7, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
[ 248.962958] CPU: 2 PID: 477 Comm: test-runner Not tainted 5.16.0 #91
[ 248.962960] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-4.fc34 04/01/2014
[ 248.962961] Call Trace:
[ 248.962964] <TASK>
[ 248.962965] dump_stack_lvl+0x34/0x44
[ 248.962971] warn_alloc.cold+0x78/0xdc
[ 248.962975] ? __alloc_pages_direct_compact+0x14c/0x1e0
[ 248.962979] __alloc_pages_slowpath.constprop.0+0xbfe/0xc60
[ 248.962982] __alloc_pages+0x2d5/0x2f0
[ 248.962984] kmalloc_order+0x23/0x80
[ 248.962988] kmalloc_order_trace+0x14/0x80
[ 248.962990] v9fs_alloc_rdir_buf.isra.0+0x1f/0x30
[ 248.962994] v9fs_dir_readdir+0x51/0x1d0
[ 248.962996] ? __handle_mm_fault+0x6e0/0xb40
[ 248.962999] ? inode_security+0x1d/0x50
[ 248.963009] ? selinux_file_permission+0xff/0x140
[ 248.963011] iterate_dir+0x16f/0x1c0
[ 248.963014] __x64_sys_getdents64+0x7b/0x120
[ 248.963016] ? compat_fillonedir+0x150/0x150
[ 248.963019] do_syscall_64+0x3b/0x90
[ 248.963021] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 248.963024] RIP: 0033:0x7fedd7c6d8c7
[ 248.963026] Code: 00 00 0f 05 eb b7 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa b8 ff ff ff 7f 48 39 c2 48 0f 47 d0 b8 d9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 81 a5 0f 00 f7 d8 64 89 02 48
[ 248.963028] RSP: 002b:00007ffd06cd87e8 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
[ 248.963031] RAX: ffffffffffffffda RBX: 000056090d87dd20 RCX: 00007fedd7c6d8c7
[ 248.963032] RDX: 0000000000080000 RSI: 000056090d87dd50 RDI: 000000000000000f
[ 248.963033] RBP: 000056090d87dd50 R08: 0000000000000030 R09: 00007fedc7d37af0
[ 248.963035] R10: 00007fedc7d7d730 R11: 0000000000000293 R12: ffffffffffffff88
[ 248.963038] R13: 000056090d87dd24 R14: 0000000000000000 R15: 000056090d0485e8
Here its seen an allocation of 512k is being requested (order:7), but faults.
In this run it there was ~35MB of available memory on the system.
Available Memory: 35268 kB
Last Test Delta: -2624 kB
Per-test Usage:
[ 0] ** 37016
[ 1] ********* 41584
[ 2] * 36280
[ 3] ********* 41452
[ 4] ******** 40940
[ 5] ****** 39284
[ 6] **** 38348
[ 7] *** 37496
[ 8] **** 37892
[ 9] 35268
This can be reproduced by running all autotests (changing the ram down to
~128MB helps trigger it faster):
./tools/test-runner -k <kernel> -d
After many attempts to fix this it was finally found that simply removing the
explicit 9p2000.u version from the kernel command line 'fixed' the problem.
This even allows decreasing the RAM down to 256MB from 384MB and so far no
OOM's have been seen.
In debug mode the test context is printed before each test. This
adds some additional information in there:
Available Memory: /proc/meminfo: MemAvailable
Last Test Delta: Change in usage between current and last test
Per-test Usage: Graph of usage relative to all past tests. This is
useful for seeing a trend down/up of usage.
This could fail and was not being checked. It was minimally changed to
take the ifindex directly (this was the only thing needed from the ethdev)
which allows checking prior to initializing the ethdev.
Running the tests inside a VM makes it difficult for the host to figure
out if the test actually failed or succeeded. For a human its easy to
read the results table, but for an automated system parsing this would
be fragile. This adds a new option --result <file> which writes PASS/FAIL
to the provided file once all tests are completed. Any failures results in
'FAIL' being written to the file.
\e[1;30m is bold black, often, but not always displayed bright black or
bold bright black. In case it is displayed as real black it is
invisible. \e[1;90m is explicit bold bright black.
\e[37m is white, therefore it is not suitable to be labeled as GREY,
which is \e[90m
The logic here assumed any BSS's in the roam scan were identical to
ones in station's bss_list with the same address. Usually this is true
but, for example, if the BSS changed frequency the one in station's
list is invalid.
Instead when a match is found remove the old BSS and re-insert the new
one.
With the addition of 6GHz '6000' is no longer the maximum frequency
that could be in .known_network.freq. For more robustness
band_freq_to_channel is used to validate the frequency.
Scanning while in AP mode is somewhat of an edge case, but it does
have some usefulness specifically with onboarding new devices, i.e.
a new device starts an AP, a station connects and provides the new
device with network credentials, the new device switches to station
mode and connects to the desired network.
In addition this could be used later for ACS (though this is a bit
overkill for IWD's access point needs).
Since AP performance is basically non-existant while scanning this
feature is meant to be used in a limited scope.
Two DBus API's were added which mirror the station interface: Scan and
GetOrderedNetworks.
Scan is no different than the station variant, and will perform an active
scan on all channels.
GetOrderedNetworks diverges from station and simply returns an array of
dictionaries containing basic information about networks:
{
Name: <ssid>
SignalStrength: <mBm>
Security: <psk, open, or 8021x>
}
Limitations:
- Hidden networks are not supported. This isn't really possible since
the SSID's are unknown from the AP perspective.
- Sharing scan results with station is not supported. This would be a
convenient improvement in the future with respect to onboarding new
devices. The scan could be performed in AP mode, then switch to
station and connect immediately without needing to rescan. A quick
hack could better this situation by not flushing scan results in
station (if the kernel retains these on an iftype change).