[denkenz@archdev ~]$ qemu-system-x86_64 --version
QEMU emulator version 9.0.1
Copyright (c) 2003-2024 Fabrice Bellard and the QEMU Project developers
QEMU now seems to complain that 'no-hpet' and 'no-acpi' command line
arguments are unrecognized.
If valgrind didn't report any issues, don't dump the logs. This
makes the test run a lot easier to look at without having to scroll
through pages of valgrind logs that provide no value.
If the test needs to do something very specific it may be useful to
prevent IWD from managing all the radios. This can now be done
by setting a "reserve" option in the radio settings. The value of
this should be something other than iwd, hostapd, or wpa_supplicant.
For example:
[rad1]
reserve=false
When HUP is received the IO read callback was never completing which
caused it to block indefinitely until waited for. This didn't matter
for most transient processes but for IWD, hostapd, wpa_supplicant
it would cause test-runner to hang if the process crashed.
Detecting a crash is somewhat hacky because we have no process
management like systemd and the return code isn't reliable as some
processes return non-zero under normal circumstances. So to detect
a crash the process output is being checked for the string:
"++++++++ backtrace ++++++++". This isn't 100% reliable obviously
since its dependent on how the binary is compiled, but even if the
crash itself isn't detected any test should still fail if written
correctly.
Doing this allows auto-tests to handle IWD crashes gracefully by
failing the test, printing the exception (event without debugging)
and continue with other tests.
Certain tests may require external processes to work
(e.g. testNetconfig) and if missing the test will just hang until
the maximum test timeout. Check in start_process if the exe
actually exists and if not throw an exception.
With the addition of DPP PKEX autotests some of the timeouts are
quite long and hit test-runners maximum timeouts. For UML we should
allow this since time-travel lets us skip idle waits. Move the test
timeout out of a global define and into the argument list so QEMU
and UML can define it differently.
Handling these events notifies hwsim of address changes for interface
creation/removal outside the initial namespace as well as address
changes due to scanning address randomization.
Interfaces that hwsim already knows about are still handled via
nl80211. But any interfaces not known when ADD/DEL_MAC_ADDR events
come will be treated specially.
For ADD, a dummy interface object will be created and added to the
queue. This lets the frame processing match the destination address
correctly. This can happen both for scan randomization and interface
creation outside of the initial namespace.
For the DEL event we handle similarly and don't touch any interfaces
found via nl80211 (i.e. have a 'name') but need to also be careful
with the dummy interfaces that were created outside the initial
namespace. We want to keep these around but scanning MAC changes can
also delete them. This is why a reference count was added so scanning
doesn't cause a removal.
For example, the following sequence:
ADD_MAC_ADDR (interface creation)
ADD_MAC_ADDR (scanning started)
DEL_MAC_ADDR (scanning done)
After adding prefix matching the rule structure contained allocated
memory which was not being cleaned up on exit if rules still
remained in the list (removing the rule via DBus was done correctly)
If a rule was disabled it would cause hwsim to not continue processing
frames using rules further in the queue. _Most_ tests only use one
rule so this shouldn't have changed their behavior but others which
use multiple rules may be effected and the tests have not been
running properly.
When PCI adapters are properly configured they should exist in the
vfio-pci system tree. It is assumed any devices configured as such
are used for test-runner.
This removes the need for a hw.conf file to be supplied, but still
is required for USB adapters. Because of this the --hw option was
updated to allow no value, or a file path.
subprocess.Popen's wait() method was overwritten to be non-blocking but
in certain circumstances you do want to wait forever. Fix this to allow
timeout=None, which calls the parent wait() method directly.
Since we use git ls-files to produce the list of all tests for -A, if
the source directory is owned by somebody other than root one might
get:
fatal: unsafe repository ('/home/balrog/repos/iwd' is owned by someone else)
To add an exception for this directory, call:
git config --global --add safe.directory /home/balrog/repos/iwd
Starting
/home/balrog/repos/iwd/tools/..//autotests/ threw an uncaught exception
Traceback (most recent call last):
File "/home/balrog/repos/iwd/tools/run-tests", line 966, in run_auto_tests
subtests = pre_test(ctx, test, copied)
File "/home/balrog/repos/iwd/tools/run-tests", line 814, in pre_test
raise Exception("No hw.conf found for %s" % test)
Exception: No hw.conf found for /home/balrog/repos/iwd/tools/..//autotests/
Mark args.testhome as a safe directory on every run.
The kernel will not let us test some scenarios of communication between
two hwsim radios (e.g. STA and AP) if they're in the same net namespace.
For example, when connected, you can't add normal IPv4 subnet routes for
the same subnet on two different interfaces in one namespace (you'd
either get an EEXIST or you'd replace the other route), you can set
different metrics on the routes but that won't fix IP routing. For
testNetconfig the result is that communication works for DHCP before we
get the inital lease but renewals won't work because they're unicast.
Allow hostapd to run on a radio that has been moved to a different
namespace in hw.conf so we don't have to work around these issues.
The --start option was directly passed to the kernel init parameter,
preventing any environment setup from happening.
Intead always use 'run-tests' as the init process but detect --start
and execute that binary/script once inside the environment.
In UML if any process dies while test-runner is waiting for the DBus
service or some socket to be available it will block forever. This
is due to the way the non_block_wait works.
Its not optimal but it essentially polls over some input function
until the conditions are met. And, depending on the input function,
this can cause UML to hang since it never has a chance to go idle
and advance the time clock.
This can be fixed, at least for services/sockets, by sleeping in
the input function allowing time to pass. This will then allow
test-runner to bail out with an exception.
This patch adds a new wait_for_service function which handles this
automatically, and wait_for_socket was refactored to behave
similarly.
This function was checking if the process object exists, which can
persist long after a process is killed, or dies unexpectedly. Check
that the actual PID exists by sending signal 0.
An earlier commit fixed several options but ended up breaking others. The
result_parent/monitor_parent options are hidden from the user and only meant
to be passed to the kernel but they relied on the fact that the underscore
was present, not a dash. This updates the argument to use a dash:
--result-parent
--monitor-parent
Fixes: 00e41eb0ff ("test-runner: Fix parsing for some arguments")
The new regex match update was actually matching way more than it should
have due to how python's 'match' API works. 'match' will return successfully
if zero or more characters match from the beginning of the string. In this
case we actually need the entire regex to match otherwise we start matching
all prefixes, for example:
"--verbose iwd" will match iwd, iwd-dhcp, iwd-acd, iwd-genl and iwd-tls.
Instead use re.fullmatch which requires the entire string to match the
regex.
Enabling this ends up dumping so much logging and, at least with namespaces,
seems to break the logger module and cause really weird behavior, worst of
which is that all processes start dumping to stdout.
This can still be enabled explicitly with --verbose iwd-rtnl, but is turned
off by default when --log is used.
Currently the parameter values reach run-tests by first being parsed by
runner.py's RunnerArgParser, then the resulting object members being
encoded as a commandline string, then as environment variables, then the
environment being converted to a python string list and passed to
RunnerCoreArgParser again. Where argument names (like --sub-tests) had
dashes, the object members had underscores (.sub_tests), this wasn't
taken into account when building the python string list from environment
variables so convert all underscores to dashes and hope that all the
names match now.
Additionally some arguments used nargs='1' or nargs='*' which resulted
in their python values becoming lists. They were converted back to command
line arguments such as: --sub_tests ['static_test.py'], and when parsed
by RunnerCoreArgParser again, the values ended up being lists of lists.
In all three cases it seems the actual user of the parsed value actually
expects a single string with comma-separated substrings in it so just drop
the nargs= uses.
This was lazily copied from UML but really made no sense in the context
of QEMU. First QEMU needs the virtfs option to define the mount tag and
in addition a 9p mount should be used rather than 'hostfs'.
The glob match was completely broken for --verbose because globs
are actually path matches, not generally for strings. Instead
match based on regular expressions.
First the verbose option was fixed to store it as an array as well
as write any list arguments into the kernel command line properly
(str() would include []). This has worked up until now because the
'in' keyword in python will work on strings just as well
as lists, for example:
>>> 'test' in 'this,is,a,test'
True
Then, the glob match was replaced with a regex match. Any exceptions
are caught and somewhat ignored (printed, but only seen with --debug).
This only guards against fatal exceptions from a user passing an
invalid expression.
This bit of code was throwing exceptions if a test cleaned up files that
test-runner was expecting to clean up. Specifically testHotspot swaps out
main.conf and PSK files many times. This led to the exception being thrown,
caught, and ignored but further on test-runner would print:
"File _X_ not cleaned up!"
Now the files will be checked if they exist before trying to remove it.
Similarly to ofono/phonesim allow tests to be skipped if wpa_supplicant
is not found on the system.
This required some changes to DPP/P2P where Wpas() should be called first
since this can now throw a SkipTest exception.
The Wpas class was also made to allow __del__ to be called without
throwing additional exceptions in case wpa_supplicant was not found.
If the user specifies the same parent directory for several outfiles
skip mounting since it already exists. For example:
--monitor /outfiles/monitor.txt --result /outfiles/result.txt
Inside the virtual environments /tmp is mounted as its own FS and not
taken from the host. This poses issues if any output files are directly
under /tmp since test-runner tries to mount the parent directory (/tmp).
The can be fixed by ensuring these output files are either not under
/tmp or at least one folder down the tree (e.g. /tmp/outputs/outfile.txt).
Now this requirement is enforced and test-runner will not start if any
output files parent directory is /tmp.
Usually the test home directory is a git repo somewhere e.g. under
/home. But if the home directory is located under /tmp this poses
a problem since UML remounts /tmp. To handle both cases mount
the home directory explicity.
Certain aspects of QEMU like mounting host directories may still require
root access but for UML this is not the case. To handle both cases first
check if SUDO_UID/GID are set and use those to obtain the actual users
ID's. Otherwise if running as non-root use the UID/GID of the user
directly.
This only posed a problem oddly if the kernel binary was in the same
directory as test-runner. Resolving the absolute path with the
argument parser resolves the issue.