Commit Graph

468 Commits

Author SHA1 Message Date
James Prestwood 1ecadc3952 test-runner: fix UML blocking on wait_for_socket/service
In UML if any process dies while test-runner is waiting for the DBus
service or some socket to be available it will block forever. This
is due to the way the non_block_wait works.

Its not optimal but it essentially polls over some input function
until the conditions are met. And, depending on the input function,
this can cause UML to hang since it never has a chance to go idle
and advance the time clock.

This can be fixed, at least for services/sockets, by sleeping in
the input function allowing time to pass. This will then allow
test-runner to bail out with an exception.

This patch adds a new wait_for_service function which handles this
automatically, and wait_for_socket was refactored to behave
similarly.
2022-06-24 18:11:10 -05:00
James Prestwood 3e5ce99e82 test-runner: make is_process_running more accurate
This function was checking if the process object exists, which can
persist long after a process is killed, or dies unexpectedly. Check
that the actual PID exists by sending signal 0.
2022-06-24 18:11:02 -05:00
James Prestwood c10ade711d test-runner: remove reference to missing class member
The print statement was referencing self.name, which doesn't exist. Use
self.args[0] instead.
2022-06-24 18:10:54 -05:00
James Prestwood d43ec1b014 test-runner: fix result/monitor options
An earlier commit fixed several options but ended up breaking others. The
result_parent/monitor_parent options are hidden from the user and only meant
to be passed to the kernel but they relied on the fact that the underscore
was present, not a dash. This updates the argument to use a dash:

--result-parent
--monitor-parent

Fixes: 00e41eb0ff ("test-runner: Fix parsing for some arguments")
2022-06-22 18:41:21 -05:00
James Prestwood 8f42507641 test-runner: fix matching with --verbose
The new regex match update was actually matching way more than it should
have due to how python's 'match' API works. 'match' will return successfully
if zero or more characters match from the beginning of the string. In this
case we actually need the entire regex to match otherwise we start matching
all prefixes, for example:

"--verbose iwd" will match iwd, iwd-dhcp, iwd-acd, iwd-genl and iwd-tls.

Instead use re.fullmatch which requires the entire string to match the
regex.
2022-06-22 18:39:41 -05:00
James Prestwood 679cea02af test-runner: exclude 'iwd-rtnl' from being enabled with --log
Enabling this ends up dumping so much logging and, at least with namespaces,
seems to break the logger module and cause really weird behavior, worst of
which is that all processes start dumping to stdout.

This can still be enabled explicitly with --verbose iwd-rtnl, but is turned
off by default when --log is used.
2022-06-22 18:37:15 -05:00
Andrew Zaborowski 00e41eb0ff test-runner: Fix parsing for some arguments
Currently the parameter values reach run-tests by first being parsed by
runner.py's RunnerArgParser, then the resulting object members being
encoded as a commandline string, then as environment variables, then the
environment being converted to a python string list and passed to
RunnerCoreArgParser again.  Where argument names (like --sub-tests) had
dashes, the object members had underscores (.sub_tests), this wasn't
taken into account when building the python string list from environment
variables so convert all underscores to dashes and hope that all the
names match now.

Additionally some arguments used nargs='1' or nargs='*' which resulted
in their python values becoming lists.  They were converted back to command
line arguments such as: --sub_tests ['static_test.py'], and when parsed
by RunnerCoreArgParser again, the values ended up being lists of lists.
In all three cases it seems the actual user of the parsed value actually
expects a single string with comma-separated substrings in it so just drop
the nargs= uses.
2022-06-22 15:56:01 -05:00
Andrew Zaborowski 1aa418d098 test-runner: Support iwd-rtnl as a --verbose value 2022-06-17 14:13:30 -05:00
James Prestwood b2ed779ce9 test-runner: fix testhome mounting for QEMU
This was lazily copied from UML but really made no sense in the context
of QEMU. First QEMU needs the virtfs option to define the mount tag and
in addition a 9p mount should be used rather than 'hostfs'.
2022-06-03 18:20:55 -05:00
James Prestwood db3d6a3652 test-runner: allow regex for verbose option
The glob match was completely broken for --verbose because globs
are actually path matches, not generally for strings. Instead
match based on regular expressions.

First the verbose option was fixed to store it as an array as well
as write any list arguments into the kernel command line properly
(str() would include []). This has worked up until now because the
'in' keyword in python will work on strings just as well
as lists, for example:

>>> 'test' in 'this,is,a,test'
True

Then, the glob match was replaced with a regex match. Any exceptions
are caught and somewhat ignored (printed, but only seen with --debug).
This only guards against fatal exceptions from a user passing an
invalid expression.
2022-06-03 18:20:48 -05:00
James Prestwood c36f94a15a test-runner: only remove /tmp files if they exist
This bit of code was throwing exceptions if a test cleaned up files that
test-runner was expecting to clean up. Specifically testHotspot swaps out
main.conf and PSK files many times. This led to the exception being thrown,
caught, and ignored but further on test-runner would print:

"File _X_ not cleaned up!"

Now the files will be checked if they exist before trying to remove it.
2022-06-03 11:59:13 -05:00
James Prestwood df46776046 auto-t: allow skipping tests is wpa_supplicant is not found
Similarly to ofono/phonesim allow tests to be skipped if wpa_supplicant
is not found on the system.

This required some changes to DPP/P2P where Wpas() should be called first
since this can now throw a SkipTest exception.

The Wpas class was also made to allow __del__ to be called without
throwing additional exceptions in case wpa_supplicant was not found.
2022-06-02 16:47:02 -05:00
James Prestwood 87bb9a42b5 test-runner: skip mounting duplicate folders
If the user specifies the same parent directory for several outfiles
skip mounting since it already exists. For example:

--monitor /outfiles/monitor.txt --result /outfiles/result.txt
2022-05-25 15:00:05 -05:00
James Prestwood 1e6773d2a7 test-runner: disallow result/monitor/log directly under /tmp
Inside the virtual environments /tmp is mounted as its own FS and not
taken from the host. This poses issues if any output files are directly
under /tmp since test-runner tries to mount the parent directory (/tmp).
The can be fixed by ensuring these output files are either not under
/tmp or at least one folder down the tree (e.g. /tmp/outputs/outfile.txt).

Now this requirement is enforced and test-runner will not start if any
output files parent directory is /tmp.
2022-05-25 15:00:05 -05:00
James Prestwood 78c918c2c1 test-runner: mount testhome rather than assume location
Usually the test home directory is a git repo somewhere e.g. under
/home. But if the home directory is located under /tmp this poses
a problem since UML remounts /tmp. To handle both cases mount
the home directory explicity.
2022-05-25 15:00:05 -05:00
James Prestwood 641f558b3d test-runner: remove root user requirement from log/monitor/result
Certain aspects of QEMU like mounting host directories may still require
root access but for UML this is not the case. To handle both cases first
check if SUDO_UID/GID are set and use those to obtain the actual users
ID's. Otherwise if running as non-root use the UID/GID of the user
directly.
2022-05-25 15:00:05 -05:00
James Prestwood 9353c7748b test-runner: resolve --kernel absolute path
This only posed a problem oddly if the kernel binary was in the same
directory as test-runner. Resolving the absolute path with the
argument parser resolves the issue.
2022-04-06 17:22:06 -05:00
James Prestwood 5453f71a7c test-runner: fix check for phonesim
This got changed to use which, but not updated to remove the
list argument.
2022-04-06 17:22:06 -05:00
James Prestwood cec5fab9b3 test-runner: allow decode failure when writing IO
If this fails, e.g. an invalid utf-8 character it should not cause
the test to fail.
2022-04-06 14:47:58 -05:00
James Prestwood 9edb196395 test-runner: fix ctrl-c for UML
The TIOCSTTY ioctl was not shared between UML and QEMU which prevented
any console input from making it into UML. This fixes that, and now
ctrl-c can be used to stop UML test execution.
2022-04-06 14:47:52 -05:00
James Prestwood 5710cc097b test-runner: Unify QEMU/UML logging code
The MountInfo tuple was changed to explicitly take a source string. This
is redundant for UML and system mounts since the fstype/source are the same,
but it allows QEMU to specify the '9p' fstype and use MountInfo rather than
calling mount() explicitly.

This also moves logging cleanup into _prepare_mounts so both UML and QEMU
can use it.
2022-04-06 14:47:45 -05:00
James Prestwood e70d7e0857 test-runner: write separators for transient processes
Many processes are not long running (e.g. hostapd_cli, ip, iw, etc)
and the separators written to log files don't show up for these which
makes debugging difficult. This is even true for IWD/Hostapd for tests
with start_iwd=0.

After writing separators for long running processes write them out for
any additional log files too.
2022-04-06 14:47:24 -05:00
James Prestwood f199e3f40d test-runner: move BarChart into utils.py 2022-04-06 14:47:18 -05:00
James Prestwood bc9dfee7cd test-runner: remove unused class members 2022-04-06 14:47:13 -05:00
James Prestwood e993503c4d test-runner: simplify start_iwd handling
Replace two separate if blocks to handle the default value
with fallback=True.
2022-04-06 14:47:06 -05:00
James Prestwood b731e121c9 test-runner: remove path_exists/find_binary
These are already implemented in the shutil module
2022-04-06 14:46:59 -05:00
James Prestwood c74ac94c31 test-runner: isolate Process/Namespace into utils
Way too many classes have a dependency on the TestContext class, in
most cases only for is_verbose. This patch removes the dependency from
Process and Namespace classes.

For Process, the test arguments can be parsed in the class itself which
will allow for this class to be completely isolated into its own file.

The Namespace class was already relatively isolated. Both were moved
into utils.py which makes 'run-tests' quite a bit nicer to look at and
more fitting to its name.
2022-04-06 14:46:10 -05:00
James Prestwood 6003ad1527 test-runner: add logging, monitor, and results to UML
This commonizes some mounting code between QEMU and UML to allow exporting
of files to the host environment. UML does this with a hostfs mount while
QEMU still uses 9p.

The common code sanitizing the inputs has been put into _prepare_outfiles
and _prepare_mounts was modified to take an 'extra' arugment containing
additional mount points.

The results and monitor parent directories are now passed into the environment
via arguments, and these are hidden from the help text (in addition to testhome)
2022-04-05 17:49:55 -05:00
James Prestwood 309760cbab test-runner: use type=os.path.abspath for argparse
This is a convenient type which automatically resolves the argument
to an absolute path. Use this for any arguments expected to be
paths.
2022-04-05 17:49:55 -05:00
James Prestwood 96e8c0a3ab test-runner: fix --help and unknown options
If --help or unknown options were supplied to test-runner python
would thrown a maximum recusion depth exception. This was due to
the way ArgumentParser was subclassed.

To fix this call ArgumentParser.__init__() rather than using the
super() method. And do this also for the RunnerCoreArgParse
subclass as well. In addition the namespace argument was removed
from parse_args since its not used, and instead supplied directly
to the parents parse_args method.
2022-04-05 13:33:44 -05:00
James Prestwood 01efa0171d test-runner: enable scripts to be executed as init
This enables CONFIG_BINFMT_SCRIPT which allows the init process to
interpret #! and execute the script rather than requiring a binary
for init.
2022-04-05 13:33:41 -05:00
James Prestwood 5ef196f74d test-runner: resolve absolute path for --start argument 2022-04-05 13:33:38 -05:00
James Prestwood 7b5f640931 hwsim: add NoVirtualInterface option to DBus API
This was available with --no-interface but no such option existed
for the DBus API.
2022-04-05 13:33:21 -05:00
James Prestwood 25db380833 test-runner: fix kernel panic on exit for UML
UML requires RB_POWER_OFF rather than RB_AUTOBOOT (Qemu) in order
to avoid a kernel panic from killing init.
2022-04-04 09:12:50 -05:00
James Prestwood 31b5275c1f auto-t: hostapd.py: use IO watch for hostapd events
With how fast UML is hostapd events were being sent out prior to
ever calling wait_for_event. Instead set an IO watch on the control
socket and cache all events as they come. Then, when wait_for_event
is called, it can reference this list. If the event is found any
older events are purged from the list.

The AP-ENABLED event needed a special case because hostapd gets
started before the IO watch can be registered. To fix this an
enabled property was added which queries the state directly. This
is checked first, and if not enabled wait_for_event continues normally.
2022-03-31 18:12:59 -05:00
James Prestwood 9e2b0e75b1 test-runner: add time-travel to kernel config
This lets UML work with time-travel[=inf-cpu] options.
2022-03-31 18:12:46 -05:00
James Prestwood b342dfd8d5 test-runner: don't kill dmesg after individual tests
This prevents any kernel logging from being available after the first
test is finished.
2022-03-31 18:12:43 -05:00
James Prestwood 5a14daf9b8 test-runner: use may_block=True for context iteration (and move location)
This allows the callers condition to be checked immediately without
the mainloop running. In addition may_block=True allows the mainloop
to poll/sleep rather than immediately return back to the caller. This
handles async IO much better than may_block=False, at least for our
use-case.
2022-03-31 18:12:40 -05:00
James Prestwood 54552db7ba test-runner: fix logging for namespaces and pre-test processes
Namespace process logs were appearing under 'ip' (and also overwriting
actual 'ip' logs) since they were executed with 'ip netns exec <namespace>'.
Instead special case this and append '-<namespace>' to the log file name.

In addition processes executed prior to any tests were being put under
a folder (name of testhome directory). Now this case is detected and these
logs are put at the top level log directory.
2022-03-31 18:12:37 -05:00
James Prestwood b5df2e27be test-runner: add initial UmlRunner implementation
This allows test-runner to run inside a UML binary which has some
advantages, specifically time-travel/infinite CPU speed. This should
fix any scheduler related failures we have on slower systems.

Currently this runner does not suppor the same features as the Qemu
runner, specifically:

 - No hardware passthrough
 - No logging/monitor (UML -> host mounting isn't implemented yet)
2022-03-31 18:12:34 -05:00
James Prestwood 2894f2e3eb test-runner: rename test-runner, add run-tests
In order to keep all test-runner dev scripts working and to work with
the new runner.py system some file renaming was required.

test-runner was renamed to run-tests
A new test-runner was added which only creates the Runner() class.
2022-03-31 18:12:31 -05:00
James Prestwood 8fa2b7de45 test-runner: remove environment specific code
This removes all the Qemu/environment related code as this has been
moved into runner.py.
2022-03-31 18:11:13 -05:00
James Prestwood e753e867f3 test-runner: Move environment setup into own module
This (as well as subsequent commits) will separate test-runner into two
parts:

1. Environment setup
2. Running tests

Spurred by interest in adding UML/host support, test-runner was in need
of a refactor to separate out the environment setup and actually running
the tests.

The environment (currently only Qemu) requires quite a bit of special
handling (ctypes mounting/reboot, 9p mounts, tons of kernel options etc)
which nobody writing tests should need to see or care about. This has all
been moved into 'runner.py'.

Running the tests (inside test-runner) won't change much.

The new 'runner.py' module adds an abstraction class which allows different
Runner's to be implemented, and setup their own environment as they see
fit. This is in preparation for UML and Host runners.
2022-03-31 18:11:09 -05:00
James Prestwood f97b53608d tools: add UML specific options to the kernel config 2022-03-30 15:25:53 -05:00
James Prestwood 8d5e64e90d tools: add some required options to kernel config
It looks like some architectures defconfig were adding these in
automatically, but not others. Explicitly add these to make sure
the kernel is built correctly.
2022-03-30 15:25:51 -05:00
Andrew Zaborowski 45f86d7148 test-runner: Replace exit with sys.exit
exit comes from the site module which is "useful for the interactive
interpreter shell and should not be used in programs."
(https://docs.python.org/3/library/constants.html#constants-added-by-the-site-module)
Replace with sys.exit().  I for an undefined error for exit in
exit_vm().
2022-03-30 14:43:49 -05:00
Andrew Zaborowski 83299ef6aa test-runner: Don't require SUDO_GID to be set for logs
Base the root user check on os.getuid() instead of SUDO_GID so as not to
implicitly require sudo.  SUDO_GID being set doesn't guarantee that the
effective user is root either since you can sudo to non-root accounts.
2022-03-30 14:43:46 -05:00
Andrew Zaborowski 0201cde7ce test-runner: Fix checks in exit_vm
We check that config is not None but then access config.ctx outside of
that if block anyway.  Then we do the same for config.ctx and
config.ctx.args.  Nest the if blocks for the checks to be useful.
2022-03-30 14:43:44 -05:00
James Prestwood 2e173d4523 test-runner: fix OOM issues (hopefully)
For quite a while test-runner has run into frequent OOM exceptions when
running many tests in a row. Its not completely known exactly why, but
seems to point to the 9p driver which is used for sharing the root fs
between the test-runner VM and the host.

With debugging enabled (-d) one can see the available memory available
relatively stable. If a test fails it may spike ~3-4kb but this quickly
recovers as python garbage collects.

At some point the kernel faults failing to allocate which (usually) is
shown by a python OOM exception. At this point there is plenty of
available memory.

Dumping the kernel trace its seen that the 9p driver is involved:

[  248.962949] test-runner: page allocation failure: order:7, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
[  248.962958] CPU: 2 PID: 477 Comm: test-runner Not tainted 5.16.0 #91
[  248.962960] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-4.fc34 04/01/2014
[  248.962961] Call Trace:
[  248.962964]  <TASK>
[  248.962965]  dump_stack_lvl+0x34/0x44
[  248.962971]  warn_alloc.cold+0x78/0xdc
[  248.962975]  ? __alloc_pages_direct_compact+0x14c/0x1e0
[  248.962979]  __alloc_pages_slowpath.constprop.0+0xbfe/0xc60
[  248.962982]  __alloc_pages+0x2d5/0x2f0
[  248.962984]  kmalloc_order+0x23/0x80
[  248.962988]  kmalloc_order_trace+0x14/0x80
[  248.962990]  v9fs_alloc_rdir_buf.isra.0+0x1f/0x30
[  248.962994]  v9fs_dir_readdir+0x51/0x1d0
[  248.962996]  ? __handle_mm_fault+0x6e0/0xb40
[  248.962999]  ? inode_security+0x1d/0x50
[  248.963009]  ? selinux_file_permission+0xff/0x140
[  248.963011]  iterate_dir+0x16f/0x1c0
[  248.963014]  __x64_sys_getdents64+0x7b/0x120
[  248.963016]  ? compat_fillonedir+0x150/0x150
[  248.963019]  do_syscall_64+0x3b/0x90
[  248.963021]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  248.963024] RIP: 0033:0x7fedd7c6d8c7
[  248.963026] Code: 00 00 0f 05 eb b7 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa b8 ff ff ff 7f 48 39 c2 48 0f 47 d0 b8 d9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 81 a5 0f 00 f7 d8 64 89 02 48
[  248.963028] RSP: 002b:00007ffd06cd87e8 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
[  248.963031] RAX: ffffffffffffffda RBX: 000056090d87dd20 RCX: 00007fedd7c6d8c7
[  248.963032] RDX: 0000000000080000 RSI: 000056090d87dd50 RDI: 000000000000000f
[  248.963033] RBP: 000056090d87dd50 R08: 0000000000000030 R09: 00007fedc7d37af0
[  248.963035] R10: 00007fedc7d7d730 R11: 0000000000000293 R12: ffffffffffffff88
[  248.963038] R13: 000056090d87dd24 R14: 0000000000000000 R15: 000056090d0485e8

Here its seen an allocation of 512k is being requested (order:7), but faults.
In this run it there was ~35MB of available memory on the system.

Available Memory: 35268 kB
Last Test Delta: -2624 kB
Per-test Usage:
[  0] **        			37016
[  1] ********* 			41584
[  2] *         			36280
[  3] ********* 			41452
[  4] ********  			40940
[  5] ******    			39284
[  6] ****      			38348
[  7] ***       			37496
[  8] ****      			37892
[  9]           			35268

This can be reproduced by running all autotests (changing the ram down to
~128MB helps trigger it faster):

./tools/test-runner -k <kernel> -d

After many attempts to fix this it was finally found that simply removing the
explicit 9p2000.u version from the kernel command line 'fixed' the problem.
This even allows decreasing the RAM down to 256MB from 384MB and so far no
OOM's have been seen.
2022-03-28 12:38:15 -05:00
James Prestwood 6ada150026 test-runner: add memory usage for debugging
In debug mode the test context is printed before each test. This
adds some additional information in there:

Available Memory: /proc/meminfo: MemAvailable
Last Test Delta: Change in usage between current and last test
Per-test Usage: Graph of usage relative to all past tests. This is
                useful for seeing a trend down/up of usage.
2022-03-28 12:38:15 -05:00