Ideas#
Ideas for improving the dedebuggability of the kernel that depends less on being able to reproduce the problem. Most of these ideas would be appropriate for -g versions only or run only during regression.
Kerdump#
A tool to beused by regression framework before it starts running any
test to capture, and possibly validdate, internal kernel states. For example:
- dump numbers of free and inuse timers, processes, thread, and other kernel objects.
- memory usage stats
- validate vectors and lists
Output format would be simple to parse ascii to be collected by the regular regression framework.
Process/Thread Watchdog #
- A debug mode where the kernel would report if specified processes or threads did not run in some specified time window. Reporting might be shutdown, trigger a kernel dump or log.
- it may be useful to extend the "on" command to start processes under the watchdog
Kernel Watchdog#
- A watchdog to detect kernel hangs. For eample, INTR_LOCK() looping forever, excessive kernel-call premption, or a processor hanging tryng to enter the kernel.
- Would require a hardware timer.
Black Box recorder #
- (Analogy from aircraft crash investigations). A logging system that saves logs over a reboot.
- would require a chunk of memory not reinitilized by reboot, which may not be possible with some bios-es.
- arranging every crashed regression to save a kernel dump would be preferrable.
Improve current CRASH and shutdown messages#
- backtrace on CRASH message
- use a bigger stack dump in shutdown messages
- shutdown() that checks various pointers it's about to dereference so we don't cause another fault that truncates the shutdown output.
- run -g kernels normally
- automate capture and storage of kernel dumps and .sym files for every crash. Archive them and host them on a web server.
- Heck, why not have the regression framework run gdb on crash?
libmod_regress #
- a great sounding name for which we don't have a clear list of ideas yet
- would be a means to optionally add lots of intrusive debug, logging, and fault insertion code into the kernel without using a lot of ifdefs.
Misc#
- save a source-code linenumber whenever we set the errno in the kernel or process manager. So we can tell which error path set that mysterious SIGBUS.
Discussion #