Adaptive Partitioning Scheduler: Requirements#

Table of Contents


What is it?#


It's a fair-share thread scheduler which guarantees groups of threads a user-specified percentage of cpu time when the system is overloaded. When the system is sufficently underloaded, it chooses to schedule threads based strictly on priority and therefore maintians realtime behavior. Even when overloaded it provides realtime latencies to an engineerable set of critical threads.

We call a group of threads working for the same purpose a purpose group. Nominally, a purpose group is composed of the threads in a user-specificed set of processes. But during the time a server thread does work on behalf of a client thread, the server thread is considered to be a temporary member of the client's purpose group. Cpu usage, and guarantees, are tracked by purpose group.

Because competing schedulers use the word "partition" to mean "group of threads" being scheduled together, the external name for our scheduler is the adaptive partition scheduler to emphasise that the variable membership of our partitions. Purpose Group and Adaptive partition are interchangeable terms.


High Level Requirements#


System Considerations #

The user's interface#

Operating principles#


Limitations, Design considerations and Questions #


Membership and inheritance#

Scheduling #

Accounting #

Security and Logging #

Bankrupcy Handling#

It's pretty much described by the Handling Bankrupcy section of sys/sched_aps.h:

 /* Handling Bankruptcy  
 * ===================
 *
 * Bankruptcy is when critical cpu time billed to a partition exceeds it's critical budget. Bankruptcy is
 * always considered to be a design error on the part of the application, but the system's response is configurable. 
 *
 * 
 * If the system is not declaring bankruptcy when you expect it, note that bankruptcy can only be declared if critical
 * time is billed to your partition. Critical time is billed on those timesclices when these four conditions are 
 * all met:
 *      1. the running partition has a critical budget greater than zero
 *      2. the top thread in the partition is marked as running critical, or has received the critical state from
 *         receiving a SIG_INTR, a sigevent marked as critical, or has just received a message from a critical thread.
 *      3. the running partition must be out of percentage-cpu budget 
 *      4. there be at least one other partition that is competing for cpu time. 
 *
 *      And then only if the billed critical time exceeds a partitions critical budget will the system declare bankrupcty.
 *
 * 
 * When the system detects bankruptcy it will always: 
 *
 * 1. cause that partition to be out of budget for the remainder of the current scheduling window. 
 * 2. If the user has set a sigevent for notify_bankrupcty with SCHED_APS_ATTACH_EVENTS, deliver the event. 
 *    This occurs at most once per calling SCHED_APS_ATTACH_EVENTS. 
 *
 * In addition the following responses are configurable. QNX recommends using SCHED_APS_BNKR_RECOMMENDED.
 */ 
 #define SCHED_APS_BNKR_BASIC            0x0000000
 /* This causes delivery of bankruptcy notification events and makes the partition out-of-budget for the rest of
 * the scheduling window (nominally 100ms). This is the default.  
 */ 
 
 #define SCHED_APS_BNKR_CANCEL_BUDGET 0x00000001
 /* Causes the system to set the offending partitions budget to zero, which forces the partition to be 
  * be scheduled by it's percentage cpu budget only. That also means that a second bankruptcy cannot occcur.
  * This persists until a restart or SCHED_APS_MODIFY_PARTITION is called to set a new critical budget. 
  */
 
 #define SCHED_APS_BNKR_LOG                      0x000000002
 /* Causes the system to log the occurence of bankruptcy. To prevent causing a flood of logs, contiguous bankruptcies
  * which occur while the same process is running, will be logged once. 
  * 
  * NOTE: For now, this is logged only to the system console output. Output to slogger is scheduled for a later 
  * release.
  */ 
 
 #define SCHED_APS_BNKR_REBOOT           0X000000004
 /* The most severe response, suggested for use while testing a product to make sure bankrupcies will never be
  * ingored. Causes the system to crash with a brief message identifying the offending partition. Not recommended
  * for field use.
  */ 

Bankruptcy notifications are throttled. There is only one notiftation per partition per registration. I.e. users must re-register to get another notitification. If an offending partition in in a tight loop, the maximum rate of bankruptcy notification is determined by how quickly the thread-to-be-informed can received notififation and register for a nother notification.

An expection is if the user selects the SCHED_APS_BNKR_CANCEL policy. This causes the bankrupt partitions critical budget to be set to zero on its first bankrupcy. Because the critical budget is being set to zero, no furthter bankrupcies can occur until the user uses SCHED_APS_MODIFY_PARTITION to set a new non-zero critical budget. However, in order to cause the whole system to stabilize after such and event, the scheduler will give *all* partitions a two-window grace against declaring bankrupcy when one partition get's it's budget cancelled.

Billing execptions#

There are certain operations where the work will not be billed to the client.