SMP Support in the Kernel#

SMP support is provided only by the procnto-smp variant. This is built using source from:

The major changes for SMP are in:


inkernel and cpunum#


On a uniprocessor, inkernel holds only the following:

For a uniprocessor, the inkernel variable can be either in a memory location (ie. a regular variable), or in a dedicated cpu register. Using a dedicated cpu register provides faster access for inkernel manipulation.

To hide the inkernel implementation, ker/CPU/kercpu.h provides a number of macros that the cpu-independent kernel code uses to manipulate inkernel:

For SMP, the inkernel value must be implemented as a variable in memory since multiple cpus can access it:

Kernel code occasionally needs to know what cpu it is running on. It uses two macros, depending on what it needs to do:

Note that these can be different - it is possible for one cpu to be executing a kernel call with other cpus handling interrupts.

On a uniprocessor, both KERNCPU and RUNCPU are statically compiled to 0.


Initialisation#


The overall initialisation sequence for SMP is described here:

cpu_interrupt_init()#

This is a cpu-specific function, in ker/CPU/interrupt.c, used to dynamically generate the interrupt dispatch code. It glues together:

For SMP, cpu_interrupt_init() also needs to plumb in the support for IPI interrupts:

The intr_process_queue code handles only IPI interrupts so this special handling is an optimisation to avoid going through the intr_process_queue code.

smp_start()#

This is the kernel entry point for secondary cpus and each secondary cpu begins execution here once it is released from the spin loop in the smp_spin callout.

Its responsibilities are to:


Kernel Entry and Exit#



IPI Handling#


Interprocessor interrupts are used for a number of situations where an activity one cpu requires something to occur on other cpus:

There are a couple of things to note:

The most common commands are:

intr_process_queue is invoked directly by the interrupt dispatch code generated by cpu_interrupt_init():

Although it is littered with cpu-specific details, the operation is essentially:

    release the intr_slock
    save_state = cpupageptr[RUNCPU]
    cpupageptr[RUNCPU]->state = 1;
    cmd = _smp_xchg(&ipi_cmds[RUNCPU], 0);
    if (cmd & IPI_TLB_SAFE) {
        set_safe_aspace();
    }
    if (cmd & IPI_CLOCK_LOAD) {
        clock_load();
    }
    if (cmd & IPI_INTR_MASK) {
        interrupt_smp_sync(INTR_FLAG_SMP_BROADCAST_MASK);
    }
    if (cmd & IPI_INTR_UNMASK) {
        interrupt_smp_sync(INTR_FLAG_SMP_BROADCAST_UNMASK);
    }
    if (cmd & IPI_TLB_FLUSH) {
        cpu-specific actions for flushing TLBs
    }
    pending_async_flags = 0;
    if (cmd & IPI_TIMESLICE) {
        pending_async_flags |= _NTO_ATF_TIMESLICE;
    }
    if (cmd & IPI_RESCHED) {
        pending_async_flags |= _NTO_ATF_SMP_RESCHED;
    }
    if (cmd & IPI_CONTEXT_SAVE) {
        save FPU context in actives_fpu[RUNCPU]->fpudata
        clear BUSY/CPU bits  in actives_fpu[RUNCPU]->fpudata
        actives_fpu[RUNCPU] = 0;
    }
    if (pending_async_flags) {
        if (interrupted user mode) {
            pending_async_flags |= _NTO_ATF_FORCED_KERNEL;
        }
        else if (interrupted __ker_entry code spinning while waiting to acquire kernel) {
            SETKIP(act, KIP(act) - KER_ENTRY_SIZE);
            set state so intr_done thinks we entered from user mode
            pending_async_flags |= _NTO_ATF_FORCED_KERNEL;
        }
        old_flags = _smp_xchg(&act->async_flags, pending_async_flags);
        if ((pending_async_flags & _NTO_ATF_FORCED_KERNEL) && !(old_flags & _NTO_ATF_FORCED_KERNEL)) {
            act->args.async.save_ip = KIP(act);
            act->args.async.save_type = KTYPE(act);
            SETKTYPE(act, __KER_NOP);
            SETKIP(act, kercallptr);
        }
    }
    cpupageptr[RUNCPU]->state = save_state
    lock intr_slock

The handling of async_flags deserves a little more explanation. The intention of these IPI commands is to cause some form of rescheduling, which will be performed via __ker_exit when it checks the thread's async_flags.

However, the return from interrupt code for SMP may not necessarily directly return via __ker_exit so the easiest way to ensure that we will run through __ker_exit at some point is to force the current thread to make a null system call. We can only do this if:

If we interrupted any other kernel code, we simply set the thread's async_flags and expect to process them when the kernel code eventually returns to user mode.

When this forced __KER_NOP call returns through __ker_exit, the _NTO_ATF_FORCED_KERNEL flag will still be set, so the code knows that it must restore the KTYPE and KIP registers that were saved in the thread's args.async fields:


FPU Handling#


To avoid having to save/restore FPU context on each context switch, the kernel implements a lazy context switch mechanism:

This means that if the thread accesses the FPU, it will generate an exception:

For SMP, this lazy scheme means that it's possible for a thread to have used the FPU on one cpu but then be rescheduled onto another cpu. If the thread then uses the FPU again, it needs to save the FPU context on the old cpu so that it can be restored on the new cpu.

Similarly, when a thread terminates and its save area needs to be freed, it may have active FPU context on another cpu that must be flushed.

To handle this, the thread's fpudata pointer has some additional information to indicate if the context is active on a cpu:

So, overall, the context switch algorithm in the FPU exception handler is:

    if (thp->fpudata == 0) {
        atomic_set(&thp->async_flags, _NTO_ATF_FPUSAVE_ALLOC);
        return;
    }
    if (actives_fpu[RUNCPU] != 0) {
        save FPU context in FPUDATA_PTR(actives_fpu[RUNCPU]->fpudata);
        actives_fpu[RUNCPU] = 0;
    }
    if (FPUDATA_INUSE(thp->fpudata) && FPUDATA_CPU(thp->fpudata) != RUNCPU) {
        // send an IPI to the cpu with the FPU context and then wait until it has been saved
        SENDIPI(FPUDATA_CPU(thp->fpudata), IPI_CONTEXT_SAVE);
        bitset_inkernel(INKERNEL_EXIT);
        unlock_kernel();
        while (FPUDATA_INUSE(thp->fpudata))
            ;
        lock_kernel();
    }
    load FPU context from FPUDATA_PTR(thp->fpudata);

    // indicate FPU context is active on this cpu
    thp->fpudata |= FPUDATA_BUSY | RUNCPU;
    actives_fpu[RUNCPU] = thp;

Note that the kernel is typically locked (INKERNEL_LOCK is set) during exception handling so we need to:


CPU Scheduling#



Message Passing (SMP_MSGOPT)#