Super Simple Tasker
Super Simple Tasker
Super Simple Tasker
Introduction
Contrary to popular opinion, I believe you can do a great deal with a very simple, homemade scheduler. In
this paper I will show you how a few simple tradeoffs will let you create a preemptive, prioritized scheduler
in only a few lines of code. This scheduler (SST for Super Simpler Tasker) is ideally suited for machines
with limited RAM and program space. It's not a good fit, though for machines like the PIC that have a
limited, inaccessible stack. SST adapts well to machines with banked registers, like the Rabbit 2000 and
8051. Because SST is both preemptive and very simple, it offers excellent real-time performance. Finally,
the simplicity of the code actually encourages customization and optimization. It's eminently practical to
optimize this scheduler to the specifics of the problem and the machine.
More important to me and to this paper, though, SST makes a great vehicle for teaching basic embedded
systems real-time techniques and for demonstrating the impact that implementation details can have on total
system performance. Unlike in the application world, concurrency in embedded systems (at least in small
embedded systems) is seldom about multiple processes. Real-time embedded programming is about
generating responses in reaction to events. Typically the events arrive via interrupts, and the responses are
immediate and short-lived. SST's implementation is ideally suited to this kind of problem. In fact, one could
easily argue that SST can barely understand any other kind of problem.
SST is an exercise in design minimalism. At every opportunity, I have opted for simplicity, efficiency, and
ease of implementation, rather than generality. My goal is to demonstrate that, even in the most constrained
environments, one can still employ a scheduler and still benefit from the structure and predictability that a
scheduler can bring to a real time program. The resulting scheduler, though, is only that: a scheduler. It
constitutes only the minimum plumbing necessary to dynamically schedule and switch between tasks.
Unlike more general solutions, SST makes no attempt to support any particular computational model. Unlike
UNIX processes and other thread models, SST doesn't create separate memory spaces, separate name spaces,
or the illusion that each thread runs in a separate process. SST supports task switching. The programmer is
responsible for adding scaffolding to create and preserve thread or task instance data and to implement basic
concurrency primitives like queues, events, semaphores, and rendezvous. None of this code is large or
particularly complex. Some of it depends on idioms unique to SST; most uses only techniques and idioms
common to concurrency anywhere. The "disadvantage" is that the programmer must thoroughly understand
exactly what they are doing. The "advantage" is that the programmer must thoroughly understand exactly
what they are doing -- and can exploit that understanding to tune the system for the problem and machine.
This minimalist approach is not without benefit, though. On most machines, SST requires only these
resources:
• Code Space: as little as 300-400 machine instructions. In cases where the application is already tracking
event priorities, it may literally add a couple of dozen lines of code. 1
• General purpose RAM: one byte for the scheduler; one byte plus one pointer (or index value) for each
task; and one pointer (or index value) for each task in the ready queue. A system with seven tasks
1
For example, see the The “native” implementation of Miro Samek’s Quantum Framework.
External Characteristics
SST's view of a task and the restrictions it places on task switches are notably different than those of other
schedulers. These differences have the advantage of allowing SST to support preemptive scheduling with a
single run-time stack. While using the stack for all context information greatly simplifies the scheduler
implementation, it creates an environment where task, thread, and priority have different meanings and
interact differently than with other schedulers.
2
Like many CS terms, the meaning of non-blocking seems to depend on context -- and may be used inconsistently by
different authors. SST can exhibit the kind of "blocking" associated with priority inversions.
Scheduling
In SST scheduling decisions are only allowed at two junctures: when a task completes and when a task is
preempted. A task can be preempted by an interrupt or can preempt itself by voluntarily calling the
scheduler. In all cases, the correct operation of the scheduler depends upon the program observing certain
conventions. In particular, each time the program adds a task to the list of run requests, it must immediately
call the scheduler. Typically, tasks are added to the list of run requests (the "ready list") during ISRs, so
almost every ISR will call the scheduler (thereby effecting SST's preemptive behavior.) If a normal task
requests a change to the ready list or to its own priority, then it too must call the scheduler.
When called, SST applies a remarkably simple scheduling algorithm:
If a task in the ready list has higher priority than the current task, start the higher priority task;
if not, resume the current task.
Whenever SST starts a new task, it selects the ready task with the highest priority.
Consequences
If the program always calls the scheduler as required by the above conventions, then at every point in time,
SST will be running the highest priority ready task. Unless preempted by a higher priority task, this task will
run to completion.
Because the scheduler will always run the current task to completion before starting another task of the same
priority, tasks with the same priority are automatically serialized and can share data without concern for
synchronization. Because of this trait, I find it useful to think of each SST priority level as the root of a
thread. Tasks with a given priority level represent the schedulable units of work in that thread. When
compared to other environments, an SST task is just a non-blocking segment of a thread.
Tasks with different priority levels must explicitly synchronize their access to shared data.
No task can run unless all eligible (ready or suspended) higher priority tasks have completed. It is helpful to
think of lower priority tasks as running in the "gaps" between the executions of higher priority tasks. These
gaps, though, are created by tasks terminating, not by some scheduler-imposed time-slicing discipline. The
scheduler is not fair; if the queue does not empty periodically, some tasks will never execute. Any task that
fails to terminate will block all tasks of equal or lower priority. Except for the lowest priority task, all SST
tasks must be transient.
public:
static const int max_queue_len = SST_MAX_Q_LEN;
private:
static Sst_priority_t current_priority;
Rdq * ready_queue;
};
Listing 2 -- The Task Interface
class Task {
friend class Sst;
public:
Task( Sst_priority_t pri=0, void (* code )( void)='\0');
// assigns a priority and a code body to a task.
private:
void (* module )( void );
Sst_priority_t priority;
Using SST
Initialization and Protocols
To use SST, a program usually needs to create these four resources:
Scheduler Implementation
Design Motivation
This scheduler design evolved years ago from a pragmatic need to squeeze a multi-threaded design into an
8748 and from a naïve fascination with three (sometimes accurate) observations:
1. An interrupt service sequence has almost the same structure as an O/S task switch.
2. In a non-blocking system, scheduling decisions are only necessary at preemption and at task
completion.
3. In a prioritized system, task context can be forced to stack just like function context does.
These observations and the need to preemptively schedule several tasks at three or four different priority
levels led me to implement a very small, non-blocking scheduler.
Observation 1: Interrupts are a task switch. This was the most "forcing" of these observations. Consider
what happens when an interrupt is serviced:
• The current thread is interrupted,
• The ISR saves the context of the interrupted thread,
• The ISR performs whatever urgent service is required by the interrupting device,
• The original context is restored, and
• The ISR returns control to the original thread.
As Figure 1 shows the ISR sequence accurately mirrors the sequence of events that occur when a
(prioritized, preemptive) RTOS preemptively starts a new higher priority task (as opposed to resuming some
task). The primary differences are:
• The RTOS always saves all context; sometimes the ISR will save only some context.
• The RTOS may need to manipulate stacks or memory mapping hardware to properly initialize the
new task environment; the ISR often runs in the same context as the interrupted routine.
T a sk IS R T a sk 1 O /S T a sk S w . T a sk 2
S a ve
C o nte xt S a ve
=== C o nte xt
S e rvic e ===
Int'rp t S e le c t
=== T a sk L a unc h()
R e sto re
C o nte xt
R e sum e () T a sk E nd s? ()
R e sto re
C o nte xt
T a sk E nd s()
Figure 1 -- Except for the dynamic selection of the service task, an ISR performs the same sequence of tasks as
an O/S task switcher.
Observation 2: preemption can be limited to "normal" interrupt times. A prioritized scheduler strives
to run the highest priority ready task. Thus, a prioritizing scheduler only needs to perform a task switch when
something happens to change the relation between the set of tasks in the ready queue and the currently
running task. Because this is a non-blocking scheduler, a task switch can only be necessary if:
• A task has been added to the ready queue, or
• The current task has completed. 3
3
Here I'm assuming that task priority is not changed dynamically and that tasks are never removed from the queue
(except by being executed). In some situations it makes sense to relax the first restriction to allow priorities to be
raised temporarily. While allowing tasks to be arbitrarily removed from the ready queue doesn't cause any scheduling
issues, I don't see that it has any useful application that can't easily be addressed in other ways that have less effect on
latency.
4
This works if the original thread, dispatcher, and all tasks were all compiled to reside in the same context (primarily
an issue of address space, here.) If not, then it will be necessary to perform a context save, call the dispatcher, and
perform a context restore.
5
As always, this depends on the architecture and the quality of implementation. For example, on architectures with
several banks of registers, the best option might be to save context by switching banks for certain priorities, and use
the stack for all other priorities.
S st
-current_priority
-ready_queue : R dq R dq
1
+run_next() : void
1 -List of R eady Tasks : Task
+add() : B oolean 1
+m atch_priority_of() : void +add() : void
+get_next() : Task
*
«uses»
Task
-priority : int
-m odule* : void
+launch() : void
+get_priority() : int
Conceptually the scheduler is constructed from three classes of object: Sst (the dispatcher), Rdq (the ready
queue), and Task. Both the dispatcher and the ready queue should be singletons. There should be a distinct
instance of Task for each separately schedulable combination of code and priority level. Figure 2 shows the
interface details and class relationships. Listing 3 and Listing 5 are the entire implementation for the Sst and
Task classes. I'll describe various alternative implementations for the Rdq class later.
void
Task::launch( ) const
{
BSP_ENABLE_INTR( );
(*module)( );
}
Sst_priority_t
Task::get_priority( void ) const
{
return priority;
}
class Rdq {
private:
Task ** queue;
int q_count;
Task ** q_ptr; // always points at next available
public:
Rdq(int max_len);
~Rdq( );
};
public:
static const int max_queue_len = SST_MAX_Q_LEN;
private:
static Sst_priority_t current_priority;
Rdq * ready_queue;
};
Sst_priority_t Sst::current_priority= 0;
Sst::Sst(int qsize){
current_priority = 0;
ready_queue = new Rdq (qsize);
}
inline bool
Sst::add(Task * item){
return ready_queue->add(item);
}
void
Sst::run_next(void){
Task * ready;
BSP_intr_state_t entry_state;
Sst_priority_t entry_priority;
void
Sst::match_priority_of(Task * task)
{
BSP_intr_state_t ival;
ival = BSP_HOLD_INTR( )
current_priority = task->get_priority( ); // can avoid locks by
BSP_RESTORE_INTR( ival); // using sig_atomic_t for priorities
}
Procedurally, the key components in this implementation are Sst::run_next() and Rdq::get_next( ). Listing 5
shows the implementation of the main dispatch routine Sst::run_next( ). This routine is invoked each time a
scheduling decision is to be made. The operation of run_next( ) is much easier to understand if you
remember that every running or suspended task was launched by a call from a separate instance of
run_next(). Thus, there must be a separate suspended instance of run_next( ) waiting for the return of every
suspended or running task. run_next( ) not only launches the task, and it also waits around to "catch" the exit
when the task completes. Figure 3 shows how the tasks and individual run_next( ) instances nest on the stack
after two levels of preemption. Notice that a separate instance of run_next is "layered" immediately above
each suspended task. This layering is the mechanism that allows the scheduler to recover control of the
processor when a task exits.
main()'s stack
frame
Run_Next()
Main() Ret. Addr.
SST(0)'s frame
Launch()
SST
SST Ret. Addr.
P2_Tasks's frame
P4_Task's Frame
Figure 3 -- This diagram shows how the stack looks after a priority 4 task interrupts a priority 2 task.
At each invocation, run_next() will decide whether to launch a higher priority task or resume the suspended
task. Thus, each invocation of run_next( ) must know the priority of the most recently suspended task and
must make all of it's decisions relative to that priority. Each instance of run_next makes this information
available to subsequent instances of run_next( ) by posting a task's priority level to the global static
SST.current_priority before launching the task. When first called, each new instance of run_next( ) copies
the task priority information from SST.current_priority into a local stack variable6 entry_priority, so
that it can "remember" what priority task was executing when it was invoked. The value in
entry_priority sets a threshold for the particular instance of the dispatcher. Each dispatcher may only
launch tasks with priority greater than the value in entry_priority.
The Rdq::get_next(floor) method retrieves a pointer to the highest priority ready task with priority greater
than floor. If there are no ready tasks with the required priority, get_next returns a null pointer. In addition to
returning a pointer to the task, get_next( ) deletes the task from the ready queue.
Method run_next( ) will continue to loop, launching tasks until it has processed all ready tasks with priority
higher than the task under it in the runtime stack. Once it has exhausted the list of high-priority ready tasks,
run_next( ) simply returns, thereby releasing the suspended task immediately under it.
Code Details
BSP_XXX calls are processor-specific macros or inline functions developed as part of the board support
package. BSP_HOLD_INTR( ) captures the current interrupt enable state, disables interrupts, and then
returns the captured value. BSP_RESTORE_INTR( ) reverses this process, restoring the interrupt enable
state from a saved value. (These primitives are necessary if the scheduler can ever be called from within a
6
Many compilers for small processors optimize local variable storage and references by mapping locals to various
internal RAM or register storage. run_next( ), however, must be re-entrant, so it is important each instance have its
own storage for this variable. Depending on the compiler, you may need to mark run_next( ) reentrant, or explicitly
declare this local auto or stack to get the desired results.
Scheduler Optimizations
Optimizing the Dispatch Call
In the examples I've shown so far, the ISR does a full context save and then calls the dispatch routine. While
this mechanism is easy to implement and understand, on most machines, it has several disadvantages. First, it
requires the code for a context save and restore to be repeated in every interrupt handler. Second, because it
always performs a full context save before servicing the interrupt, it adds to interrupt latency. Third, the
direct call requires that the stack be large enough to supply every priority level with stack frames for the
interrupt, the dispatch call, the task call, and any functions the task invokes.
All of these disadvantages can be addressed by delaying the dispatch call until after the ISR completes. The
ISR can effect such a delayed call by manipulating the stack so that when the ISR exits, the CPU will
"return" to the dispatch routine. Because the ISR exits before calling the dispatch routine, the scheduler uses
one less stack frame per level of preemption. (Compare Figure 5 to Figure 3.) On some machines this psuedo
return can be accomplished by pushing the entry point of the dispatch routine just before exiting the ISR, as
in Figure 4. You can get nearly identical results by jumping to the dispatch entry at the ISR exit. A "call by
return" is usually more compiler-friendly and is more compatible with systems that prioritize interrupts in
hardware. On machines where condition flags are easily disturbed (for example where loading a literal
address or writing to the stack also affects status flags), the link can be accomplished by "poking" the return
address into the stack before the ISR begins its context restore protocol (see the psuedo code in Figure 6). A
few processors, like the PIC, place the stack in a separate, inaccessible address space. On these machines, a
"jump to dispatch" is the only workable technique for implementing this optimization.
Begin_ISR: Begin_ISR:
Full Context Save Minimum Context Save
Service Interrupt Service Interrupt
Clear & Enable Intr. Clear & Enable Intr.
Call Sst.run_next( ) Min. Context Restore
Restore Context Push Dispatch Addr.
Normal Return Ret. From Intr.
Dispatch:
Full Context Save
Main dispatch loop
Full Context Retore
Normal Return
Figure 4 -- On many machines, the ISR can trigger a "call" at exit by pushing a return address before exit.
main()'s stack
frame
Run_Next()
Main() Ret. Addr.
SST(0)'s frame
Launch()
SST SST Ret. Addr.
Via
Psudo
Return P2_Tasks's frame
Figure 5 -- Pushing an extra address onto the stack allows the ISR to "return" to the dispatch routine instead
of calling it and reduces the number of return addresses kept in the stack.
This delayed call allows one to write much tighter ISRs. Depending on the particular processor with this
technique you may be able to achieve the minimum possible delay between interrupt recognition and
interrupt service. Delaying the dispatch call also allows all ISRs to share a single copy of the context save
and restore code. In most situations, however, you will wind up with two versions of run_next( ): one
designed to be executed by ISRs, and one (with normal compiler-compatible prolog and epilog, but without
context save and restore) that can be called from normal threads.
Begin_ISR:
Save status and acc. to dedicated storage
Push garbage/reserve stack space
Do minimal context save
Service Interrupt
Clear & Enable Intr.
Poke dispatch entry address into reserved space
If needed poke "fake" frame
Min. Context Restore
Push Dispatch Addr.
Ret. From Intr.
Dispatch:
Full Context Save
Main dispatch loop
Full Context Retore
Normal Return
Conclusion
SST demonstrates that you only need a few lines of code and a little discipline to bring the structural
advantages of a preemptive, prioritized scheduler to almost any environment. While SST's oddities may be
unsettling at first, its scheduling policy can easily be exploited to greatly reduce the amount of explicit
synchronization needed in the typical reactive program. On balance, SST demands more design effort and
real-time design skill than other schedulers, but rewards that investment with cleanly structured programs
that can run in the most limited environments.
While SST is different, it isn't that different. The insight you gain through mastering task communication
and synchronization in SST will all transport to other concurrent environments. Regardless of the
concurrency model, different rate threads need to be decoupled by queues, separate S/R chains should be
separated, locked critical sections introduce jitter, and tasks must be synchronized. At the implementation
level, every RTOS or scheduler deals with these issues almost exactly as you do in SST at the application
level. In my opinion, knowing what's "under the hood" is always a good thing.