L Computer Science & Electrical Engineering

Real-Time Systems – Lab 2

A lightweight multi-threading kernel

The purpose of this lab assignment is to provide insights into the techniques by which multiple parallel threads can be implemented on a single sequential processor. To this end we will study and modify the internals of a small multi-threading kernel which we call tinythreads. The functionality offered by tinythreads will be limited to context-switching and mutual exclusion only, but as we shall see in a subsequent assignment, it still constitutes the basis on which a much more powerful system can be built.

The tinythreads kernel is defined in the files tinythreads.h and tinythreads.c. Copy these files, as well as the file mytest.c to your working directory and study the source code. In mytest.c you will find a small test example that runs two prime number generating threads in parallel (one spawned plus the main thread). Note: to make mytest.c complete you will need to copy and paste two functions from you solution to the first lab assignment:  writeChar(char ch, int pos) and  is_prime(long i). The provided code in mytest.c assumes that position 0 corresponds to the left-most position on the screen.

However, even the implementation of tinythreads is incomplete at this stage (despite its minimal API). The assignment is thus to complete this implementation, something which shall be performed in three steps.

Examination: it is sufficient to demonstrate that the final solution works.

Step 1

Compile and link tinythreads.c and mytest.c into one single program, download and run. As you will notice, only the main thread appears to be executing (i.e., numbers are only shown at position 3 and 4 of the display). This is because as the kernel currently stands, there is actually nothing that will cause any context-switching to occur.

Your first task is to address this shortcoming by implementing the body of kernel operation yield(), and inserting calls to this operation after each call to printAt() in mytest.c. A call to yield() shall place the calling thread in the last position of the ready-queue, and then pick the first element of that queue for execution (by means of the statement dispatch(dequeue(&readyQ)). There is a very moderate amount of code to be written in this step, the focus is instead on understanding the given sources. If yield() has been implemented correctly, the test program will now alternatingly generate printouts from both threads.

Step 2

The next step is to replace the explicit calls to yield() with periodic interrupts that do the same thing, although "behind the curtain". This means that the calls to yield() in mytest.c can be removed, since they are to be replaced by involuntary subroutine calls forced upon the currently executing code by the processor hardware. This technique of enforcing yields at regular intervals is called time-sharing, and is used extensively by multi-user operating systems.

As a stepping stone, start by installing a handler for the PCINT1_vect interrupt (vector no. 3), and let that handler cause the involuntary yield. To define an interrupt handler for PCINT1_vect, you write:
ISR(PCINT1_vect) {
//code for interrupt handler
The definition of the macro ISR uses some rather non-standard C extensions, check the file avr/interrupt.h if you are curious.

A PCINT1_vect interrupt can be generated by pressing the Butterfly joystick in the downward direction, if the logical interrupt source PCINT15 is enabled in mask registers EIMSK and PCMSK1. PCINT15 corresponds to a change on pin 7 of PORTB.

Note 1: to achieve proper operation of the joystick, a pull-up resistor must be activated by setting the corresponding output pin to 1; see lab 1.
Note 2: a PCINT1_vect interrupt will be generated each time the joystick downward switch changes state; you should make the current thread yield only each time the switch is actually depressed (and not when it is released), so you should test the current status of PORTB bit 7 before calling yield() from within your interrupt handler.

See pages 51-54 of the ATmega169 manual for more details on external interrupts, and the include file avr/iom169.h for a definition of the available interrupt sources. Any initalization code required in this step should be placed in the local funtion initialize() of tinythreads.c.

When the switching of contexts by means of the joystick works correctly, configure the 16-bit Timer/Counter1 to generate a TIMER1_COMPA_vect interrupt (vector no. 7) with a period of 50 ms. To do so you must make sure that OC1A is set high (1) on compare match, set the timer to Clear on Timer Compare (CTC) mode, and write a suitable value to the timer compare register OCR1A. With a system clock of 8 MHz, a timer prescaling factor of 1024 is recommended. It is also advisable to force the timer to start its first cycle from zero by clearing the TCNT1 register during initialization. Moreover, timer output compare A interrupts must be enabled by setting the corresponding bit in register TIMSK1. Documentation of the registers involved can be found on pages 117-123 of the ATmega169 manual.

Because the interrupts you have enabled may literally occur at any time, you should now take a moment to study the definition of the macros DISABLE() and ENABLE() and their placement inside the functions of tinythreads.c. What is their purpose? What could happen if a yield() call were injected at the worst place possible within functions like enqueue() or dispatch()? What about injecting such a call into yield() itself?

When the continuing output of the two prime number generators in mytest.c is "simultaneously" shown on the display, you are ready to proceed to step 3.

Step 3

We will now address the problem of mutual exclusion by implementing the functionality of mutex variables. To obtain clear evidence why mutual exclusion may be needed in a concurrent system, start by moving the declaration of variable pp in function printAt() to the global level of the file mytest.c. This turns the body of printAt() into a critical section, the same variable pp will now be shared between concurrent calls to the function.

Notice the effect on the display when running your modified test program. Should you have problems noticing any odd effect at all, you can experiment by inserting a loop that simply increments a local variable 1000 times right before where pp is incremented in printAt() – this delay will increase the possibility that a context switch occurs at the worst possible time. Then explain what you see!

To cure the critical section problem you will need the support of mutex variables. The concrete task is to implement the bodies of operations lock() and unlock() in tinythreads.c, and apply them in mytest.c to achieve well-behaved printouts of the prime numbers. The type mutex in tinythreads.c already contains the fields necessary to implement a mutex variable. A call to lock() shall set the locked flag of the mutex if it was previously unlocked, otherwise the running thread shall be placed in the waiting queue of the mutex and a new thread should be dispatched from the ready queue. Calling unlock() shall activate a thread in the waiting queue of the mutex if it is non-empty, otherwise the locked flag shall be reset.

Note: protecting the bodies of exported operations from interrupts is just as important here as it is in other parts of the kernel. Prioritize understanding of what is going on over eager hacking.

When your modified program – including the global declaration of pp and any delay-loop added – generates the same kind of output as in step 2 of this assignment, you are done! Last question to answer: why can't the sections of the kernel enclosed by DISABLE() and ENABLE() instead be protected by calls to the "proper" mutex operations you have just implemented?

Technical notes on the implementation

The thread implementation in tinythreads is of the kind to expect in a typical threading library; i.e., the threads share global memory and other resources but use private stacks. Execution using alternative stacks is achieved by means of a small platform-dependent trick in spawn(): when the newly created thread has had its execution context initialized by setjmp() (return value 0), the stored stack-pointer is overwritten with a suitable address within a separate memory block. The exact location within a jmp_buf where the stack-pointer is stored differs between platforms, hence the preprocessor macro SETSTACK(buf,a) has to be modified if tinythreads.c is to be moved to some other platform than the AVR. Note especially that on architectures where the stack grows towards lower addresses, the initial value must be set close to the highest address of the designated memory block.

The key to understanding this thread library is to come to peace with how the use of setjmp() and longjmp() works. You might therefore find it helpful to work through a sequence of spawn() and yield() calls with pen and paper, and notice how the underlying "thread of execution" jumps from context to context. Time slicing by the interrupt handler can then be understood simply as spontaneous calls to yield() from arbitrary code positions not protected from interrupts.

Finally, a note on memory management. The stack space for a thread is allocated as part of the control block for that thread, and a fixed amount of thread control blocks are defined in a global array, initially organized as a linked list of free blocks, from which memory can be allocated when a new thread is spawned. For simplicity, allocation and deallocation of thread memory is handled by the same operations that organize queues of threads in other parts of the implementation, although it should be noted that this is a slight overkill – there is no underlying need to treat the pool of free thread blocks as a queue. Both the stack spaces and the thread block array are of fixed size, which of course can be a limiting factor in practice. In the current implementation, execution simply halts if dequeing is attempted on an empty queue, whereas shortage of stack space will manifest itself as random memory corruption caused by out-of-bounds memory writes. In a production system, a more dynamic memory management scheme would be desirable, preferably coupled with an analysis method to statically predict the memory needs of a set of threads.