Implementing Semaphores on ARM Processors
Semaphores are used to manage access to a shared resource. Depending on the type of semaphore, one or more clients may be granted access.
Before accessing a resource, a client must read the semaphore value and check that it indicates whether the client can proceed, or whether it must wait. When the client is about to proceed it must change the semaphore value to inform other clients.
A fundamental issue with semaphores is that they themselves are shared resources, which – as we just learned – must be protected by semaphores...
In order to implement a reliable semaphore, we must guarantee atomic access (atomos, Greek, uncuttable), i.e. reading the semaphore value, checking it and writing the modified value back must occur in an uninterruptible sequence. Otherwise, a second client might see the same semaphore value before the first one has had a chance to write back the new value.
In a simple microprocessor system, clients of a semaphore could be interrupt service routines. In such a systen we could easily avoid the issue by simply preventing any other interrupts from being served while we access (read–modify–write) the semaphore. In ARM processors this can be achieved by setting the I and/or F flag in the CPSR (Current Program Status Register). Example:
MRS r12, CPSR ; read CPSR ORR r12, r12, #I_bit ; set I bit MSR CPSR_c, r12 ; write back CPSR
Starting with the ARMv6 architecture and above, there is a slightly easier way to do the same thing:
CPSID i ; disable IRQ
If you are using a RealView C compiler (MDK or RVDS) the best way is to use
a compiler intrinsic called
__disable_fiq(). This intrinsic
generates proper instructions for the respective ARM architecture that the code
will be translated for. (The intrinsic can also return the current IRQ status, but
that would get us too far off track).
Disabling interrupts, however, takes time and it doesn't help us at all if other bus masters are involved. As soon as, say, another processor core can access the semaphore we need a mechanism to prevent the other masters from accessing the system bus, while one task in one core carries out the read–modify–write sequence.
Aware of this issue, ARM has created the SWP (SWaP) instruction, which is available on all ARM architectures prior to version 7. The SWP instruction can be used to implement a binary semaphore, also known as mutex. To implement other types of semaphores, a mutex would have to protect the actual semaphore, making the process a bit more complex. SWP carries out a read from memory followed by a write to memory. The instruction is not interruptible and blocks the system bus for the entire transaction so that no other master can be granted access between read and write accesses.
LOCKED EQU 0 ; define value indicating LDR r1, <addr> ; load semaphore address LDR r0, =LOCKED ; preload "locked" value spin_lock SWP r0, r0, [r1] ; swap register value with semaphore CMP r0, #LOCKED ; if semaphore was locked already BEQ spin_lock ; retry
In some systems, especially complex SoCs with fast application processors, SWP can create a critical performance bottleneck. In these systems memory latency is long compared to core cycle time. That means that while SWP executes, interrupts cannot be served for perhaps many cycles. It also means that while SWP blocks the system bus, no other master can carry out any access, even if it is completely unrelated.
Exclusive Load and Store
To avoid the performance issues, SWP has been deprecated in the ARMv6 and later architectures. A new, more flexible, non-blocking method is now the preferred way of managing shared access. Exclusive load (LDREX) reads data from memory, tagging the memory address at the same time. Exclusive store (STREX) stores data to memory, but only if the tag is still valid. Otherwise memory will not be modified. Every memory access to the same address between LDREX and STREX will invalidate the tag. With this mechanism, bus masters won't be locked out from memory access altogether, but only if they access the same location. Since there can be any instruction sequence (preferably a short one) between the LDREX and the STREX, any type of semaphore could be implemented with this instruction pair. An example is shown below:
LOCKED EQU 0 ; define value indicating LDR r12, <addr> ; preload semaphore address LDR r1, =LOCKED ; preload "locked" value spin_lock LDREX r0, [r12] ; load semaphore value CMP r0, #LOCKED ; if semaphore was locked already STREXNE r0, r1, [r12] ; try to claim CMPNE r0, #1 ; and check success BEQ spin_lock ; retry if claiming semaphore failed.
Note: The description above explains the situation for a special case. For all the details and possible scenarios, please consult the ARM Architecture Reference Manual (cf. sec. "Synchronization and semaphores").
ARM Cortex-M3 bit-banding
ARM's microcontroller core offers yet another way to implement semaphores. Write access to variables in the bit-band alias region causes an atomic read–modify–write access to a memory location in the bit-band region at system bus level.
How does that translate into semaphores? A variable in the bit-band region could serve as container for semaphores. Every client "owns" a bit in that container. Whenever a client needs to claim the semaphore, it sets its own bit by writing a 1 to the corresponding location in the bit-band alias region. It would then read the container (bit-band region) and check that no other bits are set, meaning the client has sucessfully claimed the semaphore. In case that other bits are set, the client would have to clear its own bit again, and retry (perhaps after waiting).
Since SWP and its variants are not available as Thumb instructions, these instructions are not available on the Thumb-only Cortex-M3.