pre-Release 6
SC 111000 |
base |
rt |
offset |
6 |
5 |
5 |
16 |
Release 6
SPECIAL3 011111 |
base |
rt |
offset |
0 |
SC 100110 |
6 |
5 |
5 |
9 |
1 |
6 |
SC rt, offset(base) |
MIPS32 |
Store Conditional Word |
Store Conditional Word
To store a word to memory to complete an atomic read-modify-write
if atomic_update then memory[GPR[base] + offset] = GPR[rt], GPR[rt] = 1 else GPR[rt] = 0
The LL and SC instructions provide primitives to implement atomic read-modify-write (RMW) operations on synchronizable memory locations. In Release 5, the behavior of SC is modified when Config5LLB=1.
Release 6 (with Config5ULS =1) formalizes support for uncached LL and SC sequences, whereas the pre-Release 6
LL and SC description applies to cached (coherent/non-coherent) memory types. (The description for uncached support does not modify the description for cached support and is written in a self-contained manner.)
The least-significant 32-bit word in GPR rt is conditionally stored in memory at the location specified by the aligned effective address. The signed offset is added to the contents of GPR base to form an effective address.
The SC completes the RMW sequence begun by the preceding LL instruction executed on the processor. To complete the RMW sequence atomically, the following occur:
The least-significant 32-bit word of GPR rt is stored to memory at the location specified by the aligned effective
address.
A one, indicating success, is written into GPR rt.
Otherwise, memory is not modified and a 0, indicating failure, is written into GPR rt.
If either of the following events occurs between the execution of LL and SC, the SC fails:
A coherent store is completed by another processor or coherent I/O module into the block of synchronizable physical memory containing the word. The size and alignment of the block is implementation-dependent, but it is at least one word and at most the minimum page size.
A coherent store is executed between an LL and SC sequence on the same processor to the block of synchronizable physical memory containing the word (if Config5LLB=1; else whether such a store causes the SC to fail is not predictable).
An ERET instruction is executed. (Release 5 includes ERETNC, which will not cause the SC to fail.)
Furthermore, an SC must always compare its address against that of the LL. An SC will fail if the aligned address of the SC does not match that of the preceding LL.
A load that executes on the processor executing the LL/SC sequence to the block of synchronizable physical memory containing the word, will not cause the SC to fail (if Config5LLB=1; else such a load may cause the SC to fail).
If any of the events listed below occurs between the execution of LL and SC, the SC may fail where it could have succeeded, i.e., success is not predictable. Portable programs should not cause any of these events.
A load or store executed on the processor executing the LL and SC that is not to the block of synchronizable physical memory containing the word. (The load or store may cause a cache eviction between the LL and SC that results in SC failure. The load or store does not necessarily have to occur between the LL and SC.)
Any prefetch that is executed on the processor executing the LL and SC sequence (due to a cache eviction between the LL and SC).
A non-coherent store executed between an LL and SC sequence to the block of synchronizable physical memory containing the word.
The instructions executed starting with the LL and ending with the SC do not lie in a 2048-byte contiguous region of virtual memory. (The region does not have to be aligned, other than the alignment required for instruction words.)
CACHE operations that are local to the processor executing the LL/SC sequence will result in unpredictable behaviour of the SC if executed between the LL and SC, that is, they may cause the SC to fail where it could have succeeded. Non-local CACHE operations (address-type with coherent CCA) may cause an SC to fail on either the local processor or on the remote processor in multiprocessor or multi-threaded systems. This definition of the effects of
CACHE operations is mandated if Config5LLB=1. If Config5LLB=0, then CACHE effects are implementation-dependent.
The following conditions must be true or the result of the SC is not predictable-the SC may fail or succeed (if
Config5LLB=1, then either success or failure is mandated, else the result is UNPREDICTABLE):
Execution of SC must have been preceded by execution of an LL instruction.
An RMW sequence executed without intervening events that would cause the SC to fail must use the same address in the LL and SC. The address is the same if the virtual address, physical address, and cacheability & coherency attribute are identical.
Atomic RMW is provided only for synchronizable memory locations. A synchronizable memory location is one that is associated with the state and logic necessary to implement the LL/SC semantics. Whether a memory location is synchronizable depends on the processor and system configurations, and on the memory access type used for the location:
Uniprocessor atomicity: To provide atomic RMW on a single processor, all accesses to the location must be
made with memory access type of either cached noncoherent or cached coherent. All accesses must be to one or the other access type, and they may not be mixed.
MP atomicity: To provide atomic RMW among multiple processors, all accesses to the location must be made
with a memory access type of cached coherent.
I/O System: To provide atomic RMW with a coherent I/O system, all accesses to the location must be made with
a memory access type of cached coherent. If the I/O system does not use coherent memory operations, then atomic RMW cannot be provided with respect to the I/O reads and writes.
Release 6 (with Config5ULS =1) formally defines support for uncached LL and SC with the following constraints.
Both LL and SC must be uncached, and the address must be defined as synchronizable in the system. If the address is non-synchronizable, then this may result in UNPREDICTABLE behavior. The recommended response is that the sub-system report a Bus Error to the processor.
The use of uncached LL and SC is applicable to any address within the supported address range of the system, or any system configuration, as long as the system implements means to monitor the sequence.
The SC that ends the sequence may fail locally, but never succeed locally within the processor. When it does not fail locally, the SC must be issued to a "monitor" which is responsible for monitoring the address. This monitor makes the final determination as to whether the SC fails or not, and communicates this to the processor that initiated the sequence.
It is implementation dependent as to what form the monitor takes. It is however differentiated from cached LL and SC which rely on a coherence protocol to make the determination as to whether the sequence succeeds.
Same processor uncached (but not cached) stores will cause the sequence to fail if the store address matches that of the sequence. A cached store to the same address will cause UNPREDICTABLE behavior.
Remote cached coherent stores to the same address will cause UNPREDICTABLE behavior.
Remote cached non-coherent or uncached stores may cause the sequence to fail if they address the external monitor and the monitor makes this determination.
As emphasized above, it is not recommended that software mix memory access types during LL and SC sequences.
That is all memory accesses must be of the same type, otherwise this may result in UNPREDICTABLE behavior.
Conditions that cause UNPREDICTABLE behavior for legacy cached LL and SC sequences may also cause such behavior for uncached sequences.
A PAUSE instruction is no-op'd when it is preceded by an uncached LL.
The semantics of an uncached LL/SC atomic operation applies to any uncached CCA including UCA (UnCached
Accelerated). An implementation that supports UCA must guarantee that SC does not participate in store gathering and that it ends any gathering initiated by stores preceding the SC in program order when the SC address coincides with a gathering address.
The addressed location must have a memory access type of cached noncoherent or cached coherent; if it does not, the result is UNPREDICTABLE. Release 6 (with Config5ULS =1) extends support to uncached types.
The effective address must be naturally-aligned. If either of the 2 least-significant bits of the address is non-zero, an
Address Error exception occurs.
Providing misaligned support for Release 6 is not a requirement for this instruction.
Availability and Compatibility
This instruction has been recoded for Release 6.
vAddr = sign_extend(offset) + GPR[base] if vAddr1..0 != 02 then SignalException(AddressError) endif (pAddr, CCA) = AddressTranslation (vAddr, DATA, STORE) pAddr = pAddrPSIZE-1..3 || (pAddr2..0 xor (ReverseEndian || 02)) bytesel = vAddr2..0 xor (BigEndianCPU || 02) datadoubleword = GPR[rt]63-8*bytesel..0 || 08*bytesel if LLbit then StoreMemory (CCA, WORD, datadoubleword, pAddr, vAddr, DATA) endif GPR[rt] = 063 || LLbit LLbit = 0 // if Config5LLB=1, SC always clears LLbit regardless of address match.
TLB Refill, TLB Invalid, TLB Modified, Address Error, Watch
LL and SC are used to atomically update memory locations, as shown below.
L1: LL T1, (T0) # load counter ADDI T2, T1, 1 # increment SC T2, (T0) # try to store, checking for atomicity BEQ T2, 0, L1 # if not atomic (0), try again NOP # branch-delay slot
Exceptions between the LL and SC cause SC to fail, so persistent exceptions must be avoided. Some examples of these are arithmetic operations that trap, system calls, and floating point operations that trap or require software emulation assistance.
LL and SC function on a single processor for cached noncoherent memory so that parallel programs can be run on uniprocessor systems that do not support cached coherent memory access types.
As shown in the instruction drawing above, Release 6 implements a 9-bit offset, whereas all release levels lower than
Release 6 of the MIPS architecture implement a 16-bit offset.