pre-Release 6
SCD 111100 |
base |
rt |
offset |
6 |
5 |
5 |
16 |
Release 6
SPECIAL3 011111 |
base |
rt |
offset |
0 |
SCD 100111 |
6 |
5 |
5 |
9 |
1 |
6 |
SCD rt, offset(base) |
MIPS64 |
Store Conditional Doubleword |
Store Conditional Doubleword
To store a doubleword to memory to complete an atomic read-modify-write.
if atomic_update then memory[GPR[base] + offset] = GPR[rt], GPR[rt] = 1 else GPR[rt] = 0
The LLD and SCD instructions provide primitives to implement atomic read-modify-write (RMW) operations for synchronizable memory locations.
Release 6 (with Config5ULS =1) formalizes support for uncached LLD and SCD sequences, whereas the preRelease 6 LLD and SCD description applies to cached (coherent/non-coherent) memory types. (The description for uncached support does not modify the description for cached support and is written in a self-contained manner.)
The 64-bit doubleword in GPR rt is conditionally stored in memory at the location specified by the aligned effective address. The signed offset is added to the contents of GPR base to form an effective address.
The SCD completes the RMW sequence begun by the preceding LLD instruction executed on the processor. If SCD completes the RMW sequence atomically, the following occurs:
The 64-bit doubleword of GPR rt is stored into memory at the location specified by the aligned effective address.
A 1, indicates success, is written into GPR rt.
Otherwise, memory is not modified and a 0, indicating failure, is written into GPR rt.
If either of the following events occurs between the execution of LLD and SCD, the SCD fails:
A coherent store is completed by another processor or coherent I/O module into the block of synchronizable physical memory containing the doubleword. The size and alignment of the block is implementation dependent, but it is at least one doubleword and at most the minimum page size.
An ERET instruction is executed.
If either of the following events occurs between the execution of LLD and SCD, the SCD may succeed or it may fail; the success or failure is not predictable. Portable programs should not cause the following events:
A memory access instruction (load, store, or prefetch) is executed on the processor executing the LLD/SCD.
The instructions executed starting with the LLD and en ding with the SCD do not lie in a 2048-byte contiguous region of virtual memory. (The region does not have to be aligned, other than the alignment required for instruction words.)
The following two conditions must be true or the result of the SCD is UNPREDICTABLE:
Execution of the SCD must be preceded by execution of an LLD instruction.
An RMW sequence executed without intervening events that would cause the SCD to fail must use the same address in the LLD and SCD. The address is the same if the virtual address, physical address, and cache-coherence algorithm are identical.
Atomic RMW is provided only for synchronizable memory locations. A synchronizable memory location is one that is associated with the state and logic necessary to implement the LL/SC semantics. Whether a memory location is synchronizable depends on the processor and system configurations, and on the memory access type used for the location:
Uniprocessor atomicity: To provide atomic RMW on a single processor, all accesses to the location must be
made with memory access type of either cached non coherent or cached coherent. All accesses must be to one or the other access type, and they may not be mixed.
MP atomicity: To provide atomic RMW among multiple processors, all accesses to the location must be made
with a memory access type of cached coherent.
I/O System: To provide atomic RMW with a coherent I/O system, all accesses to the location must be made with
a memory access type of cached coherent. If the I/O system does not use coherent memory operations, then atomic RMW cannot be provided with respect to the I/O reads and writes.
Release 6 (with Config5ULS =1) formally defines support for uncached LLD and SCD with the following constraints.
Both LLD and SCD must be uncached, and the address must be defined as synchronizable in the system. If the address is non-synchronizable, then this may result in UNPREDICTABLE behavior. The recommended response is that the sub-system report a Bus Error to the processor.
The use of uncached LLD and SCD is applicable to any address within the supported address range of the system, or any system configuration, as long as the system implements means to monitor the sequence.
The SCD that ends the sequence may fail locally, but never succeed locally within the processor. When it does not fail locally, the SCD must be issued to a "monitor" which is responsible for monitoring the address. This monitor makes the final determination as to whether the SCD fails or not, and communicates this to the processor that initiated the sequence.
It is implementation dependent as to what form the monitor takes. It is however differentiated from cached LLD and SCD which rely on a coherence protocol to make the determination as to whether the sequence succeeds.
Same processor uncached (but not cached) stores will cause the sequence to fail if the store address matches that of the sequence. A cached store to the same address will cause UNPREDICTABLE behavior.
Remote cached coherent stores to the same address will cause UNPREDICTABLE behavior.
Remote cached non-coherent or uncached stores may cause the sequence to fail if they address the external monitor and the monitor makes this determination.
As emphasized above, it is not recommended that software mix memory access types during LLD and SCD sequences. That is all memory accesses must be of the same type, otherwise this may result in UNPREDICTABLE behavior.
Conditions that cause UNPREDICTABLE behavior for legacy cached LLD and SCD sequences may also cause such behavior for uncached sequences.
A PAUSE instruction is no-op'd when it is preceded by an uncached LLD.
The semantics of an uncached LLD/SCD atomic operation applies to any uncached CCA including UCA (UnCached
Accelerated). An implementation that supports UCA must guarantee that SCD does not participate in store gathering and that it ends any gathering initiated by stores preceding the SCD in program order when the SCD address coincides with a gathering address.
The addressed location must have a memory access type of cached non coherent or cached coherent; if it does not, the result is UNPREDICTABLE. Release 6 (with Config5ULS =1) extends support to uncached types.
The effective address must be naturally-aligned. If any of the 3 least-significant bits of the address is non-zero, an
Address Error exception occurs.
Providing misaligned support for Release 6 is not a requirement for this instruction.
This instruction has been recoded for Release 6.
vAddr = sign_extend(offset) + GPR[base] if vAddr2..0 != 03 then SignalException(AddressError) endif (pAddr, CCA) = AddressTranslation (vAddr, DATA, STORE) datadoubleword = GPR[rt] if LLbit then StoreMemory (CCA, DOUBLEWORD, datadoubleword, pAddr, vAddr, DATA) endif GPR[rt] = 063 || LLbit
TLB Refill, TLB Invalid, TLB Modified, Address Error, Reserved Instruction, Watch
LLD and SCD are used to atomically update memory locations, as shown below.
L1: LLD T1, (T0) # load counter ADDI T2, T1, 1 # increment SCD T2, (T0) # try to store, # checking for atomicity BEQ T2, 0, L1 # if not atomic (0), try again NOP # branch-delay slot
Exceptions between the LLD and SCD cause SCD to fail, so persistent exceptions must be avoided. Examples are arithmetic operations that trap, system calls, and floating point operations that trap or require software emulation assistance.
LLD and SCD function on a single processor for cached non coherent memory so that parallel programs can be run on uniprocessor systems that do not support cached coherent memory access types.