POOL32C 011000 |
rt |
base |
ST-EVA 1010 |
SCWPE 000 |
rd |
0 0000 |
6 |
5 |
5 |
4 |
3 |
5 |
4 |
SCWPE rt, rd, (base) |
microMIPS Release 6 |
Store Conditional Word Paired EVA |
Store Conditional Word Paired EVA
Conditionally store a paired word to memory to complete an atomic read-modify-write. The store occurs in kernel mode to user virtual address space.
if atomic_update then memory[GPR[base]]= {GPR[rd],GPR[rt]}, GPR[rt] = 1 else GPR[rt] = 0
The LLWPE and SCWPE instructions provide primitives to implement a paired word atomic read-modify-write
(RMW) operation at a synchronizable memory location.
Release 6 (with Config5ULS =1) formalizes support for uncached LLWPE and SCWPE sequences. (The description for uncached support does not modify the description for cached support and is written in a self-contained manner.)
A paired word is formed from the concatentation of GPR rd and GPR rt. GPR rd is the most-significant word of the double-word, and GPR rt is the least-significant word of the double-word. The paired word is conditionally stored in memory at the location specified by the double-word aligned effective address from GPR base.
A paired word read or write occurs as a pair of word reads or writes that is double-word atomic.
The instruction has no offset. The effective address is equal to the contents of GPR base.
rd is intentionally positioned in a non-standard bit-range.
The SCWPE completes the RMW sequence begun by the preceding LLWPE instruction executed on the processor.
To complete the RMW sequence atomically, the following occur:
The paired word formed from the concatenation of GPRs rd and rt is stored to memory at the location specified by the double-word aligned effective address.
A one, indicating success, is written into GPR rt.
Otherwise, memory is not modified and a 0, indicating failure, is written into GPR rt.
Though legal programming requires LLWPE to start the atomic read-modify-write sequence and SCWPE to end the same sequence, whether the SCWPE completes is only dependent on the state of LLbit and LLAddr, which are set by a preceding load-linked instruction of any type. Software must assume that pairing load-linked and store-conditional instructions in an inconsistent manner causes UNPREDICTABLE behavior.
The SCWPE must always compare its double-word aligned address against that of the preceding LLWPE. The
SCWPE will fail if the address does not match that of the preceding LLWPE.
The SCWPE instruction functions the same as the SCWP instruction, except that address translation is performed using the user mode virtual address space mapping in the TLB when accessing an address within a memory segment configured to use the MUSUK access mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Segmentation Control for additional information.
Events that occur between the execution of load-linked and store-conditional instruction types that must cause the sequence to fail are given in the legacy SC instruction definition..
Additional events that occur between the execution of load-linked and store-conditional instruction types that may cause success of the sequence to be UNPREDICTABLE are defined in the SC instruction definition.
A load that executes on the processor executing the LLWPE/SCWPE sequence to the block of synchronizable physical memory containing the paired word, will not cause the SCWPE to fail.
Effect of CACHE operations, both local and remote, on a paired word atomic operation are defined in the SC instruction definition.
Atomic RMW is provided only for synchronizable memory locations. A synchronizable memory location is one that is associated with the state and logic necessary to implement the LL/SC semantics. Whether a memory location is synchronizable depends on the processor and system configurations, and on the memory access type used for the location. Requirements for Uniprocessor, MP and I/O atomicity are given in the SC definition.
The definition for SCWPE is extended for uncached memory types in a manner identical to SC. The extension is defined in the SC instruction description.
Load-Linked and Store-Conditional instruction types require that the addressed location must have a memory access type of cached noncoherent or cached coherent, that is the processor must have a cache. If it does not, the result is
UNPREDICTABLE. Release 6 (with Config5ULS =1) extends support to uncached types.
The architecture optionally allows support for Load-Linked and Store-Conditional instruction types in a cacheless processor. Support for cacheless operation is implementation dependent. In this case, LLAddr is optional.
Providing misaligned support is not a requirement for this instruction.
Availability and Compatibility
This instruction is introduced by Release 6. It is only present if Config5XNP=0 and Config5EVA=1.
vAddr = GPR[base] (pAddr, CCA) = AddressTranslation (vAddr, DATA, STORE) datadoubleword31..0 = GPR[rt]31..0 datadoubleword63..32 = GPR[rd]31..0 if (LLbit && (pAddr == LLAddr))then // PAIREDWORD: two word data-type that is double-word atomic StoreMemory (CCA, PAIREDWORD, datadoubleword, pAddr, vAddr, DATA) GPR[rt] = 063 || 1'b1 else GPR[rt] = 064 endif LLbit = 0
TLB Refill, TLB Invalid, TLB Modified, Reserved Instruction, Address Error, Watch, Coprocessor Unusable.
LLWPE and SCWPE are used to atomically update memory locations, as shown below.
L1: LLWPE T2, T3,(T0) # load T2 and T3 BOVC T2, 1, U32 # check whether least-significant word may overflow ADDI T2, T2, 1 # increment lower - only SCWPE T2, T3, (T0) # store T2 and T3 BEQC T2, 0, L1 # if not atomic (0), try again U32: ADDI T2, T2, 1 # increment lower ADDI T3, T3, 1 # increment upper SCWPE T2, T3, (T0) BEQC T2, 0, L1 # if not atomic (0), try again
Exceptions between the LLWPE and SCWPE cause SC to fail, so persistent exceptions must be avoided. Some examples of these are arithmetic operations that trap, system calls, and floating point operations that trap or require software emulation assistance.
LLWPE and SCWPE function on a single processor for cached noncoherent memory so that parallel programs can be run on uniprocessor systems that do not support cached coherent memory access types.