Encoding:

POOL32C

011000

base

SCWP

1001

000

0000

Format:

SCWP rt, rd, (base)

microMIPS Release 6

Store Conditional Word Paired

Purpose:

Store Conditional Word Paired

Conditionally store a paired word to memory to complete an atomic read-modify-write.

Description:

 if atomic_update then memory[GPR[base]] = {GPR[rd],GPR[rt]}, GPR[rt] = 1 else GPR[rt] = 0

The LLWP and SCWP instructions provide primitives to implement a paired word atomic read-modify-write (RMW) operation at a synchronizable memory location.

Release 6 (with Config5_ULS =1) formalizes support for uncached LLWP and SCWP sequences. (The description for uncached support does not modify the description for cached support and is written in a self-contained manner.)

A paired word is formed from the concatenation of GPR rd and GPR rt. GPR rd is the most-significant word of the paired word, and GPR rt is the least-significant word of the paired word. The paired word is conditionally stored in memory at the location specified by the double-word aligned effective address from GPR base.

A paired word read or write occurs as a pair of word reads or writes that is double-word atomic.

The instruction has no offset. The effective address is equal to the contents of GPR base.

rd is intentionally positioned in a non-standard bit-range.

The SCWP completes the RMW sequence begun by the preceding LLWP instruction executed on the processor. To complete the RMW sequence atomically, the following occur:

The paired word formed from the concatenation of GPRs rd and rt is stored to memory at the location specified by the double-word aligned effective address.

A one, indicating success, is written into GPR rt.

Otherwise, memory is not modified and a 0, indicating failure, is written into GPR rt.

Though legal programming requires LLWP to start the atomic read-modify-write sequence and SCWP to end the same sequence, whether the SCWP completes is only dependent on the state of LLbit and LLAddr, which are set by a preceding load-linked instruction of any type. Software must assume that pairing load-linked and store-conditional instructions in an inconsistent manner causes UNPREDICTABLE behavior.

The SCWP must always compare its double-word aligned address against that of the preceding LLWP. The SCWP will fail if the address does not match that of the preceding LLWP.

Events that occur between the execution of load-linked and store-conditional instruction types that must cause the sequence to fail are given in the legacy SC instruction description.

Additional events that occur between the execution of load-linked and store-conditional instruction types that may cause success of the sequence to be UNPREDICTABLE are defined in the SC instruction description.

A load that executes on the processor executing the LLWP/SCWP sequence to the block of synchronizable physical memory containing the paired word, will not cause the SCWP to fail.

Effect of CACHE operations, both local and remote, on a paired word atomic operation are defined in the SC instruction description.

Atomic RMW is provided only for synchronizable memory locations. A synchronizable memory location is one that is associated with the state and logic necessary to implement the LL/SC semantics. Whether a memory location is synchronizable depends on the processor and system configurations, and on the memory access type used for the location. Requirements for Uniprocessor, MP and I/O atomicity are given in the SC definition.

The definition for SCWP is extended for uncached memory types in a manner identical to SC. The extension is defined in the SC instruction description.

Restrictions:

Load-Linked and Store-Conditional instruction types require that the addressed location must have a memory access type of cached noncoherent or cached coherent, that is the processor must have a cache. If it does not, the result is

UNPREDICTABLE. Release 6 (with Config5_ULS =1) extends support to uncached types.

The architecture optionally allows support for Load-Linked and Store-Conditional instruction types in a cacheless processor. Support for cacheless operation is implementation dependent. In this case, LLAddr is optional.

Providing misaligned support is not a requirement for this instruction.

Availability and Compatibility

This instruction is introduced by Release 6. It is only present if Config5_XNP=0.

Operation:

vAddr = GPR[base]
(pAddr, CCA) = AddressTranslation (vAddr, DATA, STORE)
datadoubleword_31..0 = GPR[rt]_31..0
datadoubleword_63..32 = GPR[rd]_31..0
if (LLbit && (pAddr == LLAddr))then
// PAIREDWORD: two word data-type that is double-word atomic
   StoreMemory (CCA, PAIREDWORD, datadoubleword, pAddr, vAddr, DATA)
   GPR[rt] = 0⁶³|| 1'b1
else
   GPR[rt] = 0⁶⁴
endif
LLbit = 0

Exceptions:

TLB Refill, TLB Invalid, TLB Modified, Reserved Instruction, Address Error, Watch

Programming Notes:

LLWP and SCWP are used to atomically update memory locations, as shown below.

L1:
   LLWP  T2, T3, (T0)  # load T2 and T3
   BOVC  T2, 1, U32   # check whether least-significant word may overflow
   ADDI  T2, T2, 1    # increment lower - only
   SCWP  T2, T3, (T0) # store T2 and T3
   BEQC  T2, 0, L1    # if not atomic (0), try again
U32:
   ADDI  T2, T2, 1    # increment lower
   ADDI  T3, T3, 1    # increment upper
   SCWP  T2, T3, (T0)
   BEQC  T2, 0, L1    # if not atomic (0), try again

Exceptions between the LLWP and SCWP cause SC to fail, so persistent exceptions must be avoided. Some examples of these are arithmetic operations that trap, system calls, and floating point operations that trap or require software emulation assistance.

LLWP and SCWP function on a single processor for cached noncoherent memory so that parallel programs can be run on uniprocessor systems that do not support cached coherent memory access types.