Encoding:

POOL32C

011000

rt

base

ST-EVA

1010

SCWPE

000

rd

0

0000

6

5

5

4

3

5

4

Format:

SCWPE rt, rd, (base)

microMIPS Release 6

Store Conditional Word Paired EVA

Purpose:

Store Conditional Word Paired EVA

Conditionally store a paired word to memory to complete an atomic read-modify-write. The store occurs in kernel mode to user virtual address space.

Description:

 if atomic_update then memory[GPR[base]]= {GPR[rd],GPR[rt]}, GPR[rt] = 1 else GPR[rt] = 0

The LLWPE and SCWPE instructions provide primitives to implement a paired word atomic read-modify-write

(RMW) operation at a synchronizable memory location.

Release 6 (with Config5ULS =1) formalizes support for uncached LLWPE and SCWPE sequences. (The description for uncached support does not modify the description for cached support and is written in a self-contained manner.)

A paired word is formed from the concatentation of GPR rd and GPR rt. GPR rd is the most-significant word of the double-word, and GPR rt is the least-significant word of the double-word. The paired word is conditionally stored in memory at the location specified by the double-word aligned effective address from GPR base.

A paired word read or write occurs as a pair of word reads or writes that is double-word atomic.

The instruction has no offset. The effective address is equal to the contents of GPR base.

rd is intentionally positioned in a non-standard bit-range.

The SCWPE completes the RMW sequence begun by the preceding LLWPE instruction executed on the processor.

To complete the RMW sequence atomically, the following occur:

Otherwise, memory is not modified and a 0, indicating failure, is written into GPR rt.

Though legal programming requires LLWPE to start the atomic read-modify-write sequence and SCWPE to end the same sequence, whether the SCWPE completes is only dependent on the state of LLbit and LLAddr, which are set by a preceding load-linked instruction of any type. Software must assume that pairing load-linked and store-conditional instructions in an inconsistent manner causes UNPREDICTABLE behavior.

The SCWPE must always compare its double-word aligned address against that of the preceding LLWPE. The

SCWPE will fail if the address does not match that of the preceding LLWPE.

The SCWPE instruction functions the same as the SCWP instruction, except that address translation is performed using the user mode virtual address space mapping in the TLB when accessing an address within a memory segment configured to use the MUSUK access mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Segmentation Control for additional information.

Events that occur between the execution of load-linked and store-conditional instruction types that must cause the sequence to fail are given in the legacy SC instruction definition..

Additional events that occur between the execution of load-linked and store-conditional instruction types that may cause success of the sequence to be UNPREDICTABLE are defined in the SC instruction definition.

A load that executes on the processor executing the LLWPE/SCWPE sequence to the block of synchronizable physical memory containing the paired word, will not cause the SCWPE to fail.

Effect of CACHE operations, both local and remote, on a paired word atomic operation are defined in the SC instruction definition.

Atomic RMW is provided only for synchronizable memory locations. A synchronizable memory location is one that is associated with the state and logic necessary to implement the LL/SC semantics. Whether a memory location is synchronizable depends on the processor and system configurations, and on the memory access type used for the location. Requirements for Uniprocessor, MP and I/O atomicity are given in the SC definition.

The definition for SCWPE is extended for uncached memory types in a manner identical to SC. The extension is defined in the SC instruction description.

Restrictions:

Load-Linked and Store-Conditional instruction types require that the addressed location must have a memory access type of cached noncoherent or cached coherent, that is the processor must have a cache. If it does not, the result is

UNPREDICTABLE. Release 6 (with Config5ULS =1) extends support to uncached types.

The architecture optionally allows support for Load-Linked and Store-Conditional instruction types in a cacheless processor. Support for cacheless operation is implementation dependent. In this case, LLAddr is optional.

Providing misaligned support is not a requirement for this instruction.

Availability and Compatibility

This instruction is introduced by Release 6. It is only present if Config5XNP=0 and Config5EVA=1.

Operation:

vAddr = GPR[base]
(pAddr, CCA) = AddressTranslation (vAddr, DATA, STORE)
datadoubleword31..0 = GPR[rt]31..0
datadoubleword63..32 = GPR[rd]31..0
if (LLbit && (pAddr == LLAddr))then
   // PAIREDWORD: two word data-type that is double-word atomic
   StoreMemory (CCA, PAIREDWORD, datadoubleword, pAddr, vAddr, DATA)
   GPR[rt] = 063 || 1'b1
else
   GPR[rt] = 064
endif
LLbit = 0

Exceptions:

TLB Refill, TLB Invalid, TLB Modified, Reserved Instruction, Address Error, Watch, Coprocessor Unusable.

Programming Notes:

LLWPE and SCWPE are used to atomically update memory locations, as shown below.

L1:
   LLWPE T2, T3,(T0)  # load T2 and T3 
   BOVC  T2, 1, U32   # check whether least-significant word may overflow
   ADDI  T2, T2, 1    # increment lower - only
   SCWPE T2, T3, (T0)  # store T2 and T3
   BEQC  T2, 0, L1    # if not atomic (0), try again
U32:
   ADDI  T2, T2, 1    # increment lower
   ADDI  T3, T3, 1    # increment upper
   SCWPE T2, T3, (T0)
   BEQC  T2, 0, L1    # if not atomic (0), try again

Exceptions between the LLWPE and SCWPE cause SC to fail, so persistent exceptions must be avoided. Some examples of these are arithmetic operations that trap, system calls, and floating point operations that trap or require software emulation assistance.

LLWPE and SCWPE function on a single processor for cached noncoherent memory so that parallel programs can be run on uniprocessor systems that do not support cached coherent memory access types.