Encoding:

POOL32C

011000

base

ST-EVA

1010

SCE

110

offset

Format:

SCE rt, offset(base)

microMIPS

Store Conditional Word EVA

Purpose:

Store Conditional Word EVA

To store a word to user mode virtual memory while operating in kernel mode to complete an atomic read-modifywrite.

Description:

 if atomic_update then memory[GPR[base] + offset] = GPR[rt], GPR[rt] = 1 else 
GPR[rt] = 0

The LL and SC instructions provide primitives to implement atomic read-modify-write (RMW) operations for synchronizable memory locations.

Release 6 (with Config5_ULS =1) formalizes support for uncached LLE and SCE sequences. (The description for uncached support does not modify the description for cached support and is written in a self-contained manner.)

The least-significant 32-bit word in GPR rt is conditionally stored in memory at the location specified by the aligned effective address. The 9-bit signed offset is added to the contents of GPR base to form an effective address.

The SCE completes the RMW sequence begun by the preceding LLE instruction executed on the processor. To complete the RMW sequence atomically, the following occurs:

The least-significant 32-bit word of GPR rt is stored to memory at the location specified by the aligned effective address.

A 1, indicating success, is written into GPR rt.

Otherwise, memory is not modified and a 0, indicating failure, is written into GPR rt.

If either of the following events occurs between the execution of LL and SC, the SC fails:

A coherent store is completed by another processor or coherent I/O module into the block of synchronizable physical memory containing the word. The size and alignment of the block is implementation dependent, but it is at least one word and at most the minimum page size.

An ERET instruction is executed.

If either of the following events occurs between the execution of LLE and SCE, the SCE may succeed or it may fail; the success or failure is not predictable. Portable programs should not cause one of these events.

A memory access instruction (load, store, or prefetch) is executed on the processor executing the LLE/SCE.
The instructions executed starting with the LLE and ending with the SCE do not lie in a 2048-byte contiguous region of virtual memory. (The region does not have to be aligned, other than the alignment required for instruction words.)

The following conditions must be true or the result of the SCE is UNPREDICTABLE:

Execution of SCE must have been preceded by execution of an LLE instruction.
An RMW sequence executed without intervening events that would cause the SCE to fail must use the same address in the LLE and SCE. The address is the same if the virtual address, physical address, and cacheability & coherency attribute are identical.

Atomic RMW is provided only for synchronizable memory locations. A synchronizable memory location is one that is associated with the state and logic necessary to implement the LLE/SCE semantics. Whether a memory location is synchronizable depends on the processor and system configurations, and on the memory access type used for the location:

Uniprocessor atomicity: To provide atomic RMW on a single processor, all accesses to the location must be

made with memory access type of either cached non coherent or cached coherent. All accesses must be to one or the other access type, and they may not be mixed.

MP atomicity: To provide atomic RMW among multiple processors, all accesses to the location must be made

with a memory access type of cached coherent.

I/O System: To provide atomic RMW with a coherent I/O system, all accesses to the location must be made with

a memory access type of cached coherent. If the I/O system does not use coherent memory operations, then atomic RMW cannot be provided with respect to the I/O reads and writes.

The SCE instruction functions the same as the SC instruction, except that address translation is performed using the user mode virtual address space mapping in the TLB when accessing an address within a memory segment configured to use the MUSUK access mode. Memory segments using UUSK or MUSK access modes are also accessible.

Refer to Volume III, Enhanced Virtual Addressing section for additional information.

Implementation of this instruction is specified by the Config5_EVA field being set to 1.

The definition for SCE is extended for uncached memory types in a manner identical to SC. The extension is defined in the SC instruction description.

Restrictions:

The addressed location must have a memory access type of cached non coherent or cached coherent; if it does not, the result is UNPREDICTABLE. Release 6 (with Config5ULS =1) extends support to uncached types.

The effective address must be naturally-aligned. If either of the 2 least-significant bits of the address is non-zero, an

Address Error exception occurs.

Providing misaligned support for Release 6 is not a requirement for this instruction.

Operation:

vAddr = sign_extend(offset) + GPR[base]
if vAddr_1..0 != 0² then
   SignalException(AddressError)
endif
(pAddr, CCA) = AddressTranslation (vAddr, DATA, STORE)
pAddr = pAddr_PSIZE-1..3 || (pAddr_2..0 xor (ReverseEndian || 0²))
bytesel = vAddr_2..0 xor (BigEndianCPU || 0²)
datadoubleword = GPR[rt]_{63-8*bytesel..0} || 0^8*bytesel
if LLbit then
   StoreMemory (CCA, WORD, datadoubleword, pAddr, vAddr, DATA)
endif
GPR[rt] = 0⁶³|| LLbit

Exceptions:

TLB Refill, TLB Invalid, TLB Modified, Address Error, Watch, Reserved Instruction, Coprocessor Unusable

Programming Notes:

LLE and SCE are used to atomically update memory locations, as shown below.

L1:
   LLE   T1, (T0)  # load counter
   ADDI  T2, T1, 1 # increment
   SCE   T2, (T0)  # try to store, checking for atomicity
   BEQC  T2, 0, L1 # if not atomic (0), try again

Exceptions between the LLE and SCE cause SCE to fail, so persistent exceptions must be avoided. Examples are arithmetic operations that trap, system calls, and floating point operations that trap or require software emulation assistance.

LLE and SCE function on a single processor for cached non coherent memory so that parallel programs can be run on uniprocessor systems that do not support cached coherent memory access types.