Encoding:

pre-Release 6

SC

111000

base

rt

offset

6

5

5

16

Release 6

SPECIAL3

011111

base

rt

offset

0

SC

100110

6

5

5

9

1

6

Format:

SC rt, offset(base)

MIPS32

Store Conditional Word

Purpose:

Store Conditional Word

To store a word to memory to complete an atomic read-modify-write

Description:

 if atomic_update then memory[GPR[base] + offset] = GPR[rt], GPR[rt] = 1 else GPR[rt] = 0

The LL and SC instructions provide primitives to implement atomic read-modify-write (RMW) operations on synchronizable memory locations. In Release 5, the behavior of SC is modified when Config5LLB=1.

Release 6 (with Config5ULS =1) formalizes support for uncached LL and SC sequences, whereas the pre-Release 6

LL and SC description applies to cached (coherent/non-coherent) memory types. (The description for uncached support does not modify the description for cached support and is written in a self-contained manner.)

The least-significant 32-bit word in GPR rt is conditionally stored in memory at the location specified by the aligned effective address. The signed offset is added to the contents of GPR base to form an effective address.

The SC completes the RMW sequence begun by the preceding LL instruction executed on the processor. To complete the RMW sequence atomically, the following occur:

The least-significant 32-bit word of GPR rt is stored to memory at the location specified by the aligned effective

A one, indicating success, is written into GPR rt.

If either of the following events occurs between the execution of LL and SC, the SC fails:

Furthermore, an SC must always compare its address against that of the LL. An SC will fail if the aligned address of the SC does not match that of the preceding LL.

A load that executes on the processor executing the LL/SC sequence to the block of synchronizable physical memory containing the word, will not cause the SC to fail (if Config5LLB=1; else such a load may cause the SC to fail).

If any of the events listed below occurs between the execution of LL and SC, the SC may fail where it could have succeeded, i.e., success is not predictable. Portable programs should not cause any of these events.

CACHE operations that are local to the processor executing the LL/SC sequence will result in unpredictable behaviour of the SC if executed between the LL and SC, that is, they may cause the SC to fail where it could have succeeded. Non-local CACHE operations (address-type with coherent CCA) may cause an SC to fail on either the local processor or on the remote processor in multiprocessor or multi-threaded systems. This definition of the effects of

CACHE operations is mandated if Config5LLB=1. If Config5LLB=0, then CACHE effects are implementation-dependent.

The following conditions must be true or the result of the SC is not predictable-the SC may fail or succeed (if

Config5LLB=1, then either success or failure is mandated, else the result is UNPREDICTABLE):

Atomic RMW is provided only for synchronizable memory locations. A synchronizable memory location is one that is associated with the state and logic necessary to implement the LL/SC semantics. Whether a memory location is synchronizable depends on the processor and system configurations, and on the memory access type used for the location:

made with memory access type of either cached noncoherent or cached coherent. All accesses must be to one or the other access type, and they may not be mixed.

with a memory access type of cached coherent.

a memory access type of cached coherent. If the I/O system does not use coherent memory operations, then atomic RMW cannot be provided with respect to the I/O reads and writes.

Release 6 (with Config5ULS =1) formally defines support for uncached LL and SC with the following constraints.

It is implementation dependent as to what form the monitor takes. It is however differentiated from cached LL and SC which rely on a coherence protocol to make the determination as to whether the sequence succeeds.

As emphasized above, it is not recommended that software mix memory access types during LL and SC sequences.

That is all memory accesses must be of the same type, otherwise this may result in UNPREDICTABLE behavior.

Conditions that cause UNPREDICTABLE behavior for legacy cached LL and SC sequences may also cause such behavior for uncached sequences.

A PAUSE instruction is no-op'd when it is preceded by an uncached LL.

The semantics of an uncached LL/SC atomic operation applies to any uncached CCA including UCA (UnCached

Accelerated). An implementation that supports UCA must guarantee that SC does not participate in store gathering and that it ends any gathering initiated by stores preceding the SC in program order when the SC address coincides with a gathering address.

Restrictions:

The addressed location must have a memory access type of cached noncoherent or cached coherent; if it does not, the result is UNPREDICTABLE. Release 6 (with Config5ULS =1) extends support to uncached types.

The effective address must be naturally-aligned. If either of the 2 least-significant bits of the address is non-zero, an

Address Error exception occurs.

Providing misaligned support for Release 6 is not a requirement for this instruction.

Availability and Compatibility

This instruction has been recoded for Release 6.

Operation:

vAddr = sign_extend(offset) + GPR[base]
if vAddr1..0 != 02 then
   SignalException(AddressError)
endif
(pAddr, CCA) = AddressTranslation (vAddr, DATA, STORE)
pAddr = pAddrPSIZE-1..3 || (pAddr2..0 xor (ReverseEndian || 02))
bytesel = vAddr2..0 xor (BigEndianCPU || 02)
datadoubleword = GPR[rt]63-8*bytesel..0 || 08*bytesel
if LLbit then
   StoreMemory (CCA, WORD, datadoubleword, pAddr, vAddr, DATA)
endif
GPR[rt] = 063 || LLbit
LLbit = 0 // if Config5LLB=1, SC always clears LLbit regardless of address match.

Exceptions:

TLB Refill, TLB Invalid, TLB Modified, Address Error, Watch

Programming Notes:

LL and SC are used to atomically update memory locations, as shown below.

L1:
   LL    T1, (T0)  # load counter
   ADDI  T2, T1, 1 # increment
   SC    T2, (T0)  # try to store, checking for atomicity
   BEQ   T2, 0, L1 # if not atomic (0), try again
   NOP             # branch-delay slot

Exceptions between the LL and SC cause SC to fail, so persistent exceptions must be avoided. Some examples of these are arithmetic operations that trap, system calls, and floating point operations that trap or require software emulation assistance.

LL and SC function on a single processor for cached noncoherent memory so that parallel programs can be run on uniprocessor systems that do not support cached coherent memory access types.

As shown in the instruction drawing above, Release 6 implements a 9-bit offset, whereas all release levels lower than

Release 6 of the MIPS architecture implement a 16-bit offset.