Encoding:

pre-Release 6

SCD

111100

base

rt

offset

6

5

5

16

Release 6

SPECIAL3

011111

base

rt

offset

0

SCD

100111

6

5

5

9

1

6

Format:

SCD rt, offset(base)

MIPS64

Store Conditional Doubleword

Purpose:

Store Conditional Doubleword

To store a doubleword to memory to complete an atomic read-modify-write.

Description:

if atomic_update then memory[GPR[base] + offset] = GPR[rt], GPR[rt] = 1 else 
GPR[rt] = 0

The LLD and SCD instructions provide primitives to implement atomic read-modify-write (RMW) operations for synchronizable memory locations.

Release 6 (with Config5ULS =1) formalizes support for uncached LLD and SCD sequences, whereas the preRelease 6 LLD and SCD description applies to cached (coherent/non-coherent) memory types. (The description for uncached support does not modify the description for cached support and is written in a self-contained manner.)

The 64-bit doubleword in GPR rt is conditionally stored in memory at the location specified by the aligned effective address. The signed offset is added to the contents of GPR base to form an effective address.

The SCD completes the RMW sequence begun by the preceding LLD instruction executed on the processor. If SCD completes the RMW sequence atomically, the following occurs:

The 64-bit doubleword of GPR rt is stored into memory at the location specified by the aligned effective address.

If either of the following events occurs between the execution of LLD and SCD, the SCD fails:

If either of the following events occurs between the execution of LLD and SCD, the SCD may succeed or it may fail; the success or failure is not predictable. Portable programs should not cause the following events:

The following two conditions must be true or the result of the SCD is UNPREDICTABLE:

Atomic RMW is provided only for synchronizable memory locations. A synchronizable memory location is one that is associated with the state and logic necessary to implement the LL/SC semantics. Whether a memory location is synchronizable depends on the processor and system configurations, and on the memory access type used for the location:

Uniprocessor atomicity: To provide atomic RMW on a single processor, all accesses to the location must be

MP atomicity: To provide atomic RMW among multiple processors, all accesses to the location must be made

I/O System: To provide atomic RMW with a coherent I/O system, all accesses to the location must be made with

Release 6 (with Config5ULS =1) formally defines support for uncached LLD and SCD with the following constraints.

It is implementation dependent as to what form the monitor takes. It is however differentiated from cached LLD and SCD which rely on a coherence protocol to make the determination as to whether the sequence succeeds.

As emphasized above, it is not recommended that software mix memory access types during LLD and SCD sequences. That is all memory accesses must be of the same type, otherwise this may result in UNPREDICTABLE behavior.

Conditions that cause UNPREDICTABLE behavior for legacy cached LLD and SCD sequences may also cause such behavior for uncached sequences.

A PAUSE instruction is no-op'd when it is preceded by an uncached LLD.

The semantics of an uncached LLD/SCD atomic operation applies to any uncached CCA including UCA (UnCached

Accelerated). An implementation that supports UCA must guarantee that SCD does not participate in store gathering and that it ends any gathering initiated by stores preceding the SCD in program order when the SCD address coincides with a gathering address.

Restrictions:

The addressed location must have a memory access type of cached non coherent or cached coherent; if it does not, the result is UNPREDICTABLE. Release 6 (with Config5ULS =1) extends support to uncached types.

The effective address must be naturally-aligned. If any of the 3 least-significant bits of the address is non-zero, an

Address Error exception occurs.

Providing misaligned support for Release 6 is not a requirement for this instruction.

Availability and Compatibility:

This instruction has been recoded for Release 6.

Operation:

vAddr = sign_extend(offset) + GPR[base]
if vAddr2..0 != 03 then
   SignalException(AddressError)
endif
(pAddr, CCA) = AddressTranslation (vAddr, DATA, STORE)
datadoubleword = GPR[rt]
if LLbit then
   StoreMemory (CCA, DOUBLEWORD, datadoubleword, pAddr, vAddr, DATA)
endif
GPR[rt] = 063 || LLbit

Exceptions:

TLB Refill, TLB Invalid, TLB Modified, Address Error, Reserved Instruction, Watch

Programming Notes:

LLD and SCD are used to atomically update memory locations, as shown below.

L1:
   LLD   T1, (T0)  # load counter
   ADDI  T2, T1, 1 # increment
   SCD   T2, (T0)  # try to store,
                   #  checking for atomicity
   BEQ   T2, 0, L1 # if not atomic (0), try again
   NOP             # branch-delay slot

Exceptions between the LLD and SCD cause SCD to fail, so persistent exceptions must be avoided. Examples are arithmetic operations that trap, system calls, and floating point operations that trap or require software emulation assistance.

LLD and SCD function on a single processor for cached non coherent memory so that parallel programs can be run on uniprocessor systems that do not support cached coherent memory access types.