Encoding:

EXTEND

11110

stype

0

00000

SHIFT

00110

000

000

sel = 5

SLL

00

5

5

1

5

5

3

3

3

5

Format:

SYNC stype

MIPS16e2

Synchronize Shared Memory Extended

Purpose:

Synchronize Shared Memory Extended

To order loads and stores for shared memory.

Description:

These types of ordering guarantees are available through the SYNC instruction:

Completion Barrier - Simple Description:

Completion Barrier - Detailed Description:

SYNC behavior when the stype field is zero:

Ordering Barrier - Simple Description:

Ordering Barrier - Detailed Description:

SYNC instruction in the instruction stream reaches the same stage in the load/store datapath.

As compared to the completion barrier, the ordering barrier is a lighter-weight operation as it does not require the specified instructions before the SYNC to be already completed. Instead it only requires that those specified instructions which are subsequent to the SYNC in the instruction stream are never re-ordered for processing ahead of the specified instructions which are before the SYNC in the instruction stream. This potentially reduces how many cycles the barrier instruction must stall before it completes.

The Acquire and Release barrier types are used to minimize the memory orderings that must be maintained and still have software synchronization work.

Implementations that do not use any of the non-zero values of stype to define different barriers, such as ordering barriers, must make those stype values act the same as stype zero.

For the purposes of this description, the CACHE, PREF and PREFX instructions are treated as loads and stores. That is, these instructions and the memory transactions sourced by these instructions obey the ordering and completion rules of the SYNC instruction.

The following table lists the available completion barrier and ordering barriers behaviors that can be specified using the stype field.

Code

Name

Older instructions

which must reach

the load/store

ordering point

before the SYNC

instruction

completes.

Younger

instructions

which must reach

the load/store

ordering point

only after the

SYNC instruction

completes.

Older instructions

which must be

globally

performed when

the SYNC

instruction

completes

Compliance

0x0

SYNC or

SYNC 0

Loads, Stores

Loads, Stores

Loads, Stores

Required

0x4

SYNC_WMB or

SYNC 4

Stores

Stores

Optional

0x10

SYNC_MB or

SYNC 16

Loads, Stores

Loads, Stores

Optional

0x11

SYNC_ACQUIRE or

SYNC 17

Loads

Loads, Stores

Optional

0x12

SYNC_RELEASE or

SYNC 18

Loads, Stores

Stores

Optional

0x13

SYNC_RMB or

SYNC 19

Loads

Loads

Optional

0x1-0x3, 0x5-0xF

Implementation-Specific and Vendor Specific Sync Types

0x14 - 0x1F

RESERVED

Reserved for MIPS Technologies for future extension of the architecture.

Terms:

Synchronizable: A load or store instruction is synchronizable if the load or store occurs to a physical location in

shared memory using a virtual location with a memory access type of either uncached or cached coherent. Shared

memory is memory that can be accessed by more than one processor or by a coherent I/O system module.

Performed load: A load instruction is performed when the value returned by the load has been determined. The result

of a load on processor A has been determined with respect to processor or coherent I/O module B when a subsequent store to the location by B cannot affect the value returned by the load. The store by B must use the same memory access type as the load.

Performed store: A store instruction is performed when the store is observable. A store on processor A is observable

with respect to processor or coherent I/O module B when a subsequent load of the location by B returns the value written by the store. The load by B must use the same memory access type as the store.

Globally performed load: A load instruction is globally performed when it is performed with respect to all processors

and coherent I/O modules capable of storing to the location.

Globally performed store: A store instruction is globally performed when it is globally observable. It is globally observable when it is observable by all processors and I/O modules capable of loading from the location.

Coherent I/O module: A coherent I/O module is an Input/Output system component that performs coherent Direct

Memory Access (DMA). It reads and writes memory independently as though it were a processor doing loads and stores to locations with a memory access type of cached coherent.

Load/Store Datapath: The portion of the processor which handles the load/store data requests coming from the processor pipeline and processes those requests within the cache and memory system hierarchy.

Restrictions:

Unpredictable prior to MIPS16e2. The effect of SYNC on the global order of loads and stores for memory access types other than uncached and cached coherent is UNPREDICTABLE.

Operation:

SyncOperation(stype)

Exceptions:

None

Programming Notes:

A processor executing load and store instructions observes the order in which loads and stores using the same memory access type occur in the instruction stream; this is known as program order.

A parallel program has multiple instruction streams that can execute simultaneously on different processors. In multiprocessor (MP) systems, the order in which the effects of loads and stores are observed by other processors-the

global order of the loads and store-determines the actions necessary to reliably share data in parallel programs.

When all processors observe the effects of loads and stores in program order, the system is strongly ordered. On such systems, parallel programs can reliably share data without explicit actions in the programs. For such a system, SYNC has the same effect as a NOP. Executing SYNC on such a system is not necessary, but neither is it an error.

If a multiprocessor system is not strongly ordered, the effects of load and store instructions executed by one processor may be observed out of program order by other processors. On such systems, parallel programs must take explicit actions to reliably share data. At critical points in the program, the effects of loads and stores from an instruction stream must occur in the same order for all processors. SYNC separates the loads and stores executed on the processor into two groups, and the effect of all loads and stores in one group is seen by all processors before the effect of any load or store in the subsequent group. In effect, SYNC causes the system to be strongly ordered for the executing processor at the instant that the SYNC is executed.

Many MIPS-based multiprocessor systems are strongly ordered or have a mode in which they operate as strongly ordered for at least one memory access type. The MIPS architecture also permits implementation of MP systems that are not strongly ordered; SYNC enables the reliable use of shared memory on such systems. A parallel program that does not use SYNC generally does not operate on a system that is not strongly ordered. However, a program that does use SYNC works on both types of systems. (System-specific documentation describes the actions needed to reliably share data in parallel programs for that system.)

The behavior of a load or store using one memory access type is UNPREDICTABLE if a load or store was previously made to the same physical location using a different memory access type. The presence of a SYNC between the references does not alter this behavior.

SYNC affects the order in which the effects of load and store instructions appear to all processors; it does not generally affect the physical memory-system ordering or synchronization issues that arise in system programming. The effect of SYNC on implementation-specific aspects of the cached memory system, such as writeback buffers, is not defined.

# Processor A (writer)
# Conditions at entry: 
# The value 0 has been stored in FLAG and that value is observable by B
SW    R1, DATA        # change shared DATA value
LI    R2, 1
SYNC                   # Perform DATA store before performing FLAG store
SW    R2, FLAG        # say that the shared DATA value is valid
   # Processor B (reader)
      LI    R2, 1
   1: LW    R1, FLAG  # Get FLAG
      BNE   R2, R1, 1B# if it says that DATA is not valid, poll again
      NOP
      SYNC            # FLAG value checked before doing DATA read
      LW    R1, DATA  # Read (valid) shared DATA value

The code fragments above shows how SYNC can be used to coordinate the use of shared data between separate writer and reader instruction streams in a multiprocessor environment. The FLAG location is used by the instruction streams to determine whether the shared data item DATA is valid. The SYNC executed by processor A forces the store of

DATA to be performed globally before the store to FLAG is performed. The SYNC executed by processor B ensures that DATA is not read until after the FLAG value indicates that the shared data is valid.

Software written to use a SYNC instruction with a non-zero stype value, expecting one type of barrier behavior, should only be run on hardware that actually implements the expected barrier behavior for that non-zero stype value or on hardware which implements a superset of the behavior expected by the software for that stype value. If the hardware does not perform the barrier behavior expected by the software, the system may fail.