Assembly:

SYNC stype

nanoMIPS

Sync

Purpose:

Sync.

Impose ordering constraints of type stype on prior and subsequent memory operations.

Availability:

nanoMIPS

Format:

100000

00000

stype

1100

x

0000

00110

6

5

5

4

3

4

5

Operation:

sync_memory_access(stype)

The SYNC instruction is used to order loads and stores for shared memory, and also to order operations with respect to the globalinvalidate instructions GINVI and GINVT. The following types of ordering

guarantees are available with different stypes.

instructions before the SYNC are completed and globally performed before any of the specified memory instructions after the SYNC are performed to any extent. Loads are completed when

the destination register is written. Stores are completed when the stored value is visible to every other processor in the system.

memory instructions before the SYNC are ordered before any of the specified memory instructions after the SYNC. The ordering SYNC is considered complete when the memory instructions

before and after the SYNC are guaranteed thereafter to retain their order relative to the SYNC,

i.e. when it is guaranteed that all specified memory instructions before the SYNC will be globally performed before any of

the specified memory accesses after the SYNC are performed to

any extent.Itis helpfulto think of a global ordering pointin a coherence domain, which is a point where once an instruction reaches,it can be guaranteed to retain its order relative to any

memory instruction that reaches the point after it. The ordering SYNC thus can not complete before all older specified memory instructions reach the global ordering point.

The following table shows the behavior of the SYNC instruction for each stype value. Operation types listed in the ’What reaches before’ column are subject to a pre-SYNC ordering barrier: such operations,

when younger, must reach the global ordering point before the SYNC instruction completes. Operation types listed in the ’What reaches after’ column are subjectto a post-SYNC ordering barrier:such

operations, when older, must reach the global ordering point only after the SYNC instruction completes. Operation types listed in the ’What completes before’ column are subject to a completion barrier, that

is, they must be globally performed when the SYNC instruction completes.

What

What reachesWhat reachescompletes

NamebeforeafterbeforeAvailabilitystype

0x0SYNCLoads, StoresLoads, StoresLoads, StoresRequired. 0x1-0x3Impl./vendor

specific.

0x4SYNC_WMBStoresStoresOptional. 0x5-0xFImpl./vendor

specific.

0x10SYNC_MBLoads, StoresLoads, StoresOptional. 0x11SYNC_ACQUIRE LoadsLoads, StoresOptional.

0x12SYNC_RELEASE Loads, StoresLoadsOptional. 0x13SYNC_RMBLoadsLoadsOptional.

0x14SYNC_GINVLoads, StoresLoads, StoresGINVI, GINVT,Config5.GI=2,3.

SYNCI

0x15Reserved for 0x1FArchitecture.

SYNC barriers affect only uncached and cached coherent loads and stores and do not affect the order in which instruction fetches are performed. For the purposes of this description,the CACHE, PREF

and SYNCIinstructions are treated as loads and stores.In addition,the optional GlobalInvalidate instructions are synchronizable through SYNC (stype=0x14).

The effect of SYNC on the global order of loads and stores for memory access types other than uncached and cached coherent is UNPREDICTABLE.

A completion barrier may have an adverse impact on performance compared to an ordering barrier due to the constraint of completion. An implementation may optimize the ordering of memory instructions

such that the ordering barrier completes before a completion barrier under the same circumstance. The magnitude of the impact is implementation-dependent but an implementation must ensure that an

ordering barrier is not worse performing than the equivalent completion barrier. Software thus needs to use completion and ordering barriers for the appropriate conditions.

An stype of 0 is used to define the SYNC instruction with completion barrier semantics. Non-zero values of stype may be defined by the architecture or specific implementations to perform synchronization

behaviors that are less complete than that of stype=0.If an implementation does not use one of these non-zero values to define a different synchronization behavior, then that non-zero value of stype must

map to a completion barrier. This allows software written for an implementation with a lighter-weight barrier to work on another implementation which only implements the stype=0 completion barrier.

The Acquire and Release barrier types are used to minimize the memory ordering that must be maintained and still have software synchronization work.

A completion barrier is required, potentially in conjunction with an EHB instruction,to guarantee that memory reference results are visible across operating mode changes. For example, a completion

barrier is required on some implementations on entry to and exit from Debug Mode to guarantee that memory effects are handled correctly.

If Global Invalidate instructions are supported, then SYNC (stype=0x14) acts as a completion barrier with respect to any preceding GINVI or GINVT instructions. This SYNC instruction is globalized and

only completes if all preceding GINVI or GINVT operations related to the same program have completed in the system.(Any references to GINVT also imply GINVGT, available in a virtualized MIPS system.)

Asystem thatimplementsthe GlobalInvalidatesalsorequiresthatthecompletionofSYNC (stype=0x14) be constrained by legacy SYNCI operations.Thus SYNC (stype=0x14) can also be

used to enforce synchronization of SYNCI instructions.In the typical use cases, a single GINVI is used by itself to invalidate caches and would be followed by a SYNC (stype=0x14).In the case of GINVT,

multiple GINVT could be used to invalidate multiple TLB mappings, and the SYNC (stype=0x14) would be used to guaranteed completion of any number of GINVTs preceding it.

Terms

:

Synchronizable: A load or store instruction is synchronizable if the load or store occurs to a physical location in shared memory using a virtual address with a memory access type of either uncached or

cached coherent .

Shared memory: Memory that can be accessed by more than one processor or by a coherent I/O system module.

Performed load: A load instruction is performed when the value returned by the load has been determined. The result of a load on processor A has been determined with respect to processor or coherent

I/O module B when a subsequent store to the location by B cannot affect the value returned by the load. The store by B must use the same memory access type as the load.

Performed store: A store instruction is performed when the store is observable. A store on processor A is observable with respectto processor or coherentI/O module B when a subsequentload ofthe

location by B returns the value written by the store. The load by B must use the same memory access type as the store.

Globally performed load: A load instruction is globally performed when it is performed with respect to all processors and coherent I/O modules capable of storing to the location.

Globally performed store: A store instruction is globally performed when it is globally observable.It is globally observable when it is observable by all processors and I/O modules capable of loading from

the location.

Global ordering point: A point in the coherence domain where when a memory instruction reaches,it can be guaranteed to retain its order relative to any memory instruction that reaches the point after

it.

CoherentI/O module: A coherentI/O module is an Input/Output system componentthat performs coherent Direct Memory Access (DMA). It reads and writes memory independently as though it were

a processor doing loads and stores to locations with a memory access type of cached coherent.

Programming Notes

:

A processor executing load and store instructions observes the order in which loads and stores using the same memory access type occur in the instruction stream; this is known as program order.

A parallel program has multiple instruction streams that can execute simultaneously on different processors.

In multiprocessor (MP) systems,the order in which the effects ofloads and stores are observed by other processors - the global order of the loads and store - determines the actions necessary

to reliably share data in parallel programs.

When all processors observe the effects ofloads and stores in program order, the system is strongly ordered. On such systems, parallel programs can reliably share data without explicitly using a SYNC.

Executing SYNC on such a system is not necessary, will not cause an error, but may reduce overall performance.

If a multiprocessor system is not strongly ordered, the effects of load and store instructions executed by one processor may be observed out of program order by other processors. On such systems, parallel

programs must use SYNC to reliably share data at critical points in the program. SYNC separates the loads and stores executed on the processor into two groups, and the effect of allloads and stores in

one group is seen by all processors before the effect of any load or store in the subsequent group.In effect, SYNC causes the system to be strongly ordered for the executing processor at the instant that

the SYNC is executed.

The hardware ordering support provided in a MIPS-based multiprocessor system is implementation dependent. A parallel program that does not use SYNC generally does not operate on a system that is not

strongly ordered. However, a program that does use SYNC works on both types of systems.(Systemspecific documentation describes the actions needed to reliably share data in parallel programs for

that system.)

The behavior of a load or store using one memory access type is UNPREDICTABLE if a load or store was previously made to the same physical location using a different memory access type. The presence

of a SYNC between the references does not alter this behavior.

SYNC affects the order in which the effects of load and store instructions appear to all processors;it does not generally affect the physical memory-system ordering or synchronization issues that arise in

system programming. The effect of SYNC on implementation-specific aspects of the cached memory system, such as writeback buffers, is not defined.

The code fragments below show how SYNC can be used to coordinate the use of shared data between separate writer and reader instruction streams in a multiprocessor environment. The FLAG location is

used by the instruction streams to determine whether the shared data item DATA is valid. The SYNC executed by processor A forces the store of DATA to be performed globally before the store to FLAG

is performed. The SYNC executed by processor B ensures that DATA is not read until after the FLAG value indicates that the shared data is valid.

# Processor A (writer)
# Conditions at entry:
# The value 0 has been stored in FLAG and that valueisobservablebyB
SW     R1, DATA       # change sharedDATA value
LI     R2, 1
SYNC                  # Perform DATAstore beforeperforming FLAGstore
SW     R2, FLAG       # say that thesharedDATA value isvalid
# Processor B (reader)
LI     R2, 1
1: LW     R1, FLAG   # Get FLAG
BNEC   R2, R1, 1B # if it says that DATAis not valid, poll again
NOP
SYNC              # FLAG value checked beforedoing DATA read
LW     R1, DATA   # Read (valid)sharedDATA value
SYNC

Exceptions:

None.