SYNC stype - Sync

Assembly:

SYNC stype

nanoMIPS

Sync

Purpose:

Sync.

Impose ordering constraints of type stype on prior and subsequent memory operations.

Availability:

nanoMIPS

Format:

100000	00000	stype	1100	x	0000	00110
6	5	5	4	3	4	5

Operation:

sync_memory_access(stype)

The SYNC instruction is used to order loads and stores for shared memory, and also to order operations with respect to the globalinvalidate instructions GINVI and GINVT. The following types of ordering

guarantees are available with diﬀerent stypes.

Completion Barriers: A completion barrier provides a guarantee that any of the speciﬁed memory

instructions before the SYNC are completed and globally performed before any of the speciﬁed memory instructions after the SYNC are performed to any extent. Loads are completed when

the destination register is written. Stores are completed when the stored value is visible to every other processor in the system.

Ordering Barriers: An ordering barrier provides a guarantee in the system that any speciﬁed

memory instructions before the SYNC are ordered before any of the speciﬁed memory instructions after the SYNC. The ordering SYNC is considered complete when the memory instructions

before and after the SYNC are guaranteed thereafter to retain their order relative to the SYNC,

i.e. when it is guaranteed that all speciﬁed memory instructions before the SYNC will be globally performed before any of

the speciﬁed memory accesses after the SYNC are performed to

any extent.Itis helpfulto think of a global ordering pointin a coherence domain, which is a point where once an instruction reaches,it can be guaranteed to retain its order relative to any

memory instruction that reaches the point after it. The ordering SYNC thus can not complete before all older speciﬁed memory instructions reach the global ordering point.

The following table shows the behavior of the SYNC instruction for each stype value. Operation types listed in the ’What reaches before’ column are subject to a pre-SYNC ordering barrier: such operations,

when younger, must reach the global ordering point before the SYNC instruction completes. Operation types listed in the ’What reaches after’ column are subjectto a post-SYNC ordering barrier:such

operations, when older, must reach the global ordering point only after the SYNC instruction completes. Operation types listed in the ’What completes before’ column are subject to a completion barrier, that

is, they must be globally performed when the SYNC instruction completes.

What

What reachesWhat reachescompletes

NamebeforeafterbeforeAvailabilitystype

0x0SYNCLoads, StoresLoads, StoresLoads, StoresRequired. 0x1-0x3Impl./vendor

speciﬁc.

0x4SYNC_WMBStoresStoresOptional. 0x5-0xFImpl./vendor

speciﬁc.

0x10SYNC_MBLoads, StoresLoads, StoresOptional. 0x11SYNC_ACQUIRE LoadsLoads, StoresOptional.

0x12SYNC_RELEASE Loads, StoresLoadsOptional. 0x13SYNC_RMBLoadsLoadsOptional.

0x14SYNC_GINVLoads, StoresLoads, StoresGINVI, GINVT,Conﬁg5.GI=2,3.

SYNCI

0x15Reserved for 0x1FArchitecture.

SYNC barriers aﬀect only uncached and cached coherent loads and stores and do not aﬀect the order in which instruction fetches are performed. For the purposes of this description,the CACHE, PREF

and SYNCIinstructions are treated as loads and stores.In addition,the optional GlobalInvalidate instructions are synchronizable through SYNC (stype=0x14).

The eﬀect of SYNC on the global order of loads and stores for memory access types other than uncached and cached coherent is UNPREDICTABLE.

A completion barrier may have an adverse impact on performance compared to an ordering barrier due to the constraint of completion. An implementation may optimize the ordering of memory instructions

such that the ordering barrier completes before a completion barrier under the same circumstance. The magnitude of the impact is implementation-dependent but an implementation must ensure that an

ordering barrier is not worse performing than the equivalent completion barrier. Software thus needs to use completion and ordering barriers for the appropriate conditions.

An stype of 0 is used to deﬁne the SYNC instruction with completion barrier semantics. Non-zero values of stype may be deﬁned by the architecture or speciﬁc implementations to perform synchronization

behaviors that are less complete than that of stype=0.If an implementation does not use one of these non-zero values to deﬁne a diﬀerent synchronization behavior, then that non-zero value of stype must

map to a completion barrier. This allows software written for an implementation with a lighter-weight barrier to work on another implementation which only implements the stype=0 completion barrier.

The Acquire and Release barrier types are used to minimize the memory ordering that must be maintained and still have software synchronization work.

A completion barrier is required, potentially in conjunction with an EHB instruction,to guarantee that memory reference results are visible across operating mode changes. For example, a completion

barrier is required on some implementations on entry to and exit from Debug Mode to guarantee that memory eﬀects are handled correctly.

If Global Invalidate instructions are supported, then SYNC (stype=0x14) acts as a completion barrier with respect to any preceding GINVI or GINVT instructions. This SYNC instruction is globalized and

only completes if all preceding GINVI or GINVT operations related to the same program have completed in the system.(Any references to GINVT also imply GINVGT, available in a virtualized MIPS system.)

Asystem thatimplementsthe GlobalInvalidatesalsorequiresthatthecompletionofSYNC (stype=0x14) be constrained by legacy SYNCI operations.Thus SYNC (stype=0x14) can also be

used to enforce synchronization of SYNCI instructions.In the typical use cases, a single GINVI is used by itself to invalidate caches and would be followed by a SYNC (stype=0x14).In the case of GINVT,

multiple GINVT could be used to invalidate multiple TLB mappings, and the SYNC (stype=0x14) would be used to guaranteed completion of any number of GINVTs preceding it.

Terms

Synchronizable: A load or store instruction is synchronizable if the load or store occurs to a physical location in shared memory using a virtual address with a memory access type of either uncached or

cached coherent .

Shared memory: Memory that can be accessed by more than one processor or by a coherent I/O system module.

Performed load: A load instruction is performed when the value returned by the load has been determined. The result of a load on processor A has been determined with respect to processor or coherent

I/O module B when a subsequent store to the location by B cannot aﬀect the value returned by the load. The store by B must use the same memory access type as the load.

Performed store: A store instruction is performed when the store is observable. A store on processor A is observable with respectto processor or coherentI/O module B when a subsequentload ofthe

location by B returns the value written by the store. The load by B must use the same memory access type as the store.

Globally performed load: A load instruction is globally performed when it is performed with respect to all processors and coherent I/O modules capable of storing to the location.

Globally performed store: A store instruction is globally performed when it is globally observable.It is globally observable when it is observable by all processors and I/O modules capable of loading from

the location.

Global ordering point: A point in the coherence domain where when a memory instruction reaches,it can be guaranteed to retain its order relative to any memory instruction that reaches the point after

it.

CoherentI/O module: A coherentI/O module is an Input/Output system componentthat performs coherent Direct Memory Access (DMA). It reads and writes memory independently as though it were

a processor doing loads and stores to locations with a memory access type of cached coherent.

Programming Notes

A processor executing load and store instructions observes the order in which loads and stores using the same memory access type occur in the instruction stream; this is known as program order.

A parallel program has multiple instruction streams that can execute simultaneously on diﬀerent processors.

In multiprocessor (MP) systems,the order in which the eﬀects ofloads and stores are observed by other processors - the global order of the loads and store - determines the actions necessary

to reliably share data in parallel programs.

When all processors observe the eﬀects ofloads and stores in program order, the system is strongly ordered. On such systems, parallel programs can reliably share data without explicitly using a SYNC.

Executing SYNC on such a system is not necessary, will not cause an error, but may reduce overall performance.

If a multiprocessor system is not strongly ordered, the eﬀects of load and store instructions executed by one processor may be observed out of program order by other processors. On such systems, parallel

programs must use SYNC to reliably share data at critical points in the program. SYNC separates the loads and stores executed on the processor into two groups, and the eﬀect of allloads and stores in

one group is seen by all processors before the eﬀect of any load or store in the subsequent group.In eﬀect, SYNC causes the system to be strongly ordered for the executing processor at the instant that

the SYNC is executed.

The hardware ordering support provided in a MIPS-based multiprocessor system is implementation dependent. A parallel program that does not use SYNC generally does not operate on a system that is not

strongly ordered. However, a program that does use SYNC works on both types of systems.(Systemspeciﬁc documentation describes the actions needed to reliably share data in parallel programs for

that system.)

The behavior of a load or store using one memory access type is UNPREDICTABLE if a load or store was previously made to the same physical location using a diﬀerent memory access type. The presence

of a SYNC between the references does not alter this behavior.

SYNC aﬀects the order in which the eﬀects of load and store instructions appear to all processors;it does not generally aﬀect the physical memory-system ordering or synchronization issues that arise in

system programming. The eﬀect of SYNC on implementation-speciﬁc aspects of the cached memory system, such as writeback buﬀers, is not deﬁned.

The code fragments below show how SYNC can be used to coordinate the use of shared data between separate writer and reader instruction streams in a multiprocessor environment. The FLAG location is

used by the instruction streams to determine whether the shared data item DATA is valid. The SYNC executed by processor A forces the store of DATA to be performed globally before the store to FLAG

is performed. The SYNC executed by processor B ensures that DATA is not read until after the FLAG value indicates that the shared data is valid.

# Processor A (writer)
# Conditions at entry:
# The value 0 has been stored in FLAG and that valueisobservablebyB
SW     R1, DATA       # change sharedDATA value
LI     R2, 1
SYNC                  # Perform DATAstore beforeperforming FLAGstore
SW     R2, FLAG       # say that thesharedDATA value isvalid
# Processor B (reader)
LI     R2, 1
1: LW     R1, FLAG   # Get FLAG
BNEC   R2, R1, 1B # if it says that DATAis not valid, poll again
NOP
SYNC              # FLAG value checked beforedoing DATA read
LW     R1, DATA   # Read (valid)sharedDATA value
SYNC

Exceptions:

None.