SYNC stype |
nanoMIPS |
Sync |
Sync.
Impose ordering constraints of type stype on prior and subsequent memory operations.
nanoMIPS
100000 |
00000 |
stype |
1100 |
x |
0000 |
00110 |
6 |
5 |
5 |
4 |
3 |
4 |
5 |
sync_memory_access(stype)
The SYNC instruction is used to order loads and stores for shared memory, and also to order operations with respect to the globalinvalidate instructions GINVI and GINVT. The following types of ordering
guarantees are available with different stypes.
Completion Barriers: A completion barrier provides a guarantee that any of the specified memory
instructions before the SYNC are completed and globally performed before any of the specified memory instructions after the SYNC are performed to any extent. Loads are completed when
the destination register is written. Stores are completed when the stored value is visible to every other processor in the system.
Ordering Barriers: An ordering barrier provides a guarantee in the system that any specified
memory instructions before the SYNC are ordered before any of the specified memory instructions after the SYNC. The ordering SYNC is considered complete when the memory instructions
before and after the SYNC are guaranteed thereafter to retain their order relative to the SYNC,
i.e. when it is guaranteed that all specified memory instructions before the SYNC will be globally performed before any of
the specified memory accesses after the SYNC are performed to
any extent.Itis helpfulto think of a global ordering pointin a coherence domain, which is a point where once an instruction reaches,it can be guaranteed to retain its order relative to any
memory instruction that reaches the point after it. The ordering SYNC thus can not complete before all older specified memory instructions reach the global ordering point.
The following table shows the behavior of the SYNC instruction for each stype value. Operation types listed in the ’What reaches before’ column are subject to a pre-SYNC ordering barrier: such operations,
when younger, must reach the global ordering point before the SYNC instruction completes. Operation types listed in the ’What reaches after’ column are subjectto a post-SYNC ordering barrier:such
operations, when older, must reach the global ordering point only after the SYNC instruction completes. Operation types listed in the ’What completes before’ column are subject to a completion barrier, that
is, they must be globally performed when the SYNC instruction completes.
What
What reachesWhat reachescompletes
NamebeforeafterbeforeAvailabilitystype
0x0SYNCLoads, StoresLoads, StoresLoads, StoresRequired. 0x1-0x3Impl./vendor
specific.
0x4SYNC_WMBStoresStoresOptional. 0x5-0xFImpl./vendor
specific.
0x10SYNC_MBLoads, StoresLoads, StoresOptional. 0x11SYNC_ACQUIRE LoadsLoads, StoresOptional.
0x12SYNC_RELEASE Loads, StoresLoadsOptional. 0x13SYNC_RMBLoadsLoadsOptional.
0x14SYNC_GINVLoads, StoresLoads, StoresGINVI, GINVT,Config5.GI=2,3.
SYNCI
0x15Reserved for 0x1FArchitecture.
SYNC barriers affect only uncached and cached coherent loads and stores and do not affect the order in which instruction fetches are performed. For the purposes of this description,the CACHE, PREF
and SYNCIinstructions are treated as loads and stores.In addition,the optional GlobalInvalidate instructions are synchronizable through SYNC (stype=0x14).
The effect of SYNC on the global order of loads and stores for memory access types other than uncached and cached coherent is UNPREDICTABLE.
A completion barrier may have an adverse impact on performance compared to an ordering barrier due to the constraint of completion. An implementation may optimize the ordering of memory instructions
such that the ordering barrier completes before a completion barrier under the same circumstance. The magnitude of the impact is implementation-dependent but an implementation must ensure that an
ordering barrier is not worse performing than the equivalent completion barrier. Software thus needs to use completion and ordering barriers for the appropriate conditions.
An stype of 0 is used to define the SYNC instruction with completion barrier semantics. Non-zero values of stype may be defined by the architecture or specific implementations to perform synchronization
behaviors that are less complete than that of stype=0.If an implementation does not use one of these non-zero values to define a different synchronization behavior, then that non-zero value of stype must
map to a completion barrier. This allows software written for an implementation with a lighter-weight barrier to work on another implementation which only implements the stype=0 completion barrier.
The Acquire and Release barrier types are used to minimize the memory ordering that must be maintained and still have software synchronization work.
A completion barrier is required, potentially in conjunction with an EHB instruction,to guarantee that memory reference results are visible across operating mode changes. For example, a completion
barrier is required on some implementations on entry to and exit from Debug Mode to guarantee that memory effects are handled correctly.
If Global Invalidate instructions are supported, then SYNC (stype=0x14) acts as a completion barrier with respect to any preceding GINVI or GINVT instructions. This SYNC instruction is globalized and
only completes if all preceding GINVI or GINVT operations related to the same program have completed in the system.(Any references to GINVT also imply GINVGT, available in a virtualized MIPS system.)
Asystem thatimplementsthe GlobalInvalidatesalsorequiresthatthecompletionofSYNC (stype=0x14) be constrained by legacy SYNCI operations.Thus SYNC (stype=0x14) can also be
used to enforce synchronization of SYNCI instructions.In the typical use cases, a single GINVI is used by itself to invalidate caches and would be followed by a SYNC (stype=0x14).In the case of GINVT,
multiple GINVT could be used to invalidate multiple TLB mappings, and the SYNC (stype=0x14) would be used to guaranteed completion of any number of GINVTs preceding it.
:
Synchronizable: A load or store instruction is synchronizable if the load or store occurs to a physical location in shared memory using a virtual address with a memory access type of either uncached or
cached coherent .
Shared memory: Memory that can be accessed by more than one processor or by a coherent I/O system module.
Performed load: A load instruction is performed when the value returned by the load has been determined. The result of a load on processor A has been determined with respect to processor or coherent
I/O module B when a subsequent store to the location by B cannot affect the value returned by the load. The store by B must use the same memory access type as the load.
Performed store: A store instruction is performed when the store is observable. A store on processor A is observable with respectto processor or coherentI/O module B when a subsequentload ofthe
location by B returns the value written by the store. The load by B must use the same memory access type as the store.
Globally performed load: A load instruction is globally performed when it is performed with respect to all processors and coherent I/O modules capable of storing to the location.
Globally performed store: A store instruction is globally performed when it is globally observable.It is globally observable when it is observable by all processors and I/O modules capable of loading from
the location.
Global ordering point: A point in the coherence domain where when a memory instruction reaches,it can be guaranteed to retain its order relative to any memory instruction that reaches the point after
it.
CoherentI/O module: A coherentI/O module is an Input/Output system componentthat performs coherent Direct Memory Access (DMA). It reads and writes memory independently as though it were
a processor doing loads and stores to locations with a memory access type of cached coherent.
:
A processor executing load and store instructions observes the order in which loads and stores using the same memory access type occur in the instruction stream; this is known as program order.
A parallel program has multiple instruction streams that can execute simultaneously on different processors.
In multiprocessor (MP) systems,the order in which the effects ofloads and stores are observed by other processors - the global order of the loads and store - determines the actions necessary
to reliably share data in parallel programs.
When all processors observe the effects ofloads and stores in program order, the system is strongly ordered. On such systems, parallel programs can reliably share data without explicitly using a SYNC.
Executing SYNC on such a system is not necessary, will not cause an error, but may reduce overall performance.
If a multiprocessor system is not strongly ordered, the effects of load and store instructions executed by one processor may be observed out of program order by other processors. On such systems, parallel
programs must use SYNC to reliably share data at critical points in the program. SYNC separates the loads and stores executed on the processor into two groups, and the effect of allloads and stores in
one group is seen by all processors before the effect of any load or store in the subsequent group.In effect, SYNC causes the system to be strongly ordered for the executing processor at the instant that
the SYNC is executed.
The hardware ordering support provided in a MIPS-based multiprocessor system is implementation dependent. A parallel program that does not use SYNC generally does not operate on a system that is not
strongly ordered. However, a program that does use SYNC works on both types of systems.(Systemspecific documentation describes the actions needed to reliably share data in parallel programs for
that system.)
The behavior of a load or store using one memory access type is UNPREDICTABLE if a load or store was previously made to the same physical location using a different memory access type. The presence
of a SYNC between the references does not alter this behavior.
SYNC affects the order in which the effects of load and store instructions appear to all processors;it does not generally affect the physical memory-system ordering or synchronization issues that arise in
system programming. The effect of SYNC on implementation-specific aspects of the cached memory system, such as writeback buffers, is not defined.
The code fragments below show how SYNC can be used to coordinate the use of shared data between separate writer and reader instruction streams in a multiprocessor environment. The FLAG location is
used by the instruction streams to determine whether the shared data item DATA is valid. The SYNC executed by processor A forces the store of DATA to be performed globally before the store to FLAG
is performed. The SYNC executed by processor B ensures that DATA is not read until after the FLAG value indicates that the shared data is valid.
# Processor A (writer) # Conditions at entry: # The value 0 has been stored in FLAG and that valueisobservablebyB SW R1, DATA # change sharedDATA value LI R2, 1 SYNC # Perform DATAstore beforeperforming FLAGstore SW R2, FLAG # say that thesharedDATA value isvalid # Processor B (reader) LI R2, 1 1: LW R1, FLAG # Get FLAG BNEC R2, R1, 1B # if it says that DATAis not valid, poll again NOP SYNC # FLAG value checked beforedoing DATA read LW R1, DATA # Read (valid)sharedDATA value SYNC
None.