EXTEND 11110 |
stype |
0 |
00000 |
SHIFT 00110 |
000 |
000 |
sel = 5 |
SLL 00 |
5 |
5 |
1 |
5 |
5 |
3 |
3 |
3 |
5 |
SYNC stype |
MIPS16e2 |
Synchronize Shared Memory Extended |
Synchronize Shared Memory Extended
To order loads and stores for shared memory.
These types of ordering guarantees are available through the SYNC instruction:
Completion Barriers
Ordering Barriers
Completion Barrier - Simple Description:
The barrier affects only uncached and cached coherent loads and stores.
The specified memory instructions (loads or stores or both) that occur before the SYNC instruction must be completed before the specified memory instructions after the SYNC are allowed to start.
Loads are completed when the destination register is written. Stores are completed when the stored value is visible to every other processor in the system.
Completion Barrier - Detailed Description:
Every synchronizable specified memory instruction (loads or stores or both) that occurs in the instruction stream before the SYNC instruction must be already globally performed before any synchronizable specified memory instructions that occur after the SYNC are allowed to be performed, with respect to any other processor or coherent I/O module.
The barrier does not guarantee the order in which instruction fetches are performed.
A stype value of zero will always be defined such that it performs the most complete set of synchronization operations that are defined. This means stype zero always does a completion barrier that affects both loads and stores preceding the SYNC instruction and both loads and stores that are subsequent to the SYNC instruction. Non-zero values of stype may be defined by the architecture or specific implementations to perform synchronization behaviors that are less complete than that of stype zero. If an implementation does not use one of these non-zero values to define a different synchronization behavior, then that non-zero value of stype must act the same as stype zero completion barrier. This allows software written for an implementation with a lighter-weight barrier to work on another implementation which only implements the stype zero completion barrier.
A completion barrier is required, potentially in conjunction with EHB, to guarantee that memory reference results are visible across operating mode changes. For example, a completion barrier is required on some implementations on entry to and exit from Debug Mode to guarantee that memory effects are handled correctly.
SYNC behavior when the stype field is zero:
A completion barrier that affects preceding loads and stores and subsequent loads and stores.
Ordering Barrier - Simple Description:
The barrier affects only uncached and cached coherent loads and stores.
The specified memory instructions (loads or stores or both) that occur before the SYNC instruction must always be ordered before the specified memory instructions after the SYNC.
Memory instructions which are ordered before other memory instructions are processed by the load/store datapath first before the other memory instructions.
Ordering Barrier - Detailed Description:
Every synchronizable specified memory instruction (loads or stores or both) that occurs in the instruction stream before the SYNC instruction must reach a stage in the load/store datapath after which no instruction re-ordering is possible before any synchronizable specified memory instruction which occurs after the
SYNC instruction in the instruction stream reaches the same stage in the load/store datapath.
If any memory instruction before the SYNC instruction in program order, generates a memory request to the external memory and any memory instruction after the SYNC instruction in program order also generates a memory request to external memory, the memory request belonging to the older instruction must be globally performed before the time the memory request belonging to the younger instruction is globally performed.
The barrier does not guarantee the order in which instruction fetches are performed.
As compared to the completion barrier, the ordering barrier is a lighter-weight operation as it does not require the specified instructions before the SYNC to be already completed. Instead it only requires that those specified instructions which are subsequent to the SYNC in the instruction stream are never re-ordered for processing ahead of the specified instructions which are before the SYNC in the instruction stream. This potentially reduces how many cycles the barrier instruction must stall before it completes.
The Acquire and Release barrier types are used to minimize the memory orderings that must be maintained and still have software synchronization work.
Implementations that do not use any of the non-zero values of stype to define different barriers, such as ordering barriers, must make those stype values act the same as stype zero.
For the purposes of this description, the CACHE, PREF and PREFX instructions are treated as loads and stores. That is, these instructions and the memory transactions sourced by these instructions obey the ordering and completion rules of the SYNC instruction.
The following table lists the available completion barrier and ordering barriers behaviors that can be specified using the stype field.
Code |
Name |
Older instructions which must reach the load/store ordering point before the SYNC instruction completes. |
Younger instructions which must reach the load/store ordering point only after the SYNC instruction completes. |
Older instructions which must be globally performed when the SYNC instruction completes |
Compliance |
0x0 |
SYNC or SYNC 0 |
Loads, Stores |
Loads, Stores |
Loads, Stores |
Required |
0x4 |
SYNC_WMB or SYNC 4 |
Stores |
Stores |
Optional | |
0x10 |
SYNC_MB or SYNC 16 |
Loads, Stores |
Loads, Stores |
Optional | |
0x11 |
SYNC_ACQUIRE or SYNC 17 |
Loads |
Loads, Stores |
Optional | |
0x12 |
SYNC_RELEASE or SYNC 18 |
Loads, Stores |
Stores |
Optional | |
0x13 |
SYNC_RMB or SYNC 19 |
Loads |
Loads |
Optional | |
0x1-0x3, 0x5-0xF |
Implementation-Specific and Vendor Specific Sync Types | ||||
0x14 - 0x1F |
RESERVED |
Reserved for MIPS Technologies for future extension of the architecture. |
Synchronizable: A load or store instruction is synchronizable if the load or store occurs to a physical location in
shared memory using a virtual location with a memory access type of either uncached or cached coherent. Shared
memory is memory that can be accessed by more than one processor or by a coherent I/O system module.
Performed load: A load instruction is performed when the value returned by the load has been determined. The result
of a load on processor A has been determined with respect to processor or coherent I/O module B when a subsequent store to the location by B cannot affect the value returned by the load. The store by B must use the same memory access type as the load.
Performed store: A store instruction is performed when the store is observable. A store on processor A is observable
with respect to processor or coherent I/O module B when a subsequent load of the location by B returns the value written by the store. The load by B must use the same memory access type as the store.
Globally performed load: A load instruction is globally performed when it is performed with respect to all processors
and coherent I/O modules capable of storing to the location.
Globally performed store: A store instruction is globally performed when it is globally observable. It is globally observable when it is observable by all processors and I/O modules capable of loading from the location.
Coherent I/O module: A coherent I/O module is an Input/Output system component that performs coherent Direct
Memory Access (DMA). It reads and writes memory independently as though it were a processor doing loads and stores to locations with a memory access type of cached coherent.
Load/Store Datapath: The portion of the processor which handles the load/store data requests coming from the processor pipeline and processes those requests within the cache and memory system hierarchy.
Unpredictable prior to MIPS16e2. The effect of SYNC on the global order of loads and stores for memory access types other than uncached and cached coherent is UNPREDICTABLE.
SyncOperation(stype)
None
A processor executing load and store instructions observes the order in which loads and stores using the same memory access type occur in the instruction stream; this is known as program order.
A parallel program has multiple instruction streams that can execute simultaneously on different processors. In multiprocessor (MP) systems, the order in which the effects of loads and stores are observed by other processors-the
global order of the loads and store-determines the actions necessary to reliably share data in parallel programs.
When all processors observe the effects of loads and stores in program order, the system is strongly ordered. On such systems, parallel programs can reliably share data without explicit actions in the programs. For such a system, SYNC has the same effect as a NOP. Executing SYNC on such a system is not necessary, but neither is it an error.
If a multiprocessor system is not strongly ordered, the effects of load and store instructions executed by one processor may be observed out of program order by other processors. On such systems, parallel programs must take explicit actions to reliably share data. At critical points in the program, the effects of loads and stores from an instruction stream must occur in the same order for all processors. SYNC separates the loads and stores executed on the processor into two groups, and the effect of all loads and stores in one group is seen by all processors before the effect of any load or store in the subsequent group. In effect, SYNC causes the system to be strongly ordered for the executing processor at the instant that the SYNC is executed.
Many MIPS-based multiprocessor systems are strongly ordered or have a mode in which they operate as strongly ordered for at least one memory access type. The MIPS architecture also permits implementation of MP systems that are not strongly ordered; SYNC enables the reliable use of shared memory on such systems. A parallel program that does not use SYNC generally does not operate on a system that is not strongly ordered. However, a program that does use SYNC works on both types of systems. (System-specific documentation describes the actions needed to reliably share data in parallel programs for that system.)
The behavior of a load or store using one memory access type is UNPREDICTABLE if a load or store was previously made to the same physical location using a different memory access type. The presence of a SYNC between the references does not alter this behavior.
SYNC affects the order in which the effects of load and store instructions appear to all processors; it does not generally affect the physical memory-system ordering or synchronization issues that arise in system programming. The effect of SYNC on implementation-specific aspects of the cached memory system, such as writeback buffers, is not defined.
# Processor A (writer) # Conditions at entry: # The value 0 has been stored in FLAG and that value is observable by B SW R1, DATA # change shared DATA value LI R2, 1 SYNC # Perform DATA store before performing FLAG store SW R2, FLAG # say that the shared DATA value is valid # Processor B (reader) LI R2, 1 1: LW R1, FLAG # Get FLAG BNE R2, R1, 1B# if it says that DATA is not valid, poll again NOP SYNC # FLAG value checked before doing DATA read LW R1, DATA # Read (valid) shared DATA value
The code fragments above shows how SYNC can be used to coordinate the use of shared data between separate writer and reader instruction streams in a multiprocessor environment. The FLAG location is used by the instruction streams to determine whether the shared data item DATA is valid. The SYNC executed by processor A forces the store of
DATA to be performed globally before the store to FLAG is performed. The SYNC executed by processor B ensures that DATA is not read until after the FLAG value indicates that the shared data is valid.
Software written to use a SYNC instruction with a non-zero stype value, expecting one type of barrier behavior, should only be run on hardware that actually implements the expected barrier behavior for that non-zero stype value or on hardware which implements a superset of the behavior expected by the software for that stype value. If the hardware does not perform the barrier behavior expected by the software, the system may fail.