EXTEND 11110 |
Imm[10:5] |
Imm[15:11] |
ADDIUSP 00000 |
rx |
sel = 1 |
Imm[4:0] |
5 |
5 |
0 |
5 |
5 |
3 |
3 |
5
ADDIU rx, gp, immediate |
MIPS16e2 |
Add Immediate Unsigned Word (3-Operand, GP-Relative, Extended) |
Add Immediate Unsigned Word (3-Operand, GP-Relative, Extended)
To add a constant to the global pointer.
GPR[rx] = GPR[gp] + immediate
The 16-bit immediate is sign-extended and then added to the contents of GPR 28 to form a 32-bit result. The result is placed in GPR rx.
No integer overflow exception occurs under any circumstances.
None
temp = GPR[28] + sign_extend(immediate) GPR[XLat[rx]] = temp
None
The term "unsigned" in the instruction name is a misnomer; this operation is 32-bit modulo arithmetic that does not trap on overflow. It is appropriate for unsigned arithmetic, such as address arithmetic, or integer arithmetic environments that ignore overflow, such as C language arithmetic.
EXTEND 11110 |
Imm[10:5] |
Imm[15:11] |
LI 01101 |
rx |
sel = 3 |
Imm[4:0] |
5 |
5 |
5 |
5 |
3 |
3 |
3 |
ANDI rx, immediate |
MIPS16e2 |
AND Immediate Extended |
AND Immediate Extended
To do a bitwise logical AND with a constant.
GPR[rx] = GPR[rx] AND zero_extend(immediate)
The 16-bit immediate is zero-extended to the left and combined with the contents of GPR rx in a bitwise logical AND operation. The result is placed back into GPR rx.
Unpredictable prior to MIPS16e2.
GPR[XLat[rx]] = GPR[XLat[rx]] and zero_extend(immediate)
None
EXTEND 11110 |
00 |
Imm[8:5] |
op[4:0] |
SWPSP 11010 |
rx |
sel = 5 |
Imm[4:0] |
5 |
2 |
3 |
5 |
5 |
3 |
3 |
5 |
CACHE op, immediate(rx) |
MIPS16e2 |
Perform Cache Opereation Extended |
Perform Cache Opereation Extended
To perform the cache operation specified by the op field
The 9-bit immediate value is sign-extended and added to the contents of the base register to form an effective address
A TLB Refill and TLB invalid (both with cause code equal TLBL) exception can occur on any operation. For index operations (where the address is used to index the cache but need not match the cache tag), software must use
unmapped addresses to avoid TLB exceptions. This instruction never causes TLB Modified exceptions nor TLB
Refill exceptions with a cause code of TLBS. This instruction never causes Execute-Inhibit nor Read-Inhibit excep-
tions.
The effective address may be an arbitrarily-aligned by address. The CACHE instruction never causes an Address
Error Exception due to an non-aligned address.
As a result, a Cache Error exception may occur because of some operations performed by this instruction. For exam-
ple, if a Writeback operation detects a cache or bus error during the processing of the operation, that error is reported
via a Cache Error exception. Also, a Bus Error Exception may occur if a bus operation invoked by this instruction is
terminated in an error. However, cache error exceptions must not be triggered by an Index Load Tag or Index Store
tag operation, as these operations are used for initialization and diagnostic purposes.
An Address Error Exception (with cause code equal AdEL) may occur if the effective address references a portion of
the kernel address space which would normally result in such an exception. It is implementation dependent whether
such an exception does occur.
It is implementation dependent whether a data watch is triggered by a cache instruction whose address matches the
Watch register address match conditions.
The CACHE instruction and the memory transactions which are sourced by the CACHE instruction, such as cache
refill or cache writeback, obey the ordering and completion rules of the SYNC instruction.
Bits [17:16] of the instruction specify the cache on which to perform the operation, as follows:
Encoding of Bits[17:16] of CACHE Instruction
Code |
Name |
Cache |
0b00 |
I |
Primary Instruction |
0b01 |
D |
Primary Data or Unified Primary |
0b10 |
T |
Tertiary |
0b11 |
S |
Secondary |
Bits [20:18] of the instruction specify the operation to perform. To provide software with a consistent base of cache
operations, certain encodings must be supported on all processors. The remaining encodings are recommended
When implementing multiple level of caches and where the hardware maintains the smaller cache as a proper subset
of a larger cache (every address which is resident in the smaller cache is also resident in the larger cache; also known
as the inclusion property). It is recommended that the CACHE instructions which operate on the larger, outer-level
cache; must first operate on the smaller, inner-level cache. For example, a Hit_Writeback _Invalidate operation tar-
geting the Secondary cache, must first operate on the primary data cache first. If the CACHE instruction implementa-
tion does not follow this policy then any software which flushes the caches must mimic this behavior. That is, the
software sequences must first operate on the inner cache then operate on the outer cache. The software must place a
SYNC instruction after the CACHE instruction whenever there are possible writebacks from the inner cache to
ensure that the writeback data is resident in the outer cache before operating on the outer cache. If neither the CACHE
instruction implementation nor the software cache flush sequence follow this policy, then the inclusion property of
the caches can be broken, which might be a condition that the cache management hardware cannot properly deal with.
When implementing multiple level of caches without the inclusion property, the use of a SYNC instruction after the
CACHE instruction is still needed whenever writeback data has to be resident in the next level of memory hierarchy.
For multiprocessor implementations that maintain coherent caches, some of the Hit type of CACHE instruction oper-
ations may optionally affect all coherent caches within the implementation. If the effective address uses a coherent
Cache Coherency Attribute (CCA), then the operation is globalized, meaning it is broadcast to all of the coherent
caches within the system. If the effective address does not use one of the coherent CCAs, there is no broadcast of the
operation. If multiple levels of caches are to be affected by one CACHE instruction, all of the affected cache levels
must be processed in the same manner - either all affected cache levels use the globalized behavior or all affected
cache levels use the non-globalized behavior.
Encoding of Bits [20:18] of the CACHE Instruction
Code |
Caches |
Name |
Effective Address Operand Type |
Operation |
Compliance Implemented |
0b000 |
I |
Index Invalidate |
Index |
Set the state of the cache block at the specified index to invalid. This required encoding may be used by software to invalidate the entire instruction cache by step- ping through all valid indices. |
Required |
D |
Index Writeback Invalidate / Index Invalidate |
Index |
For a write-back cache: If the state of the cache block at the specified index is valid and dirty, write the block back to the memory address specified by the cache tag. After that operation |
Required | |
S, T |
Index Writeback Invalidate / Index Invalidate |
Index |
is completed, set the state of the cache block to invalid. If the block is valid but not dirty, set the state of the block to invalid. For a write-through cache: Set the state of the cache block at the specified index to invalid. This required encoding may be used by software to invalidate the entire data cache by stepping through all valid indices. The Index Store Tag must be used to initialize the cache at power up. |
Required if S, T cache is implemented | |
0b001 |
All |
Index Load Tag |
Index |
Read the tag for the cache block at the specified index into the TagLo and TagHi Coprocessor 0 registers. If the DataLo and DataHi registers are implemented, also read the data correspond- ing to the byte index into the DataLo and DataHi registers. This operation must not cause a Cache Error Exception. The granularity and alignment of the data read into the DataLo and DataHi registers is imple- mentation-dependent, but is typically the result of an aligned access to the cache, ignoring the appropriate low-order bits of the byte index. |
Recommended |
0b010 |
All |
Index Store Tag |
Index |
Write the tag for the cache block at the specified index from the TagLo and TagHi Coprocessor 0 registers. This operation must not cause a Cache Error Exception. This required encoding may be used by software to initialize the entire instruction or data caches by stepping through all valid indices. Doing so requires that the TagLo and TagHi registers associated with the cache be initialized first. |
Required |
0b011 |
All |
Implementation Dependent |
Unspecified |
Available for implementation-dependent opera- tion. |
Optional |
0b100 |
I, D |
Hit Invalidate |
Address |
If the cache block contains the specified address, set the state of the cache block to invalid. This required encoding may be used by software to invalidate a range of addresses from the |
Required (Instruction Cache Encoding Only), Recom- mended otherwise |
S, T |
Hit Invalidate |
Address |
instruction cache by stepping through the address range by the line size of the cache. In multiprocessor implementations with coher- ent caches, the operation may optionally be broadcast to all coherent caches within the sys- tem. |
Optional, if Hit_Invalidate_D is implemented, the S and T variants are rec- ommended. | |
0b101 |
I |
Fill |
Address |
Fill the cache from the specified address. |
Recommended |
D |
Hit Writeback Inval- idate / Hit Invalidate |
Address |
For a write-back cache: If the cache block con- tains the specified address and it is valid and dirty, write the contents back to memory. After |
Required | |
S, T |
Hit Writeback Inval- idate / Hit Invalidate |
Address |
that operation is completed, set the state of the cache block to invalid. If the block is valid but not dirty, set the state of the block to invalid. For a write-through cache: If the cache block contains the specified address, set the state of the cache block to invalid. This required encoding may be used by software to invalidate a range of addresses from the data cache by stepping through the address range by the line size of the cache. In multiprocessor implementations with coher- ent caches, the operation may optionally be broadcast to all coherent caches within the sys- tem. |
Required if S, T cache is implemented | |
0b110 |
D |
Hit Writeback |
Address |
If the cache block contains the specified address and it is valid and dirty, write the contents back |
Recommended |
S, T |
Hit Writeback |
Address |
to memory. After the operation is completed, leave the state of the line valid, but clear the dirty state. For a write-through cache, this oper- ation may be treated as a nop. In multiprocessor implementations with coher- ent caches, the operation may optionally be broadcast to all coherent caches within the sys- tem. |
Optional, if Hit_Writeback_D is implemented, the S and T variants are rec- ommended. | |
0b111 |
I, D |
Fetch and Lock |
Address |
If the cache does not contain the specified address, fill it from memory, performing a write- back if required. Set the state to valid and locked. If the cache already contains the specified address, set the state to locked. In set-associative or fully-associative caches, the way selected on a fill from memory is implementation depen- dent. The lock state may be cleared by executing an Index Invalidate, Index Writeback Invalidate, Hit Invalidate, or Hit Writeback Invalidate oper- ation to the locked line, or via an Index Store Tag operation to the line that clears the lock bit. Clearing the lock state via Index Store Tag is dependent on the implementation-dependent cache tag and cache line organization, and that Index and Index Writeback Invalidate opera- tions are dependent on cache line organization. Only Hit and Hit Writeback Invalidate opera- tions are generally portable across implementa- tions. It is implementation dependent whether a locked line is displaced as the result of an external invalidate or intervention that hits on the locked line. Software must not depend on the locked line remaining in the cache if an external invali- date or intervention would invalidate the line if it were not locked. It is implementation dependent whether a Fetch and Lock operation affects more than one line. For example, more than one line around the ref- erenced address may be fetched and locked. It is recommended that only the single line contain- ing the referenced address be affected. |
Recommended |
The operation of this instruction is UNDEFINED for any operation/cache combination that is not implemented.
The operation of this instruction is UNDEFINED if the operation requires an address, and that address is uncacheable.
The operation of the instruction is UNPREDICTABLE if the cache line that contains the CACHE instruction is the target of an invalidate or a writeback invalidate.
If this instruction is used to lock all ways of a cache at a specific cache index, the behavior of that cache to subsequent cache misses to that cache index is UNDEFINED.
If access to Coprocessor 0 is not enabled, a Coprocessor Unusable Exception is signaled.
Any use of this instruction that can cause cacheline writebacks should be followed by a subsequent SYNC instruction to avoid hazards where the writeback data is not yet visible at the next level of the memory hierarchy.
This instruction does not produce an exception for a misaligned memory address, since it has no memory access size.
vAddr = GPR[XLat[rx]] + sign_extend(immediate) (pAddr, uncached) = AddressTranslation(vAddr, DataReadReference) CacheOp(op, vAddr, pAddr)
TLB Refill Exception.
TLB Invalid Exception
Coprocessor Unusable Exception
Address Error Exception
Cache Error Exception
Bus Error Exception
For cache operations that require an index, it is implementation dependent whether the effective address or the translated physical address is used as the cache index. Therefore, the index value should always be converted to an unmapped address (such as an kseg0 address - by ORing the index with 0x80000000 before being used by the cache instruction). For example, the following code sequence performs a data cache Index Store Tag operation using the index passed in GPR a0:
li a1, 0x80000000 /* Base of kseg0 segment */ or a0, a1 /* Convert index to kseg0 address */ cache DCIndexStTag, 0(a0) /* Perform the index store tag operation */
EXTEND 11110 |
CP0 000 |
sel[2:0] 000 |
CLRBIT_NORES 00110 (DI) |
I8 01100 |
MOVR32 111 |
000 |
01100 |
EXTEND 11110 |
CP0 000 |
sel[2:0] 000 |
CLRBIT 00010 (DI ry) |
I8 01100 |
MOVR32 111 |
ry |
01100 |
5 |
3 |
3 |
5 |
5 |
3 |
3 |
5 |
DI |
MIPS16e2 |
Disable Interrupts Extended |
DI ry |
MIPS16e2 |
Disable Interrupts Extended |
Disable Interrupts Extended
To return the previous value of the Status register and disable interrupts. If DI is specified without an argument, GPR r0 is implied, which discards the previous value of the Status register.
GPR[ry] = Status; StatusIE = 0
The current value of the Status register is loaded into general register ry. The Interrupt Enable (IE) bit in the Status register is then cleared.
Unpredictable prior to MIPS16e2. If access to Coprocessor 0 is not enabled, a Coprocessor Unusable Exception is signaled.
Operation - DI:
The following operation pertains to the DI instruction.
StatusIE = 0
Operation - DI ry:
The following operation pertains to the DI ry instruction.
data = Status GPR[XLat[ry]] = data StatusIE = 0
Coprocessor Unusable
The effects of this instruction are identical to those accomplished by the sequence of reading Status into a GPR, clearing the IE bit, and writing the result back to Status. Unlike the multiple instruction sequence, however, the DI instruction cannot be aborted in the middle by an interrupt or exception.
This instruction creates an execution hazard between the change to the Status register and the point where the change to the interrupt enable takes effect. This hazard is cleared by the EHB, JALR.HB, JR.HB, or ERET instructions. Software must not assume that a fixed latency will clear the execution hazard.
EXTEND 11110 |
CP0 000 |
sel[2:0] 001 |
CLRBIT_NORES 00110 |
I8 01100 |
MOVR32 111 |
000 |
00001 |
EXTEND 11110 |
CP0 000 |
sel[2:0] 001 |
CLRBIT 00010 |
I8 01100 |
MOVR32 111 |
ry |
00001 |
5 |
3 |
3 |
5 |
5 |
3 |
3 |
5 |
DMT |
MIPS16e2 |
Disable Multi-Threaded Execution Extended |
DMT ry |
MIPS16e2 |
Disable Multi-Threaded Execution Extended |
Disable Multi-Threaded Execution Extended
To return the previous value of the VPEControl register and disable multi-threaded execution. If DMT is specified without an argument, GPR r0 is implied, which discards the previous value of the VPEControl register.
GPR[ry] = VPEControl; VPEControlTE = 0
The current value of the VPEControl register isloaded into general register ry. The Threads Enable (TE) bit in the
VPEControl register is then cleared, suspending concurrent execution of instruction streams other than that which
issues the DMT. This is independent of any per-TC halted state.
If access to Coprocessor 0 is not enabled, a Coprocessor Unusable Exception is signaled.
In implementations that do not implement the MT Module, this instruction results in a Reserved Instruction Exception. Unpredictable prior to MIPS16e2.
Operation - DMT:
The following operation pertains to the DMT instruction.
VPEControlTE = 0
Operation - DMT ry:
The following operation pertains to the DMT ry instruction.
data = VPEControl GPR[XLat[ry]] = sign_extend(data) VPEControlTE = 0
Coprocessor Unusable
Reserved Instruction (Implementations that do not include the MT Module)
The effects of this instruction are identical to those accomplished by the sequence of reading VPEControl into a GPR, clearing the TE bit to create a temporary value in a second GPR, and writing that value back to VPEControl. Unlike the multiple instruction sequence, however, the DMT instruction does not consume a temporary register, and cannot be aborted by an interrupt or exception.
The effect of a DMT instruction may not be instantaneous. An instruction hazard barrier, e.g., JR.HB, is required to guarantee that all other threads have been suspended. If a DMT instruction is followed in the same instruction stream by an MFC0 or MFTR from the VPEControl register, a JALR.HB, JR.HB, EHB, or ERET instruction must be issued between the DMT and the read of VPEControl to guarantee that the new state of TE will be accessed by the read.
EXTEND 11110 |
CP0 000 |
sel[2:0] 001 |
CLRBIT_NORES 00110 |
I8 01100 |
MOVR32 111 |
000 |
00000 |
EXTEND 11110 |
CP0 000 |
sel[2:0] 001 |
CLRBIT 00010 |
I8 01100 |
MOVR32 111 |
ry |
00000 |
5 |
3 |
3 |
5 |
5 |
3 |
3 |
5 |
DVPE |
MIPS16e2 |
Disable Virtual Processor Execution Extended |
DVPE ry |
MIPS16e2 |
Disable Virtual Processor Execution Extended |
Disable Virtual Processor Execution Extended
To return the previous value of the MVPControl register and disable multi-VPE execution. If DVPE is specified without an argument, GPR r0 is implied, which discards the previous value of the MVPControl register.
GPR[ry] = MVPControl; MVPControlEVP = 0
The current value of the MVPControl register isloaded into general register ry. The Enable Virtual Processors (EVP) bit in the MVPControl register is then cleared, suspending concurrent execution of instruction streams other than the instruction stream that issues the DVPE.
Unpredictable prior to MIPS16e2. If access to Coprocessor 0 is not enabled, a Coprocessor Unusable Exception is signaled. If the VPE executing the instruction is not a Master VPE, with the MVP bit of the VPEConf0 register set, the
EVP bit is unchanged by the instruction.
In implementations that do not implement the MT Module, this instruction results in a Reserved Instruction Exception.
Operation - DVPE:
The following operation pertains to the DVPE instruction.
if(VPEConf0MVP = 0) then MVPControlEVP = 0 endif
Operation - DVPE ry:
The following operation pertains to the DVPE ry instruction.
data = MVPControl GPR[XLat[ry]] = data if(VPEConf0MVP = 0) then MVPControlEVP = 0 endif
Coprocessor Unusable
Reserved Instruction (Implementations that do not include the MT Module)
The effects of this instruction are identical to those accomplished by the sequence of reading MVPControl into a GPR, clearing the EVP bit to create a temporary value in a second GPR, and writing that value back to MVPControl. Unlike the multiple instruction sequence, however, the DVPE instruction does not consume a temporary register, and cannot be aborted by an interrupt or exception, nor by the scheduling of a different instruction stream.
The effect of a DVPE instruction may not be instantaneous. An instruction hazard barrier, e.g., JR.HB, is required to guarantee that all other TCs have been suspended.
If a DVPE instruction is followed in the same instruction stream by an MFC0 or MFTR from the MVPControl register, a JALR.HB, JR.HB, EHB, or ERET instruction must be issued between the DVPE and the read of MVPControl to guarantee that the new state of EVP will be accessed by the read.
EXTEND 11110 |
00011 |
0 |
00000 |
SHIFT 00110 |
000 |
000 |
sel = 4 |
SLL 00 |
5 |
5 |
1 |
5 |
5 |
3 |
3 |
3 |
2 |
EHB |
MIPS16e2 |
Execution Hazard Barrier Extended |
Execution Hazard Barrier Extended
To stop instruction execution until all execution hazards have been cleared.
EHB is used to denote execution hazard barrier. The actual instruction is interpreted by the hardware as SLL r0, r0, 3.
This instruction alters the instruction issue behavior on a pipelined processor by stopping execution until all execution hazards have been cleared. Other than those that might be created as a consequence of setting StatusCU0, there are no execution hazards visible to an unprivileged program running in User Mode. All execution hazards created by previous instructions are cleared for instructions executed immediately following the EHB, even if the EHB is executed in the delay slot of a branch or jump. The EHB instruction does not clear instruction hazards-such hazards are cleared by the JALR.HB, JR.HB, and ERET instructions.
Unpredictable prior to MIPS16e2.
ClearExecutionHazards()
None
This instruction resolves all execution hazards.
EXTEND 11110 |
CP0 000 |
sel[2:0] 000 |
SETBIT_NORES 00111 |
I8 01100 |
MOVR32 111 |
000 |
01100 |
EXTEND 11110 |
CP0 000 |
sel[2:0] 000 |
SETBIT 00011 |
I8 01100 |
MOVR32 111 |
ry |
01100 |
5 |
3 |
3 |
5 |
5 |
3 |
3 |
5 |
EI |
MIPS16e2 |
Enable Interrupts Extended |
EI ry |
MIPS16e2 |
Enable Interrupts Extended |
Enable Interrupts Extended
To return the previous value of the Status register and enable interrupts. If EI is specified without an argument, GPR r0 is implied, which discards the previous value of the Status register.
GPR[ry] = Status; StatusIE = 1
The current value of the Status register is loaded into general register ry. The Interrupt Enable (IE) bit in the Status register is then set.
Unpredictable prior to MIPS16e2. If access to Coprocessor 0 is not enabled, a Coprocessor Unusable Exception is signaled.
Operation - EI:
The following operation pertains to the EI instruction.
StatusIE = 1
Operation - EI ry:
The following operation pertains to the EI ry instruction.
data = Status GPR[XLat[ry]] = data StatusIE = 1
Coprocessor Unusable
Reserved Instruction
The effects of this instruction are identical to those accomplished by the sequence of reading Status into a GPR, setting the IE bit, and writing the result back to Status. Unlike the multiple instruction sequence, however, the EI instruction cannot be aborted in the middle by an interrupt or exception.
This instruction creates an execution hazard between the change to the Status register and the point where the change to the interrupt enable takes effect. This hazard is cleared by the EHB, JALR.HB, JR.HB, or ERET instructions. Software must not assume that a fixed latency will clear the execution hazard.
EXTEND 11110 |
CP0 000 |
sel[2:0] 001 |
SETBIT_NORES 00111 |
I8 01100 |
MOVR32 111 |
000 |
00001 |
EXTEND 11110 |
CP0 000 |
sel[2:0] 001 |
SETBIT 00011 |
I8 01100 |
MOVR32 111 |
ry |
00001 |
5 |
3 |
3 |
5 |
5 |
3 |
3 |
5 |
EMT |
MIPS16e2 |
Enable Multi-Threaded Execution Extended |
EMT ry |
MIPS16e2 |
Enable Multi-Threaded Execution Extended |
Enable Multi-Threaded Execution Extended
To return the previous value of the VPEControl register and to enable multi-threaded execution. If EMT is specified without an argument, GPR r0 is implied, which discards the previous value of the VPEControl register.
GPR[ry] = VPEControl; VPEControlTE = 1
The current value of the VPEControl register isloaded into general register ry. The Threads Enable (TE) bit in the
VPEControl register is then set, allowing multiple instruction streams to execute concurrently.
Unpredictable prior to MIPS16e2. If access to Coprocessor 0 is not enabled, a Coprocessor Unusable Exception is signaled.
In implementations that do not implement the MT Module, this instruction results in a Reserved Instruction Exception.
Operation - EMT:
The following operation pertains to the EMT instruction.
VPEControlTE = 1
Operation - EMT ry:
The following operation pertains to the EMT ry instruction.
data = VPEControl GPR[XLat[ry]] = sign_extend(data) VPEControlTE = 1
Coprocessor Unusable
Reserved Instruction (Implementations that do not include the MT Module)
The effects of this instruction are identical to those accomplished by the sequence of reading VPEControl into a GPR, setting the TE bit to create a temporary value in a second GPR, and writing that value back to VPEControl. Unlike the multiple instruction sequence, however, the EMT instruction does not consume a temporary register, and cannot be aborted by an interrupt or exception.
If an EMT instruction is followed in the same instruction stream by an MFC0 or MFTR from the VPEControl register, a JALR.HB, JR.HB, EHB, or ERET instruction must be issued between the EMT and the read of VPEControl to guarantee that the new state of TE will be accessed by the read.
EXTEND 11110 |
CP0 000 |
sel[2:0] 001 |
SETBIT_NORES 00111 |
I8 01100 |
MOVR32 111 |
000 |
00000 |
EXTEND 11110 |
CP0 000 |
sel[2:0] 001 |
SETBIT 00011 |
I8 01100 |
MOVR32 111 |
ry |
00000 |
5 |
3 |
3 |
5 |
5 |
3 |
3 |
5 |
EVPE |
MIPS16e2 |
Enable Virtual Processor Execution Extended |
EVPE ry |
MIPS16e2 |
Enable Virtual Processor Execution Extended |
Enable Virtual Processor Execution Extended
To return the previous value of the MVPControl register and enable multi-VPE execution. If EVPE is specified without an argument, GPR r0 is implied, which discards the previous value of the MVPControl register.
GPR[ry] = MVPControl; MVPControlEVP = 1
The current value of the MVPControl register isloaded into general register ry. The Enable Virtual Processors (EVP) bit in the MVPControl register is then set, enabling concurrent execution of instruction streams on all non-inhibited
Virtual Processing Elements (VPEs) on a processor.
Unpredictable prior to MIPS16e2. If access to Coprocessor 0 is not enabled, a Coprocessor Unusable Exception is signaled. If the VPE executing the instruction is not a Master VPE, with the MVP bit of the VPEConf0 register set, the
EVP bit is unchanged by the instruction.
In implementations that do not implement the MT Module, this instruction results in a Reserved Instruction Exception.
Operation - EVPE:
The following operation pertains to the EVPE instruction.
if(VPEConf0MVP = 1) then MVPControlEVP = 1 endif
Operation - EVPE ry:
The following operation pertains to the EVPE ry instruction.
data = MVPControl GPR[XLat[ry]] = data if(VPEConf0MVP = 1) then MVPControlEVP = 1 endif
Coprocessor Unusable
Reserved Instruction (Implementations that do not include the MT Module)
The effects of this instruction are identical to those accomplished by the sequence of reading MVPControl into a GPR, setting the EVP bit to create a temporary value in a second GPR, and writing that value back to MVPControl. Unlike the multiple instruction sequence, however, the EVPE instruction does not consume a temporary register, and cannot be aborted by an interrupt or exception, nor by the scheduling of a different instruction stream.
If an EVPE instruction is followed in the same instruction stream by an MFC0 or MFTR from the MVPControl register, a JALR.HB, JR.HB, EHB, or ERET instruction must be issued between the EVPE and the read of MVPControl to guarantee that the new state of EVP will be accessed by the read.
EXTEND 11110 |
LSB (pos) |
1 |
MSBD (size-1) |
SHIFT 00110 |
rx |
ry |
sel = 2 |
SLL 00 |
5 |
5 |
1 |
5 |
5 |
3 |
3 |
3 |
2 |
EXT ry, rx, pos, size |
MIPS16e2 |
Extract Bit Field Extended |
Extract Bit Field Extended
To extract a bit field from GPR rx and store it right-justified into GPR ry
GPR[ry] = ExtractField(GPR[rx], msbd, lsb)
The bit field starting at bit pos and extending for size bits is extracted from GPR rx and stores zero-extended and right-justified in GPR ry
In implementations prior to MIPS16e2, this instriction yields unpredicable result. It would typically be executed as an SSL instruction. The operation in UNPREDICTABLE if lsb+msbd > 31
if (lsb + msbd) > 32 then UNPREDICTABLE endif temp = 032 - (msbd+1) || GPR[XLat[rx]]msbd+lsb..lsb GPR[XLat[ry]] = temp
None
EXTEND 11110 |
LSB (pos) |
0 |
MSB (pos+size-1) |
SHIFT 00110 |
000 |
ry |
sel = 1 |
SLL 00 |
5 |
5 |
1 |
5 |
5 |
3 |
3 |
3 |
2 |
INS ry, $0, pos, size |
MIPS16e2 |
Insert Bit Field 0 Extended |
Insert Bit Field 0 Extended
To merge bits with a value of zero into a specified field GPR ry
GPR[ry] = InsertField(GPR[ry], msb, lsb)
Size bits with a zero are merged into the value from GPR ry starting at bit position pos. The result is placed back in GPR ry.
In implementations prior to MIPS16e2, this instriction yields unpredicable result. It would typically be executed as an SLL instruction. The operation is UNPREDICTABLE if lsb > msb.
if lsb > msb then UNPREDICTABLE endif GPR[XLat[ry]] = GPR[XLat[ry]]31..msb+1 || GPR[XLat[ry]]msb-lsb..0 || GPR[XLat[ry]]lsb-1..0
None
GPR[XLat[ry]] = GPR[XLat[ry]]31..msb+1 || GPR[XLat[ry]]msb-lsb..0 || GPR[XLat[ry]]lsb-1..0
None
EXTEND 11110 |
LSB (pos) |
1 |
MSB (pos+size-1) |
SHIFT 00110 |
rx |
ry |
sel = 1 |
SLL 00 |
5 |
5 |
1 |
5 |
5 |
3 |
3 |
3 |
2 |
INS ry, rx, pos, size |
MIPS16e2 |
Insert Bit Field Extended |
Insert Bit Field Extended
To merge a right-justified bit field from GPR rx info a specified field in GPR ry
GPR[ry] = InsertField(GPR[ry], GPR[rx], msbd, lsb)
TThe right-most size bits from GPR rx are merged into the value GPR ry starting at bit position pos. The result is placed back in GPR ry.
In implementations prior to MIPS16e2, this instriction yields unpredicable result. It would typically be executed as an SLL instruction. The operation is UNPREDICTABLE if lsb > msb.
if lsb > msb)then UNPREDICTABLE endif GPR[XLat[ry]] = GPR[XLat[ry]]31..msb+1 || GPR[XLat[rx]]msb-lsb..0 || GPR[XLat[ry]]lsb-1..0
None
GPR[XLat[ry]] = GPR[XLat[ry]]31..msb+1 || GPR[XLat[ry]]msb-lsb..0 || GPR[XLat[ry]]lsb-1..0
None
EXTEND 11110 |
Imm[10:5] |
Imm[15:11] |
LWSP 10010 |
rx |
sel = 3 |
Imm[4:0] |
5 |
5 |
5 |
5 |
5 |
3 |
3 |
LB rx, immediate(gp) |
MIPS16e2 |
Load Byte (GP-relative) Extended |
Load Byte (GP-relative) Extended
To load a byte from memory as a signed value.
GPR[rx] = memory[GPR[gp] + immediate]
The 16-bit immediate value is sign-extended, then added to the contents of GPR 28 to form the effective address. The contents of the byte at the memory location specified by the effective address are sign-extended and loaded into GPR
rx.
Unpredictable prior to MIPS16e2.
vAddr = sign_extend(immediate) + GPR[28] (pAddr, CCA) = AddressTranslation (vAddr, DATA, LOAD) pAddr = pAddrPSIZE-1..2 || (pAddr1..0 xor ReverseEndian2) memword = LoadMemory (CCA, BYTE, pAddr, vAddr, DATA) byte = vAddr1..0 xor BigEndianCPU2 GPR[Xlat(rx)] = sign_extend(memword7+8*byte..8*byte)
TLB Refill, TLB Invalid, Bus Error, Address Error
EXTEND 11110 |
Imm[10:5] |
Imm[15:11] |
LWSP 10010 |
rx |
sel = 5 |
Imm[4:0] |
5 |
5 |
5 |
5 |
5 |
3 |
3 |
LBU rx, immediate(gp) |
MIPS16e2 |
Load Byte Unsigned (GP-relative) Extended |
Load Byte Unsigned (GP-relative) Extended
To load a byte from memory as an unsigned value
GPR[rx] = memory[GPR[gp] + immediate]
The 16-bit immediate value is sign-extended, then added to the contents of GPR 28 to form the effective address. The contents of the byte at the memory location specified by the effective address are zero-extended and loaded into GPR
rx.
Unpredictable prior to MIPS16e2.
vAddr = sign_extend(immediate) + GPR[28] (pAddr, CCA) = AddressTranslation (vAddr, DATA, LOAD) pAddr = pAddrPSIZE-1..2 || (pAddr1..0 xor ReverseEndian2) memword = LoadMemory (CCA, BTE, pAddr, vAddr, DATA) byte = vAddr1..0 xor BigEndianCPU2 GPR[Xlat(rx)] = zero_extend(memword7+8*byte..8*byte)
TLB Refill, TLB Invalid, Bus Error, Address Error
EXTEND 11110 |
Imm[10:5] |
Imm[15:11] |
LWSP 10010 |
rx |
sel = 2 |
Imm[4:0] |
5 |
5 |
5 |
5 |
5 |
3 |
3 |
LH rx, immediate(gp) |
MIPS16e2 |
Load Halfword (GP-relative) Extended |
Load Halfword (GP-relative) Extended
To load a halfword from memory as a signed value.
GPR[rx] = memory[GPR[gp] + immediate]
The 16-bit immediate value is sign-extended, then added to the contents of GPR 28 to form the effective address. The contents of the halfword at the memory location specified by the effective address are sign-extended and loaded into
GPR rx.
Unpredictable prior to MIPS16e2. The effective address must be naturally-aligned. If the least-significant bit of the address is non-zero, an Address Error exception occurs.
vAddr = sign_extend(immediate) + GPR[28] if vAddr0 != 0 then SignalException(AddressError) endif (pAddr, CCA) = AddressTranslation (vAddr, DATA, LOAD) pAddr = pAddrPSIZE-1..2 || (pAddr1..0 xor (ReverseEndian || 0)) memword = LoadMemory (CCA, HALFWORD, pAddr, vAddr, DATA) byte = vAddr1..0 xor (BigEndianCPU || 0) GPR[Xlat(rx)] = sign_extend(memword15+8*byte..8*byte)
TLB Refill, TLB Invalid, Bus Error, Address Error
EXTEND 11110 |
Imm[10:5] |
Imm[15:11] |
LWSP 10010 |
rx |
sel = 4 |
Imm[4:0] |
5 |
5 |
5 |
5 |
5 |
3 |
3 |
LHU rx, immediate(gp) |
MIPS16e2 |
Load Halfword Unsigned Extended |
Load Halfword Unsigned Extended
To load a halfword from memory as an unsigned value.
GPR[rx] = memory[GPR[gp] + immediate]
The 16-bit immediate value is sign-extended, then added to the contents of GPR 28 to form the effective address. The contents of the halfword at the memory location specified by the effective address are zero-extended and loaded into
GPR rx.
Unpredictable prior to MIPS16e2. The effective address must be naturally-aligned. If the least-significant bit of the address is non-zero, an Address Error exception occurs.
vAddr = sign_extend(immediate) + GPR[28] if vAddr0 != 0 then SignalException(AddressError) endif (pAddr, CCA) = AddressTranslation (vAddr, DATA, LOAD) pAddr = pAddrPSIZE-1..2 || (pAddr1..0 xor (ReverseEndian || 0)) memword = LoadMemory (CCA, HALFWORD, pAddr, vAddr, DATA) byte = vAddr1..0 xor (BigEndianCPU || 0) GPR[Xlat(rx)] = zero_extend(memword15+8*byte..8*byte)
TLB Refill, TLB Invalid, Bus Error, Address Error
EXTEND 11110 |
00 |
Imm[8:5] |
00 |
rb |
LWSP 10010 |
rx |
sel = 6 |
Imm[4:0] |
5 |
5 |
4 |
2 |
3 |
5 |
3 |
3 |
5 |
LL rx, immediate(rb) |
MIPS16e2 |
Load Linked Word Immediate |
Load Linked Word Immediate
To load a word from memory for an atomic read-modify-write.
GPR[rx] = memory[GPR[rb] + immediate]
The LL and SC instructions provide the primitives to implement atomic read-modify-write (RMW) operations for synchronizable memory locations.
The contents of the 32-bit word at the memory location specified by the aligned effective address are fetched and written into GPR rx. The 9-bit signed immediate value is added to the contents of GPR rb to form an effective address.
This begins a RMW sequence on the current processor. There can be only one active RMW sequence per processor.
When an LL is executed it starts an active RMW sequence replacing any other sequence that was active. The RMW sequence is completed by a subsequent SC instruction that either completes the RMW sequence atomically and succeeds, or does not and fails.
Executing LL on one processor does not cause an action that, by itself, causes an SC for the same block to fail on another processor.
An execution of LL does not have to be followed by execution of SC; a program is free to abandon the RMW sequence without attempting a write.
Unpredictable prior to MIPS16e2. The addressed location must be synchronizable by all processors and I/O devices sharing the location; if it is not, the result is UNPREDICTABLE. Which storage is synchronizable is a function of both CPU and system implementations. See the documentation of the SC instruction for the formal definition.
The effective address must be naturally-aligned. If either of the 2 least-significant bits of the effective address is nonzero, an Address Error exception occurs.
vAddr = sign_extend(immediate) + GPR[XLat[rb]] if vAddr1..0 != 02 then SignalException(AddressError) endif (pAddr, CCA) = AddressTranslation (vAddr, DATA, LOAD) memword = LoadMemory (CCA, WORD, pAddr, vAddr, DATA) GPR[XLat[rx]]= memword LLbit = 1
TLB Refill, TLB Invalid, Address Error, Watch
MIPS16e2 implements a 9-bit immediate value as the offset.
EXTEND 11110 |
Imm[10:5] |
Imm[15:11] |
LI 01101 |
rx |
sel = 1 |
Imm[4:0] |
5 |
5 |
5 |
5 |
5 |
3 |
3 |
LUI rx, immediate |
MIPS16e2 |
Load Upper Immediate Extended |
Load Upper Immediate Extended
To load a constant into the upper half of a word.
GPR[rx] = immediate || 016
The 16-bit immediate is shifted left 16 bits and concatenated with 16 bits of low-order zeros. The 32-bit result is placed into GPR rx.
Unpredictable prior to MIPS16e2.
GPR[XLat[rx]] = immediate || 016
None
EXTEND 11110 |
00 |
Imm[8:5] |
00 |
rb |
LWSP 10010 |
rx |
sel = 7 |
Imm[4:0] |
5 |
5 |
4 |
2 |
3 |
5 |
3 |
3 |
5 |
LWL rx, immediate(rb) |
MIPS16e2 |
Load Word Left Extended |
Load Word Left Extended
To load the most-significant part of a word as a signed value from an unaligned memory address
GPR[rx] = GPR[rx] MERGE memory[GPR[rb] + immediate]
The 9-bit signedimmediate value is added to the contents of GPR rb to formm an effictive address (EffAddr).
Unpredictable prior to MIPS16e2.
vAddr = sign_extend(immediate) + GPR[XLat[rb]] (pAddr, CCA) = AddressTranslation (vAddr, DATA, LOAD) pAddr = pAddrPSIZE-1..2 || (pAddr1..0 xor ReverseEndian2) if BigEndianMem = 0 then pAddr = pAddrPSIZE-1..2 || 02 endif byte = vAddr1..0 xor BigEndianCPU2 memword = LoadMemory (CCA, byte, pAddr, vAddr, DATA) temp = memword7+8*byte..0 || GPR[XLat[rx]]23-8*byte..0 GPR[XLat[rx]] = temp
None
TLB Refill, TLB Invalid, Bus Error, Address Error, Watch
The architecture provides no direct support for treating unaligned words as unsigned values, that is, zeroing bits
63..32 of the destination register when bit 31 is loaded.
EXTEND 11110 |
00 |
Imm[8:5] |
10 |
rb |
LWSP 10010 |
rx |
sel = 7 |
Imm[4:0] |
5 |
5 |
4 |
2 |
3 |
5 |
3 |
3 |
5 |
LWR rx, immediate(rb) |
MIPS16e2 |
Load Word Right Extended |
Load Word Right Extended
To load the least-significant part of a word as a signed value from an unaligned memory address
GPR[rx] = GPR[rx] MERGE memory[GPR[rb] + immediate]
The 9-bit signedimmediate value is added to the contents of GPR rb to form an effictive address (EffAddr).
Unpredictable prior to MIPS16e2.
vAddr = sign_extend(immediate) + GPR[XLat[rb]] (pAddr, CCA) = AddressTranslation (vAddr, DATA, LOAD) pAddr = pAddrPSIZE-1..2 || (pAddr1..0 xor ReverseEndian2) if BigEndianMem = 0 then pAddr = pAddrPSIZE-1..2 || 02 endif byte = vAddr1..0 xor BigEndianCPU2 memword = LoadMemory (CCA, byte, pAddr, vAddr, DATA) temp = memword31..32-8*byte || GPR[XLat[rx]]31-8*byte..0 GPR[XLat[rx]] = temp
TLB Refill, TLB Invalid, Bus Error, Address Error, Watch
The architecture provides no direct support for treating unaligned words as unsigned values, that is, zeroing bits
63..32 of the destination register when bit 31 is loaded.
In the MIPS I architecture, the LWL and LWR instructions were exceptions to the load-delay scheduling restriction.
A LWL or LWR instruction which was immediately followed by another LWL or LWR instruction, and used the same destination register would correctly merge the 1 to 4 loaded bytes with the data loaded by the previous instruction. All such restrictions were removed from the architecture in MIPS II.
EXTEND 11110 |
Imm[10:5] |
Imm[15:11] |
LWSP 10010 |
rx |
sel = 1 |
Imm[4:0] |
5 |
5 |
5 |
5 |
5 |
3 |
3 |
LW rx, immediate(gp) |
MIPS16e2 |
Load Word (GP-Relative, Extended) |
Load Word (GP-Relative, Extended)
To load a GP-relative word from memory as a signed value.
GPR[rx] = memory[GPR[gp] + immediate]
The 16-bit immediate value is sign-extended, then added to the contents of GPR 28 to form the effective address. The contents of the word at the memory location specified by the effective address are loaded into GPR rx.
Unpredictable prior to MIPS16e2. The effective address must be naturally-aligned. If either of the 2 least-significant bits of the address is non-zero, an Address Error exception occurs.
vAddr = sign_extend(immediate) + GPR[28] if vAddr1..0 != 02 then SignalException(AddressError) endif (pAddr, CCA) = AddressTranslation (vAddr, DATA, LOAD) memword = LoadMemory (CCA, WORD, pAddr, vAddr, DATA) GPR[Xlat(rx)] = memword
TLB Refill, TLB Invalid, Bus Error, Address Error
EXTEND 11110 |
CP0 000 |
sel[2:0] |
MFC0 00000 |
I8 01100 |
MOVR32 111 |
ry |
r32 |
5 |
3 |
3 |
5 |
5 |
3 |
3 |
5 |
MFC0 ry, r32, sel |
MIPS16e2 |
Move from Coprocessor 0 Extended |
Move from Coprocessor 0 Extended
To move the contents of a coprocessor 0 register to a general register.
GPR[ry] = CPR[0,r32,sel]
The contents of the coprocessor 0 register specified by the combination of r32 and sel are loaded into general register
ry. Not all coprocessor 0 registers support the sel field. In those instances, the sel field must be zero.
The results are UNDEFINED if coprocessor 0 does not contain a register as specified by r32 and sel.
reg = r32 if IsCoprocessorRegisterImplemented(0, reg, sel) then data = CPR[0, reg, sel] GPR[XLat[ry]] = data else if ArchitectureRevision() >= 6 then GPR[XLat[ry]] = 0 else UNDEFINED endif endif
Coprocessor Unusable, Reserved Instruction
EXTEND 11110 |
00000 |
0 |
00 |
000 |
SHIFT 00110 |
rx |
ry |
sel = 1 |
SRL 10 |
5 |
5 |
1 |
2 |
3 |
5 |
3 |
3 |
3 |
2 |
MOVN rx, $0, ry |
MIPS16e2 |
Move Conditional on Not Equal to Zero Extended |
Move Zero Conditional on Not Equal to Zero Extended
To conditionally zero a GPR after testing a GPR value.
if GPR[ry] != 0 then GPR[rx] = 0
If the value in GPR ry is not equal to zero, GPR rx is written with the value of 0.
In implementations prior to MIPS16e2, this instruction yielded unpredictable results. It would typically be exexcuted as an SRL instruction.
if GPR[XLat[ry]] != 0 then GPR[XLat[rx]] = 0 endif
None
The non-zero value tested might be the condition true result from the SLT, SLTI, SLTU, and SLTIU comparison instructions or a boolean value read from memory.
EXTEND 11110 |
00000 |
1 |
00 |
rb |
SHIFT 00110 |
rx |
ry |
sel = 2 |
SRL 10 |
5 |
5 |
1 |
2 |
3 |
5 |
3 |
3 |
3 |
2 |
MOVN rx, rb, ry |
MIPS16e2 |
Move Conditional on Not Equal to Zero Extended |
Move Conditional on Not Equal to Zero Extended
To conditionally move a GPR after testing a GPR value.
if GPR[ry] != 0 then GPR[rx] = GPR[rb]
If the value in GPR ry is not equal to zero, then the contents of GPR rb are placed into GPR rx.
In implementations prior to MIPS16e2, this instruction yielded unpredictable results. It would typically be executed as an SRL instruction.
if GPR[XLat[ry]] != 0 then GPR[XLat[rx]] = GPR[XLat[rb]] endif
None
The non-zero value tested might be the condition true result from the SLT, SLTI, SLTU, and SLTIU comparison instructions or a boolean value read from memory.
EXTEND 11110 |
00000 |
0 |
00 |
000 |
SHIFT 00110 |
rx |
0 |
sel = 6 |
SRL 10 |
5 |
5 |
1 |
2 |
3 |
5 |
3 |
3 |
3 |
2 |
MOVTN rx, $0 |
MIPS16e2 |
Move Conditional on T Not Equal to Zero Extended |
Move Zero Conditional on T Not Equal to Zero Extended
Test special register T and then conditionally move a GPR after testing a GPR value.
If T != 0, then GPR[rx] = 0
If the value in GPR[24] is not equal to 0, GPR rx is written with the value 0.
In implementations prior to MIPS16e2, this instruction yielded unpredictable results. It would typically be executed as an SRL instruction.
if GPR[24] != 0, then GPR[XLat[rx]] = 0 endif
None
The non-zero value tested might be the condition true result from the CMP or CMPI comparison instructions or a boolean value read from memory.
EXTEND 11110 |
00000 |
1 |
00 |
rb |
SHIFT 00110 |
rx |
0 |
sel = 6 |
SRL 10 |
5 |
5 |
1 |
2 |
3 |
5 |
3 |
3 |
3 |
2 |
MOVTN rx, rb |
MIPS16e2 |
Move Conditional on T Not Equal to Zero Extended |
Move Conditional on T Not Equal to Zero Extended
Test special register T and then conditionally move a GPR.
If T != 0, then GPR[rx] = GPR[rb]
If the value in GPR[24] is not equal to 0, the contents of GPR rb are placed into GPR rx.
In implementations prior to MIPS16e2, this instruction yielded unpredictable results. It would typically be executed as an SRL instruction.
if GPR[24] != 0, then GPR[XLat[rx]] = GPR[XLat[rb]] endif
None
The non-zero value tested might be the condition true result from the CMP or CMPI comparison instructions or a boolean value read from memory.
EXTEND 11110 |
00000 |
0 |
00 |
000 |
SHIFT 00110 |
rx |
0 |
sel = 5 |
SRL 10 |
5 |
5 |
1 |
2 |
3 |
5 |
3 |
3 |
3 |
2 |
MOVTZ rx, $0 |
MIPS16e2 |
Move Conditional on T Equal to Zero Extended |
Move Zero Conditional on T Equal to Zero Extended
To test special register T and then conditionally move a GPR after testing a GPR value.
If T = 0, then GPR[rx] = 0
If the value in GPR[24] is equal to 0, GPR rx is written with the value 0.
Unpredictable prior to MIPS16e2.
if GPR[24] = 0 then GPR[XLat[rx]] = 0 endif
In implementations prior to MIPS16e2, this instruction yielded unpredictable results. It would typically be executed as an SRL instruction.
The zero value tested might be the condition false result from the CMP or CMPI comparison instructions or a boolean value read from memory.
EXTEND 11110 |
00000 |
1 |
00 |
rb |
SHIFT 00110 |
rx |
0 |
sel = 5 |
SRL 10 |
5 |
5 |
1 |
2 |
3 |
5 |
3 |
3 |
3 |
2 |
MOVTZ rx, rb |
MIPS16e2 |
Move Conditional on T Equal to Zero Extended |
Move Conditional on T Equal to Zero Extended
To test special register T and then conditionally move a GPR.
If T = 0, then GPR[rx] = GPR[rb]
If the value in GPR[24] is equal to 0, the contents of GPR rb are placed into GPR rx.
In implementations prior to MIPS16e2, this instruction yielded unpredictable results. It would typically be exexcuted as an SRL instruction.
if GPR[24] = 0 then GPR[XLat[rx]] = GPR[XLat[rb]] endif
None
The zero value tested might be the condition false result from the CMP or CMPI comparison instructions or a boolean value read from memory.
EXTEND 11110 |
00000 |
0 |
00 |
000 |
SHIFT 00110 |
rx |
ry |
sel = 1 |
SRL 10 |
5 |
5 |
1 |
2 |
3 |
5 |
3 |
3 |
3 |
2 |
MOVZ rx, $0, ry |
MIPS16e2 |
Move Conditional on Equal to Zero Extended |
Move Zero Conditional on Equal to Zero Extended
To conditionally zero a GPR after testing a GPR value.
if GPR[ry] = 0 then GPR[rx] = 0
If the value in GPR ry is equal to zero, then GPR rx is written with the value of 0.
In implementations prior to MIPS16e2, this instruction yielded unpredictable results. It would typically be exexcuted as an SRL instruction.
if GPR[XLat[ry]] = 0 then GPR[XLat[rx]] = 0 endif
None
The zero value tested might be the condition false result from the SLT, SLTI, SLTU, and SLTIU comparison instructions or a boolean value read from memory.
EXTEND 11110 |
00000 |
1 |
00 |
rb |
SHIFT 00110 |
rx |
ry |
sel = 1 |
SRL 10 |
5 |
5 |
1 |
2 |
3 |
5 |
3 |
3 |
3 |
2 |
MOVZ rx, rb, ry |
MIPS16e2 |
Move Conditional on Equal to Zero Extended |
Move Conditional on Equal to Zero Extended
To conditionally move a GPR after testing a GPR value.
if GPR[ry] = 0 then GPR[rx] = GPR[rb]
If the value in GPR ry is equal to zero, then the contents of GPR rb are placed into GPR rx.
In implementations prior to MIPS16e2, this instruction yielded unpredictable results. It would typically be executed as an SRL instruction.
if GPR[XLat[ry]] = 0 then GPR[XLat[rx]] = GPR[XLat[rb]] endif
None
The zero value tested might be the condition false result from the SLT, SLTI, SLTU, and SLTIU comparison instructions or a boolean value read from memory.
EXTEND 11110 |
CP0 000 |
sel[2:0] |
MTC0 00001 |
I8 01100 |
MOVR32 111 |
ry |
r32 |
5 |
3 |
3 |
5 |
5 |
3 |
3 |
5 |
MTC0 ry, r32, sel |
MIPS16e2 |
Move to Coprocessor 0 Extended |
Move to Coprocessor 0 Extended
To move the contents of a general register to a coprocessor 0 register.
CPR[0, r32, sel] = GPR[ry]
The contents of general register ry are loaded into the coprocessor 0 register specified by the combination of r32 and
sel. Not all coprocessor 0 registers support the sel field. In those instances, the sel field must be set to zero.
Unpredictable prior to MIPS16e2. The results are UNDEFINED if coprocessor 0 does not contain a register as specified by r32 and sel.
data = GPR[XLat[ry]] reg = r32 if IsCoprocessorRegisterImplemented (0, reg, sel) then CPR[0,reg,sel] = data if (Config5MVH = 1) then // The most-significant bit may vary by register. Only supported // bits should be written 0. Extended LLAddr is not written with 0s, // as it is a read-only register. BadVAddr is not written with 0s, as // it is read-only if (Config3LPA = 1) then if (reg,sel = EntryLo0 or EntryLo1) then CPR[0,reg,sel]63:32 = 032 if (reg,sel = MAAR) then CPR[0,reg,sel]63:32 = 032 endif // TagLo is zeroed only if the implementation-dependent bits // are writeable if (reg,sel = TagLo) then CPR[0,reg,sel]63:32 = 032 endif if (Config3VZ = 1) then if (reg,sel = EntryHi) then CPR[0,reg,sel]63:32 = 032 endif endif endif endif endif
Coprocessor Unusable
Reserved Instruction
EXTEND 11110 |
Imm[10:5] |
Imm[15:11] |
LI 01101 |
rx |
sel = 2 |
Imm[4:0] |
5 |
5 |
5 |
5 |
5 |
3 |
3 |
ORI rx, immediate |
MIPS16e2 |
Or Immediate Extended |
Or Immediate Extended
To do a bitwise logical OR with a constant.
GPR[rx] = GPR[rx] OR immediate
The 16-bit immediate is zero-extended to the left and combined with the contents of GPR rx in a bitwise logical OR operation. The result is placed back into GPR rx.
Unpredictable prior to MIPS16e2.
GPR[XLat[rx]] = GPR[Xlat[rx]] or zero_extend(immediate)
None
EXTEND 11110 |
00101 |
0 |
00000 |
SHIFT 00110 |
000 |
000 |
sel = 6 |
SLL 00 |
5 |
5 |
1 |
5 |
5 |
3 |
3 |
3 |
5 |
PAUSE |
MIPS16e2 |
Wait for the LLBit to Clear Extended |
Wait for the LLBit to Clear Extended
Locks implemented using the LL/SC instructions are a common method of synchronization between threads of control. A lock implementation does a load-linked instruction and checks the value returned to determine whether the software lock is set. If it is, the code branches back to retry the load-linked instruction, implementing an active busywait sequence. The PAUSE instruction is intended to be placed into the busy-wait sequence to block the instruction stream until such time as the load-linked instruction has a chance to succeed in obtaining the software lock.
The PAUSE instruction is implementation-dependent, but it usually involves descheduling the instruction stream until the LLBit is zero.
In a single-threaded processor, this may be implemented as a short-term WAIT operation which resumes at the next instruction when the LLBit is zero or on some other external event such as an interrupt.
On a multi-threaded processor, this may be implemented as a short term YIELD operation which resumes at the next instruction when the LLBit is zero.
In either case, it is assumed that the instruction stream which gives up the software lock does so via a write to the lock variable, which causes the processor to clear the LLBit as seen by this thread of execution.
Unpredictable prior to MIPS16e2. The operation of the processor is UNPREDICTABLE if a PAUSE instruction is executed placed in the delay slot of a branch or jump instruction.
if LLBit != 0 then EPC = PC + 4 /* Resume at the following instruction */ DescheduleInstructionStream() endif
None
The PAUSE instruction is intended to be inserted into the instruction stream after an LL instruction has set the LLBit and found the software lock set. The program may wait forever if a PAUSE instruction is executed and there is no possibility that the LLBit will ever be cleared.
An example use of the PAUSE instruction is included in the following example:
acquire_lock: ll v0, 0(a0) /* Read software lock, set hardware lock */ bnez v0, acquire_lock_retry: /* Branch if software lock is taken */ addiu v0, v0, 1 /* Set the software lock */ sc v0, 0(a0) /* Try to store the software lock */ bnez v0, 10f /* Branch if lock acquired successfully */ sync acquire_lock_retry: pause /* Wait for LLBIT to clear before retry */ b acquire_lock /* and retry the operation */ 10: Critical region code release_lock: sync li t1, 0 /* Release software lock, clearing LLBIT */ sw t1, 0(a0) /* for any PAUSEd waiters */
EXTEND 11110 |
00 |
Imm[8:5] |
hint[4:0] |
SWSP 11010 |
rx |
sel = 4 |
Imm[4:0] |
5 |
5 |
0 |
5 |
5 |
3 |
3 |
5 |
PREF hint,immediate(rx) |
MIPS16e2 |
Prefetch Extended |
Prefetch Extended
To move data between memory and cache.
prefetch_memory(GPR[rx] + immediate)
PREF adds the signed immediate to the contents of GPR rx to form an effective byte address. The hint field supplies information about the way that the data is expected to be used.
PREF enables the processor to take some action, typically causing data to be moved to or from the cache, to improve program performance. The action taken for a specific PREF instruction is both system and context dependent. Any action, including doing nothing, is permitted as long as it does not change architecturally visible state or alter the meaning of a program. Implementations are expected either to do nothing, or to take an action that increases the performance of the program. The PrepareForStore function is unique in that it may modify the architecturally visible state.
PREF does not cause addressing-related exceptions, including TLB exceptions. If the address specified would cause an addressing exception, the exception condition is ignored and no data movement occurs. However even if no data is moved, some action that is not architecturally visible, such as write-back of a dirty cache line, can take place.
It is implementation dependent whether a Bus Error or Cache Error exception is reported if such an error is detected as a byproduct of the action taken by the PREF instruction.
PREF neither generates a memory operation nor modifies the state of a cache line for a location with an uncached memory access type, whether this type is specified by the address segment (e.g., kseg1), the programmed cacheability and coherency attribute of a segment (e.g., the use of the K0, KU, or K23 fields in the Config register), or the perpage cacheability and coherency attribute provided by the TLB.
If PREF results in a memory operation, the memory access type and cacheability&coherency attribute used for the operation are determined by the memory access type and cacheability&coherency attribute of the effective address, just as it would be if the memory operation had been caused by a load or store to the effective address.
For a cached location, the expected and useful action for the processor is to prefetch a block of data that includes the effective address. The size of the block and the level of the memory hierarchy it is fetched into are implementation specific.
In coherent multiprocessor implementations, if the effective address uses a coherent Cacheability and Coherency
Attribute (CCA), then the instruction causes a coherent memory transaction to occur. This means a prefetch issued on one processor can cause data to be evicted from the cache in another processor.
The PREF instruction and the memory transactions which are sourced by the PREF instruction, such as cache refill or cache writeback, obey the ordering and completion rules of the SYNC instruction.
Values of hint Field for PREF Instruction
Value 0 |
Name load |
Data Use and Desired Prefetch Action Use: Prefetched data is expected to be read (not modified). Action: Fetch data as if for a load. |
1 |
store |
Use: Prefetched data is expected to be stored or modified. Action: Fetch data as if for a store. |
2 |
L1 LRU hint |
Pre-Release 6: Reserved for Architecture. Release 6: Implementation-dependent. This hint code marks the line as LRU in the L1 cache and thus preferred for next eviction. Implementations can choose to writeback and/or invalidate as long as no architectural state is modified. |
3 |
Reserved |
Pre-Release 6: Reserved for Architecture. Release 6: Available for implementation-dependent use. |
4 |
load_streamed |
Use: Prefetched data is expected to be read (not modified) but not reused extensively; it "streams" through cache. Action: Fetch data as if for a load and place it in the cache so that it does not displace data prefetched as "retained." |
5 |
store_streamed |
Use: Prefetched data is expected to be stored or modified but not reused extensively; it "streams" through cache. Action: Fetch data as if for a store and place it in the cache so that it does not displace data prefetched as "retained." |
6 |
load_retained |
Use: Prefetched data is expected to be read (not modified) and reused extensively; it should be "retained" in the cache. Action: Fetch data as if for a load and place it in the cache so that it is not displaced by data prefetched as "streamed." |
7 |
store_retained |
Use: Prefetched data is expected to be stored or modified and reused extensively; it should be "retained" in the cache. Action: Fetch data as if for a store and place it in the cache so that it is not displaced by data prefetched as "streamed." |
8-15 |
L2 operation |
Pre-Release 6: Reserved for Architecture. In the Release 6 architecture, hint codes 8 - 15 are treated the same as hint codes 0 - 7 respectively, but operate on the L2 cache. |
16-23 |
L3 operation |
Pre-Release 6: Reserved for Architecture. In the Release 6 architecture, hint codes 16 - 23 are treated the same as hint codes 0 - 7 respectively, but operate on the L3 cache. |
24 |
Reserved |
Pre-Release 6: Unassigned by the Architecture - available for implementationdependent use. Release 6: This hint code is not implemented in the Release 6 architecture and generates a Reserved Instruction exception (RI). |
25 |
writeback_invalidate (also known as "nudge") |
Pre-Release 6: Use: Data is no longer expected to be used. Action: For a writeback cache, schedule a writeback of any dirty data. At the completion of the writeback, mark the state of any cache lines written back as invalid. If the cache line is not dirty, it is implementation dependent whether the state of the cache line is marked invalid or left unchanged. If the cache line is locked, no action is taken. Release 6: This hint code is not implemented in the Release 6 architecture and generates a Reserved Instruction exception (RI). |
26-29 |
Reserved |
Pre-Release 6: Unassigned by the Architecture-available for implementation-dependent use. Release 6: These hints are not implemented in the Release 6 architecture and generate a Reserved Instruction exception (RI). |
30 |
PrepareForStore |
Pre-Release 6: Use: Prepare the cache for writing an entire line, without the overhead involved in filling the line from memory. Action: If the reference hits in the cache, no action is taken. If the reference misses in the cache, a line is selected for replacement, any valid and dirty victim is written back to memory, the entire line is filled with zero data, and the state of the line is marked as valid and dirty. Programming Note: Because the cache line is filled with zero data on a cache miss, software must not assume that this action, in and of itself, can be used as a fast bzero-type function. Release 6: This hint is not implemented in the Release 6 architecture and generates a Reserved Instruction exception (RI). |
31 |
Reserved |
Pre-Release 6: Unassigned by the Architecture-available for implementation-dependent use. Release 6: This hint is not implemented in the Release 6 architecture and generates a Reserved Instruction exception (RI). |
Unpredictable prior to MIPS16e2.
vAddr = GPR[Xlat[rx]] + sign_extend(immediate) (pAddr, CCA) = AddressTranslation(vAddr, DATA, LOAD) Prefetch(CCA, pAddr, vAddr, DATA, hint)
Bus Error, Cache Error
Prefetch does not take any TLB-related or address-related exceptions under any circumstances.
Prefetch cannot move data to or from a mapped location unless the translation for that location is present in the TLB.
Locations in memory pages that have not been accessed recently may not have translations in the TLB, so prefetch may not be effective for such locations.
Prefetch does not cause addressing exceptions. A prefetch may be used using an address pointer before the validity of the pointer is determined without worrying about an addressing exception.
It is implementation dependent whether a Bus Error or Cache Error exception is reported if such an error is detected as a byproduct of the action taken by the PREF instruction. Typically, this only occurs in systems which have highreliability requirements.
Prefetch operations have no effect on cache lines that were previously locked with the CACHE instruction.
Hint field encodings whose function is described as "streamed" or "retained" convey usage intent from software to
hardware. Software should not assume that hardware will always prefetch data in an optimal way. If data is to be truly retained, software should use the Cache instruction to lock data into the cache.
EXTEND 11110 |
00000 |
0 |
HWR |
SHIFT 00110 |
000 |
ry |
sel = 3 |
SLL 00 |
5 |
5 |
1 |
5 |
5 |
3 |
3 |
3 |
2 |
RDHWR ry,HWR |
MIPS16e2 |
Read Hardware Register Extended |
Read Hardware Register Extended
To move the contents of a hardware register to a general purpose register (GPR) if that operation is enabled by privileged software.
The purpose of this instruction is to give user mode access to specific information that is otherwise only visible in kernel mode.
GPR[ry] = HWR[HWR]
If access is allowed to the specified hardware register, the contents of the register specified by SHIFT is loaded into general register ry Access control for each register is selected by the bits in the coprocessor 0 HWREna register.
The available hardware registers, and the encoding of the rd field for each, are shown below.
RDHWR Register Numbers
Register Number (HWR Value) |
Mnemonic |
Description |
0 |
CPUNum |
Number of the CPU on which the program is currently running. This register provides read access to the coprocessor 0 EBaseCPUNum field. |
1 |
SYNCI_Step |
Address step size to be used with the SYNCI instruction, or zero if no caches need be synchronized. See that instruction's description for the use of this value. |
2 |
CC |
High-resolution cycle counter. This register provides read access to the coprocessor 0 Count Register. |
3 |
CCRes |
Resolution of the CC register. This value denotes the number of cycles between update of the register. For example: CCRes ValueMeaning 1CC register increments every CPU cycle 2CC register increments every second CPU cycle 3CC register increments every third CPU cycle etc. |
4 |
Rsv |
Reserved. |
5 |
XNP |
Indicates support for Release 6 Double-Width LLX/SCX family of instructions. If set to 1, then LLX/SCX family of instructions is not present, otherwise present in the implementation. In absence of hardware support for double-width or extended atomics, user software may emulate the instruction's behavior through other means. See Config5XNP. |
6-28 |
These registers numbers are reserved for future architecture use. Access results in a Reserved Instruction Exception. | |
29 |
ULR |
User Local Register. This register provides read access to the coprocessor 0 UserLocal register, if it is implemented. In some operating environments, the UserLocal register is a pointer to a thread-specific storage block. |
30-31 |
These register numbers are reserved for implementation-dependent use. If they are not implemented, access results in a Reserved Instruction Exception. |
Unpredictable prior to MIPS16e2. Access to the specified hardware register is enabled if Coprocessor 0 is enabled, or if the corresponding bit is set in the HWREna register. If access is not allowed or the register is not implemented, a
Reserved Instruction Exception is signaled.
case HWR 0: temp = EBaseCPUNum 1: temp = SYNCI_StepSize() 2: temp = Count 3: temp = CountResolution() 5: temp = XNP 29: temp = UserLocal 30: temp = Implementation-Dependent-Value 31: temp = Implementation-Dependent-Value otherwise: SignalException(ReservedInstruction) endcase GPR[Xlat[ry]] = temp
Reserved Instruction
EXTEND 11110 |
Imm[10:5] |
Imm[15:11] |
SWSP 11010 |
rx |
sel = 3 |
Imm[4:0] |
5 |
5 |
5 |
5 |
5 |
3 |
3 |
SB rx, immediate(gp) |
MIPS16e2 |
Store Byte (GP-relative) Extended |
Store Byte (GP-relative) Extended
To store a byte to memory.
memory[GPR[gp] + immediate] = GPR[rx]
The 16-bit immediate value is sign-extended, then added to the contents of GPR 28 to form the effective address. The least-significant byte of GPR rx is stored at the effective address.
Unpredictable prior to MIPS16e2.
vAddr = sign_extend(immediate) + GPR[28] (pAddr, CCA) = AddressTranslation (vAddr, DATA, STORE) pAddr = pAddrPSIZE-1..2 || (pAddr1..0 xor ReverseEndian2) bytesel = vAddr1..0 xor BigEndianCPU2 dataword = GPR[Xlat[rx]]31-8*bytesel..0 || 08*bytesel StoreMemory (CCA, BYTE, dataword, pAddr, vAddr, DATA)
TLB Refill, TLB Invalid, TLB Modified, Bus Error, Address Error
EXTEND 11110 |
00 |
Imm[8:5] |
00 |
rb |
SWSP 11010 |
rx |
sel = 6 |
Imm[4:0] |
5 |
5 |
4 |
2 |
3 |
5 |
3 |
3 |
5 |
SC rx, immediate(rb) |
MIPS16e2 |
Store Conditional Word Extended |
Store Conditional Word Extended
To store a word to memory to complete an atomic read-modify-write.
if atomic_update then memory[GPR[rb] + immediate] = GPR[rx], GPR[rx] = 1 else GPR[rx] = 0
The LL and SC instructions provide primitives to implement atomic read-modify-write (RMW) operations on synchronizable memory locations.
The 32-bit word in GPR rx is conditionally stored in memory at the location specified by the aligned effective address. The signed immediate value is added to the contents of GPR rb to form an effective address.
The SC completes the RMW sequence begun by the preceding LL instruction executed on the processor. To complete the RMW sequence atomically, the following occur:
The 32-bit word of GPR rx is stored to memory at the location specified by the aligned effective address.
A one, indicating success, is written into GPR rx.
Otherwise, memory is not modified and a 0, indicating failure, is written into GPR rx
If the following event occurs between the execution of LL and SC, the SC fails:
A coherent store is executed between an LL and SC sequence on the same processor to the block of synchronizable physical memory containing the word (if Config5LLB=1; else whether such a store causes the SC to fail is not predictable).
An ERET instruction is executed.
Furthermore, an SC must always compare its address against that of the LL. An SC will fail if the aligned address of the SC does not match that of the preceeding LL.
A load that executes on the processor executing the LL/SC sequence to the block of synchronizable physical memory containing the word, will not cause the SC to fail (if Config5LLB=1; else such a load may cause the SC to fail).
If any of the events listed below occurs between the execution of LL and SC, the SC may fail where it could have succeeded, i.e., success is not predictable. Portable programs should not cause any of these events.
A load or store executed on the processor executing the LL and SC that is not to the block of synchronizable physical memory containing the word. (The load or store may cause a cache eviction between the LL and SC that results in SC failure. The load or store does not necessarily have to occur between the LL and SC.)
Any prefetch that is executed on the processor executing the LL and SC sequence (due to a cache eviction between the LL and SC).
A non-coherent store executed between an LL and SC sequence to the block of synchronizable physical memory containing the word.
The instructions executed starting with the LL and ending with the SC do not lie in a 2048-byte contiguous region of virtual memory. (The region does not have to be aligned, other than the alignment required for instruction words.)
CACHE operations that are local to the processor executing the LL/SC sequence will result in unpredictable behaviour of the SC if executed between the LL and SC, that is, they may cause the SC to fail where it could have succeeded. Non-local CACHE operations (address-type with coherent CCA) may cause an SC to fail on either the local processor or on the remote processor in multiprocessor or multi-threaded systems. This definition of the effects of
CACHE operations is mandated if Config5LLB=1. If Config5LLB=0, then CACHE effects are implementation-dependent.
The following conditions must be true or the result of the SC is not predictable-the SC may fail or succeed (if
Config5LLB=1, then either success or failure is mandated, else the result is UNPREDICTABLE):
Execution of SC must have been preceded by execution of an LL instruction.
An RMW sequence executed without intervening events that would cause the SC to fail must use the same address in the LL and SC. The address is the same if the virtual address, physical address, and cacheability & coherency attribute are identical.
Unpredictable prior to MIPS16e2. The effective address must be naturally-aligned. If either of the 2 least-significant bits of the address is non-zero, an Address Error exception occurs.
vAddr = sign_extend(immediate) + GPR[Xlat[rb]] (pAddr, CCA) = AddressTranslation (vAddr, DATA, STORE) dataword= GPR[Xlat[rx]] if LLbit then StoreMemory (CCA, WORD, dataword, pAddr, vAddr, DATA) endif GPR[Xlat[rx]] = 031 || LLbit LLbit = 0 // if Config5LLB=1, SC always clears LLbit regardless of address match.
TLB Refill, TLB Invalid, TLB Modified, Address Error, Watch
LL and SC are used to atomically update memory locations, as shown below.
L1: LL a1, (a0) # load counter ADDIU v0, a1, 1 # increment SC v0, (a0) # try to store, checking for atomicity BEQ v0, 0, L1 # if not atomic (0), try again NOP # branch-delay slot
Exceptions between the LL and SC cause SC to fail, so persistent exceptions must be avoided. Some examples of these are arithmetic operations that trap, system calls, and floating point operations that trap or require software emulation assistance.
LL and SC function on a single processor for cached noncoherent memory so that parallel programs can be run on uniprocessor systems that do not support cached coherent memory access types.
EXTEND 11110 |
Imm[10:5] |
Imm[15:11] |
SWSP 11010 |
rx |
sel = 2 |
Imm[4:0] |
5 |
5 |
5 |
5 |
5 |
3 |
3 |
SH rx, immediate(gp) |
MIPS16e2 |
Store Halfword (GP-relative) |
Store Halfword (GP-relative)
To store a halfword to memory.
memory[GPR[gp] + immediate] = GPR[rx]
The 16-bit immediate value is sign-extended, and then added to the contents of GPR 28 to form the effective address.
The least-significant halfword of GPR rx is stored at the effective address.
Unpredictable prior to MIPS16e2. The effective address must be naturally-aligned. If the least-significant bit of the address is non-zero, an Address Error exception occurs.
vAddr = sign_extend(immediate) + GPR[28] (pAddr, CCA) = AddressTranslation (vAddr, DATA, STORE) pAddr = pAddrPSIZE-1..2 || (pAddr11..0 xor (ReverseEndian || 0)) bytesel = vAddr11..0 xor (BigEndianCPU || 0) dataword = GPR[Xlat[rx]]31-8*bytesel..0 || 08*bytesel StoreMemory (CCA, HALFWORD, dataword, pAddr, vAddr, DATA)
TLB Refill, TLB Invalid, TLB Modified, Bus Error, Address Error.
EXTEND 11110 |
00 |
Imm[8:5] |
00 |
rb |
SWSP 11010 |
rx |
sel = 7 |
Imm[4:0] |
5 |
5 |
4 |
2 |
3 |
5 |
3 |
3 |
5 |
SWL rx, immediate(rb) |
MIPS16e2 |
Store Word Left Extended |
Store Word Left Extended
To store the most-significant part of a word to an unaligned memory address.
memory[GPR[rb] + immediate] = GPR[rx]
The 9-bit signed immediate value is added to the contents of GPR rb to form an effective address (EffAddr). EffAddr is the address of the most-significant of 4 consecutive bytes forming a word (W) in memory starting at an arbitrary byte boundary.
A part of W (the most-significant 1 to 4 bytes) is in the aligned word containing EffAddr. The same number of the most-significant (left) bytes from the word in GPR rx are stored into these bytes of W.
The following figure illustrates this operation using big-endian byte ordering for 32-bit and 64-bit registers. The four consecutive bytes in 2..5 form an unaligned word starting at location 2. A part of W (2 bytes) is located in the aligned word containing the most-significant byte at 2.
3.SWL stores the most-significant 2 bytes of the low word from the source register into these 2 bytes in memory.
4.The complementary SWR stores the remainder of the unaligned word.
The bytes stored from the source register to memory depend on both the offset of the effective address within an aligned word-that is, the low 2 bits of the address (vAddr1..0)-and the current byte-ordering mode of the processor
(big- or little-endian). The following figure shows the bytes stored for every combination of offset and byte ordering.
Unpredictable prior to MIPS16e2.
vAddr = sign_extend(immediate) + GPR[Xlat[rb]] (pAddr, CCA) = AddressTranslation (vAddr, DATA, STORE) pAddr = pAddrPSIZE-1..2 || (pAddr1..0 xor ReverseEndian2) If BigEndianMem = 0 then pAddr = pAddrPSIZE-1..2 || 02 endif byte = vAddr1..0 xor BigEndianCPU2 dataword = 024-8*byte || GPR[Xlat[rx]]31..24-8*byte StoreMemory(CCA, byte, dataword, pAddr, vAddr, DATA)
TLB Refill, TLB Invalid, TLB Modified, Bus Error, Address Error, Watch
EXTEND 11110 |
00 |
Imm[8:5] |
10 |
rb |
SWSP 11010 |
rx |
sel = 7 |
Imm[4:0] |
5 |
5 |
4 |
2 |
3 |
5 |
3 |
3 |
5 |
SWR rx, immediate(rb) |
MIPS16e2 |
Store Word Right Extended |
Store Word Right Extended
To store the least-significant part of a word to an unaligned memory address.
memory[GPR[rb] + immediate] = GPR[rx]
The 9-bit signed immediate value is added to the contents of GPR rb to form an effective address (EffAddr). EffAddr is the address of the least-significant of 4 consecutive bytes forming a word (W) in memory starting at an arbitrary byte boundary.
A part of W (the least-significant 1 to 4 bytes) is in the aligned word containing EffAddr. The same number of the least-significant (right) bytes from the word in GPR rx are stored into these bytes of W.
The following figure illustrates this operation using big-endian byte ordering for 32-bit and 64-bit registers. The 4 consecutive bytes in 2..5 form an unaligned word starting at location 2. A part of W (2 bytes) is contained in the aligned word containing the least-significant byte at 5.
1.SWR stores the least-significant 2 bytes of the low word from the source register into these 2 bytes in memory.
2.The complementary SWL stores the remainder of the unaligned word.
The bytes stored from the source register to memory depend on both the offset of the effective address within an aligned word-that is, the low 2 bits of the address (vAddr1..0)-and the current byte-ordering mode of the processor
(big- or little-endian). The following figure shows the bytes stored for every combination of offset and byte-ordering.
Unpredictable prior to MIPS16e2.
vAddr = sign_extend(immediate) + GPR[Xlat[rb]] (pAddr, CCA) = AddressTranslation (vAddr, DATA, STORE) pAddr = pAddrPSIZE-1..2 || (pAddr1..0 xor ReverseEndian2) If BigEndianMem = 0 then pAddr = pAddrPSIZE-1..2 || 02 endif byte = vAddr1..0 xor BigEndianCPU2 dataword = GPR[XLat[rx]]31-8*byte || 08*byte StoreMemory(CCA, WORD-byte, dataword, pAddr, vAddr, DATA)
TLB Refill, TLB Invalid, TLB Modified, Bus Error, Address Error, Watch
EXTEND 11110 |
Imm[10:5] |
Imm[15:11] |
SWSP 11010 |
rx |
sel = 1 |
Imm[4:0] |
5 |
5 |
5 |
5 |
5 |
3 |
3 |
SW rx, immediate(gp) |
MIPS16e2 |
Store Word (GP-relative) Extended |
Store Word (GP-relative) Extended
To store a word to memory.
memory[GPR[gp] + immediate] = GPR[rx]
The 16-bit immediate value is sign-extended, then added to the contents of GPR 28 to form the effective address. The contents of GPR rx are stored at the effective address.
Unpredictable prior to MIPS16e2. The effective address must be naturally-aligned. If either of the 2 least-significant bits of the address is non-zero, an Address Error exception occurs.
vAddr = sign_extend(immediate) + GPR[28] (pAddr, CCA) = AddressTranslation (vAddr, DATA, STORE) dataword = GPR[Xlat(rx)] StoreMemory (CCA, WORD, dataword, pAddr, vAddr, DATA)
TLB Refill, TLB Invalid, TLB Modified, Bus Error, Address Error
EXTEND 11110 |
stype |
0 |
00000 |
SHIFT 00110 |
000 |
000 |
sel = 5 |
SLL 00 |
5 |
5 |
1 |
5 |
5 |
3 |
3 |
3 |
5 |
SYNC stype |
MIPS16e2 |
Synchronize Shared Memory Extended |
Synchronize Shared Memory Extended
To order loads and stores for shared memory.
These types of ordering guarantees are available through the SYNC instruction:
Completion Barriers
Ordering Barriers
Completion Barrier - Simple Description:
The barrier affects only uncached and cached coherent loads and stores.
The specified memory instructions (loads or stores or both) that occur before the SYNC instruction must be completed before the specified memory instructions after the SYNC are allowed to start.
Loads are completed when the destination register is written. Stores are completed when the stored value is visible to every other processor in the system.
Completion Barrier - Detailed Description:
Every synchronizable specified memory instruction (loads or stores or both) that occurs in the instruction stream before the SYNC instruction must be already globally performed before any synchronizable specified memory instructions that occur after the SYNC are allowed to be performed, with respect to any other processor or coherent I/O module.
The barrier does not guarantee the order in which instruction fetches are performed.
A stype value of zero will always be defined such that it performs the most complete set of synchronization operations that are defined. This means stype zero always does a completion barrier that affects both loads and stores preceding the SYNC instruction and both loads and stores that are subsequent to the SYNC instruction. Non-zero values of stype may be defined by the architecture or specific implementations to perform synchronization behaviors that are less complete than that of stype zero. If an implementation does not use one of these non-zero values to define a different synchronization behavior, then that non-zero value of stype must act the same as stype zero completion barrier. This allows software written for an implementation with a lighter-weight barrier to work on another implementation which only implements the stype zero completion barrier.
A completion barrier is required, potentially in conjunction with EHB, to guarantee that memory reference results are visible across operating mode changes. For example, a completion barrier is required on some implementations on entry to and exit from Debug Mode to guarantee that memory effects are handled correctly.
SYNC behavior when the stype field is zero:
A completion barrier that affects preceding loads and stores and subsequent loads and stores.
Ordering Barrier - Simple Description:
The barrier affects only uncached and cached coherent loads and stores.
The specified memory instructions (loads or stores or both) that occur before the SYNC instruction must always be ordered before the specified memory instructions after the SYNC.
Memory instructions which are ordered before other memory instructions are processed by the load/store datapath first before the other memory instructions.
Ordering Barrier - Detailed Description:
Every synchronizable specified memory instruction (loads or stores or both) that occurs in the instruction stream before the SYNC instruction must reach a stage in the load/store datapath after which no instruction re-ordering is possible before any synchronizable specified memory instruction which occurs after the
SYNC instruction in the instruction stream reaches the same stage in the load/store datapath.
If any memory instruction before the SYNC instruction in program order, generates a memory request to the external memory and any memory instruction after the SYNC instruction in program order also generates a memory request to external memory, the memory request belonging to the older instruction must be globally performed before the time the memory request belonging to the younger instruction is globally performed.
The barrier does not guarantee the order in which instruction fetches are performed.
As compared to the completion barrier, the ordering barrier is a lighter-weight operation as it does not require the specified instructions before the SYNC to be already completed. Instead it only requires that those specified instructions which are subsequent to the SYNC in the instruction stream are never re-ordered for processing ahead of the specified instructions which are before the SYNC in the instruction stream. This potentially reduces how many cycles the barrier instruction must stall before it completes.
The Acquire and Release barrier types are used to minimize the memory orderings that must be maintained and still have software synchronization work.
Implementations that do not use any of the non-zero values of stype to define different barriers, such as ordering barriers, must make those stype values act the same as stype zero.
For the purposes of this description, the CACHE, PREF and PREFX instructions are treated as loads and stores. That is, these instructions and the memory transactions sourced by these instructions obey the ordering and completion rules of the SYNC instruction.
The following table lists the available completion barrier and ordering barriers behaviors that can be specified using the stype field.
Code |
Name |
Older instructions which must reach the load/store ordering point before the SYNC instruction completes. |
Younger instructions which must reach the load/store ordering point only after the SYNC instruction completes. |
Older instructions which must be globally performed when the SYNC instruction completes |
Compliance |
0x0 |
SYNC or SYNC 0 |
Loads, Stores |
Loads, Stores |
Loads, Stores |
Required |
0x4 |
SYNC_WMB or SYNC 4 |
Stores |
Stores |
Optional | |
0x10 |
SYNC_MB or SYNC 16 |
Loads, Stores |
Loads, Stores |
Optional | |
0x11 |
SYNC_ACQUIRE or SYNC 17 |
Loads |
Loads, Stores |
Optional | |
0x12 |
SYNC_RELEASE or SYNC 18 |
Loads, Stores |
Stores |
Optional | |
0x13 |
SYNC_RMB or SYNC 19 |
Loads |
Loads |
Optional | |
0x1-0x3, 0x5-0xF |
Implementation-Specific and Vendor Specific Sync Types | ||||
0x14 - 0x1F |
RESERVED |
Reserved for MIPS Technologies for future extension of the architecture. |
Synchronizable: A load or store instruction is synchronizable if the load or store occurs to a physical location in
shared memory using a virtual location with a memory access type of either uncached or cached coherent. Shared
memory is memory that can be accessed by more than one processor or by a coherent I/O system module.
Performed load: A load instruction is performed when the value returned by the load has been determined. The result
of a load on processor A has been determined with respect to processor or coherent I/O module B when a subsequent store to the location by B cannot affect the value returned by the load. The store by B must use the same memory access type as the load.
Performed store: A store instruction is performed when the store is observable. A store on processor A is observable
with respect to processor or coherent I/O module B when a subsequent load of the location by B returns the value written by the store. The load by B must use the same memory access type as the store.
Globally performed load: A load instruction is globally performed when it is performed with respect to all processors
and coherent I/O modules capable of storing to the location.
Globally performed store: A store instruction is globally performed when it is globally observable. It is globally observable when it is observable by all processors and I/O modules capable of loading from the location.
Coherent I/O module: A coherent I/O module is an Input/Output system component that performs coherent Direct
Memory Access (DMA). It reads and writes memory independently as though it were a processor doing loads and stores to locations with a memory access type of cached coherent.
Load/Store Datapath: The portion of the processor which handles the load/store data requests coming from the processor pipeline and processes those requests within the cache and memory system hierarchy.
Unpredictable prior to MIPS16e2. The effect of SYNC on the global order of loads and stores for memory access types other than uncached and cached coherent is UNPREDICTABLE.
SyncOperation(stype)
None
A processor executing load and store instructions observes the order in which loads and stores using the same memory access type occur in the instruction stream; this is known as program order.
A parallel program has multiple instruction streams that can execute simultaneously on different processors. In multiprocessor (MP) systems, the order in which the effects of loads and stores are observed by other processors-the
global order of the loads and store-determines the actions necessary to reliably share data in parallel programs.
When all processors observe the effects of loads and stores in program order, the system is strongly ordered. On such systems, parallel programs can reliably share data without explicit actions in the programs. For such a system, SYNC has the same effect as a NOP. Executing SYNC on such a system is not necessary, but neither is it an error.
If a multiprocessor system is not strongly ordered, the effects of load and store instructions executed by one processor may be observed out of program order by other processors. On such systems, parallel programs must take explicit actions to reliably share data. At critical points in the program, the effects of loads and stores from an instruction stream must occur in the same order for all processors. SYNC separates the loads and stores executed on the processor into two groups, and the effect of all loads and stores in one group is seen by all processors before the effect of any load or store in the subsequent group. In effect, SYNC causes the system to be strongly ordered for the executing processor at the instant that the SYNC is executed.
Many MIPS-based multiprocessor systems are strongly ordered or have a mode in which they operate as strongly ordered for at least one memory access type. The MIPS architecture also permits implementation of MP systems that are not strongly ordered; SYNC enables the reliable use of shared memory on such systems. A parallel program that does not use SYNC generally does not operate on a system that is not strongly ordered. However, a program that does use SYNC works on both types of systems. (System-specific documentation describes the actions needed to reliably share data in parallel programs for that system.)
The behavior of a load or store using one memory access type is UNPREDICTABLE if a load or store was previously made to the same physical location using a different memory access type. The presence of a SYNC between the references does not alter this behavior.
SYNC affects the order in which the effects of load and store instructions appear to all processors; it does not generally affect the physical memory-system ordering or synchronization issues that arise in system programming. The effect of SYNC on implementation-specific aspects of the cached memory system, such as writeback buffers, is not defined.
# Processor A (writer) # Conditions at entry: # The value 0 has been stored in FLAG and that value is observable by B SW R1, DATA # change shared DATA value LI R2, 1 SYNC # Perform DATA store before performing FLAG store SW R2, FLAG # say that the shared DATA value is valid # Processor B (reader) LI R2, 1 1: LW R1, FLAG # Get FLAG BNE R2, R1, 1B# if it says that DATA is not valid, poll again NOP SYNC # FLAG value checked before doing DATA read LW R1, DATA # Read (valid) shared DATA value
The code fragments above shows how SYNC can be used to coordinate the use of shared data between separate writer and reader instruction streams in a multiprocessor environment. The FLAG location is used by the instruction streams to determine whether the shared data item DATA is valid. The SYNC executed by processor A forces the store of
DATA to be performed globally before the store to FLAG is performed. The SYNC executed by processor B ensures that DATA is not read until after the FLAG value indicates that the shared data is valid.
Software written to use a SYNC instruction with a non-zero stype value, expecting one type of barrier behavior, should only be run on hardware that actually implements the expected barrier behavior for that non-zero stype value or on hardware which implements a superset of the behavior expected by the software for that stype value. If the hardware does not perform the barrier behavior expected by the software, the system may fail.
EXTEND 11110 |
Imm[10:5] |
Imm[15:11] |
LI 01101 |
rx |
sel = 4 |
Imm[4:0] |
5 |
5 |
5 |
5 |
5 |
3 |
3 |
XORI rx, immediate |
MIPS16e2 |
Exclusive OR Immediate Extended |
Exclusive OR Immediate Extended
To do a bitwise logical Exclusive OR with a constant.
GPR[rx] = GPR[rx] XOR immediate
Combine the contents of GPR rx and the 16-bit zero-extended immediate in a bitwise logical Exclusive OR operation and place the result back into GPR rx.
Unpredictable prior to MIPS16e2.
GPR[XLat[rx]] = GPR[Xlat[rx]] xor zero_extend(immediate)
None