ADD rd, rs, rt |
nanoMIPS, not available in NMS |
Add |
Add. Add two 32-bit signed integers in registers $rs and $rt, placing the 32-bit result inregister $rd, and trapping on overflow.
nanoMIPS, not available in NMS
001000 |
rt |
rs |
rd |
x |
0100010 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
if C0.Config5.NMS == 1: raise exception('RI') sum = GPR[rs] + GPR[rt] if overflows(sum, nbits=32): raise exception('OV') GPR[rd] = sign_extend(sum, from_nbits=32)
Overflow.
ADDIU rt, rs, imm |
nanoMIPS, availability varies by format. |
Add Immediate (Untrapped) |
Add Immediate (Untrapped). Add immediate value imm to the 32-bit integer value in register$rs, placing the 32-bit result in register $rt, and not trapping on overflow.
nanoMIPS, availability varies by format.
000000 |
rt!=0 rt |
rs |
u |
6 |
5 |
5 |
16 |
imm = u
011000 |
rt |
00001 |
s[15:0] |
s[31:16] |
6 |
5 |
5 |
16 |
16 |
if C0.Config5.NMS == 1: raise exception('RI') imm = sign_extend(s, from_nbits=32) rs = rt
011000 |
rt |
00010 |
s[15:0] |
s[31:16] |
6 |
5 |
5 |
16 |
16 |
if C0.Config5.NMS == 1: raise exception('RI') if pointers_are_64_bits(): raise behaves_like('DADDIU[GP48]') imm = sign_extend(s, from_nbits=32) rs = 28
010001 |
rt |
011 |
u |
6 |
5 |
3 |
18 |
if pointers_are_64_bits(): raise behaves_like('DADDIU[GP.B]') imm = u rs = 28
010000 |
rt |
u[20:2] |
00 |
6 |
5 |
19 |
2 |
if pointers_are_64_bits(): raise behaves_like('DADDIU[GP.W]') imm = u rs = 28
011100 |
rt3 |
1 |
u[7:2] |
6 |
3 |
1 |
6 |
if pointers_are_64_bits(): raise behaves_like('DADDIU[R1.SP]') rt = decode_gpr(rt3, 'gpr3') rs = 29 imm = u
100100 |
rt3 |
rs3 |
0 |
u[4:2] |
6 |
3 |
3 |
1 |
3 |
rt = decode_gpr(rt3, 'gpr3') rs = decode_gpr(rs3, 'gpr3') imm = u
100100 |
with rt!=0 rt |
s[3] |
1 |
s[2:0] |
6 |
5 |
1 |
1 |
3 |
rs = rt imm = sign_extend(s, from_nbits=4)
ADDIU[RS5] with rt=0 is used to provide a 16 bit NOP instruction.
100000 |
rt |
rs |
1000 |
u |
6 |
5 |
5 |
4 |
12 |
imm = -u
sum = GPR[rs] + imm GPR[rt] = sign_extend(sum, from_nbits=32)
Reserved Instruction for ADDIU[48] and ADDIU[GP48] formats on NMS cores.
ADDIUPC rt, imm |
nanoMIPS, availability varies by format. |
Add Immediate (Untrapped) to PC |
Add Immediate (Untrapped) to PC. Compute address by adding immediate value imm to thePC and placing the result in register $rt, not trapping on overflow.
nanoMIPS, availability varies by format.
000001 |
rt |
s[20:1] |
s[21] |
6 |
5 |
20 |
1 |
if pointers_are_64_bits(): raise behaves_like('DADDIUPC[32]') s = sign_extend(s[21] @ s[20:1] @ '0') imm = s + 4
011000 |
rt |
00011 |
s[15:0] |
s[31:16] |
6 |
5 |
5 |
16 |
16 |
if C0.Config5.NMS == 1: raise exception('RI') if pointers_are_64_bits(): raise behaves_like('DADDIUPC[48]') s = sign_extend(s[31:16] @ s[15:0]) imm = s + 6
GPR[rt] = effective_address(CPU.next_pc, s)
Reserved Instruction for ADDIUPC[48] format on NMS cores.
ADDU dst, src1, src2 |
nanoMIPS, availability varies by format. |
Add (Untrapped) |
Add (Untrapped). Add two 32-bit integers in registers $src1 and $src2, placing the 32-bitresult in register $dst, and not trapping on overflow.
nanoMIPS, availability varies by format.
001000 |
rt |
rs |
rd |
x |
0101010 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
dst = rd src1 = rs src2 = rt not_in_nms = False
101100 |
rt3 |
rs3 |
rd3 |
0 |
6 |
3 |
3 |
3 |
1 |
dst = decode_gpr(rd3, 'gpr3') src1 = decode_gpr(rs3, 'gpr3') src2 = decode_gpr(rt3, 'gpr3') not_in_nms = False
001111 |
rt4[3] |
0 |
rt4[2:0] |
rs4[3] |
0 |
rs4[2:0] |
6 |
1 |
1 |
3 |
1 |
1 |
3 |
dst = decode_gpr(rt4, 'gpr4') src1 = decode_gpr(rt4, 'gpr4') src2 = decode_gpr(rs4, 'gpr4') not_in_nms = True
if not_in_nms and C0.Config5.NMS: raise exception('RI') sum = GPR[src1] + GPR[src2] GPR[dst] = sign_extend(sum, from_nbits=32)
Reserved Instruction for ADDU[4X4] format on NMS cores.
ALIGN rd, rs, rt, bp |
Assembly alias |
Align |
Align. Concatenate the 32 bit values in registers $rt and $rs, extract the word at specifiedbyte position bp, and place the result in register $rd.
Assembly alias
bp != 0: EXTW rd, rs, rt, (4-bp)<<3 bp == 0: MOVE rd, rt
ALUIPC rt, %pcrel_hi(address) |
nanoMIPS |
Add aLigned Upper Immediate to PC |
Add aLigned Upper Immediate to PC. Compute a 4KB aligned PC relative address by addingan upper 20 bitimmediate value to NextPC, discarding the lower 12 bits, and placing the resultin register $rt.
nanoMIPS
111000 |
rt |
s[20:12] |
s[30:21] |
1 |
s[31] |
6 |
5 |
9 |
10 |
1 |
1 |
offset = sign_extend(s, from_nbits=32) address = effective_address(CPU.next_pc, offset) & ~0xfff
GPR[rt] = address
None.
AND rd, rs, rt |
nanoMIPS |
AND |
AND. Compute logical AND of registers $rs and $rt, placing the result in register $rd.
nanoMIPS
001000 |
rt |
rs |
rd |
x |
1001010 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
010100 |
rt3 |
rs3 |
10 |
0 |
0 |
6 |
3 |
3 |
2 |
1 |
1 |
rt = decode_gpr(rt3, 'gpr3') rs = decode_gpr(rs3, 'gpr3') rd = rt
GPR[rd] = GPR[rs] & GPR[rt]
None.
ANDI rt, rs, u |
nanoMIPS |
AND Immediate |
AND Immediate. Compute logical AND of register $rs and immediate u, placing the resultin register $rt.
nanoMIPS
100000 |
rt |
rs |
0010 |
u |
6 |
5 |
5 |
4 |
12 |
111100 |
rt3 |
rs3 |
eu |
6 |
3 |
3 |
4 |
rt = decode_gpr(rt3, 'gpr3') rs = decode_gpr(rs3, 'gpr3') u = (0x00ff if eu == 12 else 0xffff if eu == 13 else eu)
GPR[rt] = GPR[rs] & u
None.
BALC address |
nanoMIPS |
Branch And Link, Compact |
Branch And Link, Compact. Unconditional PC relative branch to address, placing returnaddress in register $31.
nanoMIPS
001010 |
1 |
s[24:1] |
s[25] |
6 |
1 |
24 |
1 |
offset = sign_extend(s, from_nbits=26)
001110 |
s[9:1] |
s[10] |
6 |
9 |
1 |
offset = sign_extend(s, from_nbits=11)
address = effective_address(CPU.next_pc, offset) GPR[31] = CPU.next_pc CPU.next_pc = address
None.
BALRSC rt, rs |
nanoMIPS |
Branch And Link Register Scaled, Compact |
Branch And Link Register Scaled, Compact.
Unconditional branch to address NextPC+
2*$rs, placing return address in register $rt.
nanoMIPS
010010 |
rt!=0 rt |
rs |
1000 |
x |
6 |
5 |
5 |
4 |
12 |
address = effective_address(CPU.next_pc, offset=GPR[rs]<<1) GPR[rt] = CPU.next_pc CPU.next_pc = address
None.
BBEQZC rt, bit, address |
nanoMIPS, not available in NMS |
Branch if Bit Equals Zero, Compact |
Branch if Bit Equals Zero, Compact. PC relative branch to address if bit bit of register $rtis equal to zero.
nanoMIPS, not available in NMS
110010 |
rt |
001 |
x |
bit |
s[10:1] |
s[11] |
6 |
5 |
3 |
1 |
6 |
10 |
1 |
offset = sign_extend(s, from_nbits=12)
if C0.Config5.NMS == 1: raise exception('RI') if bit >= 32 and not Are64BitOperationsEnabled(): raise exception('RI'); address = effective_address(CPU.next_pc, offset) testbit = (GPR[rt] >> bit) & 1 if testbit == 0: CPU.next_pc = address
Reserved Instruction on NMS cores.
BBNEZC rt, bit, address |
nanoMIPS, not available in NMS |
Branch if Bit Not Equal to Zero, Compact |
Branch if Bit Not Equal to Zero, Compact. PC relative branch to address if bit bit of register$rt is not equal to zero.
nanoMIPS, not available in NMS
110010 |
rt |
101 |
x |
bit |
s[10:1] |
s[11] |
6 |
5 |
3 |
1 |
6 |
10 |
1 |
offset = sign_extend(s, from_nbits=12)
if C0.Config5.NMS == 1: raise exception('RI') if bit >= 32 and not Are64BitOperationsEnabled(): raise exception('RI'); address = effective_address(CPU.next_pc, offset) testbit = (GPR[rt] >> bit) & 1 if testbit == 1: CPU.next_pc = address
Reserved Instruction on NMS cores.
BC address |
nanoMIPS |
Branch, Compact |
Branch, Compact. Unconditional PC relative branch to address.
nanoMIPS
001010 |
0 |
s[24:1] |
s[25] |
6 |
1 |
24 |
1 |
offset = sign_extend(s, from_nbits=26) address = effective_address(CPU.next_pc, offset)
000110 |
s[9:1] |
s[10] |
6 |
9 |
1 |
offset = sign_extend(s, from_nbits=11) address = effective_address(CPU.next_pc, offset)
CPU.next_pc = address
None.
BEQC rs, rt, address |
nanoMIPS, availability varies by format. |
Branch if Equal, Compact |
Branch if Equal, Compact. PC relative branch to address if registers $rs and $rt are areequal.
nanoMIPS, availability varies by format.
100010 |
rt |
rs |
00 |
s[13:1] |
s[14] |
6 |
5 |
5 |
2 |
13 |
1 |
offset = sign_extend(s, from_nbits=15) address = effective_address(CPU.next_pc, offset) not_in_mms = False
110110 |
rs3<rt3 && u!=0 rt3 |
rs3 |
u[4:1] |
6 |
3 |
3 |
4 |
rs = decode_gpr(rs3, 'gpr3') rt = decode_gpr(rt3, 'gpr3') offset = u address = effective_address(CPU.next_pc, offset) not_in_mms = True
if not_in_mms and C0.Config5.NMS == 1: raise exception('RI') if GPR[rs] == GPR[rt]: CPU.next_pc = address
Reserved Instruction for BEQC[16] format on NMS cores.
BEQIC rt, u, address |
nanoMIPS |
Branch if Equal to Immediate, Compact |
Branch if Equal to Immediate, Compact. PC relative branch to address if value of register$rt is equal to immediate value u.
nanoMIPS
110010 |
rt |
000 |
u |
s[10:1] |
s[11] |
6 |
5 |
3 |
7 |
10 |
1 |
offset = sign_extend(s, from_nbits=12) address = effective_address(CPU.next_pc, offset) if GPR[rt] == u: CPU.next_pc = address
None.
BEQZC rt, address # when rt andaddressarein range |
nanoMIPS |
Branch if Equal to Zero, Compact |
Branch if Equal to Zero, Compact. PC relative branch to address if register $rt equals zero.
nanoMIPS
100110 |
rt3 |
s[6:1] |
s[7] |
6 |
3 |
6 |
1 |
rt = decode_gpr(rt3, 'gpr3') offset = sign_extend(s, from_nbits=8) address = effective_address(CPU.next_pc, offset) if GPR[rt] == 0: CPU.next_pc = address
None.
BGEC rs, rt, address |
nanoMIPS |
Branch if Greater than or Equal, Compact |
Branch if Greater than or Equal, Compact. PC relative branch to address if register $rs isgreater than or equal to register $rt.
nanoMIPS
100010 |
rt |
rs |
10 |
s[13:1] |
s[14] |
6 |
5 |
5 |
2 |
13 |
1 |
offset = sign_extend(s, from_nbits=15) address = effective_address(CPU.next_pc, offset) if GPR[rs] >= GPR[rt]: CPU.next_pc = address
None.
BGEIC rt, u, address |
nanoMIPS |
Branch if Greater than or Equal to Immediate, Compact |
Branch if Greater than or Equal to Immediate, Compact. PC relative branch to address ifsigned register value $rt is greater than or equal to immediate u.
nanoMIPS
110010 |
rt |
010 |
u |
s[10:1] |
s[11] |
6 |
5 |
3 |
7 |
10 |
1 |
offset = sign_extend(s, from_nbits=12) address = effective_address(CPU.next_pc, offset) if GPR[rt] >= u: CPU.next_pc = address
None.
BGEIUC rt, u, address |
nanoMIPS |
Branch if Greater than or Equal to Immediate Unsigned, Compact |
Branch if Greater than or Equal to Immediate Unsigned, Compact. PC relative branch toaddress if unsigned register $rt is greater than or equal to immediate u.
nanoMIPS
110010 |
rt |
011 |
u |
s[10:1] |
s[11] |
6 |
5 |
3 |
7 |
10 |
1 |
offset = sign_extend(s, from_nbits=12) address = effective_address(CPU.next_pc, offset) if unsigned(GPR[rt]) >= u: CPU.next_pc = address
None.
BGEUC rs, rt, address |
nanoMIPS |
Branch if Greater than or Equal |
Branch if Greater than or Equal
to Unsigned, Compact.PC relative branch to address if
unsigned register $rs is greater than or equal to unsigned register $rt.
nanoMIPS
100010 |
rt |
rs |
11 |
s[13:1] |
s[14] |
6 |
5 |
5 |
2 |
13 |
1 |
offset = sign_extend(s, from_nbits=15) address = effective_address(CPU.next_pc, offset) if unsigned(GPR[rs]) >= unsigned(GPR[rt]): CPU.next_pc = address
None.
BITREVB rt, rs |
Assembly alias, not available in NMS |
Bit Reverse in Bytes |
Bit Reverse in Bytes. Reverse bits in each byte of 32-bit value in register $rs, placing theresult in register $rt.
Assembly alias, not available in NMS
ROTX rt, rs, 7, 8, 1
BITREVH rt, rs |
Assembly alias, not available in NMS |
Bit Reverse in Halfs |
Bit Reverse in Halfs. Reverse bits in each halfword of 32-bit value in register $rs, placingthe result in register $rt.
Assembly alias, not available in NMS
ROTX rt, rs, 15, 16
BITREVW rt, rs |
Assembly alias, not available in NMS |
Bit Reverse in Word |
Bit Reverse in Word. Reverse all bits in 32 bit register $rs, placing the result in register $rt.
Assembly alias, not available in NMS
ROTX rt, rs, 31, 0
BITSWAP rt, rs |
Assembly alias, not available in NMS |
Bitswap |
Bitswap.
Reverse bits in each byte of 32-bit value in register $rs, placing the resultin
register $rt.
Assembly alias, not available in NMS
ROTX rt, rs, 7, 8, 1
The assembly alias BITSWAP is provided for compatibility with MIPS32™.Its behavior is equivalent to the new assembly alias BITREVB, whose name is chosen to fit consistently with the naming of other
reversing instructions in nanoMIPS™.
BLTC rs, rt, address |
nanoMIPS |
Branch if Less Than, Compact |
Branch if Less Than, Compact. PC relative branch to address if signed register $rs is lessthan signed register $rt.
nanoMIPS
101010 |
rt |
rs |
10 |
s[13:1] |
s[14] |
6 |
5 |
5 |
2 |
13 |
1 |
offset = sign_extend(s, from_nbits=15) address = effective_address(CPU.next_pc, offset) if GPR[rs] < GPR[rt]: CPU.next_pc = address
None.
BLTIC rt, u, address |
nanoMIPS |
Branch if Less Than Immediate, Compact |
Branch if Less Than Immediate, Compact. PC relative branch to address if signed register$rt is less than immediate u.
nanoMIPS
110010 |
rt |
110 |
u |
s[10:1] |
s[11] |
6 |
5 |
3 |
7 |
10 |
1 |
offset = sign_extend(s, from_nbits=12) address = effective_address(CPU.next_pc, offset) if GPR[rt] < u: CPU.next_pc = address
None.
BLTIUC rt, u, address |
nanoMIPS |
Branch if Less Than Immediate Unsigned Compact |
Branch if Less Than Immediate Unsigned Compact.
PC relative branch to address if unsigned register $rt is less than immediate u.
nanoMIPS
110010 |
rt |
111 |
u |
s[10:1] |
s[11] |
6 |
5 |
3 |
7 |
10 |
1 |
offset = sign_extend(s, from_nbits=12) address = effective_address(CPU.next_pc, offset) if unsigned(GPR[rt]) < u: CPU.next_pc = address
None.
BLTUC rs, rt, address |
nanoMIPS |
Branch if Less Than Unsigned, Compact |
Branch if Less Than Unsigned, Compact. PC relative branch to address if unsigned register$rs is less than unsigned register $rt.
nanoMIPS
101010 |
rt |
rs |
11 |
s[13:1] |
s[14] |
6 |
5 |
5 |
2 |
13 |
1 |
offset = sign_extend(s, from_nbits=15) address = effective_address(CPU.next_pc, offset) if unsigned(GPR[rs]) < unsigned(GPR[rt]): CPU.next_pc = address
None.
BNEC rs, rt, address |
nanoMIPS, availability varies by format. |
Branch Not Equal, Compact |
Branch Not Equal, Compact. PC relative branch to address if register $rs is not equal toregister $rt.
nanoMIPS, availability varies by format.
101010 |
rt |
rs |
00 |
s[13:1] |
s[14] |
6 |
5 |
5 |
2 |
13 |
1 |
offset = sign_extend(s, from_nbits=15)
110110 |
rs3>=rt3 && u!=0 rt3 |
rs3 |
u[4:1] |
6 |
3 |
3 |
4 |
if C0.Config5.NMS == 1: raise exception('RI') rs = decode_gpr(rs3, 'gpr3') rt = decode_gpr(rt3, 'gpr3') offset = u
address = effective_address(CPU.next_pc, offset) if GPR[rs] != GPR[rt]: CPU.next_pc = address
Reserved Instruction for BNEC[16] format on NMS cores.
BNEIC rt, u, address |
nanoMIPS |
Branch if Not Equal to Immediate, Compact |
Branch if Not Equal to Immediate, Compact. PC relative branch to address if register $rt isnot equal to immediate u.
nanoMIPS
110010 |
rt |
100 |
u |
s[10:1] |
s[11] |
6 |
5 |
3 |
7 |
10 |
1 |
offset = sign_extend(s, from_nbits=12) address = effective_address(CPU.next_pc, offset) if GPR[rt] != u: CPU.next_pc = address
None.
BNEZC rt, address |
nanoMIPS |
Branch if Not Equal to Zero, Compact |
Branch if Not Equal to Zero, Compact. PC relative branch to address if register $rt is notequal to zero.
nanoMIPS
101110 |
rt3 |
s[6:1] |
s[7] |
6 |
3 |
6 |
1 |
rt = decode_gpr(rt3, 'gpr3') offset = sign_extend(s, from_nbits=8)
address = effective_address(CPU.next_pc, offset) if GPR[rt] != 0: CPU.next_pc = address
None.
BREAK code |
nanoMIPS |
Break |
Break. Cause a Breakpoint exception.
nanoMIPS
000000 |
00000 |
10 |
code |
6 |
5 |
2 |
19 |
000100 |
00000 |
10 |
code |
6 |
5 |
2 |
3 |
raise exception('BP')
Breakpoint.
BRSC rs |
nanoMIPS |
Branch Register Scaled, Compact |
Branch Register Scaled, Compact. Unconditional branch to address
NextPC + 2*$rs.
nanoMIPS
010010 |
00000 |
rs |
1000 |
x |
6 |
5 |
5 |
4 |
12 |
address = effective_address(CPU.next_pc, offset=GPR[rs]<<1) CPU.next_pc = address
None.
BYTEREVH rt, rs |
Assembly alias, not available in NMS |
Byte Reverse in Halfs |
Byte Reverse in Halfs. Reverse bytes in each halfword of 32-bit value in register $rs, placingthe result in register $rt.
Assembly alias, not available in NMS
ROTX rt, rs, 8, 24
BYTEREVW rt, rs |
Assembly alias, not available in NMS |
Byte Reverse in Word |
Byte Reverse in Word. Reverse each byte in word value in register $rs, placing the result inregister $rt.
Assembly alias, not available in NMS
ROTX rt, rs, 24, 8
CACHE op, offset(rs) |
nanoMIPS. Requires CP0 privilege, availability varies by format. |
Cache operation/Cache operation using EVA addressing |
CACHEE op, offset(rs) |
nanoMIPS. Requires CP0 privilege, availability varies by format. |
Cache operation/Cache operation using EVA addressing |
Cache operation/Cache operation using EVA addressing. Perform cache operation of typeop at address $rs +offset (register plus immediate). For CACHEE, translate the virtual address as though the core is in user mode, although it is actually in kernel mode.
nanoMIPS. Requires CP0 privilege, availability varies by format.
101001 |
op |
rs |
s[8] |
0111 |
0 |
01 |
s[7:0] |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
8 |
offset = sign_extend(s, from_nbits=9) is_eva = False
101001 |
op |
rs |
s[8] |
0111 |
0 |
10 |
s[7:0] |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
8 |
offset = sign_extend(s, from_nbits=9) is_eva = True
# NMS core without caches gives RI (not CoprocessorUnusable)exception. if (C0.Config5.NMS and C0.Config1.DL == 0 and C0.Config1.IL == 0 and C0.Config2.SL == 0 and C0.Config2.TL == 0 and C0.Config5.L2C == 0): raise exception('RI') if is_eva and not C0.Config5.EVA: raise exception('RI') if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) va = effective_address(GPR[rs], offset, 'Load', eva=is_eva) # Behavior for index cacheops is unpredictable ifaddressisnotunmapped. if op <= 11: # Index cacheop translation_type, description, result_args = decode_va(va, eva=is_eva) if translation_type != 'unmapped': raise UNPREDICTABLE('Index cacheopunpredictable withVA not unmapped') pa, cca = va2pa(va, 'Cacheop', eva=is_eva) if cca == 2 or cca == 7: if C0.Config.AT >= 2: pass # Cacheop to uncached address is anopin R6 else: raise UNPREDICTABLE('Cacheop to uncached address isunpredictable') else: cacheop(va, pa, op)
The CACHE/CACHEE instructions perform the cache operation specified by argument ’op’ on the register plus immediate address $rs +offset. For CACHEE, the virtual address is translated as though
the core is in user mode, although it is actually in kernel mode.
The ’op’ argument is a 5 bit value specifying one of the following the possible cache operations, which are described in more detail below:
’op’OperationAvailability
0ICache Index InvalidateRequired (if ICache present) 1DCache Index Writeback InvalidateRequired (if DCache present)
2TCache Index Writeback InvalidateRequired (if TCache present) 3SCache Index Writeback InvalidateRequired (if SCache present)
4ICache Index Load TagRecommended (if ICache present) 5DCache Index Load TagRecommended (if DCache present)
6TCache Index Load TagRecommended (if TCache present) 7SCache Index Load TagRecommended (if SCache present)
8ICache Index Store TagRequired (if ICache present) 9DCache Index Store TagRequired (if DCache present)
10TCache Index Store TagRequired (if TCache present) 11SCache Index Store TagRequired (if SCache present)
12ICache Implementation Dependent OpOptional (if ICache present) 13DCache Implementation Dependent OpOptional (if DCache present)
14TCache Implementation Dependent OpOptional (if TCache present) 15SCache Implementation Dependent OpOptional (if SCache present)
16ICache Hit InvalidateRequired (if ICache present) 17DCache Hit InvalidateOptional (if DCache present)
18TCache Hit InvalidateOptional (if TCache present) 19SCache Hit InvalidateOptional (if SCache present)
20ICache FillRecommended (if ICache present) 21DCache Hit Writeback InvalidateRecommended (if DCache present)
’op’OperationAvailability
22TCache Hit Writeback InvalidateRecommended (if TCache present) 23SCache Hit Writeback InvalidateRecommended (if SCache present)
24Unused 25DCache Hit WritebackRecommended (if DCache present)
26TCache Hit WritebackRecommended (if TCache present) 27SCache Hit WritebackRecommended (if SCache present)
28ICache Fetch and LockRecommended (if ICache present) 29DCache Fetch and LockRecommended (if DCache present)
30Unused 31Unused
Index cacheops (those with op <= 11 and optionally the implementation dependent cases 12<= op <=
15) are operations where the input address is treated as an index into the target cache array. The rules for constructing the index are given in the cacheop() function pseudocode.
’Hit’ cacheops are operations where the input address is treated as a virtual memory address.The operation willtargetthe cache line containing data for that virtual address,ifitis presentin the
cache.
The operations listed above behave as follows:
ICache Index Invalidate (op=0): Set the state of the instruction cache line at the specified index to invalid.
D/T/S Cache Index Writeback Invalidate (op=1,2,3):
Ifthe cache line atthe specified index is
valid and dirty, write the line back to the memory address specified by the cache tag. Whether or not the line was dirty, set the state of the cache line to invalid. For a write-through cache, the
writeback step is not required and this is effectively a Cache Index Invalidate operation. This cache operation is required and may be used by software to invalidate the entire data cache by
stepping through allindices. Note that the Index Store Tag operation must be used to initialize the cache at power up.
I/D/T/S Cache Index Load Tag (op=4,5,6,7): Read the tag for the cache line at the specified index
into the TagLo and TagHi Coprocessor 0 registers.If the DataLo and DataHi registers are implemented, also read the data corresponding to the byte index into the DataLo and DataHi registers.
This operation must not cause a Cache Error Exception. The granularity and alignment of the data read into the DataLo and DataHi registers is implementation-dependent, but is typically the
result of an aligned access to the cache, ignoring the appropriate low-order bits of the byte index.
I/D/T/S Cache Index Store Tag (op=8,9,10,11): Write the tag for the cache block at the specified index from the TagLo and TagHi Coprocessor 0 registers. This operation must not cause a Cache
Error Exception. This required encoding may be used by software to initialize the entire instruction or data caches by stepping through all valid indices. Doing so requires that the TagLo and
TagHi registers associated with the cache be initialized to zero first.
I/D/T/S Cache Implementation Dependent Op (op=12,13,14,15): Available for implementation dependent operation.
I/D/T/S Cache Hit Invalidate (op=16,17,18,19):
If the cache block contains the specified address,
setthe state ofthe cache block to invalid.This required encoding may be used by software to invalidate a range of addresses from the instruction cache by stepping through the address
range by the line size of the cache.In multiprocessor implementations with coherent caches, the operation may optionally be broadcast to all coherent caches within the system.
ICache FIll (op=20): Fill the cache from the specified virtual address.
D/T/S Hit Writeback Invalidate (op=21,22,23): For the cache line (if any) which contains the
specified address:if the cache line is valid and dirty, write the line back to the memory address specified by the cache tag. Whether or not the line was dirty, set the state of the cache line to
invalid. For a write-through cache,the writeback step is not required and this is effectively a Cache Hit Invalidate operation. This cache operation is required and may be used by software to
invalidate a range of addresses from the data cache by stepping through the address range by the line size of the cache.In multiprocessor implementations with coherent caches, the operation
may optionally be broadcast to all coherent caches within the system.
D/T/S Hit Writeback (op=25,26,27):
If the cache block contains the specified address and it is
valid and dirty, write the contents back to memory. After the operation is completed,leave the state of the line valid, but clear the dirty state. For a write-through cache, this operation may be
treated as a nop.In multiprocessor implementations with coherent caches, the operation may optionally be broadcast to all coherent caches within the system.
I/D Fetch and Lock (op=28,29):
If the cache does not contain the specified virtual address, fill
itfrom memory, performing a write-back if required.Setthe state to valid and locked.The way selected on a fill from memory is implementation dependent. The lock state may be cleared
by executing an Index Invalidate,Index Writeback Invalidate, Hit Invalidate, or Hit Writeback Invalidate operation to the locked line, or via an Index Store Tag operation to the line that clears
the lock bit.It is implementation dependent whether a locked line is displaced as the result of an external invalidate or intervention that hits on the locked line. Software must not depend on the
locked line remaining in the cache if an externalinvalidate or intervention would invalidate the line if it were not locked.It is implementation dependent whether a Fetch and Lock operation
affects more than one line. For example, more than one line around the referenced address may be fetched and locked.Itis recommended that only the single line containing the referenced
address be affected.
It is implementation dependent whether the input address for an Index cacheop is converted into a physical address by the MMU, so to avoid the posibility of generating a TLB exception, the index value
should always be converted to an unmapped address (such as a kseg0 address by ORing the index with 0x80000000) before being used by the cache instruction. For example, the following code sequence
performs a data cache Index Store Tag operation using the index passed in GPR a0:
li a1, 0x80000000 /* Baseofkseg0 segment */ or a0, a0, a1 /* Convertindex to kseg0 address */ cache DCIndexStTag, 0(a1) /* Performtheindex store tag operation */
Some CACHE/CACHEE operations may result in a Cache Error exception. For example, if a Writeback operation detects a cache or bus error during the processing of the operation, that error is reported
via a Cache Error exception.Also, a Bus Error Exception may occur if a bus operation invoked by this instruction is terminated in an error. However, cache error exceptions must not be triggered by
an Index Load Tag or Index Store tag operation, as these operations are used for initialization and diagnostic purposes.
It is implementation dependent whether a data watch is triggered by a cache instruction whose address matches the Watch register address match conditions. The preferred implementation is not to match
on the CACHE/CACHEE instructions.
The operation of the instruction is UNPREDICTABLE if the cache line that contains the CACHE instruction is the target of an invalidate or a writeback invalidate operation.
If this instruction is used to lock all ways of a cache at a specific cache index, the behavior of that cache to subsequent cache misses to that cache index is UNDEFINED.
The effective address may be arbitrarily aligned. The CACHE/CACHEE instructions never causes an Address Error Exception due to a non-aligned address.
The CACHE instruction and the memory transactions which are sourced by the CACHE instruction, such as cache refill or cache writeback, obey the ordering and completion rules of the SYNC instruction.
Any use of this instruction that can cause cacheline writebacks should be followed by a subsequent SYNC instruction to avoid hazards where the writeback data is not yet visible at the next level of the
memory hierarchy.
For multiprocessor implementations that maintain coherent caches, some of the Hit type operations
may optionally affect all coherent caches within the implementation.In this case,if the effective address uses a coherent Cache Coherency Attribute (CCA),
then the operation is globalized, meaning
it is broadcast to all of the coherent caches within the system.If the effective address does not use one of the coherent CCAs, there is no broadcast of the operation.If multiple levels of caches are to
be affected by one CACHE instruction, all of the affected cache levels must be processed in the same manner - either all affected cache levels use the globalized behavior or all affected cache levels use
the non-globalized behavior.
Address Error. Bus Error. Cache Error. Coprocessor Unusable. Reserved Instruction on NMS cores without caches. Reserved Instruction for CACHEE if EVA not implemented. TLB Invalid. TLB Refill.
CLO rt, rs |
nanoMIPS, not available in NMS |
Count Leading Ones |
Count Leading Ones. Count leading ones in 32-bit register value $rs, placing the result inregister $rt.
nanoMIPS, not available in NMS
001000 |
rt |
rs |
0100101 |
100 |
111 |
111 |
6 |
5 |
5 |
7 |
3 |
3 |
3 |
if C0.Config5.NMS == 1: raise exception('RI') input = GPR[rs] i = 0 while i < 32: if input[31 - i] != 1: break i += 1 GPR[rt] = i
Reserved Instruction on NMS cores.
CLZ rt, rs |
nanoMIPS, not available in NMS |
Count Leading Zeros |
Count Leading Zeros. Count leading zeros in 32-bit register value $rs, placing the result inregister $rt.
nanoMIPS, not available in NMS
001000 |
rt |
rs |
0101101 |
100 |
111 |
111 |
6 |
5 |
5 |
7 |
3 |
3 |
3 |
if C0.Config5.NMS == 1: raise exception('RI') input = GPR[rs] i = 0 while i < 32: if input[31 - i] != 0: break i += 1 GPR[rt] = i
Reserved Instruction on NMS cores.
CRC32B rt, rs |
nanoMIPS. Optional, present when Config5.CRCP=1. |
CRC32 Byte. |
CRC32 Byte.Generatea32-bit CRC valuebasedonthereversedpolynomial$rt
0xEDB88320,using cumulative 32-bit CRC valueand right-justified byte-sized message$rt $rs as inputs.
nanoMIPS. Optional, present when Config5.CRCP=1.
001000 |
rt |
rs |
x |
000 |
1111 |
1 |
01 |
000 |
6 |
5 |
5 |
3 |
3 |
4 |
1 |
2 |
3 |
if C0.Config5.CRCP == 0: raise exception('RI') result = crc32(value=GPR[rt], message=GPR[rs], nbits=8, poly=0xEDB88320) GPR[rt] = sign_extend(result, from_nbits=32)
Reserved Instruction on cores without CRC support.
CRC32CB rt, rs |
nanoMIPS. Optional, present when Config5.CRCP=1. |
CRC32 (Castagnoli) Byte |
CRC32 (Castagnoli) Byte. Generate a 32-bit CRC value $rt based on the reversed polynomial0x82F63B78, using cumulative 32-bit CRC value $rt and right-justified byte-sized message $rs as inputs.
nanoMIPS. Optional, present when Config5.CRCP=1.
001000 |
rt |
rs |
x |
100 |
1111 |
1 |
01 |
000 |
6 |
5 |
5 |
3 |
3 |
4 |
1 |
2 |
3 |
if C0.Config5.CRCP == 0: raise exception('RI') result = crc32(value=GPR[rt], message=GPR[rs], nbits=8, poly=0x82F63B78) GPR[rt] = sign_extend(result, from_nbits=32)
Reserved Instruction on cores without CRC support.
CRC32CH rt, rs |
nanoMIPS. Optional, present when Config5.CRCP=1. |
CRC32 (Castagnoli) Half |
CRC32 (Castagnoli) Half. Generate a 32-bit CRC value $rt based on the reversed polynomial0x82F63B78, using cumulative 32-bit CRC value $rt and right-justified halfword-sized message $rs as inputs.
nanoMIPS. Optional, present when Config5.CRCP=1.
001000 |
rt |
rs |
x |
101 |
1111 |
1 |
01 |
000 |
6 |
5 |
5 |
3 |
3 |
4 |
1 |
2 |
3 |
if C0.Config5.CRCP == 0: raise exception('RI') result = crc32(value=GPR[rt], message=GPR[rs], nbits=16, poly=0x82F63B78) GPR[rt] = sign_extend(result, from_nbits=32)
Reserved Instruction on cores without CRC support.
CRC32CW rt, rs |
nanoMIPS. Optional, present when Config5.CRCP=1. |
CRC32 (Castagnoli) Word |
CRC32 (Castagnoli) Word. Generate a 32-bit CRC value $rt based on the reversed polynomial 0x82F63B78, using cumulative 32-bit CRC value $rt and right-justified word-sized message $rs as inputs.
nanoMIPS. Optional, present when Config5.CRCP=1.
001000 |
rt |
rs |
x |
110 |
1111 |
1 |
01 |
000 |
6 |
5 |
5 |
3 |
3 |
4 |
1 |
2 |
3 |
if C0.Config5.CRCP == 0: raise exception('RI') result = crc32(value=GPR[rt], message=GPR[rs], nbits=32, poly=0x82F63B78) GPR[rt] = sign_extend(result, from_nbits=32)
Reserved Instruction on cores without CRC support.
CRC32H rt, rs |
nanoMIPS. Optional, present when Config5.CRCP=1. |
CRC32 Half. |
CRC32 Half.Generatea32-bit CRC valuebasedonthereversedpolynomial$rt
0xEDB88320,using cumulative 32-bit CRC valueand right-justified halfword-sized message$rt
$rs as inputs.
nanoMIPS. Optional, present when Config5.CRCP=1.
001000 |
rt |
rs |
x |
001 |
1111 |
1 |
01 |
000 |
6 |
5 |
5 |
3 |
3 |
4 |
1 |
2 |
3 |
if C0.Config5.CRCP == 0: raise exception('RI') result = crc32(value=GPR[rt], message=GPR[rs], nbits=16, poly=0xEDB88320) GPR[rt] = sign_extend(result, from_nbits=32)
Reserved Instruction on cores without CRC support.
CRC32W rt, rs |
nanoMIPS. Optional, present when Config5.CRCP=1. |
CRC32 Word. |
CRC32 Word.Generatea 32-bit CRC value$rt based on thereversed polynomial
0xEDB88320, using cumulative 32-bit CRC value $rt and right-justified word-sized message $rs as inputs.
nanoMIPS. Optional, present when Config5.CRCP=1.
001000 |
rt |
rs |
x |
010 |
1111 |
1 |
01 |
000 |
6 |
5 |
5 |
3 |
3 |
4 |
1 |
2 |
3 |
if C0.Config5.CRCP == 0: raise exception('RI') result = crc32(value=GPR[rt], message=GPR[rs], nbits=32, poly=0xEDB88320) GPR[rt] = sign_extend(result, from_nbits=32)
Reserved Instruction on cores without CRC support.
DERET |
nanoMIPS. Optional, present when Debug implemented. |
Debug Exception Return |
Debug Exception Return. Return from a debug exception by jumping to the address in theDEPC register, and clearing Debug.DM.
nanoMIPS. Optional, present when Debug implemented.
001000 |
x |
11 |
10001 |
101 |
111 |
111 |
6 |
10 |
2 |
5 |
3 |
3 |
3 |
if C0.Config1.EP == 0: raise exception('RI') if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) if C0.Debug.DM == 0: raise exception('RI') CPU.next_pc = sign_extend(Root.C0.DEPC) C0.Debug.DM = 0 # If single stepping, forward progress isallowedonthenextinstruction. CPU.debug_sst_progress_allowed = True clear_execution_hazards() clear_instruction_hazards()
The DERET instruction implements a software barrier that resolves all execution and instruction hazards. See the EHB and JALRC.HB instructions for an explanation of execution and instruction hazards
respectively, and also the SYNCI/SYNCIE instruction for additionalinformation on resolving instruction hazards created by writing to the instruction stream.
The effects of the DERET barrier are seen starting with the fetch and decode of the instruction at the PC to which the DERET returns. This means,for instance,that if C0.DEPC is modified by an MTC0
instruction prior to a DERET, an EHB is required between the MTC0 and the DERET to ensure that the DERET uses the correct DEPC value.
The DERET instruction is only legalin debug mode and will give a Coprocessor Unusable exception when executed in user mode or a Reserved Instruction exception when executed in kernel mode.
Coprocessor Unusable.Reserved Instruction when notin Debug Mode or on cores without Debug support.
DI rt |
nanoMIPS. Requires CP0 privilege. |
Disable Interrupts |
Disable Interrupts. Disable interrupts by setting Status.IE to 0, and return the previousvalue of Status register in register $rt.
nanoMIPS. Requires CP0 privilege.
001000 |
rt |
x |
01 |
00011 |
101 |
111 |
111 |
6 |
5 |
5 |
2 |
5 |
3 |
3 |
3 |
if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) GPR[rt] = C0.Status C0.Status.IE = 0
Coprocessor Unusable.
DIV rd, rs, rt |
nanoMIPS |
Divide |
Divide. Divide signed word $rs by signed word $rt and place the result in $rd.
nanoMIPS
001000 |
rt |
rs |
rd |
x |
0100011 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
numerator = GPR[rs] denominator = GPR[rt] if denominator == 0: quotient, remainder = (UNKNOWN, UNKNOWN) else: quotient, remainder = divide_integers(numerator, denominator) GPR[rd] = sign_extend(quotient, from_nbits=32)
None.
DIVU rd, rs, rt |
nanoMIPS |
Divide Unsigned |
Divide Unsigned. Divide unsigned word $rs by unsigned word $rt and place the result inregister $rd.
nanoMIPS
001000 |
rt |
rs |
rd |
x |
0110011 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
numerator = zero_extend(GPR[rs], from_nbits=32) denominator = zero_extend(GPR[rt], from_nbits=32) if denominator == 0: quotient, remainder = (UNKNOWN, UNKNOWN) else: quotient, remainder = divide_integers(numerator, denominator) GPR[rd] = sign_extend(quotient, from_nbits=32)
None.
DVP rt |
nanoMIPS. Optional, present when Config5.VP=1, otherwise NOP. Requires CP0 privilege. |
Disable Virtual Processors |
Disable Virtual Processors. Disable all virtual processors in a physical core other than theone that issued the instruction. Set VPControl.DIS to 1, and place the previous value of the VPControl CP0 register in register $rt.
nanoMIPS. Optional, present when Config5.VP=1, otherwise NOP. Requires CP0 privilege.
001000 |
rt |
x |
00000 |
0 |
1110010 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
if C0.Config5.VP == 0: # No operation when VP not implemented pass else: if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) GPR[rt] = C0.VPControl C0.VPControl.DIS = 1 disable_virtual_processors()
The DVP instruction is used to halt instruction fetch for all virtual processors in a VP core, other than the one which issued the DVP instruction. Possible uses for DVP include:
Performing cache operations where the cache state must not be affected by the actions of other
threads on the same core.
Reprogramming virtual processor scheduling priority.
All outstanding instructions for the affected virtual processors must be complete before the DVP itself is allowed to retire. Any outstanding events such as hardware instruction or data prefetch, or page-table
walks, must also be terminated.
Memory ordering equivalent to that provided by SYNC(stype=0) is guaranteed between subsequent
instructions on the virtual processor which issued the DVP, and instructions which have already graduated on the disabled virtual processors.
If a virtual processor is already disabled by another event,for instance,if it has executed a WAIT or a PAUSE instruction or has been halted by some external hardware event,then the disabled virtual
processor will not be re-enabled until both an EVP instruction has been executed on the controlling thread, and an event which would otherwise have woken the virtual processor (such as an interrupt for
a WAIT instruction or an interrupt or clearing of the LLBit for a PAUSE instruction) has also occurred.
The effect of a DVP instruction is undone by an EVP instruction, which causes execution to resume immediately (where applicable) on all other virtual processors. From the perspective of the disabled
virtual processors, after the EVP, execution continues as though the DVP had not occurred.
If an event occurs in between the DVP and EVP that renders state of a disabled virtual processor UNPREDICTABLE (such as power-gating), then the effect of EVP is UNPREDICTABLE.
A disabled virtual processor cannot be woken by an interrupt or a deferred exception, at least until execution is re-enabled by an EVP instruction on the controlling thread.The virtual processor that
executes the DVP, however, continues to be interruptible.
A DVP which is executed when VPControl.DIS=1 will return the current value of the VPControl register but otherwise will leave the other virtual processors in a disabled state. Software should only re-enable
virtual processors (via the EVP instruction) if it has verified from the VPControl value returned by the DVP that virtual processors were previously enabled. Performing this check allows DVP/EVP pairs to
be safely nested.
In a core with multiple virtual processors, more than one virtual processor may execute a DVP simultaneously. The implementation should ensure that the selection of which virtual processor’s DVP successfully graduates is not biased towards any one virtual processor, in order to prevent the possibility
of live-lock.
The DVP instruction behaves like a NOP on cores which do not implement virtual processors (i.e. when Config5.VP=0). This behavior allows kernel code to enclose critical sequences within DVP/EVP blocks
without first checking whether itis running on a VP core.The encoding ofthe DVP instruction is equivalentto a SLTU instruction targeting $0,i.e.a NOP, which leads to the correct behavior on
non-VP cores with no additional hardware special casing.
Coprocessor Unusable.
EHB |
nanoMIPS |
Execution hazard barrier |
Execution hazard barrier. Clear all execution hazards before allowing any subsequent instructions to graduate.
nanoMIPS
100000 |
00000 |
x |
1100 |
x |
0000 |
00011 |
6 |
5 |
5 |
4 |
3 |
4 |
5 |
clear_execution_hazards()
The EHB instruction creates an execution hazard barrier, meaning thatit ensures that subsequent
instructions will be aware of changes to CP0 state caused by prior instructions. Examples of instructions which change CP0 state and which need an execution hazard barrier to ensure that subsequent
instructions see those updates are MTC0, EI, DI, TLBR and CACHE/CACHEE
In the absence of an execution hazard barrier, the CP0 register value used as input to an instruction may be out of date, since it may have been read before the write to the CP0 register by a prior instruction
has actually been committed.
An execution hazard barrier is sufficient to ensure that a fetched instruction is aware of all prior CP0 updates. However, it is not sufficient to ensure that the correct instruction is being fetched as a result
of those CP0 updates. Ensuring that the correct instruction is fetched requires an instruction hazard barrier, which is provided by the JALRC.HB instruction, or any ofthe exception return instructions
ERET/ERETNC or DERET.
None.
EI rt |
nanoMIPS. Requires CP0 privilege. |
Enable Interrupts |
Enable Interrupts.
Enable interrupts by setting Status.IE to 1, and return the previous
value of Status register in register $rt.
nanoMIPS. Requires CP0 privilege.
001000 |
rt |
x |
01 |
01011 |
101 |
111 |
111 |
6 |
5 |
5 |
2 |
5 |
3 |
3 |
3 |
if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) GPR[rt] = C0.Status C0.Status.IE = 1
Coprocessor Unusable.
ERET |
nanoMIPS, availability varies by format. |
Exception Return/Exception Return Not Clearing LLBit |
ERETNC |
nanoMIPS, availability varies by format. |
Exception Return/Exception Return Not Clearing LLBit |
Exception Return/Exception Return Not Clearing LLBit. Return from an exception: either byclearing Status.ERL if set and jumping to the address in ErrorEPC; otherwise by clearing Status.EXL,
jumping to the address in EPC, and updating the current Shadow Register Setto SRSCtl.PSS if required.
nanoMIPS, availability varies by format.
001000 |
x |
0 |
11 |
11001 |
101 |
111 |
111 |
6 |
9 |
1 |
2 |
5 |
3 |
3 |
3 |
nc = False
001000 |
x |
1 |
11 |
11001 |
101 |
111 |
111 |
6 |
9 |
1 |
2 |
5 |
3 |
3 |
3 |
nc = True
if nc and C0.Config5.LLB == 0: raise exception('RI') if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) if C0.Status.ERL == 1: effective_epc = sign_extend(C0.ErrorEPC) C0.Status.ERL = 0 else: effective_epc = sign_extend(C0.EPC) C0.Status.EXL = 0 if C0.SRSCtl.HSS > 0 and C0.Status.BEV == 0: C0.SRSCtl.CSS = C0.SRSCtl.PSS CPU.next_pc = effective_epc # clear LLbit unless this is an ERETNC if not nc: C0.LLAddr.LLB = 0 clear_execution_hazards() clear_instruction_hazards()
The ERET/ERETNC instructions implement a software barrier that resolves all execution and instruction hazards. See the EHB and JALRC.HB instructions for an explanation of execution and instruction
hazards respectively, and also the SYNCI/SYNCIE instruction for additionalinformation on resolving instruction hazards created by writing to the instruction stream.
The effects of the ERET/ERETNC barrier are seen starting with the fetch and decode of the instruction at the PC to which the ERET returns. This means, for instance, that if C0.EPC is modified by an MTC0
instruction prior to an ERET, an EHB is required between the MTC0 and the ERET to ensure that the ERET uses the correct EPC value.
Config5.LLB indicates support for the ERETNC instruction.It is always 1 for R6 cores, except for those implementing the nanoMIPS™ subset.In other words, ERETNC is required for nanoMIPS™ cores and
optional for NMS cores.
Coprocessor Unusable. Reserved Instruction allowed for ERETNC on NMS cores.
EVP rt |
nanoMIPS. Optional, present when Config5.VP=1, otherwise NOP. Requires CP0 privilege. |
Enable Virtual |
Enable VirtualProcessors.Enableallvirtualprocessorsinaphysicalcore.Set
VPControl.DIS to 0, and place the previous value of the VPControl CP0 register in register $rt.
nanoMIPS. Optional, present when Config5.VP=1, otherwise NOP. Requires CP0 privilege.
001000 |
rt |
x |
00000 |
1 |
1110010 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
if C0.Config5.VP == 0: # No operation when VP not implemented pass else: if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) GPR[rt] = C0.VPControl C0.VPControl.DIS = 0 enable_virtual_processors()
The EVP instruction is used on VP cores to undo the effect of a DVP instruction, and the reader should refer to the DVP description for details regarding its usage.
The EVP instruction behaves like a NOP on cores which do not implement virtual processors (i.e. when Config5.VP=0). This behavior allows kernel code to enclose critical sequences within DVP/EVP blocks
without first checking whether itis running on a VP core.The encoding ofthe EVP instruction is equivalentto a SLTU instruction targeting $0,i.e.a NOP, which leads to the correct behavior on
non-VP cores with no additional hardware special casing.
Coprocessor Unusable.
EXT rt, rs, pos, size |
nanoMIPS, not available in NMS |
Extract |
Extract. Extract a bit field of size size at position pos from register $rs and store it rightjustified into register $rt.
nanoMIPS, not available in NMS
100000 |
rt |
rs |
1111 |
0 |
msbd |
0 |
lsb |
6 |
5 |
5 |
4 |
1 |
5 |
1 |
5 |
if C0.Config5.NMS == 1: raise exception('RI') pos = lsb size = msbd + 1 if pos + size > 32: raise UNPREDICTABLE() result = zero_extend(GPR[rs] >> pos, from_nbits=size) GPR[rt] = sign_extend(result, from_nbits=32)
Reserved Instruction on NMS cores.
EXTW rd, rs, rt, shift |
nanoMIPS |
Extract Word |
Extract Word. Concatenate the 32 bit values in registers $rt and $rs, extract the word atspecified bit position shift, and place the result in register $rd.
nanoMIPS
001000 |
rt |
rs |
rd |
shift |
011 |
111 |
6 |
5 |
5 |
5 |
5 |
3 |
3 |
tmp = GPR[rt][31:0] @ GPR[rs][31:0] result = tmp >> shift GPR[rd] = sign_extend(result, from_nbits=32)
None.
GINVI rs |
nanoMIPS. Optional, present when |
Globally Invalidate Instruction caches |
Globally Invalidate Instruction caches.
nanoMIPS. Optional, present when
Config5.GI >= 2. Requires CP0 privilege.
001000 |
x |
rs |
00 |
01111 |
101 |
111 |
111 |
6 |
5 |
5 |
2 |
5 |
3 |
3 |
3 |
if C0.Config5.GI < 2: raise exception('RI') if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) if GPR[rs] == 0: cores = get_all_cores_in_system() else: cores = implementation_dependent_ginvi_cores(GPR[rs]) for core in cores: # Find encoded line size, sets, and associativity for thetargetcache. (L, S, A) = get_cache_parameters('I', core) num_sets = 2 ** (S + 6) num_ways = A + 1 for way_index in range(num_ways): for set_index in range(num_sets): cache_line = get_cache_line('I', way_index,set_index, core) cache_line.valid = False
When $rs is 0, GINVI fully invalidates allinstruction caches of all cores in the system,including the localinstruction cache. For non-zero $rs values, GINVI invalidates the instruction cache of a specific,
implementation dependent core in the system.
The GINVIinstruction must be followed by a SYNC (stype=0x14) and an instruction hazard barrier (e.g.JRC.HB) to ensure that all instruction caches in the system have been invalidated.
Coprocessor Unusable. Reserved Instruction if Global Invalidate I-cache not implemented.
GINVT rs, type |
nanoMIPS. Optional, present when Config5.GI=3. Requires CP0 privilege. |
Globally invalidate TLBs |
Globally invalidate TLBs.
nanoMIPS. Optional, present when Config5.GI=3. Requires CP0 privilege.
001000 |
x |
type |
rs |
00 |
00111 |
101 |
111 |
111 |
6 |
3 |
2 |
5 |
2 |
5 |
3 |
3 |
3 |
if C0.Config5.GI != 3: raise exception('RI') if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) if not C0.Config5.MI: raise exception('RI', 'Config5.MI notset') ginvt(type, va=GPR[rs])
Perform type invalidation of all TLBs in the system, where type is one of:
type=0:
invALL - invalidate all non wired entries.
type=1:
invVA- invalidate all entries which match the VA specified by $rs.
type=2:
invMMID invalidate all entries which match C0.MemoryMapID.MMID and are not
global.
type=3:
invVAMMID - invalidate all entries which match the VA specified by $rs and either match
C0.MemoryMapID or are global.
The GINVT instruction must be followed by a SYNC (stype=0x14) and an instruction hazard barrier (e.g.JRC.HB) to ensure that matching entries have been removed from all TLBs in the system and that
all instructions in the instruction stream can only access the new context.
invMMID and invVAMMID operations use the C0.MemoryMapID value of the currently running process. The kernel must save/restore C0.MemoryMapID appropriately before it modifies it for the invalidation
operation. Between the save and restore, it must utilize unmapped addresses.
Coprocessor Unusable.Reserved Instruction if GlobalInvalidate TLB notimplemented.Reserved Instruction if MemoryMapID not enabled (i.e. Config5.MI==0).
INS rt, rs, pos, size |
nanoMIPS, not available in NMS |
Insert. Merge a right justified bit field of size size from register $rs into position pos of |
Insert. Merge a right justified bit field of size size from register $rs into position pos of
register $rt.
nanoMIPS, not available in NMS
100000 |
rt |
rs |
1110 |
0 |
msbd |
0 |
lsb |
6 |
5 |
5 |
4 |
1 |
5 |
1 |
5 |
if C0.Config5.NMS == 1: raise exception('RI') pos = lsb size = 1 + msbd - lsb if size < 1: raise UNPREDICTABLE() merge_mask = ((1<<size) - 1) << pos result = (GPR[rt] & ~merge_mask | (GPR[rs] << pos) & merge_mask) GPR[rt] = sign_extend(result, from_nbits=32)
The INS instruction is not available on NMS cores.It can be emulated using a sequence of three EXTW instructions:
INS rt, rs, pos, size
can be emulated using the following sequence of instructions (provided rt is not equal to rs):
EXTW rt, rt, rt, pos EXTW rt, rt, rs, size EXTW rt, rt, rt, 32 size -pos
Reserved Instruction on NMS cores.
JALRC.HB rt, rs |
nanoMIPS |
Jump And Link Register, Compact, with Hazard Barrier. Unconditional |
Jump And Link Register, Compact, with Hazard Barrier. Unconditionaljump to address in
register $rs, placing the return address in register $rt. Clear allinstruction and execution hazards before allowing any subsequent instructions to graduate.
nanoMIPS
010010 |
rt |
rs |
0001 |
x |
6 |
5 |
5 |
4 |
12 |
address = GPR[rs] + 0 GPR[rt] = CPU.next_pc CPU.next_pc = address clear_instruction_hazards() clear_execution_hazards()
The JALRC.HB instruction creates an instruction hazard barrier, meaning that it ensures that subsequent
instruction fetches will be aware of state changes caused by prior instructions.Examples of
state changes which affect instruction fetch and which need an instruction hazard barrier to ensure that subsequent instructions see those updates are:
Writes to the instruction stream (which must also have been synchronized by a SYNCI/SYNCIE
and a SYNC).
Updates to the TLB.
Changes in CP0 state which affect addresses mappings.
In the absence of an instruction hazard barrier, the state used as input to an instruction fetch may be out of date, since it may have been read before the updates to that state have actually completed.
JALRC.HB also provides an execution hazard barrier, see the EHB instruction definition for details. An instruction hazard barrier is also provided by any of the exception return instructions ERET/ERETNC,
or DERET, but those instructions are only available to privileged software, whereas JALRC.HB is available from all operating modes.
None.
JALRC dst, src |
nanoMIPS |
Jump And Link Register, Compact. Unconditional |
Jump And Link Register, Compact. Unconditionaljump to address in register $src, placing
the return address in register $dst.
nanoMIPS
010010 |
rt |
rs |
0000 |
x |
6 |
5 |
5 |
4 |
12 |
src = rs dst = rt
110110 |
rt |
1 |
0000 |
6 |
5 |
1 |
4 |
src = rt dst = 31
address = GPR[src] + 0 GPR[dst] = CPU.next_pc CPU.next_pc = address
None.
JRC rt |
nanoMIPS |
Jump Register, Compact. Unconditional jump to address in register $rt. |
Jump Register, Compact. Unconditional jump to address in register $rt.
nanoMIPS
110110 |
rt |
0 |
0000 |
6 |
5 |
1 |
4 |
address = GPR[rt] CPU.next_pc = address
None.
LAPC rt, address |
Assembly alias. NMS cores restricted to 21 bit signed offset from PC. |
Load Address, PC relative |
Load Address, PC relative. Load PC relative address to register $rt.
Assembly alias. NMS cores restricted to 21 bit signed offset from PC.
address = $PC + imm (imm in 21 bit signed range): ADDIUPC[32] rt, imm address = $PC + imm (imm in 32 bit signed range): ADDIUPC[48] rt, imm
LAPC uses the ADDIUPC instruction to load a PC relative address into register $rt.In order to determine the correct immediate value for the ADDIUPC instruction, the assembler must assume a value
for the PC that the instruction will be executed from.If the instruction is executed from a different PC, then the generated address will be shifted by a PC relative amount.
LB rt, offset(rs) |
nanoMIPS |
Load Byte |
Load Byte. Load signed byte to register $rt from memory address $rs + offset (registerplus immediate).
nanoMIPS
100001 |
rt |
rs |
0000 |
u |
6 |
5 |
5 |
4 |
12 |
offset = u
010111 |
rt3 |
rs3 |
00 |
u |
6 |
3 |
3 |
2 |
2 |
rt = decode_gpr(rt3, 'gpr3') rs = decode_gpr(rs3, 'gpr3') offset = u
010001 |
rt |
000 |
u |
6 |
5 |
3 |
18 |
rs = 28 offset = u
101001 |
rt |
rs |
s[8] |
0000 |
0 |
00 |
s[7:0] |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
8 |
offset = sign_extend(s, from_nbits=9)
va = effective_address(GPR[rs], offset, 'Load') data = read_memory_at_va(va, nbytes=1) GPR[rt] = sign_extend(data, from_nbits=8)
Address Error. Bus Error. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.
LBE rt, offset(rs) |
nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege. |
Load Byte using EVA addressing |
Load Byte using EVA addressing. Load signed byte to register $rt from virtual address $rs+ offset, translating the virtual address as though the core is in user mode, although it is actually in kernel mode.
nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege.
101001 |
rt |
rs |
s[8] |
0000 |
0 |
10 |
s[7:0] |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
8 |
offset = sign_extend(s, from_nbits=9) if not C0.Config5.EVA: raise exception('RI') if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) va = effective_address(GPR[rs], offset, 'Load', eva=True) data = read_memory_at_va(va, nbytes=1, eva=True) GPR[rt] = sign_extend(data, from_nbits=8)
Address Error. Bus Error. Coprocessor Unusable. Reserved Instruction if EVA not implemented. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.
LBU rt, offset(rs) |
nanoMIPS |
Load Byte Unsigned |
Load Byte Unsigned. Load unsigned byte to register $rt from memory address $rs + offset(register plus immediate).
nanoMIPS
100001 |
rt |
rs |
0010 |
u |
6 |
5 |
5 |
4 |
12 |
offset = u
010111 |
rt3 |
rs3 |
10 |
u |
6 |
3 |
3 |
2 |
2 |
rt = decode_gpr(rt3, 'gpr3') rs = decode_gpr(rs3, 'gpr3') offset = u
010001 |
rt |
010 |
u |
6 |
5 |
3 |
18 |
rs = 28 offset = u
101001 |
rt |
rs |
s[8] |
0010 |
0 |
00 |
s[7:0] |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
8 |
offset = sign_extend(s, from_nbits=9)
va = effective_address(GPR[rs], offset, 'Load') GPR[rt] = read_memory_at_va(va, nbytes=1)
Address Error. Bus Error. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.
LBUE rt, offset(rs) |
nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege. |
Load Byte Unsigned using EVA addressing |
Load Byte Unsigned using EVA addressing. Load unsigned byte to register $rt from virtualaddress $rs + offset, translating the virtual address as though the core is in user mode, although it is actually in kernel mode.
nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege.
101001 |
rt |
rs |
s[8] |
0010 |
0 |
10 |
s[7:0] |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
8 |
offset = sign_extend(s, from_nbits=9) if not C0.Config5.EVA: raise exception('RI') if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) va = effective_address(GPR[rs], offset, 'Load', eva=True) GPR[rt] = read_memory_at_va(va, nbytes=1, eva=True)
Address Error. Bus Error. Coprocessor Unusable. Reserved Instruction if EVA not implemented. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.
LBUX rd, rs(rt) |
nanoMIPS |
Load Byte Unsigned indeXed |
Load Byte Unsigned indeXed. Load unsigned byte to register $rd from memory address $rt+ $rs (register plus register).
nanoMIPS
001000 |
rt |
rs |
rd |
0010 |
0 |
000 |
111 |
6 |
5 |
5 |
5 |
4 |
1 |
3 |
3 |
va = effective_address(GPR[rs], GPR[rt], 'Load') GPR[rd] = read_memory_at_va(va, nbytes=1)
Address Error. Bus Error. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.
LBX rd, rs(rt) |
nanoMIPS |
Load Byte indeXed |
Load Byte indeXed. Load signed byte to register $rd from memory address $rt + $rs (register plus register).
nanoMIPS
001000 |
rt |
rs |
rd |
0000 |
0 |
000 |
111 |
6 |
5 |
5 |
5 |
4 |
1 |
3 |
3 |
va = effective_address(GPR[rs], GPR[rt], 'Load') data = read_memory_at_va(va, nbytes=1) GPR[rd] = sign_extend(data, from_nbits=8)
Address Error. Bus Error. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.
LH rt, offset(rs) |
nanoMIPS |
Load Half |
Load Half. Load signed halfword to register $rt from memory address $rs + offset (registerplus immediate).
nanoMIPS
100001 |
rt |
rs |
0100 |
u |
6 |
5 |
5 |
4 |
12 |
offset = u
011111 |
rt3 |
rs3 |
0 |
u[2:1] |
0 |
6 |
3 |
3 |
1 |
2 |
1 |
rt = decode_gpr(rt3, 'gpr3') rs = decode_gpr(rs3, 'gpr3') offset = u
010001 |
rt |
100 |
u[17:1] |
0 |
6 |
5 |
3 |
17 |
1 |
rs = 28 offset = u
101001 |
rt |
rs |
s[8] |
0100 |
0 |
00 |
s[7:0] |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
8 |
offset = sign_extend(s, from_nbits=9)
va = effective_address(GPR[rs], offset, 'Load') data = read_memory_at_va(va, nbytes=2) GPR[rt] = sign_extend(data, from_nbits=16)
Address Error. Bus Error. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.
LHE rt, offset(rs) |
nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege. |
Load Half using EVA addressing |
Load Half using EVA addressing. Load signed halfword to register $rt from virtual address$rs + offset, translating the virtual address as though the core is in user mode, although it is actually in kernel mode.
nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege.
101001 |
rt |
rs |
s[8] |
0100 |
0 |
10 |
s[7:0] |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
8 |
if not C0.Config5.EVA: raise exception('RI') if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) offset = sign_extend(s, from_nbits=9) va = effective_address(GPR[rs], offset, 'Load', eva=True) data = read_memory_at_va(va, nbytes=2, eva=True) GPR[rt] = sign_extend(data, from_nbits=16)
Address Error. Bus Error. Coprocessor Unusable. Reserved Instruction if EVA not implemented. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.
LHU rt, offset(rs) |
nanoMIPS |
Load Half Unsigned |
Load Half Unsigned. Load unsigned halfword to register $rt from memory address $rs +offset (register plus immediate).
nanoMIPS
100001 |
rt |
rs |
0110 |
u |
6 |
5 |
5 |
4 |
12 |
offset = u
011111 |
rt3 |
rs3 |
1 |
u[2:1] |
0 |
6 |
3 |
3 |
1 |
2 |
1 |
rt = decode_gpr(rt3, 'gpr3') rs = decode_gpr(rs3, 'gpr3') offset = u
010001 |
rt |
100 |
u[17:1] |
1 |
6 |
5 |
3 |
17 |
1 |
rs = 28 offset = u
101001 |
rt |
rs |
s[8] |
0110 |
0 |
00 |
s[7:0] |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
8 |
offset = sign_extend(s, from_nbits=9)
va = effective_address(GPR[rs], offset, 'Load') GPR[rt] = read_memory_at_va(va, nbytes=2)
Address Error. Bus Error. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.
LHUE rt, offset(rs) |
nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege. |
Load Half Unsigned using EVA addressing |
Load Half Unsigned using EVA addressing. Load unsigned halfword to register $rt from virtual address $rs + offset, translating the virtual address as though the core is in user mode, although it is actually in kernel mode.
nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege.
101001 |
rt |
rs |
s[8] |
0110 |
0 |
10 |
s[7:0] |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
8 |
if not C0.Config5.EVA: raise exception('RI') if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) offset = sign_extend(s, from_nbits=9) va = effective_address(GPR[rs], offset, 'Load', eva=True) GPR[rt] = read_memory_at_va(va, nbytes=2, eva=True)
Address Error. Bus Error. Coprocessor Unusable. Reserved Instruction if EVA not implemented. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.
LHUX rd, rs(rt) |
nanoMIPS |
Load Half Unsigned indeXed |
Load Half Unsigned indeXed. Load unsigned halfword to register $rd from memory address$rt + $rs (register plus register).
nanoMIPS
001000 |
rt |
rs |
rd |
0110 |
0 |
000 |
111 |
6 |
5 |
5 |
5 |
4 |
1 |
3 |
3 |
va = effective_address(GPR[rs], GPR[rt], 'Load') GPR[rd] = read_memory_at_va(va, nbytes=2)
Address Error. Bus Error. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.
LHUXS rd, rs(rt) |
nanoMIPS |
Load Half Unsigned indeXed Scaled |
Load Half Unsigned indeXed Scaled. Load unsigned halfword to register $rd from memoryaddress $rt + 2*$rs (register plus scaled register).
nanoMIPS
001000 |
rt |
rs |
rd |
0110 |
1 |
000 |
111 |
6 |
5 |
5 |
5 |
4 |
1 |
3 |
3 |
va = effective_address(GPR[rs]<<1, GPR[rt], 'Load') GPR[rd] = read_memory_at_va(va, nbytes=2)
Address Error. Bus Error. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.
LHX rd, rs(rt) |
nanoMIPS |
Load Half indeXed |
Load Half indeXed. Load signed halfword to register $rd from memory address $rt + $rs(register plus register).
nanoMIPS
001000 |
rt |
rs |
rd |
0100 |
0 |
000 |
111 |
6 |
5 |
5 |
5 |
4 |
1 |
3 |
3 |
va = effective_address(GPR[rs], GPR[rt], 'Load') data = read_memory_at_va(va, nbytes=2) GPR[rd] = sign_extend(data, from_nbits=16)
Address Error. Bus Error. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.
LHXS rd, rs(rt) |
nanoMIPS |
Load Half indeXed Scaled |
Load Half indeXed Scaled. Load signed halfword to register $rd from memory address $rt+ 2*$rs (register plus scaled register).
nanoMIPS
001000 |
rt |
rs |
rd |
0100 |
1 |
000 |
111 |
6 |
5 |
5 |
5 |
4 |
1 |
3 |
3 |
va = effective_address(GPR[rs]<<1, GPR[rt], 'Load') data = read_memory_at_va(va, nbytes=2) GPR[rd] = sign_extend(data, from_nbits=16)
Address Error. Bus Error. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.
LI rt, s |
nanoMIPS, availability varies by format. |
Load Immediate |
Load Immediate. Load immediate value s to register $rt.
nanoMIPS, availability varies by format.
110100 |
rt3 |
eu |
6 |
3 |
7 |
rt = decode_gpr(rt3, 'gpr3') s = -1 if eu == 127 else eu not_in_nms = False
011000 |
rt |
00000 |
s[15:0] |
s[31:16] |
6 |
5 |
5 |
16 |
16 |
s = sign_extend(s[31:16] @ s[15:0]) not_in_nms = True
if not_in_nms and C0.Config5.NMS == 1: raise exception('RI') GPR[rt] = s
Reserved Instruction for LI[48] format on NMS cores.
LL rt, offset(rs) |
nanoMIPS, availability varies by format. |
Load Linked word/Load Linked word using EVA addressing/Load Linked Word Pair/Load |
LLE rt, offset(rs) |
nanoMIPS, availability varies by format. |
Load Linked word/Load Linked word using EVA addressing/Load Linked Word Pair/Load |
LLWP rt, ru, (rs) |
nanoMIPS, availability varies by format. |
Load Linked word/Load Linked word using EVA addressing/Load Linked Word Pair/Load |
LLWPE rt, ru, (rs) |
nanoMIPS, availability varies by format. |
Load Linked word/Load Linked word using EVA addressing/Load Linked Word Pair/Load |
Load Linked word/Load Linked word using EVA addressing/Load Linked Word Pair/LoadLinked Word Pair using EVA addressing. For LL/LLE,load word for atomic RMW to register $rt from address $rs + offset (register plus immediate).For LLWP/LLWPE,load words for atomic RMW to
registers $rt and $ru from address $rs. For LLE/LLWPE, translate the virtual address as though the core is in user mode, although it is actually in kernel mode.
nanoMIPS, availability varies by format.
101001 |
rt |
rs |
s[8] |
1010 |
0 |
01 |
s[7:2] |
00 |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
6 |
2 |
offset = sign_extend(s, from_nbits=9) nbytes = 4 is_eva = False
101001 |
rt |
rs |
s[8] |
1010 |
0 |
10 |
s[7:2] |
00 |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
6 |
2 |
offset = sign_extend(s, from_nbits=9) nbytes = 4 is_eva = True
101001 |
rt |
rs |
x |
1010 |
0 |
01 |
ru |
x |
01 |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
5 |
1 |
2 |
offset = 0 nbytes = 8 is_eva = False
101001 |
rt |
rs |
x |
1010 |
0 |
10 |
ru |
x |
01 |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
5 |
1 |
2 |
offset = 0 nbytes = 8 is_eva = True
if nbytes == 8 and C0.Config5.XNP: raise exception('RI', 'LLWP[E] requires word-paired support') if is_eva and not C0.Config5.EVA: raise exception('RI') va = effective_address(GPR[rs], offset, 'Load', eva=is_eva) # Linked access must be aligned. if va & (nbytes-1): raise exception('ADEL', badva=va) pa, cca = va2pa(va, 'Load', eva=is_eva) if (cca == 2 or cca == 7) and not C0.Config5.ULS: raise UNPREDICTABLE('uncached CCAnotsynchronizable when Config5.ULS=0') # (Preferred behavior for non-synchronizableaddressisBusError). # Indicate that there is an active RMW sequence onthisprocessor. C0.LLAddr.LLB = 1 # Save target address of active RMW sequence. record_linked_address(va, pa, cca, nbytes=nbytes) data = read_memory(va, pa, cca, nbytes=nbytes) if nbytes == 4: # LL/LLE GPR[rt] = sign_extend(data, from_nbits=32) else: # LLWP/LLWPE word0 = data[63:32] if C0.Config.BE else data[31:0] word1 = data[31:0] if C0.Config.BE else data[63:32] if rt == ru: raise UNPREDICTABLE() GPR[rt] = sign_extend(word0, from_nbits=32) GPR[ru] = sign_extend(word1, from_nbits=32)
The LL/LLE/LLWP/LLWPE instructions are used to initiate an atomic read-modify-write sequence. C0.LLAddr.LLB is set to 1,indicating that there is an active RMW sequence on the current processor,
and an implementation dependent set of state is saved which indicates the address and access type of the active RMW sequence. There can be only one active RMW sequence per processor.
The RMW sequence will be completed by a matching SC/SCE/SCWP/SCWPE instruction.The storeconditional instruction will only complete if the system can guarantee that the accessed memory location has not been modified since the load-linked instruction occurred, as discussed in more detail
in
the SC/SCE/SCWP/SCWPE instruction description.
The address and CCA targeted by the LL/LLE/LLWP/LLWPE must be must be synchronizable by all processors and I/O devices sharing the location; if it is not, the result is UNPREDICTABLE. Which storage is
synchronizable is a function of both CPU and system implementations - see the SC/SCE/SCWP/SCWPE
instruction for the formal definition.The preferred behavior for a load-linked instruction which attempts to access an address which is not synchronizable is a Bus Error exception.
If Config5.ULS is set, then the system supports uncached load-linked/store-conditional accesses. Otherwise, the result of uncached accesses is unpredictable.
A LL/LLE/LLWP/LLWPE instruction on one processor must nottake action that, by itself, causes a
store-conditional instruction for the same block on another processor to fail. For example, if an implementation depends on retaining the data in the cache during the RMW sequence, cache misses caused
by a load-linked instruction must not fetch data in the exclusive state, since that would remove it from another core’s cache if it were present.
An execution of a load-linked instruction does not have to be followed by execution of store-conditional instruction; a program is free to abandon the RMW sequence without attempting a write.
Supportfor the paired word instructions LLWP/LLWPE is indicated by the Config5.XNP bit.Paired word support is required for nanoMIPS™ cores, except for NMS cores, where it is optional.
The result of LLWP/LLWPE is unpredictable if $rt and $ru are the same register.
Address Error. Bus Error. Coprocessor Unusable for LLE/LLWPE. Reserved Instruction for LLE/LLWPE if EVA not implemented. Reserved Instruction for LLWP/LLWPE ifload linked pair not implemented.
TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.
LSA rd, rs, rt, u2 |
nanoMIPS |
Load Scaled Address |
Load Scaled Address. Add register $rs scaled by a left shift u2 to register $rt and place the32 bit result in register $rd.
nanoMIPS
001000 |
rt |
rs |
rd |
u2 |
x |
001 |
111 |
6 |
5 |
5 |
5 |
2 |
3 |
3 |
3 |
sum = (GPR[rs] << u2) + GPR[rt] GPR[rd] = sign_extend(sum, from_nbits=32)
In nanoMIPS™,the shift field directly encodes the shift amount, meaning thatthe supported shift values are in the range 0 to 3 (instead of 1 to 4 in MIPSR6™).
None.
LUI rt, %hi(imm) |
nanoMIPS |
Load Upper Immediate. |
Load Upper Immediate.Load upper 20 bits ofimmediate value imm to upper 20 bits of
register $rt, and set the lower 12 bits to zero.
nanoMIPS
111000 |
rt |
s[20:12] |
s[30:21] |
0 |
s[31] |
6 |
5 |
9 |
10 |
1 |
1 |
imm = sign_extend(s, from_nbits=32)
GPR[rt] = imm
For backwards compatibility, instances of LUI which use a literal value for the immediate will be treated as containing a 16 bit immediate which should be loaded into the upper 16 bits of the target register.
To access the upper 20 bits of the register, the ’%hi(imm)’ form of the immediate must be used.
None.
LW rt, offset(rs) |
nanoMIPS, availability varies by format. |
Load Word |
Load Word. Load word to register $rt from memory address $rs + offset (register plusimmediate).
nanoMIPS, availability varies by format.
100001 |
rt |
rs |
1000 |
u |
6 |
5 |
5 |
4 |
12 |
offset = u
000101 |
rt3 |
rs3 |
u[5:2] |
6 |
3 |
3 |
4 |
rt = decode_gpr(rt3, 'gpr3') rs = decode_gpr(rs3, 'gpr3') offset = u
011101 |
rt4[3] |
u[2] |
rt4[2:0] |
rs4[3] |
u[3] |
rs4[2:0] |
6 |
1 |
1 |
3 |
1 |
1 |
3 |
if C0.Config5.NMS == 1: raise exception('RI') rt = decode_gpr(rt4[3] @ rt4[2:0], 'gpr4') rs = decode_gpr(rs4[3] @ rs4[2:0], 'gpr4') offset = u
010101 |
rt3 |
u[8:2] |
6 |
3 |
7 |
rt = decode_gpr(rt3, 'gpr3') rs = 28 offset = u
010000 |
rt |
u[20:2] |
10 |
6 |
5 |
19 |
2 |
rs = 28 offset = u
101001 |
rt |
rs |
s[8] |
1000 |
0 |
00 |
s[7:0] |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
8 |
offset = sign_extend(s, from_nbits=9)
001101 |
rt |
u[6:2] |
6 |
5 |
5 |
rs = 29 offset = u
va = effective_address(GPR[rs], offset, 'Load') data = read_memory_at_va(va, nbytes=4) GPR[rt] = sign_extend(data, from_nbits=32)
Address Error. Bus Error. Reserved Instruction for LW[4X4] format on NMS cores. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.
LWE rt, offset(rs) |
nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege. |
Load Word using EVA addressing |
Load Word using EVA addressing.
Load word to register $rt from virtual address $rs +
offset,translating the virtual address as though the core is in user mode, although it is actually in kernel mode.
nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege.
101001 |
rt |
rs |
s[8] |
1000 |
0 |
10 |
s[7:0] |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
8 |
offset = sign_extend(s, from_nbits=9) if not C0.Config5.EVA: raise exception('RI') if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) va = effective_address(GPR[rs], offset, 'Load', eva=True) data = read_memory_at_va(va, nbytes=4, eva=True) GPR[rt] = sign_extend(data, from_nbits=32)
Address Error. Bus Error. Coprocessor unusable. Reserved Instruction if EVA not implemented. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.
LWM rt, offset(rs), count |
nanoMIPS, not available in NMS |
Load Word Multiple |
Load Word Multiple. Load count words of data to registers $rt, $(rt+1),
..., $(rt+count-1)
from consecutive memory address starting at $rs + offset (register plus immediate).
nanoMIPS, not available in NMS
101001 |
rt |
rs |
s[8] |
count3 |
0 |
1 |
00 |
s[7:0] |
6 |
5 |
5 |
1 |
3 |
1 |
1 |
2 |
8 |
offset = sign_extend(s, from_nbits=9) count = 8 if count3 == 0 else count3
if C0.Config5.NMS == 1: raise exception('RI') i = 0 while i != count: this_rt = ( rt + i if rt + i < 32 else rt + i - 16 ) this_offset = offset + (i<<2) va = effective_address(GPR[rs], this_offset, 'Load') data = read_memory_at_va(va, nbytes=4) GPR[this_rt] = sign_extend(data, from_nbits=32) if this_rt == rs and i != count - 1: raise UNPREDICTABLE() i += 1
LWM loads count words to sequentially numbered register from sequential memory addresses. After loading $31, the sequence of registers continues from $16. Some example encodings of the register
list are:
rt=15, count=3:
loads [$15, $16, $17]
rt=31, count=3:
loads [$31, $16, $17].
The result is unpredictable if an LWM instruction updates the base register prior to the final load.
LWM must be implemented in such a way as to make the instruction restartable, but the implementation does not need to be fully atomic. For instance,it is allowable for a LWM instruction to be aborted by
an exception after a subset of the register updates have occurred. To ensure restartability, any write to GPR $rs (which may be used as the final output register) must be completed atomically, that is, the
instruction must graduate if and only if that write occurs.
Address Error. Bus Error. Reserved Instruction on NMS cores. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.
LWPC rt, address |
nanoMIPS, not available in NMS |
Load Word PC relative |
Load Word PC relative. Load word to register $rt from PC relative address address.
nanoMIPS, not available in NMS
011000 |
rt |
01011 |
s[15:0] |
s[31:16] |
6 |
5 |
5 |
16 |
16 |
offset = sign_extend(s, from_nbits=32)
if C0.Config5.NMS == 1: raise exception('RI') address = effective_address(CPU.next_pc, offset) data = read_memory_at_va(address, nbytes=4) GPR[rt] = sign_extend(data, from_nbits=32)
Address Error. Bus Error. Reserved Instruction on NMS cores TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.
LWX rd, rs(rt) |
nanoMIPS |
Load Word indeXed |
Load Word indeXed. Load word to register $rd from memory address $rt + $rs (registerplus register).
nanoMIPS
001000 |
rt |
rs |
rd |
1000 |
0 |
000 |
111 |
6 |
5 |
5 |
5 |
4 |
1 |
3 |
3 |
va = effective_address(GPR[rs], GPR[rt], 'Load') data = read_memory_at_va(va, nbytes=4) GPR[rd] = sign_extend(data, from_nbits=32)
Address Error. Bus Error. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.
LWXS rd, rs(rt) |
nanoMIPS |
Load Word indeXed Scaled |
Load Word indeXed Scaled. Load word to register $rd from memory address
$rt + 4*$rs
(register plus scaled register).
nanoMIPS
001000 |
rt |
rs |
rd |
1000 |
1 |
000 |
111 |
6 |
5 |
5 |
5 |
4 |
1 |
3 |
3 |
010100 |
rt3 |
rs3 |
rd3 |
1 |
6 |
3 |
3 |
3 |
1 |
rd = decode_gpr(rd3, 'gpr3') rs = decode_gpr(rs3, 'gpr3') rt = decode_gpr(rt3, 'gpr3')
va = effective_address(GPR[rs]<<2, GPR[rt], 'Load') data = read_memory_at_va(va, nbytes=4) GPR[rd] = sign_extend(data, from_nbits=32)
Address Error. Bus Error. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.
MFC0 rt, c0s, sel |
nanoMIPS. Requires CP0 privilege. |
Move From Coprocessor 0 |
Move From Coprocessor 0. Write value of CP0 register indexed by c0s, sel to register $rt.
nanoMIPS. Requires CP0 privilege.
001000 |
rt |
c0s |
sel |
x |
0000110 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) value = read_cp0_register(c0s, sel) GPR[rt] = sign_extend(value, from_nbits=32)
An MFC0 which targets a register which is not used on the current core will return zero.
Coprocessor Unusable.
MFHC0 rt, c0s, sel |
nanoMIPS, required. |
Move From High Coprocessor 0 |
Move From High Coprocessor 0. Write bits 63..32 (when present) of CP0 register indexedby c0s, sel to register $rt.
nanoMIPS, required.
(Optional on NMS cores). Requires CP0 privilege.
001000 |
rt |
c0s |
sel |
x |
0000111 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
if C0.Config5.MVH == 0: raise exception('RI') if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) value = read_cp0_register(c0s, sel, h=True) GPR[rt] = sign_extend(value, from_nbits=32)
For certain core configurations, specific nanoMIPS32™ CP0 registers may be extended to be 64 bits wide. The MFHC0 instruction is used to read the upper 32 bits of such registers. An MFHC0 which
targets a register for which the ’high’ bits are not used will return zero.
This instruction is available when Config5.MVH=1, which is required on nanoMIPS™ cores, except for NMS cores where it is optional.
Coprocessor Unusable. Reserved Instruction on NMS cores without MVH support.
MOD rd, rs, rt |
nanoMIPS |
Modulo |
Modulo. Compute signed division of register $rs by register $rt, and place the remainderin register $rd.
nanoMIPS
001000 |
rt |
rs |
rd |
x |
0101011 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
numerator = GPR[rs] denominator = GPR[rt] if denominator == 0: quotient, remainder = (UNKNOWN, UNKNOWN) else: quotient, remainder = divide_integers(numerator, denominator) GPR[rd] = sign_extend(remainder, from_nbits=32)
None.
MODU rd, rs, rt |
nanoMIPS |
Modulo Unsigned |
Modulo Unsigned. Compute unsigned division of register $rs by register $rt, and place theremainder in register $rd.
nanoMIPS
001000 |
rt |
rs |
rd |
x |
0111011 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
numerator = zero_extend(GPR[rs], from_nbits=32) denominator = zero_extend(GPR[rt], from_nbits=32) if denominator == 0: quotient, remainder = (UNKNOWN, UNKNOWN) else: quotient, remainder = divide_integers(numerator, denominator) GPR[rd] = sign_extend(remainder, from_nbits=32)
None.
MOVE.BALC rd, rt, address |
nanoMIPS, not available in NMS |
Move and Branch and Link, Compact |
Move and Branch and Link, Compact. Copy value of register $rt to register $rd, and performan unconditional PC relative branch to address, placing the return address in register $31.
nanoMIPS, not available in NMS
000010 |
rtz4[3] |
rd1 |
rtz4[2:0] |
s[20:1] |
s[21] |
6 |
1 |
1 |
3 |
20 |
1 |
if C0.Config5.NMS == 1: raise exception('RI') rd = decode_gpr(rd1, 'gpr1') rt = decode_gpr(rtz4[3] @ rtz4[2:0], 'gpr4.zero') offset = sign_extend(s, from_nbits=22) address = effective_address(CPU.next_pc, offset) GPR[rd] = GPR[rt] GPR[31] = CPU.next_pc CPU.next_pc = address
Although this instruction is called MOVE.BALC, the order of the updates to PC, $31 and $rd is invisible to software, and an implementation may choose any order for carring out these steps.
Reserved Instruction on NMS cores.
MOVE rt, rs |
nanoMIPS |
Move |
Move. Copy value of register $rs to register $rt.
nanoMIPS
000100 |
rt!=0 rt |
rs |
6 |
5 |
5 |
GPR[rt] = GPR[rs]
None.
MOVEP dst1, dst2, src1, src2 |
nanoMIPS, not available in NMS |
Move Pair |
Move Pair. Copy value of register $src1 to register $dst1, and copy value of register $src2to register $dst2.
nanoMIPS, not available in NMS
101111 |
rtz4[3] |
rd2[0] |
rtz4[2:0] |
rsz4[3] |
rd2[1] |
rsz4[2:0] |
6 |
1 |
1 |
3 |
1 |
1 |
3 |
dst1 = decode_gpr(rd2[1] @ rd2[0], 'gpr2.reg1') dst2 = decode_gpr(rd2[1] @ rd2[0], 'gpr2.reg2') src1 = decode_gpr(rsz4[3] @ rsz4[2:0], 'gpr4.zero') src2 = decode_gpr(rtz4[3] @ rtz4[2:0], 'gpr4.zero')
111111 |
rt4[3] |
rd2[0] |
rt4[2:0] |
rs4[3] |
rd2[1] |
rs4[2:0] |
6 |
1 |
1 |
3 |
1 |
1 |
3 |
dst1 = decode_gpr(rs4[3] @ rs4[2:0], 'gpr4') dst2 = decode_gpr(rt4[3] @ rt4[2:0], 'gpr4') src1 = decode_gpr(rd2[1] @ rd2[0], 'gpr2.reg1') src2 = decode_gpr(rd2[1] @ rd2[0], 'gpr2.reg2')
if C0.Config5.NMS == 1: raise exception('RI') if dst1 == src1 or dst1 == src2 or dst2 == src1 or dst2 == src2: GPR[dst1] = UNKNOWN GPR[dst2] = UNKNOWN else: GPR[dst1] = GPR[src1] GPR[dst2] = GPR[src2]
The output register values are unpredictable if either of the output registers is also used as an input.
Reserved Instruction on NMS cores.
MOVN rd, rs, rt |
nanoMIPS |
Move if Not zero |
Move if Not zero. Copy value of register $rs to register $rd if register $rt is not zero.
nanoMIPS
001000 |
rt |
rs |
rd |
1 |
1000010 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
GPR[rd] = GPR[rs] if GPR[rt] != 0 else GPR[rd]
None.
MOVZ rd, rs, rt |
nanoMIPS |
Move if Zero |
Move if Zero. Copy value of register $rs to register $rd if register $rt is zero.
nanoMIPS
001000 |
rt |
rs |
rd |
0 |
1000010 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
GPR[rd] = GPR[rs] if GPR[rt] == 0 else GPR[rd]
None.
MTC0 rt, c0s, sel |
nanoMIPS. Requires CP0 privilege. |
Move To Coprocessor 0 |
Move To Coprocessor 0. Write value of register $rt to CP0 register indexed by c0s, sel.
nanoMIPS. Requires CP0 privilege.
001000 |
rt |
c0s |
sel |
x |
0001110 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) write_cp0_register(GPR[rt], c0s, sel)
An MTC0 to a register which is not used on the current core is ignored.
When a register is extended to have high bits for a specific configuration (see MTHC0), legacy software which is not aware of the existence of these high bits still needs to function correctly.In such cases,
the architecture may require that an MTC0 modifies the high 32 bits of the register as well as the low 32 bits to give the correct legacy behavior.
For this reason, when setting an extended CP0 register, the MTC0 to set the low 32 bits should always precede the MTHC0 to set the high 32 bits. Also, a read-modify-write sequence to set a specific bitfield
in the low 32 bits should read both the low 32 and high 32 bits, then do MTC0 followed by MTHC0 to write the modified value back.
Coprocessor Unusable.
MTHC0 rt, c0s, sel |
nanoMIPS, required. |
Move To High Coprocessor 0 |
Move To High Coprocessor 0. Write value of register $rt to bits 63..32 (when present) ofCP0 register indexed by c0s, sel.
nanoMIPS, required.
(Optional on NMS cores). Requires CP0 privilege.
001000 |
rt |
c0s |
sel |
x |
0001111 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
if C0.Config5.MVH == 0: raise exception('RI') if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) write_cp0_register(GPR[rt], c0s, sel, h=True)
For certain core configurations, specific nanoMIPS32™ CP0 registers may be extended to be 64 bits wide.The MTHC0 instruction is used to write the upper 32 bits of such registers. An MTHC0 to a
register for which the ’high’ bits are not used will be ignored.
When a register is extended to have high bits for a specific configuration, legacy software which is not aware of the existence of these high bits still needs to function correctly.In such cases, the architecture
may require that an MTC0 modifies the high 32 bits of the register as well as the low 32 bits to give the correct legacy behavior.
For this reason, when setting an extended CP0 register, the MTC0 to set the low 32 bits should always precede the MTHC0 to set the high 32 bits. Also, a read-modify-write sequence to set a specific bitfield
in the low 32 bits should read both the low 32 and high 32 bits, then do MTC0 followed by MTHC0 to write the modified value back.
This instruction is available when Config5.MVH=1, which is required on nanoMIPS™ cores, except for NMS cores where it is optional.
Coprocessor Unusable. Reserved Instruction on NMS cores without MVHm support.
MUH rd, rs, rt |
nanoMIPS |
Multiply High |
Multiply High. Multiply signed word values from registers $rs and $rt, and place bits 63..32of the result in register $rd.
nanoMIPS
001000 |
rt |
rs |
rd |
x |
0001011 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
result = GPR[rs] * GPR[rt] result_hi = result[63:32] GPR[rd] = sign_extend(result_hi, from_nbits=32)
None.
MUHU rd, rs, rt |
nanoMIPS |
Multiply High Unsigned |
Multiply High Unsigned. Multiply unsigned word values in registers $rs and $rt, and placebits 63..32 of the result in register $rd.
nanoMIPS
001000 |
rt |
rs |
rd |
x |
0011011 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
rs_unsigned = zero_extend(GPR[rs], from_nbits=32) rt_unsigned = zero_extend(GPR[rt], from_nbits=32) result = rs_unsigned * rt_unsigned result_hi = result[63:32] GPR[rd] = sign_extend(result_hi, from_nbits=32)
None.
MUL dst, src1, src2 |
nanoMIPS, availability varies by format. |
Multiply |
Multiply. Multiply signed word values in registers $src1 and $src2, and place bits 31..0 ofthe result in register $dst.
nanoMIPS, availability varies by format.
001000 |
rt |
rs |
rd |
x |
0000011 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
dst = rd src1 = rs src2 = rt not_in_mms = False
001111 |
rt4[3] |
0 |
rt4[2:0] |
rs4[3] |
1 |
rs4[2:0] |
6 |
1 |
1 |
3 |
1 |
1 |
3 |
dst = decode_gpr(rt4, 'gpr4') src1 = decode_gpr(rt4, 'gpr4') src2 = decode_gpr(rs4, 'gpr4') not_in_mms = True
if not_in_mms and C0.Config5.NMS == 1: raise exception('RI') result = GPR[src1] * GPR[src2] GPR[dst] = sign_extend(result, from_nbits=32)
Reserved Instruction for MUL[4X4] format on NMS cores.
MULU rd, rs, rt |
nanoMIPS |
Multiply Unsigned |
Multiply Unsigned. Multiply unsigned word values in registers $rs and $rt, and place bits31..0 of the result in register $rd.
nanoMIPS
001000 |
rt |
rs |
rd |
x |
0010011 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
rs_unsigned = zero_extend(GPR[rs], from_nbits=32) rt_unsigned = zero_extend(GPR[rt], from_nbits=32) result = rs_unsigned * rt_unsigned GPR[rd] = sign_extend(result, from_nbits=32)
None.
NOP |
nanoMIPS |
No Operation |
No Operation.
nanoMIPS
100000 |
00000 |
x |
1100 |
x |
0000 |
00000 |
6 |
5 |
5 |
4 |
3 |
4 |
5 |
100100 |
00000 |
x |
1 |
x |
6 |
5 |
1 |
1 |
3 |
# No operation pass
The NOP[32] encoding is equivalent to an SLL[32] instruction using $0 as output and a shift value of 0. The NOP[16] encoding is equivalent to an ADDIU[RS5] instruction using $0 as output. Therefore NOP
does not necessarily need any additionalimplementation in hardware beyond the normal behavior of the SLL[32] and ADDIU[RS5] instructions.
If software intentionally generates a NOP instruction, it should only generate these specific encodings, rather than other instructions writing to $0 which would also result in no operation.
If hardware implements a performance counter for nops,it can expect these specific instruction encodings to be used.
It should ignore the x field of the encoding, treating all values of x as representing
a valid NOP instruction. Software on the other hand should only generate NOP instructions with an x value of 0.
As for all instruction definitions containing x fields, this methodology allows for the possibility that the meaning of x values other than zero might be enhanced in the future, with the understanding that cores
prior to the enhanced definition will treat the x!=0 encodings as equivalent to the x==0 instruction.
None.
NOR rd, rs, rt |
nanoMIPS |
NOR |
NOR. Compute logical NOR of registers $rs and $rt, placing the result in register $rt.
nanoMIPS
001000 |
rt |
rs |
rd |
x |
1011010 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
GPR[rd] = ~(GPR[rs] | GPR[rt])
None.
NOT rt, rs |
nanoMIPS |
NOT |
NOT. Write logical inversion of register $rs to register $rt.
nanoMIPS
010100 |
rt3 |
rs3 |
00 |
0 |
0 |
6 |
3 |
3 |
2 |
1 |
1 |
rt = decode_gpr(rt3, 'gpr3') rs = decode_gpr(rs3, 'gpr3') GPR[rt] = ~GPR[rs]
None.
OR rd, rs, rt |
nanoMIPS |
OR |
OR. Compute logical OR of registers $rs and $rt, placing the result in register $rt.
nanoMIPS
001000 |
rt |
rs |
rd |
x |
1010010 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
010100 |
rt3 |
rs3 |
11 |
0 |
0 |
6 |
3 |
3 |
2 |
1 |
1 |
rt = decode_gpr(rt3, 'gpr3') rs = decode_gpr(rs3, 'gpr3') rd = rt
GPR[rd] = GPR[rs] | GPR[rt]
None.
ORI rt, rs, u |
nanoMIPS |
OR Immediate |
OR Immediate. Compute logical OR of register $rs with immediate u, placing the result inregister $rt.
nanoMIPS
100000 |
rt |
rs |
0000 |
u |
6 |
5 |
5 |
4 |
12 |
GPR[rt] = GPR[rs] | u
None.
PAUSE |
nanoMIPS |
Pause |
Pause. Pause until LL Bit is cleared.
nanoMIPS
100000 |
00000 |
x |
1100 |
x |
0000 |
00101 |
6 |
5 |
5 |
4 |
3 |
4 |
5 |
if C0.LLAddr.LLB: CPU.in_pause_state = True
The purpose ofthe PAUSE instruction is halt a thread (rather than entering a spin loop) when itis waiting to acquire an LL/SC lock. This is particularly useful on multi-threaded processors, since the
waiting thread may be using the same instruction pipeline as the thread which currently owns the lock, and hence entering a spin loop will delay the other thread from completing its task and freeing the
lock.
When a thread is in the paused state,it should not issue any instructions. The paused state will be
cleared either if the LLBit for the thread gets cleared, or if the thread takes an interrupt.If an interrupt occurs,
it is implementation dependent whether C0.EPC points to the PAUSE instruction or the
instruction after the PAUSE.
In LL/SC lock software, the LLBit of the waiting thread will always be cleared when the thread which owns the lock does a store instruction to the lock address in order to clear the lock. Thus the paused
thread will always be woken when it has another opportunity to acquire the lock. After the PAUSE instruction completes, software is expected to attempt to acquire the lock again by re-executing the
LL/SC sequence.
It is legal to implement PAUSE as a NOP instruction.In this case, the behavior of LL/SC lock software will be equivalent to executing a spin loop to acquire the lock. Software using PAUSE will still work,
but the benefit of having the waiting thread not consume instruction issue slots will be lost.
PAUSE is encoded as an SLL instruction with a shift value of 5, targeting GPR $0. Hence PAUSE will behave as a NOP instruction if no additional behavior beyond that of SLL is implemented.
The following assembly code example shows how the PAUSE instruction can be used to halt a thread while it is waiting to acquire an LL/SC lock.
acquire_lock: ll t0, 0(a0) /* Read softwarelock, set LLBit. */ bnezc t0, acquire_lock_retry /* Branch if softwarelock is taken.*/ addiu t0, t0, 1 /* Set the software lock. */ sc t0, 0(a0) /* Try to store the softwarelock. */ bnezc t0, 10f /* Branchiflockacquired successfully.*/ sync acquire_lock_retry: pause /* Wait for LLBITtoclear before retrying. */ bc acquire_lock /* Now retrytheoperation. */ 10: /* Critical Region Code */ ... release_lock: sync sw zero, 0(a0) /* Releasesoftwarelock,clearing LLBIT for any PAUSEd waiters */
None.
PREF hint, offset(rs) |
nanoMIPS, availability varies by format. |
Prefetch/Prefetch using EVA addressing |
PREFE hint, offset(rs) |
nanoMIPS, availability varies by format. |
Prefetch/Prefetch using EVA addressing |
Prefetch/Prefetch using EVA addressing. Perform a prefetch operation of type hint at address $rs + offset (register plus immediate). For PREFE, translate the virtual address as though the core is in user mode, although it is actually in kernel mode.
nanoMIPS, availability varies by format.
101001 |
hint!=31 hint |
rs |
s[8] |
0011 |
0 |
00 |
s[7:0] |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
8 |
offset = sign_extend(s, from_nbits=9) is_eva = False
100001 |
hint!=31 hint |
rs |
0011 |
u |
6 |
5 |
5 |
4 |
12 |
offset = u is_eva = False
312625212016151411109870
with hint!=31
101001 |
hint |
rs |
s[8] |
0011 |
0 |
10 |
s[7:0] |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
8 |
offset = sign_extend(s, from_nbits=9) is_eva = True
if is_eva and not C0.Config5.EVA: raise exception('RI') if is_eva and not IsCoprocessor0Enabled(): raise coprocessor_exception(0) va = effective_address(GPR[rs], offset, 'Load', eva=is_eva) # Perform implementation dependent prefetch actions pref(va, hint, eva=is_eva)
The PREF and PREFE instructions request that the processor take some action to improve program performance in accordance with the intended data usage specified by the hint argument.This is
typically done by moving data to or from the cache at the specified address. The meanings of hint are as follows:
hint=0:
load
Use: Prefetched data is expected to be read (not modified).
Action: Fetch data as if for a load.
hint=1: store
Use: Prefetched data is expected to be stored or modified.
Action: Fetch data as if for a store.
hint=2: L1 LRU hint
Mark the line as LRU in the L1 cache and thus preferred for next eviction.
Implementations can choose to writeback and/or invalidate the line as long as no architectural state is
modified.
hint=3: Reserved for Implementation
hint=4:
load_streamed
Use: Prefetched data is expected to be read (not modified) but not reused extensively;
it
”streams” through cache.
Action: Fetch data as if for a load and place it in the cache so that it does not displace data
prefetched as ”retained”.
hint=5: store_streamed
Use: Prefetched data is expected to be stored or modified but not reused extensively;
it
”streams” through cache.
Action: Fetch data as if for a store and place it in the cache so that it does not displace data
prefetched as ”retained”.
hint=6:
load_retained
Use: Prefetched data is expected to be read (not modified) and reused extensively; it should
be ”retained” in the cache.
Action: Fetch data as if for a load and place it in the cache so that it is not displaced by data
prefetched as ”streamed”.
hint=7: store_retained
Use: Prefetched data is expected to be stored or modified and reused extensively; it should
be ”retained” in the cache.
Action: Fetch data as if
for a store and place it in the cache so that it is not displaced by
data prefetched as ”streamed”.
hint=8..15: L2 operation
In the Release 6 architecture, hint codes 8..15 are treated the same as hint codes 0..7 respectively, but operate on the L2 cache.
hint=16..23: L3 operation
In the Release 6 architecture, hint codes 16..23 are treated the same as hint codes 0..7 respectively, but operate on the L3 cache.
hint=24..30: Reserved for Architecture
These hint codes are reserved in nanoMIPS and should act as a NOP. (This is not the same
as the MIPSR6 behavior, where these hints give a Reserved Instruction exception). Note that hint=31 is not listed as that encoding is decoded as a SYNCI instruction.
The action taken for a specific PREF instruction is both system and context dependent. Any action, including doing nothing, is permitted as long as it does not change architecturally visible state or alter
the meaning of a program.
PREF does not cause addressing-related exceptions, including TLB exceptions.If the address specified would cause an addressing exception, the exception condition is ignored and no data movement occurs.
For cached addresses, the expected and useful action is for the processor to prefetch a block of data that includes the effective address. The size of the block and the level of the memory hierarchy it is
fetched into are implementation specific.
PREF neither generates a memory operation nor modifies the state of a cache line for addresses with an uncached CCA.
Prefetch operations have no effect on cache lines that were previously locked with the CACHE instruction.
In coherent multiprocessor implementations,if the effective address uses a coherent CCA, then the instruction causes a coherent memory transaction to occur.This means a prefetch issued on one
processor can cause data to be evicted from the cache in another processor.
The memory transactions which occur as a result of a PREF instruction, such as cache refill or cache writeback, obey the same ordering and completion rules as other load or store instructions.
It is implementation dependent whether a Bus Error or Cache Error exception is reported if such an error is detected as a byproduct ofthe action taken by the PREF instruction.Implementations are
encouraged to report such errors only if there is a specific requirement for high-reliability. Note that
suppressing a bus or cache error in this case may require that the processor communicate to the system that the reference is speculative.
Hint field encodings whose function is described as ”streamed” or ”retained” convey usage intent from software to hardware. Software should not assume that hardware will always prefetch data in an
optimal way.If data is to be truly retained, software should use the Cache instruction to lock data into the cache.
Itis implementation dependent whether a data watch or EJTAG breakpoint exception is triggered by a prefetch instruction whose address matches the Watch register address match or EJTAG data
breakpoint conditions. The preferred implementation is not to match on the prefetch instruction.
Bus Error. Cache Error. Coprocessor Unusable for PREFE. Reserved Instruction for PREFE if EVA not implemented.
RDHWR rt, hs, sel |
nanoMIPS, not available in NMS |
Read Hardware Register |
Read Hardware Register. Read specific CP0 privileged state (identified by hs, sel) to register$rs. Kernel code can enable or disable user mode RDHWR accesses by programming the enable bits in the HWREna register.
nanoMIPS, not available in NMS
001000 |
rt |
hs |
sel |
x |
0111000 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
if C0.Config5.NMS == 1: raise exception('RI') if not IsCoprocessor0Enabled(): if not C0.HWREna & (1 << hs): raise exception('RI', 'Required HWREnabitnotset') if sel and hs != 4: raise exception('RI', 'sel fieldnot supported for this hs') if is_guest_mode(): check_gpsi('CP0') if hs == 0: GPR[rt] = C0.EBase.CPUNum elif hs == 1: GPR[rt] = synci_step() elif hs == 2: if is_guest_mode(): check_gpsi('GT') GPR[rt] = guest_count() else: GPR[rt] = C0.Count elif hs == 3: GPR[rt] = CPU.count_resolution elif hs == 4: if not C0.Config1.PC: raise exception('RI', 'Perf Counters not implemented') GPR[rt] = read_cp0_register(25, sel) # Performance counter register elif hs == 5: GPR[rt] = C0.Config5.XNP elif hs == 29: if not C0.Config3.ULRI: raise exception('RI') GPR[rt] = sign_extend(C0.UserLocal) else: raise exception('RI')
Coprocessor Unusable. Reserved Instruction for unsupported register numbers. Reserved Instruction on NMS cores.
RDPGPR rt, rs |
nanoMIPS. Requires CP0 privilege. |
Read Previous GPR |
Read Previous GPR. Write the value of register $rs from the previous shadow register set(SRSCtl.PSS) to register $rt in the current shadow register set (SRSCtl.CSS). If shadow register sets are not implemented,just copy the value from register $rs to register $rt.
nanoMIPS. Requires CP0 privilege.
001000 |
rt |
rs |
11 |
10000 |
101 |
111 |
111 |
6 |
5 |
5 |
2 |
5 |
3 |
3 |
3 |
if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) if C0.SRSCtl.HSS > 0: GPR[rt] = SRS[C0.SRSCtl.PSS][rs] else: GPR[rt] = GPR[rs]
Coprocessor Unusable.
RESTORE u[, dst1 [, dst2 [, ...]]] # jr=0 implied |
nanoMIPS, availability varies by format. |
Restore callee saved registers/Restore callee saved registers and Jump to Return address, |
RESTORE.JRC u[, dst1 [, dst2 [, ...]]] # jr=1 implied |
nanoMIPS, availability varies by format. |
Restore callee saved registers/Restore callee saved registers and Jump to Return address, |
Restore callee saved registers/Restore callee saved registers and Jump to Return address,Compact. Restore registers dst1, [dst2,...]from addresses at the top of the local stack frame ($29 +
u - 4, $29 + u - 8, ...), then point register $29 back to the caller’s stack frame by adding offset u. For RESTORE.JRC, return from the current subroutine by jumping to the address in $31.
nanoMIPS, availability varies by format.
100000 |
rt |
0 |
count |
0011 |
u[11:3] |
gp |
10 |
6 |
5 |
1 |
4 |
4 |
9 |
1 |
2 |
jr = 0
000111 |
rt1 |
1 |
u[7:4] |
count |
6 |
1 |
1 |
4 |
4 |
rt = 30 if rt1 == 0 else 31 gp = 0 jr = 1
100000 |
rt |
0 |
count |
0011 |
u[11:3] |
gp |
11 |
6 |
5 |
1 |
4 |
4 |
9 |
1 |
2 |
jr = 1
if gp and C0.Config5.NMS: raise exception('RI') i = 0 while i != count: this_rt = ( 28 if gp and (i + 1 == count) else rt + i if rt + i < 32 else rt + i - 16 ) this_offset = u - ( (i+1) << 2 ) va = effective_address(GPR[29], this_offset, 'Load') if va & 3: raise exception('ADEL', badva=va) data = read_memory_at_va(va, nbytes=4) GPR[this_rt] = sign_extend(data, from_nbits=32) if this_rt == 29: raise UNPREDICTABLE() i += 1 GPR[29] = effective_address(GPR[29], u) if jr: CPU.next_pc = GPR[31]
The purpose of the RESTORE and RESTORE.JRC instructions is to restore callee saved registers from the stack on exit from a subroutine, adjust the stack pointer register $29 to point to the caller’s stack
frame, and for RESTORE.JRC to return from the subroutine by jumping to the address in register $31. RESTORE/RESTORE.JRC will usually be paired with a matching SAVE instruction at the start of the
subroutine, and SAVE and RESTORE take the same arguments.
The arguments for RESTORE/RESTORE.JRC consist of the amount to increment the stack by, and a list of registers to restore from to the stack. The increment is a double word aligned immediate value u
in the range 0 to 4092. The register list can contain up to 16 consecutive registers. The count of the number of registers is encoded in the instruction’s count field. The first register in the list is encoded
in the rt field of the instruction.
The register list is allowed to wrap around from register $31 back to register $16 and still be considered consecutive; this allows fp ($30) and ra ($31) and the saved temporary registers s0-s7 ($16 - $23) to
be restored in one instruction.
Additionally, $28 (the global pointer register) will be used in place of last register in the sequence if the ’gp’ bit in the instruction encoding is set. This feature (which is not available for NMS cores) makes it
possible to treat $28 as a callee saved register for environments such as Linux which require it.
The restored registers are read from memory addresses $29+ u -4, $29 + u -8, $29 + u -12,... etc,i.e. at the top of the local stack frame. The stack pointer is then adjusted by adding the size u of
the local stack frame, so that it points back to the caller’s stack frame.
RESTORE.JRC with count=0 adjusts the stack pointer and jumps to the address in $31, but does not restore any registers from memory.Thus the RESTORE.JRC[16]instruction format can be used to
provide ADDIU $29, $29,u; JRC $31 behavior using a single 16 bit instruction.
The result of a RESTORE instruction is UNPREDICTABLE if the register list includes register $29.
RESTORE/RESTORE.JRC must be implemented in such a way as to make the instructions restartable,
butthe implementation does not need to be fully atomic.Forinstance,itis allowable for a RESTORE/RESTORE.JRC instruction to be aborted by an exception after a subset of the register updates
have occurred. To ensure restartability, the write to GPR $29 and the jump (for RESTORE.JRC) must be completed atomically, that is, the instruction must graduate if and only if those writes occur.
Address Error. Bus Error. Reserved Instruction for gp=1 cases on NMS cores. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.
ROTR rt, rs, shift |
nanoMIPS |
Rotate Right |
Rotate Right. Rotate the word value in register $rs by shift value shift, and place the resultin register $rt.
nanoMIPS
100000 |
rt |
rs |
1100 |
x |
0110 |
shift |
6 |
5 |
5 |
4 |
3 |
4 |
5 |
tmp = GPR[rs][31:0] @ GPR[rs][31:0] result = tmp >> shift GPR[rt] = sign_extend(result, from_nbits=32)
None.
ROTRV rd, rs, rt |
nanoMIPS |
Rotate Right Variable |
Rotate Right Variable. Rotate the word value in register $rs by the shift value contained inregister $rt, and place the result in register $rd.
nanoMIPS
001000 |
rt |
rs |
rd |
x |
0011010 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
shift = GPR[rt] & 0x1f tmp = GPR[rs][31:0] @ GPR[rs][31:0] result = tmp >> shift GPR[rd] = sign_extend(result, from_nbits=32)
None.
ROTX rt, rs, shift, shiftx, stripe |
nanoMIPS, not available in NMS |
Rotate and eXchange |
Rotate and eXchange. Rotate and exchange bits in the word value in register $rs and placeresult in register $rt. Specific choices of the shift, shiftx and stripe arguments allow this instruction to perform bit and byte reordering operations including BYTEREVW, BYTEREVH, BITREVW, BITREVH
and BITREVB.
nanoMIPS, not available in NMS
100000 |
rt |
rs |
1101 |
0 |
shiftx[4:1] |
stripe |
0 |
shift |
6 |
5 |
5 |
4 |
1 |
4 |
1 |
1 |
5 |
if C0.Config5.NMS: raise exception('RI') tmp0 = GPR[rs][31:0] @ GPR[rs][31:0] tmp1 = tmp0 for i in range(47): # 0..46 s = shift if (i & 0b01000) else shiftx if stripe and not (i & 0b00100): s = ~s if s[4]: tmp1[i] = tmp0[i+16] tmp2 = tmp1 for i in range(39): # 0..38 s = shift if (i & 0b00100) else shiftx if s[3]: tmp2[i] = tmp1[i+8] tmp3 = tmp2 for i in range(35): # 0..34 s = shift if (i & 0b00010) else shiftx if s[2]: tmp3[i] = tmp2[i+4] tmp4 = tmp3 for i in range(33): # 0..32 s = shift if (i & 0b00001) else shiftx if s[1]: tmp4[i] = tmp3[i+2] tmp5 = tmp4 for i in range(32): # 0..31 s = shift; if s[0]: tmp5[i] = tmp4[i+1] GPR[rt] = sign_extend(tmp5, from_nbits=32)
The ROTX instruction can be used to reverse elements of a selected size within blocks of a different selected size. Some example use cases are shown in the table below. The ’Result’ shows the output value
assuming an input value of abcdefgh ijklmnopqrstuvwx yz012345, where each character represents the value of a single bit.
Assembly/Result from
abcdefgh ijklmnop qrstuvwx
AliasOperationyz012345
BITREVWReverse all bitsROTX rt, rs, 31, 0
543210zy xwvutsrq ponmlkji hgfedcba
BITREVHReverse bits in halfsROTX rt, rs, 15, 16
ponmlkji hgfedcba 543210zy xwvutsrq
BITREVBReverse bits in bytesROTX rt, rs, 7, 8, 1
hgfedcba ponmlkji xwvutsrq 543210zy
BYTEREVWReverse all bytesROTX rt, rs, 24, 8
yz012345 qrstuvwx ijklmnop abcdefgh
BYTEREVHReverse bytes in halfsROTX rt, rs, 8, 24
ijklmnop abcdefgh yz012345 qrstuvwx
Reverse all nibblesROTX rt, rs, 28, 4
2345yz01 uvwxqrst mnopijkl efghabcd
Reverse nibbles in halfsROTX rt, rs, 12, 20
mnopijkl efghabcd 2345yz01 uvwxqrst
Reverse nibbles in bytesROTX rt, rs, 4, 12, 1
efghabcd mnopijkl uvwxqrst
Assembly/Result from
abcdefgh ijklmnop qrstuvwx
AliasOperationyz012345
Reverse all bit pairsROTX rt, rs, 30, 2
452301yz wxuvstqr opmnklij ghefcdab
Reverse pairs in halfsROTX rt, rs, 14, 18
opmnklij ghefcdab 452301yz wxuvstqr
Reverse pairs in bytesROTX rt, rs, 6, 10, 1
ghefcdab opmnklij wxuvstqr 452301yz
Assembler aliases are provided for certain cases, as indicated in the table.
The MIPS32™ instructions BITSWAP and WSBH are equivalent to BITREVB and BYTEREVH respectively, and are also provided as assembler aliases to ROTX.
The ROTX instruction is designed to be implementable with minimal overhead using existing logic for the ROTR instruction. ROTR can be implemented using a barrel shifter, where the select signals for
the multiplexers at each stage are the bits of the ’shift’ argument. For ROTX, the mux select signals depend on the bit position as well as the stage of the shifter, and are a function of the ’shift’,’shiftx’
and ’stripe’ arguments.
Reserved Instruction on NMS cores.
SAVE u[, src1 [, src2 [, ...]]] |
nanoMIPS, availability varies by format. |
Save callee saved registers |
Save callee saved registers.
Save registers src1,[src2,...]to addresses just below the
current stack pointer ($29) address and adjust the stack pointer by subtracting offset u to accommodate the saved registers and the local stack frame.
nanoMIPS, availability varies by format.
000111 |
rt1 |
0 |
u[7:4] |
count |
6 |
1 |
1 |
4 |
4 |
rt = 30 if rt1 == 0 else 31 gp = 0
100000 |
rt |
0 |
count |
0011 |
u[11:3] |
gp |
00 |
6 |
5 |
1 |
4 |
4 |
9 |
1 |
2 |
if gp and C0.Config5.NMS: raise exception('RI') i = 0 while i != count: this_rt = ( 28 if gp and (i + 1 == count) else rt + i if rt + i < 32 else rt + i - 16 ) this_offset = - ( (i+1) << 2 ) va = effective_address(GPR[29], this_offset, 'Load') if va & 3: raise exception('ADES', badva=va) data = zero_extend(GPR[this_rt], from_nbits=32) write_memory_at_va(data, va, nbytes=4) i += 1 GPR[29] = effective_address(GPR[29], -u)
The purpose of the SAVE instruction is to save callee saved registers to the stack on entry to a subroutine, and adjust the stack pointer register ($29) to accommodate the saved registers and the subroutine’s local stack frame.
The instruction specification consists of the amount to decrement the stack by, and a list of registers to save to the stack. The stack decrement is a double word aligned immediate value u in the range
0 to 4092. The register list can contain up to 16 consecutive registers. The count of the number of registers in the register list is encoded in the instruction’s count field. The first register in the list is
encoded in the rt field of the instruction.
The register list is allowed to wrap around from register $31 back to register $16 and still be considered consecutive; this allows fp ($30) and ra ($31) and the saved temporary registers s0-s7 ($16 - $23) to
be saved in one instruction.
Additionally, $28 (the global pointer register) will be used in place of last register in the sequence if the ’gp’ bit in the instruction encoding is set. This feature (which is not available for NMS cores) makes it
possible to treat $28 as a callee saved register for environments such as Linux which require it.
The saved registers are written to memory addresses $29-4, $29-8, $29-12,...etc,i.e.just below the current stack pointer address. The stack pointer is then adjusted by subtracting offset u, which
should be chosen to accommodate the saved registers and current subroutine’s local stack frame, while maintaining the required stack pointer alignment.
SAVE with count=0 adjusts the stack pointer but does not save any registers to memory.Thus the
SAVE[16] instruction format can be used to provide ADDIU16$29, $29, -u behavior.
SAVE must be implemented in such a way as to make the instruction restartable, but the implementation does not need to be fully atomic. For instance, it is allowable for a SAVE instruction to be aborted
by an exception after a subset ofthe memory updates have occurred.To ensure restartability,the write to GPR $29 must be completed atomically,that is,the instruction must graduate if and only if
that write occurs.
Address Error.Bus Error.Reserved Instruction for gp=1 cases on NMS cores.TLB Invalid.TLB Modified. TLB Refill. Watch.
SB rt, offset(rs) |
nanoMIPS |
Store Byte |
Store Byte.
Store byte from register $rt to memory address $rs + offset (register plus
immediate).
nanoMIPS
100001 |
rt |
rs |
0001 |
u |
6 |
5 |
5 |
4 |
12 |
offset = u
010111 |
rtz3 |
rs3 |
01 |
u |
6 |
3 |
3 |
2 |
2 |
rt = decode_gpr(rtz3, 'gpr3.src.store') rs = decode_gpr(rs3, 'gpr3') offset = u
010001 |
rt |
001 |
u |
6 |
5 |
3 |
18 |
rs = 28 offset = u
101001 |
rt |
rs |
s[8] |
0001 |
0 |
00 |
s[7:0] |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
8 |
offset = sign_extend(s, from_nbits=9)
va = effective_address(GPR[rs], offset, 'Store') data = zero_extend(GPR[rt], from_nbits=8) write_memory_at_va(data, va, nbytes=1)
Address Error. Bus Error. TLB Invalid. TLB Modified. TLB Refill. Watch.
SBE rt, offset(rs) |
nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege. |
Store Byte using EVA addressing |
Store Byte using EVA addressing.
Store byte from register $rt to virtual address $rs +
offset,translating the virtual address as though the core is in user mode, although it is actually in kernel mode.
nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege.
101001 |
rt |
rs |
s[8] |
0001 |
0 |
10 |
s[7:0] |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
8 |
offset = sign_extend(s, from_nbits=9) if not C0.Config5.EVA: raise exception('RI') if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) va = effective_address(GPR[rs], offset, 'Store', eva=True) data = zero_extend(GPR[rt], from_nbits=8) write_memory_at_va(data, va, nbytes=1, eva=True)
Address Error. Bus Error. Coprocessor Unusable. Reserved Instruction if EVA not implemented. TLB Invalid. TLB Modified. TLB Refill. Watch.
SBX rd, rs(rt) |
nanoMIPS, not available in NMS |
Store Byte indeXed |
Store Byte indeXed. Store byte from register $rt to memory address $rt + $rs (registerplus register).
nanoMIPS, not available in NMS
001000 |
rt |
rs |
rd |
0001 |
0 |
000 |
111 |
6 |
5 |
5 |
5 |
4 |
1 |
3 |
3 |
if C0.Config5.NMS == 1: raise exception('RI') va = effective_address(GPR[rs], GPR[rt], 'Store') data = zero_extend(GPR[rd], from_nbits=8) write_memory_at_va(data, va, nbytes=1)
Address Error. Bus Error. Reserved Instruction on NMS cores. TLB Invalid. TLB Modified. TLB Refill. Watch.
SC rt, offset(rs) |
nanoMIPS, availability varies by format. |
Store Conditional word/Store Conditional word using EVA addressing/Store Conditional |
SCE rt, offset(rs) |
nanoMIPS, availability varies by format. |
Store Conditional word/Store Conditional word using EVA addressing/Store Conditional |
SCWP rt, ru, (rs) |
nanoMIPS, availability varies by format. |
Store Conditional word/Store Conditional word using EVA addressing/Store Conditional |
SCWPE rt, ru, (rs) |
nanoMIPS, availability varies by format. |
Store Conditional word/Store Conditional word using EVA addressing/Store Conditional |
Store Conditional word/Store Conditional word using EVA addressing/Store Conditional
Word Pair/Store Conditional Word Pair using EVA addressing. Store conditionally to complete atomic read-modify-write. For SC/SCE, store from register $rt to address $rs + offset (register plus offset).
For SCWP/SCWPE, store from registers $rt and $ru to address $rs. For SCE/SCWPE, translate the virtual address as though the core is in user mode, although itis actually in kernel mode.Indicate
Indicate success or failure by writing 1 or 0 respectively to $rt.
nanoMIPS, availability varies by format.
101001 |
rt |
rs |
s[8] |
1011 |
0 |
01 |
s[7:2] |
00 |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
6 |
2 |
offset = sign_extend(s, from_nbits=9) nbytes = 4 is_eva = False
101001 |
rt |
rs |
s[8] |
1011 |
0 |
10 |
s[7:2] |
00 |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
6 |
2 |
offset = sign_extend(s, from_nbits=9) nbytes = 4 is_eva = True
101001 |
rt |
rs |
x |
1011 |
0 |
01 |
ru |
x |
01 |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
5 |
1 |
2 |
offset = 0 nbytes = 8 is_eva = False
101001 |
rt |
rs |
x |
1011 |
0 |
10 |
ru |
x |
01 |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
5 |
1 |
2 |
offset = 0 nbytes = 8 is_eva = True
if nbytes == 8 and C0.Config5.XNP: raise exception('RI', 'SCWP[E] requires word-paired support') if is_eva and not C0.Config5.EVA: raise exception('RI') va = effective_address(GPR[rs], offset, 'Store', eva=is_eva) # Linked access must be aligned. if va & (nbytes-1): raise exception('ADES', badva=va) pa, cca = va2pa(va, 'Store', eva=is_eva) if (cca == 2 or cca == 7) and not C0.Config5.ULS: raise UNPREDICTABLE('uncached CCA not synchronizablewhen Config5.ULS=0') # (Preferred behavior for non-synchronizableaddressisBusError). if nbytes == 4: # SC/SCE data = zero_extend(GPR[rt], from_nbits=32) else: # SCWP/SCWPE word0 = GPR[rt][31:0] word1 = GPR[ru][31:0] data = word0 @ word1 if C0.Config.BE else word1 @ word0 # Write this data to memory, but only if it can bedoneatomicallywith # respect to a prior linked load. The return valueindicateswhetherthewrite # occurred. success = write_memory(data, va, pa, cca, nbytes=nbytes, atomic=True) if success: GPR[rt] = 1 else: GPR[rt] = 0 C0.LLAddr.LLB = 0 # SC always clears LLbitregardless of address matches.
The SC, SCE, SCWP and SCWPE instructions are used to complete the atomic read-modify-write (RMW) sequence begun by a prior matching LL/LLE/LLWP/LLWPE instruction respectively.If the system can
guarantee that the write to memory can be completed prior to any other modification to the targeted data since it was read by the load-linked instruction which initiated the sequence, then the write will
complete and register $rt will be set to 1,indicating success. Otherwise, the memory write will not occur, and register $rt will be set to 0, indicating failure.
If any ofthe following events occur between a load-linked and a store conditionalinstruction,the store-conditional will fail:
The store-conditional will fail if a coherent store is completed (by either the current processor, another processor, or a coherent I/O module) into the block of synchronizable physical memory containing the load-linked data. The size and alignment of the block is implementation-dependent,
butitis atleast one word and at mostthe minimum page size.Typically,the synchronizable block size is the size of the largest cache line in use.
The store-conditional will fail if an ERET instruction has been executed since the preceding loadlinked instruction.
(Note that nanoMIPS™ also includes the ERETNC instruction, which will not
cause the store-conditional instruction to fail.)
If any of the following events occur between a load-linked and a store conditional instruction, the storeconditional may fail when it would otherwise have succeeded. Portable programs should not cause any
of these events:
The store-conditional may fail
if a load or store is executed on a processor executing a loadlinked/store-conditional sequence, and that
load or store is notto the block of synchronizable
physical memory containing the load-linked data. This is because the load or store may cause the load-linked data to be evicted from the cache.
The store-conditional may fail
if any PREF instruction is executed a processor executing a loadlinked/store-conditional sequence, due to the possibility of the PREF causing a cache eviction.
The store-conditional may fail on coherent multi-processor systems if a non-coherent store is
executed during a load-linked/store-conditional sequence and that store is to the block of synchronizable physical memory containing the linked data.
The store-conditional may fail
if the instructions executed starting with the load-linked instruction and ending with the store-conditional instruction do not lie in a 2048-byte contiguous region
of virtual memory.(The region does not have to be aligned, other than the alignment required for instruction words.)
The store-conditional may fail
if a CACHE operation is carried out during the load-linked/storeconditional sequence, due to the possibility of modifying or evicting the line containing the linked
data.In addition, non-local CACHE operations may cause a store-conditionalinstruction to fail on either the local processor or on the remote processor in multiprocessor or multi-threaded
systems.
The store-conditional must not fail as a result of any of the following events:
The store-conditional must not fail as a result of a load that executes on the processor executing
a load-linked/store-conditional sequence if the load targets the block of synchronizable physical memory containing the load-linked data.
The outcome of the store-conditional is not predictable (it may succeed or fail) under any of the following conditions:
The store-conditional result is unpredictable if the store-conditional was not preceded by a matching load-linked instruction. SC must be preceded by LL, SCE must be preceded by LLE, SCWP
must be preceded by LLWP, and SCWPE must be preceded by LLWPE.
The store-conditional result is unpredictable if the load-linked and store-conditional instructions
do not target identical virtual addresses, physical addresses and CCAs.
The store-conditional result is unpredictable if the targeted memory location is not synchronizable. A synchronizable memory location is one that is associated with the state and logic necessary to track RMW atomicity. Whether a memory location is synchronizable depends on the
processor and system configurations, and on the memory access type used for the location.
The store-conditional result
is unpredictable ifthe memory access does not use a CCA which
supports atomic RMW for the targeted address.
For uniprocessor systems, a cached noncoherent or cached coherent CCA must be used, or
additionally an uncached CCA can be used in the case that Config5.ULS=1.
For multi-processor systems or systems containing coherent I/O devices, a cached coherent CCA must be used, or additionally an uncached CCA can be used in the case that Config5.ULS=1.
When Config5.ULS=1, uncached load-linked/store-conditional operations are supported, with the following additional constraints:
The result of a store-conditional which is part of an uncached load-linked/store conditional sequence is unpredictable if during the sequence a local or remote CPU accesses the block of
memory containing the targeted data using any other CCA than that used by the load-linked and store-conditional instructions.
The result of an uncached load-linked/store-conditional sequence is only predicable if it targets
an address in the system which supports uncached RMW accesses.In particular,the system must implement a ”monitor”, which is responsible determining whether or not the address can
be updated atomically with respect to the prior linked load.In response to a store-conditional instruction,the monitor updates memory where appropriate and communicates the resultto
the processor that initiated the sequence.It is implementation dependent as to what form the monitor takes. The recommended response for load-linked/store-conditionalinstructions which
target a non-synchronizable uncached address is that the sub-system report a Bus Error to the processor.
Same processor uncached stores will cause an uncached load-linked/store-conditional sequence
to fail if the store address matches that of the sequence.
A PAUSE instruction is no-op’d when it is preceded by an uncached load-linked instruction. This
is because the event which would wake the CPU from the paused state may only be visible to the external monitor, not to the local processor.
The rules for uncached load-linked/store-conditional atomic operation apply to any uncached CCA
including UCA (UnCached Accelerated). An implementation that supports UCA must guarantee that a store-conditionalinstruction does not participate in store gathering and that it ends any
gathering initiated by stores preceding the SC in program order when the SC address coincides with a gathering address.
The effective address of a store-conditional operation must be naturally-aligned,i.e. word aligned for SC and SCE, and double-word aligned for SCWP and SCWPE: Otherwise an address exception occurs.
The following assembly code shows a possible usage of LL and SC to atomically update a memory location:
L1: ll t1, 0(t0) # Load counter. addiu t2, t1, 1 # Increment. sc t2, 0(t0) # Try to store, checking for atomicity. beqc t2, 0, L1 # If not atomic (0), try again.
Exceptions between the load-linked and store-conditionalinstructions cause the store-conditional to
fail, so instructions which can cause persistent exceptions must not be used within the load-linked/storeconditional sequence. Examples of instructions which must be avoided are are arithmetic operations
that trap, system calls, and floating point operations that trap or require software emulation assistance.
Load-linked and store-conditional must function correctly on a single processor for cached noncoherent memory so that parallel programs can be run on uniprocessor systems that do not support cached
coherent memory access types.
Support for the paired word instructions SCWP/SCWPE is indicated by the Config5.XNP bit. Paired word support is required for nanoMIPS™ cores, except for NMS cores, where it is optional.
Address Error.Bus Error.Coprocessor UnusableforSCE/SCWPE. ReservedInstructionfor SCE/SCWPE if EVA notimplemented.Reserved Instruction for SCWP/SCWPE ifload linked pair
not implemented. TLB Invalid. TLB Modified. TLB Refill. Watch.
SDBBP code |
nanoMIPS. Optional, present when Debug implemented. |
Software Debug Breakpoint |
Software Debug Breakpoint. Cause a Software Debug Breakpoint exception.
nanoMIPS. Optional, present when Debug implemented.
000000 |
00000 |
11 |
code |
6 |
5 |
2 |
19 |
000100 |
00000 |
11 |
code |
6 |
5 |
2 |
3 |
if C0.Config1.EP == 0: raise exception('RI', 'Debug not implemented') if C0.Config5.SBRI and EffectiveKSU() != 0: raise exception('RI', 'SBRI exception') if Root.C0.Config5.SBRI and is_guest_mode(): raise exception('RI', 'Root SBRIexception', g=False) debug_exception('BP') Root.C0.Debug.DBp = 1 raise EXCEPTION()
Software Debug Breakpoint. Reserved Instruction if Debug not implemented.
SEB rt, rs |
nanoMIPS, not available in NMS |
Sign Extend Byte |
Sign Extend Byte. Take the lower byte of the value in register $rs, sign extend it, and placethe result in register $rt.
nanoMIPS, not available in NMS
001000 |
rt |
rs |
x |
0000001 |
000 |
6 |
5 |
5 |
6 |
7 |
3 |
if C0.Config5.NMS == 1: raise exception('RI') GPR[rt] = sign_extend(GPR[rs], from_nbits=8)
Reserved Instruction on NMS cores.
SEH rt, rs |
nanoMIPS |
Sign Extend Half |
Sign Extend Half. Take the lower halfword of the value in register $rs, sign extend it, andplace the result in register $rt.
nanoMIPS
001000 |
rt |
rs |
x |
0001001 |
000 |
6 |
5 |
5 |
6 |
7 |
3 |
GPR[rt] = sign_extend(GPR[rs], from_nbits=16)
None.
SEQI rt, rs, u |
nanoMIPS |
Set on Equal to Immediate |
Set on Equal to Immediate. Set the register $rt to 1 if register $rs is equal to immediatevalue u, and 0 otherwise.
nanoMIPS
100000 |
rt |
rs |
0110 |
u |
6 |
5 |
5 |
4 |
12 |
GPR[rt] = 1 if GPR[rs] == u else 0
None.
SH rt, offset(rs) |
nanoMIPS |
Store Half |
Store Half. Store halfword from register $rt to memory address $rs + offset (register plusimmediate).
nanoMIPS
100001 |
rt |
rs |
0101 |
u |
6 |
5 |
5 |
4 |
12 |
offset = u
011111 |
rtz3 |
rs3 |
0 |
u[2:1] |
1 |
6 |
3 |
3 |
1 |
2 |
1 |
rt = decode_gpr(rtz3, 'gpr3.src.store') rs = decode_gpr(rs3, 'gpr3') offset = u
010001 |
rt |
101 |
u[17:1] |
0 |
6 |
5 |
3 |
17 |
1 |
rs = 28 offset = u
101001 |
rt |
rs |
s[8] |
0101 |
0 |
00 |
s[7:0] |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
8 |
offset = sign_extend(s, from_nbits=9)
va = effective_address(GPR[rs], offset, 'Store') data = zero_extend(GPR[rt], from_nbits=16) write_memory_at_va(data, va, nbytes=2)
Address Error. Bus Error. TLB Invalid. TLB Modified. TLB Refill. Watch.
SHE rt, offset(rs) |
nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege. |
Store Half using EVA addressing |
Store Half using EVA addressing. Store halfword from register $rt to virtual address $rs+ offset, translating the virtual address as though the core is in user mode, although it is actually in kernel mode.
nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege.
101001 |
rt |
rs |
s[8] |
0101 |
0 |
10 |
s[7:0] |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
8 |
offset = sign_extend(s, from_nbits=9) if not C0.Config5.EVA: raise exception('RI') if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) va = effective_address(GPR[rs], offset, 'Store', eva=True) data = zero_extend(GPR[rt], from_nbits=16) write_memory_at_va(data, va, nbytes=2, eva=True)
Address Error. Bus Error. Coprocessor Unusable. Reserved Instruction if EVA not implemented. TLB Invalid. TLB Modified. TLB Refill. Watch.
SHX rd, rs(rt) |
nanoMIPS, not available in NMS |
Store Half indeXed |
Store Half indeXed. Store halfword from register $rt to memory address $rt + $rs (registerplus register).
nanoMIPS, not available in NMS
001000 |
rt |
rs |
rd |
0101 |
0 |
000 |
111 |
6 |
5 |
5 |
5 |
4 |
1 |
3 |
3 |
if C0.Config5.NMS == 1: raise exception('RI') va = effective_address(GPR[rs], GPR[rt], 'Store') data = zero_extend(GPR[rd], from_nbits=16) write_memory_at_va(data, va, nbytes=2)
Address Error. Bus Error. Reserved Instruction on NMS Cores. TLB Invalid. TLB Modified. TLB Refill. Watch.
SHXS rd, rs(rt) |
nanoMIPS, not available in NMS |
Store Half |
Store HalfindeXed Scaled.Store halfword from register $rt to memory address $rt +
2*$rs (register plus scaled register).
nanoMIPS, not available in NMS
001000 |
rt |
rs |
rd |
0101 |
1 |
000 |
111 |
6 |
5 |
5 |
5 |
4 |
1 |
3 |
3 |
if C0.Config5.NMS == 1: raise exception('RI') va = effective_address(GPR[rs]<<1, GPR[rt], 'Store') data = zero_extend(GPR[rd], from_nbits=16) write_memory_at_va(data, va, nbytes=2)
Address Error. Bus Error. Reserved Instruction on NMS Cores. TLB Invalid. TLB Modified. TLB Refill. Watch.
SIGRIE code |
nanoMIPS |
Signal Reserved Instruction Exception |
Signal Reserved Instruction Exception.
nanoMIPS
000000 |
00000 |
00 |
code |
6 |
5 |
2 |
19 |
raise exception('RI')
Reserved Instruction.
SLL rt, rs, shift |
nanoMIPS |
Shift Left Logical |
Shift Left Logical. Left shift word value in register $rs by amount shift, and place the resultin register $rt.
nanoMIPS
100000 |
rt |
rs |
1100 |
x |
0000 |
shift |
6 |
5 |
5 |
4 |
3 |
4 |
5 |
NOP[32], EHB, PAUSE, and SYNC instruction formats overlap SLL[32].Opcodes matching those instruction formats should be processed according to the description ofthose instructions, not as
SLL[32].
001100 |
rt3 |
rs3 |
0 |
shift3 |
6 |
3 |
3 |
1 |
3 |
rt = decode_gpr(rt3, 'gpr3') rs = decode_gpr(rs3, 'gpr3') shift = 8 if shift3 == 0 else shift3
result = GPR[rs] << shift GPR[rt] = sign_extend(result, from_nbits=32)
None.
SLLV rd, rs, rt |
nanoMIPS |
Shift Left Logical Variable |
Shift Left Logical Variable. Left shift word value in register $rs by shift amount in register$rt, and place the result in register $rd.
nanoMIPS
001000 |
rt |
rs |
rd |
x |
0000010 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
shift = GPR[rt] & 0x1f result = GPR[rs] << shift GPR[rd] = sign_extend(result, from_nbits=32)
None.
SLT rd, rs, rt |
nanoMIPS |
Set on Less Than |
Set on Less Than. Set the register $rd to 1 if signed register $rs is less than signed register$rt, and 0 otherwise.
nanoMIPS
001000 |
rt |
rs |
rd |
x |
1101010 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
GPR[rd] = 1 if GPR[rs] < GPR[rt] else 0
None.
SLTI rt, rs, u |
nanoMIPS |
Set on Less Than Immediate |
Set on Less Than Immediate. Set the register $rt to 1 if the signed value in register $rs isless than immediate u, and 0 otherwise.
nanoMIPS
100000 |
rt |
rs |
0100 |
u |
6 |
5 |
5 |
4 |
12 |
GPR[rt] = 1 if GPR[rs] < u else 0
None.
SLTIU rt, rs, u |
nanoMIPS |
Set on Less Than Immediate, Unsigned |
Set on Less Than Immediate, Unsigned. Set the register $rt to 1 if the unsigned value inregister $rs is less than immediate u, and 0 otherwise.
nanoMIPS
100000 |
rt |
rs |
0101 |
u |
6 |
5 |
5 |
4 |
12 |
GPR[rt] = 1 if unsigned(GPR[rs]) < u else 0
None.
SLTU rd, rs, rt |
nanoMIPS |
Set on Less Than, Unsigned |
Set on Less Than, Unsigned. Set the register $rd to 1 if unsigned register $rs is less thanunsigned register $rt, and 0 otherwise.
nanoMIPS
001000 |
rt |
rs |
rd!=0 rd |
x |
1110010 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
GPR[rd] = 1 if unsigned(GPR[rs]) < unsigned(GPR[rt]) else 0
SLTU encodings with rd=0 are used for the DVP and EVP instructions. DVP and EVP are required to behave as NOPs on cores without Virtual Processor (VP) support. This means that no DVP/EVP special
casing is required in hardware for non-VP cores,since a SLTU instruction writing to $0 naturally behaves as a NOP.
None.
SOV rd, rs, rt |
nanoMIPS |
Set on Overflow |
Set on Overflow. Set the register $rd to 1 if the signed addition of registers $rs and $rtoverflows 32 bits, and 0 otherwise.
nanoMIPS
001000 |
rt |
rs |
rd |
x |
1111010 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
sum = GPR[rs] + GPR[rt] GPR[rd] = 1 if overflows(sum, nbits=32) else 0
None.
SRA rt, rs, shift |
nanoMIPS |
Shift Right Arithmetic |
Shift Right Arithmetic. Right shift word value in register $rs by amount shift, duplicatingthe sign bit (bit 31) in the emptied bits. Place the result in register $rt.
nanoMIPS
100000 |
rt |
rs |
1100 |
x |
0100 |
shift |
6 |
5 |
5 |
4 |
3 |
4 |
5 |
GPR[rt] = GPR[rs] >> shift
None.
SRAV rd, rs, rt |
nanoMIPS |
Shift Right Arithmetic Variable |
Shift Right Arithmetic Variable. Right shift word value in register $rs by shift amount inregister $rt, duplicating the sign bit (bit 31) in the emptied bits. Place the result in register $rd.
nanoMIPS
001000 |
rt |
rs |
rd |
x |
0010010 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
shift = GPR[rt] & 0x1f GPR[rd] = GPR[rs] >> shift
None.
SRL rt, rs, shift |
nanoMIPS |
Shift Right Logical. |
Shift Right Logical.Right shift word value in register $rs by amount shift, filling the
emptied bits with zeroes. Place the result in register $rt.
nanoMIPS
100000 |
rt |
rs |
1100 |
x |
0010 |
shift |
6 |
5 |
5 |
4 |
3 |
4 |
5 |
001100 |
rt3 |
rs3 |
1 |
shift3 |
6 |
3 |
3 |
1 |
3 |
rt = decode_gpr(rt3, 'gpr3') rs = decode_gpr(rs3, 'gpr3') shift = 8 if shift3 == 0 else shift3
result = zero_extend(GPR[rs], from_nbits=32) >> shift GPR[rt] = sign_extend(result, from_nbits=32)
None.
SRLV rd, rs, rt |
nanoMIPS |
Shift Right Logical Variable |
Shift Right Logical Variable. Right shift word value in register $rs by shift amount in register$rt, filling the emptied bits with zeros. Place the result in register $rd.
nanoMIPS
001000 |
rt |
rs |
rd |
x |
0001010 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
shift = GPR[rt] & 0x1f result = zero_extend(GPR[rs], from_nbits=32) >> shift GPR[rd] = sign_extend(result, from_nbits=32)
None.
SUB rd, rs, rt |
nanoMIPS, not available in NMS |
Subtract |
Subtract. Subtract the 32-bit signed integer in register $rt from the 32-bit signed integerin register $rs, placing the 32-bit result in register $rd, and trapping on overflow.
nanoMIPS, not available in NMS
001000 |
rt |
rs |
rd |
x |
0110010 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
if C0.Config5.NMS == 1: raise exception('RI') result = GPR[rs] -GPR[rt] if overflows(result, nbits=32): raise exception('OV') GPR[rd] = sign_extend(result, from_nbits=32)
None.
SUBU rd, rs, rt |
nanoMIPS |
Subtract (Untrapped) |
Subtract (Untrapped). Subtract the 32-bit integer in register $rt from the 32-bit integer inregister $rs, placing the 32-bit result in register $rd, and not trapping on overflow.
nanoMIPS
001000 |
rt |
rs |
rd |
x |
0111010 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
101100 |
rt3 |
rs3 |
rd3 |
1 |
6 |
3 |
3 |
3 |
1 |
rd = decode_gpr(rd3, 'gpr3') rs = decode_gpr(rs3, 'gpr3') rt = decode_gpr(rt3, 'gpr3')
result = GPR[rs] -GPR[rt] GPR[rd] = sign_extend(result, from_nbits=32)
None.
SW rt, offset(rs) |
nanoMIPS, availability varies by format. |
Store Word |
Store Word. Store word from register $rt to memory address $rs + offset (register plusimmediate).
nanoMIPS, availability varies by format.
100001 |
rt |
rs |
1001 |
u |
6 |
5 |
5 |
4 |
12 |
offset = u
100101 |
rtz3 |
rs3 |
u[5:2] |
6 |
3 |
3 |
4 |
rt = decode_gpr(rtz3, 'gpr3.src.store') rs = decode_gpr(rs3, 'gpr3') offset = u
111101 |
rtz4[3] |
u[2] |
rtz4[2:0] |
rs4[3] |
u[3] |
rs4[2:0] |
6 |
1 |
1 |
3 |
1 |
1 |
3 |
if C0.Config5.NMS == 1: raise exception('RI') rt = decode_gpr(rtz4[3] @ rtz4[2:0], 'gpr4.zero') rs = decode_gpr(rs4[3] @ rs4[2:0], 'gpr4') offset = u
010000 |
rt |
u[20:2] |
11 |
6 |
5 |
19 |
2 |
rs = 28 offset = u
110101 |
rtz3 |
u[8:2] |
6 |
3 |
7 |
rt = decode_gpr(rtz3, 'gpr3.src.store') rs = 28 offset = u
101001 |
rt |
rs |
s[8] |
1001 |
0 |
00 |
s[7:0] |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
8 |
offset = sign_extend(s, from_nbits=9)
101101 |
rt |
u[6:2] |
6 |
5 |
5 |
rs = 29 offset = u
va = effective_address(GPR[rs], offset, 'Store') data = zero_extend(GPR[rt], from_nbits=32) write_memory_at_va(data, va, nbytes=4)
Address Error. Bus Error. Reserved Instruction for SW[4X4] format on NMS Cores. TLB Invalid. TLB Modified. TLB Refill. Watch.
SWE rt, offset(rs) |
nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege. |
Store Word using EVA addressing |
Store Word using EVA addressing. Store word from register $rt to virtual address $rs +offset,translating the virtual address as though the core is in user mode, although it is actually in kernel mode.
nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege.
101001 |
rt |
rs |
s[8] |
1001 |
0 |
10 |
s[7:0] |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
8 |
offset = sign_extend(s, from_nbits=9) if not C0.Config5.EVA: raise exception('RI') if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) va = effective_address(GPR[rs], offset, 'Store', eva=True) data = zero_extend(GPR[rt], from_nbits=32) write_memory_at_va(data, va, nbytes=4, eva=True)
Address Error. Bus Error. Coprocessor Unusable. Reserved Instruction if EVA not implemented. TLB Invalid. TLB Modified. TLB Refill. Watch.
SWM rt, offset(rs), count |
nanoMIPS, not available in NMS |
Store Word Multiple. |
Store Word Multiple.Storecount wordsofdatafrom registers$rt, $(rt+1),...,
$(rt+count-1) to consecutive memory addressesstarting at $rs + offset (register plusimmediate).
nanoMIPS, not available in NMS
101001 |
rt |
rs |
s[8] |
count3 |
1 |
1 |
00 |
s[7:0] |
6 |
5 |
5 |
1 |
3 |
1 |
1 |
2 |
8 |
offset = sign_extend(s, from_nbits=9) count = 8 if count3 == 0 else count3
if C0.Config5.NMS == 1: raise exception('RI') i = 0 while i != count: this_rt = ( 0 if rt == 0 else rt + i if rt + i < 32 else rt + i - 16 ) this_offset = offset + (i<<2) va = effective_address(GPR[rs], this_offset, 'Store') data = zero_extend(GPR[this_rt], from_nbits=32) write_memory_at_va(data, va, nbytes=4) i += 1
SWM stores count words from sequentially numbered registers to sequential memory addresses. After storing $31, the sequence of registers continues from $16.If rt=0, then $0 is stored for all count steps
of the instruction. Some example encodings of the register list are:
rt=15, count=3:
loads [$15, $16, $17]
rt=31, count=3: saves [$31, $16, $17]
rt=0, count=3: saves [$0, $0, $0].
If a TLB exception or interrupt occurs during the execution of this instruction, a subset of the required memory updates may have occurred. A full restart of the instruction will be performed on return from
the exception.
Address Error. Bus Error. Reserved Instruction on NMS cores. TLB Invalid. TLB Modified. TLB Refill. Watch.
SWPC rt, address |
nanoMIPS, not available in NMS |
Store Word PC relative |
Store Word PC relative. Store word from register $rt to PC relative address address.
nanoMIPS, not available in NMS
011000 |
rt |
01111 |
s[15:0] |
s[31:16] |
6 |
5 |
5 |
16 |
16 |
offset = sign_extend(s, from_nbits=32)
if C0.Config5.NMS == 1: raise exception('RI') address = effective_address(CPU.next_pc, offset, 'Store') data = zero_extend(GPR[rt], from_nbits=32) write_memory_at_va(data, address, nbytes=4)
Address Error. Bus Error. Reserved Instruction on NMS cores. TLB Invalid. TLB Modified. TLB Refill. Watch.
SWX rd, rs(rt) |
nanoMIPS, not available in NMS |
Store Word indeXed |
Store Word indeXed. Store word from register $rt to memory address $rt + $rs (registerplus register).
nanoMIPS, not available in NMS
001000 |
rt |
rs |
rd |
1001 |
0 |
000 |
111 |
6 |
5 |
5 |
5 |
4 |
1 |
3 |
3 |
if C0.Config5.NMS == 1: raise exception('RI') va = effective_address(GPR[rs], GPR[rt], 'Store') data = zero_extend(GPR[rd], from_nbits=32) write_memory_at_va(data, va, nbytes=4)
Address Error. Bus Error. Reserved Instruction on NMS cores. TLB Invalid. TLB Modified. TLB Refill. Watch.
SWXS rd, rs(rt) |
nanoMIPS, not available in NMS |
Store Word indeXed Scaled |
Store Word indeXed Scaled. Store word from register $rt to memory address
$rt + 4*$rs
(register plus scaled register).
nanoMIPS, not available in NMS
001000 |
rt |
rs |
rd |
1001 |
1 |
000 |
111 |
6 |
5 |
5 |
5 |
4 |
1 |
3 |
3 |
if C0.Config5.NMS == 1: raise exception('RI') va = effective_address(GPR[rs]<<2, GPR[rt], 'Store') data = zero_extend(GPR[rd], from_nbits=32) write_memory_at_va(data, va, nbytes=4)
Address Error. Bus Error. Reserved Instruction on NMS cores. TLB Invalid. TLB Modified. TLB Refill. Watch.
SYNC stype |
nanoMIPS |
Sync |
Sync.
Impose ordering constraints of type stype on prior and subsequent memory operations.
nanoMIPS
100000 |
00000 |
stype |
1100 |
x |
0000 |
00110 |
6 |
5 |
5 |
4 |
3 |
4 |
5 |
sync_memory_access(stype)
The SYNC instruction is used to order loads and stores for shared memory, and also to order operations with respect to the globalinvalidate instructions GINVI and GINVT. The following types of ordering
guarantees are available with different stypes.
Completion Barriers: A completion barrier provides a guarantee that any of the specified memory
instructions before the SYNC are completed and globally performed before any of the specified memory instructions after the SYNC are performed to any extent. Loads are completed when
the destination register is written. Stores are completed when the stored value is visible to every other processor in the system.
Ordering Barriers: An ordering barrier provides a guarantee in the system that any specified
memory instructions before the SYNC are ordered before any of the specified memory instructions after the SYNC. The ordering SYNC is considered complete when the memory instructions
before and after the SYNC are guaranteed thereafter to retain their order relative to the SYNC,
i.e. when it is guaranteed that all specified memory instructions before the SYNC will be globally performed before any of
the specified memory accesses after the SYNC are performed to
any extent.Itis helpfulto think of a global ordering pointin a coherence domain, which is a point where once an instruction reaches,it can be guaranteed to retain its order relative to any
memory instruction that reaches the point after it. The ordering SYNC thus can not complete before all older specified memory instructions reach the global ordering point.
The following table shows the behavior of the SYNC instruction for each stype value. Operation types listed in the ’What reaches before’ column are subject to a pre-SYNC ordering barrier: such operations,
when younger, must reach the global ordering point before the SYNC instruction completes. Operation types listed in the ’What reaches after’ column are subjectto a post-SYNC ordering barrier:such
operations, when older, must reach the global ordering point only after the SYNC instruction completes. Operation types listed in the ’What completes before’ column are subject to a completion barrier, that
is, they must be globally performed when the SYNC instruction completes.
What
What reachesWhat reachescompletes
NamebeforeafterbeforeAvailabilitystype
0x0SYNCLoads, StoresLoads, StoresLoads, StoresRequired. 0x1-0x3Impl./vendor
specific.
0x4SYNC_WMBStoresStoresOptional. 0x5-0xFImpl./vendor
specific.
0x10SYNC_MBLoads, StoresLoads, StoresOptional. 0x11SYNC_ACQUIRE LoadsLoads, StoresOptional.
0x12SYNC_RELEASE Loads, StoresLoadsOptional. 0x13SYNC_RMBLoadsLoadsOptional.
0x14SYNC_GINVLoads, StoresLoads, StoresGINVI, GINVT,Config5.GI=2,3.
SYNCI
0x15Reserved for 0x1FArchitecture.
SYNC barriers affect only uncached and cached coherent loads and stores and do not affect the order in which instruction fetches are performed. For the purposes of this description,the CACHE, PREF
and SYNCIinstructions are treated as loads and stores.In addition,the optional GlobalInvalidate instructions are synchronizable through SYNC (stype=0x14).
The effect of SYNC on the global order of loads and stores for memory access types other than uncached and cached coherent is UNPREDICTABLE.
A completion barrier may have an adverse impact on performance compared to an ordering barrier due to the constraint of completion. An implementation may optimize the ordering of memory instructions
such that the ordering barrier completes before a completion barrier under the same circumstance. The magnitude of the impact is implementation-dependent but an implementation must ensure that an
ordering barrier is not worse performing than the equivalent completion barrier. Software thus needs to use completion and ordering barriers for the appropriate conditions.
An stype of 0 is used to define the SYNC instruction with completion barrier semantics. Non-zero values of stype may be defined by the architecture or specific implementations to perform synchronization
behaviors that are less complete than that of stype=0.If an implementation does not use one of these non-zero values to define a different synchronization behavior, then that non-zero value of stype must
map to a completion barrier. This allows software written for an implementation with a lighter-weight barrier to work on another implementation which only implements the stype=0 completion barrier.
The Acquire and Release barrier types are used to minimize the memory ordering that must be maintained and still have software synchronization work.
A completion barrier is required, potentially in conjunction with an EHB instruction,to guarantee that memory reference results are visible across operating mode changes. For example, a completion
barrier is required on some implementations on entry to and exit from Debug Mode to guarantee that memory effects are handled correctly.
If Global Invalidate instructions are supported, then SYNC (stype=0x14) acts as a completion barrier with respect to any preceding GINVI or GINVT instructions. This SYNC instruction is globalized and
only completes if all preceding GINVI or GINVT operations related to the same program have completed in the system.(Any references to GINVT also imply GINVGT, available in a virtualized MIPS system.)
Asystem thatimplementsthe GlobalInvalidatesalsorequiresthatthecompletionofSYNC (stype=0x14) be constrained by legacy SYNCI operations.Thus SYNC (stype=0x14) can also be
used to enforce synchronization of SYNCI instructions.In the typical use cases, a single GINVI is used by itself to invalidate caches and would be followed by a SYNC (stype=0x14).In the case of GINVT,
multiple GINVT could be used to invalidate multiple TLB mappings, and the SYNC (stype=0x14) would be used to guaranteed completion of any number of GINVTs preceding it.
:
Synchronizable: A load or store instruction is synchronizable if the load or store occurs to a physical location in shared memory using a virtual address with a memory access type of either uncached or
cached coherent .
Shared memory: Memory that can be accessed by more than one processor or by a coherent I/O system module.
Performed load: A load instruction is performed when the value returned by the load has been determined. The result of a load on processor A has been determined with respect to processor or coherent
I/O module B when a subsequent store to the location by B cannot affect the value returned by the load. The store by B must use the same memory access type as the load.
Performed store: A store instruction is performed when the store is observable. A store on processor A is observable with respectto processor or coherentI/O module B when a subsequentload ofthe
location by B returns the value written by the store. The load by B must use the same memory access type as the store.
Globally performed load: A load instruction is globally performed when it is performed with respect to all processors and coherent I/O modules capable of storing to the location.
Globally performed store: A store instruction is globally performed when it is globally observable.It is globally observable when it is observable by all processors and I/O modules capable of loading from
the location.
Global ordering point: A point in the coherence domain where when a memory instruction reaches,it can be guaranteed to retain its order relative to any memory instruction that reaches the point after
it.
CoherentI/O module: A coherentI/O module is an Input/Output system componentthat performs coherent Direct Memory Access (DMA). It reads and writes memory independently as though it were
a processor doing loads and stores to locations with a memory access type of cached coherent.
:
A processor executing load and store instructions observes the order in which loads and stores using the same memory access type occur in the instruction stream; this is known as program order.
A parallel program has multiple instruction streams that can execute simultaneously on different processors.
In multiprocessor (MP) systems,the order in which the effects ofloads and stores are observed by other processors - the global order of the loads and store - determines the actions necessary
to reliably share data in parallel programs.
When all processors observe the effects ofloads and stores in program order, the system is strongly ordered. On such systems, parallel programs can reliably share data without explicitly using a SYNC.
Executing SYNC on such a system is not necessary, will not cause an error, but may reduce overall performance.
If a multiprocessor system is not strongly ordered, the effects of load and store instructions executed by one processor may be observed out of program order by other processors. On such systems, parallel
programs must use SYNC to reliably share data at critical points in the program. SYNC separates the loads and stores executed on the processor into two groups, and the effect of allloads and stores in
one group is seen by all processors before the effect of any load or store in the subsequent group.In effect, SYNC causes the system to be strongly ordered for the executing processor at the instant that
the SYNC is executed.
The hardware ordering support provided in a MIPS-based multiprocessor system is implementation dependent. A parallel program that does not use SYNC generally does not operate on a system that is not
strongly ordered. However, a program that does use SYNC works on both types of systems.(Systemspecific documentation describes the actions needed to reliably share data in parallel programs for
that system.)
The behavior of a load or store using one memory access type is UNPREDICTABLE if a load or store was previously made to the same physical location using a different memory access type. The presence
of a SYNC between the references does not alter this behavior.
SYNC affects the order in which the effects of load and store instructions appear to all processors;it does not generally affect the physical memory-system ordering or synchronization issues that arise in
system programming. The effect of SYNC on implementation-specific aspects of the cached memory system, such as writeback buffers, is not defined.
The code fragments below show how SYNC can be used to coordinate the use of shared data between separate writer and reader instruction streams in a multiprocessor environment. The FLAG location is
used by the instruction streams to determine whether the shared data item DATA is valid. The SYNC executed by processor A forces the store of DATA to be performed globally before the store to FLAG
is performed. The SYNC executed by processor B ensures that DATA is not read until after the FLAG value indicates that the shared data is valid.
# Processor A (writer) # Conditions at entry: # The value 0 has been stored in FLAG and that valueisobservablebyB SW R1, DATA # change sharedDATA value LI R2, 1 SYNC # Perform DATAstore beforeperforming FLAGstore SW R2, FLAG # say that thesharedDATA value isvalid # Processor B (reader) LI R2, 1 1: LW R1, FLAG # Get FLAG BNEC R2, R1, 1B # if it says that DATAis not valid, poll again NOP SYNC # FLAG value checked beforedoing DATA read LW R1, DATA # Read (valid)sharedDATA value SYNC
None.
SYNCI offset(rs) |
nanoMIPS, availability varies by format. |
SYNChronize Instruction cache/SYNChronize Instruction cache using EVA addressing |
SYNCIE offset(rs) |
nanoMIPS, availability varies by format. |
SYNChronize Instruction cache/SYNChronize Instruction cache using EVA addressing |
SYNChronize Instruction cache/SYNChronize Instruction cache using EVA addressing. Synchronize the caches to make instructions writes at address $rs + offset (register plus immediate) effective. For SYNCIE, translate the virtual address as though the core is in user mode, although it is
actually in kernel mode.
nanoMIPS, availability varies by format.
101001 |
11111 |
rs |
s[8] |
0011 |
0 |
00 |
s[7:0] |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
8 |
offset = sign_extend(s, from_nbits=9) is_eva = False
100001 |
11111 |
rs |
0011 |
u |
6 |
5 |
5 |
4 |
12 |
offset = u is_eva = False
101001 |
11111 |
rs |
s[8] |
0011 |
0 |
10 |
s[7:0] |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
8 |
offset = sign_extend(s, from_nbits=9) is_eva = True
if is_eva and not C0.Config5.EVA: raise exception('RI') if is_eva and not IsCoprocessor0Enabled(): raise coprocessor_exception(0) va = effective_address(GPR[rs], offset, 'Load', eva=is_eva) pa, cca = va2pa(va, 'Cacheop', eva=is_eva) # Make data writes at address=PA visible to the instructionstream(forall # coherent cores in the system)... # The precise details of the operation are implementationdependent,andwill # depend on the cache hierarchy and coherency behaviorofthesystem.The # following code shows a sample implementation forasystemwherethememory # hierarchy is unified beyond the L1 instruction anddatacaches. # Find index where address is present in D cache, ifany. dcache_hit_index = cache_lookup_index('D', va, pa) if dcache_hit_index: way_index, set_index = dcache_hit_index dcache_line = get_cache_line('D',way_index, set_index) if dcache_line.valid and dcache_line.dirty: dcache_line.write_back() # Implementation may or may not invalidateline too,seebelow. for core in get_all_cores_in_system(): # Find index where address is presentin this core'sIcache,ifany. icache_hit_index = cache_lookup_index('I', va, pa, core) if icache_hit_index: way_index, set_index = icache_hit_index icache_line = get_cache_line('I', way_index, set_index,core) if not icache_line.locked: icache_line.valid = 0
SYNCI is a user privilege instruction for synchronizing the caches to make instruction writes to address
$rs + offset effective. SYNCI must be followed by a SYNC instruction and an instruction hazard barrier to guarantee that subsequent instruction fetches see the updated instructions. One SYNCI instruction
is required for every cache line that was written. The size of the cache line can be determined by the RDHWR instruction.
SYNCI can cause TLB Refill and TLB invalid exceptions (with cause code TLBL). It does not cause TLBRI exceptions. A Cache Error or Bus Error exception may occur as a result of a writeback triggered by
the instruction.
An Address Error Exception (with cause code equal ADEL) may occur if a SYNCI targets an address which is not accessible from the current operating mode.It is implementation dependent whether such
an exception does occur, but the instruction should not affect cache lines which are not accessible from the current operating mode.
It is implementation dependent whether a data watch exception is triggered by a SYNCI instruction whose address matches the Watch register address match conditions. The preferred implementation
is not to match on the SYNCI instruction.
The operation of the processor is UNPREDICTABLE if the effective address of the SYNCI targets any instruction cache line that contains instructions to be executed between the SYNCI and the subsequent
JALRC.HB, JRC.HB, or ERET instruction required to clear the instruction hazard.
The SYNCIinstruction has no effect on cache lines that were previously locked with the CACHE instruction.
If correct software operation depends on the state of a locked line, the CACHE instruction
must be used to synchronize the caches.
In multi-processor systems, a SYNCI to an address with a coherent CCA must guarantee synchronization of all coherent instruction caches in the system.
(Prior to Release 6 of the MIPS™ Architecture,
this behavior was recommended but not required).
The manner in which SYNCI is implemented will depend on the cache hierarchy of the processor. Typically, all caches out
to the point at which both instruction and data references become unified are
processed.If no caches exist or if instruction cache coherency is already guaranteed, the instruction must be implemented as a NOP.
In a typical implementation in which only the L1 instruction and data caches are affected, this instruction would perform a Hit
Invalidate operation on the instruction cache and a Hit Writeback or Hit
Writeback Invalidate on the data cache. The decision to invalidate the data cache line is implementation dependent, but should be made under the assumption that the data will not be written again soon.
If a Hit Writeback Invalidate (as opposed to a Hit Writeback) would cause the line to be selected for replacement, the invalidate option might be selected.
The following example shows a routine which could be called after the new instruction stream is written to make those changes effective.
/* * This routine makes changes to the instruction stream effectivetothe * hardware. It should be called after the instruction streamiswritten. * On return, the new instructions are effective. * * Inputs: * a0 = Start address of new instruction stream * a1 = Size in bytes of new instructionstream */ beqc a1, zero, 20f /* If size==0, branch around. */ addu a1, a0, a1 /* Calculateendaddress+ 1. */ rdhwr v0, HW_SYNCI_Step /* Get step sizeforSYNCI. */ beqc v0, zero, 20f /* Nothing to doif no caches. */ 10: synci 0(a0) /* Sync allcachesaroundaddress. */ addu a0, a0, v0 /* Addstepsize.*/ sltu v1, a0, a1 /* Notpastthe endaddress? */ bnec v1, zero, 10b /* Branchif moreto do. */ sync /* Clear memory hazards.*/ 20: jrc.hb ra /* Return, clearinginstruction hazards. */
Address Error. Bus Error. Cache Error. Coprocessor Unusable for SYNCIE. Reserved Instruction for SYNCIE if EVA not implemented. TLB Invalid. TLB Refill.
SYSCALL code |
nanoMIPS |
System Call |
System Call. Cause a System Call exception.
nanoMIPS
000000 |
00000 |
01 |
0 |
code |
6 |
5 |
2 |
1 |
18 |
000100 |
00000 |
01 |
0 |
code |
6 |
5 |
2 |
1 |
2 |
raise exception('SYSCALL')
System Call.
TEQ rs, rt, code |
nanoMIPS, not available in NMS |
Trap if Equal |
Trap if Equal. Cause a Trap exception if registers $rs and $rt are equal.
nanoMIPS, not available in NMS
001000 |
rt |
rs |
code |
0 |
0000000 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
if C0.Config5.NMS == 1: raise exception('RI') if GPR[rs] == GPR[rt]: raise exception('TRAP')
Trap.
TLBINV |
nanoMIPS. Required on TLB cores, unless Config5.IE<2. Requires CP0 privilege. |
TLB Invalidate |
TLB Invalidate.
Invalidate a set of TLB entries based on ASID match.
nanoMIPS. Required on TLB cores, unless Config5.IE<2. Requires CP0 privilege.
001000 |
x |
00 |
00011 |
101 |
111 |
111 |
6 |
10 |
2 |
5 |
3 |
3 |
3 |
if C0.Config4.IE < 2: raise exception('RI') if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) tlbinv()
Coprocessor Unusable. Reserved Instruction if TLB invalidate not implemented.
TLBINVF |
nanoMIPS. Required on TLB cores, unless Config5.IE<2. Requires CP0 privilege. |
TLB Invalidate Flush |
TLB Invalidate Flush.
Invalidate a set of TLB entries, ignoring ASID match.
nanoMIPS. Required on TLB cores, unless Config5.IE<2. Requires CP0 privilege.
001000 |
x |
00 |
01011 |
101 |
111 |
111 |
6 |
10 |
2 |
5 |
3 |
3 |
3 |
if C0.Config4.IE < 2: raise exception('RI') if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) tlbinv(flush=True)
Coprocessor Unusable. Reserved Instruction if TLB invalidate not implemented.
TLBP |
nanoMIPS. Required on TLB cores. Requires CP0 privilege. |
TLB Probe |
TLB Probe. Probe the TLB for an entry matching C0.EntryHi.
Iffound, write the index of
the matching entry to C0.Index, otherwise set C0.Index.P to 1.
nanoMIPS. Required on TLB cores. Requires CP0 privilege.
001000 |
x |
00 |
00001 |
101 |
111 |
111 |
6 |
10 |
2 |
5 |
3 |
3 |
3 |
if not got_tlb(): raise exception('RI') if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) tlbp()
Coprocessor Unusable. Reserved Instruction if TLB not implemented.
TLBR |
nanoMIPS. Required on TLB cores. Requires CP0 privilege. |
TLB Read |
TLB Read. Read the TLB entry indexed by C0.Index into the TLB CP0 registers EntryHi,EntryLo0, EntryLo1, PageMask.
nanoMIPS. Required on TLB cores. Requires CP0 privilege.
001000 |
x |
00 |
01001 |
101 |
111 |
111 |
6 |
10 |
2 |
5 |
3 |
3 |
3 |
if not got_tlb(): raise exception('RI') if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) tlbr()
Coprocessor Unusable. Reserved Instruction if TLB not implemented.
TLBWI |
nanoMIPS. Required on TLB cores. Requires CP0 privilege. |
TLB Write Indexed |
TLB Write Indexed. Write the TLB entry indexed by C0.Index using the values in the TLBCP0 registers EntryHi, EntryLo0, EntryLo1, PageMask.
nanoMIPS. Required on TLB cores. Requires CP0 privilege.
001000 |
x |
00 |
10001 |
101 |
111 |
111 |
6 |
10 |
2 |
5 |
3 |
3 |
3 |
if not got_tlb(): raise exception('RI') if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) tlbwi(C0.Index.Index)
Coprocessor Unusable. Reserved Instruction if TLB not implemented.
TLBWR |
nanoMIPS. Required on TLB cores. Requires CP0 privilege. |
TLB Write Random |
TLB Write Random. Write a randomly chosen TLB entry using the values in the TLB CP0registers EntryHi, EntryLo0, EntryLo1, PageMask.
nanoMIPS. Required on TLB cores. Requires CP0 privilege.
001000 |
x |
00 |
11001 |
101 |
111 |
111 |
6 |
10 |
2 |
5 |
3 |
3 |
3 |
if not got_tlb(): raise exception('RI') if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) tlbwr()
Coprocessor Unusable. Reserved Instruction if TLB not implemented.
TNE rs, rt, code |
nanoMIPS, not available in NMS |
Trap if Not Equal |
Trap if Not Equal. Cause a Trap exception if registers $rs and $rt are not equal.
nanoMIPS, not available in NMS
001000 |
rt |
rs |
code |
1 |
0000000 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
if C0.Config5.NMS == 1: raise exception('RI') if GPR[rs] != GPR[rt]: raise exception('TRAP')
Trap.
UALH rt, offset(rs) |
nanoMIPS, not available in NMS |
Unaligned Load Half |
Unaligned Load Half.
Load signed halfword to register $rt from memory address $rs +
offset (register plus immediate), guaranteeing that the operation completes even if the address is not halfword aligned.
nanoMIPS, not available in NMS
101001 |
rt |
rs |
s[8] |
0100 |
0 |
01 |
s[7:0] |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
8 |
if C0.Config5.NMS == 1: raise exception('RI') offset = sign_extend(s, from_nbits=9) va = effective_address(GPR[rs], offset, 'Load') data = read_memory_at_va(va, nbytes=2, unaligned_support='always') GPR[rt] = sign_extend(data, from_nbits=16)
UALH will not cause an Address Error exception for unaligned addresses.
An unaligned load/store instruction may be implemented using more than one memory transaction.It is possible for a subset of these memory transactions to have completed and then for a TLB exception to
occur on a remaining transaction.It is also possible that memory could be modified by another thread or device in between the completion of the memory transactions. This behavior is equivalent to what
might occur if the unaligned load/store was carried out in software using a series of separate aligned instructions, for instance using LWL/LWR on a pre-R6 MIPS™ core. Software should take equivalent
steps to accommodate this lack of guaranteed atomicity as it would for the multiple instruction case.
Bus Error. Reserved Instruction on NMS cores. TLB Invalid. TLB Refill. TLB Read Inhibit. Watch.
UALW rt, offset(rs) |
Assembly alias, not available in NMS |
Unaligned Load Word |
Unaligned Load Word. Load word to register $rt from memory address $rs + offset (register plus immediate), guaranteeing that the operation completes even if the address is not word aligned.
Assembly alias, not available in NMS
UALWM rt, offset(rs), 1
UALWM rt, offset(rs), count |
nanoMIPS, not available in NMS |
Unaligned Load Word Multiple |
Unaligned Load Word Multiple.
Load count words of data to registers $rt, $(rt+1),...,
$(rt+count-1) from consecutive memory address starting at $rs + offset (register plus immediate). Guarantee that the operation completes even if the address is not word aligned.
nanoMIPS, not available in NMS
101001 |
rt |
rs |
s[8] |
count3 |
0 |
1 |
01 |
s[7:0] |
6 |
5 |
5 |
1 |
3 |
1 |
1 |
2 |
8 |
offset = sign_extend(s, from_nbits=9) count = 8 if count3 == 0 else count3
if C0.Config5.NMS == 1: raise exception('RI') i = 0 while i != count: this_rt = ( rt + i if rt + i < 32 else rt + i - 16 ) this_offset = offset + (i<<2) va = effective_address(GPR[rs], this_offset, 'Load') data = read_memory_at_va(va, nbytes=4, unaligned_support='always') GPR[this_rt] = sign_extend(data, from_nbits=32) if this_rt == rs and i != count - 1: raise UNPREDICTABLE() i += 1
UALWM loads count words to sequentially numbered registers from sequential memory addresses which are potentially unaligned. After loading $31, the sequence of registers continues from $16. See
LWM for example encodings of the register list.
UALWM will not cause an Address Error exception for unaligned addresses.
The result is unpredictable if an UALWM instruction updates the base register prior to the final load.
If a TLB exception or interrupt occurs during the execution of this instruction, a subset of the required register updates may have occurred.
An unaligned load/store instruction may be implemented using more than one memory transaction.It is possible for a subset of these memory transactions to have completed and then for a TLB exception to
occur on a remaining transaction.It is also possible that memory could be modified by another thread or device in between the completion of the memory transactions. This behavior is equivalent to what
might occur if the unaligned load/store was carried out in software using a series of separate aligned instructions, for instance using LWL/LWR on a pre-R6 MIPS™ core. Software should take equivalent
steps to accommodate this lack of guaranteed atomicity as it would for the multiple instruction case.
UALWM must be implemented in such a way as to make the instruction restartable, but the implementation does not need to be fully atomic. For instance,
it is allowable for a UALWM instruction to be
aborted by an exception after a subset of the register updates have occurred. To ensure restartability, any write to GPR $rs (which may be used as the final output register) must be completed atomically,
that is, the instruction must graduate if and only if that write occurs.
Bus Error. Reserved Instruction on NMS cores. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.
UASH rt, offset(rs) |
nanoMIPS, not available in NMS |
Unaligned Store Half |
Unaligned Store Half. Store halfword from register $rt to memory address $rs + offset(register plus immediate), guaranteeing that the operation completes even if the address is not halfword aligned.
nanoMIPS, not available in NMS
101001 |
rt |
rs |
s[8] |
0101 |
0 |
01 |
s[7:0] |
6 |
5 |
5 |
1 |
4 |
1 |
2 |
8 |
if C0.Config5.NMS == 1: raise exception('RI') offset = sign_extend(s, from_nbits=9) va = effective_address(GPR[rs], offset, 'Store') data = zero_extend(GPR[rt], from_nbits=16) write_memory_at_va(data, va, nbytes=2, unaligned_support='always')
UASH will not cause an Address Error exception for unaligned addresses.
An unaligned load/store instruction may be implemented using more than one memory transaction.It is possible for a subset of these memory transactions to have completed and then for a TLB exception to
occur on a remaining transaction.It is also possible that memory could be modified by another thread or device in between the completion of the memory transactions. This behavior is equivalent to what
might occur if the unaligned load/store was carried out in software using a series of separate aligned instructions, for instance using LWL/LWR on a pre-R6 MIPS™ core. Software should take equivalent
steps to accommodate this lack of guaranteed atomicity as it would for the multiple instruction case.
Bus Error. Reserved Instruction on NMS cores. TLB Invalid. TLB Modified. TLB Refill. Watch.
UASW rt, offset(rs) |
Assembly alias, not available in NMS |
Unaligned Store Word |
Unaligned Store Word.
Store word from register $rt to memory address $rs + offset
(register plus immediate), guaranteeing that the operation completes even if the address is not word aligned.
Assembly alias, not available in NMS
UASWM rt, offset(rs), 1
UASWM rt, offset(rs), count |
nanoMIPS, not available in NMS |
Unaligned Store Word Multiple |
Unaligned Store Word Multiple. Store count words of data from registers $rt, $(rt+1),
...,
$(rt+count-1) to consecutive memory addresses starting at $rs + offset (register plus immediate). Guarantee that the operation completes even if the address is not word aligned.
nanoMIPS, not available in NMS
101001 |
rt |
rs |
s[8] |
count3 |
1 |
1 |
01 |
s[7:0] |
6 |
5 |
5 |
1 |
3 |
1 |
1 |
2 |
8 |
offset = sign_extend(s, from_nbits=9) count = 8 if count3 == 0 else count3
if C0.Config5.NMS == 1: raise exception('RI') i = 0 while i != count: this_rt = ( 0 if rt == 0 else rt + i if rt + i < 32 else rt + i - 16 ) this_offset = offset + (i<<2) va = effective_address(GPR[rs], this_offset, 'Store') data = zero_extend(GPR[this_rt], from_nbits=32) write_memory_at_va(data, va, nbytes=4, unaligned_support='always') i += 1
UASWM stores count words from sequentially numbered registers to sequential memory addresses which are potentially unaligned. After storing $31, the sequence of registers continues from $16.If
rt=0, then $0 is stored for all count steps of the instruction. See SWM for example encodings of the register list.
UASWM will not cause an Address Error exception for unaligned addresses.
If a TLB exception or interrupt occurs during the execution of this instruction, a subset of the required memory updates may have occurred. A full restart of the instruction will be performed on return from
the exception.
An unaligned load/store instruction may be implemented using more than one memory transaction.It is possible for a subset of these memory transactions to have completed and then for a TLB exception to
occur on a remaining transaction.It is also possible that memory could be modified by another thread or device in between the completion of the memory transactions. This behavior is equivalent to what
might occur if the unaligned load/store was carried out in software using a series of separate aligned instructions, for instance using LWL/LWR on a pre-R6 MIPS™ core. Software should take equivalent
steps to accommodate this lack of guaranteed atomicity as it would for the multiple instruction case.
Bus Error. Reserved Instruction on NMS cores. TLB Invalid. TLB Modified. TLB Refill. Watch.
WAIT code |
nanoMIPS |
Wait |
Wait. Enter wait state.
nanoMIPS
001000 |
code |
11 |
00001 |
101 |
111 |
111 |
6 |
10 |
2 |
5 |
3 |
3 |
3 |
if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) CPU.in_wait_state = True
Coprocessor Unusable.
WRPGPR rt, rs |
nanoMIPS. Requires CP0 privilege. |
Write Previous GPR |
Write Previous GPR. Write the value of register $rs from the current shadow register set(SRSCtl.CSS) to register $rt in the previous shadow register set (SRSCtl.PSS). If shadow register sets are not implemented,just copy the value from register $rs to register $rt.
nanoMIPS. Requires CP0 privilege.
001000 |
rt |
rs |
11 |
11000 |
101 |
111 |
111 |
6 |
5 |
5 |
2 |
5 |
3 |
3 |
3 |
if not IsCoprocessor0Enabled(): raise coprocessor_exception(0) if C0.SRSCtl.HSS > 0: SRS[C0.SRSCtl.PSS][rt] = GPR[rs] else: GPR[rt] = GPR[rs]
Coprocessor Unusable.
WSBH rt, rs |
Assembly alias, not available in NMS |
Word Swap Byte Half |
Word Swap Byte Half. Swap the bytes within both halfs of the word value in register $rs,and write the result to register $rt.
Assembly alias, not available in NMS
ROTX rt, rs, 8, 24
The assembly alias WSHB is provided for compatibility with MIPS32™.Its behavior is equivalent to the new assembly alias BYTEREVH, whose name is chosen to fit consistently with the naming of other
reversing instructions in nanoMIPS™.
XOR rd, rs, rt |
nanoMIPS |
XOR |
XOR. Compute logical XOR of registers $rs and $rt, placing the result in register $rt.
nanoMIPS
001000 |
rt |
rs |
rd |
x |
1100010 |
000 |
6 |
5 |
5 |
5 |
1 |
7 |
3 |
010100 |
rt3 |
rs3 |
01 |
0 |
0 |
6 |
3 |
3 |
2 |
1 |
1 |
rt = decode_gpr(rt3, 'gpr3') rs = decode_gpr(rs3, 'gpr3') rd = rt
GPR[rd] = GPR[rs] ^ GPR[rt]
None.
XORI rt, rs, u |
nanoMIPS |
XOR Immediate |
XOR Immediate. Compute logical XOR of register $rs with immediate u, placing the resultin register $rt.
nanoMIPS
100000 |
rt |
rs |
0001 |
u |
6 |
5 |
5 |
4 |
12 |
GPR[rt] = GPR[rs] ^ u
None.