Assembly:

ADD rd, rs, rt

nanoMIPS, not available in NMS

Add

Purpose:

Add. Add two 32-bit signed integers in registers $rs and $rt, placing the 32-bit result inregister $rd, and trapping on overflow.

Availability:

nanoMIPS, not available in NMS

Format:

001000

rt

rs

rd

x

0100010

000

6

5

5

5

1

7

3

Operation:

if C0.Config5.NMS == 1:
    raise exception('RI')
sum = GPR[rs] + GPR[rt]
if overflows(sum, nbits=32):
    raise exception('OV')
GPR[rd] = sign_extend(sum, from_nbits=32)

Exceptions:

Overflow.

Assembly:

ADDIU rt, rs, imm

nanoMIPS, availability varies by format.

Add Immediate (Untrapped)

Purpose:

Add Immediate (Untrapped). Add immediate value imm to the 32-bit integer value in register$rs, placing the 32-bit result in register $rt, and not trapping on overflow.

Availability:

nanoMIPS, availability varies by format.

Format:

ADDIU[32]

000000

rt!=0

rt

rs

u

6

5

5

16

imm = u

ADDIU[48], not available in NMS

011000

rt

00001

s[15:0]

s[31:16]

6

5

5

16

16

if C0.Config5.NMS == 1:
    raise exception('RI')
imm = sign_extend(s, from_nbits=32)
rs = rt

ADDIU[GP48], not available in NMS, not available in P64 mode

011000

rt

00010

s[15:0]

s[31:16]

6

5

5

16

16

if C0.Config5.NMS == 1:
    raise exception('RI')
if pointers_are_64_bits():
    raise behaves_like('DADDIU[GP48]')
imm = sign_extend(s, from_nbits=32)
rs = 28

ADDIU[GP.B], not available in P64 mode

010001

rt

011

u

6

5

3

18

if pointers_are_64_bits():
    raise behaves_like('DADDIU[GP.B]')
imm = u
rs = 28

ADDIU[GP.W], not available in P64 mode

010000

rt

u[20:2]

00

6

5

19

2

if pointers_are_64_bits():
    raise behaves_like('DADDIU[GP.W]')
imm = u
rs = 28

ADDIU[R1.SP], not available in P64 mode

011100

rt3

1

u[7:2]

6

3

1

6

if pointers_are_64_bits():
    raise behaves_like('DADDIU[R1.SP]')
rt = decode_gpr(rt3, 'gpr3')
rs = 29
imm = u

ADDIU[R2]

100100

rt3

rs3

0

u[4:2]

6

3

3

1

3

rt = decode_gpr(rt3, 'gpr3')
rs = decode_gpr(rs3, 'gpr3')
imm = u

ADDIU[RS5]

100100

with rt!=0

rt

s[3]

1

s[2:0]

6

5

1

1

3

rs = rt
imm = sign_extend(s, from_nbits=4)

ADDIU[RS5] with rt=0 is used to provide a 16 bit NOP instruction.

ADDIU[NEG]

100000

rt

rs

1000

u

6

5

5

4

12

imm = -u

Operation:

sum = GPR[rs] + imm
GPR[rt] = sign_extend(sum, from_nbits=32)

Exceptions:

Reserved Instruction for ADDIU[48] and ADDIU[GP48] formats on NMS cores.

Assembly:

ADDIUPC rt, imm

nanoMIPS, availability varies by format.

Add Immediate (Untrapped) to PC

Purpose:

Add Immediate (Untrapped) to PC. Compute address by adding immediate value imm to thePC and placing the result in register $rt, not trapping on overflow.

Availability:

nanoMIPS, availability varies by format.

Format:

ADDIUPC[32], not available in P64 mode

000001

rt

s[20:1]

s[21]

6

5

20

1

if pointers_are_64_bits():
    raise behaves_like('DADDIUPC[32]')
s = sign_extend(s[21] @ s[20:1] @ '0')
imm = s + 4

ADDIUPC[48], not available in NMS, not available in P64 mode

011000

rt

00011

s[15:0]

s[31:16]

6

5

5

16

16

if C0.Config5.NMS == 1:
    raise exception('RI')
if pointers_are_64_bits():
    raise behaves_like('DADDIUPC[48]')
s = sign_extend(s[31:16] @ s[15:0])
imm = s + 6

Operation:

GPR[rt] = effective_address(CPU.next_pc, s)

Exceptions:

Reserved Instruction for ADDIUPC[48] format on NMS cores.

Assembly:

ADDU dst, src1, src2

nanoMIPS, availability varies by format.

Add (Untrapped)

Purpose:

Add (Untrapped). Add two 32-bit integers in registers $src1 and $src2, placing the 32-bitresult in register $dst, and not trapping on overflow.

Availability:

nanoMIPS, availability varies by format.

Format:

ADDU[32]

001000

rt

rs

rd

x

0101010

000

6

5

5

5

1

7

3

dst = rd
src1 = rs
src2 = rt
not_in_nms = False

ADDU[16]

101100

rt3

rs3

rd3

0

6

3

3

3

1

dst = decode_gpr(rd3, 'gpr3')
src1 = decode_gpr(rs3, 'gpr3')
src2 = decode_gpr(rt3, 'gpr3')
not_in_nms = False

ADDU[4X4], not available in NMS

001111

rt4[3]

0

rt4[2:0]

rs4[3]

0

rs4[2:0]

6

1

1

3

1

1

3

dst = decode_gpr(rt4, 'gpr4')
src1 = decode_gpr(rt4, 'gpr4')
src2 = decode_gpr(rs4, 'gpr4')
not_in_nms = True

Operation:

if not_in_nms and C0.Config5.NMS:
    raise exception('RI')
sum = GPR[src1] + GPR[src2]
GPR[dst] = sign_extend(sum, from_nbits=32)

Exceptions:

Reserved Instruction for ADDU[4X4] format on NMS cores.

Assembly:

ALIGN rd, rs, rt, bp

Assembly alias

Align

Purpose:

Align. Concatenate the 32 bit values in registers $rt and $rs, extract the word at specifiedbyte position bp, and place the result in register $rd.

Availability:

Assembly alias

Expansion:

bp != 0:
EXTW rd, rs, rt, (4-bp)<<3
bp == 0:
MOVE rd, rt

Assembly:

ALUIPC rt, %pcrel_hi(address)

nanoMIPS

Add aLigned Upper Immediate to PC

Purpose:

Add aLigned Upper Immediate to PC. Compute a 4KB aligned PC relative address by addingan upper 20 bitimmediate value to NextPC, discarding the lower 12 bits, and placing the resultin register $rt.

Availability:

nanoMIPS

Format:

111000

rt

s[20:12]

s[30:21]

1

s[31]

6

5

9

10

1

1

offset = sign_extend(s, from_nbits=32)
address = effective_address(CPU.next_pc, offset) & ~0xfff

Operation:

GPR[rt] = address

Exceptions:

None.

Assembly:

AND rd, rs, rt

nanoMIPS

AND

Purpose:

AND. Compute logical AND of registers $rs and $rt, placing the result in register $rd.

Availability:

nanoMIPS

Format:

AND[32]

001000

rt

rs

rd

x

1001010

000

6

5

5

5

1

7

3

AND[16]

010100

rt3

rs3

10

0

0

6

3

3

2

1

1

rt = decode_gpr(rt3, 'gpr3')
rs = decode_gpr(rs3, 'gpr3')
rd = rt

Operation:

GPR[rd] = GPR[rs] & GPR[rt]

Exceptions:

None.

Assembly:

ANDI rt, rs, u

nanoMIPS

AND Immediate

Purpose:

AND Immediate. Compute logical AND of register $rs and immediate u, placing the resultin register $rt.

Availability:

nanoMIPS

Format:

ANDI[32]

100000

rt

rs

0010

u

6

5

5

4

12

ANDI[16]

111100

rt3

rs3

eu

6

3

3

4

rt = decode_gpr(rt3, 'gpr3')
rs = decode_gpr(rs3, 'gpr3')
u = (0x00ff if eu == 12 else
     0xffff if eu == 13 else
     eu)

Operation:

GPR[rt] = GPR[rs] & u

Exceptions:

None.

Assembly:

BALC address

nanoMIPS

Branch And Link, Compact

Purpose:

Branch And Link, Compact. Unconditional PC relative branch to address, placing returnaddress in register $31.

Availability:

nanoMIPS

Format:

BALC[32]

001010

1

s[24:1]

s[25]

6

1

24

1

offset = sign_extend(s, from_nbits=26)

BALC[16]

001110

s[9:1]

s[10]

6

9

1

offset = sign_extend(s, from_nbits=11)

Operation:

address = effective_address(CPU.next_pc, offset)
GPR[31] = CPU.next_pc
CPU.next_pc = address

Exceptions:

None.

Assembly:

BALRSC rt, rs

nanoMIPS

Branch And Link Register Scaled, Compact

Purpose:

Branch And Link Register Scaled, Compact.

Unconditional branch to address NextPC+

2*$rs, placing return address in register $rt.

Availability:

nanoMIPS

Format:

010010

rt!=0

rt

rs

1000

x

6

5

5

4

12

Operation:

address = effective_address(CPU.next_pc, offset=GPR[rs]<<1)
GPR[rt] = CPU.next_pc
CPU.next_pc = address

Exceptions:

None.

Assembly:

BBEQZC rt, bit, address

nanoMIPS, not available in NMS

Branch if Bit Equals Zero, Compact

Purpose:

Branch if Bit Equals Zero, Compact. PC relative branch to address if bit bit of register $rtis equal to zero.

Availability:

nanoMIPS, not available in NMS

Format:

110010

rt

001

x

bit

s[10:1]

s[11]

6

5

3

1

6

10

1

offset = sign_extend(s, from_nbits=12)

Operation:

if C0.Config5.NMS == 1:
    raise exception('RI')
if bit >= 32 and not Are64BitOperationsEnabled():
    raise exception('RI');
address = effective_address(CPU.next_pc, offset)
testbit = (GPR[rt] >> bit) & 1
if testbit == 0:
    CPU.next_pc = address

Exceptions:

Reserved Instruction on NMS cores.

Assembly:

BBNEZC rt, bit, address

nanoMIPS, not available in NMS

Branch if Bit Not Equal to Zero, Compact

Purpose:

Branch if Bit Not Equal to Zero, Compact. PC relative branch to address if bit bit of register$rt is not equal to zero.

Availability:

nanoMIPS, not available in NMS

Format:

110010

rt

101

x

bit

s[10:1]

s[11]

6

5

3

1

6

10

1

offset = sign_extend(s, from_nbits=12)

Operation:

if C0.Config5.NMS == 1:
    raise exception('RI')
if bit >= 32 and not Are64BitOperationsEnabled():
    raise exception('RI');
address = effective_address(CPU.next_pc, offset)
testbit = (GPR[rt] >> bit) & 1
if testbit == 1:
    CPU.next_pc = address

Exceptions:

Reserved Instruction on NMS cores.

Assembly:

BC address

nanoMIPS

Branch, Compact

Purpose:

Branch, Compact. Unconditional PC relative branch to address.

Availability:

nanoMIPS

Format:

BC[32]

001010

0

s[24:1]

s[25]

6

1

24

1

offset = sign_extend(s, from_nbits=26)
address = effective_address(CPU.next_pc, offset)

BC[16]

000110

s[9:1]

s[10]

6

9

1

offset = sign_extend(s, from_nbits=11)
address = effective_address(CPU.next_pc, offset)

Operation:

CPU.next_pc = address

Exceptions:

None.

Assembly:

BEQC rs, rt, address

nanoMIPS, availability varies by format.

Branch if Equal, Compact

Purpose:

Branch if Equal, Compact. PC relative branch to address if registers $rs and $rt are areequal.

Availability:

nanoMIPS, availability varies by format.

Format:

BEQC[32]

100010

rt

rs

00

s[13:1]

s[14]

6

5

5

2

13

1

offset = sign_extend(s, from_nbits=15)
address = effective_address(CPU.next_pc, offset)
not_in_mms = False

BEQC[16], not available in NMS

110110

rs3<rt3 && u!=0

rt3

rs3

u[4:1]

6

3

3

4

rs = decode_gpr(rs3, 'gpr3')
rt = decode_gpr(rt3, 'gpr3')
offset = u
address = effective_address(CPU.next_pc, offset)
not_in_mms = True

Operation:

if not_in_mms and C0.Config5.NMS == 1:
    raise exception('RI')
if GPR[rs] == GPR[rt]:
    CPU.next_pc = address

Exceptions:

Reserved Instruction for BEQC[16] format on NMS cores.

Assembly:

BEQIC rt, u, address

nanoMIPS

Branch if Equal to Immediate, Compact

Purpose:

Branch if Equal to Immediate, Compact. PC relative branch to address if value of register$rt is equal to immediate value u.

Availability:

nanoMIPS

Format:

110010

rt

000

u

s[10:1]

s[11]

6

5

3

7

10

1

Operation:

offset = sign_extend(s, from_nbits=12)
address = effective_address(CPU.next_pc, offset)
if GPR[rt] == u:
    CPU.next_pc = address

Exceptions:

None.

Assembly:

BEQZC rt, address # when rt andaddressarein range

nanoMIPS

Branch if Equal to Zero, Compact

Purpose:

Branch if Equal to Zero, Compact. PC relative branch to address if register $rt equals zero.

Availability:

nanoMIPS

Format:

BEQZC[16]

100110

rt3

s[6:1]

s[7]

6

3

6

1

Operation:

rt = decode_gpr(rt3, 'gpr3')
offset = sign_extend(s, from_nbits=8)
address = effective_address(CPU.next_pc, offset)
if GPR[rt] == 0:
    CPU.next_pc = address

Exceptions:

None.

Assembly:

BGEC rs, rt, address

nanoMIPS

Branch if Greater than or Equal, Compact

Purpose:

Branch if Greater than or Equal, Compact. PC relative branch to address if register $rs isgreater than or equal to register $rt.

Availability:

nanoMIPS

Format:

100010

rt

rs

10

s[13:1]

s[14]

6

5

5

2

13

1

Operation:

offset = sign_extend(s, from_nbits=15)
address = effective_address(CPU.next_pc, offset)
if GPR[rs] >= GPR[rt]:
    CPU.next_pc = address

Exceptions:

None.

Assembly:

BGEIC rt, u, address

nanoMIPS

Branch if Greater than or Equal to Immediate, Compact

Purpose:

Branch if Greater than or Equal to Immediate, Compact. PC relative branch to address ifsigned register value $rt is greater than or equal to immediate u.

Availability:

nanoMIPS

Format:

110010

rt

010

u

s[10:1]

s[11]

6

5

3

7

10

1

Operation:

offset = sign_extend(s, from_nbits=12)
address = effective_address(CPU.next_pc, offset)
if GPR[rt] >= u:
    CPU.next_pc = address

Exceptions:

None.

Assembly:

BGEIUC rt, u, address

nanoMIPS

Branch if Greater than or Equal to Immediate Unsigned, Compact

Purpose:

Branch if Greater than or Equal to Immediate Unsigned, Compact. PC relative branch toaddress if unsigned register $rt is greater than or equal to immediate u.

Availability:

nanoMIPS

Format:

110010

rt

011

u

s[10:1]

s[11]

6

5

3

7

10

1

Operation:

offset = sign_extend(s, from_nbits=12)
address = effective_address(CPU.next_pc, offset)
if unsigned(GPR[rt]) >= u:
    CPU.next_pc = address

Exceptions:

None.

Assembly:

BGEUC rs, rt, address

nanoMIPS

Branch if Greater than or Equal

Purpose:

Branch if Greater than or Equal

to Unsigned, Compact.PC relative branch to address if

unsigned register $rs is greater than or equal to unsigned register $rt.

Availability:

nanoMIPS

Format:

100010

rt

rs

11

s[13:1]

s[14]

6

5

5

2

13

1

Operation:

offset = sign_extend(s, from_nbits=15)
address = effective_address(CPU.next_pc, offset)
if unsigned(GPR[rs]) >= unsigned(GPR[rt]):
    CPU.next_pc = address

Exceptions:

None.

Assembly:

BITREVB rt, rs

Assembly alias, not available in NMS

Bit Reverse in Bytes

Purpose:

Bit Reverse in Bytes. Reverse bits in each byte of 32-bit value in register $rs, placing theresult in register $rt.

Availability:

Assembly alias, not available in NMS

Expansion:

ROTX rt, rs, 7, 8, 1

Assembly:

BITREVH rt, rs

Assembly alias, not available in NMS

Bit Reverse in Halfs

Purpose:

Bit Reverse in Halfs. Reverse bits in each halfword of 32-bit value in register $rs, placingthe result in register $rt.

Availability:

Assembly alias, not available in NMS

Expansion:

ROTX rt, rs, 15, 16

Assembly:

BITREVW rt, rs

Assembly alias, not available in NMS

Bit Reverse in Word

Purpose:

Bit Reverse in Word. Reverse all bits in 32 bit register $rs, placing the result in register $rt.

Availability:

Assembly alias, not available in NMS

Expansion:

ROTX rt, rs, 31, 0

Assembly:

BITSWAP rt, rs

Assembly alias, not available in NMS

Bitswap

Purpose:

Bitswap.

Reverse bits in each byte of 32-bit value in register $rs, placing the resultin

register $rt.

Availability:

Assembly alias, not available in NMS

Expansion:

ROTX rt, rs, 7, 8, 1

The assembly alias BITSWAP is provided for compatibility with MIPS32™.Its behavior is equivalent to the new assembly alias BITREVB, whose name is chosen to fit consistently with the naming of other

reversing instructions in nanoMIPS™.

Assembly:

BLTC rs, rt, address

nanoMIPS

Branch if Less Than, Compact

Purpose:

Branch if Less Than, Compact. PC relative branch to address if signed register $rs is lessthan signed register $rt.

Availability:

nanoMIPS

Format:

101010

rt

rs

10

s[13:1]

s[14]

6

5

5

2

13

1

Operation:

offset = sign_extend(s, from_nbits=15)
address = effective_address(CPU.next_pc, offset)
if GPR[rs] < GPR[rt]:
    CPU.next_pc = address

Exceptions:

None.

Assembly:

BLTIC rt, u, address

nanoMIPS

Branch if Less Than Immediate, Compact

Purpose:

Branch if Less Than Immediate, Compact. PC relative branch to address if signed register$rt is less than immediate u.

Availability:

nanoMIPS

Format:

110010

rt

110

u

s[10:1]

s[11]

6

5

3

7

10

1

Operation:

offset = sign_extend(s, from_nbits=12)
address = effective_address(CPU.next_pc, offset)
if GPR[rt] < u:
    CPU.next_pc = address

Exceptions:

None.

Assembly:

BLTIUC rt, u, address

nanoMIPS

Branch if Less Than Immediate Unsigned Compact

Purpose:

Branch if Less Than Immediate Unsigned Compact.

PC relative branch to address if unsigned register $rt is less than immediate u.

Availability:

nanoMIPS

Format:

110010

rt

111

u

s[10:1]

s[11]

6

5

3

7

10

1

Operation:

offset = sign_extend(s, from_nbits=12)
address = effective_address(CPU.next_pc, offset)
if unsigned(GPR[rt]) < u:
    CPU.next_pc = address

Exceptions:

None.

Assembly:

BLTUC rs, rt, address

nanoMIPS

Branch if Less Than Unsigned, Compact

Purpose:

Branch if Less Than Unsigned, Compact. PC relative branch to address if unsigned register$rs is less than unsigned register $rt.

Availability:

nanoMIPS

Format:

101010

rt

rs

11

s[13:1]

s[14]

6

5

5

2

13

1

Operation:

offset = sign_extend(s, from_nbits=15)
address = effective_address(CPU.next_pc, offset)
if unsigned(GPR[rs]) < unsigned(GPR[rt]):
    CPU.next_pc = address

Exceptions:

None.

Assembly:

BNEC rs, rt, address

nanoMIPS, availability varies by format.

Branch Not Equal, Compact

Purpose:

Branch Not Equal, Compact. PC relative branch to address if register $rs is not equal toregister $rt.

Availability:

nanoMIPS, availability varies by format.

Format:

BNEC[32]

101010

rt

rs

00

s[13:1]

s[14]

6

5

5

2

13

1

offset = sign_extend(s, from_nbits=15)

BNEC[16], not available in NMS

110110

rs3>=rt3 && u!=0

rt3

rs3

u[4:1]

6

3

3

4

if C0.Config5.NMS == 1:
    raise exception('RI')
rs = decode_gpr(rs3, 'gpr3')
rt = decode_gpr(rt3, 'gpr3')
offset = u

Operation:

address = effective_address(CPU.next_pc, offset)
if GPR[rs] != GPR[rt]:
    CPU.next_pc = address

Exceptions:

Reserved Instruction for BNEC[16] format on NMS cores.

Assembly:

BNEIC rt, u, address

nanoMIPS

Branch if Not Equal to Immediate, Compact

Purpose:

Branch if Not Equal to Immediate, Compact. PC relative branch to address if register $rt isnot equal to immediate u.

Availability:

nanoMIPS

Format:

110010

rt

100

u

s[10:1]

s[11]

6

5

3

7

10

1

Operation:

offset = sign_extend(s, from_nbits=12)
address = effective_address(CPU.next_pc, offset)
if GPR[rt] != u:
    CPU.next_pc = address

Exceptions:

None.

Assembly:

BNEZC rt, address

nanoMIPS

Branch if Not Equal to Zero, Compact

Purpose:

Branch if Not Equal to Zero, Compact. PC relative branch to address if register $rt is notequal to zero.

Availability:

nanoMIPS

Format:

BNEZC[16]

101110

rt3

s[6:1]

s[7]

6

3

6

1

rt = decode_gpr(rt3, 'gpr3')
offset = sign_extend(s, from_nbits=8)

Operation:

address = effective_address(CPU.next_pc, offset)
if GPR[rt] != 0:
    CPU.next_pc = address

Exceptions:

None.

Assembly:

BREAK code

nanoMIPS

Break

Purpose:

Break. Cause a Breakpoint exception.

Availability:

nanoMIPS

Format:

BREAK[32]

000000

00000

10

code

6

5

2

19

BREAK[16]

000100

00000

10

code

6

5

2

3

Operation:

raise exception('BP')

Exceptions:

Breakpoint.

Assembly:

BRSC rs

nanoMIPS

Branch Register Scaled, Compact

Purpose:

Branch Register Scaled, Compact. Unconditional branch to address

NextPC + 2*$rs.

Availability:

nanoMIPS

Format:

010010

00000

rs

1000

x

6

5

5

4

12

Operation:

address = effective_address(CPU.next_pc, offset=GPR[rs]<<1)
CPU.next_pc = address

Exceptions:

None.

Assembly:

BYTEREVH rt, rs

Assembly alias, not available in NMS

Byte Reverse in Halfs

Purpose:

Byte Reverse in Halfs. Reverse bytes in each halfword of 32-bit value in register $rs, placingthe result in register $rt.

Availability:

Assembly alias, not available in NMS

Expansion:

ROTX rt, rs, 8, 24

Assembly:

BYTEREVW rt, rs

Assembly alias, not available in NMS

Byte Reverse in Word

Purpose:

Byte Reverse in Word. Reverse each byte in word value in register $rs, placing the result inregister $rt.

Availability:

Assembly alias, not available in NMS

Expansion:

ROTX rt, rs, 24, 8

Assembly:

CACHE  op, offset(rs)

nanoMIPS. Requires CP0 privilege, availability varies by format.

Cache operation/Cache operation using EVA addressing

CACHEE op, offset(rs)

nanoMIPS. Requires CP0 privilege, availability varies by format.

Cache operation/Cache operation using EVA addressing

Purpose:

Cache operation/Cache operation using EVA addressing. Perform cache operation of typeop at address $rs +offset (register plus immediate). For CACHEE, translate the virtual address as though the core is in user mode, although it is actually in kernel mode.

Availability:

nanoMIPS. Requires CP0 privilege, availability varies by format.

Format:

CACHE

101001

op

rs

s[8]

0111

0

01

s[7:0]

6

5

5

1

4

1

2

8

offset = sign_extend(s, from_nbits=9)
is_eva = False

CACHEE, present when Config5.EVA=1.

101001

op

rs

s[8]

0111

0

10

s[7:0]

6

5

5

1

4

1

2

8

offset = sign_extend(s, from_nbits=9)
is_eva = True

Operation:

# NMS core without caches gives RI (not CoprocessorUnusable)exception.
if (C0.Config5.NMS and C0.Config1.DL == 0 and C0.Config1.IL == 0
                   and C0.Config2.SL == 0 and C0.Config2.TL == 0
                   and C0.Config5.L2C == 0):
    raise exception('RI')
if is_eva and not C0.Config5.EVA:
    raise exception('RI')
if not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
va = effective_address(GPR[rs], offset, 'Load', eva=is_eva)
# Behavior for index cacheops is unpredictable ifaddressisnotunmapped.
if op <= 11:  # Index cacheop
    translation_type, description, result_args = decode_va(va, eva=is_eva)
    if translation_type != 'unmapped':
        raise UNPREDICTABLE('Index cacheopunpredictable withVA not unmapped')
pa, cca = va2pa(va, 'Cacheop', eva=is_eva)
if cca == 2 or cca == 7:
    if C0.Config.AT >= 2:
        pass  # Cacheop to uncached address is anopin R6
    else:
        raise UNPREDICTABLE('Cacheop to uncached address isunpredictable')
else:
    cacheop(va, pa, op)

The CACHE/CACHEE instructions perform the cache operation specified by argument ’op’ on the register plus immediate address $rs +offset. For CACHEE, the virtual address is translated as though

the core is in user mode, although it is actually in kernel mode.

The ’op’ argument is a 5 bit value specifying one of the following the possible cache operations, which are described in more detail below:

’op’OperationAvailability

0ICache Index InvalidateRequired (if ICache present) 1DCache Index Writeback InvalidateRequired (if DCache present)

2TCache Index Writeback InvalidateRequired (if TCache present) 3SCache Index Writeback InvalidateRequired (if SCache present)

4ICache Index Load TagRecommended (if ICache present) 5DCache Index Load TagRecommended (if DCache present)

6TCache Index Load TagRecommended (if TCache present) 7SCache Index Load TagRecommended (if SCache present)

8ICache Index Store TagRequired (if ICache present) 9DCache Index Store TagRequired (if DCache present)

10TCache Index Store TagRequired (if TCache present) 11SCache Index Store TagRequired (if SCache present)

12ICache Implementation Dependent OpOptional (if ICache present) 13DCache Implementation Dependent OpOptional (if DCache present)

14TCache Implementation Dependent OpOptional (if TCache present) 15SCache Implementation Dependent OpOptional (if SCache present)

16ICache Hit InvalidateRequired (if ICache present) 17DCache Hit InvalidateOptional (if DCache present)

18TCache Hit InvalidateOptional (if TCache present) 19SCache Hit InvalidateOptional (if SCache present)

20ICache FillRecommended (if ICache present) 21DCache Hit Writeback InvalidateRecommended (if DCache present)

’op’OperationAvailability

22TCache Hit Writeback InvalidateRecommended (if TCache present) 23SCache Hit Writeback InvalidateRecommended (if SCache present)

24Unused 25DCache Hit WritebackRecommended (if DCache present)

26TCache Hit WritebackRecommended (if TCache present) 27SCache Hit WritebackRecommended (if SCache present)

28ICache Fetch and LockRecommended (if ICache present) 29DCache Fetch and LockRecommended (if DCache present)

30Unused 31Unused

Index cacheops (those with op <= 11 and optionally the implementation dependent cases 12<= op <=

15) are operations where the input address is treated as an index into the target cache array. The rules for constructing the index are given in the cacheop() function pseudocode.

’Hit’ cacheops are operations where the input address is treated as a virtual memory address.The operation willtargetthe cache line containing data for that virtual address,ifitis presentin the

cache.

The operations listed above behave as follows:

Ifthe cache line atthe specified index is

valid and dirty, write the line back to the memory address specified by the cache tag. Whether or not the line was dirty, set the state of the cache line to invalid. For a write-through cache, the

writeback step is not required and this is effectively a Cache Index Invalidate operation. This cache operation is required and may be used by software to invalidate the entire data cache by

stepping through allindices. Note that the Index Store Tag operation must be used to initialize the cache at power up.

into the TagLo and TagHi Coprocessor 0 registers.If the DataLo and DataHi registers are implemented, also read the data corresponding to the byte index into the DataLo and DataHi registers.

This operation must not cause a Cache Error Exception. The granularity and alignment of the data read into the DataLo and DataHi registers is implementation-dependent, but is typically the

result of an aligned access to the cache, ignoring the appropriate low-order bits of the byte index.

Error Exception. This required encoding may be used by software to initialize the entire instruction or data caches by stepping through all valid indices. Doing so requires that the TagLo and

TagHi registers associated with the cache be initialized to zero first.

If the cache block contains the specified address,

setthe state ofthe cache block to invalid.This required encoding may be used by software to invalidate a range of addresses from the instruction cache by stepping through the address

range by the line size of the cache.In multiprocessor implementations with coherent caches, the operation may optionally be broadcast to all coherent caches within the system.

specified address:if the cache line is valid and dirty, write the line back to the memory address specified by the cache tag. Whether or not the line was dirty, set the state of the cache line to

invalid. For a write-through cache,the writeback step is not required and this is effectively a Cache Hit Invalidate operation. This cache operation is required and may be used by software to

invalidate a range of addresses from the data cache by stepping through the address range by the line size of the cache.In multiprocessor implementations with coherent caches, the operation

may optionally be broadcast to all coherent caches within the system.

If the cache block contains the specified address and it is

valid and dirty, write the contents back to memory. After the operation is completed,leave the state of the line valid, but clear the dirty state. For a write-through cache, this operation may be

treated as a nop.In multiprocessor implementations with coherent caches, the operation may optionally be broadcast to all coherent caches within the system.

If the cache does not contain the specified virtual address, fill

itfrom memory, performing a write-back if required.Setthe state to valid and locked.The way selected on a fill from memory is implementation dependent. The lock state may be cleared

by executing an Index Invalidate,Index Writeback Invalidate, Hit Invalidate, or Hit Writeback Invalidate operation to the locked line, or via an Index Store Tag operation to the line that clears

the lock bit.It is implementation dependent whether a locked line is displaced as the result of an external invalidate or intervention that hits on the locked line. Software must not depend on the

locked line remaining in the cache if an externalinvalidate or intervention would invalidate the line if it were not locked.It is implementation dependent whether a Fetch and Lock operation

affects more than one line. For example, more than one line around the referenced address may be fetched and locked.Itis recommended that only the single line containing the referenced

address be affected.

It is implementation dependent whether the input address for an Index cacheop is converted into a physical address by the MMU, so to avoid the posibility of generating a TLB exception, the index value

should always be converted to an unmapped address (such as a kseg0 address by ORing the index with 0x80000000) before being used by the cache instruction. For example, the following code sequence

performs a data cache Index Store Tag operation using the index passed in GPR a0:

        li      a1, 0x80000000       /* Baseofkseg0 segment */
        or      a0, a0, a1           /* Convertindex to kseg0 address */
        cache   DCIndexStTag, 0(a1)  /* Performtheindex store tag operation */

Some CACHE/CACHEE operations may result in a Cache Error exception. For example, if a Writeback operation detects a cache or bus error during the processing of the operation, that error is reported

via a Cache Error exception.Also, a Bus Error Exception may occur if a bus operation invoked by this instruction is terminated in an error. However, cache error exceptions must not be triggered by

an Index Load Tag or Index Store tag operation, as these operations are used for initialization and diagnostic purposes.

It is implementation dependent whether a data watch is triggered by a cache instruction whose address matches the Watch register address match conditions. The preferred implementation is not to match

on the CACHE/CACHEE instructions.

The operation of the instruction is UNPREDICTABLE if the cache line that contains the CACHE instruction is the target of an invalidate or a writeback invalidate operation.

If this instruction is used to lock all ways of a cache at a specific cache index, the behavior of that cache to subsequent cache misses to that cache index is UNDEFINED.

The effective address may be arbitrarily aligned. The CACHE/CACHEE instructions never causes an Address Error Exception due to a non-aligned address.

The CACHE instruction and the memory transactions which are sourced by the CACHE instruction, such as cache refill or cache writeback, obey the ordering and completion rules of the SYNC instruction.

Any use of this instruction that can cause cacheline writebacks should be followed by a subsequent SYNC instruction to avoid hazards where the writeback data is not yet visible at the next level of the

memory hierarchy.

For multiprocessor implementations that maintain coherent caches, some of the Hit type operations

may optionally affect all coherent caches within the implementation.In this case,if the effective address uses a coherent Cache Coherency Attribute (CCA),

then the operation is globalized, meaning

it is broadcast to all of the coherent caches within the system.If the effective address does not use one of the coherent CCAs, there is no broadcast of the operation.If multiple levels of caches are to

be affected by one CACHE instruction, all of the affected cache levels must be processed in the same manner - either all affected cache levels use the globalized behavior or all affected cache levels use

the non-globalized behavior.

Exceptions:

Address Error. Bus Error. Cache Error. Coprocessor Unusable. Reserved Instruction on NMS cores without caches. Reserved Instruction for CACHEE if EVA not implemented. TLB Invalid. TLB Refill.

Assembly:

CLO rt, rs

nanoMIPS, not available in NMS

Count Leading Ones

Purpose:

Count Leading Ones. Count leading ones in 32-bit register value $rs, placing the result inregister $rt.

Availability:

nanoMIPS, not available in NMS

Format:

001000

rt

rs

0100101

100

111

111

6

5

5

7

3

3

3

Operation:

if C0.Config5.NMS == 1:
    raise exception('RI')
input = GPR[rs]
i = 0
while i < 32:
    if input[31 - i] != 1: break
    i += 1
GPR[rt] = i

Exceptions:

Reserved Instruction on NMS cores.

Assembly:

CLZ rt, rs

nanoMIPS, not available in NMS

Count Leading Zeros

Purpose:

Count Leading Zeros. Count leading zeros in 32-bit register value $rs, placing the result inregister $rt.

Availability:

nanoMIPS, not available in NMS

Format:

001000

rt

rs

0101101

100

111

111

6

5

5

7

3

3

3

Operation:

if C0.Config5.NMS == 1:
    raise exception('RI')
input = GPR[rs]
i = 0
while i < 32:
    if input[31 - i] != 0: break
    i += 1
GPR[rt] = i

Exceptions:

Reserved Instruction on NMS cores.

Assembly:

CRC32B rt, rs

nanoMIPS. Optional, present when Config5.CRCP=1.

CRC32 Byte.

Purpose:

CRC32 Byte.Generatea32-bit CRC valuebasedonthereversedpolynomial$rt

0xEDB88320,using cumulative 32-bit CRC valueand right-justified byte-sized message$rt $rs as inputs.

Availability:

nanoMIPS. Optional, present when Config5.CRCP=1.

Format:

001000

rt

rs

x

000

1111

1

01

000

6

5

5

3

3

4

1

2

3

Operation:

if C0.Config5.CRCP == 0:
    raise exception('RI')
result = crc32(value=GPR[rt], message=GPR[rs], nbits=8, poly=0xEDB88320)
GPR[rt] = sign_extend(result, from_nbits=32)

Exceptions:

Reserved Instruction on cores without CRC support.

Assembly:

CRC32CB rt, rs

nanoMIPS. Optional, present when Config5.CRCP=1.

CRC32 (Castagnoli) Byte

Purpose:

CRC32 (Castagnoli) Byte. Generate a 32-bit CRC value $rt based on the reversed polynomial0x82F63B78, using cumulative 32-bit CRC value $rt and right-justified byte-sized message $rs as inputs.

Availability:

nanoMIPS. Optional, present when Config5.CRCP=1.

Format:

001000

rt

rs

x

100

1111

1

01

000

6

5

5

3

3

4

1

2

3

Operation:

if C0.Config5.CRCP == 0:
    raise exception('RI')
result = crc32(value=GPR[rt], message=GPR[rs], nbits=8, poly=0x82F63B78)
GPR[rt] = sign_extend(result, from_nbits=32)

Exceptions:

Reserved Instruction on cores without CRC support.

Assembly:

CRC32CH rt, rs

nanoMIPS. Optional, present when Config5.CRCP=1.

CRC32 (Castagnoli) Half

Purpose:

CRC32 (Castagnoli) Half. Generate a 32-bit CRC value $rt based on the reversed polynomial0x82F63B78, using cumulative 32-bit CRC value $rt and right-justified halfword-sized message $rs as inputs.

Availability:

nanoMIPS. Optional, present when Config5.CRCP=1.

Format:

001000

rt

rs

x

101

1111

1

01

000

6

5

5

3

3

4

1

2

3

Operation:

if C0.Config5.CRCP == 0:
    raise exception('RI')
result = crc32(value=GPR[rt], message=GPR[rs], nbits=16, poly=0x82F63B78)
GPR[rt] = sign_extend(result, from_nbits=32)

Exceptions:

Reserved Instruction on cores without CRC support.

Assembly:

CRC32CW rt, rs

nanoMIPS. Optional, present when Config5.CRCP=1.

CRC32 (Castagnoli) Word

Purpose:

CRC32 (Castagnoli) Word. Generate a 32-bit CRC value $rt based on the reversed polynomial 0x82F63B78, using cumulative 32-bit CRC value $rt and right-justified word-sized message $rs as inputs.

Availability:

nanoMIPS. Optional, present when Config5.CRCP=1.

Format:

001000

rt

rs

x

110

1111

1

01

000

6

5

5

3

3

4

1

2

3

Operation:

if C0.Config5.CRCP == 0:
    raise exception('RI')
result = crc32(value=GPR[rt], message=GPR[rs], nbits=32, poly=0x82F63B78)
GPR[rt] = sign_extend(result, from_nbits=32)

Exceptions:

Reserved Instruction on cores without CRC support.

Assembly:

CRC32H rt, rs

nanoMIPS. Optional, present when Config5.CRCP=1.

CRC32 Half.

Purpose:

CRC32 Half.Generatea32-bit CRC valuebasedonthereversedpolynomial$rt

0xEDB88320,using cumulative 32-bit CRC valueand right-justified halfword-sized message$rt

$rs as inputs.

Availability:

nanoMIPS. Optional, present when Config5.CRCP=1.

Format:

001000

rt

rs

x

001

1111

1

01

000

6

5

5

3

3

4

1

2

3

Operation:

if C0.Config5.CRCP == 0:
    raise exception('RI')
result = crc32(value=GPR[rt], message=GPR[rs], nbits=16, poly=0xEDB88320)
GPR[rt] = sign_extend(result, from_nbits=32)

Exceptions:

Reserved Instruction on cores without CRC support.

Assembly:

CRC32W rt, rs

nanoMIPS. Optional, present when Config5.CRCP=1.

CRC32 Word.

Purpose:

CRC32 Word.Generatea 32-bit CRC value$rt based on thereversed polynomial

0xEDB88320, using cumulative 32-bit CRC value $rt and right-justified word-sized message $rs as inputs.

Availability:

nanoMIPS. Optional, present when Config5.CRCP=1.

Format:

001000

rt

rs

x

010

1111

1

01

000

6

5

5

3

3

4

1

2

3

Operation:

if C0.Config5.CRCP == 0:
    raise exception('RI')
result = crc32(value=GPR[rt], message=GPR[rs], nbits=32, poly=0xEDB88320)
GPR[rt] = sign_extend(result, from_nbits=32)

Exceptions:

Reserved Instruction on cores without CRC support.

Assembly:

DERET

nanoMIPS. Optional, present when Debug implemented.

Debug Exception Return

Purpose:

Debug Exception Return. Return from a debug exception by jumping to the address in theDEPC register, and clearing Debug.DM.

Availability:

nanoMIPS. Optional, present when Debug implemented.

Format:

001000

x

11

10001

101

111

111

6

10

2

5

3

3

3

Operation:

if C0.Config1.EP == 0:
    raise exception('RI')
if not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
if C0.Debug.DM == 0:
    raise exception('RI')
CPU.next_pc = sign_extend(Root.C0.DEPC)
C0.Debug.DM = 0
# If single stepping, forward progress isallowedonthenextinstruction.
CPU.debug_sst_progress_allowed = True
clear_execution_hazards()
clear_instruction_hazards()

The DERET instruction implements a software barrier that resolves all execution and instruction hazards. See the EHB and JALRC.HB instructions for an explanation of execution and instruction hazards

respectively, and also the SYNCI/SYNCIE instruction for additionalinformation on resolving instruction hazards created by writing to the instruction stream.

The effects of the DERET barrier are seen starting with the fetch and decode of the instruction at the PC to which the DERET returns. This means,for instance,that if C0.DEPC is modified by an MTC0

instruction prior to a DERET, an EHB is required between the MTC0 and the DERET to ensure that the DERET uses the correct DEPC value.

The DERET instruction is only legalin debug mode and will give a Coprocessor Unusable exception when executed in user mode or a Reserved Instruction exception when executed in kernel mode.

Exceptions:

Coprocessor Unusable.Reserved Instruction when notin Debug Mode or on cores without Debug support.

Assembly:

DI rt

nanoMIPS. Requires CP0 privilege.

Disable Interrupts

Purpose:

Disable Interrupts. Disable interrupts by setting Status.IE to 0, and return the previousvalue of Status register in register $rt.

Availability:

nanoMIPS. Requires CP0 privilege.

Format:

001000

rt

x

01

00011

101

111

111

6

5

5

2

5

3

3

3

Operation:

if not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
GPR[rt] = C0.Status
C0.Status.IE = 0

Exceptions:

Coprocessor Unusable.

Assembly:

DIV rd, rs, rt

nanoMIPS

Divide

Purpose:

Divide. Divide signed word $rs by signed word $rt and place the result in $rd.

Availability:

nanoMIPS

Format:

001000

rt

rs

rd

x

0100011

000

6

5

5

5

1

7

3

Operation:

numerator = GPR[rs]
denominator = GPR[rt]
if denominator == 0:
    quotient, remainder = (UNKNOWN, UNKNOWN)
else:
    quotient, remainder = divide_integers(numerator, denominator)
GPR[rd] = sign_extend(quotient, from_nbits=32)

Exceptions:

None.

Assembly:

DIVU rd, rs, rt

nanoMIPS

Divide Unsigned

Purpose:

Divide Unsigned. Divide unsigned word $rs by unsigned word $rt and place the result inregister $rd.

Availability:

nanoMIPS

Format:

001000

rt

rs

rd

x

0110011

000

6

5

5

5

1

7

3

Operation:

numerator = zero_extend(GPR[rs], from_nbits=32)
denominator = zero_extend(GPR[rt], from_nbits=32)
if denominator == 0:
    quotient, remainder = (UNKNOWN, UNKNOWN)
else:
    quotient, remainder = divide_integers(numerator, denominator)
GPR[rd] = sign_extend(quotient, from_nbits=32)

Exceptions:

None.

Assembly:

DVP rt

nanoMIPS. Optional, present when Config5.VP=1, otherwise NOP. Requires CP0 privilege.

Disable Virtual Processors

Purpose:

Disable Virtual Processors. Disable all virtual processors in a physical core other than theone that issued the instruction. Set VPControl.DIS to 1, and place the previous value of the VPControl CP0 register in register $rt.

Availability:

nanoMIPS. Optional, present when Config5.VP=1, otherwise NOP. Requires CP0 privilege.

Format:

001000

rt

x

00000

0

1110010

000

6

5

5

5

1

7

3

Operation:

if C0.Config5.VP == 0:
    # No operation when VP not implemented pass
else:
    if not IsCoprocessor0Enabled():
        raise coprocessor_exception(0)
    GPR[rt] = C0.VPControl
    C0.VPControl.DIS = 1
    disable_virtual_processors()

The DVP instruction is used to halt instruction fetch for all virtual processors in a VP core, other than the one which issued the DVP instruction. Possible uses for DVP include:

threads on the same core.

All outstanding instructions for the affected virtual processors must be complete before the DVP itself is allowed to retire. Any outstanding events such as hardware instruction or data prefetch, or page-table

walks, must also be terminated.

Memory ordering equivalent to that provided by SYNC(stype=0) is guaranteed between subsequent

instructions on the virtual processor which issued the DVP, and instructions which have already graduated on the disabled virtual processors.

If a virtual processor is already disabled by another event,for instance,if it has executed a WAIT or a PAUSE instruction or has been halted by some external hardware event,then the disabled virtual

processor will not be re-enabled until both an EVP instruction has been executed on the controlling thread, and an event which would otherwise have woken the virtual processor (such as an interrupt for

a WAIT instruction or an interrupt or clearing of the LLBit for a PAUSE instruction) has also occurred.

The effect of a DVP instruction is undone by an EVP instruction, which causes execution to resume immediately (where applicable) on all other virtual processors. From the perspective of the disabled

virtual processors, after the EVP, execution continues as though the DVP had not occurred.

If an event occurs in between the DVP and EVP that renders state of a disabled virtual processor UNPREDICTABLE (such as power-gating), then the effect of EVP is UNPREDICTABLE.

A disabled virtual processor cannot be woken by an interrupt or a deferred exception, at least until execution is re-enabled by an EVP instruction on the controlling thread.The virtual processor that

executes the DVP, however, continues to be interruptible.

A DVP which is executed when VPControl.DIS=1 will return the current value of the VPControl register but otherwise will leave the other virtual processors in a disabled state. Software should only re-enable

virtual processors (via the EVP instruction) if it has verified from the VPControl value returned by the DVP that virtual processors were previously enabled. Performing this check allows DVP/EVP pairs to

be safely nested.

In a core with multiple virtual processors, more than one virtual processor may execute a DVP simultaneously. The implementation should ensure that the selection of which virtual processor’s DVP successfully graduates is not biased towards any one virtual processor, in order to prevent the possibility

of live-lock.

The DVP instruction behaves like a NOP on cores which do not implement virtual processors (i.e. when Config5.VP=0). This behavior allows kernel code to enclose critical sequences within DVP/EVP blocks

without first checking whether itis running on a VP core.The encoding ofthe DVP instruction is equivalentto a SLTU instruction targeting $0,i.e.a NOP, which leads to the correct behavior on

non-VP cores with no additional hardware special casing.

Exceptions:

Coprocessor Unusable.

Assembly:

EHB

nanoMIPS

Execution hazard barrier

Purpose:

Execution hazard barrier. Clear all execution hazards before allowing any subsequent instructions to graduate.

Availability:

nanoMIPS

Format:

100000

00000

x

1100

x

0000

00011

6

5

5

4

3

4

5

Operation:

clear_execution_hazards()

The EHB instruction creates an execution hazard barrier, meaning thatit ensures that subsequent

instructions will be aware of changes to CP0 state caused by prior instructions. Examples of instructions which change CP0 state and which need an execution hazard barrier to ensure that subsequent

instructions see those updates are MTC0, EI, DI, TLBR and CACHE/CACHEE

In the absence of an execution hazard barrier, the CP0 register value used as input to an instruction may be out of date, since it may have been read before the write to the CP0 register by a prior instruction

has actually been committed.

An execution hazard barrier is sufficient to ensure that a fetched instruction is aware of all prior CP0 updates. However, it is not sufficient to ensure that the correct instruction is being fetched as a result

of those CP0 updates. Ensuring that the correct instruction is fetched requires an instruction hazard barrier, which is provided by the JALRC.HB instruction, or any ofthe exception return instructions

ERET/ERETNC or DERET.

Exceptions:

None.

Assembly:

EI rt

nanoMIPS. Requires CP0 privilege.

Enable Interrupts

Purpose:

Enable Interrupts.

Enable interrupts by setting Status.IE to 1, and return the previous

value of Status register in register $rt.

Availability:

nanoMIPS. Requires CP0 privilege.

Format:

001000

rt

x

01

01011

101

111

111

6

5

5

2

5

3

3

3

Operation:

if not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
GPR[rt] = C0.Status
C0.Status.IE = 1

Exceptions:

Coprocessor Unusable.

Assembly:

ERET

nanoMIPS, availability varies by format.

Exception Return/Exception Return Not Clearing LLBit

ERETNC

nanoMIPS, availability varies by format.

Exception Return/Exception Return Not Clearing LLBit

Purpose:

Exception Return/Exception Return Not Clearing LLBit. Return from an exception: either byclearing Status.ERL if set and jumping to the address in ErrorEPC; otherwise by clearing Status.EXL,

jumping to the address in EPC, and updating the current Shadow Register Setto SRSCtl.PSS if required.

Availability:

nanoMIPS, availability varies by format.

Format:

ERET, requires CP0 privilege.

001000

x

0

11

11001

101

111

111

6

9

1

2

5

3

3

3

nc = False

ERETNC, present when Config5.LLB=1. Requires CP0 privilege.

001000

x

1

11

11001

101

111

111

6

9

1

2

5

3

3

3

nc = True

Operation:

if nc and C0.Config5.LLB == 0:
    raise exception('RI')
if not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
if C0.Status.ERL == 1:
    effective_epc = sign_extend(C0.ErrorEPC)
    C0.Status.ERL = 0
else:
    effective_epc = sign_extend(C0.EPC)
    C0.Status.EXL = 0
    if C0.SRSCtl.HSS > 0 and C0.Status.BEV == 0:
        C0.SRSCtl.CSS = C0.SRSCtl.PSS
CPU.next_pc = effective_epc
# clear LLbit unless this is an ERETNC
if not nc:
   C0.LLAddr.LLB = 0
clear_execution_hazards()
clear_instruction_hazards()

The ERET/ERETNC instructions implement a software barrier that resolves all execution and instruction hazards. See the EHB and JALRC.HB instructions for an explanation of execution and instruction

hazards respectively, and also the SYNCI/SYNCIE instruction for additionalinformation on resolving instruction hazards created by writing to the instruction stream.

The effects of the ERET/ERETNC barrier are seen starting with the fetch and decode of the instruction at the PC to which the ERET returns. This means, for instance, that if C0.EPC is modified by an MTC0

instruction prior to an ERET, an EHB is required between the MTC0 and the ERET to ensure that the ERET uses the correct EPC value.

Config5.LLB indicates support for the ERETNC instruction.It is always 1 for R6 cores, except for those implementing the nanoMIPS™ subset.In other words, ERETNC is required for nanoMIPS™ cores and

optional for NMS cores.

Exceptions:

Coprocessor Unusable. Reserved Instruction allowed for ERETNC on NMS cores.

Assembly:

EVP rt

nanoMIPS. Optional, present when Config5.VP=1, otherwise NOP. Requires CP0 privilege.

Enable Virtual

Purpose:

Enable VirtualProcessors.Enableallvirtualprocessorsinaphysicalcore.Set

VPControl.DIS to 0, and place the previous value of the VPControl CP0 register in register $rt.

Availability:

nanoMIPS. Optional, present when Config5.VP=1, otherwise NOP. Requires CP0 privilege.

Format:

001000

rt

x

00000

1

1110010

000

6

5

5

5

1

7

3

Operation:

if C0.Config5.VP == 0:
    # No operation when VP not implemented pass
else:
    if not IsCoprocessor0Enabled():
        raise coprocessor_exception(0)
    GPR[rt] = C0.VPControl
    C0.VPControl.DIS = 0
    enable_virtual_processors()

The EVP instruction is used on VP cores to undo the effect of a DVP instruction, and the reader should refer to the DVP description for details regarding its usage.

The EVP instruction behaves like a NOP on cores which do not implement virtual processors (i.e. when Config5.VP=0). This behavior allows kernel code to enclose critical sequences within DVP/EVP blocks

without first checking whether itis running on a VP core.The encoding ofthe EVP instruction is equivalentto a SLTU instruction targeting $0,i.e.a NOP, which leads to the correct behavior on

non-VP cores with no additional hardware special casing.

Exceptions:

Coprocessor Unusable.

Assembly:

EXT rt, rs, pos, size

nanoMIPS, not available in NMS

Extract

Purpose:

Extract. Extract a bit field of size size at position pos from register $rs and store it rightjustified into register $rt.

Availability:

nanoMIPS, not available in NMS

Format:

100000

rt

rs

1111

0

msbd

0

lsb

6

5

5

4

1

5

1

5

Operation:

if C0.Config5.NMS == 1:
    raise exception('RI')
pos = lsb
size = msbd + 1
if pos + size > 32:
    raise UNPREDICTABLE()
result = zero_extend(GPR[rs] >> pos, from_nbits=size)
GPR[rt] = sign_extend(result, from_nbits=32)

Exceptions:

Reserved Instruction on NMS cores.

Assembly:

EXTW rd, rs, rt, shift

nanoMIPS

Extract Word

Purpose:

Extract Word. Concatenate the 32 bit values in registers $rt and $rs, extract the word atspecified bit position shift, and place the result in register $rd.

Availability:

nanoMIPS

Format:

001000

rt

rs

rd

shift

011

111

6

5

5

5

5

3

3

Operation:

tmp = GPR[rt][31:0] @ GPR[rs][31:0]
result = tmp >> shift
GPR[rd] = sign_extend(result, from_nbits=32)

Exceptions:

None.

Assembly:

GINVI rs

nanoMIPS. Optional, present when

Globally Invalidate Instruction caches

Purpose:

Globally Invalidate Instruction caches.

Availability:

nanoMIPS. Optional, present when

Config5.GI >= 2. Requires CP0 privilege.

Format:

001000

x

rs

00

01111

101

111

111

6

5

5

2

5

3

3

3

Operation:

if C0.Config5.GI < 2:
    raise exception('RI')
if not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
if GPR[rs] == 0:
    cores = get_all_cores_in_system()
else:
    cores = implementation_dependent_ginvi_cores(GPR[rs])
for core in cores:
    # Find encoded line size, sets, and associativity for thetargetcache.
    (L, S, A) = get_cache_parameters('I', core)
    num_sets = 2 ** (S + 6)
    num_ways = A + 1
    for way_index in range(num_ways):
        for set_index in range(num_sets):
            cache_line = get_cache_line('I', way_index,set_index, core)
            cache_line.valid = False

When $rs is 0, GINVI fully invalidates allinstruction caches of all cores in the system,including the localinstruction cache. For non-zero $rs values, GINVI invalidates the instruction cache of a specific,

implementation dependent core in the system.

The GINVIinstruction must be followed by a SYNC (stype=0x14) and an instruction hazard barrier (e.g.JRC.HB) to ensure that all instruction caches in the system have been invalidated.

Exceptions:

Coprocessor Unusable. Reserved Instruction if Global Invalidate I-cache not implemented.

Assembly:

GINVT rs, type

nanoMIPS. Optional, present when Config5.GI=3. Requires CP0 privilege.

Globally invalidate TLBs

Purpose:

Globally invalidate TLBs.

Availability:

nanoMIPS. Optional, present when Config5.GI=3. Requires CP0 privilege.

Format:

001000

x

type

rs

00

00111

101

111

111

6

3

2

5

2

5

3

3

3

Operation:

if C0.Config5.GI != 3:
    raise exception('RI')
if not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
if not C0.Config5.MI:
    raise exception('RI', 'Config5.MI notset')
ginvt(type, va=GPR[rs])

Perform type invalidation of all TLBs in the system, where type is one of:

invALL - invalidate all non wired entries.

invVA- invalidate all entries which match the VA specified by $rs.

invMMID invalidate all entries which match C0.MemoryMapID.MMID and are not

global.

invVAMMID - invalidate all entries which match the VA specified by $rs and either match

C0.MemoryMapID or are global.

The GINVT instruction must be followed by a SYNC (stype=0x14) and an instruction hazard barrier (e.g.JRC.HB) to ensure that matching entries have been removed from all TLBs in the system and that

all instructions in the instruction stream can only access the new context.

invMMID and invVAMMID operations use the C0.MemoryMapID value of the currently running process. The kernel must save/restore C0.MemoryMapID appropriately before it modifies it for the invalidation

operation. Between the save and restore, it must utilize unmapped addresses.

Exceptions:

Coprocessor Unusable.Reserved Instruction if GlobalInvalidate TLB notimplemented.Reserved Instruction if MemoryMapID not enabled (i.e. Config5.MI==0).

Assembly:

INS rt, rs, pos, size

nanoMIPS, not available in NMS

Insert. Merge a right justified bit field of size size from register $rs into position pos of

Purpose:

Insert. Merge a right justified bit field of size size from register $rs into position pos of

register $rt.

Availability:

nanoMIPS, not available in NMS

Format:

100000

rt

rs

1110

0

msbd

0

lsb

6

5

5

4

1

5

1

5

Operation:

if C0.Config5.NMS == 1:
    raise exception('RI')
pos = lsb
size = 1 + msbd - lsb
if size < 1:
    raise UNPREDICTABLE()
merge_mask = ((1<<size) - 1) << pos
result = (GPR[rt] & ~merge_mask
          | (GPR[rs] << pos) & merge_mask)
GPR[rt] = sign_extend(result, from_nbits=32)

The INS instruction is not available on NMS cores.It can be emulated using a sequence of three EXTW instructions:

       INS      rt, rs, pos, size

can be emulated using the following sequence of instructions (provided rt is not equal to rs):

       EXTW     rt, rt, rt, pos
       EXTW     rt, rt, rs, size
       EXTW     rt, rt, rt, 32 size -pos

Exceptions:

Reserved Instruction on NMS cores.

Assembly:

JALRC.HB rt, rs

nanoMIPS

Jump And Link Register, Compact, with Hazard Barrier. Unconditional

Purpose:

Jump And Link Register, Compact, with Hazard Barrier. Unconditionaljump to address in

register $rs, placing the return address in register $rt. Clear allinstruction and execution hazards before allowing any subsequent instructions to graduate.

Availability:

nanoMIPS

Format:

010010

rt

rs

0001

x

6

5

5

4

12

Operation:

address = GPR[rs] + 0
GPR[rt] = CPU.next_pc
CPU.next_pc = address
clear_instruction_hazards()
clear_execution_hazards()

The JALRC.HB instruction creates an instruction hazard barrier, meaning that it ensures that subsequent

instruction fetches will be aware of state changes caused by prior instructions.Examples of

state changes which affect instruction fetch and which need an instruction hazard barrier to ensure that subsequent instructions see those updates are:

and a SYNC).

In the absence of an instruction hazard barrier, the state used as input to an instruction fetch may be out of date, since it may have been read before the updates to that state have actually completed.

JALRC.HB also provides an execution hazard barrier, see the EHB instruction definition for details. An instruction hazard barrier is also provided by any of the exception return instructions ERET/ERETNC,

or DERET, but those instructions are only available to privileged software, whereas JALRC.HB is available from all operating modes.

Exceptions:

None.

Assembly:

JALRC dst, src

nanoMIPS

Jump And Link Register, Compact. Unconditional

Purpose:

Jump And Link Register, Compact. Unconditionaljump to address in register $src, placing

the return address in register $dst.

Availability:

nanoMIPS

Format:

JALRC[32]

010010

rt

rs

0000

x

6

5

5

4

12

src = rs
dst = rt

JALRC[16]

110110

rt

1

0000

6

5

1

4

src = rt
dst = 31

Operation:

address = GPR[src] + 0
GPR[dst] = CPU.next_pc
CPU.next_pc = address

Exceptions:

None.

Assembly:

JRC rt

nanoMIPS

Jump Register, Compact. Unconditional jump to address in register $rt.

Purpose:

Jump Register, Compact. Unconditional jump to address in register $rt.

Availability:

nanoMIPS

Format:

110110

rt

0

0000

6

5

1

4

Operation:

address = GPR[rt]
CPU.next_pc = address

Exceptions:

None.

Assembly:

LAPC rt, address

Assembly alias. NMS cores restricted to 21 bit signed offset from PC.

Load Address, PC relative

Purpose:

Load Address, PC relative. Load PC relative address to register $rt.

Availability:

Assembly alias. NMS cores restricted to 21 bit signed offset from PC.

Expansion:

address = $PC + imm (imm in 21 bit signed range):
ADDIUPC[32] rt, imm
address = $PC + imm (imm in 32 bit signed range):
ADDIUPC[48] rt, imm

LAPC uses the ADDIUPC instruction to load a PC relative address into register $rt.In order to determine the correct immediate value for the ADDIUPC instruction, the assembler must assume a value

for the PC that the instruction will be executed from.If the instruction is executed from a different PC, then the generated address will be shifted by a PC relative amount.

Assembly:

LB rt, offset(rs)

nanoMIPS

Load Byte

Purpose:

Load Byte. Load signed byte to register $rt from memory address $rs + offset (registerplus immediate).

Availability:

nanoMIPS

Format:

LB[U12]

100001

rt

rs

0000

u

6

5

5

4

12

offset = u

LB[16]

010111

rt3

rs3

00

u

6

3

3

2

2

rt = decode_gpr(rt3, 'gpr3')
rs = decode_gpr(rs3, 'gpr3')
offset = u

LB[GP]

010001

rt

000

u

6

5

3

18

rs = 28
offset = u

LB[S9]

101001

rt

rs

s[8]

0000

0

00

s[7:0]

6

5

5

1

4

1

2

8

offset = sign_extend(s, from_nbits=9)

Operation:

va = effective_address(GPR[rs], offset, 'Load')
data = read_memory_at_va(va, nbytes=1)
GPR[rt] = sign_extend(data, from_nbits=8)

Exceptions:

Address Error. Bus Error. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.

Assembly:

LBE rt, offset(rs)

nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege.

Load Byte using EVA addressing

Purpose:

Load Byte using EVA addressing. Load signed byte to register $rt from virtual address $rs+ offset, translating the virtual address as though the core is in user mode, although it is actually in kernel mode.

Availability:

nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege.

Format:

101001

rt

rs

s[8]

0000

0

10

s[7:0]

6

5

5

1

4

1

2

8

Operation:

offset = sign_extend(s, from_nbits=9)
if not C0.Config5.EVA:
    raise exception('RI')
if not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
va = effective_address(GPR[rs], offset, 'Load', eva=True)
data = read_memory_at_va(va, nbytes=1, eva=True)
GPR[rt] = sign_extend(data, from_nbits=8)

Exceptions:

Address Error. Bus Error. Coprocessor Unusable. Reserved Instruction if EVA not implemented. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.

Assembly:

LBU rt, offset(rs)

nanoMIPS

Load Byte Unsigned

Purpose:

Load Byte Unsigned. Load unsigned byte to register $rt from memory address $rs + offset(register plus immediate).

Availability:

nanoMIPS

Format:

LBU[U12]

100001

rt

rs

0010

u

6

5

5

4

12

offset = u

LBU[16]

010111

rt3

rs3

10

u

6

3

3

2

2

rt = decode_gpr(rt3, 'gpr3')
rs = decode_gpr(rs3, 'gpr3')
offset = u

LBU[GP]

010001

rt

010

u

6

5

3

18

rs = 28
offset = u

LBU[S9]

101001

rt

rs

s[8]

0010

0

00

s[7:0]

6

5

5

1

4

1

2

8

offset = sign_extend(s, from_nbits=9)

Operation:

va = effective_address(GPR[rs], offset, 'Load')
GPR[rt] = read_memory_at_va(va, nbytes=1)

Exceptions:

Address Error. Bus Error. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.

Assembly:

LBUE rt, offset(rs)

nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege.

Load Byte Unsigned using EVA addressing

Purpose:

Load Byte Unsigned using EVA addressing. Load unsigned byte to register $rt from virtualaddress $rs + offset, translating the virtual address as though the core is in user mode, although it is actually in kernel mode.

Availability:

nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege.

Format:

101001

rt

rs

s[8]

0010

0

10

s[7:0]

6

5

5

1

4

1

2

8

Operation:

offset = sign_extend(s, from_nbits=9)
if not C0.Config5.EVA:
    raise exception('RI')
if not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
va = effective_address(GPR[rs], offset, 'Load', eva=True)
GPR[rt] = read_memory_at_va(va, nbytes=1, eva=True)

Exceptions:

Address Error. Bus Error. Coprocessor Unusable. Reserved Instruction if EVA not implemented. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.

Assembly:

LBUX rd, rs(rt)

nanoMIPS

Load Byte Unsigned indeXed

Purpose:

Load Byte Unsigned indeXed. Load unsigned byte to register $rd from memory address $rt+ $rs (register plus register).

Availability:

nanoMIPS

Format:

001000

rt

rs

rd

0010

0

000

111

6

5

5

5

4

1

3

3

Operation:

va = effective_address(GPR[rs], GPR[rt], 'Load')
GPR[rd] = read_memory_at_va(va, nbytes=1)

Exceptions:

Address Error. Bus Error. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.

Assembly:

LBX rd, rs(rt)

nanoMIPS

Load Byte indeXed

Purpose:

Load Byte indeXed. Load signed byte to register $rd from memory address $rt + $rs (register plus register).

Availability:

nanoMIPS

Format:

001000

rt

rs

rd

0000

0

000

111

6

5

5

5

4

1

3

3

Operation:

va = effective_address(GPR[rs], GPR[rt], 'Load')
data = read_memory_at_va(va, nbytes=1)
GPR[rd] = sign_extend(data, from_nbits=8)

Exceptions:

Address Error. Bus Error. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.

Assembly:

LH rt, offset(rs)

nanoMIPS

Load Half

Purpose:

Load Half. Load signed halfword to register $rt from memory address $rs + offset (registerplus immediate).

Availability:

nanoMIPS

Format:

LH[U12]

100001

rt

rs

0100

u

6

5

5

4

12

offset = u

LH[16]

011111

rt3

rs3

0

u[2:1]

0

6

3

3

1

2

1

rt = decode_gpr(rt3, 'gpr3')
rs = decode_gpr(rs3, 'gpr3')
offset = u

LH[GP]

010001

rt

100

u[17:1]

0

6

5

3

17

1

rs = 28
offset = u

LH[S9]

101001

rt

rs

s[8]

0100

0

00

s[7:0]

6

5

5

1

4

1

2

8

offset = sign_extend(s, from_nbits=9)

Operation:

va = effective_address(GPR[rs], offset, 'Load')
data = read_memory_at_va(va, nbytes=2)
GPR[rt] = sign_extend(data, from_nbits=16)

Exceptions:

Address Error. Bus Error. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.

Assembly:

LHE rt, offset(rs)

nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege.

Load Half using EVA addressing

Purpose:

Load Half using EVA addressing. Load signed halfword to register $rt from virtual address$rs + offset, translating the virtual address as though the core is in user mode, although it is actually in kernel mode.

Availability:

nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege.

Format:

101001

rt

rs

s[8]

0100

0

10

s[7:0]

6

5

5

1

4

1

2

8

Operation:

if not C0.Config5.EVA:
    raise exception('RI')
if not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
offset = sign_extend(s, from_nbits=9)
va = effective_address(GPR[rs], offset, 'Load', eva=True)
data = read_memory_at_va(va, nbytes=2, eva=True)
GPR[rt] = sign_extend(data, from_nbits=16)

Exceptions:

Address Error. Bus Error. Coprocessor Unusable. Reserved Instruction if EVA not implemented. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.

Assembly:

LHU rt, offset(rs)

nanoMIPS

Load Half Unsigned

Purpose:

Load Half Unsigned. Load unsigned halfword to register $rt from memory address $rs +offset (register plus immediate).

Availability:

nanoMIPS

Format:

LHU[U12]

100001

rt

rs

0110

u

6

5

5

4

12

offset = u

LHU[16]

011111

rt3

rs3

1

u[2:1]

0

6

3

3

1

2

1

rt = decode_gpr(rt3, 'gpr3')
rs = decode_gpr(rs3, 'gpr3')
offset = u

LHU[GP]

010001

rt

100

u[17:1]

1

6

5

3

17

1

rs = 28
offset = u

LHU[S9]

101001

rt

rs

s[8]

0110

0

00

s[7:0]

6

5

5

1

4

1

2

8

offset = sign_extend(s, from_nbits=9)

Operation:

va = effective_address(GPR[rs], offset, 'Load')
GPR[rt] = read_memory_at_va(va, nbytes=2)

Exceptions:

Address Error. Bus Error. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.

Assembly:

LHUE rt, offset(rs)

nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege.

Load Half Unsigned using EVA addressing

Purpose:

Load Half Unsigned using EVA addressing. Load unsigned halfword to register $rt from virtual address $rs + offset, translating the virtual address as though the core is in user mode, although it is actually in kernel mode.

Availability:

nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege.

Format:

101001

rt

rs

s[8]

0110

0

10

s[7:0]

6

5

5

1

4

1

2

8

Operation:

if not C0.Config5.EVA:
    raise exception('RI')
if not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
offset = sign_extend(s, from_nbits=9)
va = effective_address(GPR[rs], offset, 'Load', eva=True)
GPR[rt] = read_memory_at_va(va, nbytes=2, eva=True)

Exceptions:

Address Error. Bus Error. Coprocessor Unusable. Reserved Instruction if EVA not implemented. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.

Assembly:

LHUX rd, rs(rt)

nanoMIPS

Load Half Unsigned indeXed

Purpose:

Load Half Unsigned indeXed. Load unsigned halfword to register $rd from memory address$rt + $rs (register plus register).

Availability:

nanoMIPS

Format:

001000

rt

rs

rd

0110

0

000

111

6

5

5

5

4

1

3

3

Operation:

va = effective_address(GPR[rs], GPR[rt], 'Load')
GPR[rd] = read_memory_at_va(va, nbytes=2)

Exceptions:

Address Error. Bus Error. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.

Assembly:

LHUXS rd, rs(rt)

nanoMIPS

Load Half Unsigned indeXed Scaled

Purpose:

Load Half Unsigned indeXed Scaled. Load unsigned halfword to register $rd from memoryaddress $rt + 2*$rs (register plus scaled register).

Availability:

nanoMIPS

Format:

001000

rt

rs

rd

0110

1

000

111

6

5

5

5

4

1

3

3

Operation:

va = effective_address(GPR[rs]<<1, GPR[rt], 'Load')
GPR[rd] = read_memory_at_va(va, nbytes=2)

Exceptions:

Address Error. Bus Error. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.

Assembly:

LHX rd, rs(rt)

nanoMIPS

Load Half indeXed

Purpose:

Load Half indeXed. Load signed halfword to register $rd from memory address $rt + $rs(register plus register).

Availability:

nanoMIPS

Format:

001000

rt

rs

rd

0100

0

000

111

6

5

5

5

4

1

3

3

Operation:

va = effective_address(GPR[rs], GPR[rt], 'Load')
data = read_memory_at_va(va, nbytes=2)
GPR[rd] = sign_extend(data, from_nbits=16)

Exceptions:

Address Error. Bus Error. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.

Assembly:

LHXS rd, rs(rt)

nanoMIPS

Load Half indeXed Scaled

Purpose:

Load Half indeXed Scaled. Load signed halfword to register $rd from memory address $rt+ 2*$rs (register plus scaled register).

Availability:

nanoMIPS

Format:

001000

rt

rs

rd

0100

1

000

111

6

5

5

5

4

1

3

3

Operation:

va = effective_address(GPR[rs]<<1, GPR[rt], 'Load')
data = read_memory_at_va(va, nbytes=2)
GPR[rd] = sign_extend(data, from_nbits=16)

Exceptions:

Address Error. Bus Error. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.

Assembly:

LI rt, s

nanoMIPS, availability varies by format.

Load Immediate

Purpose:

Load Immediate. Load immediate value s to register $rt.

Availability:

nanoMIPS, availability varies by format.

Format:

LI[16]

110100

rt3

eu

6

3

7

rt = decode_gpr(rt3, 'gpr3')
s = -1 if eu == 127 else eu
not_in_nms = False

LI[48], not available in NMS

011000

rt

00000

s[15:0]

s[31:16]

6

5

5

16

16

s = sign_extend(s[31:16] @ s[15:0])
not_in_nms = True

Operation:

if not_in_nms and C0.Config5.NMS == 1:
    raise exception('RI')
GPR[rt] = s

Exceptions:

Reserved Instruction for LI[48] format on NMS cores.

Assembly:

LL    rt, offset(rs)

nanoMIPS, availability varies by format.

Load Linked word/Load Linked word using EVA addressing/Load Linked Word Pair/Load

LLE   rt, offset(rs)

nanoMIPS, availability varies by format.

Load Linked word/Load Linked word using EVA addressing/Load Linked Word Pair/Load

LLWP  rt, ru, (rs)

nanoMIPS, availability varies by format.

Load Linked word/Load Linked word using EVA addressing/Load Linked Word Pair/Load

LLWPE rt, ru, (rs)

nanoMIPS, availability varies by format.

Load Linked word/Load Linked word using EVA addressing/Load Linked Word Pair/Load

Purpose:

Load Linked word/Load Linked word using EVA addressing/Load Linked Word Pair/LoadLinked Word Pair using EVA addressing. For LL/LLE,load word for atomic RMW to register $rt from address $rs + offset (register plus immediate).For LLWP/LLWPE,load words for atomic RMW to

registers $rt and $ru from address $rs. For LLE/LLWPE, translate the virtual address as though the core is in user mode, although it is actually in kernel mode.

Availability:

nanoMIPS, availability varies by format.

Format:

LL

101001

rt

rs

s[8]

1010

0

01

s[7:2]

00

6

5

5

1

4

1

2

6

2

offset = sign_extend(s, from_nbits=9)
nbytes = 4
is_eva = False

LLE, present when Config5.EVA=1, requires CP0 privilege.

101001

rt

rs

s[8]

1010

0

10

s[7:2]

00

6

5

5

1

4

1

2

6

2

offset = sign_extend(s, from_nbits=9)
nbytes = 4
is_eva = True

LLWP, required (optional on NMS cores).

101001

rt

rs

x

1010

0

01

ru

x

01

6

5

5

1

4

1

2

5

1

2

offset = 0
nbytes = 8
is_eva = False

LLWPE, present when Config5.EVA=1. Requires CP0 privilege.

101001

rt

rs

x

1010

0

10

ru

x

01

6

5

5

1

4

1

2

5

1

2

offset = 0
nbytes = 8
is_eva = True

Operation:

if nbytes == 8 and C0.Config5.XNP:
    raise exception('RI', 'LLWP[E] requires word-paired support')
if is_eva and not C0.Config5.EVA:
    raise exception('RI')
va = effective_address(GPR[rs], offset, 'Load', eva=is_eva)
# Linked access must be aligned.
if va & (nbytes-1):
    raise exception('ADEL', badva=va)
pa, cca = va2pa(va, 'Load', eva=is_eva)
if (cca == 2 or cca == 7) and not C0.Config5.ULS:
    raise UNPREDICTABLE('uncached CCAnotsynchronizable when Config5.ULS=0')
    # (Preferred behavior for non-synchronizableaddressisBusError).
# Indicate that there is an active RMW sequence onthisprocessor.
C0.LLAddr.LLB = 1
# Save target address of active RMW sequence.
record_linked_address(va, pa, cca, nbytes=nbytes)
data = read_memory(va, pa, cca, nbytes=nbytes)
if nbytes == 4: # LL/LLE
    GPR[rt] = sign_extend(data, from_nbits=32)
else:  # LLWP/LLWPE
    word0 = data[63:32] if C0.Config.BE else data[31:0]
    word1 = data[31:0] if C0.Config.BE else data[63:32]
    if rt == ru:
        raise UNPREDICTABLE()
    GPR[rt] = sign_extend(word0, from_nbits=32)
    GPR[ru] = sign_extend(word1, from_nbits=32)

The LL/LLE/LLWP/LLWPE instructions are used to initiate an atomic read-modify-write sequence. C0.LLAddr.LLB is set to 1,indicating that there is an active RMW sequence on the current processor,

and an implementation dependent set of state is saved which indicates the address and access type of the active RMW sequence. There can be only one active RMW sequence per processor.

The RMW sequence will be completed by a matching SC/SCE/SCWP/SCWPE instruction.The storeconditional instruction will only complete if the system can guarantee that the accessed memory location has not been modified since the load-linked instruction occurred, as discussed in more detail

in

the SC/SCE/SCWP/SCWPE instruction description.

The address and CCA targeted by the LL/LLE/LLWP/LLWPE must be must be synchronizable by all processors and I/O devices sharing the location; if it is not, the result is UNPREDICTABLE. Which storage is

synchronizable is a function of both CPU and system implementations - see the SC/SCE/SCWP/SCWPE

instruction for the formal definition.The preferred behavior for a load-linked instruction which attempts to access an address which is not synchronizable is a Bus Error exception.

If Config5.ULS is set, then the system supports uncached load-linked/store-conditional accesses. Otherwise, the result of uncached accesses is unpredictable.

A LL/LLE/LLWP/LLWPE instruction on one processor must nottake action that, by itself, causes a

store-conditional instruction for the same block on another processor to fail. For example, if an implementation depends on retaining the data in the cache during the RMW sequence, cache misses caused

by a load-linked instruction must not fetch data in the exclusive state, since that would remove it from another core’s cache if it were present.

An execution of a load-linked instruction does not have to be followed by execution of store-conditional instruction; a program is free to abandon the RMW sequence without attempting a write.

Supportfor the paired word instructions LLWP/LLWPE is indicated by the Config5.XNP bit.Paired word support is required for nanoMIPS™ cores, except for NMS cores, where it is optional.

The result of LLWP/LLWPE is unpredictable if $rt and $ru are the same register.

Exceptions:

Address Error. Bus Error. Coprocessor Unusable for LLE/LLWPE. Reserved Instruction for LLE/LLWPE if EVA not implemented. Reserved Instruction for LLWP/LLWPE ifload linked pair not implemented.

TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.

Assembly:

LSA rd, rs, rt, u2

nanoMIPS

Load Scaled Address

Purpose:

Load Scaled Address. Add register $rs scaled by a left shift u2 to register $rt and place the32 bit result in register $rd.

Availability:

nanoMIPS

Format:

001000

rt

rs

rd

u2

x

001

111

6

5

5

5

2

3

3

3

Operation:

sum = (GPR[rs] << u2) + GPR[rt]
GPR[rd] = sign_extend(sum, from_nbits=32)

In nanoMIPS™,the shift field directly encodes the shift amount, meaning thatthe supported shift values are in the range 0 to 3 (instead of 1 to 4 in MIPSR6™).

Exceptions:

None.

Assembly:

LUI rt, %hi(imm)

nanoMIPS

Load Upper Immediate.

Purpose:

Load Upper Immediate.Load upper 20 bits ofimmediate value imm to upper 20 bits of

register $rt, and set the lower 12 bits to zero.

Availability:

nanoMIPS

Format:

111000

rt

s[20:12]

s[30:21]

0

s[31]

6

5

9

10

1

1

imm = sign_extend(s, from_nbits=32)

Operation:

GPR[rt] = imm

For backwards compatibility, instances of LUI which use a literal value for the immediate will be treated as containing a 16 bit immediate which should be loaded into the upper 16 bits of the target register.

To access the upper 20 bits of the register, the ’%hi(imm)’ form of the immediate must be used.

Exceptions:

None.

Assembly:

LW rt, offset(rs)

nanoMIPS, availability varies by format.

Load Word

Purpose:

Load Word. Load word to register $rt from memory address $rs + offset (register plusimmediate).

Availability:

nanoMIPS, availability varies by format.

Format:

LW[U12]

100001

rt

rs

1000

u

6

5

5

4

12

offset = u

LW[16]

000101

rt3

rs3

u[5:2]

6

3

3

4

rt = decode_gpr(rt3, 'gpr3')
rs = decode_gpr(rs3, 'gpr3')
offset = u

LW[4X4], not available in NMS

011101

rt4[3]

u[2]

rt4[2:0]

rs4[3]

u[3]

rs4[2:0]

6

1

1

3

1

1

3

if C0.Config5.NMS == 1:
    raise exception('RI')
rt = decode_gpr(rt4[3] @ rt4[2:0], 'gpr4')
rs = decode_gpr(rs4[3] @ rs4[2:0], 'gpr4')
offset = u

LW[GP16]

010101

rt3

u[8:2]

6

3

7

rt = decode_gpr(rt3, 'gpr3')
rs = 28
offset = u

LW[GP]

010000

rt

u[20:2]

10

6

5

19

2

rs = 28
offset = u

LW[S9]

101001

rt

rs

s[8]

1000

0

00

s[7:0]

6

5

5

1

4

1

2

8

offset = sign_extend(s, from_nbits=9)

LW[SP]

001101

rt

u[6:2]

6

5

5

rs = 29
offset = u

Operation:

va = effective_address(GPR[rs], offset, 'Load')
data = read_memory_at_va(va, nbytes=4)
GPR[rt] = sign_extend(data, from_nbits=32)

Exceptions:

Address Error. Bus Error. Reserved Instruction for LW[4X4] format on NMS cores. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.

Assembly:

LWE rt, offset(rs)

nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege.

Load Word using EVA addressing

Purpose:

Load Word using EVA addressing.

Load word to register $rt from virtual address $rs +

offset,translating the virtual address as though the core is in user mode, although it is actually in
 kernel mode.

Availability:

nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege.

Format:

101001

rt

rs

s[8]

1000

0

10

s[7:0]

6

5

5

1

4

1

2

8

Operation:

offset = sign_extend(s, from_nbits=9)
if not C0.Config5.EVA:
    raise exception('RI')
if not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
va = effective_address(GPR[rs], offset, 'Load', eva=True)
data = read_memory_at_va(va, nbytes=4, eva=True)
GPR[rt] = sign_extend(data, from_nbits=32)

Exceptions:

Address Error. Bus Error. Coprocessor unusable. Reserved Instruction if EVA not implemented. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.

Assembly:

LWM rt, offset(rs), count

nanoMIPS, not available in NMS

Load Word Multiple

Purpose:

Load Word Multiple. Load count words of data to registers $rt, $(rt+1),

..., $(rt+count-1)

from consecutive memory address starting at $rs + offset (register plus immediate).

Availability:

nanoMIPS, not available in NMS

Format:

101001

rt

rs

s[8]

count3

0

1

00

s[7:0]

6

5

5

1

3

1

1

2

8

offset = sign_extend(s, from_nbits=9)
count = 8 if count3 == 0 else count3

Operation:

if C0.Config5.NMS == 1:
    raise exception('RI')
i = 0
while i != count:
    this_rt = ( rt + i      if rt + i < 32 else
                rt + i - 16                    )
    this_offset = offset + (i<<2)
    va = effective_address(GPR[rs], this_offset, 'Load')
    data = read_memory_at_va(va, nbytes=4)
    GPR[this_rt] = sign_extend(data, from_nbits=32)
    if this_rt == rs and i != count - 1:
        raise UNPREDICTABLE()
    i += 1

LWM loads count words to sequentially numbered register from sequential memory addresses. After loading $31, the sequence of registers continues from $16. Some example encodings of the register

list are:

loads [$15, $16, $17]

loads [$31, $16, $17].

The result is unpredictable if an LWM instruction updates the base register prior to the final load.

LWM must be implemented in such a way as to make the instruction restartable, but the implementation does not need to be fully atomic. For instance,it is allowable for a LWM instruction to be aborted by

an exception after a subset of the register updates have occurred. To ensure restartability, any write to GPR $rs (which may be used as the final output register) must be completed atomically, that is, the

instruction must graduate if and only if that write occurs.

Exceptions:

Address Error. Bus Error. Reserved Instruction on NMS cores. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.

Assembly:

LWPC rt, address

nanoMIPS, not available in NMS

Load Word PC relative

Purpose:

Load Word PC relative. Load word to register $rt from PC relative address address.

Availability:

nanoMIPS, not available in NMS

Format:

LWPC[48]

011000

rt

01011

s[15:0]

s[31:16]

6

5

5

16

16

offset = sign_extend(s, from_nbits=32)

Operation:

if C0.Config5.NMS == 1:
    raise exception('RI')
address = effective_address(CPU.next_pc, offset)
data = read_memory_at_va(address, nbytes=4)
GPR[rt] = sign_extend(data, from_nbits=32)

Exceptions:

Address Error. Bus Error. Reserved Instruction on NMS cores TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.

Assembly:

LWX rd, rs(rt)

nanoMIPS

Load Word indeXed

Purpose:

Load Word indeXed. Load word to register $rd from memory address $rt + $rs (registerplus register).

Availability:

nanoMIPS

Format:

001000

rt

rs

rd

1000

0

000

111

6

5

5

5

4

1

3

3

Operation:

va = effective_address(GPR[rs], GPR[rt], 'Load')
data = read_memory_at_va(va, nbytes=4)
GPR[rd] = sign_extend(data, from_nbits=32)

Exceptions:

Address Error. Bus Error. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.

Assembly:

LWXS rd, rs(rt)

nanoMIPS

Load Word indeXed Scaled

Purpose:

Load Word indeXed Scaled. Load word to register $rd from memory address

$rt + 4*$rs

(register plus scaled register).

Availability:

nanoMIPS

Format:

LWXS[32]

001000

rt

rs

rd

1000

1

000

111

6

5

5

5

4

1

3

3

LWXS[16]

010100

rt3

rs3

rd3

1

6

3

3

3

1

rd = decode_gpr(rd3, 'gpr3')
rs = decode_gpr(rs3, 'gpr3')
rt = decode_gpr(rt3, 'gpr3')

Operation:

va = effective_address(GPR[rs]<<2, GPR[rt], 'Load')
data = read_memory_at_va(va, nbytes=4)
GPR[rd] = sign_extend(data, from_nbits=32)

Exceptions:

Address Error. Bus Error. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.

Assembly:

MFC0 rt, c0s, sel

nanoMIPS. Requires CP0 privilege.

Move From Coprocessor 0

Purpose:

Move From Coprocessor 0. Write value of CP0 register indexed by c0s, sel to register $rt.

Availability:

nanoMIPS. Requires CP0 privilege.

Format:

001000

rt

c0s

sel

x

0000110

000

6

5

5

5

1

7

3

Operation:

if not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
value = read_cp0_register(c0s, sel)
GPR[rt] = sign_extend(value, from_nbits=32)

An MFC0 which targets a register which is not used on the current core will return zero.

Exceptions:

Coprocessor Unusable.

Assembly:

MFHC0 rt, c0s, sel

nanoMIPS, required.

Move From High Coprocessor 0

Purpose:

Move From High Coprocessor 0. Write bits 63..32 (when present) of CP0 register indexedby c0s, sel to register $rt.

Availability:

nanoMIPS, required.

(Optional on NMS cores). Requires CP0 privilege.

Format:

001000

rt

c0s

sel

x

0000111

000

6

5

5

5

1

7

3

Operation:

if C0.Config5.MVH == 0:
    raise exception('RI')
if not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
value = read_cp0_register(c0s, sel, h=True)
GPR[rt] = sign_extend(value, from_nbits=32)

For certain core configurations, specific nanoMIPS32™ CP0 registers may be extended to be 64 bits wide. The MFHC0 instruction is used to read the upper 32 bits of such registers. An MFHC0 which

targets a register for which the ’high’ bits are not used will return zero.

This instruction is available when Config5.MVH=1, which is required on nanoMIPS™ cores, except for NMS cores where it is optional.

Exceptions:

Coprocessor Unusable. Reserved Instruction on NMS cores without MVH support.

Assembly:

MOD rd, rs, rt

nanoMIPS

Modulo

Purpose:

Modulo. Compute signed division of register $rs by register $rt, and place the remainderin register $rd.

Availability:

nanoMIPS

Format:

001000

rt

rs

rd

x

0101011

000

6

5

5

5

1

7

3

Operation:

numerator = GPR[rs]
denominator = GPR[rt]
if denominator == 0:
    quotient, remainder = (UNKNOWN, UNKNOWN)
else:
    quotient, remainder = divide_integers(numerator, denominator)
GPR[rd] = sign_extend(remainder, from_nbits=32)

Exceptions:

None.

Assembly:

MODU rd, rs, rt

nanoMIPS

Modulo Unsigned

Purpose:

Modulo Unsigned. Compute unsigned division of register $rs by register $rt, and place theremainder in register $rd.

Availability:

nanoMIPS

Format:

001000

rt

rs

rd

x

0111011

000

6

5

5

5

1

7

3

Operation:

numerator = zero_extend(GPR[rs], from_nbits=32)
denominator = zero_extend(GPR[rt], from_nbits=32)
if denominator == 0:
    quotient, remainder = (UNKNOWN, UNKNOWN)
else:
    quotient, remainder = divide_integers(numerator, denominator)
GPR[rd] = sign_extend(remainder, from_nbits=32)

Exceptions:

None.

Assembly:

MOVE.BALC rd, rt, address

nanoMIPS, not available in NMS

Move and Branch and Link, Compact

Purpose:

Move and Branch and Link, Compact. Copy value of register $rt to register $rd, and performan unconditional PC relative branch to address, placing the return address in register $31.

Availability:

nanoMIPS, not available in NMS

Format:

000010

rtz4[3]

rd1

rtz4[2:0]

s[20:1]

s[21]

6

1

1

3

20

1

Operation:

if C0.Config5.NMS == 1:
    raise exception('RI')
rd = decode_gpr(rd1, 'gpr1')
rt = decode_gpr(rtz4[3] @ rtz4[2:0], 'gpr4.zero')
offset = sign_extend(s, from_nbits=22)
address = effective_address(CPU.next_pc, offset)
GPR[rd] = GPR[rt]
GPR[31] = CPU.next_pc
CPU.next_pc = address

Although this instruction is called MOVE.BALC, the order of the updates to PC, $31 and $rd is invisible to software, and an implementation may choose any order for carring out these steps.

Exceptions:

Reserved Instruction on NMS cores.

Assembly:

MOVE rt, rs

nanoMIPS

Move

Purpose:

Move. Copy value of register $rs to register $rt.

Availability:

nanoMIPS

Format:

000100

rt!=0

rt

rs

6

5

5

Operation:

GPR[rt] = GPR[rs]

Exceptions:

None.

Assembly:

MOVEP dst1, dst2, src1, src2

nanoMIPS, not available in NMS

Move Pair

Purpose:

Move Pair. Copy value of register $src1 to register $dst1, and copy value of register $src2to register $dst2.

Availability:

nanoMIPS, not available in NMS

Format:

MOVEP

101111

rtz4[3]

rd2[0]

rtz4[2:0]

rsz4[3]

rd2[1]

rsz4[2:0]

6

1

1

3

1

1

3

dst1 = decode_gpr(rd2[1] @ rd2[0], 'gpr2.reg1')
dst2 = decode_gpr(rd2[1] @ rd2[0], 'gpr2.reg2')
src1 = decode_gpr(rsz4[3] @ rsz4[2:0], 'gpr4.zero')
src2 = decode_gpr(rtz4[3] @ rtz4[2:0], 'gpr4.zero')

MOVEP[REV]

111111

rt4[3]

rd2[0]

rt4[2:0]

rs4[3]

rd2[1]

rs4[2:0]

6

1

1

3

1

1

3

dst1 = decode_gpr(rs4[3] @ rs4[2:0], 'gpr4')
dst2 = decode_gpr(rt4[3] @ rt4[2:0], 'gpr4')
src1 = decode_gpr(rd2[1] @ rd2[0], 'gpr2.reg1')
src2 = decode_gpr(rd2[1] @ rd2[0], 'gpr2.reg2')

Operation:

if C0.Config5.NMS == 1:
    raise exception('RI')
if dst1 == src1 or dst1 == src2 or dst2 == src1 or dst2 == src2:
    GPR[dst1] = UNKNOWN
    GPR[dst2] = UNKNOWN
else:
    GPR[dst1] = GPR[src1]
    GPR[dst2] = GPR[src2]

The output register values are unpredictable if either of the output registers is also used as an input.

Exceptions:

Reserved Instruction on NMS cores.

Assembly:

MOVN rd, rs, rt

nanoMIPS

Move if Not zero

Purpose:

Move if Not zero. Copy value of register $rs to register $rd if register $rt is not zero.

Availability:

nanoMIPS

Format:

001000

rt

rs

rd

1

1000010

000

6

5

5

5

1

7

3

Operation:

GPR[rd] = GPR[rs] if GPR[rt] != 0 else GPR[rd]

Exceptions:

None.

Assembly:

MOVZ rd, rs, rt

nanoMIPS

Move if Zero

Purpose:

Move if Zero. Copy value of register $rs to register $rd if register $rt is zero.

Availability:

nanoMIPS

Format:

001000

rt

rs

rd

0

1000010

000

6

5

5

5

1

7

3

Operation:

GPR[rd] = GPR[rs] if GPR[rt] == 0 else GPR[rd]

Exceptions:

None.

Assembly:

MTC0 rt, c0s, sel

nanoMIPS. Requires CP0 privilege.

Move To Coprocessor 0

Purpose:

Move To Coprocessor 0. Write value of register $rt to CP0 register indexed by c0s, sel.

Availability:

nanoMIPS. Requires CP0 privilege.

Format:

001000

rt

c0s

sel

x

0001110

000

6

5

5

5

1

7

3

Operation:

if not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
write_cp0_register(GPR[rt], c0s, sel)

An MTC0 to a register which is not used on the current core is ignored.

When a register is extended to have high bits for a specific configuration (see MTHC0), legacy software which is not aware of the existence of these high bits still needs to function correctly.In such cases,

the architecture may require that an MTC0 modifies the high 32 bits of the register as well as the low 32 bits to give the correct legacy behavior.

For this reason, when setting an extended CP0 register, the MTC0 to set the low 32 bits should always precede the MTHC0 to set the high 32 bits. Also, a read-modify-write sequence to set a specific bitfield

in the low 32 bits should read both the low 32 and high 32 bits, then do MTC0 followed by MTHC0 to write the modified value back.

Exceptions:

Coprocessor Unusable.

Assembly:

MTHC0 rt, c0s, sel

nanoMIPS, required.

Move To High Coprocessor 0

Purpose:

Move To High Coprocessor 0. Write value of register $rt to bits 63..32 (when present) ofCP0 register indexed by c0s, sel.

Availability:

nanoMIPS, required.

(Optional on NMS cores). Requires CP0 privilege.

Format:

001000

rt

c0s

sel

x

0001111

000

6

5

5

5

1

7

3

Operation:

if C0.Config5.MVH == 0:
    raise exception('RI')
if not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
write_cp0_register(GPR[rt], c0s, sel, h=True)

For certain core configurations, specific nanoMIPS32™ CP0 registers may be extended to be 64 bits wide.The MTHC0 instruction is used to write the upper 32 bits of such registers. An MTHC0 to a

register for which the ’high’ bits are not used will be ignored.

When a register is extended to have high bits for a specific configuration, legacy software which is not aware of the existence of these high bits still needs to function correctly.In such cases, the architecture

may require that an MTC0 modifies the high 32 bits of the register as well as the low 32 bits to give the correct legacy behavior.

For this reason, when setting an extended CP0 register, the MTC0 to set the low 32 bits should always precede the MTHC0 to set the high 32 bits. Also, a read-modify-write sequence to set a specific bitfield

in the low 32 bits should read both the low 32 and high 32 bits, then do MTC0 followed by MTHC0 to write the modified value back.

This instruction is available when Config5.MVH=1, which is required on nanoMIPS™ cores, except for NMS cores where it is optional.

Exceptions:

Coprocessor Unusable. Reserved Instruction on NMS cores without MVHm support.

Assembly:

MUH rd, rs, rt

nanoMIPS

Multiply High

Purpose:

Multiply High. Multiply signed word values from registers $rs and $rt, and place bits 63..32of the result in register $rd.

Availability:

nanoMIPS

Format:

001000

rt

rs

rd

x

0001011

000

6

5

5

5

1

7

3

Operation:

result = GPR[rs] * GPR[rt]
result_hi = result[63:32]
GPR[rd] = sign_extend(result_hi, from_nbits=32)

Exceptions:

None.

Assembly:

MUHU rd, rs, rt

nanoMIPS

Multiply High Unsigned

Purpose:

Multiply High Unsigned. Multiply unsigned word values in registers $rs and $rt, and placebits 63..32 of the result in register $rd.

Availability:

nanoMIPS

Format:

001000

rt

rs

rd

x

0011011

000

6

5

5

5

1

7

3

Operation:

rs_unsigned = zero_extend(GPR[rs], from_nbits=32)
rt_unsigned = zero_extend(GPR[rt], from_nbits=32)
result = rs_unsigned * rt_unsigned
result_hi = result[63:32]
GPR[rd] = sign_extend(result_hi, from_nbits=32)

Exceptions:

None.

Assembly:

MUL dst, src1, src2

nanoMIPS, availability varies by format.

Multiply

Purpose:

Multiply. Multiply signed word values in registers $src1 and $src2, and place bits 31..0 ofthe result in register $dst.

Availability:

nanoMIPS, availability varies by format.

Format:

MUL[32]

001000

rt

rs

rd

x

0000011

000

6

5

5

5

1

7

3

dst = rd
src1 = rs
src2 = rt
not_in_mms = False

MUL[4X4], not available in NMS

001111

rt4[3]

0

rt4[2:0]

rs4[3]

1

rs4[2:0]

6

1

1

3

1

1

3

dst = decode_gpr(rt4, 'gpr4')
src1 = decode_gpr(rt4, 'gpr4')
src2 = decode_gpr(rs4, 'gpr4')
not_in_mms = True

Operation:

if not_in_mms and C0.Config5.NMS == 1:
    raise exception('RI')
result = GPR[src1] * GPR[src2]
GPR[dst] = sign_extend(result, from_nbits=32)

Exceptions:

Reserved Instruction for MUL[4X4] format on NMS cores.

Assembly:

MULU rd, rs, rt

nanoMIPS

Multiply Unsigned

Purpose:

Multiply Unsigned. Multiply unsigned word values in registers $rs and $rt, and place bits31..0 of the result in register $rd.

Availability:

nanoMIPS

Format:

001000

rt

rs

rd

x

0010011

000

6

5

5

5

1

7

3

Operation:

rs_unsigned = zero_extend(GPR[rs], from_nbits=32)
rt_unsigned = zero_extend(GPR[rt], from_nbits=32)
result = rs_unsigned * rt_unsigned
GPR[rd] = sign_extend(result, from_nbits=32)

Exceptions:

None.

Assembly:

NOP

nanoMIPS

No Operation

Purpose:

No Operation.

Availability:

nanoMIPS

Format:

NOP[32]

100000

00000

x

1100

x

0000

00000

6

5

5

4

3

4

5

NOP[16]

100100

00000

x

1

x

6

5

1

1

3

Operation:

# No operation pass

The NOP[32] encoding is equivalent to an SLL[32] instruction using $0 as output and a shift value of 0. The NOP[16] encoding is equivalent to an ADDIU[RS5] instruction using $0 as output. Therefore NOP

does not necessarily need any additionalimplementation in hardware beyond the normal behavior of the SLL[32] and ADDIU[RS5] instructions.

If software intentionally generates a NOP instruction, it should only generate these specific encodings, rather than other instructions writing to $0 which would also result in no operation.

If hardware implements a performance counter for nops,it can expect these specific instruction encodings to be used.

It should ignore the x field of the encoding, treating all values of x as representing

a valid NOP instruction. Software on the other hand should only generate NOP instructions with an x value of 0.

As for all instruction definitions containing x fields, this methodology allows for the possibility that the meaning of x values other than zero might be enhanced in the future, with the understanding that cores

prior to the enhanced definition will treat the x!=0 encodings as equivalent to the x==0 instruction.

Exceptions:

None.

Assembly:

NOR rd, rs, rt

nanoMIPS

NOR

Purpose:

NOR. Compute logical NOR of registers $rs and $rt, placing the result in register $rt.

Availability:

nanoMIPS

Format:

001000

rt

rs

rd

x

1011010

000

6

5

5

5

1

7

3

Operation:

GPR[rd] = ~(GPR[rs] | GPR[rt])

Exceptions:

None.

Assembly:

NOT rt, rs

nanoMIPS

NOT

Purpose:

NOT. Write logical inversion of register $rs to register $rt.

Availability:

nanoMIPS

Format:

010100

rt3

rs3

00

0

0

6

3

3

2

1

1

Operation:

rt = decode_gpr(rt3, 'gpr3')
rs = decode_gpr(rs3, 'gpr3')
GPR[rt] = ~GPR[rs]

Exceptions:

None.

Assembly:

OR rd, rs, rt

nanoMIPS

OR

Purpose:

OR. Compute logical OR of registers $rs and $rt, placing the result in register $rt.

Availability:

nanoMIPS

Format:

OR[32]

001000

rt

rs

rd

x

1010010

000

6

5

5

5

1

7

3

OR[16]

010100

rt3

rs3

11

0

0

6

3

3

2

1

1

rt = decode_gpr(rt3, 'gpr3')
rs = decode_gpr(rs3, 'gpr3')
rd = rt

Operation:

GPR[rd] = GPR[rs] | GPR[rt]

Exceptions:

None.

Assembly:

ORI rt, rs, u

nanoMIPS

OR Immediate

Purpose:

OR Immediate. Compute logical OR of register $rs with immediate u, placing the result inregister $rt.

Availability:

nanoMIPS

Format:

100000

rt

rs

0000

u

6

5

5

4

12

Operation:

GPR[rt] = GPR[rs] | u

Exceptions:

None.

Assembly:

PAUSE

nanoMIPS

Pause

Purpose:

Pause. Pause until LL Bit is cleared.

Availability:

nanoMIPS

Format:

100000

00000

x

1100

x

0000

00101

6

5

5

4

3

4

5

Operation:

if C0.LLAddr.LLB:
    CPU.in_pause_state = True

The purpose ofthe PAUSE instruction is halt a thread (rather than entering a spin loop) when itis waiting to acquire an LL/SC lock. This is particularly useful on multi-threaded processors, since the

waiting thread may be using the same instruction pipeline as the thread which currently owns the lock, and hence entering a spin loop will delay the other thread from completing its task and freeing the

lock.

When a thread is in the paused state,it should not issue any instructions. The paused state will be

cleared either if the LLBit for the thread gets cleared, or if the thread takes an interrupt.If an interrupt occurs,

it is implementation dependent whether C0.EPC points to the PAUSE instruction or the

instruction after the PAUSE.

In LL/SC lock software, the LLBit of the waiting thread will always be cleared when the thread which owns the lock does a store instruction to the lock address in order to clear the lock. Thus the paused

thread will always be woken when it has another opportunity to acquire the lock. After the PAUSE instruction completes, software is expected to attempt to acquire the lock again by re-executing the

LL/SC sequence.

It is legal to implement PAUSE as a NOP instruction.In this case, the behavior of LL/SC lock software will be equivalent to executing a spin loop to acquire the lock. Software using PAUSE will still work,

but the benefit of having the waiting thread not consume instruction issue slots will be lost.

PAUSE is encoded as an SLL instruction with a shift value of 5, targeting GPR $0. Hence PAUSE will behave as a NOP instruction if no additional behavior beyond that of SLL is implemented.

The following assembly code example shows how the PAUSE instruction can be used to halt a thread while it is waiting to acquire an LL/SC lock.

acquire_lock:
        ll      t0, 0(a0)    /* Read softwarelock, set LLBit. */
        bnezc  t0, acquire_lock_retry /* Branch if softwarelock is taken.*/
        addiu   t0, t0, 1    /* Set the software lock. */
        sc      t0, 0(a0)    /* Try to store the softwarelock. */
        bnezc  t0, 10f      /* Branchiflockacquired successfully.*/
        sync
acquire_lock_retry:
        pause                /* Wait for LLBITtoclear before retrying. */
        bc      acquire_lock /* Now retrytheoperation. */
10:
        /* Critical Region Code */
        ...
release_lock:
        sync
        sw      zero, 0(a0)  /* Releasesoftwarelock,clearing LLBIT
                                for any PAUSEd waiters */

Exceptions:

None.

Assembly:

PREF hint, offset(rs)

nanoMIPS, availability varies by format.

Prefetch/Prefetch using EVA addressing

PREFE hint, offset(rs)

nanoMIPS, availability varies by format.

Prefetch/Prefetch using EVA addressing

Purpose:

Prefetch/Prefetch using EVA addressing. Perform a prefetch operation of type hint at address $rs + offset (register plus immediate). For PREFE, translate the virtual address as though the core is in user mode, although it is actually in kernel mode.

Availability:

nanoMIPS, availability varies by format.

Format:

PREF[S9]

101001

hint!=31

hint

rs

s[8]

0011

0

00

s[7:0]

6

5

5

1

4

1

2

8

offset = sign_extend(s, from_nbits=9)
is_eva = False

PREF[U12]

100001

hint!=31

hint

rs

0011

u

6

5

5

4

12

offset = u
is_eva = False

PREFE, present when Config5.EVA=1, requires CP0 privilege.

312625212016151411109870

with hint!=31

101001

hint

rs

s[8]

0011

0

10

s[7:0]

6

5

5

1

4

1

2

8

offset = sign_extend(s, from_nbits=9)
is_eva = True

Operation:

if is_eva and not C0.Config5.EVA:
    raise exception('RI')
if is_eva and not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
va = effective_address(GPR[rs], offset, 'Load', eva=is_eva)
# Perform implementation dependent prefetch actions
pref(va, hint, eva=is_eva)

The PREF and PREFE instructions request that the processor take some action to improve program performance in accordance with the intended data usage specified by the hint argument.This is

typically done by moving data to or from the cache at the specified address. The meanings of hint are as follows:

load

Use: Prefetched data is expected to be read (not modified).

Action: Fetch data as if for a load.

Use: Prefetched data is expected to be stored or modified.

Action: Fetch data as if for a store.

Mark the line as LRU in the L1 cache and thus preferred for next eviction.

Implementations can choose to writeback and/or invalidate the line as long as no architectural state is

modified.

load_streamed

Use: Prefetched data is expected to be read (not modified) but not reused extensively;

it

”streams” through cache.

Action: Fetch data as if for a load and place it in the cache so that it does not displace data

prefetched as ”retained”.

Use: Prefetched data is expected to be stored or modified but not reused extensively;

it

”streams” through cache.

Action: Fetch data as if for a store and place it in the cache so that it does not displace data

prefetched as ”retained”.

load_retained

Use: Prefetched data is expected to be read (not modified) and reused extensively; it should

be ”retained” in the cache.

Action: Fetch data as if for a load and place it in the cache so that it is not displaced by data

prefetched as ”streamed”.

Use: Prefetched data is expected to be stored or modified and reused extensively; it should

be ”retained” in the cache.

Action: Fetch data as if

for a store and place it in the cache so that it is not displaced by

data prefetched as ”streamed”.

In the Release 6 architecture, hint codes 8..15 are treated the same as hint codes 0..7 respectively, but operate on the L2 cache.

In the Release 6 architecture, hint codes 16..23 are treated the same as hint codes 0..7 respectively, but operate on the L3 cache.

These hint codes are reserved in nanoMIPS and should act as a NOP. (This is not the same

as the MIPSR6 behavior, where these hints give a Reserved Instruction exception). Note that hint=31 is not listed as that encoding is decoded as a SYNCI instruction.

The action taken for a specific PREF instruction is both system and context dependent. Any action, including doing nothing, is permitted as long as it does not change architecturally visible state or alter

the meaning of a program.

PREF does not cause addressing-related exceptions, including TLB exceptions.If the address specified would cause an addressing exception, the exception condition is ignored and no data movement occurs.

For cached addresses, the expected and useful action is for the processor to prefetch a block of data that includes the effective address. The size of the block and the level of the memory hierarchy it is

fetched into are implementation specific.

PREF neither generates a memory operation nor modifies the state of a cache line for addresses with an uncached CCA.

Prefetch operations have no effect on cache lines that were previously locked with the CACHE instruction.

In coherent multiprocessor implementations,if the effective address uses a coherent CCA, then the instruction causes a coherent memory transaction to occur.This means a prefetch issued on one

processor can cause data to be evicted from the cache in another processor.

The memory transactions which occur as a result of a PREF instruction, such as cache refill or cache writeback, obey the same ordering and completion rules as other load or store instructions.

It is implementation dependent whether a Bus Error or Cache Error exception is reported if such an error is detected as a byproduct ofthe action taken by the PREF instruction.Implementations are

encouraged to report such errors only if there is a specific requirement for high-reliability. Note that

suppressing a bus or cache error in this case may require that the processor communicate to the system that the reference is speculative.

Hint field encodings whose function is described as ”streamed” or ”retained” convey usage intent from software to hardware. Software should not assume that hardware will always prefetch data in an

optimal way.If data is to be truly retained, software should use the Cache instruction to lock data into the cache.

Itis implementation dependent whether a data watch or EJTAG breakpoint exception is triggered by a prefetch instruction whose address matches the Watch register address match or EJTAG data

breakpoint conditions. The preferred implementation is not to match on the prefetch instruction.

Exceptions:

Bus Error. Cache Error. Coprocessor Unusable for PREFE. Reserved Instruction for PREFE if EVA not implemented.

Assembly:

RDHWR rt, hs, sel

nanoMIPS, not available in NMS

Read Hardware Register

Purpose:

Read Hardware Register. Read specific CP0 privileged state (identified by hs, sel) to register$rs. Kernel code can enable or disable user mode RDHWR accesses by programming the enable bits in the HWREna register.

Availability:

nanoMIPS, not available in NMS

Format:

001000

rt

hs

sel

x

0111000

000

6

5

5

5

1

7

3

Operation:

if C0.Config5.NMS == 1:
    raise exception('RI')
if not IsCoprocessor0Enabled():
   if not C0.HWREna & (1 << hs):
       raise exception('RI', 'Required HWREnabitnotset')
if sel and hs != 4:
    raise exception('RI', 'sel fieldnot supported for this hs')
if is_guest_mode():
    check_gpsi('CP0')
if hs == 0:
    GPR[rt] = C0.EBase.CPUNum
elif hs == 1:
    GPR[rt] = synci_step()
elif hs == 2:
    if is_guest_mode():
        check_gpsi('GT')
        GPR[rt] = guest_count()
    else:
        GPR[rt] = C0.Count
elif hs == 3:
    GPR[rt] = CPU.count_resolution
elif hs == 4:
    if not C0.Config1.PC:
        raise exception('RI', 'Perf Counters not implemented')
    GPR[rt] = read_cp0_register(25, sel)  # Performance counter register
elif hs == 5:
    GPR[rt] = C0.Config5.XNP
elif hs == 29:
    if not C0.Config3.ULRI:
        raise exception('RI')
    GPR[rt] = sign_extend(C0.UserLocal)
else:
    raise exception('RI')

Exceptions:

Coprocessor Unusable. Reserved Instruction for unsupported register numbers. Reserved Instruction on NMS cores.

Assembly:

RDPGPR rt, rs

nanoMIPS. Requires CP0 privilege.

Read Previous GPR

Purpose:

Read Previous GPR. Write the value of register $rs from the previous shadow register set(SRSCtl.PSS) to register $rt in the current shadow register set (SRSCtl.CSS). If shadow register sets are not implemented,just copy the value from register $rs to register $rt.

Availability:

nanoMIPS. Requires CP0 privilege.

Format:

001000

rt

rs

11

10000

101

111

111

6

5

5

2

5

3

3

3

Operation:

if not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
if C0.SRSCtl.HSS > 0:
    GPR[rt] = SRS[C0.SRSCtl.PSS][rs]
else:
    GPR[rt] = GPR[rs]

Exceptions:

Coprocessor Unusable.

Assembly:

RESTORE     u[, dst1 [, dst2 [, ...]]] # jr=0 implied

nanoMIPS, availability varies by format.

Restore callee saved registers/Restore callee saved registers and Jump to Return address,

RESTORE.JRC u[, dst1 [, dst2 [, ...]]]  # jr=1 implied

nanoMIPS, availability varies by format.

Restore callee saved registers/Restore callee saved registers and Jump to Return address,

Purpose:

Restore callee saved registers/Restore callee saved registers and Jump to Return address,Compact. Restore registers dst1, [dst2,...]from addresses at the top of the local stack frame ($29 +

u - 4, $29 + u - 8, ...), then point register $29 back to the caller’s stack frame by adding offset u. For RESTORE.JRC, return from the current subroutine by jumping to the address in $31.

Availability:

nanoMIPS, availability varies by format.

Format:

RESTORE[32]

100000

rt

0

count

0011

u[11:3]

gp

10

6

5

1

4

4

9

1

2

jr = 0

RESTORE.JRC[16]

000111

rt1

1

u[7:4]

count

6

1

1

4

4

rt = 30 if rt1 == 0 else 31
gp = 0
jr = 1

RESTORE.JRC[32], gp case not available in NMS

100000

rt

0

count

0011

u[11:3]

gp

11

6

5

1

4

4

9

1

2

jr = 1

Operation:

if gp and C0.Config5.NMS:
    raise exception('RI')
i = 0
while i != count:
    this_rt = ( 28          if gp and (i + 1 == count) else
                rt + i      if rt + i < 32             else
                rt + i - 16                                 )
    this_offset = u - ( (i+1) << 2 )
    va = effective_address(GPR[29], this_offset, 'Load')
    if va & 3:
        raise exception('ADEL', badva=va)
    data = read_memory_at_va(va, nbytes=4)
    GPR[this_rt] = sign_extend(data, from_nbits=32)
    if this_rt == 29:
        raise UNPREDICTABLE()
    i += 1
GPR[29] = effective_address(GPR[29], u)
if jr:
    CPU.next_pc = GPR[31]

The purpose of the RESTORE and RESTORE.JRC instructions is to restore callee saved registers from the stack on exit from a subroutine, adjust the stack pointer register $29 to point to the caller’s stack

frame, and for RESTORE.JRC to return from the subroutine by jumping to the address in register $31. RESTORE/RESTORE.JRC will usually be paired with a matching SAVE instruction at the start of the

subroutine, and SAVE and RESTORE take the same arguments.

The arguments for RESTORE/RESTORE.JRC consist of the amount to increment the stack by, and a list of registers to restore from to the stack. The increment is a double word aligned immediate value u

in the range 0 to 4092. The register list can contain up to 16 consecutive registers. The count of the number of registers is encoded in the instruction’s count field. The first register in the list is encoded

in the rt field of the instruction.

The register list is allowed to wrap around from register $31 back to register $16 and still be considered consecutive; this allows fp ($30) and ra ($31) and the saved temporary registers s0-s7 ($16 - $23) to

be restored in one instruction.

Additionally, $28 (the global pointer register) will be used in place of last register in the sequence if the ’gp’ bit in the instruction encoding is set. This feature (which is not available for NMS cores) makes it

possible to treat $28 as a callee saved register for environments such as Linux which require it.

The restored registers are read from memory addresses $29+ u -4, $29 + u -8, $29 + u -12,... etc,i.e. at the top of the local stack frame. The stack pointer is then adjusted by adding the size u of

the local stack frame, so that it points back to the caller’s stack frame.

RESTORE.JRC with count=0 adjusts the stack pointer and jumps to the address in $31, but does not restore any registers from memory.Thus the RESTORE.JRC[16]instruction format can be used to

provide ADDIU $29, $29,u; JRC $31 behavior using a single 16 bit instruction.

The result of a RESTORE instruction is UNPREDICTABLE if the register list includes register $29.

RESTORE/RESTORE.JRC must be implemented in such a way as to make the instructions restartable,

butthe implementation does not need to be fully atomic.Forinstance,itis allowable for a RESTORE/RESTORE.JRC instruction to be aborted by an exception after a subset of the register updates

have occurred. To ensure restartability, the write to GPR $29 and the jump (for RESTORE.JRC) must be completed atomically, that is, the instruction must graduate if and only if those writes occur.

Exceptions:

Address Error. Bus Error. Reserved Instruction for gp=1 cases on NMS cores. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.

Assembly:

ROTR rt, rs, shift

nanoMIPS

Rotate Right

Purpose:

Rotate Right. Rotate the word value in register $rs by shift value shift, and place the resultin register $rt.

Availability:

nanoMIPS

Format:

100000

rt

rs

1100

x

0110

shift

6

5

5

4

3

4

5

Operation:

tmp = GPR[rs][31:0] @ GPR[rs][31:0]
result = tmp >> shift
GPR[rt] = sign_extend(result, from_nbits=32)

Exceptions:

None.

Assembly:

ROTRV rd, rs, rt

nanoMIPS

Rotate Right Variable

Purpose:

Rotate Right Variable. Rotate the word value in register $rs by the shift value contained inregister $rt, and place the result in register $rd.

Availability:

nanoMIPS

Format:

001000

rt

rs

rd

x

0011010

000

6

5

5

5

1

7

3

Operation:

shift = GPR[rt] & 0x1f
tmp = GPR[rs][31:0] @ GPR[rs][31:0]
result = tmp >> shift
GPR[rd] = sign_extend(result, from_nbits=32)

Exceptions:

None.

Assembly:

ROTX rt, rs, shift, shiftx, stripe

nanoMIPS, not available in NMS

Rotate and eXchange

Purpose:

Rotate and eXchange. Rotate and exchange bits in the word value in register $rs and placeresult in register $rt. Specific choices of the shift, shiftx and stripe arguments allow this instruction to perform bit and byte reordering operations including BYTEREVW, BYTEREVH, BITREVW, BITREVH

and BITREVB.

Availability:

nanoMIPS, not available in NMS

Format:

100000

rt

rs

1101

0

shiftx[4:1]

stripe

0

shift

6

5

5

4

1

4

1

1

5

Operation:

if C0.Config5.NMS:
    raise exception('RI')
tmp0 = GPR[rs][31:0] @ GPR[rs][31:0]
tmp1 = tmp0
for i in range(47):  # 0..46
    s = shift if (i & 0b01000) else shiftx
    if stripe and not (i & 0b00100): s = ~s
    if s[4]: tmp1[i] = tmp0[i+16]
tmp2 = tmp1
for i in range(39):  # 0..38
    s = shift if (i & 0b00100) else shiftx
    if s[3]: tmp2[i] = tmp1[i+8]
tmp3 = tmp2
for i in range(35):  # 0..34
    s = shift if (i & 0b00010) else shiftx
    if s[2]: tmp3[i] = tmp2[i+4]
tmp4 = tmp3
for i in range(33):  # 0..32
    s = shift if (i & 0b00001) else shiftx
    if s[1]: tmp4[i] = tmp3[i+2]
tmp5 = tmp4
for i in range(32):  # 0..31
    s = shift;
    if s[0]: tmp5[i] = tmp4[i+1]
GPR[rt] = sign_extend(tmp5, from_nbits=32)

The ROTX instruction can be used to reverse elements of a selected size within blocks of a different selected size. Some example use cases are shown in the table below. The ’Result’ shows the output value

assuming an input value of abcdefgh ijklmnopqrstuvwx yz012345, where each character represents the value of a single bit.

Assembly/Result from

                                                              abcdefgh ijklmnop qrstuvwx

AliasOperationyz012345

BITREVWReverse all bitsROTX rt, rs, 31, 0

                                                              543210zy xwvutsrq ponmlkji
                                                              hgfedcba

BITREVHReverse bits in halfsROTX rt, rs, 15, 16

                                                              ponmlkji hgfedcba 543210zy
                                                              xwvutsrq

BITREVBReverse bits in bytesROTX rt, rs, 7, 8, 1

                                                              hgfedcba ponmlkji xwvutsrq
                                                              543210zy

BYTEREVWReverse all bytesROTX rt, rs, 24, 8

                                                              yz012345 qrstuvwx ijklmnop
                                                              abcdefgh

BYTEREVHReverse bytes in halfsROTX rt, rs, 8, 24

                                                              ijklmnop abcdefgh yz012345
                                                              qrstuvwx

Reverse all nibblesROTX rt, rs, 28, 4

                                                              2345yz01 uvwxqrst mnopijkl
                                                              efghabcd

Reverse nibbles in halfsROTX rt, rs, 12, 20

                                                              mnopijkl efghabcd 2345yz01
                                                              uvwxqrst

Reverse nibbles in bytesROTX rt, rs, 4, 12, 1

                                                              efghabcd mnopijkl uvwxqrst

Assembly/Result from

                                                              abcdefgh ijklmnop qrstuvwx

AliasOperationyz012345

Reverse all bit pairsROTX rt, rs, 30, 2

                                                              452301yz wxuvstqr opmnklij
                                                              ghefcdab

Reverse pairs in halfsROTX rt, rs, 14, 18

                                                              opmnklij ghefcdab 452301yz
                                                              wxuvstqr

Reverse pairs in bytesROTX rt, rs, 6, 10, 1

                                                              ghefcdab opmnklij wxuvstqr
                                                              452301yz

Assembler aliases are provided for certain cases, as indicated in the table.

The MIPS32™ instructions BITSWAP and WSBH are equivalent to BITREVB and BYTEREVH respectively, and are also provided as assembler aliases to ROTX.

The ROTX instruction is designed to be implementable with minimal overhead using existing logic for the ROTR instruction. ROTR can be implemented using a barrel shifter, where the select signals for

the multiplexers at each stage are the bits of the ’shift’ argument. For ROTX, the mux select signals depend on the bit position as well as the stage of the shifter, and are a function of the ’shift’,’shiftx’

and ’stripe’ arguments.

Exceptions:

Reserved Instruction on NMS cores.

Assembly:

SAVE u[, src1 [, src2 [, ...]]]

nanoMIPS, availability varies by format.

Save callee saved registers

Purpose:

Save callee saved registers.

Save registers src1,[src2,...]to addresses just below the

current stack pointer ($29) address and adjust the stack pointer by subtracting offset u to accommodate the saved registers and the local stack frame.

Availability:

nanoMIPS, availability varies by format.

Format:

SAVE[16]

000111

rt1

0

u[7:4]

count

6

1

1

4

4

rt = 30 if rt1 == 0 else 31
gp = 0

SAVE[32], gp case not available in NMS

100000

rt

0

count

0011

u[11:3]

gp

00

6

5

1

4

4

9

1

2

Operation:

if gp and C0.Config5.NMS:
    raise exception('RI')
i = 0
while i != count:
    this_rt = ( 28          if gp and (i + 1 == count) else
                rt + i      if rt + i < 32             else
                rt + i - 16                                 )
    this_offset = - ( (i+1) << 2 )
    va = effective_address(GPR[29], this_offset, 'Load')
    if va & 3:
        raise exception('ADES', badva=va)
    data = zero_extend(GPR[this_rt], from_nbits=32)
    write_memory_at_va(data, va, nbytes=4)
    i += 1
GPR[29] = effective_address(GPR[29], -u)

The purpose of the SAVE instruction is to save callee saved registers to the stack on entry to a subroutine, and adjust the stack pointer register ($29) to accommodate the saved registers and the subroutine’s local stack frame.

The instruction specification consists of the amount to decrement the stack by, and a list of registers to save to the stack. The stack decrement is a double word aligned immediate value u in the range

0 to 4092. The register list can contain up to 16 consecutive registers. The count of the number of registers in the register list is encoded in the instruction’s count field. The first register in the list is

encoded in the rt field of the instruction.

The register list is allowed to wrap around from register $31 back to register $16 and still be considered consecutive; this allows fp ($30) and ra ($31) and the saved temporary registers s0-s7 ($16 - $23) to

be saved in one instruction.

Additionally, $28 (the global pointer register) will be used in place of last register in the sequence if the ’gp’ bit in the instruction encoding is set. This feature (which is not available for NMS cores) makes it

possible to treat $28 as a callee saved register for environments such as Linux which require it.

The saved registers are written to memory addresses $29-4, $29-8, $29-12,...etc,i.e.just below the current stack pointer address. The stack pointer is then adjusted by subtracting offset u, which

should be chosen to accommodate the saved registers and current subroutine’s local stack frame, while maintaining the required stack pointer alignment.

SAVE with count=0 adjusts the stack pointer but does not save any registers to memory.Thus the

SAVE[16] instruction format can be used to provide ADDIU16$29, $29, -u behavior.

SAVE must be implemented in such a way as to make the instruction restartable, but the implementation does not need to be fully atomic. For instance, it is allowable for a SAVE instruction to be aborted

by an exception after a subset ofthe memory updates have occurred.To ensure restartability,the write to GPR $29 must be completed atomically,that is,the instruction must graduate if and only if

that write occurs.

Exceptions:

Address Error.Bus Error.Reserved Instruction for gp=1 cases on NMS cores.TLB Invalid.TLB Modified. TLB Refill. Watch.

Assembly:

SB rt, offset(rs)

nanoMIPS

Store Byte

Purpose:

Store Byte.

Store byte from register $rt to memory address $rs + offset (register plus

immediate).

Availability:

nanoMIPS

Format:

SB[U12]

100001

rt

rs

0001

u

6

5

5

4

12

offset = u

SB[16]

010111

rtz3

rs3

01

u

6

3

3

2

2

rt = decode_gpr(rtz3, 'gpr3.src.store')
rs = decode_gpr(rs3, 'gpr3')
offset = u

SB[GP]

010001

rt

001

u

6

5

3

18

rs = 28
offset = u

SB[S9]

101001

rt

rs

s[8]

0001

0

00

s[7:0]

6

5

5

1

4

1

2

8

offset = sign_extend(s, from_nbits=9)

Operation:

va = effective_address(GPR[rs], offset, 'Store')
data = zero_extend(GPR[rt], from_nbits=8)
write_memory_at_va(data, va, nbytes=1)

Exceptions:

Address Error. Bus Error. TLB Invalid. TLB Modified. TLB Refill. Watch.

Assembly:

SBE rt, offset(rs)

nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege.

Store Byte using EVA addressing

Purpose:

Store Byte using EVA addressing.

Store byte from register $rt to virtual address $rs +

offset,translating the virtual address as though the core is in user mode, although it is actually in
 kernel mode.

Availability:

nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege.

Format:

101001

rt

rs

s[8]

0001

0

10

s[7:0]

6

5

5

1

4

1

2

8

Operation:

offset = sign_extend(s, from_nbits=9)
if not C0.Config5.EVA:
    raise exception('RI')
if not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
va = effective_address(GPR[rs], offset, 'Store', eva=True)
data = zero_extend(GPR[rt], from_nbits=8)
write_memory_at_va(data, va, nbytes=1, eva=True)

Exceptions:

Address Error. Bus Error. Coprocessor Unusable. Reserved Instruction if EVA not implemented. TLB Invalid. TLB Modified. TLB Refill. Watch.

Assembly:

SBX rd, rs(rt)

nanoMIPS, not available in NMS

Store Byte indeXed

Purpose:

Store Byte indeXed. Store byte from register $rt to memory address $rt + $rs (registerplus register).

Availability:

nanoMIPS, not available in NMS

Format:

001000

rt

rs

rd

0001

0

000

111

6

5

5

5

4

1

3

3

Operation:

if C0.Config5.NMS == 1:
    raise exception('RI')
va = effective_address(GPR[rs], GPR[rt], 'Store')
data = zero_extend(GPR[rd], from_nbits=8)
write_memory_at_va(data, va, nbytes=1)

Exceptions:

Address Error. Bus Error. Reserved Instruction on NMS cores. TLB Invalid. TLB Modified. TLB Refill. Watch.

Assembly:

SC   rt, offset(rs)

nanoMIPS, availability varies by format.

Store Conditional word/Store Conditional word using EVA addressing/Store Conditional

SCE   rt, offset(rs)

nanoMIPS, availability varies by format.

Store Conditional word/Store Conditional word using EVA addressing/Store Conditional

SCWP  rt, ru, (rs)

nanoMIPS, availability varies by format.

Store Conditional word/Store Conditional word using EVA addressing/Store Conditional

SCWPE rt, ru, (rs)

nanoMIPS, availability varies by format.

Store Conditional word/Store Conditional word using EVA addressing/Store Conditional

Purpose:

Store Conditional word/Store Conditional word using EVA addressing/Store Conditional

Word Pair/Store Conditional Word Pair using EVA addressing. Store conditionally to complete atomic read-modify-write. For SC/SCE, store from register $rt to address $rs + offset (register plus offset).

For SCWP/SCWPE, store from registers $rt and $ru to address $rs. For SCE/SCWPE, translate the virtual address as though the core is in user mode, although itis actually in kernel mode.Indicate

Indicate success or failure by writing 1 or 0 respectively to $rt.

Availability:

nanoMIPS, availability varies by format.

Format:

SC

101001

rt

rs

s[8]

1011

0

01

s[7:2]

00

6

5

5

1

4

1

2

6

2

offset = sign_extend(s, from_nbits=9)
nbytes = 4
is_eva = False

SCE, present when Config5.EVA=1, requires CP0 privilege.

101001

rt

rs

s[8]

1011

0

10

s[7:2]

00

6

5

5

1

4

1

2

6

2

offset = sign_extend(s, from_nbits=9)
nbytes = 4
is_eva = True

SCWP, required (optional on NMS cores).

101001

rt

rs

x

1011

0

01

ru

x

01

6

5

5

1

4

1

2

5

1

2

offset = 0
nbytes = 8
is_eva = False

SCWPE, present when Config5.EVA=1. Requires CP0 privilege.

101001

rt

rs

x

1011

0

10

ru

x

01

6

5

5

1

4

1

2

5

1

2

offset = 0
nbytes = 8
is_eva = True

Operation:

if nbytes == 8 and C0.Config5.XNP:
    raise exception('RI', 'SCWP[E] requires word-paired support')
if is_eva and not C0.Config5.EVA:
    raise exception('RI')
va = effective_address(GPR[rs], offset, 'Store', eva=is_eva)
# Linked access must be aligned.
if va & (nbytes-1):
    raise exception('ADES', badva=va)
pa, cca = va2pa(va, 'Store', eva=is_eva)
if (cca == 2 or cca == 7) and not C0.Config5.ULS:
    raise UNPREDICTABLE('uncached CCA not synchronizablewhen Config5.ULS=0')
    # (Preferred behavior for non-synchronizableaddressisBusError).
if nbytes == 4: # SC/SCE
    data = zero_extend(GPR[rt], from_nbits=32)
else:  # SCWP/SCWPE
    word0 = GPR[rt][31:0]
    word1 = GPR[ru][31:0]
    data = word0 @ word1 if C0.Config.BE else word1 @ word0
# Write this data to memory, but only if it can bedoneatomicallywith
# respect to a prior linked load. The return valueindicateswhetherthewrite
# occurred.
success = write_memory(data, va, pa, cca, nbytes=nbytes, atomic=True)
if success:
    GPR[rt] = 1
else:
    GPR[rt] = 0
C0.LLAddr.LLB = 0  # SC always clears LLbitregardless of address matches.

The SC, SCE, SCWP and SCWPE instructions are used to complete the atomic read-modify-write (RMW) sequence begun by a prior matching LL/LLE/LLWP/LLWPE instruction respectively.If the system can

guarantee that the write to memory can be completed prior to any other modification to the targeted data since it was read by the load-linked instruction which initiated the sequence, then the write will

complete and register $rt will be set to 1,indicating success. Otherwise, the memory write will not occur, and register $rt will be set to 0, indicating failure.

If any ofthe following events occur between a load-linked and a store conditionalinstruction,the store-conditional will fail:

butitis atleast one word and at mostthe minimum page size.Typically,the synchronizable block size is the size of the largest cache line in use.

(Note that nanoMIPS™ also includes the ERETNC instruction, which will not

cause the store-conditional instruction to fail.)

If any of the following events occur between a load-linked and a store conditional instruction, the storeconditional may fail when it would otherwise have succeeded. Portable programs should not cause any

of these events:

if a load or store is executed on a processor executing a loadlinked/store-conditional sequence, and that

load or store is notto the block of synchronizable

physical memory containing the load-linked data. This is because the load or store may cause the load-linked data to be evicted from the cache.

if any PREF instruction is executed a processor executing a loadlinked/store-conditional sequence, due to the possibility of the PREF causing a cache eviction.

executed during a load-linked/store-conditional sequence and that store is to the block of synchronizable physical memory containing the linked data.

if the instructions executed starting with the load-linked instruction and ending with the store-conditional instruction do not lie in a 2048-byte contiguous region

of virtual memory.(The region does not have to be aligned, other than the alignment required for instruction words.)

if a CACHE operation is carried out during the load-linked/storeconditional sequence, due to the possibility of modifying or evicting the line containing the linked

data.In addition, non-local CACHE operations may cause a store-conditionalinstruction to fail on either the local processor or on the remote processor in multiprocessor or multi-threaded

systems.

The store-conditional must not fail as a result of any of the following events:

a load-linked/store-conditional sequence if the load targets the block of synchronizable physical memory containing the load-linked data.

The outcome of the store-conditional is not predictable (it may succeed or fail) under any of the following conditions:

must be preceded by LLWP, and SCWPE must be preceded by LLWPE.

do not target identical virtual addresses, physical addresses and CCAs.

processor and system configurations, and on the memory access type used for the location.

is unpredictable ifthe memory access does not use a CCA which

supports atomic RMW for the targeted address.

For uniprocessor systems, a cached noncoherent or cached coherent CCA must be used, or

additionally an uncached CCA can be used in the case that Config5.ULS=1.

For multi-processor systems or systems containing coherent I/O devices, a cached coherent CCA must be used, or additionally an uncached CCA can be used in the case that Config5.ULS=1.

When Config5.ULS=1, uncached load-linked/store-conditional operations are supported, with the following additional constraints:

memory containing the targeted data using any other CCA than that used by the load-linked and store-conditional instructions.

an address in the system which supports uncached RMW accesses.In particular,the system must implement a ”monitor”, which is responsible determining whether or not the address can

be updated atomically with respect to the prior linked load.In response to a store-conditional instruction,the monitor updates memory where appropriate and communicates the resultto

the processor that initiated the sequence.It is implementation dependent as to what form the monitor takes. The recommended response for load-linked/store-conditionalinstructions which

target a non-synchronizable uncached address is that the sub-system report a Bus Error to the processor.

to fail if the store address matches that of the sequence.

is because the event which would wake the CPU from the paused state may only be visible to the external monitor, not to the local processor.

including UCA (UnCached Accelerated). An implementation that supports UCA must guarantee that a store-conditionalinstruction does not participate in store gathering and that it ends any

gathering initiated by stores preceding the SC in program order when the SC address coincides with a gathering address.

The effective address of a store-conditional operation must be naturally-aligned,i.e. word aligned for SC and SCE, and double-word aligned for SCWP and SCWPE: Otherwise an address exception occurs.

The following assembly code shows a possible usage of LL and SC to atomically update a memory location:

L1:
ll       t1, 0(t0)     # Load counter.
addiu    t2, t1, 1     # Increment.
sc       t2, 0(t0)     # Try to store, checking for atomicity.
beqc     t2, 0, L1     # If not atomic (0), try again.

Exceptions between the load-linked and store-conditionalinstructions cause the store-conditional to

fail, so instructions which can cause persistent exceptions must not be used within the load-linked/storeconditional sequence. Examples of instructions which must be avoided are are arithmetic operations

that trap, system calls, and floating point operations that trap or require software emulation assistance.

Load-linked and store-conditional must function correctly on a single processor for cached noncoherent memory so that parallel programs can be run on uniprocessor systems that do not support cached

coherent memory access types.

Support for the paired word instructions SCWP/SCWPE is indicated by the Config5.XNP bit. Paired word support is required for nanoMIPS™ cores, except for NMS cores, where it is optional.

Exceptions:

Address Error.Bus Error.Coprocessor UnusableforSCE/SCWPE. ReservedInstructionfor SCE/SCWPE if EVA notimplemented.Reserved Instruction for SCWP/SCWPE ifload linked pair

not implemented. TLB Invalid. TLB Modified. TLB Refill. Watch.

Assembly:

SDBBP code

nanoMIPS. Optional, present when Debug implemented.

Software Debug Breakpoint

Purpose:

Software Debug Breakpoint. Cause a Software Debug Breakpoint exception.

Availability:

nanoMIPS. Optional, present when Debug implemented.

Format:

SDBBP[32]

000000

00000

11

code

6

5

2

19

SDBBP[16]

000100

00000

11

code

6

5

2

3

Operation:

if C0.Config1.EP == 0:
    raise exception('RI', 'Debug not implemented')
if C0.Config5.SBRI and EffectiveKSU() != 0:
    raise exception('RI', 'SBRI exception')
if Root.C0.Config5.SBRI and is_guest_mode():
    raise exception('RI', 'Root SBRIexception', g=False)
debug_exception('BP')
Root.C0.Debug.DBp = 1
raise EXCEPTION()

Exceptions:

Software Debug Breakpoint. Reserved Instruction if Debug not implemented.

Assembly:

SEB rt, rs

nanoMIPS, not available in NMS

Sign Extend Byte

Purpose:

Sign Extend Byte. Take the lower byte of the value in register $rs, sign extend it, and placethe result in register $rt.

Availability:

nanoMIPS, not available in NMS

Format:

001000

rt

rs

x

0000001

000

6

5

5

6

7

3

Operation:

if C0.Config5.NMS == 1:
    raise exception('RI')
GPR[rt] = sign_extend(GPR[rs], from_nbits=8)

Exceptions:

Reserved Instruction on NMS cores.

Assembly:

SEH rt, rs

nanoMIPS

Sign Extend Half

Purpose:

Sign Extend Half. Take the lower halfword of the value in register $rs, sign extend it, andplace the result in register $rt.

Availability:

nanoMIPS

Format:

001000

rt

rs

x

0001001

000

6

5

5

6

7

3

Operation:

GPR[rt] = sign_extend(GPR[rs], from_nbits=16)

Exceptions:

None.

Assembly:

SEQI rt, rs, u

nanoMIPS

Set on Equal to Immediate

Purpose:

Set on Equal to Immediate. Set the register $rt to 1 if register $rs is equal to immediatevalue u, and 0 otherwise.

Availability:

nanoMIPS

Format:

100000

rt

rs

0110

u

6

5

5

4

12

Operation:

GPR[rt] = 1 if GPR[rs] == u else 0

Exceptions:

None.

Assembly:

SH rt, offset(rs)

nanoMIPS

Store Half

Purpose:

Store Half. Store halfword from register $rt to memory address $rs + offset (register plusimmediate).

Availability:

nanoMIPS

Format:

SH[U12]

100001

rt

rs

0101

u

6

5

5

4

12

offset = u

SH[16]

011111

rtz3

rs3

0

u[2:1]

1

6

3

3

1

2

1

rt = decode_gpr(rtz3, 'gpr3.src.store')
rs = decode_gpr(rs3, 'gpr3')
offset = u

SH[GP]

010001

rt

101

u[17:1]

0

6

5

3

17

1

rs = 28
offset = u

SH[S9]

101001

rt

rs

s[8]

0101

0

00

s[7:0]

6

5

5

1

4

1

2

8

offset = sign_extend(s, from_nbits=9)

Operation:

va = effective_address(GPR[rs], offset, 'Store')
data = zero_extend(GPR[rt], from_nbits=16)
write_memory_at_va(data, va, nbytes=2)

Exceptions:

Address Error. Bus Error. TLB Invalid. TLB Modified. TLB Refill. Watch.

Assembly:

SHE rt, offset(rs)

nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege.

Store Half using EVA addressing

Purpose:

Store Half using EVA addressing. Store halfword from register $rt to virtual address $rs+ offset, translating the virtual address as though the core is in user mode, although it is actually in kernel mode.

Availability:

nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege.

Format:

101001

rt

rs

s[8]

0101

0

10

s[7:0]

6

5

5

1

4

1

2

8

Operation:

offset = sign_extend(s, from_nbits=9)
if not C0.Config5.EVA:
    raise exception('RI')
if not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
va = effective_address(GPR[rs], offset, 'Store', eva=True)
data = zero_extend(GPR[rt], from_nbits=16)
write_memory_at_va(data, va, nbytes=2, eva=True)

Exceptions:

Address Error. Bus Error. Coprocessor Unusable. Reserved Instruction if EVA not implemented. TLB Invalid. TLB Modified. TLB Refill. Watch.

Assembly:

SHX rd, rs(rt)

nanoMIPS, not available in NMS

Store Half indeXed

Purpose:

Store Half indeXed. Store halfword from register $rt to memory address $rt + $rs (registerplus register).

Availability:

nanoMIPS, not available in NMS

Format:

001000

rt

rs

rd

0101

0

000

111

6

5

5

5

4

1

3

3

Operation:

if C0.Config5.NMS == 1:
    raise exception('RI')
va = effective_address(GPR[rs], GPR[rt], 'Store')
data = zero_extend(GPR[rd], from_nbits=16)
write_memory_at_va(data, va, nbytes=2)

Exceptions:

Address Error. Bus Error. Reserved Instruction on NMS Cores. TLB Invalid. TLB Modified. TLB Refill. Watch.

Assembly:

SHXS rd, rs(rt)

nanoMIPS, not available in NMS

Store Half

Purpose:

Store HalfindeXed Scaled.Store halfword from register $rt to memory address $rt +

2*$rs (register plus scaled register).

Availability:

nanoMIPS, not available in NMS

Format:

001000

rt

rs

rd

0101

1

000

111

6

5

5

5

4

1

3

3

Operation:

if C0.Config5.NMS == 1:
    raise exception('RI')
va = effective_address(GPR[rs]<<1, GPR[rt], 'Store')
data = zero_extend(GPR[rd], from_nbits=16)
write_memory_at_va(data, va, nbytes=2)

Exceptions:

Address Error. Bus Error. Reserved Instruction on NMS Cores. TLB Invalid. TLB Modified. TLB Refill. Watch.

Assembly:

SIGRIE code

nanoMIPS

Signal Reserved Instruction Exception

Purpose:

Signal Reserved Instruction Exception.

Availability:

nanoMIPS

Format:

000000

00000

00

code

6

5

2

19

Operation:

raise exception('RI')

Exceptions:

Reserved Instruction.

Assembly:

SLL rt, rs, shift

nanoMIPS

Shift Left Logical

Purpose:

Shift Left Logical. Left shift word value in register $rs by amount shift, and place the resultin register $rt.

Availability:

nanoMIPS

Format:

SLL[32]

100000

rt

rs

1100

x

0000

shift

6

5

5

4

3

4

5

NOP[32], EHB, PAUSE, and SYNC instruction formats overlap SLL[32].Opcodes matching those instruction formats should be processed according to the description ofthose instructions, not as

SLL[32].

SLL[16]

001100

rt3

rs3

0

shift3

6

3

3

1

3

rt = decode_gpr(rt3, 'gpr3')
rs = decode_gpr(rs3, 'gpr3')
shift = 8 if shift3 == 0 else shift3

Operation:

result = GPR[rs] << shift
GPR[rt] = sign_extend(result, from_nbits=32)

Exceptions:

None.

Assembly:

SLLV rd, rs, rt

nanoMIPS

Shift Left Logical Variable

Purpose:

Shift Left Logical Variable. Left shift word value in register $rs by shift amount in register$rt, and place the result in register $rd.

Availability:

nanoMIPS

Format:

001000

rt

rs

rd

x

0000010

000

6

5

5

5

1

7

3

Operation:

shift = GPR[rt] & 0x1f
result = GPR[rs] << shift
GPR[rd] = sign_extend(result, from_nbits=32)

Exceptions:

None.

Assembly:

SLT rd, rs, rt

nanoMIPS

Set on Less Than

Purpose:

Set on Less Than. Set the register $rd to 1 if signed register $rs is less than signed register$rt, and 0 otherwise.

Availability:

nanoMIPS

Format:

001000

rt

rs

rd

x

1101010

000

6

5

5

5

1

7

3

Operation:

GPR[rd] = 1 if GPR[rs] < GPR[rt] else 0

Exceptions:

None.

Assembly:

SLTI rt, rs, u

nanoMIPS

Set on Less Than Immediate

Purpose:

Set on Less Than Immediate. Set the register $rt to 1 if the signed value in register $rs isless than immediate u, and 0 otherwise.

Availability:

nanoMIPS

Format:

100000

rt

rs

0100

u

6

5

5

4

12

Operation:

GPR[rt] = 1 if GPR[rs] < u else 0

Exceptions:

None.

Assembly:

SLTIU rt, rs, u

nanoMIPS

Set on Less Than Immediate, Unsigned

Purpose:

Set on Less Than Immediate, Unsigned. Set the register $rt to 1 if the unsigned value inregister $rs is less than immediate u, and 0 otherwise.

Availability:

nanoMIPS

Format:

100000

rt

rs

0101

u

6

5

5

4

12

Operation:

GPR[rt] = 1 if unsigned(GPR[rs]) < u else 0

Exceptions:

None.

Assembly:

SLTU rd, rs, rt

nanoMIPS

Set on Less Than, Unsigned

Purpose:

Set on Less Than, Unsigned. Set the register $rd to 1 if unsigned register $rs is less thanunsigned register $rt, and 0 otherwise.

Availability:

nanoMIPS

Format:

001000

rt

rs

rd!=0

rd

x

1110010

000

6

5

5

5

1

7

3

Operation:

GPR[rd] = 1 if unsigned(GPR[rs]) < unsigned(GPR[rt]) else 0

SLTU encodings with rd=0 are used for the DVP and EVP instructions. DVP and EVP are required to behave as NOPs on cores without Virtual Processor (VP) support. This means that no DVP/EVP special

casing is required in hardware for non-VP cores,since a SLTU instruction writing to $0 naturally behaves as a NOP.

Exceptions:

None.

Assembly:

SOV rd, rs, rt

nanoMIPS

Set on Overflow

Purpose:

Set on Overflow. Set the register $rd to 1 if the signed addition of registers $rs and $rtoverflows 32 bits, and 0 otherwise.

Availability:

nanoMIPS

Format:

001000

rt

rs

rd

x

1111010

000

6

5

5

5

1

7

3

Operation:

sum = GPR[rs] + GPR[rt]
GPR[rd] = 1 if overflows(sum, nbits=32) else 0

Exceptions:

None.

Assembly:

SRA rt, rs, shift

nanoMIPS

Shift Right Arithmetic

Purpose:

Shift Right Arithmetic. Right shift word value in register $rs by amount shift, duplicatingthe sign bit (bit 31) in the emptied bits. Place the result in register $rt.

Availability:

nanoMIPS

Format:

100000

rt

rs

1100

x

0100

shift

6

5

5

4

3

4

5

Operation:

GPR[rt] = GPR[rs] >> shift

Exceptions:

None.

Assembly:

SRAV rd, rs, rt

nanoMIPS

Shift Right Arithmetic Variable

Purpose:

Shift Right Arithmetic Variable. Right shift word value in register $rs by shift amount inregister $rt, duplicating the sign bit (bit 31) in the emptied bits. Place the result in register $rd.

Availability:

nanoMIPS

Format:

001000

rt

rs

rd

x

0010010

000

6

5

5

5

1

7

3

Operation:

shift = GPR[rt] & 0x1f
GPR[rd] = GPR[rs] >> shift

Exceptions:

None.

Assembly:

SRL rt, rs, shift

nanoMIPS

Shift Right Logical.

Purpose:

Shift Right Logical.Right shift word value in register $rs by amount shift, filling the

emptied bits with zeroes. Place the result in register $rt.

Availability:

nanoMIPS

Format:

SRL[32]

100000

rt

rs

1100

x

0010

shift

6

5

5

4

3

4

5

SRL[16]

001100

rt3

rs3

1

shift3

6

3

3

1

3

rt = decode_gpr(rt3, 'gpr3')
rs = decode_gpr(rs3, 'gpr3')
shift = 8 if shift3 == 0 else shift3

Operation:

result = zero_extend(GPR[rs], from_nbits=32) >> shift
GPR[rt] = sign_extend(result, from_nbits=32)

Exceptions:

None.

Assembly:

SRLV rd, rs, rt

nanoMIPS

Shift Right Logical Variable

Purpose:

Shift Right Logical Variable. Right shift word value in register $rs by shift amount in register$rt, filling the emptied bits with zeros. Place the result in register $rd.

Availability:

nanoMIPS

Format:

001000

rt

rs

rd

x

0001010

000

6

5

5

5

1

7

3

Operation:

shift = GPR[rt] & 0x1f
result = zero_extend(GPR[rs], from_nbits=32) >> shift
GPR[rd] = sign_extend(result, from_nbits=32)

Exceptions:

None.

Assembly:

SUB rd, rs, rt

nanoMIPS, not available in NMS

Subtract

Purpose:

Subtract. Subtract the 32-bit signed integer in register $rt from the 32-bit signed integerin register $rs, placing the 32-bit result in register $rd, and trapping on overflow.

Availability:

nanoMIPS, not available in NMS

Format:

001000

rt

rs

rd

x

0110010

000

6

5

5

5

1

7

3

Operation:

if C0.Config5.NMS == 1:
    raise exception('RI')
result = GPR[rs] -GPR[rt]
if overflows(result, nbits=32):
    raise exception('OV')
GPR[rd] = sign_extend(result, from_nbits=32)

Exceptions:

None.

Assembly:

SUBU rd, rs, rt

nanoMIPS

Subtract (Untrapped)

Purpose:

Subtract (Untrapped). Subtract the 32-bit integer in register $rt from the 32-bit integer inregister $rs, placing the 32-bit result in register $rd, and not trapping on overflow.

Availability:

nanoMIPS

Format:

SUBU[32]

001000

rt

rs

rd

x

0111010

000

6

5

5

5

1

7

3

SUBU[16]

101100

rt3

rs3

rd3

1

6

3

3

3

1

rd = decode_gpr(rd3, 'gpr3')
rs = decode_gpr(rs3, 'gpr3')
rt = decode_gpr(rt3, 'gpr3')

Operation:

result = GPR[rs] -GPR[rt]
GPR[rd] = sign_extend(result, from_nbits=32)

Exceptions:

None.

Assembly:

SW rt, offset(rs)

nanoMIPS, availability varies by format.

Store Word

Purpose:

Store Word. Store word from register $rt to memory address $rs + offset (register plusimmediate).

Availability:

nanoMIPS, availability varies by format.

Format:

SW[U12]

100001

rt

rs

1001

u

6

5

5

4

12

offset = u

SW[16]

100101

rtz3

rs3

u[5:2]

6

3

3

4

rt = decode_gpr(rtz3, 'gpr3.src.store')
rs = decode_gpr(rs3, 'gpr3')
offset = u

SW[4X4], not available in NMS

111101

rtz4[3]

u[2]

rtz4[2:0]

rs4[3]

u[3]

rs4[2:0]

6

1

1

3

1

1

3

if C0.Config5.NMS == 1:
    raise exception('RI')
rt = decode_gpr(rtz4[3] @ rtz4[2:0], 'gpr4.zero')
rs = decode_gpr(rs4[3] @ rs4[2:0], 'gpr4')
offset = u

SW[GP]

010000

rt

u[20:2]

11

6

5

19

2

rs = 28
offset = u

SW[GP16]

110101

rtz3

u[8:2]

6

3

7

rt = decode_gpr(rtz3, 'gpr3.src.store')
rs = 28
offset = u

SW[S9]

101001

rt

rs

s[8]

1001

0

00

s[7:0]

6

5

5

1

4

1

2

8

offset = sign_extend(s, from_nbits=9)

SW[SP]

101101

rt

u[6:2]

6

5

5

rs = 29
offset = u

Operation:

va = effective_address(GPR[rs], offset, 'Store')
data = zero_extend(GPR[rt], from_nbits=32)
write_memory_at_va(data, va, nbytes=4)

Exceptions:

Address Error. Bus Error. Reserved Instruction for SW[4X4] format on NMS Cores. TLB Invalid. TLB Modified. TLB Refill. Watch.

Assembly:

SWE rt, offset(rs)

nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege.

Store Word using EVA addressing

Purpose:

Store Word using EVA addressing. Store word from register $rt to virtual address $rs +offset,translating the virtual address as though the core is in user mode, although it is actually in kernel mode.

Availability:

nanoMIPS. Optional, present when Config5.EVA=1. Requires CP0 privilege.

Format:

101001

rt

rs

s[8]

1001

0

10

s[7:0]

6

5

5

1

4

1

2

8

Operation:

offset = sign_extend(s, from_nbits=9)
if not C0.Config5.EVA:
    raise exception('RI')
if not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
va = effective_address(GPR[rs], offset, 'Store', eva=True)
data = zero_extend(GPR[rt], from_nbits=32)
write_memory_at_va(data, va, nbytes=4, eva=True)

Exceptions:

Address Error. Bus Error. Coprocessor Unusable. Reserved Instruction if EVA not implemented. TLB Invalid. TLB Modified. TLB Refill. Watch.

Assembly:

SWM rt, offset(rs), count

nanoMIPS, not available in NMS

Store Word Multiple.

Purpose:

Store Word Multiple.Storecount wordsofdatafrom registers$rt, $(rt+1),...,

$(rt+count-1) to consecutive memory addressesstarting at $rs + offset (register plusimmediate).

Availability:

nanoMIPS, not available in NMS

Format:

101001

rt

rs

s[8]

count3

1

1

00

s[7:0]

6

5

5

1

3

1

1

2

8

offset = sign_extend(s, from_nbits=9)
count = 8 if count3 == 0 else count3

Operation:

if C0.Config5.NMS == 1:
    raise exception('RI')
i = 0
while i != count:
    this_rt = ( 0           if rt == 0    else
                rt + i      if rt + i < 32 else
                rt + i - 16                    )
    this_offset = offset + (i<<2)
    va = effective_address(GPR[rs], this_offset, 'Store')
    data = zero_extend(GPR[this_rt], from_nbits=32)
    write_memory_at_va(data, va, nbytes=4)
    i += 1

SWM stores count words from sequentially numbered registers to sequential memory addresses. After storing $31, the sequence of registers continues from $16.If rt=0, then $0 is stored for all count steps

of the instruction. Some example encodings of the register list are:

loads [$15, $16, $17]

If a TLB exception or interrupt occurs during the execution of this instruction, a subset of the required memory updates may have occurred. A full restart of the instruction will be performed on return from

the exception.

Exceptions:

Address Error. Bus Error. Reserved Instruction on NMS cores. TLB Invalid. TLB Modified. TLB Refill. Watch.

Assembly:

SWPC rt, address

nanoMIPS, not available in NMS

Store Word PC relative

Purpose:

Store Word PC relative. Store word from register $rt to PC relative address address.

Availability:

nanoMIPS, not available in NMS

Format:

SWPC[48]

011000

rt

01111

s[15:0]

s[31:16]

6

5

5

16

16

offset = sign_extend(s, from_nbits=32)

Operation:

if C0.Config5.NMS == 1:
    raise exception('RI')
address = effective_address(CPU.next_pc, offset, 'Store')
data = zero_extend(GPR[rt], from_nbits=32)
write_memory_at_va(data, address, nbytes=4)

Exceptions:

Address Error. Bus Error. Reserved Instruction on NMS cores. TLB Invalid. TLB Modified. TLB Refill. Watch.

Assembly:

SWX rd, rs(rt)

nanoMIPS, not available in NMS

Store Word indeXed

Purpose:

Store Word indeXed. Store word from register $rt to memory address $rt + $rs (registerplus register).

Availability:

nanoMIPS, not available in NMS

Format:

001000

rt

rs

rd

1001

0

000

111

6

5

5

5

4

1

3

3

Operation:

if C0.Config5.NMS == 1:
    raise exception('RI')
va = effective_address(GPR[rs], GPR[rt], 'Store')
data = zero_extend(GPR[rd], from_nbits=32)
write_memory_at_va(data, va, nbytes=4)

Exceptions:

Address Error. Bus Error. Reserved Instruction on NMS cores. TLB Invalid. TLB Modified. TLB Refill. Watch.

Assembly:

SWXS rd, rs(rt)

nanoMIPS, not available in NMS

Store Word indeXed Scaled

Purpose:

Store Word indeXed Scaled. Store word from register $rt to memory address

$rt + 4*$rs

(register plus scaled register).

Availability:

nanoMIPS, not available in NMS

Format:

001000

rt

rs

rd

1001

1

000

111

6

5

5

5

4

1

3

3

Operation:

if C0.Config5.NMS == 1:
    raise exception('RI')
va = effective_address(GPR[rs]<<2, GPR[rt], 'Store')
data = zero_extend(GPR[rd], from_nbits=32)
write_memory_at_va(data, va, nbytes=4)

Exceptions:

Address Error. Bus Error. Reserved Instruction on NMS cores. TLB Invalid. TLB Modified. TLB Refill. Watch.

Assembly:

SYNC stype

nanoMIPS

Sync

Purpose:

Sync.

Impose ordering constraints of type stype on prior and subsequent memory operations.

Availability:

nanoMIPS

Format:

100000

00000

stype

1100

x

0000

00110

6

5

5

4

3

4

5

Operation:

sync_memory_access(stype)

The SYNC instruction is used to order loads and stores for shared memory, and also to order operations with respect to the globalinvalidate instructions GINVI and GINVT. The following types of ordering

guarantees are available with different stypes.

instructions before the SYNC are completed and globally performed before any of the specified memory instructions after the SYNC are performed to any extent. Loads are completed when

the destination register is written. Stores are completed when the stored value is visible to every other processor in the system.

memory instructions before the SYNC are ordered before any of the specified memory instructions after the SYNC. The ordering SYNC is considered complete when the memory instructions

before and after the SYNC are guaranteed thereafter to retain their order relative to the SYNC,

i.e. when it is guaranteed that all specified memory instructions before the SYNC will be globally performed before any of

the specified memory accesses after the SYNC are performed to

any extent.Itis helpfulto think of a global ordering pointin a coherence domain, which is a point where once an instruction reaches,it can be guaranteed to retain its order relative to any

memory instruction that reaches the point after it. The ordering SYNC thus can not complete before all older specified memory instructions reach the global ordering point.

The following table shows the behavior of the SYNC instruction for each stype value. Operation types listed in the ’What reaches before’ column are subject to a pre-SYNC ordering barrier: such operations,

when younger, must reach the global ordering point before the SYNC instruction completes. Operation types listed in the ’What reaches after’ column are subjectto a post-SYNC ordering barrier:such

operations, when older, must reach the global ordering point only after the SYNC instruction completes. Operation types listed in the ’What completes before’ column are subject to a completion barrier, that

is, they must be globally performed when the SYNC instruction completes.

What

What reachesWhat reachescompletes

NamebeforeafterbeforeAvailabilitystype

0x0SYNCLoads, StoresLoads, StoresLoads, StoresRequired. 0x1-0x3Impl./vendor

specific.

0x4SYNC_WMBStoresStoresOptional. 0x5-0xFImpl./vendor

specific.

0x10SYNC_MBLoads, StoresLoads, StoresOptional. 0x11SYNC_ACQUIRE LoadsLoads, StoresOptional.

0x12SYNC_RELEASE Loads, StoresLoadsOptional. 0x13SYNC_RMBLoadsLoadsOptional.

0x14SYNC_GINVLoads, StoresLoads, StoresGINVI, GINVT,Config5.GI=2,3.

SYNCI

0x15Reserved for 0x1FArchitecture.

SYNC barriers affect only uncached and cached coherent loads and stores and do not affect the order in which instruction fetches are performed. For the purposes of this description,the CACHE, PREF

and SYNCIinstructions are treated as loads and stores.In addition,the optional GlobalInvalidate instructions are synchronizable through SYNC (stype=0x14).

The effect of SYNC on the global order of loads and stores for memory access types other than uncached and cached coherent is UNPREDICTABLE.

A completion barrier may have an adverse impact on performance compared to an ordering barrier due to the constraint of completion. An implementation may optimize the ordering of memory instructions

such that the ordering barrier completes before a completion barrier under the same circumstance. The magnitude of the impact is implementation-dependent but an implementation must ensure that an

ordering barrier is not worse performing than the equivalent completion barrier. Software thus needs to use completion and ordering barriers for the appropriate conditions.

An stype of 0 is used to define the SYNC instruction with completion barrier semantics. Non-zero values of stype may be defined by the architecture or specific implementations to perform synchronization

behaviors that are less complete than that of stype=0.If an implementation does not use one of these non-zero values to define a different synchronization behavior, then that non-zero value of stype must

map to a completion barrier. This allows software written for an implementation with a lighter-weight barrier to work on another implementation which only implements the stype=0 completion barrier.

The Acquire and Release barrier types are used to minimize the memory ordering that must be maintained and still have software synchronization work.

A completion barrier is required, potentially in conjunction with an EHB instruction,to guarantee that memory reference results are visible across operating mode changes. For example, a completion

barrier is required on some implementations on entry to and exit from Debug Mode to guarantee that memory effects are handled correctly.

If Global Invalidate instructions are supported, then SYNC (stype=0x14) acts as a completion barrier with respect to any preceding GINVI or GINVT instructions. This SYNC instruction is globalized and

only completes if all preceding GINVI or GINVT operations related to the same program have completed in the system.(Any references to GINVT also imply GINVGT, available in a virtualized MIPS system.)

Asystem thatimplementsthe GlobalInvalidatesalsorequiresthatthecompletionofSYNC (stype=0x14) be constrained by legacy SYNCI operations.Thus SYNC (stype=0x14) can also be

used to enforce synchronization of SYNCI instructions.In the typical use cases, a single GINVI is used by itself to invalidate caches and would be followed by a SYNC (stype=0x14).In the case of GINVT,

multiple GINVT could be used to invalidate multiple TLB mappings, and the SYNC (stype=0x14) would be used to guaranteed completion of any number of GINVTs preceding it.

Terms

:

Synchronizable: A load or store instruction is synchronizable if the load or store occurs to a physical location in shared memory using a virtual address with a memory access type of either uncached or

cached coherent .

Shared memory: Memory that can be accessed by more than one processor or by a coherent I/O system module.

Performed load: A load instruction is performed when the value returned by the load has been determined. The result of a load on processor A has been determined with respect to processor or coherent

I/O module B when a subsequent store to the location by B cannot affect the value returned by the load. The store by B must use the same memory access type as the load.

Performed store: A store instruction is performed when the store is observable. A store on processor A is observable with respectto processor or coherentI/O module B when a subsequentload ofthe

location by B returns the value written by the store. The load by B must use the same memory access type as the store.

Globally performed load: A load instruction is globally performed when it is performed with respect to all processors and coherent I/O modules capable of storing to the location.

Globally performed store: A store instruction is globally performed when it is globally observable.It is globally observable when it is observable by all processors and I/O modules capable of loading from

the location.

Global ordering point: A point in the coherence domain where when a memory instruction reaches,it can be guaranteed to retain its order relative to any memory instruction that reaches the point after

it.

CoherentI/O module: A coherentI/O module is an Input/Output system componentthat performs coherent Direct Memory Access (DMA). It reads and writes memory independently as though it were

a processor doing loads and stores to locations with a memory access type of cached coherent.

Programming Notes

:

A processor executing load and store instructions observes the order in which loads and stores using the same memory access type occur in the instruction stream; this is known as program order.

A parallel program has multiple instruction streams that can execute simultaneously on different processors.

In multiprocessor (MP) systems,the order in which the effects ofloads and stores are observed by other processors - the global order of the loads and store - determines the actions necessary

to reliably share data in parallel programs.

When all processors observe the effects ofloads and stores in program order, the system is strongly ordered. On such systems, parallel programs can reliably share data without explicitly using a SYNC.

Executing SYNC on such a system is not necessary, will not cause an error, but may reduce overall performance.

If a multiprocessor system is not strongly ordered, the effects of load and store instructions executed by one processor may be observed out of program order by other processors. On such systems, parallel

programs must use SYNC to reliably share data at critical points in the program. SYNC separates the loads and stores executed on the processor into two groups, and the effect of allloads and stores in

one group is seen by all processors before the effect of any load or store in the subsequent group.In effect, SYNC causes the system to be strongly ordered for the executing processor at the instant that

the SYNC is executed.

The hardware ordering support provided in a MIPS-based multiprocessor system is implementation dependent. A parallel program that does not use SYNC generally does not operate on a system that is not

strongly ordered. However, a program that does use SYNC works on both types of systems.(Systemspecific documentation describes the actions needed to reliably share data in parallel programs for

that system.)

The behavior of a load or store using one memory access type is UNPREDICTABLE if a load or store was previously made to the same physical location using a different memory access type. The presence

of a SYNC between the references does not alter this behavior.

SYNC affects the order in which the effects of load and store instructions appear to all processors;it does not generally affect the physical memory-system ordering or synchronization issues that arise in

system programming. The effect of SYNC on implementation-specific aspects of the cached memory system, such as writeback buffers, is not defined.

The code fragments below show how SYNC can be used to coordinate the use of shared data between separate writer and reader instruction streams in a multiprocessor environment. The FLAG location is

used by the instruction streams to determine whether the shared data item DATA is valid. The SYNC executed by processor A forces the store of DATA to be performed globally before the store to FLAG

is performed. The SYNC executed by processor B ensures that DATA is not read until after the FLAG value indicates that the shared data is valid.

# Processor A (writer)
# Conditions at entry:
# The value 0 has been stored in FLAG and that valueisobservablebyB
SW     R1, DATA       # change sharedDATA value
LI     R2, 1
SYNC                  # Perform DATAstore beforeperforming FLAGstore
SW     R2, FLAG       # say that thesharedDATA value isvalid
# Processor B (reader)
LI     R2, 1
1: LW     R1, FLAG   # Get FLAG
BNEC   R2, R1, 1B # if it says that DATAis not valid, poll again
NOP
SYNC              # FLAG value checked beforedoing DATA read
LW     R1, DATA   # Read (valid)sharedDATA value
SYNC

Exceptions:

None.

Assembly:

SYNCI  offset(rs)

nanoMIPS, availability varies by format.

SYNChronize Instruction cache/SYNChronize Instruction cache using EVA addressing

SYNCIE offset(rs)

nanoMIPS, availability varies by format.

SYNChronize Instruction cache/SYNChronize Instruction cache using EVA addressing

Purpose:

SYNChronize Instruction cache/SYNChronize Instruction cache using EVA addressing. Synchronize the caches to make instructions writes at address $rs + offset (register plus immediate) effective. For SYNCIE, translate the virtual address as though the core is in user mode, although it is

actually in kernel mode.

Availability:

nanoMIPS, availability varies by format.

Format:

SYNCI[S9]

101001

11111

rs

s[8]

0011

0

00

s[7:0]

6

5

5

1

4

1

2

8

offset = sign_extend(s, from_nbits=9)
is_eva = False

SYNCI[U12]

100001

11111

rs

0011

u

6

5

5

4

12

offset = u
is_eva = False

SYNCIE, present when Config5.EVA=1, requires CP0 privilege.

101001

11111

rs

s[8]

0011

0

10

s[7:0]

6

5

5

1

4

1

2

8

offset = sign_extend(s, from_nbits=9)
is_eva = True

Operation:

if is_eva and not C0.Config5.EVA:
    raise exception('RI')
if is_eva and not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
va = effective_address(GPR[rs], offset, 'Load', eva=is_eva)
pa, cca = va2pa(va, 'Cacheop', eva=is_eva)
# Make data writes at address=PA visible to the instructionstream(forall
# coherent cores in the system)...
# The precise details of the operation are implementationdependent,andwill
# depend on the cache hierarchy and coherency behaviorofthesystem.The
# following code shows a sample implementation forasystemwherethememory
# hierarchy is unified beyond the L1 instruction anddatacaches.
# Find index where address is present in D cache, ifany.
dcache_hit_index = cache_lookup_index('D', va, pa)
if dcache_hit_index:
    way_index, set_index = dcache_hit_index
    dcache_line = get_cache_line('D',way_index, set_index)
    if dcache_line.valid and dcache_line.dirty:
        dcache_line.write_back()
        # Implementation may or may not invalidateline too,seebelow.
for core in get_all_cores_in_system():
    # Find index where address is presentin this core'sIcache,ifany.
    icache_hit_index = cache_lookup_index('I', va, pa, core)
    if icache_hit_index:
        way_index, set_index = icache_hit_index
        icache_line = get_cache_line('I', way_index, set_index,core)
        if not icache_line.locked:
            icache_line.valid = 0

SYNCI is a user privilege instruction for synchronizing the caches to make instruction writes to address

$rs + offset effective. SYNCI must be followed by a SYNC instruction and an instruction hazard barrier to guarantee that subsequent instruction fetches see the updated instructions. One SYNCI instruction

is required for every cache line that was written. The size of the cache line can be determined by the RDHWR instruction.

SYNCI can cause TLB Refill and TLB invalid exceptions (with cause code TLBL). It does not cause TLBRI exceptions. A Cache Error or Bus Error exception may occur as a result of a writeback triggered by

the instruction.

An Address Error Exception (with cause code equal ADEL) may occur if a SYNCI targets an address which is not accessible from the current operating mode.It is implementation dependent whether such

an exception does occur, but the instruction should not affect cache lines which are not accessible from the current operating mode.

It is implementation dependent whether a data watch exception is triggered by a SYNCI instruction whose address matches the Watch register address match conditions. The preferred implementation

is not to match on the SYNCI instruction.

The operation of the processor is UNPREDICTABLE if the effective address of the SYNCI targets any instruction cache line that contains instructions to be executed between the SYNCI and the subsequent

JALRC.HB, JRC.HB, or ERET instruction required to clear the instruction hazard.

The SYNCIinstruction has no effect on cache lines that were previously locked with the CACHE instruction.

If correct software operation depends on the state of a locked line, the CACHE instruction

must be used to synchronize the caches.

In multi-processor systems, a SYNCI to an address with a coherent CCA must guarantee synchronization of all coherent instruction caches in the system.

(Prior to Release 6 of the MIPS™ Architecture,

this behavior was recommended but not required).

The manner in which SYNCI is implemented will depend on the cache hierarchy of the processor. Typically, all caches out

to the point at which both instruction and data references become unified are

processed.If no caches exist or if instruction cache coherency is already guaranteed, the instruction must be implemented as a NOP.

In a typical implementation in which only the L1 instruction and data caches are affected, this instruction would perform a Hit

Invalidate operation on the instruction cache and a Hit Writeback or Hit

Writeback Invalidate on the data cache. The decision to invalidate the data cache line is implementation dependent, but should be made under the assumption that the data will not be written again soon.

If a Hit Writeback Invalidate (as opposed to a Hit Writeback) would cause the line to be selected for replacement, the invalidate option might be selected.

The following example shows a routine which could be called after the new instruction stream is written to make those changes effective.

/*
 * This routine makes changes to the instruction stream effectivetothe
 * hardware. It should be called after the instruction streamiswritten.
 * On return, the new instructions are effective.
 *
 * Inputs:
 *   a0 = Start address of new instruction stream
 *   a1 = Size in bytes of new instructionstream
 */
        beqc    a1, zero, 20f     /* If size==0, branch around. */
        addu    a1, a0, a1        /* Calculateendaddress+ 1. */
        rdhwr   v0, HW_SYNCI_Step /* Get step sizeforSYNCI. */
        beqc    v0, zero, 20f     /* Nothing to doif no caches. */
10:     synci   0(a0)             /* Sync allcachesaroundaddress. */
        addu    a0, a0, v0        /* Addstepsize.*/
        sltu    v1, a0, a1        /* Notpastthe endaddress? */
        bnec    v1, zero, 10b     /* Branchif moreto do. */
        sync                      /* Clear memory hazards.*/
20:     jrc.hb ra                /* Return, clearinginstruction hazards. */

Exceptions:

Address Error. Bus Error. Cache Error. Coprocessor Unusable for SYNCIE. Reserved Instruction for SYNCIE if EVA not implemented. TLB Invalid. TLB Refill.

Assembly:

SYSCALL code

nanoMIPS

System Call

Purpose:

System Call. Cause a System Call exception.

Availability:

nanoMIPS

Format:

SYSCALL[32]

000000

00000

01

0

code

6

5

2

1

18

SYSCALL[16]

000100

00000

01

0

code

6

5

2

1

2

Operation:

raise exception('SYSCALL')

Exceptions:

System Call.

Assembly:

TEQ rs, rt, code

nanoMIPS, not available in NMS

Trap if Equal

Purpose:

Trap if Equal. Cause a Trap exception if registers $rs and $rt are equal.

Availability:

nanoMIPS, not available in NMS

Format:

001000

rt

rs

code

0

0000000

000

6

5

5

5

1

7

3

Operation:

if C0.Config5.NMS == 1:
    raise exception('RI')
if GPR[rs] == GPR[rt]:
    raise exception('TRAP')

Exceptions:

Trap.

Assembly:

TLBINV

nanoMIPS. Required on TLB cores, unless Config5.IE<2. Requires CP0 privilege.

TLB Invalidate

Purpose:

TLB Invalidate.

Invalidate a set of TLB entries based on ASID match.

Availability:

nanoMIPS. Required on TLB cores, unless Config5.IE<2. Requires CP0 privilege.

Format:

001000

x

00

00011

101

111

111

6

10

2

5

3

3

3

Operation:

if C0.Config4.IE < 2:
    raise exception('RI')
if not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
tlbinv()

Exceptions:

Coprocessor Unusable. Reserved Instruction if TLB invalidate not implemented.

Assembly:

TLBINVF

nanoMIPS. Required on TLB cores, unless Config5.IE<2. Requires CP0 privilege.

TLB Invalidate Flush

Purpose:

TLB Invalidate Flush.

Invalidate a set of TLB entries, ignoring ASID match.

Availability:

nanoMIPS. Required on TLB cores, unless Config5.IE<2. Requires CP0 privilege.

Format:

001000

x

00

01011

101

111

111

6

10

2

5

3

3

3

Operation:

if C0.Config4.IE < 2:
    raise exception('RI')
if not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
tlbinv(flush=True)

Exceptions:

Coprocessor Unusable. Reserved Instruction if TLB invalidate not implemented.

Assembly:

TLBP

nanoMIPS. Required on TLB cores. Requires CP0 privilege.

TLB Probe

Purpose:

TLB Probe. Probe the TLB for an entry matching C0.EntryHi.

Iffound, write the index of

the matching entry to C0.Index, otherwise set C0.Index.P to 1.

Availability:

nanoMIPS. Required on TLB cores. Requires CP0 privilege.

Format:

001000

x

00

00001

101

111

111

6

10

2

5

3

3

3

Operation:

if not got_tlb():
    raise exception('RI')
if not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
tlbp()

Exceptions:

Coprocessor Unusable. Reserved Instruction if TLB not implemented.

Assembly:

TLBR

nanoMIPS. Required on TLB cores. Requires CP0 privilege.

TLB Read

Purpose:

TLB Read. Read the TLB entry indexed by C0.Index into the TLB CP0 registers EntryHi,EntryLo0, EntryLo1, PageMask.

Availability:

nanoMIPS. Required on TLB cores. Requires CP0 privilege.

Format:

001000

x

00

01001

101

111

111

6

10

2

5

3

3

3

Operation:

if not got_tlb():
    raise exception('RI')
if not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
tlbr()

Exceptions:

Coprocessor Unusable. Reserved Instruction if TLB not implemented.

Assembly:

TLBWI

nanoMIPS. Required on TLB cores. Requires CP0 privilege.

TLB Write Indexed

Purpose:

TLB Write Indexed. Write the TLB entry indexed by C0.Index using the values in the TLBCP0 registers EntryHi, EntryLo0, EntryLo1, PageMask.

Availability:

nanoMIPS. Required on TLB cores. Requires CP0 privilege.

Format:

001000

x

00

10001

101

111

111

6

10

2

5

3

3

3

Operation:

if not got_tlb():
    raise exception('RI')
if not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
tlbwi(C0.Index.Index)

Exceptions:

Coprocessor Unusable. Reserved Instruction if TLB not implemented.

Assembly:

TLBWR

nanoMIPS. Required on TLB cores. Requires CP0 privilege.

TLB Write Random

Purpose:

TLB Write Random. Write a randomly chosen TLB entry using the values in the TLB CP0registers EntryHi, EntryLo0, EntryLo1, PageMask.

Availability:

nanoMIPS. Required on TLB cores. Requires CP0 privilege.

Format:

001000

x

00

11001

101

111

111

6

10

2

5

3

3

3

Operation:

if not got_tlb():
    raise exception('RI')
if not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
tlbwr()

Exceptions:

Coprocessor Unusable. Reserved Instruction if TLB not implemented.

Assembly:

TNE rs, rt, code

nanoMIPS, not available in NMS

Trap if Not Equal

Purpose:

Trap if Not Equal. Cause a Trap exception if registers $rs and $rt are not equal.

Availability:

nanoMIPS, not available in NMS

Format:

001000

rt

rs

code

1

0000000

000

6

5

5

5

1

7

3

Operation:

if C0.Config5.NMS == 1:
    raise exception('RI')
if GPR[rs] != GPR[rt]:
    raise exception('TRAP')

Exceptions:

Trap.

Assembly:

UALH rt, offset(rs)

nanoMIPS, not available in NMS

Unaligned Load Half

Purpose:

Unaligned Load Half.

Load signed halfword to register $rt from memory address $rs +

offset (register plus immediate), guaranteeing that the operation completes even if the address is not halfword aligned.

Availability:

nanoMIPS, not available in NMS

Format:

101001

rt

rs

s[8]

0100

0

01

s[7:0]

6

5

5

1

4

1

2

8

Operation:

if C0.Config5.NMS == 1:
    raise exception('RI')
offset = sign_extend(s, from_nbits=9)
va = effective_address(GPR[rs], offset, 'Load')
data = read_memory_at_va(va, nbytes=2, unaligned_support='always')
GPR[rt] = sign_extend(data, from_nbits=16)

UALH will not cause an Address Error exception for unaligned addresses.

An unaligned load/store instruction may be implemented using more than one memory transaction.It is possible for a subset of these memory transactions to have completed and then for a TLB exception to

occur on a remaining transaction.It is also possible that memory could be modified by another thread or device in between the completion of the memory transactions. This behavior is equivalent to what

might occur if the unaligned load/store was carried out in software using a series of separate aligned instructions, for instance using LWL/LWR on a pre-R6 MIPS™ core. Software should take equivalent

steps to accommodate this lack of guaranteed atomicity as it would for the multiple instruction case.

Exceptions:

Bus Error. Reserved Instruction on NMS cores. TLB Invalid. TLB Refill. TLB Read Inhibit. Watch.

Assembly:

UALW rt, offset(rs)

Assembly alias, not available in NMS

Unaligned Load Word

Purpose:

Unaligned Load Word. Load word to register $rt from memory address $rs + offset (register plus immediate), guaranteeing that the operation completes even if the address is not word aligned.

Availability:

Assembly alias, not available in NMS

Expansion:

UALWM rt, offset(rs), 1

Assembly:

UALWM rt, offset(rs), count

nanoMIPS, not available in NMS

Unaligned Load Word Multiple

Purpose:

Unaligned Load Word Multiple.

Load count words of data to registers $rt, $(rt+1),...,

$(rt+count-1) from consecutive memory address starting at $rs + offset (register plus immediate). Guarantee that the operation completes even if the address is not word aligned.

Availability:

nanoMIPS, not available in NMS

Format:

101001

rt

rs

s[8]

count3

0

1

01

s[7:0]

6

5

5

1

3

1

1

2

8

offset = sign_extend(s, from_nbits=9)
count = 8 if count3 == 0 else count3

Operation:

if C0.Config5.NMS == 1:
    raise exception('RI')
i = 0
while i != count:
    this_rt = ( rt + i      if rt + i < 32 else
                rt + i - 16                    )
    this_offset = offset + (i<<2)
    va = effective_address(GPR[rs], this_offset, 'Load')
    data = read_memory_at_va(va, nbytes=4, unaligned_support='always')
    GPR[this_rt] = sign_extend(data, from_nbits=32)
    if this_rt == rs and i != count - 1:
        raise UNPREDICTABLE()
    i += 1

UALWM loads count words to sequentially numbered registers from sequential memory addresses which are potentially unaligned. After loading $31, the sequence of registers continues from $16. See

LWM for example encodings of the register list.

UALWM will not cause an Address Error exception for unaligned addresses.

The result is unpredictable if an UALWM instruction updates the base register prior to the final load.

If a TLB exception or interrupt occurs during the execution of this instruction, a subset of the required register updates may have occurred.

An unaligned load/store instruction may be implemented using more than one memory transaction.It is possible for a subset of these memory transactions to have completed and then for a TLB exception to

occur on a remaining transaction.It is also possible that memory could be modified by another thread or device in between the completion of the memory transactions. This behavior is equivalent to what

might occur if the unaligned load/store was carried out in software using a series of separate aligned instructions, for instance using LWL/LWR on a pre-R6 MIPS™ core. Software should take equivalent

steps to accommodate this lack of guaranteed atomicity as it would for the multiple instruction case.

UALWM must be implemented in such a way as to make the instruction restartable, but the implementation does not need to be fully atomic. For instance,

it is allowable for a UALWM instruction to be

aborted by an exception after a subset of the register updates have occurred. To ensure restartability, any write to GPR $rs (which may be used as the final output register) must be completed atomically,

that is, the instruction must graduate if and only if that write occurs.

Exceptions:

Bus Error. Reserved Instruction on NMS cores. TLB Invalid. TLB Read Inhibit. TLB Refill. Watch.

Assembly:

UASH rt, offset(rs)

nanoMIPS, not available in NMS

Unaligned Store Half

Purpose:

Unaligned Store Half. Store halfword from register $rt to memory address $rs + offset(register plus immediate), guaranteeing that the operation completes even if the address is not halfword aligned.

Availability:

nanoMIPS, not available in NMS

Format:

101001

rt

rs

s[8]

0101

0

01

s[7:0]

6

5

5

1

4

1

2

8

Operation:

if C0.Config5.NMS == 1:
    raise exception('RI')
offset = sign_extend(s, from_nbits=9)
va = effective_address(GPR[rs], offset, 'Store')
data = zero_extend(GPR[rt], from_nbits=16)
write_memory_at_va(data, va, nbytes=2, unaligned_support='always')

UASH will not cause an Address Error exception for unaligned addresses.

An unaligned load/store instruction may be implemented using more than one memory transaction.It is possible for a subset of these memory transactions to have completed and then for a TLB exception to

occur on a remaining transaction.It is also possible that memory could be modified by another thread or device in between the completion of the memory transactions. This behavior is equivalent to what

might occur if the unaligned load/store was carried out in software using a series of separate aligned instructions, for instance using LWL/LWR on a pre-R6 MIPS™ core. Software should take equivalent

steps to accommodate this lack of guaranteed atomicity as it would for the multiple instruction case.

Exceptions:

Bus Error. Reserved Instruction on NMS cores. TLB Invalid. TLB Modified. TLB Refill. Watch.

Assembly:

UASW rt, offset(rs)

Assembly alias, not available in NMS

Unaligned Store Word

Purpose:

Unaligned Store Word.

Store word from register $rt to memory address $rs + offset

(register plus immediate), guaranteeing that the operation completes even if the address is not word aligned.

Availability:

Assembly alias, not available in NMS

Expansion:

UASWM rt, offset(rs), 1

Assembly:

UASWM rt, offset(rs), count

nanoMIPS, not available in NMS

Unaligned Store Word Multiple

Purpose:

Unaligned Store Word Multiple. Store count words of data from registers $rt, $(rt+1),

...,

$(rt+count-1) to consecutive memory addresses starting at $rs + offset (register plus immediate). Guarantee that the operation completes even if the address is not word aligned.

Availability:

nanoMIPS, not available in NMS

Format:

101001

rt

rs

s[8]

count3

1

1

01

s[7:0]

6

5

5

1

3

1

1

2

8

offset = sign_extend(s, from_nbits=9)
count = 8 if count3 == 0 else count3

Operation:

if C0.Config5.NMS == 1:
    raise exception('RI')
i = 0
while i != count:
    this_rt = ( 0           if rt == 0    else
                rt + i      if rt + i < 32 else
                rt + i - 16                    )
    this_offset = offset + (i<<2)
    va = effective_address(GPR[rs], this_offset, 'Store')
    data = zero_extend(GPR[this_rt], from_nbits=32)
    write_memory_at_va(data, va, nbytes=4, unaligned_support='always')
    i += 1

UASWM stores count words from sequentially numbered registers to sequential memory addresses which are potentially unaligned. After storing $31, the sequence of registers continues from $16.If

rt=0, then $0 is stored for all count steps of the instruction. See SWM for example encodings of the register list.

UASWM will not cause an Address Error exception for unaligned addresses.

If a TLB exception or interrupt occurs during the execution of this instruction, a subset of the required memory updates may have occurred. A full restart of the instruction will be performed on return from

the exception.

An unaligned load/store instruction may be implemented using more than one memory transaction.It is possible for a subset of these memory transactions to have completed and then for a TLB exception to

occur on a remaining transaction.It is also possible that memory could be modified by another thread or device in between the completion of the memory transactions. This behavior is equivalent to what

might occur if the unaligned load/store was carried out in software using a series of separate aligned instructions, for instance using LWL/LWR on a pre-R6 MIPS™ core. Software should take equivalent

steps to accommodate this lack of guaranteed atomicity as it would for the multiple instruction case.

Exceptions:

Bus Error. Reserved Instruction on NMS cores. TLB Invalid. TLB Modified. TLB Refill. Watch.

Assembly:

WAIT code

nanoMIPS

Wait

Purpose:

Wait. Enter wait state.

Availability:

nanoMIPS

Format:

001000

code

11

00001

101

111

111

6

10

2

5

3

3

3

Operation:

if not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
CPU.in_wait_state = True

Exceptions:

Coprocessor Unusable.

Assembly:

WRPGPR rt, rs

nanoMIPS. Requires CP0 privilege.

Write Previous GPR

Purpose:

Write Previous GPR. Write the value of register $rs from the current shadow register set(SRSCtl.CSS) to register $rt in the previous shadow register set (SRSCtl.PSS). If shadow register sets are not implemented,just copy the value from register $rs to register $rt.

Availability:

nanoMIPS. Requires CP0 privilege.

Format:

001000

rt

rs

11

11000

101

111

111

6

5

5

2

5

3

3

3

Operation:

if not IsCoprocessor0Enabled():
    raise coprocessor_exception(0)
if C0.SRSCtl.HSS > 0:
    SRS[C0.SRSCtl.PSS][rt] = GPR[rs]
else:
    GPR[rt] = GPR[rs]

Exceptions:

Coprocessor Unusable.

Assembly:

WSBH rt, rs

Assembly alias, not available in NMS

Word Swap Byte Half

Purpose:

Word Swap Byte Half. Swap the bytes within both halfs of the word value in register $rs,and write the result to register $rt.

Availability:

Assembly alias, not available in NMS

Expansion:

ROTX rt, rs, 8, 24

The assembly alias WSHB is provided for compatibility with MIPS32™.Its behavior is equivalent to the new assembly alias BYTEREVH, whose name is chosen to fit consistently with the naming of other

reversing instructions in nanoMIPS™.

Assembly:

XOR rd, rs, rt

nanoMIPS

XOR

Purpose:

XOR. Compute logical XOR of registers $rs and $rt, placing the result in register $rt.

Availability:

nanoMIPS

Format:

XOR[32]

001000

rt

rs

rd

x

1100010

000

6

5

5

5

1

7

3

XOR[16]

010100

rt3

rs3

01

0

0

6

3

3

2

1

1

rt = decode_gpr(rt3, 'gpr3')
rs = decode_gpr(rs3, 'gpr3')
rd = rt

Operation:

GPR[rd] = GPR[rs] ^ GPR[rt]

Exceptions:

None.

Assembly:

XORI rt, rs, u

nanoMIPS

XOR Immediate

Purpose:

XOR Immediate. Compute logical XOR of register $rs with immediate u, placing the resultin register $rt.

Availability:

nanoMIPS

Format:

100000

rt

rs

0001

u

6

5

5

4

12

Operation:

GPR[rt] = GPR[rs] ^ u

Exceptions:

None.