COP1 010001 |
fmt 10110 |
ft |
fs |
fd |
ADDR.PS 011000 |
6 |
5 |
5 |
5 |
5 |
6 |
ADDR.PS fd, fs, ft |
MIPS-3D |
Floating Point Reduction Add |
Floating Point Reduction Add
To perform a reduction add on two paired-single floating point values
FPR[fd].PL = FPR[ft].PU + FPR[ft].PL; FPR[fd].PU = FPR[fs].PU + FPR[fs].PL
The paired-single values in FPR ft are added together and the result put in the lower paired-single position of FPR fd.
Similarly, the paired-single values in FPR fs are added together and the result put in the upper paired-single position of FPR fd. The two results are calculated to infinite precision and rounded by using the current rounding mode in
FCSR. The operands and result are values in format PS.
Any generated exceptions in the two independent adds are OR’ed together. Cause bits are ORed into the Flag bits if no exception is taken.
is UNPRE-The fields fs,ft, and fd must specify FPRs valid for operands of type PS. If they are not valid, the result
DICTABLE.
is UNPREDICTABLE and the values in theThe operands must be values in format PS. If they are not,the result operand FPRs become UNPREDICTABLE.
The result of ADDR.PS is UNPREDICTABLE if the processor is executing in 16 FP registers mode.
lower = ValueFPR(ft, PS)31..0+ ValueFPR(ft, PS)63..32 upper = ValueFPR(fs, PS)31..0+ ValueFPR(fs, PS)63..32 StoreFPR (fd, PS, upper || lower)
Coprocessor Unusable, Reserved Instruction
Unimplemented Operation, Invalid Operation, Overflow, Inexact, Underflow
COP1 010001 |
BC1ANY2 01001 |
cc xx0 |
0 |
tf 0 |
offset |
6 |
5 |
3 |
1 |
1 |
16 |
BC1ANY2F cc,offset |
MIPS-3D |
Branch on Any of Two Floating Point Condition Codes False |
Branch on Any of Two Floating Point Condition Codes False
To test two consecutive floating point condition codes and do a PC-relative conditional branch
If FPConditionCode(CCn+1) = 0 or FPConditionCode(CCn) = 0, then branch
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself) in the branch delay slot to form a PC-relative effective target address. If either one of the two FP condition code bits CC is false (0), the program branches to the effective target address after the instruction in the delay slot is executed.
The CC specified must align to 2, so bit 18 must always be zero. For example, specifying a value of 4 will check if in UNPREeither one of CC5 or CC4 is 0 and branch accordingly. Specifying an illegally aligned CC will result
DICTABLE behavior.
An FP condition code is set by an FP compare instruction, C.cond.fmt and the MIPS-3D compare absolute instruction
CABS.cond.fmt.
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
This operation specification is for the general Branch On Any Two Condition operation with the tf (true/false) as a variables. The individual instructions BC1ANY2F and BC1ANY2T have a specific values for tf.
I: condition = (FPConditionCode(cc) = 0) or (FPConditionCode(cc+1) = 0) target_offset = (offset15)GPRLEN-(16+2)|| offset || 02 I+1: if condition then PC = PC + target_offset endif
Coprocessor Unusable, Reserved Instruction
Unimplemented Operation
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register
(JR) instructions to branch to addresses outside this range.
COP1 010001 |
BC1ANY2 01001 |
cc xx0 |
nd 0 |
tf 1 |
offset |
6 |
5 |
3 |
1 |
1 |
16 |
BC1ANY2T cc,offset |
MIPS-3D |
Branch on Any of Two Floating Point Condition Codes True |
Branch on Any of Two Floating Point Condition Codes True
To test two consecutive FP condition codes and do a PC-relative conditional branch
If FPConditionCode(CCn+1) = 1 or FPConditionCode(CCn) = 1, then branch
An 18-bit signed offset (the 16-bit offset ield shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself) in the branch delay slot to form a PC-relative effective target address. If either one of the two FP condition code bits CC is true (1), the program branches to the effective target address after the instruction in the delay slot is executed.
The CC speciied must align to 2, so bit 18 must always be zero. For example, specifying a value of 2 will check if in UNPREeither one of CC3 or CC2 is 1 and branch accordingly. Specifying an illegally aligned CC will result
DICTABLE behavior.
An FP condition code is set by an FP compare instruction, C.cond.fmt, and the MIPS-3D compare absolute instruction CABS.cond.fmt.
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
This operation speciication is for the general Branch On Any Two Condition operation with the tf (true/false) as a variables. The individual instructions BC1ANY2F and BC1ANY2T have a speciic values for tf.
I: condition = (FPConditionCode(cc) = 1) or (FPConditionCode(cc+1) = 1) target_offset = (offset15)GPRLEN-(16+2)|| offset || 02 I+1: if condition then PC = PC + target_offset endif
Coprocessor Unusable, Reserved Instruction
Unimplemented Operation
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register
(JR) instructions to branch to addresses outside this range.
COP1 010001 |
BC1ANY4 01010 |
cc x00 |
nd 0 |
tf 0 |
offset |
6 |
5 |
3 |
1 |
1 |
16 |
BC1ANY4F cc,offset |
MIPS-3D |
Branch on Any of Four Floating Point Condition Codes False |
Branch on Any of Four Floating Point Condition Codes False
To test four consecutive FP condition codes and do a PC-relative conditional branch
If FPConditionCode(CCn+3) = 0 or FPConditionCode(CCn+2) = 0 or FPConditionCode(CCn+1) = 0 or FPConditionCode(CCn) = 0, then branch
An 18-bit signed offset (the 16-bit offset ield shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself) in the branch delay slot to form a PC-relative effective target address. If any of the four FP condition code bits CC is false (0), the program branches to the effective target address after the instruction in the delay slot is executed.
The CC speciied must align to 4, so bits 18 and 19 must always be zero. For example, specifying a value of 0 will in UNPREcheck if any one of CC3..0 is 0 and branch accordingly. Specifying an illegally aligned CC will result
DICTABLE behavior.
An FP condition code is set by an FP compare instruction, C.cond.fmt and the MIPS-3D compare absolute instruction
CABS.cond.fmt.
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
This operation speciication is for the general Branch On Any Four Condition operation with the tf (true/false) as a variables. The individual instructions BC1ANY4F and BC1ANY4T have a speciic values for tf.
I: condition = (FPConditionCode(cc) = 0) or (FPConditionCode(cc+1) = 0) or (FPConditionCode(cc+2) = 0) or (FPConditionCode(cc+3) = 0) target_offset = (offset15)GPRLEN-(16+2)|| offset || 02 I+1: if condition then PC = PC + target_offset endif
Coprocessor Unusable, Reserved Instruction
Unimplemented Operation
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register
(JR) instructions to branch to addresses outside this range.
COP1 010001 |
BC1ANY4 01010 |
cc x00 |
nd 0 |
tf 1 |
offset |
6 |
5 |
3 |
1 |
1 |
16 |
BC1ANY4T cc,offset |
MIPS-3D |
Branch on Any of Four Floating Point Condition Codes True |
Branch on Any of Four Floating Point Condition Codes True
To test four consecutive FP condition codes and do a PC-relative conditional branch
If FPConditionCode(CCn+3) = 1 or FPConditionCode(CCn+2) = 1 or FPConditioncode(CCn+1) = 1 or FPConditionCode(CCn) = 1, then branch
An 18-bit signed offset (the 16-bit offset ield shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself) in the branch delay slot to form a PC-relative effective target address. If any of four
FP condition code bits CC is true (1), the program branches to the effective target address after the instruction in the delay slot is executed.
The CC speciied must align to 4, so bits 18 and 19 must always be zero. For example, specifying a value of 4 will check if any of the bits CC7..4 is 1 and branch accordingly. Specifying an illegally aligned CC will result in UNPREDICTABLE behavior.
An FP condition code is set by an FP compare instruction, C.cond.fmt and the MIPS-3D compare absolute instruction
CABS.cond.fmt.
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.
This operation speciication is for the general Branch On Any Four Condition operation with the tf (true/false) as a variables. The individual instructions BC1ANY4F and BC1ANY4T have a speciic values for tf.
I: condition = (FPConditionCode(cc) = 1) or (FPConditionCode(cc+1) = 1) or (FPConditionCode(cc+2) = 1) or (FPConditionCode(cc+3) = 1) target_offset = (offset15)GPRLEN-(16+2)|| offset || 02 I+1: if condition then PC = PC + target_offset endif
Coprocessor Unusable, Reserved Instruction
Unimplemented Operation
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register
(JR) instructions to branch to addresses outside this range.
COP1 010001 |
fmt |
ft |
fs |
cc |
0 |
A 1 |
FC 11 |
cond |
6 |
5 |
5 |
5 |
3 |
1 |
1 |
2 |
4 |
CABS.cond.fmt |
Floating Point Absolute Compare | |
CABS.cond.S cc,fs,ft |
MIPS-3D |
Floating Point Absolute Compare |
CABS.cond.D cc,fs,ft |
MIPS-3D |
Floating Point Absolute Compare |
CABS.cond.PS cc,fs,ft |
MIPS-3D |
Floating Point Absolute Compare |
Floating Point Absolute Compare
To compare FP values and record the boolean result in one or more condition codes
FPConditionCode(cc) = FPR[fs] compare_absolute_cond FPR[ft]
The absolute value in FPR fs is compared to the absolute value in FPR ft; the values are in format fmt. The comparison is exact and neither overflows nor underflows.
If the comparison specified by cond2..1 is true for the operand values, the result is true; otherwise, the result is false. If no exception is taken, the result is written into condition code CC; true is 1 and false is 0.
CABS.cond.PS compares the upper and lower halves of FPR fs and FPR ftindependently and writes the results into condition codes CC+1 and CC respectively. The CC number must be even. If the number is not even the operation of the instruction is UNPREDICTABLE.
See the description of the C.cond.fmtinstruction in Volume II of this multi-volume set for a complete description of the cond value and the behavior of the compare instruction.
The fields fs and ft must specify FPRs valid for operands of type fmt; if they are not valid, the result is UNPREDICTABLE. is UNPREDICTABLE and the value of the
The operands must be values in formatfmt;if they are not,the result operand FPRs becomes UNPREDICTABLE.
The result of CABS.cond.PS is UNPREDICTABLE if the processor is executing in 16 FP registers mode, or if the condition code number is odd.
if SNaN(ValueFPR(fs, fmt)) or SNaN(ValueFPR(ft, fmt)) or QNaN(ValueFPR(fs, fmt)) or QNaN(ValueFPR(ft, fmt)) then less = false equal = false unordered = true if (SNaN(ValueFPR(fs,fmt)) or SNaN(ValueFPR(ft,fmt))) or (cond3 and (QNaN(ValueFPR(fs,fmt)) or QNaN(ValueFPR(ft,fmt)))) then SignalException(InvalidOperation) endif else less = AbsoluteValue(ValueFPR(fs, fmt)) <fmt AbsoluteValue(ValueFPR(ft, fmt)) equal = AbsoluteValue(ValueFPR(fs, fmt)) =fmt AbsoluteValue(ValueFPR(ft, fmt)) unordered = false endif condition = (cond2 and less) or (cond1 and equal) or (cond0 and unordered) SetFPConditionCode(cc, condition)
For CABS.cond.PS,the pseudo code above is repeated for both halves of the operand registers, treating each half as an independent single-precision values. Exceptions on the two halves are logically ORed and reported together. The results of the lower half comparison are written to condition code CC; the results of the upper half comparison are written to condition code CC+1.
Coprocessor Unusable, Reserved Instruction
Unimplemented Operation, Invalid Operation
COP1 010001 |
fmt 10100 |
0 00000 |
fs |
fd |
CVT.PS.PW 100110 |
6 |
5 |
5 |
5 |
5 |
6 |
CVT.PS.PW fd,fs |
MIPS-3D |
Floating Point Convert Paired Word to Paired Single |
Floating Point Convert Paired Word to Paired Single
To convert a pair of 32-bit fixed point words to FP paired-single value
FPR[fd] = (convert_and_round(FPR[fs]63..32) || convert_and_round(FPR[f]s31..0)
The value in FPR fs, in format PW, is converted to a value in paired-single floating point format and rounded according to the current rounding mode in FCSR. The result is placed in FPR fd.
The fields fs and fd must specify valid FPRs-fs for type PW and fd for type PS. If they are not valid,the resultis
UNPREDICTABLE. The operand in register fs must be a value in format type PW; if it is not, the result is UNPREDICTABLE and the value in the operand FPR becomes UNPREDICTABLE.
The result of this instruction is UNPREDICTABLE if the processor is executing in 16 FP registers mode.
StoreFPR(fd, PS, ConvertFmt(ValueFPR(fs, PW)63..32, W, S) || ConvertFmt(ValueFPR(fs, PW)31..0, W, S) )
Coprocessor Unusable, Reserved Instruction
Unimplemented Operation
COP1 010001 |
fmt 10110 |
0 00000 |
fs |
fd |
CVT.PW.PS 100100 |
6 |
5 |
5 |
5 |
5 |
6 |
CVT.PW.PS fd,fs |
MIPS-3D |
Floating Point Convert Paired Single to Paired Word |
Floating Point Convert Paired Single to Paired Word
To convert a FP paired-single value to a pair of 32-bit fixed point words
FPR[fd].PU = convert_and_round(FPR[fs].PU); FPR[fd].PL =convert_and_round(FPR[fs].PL)
The values in FPR fs,in format PS, are converted to a pair of values in 32-bit word fixed point format and rounded according to the current rounding mode in FCSR. The result is placed in FPR fd. The conversions of the two halves are done independently.
When either source value is Infinity, NaN, or rounds to an integer outside the range -231 to 231-1,the result cannot be represented correctly, an IEEE Invalid Operation condition exists, and the Invalid Operation flag is set in the
FCSR. If the Invalid Operation Enable bit is set in the FCSR, no result is written to fd and an Invalid Operation exception is taken immediately. Otherwise,
the defaultresult, 231–1,is written to the correspond half of FPR fd which caused the exception.
The fields fs and fd must specify valid FPRs-fs for type PS and fd for type PW. If they are not valid,the resultis
UNPREDICTABLE. The format of the data in the specified operand register fs must be a value in format PS; if it is
not, the result is UNPREDICTABLE and the value in the operand FPR becomes UNPREDICTABLE.
The result of this instruction is UNPREDICTABLE if the processor is executing in 16 FP registers mode.
StoreFPR(fd, PW, ConvertFmt(ValueFPR(fs, PS)63..32, S, W) || ConvertFmt(ValueFPR(fs, PS)31..0, S, W) )
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions
Unimplemented Operation, Invalid Operation, Overflow, Inexact
COP1 010001 |
fmt 10110 |
ft |
fs |
fd |
MULR.PS 011010 |
6 |
5 |
5 |
5 |
5 |
6 |
MULR.PS fd, fs, ft |
MIPS-3D |
Floating Point Reduction Multiply |
Floating Point Reduction Multiply
To perform a reduction multiply on two paired-single floating point values
FPR[fd].PL = FPR[ft].PU ∗ FPR[ft].PL; FPR[fd].PU = FPR[fs].PU ∗ FPR[fs].PL
The paired-single values in FPR ft are multiplied together and the result putin the lower paired-single position of
FPR fd. Similarly, the paired-single values in FPR fs are multiplied together and the result put in the upper paired-single position of FPR fd. The two results are calculated to infinite precision and rounded by using the current rounding mode in FCSR. The operands and result are values in format PS.
Any generated exceptions in the two independent adds are OR’ed together. Cause bits are ORed into the Flag bits if no exception is taken.
is UNPRE-The fields fs,ft, and fd must specify FPRs valid for operands of type PS. If they are not valid, the result
DICTABLE.
is UNPREDICTABLE and the values in theThe operands must be values in format PS. If they are not,the result operand FPRs become UNPREDICTABLE.
The result of ADDR.PS is UNPREDICTABLE if the processor is executing in 16 FP registers mode.
lower = ValueFPR(ft, PS)31..0 × ValueFPR(ft, PS)63..32 upper = ValueFPR(fs, PS)31..0 × ValueFPR(fs, PS)63..32 StoreFPR (fd, PS, upper || lower)
Coprocessor Unusable, Reserved Instruction
Unimplemented Operation, Invalid Operation, Overflow, Inexact, Underflow
COP1 010001 |
fmt |
0 00000 |
fs |
fd |
RECIP1 011101 |
6 |
5 |
5 |
5 |
5 |
6 |
RECIP1.fmt |
Floating Point Reduced Precision Reciprocal (Sequence Step 1) | |
RECIP1.S fd,fs |
MIPS-3D |
Floating Point Reduced Precision Reciprocal (Sequence Step 1) |
RECIP1.D fd,fs |
MIPS-3D |
Floating Point Reduced Precision Reciprocal (Sequence Step 1) |
RECIP1.PS fd,fs |
MIPS-3D |
Floating Point Reduced Precision Reciprocal (Sequence Step 1) |
Floating Point Reduced Precision Reciprocal (Sequence Step 1)
Generate a reduced-precision reciprocal of one or two FP values
FPR[fd] = 1.0 / FPR[fs]
The reciprocal of the value in FPR fs is approximated and placed in FPR fd. The operand and result are values in format S, D, or PS.
The numeric accuracy of this operation is implementation dependent;it does not meetthe accuracy specified by the
IEEE 754 Floating Point standard. A minimum accuracy of 14 bits is recommended for both the S and D input data formats.
It is implementation dependent whether the result is affected by the current rounding mode in FCSR. This instruction is meant to operate in RN (round to nearest) mode for the best accuracy. It is also meant to operate in the Flush to
Zero (FS=0) mode. In this mode, if the incoming data is in the denormalized range, it is assumed to be zero, and if the output is in the denormalized range, it is forced to zero.
In addition, if the input to this instruction is zero, the output is not infinity, but the maximum normalized value. This property is useful for 3D graphics applications. If the input is infinity, the output is zero.
This instruction is used as the first step of an instruction sequence that can be used to produce a full precision reciprocal value. See the description of RECIP2.fmt for an example of how to use this instruction in a code sequence to produce a full precision reciprocal result.
is UNPRE-The fields fs and fd must specify FPRs valid for operands oftype fmt.Ifthey are not valid,the result
DICTABLE. The format of the data in the specified operand register fs must be a value in format fmt; if it is not, the
result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.
StoreFPR(fd, fmt, (1.0 / ValueFPR(fs, fmt))ReducedPrecision)
Coprocessor Unusable, Reserved Instruction
Unimplemented Operation, Invalid Operation, Overflow, Inexact, Underflow, Division-by-zero
COP1 010001 |
fmt |
ft |
fs |
fd |
RECIP2 011100 |
6 |
5 |
5 |
5 |
5 |
6 |
RECIP2.fmt |
Floating Point Reduced Precision Reciprocal (Sequence Step 2) | |
RECIP2.S fd,fs,ft |
MIPS-3D |
Floating Point Reduced Precision Reciprocal (Sequence Step 2) |
RECIP2.D fd,fs,ft |
MIPS-3D |
Floating Point Reduced Precision Reciprocal (Sequence Step 2) |
RECIP2.PS fd,fs,ft |
MIPS-3D |
Floating Point Reduced Precision Reciprocal (Sequence Step 2) |
Floating Point Reduced Precision Reciprocal (Sequence Step 2)
Take the result of RECIP1.fmt and iterate towards obtaining a full precision reciprocal FP value
FPR[fd] = iterate with FPR[fs] and FPR[ft]
This is the second step in the instruction sequence used to generate a full precision reciprocal result. (RECIP1.fmt instruction is the first step). The operand and result are values in format S, D, or PS.
The numeric accuracy of this operation is implementation dependent;it does not meetthe accuracy specified by the
IEEE 754 Floating Point standard.
It is implementation dependent whether the result is affected by the current rounding mode in FCSR. This instruction is meant to operate in RN (round to nearest) mode for the best accuracy. It is also meant to operate in the Flush to
Zero (FS=0) mode. In this mode, if the incoming data is in the denormalized range, it is assumed to be zero, and if the output is in the denormalized range, it is forced to zero.
The example below shows how a full precision reciprocalresult can be obtained using the RECIP1 and RECIP2 instructions. Assume that a value b is in register f0 in format S. Assume that RECIP1.fmt produces a 16-bit result. At the end of the three-instruction sequence shown below, register f3 contains the full precision 24-bit reciprocal 1/b.
RECIP1.S f1, f0 /* reduced precision 16-bit 1/b */ RECIP2.S f2, f1, f0 /* -(b * f1 - 1.0) */ MADD.S f3, f1, f1, f2 /* 24-bit 1/b */
The instruction sequence to produce a double, 52-bit result is as follows:
RECIP1.D f1, f0 /* reduced precision 16-bit 1/b */ RECIP2.D f2, f1, f0 /* -(b * f1 - 1.0) */ MADD.D f3, f1, f1, f2 /* 32-bit 1/b */ RECIP2.D f4, f3, f0 /* -(b * f3 - 1.0) */ MADD.D f5, f3, f3, f4 /* 53-bit 1/b */
The instruction sequence to take a paired single value and produce a paired single resultis as follows. Assume that register f0 holds two single values a and b in a paired single format, i.e., f0 = a | b.
RECIP1.PS f1, f0 /* ( reduced precision 16-bit 1/a and 1/b ) */ RECIP2.PS f2, f1, f0 /* ( -(a*f1-1.0) and -(b*f1-1.0) ) */ MADD.PS f3, f1, f1, f2 /* ( 24-bit 1/a and 1/b ) */
If the hardware does not implement the RECIP1.PS instruction, it is still possible to obtain a paired single result, but this takes three more instructions in the required sequence. Assume that register f0 holds a single value a and register
f1 holds a single value b.
RECIP1.S f2, f0 /* ( f2 gets reduced precision 1/a ) */ RECIP1.S f3, f1 /* ( f3 gets reduced precision 1/b ) */ CVT.PS.S f4, f1, f0 /* ( f4 now holds the PS values b | a ) */ CVT.PS.S f5, f3, f2 /* ( f5 holds PS seed 1/b | 1/a ) */ RECIP2.PS f6, f5, f4 /* ( f6 holds intermediate 1/b | 1/a ) */ MADD.PS f7, f5, f5, f6 /* ( f7 holds full precision PS 1/b | 1/a ) */
The fields fs, ft, and fd must specify FPRs valid for operands of type fmt. If they are not valid, the result is UNPREDICTABLE. The format of the data in the specified operand register fs must be a value in format fmt; if it is not, the result is UNPREDICTABLE and the value in the operand FPR becomes UNPREDICTABLE.
The result of RECIP2.PS is UNPREDICTABLE if the processor is executing in 16 FP registers mode.
StoreFPR(fd, fmt, RECIP_iteration(ValueFPR(fs, fmt), ValueFPR(ft, fmt)))
Coprocessor Unusable, Reserved Instruction
Unimplemented Operation, Inexact, Invalid Operation, Overflow, Underflow
COP1 010001 |
fmt |
0 00000 |
fs |
fd |
RSQRT1 011110 |
6 |
5 |
5 |
5 |
5 |
6 |
RSQRT1.fmt |
Floating Point Reduced Precision Reciprocal Square Root (Sequence Step 1) | |
RSQRT1.S fd, fs |
MIPS-3D |
Floating Point Reduced Precision Reciprocal Square Root (Sequence Step 1) |
RSQRT1.D fd, fs |
MIPS-3D |
Floating Point Reduced Precision Reciprocal Square Root (Sequence Step 1) |
RSQRT1.PS fd, fs |
MIPS-3D |
Floating Point Reduced Precision Reciprocal Square Root (Sequence Step 1) |
Floating Point Reduced Precision Reciprocal Square Root (Sequence Step 1)
To produce a reduced-precision reciprocal of the square root of one or two FP values
FPR[fd] = 1.0 / sqrt (FPR[fs])
The reciprocal of the positive square root of the value in FPR fs is approximated and placed in FPR fd. The operand and result are values in format S, D, or PS.
The numeric accuracy of this operation is implementation dependent;it does not meetthe accuracy specified by the
IEEE 754 Floating Point standard. A minimum accuracy of 14 bits is recommended for the S input data format, and
23 bits for the D data format.
It is implementation dependent whether the result is affected by the current rounding mode in FCSR.
In addition, if the input to this instruction is zero, the output is not infinity, but the maximum normalized value. This property is useful for 3D graphics applications. If the input is infinity, the output is zero.
This instruction is used as the first step of an instruction sequence that can be used to produce a full precision reciprocal square root value. See the description of RSQRT2.fmt for an example of how to use this instruction in a code sequence to produce a full precision reciprocal square root result.
is UNPRE-The fields fs and fd must specify FPRs valid for operands of type fmt. If they are not valid,the result
DICTABLE. The format of the data in the specified operand register fs must be a value in format fmt; if it is not, the
result is UNPREDICTABLE and the value in the operand FPR becomes UNPREDICTABLE.
StoreFPR(fd, fmt, (1.0 / SquareRoot(ValueFPR(fs, fmt)))ReducedPrecision)
Coprocessor Unusable, Reserved Instruction
Unimplemented Operation, Invalid Operation, Overflow, Inexact, Underflow, Division-by-zero
COP1 010001 |
fmt |
ft |
fs |
fd |
RSQRT2 011111 |
6 |
5 |
5 |
5 |
5 |
6 |
RSQRT2.fmt |
Floating Point Reduced Precision Reciprocal Square Root (Sequence Step 2) | |
RSQRT2.S fd, fs, ft |
MIPS-3D |
Floating Point Reduced Precision Reciprocal Square Root (Sequence Step 2) |
RSQRT2.D fd, fs, ft |
MIPS-3D |
Floating Point Reduced Precision Reciprocal Square Root (Sequence Step 2) |
RSQRT2.PS fd, fs, ft |
MIPS-3D |
Floating Point Reduced Precision Reciprocal Square Root (Sequence Step 2) |
Floating Point Reduced Precision Reciprocal Square Root (Sequence Step 2)
Iterate towards obtaining a full precision reciprocal square root FP value
FPR[fd] = iterate with FPR[fs] and FPR[ft]
This is a step of iteration towards generating the full precision reciprocal square root value. The operand and result are values in format S, D, or PS.
The numeric accuracy of this operation is implementation dependent;it does not meetthe accuracy specified by the
IEEE 754 Floating Point standard.
It is implementation dependent whether the result is affected by the current rounding mode in FCSR.
A full precision reciprocal square root result is obtained by using the instruction sequence shown below. Assume that a value b is in register f0 in format S. Assume that RSQRT1.fmt has a 16-bit precision in the example implementation. At the end of the four-instruction sequence shown below, register f4 contains the full precision 24-bit reciprocal square root 1/(sqrt)b.
RSQRT1.S f1, f0 /* 16-bit 1/sqrt(b) */ MUL.S f2, f1, f0 /* b * f0 */ RSQRT2.S f3, f2, f1 /* -(f1 * f2 - 1.0)/2 */ MADD.S f4, f1, f1, f3 /* 24-bit 1/sqrt(b) */
The instruction sequence to produce a 52-bit result is as follows:
RSQRT1.D f1, f0 /* 16-bit 1/sqrt(b) */ MUL.D f2, f1, f0 /* b * f0 */ RSQRT2.D f3, f2, f1 /* -(f1 * f2 - 1.0)/2 */ MADD.D f4, f1, f1, f3 /* 31-bit 1/sqrt(b) */ MUL.D f5, f0, f4 /* b * f0 */ RSQRT2.D f6, f5, f4 /* -(f4 * f5 - 1.0)/2 */ MADD.D f7, f4, f4, f6 /* 53-bit 1/sqrt(b) */
The instruction sequence to take a paired single value and produce a paired single resultis as follows. Assume that register f0 holds two single values a and b in a paired single format, i.e., f0 = a | b.
RSQRT1.PS f1, f0 /* ( 16-bit 1/sqrt(a) and 1/sqrt(b) ) */ MUL.PS f2, f1, f0 /* ( a * f0 and b * f1 ) */ RSQRT2.PS f3, f2, f1 /* ( -(f1*f2-1.0)/2 ) */ MADD.PS f4, f1, f1, f3 /* ( 24-bit 1/sqrt(a) and 1/sqrt(b) ) */
If the hardware does not implement the RSQRT1.PS instruction, it is still possible to obtain a paired single result, but this takes three more instructions in the required sequence. Assume that register f0 holds a single value a and register
f1 holds a single value b.
RSQRT1.S f2, f0 /* ( f2 gets reduced precision 1/sqrt(a) ) */ RSQRT1.S f3, f1 /* ( f3 gets reduced precision 1/sqrt(b) ) */ CVT.PS.S f4, f1, f0 /* ( f4 now holds the PS values b | a ) */ CVT.PS.S f5, f3, f2 /* ( f5 holds PS seed 1/sqrt(b) | 1/sqrt(a) ) */ MUL.PS f6, f5, f4 /* ( f6 holds intermediate1 results ) */ RSQRT2.PS f7, f6, f5 /* ( f7 holds intermediate2 results ) */ MADD.PS f8, f5, f5, f7 /* ( f8 holds full precision PS 1/sqrt(b) | */ /* 1/sqrt(a) ) */
The fields fs, ft, and fd must specify FPRs valid for operands of type fmt. If they are not valid, the result is UNPREDICTABLE. The format of the data in the specified operand register fs must be a value in format fmt; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.
StoreFPR(fd, fmt, RSQRT_iteration(ValueFPR(fs, fmt), ValueFPR(ft, fmt)))
Coprocessor Unusable, Reserved Instruction
Unimplemented Operation, Invalid Operation, Overflow, Inexact, Underflow