VFMSUBADD132PH/VFMSUBADD213PH/VFMSUBADD231PH - Fused Multiply-Alternating Subtract/Add of Packed FP16 Values

Opcode/ Instruction

Op/ En

64/32 bit Mode Support

CPUID Feature Flag

Description

EVEX.128.66.MAP6.W0 97 /r VFMSUBADD132PH xmm1{k1}{z}, xmm2, xmm3/m128/m16bcst

A

V/V

AVX512-FP16 AVX512VL

Multiply packed FP16 values from xmm1 and xmm3/m128/m16bcst, subtract/add elements in xmm2, and store the result in xmm1 subject to writemask k1.

EVEX.256.66.MAP6.W0 97 /r VFMSUBADD132PH ymm1{k1}{z}, ymm2, ymm3/m256/m16bcst

A

V/V

AVX512-FP16 AVX512VL

Multiply packed FP16 values from ymm1 and ymm3/m256/m16bcst, subtract/add elements in ymm2, and store the result in ymm1 subject to writemask k1.

EVEX.512.66.MAP6.W0 97 /r VFMSUBADD132PH zmm1{k1}{z}, zmm2, zmm3/m512/m16bcst {er}

A

V/V

AVX512-FP16

Multiply packed FP16 values from zmm1 and zmm3/m512/m16bcst, subtract/add elements in zmm2, and store the result in zmm1 subject to writemask k1.

EVEX.128.66.MAP6.W0 A7 /r VFMSUBADD213PH xmm1{k1}{z}, xmm2, xmm3/m128/m16bcst

A

V/V

AVX512-FP16 AVX512VL

Multiply packed FP16 values from xmm1 and xmm2, subtract/add elements in xmm3/m128/ m16bcst, and store the result in xmm1 subject to writemask k1.

EVEX.256.66.MAP6.W0 A7 /r VFMSUBADD213PH ymm1{k1}{z}, ymm2, ymm3/m256/m16bcst

A

V/V

AVX512-FP16 AVX512VL

Multiply packed FP16 values from ymm1 and ymm2, subtract/add elements in ymm3/m256/ m16bcst, and store the result in ymm1 subject to writemask k1.

EVEX.512.66.MAP6.W0 A7 /r VFMSUBADD213PH zmm1{k1}{z}, zmm2, zmm3/m512/m16bcst {er}

A

V/V

AVX512-FP16

Multiply packed FP16 values from zmm1 and zmm2, subtract/add elements in zmm3/m512/ m16bcst, and store the result in zmm1 subject to writemask k1.

EVEX.128.66.MAP6.W0 B7 /r VFMSUBADD231PH xmm1{k1}{z}, xmm2, xmm3/m128/m16bcst

A

V/V

AVX512-FP16 AVX512VL

Multiply packed FP16 values from xmm2 and xmm3/m128/m16bcst, subtract/add elements in xmm1, and store the result in xmm1 subject to writemask k1.

EVEX.256.66.MAP6.W0 B7 /r VFMSUBADD231PH ymm1{k1}{z}, ymm2, ymm3/m256/m16bcst

A

V/V

AVX512-FP16 AVX512VL

Multiply packed FP16 values from ymm2 and ymm3/m256/m16bcst, subtract/add elements in ymm1, and store the result in ymm1 subject to writemask k1.

EVEX.512.66.MAP6.W0 B7 /r VFMSUBADD231PH zmm1{k1}{z}, zmm2, zmm3/m512/m16bcst {er}

A

V/V

AVX512-FP16

Multiply packed FP16 values from zmm2 and zmm3/m512/m16bcst, subtract/add elements in zmm1, and store the result in zmm1 subject to writemask k1.

Instruction Operand Encoding

Op/En

Tuple

Operand 1

Operand 2

Operand 3

Operand 4

A

Full

ModRM:reg (r, w)

VEX.vvvv (r)

ModRM:r/m (r)

N/A

Description

This instruction performs a packed multiply-add (even elements) or multiply-subtract (odd elements) computation on FP16 values using three source operands and writes the results in the destination operand. The destination operand is also the first source operand. The notation "132", "213" and "231" indicate the use of the operands in A * B +- C, where each digit corresponds to the operand number, with the destination being operand 1; see Table 5-7.

The destination elements are updated according to the writemask.

Table 5-7. VFMSUBADD[132,213,231]PH Notation for Odd and Even Elements

Notation

Odd Elements

Even Elements

132

dest = dest*src3-src2

dest = dest*src3+src2

231

dest = src2*src3-dest

dest = src2*src3+dest

213

dest = src2*dest-src3

dest = src2*dest+src3

Operation

VFMSUBADD132PH DEST, SRC2, SRC3 (EVEX encoded versions) when src3 operand is a register

VL = 128, 256 or 512
KL := VL/16
IF (VL = 512) AND (EVEX.b = 1):
   SET_RM(EVEX.RC)
ELSE
   SET_RM(MXCSR.RC)
FOR j := 0 TO KL-1:
   IF k1[j] OR *no writemask*:
       IF *j is even*:
            DEST.fp16[j] := RoundFPControl(DEST.fp16[j]*SRC3.fp16[j] + SRC2.fp16[j])
       ELSE:
            DEST.fp16[j] := RoundFPControl(DEST.fp16[j]*SRC3.fp16[j] - SRC2.fp16[j])
   ELSE IF *zeroing*:
       DEST.fp16[j] := 0
   // else dest.fp16[j] remains unchanged
DEST[MAXVL-1:VL] := 0 

VFMSUBADD132PH DEST, SRC2, SRC3 (EVEX encoded versions) when src3 operand is a memory source

VL = 128, 256 or 512
KL := VL/16
FOR j := 0 TO KL-1:
   IF k1[j] OR *no writemask*:
       IF EVEX.b = 1:
            t3 := SRC3.fp16[0]
       ELSE:
            t3 := SRC3.fp16[j]
       IF *j is even*:
            DEST.fp16[j] := RoundFPControl(DEST.fp16[j] * t3 + SRC2.fp16[j])
       ELSE:
            DEST.fp16[j] := RoundFPControl(DEST.fp16[j] * t3 - SRC2.fp16[j])
   ELSE IF *zeroing*:
       DEST.fp16[j] := 0
   // else dest.fp16[j] remains unchanged
DEST[MAXVL-1:VL] := 0:

VFMSUBADD213PH DEST, SRC2, SRC3 (EVEX encoded versions) when src3 operand is a register

VL = 128, 256 or 512
KL := VL/16
IF (VL = 512) AND (EVEX.b = 1):
   SET_RM(EVEX.RC)
ELSE
   SET_RM(MXCSR.RC)
FOR j := 0 TO KL-1:
   IF k1[j] OR *no writemask*:
       IF *j is even*:
            DEST.fp16[j] := RoundFPControl(SRC2.fp16[j]*DEST.fp16[j] + SRC3.fp16[j])
       ELSE
            DEST.fp16[j] := RoundFPControl(SRC2.fp16[j]*DEST.fp16[j] - SRC3.fp16[j])
   ELSE IF *zeroing*:
       DEST.fp16[j] := 0
   // else dest.fp16[j] remains unchanged
DEST[MAXVL-1:VL] := 0

VFMSUBADD213PH DEST, SRC2, SRC3 (EVEX encoded versions) when src3 operand is a memory source

VL = 128, 256 or 512
KL := VL/16
FOR j := 0 TO KL-1:
   IF k1[j] OR *no writemask*:
       IF EVEX.b = 1:
            t3 := SRC3.fp16[0]
       ELSE:
            t3 := SRC3.fp16[j]
       IF *j is even*:
            DEST.fp16[j] := RoundFPControl(SRC2.fp16[j] * DEST.fp16[j] + t3 )
       ELSE:
            DEST.fp16[j] := RoundFPControl(SRC2.fp16[j] * DEST.fp16[j] - t3 )
   ELSE IF *zeroing*:
       DEST.fp16[j] := 0
   // else dest.fp16[j] remains unchanged
DEST[MAXVL-1:VL] := 0:

VFMSUBADD231PH DEST, SRC2, SRC3 (EVEX encoded versions) when src3 operand is a register

VL = 128, 256 or 512
KL := VL/16
IF (VL = 512) AND (EVEX.b = 1):
   SET_RM(EVEX.RC)
ELSE
   SET_RM(MXCSR.RC)
FOR j := 0 TO KL-1:
   IF k1[j] OR *no writemask*:
       IF *j is even:
            DEST.fp16[j] := RoundFPControl(SRC2.fp16[j]*SRC3.fp16[j] + DEST.fp16[j])
       ELSE:
            DEST.fp16[j] := RoundFPControl(SRC2.fp16[j]*SRC3.fp16[j] - DEST.fp16[j])
   ELSE IF *zeroing*:
       DEST.fp16[j] := 0
   // else dest.fp16[j] remains unchanged
DEST[MAXVL-1:VL] := 0

VFMSUBADD231PH DEST, SRC2, SRC3 (EVEX encoded versions) when src3 operand is a memory source

VL = 128, 256 or 512
KL := VL/16
FOR j := 0 TO KL-1:
   IF k1[j] OR *no writemask*:
       IF EVEX.b = 1:
            t3 := SRC3.fp16[0]
       ELSE:
            t3 := SRC3.fp16[j]
       IF *j is even*:
            DEST.fp16[j] := RoundFPControl(SRC2.fp16[j] * t3 + DEST.fp16[j] )
       ELSE:
            DEST.fp16[j] := RoundFPControl(SRC2.fp16[j] * t3 - DEST.fp16[j] )
   ELSE IF *zeroing*:
       DEST.fp16[j] := 0
   // else dest.fp16[j] remains unchanged
DEST[MAXVL-1:VL] := 0

Intel C/C++ Compiler Intrinsic Equivalent

VFMSUBADD132PH, VFMSUBADD213PH, and VFMSUBADD231PH:
__m128h _mm_fmsubadd_ph (__m128h a, __m128h b, __m128h c);
__m128h _mm_mask_fmsubadd_ph (__m128h a, __mmask8 k, __m128h b, __m128h c);
__m128h _mm_mask3_fmsubadd_ph (__m128h a, __m128h b, __m128h c, __mmask8 k);
__m128h _mm_maskz_fmsubadd_ph (__mmask8 k, __m128h a, __m128h b, __m128h c);
__m256h _mm256_fmsubadd_ph (__m256h a, __m256h b, __m256h c);
__m256h _mm256_mask_fmsubadd_ph (__m256h a, __mmask16 k, __m256h b, __m256h c);
__m256h _mm256_mask3_fmsubadd_ph (__m256h a, __m256h b, __m256h c, __mmask16 k);
__m256h _mm256_maskz_fmsubadd_ph (__mmask16 k, __m256h a, __m256h b, __m256h c);
__m512h _mm512_fmsubadd_ph (__m512h a, __m512h b, __m512h c);
__m512h _mm512_mask_fmsubadd_ph (__m512h a, __mmask32 k, __m512h b, __m512h c);
__m512h _mm512_mask3_fmsubadd_ph (__m512h a, __m512h b, __m512h c, __mmask32 k);
__m512h _mm512_maskz_fmsubadd_ph (__mmask32 k, __m512h a, __m512h b, __m512h c);
__m512h _mm512_fmsubadd_round_ph (__m512h a, __m512h b, __m512h c, const int rounding);
__m512h _mm512_mask_fmsubadd_round_ph (__m512h a, __mmask32 k, __m512h b, __m512h c, const int rounding);
__m512h _mm512_mask3_fmsubadd_round_ph (__m512h a, __m512h b, __m512h c, __mmask32 k, const int rounding);
__m512h _mm512_maskz_fmsubadd_round_ph (__mmask32 k, __m512h a, __m512h b, __m512h c, const int rounding);

SIMD Floating-Point Exceptions

Invalid, Underflow, Overflow, Precision, Denormal.

Other Exceptions

EVEX-encoded instructions, see Table 2-46, "Type E2 Class Exception Conditions."