Talk:Advanced Vector Extensions

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing
Low	This article has been rated as Low-importance on the project's importance scale.
	This article is supported by WikiProject Software (assessed as Low-importance).
	This article is supported by Computer hardware task force (assessed as Low-importance).

Introduction

Intro doesn't tell the story of what AVX is supposed to do, as in its purpose. Intro refers vaguely to "new features".

AVX-512

The section on AVX-512 looks like it has been copied from a news release. Maybe it should be a separate article. It needs to point out the distinction between the unnamed instruction set of Knights Corner and the AVX-512 instruction set of Knights Landing. The former uses an MVEX prefix and the latter uses an EVEX prefix. These two prefixes differ by a single bit, even for otherwise identical instructions. Therefore the two instruction sets are not mutually compatible, but both are backwards compatible with AVX2. Does anybody have info about the fate of the Knights Corner instruction set? Is it obsolete or will both lines be continued? Afog 09:48, 2 October 2013 (UTC)[reply]

Completely rewritten

I have rewritten the whole page to fix the issues discussed below and because the article had the tag:

AES, PCLMUL and FMA are separate instruction sets which I have put into separate articles. I have added information about AMD support, operating system support and many technical details. Afog (talk) 11:50, 4 June 2009 (UTC)[reply]

extended precision?

I want to clarify that AVE does 4x64 bit (double precision) and 8x32 bit (single precision) but NOT 2x128 (extended precision). The docs seem to indicate this as there is nothing mentioned about 128 bit floating point numbers. Can an expert please verify this statement as it is critical to understanding AVE.

Gene Thomas (talk) 03:08, 22 May 2009 (UTC)[reply]

Note: Extended double precision is 80 bits in size, but is often stored as 128-bit for alignment. True 128-bit floating point numbers are "quadruple precision". I have no idea if AVX supports either of these things. Aaronfranke (talk) 19:40, 18 September 2019 (UTC)[reply]

AVX does not support new data types, i.e. only the 32 and 64-bit floating-point numbers remain supported. F16C adds support for converting to/from 16-bit floating-point numbers. 80 or 128-bit floating-point numbers are not natively supported. 2A02:2168:84E0:CE00:D38C:F7B5:4F24:FDE5 (talk) 16:06, 1 November 2024 (UTC)[reply]

why does AVX look like altivec?

ibm calls AltiVec VMX, and actually never believed in altivec before apple and nintendo insisted on it for the Gecko and the later G5 design upgraded on the G5 to a 256bit SIMD and a 128Bit SIMD on the Gamecube and Wii. personally i think apple might have had something to do with AVX, but that's just speculation Markthemac (talk) 01:45, 9 June 2008 (UTC)[reply]

FMA

The article states that 1) the published AVX instruction set includes FMA instructions, and 2) FMA will appear in a future extension of the instruction set. There's a contradiction here, please clarify. --85.140.239.250 (talk) 20:22, 5 December 2008 (UTC)[reply]

Power efficient, idle power usage is insignificant

This is supposedly due to the new instructions? I think a link to source material is needed here or atleast a re-write, as generally I understand the term idle to imply that the processor is doing nothing! As I understand it, power usage during idle NOP instructions is a function of the power control unit within the CPU and not due to an instruction which in an Intel cpu is a microcode op or series of microcode ops.

I would suggest that the meaning here is to imply that a future CPU implementing AVE will have enhanced power control over these new instructions, shutting off unused units during AVE instuction execution. —Preceding unsigned comment added by 86.13.171.234 (talk) 00:16, 10 January 2009 (UTC)[reply]

The phrase "idle power usage is insignificant" refers to the power usage caused by leakage current of the extra transistors added to implement the AVX logic in the processor. Transistors use power even when they are not switching. All modern x86 processors shutoff idle units by gating the clock signal into the unit. When the clock signal is turned off, power consumption of the unit will be due solely to transistor leakage current. The power consumed by the AVX unit when the AVX unit is idle is an insignificant portion of the total power consumed by the processor. Typically, this would mean below 1% of total power. Ksavage9 (talk) 20:50, 20 April 2009 (UTC)[reply]

AMD reaction

Please incorporate in article info from http://forums.amd.com/devblog/blogpost.cfm?catid=208&threadid=112934 —Preceding unsigned comment added by 83.167.112.66 (talk) 14:11, 6 May 2009 (UTC)[reply]

Compiler support

Free Pascal is developing support for AVX within their SVN repository. I don't know if that's applicable or not to mention. PrincessSchala (talk) 08:09, 14 March 2012 (UTC)[reply]

three-operand instruction for sqrt?

Maybe someone can explain what sqrt with 3 operands means? xmm1=xmm2=sqrt(xmm3)?

 vsqrtsd xmm1, xmm2, xmm3

I'm disappointed - no fast exp(), cos() etc. Unlike CUDA. Intel really has a problem. Oh, and have to get a new operating system to use the thing. Oh, really. — Preceding unsigned comment added by 113.190.231.252 (talk) 11:51, 21 July 2012 (UTC)[reply]

It computes a square root of the FP64 element in the lower half of xmm3, then combines it with the FP64 element in the upper half of xmm2 and puts the result in xmm1. SSE sqrtsd used to do the same, only the first two arguments were the same register (that is, the square root was placed in the lower element of the target register, upper element unaffected). See SDM for instruction descriptions.

More complex math functions are more expensive to implement in hardware, even division and square root are difficult. And no, the fact that there are such functions in CUDA runtime does not mean they are actually implemented as dedicated hardware instructions. 188.32.106.30 (talk) 22:56, 9 July 2022 (UTC)[reply]

Windows XP and AVX

Does Windows XP support the AVX instructions? If not, what is the minimum Microsoft OS needed to support AVX? — Preceding unsigned comment added by 222.165.42.62 (talk) 10:49, 25 October 2012 (UTC)[reply]

Windows XP does not support AVX for the obvious reason that it is 32-bit. To use modern features of 64-bit processors, you need to use a 64-bit operating system made for 64-bit processors. Windows 7 64-bit supports AVX, but only with Service Pack 1. Aaronfranke (talk) 19:47, 18 September 2019 (UTC)[reply]

32 vs 64-bit is irrelevant. The OS needs to save and restore extra state (upper vector register halves) on context switching, and this needs to be done regardless of bitness. Windows XP does not support AVX because Microsoft did not update the kernel to support this extra state, but they did update Windows 7 SP1 (both 32 and 64-bit versions of it). 188.32.106.30 (talk) 22:47, 9 July 2022 (UTC)[reply]

AVX the successor to SSE?

Would it be correct to call AVX the successor to SSE (more precisely SSE4)? — Preceding unsigned comment added by 222.165.42.62 (talk) 04:01, 30 October 2012 (UTC)[reply]

No, but AVX2 could fullfil that role. In the article it is stated

* Suitable for floating point-intensive calculations in multimedia, scientific and financial applications (AVX2 adds support for integer operations).

and

* Improves Linux RAID software performance (required AVX2, AVX is not sufficient)[32]

To me that means, that AVX isn't capable of doing integer operations, but SSE4 and AVX2 is. Thus only AVX2 can be a successor of SSE4. --91.89.138.29 (talk) 22:47, 6 October 2019 (UTC)[reply]

AVX was also able to do integer operations just only 128bit at a time. AVX128 is still faster than SSE though because it has three operand instructions.Carewolf (talk) 20:38, 8 October 2019 (UTC)[reply]

Split off AVX-512

I plan to split AVX-512 off to a separate article. The EVEX based AVX-512 has many new details and consists of several new separate extensions, so it will be better dealt with separately. This can leave this article to deal with 128/256-bit VEX encoded AVX extensions. Any comments or objections? Carewolf (talk) 20:18, 23 February 2014 (UTC)[reply]

If you ask me, go ahead. — Dsimic (talk | contribs) 23:11, 23 February 2014 (UTC)[reply]

Knights Landing out date

"Knights Landing processor scheduled to ship in 2015"

The reference provided for this sentence doesn't talk about the shipping date. I couldn't find this date yet on the Internet. — Preceding unsigned comment added by 2A00:FE00:4103:1:0:0:0:300 (talk) 08:28, 22 August 2014 (UTC)[reply]

I've resolved this issue. Aaronfranke (talk) 19:48, 18 September 2019 (UTC)[reply]

AVX-128

This article states that it's safe to use AVX-128 on OSes that support only SSE and don't support AVX.

In Intel(R) Advanced Vector Extensions Programming Reference or Intel® Architecture Instruction Set Extensions Programming Reference, section "3.2: YMM STATE MANAGEMENT" it is clearly stated that

"An OS must enable its YMM state management to support AVX and FMA extensions. Otherwise, an attempt to execute an instruction in AVX or FMA extensions (including an enhanced 128-bit SIMD instructions using VEX encoding) will cause a #UD exception."

Also, according to AMD64 Architecture Programmer’s Manual Volume 6: 128-Bit and 256-Bit XOP and FMA4 Instructions even 128-bit XOP also cause #UD exception if an OS doesn't support YMM save/restore.

So I believe that the statement that "AVX-128 instructions that do not use YMM registers are also safe to use on operating systems without AVX-support[citation needed], since AVX-support in operating systems refers to handling YMM register state.[3]" is totally incorrect. — Preceding unsigned comment added by 109.188.120.144 (talk) 00:46, 25 October 2015 (UTC)[reply]

P.S. it makes sense, since AVX-128 instructions clear the upper half of a destination YMM register, so even AVX-128 touches all 256 bits of YMM reg. — Preceding unsigned comment added by 109.188.120.144 (talk) 09:22, 25 October 2015 (UTC)[reply]

It lacks a citation anyway, so feel free to remove it. I think it was based on the 128-bit forms doesn't use YMM registers.Carewolf (talk) 09:19, 26 October 2015 (UTC)[reply]

Disadvantages - Frequency drop because of using some AVX instructions

Using AVX2 registers in 256 bit mode and AVX512 can slow down the program because the overheating protection will drop frequency when some heavy load AVX2 and AVX-512 instructions are used. Read this Article for more informations. This information should be added to the Wikipedia article. --91.89.138.29 (talk) 22:53, 6 October 2019 (UTC)[reply]

You're correct on it being the over-temp and TDP limits controlling this. Everyone is using the Gold 5120 as an example; it has a TDP of 105W. The data from that is meaningless by itself as it's already a very low power chip for the core count and clocks. A comparison vs. the 5117F (400MHz lower turbo and 200MHz lower base but 8W higher TDP) and the performance chip: 6132 (Base 2.6GHz / Turbo 3.7GHz, TDP 140W) which are both 14 cores released at the same time as the 5120 would probably show that heavy curve at 14 cores utilized evening out or disappearing. With heavy enough cooling (workstation with water cooling rather than regular low profile heat sink you'd slap on a 105W chip), the 140W chip can probably handle running all cores at the fixed frequency drop ratio of 300MHz without exceeding the turbo TDP (likely 180W or in that range) or with very little change at all cores utilized. It's impossible to tell though since everybody apparently took one set of results and jumped to conclusions without testing anything else. Pretty soon they'll probably start showing up cheap enough on eBay to do actual tests but it's understandable why people weren't rushing out to go buy multiple $1500-$2000 processors to get real results. I'm just basing this on how the whole thing actually works, and it doesn't take into account the number of cores used except as the basis for max turbo clock with that many cores and the offset. That's purely a power issue and unless somebody does the microcode rollback and write-out of the adjustable clock config bits on a motherboard that will enable overclock settings with this flipped on so the TDP can be set higher they won't get that data out of a 5120 by itself. --A Shortfall Of Gravitas (talk) 09:07, 18 July 2021 (UTC)[reply]

AVX's three-operand format and instructions with general purpose registers in AVX2?

The article claims "AVX's three-operand format is limited to the instructions with SIMD operands (YMM), and does not include instructions with general purpose registers (e.g. EAX). Such support will first appear in AVX2.[5]". Is this valid? Does AVX2 support instructions in 3-operand format with general purpose registers? --91.89.138.29 (talk) 22:57, 6 October 2019 (UTC)[reply]

Indeed, neither AVX nor AVX2 provides 3-operand format instructions for general purpose registers. I have removed the last sentence to correct it. 110.22.247.167 (talk) 09:20, 21 April 2021 (UTC)[reply]

Are "XMM"/etc acronyms for something?

Do XMM and YMM stand for something? Jimw338 (talk) 22:04, 19 August 2020 (UTC)[reply]

AFAIK it is a variant on the former MMX(0-9) registers, that stood for multimedia extension registers. XMM doesn't stand for anything, but is just supposed to look similar but different. And similarly with YMM and ZMM.Carewolf (talk) 17:40, 21 August 2020 (UTC)[reply]

Actually, MMX registers were called MM(0-7). So XMM(0-7) are eXtended versions of those, being double the width. YMM and ZMM simply reiterate on that. 188.32.106.30 (talk) 22:38, 9 July 2022 (UTC)[reply]

Death Stranding&AVX?

I saw on Steam a comment re AVX being required for the final boss fight (???) for Death Stranding on PC. Anyone know more about this? E.g. should it be included in the list, and if so for which version?

No, it shouldn't. AVX has been available in the majority of processors for almost 10 years, I'd be shocked if everybody isn't enabling at least the base version of it which doesn't have any clock penalties, and Death Stranding's bare minimum processor requirement supports AVX. I have no idea why or what they were mentioning it for, aside from probably being mad that their 10 year old hardware didn't mess up on them until the last boss when the minimum was there for a reason. If it required AVX-512 (but doesn't list Skylake as a hard requirement) that would be a different matter but still one that belongs on the page for that game. --A Shortfall Of Gravitas (talk) 07:39, 18 July 2021 (UTC)[reply]

Re: Downclocking

The percentages listed in that section aren't a reflection of how the processors actually handle things. A fixed ratio multiplier (multiplied by the processor bclk reference, so usually 100MHz unless somebody messed with it) is used to calculate the frequency drop, so any percentages are only relevant for whatever specific speed processor the person testing them tested on and at unmodified clock settings. See the link to the XTU guide I posted in that section. I'm not sure how to re-word it correctly because I don't feel like digging through somebody's findings for their specific model and figuring out the pre-set ratios it used, but for example on Broadwell-E the AVX2 drop ratio is 2x by default, which results in a drop of 200Mhz below the TB3 frequency, and it can be changed in either the BIOS or XTU given specific cooling and possibly slight core voltage increase if the processor is being overclocked to begin with. There is no curve built in of "more cores = lower speed" with AVX. the 5117 has the same 105W TDP but is clocked at 2.0 / 2.8 instead of 2.2 / 3.2 which is kind of suspicious.

Anyway, on an i7-6950x Broadwell-E with the AVX2 ratio set to the default of 2x, the processor TDP ignored because of sufficient water-cooling, and all cores set to a 4.2GHz turbo ratio, every core runs at 4.0GHz because there's no TDP to exceed. Even with TDP left on the defaults there's no real drop because the turbo max defaults to 185W and is heat-limited at that clock speed. Raising it even slightly would produce different results either from heat or exceeding max turbo TDP since voltages don't scale linearly with clock increase. Changing that ratio to 0x results in instability unless core voltages are raised, but since running all cores at 4.2GHz is much higher than the intended one core max turbo for that processor in the first place there's no real point in doing so. Likewise I suspect if I lowered my TDP to 105W via BIOS or XTU I'd immediately start seeing that kind of rapid tanking of performance with high all-core AVX2 usage regardless of whether the ratio offset was set or not. I'm too lazy to do it and original research is useless here anyway (unless you post it on stackexchange apparently), but that's just how the TDP and thermal limiting work on these things. It's a huge part of why everyone can sell these processors with massive turbo clocks and unlocked ratios and not worry about anyone frying them even though (mostly) nobody really understands what exactly it is they're changing or how any of it works. If they do, well, running your memory at the XMP speed it and the processor itself in the case of AMD was advertised at instead of its base frequency already voided the warranty on the processor so Intel / AMD don't really need to worry too much.

It doesn't have AVX512 so I can't test that but the behavior of both ratios is the same; a constant offset from the max ratio; This kind of test can't really be done on a "true" Xeon aside from some earlier v4 and possibly Skylake parts which could be forced to allow overclocking if their microcode wasn't updated, but Broadwell-E was a Xeon anyway and the same applies.

The defaults can be different between processors. On the linked Ice Lake, the information is almost certainly wrong in that the L1 downclock still exists but has the ratio set to 0x for that model and the L2 ratio is set to 1x. I suspect in XTU if someone was overclocking it they could adjust both ratios to be non-zero to keep voltages and thermals within a reasonable range while AVX code was running or set the L2 ratio to zero and increase allowed power / TDP a bit if they really needed the extra 100MHz of speed in AVX512 code.

AFAIK the whole notion of downclocking in the first place came about when the unlocked higher core count Haswell-Es were released and didn't implement this, and someone turned off both the thermal and power limits in their BIOS then tried to run heavy AVX2 code and fried their processor which now had everything telling it to turn off before it caught on fire disabled. Then they whined about it up and down the entire internet for 2 years, so Intel implemented the downclocking to avoid that in Broadwell-E and it became more necessary when power management changed with Skylake and TDP was followed less strictly anyway and AVX512 was even more of a power hog.

In any case my point is that if any hard numbers are to be left in that section they should probably be specified as examples, specific to those single processors, and in terms of base clock multipliers since percentages make little / no sense even on similar speed chips as seen from the 14C skylake server chip downclocking immensely to keep within the low TDP. --A Shortfall Of Gravitas (talk) 08:46, 18 July 2021 (UTC)[reply]

The essential point this section should get across is that there is a separate downclocking mechanism that is besides thermal and power regulation. In other words, the downclocking that is being discussed in this section happens solely based on execution of certain instructions. The Ice Lake reference describes that in detail. The exact amount of downclocking is configurable, but there are default clock multiplier reductions. Changing these settings is considered overclocking and is besides the point of this section. Though I agree that it would be more accurate to use clock multiplier reductions instead of percentages. 5.142.43.254 (talk) 21:02, 8 January 2022 (UTC)[reply]

Introduction

It says "AVX provides new features..."

Would it be possible to provide one or two examples of the functions that get the most benefit from this instruction set(s)? Thanks Nei1 (talk) 15:17, 23 February 2023 (UTC)[reply]

There is an entire section listing software that benefits from AVX. 2A02:2168:84D9:F00:FA48:DA12:73B7:376E (talk) 17:03, 9 June 2023 (UTC)[reply]