As the name suggests, adding the alwaysinline attribute to a function or callsite, always causes the function to be inlined. However, LLVM seems to ignore situations when the function cannot be inlined, for example when the caller and the callee have conflicting attributes.

I believe this behaviour is currently broken.

The LangRef documentation [1] for alwaysinline says:

This attribute indicates that the inliner should attempt to inline this function into callers whenever possible, ignoring any active inlining size threshold for this caller.

The Clang documentation [2] for __attribute__((always_inline)) says:

Inlining heuristics are disabled and inlining is always attempted regardless of optimization level.
[…]
This attribute does not guarantee that inline substitution actually occurs.

This is more lenient than the description in the GNU documentation of the attribute [3], which says:

Failure to inline such a function is diagnosed as an error

The above suggests that Clang/LLVM has the ability not to inline when it shouldn’t do so, and to possibly also emit an error, but LLVM doesn’t take this approach.

There are some attributes where inlining them would result in broken code. For example conflicting target-features, or conflicting strictfp attributes, or conflicting attributes around pointer authentication. My actual motivation is to avoid inlining functions with a different ā€˜streaming-mode’, which is a property that is specific to AArch64 SME and has its own set of function attributes. Whether a function can be inlined requires knowledge of the body of the to-be-inlined function, which isn’t easily analyzed in the front-end.

I can imagine that for some of the attributes we may still want to inline them even when they’re not compatible, such as noprofile or even the sanitize_* attributes, as there might be a desire to balance runtime performance and functionality. But I’m still not entirely sure this is desired/correct behaviour.

So I’d like to collect some input on what the desired functionality actually is.

  • Do we want to change the behaviour such that all attributes must be compatible in order for them to be inlined (regardless of the alwaysinline attribute)? Or do we want to distinguish between conflicting attributes that do - and conflicting attributes that do not - inline depending on whether the alwaysinline attribute is set?

  • Then orthogonal to this there is the point of generating an error diagnostic to the user when LLVM cannot inline. Do we want to do this? Or do we want to silently avoid inlining (possibly with an OptimizationRemark).

Any thoughts?

[1] LLVM Language Reference Manual — LLVM 19.0.0git documentation
[2] Attributes in Clang — Clang 19.0.0git documentation
[3] Common Function Attributes (Using the GNU Compiler Collection (GCC))

2 Likes

Generally speaking, yes, alwaysinline should be ignored if the inlining would not be legal due to attributes.

In practice, actually getting there will take some work. This change has been attempted (I think in the LLVM 17 cycle?) and was reverted due to a large amount of fallout that was mostly related to target feature mismatches: Failing to inline SIMD builtins will usually have a catastrophic performance effect. The fact that these builtins tend to be marked alwaysinline papers over cases where our analysis of whether inlining is valid (based on target features) is not sufficiently accurate. We should definitely move away from relying on that, but we also need to make sure we don’t break inlining in valid cases in the process (e.g. because some targets don’t implement subset handling for target features and only allow inlining for strict equality).

1 Like

On the first question, +1 to Nikita’s answer. The alwaysinline should be ignored when inlining is not legal, but there’s work required to ensure inlining is not missed in performance critical cases.

On the second question, I am not in favor of an error diagnostic. It’s possible for the same alwaysinline function to be inlined in one context and not in another. ORE is great to have though.

Is the intent to treat alwaysinline on function declaration and callsite the same? Don’t think there is a way to set it per callsite in current frontends (correct me if I’m wrong), so it might make sense to treat per-callsite attribute as a ā€œstrongerā€ version than what declaration has (overriding whatever attributes it might have). I.e. it might be desirable to have a noinline function that has alwaysinline for some callsites. Although it might be possible to achieve this in other ways.

statement level always_inline attribute [[clang::always_inline]] can be used per-all callsite: Attributes in Clang — Clang 19.0.0git documentation

If we just ignore alwaysinline in cases where inlining is determined to be prohibited, what happens in cases such as a target mismatch where the call itself might be problematic? In clang this is diagnosed in the front end (example), but it feels like silently ignoring the attribute could be masking a problem.

In the case of a target mismatch, being able to call a function with a different target is useful (such as when the calling function performs a runtime CPU check), but seeing that in combination with alwaysinline seems like an indication of a possible problem.

1 Like

Library functions for some ML cores often have always_inline attribute, because a call instruction is not present on such target. Failure to inline in this case is an error.

It looks like always_inline is treated as ā€œinline aggressivelyā€ in many cases, rather than ā€œinline or dieā€. These are different use cases however, maybe they need different attributes.

2 Likes

I also would prefer the ability/option to force alwaysinline to error if it can’t inline. The embedded language/object format I target doesn’t support adding new function target slots at module compile time but does have always inline function embeds. That said I currently detect post pass manager when I write out the slot calls and wouldn’t be surprised if other projects/backends with similar needs could do something like that.

1 Like

I think there are also some more tricky cases here that are technically not legal to inline but practically the error does not matter, so they can be inlined. Clang produces ABI violation errors, while GCC happily inlines the function. So there is some inconsistency between compilers here.

Related to @nikic’s comment about missing inlining opportunities, an example I just came across in my own code is when the caller has, e.g., target("avx512f"), and the alwaysinline function has no target-specific information (see godbolt example). If the function to inline has hidden visibility and alwaysinline, in my understanding, it will never actually produce a function to call, so the ABI does not matter in the frontend. GCC happily accepts this code, Clang does not. I think Clang is pessimistically not inlining/erroring, even if this is possible.

So having less specific code to inline could just be inlined and then compiled with the caller’s target-specific information. I guess the other direction is more problematic.