| |

Subscribe / Log in / New account

Safer flexible arrays for the kernel

By Jake Edge
September 22, 2022

At the 2022 Linux Security Summit Europe (LSS EU), Gustavo A. R. Silva reported in on work he has been doing on "flexible" arrays in the kernel. While these arrays provide some ... flexibility ... they are also a source of bugs, which can often result in security vulnerabilities. He has been working on ways to make the use of flexible arrays safer in the kernel.

Silva has a background in embedded systems, working with both realtime operating systems (RTOS) and embedded Linux. For the last six years, he has has been working as an upstream kernel engineer. He collaborates with the Kernel Self Protection Project (KSPP) and the Linux kernel division of the Google open-source security team.

Trailing and flexible arrays

He began with an introduction to C arrays, starting with the simplest, int happy_array[10];, which declares an array holding ten elements of type int that can be indexed using the values zero to nine. happy_array is "going to remain happy as long as we access it within its boundaries". But C does not enforce those limits, so developers must do so; if they do not, they end up in what he likes to call "The Land of Possibilities", also known as undefined behavior.

A "trailing" array is one that is declared as the last field in a structure. They can have a concrete size, as with happy_array, or they can represent a "blob" of data that is tacked onto the structure at run time. For example:

    struct blob_holder {
        ...
        size_t count;
        unsigned char blob[];
    }

Typically, some element in the structure holds the length of the blob, such as count here. In this way, trailing arrays are used to build variable-length objects (VLOs) in C. So a flexible array is simply a trailing array used as a VLO; its size is determined at run time. A flexible structure is a struct that contains a flexible array as its last element. There are three ways to declare flexible arrays, Silva said. Two of those are designated as "fake" flexible arrays because they do not use the C99 empty-bracket declaration (i.e. "true" flexible arrays) as above. Many of those fake uses predate C99 entirely and they declare either zero- or one-element arrays to use as flexible arrays. This usage leads to bugs.

Declaring a one-element flexible array is a "buggy hack". The problem is that the single element is counted toward the size of the array (and enclosing structure), which can easily lead to off-by-one errors. The count field in the structure is one larger than what should be allocated, so count - 1 needs to be used consistently. When analyzing existing code that uses flexible structures of that sort, one must always consider the uses of sizeof() for the array and structure. Often that analysis will find existing off-by-one and other bugs in the code.

A zero-element fake flexible array is a GNU extension that was added to work around the lack of true flexible arrays in the language at that time. They are somewhat less buggy than the one-element arrays, since they do not contribute to the size of the enclosing structure. True flexible arrays must appear last in the structure, which is enforced by the compiler. Either of the fake flexible array variants can appear anywhere in a structure, though, which can lead to other kinds of problems, of course.

Problems

The sizeof() operator returns different values for the three variants. For the one-element variant, the array's size is that of one element of the type of the array; it is zero for the zero-element variant. But for true flexible arrays, sizeof() gives a compile-time error because the size is not known.

The first flexible-array-transformation fix that he did as part of his KSPP work shows the kind of problem that can stem from fake flexible arrays. A zero-length array was declared at the end of a structure, but later someone added a field for read-copy-update (RCU) after the flexible array. The compiler did not complain, so the bug persisted from 2011 until he fixed it in 2019. He used a true flexible array declaration (and moved it to the end); now if someone adds a new structure member at the end, the compiler will report an error.

There has been an effort to enable array-bounds checking in the compiler with the -Warray-bounds option, but the fake flexible arrays were causing too many false-positives (along with finding some real bugs). It is not uncommon for a flexible array to be indexed directly with a value that is beyond the "end" of the array. Those needed to be fixed before bounds checking could be turned on.

He fixed a simple example in mid-2021. A one-element array was being accessed with [1], which is obviously one element too far; changing it to a true flexible array got rid of the warning. Others are a bit more elaborate, but boil down to a switch to a true flexible array; removing the one-element array also got rid of a few count - 1 calculations for allocation sizes.

Flexible arrays can be the source or target of memcpy() operations and it would be nice to have them participate in the hardened memcpy() effort. When CONFIG_FORTIFY_SOURCE is enabled for the kernel, memcpy() uses the __builtin_object_size() function (with a type argument of 1) to calculate the sizes of the source and destination at run time.

For true flexible arrays, though, that function returns -1 because it cannot determine the size. Fake flexible arrays do have a size, but it turns out that __builtin_object_size() still returns -1 for those. Combining that with the behavior of sizeof() makes things all a bit confusing as he showed in his slides:

    __builtin_object_size(flex_struct->one_element_array, 1) == -1
    __builtin_object_size(flex_struct->zero_length_array, 1) == -1
    __builtin_object_size(flex_struct->flex_array_member, 1) == -1

    sizeof(flex_struct->one_element_array) == size-of-element-type
    sizeof(flex_struct->zero_length_array) == 0
    sizeof(flex_struct->flex_array_member) == ? /* Error */

Because __builtin_object_size() cannot determine a size for trailing arrays, no bounds checking is done in memcpy() (with CONFIG_FORTIFY_SOURCE) for those arrays today. What's even stranger, perhaps, is that __builtin_object_size() returns -1 for any trailing array, even if it has a specified size greater than one. Because __builtin_object_size() does not return a size for trailing arrays, even those that it ostensibly could determine, no bounds checking is done in memcpy() (with CONFIG_FORTIFY_SOURCE) for those arrays today. The reason for this __builtin_object_size() behavior is legacy code that declares trailing arrays with a fixed length—but treats them as flexible arrays. He showed a BSD version of struct sockaddr with a trailing array, char sa_data[14], that can actually hold up to 255 bytes at run time.

In order to allow memcpy() to do sanity checking on trailing arrays, this ambiguity in declarations for flexible arrays needs to be eliminated. All arrays that are meant to be used as flexible arrays should be declared as true flexible arrays using []; then, compilers can be instructed to treat fixed-length trailing arrays as regular fixed-length arrays. He pointed to a GCC bug report for a compiler change to address the problem.

Compiler flag

There is a new compiler flag in the upcoming GCC 13 and Clang 16 releases that will allow developers to set the level of strictness for flexible arrays: ‑fstrict‑flex‑arrays[=n] (often abbreviated as -fsfa). The default setting for n is 0, which means no change from today and all trailing arrays are treated as flexible by __builtin_object_size(). Values of n from 1 to 3 increase the strictness of enforcement by changing the behavior of __builtin_object_size():

-fsfa=1: only trailing arrays that are declared with [1], [0], and [] are treated as flexible arrays; __builtin_object_size() returns the proper length for others.
-fsfa=2: only trailing arrays that are declared with [0] and [] are treated as flexible arrays; __builtin_object_size() returns the proper length for others.
-fsfa=3: only trailing arrays that are declared with [] are treated as flexible arrays; __builtin_object_size() returns the proper length for any with a concrete size.

Unfortunately, the Clang developers have not (yet?) been convinced to add -fsfa=3; there is an ongoing discussion about it, Silva said. The work to transform the flexible arrays in the kernel to true flexible arrays had been going on for several years and there is still more to do. Transforming uses of zero-element arrays is fairly straightforward, but one-element arrays are more difficult to transform because they require more code inspection to look for off-by-one problems. Once that is done, and the compilers are available, memcpy() will be able to bounds-check all trailing (non-flexible) arrays, so all arrays of fixed size will finally be bounds-checked in the kernel.

So there is a path toward getting all of those arrays bounds-checked, what about checking for actual flexible arrays? It is a more challenging case, Silva said, but there are proposals for ways to handle it. The key is to identify the structure member that holds the length of the array. That could be done with an attribute on the array like the following:

    struct bounded_flex_struct {
	...
	size_t elements;
	struct foo flex_array[]
	  __attribute__((__element_count__(elements)));
    };

There are some user-space API issues to work out, however, when switching from one-element flexible arrays to true flexible arrays. The first attempt at supporting both the existing API and the new way of doing things duplicated the fields in user-facing structures and placed them inside a union so that user space could use the array one way and the kernel could use it the other:

    struct farray {
        union {
            struct {
                ... /* renamed versions of the members */
                size_t renamed_count;
                int orig_array_name[1];
            };
            struct {
                ... /* members with existing names */
                size_t count;
                int orig_array_name_flex[];
            };
        };
    };

Doing that caused a lot of code churn, so the __DECLARE_FLEX_ARRAY() helper macro was added that would go in a union that just contained the arrays:

    struct farray {
        ...
        size_t count;
        union {
            int orig_array_name[1];
            __DECLARE_FLEX_ARRAY(int, orig_array_name_flex);
        };
    };

In both cases, user space will continue to use orig_array_name, while the kernel will use orig_array_name_flex. One thing to note is that the size of the structure does not change; the one-element array will still contribute to the size of the structure.

Status, conclusions, and questions

At this point, most of the zero-length arrays in the kernel have been transformed, including handling any user-space API issues. But there is nothing stopping new ones from being added, so he asked kernel developers not to introduce new uses. Transformations for one-element arrays are still a work in progress; that work is more challenging and there is a need to ensure that the maintainers of the code being changed feel comfortable that the changes have not broken anything. To that end, he is using a variety of diff-like tools to try to verify that no significant changes have been made by the transformation process.

It is important to turn all of the kernel uses of flexible arrays into true flexible arrays—and then to ensure that no new uses of zero- or one-length flexible arrays are added, he reiterated. The security of the kernel can be significantly improved with ‑fstrict‑flex‑arrays=3, which means it is important to convince the Clang developers to support that setting. This work has already found vulnerabilities in the kernel and will surely find more as it progresses. It is going to take some time but there is a clear vision of how we get to the point where all trailing arrays, fixed-size or flexible, will be bounds-checked in memcpy().

Silva took a few comments and questions at the end of the talk. LSS EU organizer Elena Reshetova noted that when the conversion from atomic_t to recount_t was done, those developers faced a similar problem with stopping developers from adding new uses of the types they were trying to convert. They ended up integrating a test into the 0-day test robot to catch those introductions and send email. That worked well and she encouraged Silva to try something like that.

I asked what reasons the Clang developers had for opposing the strictest setting on the new compiler flag. Silva said that their position is "just don't use zero-length arrays" but he deferred to Kees Cook, who said he could speak to the "minutiae of that". The Clang folks point out that zero-length arrays are not legal C, according to the standard, so if the GNU extension allowing them is removed, zero-length arrays do not exist so the =2 level is sufficient. Adding another option to support having zero-length arrays that are not flexible arrays seems pointless within that community, Cook said.

"Unfortunately, that's not the reality of our world." When the GNU extension was added, some code used zero-length arrays as flexible arrays, while other code used them as actual arrays with no elements in order to place markers inside structures, for example. In addition, there are arrays in the kernel that typically have some fixed size but that size may fall to zero in certain configurations.

There are probably ways to work around the lack of that option for Clang, Cook said, but it would be much easier for the Clang developers to accept the reality that zero-length arrays exist and that the kernel (at least) wants to be able to stop treating them as flexible arrays. There is a flag available to warn on the use of zero-length arrays, but it produces 60,000 warnings on the kernel code, so that is not a sensible path either, Silva said. It is clear that the hope is for the Clang folks to relent on this particular point.

[I would like to thank LWN subscribers for supporting my travel to Dublin for the Linux Security Summit Europe.]

Index entries for this article
Kernel	Variable-length arrays
Security	C language
Security	Linux kernel/Hardening
Conference	Linux Security Summit Europe/2022

Safer flexible arrays for the kernel

Posted Sep 23, 2022 0:06 UTC (Fri) by ndesaulniers (subscriber, #110768) [Link] (2 responses)

https://reviews.llvm.org/D126864 for the review comments in the initial implementation of -fstrict-flex-arrays in clang.

Safer flexible arrays for the kernel

Posted Sep 23, 2022 0:09 UTC (Fri) by ndesaulniers (subscriber, #110768) [Link]

And additional color: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836#c32.

clang will implement -fstrict-flex-arrays=3

Posted Feb 1, 2023 16:28 UTC (Wed) by david.a.wheeler (guest, #72896) [Link]

FYI: After this article was published the clang folks decided to add -fstrict-flex-arrays=3. Here's the commit:
https://github.com/llvm/llvm-project/commit/7f93ae808634e...

This is expected to be in LLVM 16, at the time of this writing that should be out in a few months.
So there *is* movement on this front!

(My thanks to Nathan Chancellor for letting me know about this upcoming change to clang.)

Safer flexible arrays for the kernel

Posted Sep 23, 2022 0:53 UTC (Fri) by tialaramex (subscriber, #21167) [Link] (6 responses)

Clearly the way forward is for all the actual zero size arrays in Linux to be written in Rust since in Rust types with zero size are a completely unremarkable idea we use all the time :D

But yes, I think Clang relenting makes the most sense. Zero Size Types are good. Standard C and C++ can't have them without ripping the mask off when it comes to pointers, but that's a price the kernel is certainly willing to pay. "Flexible" arrays are less obviously good, but they're a pragmatic extension to C, and they aren't somehow the same thing as ZSTs, so it makes little sense to implement both but then insist people can only have one or the other.

Safer flexible arrays for the kernel

Posted Sep 23, 2022 3:07 UTC (Fri) by wahern (subscriber, #37304) [Link] (4 responses)

I think there's still an active proposal for adding better array semantics to C. Two actually, but they're related. I believe both have missed the C2x cut-off, but judging by straw poll reporting still under consideration for post-C2x changes. They effectively add dependent typing, extending and improving variably-length array (VLA) semantics. See, e.g., https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2660.pdf and follow-ups N2779, N2905, N2906.

VLAs--more precisely, variably-modified types (VMTs)--are what permit C99 and later code to declare arrays like,

  size_t n = 10;
  int a[n];

with sizeof working as expected,

  printf("%zu\n", sizeof a / sizeof a[0]); // prints 10

C99 also added VLA syntax for function arguments, but it's useless on account of retained array-to-pointer decay semantics, despite this being infinitely higher on the wish list for most C programmers (IMO), who also mostly (and rightly) shun VMTs. So this is valid code,

  foo(size_t n, int a[n]) {
      ...
  }

but sizeof doesn't work as expected as `a` has still decayed to a pointer. To make things nominally consistent--but just as useless--the standard says that `int a[n]` in prototypes is treated as-if written `int a[*]`, where `*` means unspecified length. So the most obvious fix here is making VLAs in function arguments work sanely. While conceptually simple, the required changes would be backward incompatible. One proposed workaround is adding a new operator, lengthof, to complement sizeof. sizeof could still work as before, but compilers would be permitted (or required) to complain and suggest lengthof. There are still some other potential incompatibilities regarding function declaration semantics, though, that either need to be worked around or swallowed as necessary medicine.

The second prong to better array types marry VLAs and flexible array members (FAMs). At least, that's how I conceptualize of it in my head. Similar to how automatic storage VLAs (aka VMTs) *do* work, and to how function declaration VLAs *should* work, improved FMAs would look like:

  struct foo {
    size_t n;
    int a[n];
  }

This informs the compiler that the size of FAM `a` is a function of the value of member `n`. Correct me if I'm wrong, but this is literally dependent typing, albeit simple and strictly limited to this kind of declaration. In principal this feature shouldn't be too onerous to implement as it's effectively identical to how automatic variable VLAs behave today--sizeof must be computed at runtime rather than being a compile-time constant. And there aren't any backwards compatibility worries, except for sizeof (arguable as sizeof wouldn't work before, but lengthof workaround would suffice nonetheless), or regarding some proprietary extensions (IIRC, GCC might support this syntax, but not the semantics--`n` is ignored as a side-effect of some other GCC feature). One ugly issue to resolve is how to *initialize* `n`. AFAIU this is not unrelated to a similar issue in Rust regarding array initializations, though the C community would likely more be accepting of a solution that still leaves a potential foot-gun lying around (e.g. sizeof/lengthof on `a` undefined if `n' not explicitly and correctly assigned beforehand), whereas the Rust community (or at least the current Rust developers) are adamant that no gaps in soundness guarantees be introduced.

These two language changes don't per se get us to strict array bounds checking in C, but they're necessary prerequisites for being able to apply Fortify-like bounds checking ubiquitously and comprehensively. Personally I've never found __builtin_object_size very useful precisely because I have no way to comprehensively and reliably declare my array types appropriately; it's currently all just best effort and very hand wavy. I would gladly contribute hard cash (maybe even 4-figure cash, though the work is worthy of at least 6-figure renumeration) if it helped redirect GCC and clang efforts to first implementing the above extensions to VLAs. And having a proper implementation of these proposals in GCC or clang would go a long way toward their ultimate acceptance by the C committee. Contrast that with Fortify and __builtin_object_size, where they're so obviously deficient--at least, absent the above changes--that few would even seriously question why they wouldn't be under consideration by the C committee.

Safer flexible arrays for the kernel

Posted Sep 23, 2022 5:20 UTC (Fri) by kees (subscriber, #27264) [Link] (1 responses)

Yup, this is the purpose of the proposed "element_count" attribute, which can entirely bypass needing a C language spec change, letting us just get it working. With __builtin_dynamic_object_size this will significantly improve bounds checking coverage. (And also adding the "access" function attribute.)

See https://lore.kernel.org/linux-hardening/e2a0debe-e99f-225...

Safer flexible arrays for the kernel

Posted Sep 23, 2022 6:15 UTC (Fri) by jrtc27 (subscriber, #107748) [Link]

Such standard attributes would also be useful for CHERI; by default, bounds are based on the allocation, but have an opt-in mode (in fact, more than just one, but they get increasingly aggressive and standard breaking) to add subobject bounds which we enable for the pure-capability CheriBSD kernel configuration, and then have to go annotate things that don't play nicely with it to opt-out on a per-member/type basis. We have various variants in https://github.com/CTSRD-CHERI/cheribsd/blob/6a9e74b4bd73..., but for the dependent type case we just have to make it be treated like a flexible array member, so a standardised annotation for that would give us additional expressivity (and maybe some of what we had might be usefully fed back into standardised GNU attributes, since there's a lot of overlap with the goals?).

Safer flexible arrays for the kernel

Posted Feb 1, 2023 13:18 UTC (Wed) by syrjala (subscriber, #47399) [Link] (1 responses)

I wouldn't call using the array notation for function arguments entirely useless. It at last conveys to the reader the fact that the argument is supposed to be an array rather than just a pointer. I've wasted far too much time sifting through code trying to figure out exactly what one is supposed to pass in to random functions that apparently take just pointers.

OTOH it is a bit of a footgun for C newbies. Maybe the compiler could warn about sizeof(foo) when foo was declared as one of these "array that's actually a pointer" things?

I've also made an unfortunate observation about forward declarations vs. array function arguments:
struct foo;
void whatever(struct foo *stuff);
-> fine

struct foo;
void whatever(struct foo stuff[]);
-> fails to build :(

Personally I would call that a bug, but maybe it's because of some specific stupid clause in the standard.

Safer flexible arrays for the kernel

Posted Feb 1, 2023 13:30 UTC (Wed) by excors (subscriber, #95769) [Link]

> Maybe the compiler could warn about sizeof(foo) when foo was declared as one of these "array that's actually a pointer" things?

GCC and Clang already do this by default:

> warning: ‘sizeof’ on array function parameter ‘stuff’ will return size of ‘struct foo *’ [-Wsizeof-array-argument]

> warning: sizeof on array function parameter will return size of 'struct foo *' instead of 'struct foo [n]' [-Wsizeof-array-argument]

Safer flexible arrays for the kernel

Posted Sep 23, 2022 17:04 UTC (Fri) by bluss (subscriber, #47454) [Link]

ZST work fine in Rust but I would say from experience that unsafe code has to handle it carefully and a lot of corner cases in Rust and Rust libraries have to fixed through history due to this.

tangentially-related rant about flexible arrays

Posted Sep 26, 2022 19:11 UTC (Mon) by floppus (guest, #137245) [Link] (2 responses)

It's kind of crazy that it's legal to take the "sizeof" a structure with a flexible array in it. If I have

    struct foo {
        int i;
        char c[];
    } *x;

then "sizeof(x->c)" is an error (as the article says), but "sizeof(*x)" is perfectly acceptable and not even a warning.

(I've been thinking about this recently because of a bug in my own code that would have been caught by such a warning.)

I can't think of any scenario where "sizeof(*x)" or "sizeof(struct foo)" is the correct thing to write - what you probably mean to write is "offsetof(struct foo, c)". Even if "sizeof" and "offsetof" happen to give the same answer (which is hard to guarantee), "offsetof" has the advantages of being compatible with the pre-C99 hacks, AND making it apparent to the reader that you're doing something funky with memory.

tangentially-related rant about flexible arrays

Posted Feb 1, 2023 7:38 UTC (Wed) by WolfWings (subscriber, #56790) [Link] (1 responses)

...that's because sizeof(*x) is asking for the size of what's pointed to by x, which in the case of structures is the first element of the structure. So it's actually asking for sizeof(x->i) in your example.

struct foo { short int i; char c[]; } *x;

The above would cause sizeof(x[0]) or sizeof(*x) to return 2 on most platforms.

tangentially-related rant about flexible arrays

Posted Feb 1, 2023 8:14 UTC (Wed) by nybble41 (subscriber, #55106) [Link]

> ...that's because sizeof(*x) is asking for the size of what's pointed to by x, which in the case of structures is the first element of the structure. So it's actually asking for sizeof(x->i) in your example.

No, x points to the structure, the expression *x has a structure type (not short int), and sizeof(*x) is the size of that structure. A pointer to a structure is distinct from a pointer to its first field; they have different types, even though the addresses are equal. If the array member were a normal non-flexible array (char c[20]) it would be included in sizeof(*x). Likewise, if there were fields besides i and c in the structure they would be counted in sizeof(*x) with or without the flexible array. The size of the flexible array member is not known statically, but for the purpose of the sizeof operator it's treated as it if were zero-length. (But not as if it were missing; it still affects alignment.)

Safer flexible arrays for the kernel

Posted Sep 30, 2022 2:25 UTC (Fri) by mirabilos (subscriber, #84359) [Link]

You *never* use sizeof on a struct with a flexible array member. You use offsetof of that member plus the size of the extra area to allocate.

Safer flexible arrays for the kernel

Posted Feb 1, 2023 19:46 UTC (Wed) by andresfreund (subscriber, #69562) [Link]

Is there ongoing towards the the __element_count__ attribute? I didn't find anything in my initial search.

I would be quite interested in using it in postgres...

Copyright © 2022, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds