Rust UB versus C UB - and why there's less discussion of Rust UB

Posted Jul 10, 2024 11:26 UTC (Wed) by Wol (subscriber, #4433)
In reply to: Rust UB versus C UB - and why there's less discussion of Rust UB by farnz
Parent article: New features in C++26

> And, as an aside, I'd like to be picky about signed overflow; C++ could get the same general effect with unspecified behaviour, where the only permissible behaviours are wraparound or an exception being thrown.

I think this is the point I'm getting at, which is why I get so frustrated at the O_PONIES argument. You are executing code which assumes infinite precision, on hardware which can only do fixed precision. And then you place the onus on the PROGRAMMER to ensure that there are no bugs!!!

Rust may make the same "you want infinite precision" assumptions, but it also says "if you get it wrong, either the program will fail to compile, or it will crash at runtime". C/C++ just says "all bets are off" including deleting loads of code that will only ever execute if the programmer screws up - and was probably put there for the explicit purpose of detecting said screw up!!!

At the end of the day, all I want (thinking as an engineer) is that when hardware reality and programming mathematical purity collide, I get alarms going off. And if the language purists say I'm demanding O_PONIES, then that is a massive alarm going off telling anyone writing real-world systems they need to get the hell out of there.

Which is why, imnsho, I think C/C++ is rapidly heading into the Pascal world - a great language for teaching, but totally useless in real life without loads of hacks that sully its mathematical purity. The tragedy is that it didn't start out that way - as originally designed it matched reality pretty exactly - "x = a + b" just fed a and b into the cpu ADD instruction and gave you the result. If it wasn't what you expected because it had overflowed, so be it!

Rust is a real world language. Safe code is basically defined as "it does what the man on the Clapham Omnibus would expect". Code that can misbehave MUST be tucked inside the "unsafe" keyword, and then the programmer is placed on notice that the compiler will be unable to help them if they screw up. And equally importantly, the programmer is placed on notice if the code outside of "unsafe" screws up, that is a serious language bug that will get fixed, even if it breaks other code.

Cheers,
Wol

Rust UB versus C UB - and why there's less discussion of Rust UB

Posted Jul 10, 2024 13:13 UTC (Wed) by farnz (subscriber, #17727) [Link] (1 responses)

The reason for "O_PONIES" is that what people ask for is not a well-defined semantic, but "do what the hardware does". When the language is meant to be portable to arbitrary hardware, "do what the hardware" does is meaningless, since it depends on what the hardware is.

And Rust is not defined as "it does what the man on the Clapham Omnibus would expect" - there are explicit semantics defined for all of the operations, and it's expected that if the Rust operation doesn't quite match the hardware operation, the compiler will fix up the difference.

C has never worked the way you describe it working - even early C compilers could do things like strength reduction, converting "x = a * 5" to "x = a << 2 + a" . The problem is that some of these changes are liked (such as reducing multiply to bitshift and add), and nobody complains that the compiler chose to output a bitshift and add instead of a multiply, and some are disliked - but nobody can get even POSIX to define the meaning of UBs that people hate, like signed overflow.

Rust UB versus C UB - and why there's less discussion of Rust UB

Posted Jul 10, 2024 14:23 UTC (Wed) by khim (subscriber, #9252) [Link]

> When the language is meant to be portable to arbitrary hardware, "do what the hardware" does is meaningless, since it depends on what the hardware is.

It doesn't even work for a single architecture. Because to even say “what the hardware is doing” you need precise and unambigious mapping from the source code to the machine code.

This essentially turns your language into an assembler and, worse, into non-optimizing assembler for even minor changes in the code generated may blow your code to pieces. I have even wrote such code, myself, when worked with a small enough microcontroller that had 256byte pages and I tried to squeeze all my code into one such page.

> And Rust is not defined as "it does what the man on the Clapham Omnibus would expect" - there are explicit semantics defined for all of the operations, and it's expected that if the Rust operation doesn't quite match the hardware operation, the compiler will fix up the difference.

Precisely. Rust program behavior is still described in terms of abstract Rust machine and Rust is fully embracing the fact that answer to the question about whether two such programs are equivalent or not couldn't be given in 100% of cases.

Rust works by embracing the fact that undecidable problems exist, while “we code for the hardware” guys just ignore that fact entirely.

> but nobody can get even POSIX to define the meaning of UBs that people hate, like signed overflow

Why do you say that? Both GCC and clang supported -fwrapv option for years. Signed overflow can be defined if you want it to be defined — that's not what “we code for the hardware” guys complain about. They complain about the need to know about these things in addition to knowing about what the hardware is doing — and that part couldn't be solved by tweaks to the language definition: of course if you want to use the language you need to know how it works! What other alternatives are there?

Rust UB versus C UB - and why there's less discussion of Rust UB

Posted Jul 10, 2024 14:10 UTC (Wed) by khim (subscriber, #9252) [Link]

> Rust may make the same "you want infinite precision" assumptions, but it also says "if you get it wrong, either the program will fail to compile, or it will crash at runtime". C/C++ just says "all bets are off" including deleting loads of code that will only ever execute if the programmer screws up - and was probably put there for the explicit purpose of detecting said screw up!!!

You are misinterpreting things. Again. If you want to get Rust to behave like C/C++… just write x.unchecked_add(y) and you are back in the world of undefined overflow.

And if you want you may write __builtin_add_overflow(x, y, &res); in C/C++ and voila: no UB when something overflows! It's doable, ma!

> At the end of the day, all I want (thinking as an engineer) is that when hardware reality and programming mathematical purity collide, I get alarms going off.

Yup. You are asking for the only thing that compilers never delivered and could, in fact, never deliver.

> And if the language purists say I'm demanding O_PONIES, then that is a massive alarm going off telling anyone writing real-world systems they need to get the hell out of there.

Why? You are, quite literally, asking for the impossible. At some point this becomes a source of amusement, but it's obvious who would leave, in the end: the guys who are using things without crying about something they could never have would learn how to deal with the fact that compiler is an always an adversary (but sometimes a friend, too) and dreams “benign” compilers would never materialize, while people who want to “program for the hardware” would just eventually die off.

When Rust developers realized that providing what you are asking for is fundamentally impossible they divided language in two: in normal, safe, realm all “bad” programs are detected and rejected and in the unsafe real programmer is left one-on-one with a formidable adversary that is modern optimizing compiler.

They haven't solved the impossible issue, but they sidestepped it. And they are doing the same things in other places, too. They accept that existing state of affairs is not ideal but they effectively compromise to be able to work real-world programs using real-world compilers anyway.

While in C/C++ realm enough of developers (both C/C++ developers who use the compilers and also C/C++ compiler developers who write them) don't even think about compromise.

In particular your rants never ever admit that you are asking for the impossible and don't ever stop to think about whether some other way forward except for O_PONIES may exist.

> The tragedy is that it didn't start out that way

Seriously? Because from my POV it's the exact opposite.

> "x = a + b" just fed a and b into the cpu ADD instruction and gave you the result.

Sure. And that's where the problem lies. If you define your language in terms of instructions executed then you have to ensure precise sequence of instructions generated! And that's only possible and feasible when your compiler is primitive enough to predict exactly what a given sequence of characters gives as input would produce on the output from the compiler.

Not even assembler always works that way, self-modifying code often needs a very specific version of assembler to be compiled correctly.

Thinking that you may create a high-level language on that basis is sheer lunacy.

> Rust is a real world language.

Sure. But it's important to understand how the decision that were put in Rust have come to be. Rust is very much a result of a compromise. If we couldn't solve some problem perfectly then let's solve it to the best of our abilities and, maybe, redo the decision later. There's an intersting list of things that Gaydon Hoare had to give up on to ensure that Rust would succeed.

That's more-or-less the polar opposite from what C/C++ world is usually doing: compromises there are very rarely happen, instead things where people couldn't agree are just thrown away (take the drama discussed there with a grain of salt because JeanHeyd is a bit of a drama queen, but facts are facts).

> Safe code is basically defined as "it does what the man on the Clapham Omnibus would expect". Code that can misbehave MUST be tucked inside the "unsafe" keyword, and then the programmer is placed on notice that the compiler will be unable to help them if they screw up. And equally importantly, the programmer is placed on notice if the code outside of "unsafe" screws up, that is a serious language bug that will get fixed, even if it breaks other code.

Sure. And it works pretty well. But this mere structure is an explicit admission that O_PONIES are impossible, that to separation of the wheat from the chaff would never happen and when hardware reality and programming mathematical purity collide, I get alarms going off demand could never be fulfilled.

Rust design is based on top of that acceptance. If we couldn't precisely separate “good” programs from “bad” programs then let's separate them imprecisely and let's give you two answers instead on one: with false positives in one part of your program and false negative in the other part of your program.

And if you are willing to compromise then the line where the compromise is reached can be debated, but if you are not willing to compromise and only just scream “gimme O_PONIES, only O_PONIES are acceptable”, then obviously compromise would never be reached for you are not seeking it.