L01-slides-Programming For Performance
L01-slides-Programming For Performance
L01-slides-Programming For Performance
January 9, 2023
The source material for the ECE 459 notes & slides is open-sourced via Github.
You can submit a pull request (changes) for me to look at and incorporate!
I’m certain you know what “programming” means. But define “performance”.
Improving on either of these will make your program “faster” in some sense.
Hopefully we could improve both metrics; sometimes we’ll have to pick one.
— Andrew S. Tanenbaum
This measures how much time it takes to do any one particular task.
Google cares, which is why they provide the 8.8.8.8 DNS servers.
Say you need to make 100 paper airplanes. What’s the fastest way of doing this?
high bandwidth
high latency
The above example makes the difference between bandwidth and latency clear.
Any improvements here may also help with the parallelized version.
On the other hand, faster sequential algorithms may not parallelize as well.
You can’t successfully make your code faster if you don’t know why it’s slow.
Chances are that you got some right and some wrong... and the ones that were
wrong were not just a little wrong, but off by several orders of magnitude.
Moral of the story is: don’t just guess at what the slow parts of your code are.
It’s okay to have a theory as a starting point, but test your theory.
If you know something that the user is going to ask for in advance, you can have
it at the ready to provide upon request.
Then putting it into Excel is simple and the report is available quickly.
Compiler optimizations (which we’ll discuss in this course) help with getting
smaller constant factors.
Sometimes you can find this type of improvements in your choice of libraries.
Use a more specialized library which does the task you need more quickly.
Libraries may be better and more reliable than the code you can write yourself.
Once upon a time, it was okay to write code with terrible performance on the
theory that next year’s CPUs would make it run acceptably.
Spending a ton of time optimizing your code to run on today’s processors was a
waste of time.
Well, those days seem to be over; CPUs are not getting much faster these days
(evolutionary rather than revolutionary change).
What if the CPU is not the limiting factor: your code might be I/O-bound.
Buy some SSDs!
Profiling is key here, to find out what the slow parts of execution are.
Furthermore, CPUs may accept the commands in x86 assembly (or whatever
your platform is) but internally they don’t operate on those commands directly.
“The report generation has been running for three hours; I think it’s stuck.”
How do I speed up this task to get it under the 30 minute time limit?
Often, it is easier to just throw more resources at the problem: use a bunch of
CPUs at the same time.
On the other hand, it’s hard to parallelize a linked list traversal. (Why?)
All modern CPUs do this, but you can do it in your code too.
Think of an assembly line: you can split a task into a set of subtasks and
execute these subtasks in parallel.
We can also use more exotic hardware, like graphics processing units (GPUs).
You may have noticed that it is easier to do a project when it’s just you rather
than being you and a team.
It’s easy to communicate the problem to all of the processors and to get the
answer back.
First, a task can’t start processing until it knows what it is supposed to process.
Also, the task needs to combine its result with the other tasks.
This is known as Amdahl’s Law, and we’ll talk about this soon.
It’s already quite difficult to make sure that sequential programs work right.
Making sure that a parallel program works right is even more difficult.
Deadlock occurs when none of the threads or processes can make progress.
It gets worse. Performance is great, but it’s not the only thing.
We also care about scalability: the trend of performance with increasing load.
If the performance deteriorates rapidly with increasing load (that is, the
number of operations to do), we say it is not scalable.
Even the most scalable systems have their limits, of course, and while higher is
better, nothing is infinite.
The nature of the languages make it hard, or even impossible, to write code
that is fast, correct, and secure.
But in many cases, writing insecure fast code isn’t the right thing.
Robert O’Callahan1 : “I cannot consistently write safe C/C++ code.” (17 July
2017)
1 Holds a PhD in CS from Carnegie Mellon University; was Distinguished Engineer at Mozilla for
Google implements best practices, and has all the tools and developers that
money can buy!
“Try Harder”?
Expecting people to be perfect and make no mistakes is unrealistic.
What we want is to make mistakes impossible.
ECE 459 Winter 2023 50 / 53
Wait, we know this...
A lot of the problems we frequently encounter are the kind that can be found by
Valgrind, such as memory errors or race conditions.
Other tools like code reviews and Coverity (static analysis defect-finding tool)
exist.
At compile time?
A design goal of this language is to avoid issues with memory allocation and
concurrency.
It does so by checking things at compile time that most languages don’t check
at all, and if so, only at runtime.