64-Bit Insider Volume 1 Issue 13
64-Bit Insider Volume 1 Issue 13
64-Bit Insider Volume 1 Issue 13
Software optimization techniques can require changes to any level of your application—
from the high-level architecture of a multi-component system, to the algorithms you use
to implement small, functional units, down to the specific machine-code instructions
you use to execute simple statements. Although it can be a mistake to focus too much
on optimization when functionality is still immature, optimization should never be too
far from your mind. Commercial software and custom solutions routinely include
performance benchmarks in their requirements specifications.
Enhancing Hardware
Adding processors is the fastest way to gain short-term performance increases. But this
option only works if the processor is the cause of the bottleneck in your application—
this source of the problem is not always the case. However, assuming that your
algorithm is optimal and all the other alternatives discussed in this newsletter have been
applied, adding processors is a valid way to increase the performance level of your
application.
“Software optimization A basic axiom of system performance is that Random
techniques can require Access Memory (RAM) storage is more expensive, and
changes to any level of faster, than disk storage. A common technique used to
your application—from the improve performance is to eliminate or reduce disk
high-level architecture of a access by expanding available memory and keeping
multi-component system, to everything stored in RAM. 64-bit Windows® makes this
the algorithms you use to technique feasible by greatly increasing the amount of
implement small functional RAM that is available to an application. This technique
units, down to the specific is a cheap but effective way to speed certain
machine-code instructions applications. For example, databases and Web servers
you use to execute simple can make significant performance gains by moving to
statements.” 64-bit systems that have large amounts of memory.
Compiler Switches
Table 1 identifies some of the compiler switches related to optimization that are
available in the C++ compilers from Microsoft and Intel.
Note Please review the documentation for both compilers to learn the specific details
and associated caveats for each switch. The Intel compiler has additional options for
high-level and floating-point optimization. Please refer to the Intel documentation for
more information about these options.
Both LTCG and PGO are techniques that allow you to optimize your application
without making changes to your code.
The proprietary format allows the optimizer to consider all object files during
optimization. This consideration enables more effective inlining, memory
disambiguation, and other inter-procedural optimizations. Also, the executable can be
better arranged to reduce offsets for things like Thread Local Storage and to reduce
paging in large executables. Refer to the compiler documentation for a full description
of the optimizations that are enabled by LTCG.
Linker Linker
(/LTCG)
Optimizer
EXE EXE
Profile-Guided Optimization
Many optimization techniques are heuristic in nature and many involve trade-offs
between image size and speed. For example, choosing whether to inline a function
depends on how large the function is and how often it is called. Small functions that are
called many times should be inlined. Large functions that are called only once, from a
few locations, should not be inlined. At least, this practice is usually a safe bet. But you
must also consider situations that fall between these two extremes.
Generic algorithms can be used to determine whether or not to inline functions. And,
frequently, it is not clear how often a function is called. For example, the function may
be guarded by a condition. Similar problems exist for branch prediction (which
determines the order of switch and other conditional statements).
64-bit Insider Newsletter
Volume 1, Issue 13
Page 5/7
“PGO provides additional PGO is a technique whereby your program can be
information about your executed with a representative set of data. Your
application’s behavior that program’s behavior will be monitored to determine how
is gathered during a often pieces of code have been executed and how often
runtime analysis of the certain branches have been taken. This technique
application.” produces a profile that the optimizer can use to make
better decisions during optimization.
2. Execute your application and feed it data or user input that represents data that
would be expected from a target user. It is important to choose this data
carefully; otherwise, you will be optimizing your program for irrelevant
scenarios.
Linker
(/LTCG:PGO)
Instrumented
EXE Optimizer
Profile
Optimized
Sample EXE
data
Again, please consult the compiler documentation for a full description of the kinds of
optimizations that can be performed during PGO.
64-bit Insider Newsletter
Volume 1, Issue 13
Page 6/7
Summary
Optimization means to maximize your application’s performance by reducing its use of
expensive or slow resources, and doing so without reducing the application’s ability to
do work. Although spending money on extra hardware can help, appropriate changes to
how your application is written or built can yield substantial performance gains, as well.
Assuming that your algorithms are sound, compiler flags should be the first place you
look to increase your application’s performance. Both LTCG and PGO can be enabled
by using compiler flags and can substantially improve performance without changing a
single line of code.