Composer Brief

Intel® Parallel Composer
Product Brief Build Serial and Parallel C and C++ Applications

Intel® Parallel Composer for Multicore Systems
Intel® Parallel Composer is a comprehensive set of Intel® C++ compilers,
libraries, and debugging capabilities for developers bringing parallelism to
their Windows*-based client applications. It integrates with Microsoft Visual
Studio* 2005 and 2008, is compatible with Microsoft Visual C++*, and supports
the way developers work, protecting IDE investments while delivering an
unprecedented breadth of parallelism development capabilities, including
parallel debugging. Intel Parallel Composer is a stand-alone product or can
be purchased as part of Intel® Parallel Studio, which includes Intel® Parallel
Inspector to analyze threading and memory errors, and Intel® Parallel
Amplifier for performance analysis of parallel applications.
Parallel Composer Components

• Intel C++ Compilers for 32-bit processors and a cross-compiler to create 64-bit applications
on 32-bit systems
“Here at Trading Systems
Lab, we got a 10% to 20% • Intel® Parallel Debugger Extension, which integrates with the Microsoft Visual
Studio debugger
performance boost in the
• Intel® Threading Building Blocks (Intel® TBB), an award winning C++ template library that
multimode trading simulator
abstracts threads to tasks to create reliable, portable, and scalable parallel applications.
that’s used in our TSL Algo It can also be used with Visual C++.
Auto-Design Platform by • Intel® Integrated Performance Primitives (Intel® IPP), an extensive library of multicore-ready,
using the C++ compiler in highly optimized software functions for multimedia, data processing, and communications
applications. Intel IPP includes both hand-optimized primitive level functions and
Intel Parallel Studio. The
high-level threaded solutions such as codecs. It can be used for both Visual C++ and .NET
compatibility with Microsoft development, in addition to working with the Intel C++ Compiler.
Visual C++* is great, and • Sample code and a great Getting Started Guide to get you going quickly
we’re looking forward to
using more parallelism
features in Parallel Studio.”
Mike Barna
President
Trading Systems Lab
Intel® Parallel Composer Intel C++ Compiler
The compiler offers native 32-bit development and a cross-
Microsoft Visual Studio Integration and Compatibility
compilation environment (32-bit host to develop 64-bit applications).
All Intel Parallel Composer features are seamlessly integrated into
You have the option of installing only the 32-bit capability, only the
Microsoft Visual Studio 2005 and 2008. The sections below discuss
64-bit capabilities, or both.
and depict integration features of major capabilities, such as the
Intel C++ Compiler, Intel Parallel Debugger Extension, Intel Threading The Intel C++ Compiler is binary compatible with Visual C++ and,
Building Blocks, and Intel Integrated Performance Primitives. in many cases, can offer a significant performance advantage to
modules or full applications built with the compiler. Another advantage
Easy to Get Started and Stay Connected with a of using the Intel compiler is the associated Intel Parallel Debugger
Growing Parallelism Community Extension which can speed debugging of parallelized code, especially
Intel Parallel Composer includes sample source code that is featured code built with OpenMP. Using Intel C++ offers developers a number
in an easy-to-use Getting Started Guide. There are also short videos of advantages, but it is not required for the use of other components
on the Web site and referenced in the Getting Started Guide, so there in Parallel Composer or the full Parallel Studio. You can use Intel
is no doubt how to use the parallelism features in Parallel Composer. Threading Building Blocks and Intel Integrated Performance Primitives
Users of Parallel Composer find the guide to be worth the few with the Visual C++ compiler or with the Intel C++ compiler. You can
minutes it takes to go through it. They find the sample code useful also use the Parallel Studio memory leak, concurrency checking and
in introducing parallelism concepts and techniques, which leads to other capabilities on applications built with Visual C++ or the Intel C++
productive use of the tools. compiler. In short, there are good reasons for you to use the full Intel
Parallel Studio, including a powerful, easy-to-use, compatible Intel C++
The Getting Started Guide is available from several places. For compiler, but you can continue to use Visual C++.
example, you can find it on our Web pages, from the Visual Studio
Help menu (along with in-depth documentation), and from the Intel
Parallel Studio or Intel Parallel Composer tree structure available
from the Windows “Start” button. There is even a prompt for it upon
completion of installation. Whether you are a beginner, a parallelism
pro, or somewhere in between, it’s worth a few minutes to go through
the Getting Started Guide.
Once you get going, you will find it useful to join the growing
community of developers taking advantage of systems based
on Intel® multicore processors. Intel provides a dynamic forum for
developers to exchange ideas, post comments and questions, and
earn points to become an Intel® Black Belt Software Developer. We
also provide a large and growing knowledge base presenting a variety
of topics to developers interested in parallelism. Join the community
today. Visit the parallel programming and multicore community at Figure 1: Intel® Parallel Composer integrates into Visual Studio*. The
http://software.intel.com/en-us/articles/intel-parallel-studio/. From solution on display is from the sample code used in the Parallel Studio
Getting Started Guide.
there, you can tap into all of its resources, including blogs, knowledge
bases, downloads, and more. Feel free to explore, and don’t forget to
bookmark it for future reference.
Support for Lambda Functions OpenMP 3.0 Task Queuing
The Intel Compiler is the first C++ compiler to implement lambda Sometimes programs with irregular patterns of dynamic data or
functions in support of the working draft of the next C++ standard complicated control structures, like recursion, are hard to parallelize
C++0x. A lambda construct is almost the same as a function object in efficiently. The work queuing model in OpenMP 3.0 allows you to
C++ or a function pointer in C. Together with closures, they represent exploit irregular parallelism, beyond that possible with OpenMP 2.0
a powerful concept, because they combine code with a scope. A or 2.5.
closure is a function that can refer to and alter the values of bindings
The task pragma specifies the environment within which the enclosed
established by binding forms that textually include the function
units of work (tasks) are to be executed. When a task pragma is
definition. In short, a lambda function, together with a closure, can be
encountered, the code inside the task block is conceptually queued
seen as syntactic sugar around function objects and function pointers
into the queue associated with the task. To preserve sequential
that offers a convenient way to write function objects, or lambdas,
semantics, there is an implicit barrier at the completion of the task.
right at the point of use.
The developer is responsible for ensuring that no dependencies exist
The source code in Figure 2 below is an example of a function object or that dependencies are appropriately synchronized, either between
created by a lambda expression. Tighter C++ and Intel TBB integration the task blocks, or between code in a task block and code in the task
allows the simplification of the functor operator() concept by using block outside of the task blocks. An example is presented below in
lambda functions and closures to pass code as parameters. Figure 3.
void solve() { #pragma omp parallel

parallel_for(blocked_range<size_t>(0, size, 1), #pragma omp single
[](const blocked_range<int> &r){ {
for (int i = r.begin(); i != r.end(); ++i) for(int i=0; i<size; i++) {
setQueen(new int[size], 0, (int)i); // try all positions in first row
}; // create separate array for each recursion
// started here
Figure 2: Source code example of a lambda function #pragma omp task
setQueen(new int[size], 0, i);
OpenMP 3.0 }
OpenMP is an industry standard for portable multithreaded application }
development. It is effective at fine-grain (loop-level) and large-
grain (function-level) threading. OpenMP 3.0 supports both data and Figure 3: An example of OpenMP3 3.0* task queuing
now, task parallelism, using a directives approach, which provides an
easy and powerful way to convert serial applications into parallel In this example, we need only one task queue. Therefore, we need
applications, enabling potentially big performance gains from parallel to set up the queue by invoking only one thread (omp single). The
execution on multicore and symmetric multiprocessor systems. setQueens calls are independent of each others and therefore they
fit nicely into the task concept. You might want to also read about the
When an application that has been written and built using OpenMP
Intel Parallel Debugger Extension, directly below, which makes it easy
is run on a system with just one processor, the results are the same
to inspect the state of tasks, teams, locks, barriers, or taskwaits in
as unmodified source code. Stated differently, the results are the
your OpenMP program in dedicated windows.
same as those you would get from unmodified, serial-execution code.
This makes it easier for you to make incremental code changes while
maintaining serial consistency. Because only directives are inserted
into the code, it is possible to make incremental code changes and still
maintain a common code base for your software as it runs on systems
that still have only one processor.
OpenMP is a single source code solution that supports multiple

platforms and operating systems. There is also no need to “hard-
code” the number of cores into your application because the OpenMP
runtime chooses the right number for you.
Simple Concurrency Functions Intel Threading Building Blocks
The Intel® C++ Compiler in Parallel Studio offers four new keywords Intel Threading Building Blocks (Intel TBB) is an award-winning C++
to help make parallel programming easier: __taskcomplete, __task, template library that abstracts threads to tasks to create reliable,
__par, and __critical. In order for your application to benefit from portable, and scalable parallel applications. Included in Parallel
the parallelism made possible by these keywords, you specify the / Composer, Intel TBB is a standard template library (STL) that can be
Qopenmp compiler option and then recompile, which links in the used with the Intel C++ Compiler or with Microsoft Visual C++.
appropriate runtime support libraries, which manage the actual degree
of parallelism. These new keywords use the OpenMP 3.0* runtime Intel TBB solves three key challenges for parallel programming:
library to deliver the parallelism, but free you from actually expressing
• Productivity: Simplifies the implementation of parallelism
it with OpenMP* pragma and directive syntax. This keeps your code
• Correctness: Helps eliminate parallel synchronization issues
more naturally written in C or C++.
• Maintenance: Aids in the creation of applications ready for
The keywords mentioned above are used as statement prefixes.
tomorrow, not just today
For example, we can parallelize the function, solve(), using __par.
Assuming that there is no overlap among the arguments, the solve() Advantages of using Intel TBB:
function is modified with the addition of the __par keyword. With
• Future-proof applications: As the number of cores (and
no change to the way the function is called, the computation is
threads) increase, application speedup increases using Intel TBB’s
parallelized. An example is presented in Figure 4:
sophisticated task scheduler
• Portability: Implement parallelism once to execute threaded code

void solve() { on multiple platforms
__par for(int i=0; i<size; i++) { • Interoperability: Commitment to work with a variety of threading
// try all positions in first row methods, hardware, and operating systems
// create separate array for each recursion
• Active open source community: Intel TBB is also available in an
// started here
open source version. opentbb.org is an active site with forums, blogs,
setQueen(new int[size], 0, i);
code samples, and much more
}
} Intel TBB offers comprehensive, abstracted templates, containers, and
classes for parallelism. Figure 5 highlights the major functional groups
Figure 4: Example of __par, one of four simple concurrency functions, within Intel TBB.
new in the Intel® C++ Compiler in Intel® Parallel Studio
Figure 5: Major function groups within Intel® TBB

Intel TBB can be used to solve a wide variety of threading tasks.
Figure 6 presents three common parallelism problems addressed
by Intel TBB.
Problem Solution
How to add Intel TBB parallel_for command
parallelism easily • Straightforward replacement of for/next loops to get
advantages of parallelism
• Load-balanced parallel execution of fixed number of
independent loop iterations
Management of Intel TBB Task Scheduler

threads to get • Manages thread pool and hides complexity of native threads
best scalability • Designed to address common performance issues of parallel
programming
- Oversubscription: One scheduler thread per hardware
thread
- High overhead: Programmer specifies tasks, not threads
- Load imbalance: Work-stealing balances load
Memory Intel TBB provides tested, tuned, and scalable memory

allocation is allocator based on per-thread memory management algorithm
a bottleneck • As an allocator argument to STL template classes
in concurrent • As a replacement for malloc/realloc/free calls (C programs)
environment • As a replacement for global new and delete operators
(C++ programs)
Figure 6: Intel® TBB addresses three major parallelism issues
Intel Integrated Performance

Primitives (Intel IPP)
Figure 7: Intel® Integrated Performance Primitives is included in Intel®
Parallel Composer, part of Intel® Parallel Studio, and features threaded
Parallel Composer includes the Intel Integrated Performance and thread-safe library functions over a wide variety of domains listed
Primitives. Intel IPP is an extensive library of multicore-ready, above.
highly optimized software functions for multimedia, data processing,
and communications applications. It offers thousands of optimized
Intel IPP Performance
functions covering frequently used fundamental algorithms in image
Depending on the application and workload, Intel IPP functions can
processing, image compression, data compression, video coding, string
perform many times faster than the equivalent compiled C code. In
processing, cryptography, signal processing, audio coding, speech
the image resize example below, the same operation that required
coding, speech recognition, computer vision, image color conversion,
338 microseconds to execute in compiled C++ code required only 111
and vector/matrix mathematics.
microseconds when Intel IPP image processing functions were used.
Intel IPP includes both hand-optimized primitive level functions and That is a 300 percent performance improvement.
high-level threaded solutions, such as codecs, and can be used for
both Visual C++ and .NET development. All of these functions and
solutions are fully thread-safe, and many are internally threaded, to
help you get the most out of today’s multicore processors and scale to
future manycore processors.
In this image resizing example, Intel® IPP code ran 3x
faster than compiled C++ code
Header to enable
IPP calls
IPP
function
calls
Figure 9: It’s easy to incorporate Intel® IPP library calls into your Visual
Studio* code
Using Intel IPP in Visual Studio

It’s easy to add Intel IPP support to a Microsoft Visual Studio project.
Parallel Composer includes menus and dialog boxes to add Intel
IPP library names and paths to a Visual Studio project. Simply click
on the project name in the Solution Explorer, opt for the Intel Build
Components Selection menu item, and use the Build Components
dialog to add Intel IPP. Then just add Intel IPP code to your project
including the header and functional code. You’ll notice that the Build
Selection dialog automatically adds the library names to the linker for
IPP and adds a path to the Intel IPP libraries.
In addition to C++ projects, Intel IPP can also be used in C# projects

using the included wrapper classes to support calls from C# to Intel
IPP functions in the string processing, image processing, signal
processing, color conversion, cryptography, data compression, JPEG,
matrix, and vector math domains.
Figure 8. In this image resizing example (from 256 x 256 bits to 460 x
332 bits), the Intel® IPP-powered application ran in 111 msec vs. 338
msec for compiled C++ code (system configuration: Intel® Xeon®, 2.9 GHz,
2 processors, 4 cores/processor, 2 threads/processor).
Optimize Embarrassingly Parallel Loops
Algorithms that display data parallelism with iteration independence
lend themselves to loops that exhibit “embarrassingly parallel”
code. Parallel Composer supports three techniques to maximize the
performance of such loops with minimal effort: auto-vectorization,
use of Intel-optimized valarray containers and auto-parallelization.
Parallel Composer can automatically detect loops that lend
themselves to auto-vectorization. This includes explicit for loops
with static or dynamic arrays, vector and valarray containers, or user-
defined C++ classes with explicit loops. As a special case, implicit
valarray loops can either be auto-vectorized or directed to invoke
optimized Intel Integrated Performance Primitives (IPP) library
primitives. Auto-vectorization and use of optimized valarray headers
optimize the performance of your application to take full advantage of Figure 11: Adding the command to use optimized header files to a
command line in Visual C++
processors that support the Streaming SIMD Extensions.
Next, from the Project menu, select Intel Parallel Composer, then
In a moment, we’ll look at how to enable Intel optimized valarray
Select Build Components. In the resulting pop-up box, check “Use IPP”
headers. But first, let’s look at Figure 10 which shows an example of
click “OK.” Figure 12 presents a picture of this. With this done, you can
an explicit valarray, vector loops, and an implicit valarray loop.
rebuild your application and check it for performance and behavior as
you would when you make any change to your application.
valarray<float> vf(size), vfr(size);
vector<float> vecf(size), vecfr(size);
//log function, vector, explicit loop

for (int j = 0; j < size-1; j++) {
vecfr[j]=log(vecf[j]);
//log function, valarray, explicit loop

for (int j = 0; j < size-1; j++) {
vfr[j]=log(vf[j]);
} Figure 12: Directing Visual Studio to use Intel® IPP
//log function, valarray, implicit loop Auto-parallelization improves application performance by finding
vfr=log(vf); parallel loops capable of being executed safely in parallel and
automatically generating multithreaded code, allowing you to take
Figure 10: The source code above shows examples of explicit valarray, advantage of multicore processors. It relieves the user from having to
vector loops, and an implicit valarray loop.
deal with the low-level details of iteration partitioning, data sharing,
thread scheduling, and synchronizations.
To use optimized valarray headers, you need to specify the use of Intel
Integrated Performance Primitives as a Build Component Selection Auto-parallelization complements auto-vectorization and use
and set a command line option. To do this, first load your project into of optimized valarray headers, giving you optimal performance
Visual Studio and bring up the project properties pop-up window. on multicore systems that support SSE. For more information
In the “Additional Options” box, simply add “/Quse-intel-optimized- on multithreaded application support, see the user guide (http://
headers” and click “OK.” Figure 11 presents a picture of how to do this. software.intel.com/en-us/intel-parallel-composer/, then click the
documentation link).
Intel Parallel Debugger Extension System Requirements
Intel Parallel Composer includes the Intel Parallel Debugger Extension More specific information on installation requirements is available in
which, after installation, can be accessed through the Visual Studio the Release Notes but, at a glance, Intel Parallel Composer can be used
Debug pull-down menu (see Figure 13 below). on, and develop code for, Intel® processors, and processors compatible
with Intel processors, since the Intel® Pentium® 4 processor. Supported
operating systems include Windows XP*, Windows Vista*, or Windows
Server* 2003 or 2008. Parallel Composer also requires Microsoft
Visual Studio 2005 or 2008 but is not supported for use with Visual
Studio Express Edition. For more information, visit http://software.
intel.com/en-us/intel-parallel-composer/ and click “Release Notes.”
Support
Intel Parallel Studio products include access to community forums
and a knowledge base for all your technical support needs, including
technical notes, application notes, documentation, and all product
updates.
Figure 13: Intel® Parallel Debugger Extension is accessible from the For more information, go to
Debug pull-down menu in Microsoft Visual Studio* http://software.intel.com/en-us/articles/intel-parallel-studio/
The Intel Parallel Debugger Extension provides you with additional

insight and access to shared data and data dependencies in your
parallel application. This facilitates faster development cycles and
early detection of potential data access conflicts that can lead to
serious runtime issues. After installing the Parallel Composer and
starting Visual Studio, you can use the Intel Parallel Debug Extension
whenever your applications are taking advantage of Single Instruction
Multiple Data (SIMD) execution and get additional insight into the
execution flow and possible runtime conflicts if your parallelized
application uses OpenMP threading.
To take advantage of the advanced features of the Intel Parallel

Debugger Extension such as shared data event detection,
function re-entrancy detection, and OpenMP awareness including
serialized execution of parallelized code, compile your code with
the Intel Compiler using the /debug:parallel option for debug info
instrumentation.
For more information, see the Intel Parallel Debugger Extension white
paper at http://software.intel.com/en-us/articles/parallel-debugger-
extension/. This paper goes into many more details and benefits the
debugger extension can bring to you, and how to best take advantage
of them.
If you are evaluating debugging products for your parallel applications,

consider the larger Parallel Studio product suite. It includes Intel
Parallel Inspector, which features memory leak analysis and thread
checking tools. It also includes Intel Parallel Amplifier, which provides
hotspot (performance bottleneck) analysis and concurrency checking
tools to debug for code correctness with added awareness of
parallelized code and data. Intel Parallel Studio provides all of these
capabilities, including the Intel Parallel Debugger Extensions.
Features
• Seamlessly upgrades Microsoft Visual Studio* for C/C++ • Integrated array notation, data-parallel Intel® IPP functions are
parallelism. It integrates into Visual Studio and preserves your included, which speed audio, video, signal analysis, and other
IDE investment, while adding parallelism capabilities. application classes
• Intel C++ Compiler is part of Parallel Composer and is compatible • Includes Intel® TBB is included, which is the most efficient
with Microsoft Visual C++* way to implement parallel applications and take advantage of
• Intel® Parallel Debugger Extension integrates with the Microsoft multicore platform
debugger, enhancing Visual Studio to help find and address • There is extensive documentation, including code samples, for
parallelism issues. Saves time in getting applications ready to be getting started with parallelism, and a short Getting Started
used. Guild to get you going in just a few minutes
• Intel® Parallel Composer includes simple concurrency functions, • Community support. You’re not alone out there. Join the growing
data parallel arrays, and thousands of threaded library community of developers adding parallelism to their code. Draw
functions, which simplify threading tasks and speed application on the experience of others and contribute your own knowledge
development and experience and win prizes while doing it.
• Auto-parallelization and auto-vectorization options are included,

which simplify development and saves time
Download a Trial Version Today

Evaluation copy available at:
www.intel.com/software/products/ParallelStudio/
Intel® Parallel Studio
Designed for today’s serial applications and tomorrow’s software innovators.

Intel brings simplified parallelism to Microsoft Visual Studio* C++ developers with a complete productivity solution designed to optimize
serial and new parallel applications for multicore and scale for manycore.
Intel® Parallel Studio: Create optimized serial and parallel Intel® Parallel Inspector: Ensure application reliability with
applications with the ultimate all-in-one parallelism toolkit proactive parallel memory and threading error checking
Intel® Parallel Composer: Develop effective applications Intel® Parallel Amplifier: Quickly find bottlenecks and tune
with a C/C++ compiler and advanced threaded libraries parallel applications for scalable multicore performance
© 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries.
*Other names and brands may be claimed as the property of others.
0409/BLA/CMD/PDF 321554-001

Composer Brief

Uploaded by

Copyright:

Available Formats

Composer Brief

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Composer Brief

Uploaded by

Copyright:

Available Formats

Intel® Parallel Composer

Product Brief Build Serial and Parallel C and C++ Applications

Parallel Composer Components

void solve() { #pragma omp parallel

OpenMP is a single source code solution that supports multiple

• Portability: Implement parallelism once to execute threaded code

Figure 5: Major function groups within Intel® TBB

Management of Intel TBB Task Scheduler

Memory Intel TBB provides tested, tuned, and scalable memory

Figure 6: Intel® TBB addresses three major parallelism issues

Intel Integrated Performance

Using Intel IPP in Visual Studio

In addition to C++ projects, Intel IPP can also be used in C# projects

//log function, vector, explicit loop

//log function, valarray, explicit loop

The Intel Parallel Debugger Extension provides you with additional

To take advantage of the advanced features of the Intel Parallel

If you are evaluating debugging products for your parallel applications,

• Auto-parallelization and auto-vectorization options are included,

Download a Trial Version Today

Intel® Parallel Studio

Designed for today’s serial applications and tomorrow’s software innovators.

You might also like