Composer Brief
Composer Brief
Composer Brief
Once you get going, you will find it useful to join the growing
community of developers taking advantage of systems based
on Intel® multicore processors. Intel provides a dynamic forum for
developers to exchange ideas, post comments and questions, and
earn points to become an Intel® Black Belt Software Developer. We
also provide a large and growing knowledge base presenting a variety
of topics to developers interested in parallelism. Join the community
today. Visit the parallel programming and multicore community at Figure 1: Intel® Parallel Composer integrates into Visual Studio*. The
http://software.intel.com/en-us/articles/intel-parallel-studio/. From solution on display is from the sample code used in the Parallel Studio
Getting Started Guide.
there, you can tap into all of its resources, including blogs, knowledge
bases, downloads, and more. Feel free to explore, and don’t forget to
bookmark it for future reference.
Support for Lambda Functions OpenMP 3.0 Task Queuing
The Intel Compiler is the first C++ compiler to implement lambda Sometimes programs with irregular patterns of dynamic data or
functions in support of the working draft of the next C++ standard complicated control structures, like recursion, are hard to parallelize
C++0x. A lambda construct is almost the same as a function object in efficiently. The work queuing model in OpenMP 3.0 allows you to
C++ or a function pointer in C. Together with closures, they represent exploit irregular parallelism, beyond that possible with OpenMP 2.0
a powerful concept, because they combine code with a scope. A or 2.5.
closure is a function that can refer to and alter the values of bindings
The task pragma specifies the environment within which the enclosed
established by binding forms that textually include the function
units of work (tasks) are to be executed. When a task pragma is
definition. In short, a lambda function, together with a closure, can be
encountered, the code inside the task block is conceptually queued
seen as syntactic sugar around function objects and function pointers
into the queue associated with the task. To preserve sequential
that offers a convenient way to write function objects, or lambdas,
semantics, there is an implicit barrier at the completion of the task.
right at the point of use.
The developer is responsible for ensuring that no dependencies exist
The source code in Figure 2 below is an example of a function object or that dependencies are appropriately synchronized, either between
created by a lambda expression. Tighter C++ and Intel TBB integration the task blocks, or between code in a task block and code in the task
allows the simplification of the functor operator() concept by using block outside of the task blocks. An example is presented below in
lambda functions and closures to pass code as parameters. Figure 3.
Problem Solution
How to add Intel TBB parallel_for command
parallelism easily • Straightforward replacement of for/next loops to get
advantages of parallelism
• Load-balanced parallel execution of fixed number of
independent loop iterations
Header to enable
IPP calls
IPP
function
calls
Figure 9: It’s easy to incorporate Intel® IPP library calls into your Visual
Studio* code
Figure 8. In this image resizing example (from 256 x 256 bits to 460 x
332 bits), the Intel® IPP-powered application ran in 111 msec vs. 338
msec for compiled C++ code (system configuration: Intel® Xeon®, 2.9 GHz,
2 processors, 4 cores/processor, 2 threads/processor).
Optimize Embarrassingly Parallel Loops
Algorithms that display data parallelism with iteration independence
lend themselves to loops that exhibit “embarrassingly parallel”
code. Parallel Composer supports three techniques to maximize the
performance of such loops with minimal effort: auto-vectorization,
use of Intel-optimized valarray containers and auto-parallelization.
Parallel Composer can automatically detect loops that lend
themselves to auto-vectorization. This includes explicit for loops
with static or dynamic arrays, vector and valarray containers, or user-
defined C++ classes with explicit loops. As a special case, implicit
valarray loops can either be auto-vectorized or directed to invoke
optimized Intel Integrated Performance Primitives (IPP) library
primitives. Auto-vectorization and use of optimized valarray headers
optimize the performance of your application to take full advantage of Figure 11: Adding the command to use optimized header files to a
command line in Visual C++
processors that support the Streaming SIMD Extensions.
Next, from the Project menu, select Intel Parallel Composer, then
In a moment, we’ll look at how to enable Intel optimized valarray
Select Build Components. In the resulting pop-up box, check “Use IPP”
headers. But first, let’s look at Figure 10 which shows an example of
click “OK.” Figure 12 presents a picture of this. With this done, you can
an explicit valarray, vector loops, and an implicit valarray loop.
rebuild your application and check it for performance and behavior as
you would when you make any change to your application.
valarray<float> vf(size), vfr(size);
vector<float> vecf(size), vecfr(size);
//log function, valarray, implicit loop Auto-parallelization improves application performance by finding
vfr=log(vf); parallel loops capable of being executed safely in parallel and
automatically generating multithreaded code, allowing you to take
Figure 10: The source code above shows examples of explicit valarray, advantage of multicore processors. It relieves the user from having to
vector loops, and an implicit valarray loop.
deal with the low-level details of iteration partitioning, data sharing,
thread scheduling, and synchronizations.
To use optimized valarray headers, you need to specify the use of Intel
Integrated Performance Primitives as a Build Component Selection Auto-parallelization complements auto-vectorization and use
and set a command line option. To do this, first load your project into of optimized valarray headers, giving you optimal performance
Visual Studio and bring up the project properties pop-up window. on multicore systems that support SSE. For more information
In the “Additional Options” box, simply add “/Quse-intel-optimized- on multithreaded application support, see the user guide (http://
headers” and click “OK.” Figure 11 presents a picture of how to do this. software.intel.com/en-us/intel-parallel-composer/, then click the
documentation link).
Intel Parallel Debugger Extension System Requirements
Intel Parallel Composer includes the Intel Parallel Debugger Extension More specific information on installation requirements is available in
which, after installation, can be accessed through the Visual Studio the Release Notes but, at a glance, Intel Parallel Composer can be used
Debug pull-down menu (see Figure 13 below). on, and develop code for, Intel® processors, and processors compatible
with Intel processors, since the Intel® Pentium® 4 processor. Supported
operating systems include Windows XP*, Windows Vista*, or Windows
Server* 2003 or 2008. Parallel Composer also requires Microsoft
Visual Studio 2005 or 2008 but is not supported for use with Visual
Studio Express Edition. For more information, visit http://software.
intel.com/en-us/intel-parallel-composer/ and click “Release Notes.”
Support
Intel Parallel Studio products include access to community forums
and a knowledge base for all your technical support needs, including
technical notes, application notes, documentation, and all product
updates.
Figure 13: Intel® Parallel Debugger Extension is accessible from the For more information, go to
Debug pull-down menu in Microsoft Visual Studio* http://software.intel.com/en-us/articles/intel-parallel-studio/
For more information, see the Intel Parallel Debugger Extension white
paper at http://software.intel.com/en-us/articles/parallel-debugger-
extension/. This paper goes into many more details and benefits the
debugger extension can bring to you, and how to best take advantage
of them.
• Seamlessly upgrades Microsoft Visual Studio* for C/C++ • Integrated array notation, data-parallel Intel® IPP functions are
parallelism. It integrates into Visual Studio and preserves your included, which speed audio, video, signal analysis, and other
IDE investment, while adding parallelism capabilities. application classes
• Intel C++ Compiler is part of Parallel Composer and is compatible • Includes Intel® TBB is included, which is the most efficient
with Microsoft Visual C++* way to implement parallel applications and take advantage of
• Intel® Parallel Debugger Extension integrates with the Microsoft multicore platform
debugger, enhancing Visual Studio to help find and address • There is extensive documentation, including code samples, for
parallelism issues. Saves time in getting applications ready to be getting started with parallelism, and a short Getting Started
used. Guild to get you going in just a few minutes
• Intel® Parallel Composer includes simple concurrency functions, • Community support. You’re not alone out there. Join the growing
data parallel arrays, and thousands of threaded library community of developers adding parallelism to their code. Draw
functions, which simplify threading tasks and speed application on the experience of others and contribute your own knowledge
development and experience and win prizes while doing it.
Intel® Parallel Studio: Create optimized serial and parallel Intel® Parallel Inspector: Ensure application reliability with
applications with the ultimate all-in-one parallelism toolkit proactive parallel memory and threading error checking
Intel® Parallel Composer: Develop effective applications Intel® Parallel Amplifier: Quickly find bottlenecks and tune
with a C/C++ compiler and advanced threaded libraries parallel applications for scalable multicore performance
© 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries.
*Other names and brands may be claimed as the property of others.
0409/BLA/CMD/PDF 321554-001