Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
7 views

Programming in Cilk Plus

This document discusses parallel programming with Intel Cilk Plus. It provides an overview of Cilk Plus and its key concepts including task parallelism using cilk_spawn and cilk_sync, work stealing for load balancing, and data parallelism using SIMD instructions. It also covers loop parallelization with cilk_for and modeling parallelism with task graphs.

Uploaded by

Gabe
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Programming in Cilk Plus

This document discusses parallel programming with Intel Cilk Plus. It provides an overview of Cilk Plus and its key concepts including task parallelism using cilk_spawn and cilk_sync, work stealing for load balancing, and data parallelism using SIMD instructions. It also covers loop parallelization with cilk_for and modeling parallelism with task graphs.

Uploaded by

Gabe
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 154

Parallel Programming with

Intel Cilk Plus: part I


Advanced Research Computing
Outline
• A tour of Python
– Date types, exceptions
– Functions, modules, generators
– Functional programming
– The Map-reduce data-parallel pattern
• Linear algebra in Python
– Saxpy as a map operation
– Using C extensions from Python
• Data-parallel Python
Advanced Research Computing
2
WHY DO WE NEED INTEL CILK
PLUS?

Advanced Research Computing 3


What is Python?
• Python is a high-level, multi-paradigm, scripting
language
• Scripting languages trade performance for
productivity
– dynamic typing improves productivity
– Interpreted nature supports interactive development
• Multi-paradigm
– Procedural programming: uni- and multiprocessing
– Functional programming: data parallel primitives
– Object oriented programming

Advanced Research Computing


Python interpreters
• Two interpreters are widely used
– Python
– IPython
• Both support an interactive mode (also know as
the Python shell)
– the user enters Python statements and expressions
that are evaluated as they are entered
– improves productivity
• Both support the help() function
• IPython has a sophisticated line completion
features and context sensitive help
Advanced Research Computing
Multi-core architecture (x)

Advanced Research Computing


Multi-core architecture (1)
• Two
interpreters
are widely
used

Advanced Research Computing


Computer Architecture
• Multi cores and many
cores
– Performance grows with the
number of cores
• Cache hierarchy
– Private and shared
• On-die interconnect mode
– high bandwidth, low latency
• Wider vector units

Advanced Research Computing


Chip Multiprocessing

Advanced Research Computing


Intel Many Integrated Cores

Advanced Research Computing


Multi-core architecture (x)

Advanced Research Computing


Harnessing parallelism

• Hardware • Cilk Plus


– multi or many cores – tasks
– hardware threads – tasks
– SIMD instructions – array notation and
elemental functions

• With Cilk Plus one can create parallel tasks


with SIMD kernels

Advanced Research Computing


cilk_spawn

Advanced Research Computing


CILK PLUS CONCEPTS

Advanced Research Computing 14


What is Intel Cilk Plus?
• Extension to C/C++ to make task and vector
parallelism first class citizens I the language
• Task parallelism uses a fork-join model:
– three keywords cilk_span, cilk_sync, and cilk_for
– hyperobjects prevent data races
• Vectorization of loops and array operations using:
– array notation, elemental functions, pragmas
• Cilk Plus is a faithful extension of C
– serial elision of a Cilk Plus program is a valid
implementation
Advanced Research Computing
What is Intel Cilk Plus? (2)

Advanced Research Computing


What is Cilk Plus (3)

Advanced Research Computing


TASK PARALLELISM

Advanced Research Computing 18


Task Parallelism in Cilk Plus
• Three Cilk Plus keywords
– cilk_spawn and cilk_sync support the fork-join model
– cilk_for for loop parallelization
• The keywords grant permission for parallel
execution but do not mandate parallel execution
• Task execution is managed by the Cilk Plus task
scheduler using work stealing
– The continuation (execution in the caller) is stolen, not
the spawned function

Advanced Research Computing


cilk_spawn keyword
• The cilk_spawn keyword modifies a function call to
tell the Cilk Plus runtime system that the function
may (but is not required to) run in parallel with the
caller
– the function that executes cilk_spawn is called the parent
and the spanwed function is called the child
– caller must ensure that the pass-by-reference arguments
have lifespans that extend until the cilk_sync matching
the cilk_spawn
• Example
int rc;
int arg = 13;
rc = cilk_spawn f(arg);
Advanced Research Computing
cilk_for keyword

Advanced Research Computing


cilk_sync keyword
• The cilk_sync keyword indicates that the
spawning function must wait for all spawned
children to complete before proceeding from
the statement that follows cilk_sync
• cilk_sync only syncs with children spawned by
the function in which it is located
– Children of other functions are not affected
• There is an implicit cilk_sync at the end of
every function and every try block that
contains a cilk_spawn and at the end of a
cilk_for block
Advanced Research Computing
Implicit synchronization

Advanced Research Computing


Composability

Advanced Research Computing


Strands and Knots
• Strand: longest sequence of
instructions without a parallel
control structure
• Knot: a point at which three or
more strands meet
• Two types of knots
– spawn knots (A)
– sync knots (B)
• Program is represented as a
directed acyclic graph (DAG)
• Continuation, i.e., strand 2,
may be stolen

Advanced Research Computing


Strands, nodes, and dependencies

Advanced Research Computing


Fibonacci Numbers

Advanced Research Computing


Fibonacci Numbers (3)

Advanced Research Computing


Execution Flow

Advanced Research Computing


Specifying Parallel Tasks

Advanced Research Computing


cilk_sync

Advanced Research Computing


SERIALIZATION AND DETERMINISM

Advanced Research Computing 32


Serialization

Advanced Research Computing


Fibonacci function serialization

Advanced Research Computing


Computation DAG

Advanced Research Computing


Deterministic Cilk Plus program

Advanced Research Computing


MODELING TASK PARALLELISM

Advanced Research Computing 37


Task Graph
• Tasks are pieces of work
at application level
– loop bodies, functions
• Cilk Plus keywords define
dynamic tasks
• The Cilk Plus runtime
executes the tasks

Advanced Research Computing


Modeling Task Parallelism (1)

Advanced Research Computing


Modeling Task Parallelism (2)

Advanced Research Computing


Modeling Task Parallelism (3)

Advanced Research Computing


Modeling Task Parallelism (4)

Advanced Research Computing


Modeling Task Parallelism (5)

Advanced Research Computing


WORK STEALING

Advanced Research Computing 44


Cilk Plus Runtime

Advanced Research Computing


Work Stealing
• After a spawning function calls cilk_spawn
– the child function is excuted by the worker bound to
the the parent function
– the continuation in the parent function might be
stolen from the worker bound to the parent function
– The Cilk Plus task scheduler decides whether or not to
steal the continuation from the worker bound to
parent function
• Spawning is cheap, stealing is expensive
– Stealing is efficient if enough work in the continuation

Advanced Research Computing


Work Stealing (1)

Advanced Research Computing


Work Stealing (2)

Advanced Research Computing


Work Stealing (3)

Advanced Research Computing


Work Stealing (4)

Advanced Research Computing


Work Stealing (5)

Advanced Research Computing


Work Stealing (6)

Advanced Research Computing


LOOP PARALLELISM

Advanced Research Computing 53


cilk_for keyword

• cilk_for is the same as for if

Advanced Research Computing


Loop Parallelization (1)

Advanced Research Computing


cilk_for grain size

Advanced Research Computing


Loop Parallelization (2)

Advanced Research Computing


Loop parallelization (3)

Advanced Research Computing


Work stealing for snippet A

Advanced Research Computing


Work stealing for Snippet B

Advanced Research Computing


Task Scheduler Overhead (1)

Advanced Research Computing


Task Scheduler Overhead (2)

Advanced Research Computing


DATA PARALLELISM

Advanced Research Computing 63


Data Parallelism (SIMD)
• Array notation to express data parallelism
a[:] = b[:] + c[:]
• Array reductions to merge array elements into a scalar
result
add, mult, max, min
• Elemental functions: map scalar functions across
multiple array elements
__declspec(vector)<function signature>
• Array notation and Elemental functions implement the
map parallel pattern
Advanced Research Computing
Data Parallelism

Advanced Research Computing


ARRAY NOTATION

Advanced Research Computing 66


Array Notation (1)

Advanced Research Computing


Array (SIMD)

Advanced Research Computing


Array Notation (2)

Advanced Research Computing


Array Notation (3)

Advanced Research Computing


Adding two arrays

Advanced Research Computing


Enabling SIMD Code Generation

Advanced Research Computing


Conditional Select

Advanced Research Computing


Specifying the Array Shape

Advanced Research Computing


Avoid runtime overhead
• By default, the compiler checks for data alignment
and aliasing which incurs run-time overhead
• The restrict keyword and align assertions eliminate
the overhead

Advanced Research Computing


REDUCTIONS

Advanced Research Computing 76


Reductions

Advanced Research Computing


Custom Reductions

Advanced Research Computing


ELEMENTAL FUNCTIONS

Advanced Research Computing 79


Function Maps

Advanced Research Computing


Elemental Functions (x)

Advanced Research Computing


Elemental Functions (y)

Advanced Research Computing


Elemental Functions (z)

Advanced Research Computing


Elemental Functions (1)

Advanced Research Computing


Elemental Functions (2)

Advanced Research Computing


Elemental Functions (4)

Advanced Research Computing


Elemental Functions Limitations

Advanced Research Computing


Array Notation (3)

Advanced Research Computing


Array Notation Summary

Advanced Research Computing


REDUCER HYPEROBJECTS

Advanced Research Computing 90


Serial code

Advanced Research Computing


Adding cilk_for create a race

Advanced Research Computing


Locks eliminate race

Advanced Research Computing


Reducers have lower overhead

Advanced Research Computing


Hyperobjects

Advanced Research Computing


Reducers (1)

Advanced Research Computing


Reducing over list concatenation (1)

Advanced Research Computing


Reducing over list concatenation (2)

Advanced Research Computing


Hyperobject Library

Advanced Research Computing


Hyperobject pros and cons

Advanced Research Computing


WORK STEALING SCHEDULER

Advanced Research Computing 101


Implementing Parallel Tasks (1)

Advanced Research Computing


Implementing Parallel Tasks (2)

Advanced Research Computing


Implementing Parallel Tasks (3)

Advanced Research Computing


Implementing Parallel Tasks (4)

Advanced Research Computing


Implementing Parallel Tasks (5)

Advanced Research Computing


Loop Parallelization (2)

Advanced Research Computing


Data Parallelism

Advanced Research Computing


Tuples
• Tuple: an immutable, heterogeneous sequence
– A comma-separated list of values
– The values can be of any type and are indexed
starting from zero
– Slices can be extracted using indexing

t1 = (); t2 = (1, );
t3 = tuple(‘hello’); t3[0]; t3[-1]
a=1, b=2; a,b = (b, a)
l3 = list(t3)
Advanced Research Computing
Lists
• Lists are mutable, heterogeneous sequences
l0= [];
l1 = [1, 2, 3.14, 9.81]; l2 = range(4);
l1.index(3.14); l1.count(2)
del l1[3]; l1.remove(3.14); l1.append(4.5)
# okay for list, but not for tuple
l1[2] = 3; l1[2:] = (3,4)
pairs = zip(l1, l2)
l1_l2 = l1 + l2; l2x2 = l2 * 2
l3 = list(t3); l3[0]; l3[-1]; l3[1:]
len(l3); type(l3)
Advanced Research Computing
Sets
• Sets are unordered collections of immutable
objects (ints, floats, strings, tuples)

s1 = set([1, 2, 3, 4 ])
s1.add(6); s1.add(5)

s2 = set([1, 2, 3, 4, 5, 4,3,2,1]
s1 | s2; s1 & s2; s1 ^ s2

s3 = frozenset( set([5]) )
s1.add(s3)
Advanced Research Computing
Dictionaries
• Python’s associative arrays
e2f= {}
e2f = { ‘blue’: ‘bleu’, ‘red’: rouge}
e2f[‘green’] = ‘vert’
e2f[‘blue’]; e2f.get(‘blue’)
‘red’ in e2f; del e2f[‘green’]

list(e2f.keys()); list(e2f.values());
list(e2f.items()); len(e2f)
num = { 1: ‘one’, 2: ‘two’}; num1 =num.copy()
Advanced Research Computing
Iterables
• An iterable is an object that has an iterator
– The iterator is obtained with the __iter__()
method or with the iter() function
– can be iterated through with the construct:
for item in iterable
• Lists and tuples are iterable objects, but there
are others
– NumPy arrays ;
– Generator objects
• Some functions (e.g., sum) require an iterable
Advanced Research Computing
Control Flow
• If, for, and while
x = [ -1, 2, 3] ; i = 0
for el in x:
if x < 0:
print ‘negative’
elif x== 0:
print ‘zero’
else:
print ‘positive’
while (i < len(x)):
print x[i]; i += 1
Advanced Research Computing
Functions
• Functions have dynamically typed arguments
• The number of function arguments can be
fixed or variable
– The function definition indicates whether it
accepts a variable number of arguments
• Recursive vs iterative function implementation
– Cost of function call + if statement vs cost of for
loop
– Example the factorial function

Advanced Research Computing


Factorial: recursive version
• Recur until the base case n<= 1 is reached

def factorial_rec(n):

if ( n <= 1):
return 1
else:
return n * factorial_rec(n – 1)

print factorial_rec(4)

Advanced Research Computing


Factorial: iterative version
• Compute iteratively the factorial function for
1, 2, …, n using a while loop over 1, 2, … , n.
def factorial_iter(n):
res = 1
i=1
while (i <= n):
res *= i
i += 1
return res
print factorial_iter(4)
Advanced Research Computing
Mutable Objects as Arguments
• Arguments are passed by object reference
– the parameter becomes a new reference to the object
• Passing an immutable object (number, string,
tuple) means that the function cannot modify the
object
• Passing a mutable object (list, dictionary) means
that any change made by the function to the
corresponding parameter will change the
argument in the caller’s environment

Advanced Research Computing


Function with variable number of parameters
• The unknown number of arguments is indicated by *args
def show_list( list, *args ):
last = len(list); step = 1
if (len(args) >= 1):
last = args[0]
if (len(args) >= 2):
step = args[1]
print 'stop_idx =', last, 'step =', step
return [ list[i] for i in range(0,last,step)]

list = range(8)
show_list( list); show_list( list, 4); show_list( list, 4, 2)

Advanced Research Computing


Generator Functions
• A generator is a special function that contains
one or more yield statements
– when called, it returns a generator object
– the object is iterable: calling next() on the
generator object executes the function up to and
including the next yield statement
• Generators implement state machines
– Functions that are not generators should be
stateless

Advanced Research Computing


Modules
• Are python files (possibly in directory tree)
that are loaded using the import statement
– import loads the module namespace, compiles
and executes the module
– a namespace is am map from names to objects
– different ways to import allow to control the
namespace loaded

Advanced Research Computing


Importing a Module
• Access the entire module namespace prefixed
with the module name:
import module_name
• Access one name from the module namespace,
without prefix
from module_name import other_name
• Access all names from the module namespace,
without prefix
from module_name import *
– must be used with caution to avoid name collisions

Advanced Research Computing


Generator for a finite sequence
• If next() is called after the last yield has been executed, a
StopIteration exception is generated
def hello_gen():
yield 'hello'
yield 'world'
hello = hello_gen()
list(hello)
hello = hello_gen()
while 1:
try:
hello.next()
except StopIteration:
print 'DONE'
break
Advanced Research Computing
Generator for an infinite series
• Generate 1-1/3+1/5 … which converges to pi/4
def pi_series():
sum = 0.0; i = 1.0; j = 1
while(1):
sum = sum + j/i
yield 4*sum
i = i + 2; j = j * -1
series = pi_series()
pi = 0.0; n = 0
for elem in series:
n += 1
if abs( elem - pi ) < 0.001:
print "PI =", pi, "for", n, "terms in the series 1 - 1/3 + 1/5 - ..."
break
else:
pi = elem

Advanced Research Computing


Exceptions (1)
• Python, like Java and C++, indicates that an
error has occurred in the execution by raising
an exceptions rather than be setting an error
code that contains one or more yield
statements
– exceptions are typed objects and they can be
caught by type
– example exception types: ZeroDivisionError,
TypeError

Advanced Research Computing


Exceptions (2)
• Exception types caught are listed from specific to
general
def reciprocals(n):
try:
print "reciprocal of ", n, " is", 1./n
reciprocals(n-1)
except ZeroDivisionError:
print “ZeroDivisionError for reciprocal of n =", n
except:
print “Error in computing reciprocal of n =", n

reciprocals(3)
Advanced Research Computing
Classes
• Every class has the __init__() method - constructor
• Every method has as first parameter self, which is a
reference to the class instance
• Private instance variables can be accessed only from
inside the object
class Circle:
def __init__(self, radius=1):
self.__radius = radius
def get_radius(self):
return self.__radius
def set_radius(self, radius):
self.__radius = radius
Advanced Research Computing
Functional Programming (1)
• Imperative programming
– Program seen as a sequence of computation steps
– Focus is on how to compute the solution of the
problem
• Functional programming
– Program seen as evaluations of mathematical
functions
– Focus is on what to compute to solve the problem

Advanced Research Computing


Functional Programming (2)
• Functions are first class objects
– Closure: functions can access variables in the
referencing environment
– No side effects: (pure) functions do not store data in
memory, just return the results
• Stateless programs
• Lambda functions are anonymous functions
lambda i : i %2 == 0
• Using functions instead of imperative constructs
(e.g., for loops) enables parallel implementation

Advanced Research Computing


Functional Programming (3)
• Python is a multi-paradigm language, not a
functional language
• Python includes some of the features of
functional languages
– Passing functions to other functions
– Lambda functions
– A set of (pure) functions: map(), sum(), all(), any()

Advanced Research Computing


Map function
• Apply a function func to all the elements of a
sequence: map(func, iterable)
• Example: convert the elements of a list from
integer to float
L1 = [1, 2, 3, 4]
# using a for loop
L2 = [ float(elem) for elem in L1 ]
# using a map
L3 = map(float, L1)

Advanced Research Computing


Map: recursive version
• Recur until an empty list is reached
def map_rec(func, lst):
if lst == []:
return []
else:
return [func(lst[0])] + map_rec(func, lst[1:])

input = [-2, -4, 6]


print input
output = map_rec(abs, input)
print output
Advanced Research Computing
Map: iterative version
• Evaluate the function for all the elements of
the list using a for loop over the list
def map_iter(func, lst):
return [func(item) for item in lst]

input = [-2, -4, 6]


print input
output = map_iter(abs, input)
print output

Advanced Research Computing


Reduce function
• Reduce function
reduce ( function, iterable, initializer=None)
• Converts an iterable to one value
• Sum as a reduction operation:
def sum_red(seq):
reduce ( lambda x,y: x+y, seq)
x = range(10)
print sum_red(x)
• Sum can also be cast as a map-reduce operation

Advanced Research Computing


Map-Reduce Example (1)
• Consider the problem of determining whether all
the elements of a list are even numbers
• Imperative programming solution uses a function
that
– Initializes the partial result to True
– iterates over the elements of the list and computes a
flag that is True if the element is even and False
otherwise
– Computes the logical AND between the flag and the
partial result
– Returns the results after the list has been traversed

Advanced Research Computing


Map-reduce Example (2)
• Imperative style of determining whether the
elements of a list are even numbers
def even_iter( iterable ):
res = True
for item in iterable:
res = res and (item % 2 == 0)
return res
even_iter( [ 2, 4, 6] )

• The for loop imposes an unnecessary order of


operations
– Goal: express the computation in a way that does
not impose an unnecessary order
Advanced Research Computing
Map-reduce Example (3)
• Use the map-reduce pattern
– Map: compute for all items (item%2==0)
– Reduce: call all() on the list built with map()
even = lambda item: item %2 == 0
def even_map_red( iterable ):
return all ( map (even, iterable) )
even_map_red( [ 2, 4, 6] )

• No unnecessary ordering of operations


– Enables data-parallel implementation using
frameworks such as Copperhead (UC Berkeley)
Advanced Research Computing
Sum using map-reduce (1)
• Sum as a reduce operation

sum

• Sum as a map-reduce operation

map

reduce
Advanced Research Computing
LINEAR ALGEBRA IN PYTHON

Advanced Research Computing 139


Using BLAS from Python
• BLAS can be used either:
– Directly from python, as an external module
– Via a Python package, scypi.linalg.blas
• First, we look at invoking BLAS directly from Python
– We can invoke any BLAS implementation that is
packaged as a shared library (.so) file
– We will use Intel Math Kernel Libraries (MKL) that
include BLAS

Advanced Research Computing


BLAS Functions
• BLAS function prefix
– Real: s = single precision, d = double precision
– Complex: c = single precision, z = double precision
• BLAS1: vector operations
– ddot dot product of two vectors: r = x y
– daxpy scaled vector sum: y = a x + y
• BLAS2: matrix vector operations
– dgemv matrix vector product
• BLAS3: matrix matrix operations
– dgemm = matrix-matrix multiplication

Advanced Research Computing


MKL BLAS Functions
• FORTRAN functions prototypes are defined in
– mkl_blas.fi for Fortran 77
– blas.f90 , blas95.mode for Fortran 95
• C prototypes defined in
– For Fortran 77 BLAS : mkl_blas.h
– For CBLAS: mkl_cblas.h
– mkl_blas.h and mkl_cblas.h are included by mkl.h
• When calling BLAS from C, note that Fortran
assumes column-major order of matrix elements

Advanced Research Computing


The CBLAS saxpy function
• Prototype of the C saxpy function
$ grep -A1 'cblas_saxpy('
$MKL_DIR/include/mkl_cblas.h
void cblas_saxpy(
const MKL_INT N,
const float alpha,
const float * X, const MKL_INT incX,
const float *Y, const MKL_INT incY);

Advanced Research Computing


Calling BLAS saxpy from Python
• Pass to the saxpy function the correct ctypes
from ctypes import *

mkl = cdll.LoadLibrary("libmkl_rt.so")
cblas_saxpy = mkl.cblas_saxpy

n = 8; alpha = 1.0
xp = c_float * n;
yp = c_float * n
x = xp(0.0); y = yp(0.0)
for i in range(len(x)):
x[i] = 1.0; y[i] = 1.0
cblas_saxpy( c_int(n), c_float(1.0), \
byref(x), c_int(1), byref(y), c_int(1))

Advanced Research Computing


BLAS and MKL links
• Intel MKL page
• http://software.intel.com/en-us/articles/intel-math-
kernel-library-documentation/
• Intel MKL User Guide
• http://software.intel.com/sites/products/documenta
tion/hpc/mkl/mkl_userguide_lnx/index.htm
• Explore the BLAS functions
• http://www.netlib.org/lapack/explore-html

Advanced Research Computing


Calling SciPy saxpy from Python
• Pass to the saxpy function the types expected by
the Python function
import scipy.linalg
import numpy as np

n = 8; a = 1.0
x = np.ones(n, 'f')
y = np.ones(n, 'f')

# use print(saxpy.__doc__ ) to get the signature


y = scipy.linalg.cblas.saxpy
z = saxpy(a, x, y)
Advanced Research Computing
DATA-PARALLEL PYTHON

Advanced Research Computing 147


SAXPY as a map operation
• Data parallelism
x_i y_i

a*x_i + y_i

• The specification of saxpy must not impose


an unnecessary order of operations

Advanced Research Computing


Imperative specification of saxpy
The following code inhibits parallelization
because
– Parallelization requires a priori knowledge of the
sequence range(len(y))
– The function has side effects: it changes y

def saxpy(a, x, y):


for i in range(len(y)):
y[i] = a*x[i] + y[i]
return y

Advanced Research Computing


Declarative specification of saxpy
The code below enables parallelization
– No side effects, no unnecessary order, no unknown
indices
– Closure: lambda gets a from caller’s environment

def saxpy(a, x, y):


# or return [ a*xi + yi for xi,yi in zip(x,y) ]
return map(lambda xi, yi: a*xi + yi, x, y)

Advanced Research Computing


Relaxing the order
• Python orders the computation sequentially
– top to bottom of statements
– blas.f90 , left to right for i in interable:
– inside to outside and left to right of expressions
• Relaxing this order enables to
– evaluate expressions out order
– execute map() in arbitrary order
• Annotated saxpy in Python

Advanced Research Computing


Annotated saxpy
The annotation @cu indicates that this is
CopperHead code that will be compiled to a
GPU executable using Just In Time
Specialization (JITS)
@cu
def saxpy(a, x, y):
return map(lambda xi, yi: a*xi + yi, x, y)
res_gpu = saxpy(2.0, x, y)
res_cpu = saxpy(2.0, x, y, cuEntry=False)
Advanced Research Computing
Productivity AND efficiency
• Data-parallel Python, e.g., Copperhead, combines
productivity and efficiency

Advanced Research Computing


Thank you.

Questions?

Advanced Research Computing

You might also like