C Code Example
C Code Example
We shall now look at the C code of a coprocessing system consisting of the three components
mentioned earlier. First, a C source file with a very simple example of a synthesized function
and its wrapper are shown. Next, a host program outlines how the communication with the
logic is done from the host. This host program is then improved for the sake of performance
and reliability, and showing the recommended programming techniques.
Sample code for HLS synthesis
To clarify how HLS works with Xillybus, lets consider a simple example, which
demonstrates the calculation of a trigonometric sine and a simple integer operation, both
covered in a custom function, mycalc(). This is a very simple function, but as Xilinx guide to
Vivado HLS shows, the possibilities go way beyond this. mycalc() takes the role of the
synthesized function.
This function is called by a wrapper function, xillybus_wrapper(), which is responsible for
the interface with the host. It accepts an integer and a floating point number from the host
through a data pipe, which is represented by the in argument. It returns the integer plus one
and the (trigonometric) sine of the floating point number, using the out argument.
How the *in++ and *out++ operations transport data from and to the host application is
explained below. A walkthrough of the code is given immediately after its listing here.
#include <math.h>
#include <stdint.h>
#include "xilly_debug.h"
// Debug output
xilly_puts("x1=");
xilly_decprint(x1, 1);
xilly_puts("\n");
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdint.h>
struct {
uint32_t v1;
float v2;
} tologic, fromlogic;
tologic.v1 = 123;
tologic.v2 = 0.78539816; // ~ pi/4
// Not checking return values of write() and read(). This must be done
// in a real-life program to ensure reliability.
close(fdr);
close(fdw);
return 0;
}
$ ./hlsdemo
$ cat /dev/xillybus_read_8
Hello, world
x1=123
Hello, world
The origins of the first two lines are easily found on the wrapper function above. The third
Hello, world line may come slightly unexpected, and may not appear in some cases. Its a
result of the HLS compilers attempt to promote data flow. The logics state machine always
assumes that new input data is onway, and attempts to move things forward as much as
possible to save the processing time once the data arrives.
Since no input data is needed for the second Hello, world, its sent out as soon as possible.
In this case, its immediately after x1=123, which depends on input data. In theory, it could
go on printing out the x1= part as well, but the compiler didnt optimize things this far.
A practical host program
The code above outlines the way data is exchanged, but two changes are necessary in
practical system:
Sending a single set of data for processing is extremely inefficient, making I/O overhead
a major delay component. Its also wrong to wait for the outcome of a single execution
before sending the next set.
The return values from read() and write() arent checked, so partial operation and UNIX
signals arent handled properly. This is a negligible issue when a single chunk of 8 bytes
is going back and forth, but may cause weird problems in real-life applications.
The program below shows a suggested practical Linux-style implementation of using the
logic for coprocessing. This is a throughput-oriented implementation, focused on keeping the
data flowing rather than completing rounds of requests and responses.
The following differences are most notable:
Rather than generating a single set of data for processing, an array of structures is
allocated and sent. Likewise, an array of data is received from the logic. This reduces the
I/O overhead, and the impact of software and hardware latencies.
The program forks into two processes, one for writing and one for reading data. Making
these two tasks independent prevents the processing from stalling due to lack of data to
process or output data waiting to be cleared up. This independency can be achieved with
threads (in particular in Windows) or using the select() call as well.
The read() and write() calls are made as necessary to ensure reliable I/O. These while
loops may appear cumbersome, but they are necessary to respond correctly to partial
completions of these calls (not all bytes read or written) which is a frequent case under
load. The EINTR error is also handled as necessary to react properly to POSIX signals,
which may be sent to the running processes, possibly by unrelated software.
Note that for real use, the debug messages must be removed from the synthesized and
wrapper functions, as they may slow down execution dramatically, in particular by forcing
sequential execution where a speedup is possible by parallel execution.
The programs listing follows.
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdint.h>
#define N 1000
struct packet {
uint32_t v1;
float v2;
};
pid = fork();
if (pid < 0) {
perror("Failed to fork()");
exit(1);
}
if (pid) {
close(fdr);
if (rc <= 0) {
perror("write() failed");
exit(1);
}
donebytes += rc;
}
close(fdw);
return 0;
} else {
close(fdw);
if (rc < 0) {
perror("read() failed");
exit(1);
}
if (rc == 0) {
fprintf(stderr, "Reached read EOF!? Should never happen.\n");
exit(0);
}
donebytes += rc;
}
close(fdr);
return 0;
}
}