## High-Level Synthesis Xilinx Vivado HLS

#### Hao Zheng Comp Sci & Eng University of South Florida



#### → The Zynq Book, chapter 14, 15

# → Vivado Design Suite Tutorial: High-Level Synthesis

#### **Overview**





*Figure 14.4: Clarification of the algorithm and interface, and showing a subset of interface types* 





*Figure 14.6: C functional verification and C/RTL cosimulation in Vivado HLS* 

#### **Implementation Considerations**

- → Resources / area
- → Throughput
- → Clock frequency
- → Latency
- → Power consumption
- → I/O requirements

Controlled by synthesis directives



Figure 14.8: Comparison of three possible outcomes from HLS for an example function

#### Native Types in C/C++

| Туре                                         | Description                                                         | Number<br>of Bits <sup>a</sup> | Range <sup>b</sup>                                      |
|----------------------------------------------|---------------------------------------------------------------------|--------------------------------|---------------------------------------------------------|
| char                                         | Representation of the basic character set.                          | 8                              | -128 to 127                                             |
| signed char                                  |                                                                     | 8                              | -128 to 127                                             |
| unsigned char                                |                                                                     | 8                              | 0 to 255                                                |
| short int                                    | A reduced precision ver-<br>sion of int, requiring less<br>storage. | 16                             | -32,768 to 32,767                                       |
| unsigned short<br>int                        |                                                                     | 16                             | 0 to 65,535                                             |
| int                                          | The basic integer data type.                                        | 32                             | -2,147,483,648 to 2,147,483,647                         |
| unsigned int                                 | -                                                                   | 32                             | 0 to 4,294,967,295                                      |
| long int                                     | t An extended precision integer type.                               | 32                             | -2,147,483,648 to 2,147,483,647                         |
| unsigned long<br>int                         |                                                                     | 32                             | 0 to 4,294,967,295                                      |
| long long int                                |                                                                     | 64                             | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 |
| unsigned long<br>long int<br>float<br>double |                                                                     | 64                             | 0 to<br>18,446,744,073,709,551,615                      |
|                                              | Single precision floating<br>point (IEEE 754)                       | 32                             | $-3.403e^{+38}$ to $3.403e^{+38}$                       |
|                                              | Double precision floating<br>point (IEEE 754)                       | 64                             | -1.798e <sup>+308</sup> to 1.798e <sup>+308</sup>       |

#### **Arbitrary Precision – Integer**

*Table 15.2: Arbitrary precision integer data types for use in C and C++ Vivado HLS designs* 

| Language | Integer Data Type                           | Description                                 | <b>Required Header</b>          |
|----------|---------------------------------------------|---------------------------------------------|---------------------------------|
| С        | intN<br>(e.g. int7)                         | signed integer of <i>N</i> bits precision   | <pre>#include "ap_cint.h"</pre> |
|          | uintN<br>(e.g. uint7)                       | unsigned integer of <i>N</i> bits precision |                                 |
| C++      | ap_int <n><br/>(e.g. ap_int&lt;7&gt;)</n>   | signed integer of <i>N</i> bits precision   | <pre>#include "ap_int.h"</pre>  |
|          | ap_uint <n><br/>(e.g. ap_uint&lt;7&gt;)</n> | unsigned integer of <i>N</i> bits precision |                                 |

#### **Typical C/C++ Construct to RTL Mapping**

- <u>C Constructs</u> <u>HW Components</u>

  - Arguments → Input/output ports
    - **Operators → Functional units** 
      - - Arrays -> Memories
- Control flows → Control logics

#### **Function Hierarchy**

→ Each function is synthesized to a RTL module
→ Function inlining eliminates hierarchy

#### → The function main() cannot be synthesized

→ Used to develop C-testbench

#### Source code







#### **Function Arguments**

→ Function arguments become module ports → Interface follows certain protocol to synchronize data exchange



#### **Expressions**

#### → Expressions and operations are synthesized to datapath components

→ Timing constraints influence the degree of registering



#### Arrays

→ An array is typically implemented by a mem block
 → Read & write array -> RAM; Constant array -> ROM
 → An array can be partitioned and map to multiple RAMs
 → Multiples arrays can be merged and map to one RAM
 → An array can be partitioned into individual elements and map to registers



#### Loops

#### → By default, loops are rolled

- → Each loop iteration corresponds to a "sequence" of states (possibly a DAG)
- →This state sequence will be repeated multiple times based on the loop trip count



#### **Loop Unrolling**

→ To expose higher parallelism and achieve shorter latency

→ Pros

- →Decrease loop overhead
- →Increase parallelism for scheduling
- →Facilitate constant propagation and array-to-scalar promotion
- →Cons increase operation count, which may negatively impact area,

for (int i = 0; i < N; i++) A[i] = C[i] + D[i];

```
A[0] = C[0] + D[0];

A[1] = C[1] + D[1];

A[2] = C[2] + D[2];
```

#### **Loop Pipelining**

→ Loop pipelining is one of the most important optimizations for high-level synthesis

- → Allows a new iteration to begin processing before the previous iteration is complete
- → Key metric: Initiation Interval (II) in # cycles



#### **Synthesis of Loops – Case Study**

By default, Vivado intends to optimize area, so loops are rolled

#### **Synthesis of Loops – Case Study**



*Figure 15.23: Extraction of addition loop into datapath and control logic* 

#### **Merging Loops**



Figure 15.24: Consecutive loops for addition and multiplication within a function

### **Merging Loops**



Figure 15.25: Merged addition and multiplication loops

#### **Interface Synthesis**

```
void find_average_of_best_X (int *average, int samples[8], int X)
{
    // body of function (statements, sub-function calls, etc.)
}
```



#### **Port Directions**

Table 15.6: Synthesis of port directions

| C/C++ Function Argument                             | RTL Port Type         |
|-----------------------------------------------------|-----------------------|
| An argument which is read from and never written to | in                    |
| An argument which is written to and never read from | out                   |
| A value output by the function return statement     | out                   |
| An argument which is both written to and read from  | inout (bidirectional) |

#### **Port Protocols**

- → Simple: ap\_none, ap\_stable, ap\_ack
- → Ports with validation: ap\_vld, ap\_ovld, ap\_hs
- → Memory Interface: ap\_memory, bram
- → ap\_fifo—
- →ap\_bus—
- →AXI: axis, s\_axilite, m\_axi.

