Chapter 3: First Steps

This is the third part of the Peripheral Control With HLS series of posts. Head to Chapter 2 for installation instructions and environment setup. Chapter 4 will begin working with PYNQ in earnest.

In this chapter, we'll decide on what constitutes a good starting point. In firmware, the classical "Hello, world" involves the toggling of LEDs using a button or dipswitch. For the sake of tradition, we'll be doing likewise, although we'll build up to flashing LEDs incrementally since there's a lot to digest in just getting the LEDs turned on. A word of warning, however, that HLS isn't intended for these types of "real-time" application; it's meant to produce black-box IP cores that work, as opposed to meeting very specific latency requirements. We won't be pushing the "real-time" constraint too hard, but something to be kept in mind nevertheless.

All of the source code for this post can be found in the leds subdirectory of the hls repo.

Turning The LEDs On

High-Level Synthesis

First of all, we're going to write our HLS function that'll be synthesised into something which simply turns on a few LEDs. The Pynq-Z2 has four LEDs, and we'll be turning on numbers zero and two.

Navigate to leds/static/vivado_hls where you'll see a few directories and a tcl script. The tcl script is used to control Vitis HLS without having to go via the GUI interface, while the directories contain C++ code much like any regular software project. We'll ignore the workflow for now and instead concentrate on the source code that'll be synthesised. Let's start by taking a look at src/leds.cpp. The HLS function is, for the most part, self-explanatory:

#include "../include/leds.hpp"

void leds_static(ap_uint<4>& leds) {
#pragma HLS INTERFACE ap_ctrl_none port=return
#pragma HLS INTERFACE ap_none port=leds
    leds = 0b0101;
}

First of all, we see the inclusion of a header that certainly isn't in the C++ standard library. ap_int.h defines arbitrary precision (ap) datatypes that are convenient for firmware. In our case, we're interested in the ap_uint<int> class, which provides us with an arbitrary width unsigned integer datatype, whose template parameter is the number of bits. Since we have four LEDs, we simply need four bits to toggle whether the corresponding LED is on or off, hence the function argument.

It's worth clarifying from the outset that the function arguments aren't like function arguments in regular software that are placed on a stack and manipulated during runtime. Rather, arguments to this function are the interfaces that the core needs to expose -- in this case we need to be able to interact with four bits that toggle LED states. The synthesised core is specified with RTL, and has no idea about references to variables or whatever else we might encounter in software. Think of the synthesised core as a black box whose inputs and outputs must be included as function arguments.

Next up we have a few preprocessor directives that look a little curious. Clearly they tell the HLS tools about the inputs and outputs to the core, albeit using some unknown nomenclature. A full exposition on this topic is well outside the remit of this post -- the reader should consult the "pragma HLS interface" section of UG1399 for further details. However, we can try to give a bit of a TL;DR. Synthesised cores have two types of interface: block and port level interfaces. The block level interfaces exposes various control signals to the core, such as resets and interrupts. For our example, we're just statically assigning the LED states, and clearly we have no use for a block level interface since we just instantiate the core and let it run without having to interact with anything else. Hence our designating the return port as ap_ctrl_none. The port level interfaces allow us to specify how data, i.e. the "function arguments", are passed into/ out of the core, including protocols (whether to use things like acknowledgements, ready/valid handshakes, etc.) and properties such as bus widths and what form the ports take (streams from DDR, registers, BRAMs, etc.) For our case, the leds port is just going to be fed with some static bitmap, and so we don't need anything fancy -- we just designate it as ap_none.

The final part of our code simply assigns the leds variable the bitmap 0101, i.e. the zeroth and second LEDs should be activated and the first and third deactivated. And that's it -- we're ready to synthesise the core.

There's also a header in the include/ directory and a testbench in test/, but these are very much boilerplate that shouldn't require a great deal of explanation. Instead, let's take a look at that tcl script we mentioned earlier to drive the synthesis. You'll see that there's quite a bit of boilerplate to appease the Vitis HLS tools -- setting up projects, targetting hardware, etc. The script is commented to explain what's happening, but for the most part is uninteresting until we get to the final four lines. HLS comprises four main stages:

Simulation (csim_design): Pre-synthesis C++ simulation with the provided testbench, generating a simulation binary; verifies that the behaviour of the code is correct.
Synthesis (csynth_design): Synthesis of RTL from the kernel code.
Cosimulation (cosim_design): Verification of the RTL using hardware emulation and the testbench used in step 1.
Export (export_design): Generate the binary of the RTL that can be implemented on the FPGA.

That't it! Now, let's actually build something:

foo@bar~$ vitis_hls -f run_hls.tcl

You'll be met with an overwhelming amount of terminal output, but after a short while should be told that everything has completed successfully; the final core that can be instantiated in Vivado can be found in static_leds/impl/.

Generating FPGA Image

From the leds/static directory we're going to open Vivado and create a new project that we'll call vivado (Vivado will then obligingly create a leds/static/vivado directory to place the build when we check the "Create project subdirectory" box). We want to select the "RTL Project" box but don't want to specify any sources at this time (since we don't have any HDL to import). Next up we'll be asked to specify a part, so head to the "Boards" tab and select the Pynq-Z2.

chapter_03_create_project

Now that we have a project created, we need to import the IP that we generated with Vitis HLS. This is done through inclusion of a repository containing the synthesised IP core, and can be configured through the "Settings" as seen in the figure below. You'll see that inclusion of leds/static/vitis_hls as a repository makes the Static_leds IP available for us to use in Vivado.

chapter_03_add_repository

We can include the IP block that we've just imported through the "Create Block Design" option. We're not too concerned with the design name so just continue through until Vivado provides you with an empty canvas, prompting you to add a block. Press the plus sign and you'll be presented with a list of available blocks that can be included in the design; in this case, we're after the Static_leds block, so select this and Vivado will import the block. The output from the block, leds, needs to be routed to the LED pins on the PL, so we'll right-click the port and select "Make External".

chapter_03_add_block

We finished be renaming the leds_0 port to leds to match a particular convention we've chosen in our "Constraints" file, a file which tells Vivado about what outputs map to what physical pins on the FPGA. We've included a constraints.xdc file in leds/static, which we'll now import into our project. Opening this file, we see that each bit of the leds port maps to a physical pin (R14, P14, etc.) specified by the Pynq-Z2 schematics.

chapter_03_constraints

The final thing we need to do now is generate a HDL wrapper which reflects what we've just done in the block designer, i.e. instantiate our synthesised Static_leds core and make the connections to the outside world. To this end, we need to right-click on the "design_1" in "Design Sources" and select "Generate HDL wrapper".

chapter_03_hdl_wrapper

All that remains is synthesising our design (i.e. generating the netlist from the RTL in the HDL that we've generated), implementing it (i.e. doing the place-and-route of the netlist on the hardware we've selected) and finally generating the bitstream that can be used to program the FPGA. This is all helpfully signposted in Vivado by big green arrowheads, so select the "Run Synthesis" and click your way through the dialog boxes that are presented to you on completion of each step. This will likely take a few minutes, but once done we're ready to deploy our bitstream on the FPGA. So switch on the Pynq-Z2 and use the "Hardware Manager" within Vivado to program your device.

chapter_03_hardware_manager

If you've done everything right, you should see the zeroth and second LEDs on your board light up. Congratulations! You've gone from writing a small C application to deploying a functional FPGA configuration!

Connecting LED State To Buttons

All of the previous section was mightily impressive, but naturally you'll be a little underwhelmed after a moment or so. So let's try to spice the example up a little -- let's make the LED flashing a little more interactive. Conveniently enough, there's a push button beneath each LED, so let's make it so that each push button toggles the LED that's above it. Head over to leds/button/vitis_hls/src and you'll see another synthesisable function.

#include "../include/leds.hpp"

void button_leds(const ap_uint<4>& btns, ap_uint<4>& leds) {
#pragma HLS INTERFACE ap_ctrl_none port=return
#pragma HLS INTERFACE ap_none port=leds
#pragma HLS INTERFACE ap_none port=btns

    static int timeout[4] = { 0 };
    static ap_uint<4> led_state = 0b0000;

    for (int i=0; i<4; ++i) {
#pragma HLS unroll
        if (btns[i] == 0b1 && timeout[i] == 0) {
            led_state[i] = !led_state[i];
            timeout[i] = 10000000;
        } else if (timeout[i] > 0) {
            --timeout[i];
        }
    }

    leds = led_state;
    return;

}

This hopefully won't present too much of a leap in complexity. We have a new input to our function, btns, representing the state of each button beneath the LEDs; when the bit is high, it means the button has been pressed. So we need to iterate over each button-LED pair and check the button state, toggling the corresponding LED if the button has been pressed. This all necessitates a couple of static variables (i.e. variables which retain state between function invocations): timeout and led_state.

timeout has a physical justification. Our design is going to be clocked at 100MHz or so, and so the synthesised core will be checking the state of the button each 10ns. Now, unless you happen to have reflexes which allow you to press and release a button within 10ns, we're going to need to stop the core from toggling the LED millions of times when we press the button. timeout is set to some large number once the button has been pressed; unless timeout is zero, the LED state won't be toggled. Otherwise, timeout will just be decremented on each clock cycle, without affecting the LED state. In this way, we can adjust the value that timeout is reset with to effectively make the corresponding LED unresponsive until it is expired. Since our clock period will be roughly 10ns, we'll set the maximum timeout to ten million, thereby giving us 100ms to press the button and release it.

led_state is a little more subtle. Think about eventually drawing this block in Vivado; to toggle the LED state, we need to know it's previous value, which means the LED state is both an input and an output to the block. This is a little problematic when the port maps directly to an external pin, and Vivado will complain about loops in our design. As a result, we create a static element in our function that retains the LED state within our block; it's the led_state that gets toggled in our function, and the leds output is just assigned the value from led_state.

The final bit of syntax we've introduced is another preprocessor directive: #pragma HLS unroll. This tells the HLS tools that each iteration of the loop is independent of the others, i.e. each LED state depends on its corresponding button state. As such, the HLS tools are free to unroll the loop and instantiate each iteration as an independent pipeline in the FPGA, as opposed to making a pipeline, resulting in a reduced latency of the core. We're starting to stray into optimisations territory here though, so we'll defer further discussion of this to another chapter.

Perform the synthesis using the included tcl script and fire up Vivado like we did in the previous section. We want to copy all of the instructions up to and including the point where we make external connections for btns and leds. At this point, you'll notice that Vivado is suggesting we do something called "Run Connection Automation" in a green bar above the block design area. You'll also see that there are two ports on our block that weren't included in the HLS source; ap_clk and ap_rst -- where did they come from? We certainly didn't include them in the HLS source. The HLS tools recognised that the latency of our synthesised block would be non-zero; there's some data-dependency in the function (timeout needs to be checked whether it's zero, after which led_state can be inverted if btns is high, then leds needs to be assigned from led_state -- all of this can't happen in a single cycle). As such, a clock is required to be input to the block to drive the block forwards. Contrast this with our previous block where we just assigned leds some fixed value -- Vitis HLS deduced there's nothing dynamic going on here so required no clock.

If we run the connection automation, Vivado will include a clocking module in the design, generating a clock source to provide to our synthesised block. We'll then be met with another green banner asking us to "Run Connection Automation" once more to try and connect the clock module to our block. Proceed with this and the tools will do all of the hard work for us. You'll notice that both a sys_clock and rtl_reset external connection will appear in the block design. The sys_clock is just the system clock that the Zynq chip provides (for our chip, it's at 125MHz, hence why we need the clocking module to slow this down to 100MHz for our design). The rtl_reset is an accompanying reset signal for the clocking module that we don't particularly care about; you'll see in the provided constraints.xdc, this port is mapped to one of the dipswitches on the Pynq-Z2. The reset needs to be mapped to something -- it's active high (i.e. when the line is high, the clocking module will be continually reset) -- and the dipswitch can be kept low. You can play with this later and see if you turn the dipswitch on, the behaviour of the LEDs will be all screwy.

chapter_03_automation

Now we're all set to synthesise, implement and generate a bitstream to program the device with as before. So go ahead and do that, then marvel at your ability to turn LEDs on and off!

Final Remarks

That should suffice for this chapter. We've used both Vitis HLS and Vivado to play with some LEDs and buttons on the Pynq-Z2 board, which is typically where hardware tutorials end for a first effort. In the next chapter, we'll get onto interfacing with hardware components using the PS of the Zynq chip.

Comments