This is the fifth part of the Peripheral Control With HLS series of posts. You can head to Chapter 4 to see how we interact with our IP cores using PYNQ or Chapter 6 TBD!
In this chapter, we'll be finally looking at doing some numerical computing. We've already designed HLS kernels and kind of understand how all of this works, so we'll try to complicate things a little more by asynchronously launching our IP core on the PL. Then, rather than just idling the PS, we'll try to do some useful work there at the same time and utilise the dual-core ARM processor that's also at our disposal.
Interrupts
An Example
You'll recall in the previous chapter that when we added a block protocol to our
IP core, we had an interrupt signal that was output from the core. Now it's
time to actually use that signal. For our block using the ap_ctrl_hs protocol,
the interrupt will go high when processing is complete, i.e. when AP_DONE is
asserted.
Let's use an illuminating example, where our core will wait until a button is pressed, then raise an interrupt and light up the corresponding LED:
void button_interrupt(const ap_uint<4>& btns, ap_uint<4>& leds) {
#pragma HLS INTERFACE port=return mode=ap_ctrl_hs bundle=bus_a
#pragma HLS INTERFACE port=return mode=s_axilite bundle=bus_a
#pragma HLS INTERFACE port=btns mode=ap_ctrl_none
#pragma HLS INTERFACE port=leds mode=ap_ctrl_none
static ap_uint<4> led_state = 0b0000;
static bool button_pressed = false;
while (!button_pressed) {
for (int idx=0; idx<4; ++idx) {
#pragma HLS PIPELINE
if (btns[idx] == 0b1) {
led_state[idx] = 0b1;
button_pressed = true;
}
}
}
leds = led_state;
return;
}
Build this as usual using Vitis HLS, and add both it and the ZYNQ7 Processing System blocks into the Vivado block designer. However, we'll need to add another block to convert the interrupt signal from the core into a form that's usable by the ZYNQ7 Processing System, not to mention enable interrupts on the ZYNQ7 Processing System.
First of all, double-click on the ZYNQ7 Processing System block and head over to the "Interrupts" section. There you'll want to check the "Fabric Interrupts" box and in the "PL-PS Interrupt Ports", check the "IRQ_F2P" box. This will enable a port from the PL to a shared register with the PS used for general interrupts. For PYNQ, we're only ever allowed to use a single interrupt, so only a single bit of the 16-bit register will be used for interrupts.
Next, we'll want to add an "AXI Interrupt Controller" to our block design. This
takes some number of interrupts and performs some internal logic to create a
single interrupt output that can be configured in particular ways. Double-click
on this block once you've added it and in the "Interrupt Output Connection"
option, select "Single" (we only want a single output interrupt). In addition,
in the "Peripheral Interrupts Type" subsection, toggle the
"Interrupts Type -- Edge or Level" option and set the corresponding value to
0xFFFFFFFF (you can find a justification for this setting hereQQ).
Now we're going to have to make a few manual connections in our design here.
Connect the interrupt output from our HLS block to the intr input on the
AXI Interrupt Controller, and the connect the irq output from this block to
the IRQ_F2P input of the ZYNQ7 Processing System. Once that's done, go ahead
and run the suggested automations and generate the bitstream as usual.
One thing that we need to know a bit more about is the memory-mapping that our core will use for its control signals. Vitis HLS generates a control register map that's documented in the "S_AXILITE Control Register Map" section of UG1399. Of particular importance to us:
0x04 : Global Interrupt Enable Register
- bit 0 - Global Interrupt Enable (Read/Write)
- others - reserved
0x08 : IP Interrupt Enable Register (Read/Write)
- bit 0 - enable ap_done interrupt (Read/Write)
- bit 1 - enable ap_ready interrupt (Read/Write)
- others - reserved
0x0c : IP Interrupt Status Register (Read/TOW)
- bit 0 - ap_done (COR/TOW)
- bit 1 - ap_ready
The "Global Interrupt Enable Register" sets whether any interrupts are enabled
from the core. The "IP Interrupt Enable Register", on the other hand, sets what
triggers the interrupt from the core, in this case either the
AP_DONE or AP_READY signal that's used in the block-level handshake
protocol. The "IP Interrupt Status Register" can be read from to determine
whether an interrupt has been raised from either of the interrupt modes in
the "IP Interrupt Enable Register".
Using asyncio
Transfer the HWH file and bitstream over to the PYNQ-Z2, and instantiate an overlay as usual:
from pynq import Overlay
ol = Overlay("./button_interrupt.bit")
Now we've got to prime the core in a few ways. First up, we've got to start
it up by setting the AP_DONE register; we encountered this in the previous
chapter. Next we've got to enable interrupts as per our discussion above.
Clearly we're going to want to enable the "Global Interrupt Enable Register",
and we're going to want to trigger an interrupt on the "AP_DONE" signal, so
we'll set the first bit in the "IP Interrupt Status Register" too:
AP_START = 0x0
G_IER = 0x04
IP_IER = 0x08
ol.button_interrupt.write(AP_START, 1)
ol.button_interrupt.write(G_IER, 1)
ol.button_interrupt.write(IP_IER, 1)
PYNQ conveniently provides an Interrupt class which exposes a co-routine,
wait, to us. This co-routine yields from an asyncio event, such that
it will return once an interrupt has been raised. We can wrap this co-routine
in another co-routine that awaits the interrupt and does something (in our case,
just prints that an interrupt has been received) else once complete. We then
just wrap our co-routine in a future so that we can designate where in our code
we need the result.
ip = ol.button_interrupt
async def handler(interrupt):
await interrupt.wait()
print("Received interrupt!")
handler_task = asyncio.ensure_future(handler(ip.interrupt))
In our case the future is a little unnecessary since we're not going to be doing
anything besides wait for the interrupt, but this serves as a template that
can be used for more complex flows. Finally, we run the asyncio event loop,
waiting for the future we've just created.
import asyncio
from psutil import cpu_percent
loop = asyncio.get_event_loop()
cpu_percent(percpu=True)
loop.run_until_complete(handler_task)
cpu_used = cpu_percent(percpu=True)
print('CPU Utilization = {cpu_used}'.format(**locals()))
Note that we've also added a little bit of extra code to check how much
each of the two CPUs is being used. I'm not entirely sure about the
low-level details of how asyncio or the Interrupt class is implemented,
but I'm guessing it's just polling the "IP Interrupt Status Register" we saw
in the control memory map above, hence the high CPU utilisation on one of the
cores you'll see in this example (it's basically just spin-locking).
That's about all we have to say about that. Run the code and you'll see nothing
happens to the board. Go ahead and press a button -- the corresponding LED
should light up, and the handler routine will progress past the await,
printing to STDOUT that an interrupt was received. At this point, the board
will be unresponsive (since we're not clearing any of the registers, nor
doing anything generally once the interrupt is triggered) until the overlay
is reloaded. Neat!
Estimating Pi
OK, we've talked enough about hardware problems now. Let's try to apply everything that we've learned to a numerical example. Estimating pi seems like a good place to start since it's a little more complex than the standard examples such as vector addition. Recall that since the area of a circle is \(\pi r^2\) and the area of a square is \(4r^2\), \(pi\) is nothing more than four times the ratio of the areas of a circle and square. The classic way to estimate \(pi\) is by randomly generating \((x,y)\) co-ordinates on a square,
Random Number Generator
Sadly, rand is not synthesisable, nor are any of the random number generators
in the STL. As such, we're going to have to write our own random number
generator. Mercifully, since we don't need a cryptographically-secure random
number generator, we can use a linear-feedback shift register (LFSR). LFSRs are
discussed in great detail elsewhere (see here for example), so there's no point
in our replicating content here. Rather, we'll simply provide a synthesisable
implementation of the LFSR here:
uint16_t lfsr(uint16_t seed) {
static bool uninitialised = 0;
static ap_uint<16> state = 0;
if (uninitialised) {
uninitialised = 1;
state = seed;
}
ap_uint<1> new_bit = state[0] ^ state[1] ^ state[15];
state = state >> 1;
state[15] = new_bit;
return state;
}
Comments