In this assignment you will use Xilinx Vivado in non-project mode
(batch mode) to optimize, place, route, and produce a bitstream. The
src/
directory contains the simple_module.v
source code. The
module’s behavior is very similar to the previous assignment:
en == 1
),
the output q
is assigned (<=
) the
input d
en == 0
),
the output q
is not changedq
is initially set to 0The src/
directory
also contains simulation tests in the file testbench.v
. This testbench
illustrates an alternative method for generating test patterns with
case
statements. These lines make signal changes after the indicated number
of clock cycles:
case(clk_count)
2: begin
<= 1;
d end
5: begin
<= 1;
en <= 0;
d end
8: begin
<= 1;
d end
endcase // case (clk_count)
These lines make signal changes when the clk_count
equals 2, 5, and 8. The
signals stay constant during other clock cycles, which will allow us to
observe the delay between input and output signal changes.
Run the simulation using make
. You should notice that when both
en
and d
change to 1
, then q
changes to 1
after a
delay of one clock cycle. The results ought to look like this:
clk: 0 en: 0 d: 0 q: 0
clk: 1 en: 0 d: 0 q: 0
clk: 2 en: 0 d: 0 q: 0
clk: 3 en: 0 d: 1 q: 0
clk: 4 en: 0 d: 1 q: 0
clk: 5 en: 0 d: 1 q: 0
clk: 6 en: 1 d: 0 q: 0
clk: 7 en: 1 d: 0 q: 0
clk: 8 en: 1 d: 0 q: 0
clk: 9 en: 1 d: 1 q: 0
clk: 10 en: 1 d: 1 q: 1
clk: 11 en: 1 d: 1 q: 1
Notice that the assignment d <= 1
is applied when clk_count
equals
2, but the change doesn’t take effect until clk_count
is 3. The clocked assignment
always has a 1-cycle delay. So when both en
and d
are raised at cycle 8, the
assignment takes effect at cycle 9. The change in q
appears at cycle 10, since the
assignment in simple_module
adds
another cycle of delay.
When a design is implemented on a physical FPGA chip, a constraint file must be provided. In the Xilinx platform, the constraints are specified in an “XDC file” which serves several purposes:
In this design, we use the XDC file to define the clock signal and
associate pins. Open the file named simple_module.xdc
and look carefully
at each line.
These lines setup the clock:
set_property PACKAGE_PIN W5 [get_ports clk]
set_property IOSTANDARD LVCMOS33 [get_ports clk]
create_clock -add -name sys_clk_pin -period 10.00 -waveform {0 5} [get_ports clk]
The next few lines associate switches 0 and 1 to inputs en and d:
set_property PACKAGE_PIN V17 [get_ports en]
set_property IOSTANDARD LVCMOS33 [get_ports en]
set_property PACKAGE_PIN V16 [get_ports d]
set_property IOSTANDARD LVCMOS33 [get_ports d]
Note that there are two lines for each signal, one to associate the FPGA pin, and a second line to configure the voltage.
The last few lines associate LED 0 to signal q
:
set_property PACKAGE_PIN U16 [get_ports q]
set_property IOSTANDARD LVCMOS33 [get_ports q]
These lines are all edited from the Basys3_master.xdc
file located in the
main 3700 directory. All of the Basys3 features and their pin
connections are defined in that file.
You will implement the design using a TCL script (pronounced
“tickle”). TCL is short for “Tool Command Language” and is widely used
in Electronic Design Automation (EDA). Open the file build.tcl
and examine each line.
The first few lines load sources and contraint files:
# Load sources
[ glob src/*.v ]
read_verilog read_xdc simple_module.xdc
The glob keyword is a tcl command that returns a
list of matching files. The other commands, read_verilog
and read_xdc
, are specific to Vivado and
are self-explanatory.
Next we synthesize the design using the synth_design
command. The synthesis
process converts your behavioral RTL design to a
structural Verilog design that uses only cell
primitive types that exist on the FPGA.
# Run Synthesis
-top simple_module -part xc7a35tcpg236-1
synth_design -force post_synth.v write_verilog
The options shown here specify the top module and the FPGA part used in the Basys3 board. You always need to specify the correct top module and FPGA part identifier.
Next comes the place procedure, which assigns a specific FPGA cell to every primitive cell instance in the synthesized design. The route procedure solves the switch patterns needed to interconnect all the assigned cells so that the final product matches the specified design.
# Implement (optimize, place, route)
opt_design
place_design route_design
After the place-and-route procedure, we should always generate timing and utilization reports:
# Generate Reports
-file post_route_timing.rpt
report_timing_summary -file post_route_utilization.rpt report_utilization
These reports indicate whether the implementation was successful. We’ll discuss the details later in this assignment.
The last line creates a bitstream which can be used to program the actual FPGA.
# Make bitstream
-force simple_module.bit write_bitstream
This should create a bitstream file called simple_module.bit
. The -force
directive indicates that it’s okay to overwrite any existing bitstream
file (if -force
is not specified, the script halts with an error rather than overwrite
an old version of the file).
You can use the Vivado Hardware Manager to program the bitstream file onto the Basys3 board. Alternatively, you can save the bitstream onto a USB thumdrive (it should be the only bitstream file in the thumbdrive’s root directory); then plug the thumbdrive into the Basys3’s right-side USB port. Change the adjacent jumper setting to “USB” and the board will load the USB bitstream file on powerup.
Now run the build process by typing make implement
in the terminal. It may
take a few minutes to complete. When it finishes, a directory listing
should reveal the two report files and the bitstream file.
Use a text viewer/editor to open the synthesized
netlist file. (“Netlist” is another term for a
structural hardware description; a “net” is a wire and a “netlist” is a
list of wire connections between components). You can read the file
directly in the terminal using the less
command.
In the netlist, note that there are no always
or
reg
statements. Everything is either a wire
or a
primitive cell. Some example primitive cells are:
With further exploration, you can see that the LUT
implements the logic operation on d
and en
. Here is the LUT instance:
(
LUT2 #(4'h8))
.INIT
q_i_1(.I0(d_IBUF),
(en_IBUF),
.I1(q_i_1_n_0)); .O
This module is of type LUT2
(it has 2 inputs). The logic function is defined by the INIT
parameter, which is set to 8, or
binary 1000
. In this
number, the Most Significant Bit (MSB) is a 1. All
other bits, including the Least Significant Bit (LSB),
are 0.
The INIT
parameter defines a
logic truth table. The binary 1000
gives the
order of logic output values in the table:
en | d | q | |
---|---|---|---|
0 | 0 | 0 | <– LSB |
0 | 1 | 0 | |
1 | 0 | 0 | |
1 | 1 | 1 | <– MSB |
As you might imagine, the LUT is a powerful logic cell since it can be configured to implement any desired function.
post_route_timing.rpt
Use a text viewer/editor to open the timing report file. The most important part of the report is the timing summary table. Scroll down to find it. Here is a portion of the table:
------------------------------------------------------------------------------------------------
| Design Timing Summary
| ---------------------
------------------------------------------------------------------------------------------------
WNS(ns) TNS(ns) TNS Failing Endpoints TNS Total Endpoints WHS(ns) THS(ns)
------- ------- --------------------- ------------------- ------- -------
NA NA NA NA NA NA
All user specified timing constraints are met.
You should look first at the Worst Negative Slack (WNS). This measures register-to-register delay. If the slack is positive, then the delay is short enough to avoid timing faults. If the slack is negative, then the delay is too high, which means the next clock edge will likely occur before the logic signal arrives, causing an error.
In this design, we only have one register, so the WNS cannot be computed. For large designs, WNS often becomes the central focus of design effort. Later in this assignment, you will modify the design to make a signal pipeline in which a logic value is passed from one register to another. The timing slack measures the reliability of that pipeline: a positive slack means the signal beats the clock. A negative slack means the signal doesn’t get there in time.
post_route_utilization.rpt
Use a text viewer/editor to open the timing report file. This file reports how many of each primitive cell type are used in the design. It also reports percent utilization for each cell type. If the utilization exceeds 100% for any resource, then the design cannot be programmed onto the target FPGA part.
The different resources categories are reported in a collection of tables, like this one:
+-------------------------+------+-------+-----------+-------+
| Site Type | Used | Fixed | Available | Util% |
+-------------------------+------+-------+-----------+-------+
| Slice LUTs | 1 | 0 | 20800 | <0.01 |
| LUT as Logic | 1 | 0 | 20800 | <0.01 |
| LUT as Memory | 0 | 0 | 9600 | 0.00 |
| Slice Registers | 1 | 0 | 41600 | <0.01 |
| Register as Flip Flop | 1 | 0 | 41600 | <0.01 |
| Register as Latch | 0 | 0 | 41600 | 0.00 |
| F7 Muxes | 0 | 0 | 16300 | 0.00 |
| F8 Muxes | 0 | 0 | 8150 | 0.00 |
+-------------------------+------+-------+-----------+-------+
In the FPGA, the LUTs and Registers are arranged into Logic Slices, which are further grouped into Configurable Logic Blocks (CLBs). A number of multiplexers (Muxes) serve as configurable switches to control how the slices are inter-connected. In most designs, these are the most heavily used resources.
Scrolling further down, you will see a list of other resources that are not used in this design, but may be valuable in future projects:
These resources provide specialized functionality beyond the “programmable gate array” features, allowing for designs with better performance. For example, the DSP module contains a dedicated multiplier. Since it is able to be optimized to work solely as a multiplier, it is much faster and more compact than what could be achieved with all-purpose logic slices. We will explore some of these special-purpose modules in future assignments.
On your local machine, plug in the Basys3 board, turn it on, and launch the Vivado hardware manager. To access the Hardware Manager, launch Vivado from the Applications menu or from a local terminal. It should present you with a window showing various options, where you should select Open Hardware Manager.
Next, the Hardware Manager window should appear. Near the top left of
the window, click Open Target and select Auto
Connect. It should identify your Basys3 device and open a
connection to it. Then click Program Device and provide
the path to your simple_module.bit
file.
Copy your bitstream file onto a thumb drive with a USB Type-A
connector (Type-A is the most common type). Make sure there is only one
.bit
file in the top folder of
your thumb drive. Eject the thumbdrive from your computer. Turn off the
Basys3. Plug your thumbdrive into the USB port on the right side, above
the pushbuttons. Move the JP1
jumper to the USB
setting. Turn on the Basys3. The amber light should “throb” for
a moment while the bitstream is retrieved from the thumb drive. Once
complete, the amber light will turn off and your design should be
active.
Once the board is programmed, verify the truth table:
sw0 | sw1 | LED0 |
---|---|---|
0 | 0 | 0 |
0 | 1 | 0 |
1 | 0 | 0 |
1 | 1 | 1 |
Modify the design to add a second register, so that we can properly evaluate the WNS. This modification will create a logic pipeline between two registers:
In this pipeline, the signal _q
is connected between two registers:
it is the output of D Register 1, and it is the input of D Register 2.
Both of these registers are controlled by the same clock signal. When
the external input d
changes,
Register 1 locks-in the new value upon the rising edge of clk
. There is some physical delay
before the new value appears on _q
. Register 2 will lock-in the new
value upon the rising edge of clk
, so _q
has to stabilize before the clock
rises. If _q
stablizes early,
the extra time is called the slack:
To change your design and implement the signal pipeline, follow these steps:
src/top.v
src/top.v
, do
the following:
_q
to make an internal connectionSM1
clk
, en
, and d
to SM1
SM1.q
to _q
q <= _q
src/testbench.v
so that it instantiates top
instead of simple_module
build.tcl
, change the
synth_design
line to replace
simple_module
with top
: -top top -part xc7a35tcpg236-1 synth_design
Then simulate the design by running make
. You should observe a
two-cycle delay in the output signal.
Once verified, run make implement
to build the design.
Then open the .rpt
files and
observe:
You do not need to program the design onto your board.
To turn in your work, run these commands:
git add *.rpt
git add src/top.v
git commit . -m "Complete."
git push origin main
Then indicate on Canvas that your work is done.