Build and Program a Simple Module

Examine source files and run simulations

In this assignment you will use Xilinx Vivado in non-project mode (batch mode) to optimize, place, route, and produce a bitstream. The src/ directory contains the simple_module.v source code. The module’s behavior is very similar to the previous assignment:

The src/ directory also contains simulation tests in the file testbench.v. This testbench illustrates an alternative method for generating test patterns with case statements. These lines make signal changes after the indicated number of clock cycles:

case(clk_count)
   2: begin
      d <= 1;
   end
   5: begin
      en <= 1;
      d  <= 0;
   end
   8: begin
      d  <= 1;
   end
endcase // case (clk_count)

These lines make signal changes when the clk_count equals 2, 5, and 8. The signals stay constant during other clock cycles, which will allow us to observe the delay between input and output signal changes.

Run the simulation using make. You should notice that when both en and d change to 1, then q changes to 1 after a delay of one clock cycle. The results ought to look like this:

clk:           0 en: 0   d: 0 q: 0
clk:           1 en: 0   d: 0 q: 0
clk:           2 en: 0   d: 0 q: 0
clk:           3 en: 0   d: 1 q: 0
clk:           4 en: 0   d: 1 q: 0
clk:           5 en: 0   d: 1 q: 0
clk:           6 en: 1   d: 0 q: 0
clk:           7 en: 1   d: 0 q: 0
clk:           8 en: 1   d: 0 q: 0
clk:           9 en: 1   d: 1 q: 0
clk:          10 en: 1   d: 1 q: 1
clk:          11 en: 1   d: 1 q: 1

Notice that the assignment d <= 1 is applied when clk_count equals 2, but the change doesn’t take effect until clk_count is 3. The clocked assignment always has a 1-cycle delay. So when both en and d are raised at cycle 8, the assignment takes effect at cycle 9. The change in q appears at cycle 10, since the assignment in simple_module adds another cycle of delay.

“Implement”: Synthesize, Elaborate, Place/Route

Examine the XDC constraint file

When a design is implemented on a physical FPGA chip, a constraint file must be provided. In the Xilinx platform, the constraints are specified in an “XDC file” which serves several purposes:

In this design, we use the XDC file to define the clock signal and associate pins. Open the file named simple_module.xdc and look carefully at each line.

These lines setup the clock:

set_property PACKAGE_PIN W5 [get_ports clk]                         
set_property IOSTANDARD LVCMOS33 [get_ports clk]
create_clock -add -name sys_clk_pin -period 10.00 -waveform {0 5} [get_ports clk]

The next few lines associate switches 0 and 1 to inputs en and d:

set_property PACKAGE_PIN V17 [get_ports en]                 
    set_property IOSTANDARD LVCMOS33 [get_ports en]
set_property PACKAGE_PIN V16 [get_ports d]                  
    set_property IOSTANDARD LVCMOS33 [get_ports d]

Note that there are two lines for each signal, one to associate the FPGA pin, and a second line to configure the voltage.

The last few lines associate LED 0 to signal q:

set_property PACKAGE_PIN U16 [get_ports q]                  
set_property IOSTANDARD LVCMOS33 [get_ports q]

These lines are all edited from the Basys3_master.xdc file located in the main 3700 directory. All of the Basys3 features and their pin connections are defined in that file.

Study the build script

You will implement the design using a TCL script (pronounced “tickle”). TCL is short for “Tool Command Language” and is widely used in Electronic Design Automation (EDA). Open the file build.tcl and examine each line.

The first few lines load sources and contraint files:

# Load sources
read_verilog [ glob src/*.v ]
read_xdc simple_module.xdc

The glob keyword is a tcl command that returns a list of matching files. The other commands, read_verilog and read_xdc, are specific to Vivado and are self-explanatory.

Next we synthesize the design using the synth_design command. The synthesis process converts your behavioral RTL design to a structural Verilog design that uses only cell primitive types that exist on the FPGA.

# Run Synthesis

synth_design -top simple_module -part xc7a35tcpg236-1
write_verilog -force post_synth.v

The options shown here specify the top module and the FPGA part used in the Basys3 board. You always need to specify the correct top module and FPGA part identifier.

Next comes the place procedure, which assigns a specific FPGA cell to every primitive cell instance in the synthesized design. The route procedure solves the switch patterns needed to interconnect all the assigned cells so that the final product matches the specified design.

# Implement (optimize, place, route)
opt_design
place_design
route_design

After the place-and-route procedure, we should always generate timing and utilization reports:

# Generate Reports
report_timing_summary -file post_route_timing.rpt
report_utilization -file post_route_utilization.rpt

These reports indicate whether the implementation was successful. We’ll discuss the details later in this assignment.

The last line creates a bitstream which can be used to program the actual FPGA.

# Make bitstream
write_bitstream -force simple_module.bit

This should create a bitstream file called simple_module.bit. The -force directive indicates that it’s okay to overwrite any existing bitstream file (if -force is not specified, the script halts with an error rather than overwrite an old version of the file).

Programming the Basys3 Board

You can use the Vivado Hardware Manager to program the bitstream file onto the Basys3 board. Alternatively, you can save the bitstream onto a USB thumdrive (it should be the only bitstream file in the thumbdrive’s root directory); then plug the thumbdrive into the Basys3’s right-side USB port. Change the adjacent jumper setting to “USB” and the board will load the USB bitstream file on powerup.

Assigned Tasks

Run the build script

Now run the build process by typing make implement in the terminal. It may take a few minutes to complete. When it finishes, a directory listing should reveal the two report files and the bitstream file.

Examine post_synth.v

Use a text viewer/editor to open the synthesized netlist file. (“Netlist” is another term for a structural hardware description; a “net” is a wire and a “netlist” is a list of wire connections between components). You can read the file directly in the terminal using the less command.

In the netlist, note that there are no always or reg statements. Everything is either a wire or a primitive cell. Some example primitive cells are:

With further exploration, you can see that the LUT implements the logic operation on d and en. Here is the LUT instance:

  LUT2 #(
    .INIT(4'h8)) 
    q_i_1
       (.I0(d_IBUF),
        .I1(en_IBUF),
        .O(q_i_1_n_0));

This module is of type LUT2 (it has 2 inputs). The logic function is defined by the INIT parameter, which is set to 8, or binary 1000. In this number, the Most Significant Bit (MSB) is a 1. All other bits, including the Least Significant Bit (LSB), are 0.

The INIT parameter defines a logic truth table. The binary 1000 gives the order of logic output values in the table:

en d q
0 0 0 <– LSB
0 1 0
1 0 0
1 1 1 <– MSB

As you might imagine, the LUT is a powerful logic cell since it can be configured to implement any desired function.

Examine post_route_timing.rpt

Use a text viewer/editor to open the timing report file. The most important part of the report is the timing summary table. Scroll down to find it. Here is a portion of the table:

------------------------------------------------------------------------------------------------
| Design Timing Summary
| ---------------------
------------------------------------------------------------------------------------------------

    WNS(ns)      TNS(ns)  TNS Failing Endpoints  TNS Total Endpoints      WHS(ns)      THS(ns)  
    -------      -------  ---------------------  -------------------      -------      -------  
         NA           NA                     NA                   NA           NA           NA  


All user specified timing constraints are met.

You should look first at the Worst Negative Slack (WNS). This measures register-to-register delay. If the slack is positive, then the delay is short enough to avoid timing faults. If the slack is negative, then the delay is too high, which means the next clock edge will likely occur before the logic signal arrives, causing an error.

In this design, we only have one register, so the WNS cannot be computed. For large designs, WNS often becomes the central focus of design effort. Later in this assignment, you will modify the design to make a signal pipeline in which a logic value is passed from one register to another. The timing slack measures the reliability of that pipeline: a positive slack means the signal beats the clock. A negative slack means the signal doesn’t get there in time.

Examine post_route_utilization.rpt

Use a text viewer/editor to open the timing report file. This file reports how many of each primitive cell type are used in the design. It also reports percent utilization for each cell type. If the utilization exceeds 100% for any resource, then the design cannot be programmed onto the target FPGA part.

The different resources categories are reported in a collection of tables, like this one:

+-------------------------+------+-------+-----------+-------+
|        Site Type        | Used | Fixed | Available | Util% |
+-------------------------+------+-------+-----------+-------+
| Slice LUTs              |    1 |     0 |     20800 | <0.01 |
|   LUT as Logic          |    1 |     0 |     20800 | <0.01 |
|   LUT as Memory         |    0 |     0 |      9600 |  0.00 |
| Slice Registers         |    1 |     0 |     41600 | <0.01 |
|   Register as Flip Flop |    1 |     0 |     41600 | <0.01 |
|   Register as Latch     |    0 |     0 |     41600 |  0.00 |
| F7 Muxes                |    0 |     0 |     16300 |  0.00 |
| F8 Muxes                |    0 |     0 |      8150 |  0.00 |
+-------------------------+------+-------+-----------+-------+

In the FPGA, the LUTs and Registers are arranged into Logic Slices, which are further grouped into Configurable Logic Blocks (CLBs). A number of multiplexers (Muxes) serve as configurable switches to control how the slices are inter-connected. In most designs, these are the most heavily used resources.

Scrolling further down, you will see a list of other resources that are not used in this design, but may be valuable in future projects:

These resources provide specialized functionality beyond the “programmable gate array” features, allowing for designs with better performance. For example, the DSP module contains a dedicated multiplier. Since it is able to be optimized to work solely as a multiplier, it is much faster and more compact than what could be achieved with all-purpose logic slices. We will explore some of these special-purpose modules in future assignments.

Program the Basys3

Using the Vivado Hardware Manager

On your local machine, plug in the Basys3 board, turn it on, and launch the Vivado hardware manager. To access the Hardware Manager, launch Vivado from the Applications menu or from a local terminal. It should present you with a window showing various options, where you should select Open Hardware Manager.

Next, the Hardware Manager window should appear. Near the top left of the window, click Open Target and select Auto Connect. It should identify your Basys3 device and open a connection to it. Then click Program Device and provide the path to your simple_module.bit file.

Using a Thumb Drive

Copy your bitstream file onto a thumb drive with a USB Type-A connector (Type-A is the most common type). Make sure there is only one .bit file in the top folder of your thumb drive. Eject the thumbdrive from your computer. Turn off the Basys3. Plug your thumbdrive into the USB port on the right side, above the pushbuttons. Move the JP1 jumper to the USB setting. Turn on the Basys3. The amber light should “throb” for a moment while the bitstream is retrieved from the thumb drive. Once complete, the amber light will turn off and your design should be active.

Test the Module

Once the board is programmed, verify the truth table:

sw0 sw1 LED0
0 0 0
0 1 0
1 0 0
1 1 1

Design Exercise

Modify the design to add a second register, so that we can properly evaluate the WNS. This modification will create a logic pipeline between two registers:

Diagram of a simple logic pipeline.

In this pipeline, the signal _q is connected between two registers: it is the output of D Register 1, and it is the input of D Register 2. Both of these registers are controlled by the same clock signal. When the external input d changes, Register 1 locks-in the new value upon the rising edge of clk. There is some physical delay before the new value appears on _q. Register 2 will lock-in the new value upon the rising edge of clk, so _q has to stabilize before the clock rises. If _q stablizes early, the extra time is called the slack:

Illustration of timing slack in a signal pipeline.

To change your design and implement the signal pipeline, follow these steps:

synth_design -top top -part xc7a35tcpg236-1

Then simulate the design by running make. You should observe a two-cycle delay in the output signal.

Once verified, run make implement to build the design. Then open the .rpt files and observe:

You do not need to program the design onto your board.

Turn in Your Work

To turn in your work, run these commands:

git add *.rpt
git add src/top.v
git commit . -m "Complete."
git push origin main

Then indicate on Canvas that your work is done.