Modeling Random Access Memory (RAM)

RAM basics

Most digital systems use general-purpose data storage known as Random Access Memory, or RAM. FPGAs typically provide some form of Block RAM resources. In Xilinx FPGAs these are organized as configurable memory segments that can be arranged to make memories with different bit-width and storage capacity.

Memory Terminology

From the vantage point of RTL design, a RAM is an array. Each element of the array is called a word, and the words all have the same number of bits, called the bit width W or word size. Common word sizes are 8, 16, 32, or 64 bits.

The length of the array is called the depth D. Usually the depth is a power of 2, e.g. 128 words, or 256, 512, 1024, and so on.

A particular position in the array is called the address a, and the number of address bits is N >= log2(D). Usually N = log2(D), but sometimes the memory depth is less than the total addressable range.

At minimum, a RAM needs these I/O ports:

Most RAMs will also require clk and rst_l ports.

Reading and Writing

A RAM array supports both READ and WRITE operations:

An ambiguity occurs when there are simultaneous READ and WRITE events. In this situation, there are two protocols:

Many RAM circuits (including the Xilinx Block RAM) can select between these protocols. The distinction can be critical for memory-intensive algorithms, especially in high-speed real-time applications like signal- processing and graphics imaging.

Buffering

RAMs can utilize a register to buffer the output data. Buffering is important if the RAM lies in the system’s critical path, meaning the whole system clock rate is limited by the RAM speed. By using a buffer, the signal propagation delay can be reduced allowing for a faster clock speed.

The drawback to buffering is that the RAM’s output data is delayed by a full clock cycle. This is called latency; the RAM (and the whole system) runs faster on average, but there is an added clock-cycle delay. To visualize latency, suppose we perform 100 consecutive READ operations. The data from the first READ will be delayed by a clock cycle, but after that initial delay the RAM will deliver one word per clock cycle. The initial delay is called latency, and the average output rate is called the throughput.

Assigned Tasks

You will design and implement a READ-FIRST Buffered RAM and demonstrate it using the Basys3 board.

Verilog RAM Template

The Vivado synthesis tool is able to infer a RAM if the module is written in the format shown here:

module single_port_RAM
  #(
     parameter DATA_WIDTH=8,
     parameter ADDR_WIDTH=8
  )
  (
     input clk,
     
     input rd,
     input wr,

     input      [ADDR_WIDTH-1:0] addr,
     input      [DATA_WIDTH-1:0] d_in,
     output reg [DATA_WIDTH-1:0] d_out
  );

  // The Memory array:
  reg [DATA_WIDTH-1:0] ram[2**ADDR_WIDTH-1:0];

  always @(posedge clk) begin
     if (rd) 
       d_out <= ram[addr];
     if (wr)
       ram[addr] <= d_in;
  end
endmodule

In this code segment, the bit width W is declared as parameter DATA_WIDTH, and the address bit width N is declared as parameter ADDR_WIDTH. The memory depth D is 2^N; in Verilog the exponential operation is a double asterisk, as in 2**ADDR_WIDTH.

The RAM itself is represented as a Verilog array:

reg [DATA_WIDTH-1:0] ram[2**ADDR_WIDTH-1:0];

This line declares an array of bit vectors. Each bit vector has width DATA_WIDTH. There are a total of 2**ADDR_WIDTH vectors in the array. The individual vectors are addressed using C-style array indexing:

     d_out <= ram[addr];

The above line returns DATA_WIDTH bits from the memory at the location pointed to by addr.

Simulate the RAM

Make a Verilog module for the RAM design. Create a testbench to verify the RAM. For every address 0 to 255, do these steps, each in a separate clock cycle:

  1. Generate a random value for d_in
  2. WRITE to the memory (wr <= 1, rd <= 0)
  3. READ from the memory at the same address (wr <= 0, rd <= 1)
  4. Log the values of d_in and d_out (using $write)
  5. Generate a new random value for d_in
  6. Simultaneously READ and WRITE (wr <= 1, rd <= 1)
  7. Log the values of d_in and d_out
  8. Do one more READ operation (wr <= 0, rd <= 1)
  9. Log d_in and d_out again

You will need to make a state machine for these tests, since the steps unfold across several clock cycles. Note that wr and rd should be assigned 0 except when performing a WRITE or READ.

Here is a partial output from my testbench:

Address e3
   1: d_in = 90 d_out = 90  <-- write then read
   2: d_in = 51 d_out = 90  <-- write/read simultaneous
   3: d_in = 51 d_out = 51 <-- read

Address e4
   1: d_in = 69 d_out = 69  <-- write then read
   2: d_in = 77 d_out = 69  <-- write/read simultaneous
   3: d_in = 77 d_out = 77 <-- read

Address e5
   1: d_in = 4a d_out = 4a  <-- write then read
   2: d_in = d8 d_out = 4a  <-- write/read simultaneous
   3: d_in = d8 d_out = d8 <-- read

This output demonstrates that the output data matches the input data, and that the RAM carries out read-before-write when read/write occur simultaneously.

Implement the Design

Create XDC and build.tcl files with these mappings:

After synthesis is complete, open the utilization report and note the use of Block RAM (BRAM) resources.

Demonstrate the Design

Implement the design and demonstrate it on the Basys3 board. Record a short video demonstrating your design. In the video, you should:

Upload the video to Canvas.

Turn in your work using git:

git add case* src/*.v *.v *.rpt *.txt *.tcl *.bit *.xdc
git commit . -m "Complete"
git push origin master