PRISM model: RFB circuit

Winstead via FLUENT Verification Project

Wed, 28 Aug 2019 11:23:46 -0600

!!FLUENT Verification Project

Yesterday we had our USU group meeting between myself, Zhen and Thakur. I explained the "Restorative Feedback" (RFB) concept and they helped me correct some issues in my PRISM model of the circuit. Here is a summary of that information for the larger group's benefit.

RFB is a method of triple-modular redundancy (TMR) that was inspired by the genetic "quorum trigger" circuit we developed under our previous grant. It is applicable to a "fully triplicated" TMR system described below. Where a TMR system uses majority-vote gates to correct or suppress errors, the RFB system uses Muller C-elements. Some motivations for using RFB over TMR are:

RFB can correct single "hard faults" (i.e. long-lasting or permanent errors) in addition to double "soft faults" (momentary glitches). TMR can only correct a single fault at any time.
RFB is well defined for multi-level logics, and a ternary-CMOS implementation was demonstrated. TMR does not have well-defined behavior for non-binary signals.
RFB can mask or self-correct voter faults.

Now some detailed definitions:

TMR System

A TMR system has three copies of all components and signals. Within the system, majority voters are periodically inserted to correct errors. In a fully-triplicated system, there are always three voters so that voter errors have a change to be corrected "downstream" from the faulty voter. At any point in the system, a bundle error occurs if there are two or three signal errors within the same triplet (I prefer "information error" instead of "bundle error" but no one else says this).

A majority voter is able to correct any type of single error within a bundle. Suppose we have three signals [tex]$x_0, x_1, x_2$[/tex], representing copies of an information signal [tex]$x$[/tex]. If there are no signal errors, then [tex]$x_i=x$[/tex] for all [tex]$i$[/tex]. Suppose an error occurs only on [tex]$x_0$[/tex], due to faulty logic, interference, random noise, cosmic rays or another error source. Then [tex]$\text{Maj}(x_0,x_1,x_2)=x$[/tex].

If there are errors on both [tex]$x_0$[/tex] and [tex]$x_1$[/tex], then the voter cannot correct it. Now consider a system with a mixture of hard and soft faults: [tex]$x_0$[/tex] is a permanent fault, and a momentary fault may appear on [tex]$x_1$[/tex] or [tex]$x_2$[/tex]. In this situation, the voter is transparent to errors in [tex]$x_1$[/tex] and [tex]$x_2$[/tex]: it just propagates the errors into the rest of the system. In a fully-triplicated system, this results in a bundle error.

RFB System

The RFB concept is based on the C-element's ability to block momentary upsets:

C-element: output q and inputs a, b.
Behavior: if (a==b) then q<=a

The C-element acts as a latch, so that it's state is unchanged when a!=b. Suppose the C-element's inputs are [tex]$x_0$[/tex] and [tex]$x_1$[/tex], as defined in the TMR discussion. Then a single momentary fault is masked by the C-element, since it's output will only change when [tex]$x_0==x_1$[/tex].

The RFB circuit is comprised of three C-elements, as shown in the attached figure. For each C-element, the inputs are defined by multiplexers that select between setup ([tex]$\phi=0$[/tex]) and hold ([tex]$\phi=1$[/tex]) states.

Setup: C-element [tex]$C_i$[/tex] is set to [tex]$z_i<=x_{(i-1)}$[/tex], indicated by the red wires in the schematic.
Hold: C-element [tex]$C_i$[/tex] has inputs [tex]$x_i$[/tex] and [tex]$z_{(i-1)}$[/tex].

In the hold phase, since the initial values at C-element [tex]$C_i$[/tex] are [tex]$z_{(i-1)}==x_{(i+1)}$[/tex] and [tex]$z_{i}==x_{(i-1)}$[/tex], and the remaining input is [tex]$x_i$[/tex], the C-element settles to a state determined by all three of the input signals. It can be shown that each C-element's hold state is equal to [tex]$\text{Maj}(x_0,x_1,x_2)}$[/tex], so RFB is initialized to the TMR behavior.

The benefit of RFB appears in the hold state:

C-element faults (state upsets) are physically suppressed by the feedback signals. This part emulates the quorum trigger.
Input momentary faults are masked by the C-elements.
Correct input signals suppress C-element state faults, since they maintain active drive within the C-element circuit.
Incorrect input signals leave the C-element state floating, which makes a fault more likely, however the fault is not directly propagated.

PRISM Model

I represented the RFB circuit as a DTMC model in PRISM. Logic signal values are 0, 1 and X (encoded as 2). There are four modules:

Clock Generator: I used a stochastic clock generator that starts in state 0 and switches to state 1 at a random time.
Signal Source: initialized to value 0, flips to X with probability epsilon. From X, it can flip to 1 or 0 with equal probability. From 1, it flips back to 0.
MIUX: assumed to be error-free.
C-element: behaves like an ideal C-element when inputs are 00 or 11. Can flip to X when inputs are mixed. From X, can flip to 1 or 0 with equal probability.

I triplicated components 2--4 using the variable-renaming syntax. The model builds in PRISM without errors or warnings. I'm now running some CSL properties to see what I can do with it.


// Restorative Feedback (RFB) model 
// based on C-element with MUX initialization
// logic values: 0, 1, X (X == 2)

dtmc

const double epsilon=0.005;
const double alpha;

//------------------
// Clock Source
//------------------
module clkgen

clk : [0..1] init 0;

[] (clk=0) -> 0.25:(clk'=1) + 0.75:(clk'=0);

endmodule

//------------------
// C-elements
//------------------
module C0
 z0 : [0 .. 2] init 2;

// Primary deterministic function:
 [] a0=1 & b0=1 -> (z0'=1);
 [] a0=0 & b0=0 -> (z0'=0);

// Effect of indeterminate inputs:
 [] (b0=2)&(a0!=2) -> 0.5:(z0'=a0) + 0.5:(z0'=z0);
 [] (a0=2)&(b0!=2) -> 0.5:(z0'=b0) + 0.5:(z0'=z0);
 [] (a0=2)&(b0=2) -> (z0'=2);

// State upsets:
 [] ((a0=0&b0=1)|(a0=1&b0=0))&(z0!=2) -> alpha:(z0'=2) + (1-alpha):(z0'=z0);
 [] (z0=2)&(!(a0=0 & b0=0))&(!(a0=1 & b0=1)&a0!=2 & b0!=2) -> 0.5:(z0'=0) + 0.5:(z0'=1);

endmodule

module C1 = C0 [ z0=z1, a0=a1, b0=b1 ] endmodule
module C2 = C0 [ z0=z2, a0=a2, b0=b2 ] endmodule

//------------------
// MUX switches
//------------------
module MUXA0

a0 : [0..2] init 0;

[] (clk=1) -> (a0'=x2);
 [] (clk=0) -> (a0'=z2);

endmodule

module MUXB0 = MUXA0 [ a0=b0, z2=x0, x2=x2 ] endmodule

module MUXA1 = MUXA0 [ a0=a1, z2=z0, x2=x0 ] endmodule
module MUXB1 = MUXA0 [ a0=b1, z2=x1, x2=x0 ] endmodule

module MUXA2 = MUXA0 [ a0=a2, z2=z1, x2=x1 ] endmodule
module MUXB2 = MUXA0 [ a0=b2, z2=x2, x2=x1 ] endmodule

//------------------
// Signal Sources
//------------------
module Source0

x0 : [0..2] init 0;

[] (x0=0) -> epsilon:(x0'=2) + (1-epsilon):(x0'=0);
 [] (x0=2) -> 0.5:(x0'=1) + 0.5:(x0'=0);
 [] (x0=1) -> (x0'=0);

endmodule

module Source1 = Source0 [ x0=x1 ] endmodule
module Source2 = Source0 [ x0=x2 ] endmodule

#PRISM #RFB #models #TMR #nanotech #FaultTolerance #Celement

Attachments:

RFB schematic (PNG)
RFB schematic (TikZ source)
RFB model (PRISM)

meetings models

Chris Myers

Wed, 28 Aug 2019 19:35:20 -0600

Last image not showing up. Maybe permissions?

Winstead

Wed, 28 Aug 2019 21:39:51 -0600

@Chris Myers it could be due to permissions. I opened the image to public access now.

Chris Myers

Thu, 29 Aug 2019 07:22:03 -0600

Cool. I can see it now.