!!
FLUENT Verification ProjectYesterday we had our USU group meeting between myself, Zhen and Thakur. I explained the
"Restorative Feedback" (RFB) concept and they helped me correct some issues in my PRISM model of the circuit. Here is a summary of that information for the larger group's benefit.
RFB is a method of
triple-modular redundancy (TMR) that was inspired by the genetic "quorum trigger" circuit we developed under our previous grant. It is applicable to a "fully triplicated" TMR system described below. Where a TMR system uses majority-vote gates to correct or suppress errors, the RFB system uses Muller C-elements. Some motivations for using RFB over TMR are:
- RFB can correct single "hard faults" (i.e. long-lasting or permanent errors) in addition to double "soft faults" (momentary glitches). TMR can only correct a single fault at any time.
- RFB is well defined for multi-level logics, and a ternary-CMOS implementation was demonstrated. TMR does not have well-defined behavior for non-binary signals.
- RFB can mask or self-correct voter faults.
Now some detailed definitions:
TMR System
A TMR system has three copies of all components and signals. Within the system, majority voters are periodically inserted to correct errors. In a fully-triplicated system, there are always three voters so that voter errors have a change to be corrected "downstream" from the faulty voter. At any point in the system, a
bundle error occurs if there are two or three signal errors within the same triplet (I prefer "information error" instead of "bundle error" but no one else says this).
A majority voter is able to correct any type of single error within a bundle. Suppose we have three signals [tex]$x_0, x_1, x_2$[/tex], representing copies of an information signal [tex]$x$[/tex]. If there are no signal errors, then [tex]$x_i=x$[/tex] for all [tex]$i$[/tex]. Suppose an error occurs only on [tex]$x_0$[/tex], due to faulty logic, interference, random noise, cosmic rays or another error source. Then [tex]$\text{Maj}(x_0,x_1,x_2)=x$[/tex].
If there are errors on both [tex]$x_0$[/tex] and [tex]$x_1$[/tex], then the voter cannot correct it. Now consider a system with a mixture of hard and soft faults: [tex]$x_0$[/tex] is a permanent fault, and a momentary fault may appear on [tex]$x_1$[/tex] or [tex]$x_2$[/tex]. In this situation, the voter is
transparent to errors in [tex]$x_1$[/tex] and [tex]$x_2$[/tex]: it just propagates the errors into the rest of the system. In a fully-triplicated system, this results in a bundle error.
RFB System
The RFB concept is based on the C-element's ability to block momentary upsets:
C-element: output q and inputs a, b.
Behavior: if (a==b) then q<=a
The C-element acts as a latch, so that it's state is unchanged when a!=b. Suppose the C-element's inputs are [tex]$x_0$[/tex] and [tex]$x_1$[/tex], as defined in the TMR discussion. Then a single momentary fault is masked by the C-element, since it's output will only change when [tex]$x_0==x_1$[/tex].
The RFB circuit is comprised of three C-elements, as shown in the attached figure. For each C-element, the inputs are defined by multiplexers that select between
setup ([tex]$\phi=0$[/tex]) and
hold ([tex]$\phi=1$[/tex]) states.
- Setup: C-element [tex]$C_i$[/tex] is set to [tex]$z_i<=x_{(i-1)}$[/tex], indicated by the red wires in the schematic.
- Hold: C-element [tex]$C_i$[/tex] has inputs [tex]$x_i$[/tex] and [tex]$z_{(i-1)}$[/tex].
In the hold phase, since the initial values at C-element [tex]$C_i$[/tex] are [tex]$z_{(i-1)}==x_{(i+1)}$[/tex] and [tex]$z_{i}==x_{(i-1)}$[/tex], and the remaining input is [tex]$x_i$[/tex], the C-element settles to a state determined by all three of the input signals.
It can be shown that each C-element's hold state is equal to [tex]$\text{Maj}(x_0,x_1,x_2)}$[/tex],
so RFB is initialized to the TMR behavior.
The benefit of RFB appears in the hold state:
- C-element faults (state upsets) are physically suppressed by the feedback signals. This part emulates the quorum trigger.
- Input momentary faults are masked by the C-elements.
- Correct input signals suppress C-element state faults, since they maintain active drive within the C-element circuit.
- Incorrect input signals leave the C-element state floating, which makes a fault more likely, however the fault is not directly propagated.
PRISM Model
I represented the RFB circuit as a DTMC model in PRISM. Logic signal values are 0, 1 and X (encoded as 2). There are four modules:
- Clock Generator: I used a stochastic clock generator that starts in state 0 and switches to state 1 at a random time.
- Signal Source: initialized to value 0, flips to X with probability epsilon. From X, it can flip to 1 or 0 with equal probability. From 1, it flips back to 0.
- MIUX: assumed to be error-free.
- C-element: behaves like an ideal C-element when inputs are 00 or 11. Can flip to X when inputs are mixed. From X, can flip to 1 or 0 with equal probability.
I triplicated components 2--4 using the variable-renaming syntax. The model builds in PRISM without errors or warnings. I'm now running some CSL properties to see what I can do with it.
// Restorative Feedback (RFB) model
// based on C-element with MUX initialization
// logic values: 0, 1, X (X == 2)
dtmc
const double epsilon=0.005;
const double alpha;
//------------------
// Clock Source
//------------------
module clkgen
clk : [0..1] init 0;
[] (clk=0) -> 0.25:(clk'=1) + 0.75:(clk'=0);
endmodule
//------------------
// C-elements
//------------------
module C0
z0 : [0 .. 2] init 2;
// Primary deterministic function:
[] a0=1 & b0=1 -> (z0'=1);
[] a0=0 & b0=0 -> (z0'=0);
// Effect of indeterminate inputs:
[] (b0=2)&(a0!=2) -> 0.5:(z0'=a0) + 0.5:(z0'=z0);
[] (a0=2)&(b0!=2) -> 0.5:(z0'=b0) + 0.5:(z0'=z0);
[] (a0=2)&(b0=2) -> (z0'=2);
// State upsets:
[] ((a0=0&b0=1)|(a0=1&b0=0))&(z0!=2) -> alpha:(z0'=2) + (1-alpha):(z0'=z0);
[] (z0=2)&(!(a0=0 & b0=0))&(!(a0=1 & b0=1)&a0!=2 & b0!=2) -> 0.5:(z0'=0) + 0.5:(z0'=1);
endmodule
module C1 = C0 [ z0=z1, a0=a1, b0=b1 ] endmodule
module C2 = C0 [ z0=z2, a0=a2, b0=b2 ] endmodule
//------------------
// MUX switches
//------------------
module MUXA0
a0 : [0..2] init 0;
[] (clk=1) -> (a0'=x2);
[] (clk=0) -> (a0'=z2);
endmodule
module MUXB0 = MUXA0 [ a0=b0, z2=x0, x2=x2 ] endmodule
module MUXA1 = MUXA0 [ a0=a1, z2=z0, x2=x0 ] endmodule
module MUXB1 = MUXA0 [ a0=b1, z2=x1, x2=x0 ] endmodule
module MUXA2 = MUXA0 [ a0=a2, z2=z1, x2=x1 ] endmodule
module MUXB2 = MUXA0 [ a0=b2, z2=x2, x2=x1 ] endmodule
//------------------
// Signal Sources
//------------------
module Source0
x0 : [0..2] init 0;
[] (x0=0) -> epsilon:(x0'=2) + (1-epsilon):(x0'=0);
[] (x0=2) -> 0.5:(x0'=1) + 0.5:(x0'=0);
[] (x0=1) -> (x0'=0);
endmodule
module Source1 = Source0 [ x0=x1 ] endmodule
module Source2 = Source0 [ x0=x2 ] endmodule
#
PRISM #
RFB #
models #
TMR #
nanotech #
FaultTolerance #
CelementAttachments:
RFB schematic (PNG)
RFB schematic (TikZ source)
RFB model (PRISM)