Why Do Constrained Random Verification

8 min readApr 4, 2020

TL;DR:

Over time chips became too complex to profitably design and manufacture.[¹]

By getting computers to do some “guided guessing” (i.e. constrained random generation of tests) chip design engineers were able to be luckier than before in finding critical bugs before a chip went into high-volume manufacturing, thus preventing most chips design projects from experiencing certain failure.

Longer Form

Over time chips became too complex to profitably design and manufacture.[¹]

The diagram below illustrates this phenomenon:

Graph showing chip design complexity versus design productivity — Chip Design Complexity vs Chip Designer Productivity[¹]

What should you gleam from this diagram is that between 1980 and 2010 the number of logic gates per chip grew by 58%/yr while the number of transistors a design engineer could successfully utilize per month grew only by 21%/yr.

Clearly there is a gap. It is known in the semiconductor industry as the design/verification gap. As a consequence of this gap, chip design projects exhibit the following[²]:

61% of all chip design projects complete behind schedule.
67% of all chip design projects are not successful with first silicon.
The number of design verification engineers on a chip design team is growing by 12.5% CAGR versus 3.7% for design engineers.

But the chip design industry did not let this challenge go unaddressed. We over time have adopted a lot of design practices and technologies to close this gap.

Generation of test stimulus via constrained random techniques is a key technology adopted by design engineers to address the design/verification gap challenge.

Follow me: I will explain how it solves the problem of exploding complexity in modern chip designs.

Closing the Design/Verification Gap: The Journey from Directed Testing to Constrained Random Verification

Take the diagram below to represent the entire state space of a modern chip design as depicted by the big circle; within that circle exists a design defect (i.e. a bug), represented by the small red circle. For the purposes of this post, I assume that the test space (the set of tests required to cover a given state space) is the same as the state space itself.

It use to be (back in the day) that this defect could be captured via exhaustive testing as shown below:

How the bug could be found via exhaustive testing

In the diagram above the gray circle represents the state space of the design completely covered by a collection of tests that exhaustively exercises all functionality as presented by the state space of the design. Because the testing essentially covers everything, we are able to discover the bug represented by the small red circle.

Because of the design/verification gap, this design verification strategy soon became infeasible. Why?: When exhaustive testing was feasible, chip designs were composed of on the order of just a few thousand transistors. Over time the transistor count of chip designs grew to 1 billion or more transistors. This resulted in the state space of chip designs growing to 100’s of billions (if not trillions) of testable states. Even if an engineer could write one test per state in the design per second it would take 3000+ years for that engineer to exhaustively test a design with 100 billion states.

As an illustration, the table below lists a few generations of NVIDIA GPUs and their respective transistor counts [³]:

As chips grew in size, some chip design teams then opted for directed testing as shown below:

How the bug could (or could not) be found using a directed testing strategy

With directed testing, an engineer would read the design specification and other relevant material to determine the set of critical features that required testing to ensure a successful chip design project. Often these plans would be reviewed with the team to sanity check the engineers prioritization. Once the plan (otherwise known as a test plan) is approved, the engineer would then write tests. Each test would perform a dedicated and fixed verification of the collection of critical features as laid out in the test plan. In the diagram above the small grey circles represent the portion of the entire chip’s state space/functionality tested by each test, one test per circle.

As shown in the same diagram above, a bug (the red circle) would often go undetected by this method of verification. Sometimes these bugs resulted in product failure (FDIV bug), other times these bugs were considered to be “errata” (i.e. documented defects not to be fixed until a later generation of the chip, or never to be fixed). Here’s a link to errata for a recent Intel processor: Section 4 of the linked PDF

The lesson learned from the directed testing era was this: The bugs exist where you least expect them. So we needed to utilize a technology that would enable testing to cover not only the cases we cared about, but also increased the chances of exercising the relevant cases that we didn’t know that we should care about.

As a consequence of this new directive in design verification, the next step in the journey to close the design/verification gap involved the adoption of random testing. The diagram below illustrates how this worked:

Purely Random Testing (a.k.a “spray and pray”)

The grey circles that represent directed tests one diagram ago have been replaced with a grey cloud that represents a randomly generated test in the diagram immediately above.

What do I mean by randomly generated: Directed tests are written to run a fixed set of stimulus patterns representing a single or explicitly enumerated set of desired scenarios. When you run a directed test, you always get the same stimulus, no matter how many times you run that test. On the other hand, randomly generated tests generate a different stimulus pattern each time the test is run. Each stimulus pattern generated represents a scenario out of a set of possible scenarios given the physical characteristics of the design.

But as you can see, there are two problems with randomly generated stimulus:

When you just randomize stimulus based on the set of possibilities limited only by the physical characteristics of the design (e.g. based solely on the width of the data and address busses) you will generate both valid stimulus (stimulus inside the big circle), and invalid stimulus (stimulus outside of the big circle)
Given the size of the state space, you may spend too much time generating stimulus that may be possible, and even that is valid within the bounds of the state space of the design, but that does not ultimately lead to the set of important scenarios that would expose a latent design bug.

One possible and insidiously painful outcome is that you waste both engineering time and compute resources simulating and then debugging tests that yield false-negative results. By that I mean tests that end in errors that are not indicative of a design issue, but rather an invalid test.

In light of these problems , engineers sought to combine the strenghts of directed stimulus with those of purely random stimulus.

This led to the advent of constrained random stimulus[⁴].

Constrained random stimulus enables engineers to achieve two goals in a single test:

Prioritize test scenarios based on the tester’s understanding of the design and her assessment of where the bugs may most likely reside within the design.
Leverage automation to enable the generation of tests that the tester may not have presumed to be important, yet unexpectedly uncover critical bugs in the design.

This is what constrained random stimulus looks like in relationship to the state space of the design:

constrained random testing — Constrained Random Testing (“Are we there yet? Not quite.”)

As you can see each test no longer just covers a very well defined and fixed subset of the design’s state space. Also, because the tests are constrained based on both the physical constraints of the design but also on the the engineer’s understanding of the functionality of the design, each test does not ever go outside the bounds of the state space of the design.

However, a problem yet remains: We still have tests that currently do not hit the bug that exists within the design. (The little red circle isn’t touching one of those grey clouds).

In this situation, the power of constrained random testing can be wielded: If the constraints that define the possible permutations of each test would eventually result in at least one test being generated that would cover the part of the state space wherein the bug exists, then the engineer will be able to hit that location in the state space merely by running more random permutations of that test.

Below shows an example where we just run more permutations of the last test and finally uncover the bug:

“Directed Random Testing” == “Constrained Random Testing” == “Lucked Out”

The point I want to make is that we found the bug for free. In other words, we found the bug not by making the engineer work harder dreaming up new test scenarios; we essentially “lucked out” by making a computer run more permutations of the random test.

Another possibile situation with respect to the transition from the two diagrams ago (“Are we there yet?”) to the diagram above (“Lucked Out”) is that a test is over constrained (i.e., the constraints that determine the possible permutations of a given test were defined too strictly and left out possibilities supported in the design). This problem is easier to spot and correct using constrained random stimulus than with purely random stimulus because adjusting already valid constraints to encorporate newly realized valid possibilities will more likely yield novel valid permutations than if you were to just adjust contraints that sometimes result in valid test stimulus and sometimes yield invalid test stimulus. That’s my experience, still looking for some literature on that.

Nevertheless, using constrained random stimulus leaves a few problems left unsolved and has a limitation:

The limitation:

Constrained random stimulus still suffers from the Coupon Collectors Problem.

The problems:

How do I know if certain stimulus patterns that I think are important were ever generated?
How do I know if the chip design’s response to any given randomly generate test stimulus is correct?

I’ll discuss the two problems listed above next. I’ll leave the limitation for sometime later.

I learned the concepts I wrote about above from experience working on projects like the i960 family of processors and the Intel 80200 and other XScale-based products; however, I want to give credit to The Art of Verification with Vera which I think does a very good job describing constrained random verification.

[¹]: Rowen, Chris. Engineering the Complex SOC: Fast, Flexible Design with Configurable Processors. Upper Saddle River, NJ: Prentice Hall Professional Technical Reference, 2004. Figure 8–2.

[²]: Wilson Research Group and Mentor Graphics, 2014 Functional Verification Survey

[³]: https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units

[⁴]: N. Kitchen and A. Kuehlmann, “Stimulus generation for constrained random simulation,” 2007 IEEE/ACM International Conference on Computer-Aided Design, San Jose, CA, 2007, pp. 258–265.

Why Do Constrained Random Verification

TL;DR:

Longer Form

Closing the Design/Verification Gap: The Journey from Directed Testing to Constrained Random Verification

Written by Michael Green