summary.md 7.6 KB


title: Verification and Test Talk date: November 8th, 2023 author:

  • Presented by Harald Schnirzer and Thomas Fromherz.
  • Summary by Christian Töpfl, Norbert Tremurici. ---

Presentation

Motivation

In 1993 Intel released the Pentium lineup and Professor Thomas Nicely found a bug in them in the fdiv module. This bug cost Intel about 475 million dollars. This showed how important verification can be and therefore verification is used to find bugs like this and to ensure that the hardware works as intended and the number of possible bugs is reduced.

Introduction

In hardware developement in general we want to verify if components adhere to their specification, how the components react to various inputs and if they are correct in reference to specific standards. The difference between verification and validation is that verification ensures that the design is right and that the design meets the specification. Validation checks if it is the right design. And in general the specification should be interpreted correctly and that it is implemented correctly.

A fault is the cause of an error, an error is an incorrect state which may lead to failure, and a failure is a deviation from specified/desired behaviour.

Types

In general there are a few methods to check if the system works. This can range from a simple run of the software, checking whether it works, up to a functional and a formal verification which works with a model using mathematical abstraction.

Software verification

In software it is only possible to show the presence of failures but not the complete absence of those. An important thing to keep in mind in software verification is the coverage of all aspects such as Input Space Partitioning, Graph coverage, Logic coverage and Syntax-based coverage. A big help for the verification is the V model commonly used in larger projects (slide 11).

Current trends in the software verification are Test Driven Developement where the predefined test is created at first and then the implementation, automated testing where tools and scripts are used to create tests for the system and a machine learning approach which is a hot research topic right now.

Hardware verification

In hardware verification we want that the design adheres to the specifications and that the safety required my the specification is met by the design. The goal is to show that design behaves like specified. The behavior is the wanted output to given inputs. A problem we run into here is the space explosion problem, because in larger system not every input and output possibility can be checked as they are way too big.

Another approach is the formal verification where a mathematical model is used to guarantee the system is working for every possible input/output without explicit testing. This can be used for smaller curcuits but it still does not solve the state explosion problem.

Model checking is an effective technique for hardware with a finite state space. It involves translating system specifications into temporal logic (e.g., CTL, LTL) to systematically explore and verify properties in all states. While state space explosion remains a challenge, smart exploration techniques help mitigate it.

Bounded Model Checking (BMC)

Bounded Model Checking (BMC) efficiently examines a limited number of clock cycles (k) using Boolean logic and SAT transformation. It's excellent for quickly detecting bugs in smaller hardware blocks over a limited timeframe, but it doesn't address state space explosion.

Simulation

Simulation offers a practical alternative to exhaustive testing, relying on random testing with tools, testbenches, and high-level HDL concepts. However, manually verifying waveforms for correctness can be labor-intensive for complex systems.

Reference Model

Reference models automatically check hardware results for correctness by providing a higher-level implementation of system specifications. These models are guaranteed to be correct and can simplify the verification process, potentially generated from specifications.

RTL to TLM

Comparing waveform output to a reference model is simplified by transforming the waveform to the abstraction level of the reference model (e.g., RTL to Transaction Level Modeling).

\pagebreak

Discussion

  • On the slide about hardware testing and chaining flip-flops, does it test programmable logic in general or is it only meant to be used at the manufacturing point?
    • Usually there is a separate input (scan enable)
    • All internal flip-flops are chained together, so test vectors can be shifted in
  • Do you have a sense how many resources are used in hardware chips for testing purposes? (like BIST, ...), would it need one multiplexer for every flip-flop?
    • There is roughly 10% - 20% overhead
  • How does testing interact with optimization? For example, the optimizer might optimize test resources away. How do you make sure your testbench stays synchronized with the changes brought about by optimization?
    • automatic test vector generation generates for actualized hardware, so the test target is well-defined
    • for functional tests the inputs/outputs remain the same, so this test target is also well-defined
  • Why was the FDIV bug not catched even though scan test exists with almost 100% coverage?
    • scan test assumes the design is correct and tests whether the hardware works, but the assumptions were wrong
    • scan test is not a golden catch-all solution
  • What's often also forgotten: documentation, rarely is the implementation tested against the documentation.
    • a classic: outdated, wrong, imprecise or ambiguous documentation (that's where formal specification on the other hand shines!)
  • Do you have a sense how many protocols/specifications offer a reference model (see example of spike, riscv isa sim)
    • extra effort that often cannot be expended
  • Why do we keep accepting faults in software, but complain about every hardware fault?
    • our experience of the density of errors is different, we expect software errors, but not hardware errors
    • it is a bit unfair to ask this question, software rests on hardware and deals with a lot of complexities (though there is a neverending struggle to abstract away those complexities)
    • often not a lot of choice: we accept software faults because big companies often don't care, although if they accumulate too much of a bad reputation, they can crumble under the weight
  • Software is much more accessible, beginner friendly and cheaper
    • level of expertise: everyone feels like they can write software, which lowers the bar, but hardware...
  • Is it really more difficult to verify/test software than hardware?
    • hardware rests on a deeply complex foundation (PVT variations, ...), which is a reason for hardware verification being more difficult
    • an argument for the difficulty of software testing: in hardware there are more fundamental blocks that don't change much, whereas software has entire structures that change more frequently
      • the exploitation of the fact that common hardware structures don't change is reflected in the industry hype about (verified) IP blocks
  • Are hardware verification techniques also used in software?
    • model checking came from the hardware side first, used for software now
    • similar story for random testing
  • Byzantine tolerant model story: a system was formally proven to be tolerant against byzantine faults, but when the system was built it failed, why?
    • only digital voltage levels were assumed, but metastability brought the system down
    • there is always a residual gap between what is formally proven and what is actually reality