First published in Verification Horizons, July 2022
As a Doulos ‘techie’, I train over 100 engineers in SystemVerilog and UVM each year. I do believe quite soundly, that the effort of simulation verification is an art, supported by the language. So, regardless of the language, I have a ready list of useful testbench coding strategies to achieve faster regression CPU cycle execution. This means more regression tests executed in the same amount of ‘wall-clock’ time!
Often, an engineer wants to write testbench code by looking at online examples. However, these examples are usually written more to indicate what you could do, rather than what you should do, in the interests of CPU cycle usage. Now, if you start at the unit test level, which is excellent because you get to wiggle ALL the inputs (appropriately) and thoroughly test the unit design behavior, it also gives you excellent monitors and scoreboards that help, in the larger system, to isolate errors introduced within that design block.
1. Use SVA (System Verilog Assertion) properties in the design and the testbench
The fastest route to debug is the detection of an error where the error occurred – That seems an obvious statement (similar to finding your keys in the last place you looked) but many engineers do not realize that SVA can help in this regard. Adding SVA properties to check the waveforms of FSM control inputs or protocol "handshake" signals supports the concept of detecting problems at the source. A big plus? SVA is a formal verification language better suited to describing signal shapes and relationships (handshakes) than the SV (SystemVerilog) procedural language. Resulting in less code and more efficient simulation execution. SVA can easily be used with VHDL-based designs, by the way.
2. Assiduously Control File IO
Although SV and UVM reporting allows an engineer to include even the most detailed information throughout the testbench in multiple logfiles, file IO is the most egregious waste of CPU cycles during regression test simulations. There are specific coding strategies to avoid wasting CPU cycles: When a testbench is run with a new seed, the test will likely pass so reporting anything beyond the two file IO messages (“began testing” and “passed”/“failed” ) is a waste of CPU time.
The coding rule/strategy is to set the verbosity of those two file IO (info) statements to UVM_NONE and the verbosity of all other messages to something higher (such as UVM_DEBUG).
Referencing regression runs reminds me to recommend that the team have a system in place of ‘sorting’ the previous regression runs (usually those run on the previous night) for the top ten seeds with the top 10 coverage scores that DID run successfully. Collecting these seeds provides an easy baseline for the next regression session. And, yes, if there are significant changes in the testbench hierarchy or the design, none of the seeds may be the top 10 in this current regression session. However, as the design and verification schedule advances, the testbench matures and the design becomes stable. At that point, those 10 high coverage seeds will become the ‘mini’ regression run needed to check small changes in the design even in the ‘slush’ or ‘freeze’ stage.
Oh, yes, never ‘count’ the coverage score of the regression simulation runs that had errors. It makes no sense for a human to wade through the coverage results of a failed seed to define where the deviation from correct behavior occurred, to ‘salvage’ the coverage from that run. More efficient to fix the testbench code or the design code and get the seed running successfully – no one has that much time in a schedule unless there is no product to ship!!
Finally, under the strategy to control testbench file IO, have a rule that no model uses INFO with a verbosity of UVM_NONE (‘0’) – since it is NOT filterable. There should only be two INFO statements using a verbosity of UVM_NONE at the testbench level, as mentioned above, test started and test completed – pass/fail.
Of course, if the regression simulation run is for a seed that failed, turning up the verbosity is what is necessary to help the engineer quickly debug the failure using file IO statements and waveforms, or transaction recording (see next bullet).
Also, for speed of execution, the testbench that is used for regression simulation test runs (either for the first run of a seed or for expected successful runs of a known seed) is the testbench with the data checking at the outside edges of the design – using none of the unit/block level monitors and scoreboards. Only use that fully populated testbench for reruns of failed (seed) simulations (while setting a higher INFO verbosity). These extra monitors and data checkers contribute file IO of their own, helping to isolate or detect the point of error!
3. Control Signal Capture FileIO for debugging: transaction recording
The ability to display design bus behavior as ‘boxes’ on a ‘thread of execution’ in the waveform display is a huge data collection time saver. Instead of collecting the individual signals with all their transitions, as well as the requirements of understanding the signal handshake behavior which can slow both the simulation and the debug time considerably, the transaction recording is basically a bunch of ASCII text and boxes.
With very little extra code in the BFM or the monitor, transaction recording (and tracking the transaction ‘box’ id (handle)) can speed debug and assist the test engineer in observing behavior on less familiar interfaces. This can better inform the bug report that is filed, as well!
Transaction recordings can be nested – a sequence can relate to its generated transactions and on the waveform view, the test engineer can read “Ethernet packet” (you can always include the file and line number), and nested, the test engineer can read the ‘action’ – ‘header’, ‘payload’, ‘CRC’ and the related attributes – like address and mode and anything else pertinent to the bus behavior. If the data is delayed or results in a separate execution, keeping the transaction ‘box’ id (handle) allows you to ‘attach’ the data attribute at a later simulation time.
No more ‘reading signals’ to discern the action on a bus and no more collecting gigabytes of signal data for debug purposes.
Even more debugging time savings can be gleaned by ‘associating’ the data exiting the design with the data entering the design by associating the transactions ‘checked’ by the checker – the code simply needs to ‘relate’ the two transactions, using the transaction (box) id (handle). It makes debug incredibly cleaner, as well.
Having talked about one of the more egregious time-wasters (file IO) let’s start looking at the nitty-gritty contributors to the regression CPU cycle consumption. However, these next few improvements will be un-measurable if the file IO is not carefully under control!
4. Engineering the coverage model
Everyone knows that when a team switches to constrained random stimulus generation, a coverage model is the ONLY way each simulation run is evaluated. There is no longer a test plan that says “write a test that exercises this function/feature of the design”. Now, the test plan, written by dissecting the Design Specification, outlines the coverage groups and the ‘cov’ properties, as well as taking advantage of the code coverage (of the design code) tool. These all add up to identifying the exercise of these functions and features required in the design.
Having described a coverage group requirement, one of the surest ways to speed the coverage collection and minimize the effort of identifying, from a coverage report, what coverage/test behavior is still needed, is to require every bin to be labeled. There is a two-fold payback here. One, the bins are up-ticked with the identification of a value in the bin range, so the engineer has to consider corner case values. Eventually, if the engineer just doesn’t want to label a LOT of bins, some effort must be expended to carefully strategize the valid, and useful value groupings/ranges/corner cases. Two, since each bin must be labeled and that limits the number of values and value ranges that the engineer is willing to generate, then the number of bins will simply be smaller than the similar bin results from autobinning, especially if any cross-coverage bins need to be established to meet the coverage model. Of course, this means the testbench engineer does NOT use autobinning. The rule here is option.auto_bin_max = 1; in every covergroup! If there is no distinction in the values, one bin should suffice.
Final note, the naming of bins actually speeds the recognition of why there are holes in the coverage – making for shorter review meetings with management.
5. Once the coverage model is engineered, constraints for the stimulus can be constructed
The data stimuli, controlled by constraints, are best organized versus what the engineer learned from the design specification for the coverage model – and it makes sense that the constraints mimic the coverage model needs. Of course, an engineering strategy is needed here, especially in the case where the stimulus must include injected errors. Soft constraints are a way of establishing non-error behavior and yet allow for added, sometimes contravening, constraints that do inject errors.
One of the major strengths of UVM is the factory. With other methodologies, the engineer dealt with testbench upgrades in source code control ‘branches’ while simultaneously maintaining the ‘production’ version or the ‘trunk’ of the testbench being used by the verification test team, and the design team (to check for issues before design source code checkin). However, with the factory type replacements, whether by type or specific instance pathname, the testbench engineer can leave the ‘production’ testbench untouched and, yet, run as many, or all of the regression testing to make sure the new, more sophisticated testbench models work well. This usually removes the time consumption of integration testing, when swapping out the old models, completely for the new fully tested, and upgraded testbench models. Because of the factory, the production testbench is functioning immediately upon integration providing a useable and stable testbench for regression testing as well as design check-in.
Don’t forget, when injecting a stimulus error, it is necessary to create an intuitive way to add the error into the data, in a way that the design engineer can understand and exercise. It is also imperative to change the monitor and scoreboard response to an injected error. The monitor will most likely have a ‘covergroup’ for recognizing the error, and for recognizing that the design did the right thing with the error (possibly an uptick in a QoS (Quality of Service) register or setting a ‘poison’ bit in a packet or some overt, expected, behavior). This means eventually recognizing the error response behavior is ‘normal’ behavior for the design.
Despite the ease of upgrading that the factory provides, any types in the testbench that are being overridden by the factory need to be integrated as part of the testbench. This effort needs to be accomplished over and over, during the regression testing schedule, and especially before the testbench is shared with the next project. It is too difficult to inherit a testbench for production that has bits and pieces, often not very well documented, being added in this ad hoc, though effective, manner.
6. Use configuration objects rather than individual config_db entries
Every time the configuration database is accessed, there is a search for the right ‘pair’, often using some uniquifying string like the topological pathname of the model. That is the ‘dot-separated’ strings provided in the code generating each instance using the create function of the factory.
Actually, for every agent instance, the uvm_active_passive_enum value is stored individually in the configuration data. To avoid working with the 4k bit item stored automatically (an OVM artifact), the recommendation is for each model, that is, virtual sequences, sequences, agents, monitors, sequencers, drivers, and scoreboards (checkers) that there be one configuration object. There are also coordination strategies, where the engineer creates just one configuration object for the agent and passes down the control information in the connect_phase to the agent’s children
Whether the engineer uses the agent strategy for the agent’s possible children, the resulting size of the configuration database becomes MUCH smaller, since the configuration objects only store the reference! That is a word-sized value that represents the address location of the object.
Although the config_db accessors to the database are matching the types stringently, the types of individual data members within the configuration objects are not an issue, just the type of the configuration object itself. Within the configuration object, then, the engineer can include the uvm_active_passive_enum, and where needed, the virtual interface for the agent and/or the agent’s children, monitor, sequencer, and driver, and any other type needed without concern about the strictness of the config_db typing.
7. Use the constructor to avoid RTTI
Many examples require that the resulting return value, by dint of being a base class, must be appropriately cast to the desired derived class. This is referred to as RunTimeTypeIdentification or RTTI.
For instance – get_parent() returns a uvm_component type, which needs to be $cast to the actual derived class type, making it a runtime check. Although a few RTTI amounts to very little CPU cycle expenditure, if this happens for every transaction the CPU cycle expenditure does start adding up.
A better use model is the inclusion of a class data member, referencing the parent derived type and setting it with the same ‘this’, self-referential reference, that the parent provides in the ‘create’ function of the factory. Just set the local reference to the parent in the constructor of the child – this also allows the engineer to use ‘local’ instead of const to protect the parent reference – another small execution savings.
8. Don’t use the UVM Sequence Library
Although it is an interesting idea to have a library of sequences, it really is a trap: if an engineer has a container (a library for sequences) the engineer might feel the need to generate sequences to fill the container.
The number one sequence that should be generated is the randomization of the transaction and sending it to the driver. This is specific to non-framed data. By doing this, with the correct kind of constraints and enough disparate seeds, the transaction-to-transaction behavior should generate some very interesting behaviors in the design. At the frame level, the randomization will have to occur within the data of the frame, of course.
However, using a Sequence Library limits your ability and ease to organize the sequence execution; it is better to have a single ‘parent’ sequence with a distribution that has members including all sequences with an appropriate selection value (using dist). What you want to consider is the number of possible simulation tests in one regression session, including weekend regressions (often a larger group), and the percentage need to run certain sequences. For instance, after the simple constrained random generation of transactions, there might have been a design bug that was resolved. However, with the opportunity for human error in source code control committing of an old design, it behooves the verification team to create a sequence that would regenerate the error if the design were to regress. However, these types of sequences do not need to be run during every regression session, once a month is quite often enough, though I’ve known concerned engineers to run them once a week. More often is a severe waste of regression testing CPU cycles! The reason to avoid the UVM Sequence Library is that all the built-in policies treat the sequences as having equal runtime value. The engineer could create a user-defined policy, but this is just extra work when a local parent sequence with a weighted distribution will suffice – less code and nothing indirect to understand or explain!
9. Those last few weeks of schedule/ those last few testbench structural items incomplete
In the sequence world, there are a few disruptive capabilities. One is priority, and the other is lock/grab. Of course, neither is of use if the number of sequences being run on a sequencer is 1. This discussion pertains to overriding the current testbench behavior to eke out those last tests that the testbench structure is not yet mature enough to support.
Going back to the requirement of more than one sequence running on the sequencer – it is not well understood that the queue in the sequencer only holds one transaction at a time from each sequence running on it. This is the result of the ‘start_item’/’finish_item’ blocking handshake. Therefore, unless the testbench team has a very good reason for using priority between sequences running on the same sequencer, they should save priority for those last few weeks’ disruptive nature. Of course, actual use of priority also requires changing the policy setting for the sequencer.
Grab and lock capabilities are what I refer to as ‘bandaids’ or ‘plasters.’ Unfortunately, one sometimes inherits a testbench with these bandaids in place which requires the unraveling of the required disruptive behavior, and the need to integrate the test capabilities into the testbench, since the prior project didn’t fully complete the testbench.
Grab and lock capabilities of the sequences also favor the ‘parent’ sequence approach.
10. The usuals – naming conventions, bleedingly obvious code, and summary remarks
As was mentioned earlier, detection at source resolves to the most efficient debug.
Naming conventions help equally in creating and maintaining testbench code as the team moves to the next project. This includes agreeing on standard names for modports in interfaces and clocking block names. Having to look up an interface to determine how the ‘monitor’ modport was named is just wasting time. Simply name the agent ‘children’ as monitor, driver, sequencer – and use the full ‘sequencer’ in the typedef and variable name for any virtual sequencer. It may seem trivial when you are first building a testbench, but by the next project having naming conventions clarifies so very much!
Using a trailing ‘_t’ for any typedefs like ‘typedef uvm_sequencer #(APB_transaction) APB_sequencer_t’ then leads to easier code checking of the following:
`uvm_declare_p_sequencer(APB_sequencer_t) - in that the user-defined string must be a ‘type’ not a ‘variable name’!! You can even write a rule if you use a design rule checker instead of just a linter.
When using UVM and setting the string of the ‘create’ function for the factory, make the string exactly the same as the variable name – then the debug statements will not require you to keep a decoder ring to translate the provided ‘dot-separated’ pathname.
The engineer should always write bleedingly obvious code – the maintenance engineer may be that selfsame engineer in a couple of years when clever details are forgotten.
Make sure you use source code control!
And finally, don’t pass on code reviews but make sure that the engineer has either a reason for any warnings that are waived or design rule checks that haven’t been resolved before the code review. Engineers don’t need to waste time on computer-recognizable coding issues that linting or design rule checkers can elucidate.
Enjoy writing the testbench code! It takes a special talent and significant hardware knowledge, but that’s why efficient verification engineers can demand the big bucks! Eileen.
If you'd like help with your SystemVerilog design and verification KnowHow then Doulos can help you: