Friday 28 July 2017

Developing & Delivering KnowHow

Home > Knowhow > Perl > SDF File Patching Using Perl

SDF File Patching Using Perl

Text files sometimes have to be modified in ways not supported by EDA tools. Standard Delay Format (SDF) is one of the many types of text files used by EDA engineers. Manual editing with a text editor is a good one off solution when there aren't too many changes to make. However, when the same modifications or patches must be applied repeatedly then a little help from Perl saves many hours per project.

About SDF Files

Here we present a Perl script showing some simple but powerful time saving tricks to patch an SDF file. It was inspired by a delegate attending the Doulos Essential Perl course. He had a script to patch an SDF file zeroing delays in specific parts of his gate-level simulation. His script worked but took 30 minutes to process a typical 70MB SDF file. After the course his script was re-written to do the same job in just 5 seconds!

In this example we're going to copy the SDF file zeroing IOPATH timing data fields when the INSTANCE name matches one listed in a separate file. Many patching operations are possible but they're all variations on this theme. Here's the script, we'll break it down and explain each chunk further down the page:

#!/usr/local/bin/perl

#                                   patch_sdf.pl
#                                   Version 1.0
#                                   8 Jan 2003, SD
#                                   www.doulos.com
#
# Copyright (c) 2003, Doulos Limited
# Training and consultancy for hardware engineers.
#
# Perl script for patching Standard Delay Format files.

unless( @ARGV == 3 ) {
  die <<EOF;

  This Perl script patches Standard Delay Format (SDF) files. It
  accepts three filenames as command line arguments:

    1. SDF file for reading and patching
    2. File containing a list of INSTANCEs for patching
    3. SDF file for writing patched file

  Call the script like this:

    perl patch_sdf.pl in.sdf celltypes.txt out.sdf

EOF
}

# Read three command line argument filenames
( $fn_in, $fn_instances, $fn_out ) = @ARGV;

open( IN,        $fn_in        ) or die "Cannot open $fn_in\n";
open( INSTANCES, $fn_instances ) or die "Cannot open $fn_instances\n";
open( OUT,       ">$fn_out"    ) or die "Cannot create $fn_out\n";

# Undefine $INPUT_RECORD_SEPARATOR to slurp the whole file
# Store INSTANCE names in %instances for fast lookups
# Every string of non-white space is an INSTANCE name

undef $/;
foreach( <INSTANCES> =~ /\S+/g ) { $instances{$_} = 1 }

# Here's a neat trick to save parsing every line
# Set $/, the $INPUT_RECORD_SEPARATOR, to '(INSTANCE'
# Now the record input operator loads one CELL entry each time.
# The first field is the INSTANCE name

$/ = '(INSTANCE';
print OUT scalar( <IN> ); # First record only - force scalar context

while( <IN> ) {
  # Match the instance name and check if it's in %instances
  if( /([^\s)]+)/o and $instances{$1} ) {
    print "Found INSTANCE $1\n";
    delete( $instances{$1} ); # So instances not found can be reported

    # Match this:
    #    IOPATH SET O (6500:6500:6500)(6500:6500:6500)
    #    -----$1------
    # End up with this:
    #    IOPATH SET O (0:0:0)(0:0:0)
    s/
      (
        IOPATH[^(]+    # Find "IOPATH" and everything before "("
      )                # Back reference in $1

      \([^)]+\)        # Match first timing data field eg. (4:5:6)
      \s*              # Possible white space
      \([^)]+\)        # Match second timing data field

     /$1(0:0:0)(0:0:0)/gox; # Substitute globally

#     Could have written this but it's not so easy to read
#     s/(IOPATH[^(]+)\([^)]+\)\s*\([^)]+\)/$1(0:0:0)(0:0:0)/go;
  }
  print OUT;
}

# Report instances that were not found
# These will be the labels remaining in the %instances hash
foreach( keys %instances ) {
  print "Not found: INSTANCE $_\n";
}
print "$fn_out generated\n";

How It Works

Download The Demo

You may copy/modify and use this script with your own SDF files. Download the demo then unzip it:

gzip -d doulos_demo_sdf.pl

This file, doulos_demo_sdf_pa.pl, is a self extracting Perl archive. Execute it to create a directory called doulos_demo_sdf/ containing all necessary files to experiment with sdf_patch.pl. The example works on both Unix and Windows.

perl doulos_demo_sdf_pa.pl

Checking Command Line Arguments

This script must be called with three arguments: the name of an SDF file to read, the name of a text file containing a list of instance names to patch and the name of a file to write the patched SDF file to. In a list context the @ARGV array returns all command line arguments. But, in a scalar context, provided here by the == operator, an array returns its number of elements. Unless there are exactly three we want our script to die and print an explanation.

unless( @ARGV == 3 ) {
  die <<EOF;

  This Perl script patches Standard Delay Format (SDF) files. It
  accepts three filenames as command line arguments:

    1. SDF file for reading and patching
    2. File containing a list of INSTANCEs for patching
    3. SDF file for writing patched file

  Call the script like this:

    perl patch_sdf.pl in.sdf celltypes.txt out.sdf

EOF
}

Here file syntax like this <<EOF; . . . . EOF tells Perl to interpret the enclosed lines as an interpolated string. That is, a string that would otherwise have been enclosed in double quotes allowing variable interpolation (though none is used here). It's a neat syntax for clearly presenting a chunk of text inside a Perl script. It also self-documents the script.

Open Three Files

There are three files, so we'll open them. Being engineers and knowing this script is a point solution never extending beyond 30 lines of code we'll dispense with unnecessary formalities like declaring and scoping variables. However, meaningful variable names will save time when reusing and extending scripts later. Don't be afraid to type longer names. (If you can't touch type - learn quickly!)

Capitalised identifiers are file handles by convention in Perl so the name only has to tell us which file handle it is. IN, INSTANCES and OUT are succinct and self-documenting.

( $fn_in, $fn_instances, $fn_out ) = @ARGV;

open( IN,        $fn_in        ) or die "Cannot open $fn_in\n";
open( INSTANCES, $fn_instances ) or die "Cannot open $fn_instances\n";
open( OUT,       ">$fn_out"    ) or die "Cannot create $fn_out\n";

In Perl open or die is a great self-documenting construction. It pinpoints problems very quickly if you add a meaningful comment. Consider adding $! the $OS_ERROR special variable to the error string it will contain the file system error message.

To save time, complexity and typing we deliberately chose not close these file handles explicitly. Perl tidies up for us once it runs off the end of the script closing all open file handles. Of course there are occasions when closing file handles explicitly part way through a script is necessary.

Reading The Instance List

The record input operator, <FILE_HANDLE>, reads everything upto and including the first occurrence of the string stored in $/, the special variable $INPUT_RECORD_SEPARATOR. Undefining $/ causes the record input operator to read the whole file in one go because undef doesn't occur before the end of file.

$/ always has the initial value "\n". Changing $/ may cause trouble if another part of the Perl script is going to assume it still has its initial value. Assigning "\n" back to $/ afterwards might save frustration in longer scripts.

undef $/;
foreach( <INSTANCES> =~ /\S+/g ) { $instances{$_} = 1 }

The instances.txt file contains a list of names separated by white space. Keys for each instance name are added to the %instances hash. True values, 1s, are associated with each key.

Perl interprets this last line by first reading a record from INSTANCES, the whole file into one string. Next this string is bound to the matching operator, /..../, using the binding operator =~. The regular expression \S+ greedily matches the first substring of non-white space characters.

The whole matching operation is evaluated in a list context provided by foreach and with the global modifier g so returns a list of every instance name. No loop variable is specified so Perl assumes the default $_.

Inside the loop $_ is used to add a key/value pair to the %instances hash.

Preparing To Read, Patch And Write Each Record

Speed is very important when patching files that could be tens of MB long. For best performance we need to select a pragmatic approach to parsing the SDF file. We could slurp the whole file into a scalar variable but this could overstretch our computer's resources. We could process the file one line at a time but each record spans several lines and would entail extra work keeping track of where we are. Ideally we need to read one complete record at a time. There's a really neat trick to do this with Perl.

Setting $/ to the literal string '(INSTANCE' makes this patching process very efficient. Each use of the record input operator reads in exactly one record. It does split the records part way down so each string read begins with the instance name of one record and continues to the space in front of the instance name of the following record.

But who cares? Excepting the very first read, each read contains the right amount of information to process. ie. An instance name and the IOPATH lines that may need patching.

$/ = '(INSTANCE';
print OUT scalar( <IN> ); # First record only - force scalar context

The first read gets everything up to the first instance name. We'll print this straight out. The record input operator would read every record in one go inside print's list context so the scalar command is used to force only one read.

Subsequent record input operations read a string ending in '(INSTANCE' like this one:

 Data_0_INBLOCK_IBUF)
      (DELAY
        (ABSOLUTE
          (IOPATH I O (3000:3000:3000)(3000:3000:3000))
        )
      )
  )
  (CELL (CELLTYPE "X_BUF")
    (INSTANCE

For this record the instance name is Data_0_INBLOCK_IBUF which is one of those listed in the demonstration so all six 3000s need changing to 0s.

Record Patching

Here's the SDF record patching bit:

while( <IN> ) {
  # Match the instance name and check if it's in %instances
  if( /([^\s)]+)/o and $instances{$1} ) {
    print "Found INSTANCE $1\n";
    delete( $instances{$1} ); # So instances not found can be reported

    # Match this:
    #    IOPATH SET O (6500:6500:6500)(6500:6500:6500)
    #    -----$1------
    # End up with this:
    #    IOPATH SET O (0:0:0)(0:0:0)
    s/
      (
        IOPATH[^(]+    # Find "IOPATH" and everything before "("
      )                # Back reference in $1

      \([^)]+\)        # Match first timing data field eg. (4:5:6)
      \s*              # Possible white space
      \([^)]+\)        # Match second timing data field

     /$1(0:0:0)(0:0:0)/gox; # Substitute globally

#     Could have written this but it's not so easy to read
#     s/(IOPATH[^(]+)\([^)]+\)\s*\([^)]+\)/$1(0:0:0)(0:0:0)/go;
  }
  print OUT;
}

Comments may or may not help. There isn't much here without comments, it's just harder to see what's happening:

while( <IN> ) {
  if( /([^\s)]+)/o and $instances{$1} ) {
    print "Found INSTANCE $1\n";
    delete( $instances{$1} );
    s/(IOPATH[^(]+)\([^)]+\)\s*\([^)]+\)/$1(0:0:0)(0:0:0)/go;
  }
  print OUT;
}

Some special Perl magic happens when the record input operator and nothing else appears as the condition to a while loop. Perl automagically inserts $_ like this: while( $_ = <IN> ) {.

Before printing every record read into $_ a substitution is done conditionally. Statement print OUT; defaults to printing $_ to file handle OUT in the absence of an argument.

The if statement back references the first identifier into $1. The regular expression [^s)]+ matches the leftmost substring of characters not including white space or literal ) characters. That is, the instance name which may be followed by a space or ).

The instance name, now back referenced in $1, is looked up in the %instances hash to see if it's one of the records to modify. If it is then print a comment to say we've found an instance name from the list and delete its key/value pair from the hash. This is necessary to produce a report of instance names not found. Finally use a substitution operation to find each PATHIO entry and zero its timing data fields.

Modifier g finds and substitutes every PATHIO entry. Without it only the first entry would be substituted. Modifier o means compile once. This is a regular expression engine optimisation that tells Perl not to recompile the regular expression every time through the loop. Sometimes recompiling is necessary because interpolated variables in the regular expression change each time through the loop.

Modifier x directs Perl's regular expression engine to ignore white space and end of line comments following # symbols. It's a neat way to comment regular expressions so others can read your code and, perhaps more importantly, you can understand your code when you come back to it later. Adding comments with a sample showing which bit is matched and which bit back referenced may help.

Negated character classes can be the fastest way to match strings using Perl's regular expression engine. Consider [^(]+, which matches a string of characters upto but not including the next (. In this case everything between PATHIO and the first (. For each character Perl only has to check it's not a (. That's much faster than say [\w\s]+ where Perl would have to compare each character against more than 60 possibilities.

For many scripts these optimisations don't matter. We're paying particular attention to them here because the same operation may be done thousands or even millions of times per run.

Missed Instances

foreach( keys %instances ) {
  print "Not found: INSTANCE $_\n";
}
print "$fn_out generated\n";

Finally, if there are any entries not deleted from the %instances hash then we need to report their absence from this SDF file. Then remind the user of the name of the new SDF file just generated.

Privacy Policy Site Map Contact Us