#################################################################### # # DOULOS REGEXP LIBRARY # --------------------- # # This file contains a collection of Regular Expressions (REs) # in the RE dialect supported by Tcl version 8.1 and above. # Commentary text is prefixed with a sharp-sign '#' at the # beginning of each line, to make it easier to paste these # comments into your own Tcl scripts. If you wish to use these # REs with any other regular expression processor (such as Perl # or awk) you may need to modify some of the syntax to suit your # tool's specific RE dialect. # # The Regular Expressions contained in this file are offered # freely to the EDA community and whilst Doulos Ltd has taken # reasonable care to ensure that they perform their stated # function, no guarantee of any kind is offered with them. # You are free to use and modify them in any way you see # fit, but Doulos Ltd accepts no liability whatsoever for # their use. # # User contributions to this library are most welcome, and # will be added to the downloadable file with an acknowledgement # of authorship. Please email your contributions to # info@doulos.com # with "regexp contribution" in the subject line. # #################################################################### # 1. In arbitrarily long text, find two successive duplicate words. # Created 21-Nov-2002, Jonathan Bromley. # # \1 contains the first instance of the duplicated word. # (\m\w+\M)\W+\1\M # 2. Modification of 1. to find three or more successive words the same. # Created 21-Nov-2002, Jonathan Bromley. # (\m\w+\M)(\W+\1\M){2,} # 3. In arbitrarily long text, find a sentence with a duplicate word. # Created 21-Nov-2002, Jonathan Bromley. # # Change "\w+" to "\w{n,}" to avoid matching duplicates < n chars, # because (for example) it's probably OK to have 'a' or 'an' twice # in the same sentence. # # \2 contains the whole offending sentence. # \3 contains just the fragment starting and ending with the word. # \4 contains the duplicated word. # # Probably best used with -nocase. # (^|\.)\W*(\m[^.]*\m((\w+)\M[^.]*\m\4\M)[^.]*)(\.|\W*$) # 4. Locate the comment, if any, in a VHDL source line. # Created for Tcl regexp course, SD, Feb 2002 # Modified 22-Nov-2002, Jonathan Bromley: # make the comment optional; added lazy match to the non-comment part. # # Don't forget to use -lineanchor and -linestop if you are scanning # a multi-line string. # # No letter matches, so -nocase is unnecessary. # Won't work correctly if the VHDL source code contains # an escaped identifier that includes an odd number of # double-quotes or two consecutive hyphens. # # \1 contains the line, with the comment stripped away. # \3 contains just the comment. # ^(("[^"]*"|[^"])*?)(--.*)?$