We are going to extend the sequential counter to implement a Gray code sequence rather than a simple linear increment.

The sequence we will implement is as follows:

Nth Step after Reset (in binary)

000

001

010

011

100

101

110

111

Gray Code

000

001

011

010

110

111

101

100

which yields the following always block for a Gray code counter, derived directly from the above truth table:

always @(negedge reset or posedge clock) if (~reset) q <= 0; else case (q) 3'b000: q <= 3'b001; 3'b001: q <= 3'b011; 3'b011: q <= 3'b010; 3'b010: q <= 3'b110; 3'b110: q <= 3'b111; 3'b111: q <= 3'b101; 3'b101: q <= 3'b100; 3'b100: q <= 3'b000; default: q <= 3'bx; endcase

Obviously, the Verilog code doesn't look like a normal counter always block. Perhaps more worrying is that this approach to coding is going to take a long time for a counter > 3 or 4 bits. Yes, the code space increases as O(2^{N}). In the same way that we try to avoid algorithms whose complexity creates execution times or whose output data sets increase at anything greater than O(N^{3/2}), it's also wise to avoid such algorithms from a coding standpoint.

As an example of algorithm choice, consider the options for sorting a set of data. Various algorithms exist, ye olde BubbleSort, Williams' HeapSort and Hoare's QuickSort spring to mind. The HeapSort algorithm executes in time O(Nlog_{2}N), whereas BubbleSort executes in time O(N^{2}). QuickSort is generally faster than HeapSort by about 25% on typical (< 100 items) random data sets providing the data set is not already ordered. As an aside, if you don't know how to code the BubbleSort algorithm don't bother, go straight to the HeapSort algorithm (do not pass GO, but do collect 200 coding brownie points!).

So let's find a way to code the Gray code sequence in as succinct a way as possible. By examining the sequence in the table, we can see that as we *increment* through each step bit N+1 inverts the value of bit N when 1. This looks just like an XOR operation and so it proves. Note, that this inversion rule applies because the sequence is incrementing and thus an incrementer is required too.

Thus, the horrid case statement in our first code snippet becomes somewhat simpler:

q <= q + 1; -- the incrementer

and for the inversion (extending the counter to four bits),

q_gray = {q[3], q[3]^q[2], q[2]^q[1], q[1]^q[0]};

so the Verilog code now becomes:

always @(negedge reset or posedge clock) if (~reset) q <= 0; else q <= q + 1; assign q_gray = {q[3], ^q[3:2], ^q[2:1], ^q[1:0]};

Yes, we took advantage of Verilog's unary XOR operator to make the code even shorter, but well, that's what I thought it was there for! So, a summary of the key points:

- avoid any kind of complexity greater than O(N
^{3/2}), whether it be code space, execution time or data set - always try to recognise patterns in your code (look at the Synchronizer Scaler model for yet another example of this cognitive approach to hardware design)
- use neat language features where they fit, such as Verilog's unary ^ operator

One final point, if the code looks neat from a coding, performance (execution and real-time) and data set perspective, you're probably not going to do much better. Unless, of course, you use a *better* algorithm...