All engineers who attend our courses are invited to send their questions to us in the weeks and months that follow the course, as they learn to apply the knowledge they gained during the few days of the course. The issue we're looking at here came from an email from an attendee.
The engineer was looking for an efficient way to implement a large multiplexer in an FPGA, written using parameterized HDL for flexibility. His design needed to select one 32-bit word from a maximum of 24 words, using five select bits.
There is an apparent contradiction that although an FPGA structure is largely constructed of multiplexers, it can be hard to implement multiplexed logic in an efficient manner.
Consider the very common four-input look-up table (LUT). With four inputs, the biggest multiplexer you can build is one with two inputs. More inputs to the mux require more LUTs; when we get to the level of selecting one 32-bit word from 24, we have many levels of logic causing a degradation of circuit performance.
A multiplexer is a very common thing to have in a circuit. Indeed, any time you write any conditional construct in your HDL you may be inferring a multiplexer. Whether it will remain after optimization is down to your synthesis tool, but you should be aware that it might. Some VHDL and Verilog examples:
if sel = '1' then output <= a; else output <= b; end if; assign data_out = data[sel]; assign sig = sel ? A : B;
Using these and many other related constructs requires care, particularly if the structure of your silicon is already fixed. Coding guidelines for some FPGA families specifically recommend that nested conditional constructs are not used because of the potential for complex, slow implementations.
No solution, as such. Nothing that will magically make your circuits faster. Mostly, it's just one example of where knowledge of the fabric of your FPGA is an important part of achieving an efficient use of its resources with your application. However, there is something you can try.
If you are writing code that selects from a small number of alternatives then it's probably not worth changing the way you write the code, as any improvements from different code will be tiny. If you are looking at the complexity of our original example then a little creative thinking may pay dividends.
Think about how a multiplexer actually works. It's basically an OR of a number of ANDed terms, and each AND is formed by an input and one combination of the select input. In its simplest form:
Op = (!Sel & A) | (Sel & B)
This is the logic for a two-input multiplexer, but is pretty much the same pattern for any width. We may be able to give the synthesis tool a bit of a leg-up by writing our multiplexer as an explicit AND-OR instead of the more usual form. Instead of this sort of thing:
assign data_out = data[sel]; //Verilog data_out <= data(integer_version_of(sel)); --VHDL
try this:
always @* begin : wide_ANDOR_mux integer i; data_bus = 32'b0; for (i=0; i < NUM_INPUTS; i=i+1) if (i == sel) data_bus = data_bus | data[i]; end // wide_ANDOR_mux
In VHDL:
process (data) is variable tmp : std_logic_vector(data_bus'range); begin tmp <= (others => '0'); for I in 0 to NUM_INPUTS-1 loop if I=sel then tmp := tmp or data(I); end if; end loop; data_bus <= tmp; end process;
In this code, data is the array containing the N words input to the multiplexer and data_bus is the word selected.
Absolutely no guarantees. There are so many factors involved that it's impossible to say "Yes, this will give you x% improvement in timing/area/utilization/etc". In the case of the original correspondent, representing his word multiplexer in this way gave him a small improvement in timing. Not enough, alas, for him to meet timing, but it got him closer.
In his case, the next step was to pipeline the multiplexer. This is a technique that is well worth considering, especially on FPGA architectures that are relatively rich in flip-flops. Doing this gave our correspondent the improvements he needed to meet timing.
The basic point is that you can't necessarily rely on your synthesis tool to give you the best possible circuit no matter what code you write. If you're working with FPGAs you need to have some knowledge of what chip resources are available to you and how your desired function can best be implemented using those resources. If you're working on an ASIC your silicon "canvas" is pretty much blank, but you still need to have a reasonable idea of what your synthesis tool is able to do with different coding styles, and what circuit is liable to result from those styles.
Your comments are welcome - send email