-
Notifications
You must be signed in to change notification settings - Fork 43
/
Copy pathCDC_FIFO_Repacker.html
553 lines (488 loc) · 23.9 KB
/
CDC_FIFO_Repacker.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
<html>
<head>
<link rel="shortcut icon" href="./favicon.ico">
<link rel="stylesheet" type="text/css" href="./style.css">
<link rel="canonical" href="./CDC_FIFO_Repacker.html">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="description" content="Takes in a ready/valid handshake of a given word width, buffers it into a FIFO and passes it into another clock domain with full throughput, and outputs a ready/valid handshake with a *different* word width. The data is repacked without gaps into the new word width. The words widths are arbitrary and need not be multiples of eachother.">
<title>CDC FIFO Repacker</title>
</head>
<body>
<p class="inline bordered"><b><a href="./CDC_FIFO_Repacker.v">Source</a></b></p>
<p class="inline bordered"><b><a href="./legal.html">License</a></b></p>
<p class="inline bordered"><b><a href="./index.html">Index</a></b></p>
<h1>CDC (Clock Domain Crossing) FIFO Repacker</h1>
<p>Takes in a ready/valid handshake of a given word width, buffers it into
a FIFO and passes it into another clock domain with full throughput, and
outputs a ready/valid handshake with a <em>different</em> word width. The data is
repacked without gaps into the new word width. The words widths are
arbitrary and need not be multiples of eachother.</p>
<p>The minimum input-to-output latency is 7 cycles when both
clocks are plesiochronous.</p>
<h2>Asynchronous Clocks</h2>
<p>This FIFO Repacker supports having the input and output interfaces each on
their own mutually asynchronous clock (without knowledge or restriction on
their relative clock frequencies or phase). Using a FIFO for data tranfer
between asynchronous clock domains allows multiple transfers to overlap
with the CDC synchronization overhead. Once the CDC synchronization is done,
all the previously written data in one clock domain can be freely read out
in the other clock domain without further overhead.</p>
<h2>Trading Off Size for Simplicity</h2>
<p>Usually, a repacker is implemented with a minimally sized ring buffer with
a pre-calculated read/write schedule which covers the whole sequence of
possible amounts of data in the buffer at a given time. This is a complex
schedule which resembles pseudo-random number generation and is likely
unique for every pair of input/output widths. This also means the read side
must be able to read into every possible offset into the ring buffer as
given by the schedule, which can create wide multiplexers. The minimum size
of the buffer is also a function of the computed schedule.</p>
<p>Instead, let's trade-off using more storage to obtain the implicit simplest
schedule: we use the Least Common Multiple of the input and output widths,
which is the smallest buffer which holds an integer number of input and
output entries at the same time. Then the behaviour of the buffer reduces
to that of a FIFO buffer, albeit with different input and output word
widths. This approach tends to minimize the amount of multiplexing and does
not require computing any schedules (only plain counting). We only need to
specify the input and output widths and everything is computed for us.</p>
<p><strong>NOTE</strong>: Using the LCM depth does not guarantee full throughput. A multiple
of the LCM may be needed. See below.</p>
<h2>Resets</h2>
<p>Since the input and output interfaces are asynchronous to eachother, they
each have their own locally-synchronous <code>clear</code> reset signal. However, it
makes no sense to reset one interface only, as that would corrupt the
read/write addresses and duplicate or lose data. <em>Both interfaces must be reset
together.</em> Pick one interface reset as the primary reset, then synchronize
it into the clock domain of the other interface using a <a href="./Reset_Synchronizer.html">Reset
Synchronizer</a>.</p>
<p><strong>NOTE</strong>: Both interfaces must be out of reset before beginning operation,
otherwise a CDC synchronization from one domain into another domain which
is still under reset will be lost, and the system state becomes
inconsistent.</p>
<h2>Parameters, Ports, and Constants</h2>
<pre>
`default_nettype none
module <a href="./CDC_FIFO_Repacker.html">CDC_FIFO_Repacker</a>
#(
parameter WORD_WIDTH_INPUT = 0,
parameter WORD_WIDTH_OUTPUT = 0,
parameter CDC_EXTRA_STAGES = 0
)
(
input wire input_clock,
input wire input_clear,
input wire input_valid,
output reg input_ready,
input wire [WORD_WIDTH_INPUT-1:0] input_data,
input wire output_clock,
input wire output_clear,
output wire output_valid,
input wire output_ready,
output reg [WORD_WIDTH_OUTPUT-1:0] output_data
);
</pre>
<h3>Buffer Depth Adjustment</h3>
<p>First, we have to calculate the Least Common Multiple (LCM) of both
<code>WORD_WIDTH_INPUT</code> and <code>WORD_WIDTH_OUTPUT</code>, which gives us the minimum
buffer depth which will hold a whole number of input and output items at
the same time.</p>
<pre>
`include "<a href="./lcm_function.html">lcm_function</a>.vh"
localparam BUFFER_DEPTH_MIN = lcm(WORD_WIDTH_OUTPUT, WORD_WIDTH_INPUT); // Least Common Multiple
</pre>
<p>However, the LCM depth is not necessarily sufficient to ensure full
throughput, as it may contain too few of <code>WORD_WIDTH_INPUT</code> or
<code>WORD_WIDTH_OUTPUT</code> entries to allow reads and writes to continue without
stalls while the read and write addresses are synchronized across the input
and output clock domains.</p>
<p>Based on the design of the [CDC Word Synchronizer]
(./CDC_Word_Synchronizer.html), the absolute worst-case latency for
completing a transfer from one clock domain to the other is 8 clock cycles.
Thus, we have to contain more than <em>twice</em> that number of entries (17) in
the FIFO buffer to guarantee that there is always a free entry to read and
write at any time, thus avoiding a stall.</p>
<p>We calculate the final <code>BUFFER_DEPTH</code> by dividing it by the larger of
<code>WORD_WIDTH_OUTPUT</code> and <code>WORD_WIDTH_INPUT</code>, taking the ratio of that
fraction with 17 (fudged up by +1 to compensate for integer division), and
then multiplying the <code>BUFFER_DEPTH_MIN</code> by that amount, thus guaranteeing
that <code>BUFFER_DEPTH</code> >= <code>17 * max(WORD_WIDTH_INPUT, WORD_WIDTH_OUTPUT)</code>.</p>
<p>I do integer division with a +1 fudge factor to avoid potential problems as
it's unclear if casting a real to an integer truncates or rounds. So lets
pay the price of possible <code>BUFFER_DEPTH_MIN</code> more entries than we really
need to guarantee we meet our constraint. <em>This can double the buffer depth
when the LCM is a large number close to <code>17 * max(WORD_WIDTH_OUTPUT,
WORD_WIDTH_INPUT)</code>.</em></p>
<pre>
`include "<a href="./max_function.html">max_function</a>.vh"
localparam WORD_WIDTH_MAX = max(WORD_WIDTH_OUTPUT, WORD_WIDTH_INPUT);
localparam ITEM_COUNT_MIN = BUFFER_DEPTH_MIN / WORD_WIDTH_MAX;
localparam BUFFER_DEPTH_MULTIPLIER = (17 / ITEM_COUNT_MIN) + 1;
localparam BUFFER_DEPTH = BUFFER_DEPTH_MIN * BUFFER_DEPTH_MULTIPLIER;
`include "<a href="./clog2_function.html">clog2_function</a>.vh"
localparam ADDR_WIDTH = clog2(BUFFER_DEPTH);
localparam ADDR_ZERO = {ADDR_WIDTH{1'b0}};
localparam ADDR_LAST = BUFFER_DEPTH-1;
localparam BUFFER_ZERO = {BUFFER_DEPTH{1'b0}};
localparam OUTPUT_ZERO = {WORD_WIDTH_OUTPUT{1'b0}};
// A little contortion to please the linter.
// (doing (foo-1)[width-1:0], or similar, isn't <a href="./legal.html">legal</a> in Verilog-2001)
localparam [ADDR_WIDTH-1:0] ADDR_INITIAL_INPUT = WORD_WIDTH_INPUT - 1;
localparam [ADDR_WIDTH-1:0] ADDR_INITIAL_OUTPUT = WORD_WIDTH_OUTPUT - 1;
initial begin
input_ready = 1'b1; // Empty at start, so accept data
output_data = OUTPUT_ZERO;
end
</pre>
<h2>Data Path</h2>
<h3>FIFO Buffer Registers</h3>
<p>We have to access arbitrary word subsets of the entire FIFO storage so we
can read and write word of different width <em>without introducing gaps in the
data!</em> Block RAMs only support limited and fixed word sizes, so are not
usable here. Thus, we must implement using registers and read/write the
necessary exact word subsets as needed.</p>
<p>We also need to do CDC, so we must be careful to not have simultaneous
reads and write to the same register.</p>
<pre>
reg [BUFFER_DEPTH-1:0] buffer = BUFFER_ZERO;
</pre>
<h3>Read/Write Address Counters (Head and Tail)</h3>
<p>To define a word subset inside the <code>buffer</code>, the input and output each use
two counters: head and tail. The tail counter starts at <code>ADDR_ZERO</code>, and
the head counter starts at <code>WORD_WIDTH_OUTPUT-1</code> or <code>WORD_WIDTH_INPUT-1</code>,
and both can only increment by <code>WORD_WIDTH_OUTPUT</code> or <code>WORD_WIDTH_INPUT</code>,
wrapping around to their start value if incremented past <code>ADDR_LAST</code>.</p>
<p>Since <code>BUFFER_DEPTH</code> is the Least Common Multiple of the input/output word
widths, the counters always index a whole word without residue or having to
wrap around the end of the buffer.</p>
<p>We use two counters rather than a counter and an adder to calculate the
next head and tail concurrently.</p>
<p>From these counter values we can perform range checks (e.g.: by seeing if
a write head counter is past a read tail counter, and thus there is word
overlap and no room to write) to test if there is enough input data to
compose an output data word, if we have reached the end of the buffer, and
calculate vector part selects to read/write the buffer.</p>
<h4>Write Address Counters</h4>
<pre>
reg increment_buffer_write_addr = 1'b0;
reg load_buffer_write_addr = 1'b0;
wire [ADDR_WIDTH-1:0] buffer_write_addr_tail;
<a href="./Counter_Binary.html">Counter_Binary</a>
#(
.WORD_WIDTH (ADDR_WIDTH),
.INCREMENT (WORD_WIDTH_INPUT),
.INITIAL_COUNT (ADDR_ZERO)
)
write_address_tail
(
.clock (input_clock),
.clear (input_clear),
.up_down (1'b0), // 0/1 --> up/down
.run (increment_buffer_write_addr),
.load (load_buffer_write_addr),
.load_count (ADDR_ZERO),
.carry_in (1'b0),
// verilator lint_off PINCONNECTEMPTY
.carry_out (),
.carries (),
.overflow (),
// verilator lint_on PINCONNECTEMPTY
.count (buffer_write_addr_tail)
);
wire [ADDR_WIDTH-1:0] buffer_write_addr_head;
<a href="./Counter_Binary.html">Counter_Binary</a>
#(
.WORD_WIDTH (ADDR_WIDTH),
.INCREMENT (WORD_WIDTH_INPUT),
.INITIAL_COUNT (ADDR_INITIAL_INPUT)
)
write_address_head
(
.clock (input_clock),
.clear (input_clear),
.up_down (1'b0), // 0/1 --> up/down
.run (increment_buffer_write_addr),
.load (load_buffer_write_addr),
.load_count (ADDR_INITIAL_INPUT),
.carry_in (1'b0),
// verilator lint_off PINCONNECTEMPTY
.carry_out (),
.carries (),
.overflow (),
// verilator lint_on PINCONNECTEMPTY
.count (buffer_write_addr_head)
);
</pre>
<h4>Read Address Counters</h4>
<pre>
reg increment_buffer_read_addr = 1'b0;
reg load_buffer_read_addr = 1'b0;
wire [ADDR_WIDTH-1:0] buffer_read_addr_tail;
<a href="./Counter_Binary.html">Counter_Binary</a>
#(
.WORD_WIDTH (ADDR_WIDTH),
.INCREMENT (WORD_WIDTH_OUTPUT),
.INITIAL_COUNT (ADDR_ZERO)
)
read_address_tail
(
.clock (output_clock),
.clear (output_clear),
.up_down (1'b0), // 0/1 --> up/down
.run (increment_buffer_read_addr),
.load (load_buffer_read_addr),
.load_count (ADDR_ZERO),
.carry_in (1'b0),
// verilator lint_off PINCONNECTEMPTY
.carry_out (),
.carries (),
.overflow (),
// verilator lint_on PINCONNECTEMPTY
.count (buffer_read_addr_tail)
);
wire [ADDR_WIDTH-1:0] buffer_read_addr_head;
<a href="./Counter_Binary.html">Counter_Binary</a>
#(
.WORD_WIDTH (ADDR_WIDTH),
.INCREMENT (WORD_WIDTH_OUTPUT),
.INITIAL_COUNT (ADDR_INITIAL_OUTPUT)
)
read_address_head
(
.clock (output_clock),
.clear (output_clear),
.up_down (1'b0), // 0/1 --> up/down
.run (increment_buffer_read_addr),
.load (load_buffer_read_addr),
.load_count (ADDR_INITIAL_OUTPUT),
.carry_in (1'b0),
// verilator lint_off PINCONNECTEMPTY
.carry_out (),
.carries (),
.overflow (),
// verilator lint_on PINCONNECTEMPTY
.count (buffer_read_addr_head)
);
</pre>
<h3>Wrap-Around Bits</h3>
<p>To distinguish between the empty and full cases, which both identically
show as equal read and write buffer addresses, we keep track of each time
an address wraps around to zero by toggling a bit. <em>The addresses never
pass eachother.</em> </p>
<p>If the write address runs ahead of the read address enough to wrap-around
and reach the read address from behind, the buffer is full (or has less
free space than one write word) and all writes to the buffer halt until
after we've read out more than the width of a write word. We detect this
because the write address will have wrapped-around one more time than the
read address, so their wrap-around bits will be different.</p>
<p>Conversely, if the read address catches up to the write address from behind,
the buffer is empty (or containing less than one read word of data) and all
reads halt until after we've written enough data to have more than the
width of a read word in the buffer. In this case, the wrap-around bits are
identical.</p>
<pre>
reg toggle_buffer_write_addr_wrap_around = 1'b0;
wire buffer_write_addr_wrap_around;
<a href="./Register_Toggle.html">Register_Toggle</a>
#(
.WORD_WIDTH (1),
.RESET_VALUE (1'b0)
)
write_wrap_around_bit
(
.clock (input_clock),
.clock_enable (1'b1),
.clear (input_clear),
.toggle (toggle_buffer_write_addr_wrap_around),
.data_in (buffer_write_addr_wrap_around),
.data_out (buffer_write_addr_wrap_around)
);
reg toggle_buffer_read_addr_wrap_around = 1'b0;
wire buffer_read_addr_wrap_around;
<a href="./Register_Toggle.html">Register_Toggle</a>
#(
.WORD_WIDTH (1),
.RESET_VALUE (1'b0)
)
read_wrap_around_bit
(
.clock (output_clock),
.clock_enable (1'b1),
.clear (output_clear),
.toggle (toggle_buffer_read_addr_wrap_around),
.data_in (buffer_read_addr_wrap_around),
.data_out (buffer_read_addr_wrap_around)
);
</pre>
<h3>Read/Write Address CDC Transfer</h3>
<p>We need to compare the read and write addresses, along with their
associated wrap-around bits, to detect if the buffer is holding any items.
Therefore, we need to transfer the read address into the <code>output_clock</code>
domain, and the write address into the <code>input_clock</code> domain. We do this
with two <a href="./CDC_Word_Synchronizer.html">CDC Word Synchronizers</a>.</p>
<p>A read/write address is always valid, so we tie <code>sending_valid</code> high and
ignore <code>sending_ready</code>. Then, we loop <code>receiving_valid</code> into
<code>receiving_ready</code> so we start a new CDC word transfer as soon as the
current CDC word transfer completes. This configuration samples the address
continuously, as fast as the CDC word transfer happens. The synchronized
address at <code>receiving_data</code> remains steady between CDC word transfers.</p>
<p>It takes a few cycles to do the CDC word transfer, so when comparing the
local read or write address with the synchronized counterpart from the
other clock domain, we are comparing to a slightly stale version, lagging
behind the actual value. However, since the addresses never pass eachother,
this does not cause any corruption. The synchronized value eventually
catches up, and the actual buffer condition is updated. <em>At worst, this lag
means having to specify a somewhat deeper FIFO to achieve the expected peak
capacity, depending on input/output rates.</em> However, not being restricted
to powers-of-two FIFO depths minimizes this overhead.</p>
<pre>
wire [ADDR_WIDTH-1:0] buffer_write_addr_tail_synced;
wire buffer_write_addr_wrap_around_synced;
wire buffer_write_addr_synced_valid;
<a href="./CDC_Word_Synchronizer.html">CDC_Word_Synchronizer</a>
#(
.WORD_WIDTH (1 + ADDR_WIDTH),
.EXTRA_CDC_DEPTH (CDC_EXTRA_STAGES),
.OUTPUT_BUFFER_TYPE ("HALF"), // "HALF", "SKID", "FIFO"
.OUTPUT_BUFFER_CIRCULAR (0),
.FIFO_BUFFER_DEPTH (), // Only for "FIFO"
.FIFO_BUFFER_RAMSTYLE () // Only for "FIFO"
)
write_to_read
(
.sending_clock (input_clock),
.sending_clear (input_clear),
.sending_data ({buffer_write_addr_wrap_around, buffer_write_addr_tail}),
.sending_valid (1'b1),
// verilator lint_off PINCONNECTEMPTY
.sending_ready (),
// verilator lint_on PINCONNECTEMPTY
.receiving_clock (output_clock),
.receiving_clear (output_clear),
.receiving_data ({buffer_write_addr_wrap_around_synced, buffer_write_addr_tail_synced}),
.receiving_valid (buffer_write_addr_synced_valid),
.receiving_ready (buffer_write_addr_synced_valid)
);
wire [ADDR_WIDTH-1:0] buffer_read_addr_tail_synced;
wire buffer_read_addr_wrap_around_synced;
wire buffer_read_addr_synced_valid;
<a href="./CDC_Word_Synchronizer.html">CDC_Word_Synchronizer</a>
#(
.WORD_WIDTH (1+ ADDR_WIDTH),
.EXTRA_CDC_DEPTH (CDC_EXTRA_STAGES),
.OUTPUT_BUFFER_TYPE ("HALF"), // "HALF", "SKID", "FIFO"
.OUTPUT_BUFFER_CIRCULAR (0),
.FIFO_BUFFER_DEPTH (), // Only for "FIFO"
.FIFO_BUFFER_RAMSTYLE () // Only for "FIFO"
)
read_to_write
(
.sending_clock (output_clock),
.sending_clear (output_clear),
.sending_data ({buffer_read_addr_wrap_around, buffer_read_addr_tail}),
.sending_valid (1'b1),
// verilator lint_off PINCONNECTEMPTY
.sending_ready (),
// verilator lint_on PINCONNECTEMPTY
.receiving_clock (input_clock),
.receiving_clear (input_clear),
.receiving_data ({buffer_read_addr_wrap_around_synced, buffer_read_addr_tail_synced}),
.receiving_valid (buffer_read_addr_synced_valid),
.receiving_ready (buffer_read_addr_synced_valid)
);
</pre>
<h2>Control Path</h2>
<h3>Buffer States</h3>
<p>We describe the state of the buffer itself as the number of items currently
stored in the buffer, as indicated by the read and write addresses and
their wrap-around bits. We only care about the extremes: </p>
<ul>
<li>if the buffer holds no read words, or less than one read word, thus we cannot read</li>
<li>if the buffer holds its maximum number of write words, or has less free space than one write word, thus we cannot write</li>
</ul>
<pre>
reg cannot_read = 1'b0;
reg cannot_write = 1'b0;
always @(*) begin
cannot_read = (buffer_read_addr_head > buffer_write_addr_tail_synced) && (buffer_read_addr_wrap_around == buffer_write_addr_wrap_around_synced);
cannot_write = (buffer_read_addr_tail_synced < buffer_write_addr_head) && (buffer_read_addr_wrap_around_synced != buffer_write_addr_wrap_around);
end
</pre>
<h3>Input Interface (Insert)</h3>
<p>The input interface is simple: if the buffer isn't at its maximum capacity,
signal the input is ready, and when an input handshake completes, write the
data directly into the buffer and increment the write address, wrapping
around as necessary.</p>
<pre>
reg insert = 1'b0;
always @(*) begin
input_ready = (cannot_write == 1'b0);
insert = (input_valid == 1'b1) && (input_ready == 1'b1);
increment_buffer_write_addr = (insert == 1'b1);
load_buffer_write_addr = (increment_buffer_write_addr == 1'b1) && (buffer_write_addr_head == ADDR_LAST [ADDR_WIDTH-1:0]);
toggle_buffer_write_addr_wrap_around = (load_buffer_write_addr == 1'b1);
end
</pre>
<p>Normally this would be a Register module, but we need to describe a clock
enable to <em>part</em> of the registers here, not a data mux. So it's a rare use
of an if-statement outside of a generate block.</p>
<pre>
always @(posedge input_clock) begin
if (insert == 1'b1) begin
buffer [buffer_write_addr_tail +: WORD_WIDTH_INPUT] <= input_data;
end
end
</pre>
<h3>Output Interface (Remove)</h3>
<p>The output interface is not so simple because the output is registered, and
so holds data independently of the buffer. We signal the output holds valid
data whenever we can remove an item from the buffer and load it into the
output register. We meet this condition if an output handshake completes,
or if the buffer holds an item but the output register is not holding any
valid data. Also, we do not increment/wrap the read address if the
previous item removed from the buffer and loaded into the output register
was the last one.</p>
<pre>
reg remove = 1'b0;
reg output_leaving_idle = 1'b0;
reg load_output_register = 1'b0;
always @(*) begin
remove = (output_valid == 1'b1) && (output_ready == 1'b1);
output_leaving_idle = (output_valid == 1'b0) && (cannot_read == 1'b0);
load_output_register = (remove == 1'b1) || (output_leaving_idle == 1'b1);
increment_buffer_read_addr = (load_output_register == 1'b1) && (cannot_read == 1'b0);
load_buffer_read_addr = (increment_buffer_read_addr == 1'b1) && (buffer_read_addr_head == ADDR_LAST [ADDR_WIDTH-1:0]);
toggle_buffer_read_addr_wrap_around = (load_buffer_read_addr == 1'b1);
end
</pre>
<p>Normally this would be a Register module, but we need to describe a clock
enable to the <code>output_data</code> register, and a mux from the buffer. So it's
a rare use of an if-statement outside of a generate block.</p>
<pre>
always @(posedge output_clock) begin
if (load_output_register == 1'b1) begin
output_data <= buffer [buffer_read_addr_tail +: WORD_WIDTH_OUTPUT];
end
end
</pre>
<p><code>output_valid</code> must be registered to match the latency of the buffer output
register.</p>
<pre>
<a href="./Register.html">Register</a>
#(
.WORD_WIDTH (1),
.RESET_VALUE (1'b0)
)
output_data_valid
(
.clock (output_clock),
.clock_enable (load_output_register == 1'b1),
.clear (output_clear),
.data_in (cannot_read == 1'b0),
.data_out (output_valid)
);
endmodule
</pre>
<hr>
<p><a href="./index.html">Back to FPGA Design Elements</a>
<center><a href="https://fpgacpu.ca/">fpgacpu.ca</a></center>
</body>
</html>