forked from BerkeleyLab/Bedrock
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
169 lines (144 loc) · 8.01 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
This is the latest (and greatest!) in a long series of fully-unrolled
CORDIC processors written by Larry Doolittle, with occasional help
from Ming Choy and Gang Huang. It is written in portable Verilog,
synthesizable for pretty much any FPGA; part of the Verilog is in turn
composed by a Python program.
The license for all these files is based on 3-clause BSD.
See LICENSE.md, enclosed.
Good reference material on CORDIC hardware in general is given by
Ray Andraka at
http://www.andraka.com/cordic.php
and of course see the general and long-winded background at
https://en.wikipedia.org/wiki/CORDIC
The module cordicg_b22 (in cordicg_b22.v) can be instantiated
in application code. One of its parameters (width) sets the
bit width of the X and Y input and output ports. The phase input
and output are one bit wider than the X and Y ports. See below
for additional configuration.
The "_b22" above refers to the internal data path width of the CORDIC
computations, necessarily larger than the port width. Note that the
file cordicg_b22.v is not included in the source tarball; rather, it is
generated by the Python program cordicgx.py using
python cordicgx.py 22 > cordicg_b22.v
This rule is embedded in the Makefile, so it's equivalent to say
make cordicg_b22.v DPW=22
Of course, there's nothing magic about the choice of 22, it's your
choice to balance DSP accuracy and resources. See perf.png, another
Makefile target.
The complete parameter list for cordicg_$DPW is:
width [2, $DPW-1] port width
nstg [2, $DPW] number of CORDIC stages
def_op [0, 3] see below
The number of logic elements scales approximately as 3*DPW*nstg.
Latency is nstg cycles.
Phase ports are in "natural" binary units, such that wrapping around the
finite-width digital word is a 2*pi wrap of conventional angle. As such,
it can be interpreted equally well as signed or unsigned.
X and Y ports are always signed, even when X is used as a radius output
in R->P mode. You are expected to know something about CORDIC when
setting up their scaling: a CORDIC engine has an intrinsic gain of
about 1.64676 (asymptotic value for a large number of stages). Also be
aware that a full-scale input on both X and Y has a radius sqrt(2) larger
than just full scale in one axis. This module does not detect or saturate
overflows; it just wraps, which is useless, so don't let that happen.
The 2-bit op input selects the operation mode as follows:
0 Polar->Rectangular "rotation" (phaseout will be close to zero)
1 Rectangular->Polar "vectoring" (yout will be close to zero)
2 not used
3 Follow
All three data inputs are used in all modes. To get an ordinary P->R
computation with op=0, set yin to zero. It`s also possible to use that
mode for general vector rotation of the input (x,y) vector by angle
phasein. To get an ordinary R->P computation with op=1, set phasein
to zero. A non-zero phasein in that mode will simply be added to the
answer. See below for info on follow mode.
The op input is allowed to vary cycle-by-cycle. Feel free to interleave
R->P with P->R computations. One pipelined CORDIC computation, based on the
three data inputs and the op control input, starts on every (posedge clk).
The def_op parameter sets the initial value of the op port; in use cases
where op is constant, setting def_op to match can help the synthesizer
optimize away unused resources.
Version 27 has an important API name change: the module you instantiate
now has its data path bit-width encoded in the name, as described above.
The generator and testbenches were rewritten from Matlab/Octave and awk to
Python, tested compatible with both python2 and python3. The resulting ports
and functionality of the Verilog module are unchanged. This version also gets
rid of the Verilog include file and its hidden configuration state, that
made previous versions needlessly hard to incorporate in larger projects.
The previously hidden data path width now shows up as part of the module
name, and the previously hidden number of stages is now a parameter, as
described above. Thus if you previously instantiated a cordicg(), that
in turn included cordicg.vh generated with o=22 and s=20, you would now
instantiate cordicg_b22() and set its parameter nstg to 20. The code base
is now drawn from the LBNL-ATG Bedrock project, with an actual license!
Version 26 has no changes to the core synthesizable code; it improves the
documentation, fixes the generated Verilog code for 33 < o < 56, and adds
provision to check synthesis on three generations of Xilinx chips.
It is interesting to compare and contrast the maximum speed this CORDIC
engine can run (in its default configuration with 18-bit ports, 22-bit internal
data path witch, and 20 stages) on the various architectures. Summary:
part speed LUTs chip price chip LUTs CORDIC price
xc3s1000-ft256-5 7.2 ns 1686 46.60 15360 4.87
xc6slx45t-fgg484-3 5.2 ns 1596 84.39 27288 4.78
xc7a100t-fgg484-2 3.8 ns 1564 141.25 63400 3.31
xc7k70t-fbg484-1 3.1 ns 1629 127.21 41000 4.86
5CSXFC6D6F31C8N 4.4 ns 2320 226.89 110000 4.78
where the price was as of 2014-05-25 at Digi-Key. The Xilinx synthesizer
is XST 14.7. The CORDIC price is an upper limit, since it assumes all the
non-LUT resources on the chip are valueless.
Semi-incompatible change between versions 24 and 25: the op parameter
is two bits instead of one. Just pad on the left with zero (as will be
performed by default in Verilog), and there will be no change in function.
The new bit enables follow mode, where the rotation phase is the negative
of the previous operation. For such cycles, the input phase is ignored.
The hardware required to implement this new mode is small, about one logic
cell per stage, and even that should be stripped away by the synthesizer
if op[1] is hard wired to zero. The follow mode's computation can be also
performed by two successive passes through the CORDIC engine; using follow
mode saves a factor of two in latency, and reduces round-off error.
Incompatible change between versions 23 and 24: the phase of the
rectangular to polar conversion has been changed by pi. That means
that when op==1, the angle output is truly atan2(y,x), and the x (R)
output in that mode is now positive.
The other new feature of version 24, besides an additional test
bench mode, is the parameter op_def. Use cases with constant op
input can set this parameter to match, and might save some gates
and/or timing when synthesizing for Xilinx.
The test platform is Icarus Verilog, Xilinx XST, using Debian GNU/Linux
s the operating system. The code is generally standards-based and not
version-specific.
As shipped, results of the regression test fired off by "make" are:
python3 cordicgx.py 22 > cordicg_b22.v
iverilog -Wall -DSIMULATE -Wno-timescale -DDPW=22 -pnstg=20 -o cordicg_tb cordicg_tb.v cordicg_b22.v cstageg.v addsubg.v
vvp -N cordicg_tb +op=0 > cordic_ptor.dat
Check of x,y,theta->x,y
python3 cordic_check.py cordic_ptor.dat
test covers 15958 points, maximum amplitude is 90325 counts
peak error 1.25 bits, 0.0010 %
rms error 0.36 bits, 0.0003 %
PASS
vvp -N cordicg_tb +op=1 > cordic_rtop.dat
Check of x,y,theta->r,theta
python3 cordic_check.py cordic_rtop.dat
test covers 7979 points, maximum amplitude is 129001 counts
peak error 1.06 bits, 0.0008 %
rms error 0.36 bits, 0.0003 %
PASS
vvp -N cordicg_tb +rmix=1 > cordic_bias.dat
Check of downconversion bias
python3 cordic_check.py bias cordic_bias.dat
test covers 6102 points
averages 0.027 -0.007
PASS
vvp -N cordicg_tb +op=3 > cordic_slve.dat
Check of follow mode
python3 cordic_check.py cordic_slve.dat
test covers 11968 points, maximum amplitude is 129001 counts
peak error 3.27 bits, 0.0025 %
rms error 0.41 bits, 0.0003 %
PASS
Note that the theoretical lower limit for peak error is 0.5, and for
rms error is 1/sqrt(12) = 0.29. More information about the accuracy
behavior is given in a plot you can create with "make perf.png".
Happy computing!
Larry Doolittle <[email protected]> March 10, 2020