Copyright (C) 2020-2022 University of Zurich
The documentation for ReckOn is under a Creative Commons Attribution 4.0 International License (see doc/LICENSE file or http://creativecommons.org/licenses/by/4.0/), while the ReckOn HDL source code is under a Solderpad Hardware License v2.1 (see LICENSE file or https://solderpad.org/licenses/SHL-2.1/).
Before reading the documentation, it is strongly advised to read our ISSCC 2022 paper in order to have a clear overview of the ReckOn online-learning spiking RNN processor.
Part of the documentation structure, formatting and contents is adapted from the documentation of the ODIN SNN processor.
Current documentation revision: v1.0. It only contains basic descriptions on the chip communication buses, memory addressing schemes and main configuration registers. Unpublished parts of the chip have been removed from the currently-released HDL code, as well as from the documentation. A documentation and HDL update will take place upon publication of the omitted parts.
- Architecture
- Interfaces and commands
- Global configuration registers
- Testbench
- Implementation tips
- Citing ReckOn
- Revision history
ReckOn is a spiking recurrent neural network (RNN) processor enabling on-chip learning over second-long timescales based on a modified version of the e-prop algorithm (we released a PyTorch implementation of the vanilla e-prop algorithm for leaky integrate-and-fire neurons here). It was prototyped and measured in 28-nm FDSOI CMOS at the Institute of Neuroinformatics, University of Zurich and ETH Zurich, and published at the 2022 IEEE International Solid-State Circuits Conference (ISSCC) with the following three main claims:
- ReckOn demonstrates end-to-end on-chip learning over second-long timescales while keeping a milli-second temporal resolution,
- it provides a low-cost solution with a 0.45-mm² core area, 5.3pJ/SOP at 0.5V, and a memory overhead of only 0.8% compared to the equivalent inference-only network,
- it exploits a spike-based representation for task-agnostic learning toward user customization and chip repurposing at the edge.
ReckOn implements a (256)-r256-16 network topology with 256 virtual input neurons, 256 recurrent leaky integrate-and-fire (LIF) neurons with all-to-all connectivity and 16 output leaky integrator (LI) neurons. A future revision of the documentation will extensively cover architectural details of ReckOn, including the embedded FSMs. For the time being, we briefly describe hereunder how the main SRAM resources of ReckOn are accessed. We refer the reader to the paper for more information on the network architecture, as well as for block diagrams of the complete system and of the e-prop-based learning scheme.
- Neuron SRAM: This 2-kB SRAM contains 128 words of 128 bits. Each word contains the individual state data of two neurons (i.e. current membrane potential and eligibility trace values) and their shared parameters (i.e. leakage decay factor alpha and firing threshold) as follows:
Word bit range | Description (N represents the 7-bit word address) |
---|---|
<127:116> | 12 LSBs of the fractional part of the 16-bit leakage decay factor alpha (see Section 3 for the 4 MSBs). Shared parameter between neurons 2N and 2N+1. |
<115:100> | 16-bit firing threshold. Shared parameter between neurons 2N and 2N+1. |
<99:90> | 10-bit output eligibility trace associated to neuron 2N+1. |
<89:78> | 12-bit recurrent eligibility trace associated to neuron 2N+1. |
<77:66> | 12-bit input eligibility trace associated to neuron 2N+1. |
<65:50> | 16-bit membrane potential associated to neuron 2N+1. |
<49:40> | 10-bit output eligibility trace associated to neuron 2N. |
<39:28> | 12-bit recurrent eligibility trace associated to neuron 2N. |
<27:16> | 12-bit input eligibility trace associated to neuron 2N. |
<15:0> | 16-bit membrane potential associated to neuron 2N. |
-
Input/recurrent weight SRAMs: These 64-kB SRAMs contain 4k words of 128 bits for the storage of 8-bit input/recurrent weights. The 8 MSBs of the 12-bit word address contain the pre-synaptic neuron index, the 4 LSBs of the 12-bit word address contain the 4 MSBs of the post-synaptic neuron index. The 4 LSBs of the post-synaptic neuron index represent the byte address of the target weight in the accessed 128-bit word.
-
Output weight SRAM: This 8-kB SRAM contains 512 words of 128 bits for the storage of 8-bit output weights. Only the first 256 words are used, the MSB of the 9-bit word address is thus fixed to 0 and the 8 LSBs represent the pre-synaptic neuron index. The accessed 128-bit word thus contains the output weights of all 16 output neurons.
The top-level file reckon.v contains three main interfaces: the SPI bus (Section 2.1), the input AER bus (Section 2.2) and an output bus (Section 2.3). Other I/O pins are described as follows:
Pin | Direction | Description |
---|---|---|
CLK_EXT | Input | External clock. |
CLK_INT_EN | Input | Enable signal for the internal clock generator. |
RST | Input | Global reset signal. |
SAMPLE | Input | Signals the start and the end of an input data sample. |
TIME_TICK | Input | Signals the start of a new timestep (rising edge). |
TARGET_VALID | Input | Signals timesteps for which a target is expected for e-prop updates. |
INFER_ACC | Input | Signals timesteps for which counts of the winning output neurons are to be updated (for classification tasks, over the course of timesteps during which INFER_ACC was enabled, the label of the output neuron with the highest output over most timesteps will represent the network inference). |
SPI_RDY | Output | [For debug/monitoring purposes] Signals when the global FSM enters the CONFIG state, during which the network state is frozen and can be safely read/written through SPI. |
TIMING_ERROR_RDY | Output | If SPI_TIMING_MODE=1, signals the occurrence of a timing error (TIME_TICK was asserted before the global FSM finished processing the current timestep). If SPI_TIMING_MODE=0, signals when the global FSM finished processing the current timestep and TIME_TICK can be safely asserted. |
Fig. 1 - 32-bit SPI timing diagram for (a) write and (b) read operations. |
ReckOn implements a standard 32-bit SPI slave bus with the following interface:
Pin | Direction | Width | Description |
---|---|---|---|
SCK | Input | 1-bit | SPI clock generated by the SPI master. |
MOSI | Input | 1-bit | Master output, slave input. |
MISO | Output | 1-bit | Master input, slave output. As the fabricated chip was pad-limited, other signals can be displayed on the MISO pin (see Section 3). |
When using the SPI bus, the SPI_EN_CONF configuration register should be asserted first (Section 3). In order to ensure proper operation, the SCK SPI clock should operate at a frequency at least 4x smaller than the clock frequency of ReckOn. The SPI write and read operations follow the timing diagram shown in Figs. 1(a) and 1(b), respectively: a 32-bit address field is first transmitted by the SPI master, before data associated to this address is sent by the master (write) or received from ReckOn (read). Depending on the contents of the 32-bit address field a, where a<31> indicates whether a write (0) or read (1) operation should be pursued, the SPI bus can be used to access the configuration registers or the on-chip SRAM / register file contents as follows:
code<2:0> (a<30:28>) | addr<15:0> (a<15:0>) | Description |
---|---|---|
3'b000 | {addr_conf<15:0>} | Write to configuration register at address addr_conf (not readable). |
3'b001 | {n/a,addr_word<6:0>,addr_32b<1:0>} | Read/write to the neuron SRAM (128 128-bit words). 4-byte chunk addr_32b from word address addr_word of the SRAM is read/written. |
3'b010 | {n/a,addr_oneur<3:0>} | Read/write to the 16-bit membrane potential of output neuron addr_oneur (register-file-based storage). |
3'b011 | {n/a,addr_word<11:0>,addr_32b<1:0>} | Read/write to the input weight SRAM (4096 128-bit words). 4-byte chunk addr_32b from word address addr_word of the SRAM is read/written. |
3'b100 | {n/a,addr_word<11:0>,addr_32b<1:0>} | Read/write to the recurrent weight SRAM (4096 128-bit words). 4-byte chunk addr_32b from word address addr_word of the SRAM is read/written. |
3'b101 | {n/a,addr_word<8:0>,addr_32b<1:0>} | Read/write to the output weight SRAM (512 128-bit words, of which only the first 256 words are used). 4-byte chunk addr_32b from word address addr_word of the SRAM is read/written. |
In order to accelerate initialization/readback of the ReckOn SRAMs, grouped data read/write operations can be performed over SPI from a single address field. To do so, a<27:16> contains the number of SPI data transactions to be performed from a starting address of a<15:0>, which will then be incremented internally in the SPI module. If a<27:16> is given a value of 1, a standard SPI transaction with a single data field is performed.
Fig. 2 - Input AER four-phase handshake timing diagram. |
Address-event representation (AER) buses follow a four-phase-handshake protocol for asynchronous communication between neuromorphic chips. As ReckOn follows a synchronous digital IC design flow, a double-latching barrier is placed on the REQ line of the input AER bus in order to limit metastability issues.
The input AER bus has the following interface:
Pin | Direction | Width | Description |
---|---|---|---|
AERIN_ADDR | Input | 8-bit | AER address field. |
AERIN_TAR_EN | Input | 1-bit | Indicates whether data in AERIN_ADDR represents the address of a virtual input neuron (0) or target data for e-prop-based learning (1). |
AERIN_REQ | Input | 1-bit | AER request handshake line. |
AERIN_ACK | Output | 1-bit | AER acknowledge handshake line. |
The output bus interface has been simplified compared to the one in the fabricated chip in order to remove unpublished blocks. The output bus in the currently-released version of the HDL has a format similar to the AER interface described in Section 2.2 and its sole purpose is to transmit inference results.
Pin | Direction | Width | Description |
---|---|---|---|
OUT_DATA | Output | 8-bit | Data field. |
OUT_REQ | Output | 1-bit | Request handshake line. |
OUT_ACK | Input | 1-bit | Acknowledge handshake line. |
For classification setups (see Section 3 for the associated configuration registers), only one transaction takes place at the end of the sample and the 4 LSBs of OUT_DATA contain the index of the output neuron with the highest output averaged over the timesteps during which the INFER_ACC pin was asserted. For regression setups, a series of output transactions takes place at each timestep, where the 16-bit membrane potential values of active output neurons are successively transmitted (the 8 LSBs first, followed by the 8 MSBs).
Configuration registers can be written through the SPI bus (no readback operation is available) and are defined as follows:
Register Name | Addr<15:0> | Width | Reset value | Description |
---|---|---|---|---|
SPI_EN_CONF | 0 | 1-bit | 1'b1 | Enables access to the network internal state through SPI and ensures the control FSM goes into a safe state to do so, which will be signalled through the SPI_RDY pin. |
SPI_RO_STAGE_SEL | 1 | 9-bit | / | Selects the stage of the ring-oscillator-based local clock generator (not used in the released HDL code as technology-specific blocks, incl. clock gen and frequency divider, were removed). |
SPI_GET_CLKINT_OUT | 2 | 1-bit | 1'b0 | Enables a frequency-divided copy of the locally generated clock to be displayed on the SPI MISO pin for monitoring purposes (not used in the released HDL code as technology-specific blocks, incl. clock gen and frequency divider, were removed). |
SPI_GET_TAR_REQ_OUT | 3 | 1-bit | 1'b1 | Enables the target request signal to be displayed on the MISO pin. |
SPI_RST_MODE | 8 | 1-bit | 1'b0 | Selects the spike reset mode of LIF neurons (1: reset to zero, 0: reset by subtraction). |
SPI_DO_EPROP | 9 | 3-bit | 3'b111 | Enables e-prop updates (bit 0: input weight updates, bit 1: recurrent weight updates, bit 2: output weight updates). Input/recurrent/output weights can be independently configured in any plastic/frozen configuration. |
SPI_LOCAL_TICK | 10 | 1-bit | 1'b0 | Enables local generation of timestep ticks (see SPI_CYCLES_PER_TICK for the timestep duration). If configured to 0, timestep ticks are provided externally through the TIME_TICK pin. |
SPI_ERROR_HALT | 11 | 1-bit | 1'b1 | Enables halting the network operation if a timing error takes place (i.e. a timestep tick occurred before the global FSM finished processing the current timestep) for debugging purposes. A network reset will be necessary. |
SPI_FP_LOC_WINP | 12 | 3-bit | 3'd0 | Input weight scaling parameter. The stored 8-bit input weights are sign-extended to 16 bits and left-shifted by the value of SPI_FP_LOC_WINP before being added to the neuron membrane potentials. |
SPI_FP_LOC_WREC | 13 | 3-bit | 3'd0 | Recurrent weight scaling parameter. The stored 8-bit recurrent weights are sign-extended to 16 bits and left-shifted by the value of SPI_FP_LOC_WREC before being added to the neuron membrane potentials. |
SPI_FP_LOC_WOUT | 14 | 3-bit | 3'd0 | Output weight scaling parameter. The stored 8-bit output weights are sign-extended to 16 bits and left-shifted by the value of SPI_FP_LOC_WOUT before being added to the neuron membrane potentials. |
SPI_FP_LOC_TINP | 15 | 3-bit | 3'd0 | Radix point location of input traces (left-shifted by the value of SPI_FP_LOC_TINP). |
SPI_FP_LOC_TREC | 16 | 3-bit | 3'd0 | Radix point location of recurrent traces (left-shifted by the value of SPI_FP_LOC_TREC). |
SPI_FP_LOC_TOUT | 17 | 3-bit | 3'd0 | Radix point location of output traces (left-shifted by the value of SPI_FP_LOC_TOUT). |
SPI_LEARN_SIG_SCALE | 18 | 4-bit | 4'd0 | Learning signals scaling parameter, which are left-shifted by the value of SPI_LEARN_SIG_SCALE. |
SPI_REGUL_MODE | 19 | 3-bit | 3'b000 | Selects the weight regularization mode (bit 0: multiplicative regularization, bit 1: additive regularization). If bit 2 is asserted, regularization is enabled during all timesteps, not only when the TARGET_VALID pin is asserted (for use only with additive regularization). |
SPI_REGUL_W | 20 | 2-bit | 2'b00 | Enables weight regularization (bit 0: input weights, bit 1: recurrent weights). Input/recurrent weights can be independently configured in any regularized/non-regularized configuration. |
SPI_EN_STOCH_ROUND | 21 | 1-bit | 1'b0 | Enables stochastic rounding in the eligibility traces and neuron membrane potentials. |
SPI_SRAM_SPEEDMODE | 22 | 8-bit | 8'h00 | Configuration of the SRAM macro speed modes (not used in the released HDL code as technology-specific blocks were removed). |
SPI_TIMING_MODE | 23 | 1-bit | 1'b0 | Controls the pin function of TIMING_ERROR_RDY (see pin description in Section 2). |
SPI_REGRESSION | 25 | 1-bit | 1'b0 | Should be programmed to 1 for regression tasks and 0 for classification tasks. |
SPI_SINGLE_LABEL | 26 | 1-bit | 1'b1 | Should be programmed to 1 for classification tasks in order to provide the classification label only once per sample, instead of at every timestep. |
SPI_NO_OUT_ACT | 27 | 1-bit | 1'b0 | Disables the hard-sigmoid non-linearity applied to the membrane potential of output neurons. |
SPI_SEND_PER_TIMESTEP | 30 | 1-bit | 1'b0 | Enables sending the network output (format conditioned by SPI_SEND_LABEL_ONLY) at every timestep instead of once at the end of the sample. Typically for use in regression tasks. |
SPI_SEND_LABEL_ONLY | 31 | 1-bit | 1'b1 | Configures the network output contents sent over the output bus (1: winning neuron label, 0: membrane potential values of all enabled output neurons). Typically configured to 1 for classification tasks and 0 for regression tasks. |
SPI_NOISE_EN | 32 | 1-bit | 1'b0 | Enables the addition of random noise to membrane potential updates of LIF neurons (noise magnitude configured with SPI_NOISE_STR). |
SPI_FORCE_TRACES | 33 | 1-bit | 1'b0 | Forces eligibility trace computation even if e-prop updates are disabled (for monitoring purposes). |
SPI_CYCLES_PER_TICK | 64 | 32-bit | / | Number of clock cycles per locally generated timestep tick (used only if SPI_LOCAL_TICK is enabled). |
SPI_ALPHA_CONF | 65-68 | 128-bit | 128'h0 | Each bit of SPI_ALPHA_CONF selects the 4 MSBs of the 16-bit leakage decay factors alpha associated to every pair of two LIF neurons, which consist of a single-bit integer part and a 15-bit fractional part (1: the integer part bit is 1 and the three MSBs of the fractional part are 3'b000, 0: the integer part bit is 0 and the three MSBs of the fractional part are 3'b111). The 12 LSBs of alpha's for every pair of two LIF neurons are defined in the neuron memory (Section 1). |
SPI_KAPPA | 69 | 8-bit | 8'h7A | Defines the value of the 8-bit leakage factor kappa shared among all output LI neurons, which consists of a single-bit integer part and a 7-bit fractional part. |
SPI_THR_H_0 | 70 | 16-bit | / | Defines the membrane potential threshold separating the first and second segments of the straight-through-estimator (STE) function. |
SPI_THR_H_1 | 71 | 16-bit | / | Defines the membrane potential threshold separating the second and third segments of the straight-through-estimator (STE) function. |
SPI_THR_H_2 | 72 | 16-bit | / | Defines the membrane potential threshold separating the third and fourth segments of the straight-through-estimator (STE) function. |
SPI_THR_H_3 | 73 | 16-bit | / | Defines the membrane potential threshold separating the fourth and fifth segments of the straight-through-estimator (STE) function. |
SPI_H_0 | 74 | 5-bit | / | Defines the value of the first segment of the straight-through-estimator (STE) function. |
SPI_H_1 | 75 | 5-bit | / | Defines the value of the second segment of the straight-through-estimator (STE) function. |
SPI_H_2 | 76 | 5-bit | / | Defines the value of the third segment of the straight-through-estimator (STE) function. |
SPI_H_3 | 77 | 5-bit | / | Defines the value of the fourth segment of the straight-through-estimator (STE) function. |
SPI_H_4 | 78 | 5-bit | / | Defines the value of the fifth segment of the straight-through-estimator (STE) function. |
SPI_LR_R_WINP | 79 | 5-bit | / | Input weight update probability scaling parameter (applies a right shift by the value of SPI_LR_R_WINP). |
SPI_LR_P_WINP | 80 | 5-bit | / | Input weight update probability scaling parameter (applies a left shift by the value of SPI_LR_P_WINP). |
SPI_LR_R_WREC | 81 | 5-bit | / | Recurrent weight update probability scaling parameter (applies a right shift by the value of SPI_LR_R_WREC). |
SPI_LR_P_WREC | 82 | 5-bit | / | Recurrent weight update probability scaling parameter (applies a left shift by the value of SPI_LR_P_WREC). |
SPI_LR_R_WOUT | 83 | 5-bit | / | Output weight update probability scaling parameter (applies a right shift by the value of SPI_LR_R_WOUT). |
SPI_LR_P_WOUT | 84 | 5-bit | / | Output weight update probability scaling parameter (applies a left shift by the value of SPI_LR_P_WOUT). |
SPI_SEED_INP | 85 | 25-bit | / | Seed of the unfolded LFSR generating random numbers for stochastic input weight updates. |
SPI_SEED_REC | 86 | 25-bit | / | Seed of the unfolded LFSR generating random numbers for stochastic recurrent weight updates. |
SPI_SEED_OUT | 87 | 22-bit | / | Seed of the unfolded LFSR generating random numbers for stochastic output weight updates. |
SPI_SEED_STRND_NEUR | 88 | 30-bit | / | Seed of the unfolded LFSR generating random numbers for stochastic rounding of the LIF neuron membrane potentials. |
SPI_SEED_STRND_ONEUR | 89 | 15-bit | / | Seed of the unfolded LFSR generating random numbers for stochastic rounding of the output neuron membrane potentials. |
SPI_SEED_STRND_TINP | 90 | 30-bit | / | Seed of the unfolded LFSR generating random numbers for stochastic rounding of the input eligibility traces. |
SPI_SEED_STRND_TREC | 91 | 30-bit | / | Seed of the unfolded LFSR generating random numbers for stochastic rounding of the recurrent eligibility traces. |
SPI_SEED_STRND_TOUT | 92 | 30-bit | / | Seed of the unfolded LFSR generating random numbers for stochastic rounding of the output eligibility traces. |
SPI_SEED_NOISE_NEUR | 93 | 17-bit | / | Seed of the unfolded LFSR generating random numbers for the configurable amount of noise added to the LIF neuron membrane potentials. |
SPI_NUM_INP_NEUR | 94 | 8-bit | 8'hFF | Number of input neurons enabled in the network (should be configured to the target number of neurons -1). |
SPI_NUM_REC_NEUR | 95 | 8-bit | 8'hFF | Number of recurrent neurons enabled in the network (should be configured to the target number of neurons -1). |
SPI_NUM_OUT_NEUR | 96 | 4-bit | 4'hF | Number of output neurons enabled in the network (should be configured to the target number of neurons -1). |
SPI_REGUL_F0 | 98 | 12-bit | / | Value of the post-synaptic recurrent eligibility traces above which regularization on the pre-synaptic weights is turned on. |
SPI_REGUL_K_INP_R | 99 | 5-bit | / | Input weight additive regularization scaling parameter (applies a right shift by the value of SPI_REGUL_K_INP_R). |
SPI_REGUL_K_INP_P | 100 | 5-bit | / | Input weight additive regularization scaling parameter (applies a left shift by the value of SPI_REGUL_K_INP_P). |
SPI_REGUL_K_REC_R | 101 | 5-bit | / | Recurrent weight additive regularization scaling parameter (applies a right shift by the value of SPI_REGUL_K_REC_R). |
SPI_REGUL_K_REC_P | 102 | 5-bit | / | Recurrent weight additive regularization scaling parameter (applies a left shift by the value of SPI_REGUL_K_REC_P). |
SPI_REGUL_K_MUL | 103 | 5-bit | / | Input and recurrent weight multiplicative regularization scaling parameter (applies a right shift by the value of SPI_REGUL_K_MUL). |
SPI_NOISE_STR | 104 | 4-bit | / | Neuron noise scaling parameter. Noise is generated as pseudo-random 16-bit words to be added to the LIF neuron membrane potentials, right-shifted by the value of SPI_NOISE_STR. |
A simple testbench file demonstrating online learning is provided (see testbench.sv file, you may want to update paths listed in lines 33-37). The data folder contains two single 50-sample batches (one for training, one for test) of the delayed-supervision navigation task, whose size and complexity are suitable for RTL simulations (see paper for details). The first line of these dataset files indicates the number of samples, the subsequent lines represent events: the first tuple element contains the input neuron index, the second one the time in milliseconds. Special events are used to represent the end of the sample (the first element has index -1) and the start of the delayed supervision (the first element has index -2, the second represents the target label). Random input, recurrent and output weight initialization files are provided as well.
When implementing ReckOn or adapting it for a particular application, care should be taken with the following points:
-
The provided Verilog HDL code can directly be used for behavioral simulation. For implementation with a specific technology node, the behavioral descriptions of the neuron and weight SRAMs in srnn.v need to be replaced with SRAM macros. Block RAM (BRAM) instances can be used for FPGA implementations.
-
The different Verilog modules contain as parameters N and M, where N represents the number of neurons in the input and recurrent layers and M is log2(N). These parameters help rescaling ReckOn to different network dimensions, however the datapath is not entirely generic in N and M and further adaptations are needed if the dimensions are changed.
Upon usage of the documentation or source code, please cite the paper associated to ReckOn:
[C. Frenkel and G. Indiveri, "ReckOn: A 28nm sub-mm² task-agnostic spiking recurrent neural network processor enabling on-chip learning over second-long timescales," IEEE International Solid-State Circuits Conference (ISSCC), 2022]
Rev | Date | Author | Description |
---|---|---|---|
1.0 | 18 Feb. 2022 | C. Frenkel | Basic doc without unpublished parts |