MJ Logic Design
Support Processor Sub-System
This block performed chip-level configuration/supervision as well as “slow-
path” packet processing in a next-generation storage processor ASIC. The
block consisted of a BVCI-based infrastructure necessary to connect an ARC
processor core with various on-chip blocks (e.g., PCIe interface, buffer
manager, DMA, etc) and internal/external memories (DDR SRAM, DDR
SDRAM, SRAM). Some of the more extensive BVCI-based interface and
peripheral blocks designed were as follows:
- BVCI Initiator & Target Blocks: The various BVCI initiator and target
blocks were simplified via the use of parameterized BVCI bus initiator
and bus slave sub-blocks. These sub-blocks were used throughout
the sub-system to insulate the “user logic” from the BVCI protocol. For
instance, the standard bus initiator sub-block allowed multiple
outstanding burst reads/writes, and then reordered the subsequent
responses as needed, so that the “user logic” only received in-order
responses.
- BVCI Arbiter: The 8x8 BVCI arbiter allowed 8 initiators to share
access to 8 targets. Specific block features included: parameterized
datapath width, programmable per-target address windows
implemented with strict priority to allow window overlap, and per-target
round-robin arbitration that occurred on burst boundaries and
provided full bandwidth simultaneously to all 8 targets. This arbiter
was instantiated twice within the sub-system.
- Frame Loader: This BVCI initiator block accepted variable-length
frames arriving on 4 different channels, and wrote them into memory
according to buffer descriptors arranged in a ring. As this block
consumed a series of buffers to store a frame, it updated descriptor
fields to indicate the presence of the start-of-frame (SOF), end-of-
frame (EOF), and in the case of the EOF descriptor, how much of the
buffer was consumed. Software was interrupted on a per-channel
basis each time a complete frame was available in memory.
- Frame Unloader: This block provided basic multi-channel, descriptor-
based DMA capability between various memories and the cell-based
portions of the client’s ASIC. Byte-level packing and knowledge of
frame header/payload boundaries allowed full frame construction
across multiple descriptors. Frames were then parsed into cells and
passed through per-channel output FIFOs.
- DMA Engine: This 4-channel, descriptor-based, scatter/gather,
general-purpose DMA engine utilized two BVCI initiators, one for
source reads, and another for destination writes, which enabled full
BVCI bus speed transfers from one memory to another. A per-channel
transfer buffer allowed efficient use of the BVCI bus bandwidth,
allowing one BVCI burst to be accumulated, while another was
forwarded.
- PCIe Initiator/Target: These blocks provided a bridge between BVCI
and PCIe. The Initiator block provided multiple programmable
aperture windows that were used to convert internal BVCI burst
commands into posted/non-posted PCIe request TLPs, and also
converted subsequent non-posted completion TLPs back into a
corresponding BVCI response. The Target block converted PCIe
request TLPs into BVCI bursts and converted the subsequent BVCI
response for non-posted commands into a corresponding PCIe
completion TLP.
- Main Memory Controller: This BVCI target block translated each
variable-sized BVCI access into one fixed-sized access to the ASIC’s
central DDR Memory Controller, performing the appropriate byte-
masking for writes, or discarding of read data for reads. To overcome
the DDR Controller latency, this block maintained enough context and
response completion memory to allow up to 8 outstanding reads and
writes. One of the interesting challenges on this block was handling
wrap-around addressing for BVCI bursts, when the DDR Controller
only supported standard linear addressing.
- DDR SRAM Interface: This block utilized inter-clock FIFOs in both the
command and response directions to interface the 8-byte BVCI bus
with an effective 4-byte DDR SRAM that operated in a different clock
domain.


