
44
CHAPTER 4. LAB TASK 2 - DESIGN A JPEG ACCELERATOR
...
DCT2 Control Unit
csr
DCT
64
8x16=128
32
NC
32 NC
RAM
in
8x12=96
t_wr
t_rd
8x12=96
WB
Ctrl
Transpose
Memory
Block
1
counter
wb.adr
wb.stb
wb.ack
wb.dat_o
3232
Q2
Block
RAM
out
32
NC
1
counter
wb.dat_i
wb.adr
wb.dat_o
Figure 4.1: Proposed architecture for the 2-D DCT-accelerator. csr is a Control/Status
register. Not all wires are shown.
2. A row of the image is read from the in RAM in 2 clock cycles. This is repeated
8 times.
3. The rows are transformed in DCT (12-bit signed numbers) and written to the
transpose memory in the same tempo
4. When all rows have been written to T, columns, 8 × 12 bits, can be read from
T and fed into the DCT again. A complete column can be read per clock cycle.
After the second DCT the values are 16-bit signed numbers.
5. Finally 2 × 16 bits per clock cycle are quantized in Q2 and written to the out
RAM. When all columns have been written, the RDY bit in csr is set.
You will receive a Verilog module, dct.v, that computes a 1-D DCT multiplied
with
√
8. This file is a straightforward implementation in Verilog of the computation
schemes (modified Loeffler) in Figures 3.7 and 3.8 in Chapter 3.
Preparation task 6
Open the file
dct.v
and have a look at what it does. What are the inputs, what are the
outputs? How many clock cycles does a computation take? Is it pipelined?
4.2.1 Block RAMs in VirtexII
The VirtexII-4000 FPGA contains 120 18 kbit block RAMs. They are dual ported with
two completely independent sets of synchronous read and write ports. The easiest way
Komentáře k této Příručce