Abstract
The FFT computation involves global data transfer. Inefficiencies can result when implementing large FFT computations on mesh arrays with small local storage. Several factors are to be considered when mapping FFT computation onto mesh arrays. These include amount of available local storage, I/O bandwidth, concurrent execution of I/O, arithmetic logic operations within the PEs, and interprocessor communication operations. Indeed, the mapping of the computation is further complicated by hardware features such as multi-function pipeline, multi-port memories in the PEs. Several mappings of FFT computation are evaluated with respect to I/O time, computation time and communication time on a p*p Systolic/Cellular Array Processor developed at Hughes Research Labs. Various mappings are obtained by modifying the FFT signal flow graph.