![]()
|
The 38th Annual IEEE/ACM International Symposium on Microarchitecture, 2005
| |||
Exploiting parallelism with the Cell heterogeneous architecture
AbstractWe introduce the Cell Broadband Engine Architecture, a next generation system architecture specifically designed to address next generation architecture challenges. Cell addresses the major pressing issues in modern architecture -- the memory wall, the power wall, and the frequency wall -- by shifting the compute paradigm from a uni-processor design to an efficient chip multiprocessor. The Cell Broadband Engine implements a heterogenous chip multiprocessor to deliver a quantum leap in performance by exploiting application parallelism at all levels, i.e., thread level parallelism, instruction level parallelism and data-level parallelism. By synergistically exploiting the trade-offs possible in a heterogenous multiprocessor, Cell delivers unsurpassed performance with 9 processor cores and 10 threads on a single chip. Cell addresses the compute density challenge to increase the performance per area and power, and delivers high performance for compute intensive codes by exploiting chip multiprocessing to address inherent power and memory latency issues. The first generation Cell Broadband Engine provides an unparalleled level of compute performance on a single chip with a collection of heterogeneous processors, including a Power architecture core with two levels of cache and eight attached Synergistic Processing Elements with their own local memories and globally consistent Synergistic Memory Flow Controllers integrating the system across a high-performance Element Interconnect Bus. To maximize performance, the SPEs implement a new streamlined architecture built around pervasively data parallel computing. We will describe the CELL Architecture and provide an in-depth discussion of the various design challenges and trade-offs. In addition to the Cell system architecture, we will also describe the novel pervasively data parallel computing architecture embodied in the Synergistic Processor Architecture, its motivation, and its integration in the Cell system. We will describe the compilation challenges involved in generating high performance code from this novel architecture. This will include issues such as how to generate good scalar code in a SIMD engine where significant hardware function has been offloaded to software, how to exploit the multiple levels of parallelism, and how the compiler can provide a single system abstraction of the underlying heterogeneous processors with attached local memories. The discussion will be in the context of a functioning prototype compiler for the Cell Broadband Engine Architecture. We introduce the Cell Broadband Engine Architecture, a next generation system architecture specifically designed to address next generation architecture challenges. Cell addresses the major pressing issues in modern architecture -- the memory wall, the power wall, and the frequency wall -- by shifting the compute paradigm from a uni-processor design to an efficient chip multiprocessor. The Cell Broadband Engine implements a heterogenous chip multiprocessor to deliver a quantum leap in performance by exploiting application parallelism at all levels, i.e., thread level parallelism, instruction level parallelism and data-level parallelism. By synergistically exploiting the trade-offs possible in a heterogenous multiprocessor, Cell delivers unsurpassed performance with 9 processor cores and 10 threads on a single chip. Cell addresses the compute density challenge to increase the performance per area and power, and delivers high performance for compute intensive codes by exploiting chip multiprocessing to address inherent power and memory latency issues. The first generation Cell Broadband Engine provides an unparalleled level of compute performance on a single chip with a collection of heterogeneous processors, including a Power architecture core with two levels of cache and eight attached Synergistic Processing Elements with their own local memories and globally consistent Synergistic Memory Flow Controllers integrating the system across a high-performance Element Interconnect Bus. To maximize performance, the SPEs implement a new streamlined architecture built around pervasively data parallel computing. We will describe the CELL Architecture and provide an in-depth discussion of the various design challenges and trade-offs. In addition to the Cell system architecture, we will also describe the novel pervasively data parallel computing architecture embodied in the Synergistic Processor Architecture, its motivation, and its integration in the Cell system. We will describe the compilation challenges involved in generating high performance code from this novel architecture. This will include issues such as how to generate good scalar code in a SIMD engine where significant hardware function has been offloaded to software, how to exploit the multiple levels of parallelism, and how the compiler can provide a single system abstraction of the underlying heterogeneous processors with attached local memories. The discussion will be in the context of a functioning prototype compiler for the Cell Broadband Engine Architecture. | ||||
| | Paper submission | Home | Call for papers (PDF) | Workshops and Tutorials | Technical Program | Hotel information | Organizing committee | Prior MICRO conferences | | ||||