In this point-counterpoint discussion, Trevor Mudge argues for the combination of near-threshold voltage processors with techniques such as boosting to address the needs of datacenter workloads. Urs Holzle offers a ca...
详细信息
In this point-counterpoint discussion, Trevor Mudge argues for the combination of near-threshold voltage processors with techniques such as boosting to address the needs of datacenter workloads. Urs Holzle offers a cautionary note on the wisdom of giving up too much single-threaded performance to achieve energy-efficiency in large internet service applications.
Programming network processors is challenging. To sustain high line rates, network processors have extremely tight memory access and instruction budgets. Achieving desired performance has traditionally required hand-c...
详细信息
Programming network processors is challenging. To sustain high line rates, network processors have extremely tight memory access and instruction budgets. Achieving desired performance has traditionally required hand-coded assembly. Researchers have recently proposed high-level programming languages for packet processing, but the challenges of compiling these languages into code that is competitive with hand-tuned assembly remain unanswered. This paper describes the Shangri-La compiler, which accepts a packet program written in a C-like high-level language and applies scalar and specialized optimizations to generate a highly optimized binary. Hot code paths identified by profiling are mapped across processing elements to maximize processor utilization. Since our compilation target has no hardware caches, software-controlled caches are generated for frequently accessed application data structures. Packet handling optimizations significantly reduce per-packet memory access and instruction counts. Finally, a custom stack model maps stack frames to the fastest levels of the target processor's heterogeneous memory hierarchy. Binaries generated by the compiler were evaluated on the Intel IXP2400 network processor with eight packet processing cores and eight threads per core. Our results show the importance of both traditional and specialized optimization techniques for achieving the maximum forwarding rates on three network applications, L3-Switch, MPLS and Firewall.
暂无评论