Intel's Xeon roadmap includes package-integrated FPGAs in every new generation. In this talk, we will dissect why this is such a powerful combination at this time of great change in datacenter workloads. We will s...
详细信息
The proceedings contain 29 papers. The topics discussed include: accelerating subsequence similarity search based on dynamic time warping distance with FPGA;video-rate stereo matching using Markov random field TRW-S i...
ISBN:
(纸本)9781450318877
The proceedings contain 29 papers. The topics discussed include: accelerating subsequence similarity search based on dynamic time warping distance with FPGA;video-rate stereo matching using Markov random field TRW-S inference on a hybrid CPU+FPGA computing platform;fully-functional FPGA prototype with fine-grain programmable body biasing;sensing nanosecond-scale voltage attacks and natural transients in FPGAs;word-length optimization beyond straight line code;word-length optimization beyond straight line code;embedding-based placement of processing element networks on FPGAs for physical model simulation;a remote memory access infrastructure for global address space programming models in FPGAs;architecture support for custom instructions with memory operations;high throughput and programmable online traffic classifier on FPGA;and indirect connection aware attraction for FPGA clustering.
Multi-FPGA platforms are a popular choice today for complex system prototyping because they offer high execution speed, low cost, and real world testing experience. However, performance of multi-FPGA based systems is ...
详细信息
ISBN:
(纸本)9781450345354
Multi-FPGA platforms are a popular choice today for complex system prototyping because they offer high execution speed, low cost, and real world testing experience. However, performance of multi-FPGA based systems is severely affected by widening logic to I/O gap in FPGAs. In order to address the performance issue, in this work, we propose an exploration and optimization flow for multi-FPGA based prototyping that gives an end-to-end experience starting from benchmark generation to optimized inter-FPGA routing. Using generic tools of the flow, ten large benchmarks are generated. Then, through a generic novel inter-FPGA routing environment, effect of variation of number of FPGAs as well as number of inter-FPGA tracks on the performance of a target design is explored. For performance exploration and optimization, five different FPGA boards are utilized where number of FPGAs on board are varied from two to six. Moreover, for each board four different inter-FPGA track combinations are used. Experimental results reveal that multi-FPGA boards with inter-FPGA tracks corresponding optimally to the cut net requirements of benchmarks under consideration give best frequency results. Furthermore, frequency comparison between different boards shows that FPGA board with six FPGAs gives, on average, best frequency results. Finally, we also perform frequency-price analysis which shows that board with four FPGAs gives better frequency-price tradeoff as compared to other FPGA boards under consideration.
In recent ten years, lots of new applications emerged, such as AI, big data and cloud. Though the workloads of these applications are very diverse, they demand huge resource of data center. In contrast, the silicon te...
详细信息
ISBN:
(纸本)9781450341851
In recent ten years, lots of new applications emerged, such as AI, big data and cloud. Though the workloads of these applications are very diverse, they demand huge resource of data center. In contrast, the silicon technology moves slower and slower because the Moore's law is going to the end. Consequently, the data center building from commodity hardware cannot provide enough costefficiency and power-efficiency. To meet the increasingly resource needs of emerging applications, the scale of data center is become much larger and larger. It consumes huge power and cost of hardware. From the business perspective, the slow development of hardware technology limits the value creation of emerging applications. We, Baidu, the largest search engine in China, have faced this challenge in several years ago. We find that the server number increases much faster than the scale of business. And this case is common for internet companies. Because the iteration of general processor becomes slower and slower. For example, Intel announced that the Tick-Tock production strategic was out of date in this early year. This problem drive us to look for new methods to boost business. From Internet Company's perspective, building new chips or new architecture based on its applications' characteristics makes sense. This method can break the limitation of commodity chips and commodity hardware. And according to academic and industry experiences, domain-specified architecture can achieve much better performance and power efficiency than general architecture. Consequently, we are exploring new architecture to extend Moore's law. In this paper, we present the works on exploring new architecture for data center. The data center resource includes storage, memory, computing and networking. Hence, we focus on these four areas. Firstly, we implemented SDF for large-scale distributed storage system. The SDF aims to low cost and high performance flash storage system. Secondly, we implemented SDA for dee
RapidSmith is an open-source framework that allows for the exploration of novel approaches to the FPGA CAD flow for Xilinx devices. However, RapidSmith has poor sup- port for manipulating designs below the slice level...
详细信息
Deep Neural Networks (DNNs) are compute-intensive learning models with growing applicability in a wide range of domains. FPGAS are an attractive choice for DNNs since they offer a programmable substrate for accelerati...
详细信息
Combining multi-processing with the high level of configurability possible with FPGA-based soft-processors, this paper presents a multiprocessing framework based on the MicroBlaze soft-processor that provides multicor...
详细信息
ISBN:
(纸本)9781450333153
Combining multi-processing with the high level of configurability possible with FPGA-based soft-processors, this paper presents a multiprocessing framework based on the MicroBlaze soft-processor that provides multicore support and fully coherent, independently configurable Level 1 Caches with Linux multicore support. This architecture allows for finegrain configurability of the system, allowing for FPGA resources to be better optimized for a specific embedded application. We use our framework to explore the L1 Data Cache configuration, developing a metric for efficiency based on resource usage and static application runtime. We find that a Pseudo-Random replacement policy is consistently the more efficient choice for FPGA systems.
A low-power nonvolatile programmable-logic cell array is proposed for energy-constrained applications such as wireless sensor nodes and mobile apparatuses. A 64 ×64 programmablelogic cell array includes a 9.2-Mbi...
详细信息
ISBN:
(纸本)9781450333153
A low-power nonvolatile programmable-logic cell array is proposed for energy-constrained applications such as wireless sensor nodes and mobile apparatuses. A 64 ×64 programmablelogic cell array includes a 9.2-Mbit nonvolatile switch, namely atom switch, as the routing switch and configuration memory. A 16-bit arithmetic logic unit, which is a building block of the micro-controller unit, was implemented to compare the speed and power consumption with a state-of-the-art low power fieldprogrammablegate array. The proposed programmable-logic array exhibited 30% dynamic power saving and x2.5 faster operation in the low-voltage region. Zero sleep power was also demonstrated.
This tutorial describes tools for efficiently implementing floating point applications on FPGAs. We present both the SDK for OpenCL and DSP Builder Advanced Blockset and show that they can be effectively used to imple...
详细信息
暂无评论