Quantized Neural Networks (QNNs), which use low bitwidth numbers for representing parameters and performing computations, have been proposed to reduce the computation complexity, storage size and memory usage. In QNNs...
详细信息
Accurately synthesizing talking face videos and capturing fine facial features for individuals with long hair presents a significant challenge. To tackle these challenges in existing methods, we propose a decomposed p...
详细信息
Person re-identification, as the basic task of a multi-camera surveillance system, plays an important role in a variety of surveillance applications. However, the current mainstream person re-identification model base...
详细信息
Growth of NaCl and Fe/NaCl/Fe Magnetic tunneling junctions on Si (100) has been achieved by using a high vacuum electron-beam deposition system. Epitaxial tunnel junctions turn out to be prone to pinholes as well as e...
详细信息
ISBN:
(纸本)9781479956234
Growth of NaCl and Fe/NaCl/Fe Magnetic tunneling junctions on Si (100) has been achieved by using a high vacuum electron-beam deposition system. Epitaxial tunnel junctions turn out to be prone to pinholes as well as electrode oxidation. Instead, the best tunneling magnetoresistance we have achieved in this system is on polycrystalline tunnel barriers with thin Mg insertion, and reaching 22.3% at room temperature.
Considered the wireless sensor network clustering structure, a new big data collecting method based on compressive sensing is proposed. The collection process is as follows: in the cluster, the sink node sets the corr...
详细信息
Considered the wireless sensor network clustering structure, a new big data collecting method based on compressive sensing is proposed. The collection process is as follows: in the cluster, the sink node sets the corresponding seed vector based on the distribution of network, and then sends it to each cluster head. Cluster head can generate corresponding own random spacing sparse matrix based on its received seed vector, and collect data through compressive sensing technology; Among clusters, clusters forward measurement values to sink node along multi-hop routing tree which we built before. Performance analyzing and comparison of results show that this method is superior to other methods regardless of in a cluster or inter-cluster.
With the recently proposed redundancy-based core salvaging technology, resilient processors can survive the threat of severe timing violation induced by near-threshold Vdd and function correctly at aggressive clock ra...
详细信息
ISBN:
(纸本)9781467392280
With the recently proposed redundancy-based core salvaging technology, resilient processors can survive the threat of severe timing violation induced by near-threshold Vdd and function correctly at aggressive clock rates. In our observation, proactively disabling the weakest components that limit the core frequency can still maintain a higher throughput at Near Threshold Voltage (NTV) supply if the cores with defected components are salvaged at a low cost. In this work, a resilience-aware frequency scaling and mapping strategy that considers defected processor states in scheduling is proposed to exploit the fault-tolerant architectures for higher energy efficiency. In our evaluation, it is witnessed that typical resilient multi-core processors can achieve significantly higher performance per watt in experiments compared to conventional scheduling policy.
Atmospheric modeling is an essential issue in the study of climate change. However, due to the complicated algorithmic and communication models, scientists and researchers are facing tough challenges in finding effici...
详细信息
Atmospheric modeling is an essential issue in the study of climate change. However, due to the complicated algorithmic and communication models, scientists and researchers are facing tough challenges in finding efficient solutions to solve the atmospheric equations. In this paper, we accelerate a solver for the three-dimensional Euler atmospheric equations through reconfigurable data flow engines. We first propose a hybrid design that achieves efficient resource allocation and data reuse. Furthermore, through algorithmic offsetting, fast memory table, and customizable-precision arithmetic, we map a complex Euler kernel into a single FPGA chip, which can perform 956 floating point operations per cycle. In a 1U-chassis, our CPU-DFE unit with 8 FPGA chips is 18.5 times faster and 8.3 times more power efficient than a multicore system based on two 12-core Intel E5-2697 (Ivy Bridge) CPUs, and is 6.2 times faster and 5.2 times more power efficient than a hybrid unit equipped with two 12-core Intel E5-2697 (Ivy Bridge) CPUs and three Intel Xeon Phi 5120d (MIC) cards.
Remote Photoplethysmography (rPPG) is a non-contact method that uses facial video to predict changes in blood volume, enabling physiological metrics measurement. Traditional rPPG models often struggle with poor genera...
详细信息
AI benchmarking provides yardsticks for benchmarking, measuring and evaluating innovative AI algorithms, architecture, and systems. Coordinated by BenchCouncil, this paper presents our joint research and engineering e...
详细信息
This paper presents a novel reconfigurable framework for training Convolutional Neural Networks (CNNs). The proposed framework is based on reconfiguring a streaming datapath at runtime to cover the training cycle for ...
详细信息
ISBN:
(纸本)9781509015047
This paper presents a novel reconfigurable framework for training Convolutional Neural Networks (CNNs). The proposed framework is based on reconfiguring a streaming datapath at runtime to cover the training cycle for the various layers in a CNN. The streaming datapath can support various parameterized modules which can be customized to produce implementations with different trade-offs in performance and resource usage. The modules follow the same input and output data layout, simplifying configuration scheduling. For different layers, instances of the modules contain different computation kernels in parallel, which can be customized with different layer configurations and data precision. The associated models on performance, resource and bandwidth can be used in deriving parameters for the datapath to guide the analysis of design trade-offs to meet application requirements or platform constraints. They enable estimation of the implementation specifications given different layer configurations, to maximize performance under the constraints on bandwidth and hardware resources. Experimental results indicate that the proposed module design targeting Maxeler technology can achieve a performance of 62.06 GFLOPS for 32-bit floating-point arithmetic, outperforming existing accelerators. Further evaluation based on training LeNet-5 shows that the proposed framework achieves about 4 times faster than CPU implementation of Caffe and about 7.5 times more energy efficient than the GPU implementation of Caffe.
暂无评论