In recent years, many studies on optimization of energy consumption have focused on heterogeneous processor architectures. Heterogeneous computing model composed of CPU and GPU has developed from co-processing between...
详细信息
Most processors employ hardware data prefetching techniques to hide memory access latencies. However, the prefetching requests from different threads on a multicore processor can cause severe interference with prefetc...
详细信息
Most processors employ hardware data prefetching techniques to hide memory access latencies. However, the prefetching requests from different threads on a multicore processor can cause severe interference with prefetching and/or demand requests of others. The data prefetching can lead to significant performance degradation due to shared resource contention on shared memory multicore systems. This article proposes a thread-aware data prefetching mechanism based on low-overhead runtime information to tune prefetching modes and aggressiveness, mitigating the resource contention in the memory system. Our solution has three new components: (1) a self-tuning prefetcher that uses runtime feedback to dynamically adjust data prefetching modes and arguments of each thread, (2) a filtering mechanism that informs the hardware about which prefetching request can cause shared data invalidation and should be discarded, and (3) a limiter thread acceleration mechanism to estimate and accelerate the critical thread which has the longest completion time in the parallel region of execution. On a set of multithreaded parallel benchmarks, our thread-aware data prefetchingmechanism improves the overall performance of 64-core system by 13% over a multimode prefetch baseline system with two-level cache organization and conventional modified, exclusive, shared, and invalid-based directory coherence *** compare our approach with the feedback directed prefetching technique and find that it provides 9% performance improvement on multicore systems, while saving the memory bandwidth consumption.
Nowadays the digital universe becomes larger and larger. The data created every year has been up to ZB level. How to store the data in many normal servers is a critical issue. Although a distributed file system allevi...
详细信息
ISBN:
(纸本)9781509036837
Nowadays the digital universe becomes larger and larger. The data created every year has been up to ZB level. How to store the data in many normal servers is a critical issue. Although a distributed file system alleviates the storage problem, there is still a need to reduce the storage and speed up transmission of largescale data. The distributed file system like HDFS already offers compression schemes to cater to the need, however, when the workload and data format change, configuring the compression with only one kind of algorithm is not always effective. In this paper, we proposed a model called PACM (Prediction-based Auto-adaptive Compression Model) to optimize the storage and performance by using different algorithms, e.g. quicklz, zlib, snappy according to variable data format and workload. We also implemented the model inHadoop and our empirical evaluation shows that by using PACM, the write throughput has been improved by 2-5 times.
As one of the most important deep learning models, convolutional neural networks (CNNs) have achieved great successes in a number of applications such as image classification, speech recognition and nature language un...
详细信息
ISBN:
(纸本)9781509028245
As one of the most important deep learning models, convolutional neural networks (CNNs) have achieved great successes in a number of applications such as image classification, speech recognition and nature language understanding. Training CNNs on large data sets is computationally expensive, leading to a flurry of research and development of open-source parallel implementations on GPUs. However, few studies have been performed to evaluate the performance characteristics of those implementations. In this paper, we conduct a comprehensive comparison of these implementations over a wide range of parameter configurations, investigate potential performance bottlenecks and point out a number of opportunities for further optimization.
Based on the attribute-based encryption(ABE) scheme which was proposed by Brakerski and constructed on the LWE problem, a RLWE-based key-policy ABE scheme was presented. Efficiency and key size of this scheme overtake...
详细信息
SURF (Speeded up robust features) detection is used extensively in object detection, tracking and matching. However, due to its high complexity, it is usually a challenge to perform such detection in real time on a ge...
详细信息
ISBN:
(纸本)9781509053827
SURF (Speeded up robust features) detection is used extensively in object detection, tracking and matching. However, due to its high complexity, it is usually a challenge to perform such detection in real time on a general-purpose processor. This paper proposes a parallel computing algorithm for the fast computation of SURF, which is specially designed for FPGAs. By efficiently exploiting the advantages of the architecture of an FPGA, and by appropriately handling the inherent parallelism of the SURF computation, the proposed algorithm is able to significantly reduce the computation time. Our experimental results show that, for an image with a resolution of 640x480, the processing time for computing using SURF is only 0.047 seconds on an FPGA (XC6SLX150T, 66.7 MHz), which is 13 times faster than when performed on a typical i3-3240 CPU (with a 3.4 GHz main frequency) and 249 times faster than when performed on a traditional ARM system (CortexTM-A8, 1 GHz).
Current intrusion detection systems are mostly for detecting external attacks, but sometimes the internal staff may bring greater harm to organizations in information security. Traditional insider threat detection met...
详细信息
Although FPGA's power and performance advantages were recognized widely, designing applications on FPGA-based systems is traditionally a task undertaken by hardware experts. It is significant to allow application-...
详细信息
ISBN:
(纸本)9781450333153
Although FPGA's power and performance advantages were recognized widely, designing applications on FPGA-based systems is traditionally a task undertaken by hardware experts. It is significant to allow application-level programmers with less system-level but more algorithm knowledge to realize their applications conveniently on FPGAs. In this paper, an embedded FPGA operating system is proposed to facilitate application-level programmers to use FPGAs. Firstly, it builds specific I/Os and optimizes bus interconnection among I/Os, DDR memory, user IPs etc within the FPGA for vision computing. Secondly, it manages resources of the FPGA such as I/Os, DDR memory, communication etc, frees users from low-level details. Thirdly, it schedules tasks (IPs) executed on the FPGA dynamically in runtime, which makes the FPGA multiplexed when necessary. After porting the FPGA operating system to different FPGA platforms and implementing vision algorithms based on that, it shows the FPGA operating system is able to simplify algorithm development on FPGA platforms and improve portability of user applications. Furthermore, implementation results of several popular vision algorithms show the FPGA operating system is efficient and effective for vision computing. Finally, experimental results shows that for multiple algorithms requiring more FPGA resources, runtime task scheduling of multiple IPs is more efficient than a fixed IP when the SoC of FPGA is considered.
We formalize the security notions of non-malleability under selective opening attacks (NM-SO security) in two approaches: the indistinguishability-based approach and the simulation-based approach. We explore the relat...
详细信息
暂无评论