In this work, we address the challenge of designing an efficient warp scheduler for throughput processors by proposing SAWS (Simple and Adaptive Warp Scheduler). Differently from previous approaches which target a par...
详细信息
ISBN:
(纸本)9781538649756
In this work, we address the challenge of designing an efficient warp scheduler for throughput processors by proposing SAWS (Simple and Adaptive Warp Scheduler). Differently from previous approaches which target a particular type of applications, SAWS considers several simple scheduling algorithms and tries to use the one that best fits each application or phase within an application. Through detailed simulations we demonstrate that a practical implementation of SAWS can obtain IPC values that closely match the best scheduling algorithm in each case.
Wire routing has always been very compute bound phase in the realm of physical design of Very Large Integration Circuits (VLSI) circuits. Some of the software solutions to this problem entail divide and conquer method...
详细信息
ISBN:
(纸本)0780312813
Wire routing has always been very compute bound phase in the realm of physical design of Very Large Integration Circuits (VLSI) circuits. Some of the software solutions to this problem entail divide and conquer methods like the hierarchical routing, etc., in order to reduce its time complexity. Recently, hardware accelerators have been employed to achieve further increase in the speed of this process. In this paper, implementation aspects of a reduced array architecture (RAA) for hardware acceleration of the cut and paste hierarchical routing algorithm are detailed. Several macros have been defined to implement the algorithm in hardware. The architecture has been implemented in double-metal 2ji CMOS technology.
The development of complex networked multi-core systems, like compute nodes in the Internet-of-Things, requires new simulation and design concepts. In this paper we present an environment for the asynchronous simulati...
详细信息
ISBN:
(纸本)9781479968909
The development of complex networked multi-core systems, like compute nodes in the Internet-of-Things, requires new simulation and design concepts. In this paper we present an environment for the asynchronous simulation of networked multi-core systems, based on SystemC. Combined with the open-source machine emulator and virtualizer QEMU, a virtual network is created. The compute nodes act similar to recent Systems-on-Chip from Xilinx and Altera. By combining an ARM processing system with programmable logic, a high flexibility is provided. We exemplary simulate these systems by extending QEMU, following its device model abstraction qdev. The resulting network benefits from the execution on different host systems. It is highly scalable and designed for the development of complex networked multi-core systems. For the non-distributed execution on one processor we implemented an alternative communication method which takes only 2/3 of the time for networked simulation.
This paper presents work-in-progress towards a C++ source-to-source translator that automatically seeks parallelisable code fragments and replaces them with code for a graphics co-processor. We report on our experienc...
详细信息
A network of (wireless smart) cameras can analyse the scene from different views. Wireless smart cameras challenge the hardware for low-power consumption and high imaging performance. In this paper we introduce a wire...
详细信息
ISBN:
(纸本)1424407281
A network of (wireless smart) cameras can analyse the scene from different views. Wireless smart cameras challenge the hardware for low-power consumption and high imaging performance. In this paper we introduce a wireless smart camera based on an SIMD video-analysis processor and an 8051 microcontroller as a local host. Wireless communication is through the ieee802.15.4 standard. The camera constructed in this paper is to enable application research into distributed smart camera systems.
In this paper, we present a new distributed algorithm for minimizing a sum of non-necessarily differentiable convex functions composed with arbitrary linear operators. The overall cost function is assumed strongly con...
详细信息
ISBN:
(纸本)9781479970612
In this paper, we present a new distributed algorithm for minimizing a sum of non-necessarily differentiable convex functions composed with arbitrary linear operators. The overall cost function is assumed strongly convex. Each involved function is associated with a node of a hypergraph having the ability to communicate with neighboring nodes sharing the same hyperedge. Our algorithm relies on a primal-dual splitting strategy with established convergence guarantees. We show how it can be efficiently implemented to take full advantage of a multicore architecture. The good numerical performance of the proposed approach is illustrated in a problem of video sequence denoising, where a significant speedup is achieved.
Directed Acyclic Graphs (DAGs) are often used to model circuits and networks. The path length in such DAGs represents circuit or network delays. In the vertex splitting problem, the objective is to determine a minimum...
详细信息
Similarity-oriented services serve as a foundation in a wide range of data analytic applications such as machine learning, target advertising, and real-time decisions. Both industry and academia strive for efficient a...
详细信息
ISBN:
(纸本)9781538627044
Similarity-oriented services serve as a foundation in a wide range of data analytic applications such as machine learning, target advertising, and real-time decisions. Both industry and academia strive for efficient and scalable similarity discovery and querying techniques to handle massive, complex data records in the real world. In addition to performance, data security and privacy become an indispensable criterion in the quality of service due to progressively increased data breaches. To address this serious concern, in this paper, we propose and implement "EncSIM", an encrypted and scalable similarity search service. The architecture of EncSIM enables parallel query processing over distributed, encrypted data records. To reduce client overhead, EncSIM resorts to a variant of the state-of-the-art similarity search algorithm, called all-pairs locality-sensitive hashing (LSH). We describe a novel encrypted index construction for EncSIM based on searchable encryption to guarantee the security of service while preserving performance benefits of all-pairs LSH. Moreover, EncSIM supports data record addition with a strong security notion. Intensive evaluations on a cluster of Redis demonstrate low client cost, linear scalability, and satisfied query performance of EncSIM.
Global Computing (GC) platforms such as BOINC [1] are nowadays considered as the most powerful distributed computing systems worldwide. Based on volunteer computing and various forms of incentives, such architecture a...
详细信息
Computing radiosity is a very expensive problem in computer graphics. Recent hierarchical methods have greatly speeded up the computation of first diffuse and now also specular radiosity. We present a parallel algorit...
详细信息
ISBN:
(纸本)9781581130102
Computing radiosity is a very expensive problem in computer graphics. Recent hierarchical methods have greatly speeded up the computation of first diffuse and now also specular radiosity. We present a parallel algorithm for computing both diffuse and specular radiosity together, and discuss the techniques we used to improve its performance. The algorithm is both irregular and highly unpredictable. Despite this, by carefully designing a parallel algorithm that minimizes synchronization and memory access overhead and by identifying and correcting several synchronization bottlenecks that we did not anticipate, we were able to obtain speedups of 26.3 on a 32-processor machine with distributed memory and 14.2 on a 16-processor machine with centralized memory. We demonstrate how execution profiles obtained at runtime, for example time spent waiting at different locks, can be used to significantly improve the performance of complex, irregular parallelapplications.
暂无评论