The future of main memory appears to lie in the direction of new technologies that provide strong capacity-to-performance ratios, but have write operations that are much more expensive than reads in terms of latency, ...
详细信息
Technological advancements have necessitated the need for effectively teaching GPU computing. This need has been inspired by the increasing pattern of utilizing parallel computing and by the growing utilization of GPU...
详细信息
Technological advancements have necessitated the need for effectively teaching GPU computing. This need has been inspired by the increasing pattern of utilizing parallel computing and by the growing utilization of GPUs for computationally intensive tasks. This paper is motivated to address the above mentioned need. The paper describes a semester-long course on CUDA programming. The course has significant emphasis on developing practical hands-on skills, building skills for parallel algorithm design and implementation, and utilizing GPUs for solving computationally expensive problems. The paper explains the goals of the course and elaborates on course contents and students' assessments. Student feedback reveals effective learning and improved utilization of GPUs by students. This paper is useful for the community members who would like to teach GPU programming as an elective course in parallel computing. The course can either be offered at the senior undergraduate level or at the graduate level.
In this paper, we consider the convergence of a very general asynchronous-parallel algorithm called ARock [1], that takes many well-known asynchronous algorithms as special cases (gradient descent, proximal gradient, ...
详细信息
The core of a distributed system can be characterised by autonomously acting agents, where each agent executes its own program, uses shared resources and communicates with the others, but otherwise is totally obliviou...
详细信息
ISBN:
(纸本)9781538626276
The core of a distributed system can be characterised by autonomously acting agents, where each agent executes its own program, uses shared resources and communicates with the others, but otherwise is totally oblivious to the behaviour of the other agents. In a distributed adaptive system (DAS) agents may change their programs, enter or leave the collection at any time thereby changing the behaviour of the overall system. The behavioural theory of DAS provides an axiomatic definition plus a proof that concurrent reflective abstract state machines (crASMs) captures all systems stipulated by the axioms. In this paper we take a closer look into crASMs emphasising the tree background structure that is needed for handling the manipulation of self-representations.
Nowadays, the explosive growth in data collection in business and scientific areas has required the need to analyze and mine useful knowledge residing in these data. The recourse to data mining techniques seems to be ...
详细信息
Nowadays, the explosive growth in data collection in business and scientific areas has required the need to analyze and mine useful knowledge residing in these data. The recourse to data mining techniques seems to be inescapable in order to extract useful and novel patterns/models from large datasets. In this context, frequent itemsets (patterns) play an essential role in many data mining tasks that try to find interesting patterns from datasets. However, conventional approaches for mining frequent itemsets in Big Data era encounter significant challenges when computing power and memory space are limited. This paper proposes an efficient distributed frequent itemset mining algorithm, called parallelCharMax, that is based on a powerful sequential algorithm, called Charm, and computes the maximal frequent itemsets that are considered perfect summaries of the frequent ones. The proposed algorithm has been implemented using MapReduce framework. The experimental component of the study shows the efficiency and the performance of the proposed algorithm compared with well known algorithms such as MineWithRounds and HMBA.
The HEVC video coding standard designed by the Joint Collaborative Team on Video Coding requires nearly 70% more time than the previous standard H.264/AVC to encode a video sequence, because it is computationally more...
详细信息
Communicating radius of automatic light trap surveillance network characterizes how well an area is monitored or tracked by automatic light traps. Connectivity is an important required that shows how nodes in an autom...
详细信息
ISBN:
(纸本)9781467389297
Communicating radius of automatic light trap surveillance network characterizes how well an area is monitored or tracked by automatic light traps. Connectivity is an important required that shows how nodes in an automatic BPH light trap surveillance network can eectively communicate. In this paper, we propose a new approach to determine the communication radius of an automatic light trap based on balltree structure. This approach will propose a parallel algorithm for implementing the balltree structure (CudaBalltree) and determining the communication radius of an automatic light trap by using CUDA NVIDA platform.
Bayesian networks (BN) are probabilistic graphical models which are widely utilized in modeling complex biological interactions in the cell. Learning the structure of a BN is an NP-hard problem and existing exact and ...
详细信息
Bayesian networks (BN) are probabilistic graphical models which are widely utilized in modeling complex biological interactions in the cell. Learning the structure of a BN is an NP-hard problem and existing exact and heuristic solutions do not scale to large enough domains to allow for meaningful modeling of many biological processes. In this work, we present efficient parallel algorithms which push the scale of both exact and heuristic BN structure learning. We demonstrate the applicability of our methods by implementations on an IBM Blue Gene/L and an AMD Opteron cluster, and discuss their significance for future applications to systems biology.
We present two fundamentally different approaches to detect collisions between two point clouds and compare their performance on multiple datasets. A collision between points happens if they are closer to each other t...
详细信息
We present two fundamentally different approaches to detect collisions between two point clouds and compare their performance on multiple datasets. A collision between points happens if they are closer to each other than a given threshold radius. One approach utilizes the main CPU with a k-d tree datastructure to efficiently carry out fixed range searches around points in 3D while the other mainly executes on a GPU using a regular grid decomposition technique implemented in the CUDA framework. We will show how massively parallel 3D range searches on a grid based datastructure on a GPU performs similarly well as a tree based approach on the CPU with orders of magnitude less parallelization. We also show how each method scales with varying input sizes and how they perform differently well depending on the spatial structure of the input data. (C) 2016, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.
The advancement of the engine control increases the amount of computation. The production ECU (Electronic Control Unit), which is made of single-core architecture, cannot have a higher clock speed. Using multi- / many...
详细信息
The advancement of the engine control increases the amount of computation. The production ECU (Electronic Control Unit), which is made of single-core architecture, cannot have a higher clock speed. Using multi- / many-core architecture is the only way to decrease execution time. However, when implementing the engine control software, various problems occur in utilization of the multi- / many-core ECU. One of the biggest problems is sequential structure of control software because the software can only execute with one core on the multi- / many-core ECU. The purpose of this paper is to describe the parallelized control design method, which has decomposed sequential structure and decreases execution time in the embedded multi- / many-core production ECU. (C) 2016, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.
暂无评论