Nowadays GPUs become extremely promising multi/many-core architectures for a wide range of demanding applications. Basic features of these architectures include utilization of a large number of relatively simple proce...
详细信息
ISBN:
(纸本)9783642143892
Nowadays GPUs become extremely promising multi/many-core architectures for a wide range of demanding applications. Basic features of these architectures include utilization of a large number of relatively simple processing units which operate in the SIMD fashion, as well as hardware supported, advanced multithreading. However, the utilization of GPUs in an every-day practice is still limited, mainly because of necessity of deep adaptation of implemented algorithms to a target architecture. hi this work, we propose how to perform such an adaptation to achieve an efficient parallel implementation of the conjugate gradient (CG) algorithm, which is widely used for solving large sparse linear systems of equations, arising e.g. in FEM problems. Aiming at efficient;implementation of the main operation of the CG algorithm, which is sparse matrix-vector multiplication (SpMV), different techniques of optimizing access to the hierarchical memory of GPUs are proposed and studied. the experimental investigation of a proposed CUDA-based implementation of the CG algorithm is carried out On two CPU architectures: GeForce 8800 and Tesla C1060. It has been shown that optimization of access to CPU memory allows us to reduce considerably the execution time of the SpMV operation, and consequently to achieve a significant speedup over CPUs when implementing the whole CC algorithm.
As the fast development of Bluetooth networks and wireless communications, the mobile devices share information with each other easier than ever before. However, the handy communication technology accompanies privacy ...
详细信息
ISBN:
(纸本)9783642131189
As the fast development of Bluetooth networks and wireless communications, the mobile devices share information with each other easier than ever before. However, the handy communication technology accompanies privacy and security issues. Nowadays, a Bluetooth adopts peer-to-peer and Frequency Hopping Spread Spectrum (FHSS) mechanisms to avoid data reveal, but the malicious attacks collect the transmission data of the relay station for a long period of time and then can break into the system. In this study, we take a Piconet as a cube, and transform a Scatternet into a cluster (N-cube) structure. Subsequently, this study exploits the Elliptic Curve Diffie-Hellman (ECDH) [1] and the conference Key (CK) schemes to perform session key agreements and secure data transmissions. the proposed scheme only needs a small key length 160-bit to achieve compatible security levels on 1024-bit Diffee-Hellman (DH) [2], and each node uses few CPU, memory and bandwidth to complete security operations. As a result, the proposed fault-tolerant routing algorithm with secure data transmissions can perform rapidly and efficiently, and is quite suited for Bluetooth networks with limited resources.
the Brazilian middleware for Digital TV, known as Ginga, is currently divided in two subsystems: the declarative, named Ginga Nested Context Language (Ginga-NCL), and the procedural Ginga-Java (Ginga-J). the Ginga Dev...
详细信息
Finite-Difference Time-Domain (FDTD) has been proved to be a very useful computational electromagnetic algorithm. However, the scheme based on traditional general purpose processors can be computationally prohibitive ...
详细信息
ISBN:
(纸本)9783642131189
Finite-Difference Time-Domain (FDTD) has been proved to be a very useful computational electromagnetic algorithm. However, the scheme based on traditional general purpose processors can be computationally prohibitive and require thousands of CPU hours, which hinders the large-scale application of FDTD. With rapid progress on GPU hardware capability and its programmability, we propose in this paper a novel scheme in which GPU is applied to accelerate three-dimensional FDTD with UPML absorbing boundary conditions. this GPU-based scheme can reduce the computation time significantly, while obtaining high accuracy as compared withthe CPU-based scheme. With only one AMD ATI HD4850 GPU, when computational domain is up to (180x80x180), our implementation of the GPU-based FDTD performs approximately 93 times faster than the one running with Intel E2180 dual cores CPU.
this paper believes that "broken layers" and "application- driven" will be new trends of microprocessor architecture. After discussion of parallel technologies at several levels, and schemes to man...
详细信息
In many image processing systems, an input grayscale image is reformed to make a clear image. After this reformation, the obtained gray-scale image is transformed into a binary image. Some gray-scale and binary image ...
详细信息
ISBN:
(纸本)9788988678299
In many image processing systems, an input grayscale image is reformed to make a clear image. After this reformation, the obtained gray-scale image is transformed into a binary image. Some gray-scale and binary image processing programs are implemented onto a linearly connected parallel processor. After partitioning an image into rectangular regions, each region is loaded in the main memory of a processing element (PE) as a partial image of the input image. PE's are connected through two communication memories. By accessing different memories, it accesses pixels stored on adjacent PE's without conflicts of memory access. In spite of the simple architecture of this linearly connected parallel processor, the results of the implementation of some programs indicate that execution times are improved, depending on the number of PE's.
the web browser is a CPU-intensive program. Especially on mobile devices, webpages load too slowly, expending significant time in processing a document's appearance. Due to power constraints, most hardware-driven ...
详细信息
the purpose of this paper is to propose a method for constructing correct parallelprocessing programs from a problem description. the framework we adopt for this purpose is Equivalent Transformation Framework (ETF), ...
详细信息
ISBN:
(纸本)9788988678299
the purpose of this paper is to propose a method for constructing correct parallelprocessing programs from a problem description. the framework we adopt for this purpose is Equivalent Transformation Framework (ETF), which regards computation as transformation of definite clauses. In the framework, a problem's domain knowledge and a query are described in definite clauses, and its meaning is defined by a model of the set of definite clauses. then meaning-preserving transformation rules for the query are generated. We propose a parallelprocessing method based on "specialization", a part of operation in the transformations, and discuss new parallelprocessing method based on the specialization that maintains correctness of the computation. the specialization is generalized notion of substitution in logic programming, and it allows more rich representation. We demonstrate the advantage of using specialization rather than substitution in constraint satisfaction problem solving.
the proceedings contain 98 papers. the topics discussed include: starsscheck: a tool to find errors in task-based parallel programs;automated tuning in parallel sorting on multi-core architectures;estimating and explo...
ISBN:
(纸本)3642152767
the proceedings contain 98 papers. the topics discussed include: starsscheck: a tool to find errors in task-based parallel programs;automated tuning in parallel sorting on multi-core architectures;estimating and exploiting potential parallelism by source-level dependence profiling;efficient graph partitioning algorithms for collaborative grid workflow developer environments;profile-driven selective program loading;characterizing the impact of using spare-cores on application performance;a model for space-correlated failures in large-scale distributed systems;architecture exploration for efficient data transfer and storage in data-parallel applications;non-clairvoyant scheduling of multiple bag-of-tasks applications;extremal optimization approach applied to initial mapping of distributed java programs;a parallel implementation of the Jacobi-Davidson eigensolver and its application in a plasma turbulence code;and exploiting fine-grained parallelism on cell processors.
the proceedings contain 98 papers. the topics discussed include: starsscheck: a tool to find errors in task-based parallel programs;automated tuning in parallel sorting on multi-core architectures;estimating and explo...
ISBN:
(纸本)3642152902
the proceedings contain 98 papers. the topics discussed include: starsscheck: a tool to find errors in task-based parallel programs;automated tuning in parallel sorting on multi-core architectures;estimating and exploiting potential parallelism by source-level dependence profiling;efficient graph partitioning algorithms for collaborative grid workflow developer environments;profile-driven selective program loading;characterizing the impact of using spare-cores on application performance;a model for space-correlated failures in large-scale distributed systems;architecture exploration for efficient data transfer and storage in data-parallel applications;non-clairvoyant scheduling of multiple bag-of-tasks applications;extremal optimization approach applied to initial mapping of distributed java programs;a parallel implementation of the Jacobi-Davidson eigensolver and its application in a plasma turbulence code;and exploiting fine-grained parallelism on cell processors.
暂无评论