作者:
Sun, JCChinese Acad Sci
Inst Software R&D Ctr Parallel Software Beijing 100080 Peoples R China
In this paper, the problem of partitioning parallel dodecahedrons in 3D is examined. Two schemes are introduced and their convergence rate discussed. A parallel fast solver was implemented and tested experimentally, w...
详细信息
ISBN:
(纸本)3540292357
In this paper, the problem of partitioning parallel dodecahedrons in 3D is examined. Two schemes are introduced and their convergence rate discussed. A parallel fast solver was implemented and tested experimentally, withthe performance results presented.
We propose two extensions for a state-of-the-art method of rollback-recovery in distributed CEP (complex event processing). In CEP, an operator network is used to search for patterns in events streams. Sometimes these...
详细信息
ISBN:
(纸本)9781538655023
We propose two extensions for a state-of-the-art method of rollback-recovery in distributed CEP (complex event processing). In CEP, an operator network is used to search for patterns in events streams. Sometimes these operators fail and lose their state. Rollback-recovery is a method for dealing with such state losses. the type of rollback-recovery we consider is upstream backup, where the state of a failed operator is recovered by replaying to it the input events that led it to that state. these events are kept in upstream operators' memory buffers, which are trimmed continuously as the downstream operator progresses. the first extension we propose saves memory and speeds up recovery by avoiding to store and retransmit unnecessary events. the second extension makes the base method of upstream backup compatible with data-parallel CEP, allowing that the windows into which operators partition their input be processed in parallel. We evaluated the proposed extensions through experiments that showed a significant reduction in memory usage and recovery time at the expense of a negligible processing overhead during normal operation.
Rate-constrained motion estimation (RCME) is the most computationally intensive task of H.265/HEVC encoding. Massively parallelarchitectures, such as graphics processing units (GPUs), used in combination with a multi...
详细信息
ISBN:
(纸本)9781509021758
Rate-constrained motion estimation (RCME) is the most computationally intensive task of H.265/HEVC encoding. Massively parallelarchitectures, such as graphics processing units (GPUs), used in combination with a multi-core central processing unit (CPU), provide a promising computing platform to achieve fast encoding. However, the dependencies in deriving motion vector predictors (MVPs) prevent the parallelization of prediction units (PUs) processing at a frame level. Moreover, the conditional execution structure of typical fast search algorithms is not suitable for GPUs designed for data-intensive parallel problems. In this paper, we propose a novel highly parallel RCME method based on multiple temporal motion vector (MV) predictors and a new fast nested diamond search (NDS) algorithm well-suited for a GPU. the proposed framework provides fine-grained encoding parallelism. Experimental results show that our approach provides reduced GPU load with better BD-Rate compared to prior full search parallel methods based on a single MV predictor.
In the field of parallel computing, Coarse-Grained Reconfigurable Architecture (CGRA) is a promising technique for processingparallel applications. Application kernels are mapped on CGRA through the calculation of ma...
详细信息
In the last few years, we have been seeing a significant increase in research about the energy efficiency of hardware and software components in both centralized and parallel platforms. In data centers, DBMSs are one ...
详细信息
ISBN:
(纸本)9783319495835;9783319495828
In the last few years, we have been seeing a significant increase in research about the energy efficiency of hardware and software components in both centralized and parallel platforms. In data centers, DBMSs are one of the major energy consumers, in which, a large amount of data is queried by complex queries running daily. Having green nodes is a pre-condition to design an energy-aware parallel database cluster. Generally, the most existing DBMSs focus on high-performance during query optimization phase, while usually ignoring the energy consumption of the queries. In this paper, we propose a methodology, supported by a tool called EnerQuery, that makes nodes of parallel database clusters saving energy when optimizing queries. To show its effectiveness, we implement our proposal on the top of PostgreSQL DBMS query optimizer. A mathematical cost model based on a machine learning technique is defined and used to estimate the energy consumption of SQL queries.
this paper considers the problem of digital predistortion of parallel Wiener-type systems using the Recursive Prediction Error Method (RPEM) and the Nonlinear Filtered-x Least Mean Squares (NFxLMS) algorithms. the RPE...
详细信息
ISBN:
(纸本)9781424421787
this paper considers the problem of digital predistortion of parallel Wiener-type systems using the Recursive Prediction Error Method (RPEM) and the Nonlinear Filtered-x Least Mean Squares (NFxLMS) algorithms. the RPEM algorithm is used for the identification of the parallel Wiener-type system and the FIR filter that represents the inverse of the linear kernels. then the estimate of the nonlinear kernels and the inverse of the linear kernels are used to construct the predistorter as done in [1]. On the other hand, the NFxLMS algorithm is used to directly estimate the coefficients of the predistorter modeled using Volterra series. A comparative simulation study between the two algorithms is given in this paper
It is a trend now that computing power through parallelism is provided by multi-core systems or heterogeneous architectures for High Performance Computing (HPC) and scientific computing. Although many algorithms have ...
详细信息
ISBN:
(纸本)9781509052523
It is a trend now that computing power through parallelism is provided by multi-core systems or heterogeneous architectures for High Performance Computing (HPC) and scientific computing. Although many algorithms have been proposed and implemented using sequential computing, alternative parallel solutions provide more suitable and high performance solutions to the same problems. In this paper, three parallelization strategies are proposed and implemented for a dynamic programming based cloud smoothing application, using both shared memory and non-shared memory approaches. the experiments are performed on NVIDIA GeForce GT750m and Tesla K20m, two GPU accelerators of Kepler architecture. Detailed performance analysis is presented on partition granularity at block and thread levels, memory access efficiency and computational complexity. the evaluations described show high approximation of results with high efficiency in the parallel implementations, and these strategies can be adopted in similar data analysis and processing applications.
In the sequential model of programming, instructions in a program are executed sequentially. Existing, programming languages are mainly designed for the sequential model. As the programming paradigm shifts from the se...
详细信息
ISBN:
(纸本)9783642030949
In the sequential model of programming, instructions in a program are executed sequentially. Existing, programming languages are mainly designed for the sequential model. As the programming paradigm shifts from the sequential to distributed computing, existing sequential programming languages have their limitations. Nevertheless, the sequential languages are the languages which most of programmers are most familiar with. One of the motivations of this research is to implement a framework to support the implementations of distributed applications using Sequential programming languages Such as C/C++, COBOL, and Java. In this paper, we present an implementation of a framework for open distributed programming. Allowing programmers to write distributed programs in their favorite sequential programming languages makes the programming paradigm very unique to the existing programming paradigms.
parallelprocessing is a vital tool for many scientific and industrial applications where real time constraints apply;in many applications the use of parallelprocessing and multiprocessor platforms seems to be the fa...
详细信息
ISBN:
(纸本)0780375963
parallelprocessing is a vital tool for many scientific and industrial applications where real time constraints apply;in many applications the use of parallelprocessing and multiprocessor platforms seems to be the favourable solution for achieving acceptable throughput. Hence parallelprocessingalgorithms are vital tools to achieve a good trade off between hardware cost, system efficiency and power. In this paper, the one-dimensional generalised parallel block filter algorithm based on the overlap-add approach is implemented on multi-DSPs platform. the mathematical concept of the input stage, output stage and the generalised direct filter equation are given. Also the I-D parallel algorithm is shown and a suitable parallel architecture is presented.
the optimal directed acyclic graph search problem constitutes searching for a DAG with a minimum score, where the score of a DAG is defined on its structure. this problem is known to be NP-hard, and the state-of-the-a...
详细信息
ISBN:
(纸本)9783319495835;9783319495828
the optimal directed acyclic graph search problem constitutes searching for a DAG with a minimum score, where the score of a DAG is defined on its structure. this problem is known to be NP-hard, and the state-of-the-art algorithm requires exponential time and space. It is thus not feasible to solve large instances using a single processor. Some parallelalgorithms have therefore been developed to solve larger instances. A recently proposed parallel algorithm can solve an instance of 33 vertices, and this is the largest solved size reported thus far. In the study presented in this paper, we developed a novel parallel algorithm designed specifically to operate on a parallel computer with a torus network. Our algorithm crucially exploits the torus network structure, thereby obtaining good scalability. through computational experiments, we confirmed that a run of our proposed method using up to 20,736 cores showed a parallelization efficiency of 0.94 as compared to a 1296-core run. Finally, we successfully computed an optimal DAG structure for an instance of 36 vertices, which is the largest solved size reported in the literature.
暂无评论