This paper describe the parallel implementation of cone beam CT reconstruction using MPI(Message Processing Interface) on workstation, it also analysis FDK algorithm and its parallel implementation. T
This paper describe the parallel implementation of cone beam CT reconstruction using MPI(Message Processing Interface) on workstation, it also analysis FDK algorithm and its parallel implementation. T
The emergence of the new standard HEVC (High Efficiency Video Coding) is accompanied with serious problems related to resource consumption and encoding time. The proposal of new tools and optimizations is strongly rec...
详细信息
ISBN:
(纸本)9781509015948
The emergence of the new standard HEVC (High Efficiency Video Coding) is accompanied with serious problems related to resource consumption and encoding time. The proposal of new tools and optimizations is strongly recommended to ensure the integration of this new encoder in various platforms and multimedia applications. In this context, Kvazaar HEVC encoder is introduced to overcome the problems related to HEVC test model (HM) reference software. This academic open-source is tailored to fit the programmer's needs by enabling high-level parallel processing. In this context, this paper presents different parallel implementations of the Kvazaar HEVC encoder on a powerful Octa-core CubieBoard4 platform including two quad-core ARM A7 and ARM A15 for efficient power and high performance in a single chip. A performance comparison of different parallelization strategies is performed. For single-threaded implementation, experimental results show that the high speed preset (RD1) can save up to 48% and 91% of encoding time for Random Access (RA) and All-Intra (AI) configurations respectively. When moving to multi-threaded implementation, time saving is about 65% to 75% for AI configuration. Moreover, experiments show that Wavefront parallel Processing (WPP) outperforms tiles-level parallelization in terms of encoding speed without inducing video quality degradation or bitrate increase.
The parallel implementation of a novel mesh simplification method is introduced detailedly in this paper, which is based on a Beowulf cluster *** full advantage of the distributed memory and high performance network,w...
详细信息
The parallel implementation of a novel mesh simplification method is introduced detailedly in this paper, which is based on a Beowulf cluster *** full advantage of the distributed memory and high performance network,we can simplify out-of-core models quickly and avoid thrashing the virtual memory *** addition,the file I/O and load balancing are also considered to make sure a near optimal utilization of the computational resources as well as obtaining high quality output.A set of numerical experiments have demonstrated that our parallel implementation can not only reduce the execution time greatly but also obtain higher parallel efficiency.
We present a parallel and linear scaling implementation of the calculation of the electrostatic potential arising from an arbitrary charge *** approach is making use of the multi-resolution basis of *** potential is o...
详细信息
We present a parallel and linear scaling implementation of the calculation of the electrostatic potential arising from an arbitrary charge *** approach is making use of the multi-resolution basis of *** potential is obtained as the direct solution of the Poisson equation in its Green’s function integral *** the multiwavelet basis,the formally non local integral operator decays rapidly to negligible values away from the main diagonal,yielding an effectively banded structure where the bandwidth is only dictated by the requested *** sparse operator structure has been exploited to achieve linear scaling and parallel *** has been achieved both through the shared memory(OpenMP)and the message passing interface(MPI)*** implementation has been tested by computing the electrostatic potential of the electronic density of long-chain alkanes and diamond fragments showing(sub)linear scaling with the system size and efficent parallelization.
This work focuses on the control of a camera mounted on a differential drive robot via a VPC (Visual Predictive Control) scheme. First, an exact model of the visual feature prediction is presented for this robotic sys...
详细信息
This work focuses on the control of a camera mounted on a differential drive robot via a VPC (Visual Predictive Control) scheme. First, an exact model of the visual feature prediction is presented for this robotic system. Next, relying on the equivalent command vector concept, a parallel implementation on a GPU (Graphics Processing Unit) of the computation of the cost function and its gradient is presented. Finally, results show that the proposed approach is more accurate than the ones classically used and can be up to six times faster than CPU-based (Central Processing Unit) one for large prediction horizons and numerous visual features. It then becomes possible to implement a VPC controller running sufficiently fast to perform a navigation tasks, while guaranteeing the closed-loop stability by relying on large prediction horizons.
The watershed transform has been used as a powerful morphological segmentation tool in a variety of image processing applications. This is because it gives a good segmentation result if a topographical relief and mark...
详细信息
The watershed transform has been used as a powerful morphological segmentation tool in a variety of image processing applications. This is because it gives a good segmentation result if a topographical relief and markers are suitably chosen for different type of images. This paper proposes a parallel implementation of the watershed transform on the cellular neural network (CNN) universal machine, called cellular watersheds. Owing to its fine grain architecture, the watershed transform can be parallelized using local information. Our parallel implementation is based on a simulated immersion process. To evaluate our implementation, we have experimented on the CNN universal chip, ACE16k, for synthetic and real images.
Modeling complex material failure with competing mechanisms is a difficult task that often leads to mathematical and numerical challenges. This work contributes to the study of localized failure mechanisms by means of...
详细信息
Modeling complex material failure with competing mechanisms is a difficult task that often leads to mathematical and numerical challenges. This work contributes to the study of localized failure mechanisms by means of phase fields in a variational framework: in addition to the treatment of brittle and ductile fracture, done in previous work, we consider the case of shear band formation followed by ductile fracture. To achieve this, a new degradation function is introduced, which distinguishes between two successive failure mechanisms: (i) plastic strain localization and (ii) ductile fracture. Specifically, the onset of elastic damage is delayed to allow for the formation of shear bands driven by plastic deformations, thus accounting for the mechanisms that precede the coalescence of voids and microcracks into macroscopic ductile fractures. Once a critical degradation value has been reached, a phase-field model is introduced to capture the (regularized) kinematics of macroscopic cracks. To tackle the issue of potentially high computational cost, we propose a parallel implementation of the phase-field approach based on an iterative algorithm. The algorithm was implemented within the Alya system, a high performance computational mechanics code. Several examples show the capabilities of our implementation. We pay special attention to the ability to capture different failure mechanisms.
parallel implementations of RLS algorithms over systolic architectures are considered and their efficiency in terms of estimate updating rate is discussed. New implementations are proposed, which allow higher throughp...
详细信息
parallel implementations of RLS algorithms over systolic architectures are considered and their efficiency in terms of estimate updating rate is discussed. New implementations are proposed, which allow higher throughputs (up to 0(1) estimate updates per time unit). Since in some of them, a distortion with respect to exact RLS is introduced, their performance is investigated, both analytically and experimentally. Tradeoffs between complexity and performance are discussed.
Background: The huge quantity of data produced in Biomedical research needs sophisticated algorithmic methodologies for its storage, analysis, and processing. High Performance Computing (HPC) appears as a magic bullet...
详细信息
Background: The huge quantity of data produced in Biomedical research needs sophisticated algorithmic methodologies for its storage, analysis, and processing. High Performance Computing (HPC) appears as a magic bullet in this challenge. However, several hard to solve parallelization and load balancing problems arise in this context. Here we discuss the HPC-oriented implementation of a general purpose learning algorithm, originally conceived for DNA analysis and recently extended to treat uncertainty on data (U-BRAIN). The U-BRAIN algorithm is a learning algorithm that finds a Boolean formula in disjunctive normal form (DNF), of approximately minimum complexity, that is consistent with a set of data (instances) which may have missing bits. The conjunctive terms of the formula are computed in an iterative way by identifying, from the given data, a family of sets of conditions that must be satisfied by all the positive instances and violated by all the negative ones;such conditions allow the computation of a set of coefficients (relevances) for each attribute (literal), that form a probability distribution, allowing the selection of the term literals. The great versatility that characterizes it, makes U-BRAIN applicable in many of the fields in which there are data to be analyzed. However the memory and the execution time required by the running are of O(n(3)) and of O(n(5)) order, respectively, and so, the algorithm is unaffordable for huge data sets. Results: We find mathematical and programming solutions able to lead us towards the implementation of the algorithm U-BRAIN on parallel computers. First we give a Dynamic Programming model of the U-BRAIN algorithm, then we minimize the representation of the relevances. When the data are of great size we are forced to use the mass memory, and depending on where the data are actually stored, the access times can be quite different. According to the evaluation of algorithmic efficiency based on the Disk Model, in order to r
Satellite onboard processing for hyperspectral imaging applications is characterized by large data sets, limited processing resources and limited bandwidth of communication links. The CCSDS-123 algorithm is a speciali...
详细信息
Satellite onboard processing for hyperspectral imaging applications is characterized by large data sets, limited processing resources and limited bandwidth of communication links. The CCSDS-123 algorithm is a specialized compression standard assembled for space-related applications. In this paper, a parallel FPGA implementation of CCSDS-123 compression algorithm is presented. The proposed design can compress any number of samples in parallel allowed by resource and I/O bandwidth constraints. The CCSDS-123 processing core has been placed on Zynq-7035 SoC and verified against the existing reference software. The estimated power use scales approximately linearly with the number of samples processed in parallel. Finally, the proposed implementation outperforms the state-of-the-art implementations in terms of both throughput and power.
暂无评论