parallelalgorithms for particle tracking are central to the modeling of a wide range of physical processes including cloud formation, spray combustion, flows of ash from wildfires and reactions in nuclear systems Her...
详细信息
ISBN:
(纸本)9783642144028
parallelalgorithms for particle tracking are central to the modeling of a wide range of physical processes including cloud formation, spray combustion, flows of ash from wildfires and reactions in nuclear systems Here we focus on tracking the motion of cloud droplets with radii in the range from 10 to 60 mu m that are suspended in a turbulent flow field the gravity and droplet inertia are simultaneously considered Our codes for turbulent flow and droplet motion are fully parallelized in MPI (message passing interface), allowing efficient computation of dynamic and kinematic properties of a polydisperse suspension with more than 10(7) droplets Previous direct numerical simulations (DNS) of turbulent collision, due to their numerical complexity, are typically limited to small Taylor microscale flow Reynolds numbers (similar to 100), or equivalently to a small physical domain size at a given flow dissipation rate in a turbulent cloud the difficulty lies in the necessity to treat simultaneously a field representation of the turbulent flow and free movement of particles We demonstrate here how the particle tracking and collision can be handled within the framework of a specific domain decomposition Our newly developed MPI code can be run on computers with distributed memory and as such can take full advantage of available computational resources We discuss scalability of five major computational tasks in our code collision detection, advancing particle position, fluid velocity interpolation at particle location, implementation of the periodic boundary condition, using up to 128 CPUs In most tested cases we achieved parallel efficiency above 100 % due to a reduction in effective memory usage Finally, our MPI results of pair statistics are validated against a previous OpenMP implementation
We investigate a scheduling problem for the parallel batch-processing machines with deterioration consideration and non-identical job release times. the objective is to minimize the makespan. A linear programming mode...
详细信息
ISBN:
(纸本)9789623676960
We investigate a scheduling problem for the parallel batch-processing machines with deterioration consideration and non-identical job release times. the objective is to minimize the makespan. A linear programming model is formulated. Considering the complexity of the problem, a filter-and-fan based heuristic is proposed. three neighborhoods based on the characteristics of the problem are designed and used in the filterand-fan method. To probably avoid the search being trapped in a local optimum, a reconstructive strategy is proposed to strengthen the dispersity of solutions in the search process. A speedup strategy is also proposed to shorten the run time of the proposed method. the computational results show that for most of instances with small scale the heuristic can obtain optimal solutions. For all instances with large scale, the heuristic significantly outperform the standard commercial solver both on run time and quality of solution.
Video processing is computationally intensive and often has accompanying real-time or super-real-time requirements. For example, video tagging and surveillance systems need to robustly analyze video and automatically ...
详细信息
Video processing is computationally intensive and often has accompanying real-time or super-real-time requirements. For example, video tagging and surveillance systems need to robustly analyze video and automatically recognize the faces in real time. the semiconductor industry has shifted from increasing clock speeds to a strategy of growththrough increasing core counts. this shift from single core to multi-core presents a major challenge to application developers to exploit sufficient parallelism in performance-sensitive applications. this give rise to a new computation paradigm for developing more advance algorithms. In this paper, we present a method to efficiently parallelize face detection which can be extended to any object detection algorithms for SMP architectures. We also show that a well-designed parallel code of face detection algorithm will result in a performance gain in excess of 2X on dual core systems.
Many hardware efficient algorithms exists for hardware signal processing architecture. Among these algorithm is a set of shift-add algorithms collectively known as CORDIC (Coordinate Rotation for Digital Computers) fo...
详细信息
Many hardware efficient algorithms exists for hardware signal processing architecture. Among these algorithm is a set of shift-add algorithms collectively known as CORDIC (Coordinate Rotation for Digital Computers) for computing a wide range of functions including certain trigonometric, hyperbolic, linear and logarithmic functions. the paper compares the different CORDIC architectures with respect to their area, speed, and data throughput performance especially in three different major styles iterative, parallel and pipelined structures. All three designs were designed in VHDL, simulated using Modelsim simulator and Implemented using Xilinx FPGA synthesis and Synopsis ASIC synthesis tools.
Traditional techniques for processing continuous queries on moving objects reduce query re-computing through single-threaded and shared execution between multiple queries;and don't make use of the parallel computi...
详细信息
ISBN:
(纸本)9783642142451
Traditional techniques for processing continuous queries on moving objects reduce query re-computing through single-threaded and shared execution between multiple queries;and don't make use of the parallel computing capabilities of the ubiquitous multi-core CPUs. thus;to explore this kind of parallelism;a Multi-threading based Framework for Continuous Queries (MFCQ) is proposed which adopts a strategy of re-computing all of the queries periodically. the framework divides the query process into three phases:the updating;optimization and execution stages;multi-threading based methods are used in each phase. Moreover: the framework is deemed to be general;because it is compatible with various index techniques and query algorithms. By using the framework, a. query index based KNN algorithm and an object index based KNN algorithm are proposed respectively. Experimental results show that the multi-threading framework executed on the multi-core platform outperforms the traditional YPK-CNN algorithm.
this paper presents the evaluation of radix-2, radix-4 and radix-8algorithms for N-point FFTs on a homogeneous Multi-Processor System-on-Chip, prototyped on FPGA device. the evaluation of the algorithms was done anal...
详细信息
this paper presents the evaluation of radix-2, radix-4 and radix-8algorithms for N-point FFTs on a homogeneous Multi-Processor System-on-Chip, prototyped on FPGA device. the evaluation of the algorithms was done analysing profiling of the algorithms in comparison to a single processor architecture. the performance were evaluated in terms of required clock cycles, achieved speed-up and parallelization efficiency. the analysis showed for each algorithm how the parallelization efficiency grows moving from small to larger FFTs. Moreover the comparison between the different implementations showed the parallelization properties of each algorithm. Radix-2 algorithm shows the best speed-up and parallelization efficiency while radix-4 gives the best performance in terms of required clock cycles.
this paper presents a preliminary PhD research towards developing a framework to evaluate and optimize application mapping algorithms for Network-on-Chip architectures. Several such algorithms have been proposed for m...
详细信息
ISBN:
(纸本)9781424473359
this paper presents a preliminary PhD research towards developing a framework to evaluate and optimize application mapping algorithms for Network-on-Chip architectures. Several such algorithms have been proposed for mapping the threads of a parallel application on a NoC architecture. However, the performance of those algorithms is evaluated only on some specific NoC designs. A unified approach for evaluating such algorithms allows a better comparison of their performance and can potentially lead to some optimizations. the proposed framework is intended to be flexible so that the algorithms can be tested on different NoC designs. To this end, a scalable and flexible Network-on-Chip simulator is proposed. Some preliminary results obtained withthis simulator are presented, too. they show the flexibility of this simulator and that it is feasible for addressing the application mapping problem in a unified manner.
this work presents a new method to numerically calculate the signal to jitter distortion ratio (SDjR) for any modulated input which is bandpass sampled in the presence of jitter. the numerical method matches the deriv...
详细信息
this work presents a new method to numerically calculate the signal to jitter distortion ratio (SDjR) for any modulated input which is bandpass sampled in the presence of jitter. the numerical method matches the derived analytical equations. It is shown that the impact of white phase noise depends on the signal center frequency and the sub sampling factor while the impact of correlated jitter depends mostly on the signal center frequency. By separating the impact of both phase noise types on the SDjR, specifications for PLL phase noise are derived for a given sampling frequency.
We investigate a scheduling problem for the parallel batch-processing machines with deterioration consideration and non-identical job release times. the objective is to minimize the makespan. A linear programming mode...
详细信息
We investigate a scheduling problem for the parallel batch-processing machines with deterioration consideration and non-identical job release times. the objective is to minimize the makespan. A linear programming model is formulated. Considering the complexity of the problem, a filter-and-fan based heuristic is proposed. three neighborhoods based on the characteristics of the problem are designed and used in the filter-and-fan method. To probably avoid the search being trapped in a local optimum, a reconstructive strategy is proposed to strengthen the dispersity of solutions in the search process. A speedup strategy is also proposed to shorten the run time of the proposed method. the computational results show that for most of instances with small scale the heuristic can obtain optimal solutions. For all instances with large scale, the heuristic significantly outperform the standard commercial solver both on run time and quality of solution.
In this paper, a hybrid parallel computing framework is proposed for video understanding and retrieval. It is a unified computing architecture based on the Map-Reduce programming model, which supports multi-core and G...
详细信息
In this paper, a hybrid parallel computing framework is proposed for video understanding and retrieval. It is a unified computing architecture based on the Map-Reduce programming model, which supports multi-core and GPU architectures. A key task scheduler is designed for the parallelization of computation tasks. the SVM method is used to train models for video understanding purposes. To effectively shorten the training and processing time, the hybrid computing framework is used to train large scale SVM models. the TRECVID database is used as the basic experimental content for video understanding and retrieval. Experiments were conducted on two 8-core servers, each equipped with NVIDIA Quadro FX 4600 graphics card. Results proved that the proposed parallel computing framework works well for the video understanding and retrieval system by speeding up system development and providing better performances.
暂无评论