A hybrid genetic algorithm for global multi-objective optimization is parallelized and applied to solve competitive facility location problems. the impact of usage of the local search on the performance of the paralle...
详细信息
ISBN:
(数字)9783642551956
ISBN:
(纸本)9783642551956
A hybrid genetic algorithm for global multi-objective optimization is parallelized and applied to solve competitive facility location problems. the impact of usage of the local search on the performance of the parallel algorithm has been investigated. An asynchronous version of the parallel genetic algorithm withthe local search has been proposed and investigated by solving competitive facility location problem utilizing hybrid distributed and shared memory parallel programming model on high performance computing system.
Hybrid parallel file systems (PFS), which consist of both HDD and SSD servers, provide a promising solution for data-intensive applications. In this study, we propose a performance-aware data placement (PADP) strategy...
详细信息
ISBN:
(纸本)9783319111971;9783319111964
Hybrid parallel file systems (PFS), which consist of both HDD and SSD servers, provide a promising solution for data-intensive applications. In this study, we propose a performance-aware data placement (PADP) strategy to enable efficient data layout in hybrid PFSs. the basic idea of PADP is to dispatch data on different file servers with adaptive varied-size file stripes based on the server storage performance. By using an effective data access cost model and a linear programming optimization method, the appropriate stripe sizes for each file server are determined effectively. We have implemented PADP within OrangeFS, a widely used parallel file system in HPC domain. Experimental results of representative benchmark show that PADP can significantly improve the I/O performance of hybrid PFSs.
Quantum computer simulators play an important role when we evaluate quantum algorithms. Quantum computation can be regarded as parallel computation in some sense, and thus, it is suitable to implement a simulator on a...
详细信息
ISBN:
(纸本)9781479960163
Quantum computer simulators play an important role when we evaluate quantum algorithms. Quantum computation can be regarded as parallel computation in some sense, and thus, it is suitable to implement a simulator on a hardware, which can process a lot of operations in parallel. In this paper, we propose a processor architecture dedicated to simulating quantum algorithms. the proposed architecture is based on the register reordering method that shifts and swaps registers containing probability amplitudes so that the probability amplitudes of target basis states can be quickly selected. this reduces the number of large multiplexers and improves clock frequency. We implemented the processor on an FPGA. Experimental results show that the proposed processor has scalability in terms of the number of quantum bits, and can simulate quantum algorithms faster than software simulators.
Many-core architectures are playing an important role in the HPC systems. But they are giving high performance at the cost of a great electrical power consumption. On Tianhe-2 supercomputer, the Xeon Phi many-core pro...
详细信息
Super Resolution (SR) is a technique to recover a high-resolution (HR) image from different noisy low resolution (LR) images. the missing highfrequency components in LR images should be restored correctly in HR image....
详细信息
Due to the power limitation and the small size condition of the wireless capsule endoscopy, therefore the principal defiance is to reduce the area and the power consumption. the aim is to preserve acceptable image rec...
详细信息
Due to the power limitation and the small size condition of the wireless capsule endoscopy, therefore the principal defiance is to reduce the area and the power consumption. the aim is to preserve acceptable image reconstruction and coding. In this paper, we present a Low complexity and efficient architecture of 1D-DCT based Cordic-Loeffler technique for wireless capsule endoscopy. Our improvement over the original algorithm is performed in CORDIC part. this brings us to reduce the number of addition operations from 18 to 10. As a result, the number of addition is reduced from 38 to 30 operations in the main algorithm. Also, to more ameliorate our results, we used Modified Carry look Ahead adder (MCLA) and Carry Save Adder (CSA) adder which are characterized by low power and high speed compared to classical Carry Look Ahead adder (CLA). Our aim is to provide an optimized architecture in terms of area and power consumption. the proposed design has been implemented on FPGA. Compared to other architectures, the proposed architecture has not only reduced the computation complexity, but also the area and the power consumption. It should be noted that the proposed DCT architecture is very suitable for low-power and high-quality codecs, especially for battery-based systems.
GPUs (Graphics processing Units) are designed to solve large data-parallel problems encountered in the fields of age processing, scene rendering, video playback, and gaming. CPUs are therefore designed to handle a hig...
详细信息
ISBN:
(纸本)9789380544120
GPUs (Graphics processing Units) are designed to solve large data-parallel problems encountered in the fields of age processing, scene rendering, video playback, and gaming. CPUs are therefore designed to handle a higher degree of parallelism as compared to conventional CPUs. GPGPU (General Purpose computing on Graphics processing Units) enables users to do parallel computing on the graphics hardware commonly available on current personal computers. these days' systems are available with multi-core GPUs that provide the necessary hardware infrastructure, thereby enabling high performance computing on personal computers. NVIDIA's CUDA (Compute Unified Device Architecture) and the industry standard OpenCL (Open Computing Language) provides the software platform required to utilize the graphics hardware to solve computational problems using parallelalgorithms, otherwise solvable mostly in supercomputing environments. this paper presents two parallel CREW (Concurrent Read Exclusive Write) PRAM algorithms for optimal coloring of general graphs on stream processingarchitectures such as the CPU. the algorithms are implemented using OpenCL. the first algorithm presents the techniques for computing vertex independent sets on the GPU and then assigns colors to them. the second algorithm focuses on the optimization of the vertex independent set computation for edge-transitive graphs by taking advantage of the structures of such graphs and then assigns color to each of the normalized independent sets.
Conventional software speculative parallel models are facing challenges due to the increasing number of the processor core and the diversification of the application. the speculation accuracy is one of the key factors...
详细信息
ISBN:
(数字)9783319111940
ISBN:
(纸本)9783319111940;9783319111933
Conventional software speculative parallel models are facing challenges due to the increasing number of the processor core and the diversification of the application. the speculation accuracy is one of the key factors to the performance of software speculative parallel model. In this paper, we proposed a novel value prediction mechanism named Inter-thread Fetching Value Prediction(IFVP). It supports a speculative thread to read the values of conflict variables speculatively from another speculative thread. this method can remarkably reduce the miss speculation rate in a loop to be parallelized with cross-iter dependencies. We have proved that the IFVP can improve the speculation accuracy by about 19.1% on the average, and can improve the performance by about 37.1% on the average, compared withthe conventional models without value prediction.
暂无评论