In this paper, a novel approach for tasks scheduling in XQuery's automatic parallel implementation is proposed. the approach solves the scheduling problem on the shared memory multithread environment, which includ...
详细信息
ISBN:
(纸本)9781479938445
In this paper, a novel approach for tasks scheduling in XQuery's automatic parallel implementation is proposed. the approach solves the scheduling problem on the shared memory multithread environment, which includes three strategies, i.e. task parallelism, data parallelism and pipeline parallelism. An automaton model is established for the pipeline parallelism, which is used to reduce the idle time between pipeline stages. the experimental results show that our approach could improve the performance and have good memory efficiency.
We present new parallel algorithms for testing pattern involvement for all length 4 permutations. Our algorithms have the complexity of O(log n) time with n/log n processors on the CREW PRAM model, O(log log log n) ti...
详细信息
ISBN:
(纸本)9781479938445
We present new parallel algorithms for testing pattern involvement for all length 4 permutations. Our algorithms have the complexity of O(log n) time with n/log n processors on the CREW PRAM model, O(log log log n) time with n/log log log n processors or constant time and n log 3 n processors on a CRCW PRAM model. parallel algorithms were not designed before for some of these patterns and for other patters the previous best algorithms require O(log n) time and n processors on the CREW PRAM model.
In the traditional multithread programming model, there is no dedicated performance optimization strategy for Many Integrated Core (MIC) heterogeneous system. To fully exploit the high computing power of MIC processor...
详细信息
ISBN:
(纸本)9781479938445
In the traditional multithread programming model, there is no dedicated performance optimization strategy for Many Integrated Core (MIC) heterogeneous system. To fully exploit the high computing power of MIC processor, this paper discusses the specific program porting and performance optimization strategies on the MIC heterogeneous parallel system based on the k-means application program. Experimental results show that the proposed porting and performance optimization strategies are effective, and can be able to guide the programmer to port and optimize applications effectively to MIC heterogeneous parallel system.
Based on GPU parallel technology, this paper proposes a parallel SRM feature extraction algorithm to accelerate the extraction of SRM feature for steganalysis of HUGO images. Using the parallel program framework of Op...
详细信息
ISBN:
(纸本)9781479938445
Based on GPU parallel technology, this paper proposes a parallel SRM feature extraction algorithm to accelerate the extraction of SRM feature for steganalysis of HUGO images. Using the parallel program framework of OpenCL for GPU, we parallelize and implement a serial algorithm and employ some optimization technologies for our parallel program to accelerate the extraction process. the techniques include convolution unrolling, combined memory access, aversion of bank conflicts. the experimental results show that the speed of the proposed parallel extraction algorithm for different size images is 25 similar to 55 times faster than the original serial algorithm, and 2 similar to 4.2 times faster than running the parallel method on Quad-core CPU.
three dimensional wave propagation model of parabolic approximation type is widely used in exploring the ocean. For this application, FOR3D model is one of the mostly used models, for it takes the azimuthal coupling i...
详细信息
ISBN:
(纸本)9781479938445
three dimensional wave propagation model of parabolic approximation type is widely used in exploring the ocean. For this application, FOR3D model is one of the mostly used models, for it takes the azimuthal coupling into consideration and the result gains a better precision. When the precision requirement is high, large scale computation task would be faced, which cannot be solved only in one computer or constrained time. In this paper, we propose a parallel method to decompose the computation task, which divide the original computation into small size. then each processor get one piece of task and run the FOR3D independently. Our method was implemented on Windows Azure and the result shows that we almost gain a linear speed-up.
In recent years, multi-core digital signal processors (DSPs) have been widely used to improve execution efficiency in a variety of applications. In order to fully explore the parallel processing capacity of DSPs, a we...
详细信息
ISBN:
(纸本)9781479938445
In recent years, multi-core digital signal processors (DSPs) have been widely used to improve execution efficiency in a variety of applications. In order to fully explore the parallel processing capacity of DSPs, a well-designed parallelprogramming model is essential for programmers. In this paper, a parallelprogramming model for a self-designed multi-core audio DSP (MAD) is proposed based on both shared-memory and message-passing communication mechanisms. A set of application program interfaces (APIs) of PPMA are provided to realize inter-core data transmission and synchronization controlling with high efficiency. To evaluate performance improvement of audio applications using PPMA, a low bit-rate speech codec application is ported to the MAD. Withthe help of PPMA, task scheduling of speech codec can be implemented conveniently. Experimental results also show that the overhead of inter-core communication in MAD is negligible compared to the parallel speedup achieved by PPMA.
the proceedings contain 56 papers. the topics discussed include: design and evaluation of dynamically-allocated multi-queue buffers with multiple packets for NoC routers;algorithmic aspects for bi-objective multiple-c...
ISBN:
(纸本)9781479938445
the proceedings contain 56 papers. the topics discussed include: design and evaluation of dynamically-allocated multi-queue buffers with multiple packets for NoC routers;algorithmic aspects for bi-objective multiple-choice hardware/software partitioning;fault-tolerant distributed publish/subscribe using self-stabilization;a runtime framework for GPGPU;a sensitive and robust grid reputation system based on rating of recommenders;efficient FPGA-mapping of 1024 point FFT pipeline SDF processor;multi-parameter online identification algorithm of induction motor for hybrid electric vehicle applications;wide area power system fault detection using compressed sensing to reduce the WAN data traffic;PSO applied to optimal operation of a micro-grid with wind power;and a fuzzy multi-objective optimization method solving the output of energy storage system.
the 6G is the space-terrestrial integrated network combining the Internet of things and satellite network. And the data security problem of 6G network has become a research hotpot. the introduction of blockchain techn...
详细信息
ISBN:
(纸本)9781665496391
the 6G is the space-terrestrial integrated network combining the Internet of things and satellite network. And the data security problem of 6G network has become a research hotpot. the introduction of blockchain technology in 6G network can effectively improve the safety of data storage. However, withthe continuous addition of IoT nodes and satellite nodes, more and more data are stored in blocks, which makes the problems of large storage space and low storage efficiency prominent. therefore, for 6G network, this paper proposes a lightweight block storage security scheme based on LT coding. In the proposed storage scheme, satellite nodes are responsible for the complete storage of the blockchain. the blocks are encoded by using LT code and stored at the satellite nodes in the form of slices, while the latest blocks are stored in IoT nodes. this greatly reduces the storage burden of 6G network. To suit large-scale nodes of 6G network, the traditional blockchain indexing method has been improved. the IoT nodes can calculate the jump block number according to their own block number, thus realizing the jump indexing and verification of the block. Finally, the simulation results prove that the proposed storage scheme has superior performance in block recovery success rate and block storage occupancy.
the biggest problem to be solved in grid computing is data transmission and communication efficiency and stability. For computing grid composed of small local area network(LAN), this problem can be solved by properly ...
详细信息
the biomedical imagery, the numeric communications, the acoustic signal processing and many others digital signal processing (DSP) applications are present more and more in the numeric world. they process growing data...
详细信息
ISBN:
(纸本)9781479961238
the biomedical imagery, the numeric communications, the acoustic signal processing and many others digital signal processing (DSP) applications are present more and more in the numeric world. they process growing data volume which is represented with more and more accuracy, and use complex algorithms with time constraints to satisfying. Consequently, a high requirement of computing power characterize them. To satisfy this need, it's inevitable today to use parallel and heterogeneous architectures in order to speedup the processing, where the best examples are today's supercomputers like "Tianhe-2" and "Titan" of Top500 ranking. these architectures withtheir multi-core nodes supported by many-core accelerators offer a good response to this problem. However, they are still hard to program to make performance because of many reasons: parallelism expression, task synchronization, memory management, hardware specifications handling, load balancing ... In the present work, we are characterizing DSP applications and propose a programming model based on their distinctiveness in order to implement them easily and efficiently on heterogeneous clusters.
暂无评论