Using grid resources to execute scientific applications requiring a large amount of computing power is attractive but not easy from the user's point of view. Vigne is a Grid system designed to provide users with a...
详细信息
ISBN:
(纸本)9783540744658
Using grid resources to execute scientific applications requiring a large amount of computing power is attractive but not easy from the user's point of view. Vigne is a Grid system designed to provide users with a simplified view of a grid. this paper presents a set of system services that allow to run a wide range of distributed applications in a simple and efficient manner. A running prototype has been implemented as a proof of concept and experiments on the Grid'5000 testbed show the efficiency of our approach.
the overheads of manually tuning a loop's parameters, such as chunk size, might prevent an application from reaching its maximum parallel performance. In this paper, we address this challenge by implementing a mul...
详细信息
ISBN:
(纸本)9781538625880
the overheads of manually tuning a loop's parameters, such as chunk size, might prevent an application from reaching its maximum parallel performance. In this paper, we address this challenge by implementing a multinomial logistic regression model on an HPX loop. We present a framework that captures boththe static and dynamic information of the runtime environment and feeds this information to a learning model which assigns an efficient loop chunk size automatically. Our evaluated execution results show that the proposed technique improves the performance of an NBody application by an average about 33%, 17%, and 19% compared to using existing HPX auto-parallelization tools when run with problem sizes of 10(5), 10(6), and 10(7) particles respectively.
parallelism permeates all levels of current computing systems, from single CPU machines, to large server farms, to geographically dispersed “volunteers-who collaborate over the Internet. the effective use of parallel...
详细信息
ISBN:
(纸本)9783642400476
parallelism permeates all levels of current computing systems, from single CPU machines, to large server farms, to geographically dispersed “volunteers-who collaborate over the Internet. the effective use of parallelism depends crucially on the availability of faithful, yet tractable, computational models for algorithm design and analysis and models of efficient strategies for solving key computational problems on prominent classes of computing platforms. Equally important are good algorithmic models of the way the different system components are interconnected. Withthe development of new genres of computing platforms, such as multicore parallel machines, desktop grids, clouds, and hybrid GPU/CPUbased systems, new computational models and paradigms are needed that will allow parallel programming to advance into mainstream computing. Topic 12 focuses on contributions providing new results on foundational issues regarding parallelism in computing and/or proposing improved approaches to the solution of specific algorithmic problems.
A program system for manipulating and translating of multimedia program skeletons ("films") into parallel programs is considered. the main goal of this system is to make easier to create parallel programs fo...
详细信息
ISBN:
(纸本)354040788X
A program system for manipulating and translating of multimedia program skeletons ("films") into parallel programs is considered. the main goal of this system is to make easier to create parallel programs for various parallelcomputing systems.
Finding the Longest Common Subsequence (LCS) is a traditional and well studied problem in bioinformatics and text editing. In this papers a customized parallel algorithm based on the partitioned Global Address Space (...
详细信息
ISBN:
(纸本)9783540854500
Finding the Longest Common Subsequence (LCS) is a traditional and well studied problem in bioinformatics and text editing. In this papers a customized parallel algorithm based on the partitioned Global Address Space (PGAS) programming model to compute the LCS is presented. the algorithm is based on two related parameters balancing the communication and the synchronization needs in order to find the best data and workload distributions. the basic design of the algorithm and its complexity analysis are discussed together with experimental results. these results show the impact of those parameters on PGAS algorithm performance.
the solution of large-scale problems in Computational Science and Engineering relies on the availability of accurate, robust and efficient numerical algorithms and software that are able to exploit the power offered b...
详细信息
ISBN:
(纸本)9783642328206
the solution of large-scale problems in Computational Science and Engineering relies on the availability of accurate, robust and efficient numerical algorithms and software that are able to exploit the power offered by modern computer architectures. Such algorithms and software provide building blocks for prototyping and developing novel applications, and for improving existing ones, by relieving the developers from details concerning numerical methods as well as their implementation in new computing environments.
Foreground object detection is an important area in computer vision research. It aims to detect, classify, and recognize object in video images. Real time video processing needs to deal with large amount of data. It i...
详细信息
ISBN:
(纸本)9780769550886
Foreground object detection is an important area in computer vision research. It aims to detect, classify, and recognize object in video images. Real time video processing needs to deal with large amount of data. It is also computationally intensive. this paper presents the parallelization of foreground object detection algorithms on a polymorphous computer that is capable of data parallel, thread parallel and instruction parallel computation. A modified approach using background differentiation and image integration is adopted in this research and a parallel implementation is realized on the PAAG polymorphous array processor. the parallel implementation is able to perform effective real-time foreground object detection in fast video frames. Simulation results show that this approach is highly effective.
this paper proposes a scalable and efficient architecture to accelerate random forest computation on FPGA devices targeting edge computing platforms. the proposed architecture with efficient decision tree units (DTUs)...
详细信息
ISBN:
(纸本)9783031506833;9783031506840
this paper proposes a scalable and efficient architecture to accelerate random forest computation on FPGA devices targeting edge computing platforms. the proposed architecture with efficient decision tree units (DTUs) executes samples in a pipeline model for improving performance. Moreover, a size-effective memory organization is also introduced withthe architecture to save the on-chip block ram used for reducing the latency and improving working frequency of the implementation system on FPGA devices. We target edge computing platforms that suffer from the limitations of resources and power consumption. therefore, the proposed architecture can reconfigure the number of DTUs according to the target platform's available resources. We build a system with a PYNQ Z2 FPGA board for testing, validating, and estimating the proposed architecture. In this system, we exploit different numbers of DTUs, from 1 to 15, to test our scalability. Experimental results with certified datasets show that we achieve speed-ups by up to 170.39x and 90.27x compared to Intel core i7 desktop version and core i9 high-performance computing version processors, respectively.
Derived from a parallel multiplier, a parallel-serial decimal multiplier is proposed in which the multiplicand is assumed in parallel whereas the multiplier is in digit-serial form. A scheme for a parallel-serial deci...
详细信息
ISBN:
(纸本)9781467351652
Derived from a parallel multiplier, a parallel-serial decimal multiplier is proposed in which the multiplicand is assumed in parallel whereas the multiplier is in digit-serial form. A scheme for a parallel-serial decimal multiplier is presented, using BCD digits. the multiplicand is assumed in parallel, the multiplier in digit-serial form. the values of the Digit Products in the successive columns of the product array are added in binary and converted in decimal. their decimal alignment generates a set of three or four serial decimal numbers whose sum is the product. the parallel-serial proposal substantially reduces complexity and it exploits overlapping update to speed up the pipeline. Evaluation on a basic implementation on FPGAs is compared against another embedded multiplier approach, showing that the proposed scheme achieves an increasing advantage as the input size increases.
暂无评论