OpenACC has been on development for a few years now. the OpenACC 2.5 specification was recently made public and there are some initiatives for developing full implementations of the standard to make use of accelerator...
详细信息
ISBN:
(纸本)9783319499567;9783319499550
OpenACC has been on development for a few years now. the OpenACC 2.5 specification was recently made public and there are some initiatives for developing full implementations of the standard to make use of accelerator capabilities. there is much to be done yet, but currently, OpenACC for GPUs is reaching a good maturity level in various implementations of the standard, using CUDA and OpenCL as backends. Nvidia is investing in this project and they have released an OpenACC Toolkit, including the PGI Compiler. there are, however, more developments out there. In this work, we analyze different available OpenACC compilers that have been developed by companies or universities during the last years. We check their performance and maturity, keeping in mind that OpenACC is designed to be used without extensive knowledge about parallel programming. Our results show that the compilers are on their way to a reasonable maturity, presenting different strengths and weaknesses.
For particular real world combinatorial optimization problems e.g. the longest common subsequence problem (LCSSP) from Bioinformatics, determining multiple optimal solutions (DMOS) is quite useful for experts. However...
详细信息
ISBN:
(纸本)9783319495835;9783319495828
For particular real world combinatorial optimization problems e.g. the longest common subsequence problem (LCSSP) from Bioinformatics, determining multiple optimal solutions (DMOS) is quite useful for experts. However, for large size problems, this may be too time consuming, thus the resort to parallel computing. We address here the parallelization of an algorithm for DMOS for the LCSSP. Considering the dynamic programming algorithm solving it, we derive a generic algorithm for DMOS (A-DMOS). Since the latter is a non perfect DO-loop nest, we adopt a three-step approach. the first consists in transforming the A-DMOS into a perfect nest. the second consists in choosing the granularity and the third carries out a dependency analysis in order to determine the type of each loop i.e. either parallel or serial. the practical performances of our approach are evaluated through experimentations achieved on input benchmarks and random DNA sequences and targeting a parallel multicore machine.
In hardware implementation of several widely used data and signal processingalgorithms, data permutations need to be performed between the consecutive computation stages consisting of parallel computational units. Re...
详细信息
ISBN:
(纸本)9782839918442
In hardware implementation of several widely used data and signal processingalgorithms, data permutations need to be performed between the consecutive computation stages consisting of parallel computational units. Recently, some highly data parallel streaming architectures for data permutation have been proposed to achieve high throughput. However, the interconnection complexity of these designs increases dramatically withthe problem size and data parallelism. In this paper, we develop a hardware structure to perform data permutation with optimized interconnection complexity, defined as the interconnection area per throughput. We propose a novel design technique such that the required interconnection logic is highly reduced for realizing a fixed permutation on streaming data. Our experimental results show that the proposed design technique reduces interconnection complexity by 27.3% to 75.8%, and improves the throughput by 5.3%similar to 129% and the energy efficiency by 1.2x similar to 3.5x compared withthe state-of-the-art.
parallel Computing Technologies : 5thinternationalconference, Pact-99, St. Petersburg, Russia, September 6-10, 1999 : Proceedings by internationalconference on parallel Computing Technologies (5th : 1999 : Saint Pe...
详细信息
parallel Computing Technologies : 5thinternationalconference, Pact-99, St. Petersburg, Russia, September 6-10, 1999 : Proceedings by internationalconference on parallel Computing Technologies (5th : 1999 : Saint Petersburg, Russia); Malyshkin, V. Ė. (Viktor Ėmmanuilovich); published by Berlin ; New York : Springer
the algorithm clustering by fast search and find of density peaks shows good efficiency and accuracy, but the space complexity of the algorithm is too high since it has to keep a global distance matrix in memory, so i...
详细信息
In many application scenarios of distributed stream processing, there might be partial order relations among the requests. However, existing stream processing systems can not directly handle partially ordered requests...
详细信息
ISBN:
(纸本)9783319495835;9783319495828
In many application scenarios of distributed stream processing, there might be partial order relations among the requests. However, existing stream processing systems can not directly handle partially ordered requests, while indirect mechanisms are usually strongly coupled with business logic, which lack flexibility and have limited performance. We propose Pork, a novel distributed stream processing system targeting at partially ordered requests. In the experiments, the new system has achieved a parallelism and request throughput larger than the traditional mechanism in the presented example, and the performance overhead due to parallelism is considerably small. then the scalability characteristic of the new system is discussed. What's more, the experiment results also show that the new system has a more flexible load balancing ability.
the Gauss-Huard algorithm (the GHA) is a specialized version of Gauss-Jordan elimination for the solution of linear systems that, enhanced with column pivoting, exhibits numerical stability and computational cost clos...
详细信息
ISBN:
(纸本)9783319495835;9783319495828
the Gauss-Huard algorithm (the GHA) is a specialized version of Gauss-Jordan elimination for the solution of linear systems that, enhanced with column pivoting, exhibits numerical stability and computational cost close to those of the conventional solver based on the LU factorization with row pivoting. Furthermore, the GHA can be formulated as a procedure rich in matrix multiplications, so that high performance can be expected on current architectures with multi-layered memories. Unfortunately, in principle the GHA does not admit the introduction of look-ahead, a technique that has been demonstrated to be rather useful to improve the performance of the LU factorization on multi-threaded platforms with high levels of hardware concurrency. In this paper we analyze the effect of this drawback on the implementation of the GHA on systems accelerated with graphics processing units (GPUs), exposing the roles of the CPU-to-GPU and single precision-to-double precision performance ratios, as well as the contribution from the operations in the algorithm's critical path.
One important approach to high-performance computing has a (relatively) simple physical computer architecture emulate virtual algorithmic architectures (VAAs) that are highly optimized for important application domain...
详细信息
ISBN:
(纸本)9783319499567;9783319499550
One important approach to high-performance computing has a (relatively) simple physical computer architecture emulate virtual algorithmic architectures (VAAs) that are highly optimized for important application domains. We expose the Cellular ANTomaton (CAnt) computing model-cellular automata enhanced with mobile FSMs (Ants)-as a highly efficient VAA for a variety of pattern-processing problems that are inspired by biocomputing applications. We illustrate the CAnt model via a scalable design for an n x n CAnt that solves the following bio-inspired problem in linear time. the Pattern-Assembly Problem. Inputs: a length-n master pattern II and r test patterns pi(0), . . . , pi(r-1), of respective lengths m(0) >= . . . >= m(r-1). the problem: Find every sequence of pi(k)'s, possibly with repetitions, that "assemble" (i.e., concatenate) to produce II;i.e., pi(j0) . . . pi(js-1) = II. Timing: m(1) + . . . + m(r) + O(n) steps, with a quite-small big-O constant.
European market size for smart home systems is expected to grow by 20% to over 4.3 billion USD by 2017. this growth is mainly due to the development of luxury and premium markets towards a mass market. In 2014, more t...
详细信息
European market size for smart home systems is expected to grow by 20% to over 4.3 billion USD by 2017. this growth is mainly due to the development of luxury and premium markets towards a mass market. In 2014, more than one billion mobile devices with Google's operating system Android were sold, which resulted in a market share of more than 80%. thus, mobile devices have already reached the mass market. the main idea behind this work is the reuse of mobile devices as smart home systems to reduce market entry barrier for end users and to contribute to a more holistic use of mobile devices. therefore, we developed a modular architecture concept for smart home systems based on Android devices. In order to ensure the usefulness of this concept, the characteristics of existing systems and the potential of mobile devices were taken into account. Our primary goal was the development of a generic software architecture for various smart home applications. therefore, we present a plugin framework concept that allows modular system design. Our framework is implemented by using components without modifying the Android operating system itself.
this book constitutes the refereed proceedings of the 12th IFIP WG 12.5 internationalconference on Artificial Intelligence Applications and Innovations, AIAI 2016, and three parallel workshops, held in thessaloniki, ...
ISBN:
(数字)9783319449449
ISBN:
(纸本)9783319449432
this book constitutes the refereed proceedings of the 12th IFIP WG 12.5 internationalconference on Artificial Intelligence Applications and Innovations, AIAI 2016, and three parallel workshops, held in thessaloniki, Greece, in September 2016. the workshops are the third Workshop on New Methods and Tools for Big Data, MT4BD 2016, the 5th Mining Humanistic Data Workshop, MHDW 2016, and the First Workshop on 5G - Putting Intelligence to the Network Edge, 5G-PINE 2016. the 30 revised full papers and 8 short papers presented at the main conference were carefully reviewed and selected from 65 submissions. the 17 revised full papers and 7 short papers presented at the 3 parallel workshops were selected from 33 submissions. the papers cover a broad range of topics such as artificial neural networks, classification, clustering, control systems - robotics, data mining, engineering application of AI, environmental applications of AI, feature reduction, filtering, financial-economics modeling, fuzzy logic, genetic algorithms, hybrid systems, image and video processing, medical AI applications, multi-agent systems, ontology, optimization, pattern recognition, support vector machines, text mining, and Web-social media data AI modeling.
暂无评论