The indirect solution of constrained optimal control problems gives rise to two-point boundary value problems (BVPs) that involve index-1 differential-algebraic equations (DAEs) and inequality constraints. This paper ...
详细信息
The indirect solution of constrained optimal control problems gives rise to two-point boundary value problems (BVPs) that involve index-1 differential-algebraic equations (DAEs) and inequality constraints. This paper presents a parallel collocation algorithm for the solution of these inequality constrained index-1 BVP-DAEs. The numerical algorithm is based on approximating the DAEs using piecewise polynomials on a nonuniform mesh. The collocation method is realized by requiring that the BVP-DAE be satisfied at Lobatto points within each interval of the mesh. A Newton interior-point method is used to solve the collocation equations, and maintain feasibility of the inequality constraints. The implementation of the algorithm involves: (i) parallel evaluation of the collocation equations;(ii) parallel evaluation of the system Jacobian;and (iii) parallel solution of a boarded almost block diagonal (BABD) system to obtain the Newton search direction. Numerical examples show that the parallel implementation provides significant speedup when compared to a sequential version of the algorithm.
作者:
Zafari, AfshinUppsala University
Department of Information Technology Division of Scientific Computing Lägerhyddsvägen 2 Uppsala752 37 Sweden
Task based parallel programming has shown competitive outcomes in many aspects of parallel programming such as efficiency, performance, productivity and scalability. Different approaches are used by different software...
详细信息
DAG-based scheduling models have been shown to effectively express the parallel execution of current many-core heterogeneous architectures. However, their applicability to real-time settings is limited by the difficul...
详细信息
DAG-based scheduling models have been shown to effectively express the parallel execution of current many-core heterogeneous architectures. However, their applicability to real-time settings is limited by the difficulties to find tight estimations of the worst-case timing parameters of tasks that may arbitrarily be preempted/migrated at any instruction. An efficient approach to increase the system predictability is to limit task preemptions to a set of pre-defined points. This limited preemption model supports two different preemption approaches, eager and lazy, which have been analyzed only for sequential task-sets. This paper proposes a new response time analysis that computes an upper bound on the lower priority blocking that each task may incur with eager and lazy preemptions. We evaluate our analysis with both, synthetic DAG-based task-sets and a real case-study from the automotive domain. Results from the analysis demonstrate that, despite the eager approach generates a higher number of priority inversions, the blocking impact is generally smaller than in the lazy approach, leading to a better schedulability performance.
Large-scale parallel programming environments and algorithms require efficient group-communication on computing systems with failing nodes. Existing reliable broadcast algorithms either cannot guarantee that all nodes...
详细信息
Large-scale parallel programming environments and algorithms require efficient group-communication on computing systems with failing nodes. Existing reliable broadcast algorithms either cannot guarantee that all nodes are reached or are very expensive in terms of the number of messages and latency. This paper proposes Corrected-Gossip, a method that combines Monte Carlo style gossiping with a deterministic correction phase, to construct a Las Vegas style reliable broadcast that guarantees reaching all the nodes at low cost. We analyze the performance of this method both analytically and by simulations and show how it reduces the latency and network load compared to existing algorithms. Our method improves the latency by 20% and the network load by 53% compared to the fastest known algorithm on 4,096 nodes. We believe that the principle of corrected-gossip opens an avenue for many other reliable group communication operations.
Homology detection has evolved over the time from heavy algorithms based on dynamic programming approaches to lightweight alternatives based on different heuristic models. However, the main problem with these algorith...
详细信息
Homology detection has evolved over the time from heavy algorithms based on dynamic programming approaches to lightweight alternatives based on different heuristic models. However, the main problem with these algorithms is that they use complex statistical models, which makes it difficult to achieve a relevant speedup and find exact matches with the original results. Thus, their acceleration is essential. The aim of this article was to prefilter a sequence database. To make this work, we have implemented a groundbreaking heuristic model based on NVIDIA's graphics processing units (GPUs) and multicore processors. Depending on the sensitivity settings, this makes it possible to quickly reduce the sequence database by factors between 50% and 95%, while rejecting no significant sequences. Furthermore, this prefiltering application can be used together with multiple homology detection algorithms as a part of a next-generation sequencing system. Extensive performance and accuracy tests have been carried out in the Spanish National Centre for Biotechnology (NCB). The results show that GPU hardware can accelerate the execution times of former homology detection applications, such as National Centre for Biotechnology Information (NCBI), Basic Local Alignment Search Tool for Proteins (BLASTP), up to a factor of 4.
MapReduce is a widely adopted computing framework for data-intensive applications running on clusters. This paper proposed an approach to exploit data parallelisms in XML processing using MapReduce in Hadoop. The auth...
详细信息
MapReduce is a widely adopted computing framework for data-intensive applications running on clusters. This paper proposed an approach to exploit data parallelisms in XML processing using MapReduce in Hadoop. The authors' solution seamlessly integrates data storage, labeling, indexing, and parallel queries to process a massive amount of XML data. Specifically, the authors introduce an SDN labeling algorithm and a distributed hierarchical index using DHTs. More importantly, an advanced two-phase MapReduce solution are designed that is able to efficiently address the issues of labeling, indexing, and query processing on big XML data. The experimental results show the efficiency and effectiveness of the proposed parallel XML data approach using Hadoop.
This paper focuses on automated synthesis of divide-andconquer parallelism, which is a common parallel programming skeleton supported by many cross-platform multithreaded libraries. The challenges of producing (manual...
详细信息
While multi-core platforms are now ubiquitous in all areas of information technology, from enterprise software engineering to mobile app development, parallel computing education is still lagging behind the demand for...
详细信息
While multi-core platforms are now ubiquitous in all areas of information technology, from enterprise software engineering to mobile app development, parallel computing education is still lagging behind the demand for skilled parallel programmers. At many universities today, parallel and concurrent computing is still not part of the core curriculum because of resistance to major curriculum changes. Many other universities lack the necessary educators or infrastructure to teach a comprehensive parallel computing course. Furthermore, even addressing these issues would do nothing towards supporting software professionals who have already entered the work force and have no plans to return to school. To address this broad need for a standalone, publically available, comprehensive, and easily accessible course on parallel computing, we have developed an online offering packaged as a Coursera Specialization on parallel, Concurrent, and Distributing Computing in Java. In this paper, we describe the preparations for this online course and the unique challenges we encountered in terms of both curriculum development and technical infrastructure. We describe how lessons learned from an on-campus parallelism course at Rice University helped to shape the Coursera specialization, and summarize our experience with implementing this specialization on the Coursera platform at scale.
String matching refers to the search of each and every occurrence of a string in another string. Nowadays, this issue presents itself in various segments in a great deal, starting from standard programs for text editi...
详细信息
String matching refers to the search of each and every occurrence of a string in another string. Nowadays, this issue presents itself in various segments in a great deal, starting from standard programs for text editing and processing, through databases and all the way to their various applications in other sciences. There are numerous different efficient algorithms to solve this problem. One of the efficient algorithms is Rabin-Karp algorithm which has complexity of O(m(n-m+l)) whereas the complexity of proposed advanced Rabin-Karp algorithm is O(n-m). However, the main focus of this research is to apply the concepts of parallelism to improve the performance of the algorithm. There are lots of parallel processing Application programming Interfaces (APIs) available, like OpenMP, MPI, CUDA MapReduce, etc. out of these we have chosen OpenMP and CUDA to achieve parallelism. Comparison of the results of both serial and parallel implementations will give us insights into how performance and efficiency is achieved through various techniques of parallelism.
In fork-join parallelism, a sequential program is split into a directed acyclic graph of tasks linked by directed dependency edges, and the tasks are executed, possibly in parallel, in an order consistent with their d...
详细信息
暂无评论