检索结果-内蒙古大学图书馆

parallel collocation solution of index-1 BVP-DAEs arising from constrained optimal control problems

NUMERICAL ALGORITHMS 2016年第2期71卷 311-335页

作者： Fabien, Brian C. Univ Washington 322 MEBBox 352600 Seattle WA 98195 USA

The indirect solution of constrained optimal control problems gives rise to two-point boundary value problems (BVPs) that involve index-1 differential-algebraic equations (DAEs) and inequality constraints. This paper presents a parallel collocation algorithm for the solution of these inequality constrained index-1 BVP-DAEs. The numerical algorithm is based on approximating the DAEs using piecewise polynomials on a nonuniform mesh. The collocation method is realized by requiring that the BVP-DAE be satisfied at Lobatto points within each interval of the mesh. A Newton interior-point method is used to solve the collocation equations, and maintain feasibility of the inequality constraints. The implementation of the algorithm involves: (i) parallel evaluation of the collocation equations;(ii) parallel evaluation of the system Jacobian;and (iii) parallel solution of a boarded almost block diagonal (BABD) system to obtain the Newton search direction. Numerical examples show that the parallel implementation provides significant speedup when compared to a sequential version of the algorithm.

关键词： Collocation method Boundary value problem Index-1 differential-algebraic equations parallel programming Optimal control

来源：评论

学校读者我要写书评

暂无评论

TaskUniVerse: A Task-Based Unified Interface for Versatile parallel Execution

arXiv

引用

arXiv 2017年

作者： Zafari, Afshin Uppsala University Department of Information Technology Division of Scientific Computing Lägerhyddsvägen 2 Uppsala752 37 Sweden

Task based parallel programming has shown competitive outcomes in many aspects of parallel programming such as efficiency, performance, productivity and scalability. Different approaches are used by different software development frameworks to provide these outcomes to the programmer, while making the underlying hardware architecture transparent to her. However, since programs are not portable between these frameworks, using one framework or the other is still a vital decision by the programmer whose concerns are expandability, adaptivity, maintainability and interoperability of the programs. In this work, we propose a unified programming interface that a programmer can use for working with different task based parallel frameworks transparently. In this approach we abstract the common concepts of task based parallel programming and provide them to the programmer in a single programming interface uniformly for all frameworks. We have tested the interface by running programs which implement matrix operations within frameworks that are optimized for shared and distributed memory architectures and accelerators, while the cooperation between frameworks is configured externally with no need to modify the programs. Further possible extensions of the interface and future potential research are also described. Copyright © 2017, The Authors. All rights reserved.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

An Analysis of Lazy and Eager Limited Preemption Approaches under DAG-Based Global Fixed Priority Scheduling

An Analysis of Lazy and Eager Limited Preemption Approaches ...

引用

International Symposium on Object-Oriented Real-Time Distributed Computing

作者： Maria A. Serrano Alessandra Melani Sebastian Kehr Marko Bertogna Eduardo Quiñones Universitat Politecnica de Catalunya (UPC) Barcelona Spain Barcelona Supercomputing Center (BSC) Barcelona Spain Scuola Superiore Sant’ Anna Pisa Italy DENSO AUTOMOTIVE Deutschland GmbH University of Modena and Reggio Emilia Modena Italy

DAG-based scheduling models have been shown to effectively express the parallel execution of current many-core heterogeneous architectures. However, their applicability to real-time settings is limited by the difficulties to find tight estimations of the worst-case timing parameters of tasks that may arbitrarily be preempted/migrated at any instruction. An efficient approach to increase the system predictability is to limit task preemptions to a set of pre-defined points. This limited preemption model supports two different preemption approaches, eager and lazy, which have been analyzed only for sequential task-sets. This paper proposes a new response time analysis that computes an upper bound on the lower priority blocking that each task may incur with eager and lazy preemptions. We evaluate our analysis with both, synthetic DAG-based task-sets and a real case-study from the automotive domain. Results from the analysis demonstrate that, despite the eager approach generates a higher number of priority inversions, the blocking impact is generally smaller than in the lazy approach, leading to a better schedulability performance.

关键词： Computational modeling Interference Real-time systems Processor scheduling Computer architecture Analytical models parallel programming

来源：评论

学校读者我要写书评

暂无评论

Corrected Gossip Algorithms for Fast Reliable Broadcast on Unreliable Systems

Corrected Gossip Algorithms for Fast Reliable Broadcast on U...

引用

International Symposium on parallel and Distributed Processing (IPDPS)

作者： Torsten Hoefler Amnon Barak Amnon Shiloh Zvi Drezner Department of Computer Science ETH Zurich Zurich Switzerland Department of Computer Science The Hebrew University of Jerusalem Jerusalem Israel College of Business and Economics California State University Fullerton CA USA

Large-scale parallel programming environments and algorithms require efficient group-communication on computing systems with failing nodes. Existing reliable broadcast algorithms either cannot guarantee that all nodes are reached or are very expensive in terms of the number of messages and latency. This paper proposes Corrected-Gossip, a method that combines Monte Carlo style gossiping with a deterministic correction phase, to construct a Las Vegas style reliable broadcast that guarantees reaching all the nodes at low cost. We analyze the performance of this method both analytically and by simulations and show how it reduces the latency and network load compared to existing algorithms. Our method improves the latency by 20% and the network load by 53% compared to the fastest known algorithm on 4,096 nodes. We believe that the principle of corrected-gossip opens an avenue for many other reliable group communication operations.

关键词： Reliability Protocols Computer network reliability Algorithm design and analysis parallel programming Image color analysis

来源：评论

学校读者我要写书评

暂无评论

Prefiltering Model for Homology Detection Algorithms on GPU

引用

EVOLUTIONARY BIOINFORMATICS 2016年第2016期12卷 313-322页

作者： Retamosa, German de Pedro, Luis Gonzalez, Ivan Tamames, Javier Univ Autonoma Madrid High Performance Comp & Networking Dept Madrid Spain CSIC Natl Biotechnol Ctr Madrid Spain

Homology detection has evolved over the time from heavy algorithms based on dynamic programming approaches to lightweight alternatives based on different heuristic models. However, the main problem with these algorithms is that they use complex statistical models, which makes it difficult to achieve a relevant speedup and find exact matches with the original results. Thus, their acceleration is essential. The aim of this article was to prefilter a sequence database. To make this work, we have implemented a groundbreaking heuristic model based on NVIDIA's graphics processing units (GPUs) and multicore processors. Depending on the sensitivity settings, this makes it possible to quickly reduce the sequence database by factors between 50% and 95%, while rejecting no significant sequences. Furthermore, this prefiltering application can be used together with multiple homology detection algorithms as a part of a next-generation sequencing system. Extensive performance and accuracy tests have been carried out in the Spanish National Centre for Biotechnology (NCB). The results show that GPU hardware can accelerate the execution times of former homology detection applications, such as National Centre for Biotechnology Information (NCBI), Basic Local Alignment Search Tool for Proteins (BLASTP), up to a factor of 4.

关键词： computational biology next-generation sequencing parallel programming performance analysis NCBI BLAST NVIDIA CUDA

来源：评论

学校读者我要写书评

暂无评论

Efficient Querying Distributed Big-XML Data using MapReduce

引用

INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING 2016年第3期8卷 70-79页

作者： Song Kunfang Hongwei Lu Huazhong Univ Sci & Technol Wuhan Peoples R China

MapReduce is a widely adopted computing framework for data-intensive applications running on clusters. This paper proposed an approach to exploit data parallelisms in XML processing using MapReduce in Hadoop. The authors' solution seamlessly integrates data storage, labeling, indexing, and parallel queries to process a massive amount of XML data. Specifically, the authors introduce an SDN labeling algorithm and a distributed hierarchical index using DHTs. More importantly, an advanced two-phase MapReduce solution are designed that is able to efficiently address the issues of labeling, indexing, and query processing on big XML data. The experimental results show the efficiency and effectiveness of the proposed parallel XML data approach using Hadoop.

关键词： B-SLCA Big XML Distributed programming MapReduce parallel programming

来源：评论

学校读者我要写书评

暂无评论

Automated synthesis of divide and conquer parallelism

arXiv

引用

arXiv 2017年

作者： Farzan, Azadeh Nicolet, Victor University of Toronto Ecole Polytechnique

This paper focuses on automated synthesis of divide-andconquer parallelism, which is a common parallel programming skeleton supported by many cross-platform multithreaded libraries. The challenges of producing (manually or automatically) a correct divide-and-conquer parallel program from a given sequential code are two-fold: (1) assuming that individual worker threads execute a code identical to the sequential code, the programmer has to provide the extra code for dividing the tasks and combining the computation results, and (2) sometimes, the sequential code may not be usable as is, and may need to be modified by the programmer. We address both challenges in this paper. We present an automated synthesis technique for the case where no modifications to the sequential code are required, and we propose an algorithm for modifying the sequential code to make it suitable for parallelization when some modification is necessary. The paper presents theoretical results for when this modification is efficiently possible, and experimental evaluation of the technique and the quality of the produced parallel programs. Copyright © 2017, The Authors. All rights reserved.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Preparing an Online Java parallel Computing Course

Preparing an Online Java Parallel Computing Course

引用

IEEE International Symposium on parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

作者： Vivek Sarkar Max Grossman Zoran Budimlić Shams Imam Rice University Houston TX USA Two Sigma Houston TX USA

While multi-core platforms are now ubiquitous in all areas of information technology, from enterprise software engineering to mobile app development, parallel computing education is still lagging behind the demand for skilled parallel programmers. At many universities today, parallel and concurrent computing is still not part of the core curriculum because of resistance to major curriculum changes. Many other universities lack the necessary educators or infrastructure to teach a comprehensive parallel computing course. Furthermore, even addressing these issues would do nothing towards supporting software professionals who have already entered the work force and have no plans to return to school. To address this broad need for a standalone, publically available, comprehensive, and easily accessible course on parallel computing, we have developed an online offering packaged as a Coursera Specialization on parallel, Concurrent, and Distributing Computing in Java. In this paper, we describe the preparations for this online course and the unique challenges we encountered in terms of both curriculum development and technical infrastructure. We describe how lessons learned from an on-campus parallelism course at Rice University helped to shape the Coursera specialization, and summarize our experience with implementing this specialization on the Coursera platform at scale.

关键词： parallel processing parallel programming Java Education Concurrent computing programming profession

来源：评论

学校读者我要写书评

暂无评论

parallelized Advanced Rabin-Karp Algorithm for String Matching

Parallelized Advanced Rabin-Karp Algorithm for String Matchi...

引用

International Conference on Computing Communication Control and Automation (ICCUBEA)

作者： Omkar Sunil Joshi Bhargavi R. Upadhvay M. Supriya Department of Computer Science and Engineering Amrita University Benguluru India

String matching refers to the search of each and every occurrence of a string in another string. Nowadays, this issue presents itself in various segments in a great deal, starting from standard programs for text editing and processing, through databases and all the way to their various applications in other sciences. There are numerous different efficient algorithms to solve this problem. One of the efficient algorithms is Rabin-Karp algorithm which has complexity of O(m(n-m+l)) whereas the complexity of proposed advanced Rabin-Karp algorithm is O(n-m). However, the main focus of this research is to apply the concepts of parallelism to improve the performance of the algorithm. There are lots of parallel processing Application programming Interfaces (APIs) available, like OpenMP, MPI, CUDA MapReduce, etc. out of these we have chosen OpenMP and CUDA to achieve parallelism. Comparison of the results of both serial and parallel implementations will give us insights into how performance and efficiency is achieved through various techniques of parallelism.

关键词： Graphics processing units Pattern matching parallel programming parallel processing Time complexity Heuristic algorithms

来源：评论

学校读者我要写书评

暂无评论

Well-structured futures and cache locality

引用

ACM Transactions on parallel Computing 2016年第4期2卷 1–20页

作者： Herlihy, Maurice Liu, Zhiyu Computer Science Department Brown University ProvidenceRI02912 United States

In fork-join parallelism, a sequential program is split into a directed acyclic graph of tasks linked by directed dependency edges, and the tasks are executed, possibly in parallel, in an order consistent with their dependencies. A popular and effective way to extend fork-join parallelism is to allow threads to create futures. A thread creates a future to hold the results of a computation, which May or May not be executed in parallel. That result is returned when some thread touches that future, blocking if necessary until the result is ready. Recent research has shown that although futures can, of course, enhance parallelism in a structured way, they can have a deleterious effect on cache locality. In the worst case, futures can incur (PT∞ + tT∞) deviations, which implies (CPT∞ +CtT∞) additional cache misses, where C is the number of cache lines, P is the number of processors, t is the number of touches, and T∞ is the computation span. Since cache locality has a large impact on software performance on modern multicores, this result is troubling. In this article, we show that if futures are used in a simple, disciplined way, then the situation is much better: if each future is touched only once, either by the thread that created it or by a later descendant of the thread that created it, then parallel executions with work stealing can incur at most O(CPT2∞) additional cache misses—a substantial improvement. This structured use of futures is characteristic of many (but not all) parallel applications. © 2016 ACM.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：