检索结果-内蒙古大学图书馆

international conference on parallel processing Workshops (ICPPW)

作者： Zahra Khatami Hartmut Kaiser J. Ramanujam The STE∥AR Group Baton Rouge Louisiana USA Center for Computation and Technology Louisiana State University

ISBN: (纸本)9781509028269

Computer scientists and programmers face the difficultly of improving the scalability of their applications while using conventional programming techniques only. As a base-line hypothesis of this paper we assume that an advanced runtime system can be used to take full advantage of the available parallel resources of a machine in order to achieve the highest parallelism possible. In this paper we present the capabilities of HPX - a distributed runtime system for parallel applications of any scale - to achieve the best possible scalability through asynchronous task execution [1]. OP2 is an active library which provides a framework for the parallel execution for unstructured grid applications on different multi-core/many-core hardware architectures [2]. OP2 generates code which uses OpenMP for loop parallelization within an application code for both single-threaded and multi-threaded machines. In this work we modify the OP2 code generator to target HPX instead of OpenMP, i.e. port the parallel simulation backend of OP2 to utilize HPX. We compare the performance results of the different parallelization methods using HPX and OpenMP for loop parallelization within the Airfoil application. the results of strong scaling and weak scaling tests for the Airfoil application on one node with up to 32 threads are presented. Using HPX for parallelization of OP2 gives an improvement in performance by 5%-21%. By modifying the OP2 code generator to use HPX's parallel algorithms, we observe scaling improvements by about 5% as compared to OpenMP. To fully exploit the potential of HPX, we adapted the OP2 API to expose a future and dataflow based programming model and applied this technique for parallelizing the same Airfoil application. We show that the dataflow oriented programming model, which automatically creates an execution tree representing the algorithmic data dependencies of our application, improves the overall scaling results by about 21% compared to OpenMP. Our results show

关键词： Automotive components parallel processing Scalability Programming Instruction sets Runtime Hardware

来源：评论

学校读者我要写书评

暂无评论

Retargetable Communication for Distributed Programs

Retargetable Communication for Distributed Programs

引用

international ACM SIGSOFT conference on Quality of Software architectures (QoSA)

作者： Oren Freiberg Jens Palsberg Mahdi Eslamimehr Microsoft Viewpoints Research Institute

ISBN: (纸本)9781509025688

the emergence of clusters of multi-core multiprocessors has created a challenge for software developers who use concurrency to gain performance. the challenge lies in the application's dependence on both the hardware and the deeply integrated communication infrastructure for performance improvements. this integration of the communication and parallelism in the user's application reduces flexibility by adding complexity when switching to different communication and parallel infrastructures. In this paper, we present a retargetable compiler framework for a subset of X10 that abstracts the hardware details, parallelism, and communication away from the application, allowing for portability and easier retargeting of the communication and parallelism. the retargetable compiler framework uses asynchronous computation and communication, as well as the concept of places to abstract away hardware details and to provide scalability. the framework offers performance, functionality, and flexibility because of our separation of tasks into layers and because of source code level serialization. To illustrate the ease of retargeting the communication and the patterns of parallelism, our framework is implemented with two different communication APIs (DUP and MPI-2) and two different patterns of parallelism (thread pooling and thread spawning). Retargeting the communication infrastructure using our framework required fewer code changes than changing the pattern of parallelism. the minimal code change needed to retarget these components offers developers a reasonable way to retarget without recompiling their application or sacrificing performance.

关键词： Message systems parallel processing Libraries Hardware Benchmark testing Switches Synchronization

来源：评论

学校读者我要写书评

暂无评论

Compiler for a Simplified Programming Language Aiming on Multi Core Students' Experimental Processor 10

Compiler for a Simplified Programming Language Aiming on Mul...

引用

10th IEEE international conference on Industrial and Information Systems (ICIIS)

作者： Wepathana, Y. M. R. D. Anthonys, G. Udugama, L. S. K. Open Univ Sri Lanka Fac Engn Technol Dept Elect & Comp Engn Nugegoda Sri Lanka

ISBN: (纸本)9781479918768

Knowledge of parallel programming is an essential requirement in multicore era. To meet this requirement, teaching parallel programming is important at university level. Further, students should have an exposure to different parallel architectures and programming models as well. In order to achieve this objective, it is appropriate to use an integrated system having different parallel architectures and supporting programming languages. though it is difficult to find a system as stated above, Multi Core Students Experimental Processor (MCSEP) designed on the base of Students Experimental Processor provides an opportunity to develop such system. the MCSEP can be configured to one of the five architectures: SISD, SIMD, MIMD, Multiple-SIMD, and Multiple-MIMD. Each architecture can further be configured to one of six Instruction Set architectures: Memory-Memory, Accumulator, Extended Accumulator, Stack, Register Memory, and Load Store. As there are no programming tools for the MCSEP, a compiler and a simplified programming language, SEPCom has been developed for using all the features of the multicore processor MCSEP. the SEPCom is a Java like programming language with parallel programming features. the test results show that SEPCom performs well in all architectures available in the MCSEP. therefore SEPCom can be used for writing parallel programs for different parallel architectures. Consequently, students can develop appropriate programs to do their experiments, and moreover to analyze and measure performances in different parallel architectures. Further, students can also use it as a case study for learning compiler design.

关键词： compiler design multicore programming MIMD SIMD SISD teaching parallel programming

来源：评论

学校读者我要写书评

暂无评论

Applying K-means clustering and genetic algorithm for solving MTSP 11th

Applying K-means clustering and genetic algorithm for solvin...

引用

11th international conference on Bio-inspired Computing – theories and Applications, BIC-TA 2016

作者： Lu, Zhanqing Zhang, Kai He, Juanjuan Niu, Yunyun School of Computer Science Wuhan University of Science and Technology Wuhan430081 China Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System Wuhan430081 China School of Information Engineering China University of Geosciences Beijing100083 China

ISBN: (纸本)9789811036132

In this paper, a new algorithm is designed to solve Multiple Traveling Salesman Problem (MTSP) that avoiding the path intersection among the traveling salesmen. there are three objectives in this problem including the shortest path of every salesman, the balance of each salesmans task and avoiding the crosses of each routes. We combine the K-means algorithm and genetic algorithm. K-means algorithm is designed to divide all points into several subsets and choose the start city for the genetic algorithm, and then using GA to process every subsets in parallel. this method not only achieve these multiple objectives, but also use much less time, since we have divided all the points into several parts and make them calculated at the same time. © Springer Nature Singapore Pte Ltd. 2016.

关键词： Genetic algorithms

来源：评论

学校读者我要写书评

暂无评论

Coordination of parallel tasks in access to resource groups by adaptive conflictless scheduling

Communications in Computer and Information Science

引用

Communications in Computer and Information Science 2016年 613卷 272-282页

作者： Smolinski, Mateusz Institute of Information Technology Lodz University of Technology Wolczanska 215 Lodz90-924 Poland

ISBN: (纸本)9783319340982

Conflictless task scheduling is dedicated for environment of parallel task processing with high contention of limited amount of resources. For tasks that each one requires group of resources presented solution can prepare schedule of tasks execution without occurrence of any resource conflict. As a task it can be used any selected sequence of operation that for execution requires access for resource group, to which access is controlled by conflictless scheduling. Any resource group required by task has own FIFO queue, where tasks are waiting for access of those resources. Queues are emptying according to prepared conflictless schedule in such a way that there is no starvation of waiting tasks. Presented scheduling concept for tasks and resource group bases on resource representation model which allows to efficient detect a resource conflict using dedicated data structures like task classes and conflict matrix and algorithms which allows to prepare adaptive conflictless schedule. Prepared conflictless schedule adapts to current environment state like number of resource groups and tasks in their queues and also waiting times of tasks. Prepared schedule ensures task execution without resource conflicts and therefore there is no tasks deadlock. As example of environments where conflictless scheduling can be applied is transaction processing in databases or OLTP systems, processes or threads competing for resources. In transaction processing environment deadlock elimination by using proposed conflictless scheduling reduces the number of transaction rollbacks. © Springer international Publishing Switzerland 2016.

关键词： Concurrency control

来源：评论

学校读者我要写书评

暂无评论

High-Order Finite-Differences on Multi-threaded architectures Using OCCA 10th

引用

10th international conference on Spectral and High-Order Methods (ICOSAHOM)

作者： Medina, David St-Cyr, Amik Warburton, Timothy Rice Univ Computat & Appl Math Houston TX 77005 USA Royal Dutch Shell Seism Applicat Team Rijswijk Netherlands

ISBN: (纸本)9783319198002;9783319197999

High-order finite-differencemethods are commonly used in wave propagator for industrial subsurface imaging algorithms. Computational aspects of the reduced linear elastic vertical transversely isotropic propagator are considered. thread parallel algorithms suitable for implementing this propagator on multi-core and many-core processing devices are introduced. Portability is addressed through the use of the OCCA runtime programming interface. Finally, performance results are shown for various architectures on a representative synthetic test case.

关键词： Finite difference method

来源：评论

学校读者我要写书评

暂无评论

Mimer and Schedeval: Tools for Comparing Static Schedulers for Streaming Applications on Manycore architectures 44

Mimer and Schedeval: Tools for Comparing Static Schedulers f...

引用

44th Annual international conference on parallel processing Workshops (ICPPW)

作者： Melot, Nicolas Janzen, Johan Kessler, Christoph Linkoping Univ Linkoping Sweden Uppsala Univ Uppsala Sweden

ISBN: (纸本)9781467375894

Scheduling algorithms published in the scientific literature are often difficult to evaluate or compare due to differences between the experimental evaluations in any two papers on the topic. Very few researchers share the details about the scheduling problem instances they use in their evaluation section, the code that allows them to transform the numbers they collect into the results and graphs they show, nor the raw data produced in their experiments. Also, many scheduling algorithms published are not tested against a real processor architecture to evaluate their efficiency in a realistic setting. In this paper, we describe Mimer, a modular evaluation tool-chain for static schedulers that enables the sharing of evaluation and analysis tools employed to elaborate scheduling papers. We propose Schedeval that integrates into Mimer to evaluate static schedules of streaming applications under throughput constraints on actual target execution platforms. We evaluate the performance of Schedeval at running streaming applications on the Intel Single-Chip Cloud computer (SCC), and we demonstrate the usefulness of our tool-chain to compare existing scheduling algorithms. We conclude that Mimer and Schedeval are useful tools to study static scheduling and to observe the behavior of streaming applications when running on manycore architectures.

关键词： Benchmark testing Energy consumption Processor scheduling Schedules Switches throughput Time-frequency analysis benchmark crown scheduling energy frequency many-core scaling scc scheduling streaming voltage Benchmark testing scheduling of multiprocessor Streaming Time frequency analysis throughput Scheduling algorithms Switches energy consumption BENCHMARKS Stress corrosion cracking Scale formation VOLTAGE Scaling

来源：评论

学校读者我要写书评

暂无评论

Improving Data Transfer throughput with Direct Search Optimization

Improving Data Transfer Throughput with Direct Search Optimi...

引用

international conference on parallel processing (ICPP)

作者： Prasanna Balaprakash Vitali Morozov Rajkumar Kettimuthu Kalyan Kumaran Ian Foster Leadership Computing Facility Argonne National Laboratory Argonne IL USA Mathematics and Computer Science Division Argonne National Laboratory Argonne IL USA

ISBN: (纸本)9781509028245

Improving data transfer throughput over high-speed long-distance networks has become increasingly difficult. Numerous factors such as nondeterministic congestion, dynamics of the transfer protocol, and multiuser and multitask source and destination endpoints, as well as interactions among these factors, contribute to this difficulty. A promising approach to improving throughput consists in using parallel streams at the application layer. We formulate and solve the problem of choosing the number of such streams from a mathematical optimization perspective. We propose the use of direct search methods, a class of easy-to-implement and light-weight mathematical optimization algorithms, to improve the performance of data transfers by dynamically adapting the number of parallel streams in a manner that does not require domain expertise, instrumentation, analytical models, or historic data. We apply our method to transfers performed with the GridFTP protocol, and illustrate the effectiveness of the proposed algorithm when used within Globus, a state-of-the-art data transfer tool, on production WAN links and servers. We show that when compared to user default settings our direct search methods can achieve up to 10x performance improvement under certain conditions. We also show that our method can overcome performance degradation due to external compute and network load on source end points, a common scenario at high performance computing facilities.

关键词： throughput Data transfer Tuning Optimization Protocols Concurrent computing parallel processing

来源：评论

学校读者我要写书评

暂无评论

10th international conference on parallel processing and Applied Mathematics, PPAM 2013

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2015年第4期27卷 882-884页

作者： Wyrzykowski, Roman Tudruj, Marek Czestochowa Tech Univ PL-42200 Czestochowa Poland Polish Acad Sci Inst Comp Sci PL-01248 Warsaw Poland Polish Japanese Inst Informat Technol PL-01248 Warsaw Poland

来源：评论

学校读者我要写书评

暂无评论

Performance-vetted 3-D MAC Processors for parallel volumetric convolution algorithm: A 256×256×20 MRI Filtering case study

Performance-vetted 3-D MAC Processors for Parallel volumetri...

引用

Al-Sadeq international conference on Multidisciplinary in IT and Communication Science and Applications

作者： Sami Hasan System Engineering Department College of Information Engineering Al-Nahrain University Baghdad Iraq

ISBN: (纸本)9781509032488

3-D raw data collections introduce noise and artifacts that need to be recovered from degradation by an automated filtering system before further machine analysis. Serving this goal, five performance-efficient FPGA-prototyped processors are devised to realize parallel 3-D "filtering algorithm". these parallel processors tackle the major bottlenecks and limitations of existing multiprocessor systems in input volumetric data, processing word-length, output boundary conditions and inter-processor communications. then, greyscale 256×256×20 MRI case study are efficiently filtered and improved by a class of common convolution operators and their developed ones respectively. Analytically, the performance of the five implemented processors are evaluated in term of area, speed, dynamic power, and throughput. All five processors efficiently perform in high real-time throughput up to (114 VPS), lowest power consumption of down to (64 mW) at maximum operating frequency. the devised processors can be embedded in mobile MRI or fMRI scanner and as a pre-filtering stage in any portable automated fMRI systems.

关键词： 3-D MRI Neuroscience FPGA parallel algorithms architectures Power throughput Filtering

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：