检索结果-内蒙古大学图书馆

33rd International Conference on High Performance Computing (ISC High Performance)

作者： Chronaki, Kallia Casas, Marc Moreto, Miquel Bosch, Jaume Badia, Rosa M. BSC Barcelona Spain UPC Barcelona Spain CSIC Spanish Natl Res Council Bellaterra Spain

ISBN: (纸本)9783319920405;9783319920399

As chip multi-processors (CMPs) are becoming more and more complex, software solutions such as parallel programming models are attracting a lot of attention. Task-based parallel programming models offer an appealing approach to utilize complex CMPs. However, the increasing number of cores on modern CMPs is pushing research towards the use of fine grained parallelism. Task-based programming models need to be able to handle such workloads and offer performance and scalability. Using specialized hardware for boosting performance of task-based programming models is a common practice in the research community. Our paper makes the observation that task creation becomes a bottleneck when we execute fine grained parallel applications with many taskbased programming models. As the number of cores increases the time spent generating the tasks of the application is becoming more critical to the entire execution. To overcome this issue, we propose TaskGenX. TaskGenX offers a solution for minimizing task creation overheads and relies both on the runtime system and a dedicated hardware. On the runtime system side, TaskGenX decouples the task creation from the other runtime activities. It then transfers this part of the runtime to a specialized hardware. We draw the requirements for this hardware in order to boost execution of highly parallel applications. From our evaluation using 11 parallel workloads on both symmetric and asymmetric multicore systems, we obtain performance improvements up to 15x , averaging to 3.1x over the baseline.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Optimal Placement for Smart Mobile Access Points 4

Optimal Placement for Smart Mobile Access Points

引用

IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI)

作者： Majd, Amin Troubitsyna, Elena Daneshtalab, Masoud Abo Akad Univ Dept Informat Technol Turku Finland Malardalen Univ Vasteras Sweden

ISBN: (纸本)9781538693803

Recently Smart Mobile Access Point (SMAP) based architectures have emerged as a promising solution for creating smart solutions supporting monitoring of special phenomena. SMAP allow us to predict communication activities in a system using the information collected from the network, and select the best approach to support the network at any given time. To improve the network performance, SMAPs can autonomously change their positions. They communicate with each other and carry out distributed computing tasks, constituting a mobile fog-computing platform. However, the communication cost becomes a critical factor. In this paper, we propose a compound method to select the best near-optimal placement of SMAPs with the goal to maximize the monitoring coverage and to minimize the communication cost. Our approach combines a parallel implementation of the Imperialist Competitive Algorithm (ICA) with Kruskal's Algorithm.

关键词： Smart mobile access point fog computing wireless sensor networks cyber-physical systems greedy approach multi-objective optimization evolutionary computing parallel approaches ICA parallel programming multi-population placement

来源：评论

学校读者我要写书评

暂无评论

Unobtrusive Support for Asynchronous GUI Operations with Java Annotations 32

Unobtrusive Support for Asynchronous GUI Operations with Jav...

引用

32nd IEEE International parallel and Distributed Processing Symposium (IPDPS)

作者： Mehrabi, Mostafa Giacaman, Nasser Sinnen, Oliver Univ Auckland Dept Elect & Comp Engn Parallel & Reconfigurable Comp Lab Auckland New Zealand

ISBN: (纸本)9781538655559

The complexities involved in parallel programming encourage frameworks to detach programmers from these concerns via higher-level abstraction. The high-performance nature of parallel computing drifts the focus of these programming environments towards facilitating and safeguarding faster computations. Therefore, aspects such as asynchronous graphical user interfaces (GUIs) do not see as much emphasis, even though many applications today depend on concurrent human-computer interactions. The significance of this topic is growing such that facilitating the efficient management of asynchronous GUI operations is currently a virtue, but will soon become necessary for parallel-programming frameworks. This paper discusses an unobtrusive and annotation-based approach for managing different types of asynchronous GUI operations within the layout of familiar sequential code. The proposed solution minimizes the restructuring of sequential code, in order to simplify developing, testing and maintaining GUI-based applications. Furthermore, the paper presents an implementation of the concept for @PT, a parallel programming environment based on Java annotations. The evaluation discussed in this paper suggests that the proposed mechanism is valid, and demonstrates timely and efficient handling of asynchronous GUI operations.

关键词： parallel programming asynchronous GUI responsiveness Java annotations @PT

来源：评论

学校读者我要写书评

暂无评论

Synchronization Techniques for parallel Redundant Execution of Applications

Synchronization Techniques for Parallel Redundant Execution ...

引用

ARCS Workshop 2019;32nd International Conference on Architecture of Computing Systems

作者： Christian Nagengast Lukas Osinski Juergen Mottok Laboratory for Safe and Secure Systems - LaS~3 Technical University of Applied Sciences Regensburg

ISBN: (纸本)9783800749577

In fault tolerant systems, applications are replicated and executed to enable error detection and recovery. If one replica application fails, another is able to take its place and provide the correct results. This concept can benefit from parallel execution on separate execution units. The rise of multicore platforms supports the development of parallel software, by providing the adequate hardware. However, this raises challenges regarding the synchronization of the redundant strings of execution. Replica determinism means that given the same input, identical programs provide the same output. To ensure replica determinism, requirements regarding the synchronization can be split in two domains: data and time. This paper examines the state of the art of synchronization techniques for parallel replicated execution in the context of fault tolerant systems. We analyze the requirements regarding synchronization within the time and data domain and compare different concepts of hardware (multicore, multiprocessor and multi-PCB) and software (processes, threads).

关键词： Fault tolerance Synchronization Replica determinism Multicore parallel computing Multi-core processors Fault tolerance parallel PROCESSING (COMPUTERS) Fault tolerant systems Legal Executions state of the art parallel programming active redundancy parallel execution Error detection and recovery

来源：评论

学校读者我要写书评

暂无评论

Noise removal of the X-ray medical image using fast spatial filters and GPU 41

Noise removal of the X-ray medical image using fast spatial ...

引用

Conference on Applications of Digital Image Processing XLI

作者： Cadena, Luis Zotin, Alexander Cadena, Franklin Espinosa, Nikolai Univ Fuerzas Armadas ESPE Ave Gral Ruminahui S-N Sangolqui Ecuador Reshetnev Siberian State Univ Sci & Technol 31 Krasnoyarsky Rabochy Pr Krasnoyarsk 660037 Russia Coll Juan Suarez Chacon Quito Ecuador

ISBN: (数字)9781510620766

ISBN: (纸本)9781510620766

Medical images are corrupted by different types of noises caused by the equipment itself. It is very important to obtain precise images to facilitate accurate observations for the given application. Removing of noise from images is now a very challenging issue in the field of medical image processing. This work undertake the study of noise removal techniques in medical image by using fast implementation of different digital filters, such as average, median and Gaussian filter. Processing of X-ray medical images takes a significant time. Now days modern hardware allows to use parallel technology for image processing on CPU and GPU. Using GPU processing technology were proposed parallel implementations of noise reduction algorithm taking into account the data parallelism. The experimental study conducted on medical X-ray image, so that to choose the best filters considering medical task and time of processing. The comparison of the implementation of fast filters algorithm and GPU implementation show great increase in performance. Graphics processing units (GPUs) are used today in a wide range of applications, mainly because they can dramatically accelerate parallel computing. In the field of medical imaging, GPUs are in some cases crucial for enabling practical use of computationally demanding algorithms.

关键词： medical image average fast filter median fast filter Gaussian fast filter parallel programming GPU

来源：评论

学校读者我要写书评

暂无评论

On the Efficiency of Transactional Code Generation: A GCC Case Study 19

On the Efficiency of Transactional Code Generation: A GCC Ca...

引用

19th Symposium on High-Performance Computing Systems (WSCAD)

作者： Honorio, Bruno Chinelato de Carvalho, Joao P. L. Baldassin, Alexandro Sao Paulo State Univ UNESP Rio Claro Brazil Univ Estadual Campinas Inst Comp Campinas SP Brazil

ISBN: (纸本)9781728137728

Memory transactions are becoming more popular as chip manufacturers are building native support for their execution. Although current Intel and IBM microprocessors support transactions in their instruction set architectures, there is still room for improvement in the compiler and runtime front. The GNU Compiler Collection (GCC) has language support for transactions, although performance is still a hindrance for its wider use. In this paper we perform an up-to-date study of the GCC transactional code generation and highlight where the main performance losses are coming from. Our study indicates that one of the main source of inefficiency is the read and write barriers inserted by the compiler. Most of this instrumentation is required because the compiler cannot determine, at compile time, whether a region of memory will be accessed concurrently or not. To overcome those limitations, we propose new language constructs that allow programmers to specify which memory locations should be free from instrumentation. Initial experimental results show a good speedup when barriers are elided using our proposed language support compared to the original code generated by GCC.

关键词： transactional memory parallel programming compilers over-instrumentation optimization

来源：评论

学校读者我要写书评

暂无评论

Employing Student Retention Strategies for an Introductory GPU programming Course

Employing Student Retention Strategies for an Introductory G...

引用

IEEE/ACM Workshop on Education for High-Performance Computing (EduHPC)

作者： Gutierrez, Julian Previlon, Fritz Kaeli, David Northeastern Univ Dept Elect & Comp Engn Boston MA 02115 USA

ISBN: (纸本)9781728101903

Graphics Processing Units (GPUs) have become a vital hardware resource for the industry and research community due to their high computing capabilities. Despite this, GPUs have not been introduced into the undergraduate curriculum of Computer Engineering and are barely covered in graduate courses. Bridging the gap between university curriculum and industry requirements for GPU expertise is ongoing, but this process takes time. Offering an immediate opportunity for students to learn GPU programming is key for their professional growth. The Northeastern University Computer Architecture Research Lab offers a free GPU programming course to incentivize students from all disciplines to learn how to efficiently program a GPU. In this paper, we discuss the methods used to keep students engaged in a course with no academic obligations. After applying these strategies, we were able to retain more than 80% of the students who started the course. Moreover, the students gave positive feedback on these strategies.

关键词： Graphics processing units Computer architecture Education programming profession parallel programming Hardware

来源：评论

学校读者我要写书评

暂无评论

New List Skeletons for the Python Skeleton Library

New List Skeletons for the Python Skeleton Library

引用

IEEE International Conference on parallel and Distributed Computing, Applications and Technologies (PDCAT)

作者： Frédéric Loulergue Jolan Philippe Northern Arizona University Flagstaff AZ USA School of Informatics Computing and Cyber Systems Northern Arizona University USA IMT Atlantique Nantes France

ISBN: (数字)9781728126166

ISBN: (纸本)9781728126173

Algorithmic skeletons are patterns of parallel computations. Skeletal parallel programming eases parallel programming: a program is merely a composition of such patterns. Data-parallel skeletons operate on parallel data-structures that have often sequential counterparts. In algorithmic skeleton approaches that offer a global view of programs, a parallel program has therefore a structure similar to a sequential program but operates on parallel data-structures. PySke is such an algorithmic skeleton library for Python to program shared or distributed memory parallel architectures in a simple way. This paper presents an extension to PySke: new algorithmic skeletons on parallel lists. This extension is evaluated on an application.

关键词： Skeleton Python Data structures Libraries Indexes parallel architectures parallel programming

来源：评论

学校读者我要写书评

暂无评论

PySke: Algorithmic Skeletons for Python

PySke: Algorithmic Skeletons for Python

引用

International Conference on High Performance Computing & Simulation (HPCS)

作者： Jolan Philippe Frédéric Loulergue School of Informatics Computing and Cyber Systems Northern Arizona University Flagstaff AZ USA

ISBN: (数字)9781728144849

ISBN: (纸本)9781728144856

PySke is a library of parallel algorithmic skeletons in Python designed for list and tree data structures. Such algorithmic skeletons are high-order functions implemented in parallel. An application developed with PySke is a composition of skeletons. To ease the write of parallel programs, PySke does not follow the Single Program Multiple Data (SPMD) paradigm but offers a global view of parallel programs to users. This approach aims at writing scalable programs easily. In addition to the library, we present experiments performed on a high-performance computing cluster (distributed memory) on a set of example applications developed with PySke.

关键词： Skeleton Libraries Python Indexes parallel programming C++ languages parallel processing

来源：评论

学校读者我要写书评

暂无评论

Comprehensive Multiparty Session Types

arXiv

引用

arXiv 2019年

作者： Bejleri, Andi Domnori, Elton Viering, Malte Eugster, Patrick Mezini, Mira TU Darmstadt Darmstadt Germany Canadian Institute of Technology Tirana Albania Universitá della Svizzera Italiana Lugano Switzerland IBM GBS Frankfurt Germany

Multiparty session types (MST) are a well-established type theory that describes the interactive structure of a fixed number of components from a global point of view and type-checks the components through projection of the global type onto the participants of the session. They guarantee communication-safety for a language of multiparty sessions (LMS), i.e., distributed, parallel components can exchange values without deadlocking and unexpected message types. Several variants of MST and LMS have been proposed to study key features of distributed and parallel programming. We observe that the population of the considered variants follows from only one ancestor, i.e. the original LMS/MST, and there are overlapping traits between features of the considered variants and the original. These hamper evolution of session types and languages and their adoption in practice. This paper addresses the following question: What are the essential features for MST and LMS, and how can these be modelled with simple constructs? To the best of our knowledge, this is the first time this question has been addressed. We performed a systematic analysis of the features and the constructs in MST, LMS, and the considered variants to identify the essential features. The variants are among the most influential (according to Google Scholar) and well-established systems that cover a wide set of areas in distributed, parallel programming. We used classical techniques of formal models such as BNF, structural congruence, small step operational semantics and typing judgments to build our language and type system. Lastly, the coherence of operational semantics and type system is proven by induction. This paper proposes a set of essential features, a language of structured interactions and a type theory of comprehensive multiparty session types, including global types and type system. The analysis removes overlapping features and captures the shared traits, thereby introducing the essential features. Th

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：