检索结果-内蒙古大学图书馆

38th International Conference on Information Systems Architecture and Technology (ISAT)

作者： Krzywaniak, Adam Czarnul, Pawel Gdansk Univ Technol Fac Elect Telecommun & Informat Gdansk Poland

ISBN: (纸本)9783319672205;9783319672199

In the paper we present parallel implementations as well as execution times and speed-ups of three different algorithms run in various environments such as on a workstation with multi-core CPUs and a cluster. The parallel codes, implementing the master-slave model in C+MPI, differ in computation to communication ratios. The considered problems include: a genetic algorithm with various ratios of master processing time to communication and fitness evaluation times, matrix multiplication and numerical integration. We present how the codes scale in the aforementioned systems. For the numerical integration code that scales very well we also show performance in a hybrid CPU+Xeon Phi environment.

关键词： parallel programming Multi-core CPU Cluster Intel Xeon Phi parallelization

来源：评论

学校读者我要写书评

暂无评论

Towards Automatically Optimizing PySke Programs

Towards Automatically Optimizing PySke Programs

引用

International Conference on High Performance Computing & Simulation (HPCS)

作者： Jolan Philippe Frédéric Loulergue School of Informatics Computing and Cyber Systems Northern Arizona University Flagstaff USA

ISBN: (数字)9781728144849

ISBN: (纸本)9781728144856

Explicit parallel programming for shared and distributed memory architectures is an efficient way to deal with data intensive computations. However approaches such as explicit threads or MPI remain difficult solutions for most programmers. Indeed they have to face different constraints such as explicit inter-processors communications or data distribution.

关键词： Skeleton Data structures Python Libraries parallel programming Program processors Informatics

来源：评论

学校读者我要写书评

暂无评论

OpenCL Superpixel Implementation on a General Purpose Multi-core CPU

OpenCL Superpixel Implementation on a General Purpose Multi-...

引用

IEEE International Conference on Imaging Systems and Techniques (IST)

作者： Haseljic, Hana Cogo, Emir Prazina, Irfan Turcinhodzic, Razija Buza, Emir Akagic, Amila Univ Sarajevo Dept Comp Sci & Informat Fac Elect Engn Sarajevo Bosnia & Herceg

ISBN: (纸本)9781538666289

Multi-, many-core, hybrid processors and parallel programming languages are slowly becoming pervasive in mainstream computing. It is expected that they will affect a large spectrum of systems, from embedded and general-purpose, to high-end computing systems. This architectural change has already challenged programmers to efficiently write an application code that can scale over many cores to utilize its computational power. Moreover, many heterogeneous architectures exist today, hence there was an emergent need for a uniform interface to these architectures. Recently, Khronos Group defined the Open Computing Language (OpenCL) for abstracting the underlying hardware, which enables software developers to write a portable code across different shared-memory architectures. In this paper, we introduce a new parallel implementation of one of the fastest image segmentation algorithms known as Simple Linear Iterative Clustering based on OpenCL. We evaluate the effectiveness of this implementation using only multi-core GPCPU. Our implementation is fully compatible with sequential implementation. When the algorithm is executed sequentially it utilizes only 25% of total computational power of a GPCPU for any image resolution, while its modified algorithm is able to utilize close to 100% for high resolution images. The resulting algorithm is up to 5x faster than its sequential counterpart.

关键词： parallel programming multi-core gpcpu opencl superpixel image segmentation image processing

来源：评论

学校读者我要写书评

暂无评论

parallel Power Flow based on OpenMP

Parallel Power Flow based on OpenMP

引用

North American Power Symposium (NAPS)

作者： Ahmadi, Afshin Jin, Shuangshuang Smith, Melissa C. Collins, E. Randolph Goudarzi, Arman Clemson Univ Holcombe Dept Elect & Comp Engn Clemson SC 29631 USA Clemson Univ Sch Comp Clemson SC 29631 USA Univ KwaZulu Natal Discipline Elect Elect & Comp Engn ZA-4001 Durban South Africa

ISBN: (纸本)9781538671382

Integration of intermittent renewable energy resources to the power system necessitates the development of fast computational methods and tools to enable real-time monitoring, control, and decision making in the power grid. Generally, techniques which can be used to increase the computational speed are summarized in algorithm improvement and hardware acceleration. In this paper, the serial version of the Newton-Raphson power flow algorithm has been transformed to a parallel solution by using OpenMP standard. The parallel implementation is tested on several power systems and the computational efficiency is compared with varying thread numbers. The experimental results show more than three times speedup ratio achievement and significant computational time reduction.

关键词： Power Flow Analysis parallel programming OpenMP Standard Newton-Raphson Method

来源：评论

学校读者我要写书评

暂无评论

Verified Programs for Frequent Itemset Mining 4

Verified Programs for Frequent Itemset Mining

引用

IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI)

作者： Loulergue, Frederic Whitney, Christopher D. No Arizona Univ Sch Informat Comp & Cyber Syst Flagstaff AZ 86011 USA

ISBN: (纸本)9781538693803

Frequent itemset mining is one pillar of machine learning and is very important for many data mining applications. There are many different algorithms for frequent itemset mining, but to our knowledge no implementation has been proven correct using computer aided verification. Hu et al. derived on paper an efficient algorithm for this problem, starting from an inefficient functional program and by using program calculation derived an efficient version. Based on their work, we propose a formally verified functional implementation for frequent itemset mining developed with the Coq proof assistant. All the proposed programs are evaluated on classical datasets and are compared to a non verified Java implementation of the Apriori algorithm.

关键词： Formal verification Coq proof assistant data mining frequent itemset mining functional programming parallel programming

来源：评论

学校读者我要写书评

暂无评论

Towards Embedded Heterogeneous FPGA-GPU Smart Camera Architectures for CNN Inference 2019

Towards Embedded Heterogeneous FPGA-GPU Smart Camera Archite...

引用

Proceedings of the 13th International Conference on Distributed Smart Cameras

作者： Walther Carballo-Hernández François Berry Maxime Pelcat Miguel Arias-Estrada Department of Images Perception Systems and Robotics Institut Pascal Aubière France Department of Images Institut National des Sciences Appliquées (INSA) des Rennes IETR UMR CNRS Rennes France Department of Computer Science Instituto Nacional de Astrofísica (Óptica y Electrónica (INAOE) Puebla Mexico

ISBN: (纸本)9781450371896

The success of Deep Learning (DL) algorithms in computer vision tasks have created an on-going demand of dedicated hardware architectures that could keep up with the their required computation and memory complexities. This task is particularly challenging when embedded smart camera platforms have constrained resources such as power consumption, Processing Element (PE) and communication. This article describes a heterogeneous system embedding an FPGA and a GPU for executing CNN inference for computer vision applications. The built system addresses some challenges of embedded CNN such as task and data partitioning, and workload balancing. The selected heterogeneous platform embeds an Nvidia® Jetson TX2 for the CPU-GPU side and an Intel Altera® Cyclone10GX for the FPGA side interconnected by PCIe Gen2 with a MIPI-CSI camera for prototyping. This test environment will be used as a support for future work on a methodology for optimized model partitioning.

关键词： Internet of Things Field Programmable Gate Array (FPGA) Graphic Processing Unit (GPU) Deep Learning (DL) Edge Computing Processing Elements (PE) Artificial Neural Networks (ANN) Single Instruction Multiple Data Convolutional Neural Networks (CNN) Models of Computation and Architecture Pipelining Heterogeneous Computing parallel programming

来源：评论

学校读者我要写书评

暂无评论

An Analysis and a Solution of False Conflicts for Hardware Transactional Memory 25

An Analysis and a Solution of False Conflicts for Hardware T...

引用

25th IEEE International Conference on Electronics, Circuits and Systems (ICECS)

作者： Futamase, Yuki Hayashi, Masaki Tajimi, Tomoki Shioya, Ryota Goshima, Masahiro Tsumura, Tomoaki Nagoya Inst Technol Showa Ku Nagoya Aichi Japan Univ Tokyo Bunkyo Ku Hongo 7-3-1 Tokyo Japan Natl Inst Informat Chiyoda Ku Hitotsubashi 2-1-2 Tokyo Japan

ISBN: (纸本)9781538695623

Transactional memory is a promising paradigm for shared-memory parallel programming model. On TMs, transactions are executed speculatively in parallel as long as any access conflict is not detected. On general hardware transactional memories (HTMs), conflicts degrade the performance because of the overhead for retrying transactions, and it is important to avoid conflicts. HTM generally detects access conflicts on cache line granularity, and this causes accesses on different variables that are on a cache line to be falsely detected as conflicting accesses. In this paper, we analyze how frequently such false conflicts occur and what type of coding can cause them. As a result of the analysis, we confirmed that the false conflicts account for 27.4% on average and even 99.9% at a maximum of all detected conflicts. We also propose a light-weight fine-grained conflict detection mechanism and show that it can reduce the execution cycles by 17.7% on average and 36.5% at a maximum.

关键词： Hardware Memory management Encoding Synchronization parallel programming Message systems parallel processing

来源：评论

学校读者我要写书评

暂无评论

C# 7 and . NET Core 2. 0 High Performance 1

引用

丛书名： []

2018年

作者： Ovais Mehboob Ahmed Khan

ISBN: (数字)9781788474603

ISBN: (纸本)9781788470049

Improve the speed of your code and optimize the performance of your apps Key Features Understand the common performance pitfalls and improve your application's performance Get to grips with multi-threaded and asynchronous programming in C# Develop highly performant applications on .NET Core using microservice architecture Book Description While writing an application, performance is paramount. Performance tuning for realworld applications often involves activities geared toward fnding bottlenecks; however, this cannot solve the dreaded problem of slower code. If you want to improve the speed of your code and optimize an application's performance, then this book is for you. C# 7 and .NET Core 2.0 High Performance begins with an introduction to the new features of what'explaining how they help in improving an application's performance. Learn to identify the bottlenecks in writing programs and highlight common performance pitfalls, and learn strategies to detect and resolve these issues early. You will explore multithreading and asynchronous programming with .NET Core and learn the importance and effcient use of data structures. This is followed with memory management techniques and design guidelines to increase an application's performance. Gradually, the book will show you the importance of microservices architecture for building highly performant applications and implementing resiliency and security in .NET Core. After reading this book, you will learn how to structure and build scalable, optimized, and robust applications in C#7 and .NET. What you will learn Measure application performance using BenchmarkDotNet Explore the techniques to write multithreaded applications Leverage TPL and PLinq libraries to perform asynchronous operations Get familiar with data structures to write optimized code Understand design techniques to increase your application's performance Learn about memory management techniques in .NET Core Develop a containerized application based on micr

关键词： C# C# 7 .NET Core Multithreading concurrency parallel programming Core Microservices resiliency Security Core Identity Memory Management data structures monitor application performance

来源：评论

学校读者我要写书评

暂无评论

parallel solver for the Poisson equation on a hierarchy of superimposed meshes, under a Python framework

Parallel solver for the Poisson equation on a hierarchy of s...

引用

作者： Par Federico Tesser UNIVERSITE DE BORDEAUX

学位级别：博士

Adaptive discretizations are important in compressible/incompressible flow problems since it is often necessary to resolve details on multiple levels, allowing large regions of space to be modeled using a reduced number of degrees of freedom (reducing the computational time). There are a wide variety of methods for adaptively discretizing space, but Cartesian grids have often outperformed them even at high resolutions due to their simple and accurate numerical stencils and their superior parallel performances. Such performance and simplicity are in general obtained applying a finite-difference scheme for the resolution of the problems involved, but this discretization approach does not present, by contrast, an easy adapting path. In a finite-volume scheme, instead, we can incorporate different types of grids, more suitable for adaptive refinements, increasing the complexity on the stencils and getting a greater flexibility. The Laplace operator is an essential building block of the Navier-Stokes equations, a model that governs fluid flows, but it occurs also in differential equations that describe many other physical phenomena, such as electric and gravitational potentials, and quantum mechanics. So, it is a very important differential operator, and all the studies carried out on it, prove its relevance. In this work will be presented 2D finite-difference and finite-volume approaches to solve the Laplacian operator, applying patches of overlapping grids where a more fined level is needed, leaving coarser meshes in the rest of the computational domain. These overlapping grids will have generic quadrilateral shapes. Specifically, the topics covered will be: 1) introduction to the finite difference method, finite volume method, domain partitioning, solution approximation; 2) overview of different types of meshes to represent in a discrete way the geometry involved in a problem, with a focus on the octree data structure, presenting PABLO and PABLitO. The first one is an

关键词： Python Finite Differences Finite Volumes parallel programming Laplace Operator Adaptive Discretizations

来源：评论

学校读者我要写书评

暂无评论

Real-time cortical simulation on neuromorphic hardware

arXiv

引用

arXiv 2019年

作者： Rhodes, Oliver Peres, Luca Rowley, Andrew G.D. Gait, Andrew Plana, Luis A. Brenninkmeijer, Christian Furber, Steve B. Department of Computer Science University of Manchester Manchester United Kingdom

Real-time simulation of a large-scale biologically representative spiking neural network is presented, through the use of a heterogeneous parallelisation scheme and SpiNNaker neuromorphic hardware. A published cortical microcircuit model is used as a benchmark test case, representing ≈ 1 mm2 of early sensory cortex, containing 77k neurons and 0.3 billion synapses. This is the first true real-time simulation of this model, with 10 s of biological simulation time executed in 10 s wall-clock time. This surpasses best published efforts on HPC neural simulators (3× slowdown) and GPUs running optimised SNN libraries (2× slowdown). Furthermore, the presented approach indicates that real-time processing can be maintained with increasing SNN size, breaking the communication barrier incurred by traditional computing machinery. Model results are compared to an established HPC simulator baseline to verify simulation correctness, comparing well across a range of statistical measures. Energy to solution, and energy per synaptic event are also reported, demonstrating that the relatively low-tech SpiNNaker processors achieve a 10× reduction in energy relative to modern HPC systems, and comparable energy consumption to modern GPUs. Finally, system robustness is demonstrated through multiple 12 h simulations of the cortical microcircuit, each simulating 12 h of biological time, and demonstrating the potential of neuromorphic hardware as a neuroscience research tool for studying complex spiking neural networks over extended time periods. Copyright © 2019, The Authors. All rights reserved.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：