检索结果-内蒙古大学图书馆

26th Euromicro International Conference on parallel, distributed, and Network-Based Processing (PDP)

作者： Fernandez, Placido del Rio Astorga, David Dolz, Manuel F. Fernandez, Javier Awile, Omar Garcia, Daniel CERN EP LBC CH-1211 Geneva 23 Switzerland Univ Carlos III Madrid Dept Comp Sci Madrid Spain CERN IT CF FPP CH-1211 Geneva 23 Switzerland

ISBN: (纸本)9781538649756

real time data processing is an important component of particle physics experiments with large computing resource requirements. As the Large Hadron Collider (LHC) at CERN is preparing for its next upgrade the LHCb experiment is upgrading its detector for a 30x increase in data throughput. In preparation for this upgrade the experiment is considering a number of architectural improvements encompassing both its software and hardware infrastructure. One of the hardware platforms under consideration is the Intel Xeon-Phi Knights Landing processor. Thanks to its on-package high-bandwidth memory and many-core architecture it offers an interesting alternative to more traditional server systems. We present a scalable, multi-threaded and NUMA-aware Kalman filter proto-application for particle track fitting expressed in terms of generic parallel patterns using the GRPPI interface. We show how code maintainability and readability improves, while maintaining comparable levels of performance to the baseline implementation. This is achieved by keeping the parallel algorithms in the underlying framework generic, but topology aware through the use of the Portable Hardware Locality (hwloc) library, which allows us to target different architectures with the same program. We measure the performance of our topology-aware GRPPI Kalman filter implementation on the Intel Xeon-Phi Knights Landing platform and conclude on the feasibility of integrating such high-level parallelization libraries in complex software frameworks such as LHCb's Gaudi framework.

关键词： NUMA affinity LHCb parallel patterns GRPPI Kalman filter knl multi-threading xeon phi

来源：评论

学校读者我要写书评

暂无评论

Efficient Evaluation of Scheduling Metrics Using Emulation A Case Study in the Effect of Artefacts 18

Efficient Evaluation of Scheduling Metrics Using Emulation A...

引用

47th International Conference on parallel Processing (ICPP) / International workshop on Embedded Multicore systems (EMS)

作者： Barberato, Claudio Strazdins, Peter E. McCreath, Eric Atif, Muhammad Australian Natl Univ Res Sch Comp Sci Canberra ACT Australia Natl Computat Infrastruct Canberra ACT Australia

ISBN: (纸本)9781450365239

Scheduling algorithms have a significant impact in the optimal utilization of HPC facilities. Waiting time, response time, slowdown and weighted slowdown are classical metrics used to compare the performance of different scheduling algorithms. This paper investigates the effects of four artefacts, namely non-determinism, shuffling, time shrinking and sampling, on these metrics. We present a scheduling framework based on emulation, that is, using a real scheduler (Slurm) with a sleep program able to take into account periods of suspension. The framework is able to emulate a 50K core cluster using 10 virtualized nodes, with the scheduler running on an isolated node. We find that the non-determinism in repeatedly running a workload has a small but discernible effect of these metrics, and that shuffling job order in a workload increases this by a factor of 5-10. Experiments with shuffled workloads indicate that the average difference of the Backfill and Suspend-Resume strategy performance is within this variation. We also propose methodologies for time shrinking and sampling to decrease the duration of emulations, while aiming to keep these metrics invariant (or linear variant) with the original workload. We find that time shrinking to a factor of up to 90% can have similar effect on the metrics as non-determinism. For sampling, our methodology preserved the distribution of job sizes to a high extent, but had a variation in the metrics somewhat greater than for shuffling. Finally, we use our framework to study in-depth Slurm's scheduling performance, and discover a deficiency in the Suspend-Resume implementation.

关键词： parallel job scheduling classical scheduling metrics emulation Slurm

来源：评论

学校读者我要写书评

暂无评论

Building a Cloud-Ready Program: A highly scalable Implementation based on Kubernetes 18

Building a Cloud-Ready Program: A highly scalable Implementa...

引用

2nd International Conference on Advances in Image Processing (ICAIP) / 2nd International Conference on Software Engineering and Development (ICSED

作者： Li, Qiankun Yin, Gang Wang, Tao Yu, Yue Natl Univ Def Technol Natl Lab Parallel & Distributed Proc Changsha 410073 Peoples R China

ISBN: (纸本)9781450364607

Build system, which can convert source codes into applications, is essential for the development of software. The general build systems that relying on single physical or cloud host to run bring problems such as system security, resource shortage, overload, and low availability in the face of massive build requests. After modularizing and streamlining the steps during a build process, this paper proposes a system that introduces container technology and then builds a large-scale, real-time, and huge-concurrency supported build system based on Kubernetes[1]. The system provides a highly scalable and feature-stable cloud architecture that supports huge concurrency with lower resource consumption. Also, the system controls programs' behaviors very well to avoid potential security and resource issues and shows excellent performance in concurrency, scalability, security, and load balance even when handling a large number of build tasks.

关键词： Build system Container Kubernetes Concurrent build

来源：评论

学校读者我要写书评

暂无评论

Properties of the parallel discrete event simulation algorithms on small-world communication networks 8

Properties of the parallel discrete event simulation algorit...

引用

Selected Papers of the 8th International Conference "distributed Computing and Grid-Technologies in Science and Education", GRID 2018

作者： Ziganurova, Liliia Shchur, Lev Science Center in Chernogolovka Chernogolovka Moscow Region142432 Russia National Research University Higher School of Economics Moscow101000 Russia Landau Institute for Theoretical Physics Chernogolovka Moscow Region142432 Russia

Synchronization aspects in the method of large-scale simulation, knovvn as parallel discrete event simulation (PDES), analyzed using the models of the time profile evolutions. time profile is formed vvith the local virtual times of the processes. time profile evolution in the simplest cases reminds the growth of the surface in the models of statistical physics. This simplest case considers only local dependences, the message exchange with the nearest neighbors. In the real simulations, the exchange can be with any of the processes, and we can consider the communication network of messages forms the small-world network. We found the enhancement of synchronization with the growing average shortest path of the small-world network. At the same time, the utilization drops down, although on the moderate level. We present the preliminary results of our study. The work is supported by grant 14-21-00158 of the Russian Science Foundation. © 2018 Liliia Ziganurova, Lev Shchur.

关键词： Synchronization

来源：评论

学校读者我要写书评

暂无评论

8th International Conference on Model and Data Engineering, MEDI 2018, International workshop on Modeling, Verification and Testing of Dependable Critical systems, DETECT 2018, Model and Data Engineering for Social Good workshop, MEDI4SG 2018, 2nd International workshop on Cybersecurity and Functional Safety in Cyber-Physical systems, IWCFS 2018, International workshop on Formal Model for Mastering Multifaceted systems, REMEDY 2018

8th International Conference on Model and Data Engineering, ...

引用

8th International Conference on Model and Data Engineering, MEDI 2018, International workshop on Modeling, Verification and Testing of Dependable Critical systems, DETECT 2018, Model and Data Engineering for Social Good workshop, MEDI4SG 2018, 2nd International workshop on Cybersecurity and Functional Safety in Cyber-Physical systems, IWCFS 2018, International workshop on Formal Model for Mastering Multifaceted systems, REMEDY 2018

ISBN: (纸本)9783030028510

The proceedings contain 21 papers. The special focus in this conference is on Model and Data Engineering. The topics include: Towards a requirements engineering approach for capturing uncertainty in cyber-physical systems environment;assessment of emerging standards for safety and security co-design on a railway case study;generation of behavior-driven development C++ tests from abstract state machine scenarios;hybrid systems and event-B: A formal approach to signalised left-turn assist;handling reparation in incremental construction of realizable conversation protocols;Analyzing a ROS based architecture for its cross reuse in ISO26262 settings;reliability in fully probabilistic event-B: How to bound the enabling of events;systematic construction of critical embedded systems using event-B;component design and adaptation based on behavioral contracts;An MDA approach for the specification of relay-based diagrams;Towards real-time semantics for a distributed event-based MOP language;Automatic planning: From event-B to PDDL;a problem-oriented approach to critical system design and diagnosis support;formal specification and verification of cloud resource allocation using timed petri-nets;Petri nets to event-B: Handling mathematical sequences through an ERTMS L3 Case;model-based verification and testing methodology for safety-critical airborne systems;gamification and serious games based learning for early childhood in rural areas;context-based sentiment analysis: A survey;a multi-agent system-based distributed intrusion detection system for a cloud computing.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Highly efficient image registration for embedded systems using a distributed multicore DSP architecture

引用

JOURNAL OF real-time IMAGE PROCESSING 2018年第2期14卷 341-361页

作者： Berg, Roelof Koenig, Lars Ruehaak, Jan Lausen, Ralph Fischer, Bernd Berg Solut Lubeck Germany Fraunhofer MEVIS Lubeck Germany DHBW Karlsruhe Karlsruhe Germany

We present a complete approach to highly efficient image registration for embedded systems, covering all steps from theory to practice. An optimization-based image registration algorithm using a least-squares data term is implemented on an embedded distributed multicore digital signal processor (DSP) architecture. All relevant parts are optimized, ranging from mathematics, algorithmics, and data transfer to hardware architecture and electronic components. The optimization for the rigid alignment of two-dimensional images is performed in a multilevel Gauss-Newton minimization framework. We propose a reformulation of the necessary derivative computations, which eliminates all sparse matrix operations and allows for parallel, memory-efficient computation. The pixelwise parallellism forms an ideal starting point for our implementation on a multicore, multichip DSP architecture. The reduction of data transfer to the particular DSP chips is key for an efficient calculation. By determining worst cases for the subimages needed on each DSP, we can substantially reduce data transfer and memory requirements. This is accompanied by a sophisticated padding mechanism that eliminates pipeline hazards and speeds up the generation of the multilevel pyramid. Finally, we present a reference hardware architecture consisting of four TI C6678 DSPs with eight cores each. We show that it is possible to register high-resolution images within milliseconds on an embedded device. In our example, we register two images with 4096 x 4096 pixels within 93 ms, while off-loading the CPU by a factor of 20 and requiring 3.12 times less electrical energy.

关键词： Image registration Embedded systems parallelization distributed computing DSP

来源：评论

学校读者我要写书评

暂无评论

Proceedings - 2018 IEEE 32nd International parallel and distributed Processing Symposium workshops, IPDPSW 2018

Proceedings - 2018 IEEE 32nd International Parallel and Dist...

引用

32nd IEEE International parallel and distributed Processing Symposium workshops, IPDPSW 2018

ISBN: (纸本)9781538655559

The proceedings contain 145 papers. The topics discussed include: user-transparent translation of machine instructions to programmable hardware;approximation algorithm for scheduling applications on hybrid multi-core machines with communications delays;large scale data centers simulation based on baseline test model;application performance on a cluster-booster system;transport-triggered soft cores;robustness of surface EMG classifiers with fixed-point decomposition on reconfigurable architecture;streaming architecture for large-scale quantized neural networks on an FPGA-based dataflow platform;high-level reliability evaluation of reconfiguration-based fault tolerance techniques;dynamic reconfiguration for real-time automotive embedded systems in fail-operational context;and rerooting trees increases opportunities for concurrent computation and results in markedly improved performance for phylogenetic inference.

关键词：

来源：评论

学校读者我要写书评

暂无评论

A Sandpile cellular automata-based scheduler and load balancer

引用

JOURNAL OF COMPUTATIONAL SCIENCE 2017年第Jul.期21卷 460-468页

作者： Gasior, Jakub Seredynski, Franciszek Cardinal Stefan Wyszynski Univ Dept Math & Nat Sci Warsaw Poland

We present in this paper a novel load balancing and rescheduling approach based on the concept of the Sandpile cellular automaton: a decentralized multi-agent system working in a critical state at the edge of chaos. Our goal is providing fairness between concurrent job submissions in highly parallel and distributed environments such as currently built cloud computing systems by minimizing slowdown of individual applications and dynamically rescheduling them to the best suited resources. The algorithm design is experimentally validated by a number of numerical experiments showing the effectiveness and scalability of the scheme in the presence of a large number of jobs and resources and its ability to react to dynamic changes in real time. (C) 2016 Elsevier B.V. All rights reserved.

关键词： Cellular automata distributed systems Self-organization Scheduling and load-balancing

来源：评论

学校读者我要写书评

暂无评论

Response-time Analysis in Hierarchically-Scheduled time-Partitioned distributed systems

引用

IEEE TRANSACTIONS ON parallel AND distributed systems 2017年第7期28卷 2017-2030页

作者： Carlos Palencia, J. Gonzalez Harbour, Michael Javier Gutierrez, J. Rivas, Juan M. Univ Cantabria Fac Ciencias Software Engn & Real Time Grp E-39005 Santander Spain

This paper develops an offset-based response-time analysis technique for analyzing complex distributed real-time systems where processing and communication resources use the time-partitioning strategy to isolate the operation of separate software components. time partitioning may be provided in the processors by an ARINC 653 compliant operating system, and in the networks via the TTP communication protocol. The software components executed by the system may themselves be distributed and complex, composed of many concurrent tasks and with one or more end-to-end flows that may have end-to-end timing requirements. The developed analysis supports hierarchical scheduling where a primary scheduler performs time partitioning into separate partitions, and secondary fixed-priority schedulers dispatch the different concurrent tasks inside each partition. It also supports end-to-end flows that are either synchronized with the partition schedule or not. This is the first time that this kind of analysis is developed. An evaluation of an improvement introduced in the analysis is discussed. Two representative case studies are described.

关键词： real-time distributed systems scheduling task partitioning clock synchronization embedded systems modeling techniques worst-case analysis response-time analysis

来源：评论

学校读者我要写书评

暂无评论

A Memory Efficient parallel Particle-Based Volume Rendering for Large-Scale distributed Unstructured Volume Datasets in HPC Environments 18th

A Memory Efficient Parallel Particle-Based Volume Rendering ...

引用

18th Annual Asia Simulation Conference (AsiaSim)

作者： Yamaoka, Yoshiaki Hayashi, Kengo Sakamoto, Naohisa Nonaka, Jorji Kobe Univ Kobe Hyogo Japan RIKEN Ctr Computat Sci Kobe Hyogo Japan

ISBN: (纸本)9789811328534;9789811328527

In recent years, the size and complexity of the datasets generated by the large-scale numerical simulations using modern HPC (High Performance Computing) systems have continuously increasing. These generated datasets can possess different formats, types, and attributes. In this work, we have focused on the large-scale distributed unstructured volume datasets, which are still applied on numerical simulations in a variety of scientific and engineering fields. Although volume rendering is one of the most popular techniques for analyzing and exploring a given volume data, in the case of unstructured volume data, the time-consuming visibility sorting becomes problematic as the data size increases. Focusing on an effective volume rendering of large-scale distributed unstructured volume datasets generated in HPC environments, we opted for using the well-known PBVR (Particle-based Volume Rendering) method. Although PBVR does not require any visibility sorting during the rendering process, the CPU-based approach has a notorious image quality and memory consumption tradeoff. This is because that the entire set of the intermediate rendering primitives (particles) was required to be stored a priori to the rendering processing. In order to minimize the high pressure on the memory consumption, we propose a fully parallel PBVR approach, which eliminates the necessity for storing these intermediate rendering primitives, as required by the existing approaches. In the proposed method, each set of the rendering primitives is directly converted to a partial image by the processes, and then they are gathered and merged by the utilized parallel image composition library (234Compositor). We evaluated the memory cost and processing time by using a real CFD simulation result, and we could verify the effectiveness of our proposed method compared to the already existing parallel PBVR method.

关键词： Particle-based Volume Rendering Unstructured volume dataset 234 image composition In-situ visualization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：