检索结果-内蒙古大学图书馆

International Conference on High Performance computing & Simulation (HPCS)

作者： Javier Carnero Francisco Javier Nieto Advanced Parallel Computing Lab ATOS Seville Spain Advanced Parallel Computing Lab ATOS Bilbao Spain

ISBN: (纸本)9781538678800

In general, one of the complexities of large simulations is related to the usage of the heterogeneous computational resources that are needed to execute them. The definition of workflows, usually linked to concrete orchestrations solutions, has reduced most of that complexity. These solutions are oriented to High Performance computing (HPC) or just deals with services managed remotely. This paper presents a novel solution we propose for running simulations in a hybrid HPC and Cloud infrastructure, exploiting the performance and power of HPC systems and benefiting from the fast and flexible provision of Cloud resources. We provide our vision about the typical simulation workflows and the kind of computational resources that fits best in each phase. In line with such vision, we describe the research done in order to enable the definition of the workflows extending the TOSCA standard (originally focused on Cloud solutions), used by our orchestrator and other solutions. We propose several extensions (types, relationships, compute properties and job properties), compatible with the standard definition, so Cloud and HPC tasks can be processed as expected. The paper also shows a use case implemented with the proposed approach, highlighting some of the benefits found so far.

关键词： Mathematical model Computational modeling Task analysis Cloud computing Complexity theory Tools Software

来源：评论

学校读者我要写书评

暂无评论

PRACTICAL MASSIVELY parallel MONTE-CARLO TREE SEARCH APPLIED TO MOLECULAR DESIGN 9

PRACTICAL MASSIVELY PARALLEL MONTE-CARLO TREE SEARCH APPLIED...

引用

9th International Conference on Learning Representations, ICLR 2021

作者： Yang, Xiufeng Aasawat, Tanuj Kr Yoshizoe, Kazuki Chugai Pharmaceutical Co. Ltd Japan Parallel Computing Lab - India Intel Labs India RIKEN Center for Advanced Intelligence Project Japan

It is common practice to use large computational resources to train neural networks, known from many examples, such as reinforcement learning applications. However, while massively parallel computing is often used for training models, it is rarely used to search solutions for combinatorial optimization problems. This paper proposes a novel massively parallel Monte-Carlo Tree Search (MP-MCTS) algorithm that works efficiently for a 1,000 worker scale on a distributed memory environment using multiple compute nodes and applies it to molecular design. This paper is the first work that applies distributed MCTS to a real-world and non-game problem. Existing works on large-scale parallel MCTS show efficient scalability in terms of the number of rollouts up to 100 workers. Still, they suffer from the degradation in the quality of the solutions. MP-MCTS maintains the search quality at a larger scale. By running MP-MCTS on 256 CPU cores for only 10 minutes, we obtained candidate molecules with similar scores to non-parallel MCTS running for 42 hours. Moreover, our results based on parallel MCTS (combined with a simple RNN model) significantly outperform existing state-of-the-art work. Our method is generic and is expected to speed up other applications of MCTS. © 2021 ICLR 2021 - 9th International Conference on Learning Representations. All rights reserved.

关键词： Reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

POPA: Expressing High and Portable Performance across Spatial and Vector Architectures for Tensor Computations 24

POPA: Expressing High and Portable Performance across Spatia...

引用

32nd ACM International Symposium on Field-Programmable Gate Arrays, FPGA 2024

作者： Hao, Xiaochen Rong, Hongbo Zhang, Mingzhe Sun, Ce Jiang, Hong Liang, Yun Peking University China Parallel Computing Lab Intel United States Tsinghua University China University of Science and Technology of China China Intel United States Peking University & Beijing Advanced Innovation Center for Integrated Circuits China

ISBN: (纸本)9798400704185

This paper aims at high and portable performance for tensor computations across spatial (e.g., FPGAs) and vector architectures (e.g., GPUs). The state-of-the-art usually address performance portability across vector architectures (CPUs and GPUs). However, they either miss FPGAs or do not achieve high performance. Without a common architectural abstraction, they program and optimize spatial and vector devices separately, causing low portability. We propose a unified programming framework, POPA, which achieves portability via architectural abstraction and performance via specialization. A parallel dataflow machine is proposed as a unified, abstract hardware target that hides differences of concrete architectures. The machine consists of software-defined systolic arrays and a tensor-specific cache hierarchy, which captures pipeline parallelism and customizable memories on FPGAs, as well as multithreading parallelism on GPUs. The machine is specified in a unified programming model as two dataflow graphs for scheduling compute and data movement, respectively. A compiler then specializes the abstract machine to exploit the properties of FPGAs and GPUs, bridging the gap between the abstract machine and a concrete architecture. We evaluate POPA on several Intel FPGAs and GPUs with high-profile tensor kernels, and this is the first system that achieves >=80% performance of expert-written code or machine peak across architectures, to the best of our knowledge. © 2024 ACM.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

Delayed Difference Scheme for Large Scale Scientific Simulations

引用

Physical Review Letters 2014年第21期113卷 218701-218701页

作者： Dheevatsa Mudigere Sunil D. Sherlekar Santosh Ansumali Parallel Computing Lab Intel Labs Bangalore 560103 India Engineering Mechanics Unit Jawaharlal Nehru Centre for Advanced Scientific Research Jakkur Bangalore 560064 India

We argue that the current heterogeneous computing environment mimics a complex nonlinear system which needs to borrow the concept of time-scale separation and the delayed difference approach from statistical mechanics and nonlinear dynamics. We show that by replacing the usual difference equations approach by a delayed difference equations approach, the sequential fraction of many scientific computing algorithms can be substantially reduced. We also provide a comprehensive theoretical analysis to establish that the error and stability of our scheme is of the same order as existing schemes for a large, well-characterized class of problems.

关键词： HETEROGENEOUS computing COMPUTER algorithms NONLINEAR dynamical systems STATISTICAL mechanics DIFFERENCE equations

来源：评论

学校读者我要写书评

暂无评论

Data structure and movement for lattice-based simulations

引用

Physical Review E 2013年第1期88卷 013314-013314页

作者： Aniruddha G. Shet Shahajhan H. Sorathiya Siddharth Krithivasan Anand M. Deshpande Bharat Kaul Sunil D. Sherlekar Santosh Ansumali Parallel Computing Lab Intel Labs Bangalore 560103 India Engineering Mechanics Unit Jawaharlal Nehru Centre for Advanced Scientific Research Jakkur Bangalore 560064 India

We show that for the lattice Boltzmann model, the existing paradigm in computer science for the choice of the data structure is suboptimal. In this paper we use the requirements of physical symmetry necessary for recovering hydrodynamics in the lattice Boltzmann description to propose a hybrid data layout for the method. This hybrid data structure, which we call a structure of an array of structures, is shown to be optimal for the lattice Boltzmann model. Finally, the possible advantages of establishing a connection between group theoretic symmetry requirements and the construction of the data structure is discussed in the broader context of grid-based methods.

关键词： DATA structures (Computer science) LATTICE theory SIMULATION methods & models LATTICE Boltzmann methods HYDRODYNAMICS COMPUTATIONAL grids (Computer systems)

来源：评论

学校读者我要写书评

暂无评论

Practical massively parallel monte-carlo tree search applied to molecular design

arXiv

引用

arXiv 2020年

作者： Yang, Xiufeng Aasawat, Tanuj Kr Yoshizoe, Kazuki Chugai Pharmaceutical Co. Ltd Parallel Computing Lab - India Intel Labs RIKEN Center for Advanced Intelligence Project

It is common practice to use large computational resources to train neural networks, known from many examples, such as reinforcement learning applications. However, while massively parallel computing is often used for training models, it is rarely used to search solutions for combinatorial optimization problems. This paper proposes a novel massively parallel Monte-Carlo Tree Search (MP-MCTS) algorithm that works efficiently for a 1,000 worker scale on a distributed memory environment using multiple compute nodes and applies it to molecular design. This paper is the first work that applies distributed MCTS to a real-world and non-game problem. Existing works on large-scale parallel MCTS show efficient scalability in terms of the number of rollouts up to 100 workers. Still, they suffer from the degradation in the quality of the solutions. MP-MCTS maintains the search quality at a larger scale. By running MP-MCTS on 256 CPU cores for only 10 minutes, we obtained candidate molecules with similar scores to non-parallel MCTS running for 42 hours. Moreover, our results based on parallel MCTS (combined with a simple RNN model) significantly outperform existing state-of-the-art work. Our method is generic and is expected to speed up other applications of MCTS1 Copyright © 2020, The Authors. All rights reserved.

关键词： Trees (mathematics)

来源：评论

学校读者我要写书评

暂无评论

Characterizing Multi-media Retrieval Applications

Characterizing Multi-media Retrieval Applications

引用

International Conference on parallel Processing (ICPP)

作者： Yunping Lu Xin Wang Weihua Zhang Yi Li Wenyun Zhao Shanghai Key Laboratory of Data Science Fudan University Shanghai China State Key Lab of Mathematical Engineering and Advanced Computing Wuxi China Software School Fudan University Shanghai China Parallel Processing Institute Fudan University Shanghai China

Multimedia data, especially image and video data, have become one of the most overwhelming data types on the Internet recently. Considering the user experience and real application requirements, multimedia data always demand a real-time processing speed. As a result, the huge amount of such data make retrieving useful information from them not only data-intensive, but also computation-intensive, which poses significant challenges to current system and architecture designs. Unfortunately, most prior studies focus only on text based retrieval systems or traditional multimedia processing applications. As far as we know, there is no systematic study on analyzing the characteristics of multimedia retrieval applications and how they might impact system and architecture designs. In this paper, we make the first attempt to construct a multimedia retrieval benchmark suite (called MMR Bench) to evaluate the corresponding system and architecture designs. To embody diverse multimedia retrieval applications, we collect eight state-of-the-art multimedia retrieval algorithms which cover the whole retrieval stages, including feature extraction, feature matching, and spatial verification. To satisfy diverse evaluation purposes, we implement multiple versions for each algorithm, including sequential version, pthread version for multi-core evaluation and data-parallel (i.e., Map-reduce) version for data-center evaluation. Moreover, MMR Bench provides flexible interfaces through retrieval stages, as well as a tool to adjust parameters and regenerating different scales of reasonable input. With such a flexible design, the algorithms in MMR Bench may be not only suitable for individual kernel-level evaluation, but also capable to be integrated into a complete infrastructure for system-level evaluation. Based on MMR Bench, we further analyze the inherent architectural characteristics, such as input size sensitivity and workload balance, which provides some insights into system and archite

关键词： Multimedia communication Feature extraction Benchmark testing Streaming media Algorithm design and analysis Computer architecture

来源：评论

学校读者我要写书评

暂无评论

Quantum Neuron: An elementary building block for machine learning on quantum computers

arXiv

引用

arXiv 2017年

作者： Cao, Yudong Guerreschi, Gian Giacomo Aspuru-Guzik, Alán Department of Chemistry and Chemical Biology Harvard University CambridgeMA02138 United States Parallel Computing Lab Intel Corporation Santa ClaraCA95054 Canadian Institute for Advanced Research TorontoONM5G 1Z8 Canada

Even the most sophisticated artificial neural networks are built by aggregating substantially identical units called neurons. A neuron receives multiple signals, internally combines them, and applies a non-linear function to the resulting weighted sum. Several attempts to generalize neurons to the quantum regime have been proposed, but all proposals collided with the difficulty of implementing non-linear activation functions, which is essential for classical neurons, due to the linear nature of quantum mechanics. Here we propose a solution to this roadblock in the form of a small quantum circuit that naturally simulates neurons with threshold activation. Our quantum circuit defines a building block, the "quantum neuron", that can reproduce a variety of classical neural network constructions while maintaining the ability to process superpositions of inputs and preserve quantum coherence and entanglement. In the construction of feedforward networks of quantum neurons, we provide numerical evidence that the network not only can learn a function when trained with superposition of inputs and the corresponding output, but that this training suffices to learn the function on all individual inputs separately. When arranged to mimic Hopfield networks, quantum neural networks exhibit properties of associative memory. Patterns are encoded using the simple Hebbian rule for the weights and we demonstrate attractor dynamics from corrupted inputs. Finally, the fact that our quantum model closely captures (traditional) neural network dynamics implies that the vast body of literature and results on neural networks becomes directly relevant in the context of quantum machine learning. Copyright © 2017, The Authors. All rights reserved.

关键词： Neurons

来源：评论

学校读者我要写书评

暂无评论

GREYONE: data flow sensitive fuzzing 20

GREYONE: data flow sensitive fuzzing

引用

Proceedings of the 29th USENIX Conference on Security Symposium

作者： Shuitao Gan Chao Zhang Peng Chen Bodong Zhao Xiaojun Qin Dong Wu Zuoning Chen State Key Laboratory of Mathematical Engineering and Advanced Computing Institute for Network Science and Cyberspace Tsinghua University and Beijing National Research Center for Information Science and Technology ByteDance AI lab Institute for Network Science and Cyberspace Tsinghua University National Research Center of Parallel Computer Engineering and Technology

ISBN: (纸本)9781939133175

Data flow analysis (e.g., dynamic taint analysis) has proven to be useful for guiding fuzzers to explore hard-to-reach code and find vulnerabilities. However, traditional taint analysis is labor-intensive, inaccurate and slow, affecting the fuzzing efficiency. Apart from taint, few data flow features are *** this paper, we proposed a data flow sensitive fuzzing solution GREYONE. We first utilize the classic feature taint to guide fuzzing. A lightweight and sound fuzzing-driven taint inference (FTI) is adopted to infer taint of variables, by monitoring their value changes while mutating input bytes during fuzzing. With the taint, we propose a novel input prioritization model to determine which branch to explore, which bytes to mutate and how to mutate. Further, we use another data flow feature constraint conformance, i.e., distance of tainted variables to values expected in untouched branches, to tune the evolution direction of *** implemented a prototype of GREYONE and evaluated it on the LAVA data set and 19 real world programs. The results showed that it outperforms various state-of-the-art fuzzers in terms of both code coverage and vulnerability discovery. In the LAVA data set, GREYONE found all listed bugs and 336 more unlisted. In real world programs, GREYONE on average found 2.12X unique program paths and 3.09X unique bugs than state-of-the-art evolutionary fuzzers, including AFL, VUzzer, CollAFL, Angora and Honggfuzz, Moreover, GREYONE on average found 1.2X unique program paths and 1.52X unique bugs than a state-of-the-art symbolic exeuction assisted fuzzer QSYM. In total, it found 105 new security bugs, of which 41 are confirmed by CVE.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Resource-Aware Multi-Criteria Vehicle Participation for Federated Learning in Internet of Vehicles

SSRN

引用

SSRN 2023年

作者： Wen, Jie Zhang, Jingbo Zhang, Zhixia Cui, Zhihua Cai, Xingjuan Chen, Jinjun The Shanxi Key Laboratory of Advanced Control and Equipment intelligence Taiyuan University of Science and Technology Shanxi Taiyuan China The Shanxi Key Laboratory of Big Data Analysis and Parallel Computing Taiyuan University of Science and Technology Shanxi Taiyuan China The State Key Lab for Novel Software Technology Nanjing University China Department of Computing Technologies Swinburne University of Technology Melbourne Australia

Federated learning (FL), as a safe distributed training mode, provides strong support for the edge intelligence of the Internet of Vehicles (IoV) to realize efficient collaborative control and safe data sharing. However, due to the resource limitation and the instability of training environment in the complex IoV, ideal performance of FL cannot be achieved. Since considering the actual resource constraints and federated task requirements, the diversified device selection criteria make the resource-aware vehicle selection problem become a multi-criteria selection problem. To effectively support FL for IoV, the resource-aware multi-criteria vehicle selection problem was described as a many-objective optimization problem, and proposed a resource-aware many-objective vehicle selection model (RA-MaOVSM) to optimize resource efficiency. The RA-MaOVSM considering heterogeneous resources (like computation resources, communication resources, energy resources and data resources) of on-board devices in IoV, and realizes the joint optimization of learning efficiency, energy cost and global performance. Additionally, a novel probability distribution combination game strategy is applied to many-objective evolutionary algorithm (MaOEA) for improving the model solving performance. Simulation results demonstrate that RA-MaOVSM can effectively optimize the IoV resources and FL model performance, and the designed algorithm exhibits good convergence and distribution, achieving a good balance among multiple device selection criteria. © 2023, The Authors. All rights reserved.

关键词： Evolutionary algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：