检索结果-内蒙古大学图书馆

21st IEEE Hot Chips Symposium, HCS 2009

作者： Snir, Marc UPORO Illinois Universal parallel Computing Research Center United States

ISBN: (纸本)9781467388733

• parallelism need not be hard - much easier than traditional concurrent programming • parallel programming, like programming, is a team effort that requires many different skills and many different tools - coarse-level parallelism for the masses, tuned libraries, refactoring tools, verification and tuning tools • parallel architectures can scale if they take advantage of practical constraints on communication and synchronization in real prog rams. © 2009 IEEE.

关键词： parallel architectures

来源：评论

学校读者我要写书评

暂无评论

A distributed computing center software for the efficient use of parallel computer systems

A distributed computing center software for the efficient us...

引用

International Conference and Exhibition on High-Performance computing and Networking, HPCN 1994

作者： Ramme, F. Römke, T. Kremer, K. Paderborn Center for Parallel Computing University of Paderborn Warburger Str. 100 Paderborn33095 Germany University of Technology Aachen Computing Center Seffenter Weg 23 Aachen52074 Germany

ISBN: (纸本)9783540579816

Nowadays, computing systems accessible to researchers with "Grand Challenge" problems consist of a hardware mixture ranging from clusters of workstations to parallel supercomputers. This hardware is available via geographically distributed networks with various communication capabilities. Within this paper the computing center Software (CCS) is introduced. This software package is implemented as a message based distributed application by itself. Thus, it is considerable simplified to integrate new MPP systems into an existing environment. Giving the same look and feel when accessing different types of parallel or distributed computer systems will bring the idea of metacomputing close to practice. This paper takes a look behind CCS. The underlying model which uses abstract views for specifying system components and a general purpose Resource Description Language is sketched. We explain how it is possible to support Wide-Area Network access and unstable connection lines. Afterwards, we present the system and vendor independent batch processing facility usable for arbitrary programming environments. A sample user session and first prototype installations complete our work. © 1994, Springer Verlag. All rights reserved.

关键词： Wide area networks

来源：评论

学校读者我要写书评

暂无评论

Evaluating the Strong Scaling Potential of AI Engines for Molecular Dynamics Simulations 25

Evaluating the Strong Scaling Potential of AI Engines for Mo...

引用

Proceedings of the 15th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies

作者： Mika Bröker Johannes Menzel Christian Plessl Paderborn Center for Parallel Computing Paderborn University Paderborn Germany

来源：评论

学校读者我要写书评

暂无评论

FINN-HPC: Closing the Gap for Energy-Efficient Neural Network Inference on FPGAs in HPC 25

FINN-HPC: Closing the Gap for Energy-Efficient Neural Networ...

引用

Proceedings of the 15th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies

作者： Linus Jungemann Bjarne Wintermann Heinrich Riebler Christian Plessl Paderborn University Paderborn Center for Parallel Computing Paderborn Germany

来源：评论

学校读者我要写书评

暂无评论

Neural Network Inference in High-Performance computing: Closing the Gap for FINN based Reconfigurable Accelerators 25

Neural Network Inference in High-Performance Computing: Clos...

引用

Proceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

作者： Linus Jungemann Bjarne Wintermann Heinrich Riebler Christian Plessl Paderborn University Paderborn Center for Parallel Computing Paderborn Germany

ISBN: (纸本)9798400713965

In recent years, Neural Networks (NNs) have become one of the most prevailing topics in computers science, both in research and in industry. NNs are used for data analysis, natural language processing, autonomous driving and more. As such, NNs also see more application and use in High-Performance computing (HPC). At the same time, energy efficiency has become an increasingly critical topic. NNs use large amounts of energy for operation, which in return results in large amounts of CO2 emissions. This work presents a comprehensive evaluation of current NN inference soft- and hardware configurations within High-Performance computing (HPC) environments, with a focus on both performance metrics and energy consumption. NN quantization and accelerators such as FPGAs allow for an increased inference efficiency, both in terms of throughput and energy. Therefore, this work focuses on FINN, an efficient NN inference framework for FPGAs, highlighting its current lack of support for HPC systems. We provide an in-depth analysis of FINN in order to implement extensions to optimize the end-to-end execution for the usage in the HPC environment. We thoroughly evaluate the performance and energy efficiency gains using newly implemented optimizations and compare it against existing NN accelerators for HPC. With our extensions of FINN, we were able to achieve a 1847× higher throughput, while also decreasing the latency on average by 0.9978× and EDP by 0.9979× on an Alveo U55C FPGA. Data flow based NN inference accelerators on an FPGA should be used if the performance and energy footprint of the inference process is crucial, and the batch sizes are small to medium. For extremely large batch sizes and a very limited time for network-to-accelerator (less than a few days), using GPUs is still the way to go. Our results show that with the newly developed driver, we outperform a high-end Nvidia A100 GPU by up to 7.81x in throughput, while having a 0.87x lower latency and 0.88x lower energy de

关键词： dataflow architectures

来源：评论

学校读者我要写书评

暂无评论

Efficient and Distributed Computation of Electron Repulsion Integrals on AMD AI Engines

Efficient and Distributed Computation of Electron Repulsion ...

引用

Annual IEEE Symposium on Field-Programmable Custom computing Machines (FCCM)

作者： Johannes Menzel Christian Plessl Department of Computer Science Paderborn Center for Parallel Computing Paderborn University Paderborn Germany

ISBN: (数字)9798331502812

ISBN: (纸本)9798331502829

computing electron repulsion integrals (ERIs) is the major computational bottleneck of many quantum mechanical simulation methods, requiring trillions of ERI evaluations per time step. While the computation of independent ERIs is embarrassingly parallel, the efficient computation of individual ERIs on modern processor cores is difficult due to both an insufficient cache size for intermediates of the computation and irregular memory access patterns that are difficult to vectorize. In this paper, we present how our implementation on the AI Engine (AIE) architecture addresses both of these problems. First, we have defined a flexible graph structure, which we call an ERI-Engine, that can be implemented for all 231 canonical ERI quartets from {ss|ss} to {hh| hh} by distributing the computation over 2–14 AIEs. Second, for the larger quartets, we have devised a novel vectorization scheme that leverages the advanced floating-point unit of the AIEs, while also supporting vectorization of independent ERIs for the smaller quartets. Finally, ERI-Engines are horizontally and vertically stackable to fill the entire AIE array, and in particular, the vertically stacked ERI-Engines form a column that uses one or more time-shared channels to stream the results out of the AIE array, almost completely hiding the computational phases of individual ERI-Engines. In terms of absolute performance, we are competitive with recent high-performance implementations of ERI algorithms on FPGAs (SERI) and GPUs (LibintX), as well as well-established highly optimized CPU libraries (Libint, Libcint), while being the unequivocal leader in terms of energy efficiency.

关键词： Phased arrays Quantum computing Memory management Quantum mechanics Libraries Computational efficiency Artificial intelligence Engines Field programmable gate arrays Electrons

来源：评论

学校读者我要写书评

暂无评论

Partial Order-centered Hyperbolic Representation Learning for Few-shot Relation Extraction 31

Partial Order-centered Hyperbolic Representation Learning fo...

引用

31st International Conference on Computational Linguistics, COLING 2025

作者： Hu, Biao Huang, Zhen Hu, Minghao Yang, Pinglv Qiao, Peng Dou, Yong Wang, Zhilin National Key Laboratory of Parallel and Distributed Computing National University of Defense Technology China Center of Information Research Academy of Military Science China College of Meteorology and Oceanology National University of Defense Technology China

ISBN: (纸本)9798891761964

Prototype network-based methods have made substantial progress in few-shot relation extraction (FSRE) by enhancing relation prototypes with relation descriptions. However, the distribution of relations and instances in distinct representation spaces isolates the constraints of relations on instances, making relation prototypes biased. In this paper, we propose an end-to-end partial order-centered hyperbolic representation learning (PO-HRL) framework, which imposes the constraints of relations on instances by modeling partial order in hyperbolic space, so as to effectively learn the distribution of instance representations. Specifically, we develop the hyperbolic supervised contrastive learning based on Lorentzian cosine similarity to align representations of relations and instances, and model the partial order by constraining instances to reside within the Lorentzian entailment cone of their respective relation. Experiments on three benchmark datasets show that PO-HRL outperforms the strong baselines, especially in 1-shot settings lacking relation descriptions. © 2025 Association for Computational Linguistics.

关键词： Contrastive Learning

来源：评论

学校读者我要写书评

暂无评论

PEbfs: Implement High-Performance Breadth-First Search on PEZY-SC3s 24th

PEbfs: Implement High-Performance Breadth-First Search on P...

引用

24th International Conference on Algorithms and Architectures for parallel Processing, ICA3PP 2024

作者： Guo, Weihao Wang, Qinglin Liu, Xiaodong Peng, Muchun Yang, Shun Liang, Yaling Shi, Yongzhen Cao, Ligang Liu, Jie Laboratory of Digitizing Software for Frontier Equipment National University of Defense Technology Changsha410073 China National Key Laboratory of Parallel and Distributed Computing National University of Defense Technology Changsha410073 China Engineering Research Center for National Fundamental Software National University of Defense Technology Changsha410073 China

ISBN: (纸本)9789819615506

The breadth-first search (BFS) algorithm is a fundamental algorithm in graph theory, and it’s parallelization can significantly improve performance. Therefore, there have been numerous efforts to leverage the powerful parallel computing capabilities of hardware like GPGPU to implement high-performance BFS algorithms. However, the energy efficiency is relatively low due to the high power consumption of the platforms on which the algorithm is adapted to. To deal with these challenges, this paper introduces PEbfs that is a high-performance BFS algorithm based on the PEZY-SC3s efficient processor. We integrated three search algorithms, two algorithm optimization strategies, and a directional optimization scheme into PEbfs. Through multiple evaluations of the performance of PEbfs on the public SNAP dataset, the results demonstrate that the average energy efficiency ratio of PEbfs is higher than that of Enterprise and Tigr, the two most advanced implementations on Nvidia’s GPGPU: It achieves 3.08× the average energy efficiency ratio of Enterprise and 4.53× that of Tigr. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.

关键词： Optimization algorithms

来源：评论

学校读者我要写书评

暂无评论

TensorMD: Molecular Dynamics Simulation with Ab Initio Accuracy of 50 Billion Atoms 25

TensorMD: Molecular Dynamics Simulation with Ab Initio Accur...

引用

Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of parallel Programming

作者： Yucheng Ouyang Ying Liu Honghui Shang Zhenchuan Chen Jiahao Shan Huimin Cui Xiaobing Feng Xin Chen Xingyu Gao Lifang Wang Haifeng Song Rongfen Lin Fang Li Institute of Computing Technology Chinese Academy of Sciences Beijing China National Research Center of Parallel Computer Engineering and Technology Beijing China Institute of Applied Physics and Computational Mathematics Beijing China

ISBN: (纸本)9798400714436

Molecular dynamics simulation emerges as an important area that HPC+AI helps to investigate the physical properties, with machine-learning interatomic potentials (MLIPs) being used. General-purpose machine-learning (ML) tools have been leveraged in MLIPs, but they are not perfectly matched with each other, since many optimization opportunities in MLIPs have been missed by ML tools. This inefficiency arises from the fact that HPC+AI applications work with far more computational complexity compared with pure AI scenarios. This paper has developed an MLIP, named TensorMD, independently from any ML tool. TensorMD has been evaluated on two supercomputers and scaled to 51.8 billion atoms, i.e., ~ 3× compared with state-of-the-art.

关键词： GPU

来源：评论

学校读者我要写书评

暂无评论

An End-to-End Classification Network Model Based on Hybrid Supervision for Industrial Surface Defect Detection

引用

International Journal of Pattern Recognition and Artificial Intelligence 2025年第5期39卷

作者： Qin, Runbing Chen, Ningjiang Gan, Shukun Guangxi University School of Computer Electronics and Information Nanning530004 China Education Department of Guangxi Zhuang Autonomous Region Key Laboratory of Parallel Distributed and Intelligent Computing Guangxi University Nanning530004 China Guangxi Intelligent Digital Services Research Center of Engineering Technology Nanning530004 China

Industrial part surface defect detection aims to precisely locate defects in images, which is crucial for quality control in manufacturing. The traditional method needs to be designed in advance, but it has shortcomings in terms of generalization ability. Deep neural network-based defect detection faces issues such as low resolution, data scarcity, labeling costs, and high computation. Therefore, an improved solution is necessary to enhance industrial product quality control efficiency and accuracy. A new method is proposed that utilizes end-to-end training of a two-stage neural network based on segmentation with an extended training process. The gradient flow from the classification to the segmentation network is adjusted to prevent unstable features from interfering with learning. To address the problem of image oversampling and undersampling during training, a frequency sampling scheme for negative samples is introduced. Additionally, positive pixels in the region-based segmentation mask are weighted using the distance transform algorithm so that regions with a high probability of defects can be detected without detailed annotation. Experiments are conducted across three distinct defect datasets by applying hybrid supervision encompassing diverse conditions. In the optimal case, the AP rate of the proposed model on the DAGM and KolektorSDD datasets reaches 100%, and the AP rate on the Severstal Steel dataset reaches 98.99%. The experiments’ results show that the detection accuracy has improved greatly, showing how well the proposed method works in the industry. © 2025 World Scientific Publishing Company.

关键词： Deep neural networks

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：