检索结果-内蒙古大学图书馆

Attar: RRAM-based in-memory attention accelerator with software-hardware co-optimization

Science China(Information sciences) 2025年第3期68卷 371-387页

作者： Bing LI Ying QI Ying WANG Yinhe HAN Information Engineering College Capital Normal University Research Center for Intelligent Computing Systems Institute of Computing TechnologyChinese Academy of Sciences

The attention mechanism has become a pivotal component in artificial intelligence, significantly enhancing the performance of deep learning applications. However, its quadratic computational complexity and intricate computations lead to substantial inefficiencies when processing long sequences. To address these challenges, we introduce Attar, a resistive random access memory(RRAM)-based in-memory accelerator designed to optimize attention mechanisms through software-hardware co-optimization. Attar leverages efficient Top-k pruning and quantization strategies to exploit the sparsity and redundancy of attention matrices, and incorporates an RRAM-based in-memory softmax engine by harnessing the versatility of the RRAM crossbar. Comprehensive evaluations demonstrate that Attar achieves a performance improvement of up to 4.88× and energy saving of 55.38% over previous computing-in-memory(CIM)-based accelerators across various models and datasets while maintaining comparable accuracy. This work underscores the potential of in-memory computing to enhance the efficiency of attention-based models without compromising their effectiveness.

关键词： RRAM computing-in-memory attention pruning quantization

来源：评论

学校读者我要写书评

暂无评论

Communication delay-aware cooperative adaptive cruise control with dynamic network topologies——A convergence of communication and control

引用

Digital Communications and Networks 2025年第1期11卷 191-199页

作者： Jihong Liu Yiqing Zhou Ling Liu State Key Lab of Processors Institute of Computing TechnologyChinese Academy of SciencesBeijing100190China University of Chinese Academy of Sciences Beijing100049China Beijing Key Laboratory of Mobile Computing and Pervasive Device Beijing100190China Zhongke Nanjing Mobile Communication&Computing Innovation Institute Nanjing211100China

Wireless communication-enabled Cooperative Adaptive Cruise Control(CACC)is expected to improve the safety and traffic capacity of vehicle *** CACC considers a conventional communication delay with fixed Vehicular Communication Network(VCN)***,when the network is under attack,the communication delay may be much higher,and the stability of the system may not be *** paper proposes a novel communication Delay Aware CACC with Dynamic Network Topologies(DADNT).The main idea is that for various communication delays,in order to maximize the traffic capacity while guaranteeing stability and minimizing the following error,the CACC should dynamically adjust the VCN network topology to achieve the minimum inter-vehicle *** this end,a multi-objective optimization problem is formulated,and a 3-step Divide-And-Conquer sub-optimal solution(3DAC)is *** results show that with 3DAC,the proposed DADNT with CACC can reduce the inter-vehicle spacing by 5%,10%,and 14%,respectively,compared with the traditional CACC with fixed one-vehicle,two-vehicle,and three-vehicle look-ahead network topologies,thereby improving the traffic efficiency.

关键词： Communication delay Cooperative adaptive Cruise control Network topology String stability

来源：评论

学校读者我要写书评

暂无评论

Two-Stage Planning for Smart Buildings With Flexible Heating Load Considering Climate Change Induced Heat Waves

引用

IEEE Transactions on Smart Grid 2025年第3期16卷 2012-2025页

作者： Zhao, Tianyang Xu, Qianwen KTH Royal Institute of Technology School of Electrical Engineering and Computing Sciences Stockholm114 58 Sweden

Building energy planning is significantly challenged by climate change, particularly the increasing frequency of heat waves impacting heating and cooling demands. Current planning methodologies neglect the impacts of heat waves on energy consumption and do not accurately model the temperature-dependent performance of heat pumps (HPs). This paper addresses the critical issue of designing energy-efficient and climate-resilient buildings through optimal resource configuration under uncertain weather conditions. A two-stage stochastic optimization model for building energy system planning is proposed. In the first stage, the capacities of energy resources are optimized;in the second stage, operational strategies under various weather scenarios are determined. A novel long-term load forecasting method using morphing techniques is developed to generate scenario trees accounting for both normal conditions and heat waves, capturing the impact of climate change on energy demand. Additionally, a temperature-dependent HP model with finite partial output levels is introduced, improving upon existing fixed coefficient of performance models to reflect practical operational characteristics. Simulation results on a real educational building in Stockholm demonstrate the effectiveness of the approach, showing an 8.33% reduction in heating capacity requirements and a 62.14% decrease in solution time, enhancing both resilience and computational efficiency. © 2010-2012 IEEE.

关键词： Stochastic systems

来源：评论

学校读者我要写书评

暂无评论

Grover's search finds new applications in continuous optimization and spectral analysis

引用

Science China(Physics,Mechanics & Astronomy) 2025年第6期Mechanics & Astronomy) .卷 216-217页

作者： Xiaoming Sun State Key Lab of Processors Institute of Computing Technology Chinese Academy of Sciences School of Computer Science and Technology University of Chinese Academy of Sciences

A novel quantum search algorithm tailored for continuous optimization and spectral problems was proposed recently by a research team from the University of Electronic Science and technology of China to broaden quantum computation frontiers and enrich its application *** computing has traditionally excelled at tackling discrete search challenges, but many important applications from large-scale optimization to advanced physics simulations necessitate searching through continuous domains.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Pyramid: Accelerating LLM Inference With Cross-Level Processing-in-Memory

IEEE Computer Architecture Letters

引用

IEEE Computer Architecture Letters 2025年第1期24卷 121-124页

作者： Yan, Liang Lu, Xiaoyang Chen, Xiaoming Han, Yinhe Sun, Xian-He Chinese Academy of Sciences Intelligent Computing Systems Institute of Computing Technology Beijing100190 China University of Chinese Academy of Sciences Beijing100190 China Illinois Institute of Technology Department of Compute Science ChicagoIL60616 United States

Integrating processing-in-memory (PIM) with GPUs accelerates large language model (LLM) inference, but existing GPU-PIM systems encounter several challenges. While GPUs excel in large general matrix-matrix multiplications (GEMM), they struggle with small-scale operations better suited for PIM, which currently cannot handle them independently. Additionally, the computational demands of activation operations exceed the capabilities of current PIM technologies, leading to excessive data movement between the GPU and memory. PIM's potential for general matrix-vector multiplications (GEMV) is also limited by insufficient support for fine-grained parallelism. To address these issues, we propose Pyramid, a novel GPU-PIM system that optimizes PIM for LLM inference by strategically allocating cross-level computational resources within PIM to meet diverse needs and leveraging the strengths of both technologies. Evaluation results demonstrate that Pyramid outperforms existing systems like NeuPIM, AiM, and AttAcc by factors of 2.31×, 1.91×, and 1.72×, respectively. © 2002-2011 IEEE.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

Seeing Beyond the Blur with Generative AI

引用

XRDS: Crossroads 2025年第2期31卷 28-31页

作者： Feng, Berthy Bouman, Katherine L. The Computing and Mathematical Sciences Electrical Engineering Astronomy Departments California Institute of Technology United States

Can AI hallucinations be responsibly harnessed for scientific imaging?

关键词：

来源：评论

学校读者我要写书评

暂无评论

LAD: Efficient Accelerator for Generative Inference of LLM with Locality Aware Decoding 31

LAD: Efficient Accelerator for Generative Inference of LLM w...

引用

31st IEEE International Symposium on High Performance Computer Architecture, HPCA 2025

作者： Wang, Haoran Li, Yuming Xu, Haobo Wang, Ying Liu, Liqi Yang, Jun Han, Yinhe Institute of Computing Technology Chinese Academy of Sciences China University of Chinese Academy of Sciences China

ISBN: (纸本)9798331506476

Large Language Models (LLMs) have emerged as the cornerstone of content generation applications due to their ability to capture relations between newly generated token and the full preceding context. However, this ability stems from the attention mechanism for decoding that retains the entire generation history as key value cache (KV cache). As the generated sequence lengthens, the KV cache expands, causing a substantial memory access bottleneck. In advanced LLM generation systems running on GPUs, the attention mechanism for decoding accounts for more than 50% of the total inference time when the KV cache length reaches 4096. To address this issue, this paper introduces LAD (Locality Aware Decoding), an LLM generation accelerator with algorithm-hardware enhancements that significantly decrease KV cache access, resulting in considerable speedups and energy savings. A key insight underlying LAD is that when the attention score for a specific position remains fixed over the next several decoding steps, it is unnecessary to repeatedly retrieve the associated key and value at each step to reproduce the computation. Our analysis reveals that numerous positions exhibit notable numerical locality in attention scores through multiple decoding steps. Leveraging these insights, we have designed an innovative attention decoding computation method that decreases the frequency of accessing the key and value for positions demonstrating good locality, all while maintaining decoding accuracy. Extensive experiments show that LAD generates sequences with an average ROUGE-1 similarity of 97% compared to those generated by the original model. When the length of KV cache exceeds 2048, the high configuration of LAD accelerator achieves on average (geomean) 10.7 × speedup and 52.4 × energy efficiency for the attention mechanism compared to the A100 GPU. For end-to-end model inference, it also achieves on average 2.3 × speedup and 13.4 × energy efficiency. © 2025 IEEE.

关键词： Program processors

来源：评论

学校读者我要写书评

暂无评论

High-Parallel In-Memory NTT Engine with Hierarchical Structure and Even-Odd Data Mapping 25

High-Parallel In-Memory NTT Engine with Hierarchical Structu...

引用

30th Asia and South Pacific Design Automation Conference, ASP-DAC 2025

作者： Li, Bing Liu, Huaijun Du, Yibo Wang, Ying Capital Normal University Beijing China Institute of Computing Technology Chinese Academy Sciences University of Chinese Academy Sciences Beijing China Institute of Computing Technology Chinese Academy Sciences Beijing China Institute of Microelectronics Chinese Academy of Sciences China

ISBN: (纸本)9798400706356

The Number Theoretic Transform (NTT) significantly impacts the execution time of Fully Homomorphic Encryption (FHE) in practical applications, driving research into accelerated NTT methods. computing-in-Memory (CIM) offers a promising solution to handle NTT's memory bottlenecks, yet efficiently implementing a CIM-based NTT engine remains challenging due to unique operations and large data sizes. We propose HP-CIM, a high-parallelism digital SRAM-based CIM NTT engine designed for large-scale NTT. HP-CIM integrates MVM-based NTT with a hierarchical SRAM architecture and novel even-odd data mapping, achieving nearly 3.08× faster execution and 4.96× energy savings compared to prior CIM-based designs. © 2025 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

关键词： Static random access storage

来源：评论

学校读者我要写书评

暂无评论

MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models 31

MLaKE: Multilingual Knowledge Editing Benchmark for Large La...

引用

31st International Conference on Computational Linguistics, COLING 2025

作者： Wei, Zihao Deng, Jingcheng Pang, Liang Ding, Hanxing Shen, Huawei Cheng, Xueqi Institute of Computing Technology Chinese Academy of Sciences China University of Chinese Academy of Sciences China

ISBN: (纸本)9798891761964

The extensive utilization of large language models (LLMs) underscores the crucial necessity for precise and contemporary knowledge embedded within their intrinsic parameters. Existing research on knowledge editing primarily concentrates on monolingual scenarios, neglecting the complexities presented by multilingual contexts and multi-hop reasoning. To address these challenges, our study introduces MLaKE (Multilingual Knowledge Editing), a novel benchmark comprising 4072 multi-hop and 5360 single-hop questions designed to evaluate the adaptability of knowledge editing methods across five languages: English, Chinese, Japanese, French, and German. MLaKE aggregates fact chains from Wikipedia across languages and utilizes LLMs to generate questions and answer. We assessed the effectiveness of current multilingual knowledge editing methods using the MLaKE dataset. Our results show that due to considerable inconsistencies in both multilingual performance and encoding efficiency, these methods struggle to generalize effectively across languages. The accuracy of these methods when editing English is notably higher than for other languages. The experimental results further demonstrate that models encode knowledge and generation capabilities for different languages using distinct parameters, leading to poor cross-lingual transfer performance in current methods. Transfer performance is notably better within the same language family compared to across different families. These findings emphasize the urgent need to improve multilingual knowledge editing methods. © 2025 Association for Computational Linguistics.

关键词： Computational linguistics

来源：评论

学校读者我要写书评

暂无评论

Be CIM or Be Memory: A Dual-mode-aware DNN Compiler for CIM Accelerators 25

Be CIM or Be Memory: A Dual-mode-aware DNN Compiler for CIM ...

引用

30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2025

作者： Zhao, Shixin Li, Yuming Li, Bing He, Yintao Wang, Mengdi Han, Yinhe Wang, Ying Institute of Computing Technology Chinese Academy of Sciences University of Chinese Academy of Sciences Beijing China Institute of Microelectronics Chinese Academy of Sciences Beijing China State Key Lab of Processors Institute of Computing Technology Chinese Academy of Sciences Beijing China

ISBN: (纸本)9798400710797

computing-in-memory (CIM) architectures demonstrate superior performance over traditional architectures. To unleash the potential of CIM accelerators, many compilation methods have been proposed, focusing on application scheduling optimization specific to CIM. However, existing compilation methods often overlook CIM's capability to switch dynamically between compute and memory modes, which is crucial for accommodating the diverse memory and computational needs of real-world deep neural network architectures, especially the emerging large language models. To fill this gap, we introduce CMSwitch, a novel compiler to optimize resource allocation for CIM accelerators with adaptive mode-switching capabilities, thereby enhancing the performance of DNN applications. Specifically, our approach integrates the compute-memory mode switch into the CIM compilation optimization space by introducing a new hardware abstraction attribute. Then, we propose a novel compilation optimization pass that identifies the optimal network segment and the corresponding mode resource allocations using dynamic programming and mixed-integer programming. CMSwitch uses the tailored meta-operator to express the compilation result in a generalized manner. Evaluation results demonstrate that CMSwitch achieves an average speedup of 1.31x compared to existing SOTA CIM compilation works, highlighting CMSwitch's effectiveness in fully exploiting the potential of CIM processors for a wide range of real-world DNN applications. © 2025 ACM.

关键词： Resource allocation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：