检索结果-内蒙古大学图书馆

IEEE International High-Level Design Validation and Test Workshop

作者： P. Mishra N. Dutt Architectures and Compilers for Embedded Systems Center for Embedded Computer Systems University of California Irvine CA USA

Formal techniques offer an opportunity to significantly reduce the cost of microprocessor verification. We propose a model checking based approach to automatically generate functional test programs for pipelined processors. We specify the processor architecture in an Architecture Description Language (ADL). The processor model is extracted from the ADL specification. Specific properties are applied to the processor model using SMV model checker to generate test programs. We applied this methodology on a single-issue DLX processor to demonstrate the usefulness of our approach.

关键词： Automatic testing Microprocessors Computer architecture Architecture description languages Instruction sets System testing program processors Embedded system Embedded computing Cost function

来源：评论

学校读者我要写书评

暂无评论

On Reducing the Execution Latency of Superconducting Quantum processors via Quantum program Scheduling

arXiv

引用

arXiv 2024年

作者： Wu, Wenjie Wang, Yiquan Yan, Ge Zhao, Yuming Yan, Junchi Shanghai Jiao Tong University Shanghai China

Quantum computing has gained considerable attention, especially after the arrival of the Noisy Intermediate-Scale Quantum (NISQ) era. Quantum processors and cloud services have been made worldwide increasingly available. Unfortunately, programs on existing quantum processors are often executed in series, and the workload could be heavy to the processor. Typically, one has to wait for hours or even longer to obtain the result of a single quantum program on public quantum cloud due to long queue time. In fact, as the scale grows, the qubit utilization rate of the serial execution mode will further diminish, causing the waste of quantum resources. In this paper, to our best knowledge for the first time, the Quantum program Scheduling Problem (QPSP) is formulated and introduced to improve the utility efficiency of quantum resources. Specifically, a quantum program scheduling method concerning the circuit width, number of measurement shots, and submission time of quantum programs is proposed to reduce the execution latency. We conduct extensive experiments on a simulated Qiskit noise model, as well as on the Xiaohong (from QuantumCTek) superconducting quantum processor. Numerical results show the effectiveness in both QPU time and turnaround time. Copyright © 2024, The Authors. All rights reserved.

关键词： program processors

来源：评论

学校读者我要写书评

暂无评论

High-level test program generation strategies for processors

High-level test program generation strategies for processors

引用

East-West Design & Test Symposium (EWDTS)

作者： Shima Hoseinzadeh Mohammad Hashem Haghbayan Department of Computer Engineering Science and Research branch Islamic Azad University Tehran Iran School of Electrical and Computer Engineering University of Tehran Tehran Iran

ISBN: (纸本)9781479920976

This paper brings together reliability and testability and introduces certain rules for generating high level test macros for processors. These rules help to generate higher quality test macros. On the other hand, these rules can be a reference guide for a programmer to write more reliable codes. The basic idea of these rules comes from the motto that a more testable code results in a lower reliability and vice versa. The empirical results show the effect of these rules in generating high quality high-level test macros and use of which results in a less reliable overall code. The programmer can use these guidelines for generating of less efficient testable code, and better reliable programs.

关键词： Circuit faults program processors Reliability engineering Testing Algorithm design and analysis programming

来源：评论

学校读者我要写书评

暂无评论

Power-Area Optimized Multiplier Design using In-Memory Computation

Power-Area Optimized Multiplier Design using In-Memory Compu...

引用

Integrated Circuits and Communication Systems (ICICACS), IEEE International Conference on

作者： Veerubhotla Sri Pranav M. Vinodhini Department of Electronics and Communication Engineering Amrita School of Engineering Bengaluru Amrita Vishwa Vidyapeetham India

ISBN: (数字)9798331508456

ISBN: (纸本)9798331508463

This paper presents a novel multiplier design leveraging In-Memory Computation (IMC) with a Content Addressable Memory (CAM) module and associated processors to achieve high energy efficiency and optimal area utilization. Un-like conventional multipliers that rely on separate memory and processing units, leading to latency and power inefficiencies, the proposed architecture integrates computation within the memory. By utilizing the parallel search and match capabilities of CAM, the design minimizes data movement, reducing power consumption and hardware footprint. Implemented in Verilog HDL and validated using Xilinx VIVADO, the proposed multiplier demonstrates a power reduction of 83.06 % compared to approximate multipliers and 90.09 % relative to accurate multipliers. Additionally, the design achieves a 54.05% reduction in LUT utilization and a 49.23% decrease in flip-flop usage. While a marginal increase in I/O ports and buffer gates utilization is observed due to the CAM module's delay compensation, the results highlight the potential of in-memory computation as a transformative approach for energy-efficient. compact hardware in modern computing systems.

关键词： Associative memory Accuracy program processors Power demand Random access memory Logic gates Hardware Energy efficiency Table lookup Hardware design languages

来源：评论

学校读者我要写书评

暂无评论

ZO2: Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU Memory

arXiv

引用

arXiv 2025年

作者： Wang, Liangyu Ren, Jie Xu, Hang Wang, Junxiao Xie, Huanyi Keyes, David E. Wang, Di King Abdullah University of Science and Technology Saudi Arabia Guangzhou University China

Fine-tuning large pre-trained LLMs generally demands extensive GPU memory. Traditional first-order optimizers like SGD encounter substantial difficulties due to increased memory requirements from storing activations and gradients during both the forward and backward phases as the model size expands. Alternatively, zeroth-order (ZO) techniques can compute gradients using just forward operations, eliminating the need to store activations. Furthermore, by leveraging CPU capabilities, it’s feasible to enhance both the memory and processing power available to a single GPU. We propose a novel framework, ZO2 (Zeroth-Order Offloading), for efficient zeroth-order fine-tuning of LLMs with only limited GPU memory. Our framework dynamically shifts model parameters between the CPU and GPU as required, optimizing computation flow and maximizing GPU usage by minimizing downtime. This integration of parameter adjustments with ZO’s double forward operations reduces unnecessary data movement, enhancing the fine-tuning efficacy. Additionally, our framework supports an innovative low-bit precision approach in AMP mode to streamline data exchanges between the CPU and GPU. Employing this approach allows us to fine-tune extraordinarily large models, such as the OPT-175B with more than 175 billion parameters, on a mere 18GB GPU—achievements beyond the reach of traditional methods. Moreover, our framework achieves these results with almost no additional time overhead and absolutely no accuracy loss compared to standard zeroth-order methods. ZO2’s code has been open-sourced in https://***/liangyuwang/zo2. Copyright © 2025, The Authors. All rights reserved.

关键词： program processors

来源：评论

学校读者我要写书评

暂无评论

Energy Consumption and Power Modeling for Various Intel Multicore processors

Energy Consumption and Power Modeling for Various Intel Mult...

引用

Euromicro Conference on Parallel, Distributed and Network-Based Processing

作者： Thomas Rauber Gudula Rünger Computer Science Department University Bayreuth Bayreuth Germany Computer Science Department Chemnitz University of Technology Chemnitz Germany

ISBN: (数字)9798331524937

ISBN: (纸本)9798331524944

Multicore processors spend a varying amount of time and energy when executing a user application. The specific amount of time and energy consumed depends on application parameters as well as on the execution mode, which includes the number of threads or the operational frequency used for the execution of an application. However, also the characteristics of the multicore processor can have a significant impact. In this article, three Intel multicore processors (Broadwell, Cascade Lake and Sapphire Rapid) of different generations are investigated with respect to their time and energy expenditure. The user application is a numerical solution of time-dependent partial differential equations. The experiments include an analysis of each processor’s unique power characteristics, with an emphasis on investigating and modeling the power behavior of these multicore processors. Processor specific power models distinguishing static and dynamic power are presented and their validity is shown with experimental data. The best suited power models differ strongly for the different processors reflecting the progress in internal processor design with respect to energy management.

关键词： Analytical models Energy consumption Time-frequency analysis program processors Multicore processing Partial differential equations Lakes Mathematical models Numerical models Energy management

来源：评论

学校读者我要写书评

暂无评论

Multiphase Lateral Flux Indirect Coupled Inductor for Vertical Power Delivery Voltage Regulator Module

Multiphase Lateral Flux Indirect Coupled Inductor for Vertic...

引用

Annual IEEE Conference on Applied Power Electronics Conference and Exposition (APEC)

作者： Adhistira M. Naradhipa Qiang Li Center for Power Electronics Systems (CPES) Virginia Polytechnic Institute and State University Blacksburg VA USA

ISBN: (数字)9798331516116

ISBN: (纸本)9798331516123

The rising trend of artificial intelligence (AI) usage in many applications requires high-performance processors, demanding power to an unprecedented level. Recently, in the 48 V two-stage conversion system, the vertical power delivery (VPD) solution is sought, where the second stage is placed directly underneath the processor to remove the "last inch" power loss found in the lateral power delivery (LPD) architecture. However, the VPD solution gives a strict size requirement for the voltage regulator module (VRM), where the bottleneck is usually the magnetic component. In addition, high-performance processors require a fast transient response. This article proposed a new air gap-less powder-core-based multiphase integrated lateral flux negative coupled inductor structure with a footprint of only 100 mm 2 and a height of 2.9 mm, enabling a high-density and fast-transient VRM solution. The negative coupling is achieved electrically through the coupled winding, which enables symmetrical N-phase coupling and straight-phase winding, resulting in an extremely small DCR. The proposed inductor is experimentally tested at up to 300 A (75 A/phase) to prove its high current handling capability, achieving a high 3 A/mm 2 current density.

关键词： Couplings Transient response Regulators program processors Voltage measurement Windings Inductors Voltage control Current density Artificial intelligence

来源：评论

学校读者我要写书评

暂无评论

AutoHete: An Automatic and Efficient Heterogeneous Training System for LLMs

arXiv

引用

arXiv 2025年

作者： Zeng, Zihao Liu, Chubo He, Xin Hu, Juan Jiang, Yong Huang, Fei Li, Kenli Lim, Wei Yang Bryan Nanyang Technological University Singapore Hunan University China Agency for Science Technology and Research Singapore National University of Singapore Singapore Alibaba Group China

Transformer-based large language models (LLMs) have demonstrated exceptional capabilities in sequence modeling and text generation, with improvements scaling proportionally with model size. However, the limitations of GPU memory have restricted LLM training accessibility for many researchers. Existing heterogeneous training methods significantly expand the scale of trainable models but introduce substantial communication overheads and CPU workloads. In this work, we propose AutoHete, an automatic and efficient heterogeneous training system compatible with both single-GPU and multi-GPU environments. AutoHete dynamically adjusts activation checkpointing, parameter offloading, and optimizer offloading based on the specific hardware configuration and LLM training needs. Additionally, we design a priority-based scheduling mechanism that maximizes the overlap between operations across training iterations, enhancing throughput. Compared to state-of-the-art heterogeneous training systems, AutoHete delivers a 1.32x∼1.91x throughput improvement across various model sizes and training configurations. Copyright © 2025, The Authors. All rights reserved.

关键词： program processors

来源：评论

学校读者我要写书评

暂无评论

8.5 A Command-Aware Hybrid LDO for Advanced HBM Interfaces with 150μA Quiescent Current and 20pF On-Chip Capacitor Achieving Sub-10mV Voltage Droop in 400ps Settling Time

8.5 A Command-Aware Hybrid LDO for Advanced HBM Interfaces w...

引用

IEEE International Conference on Solid-State Circuits (ISSCC)

作者： Jaeho Kim Myeongho Han Jooeun Bang Younghyun Lim Jaehyouk Choi Seoul National University Seoul Korea KAIST Daejeon Korea Kyung Hee University Yongin Korea

ISBN: (数字)9798331541019

ISBN: (纸本)9798331541026

With the advent of the generative AI era, high-bandwidth memory (HBM) has emerged as an irreplaceable solution that can provide ultra-high memory bandwidth (BW) of more than 1TB/s to AI processors. To enable such a high BW, HBM3E accommodates 16 channels (CHs) with two pseudo CHs (pCHs) each, and HBM4 increases them to 32 CHs to double the BW. Each pCH receives a dedicated differential write data strobe (WDQS) from the host. Then, the quadrature clocks (WDQS/2 IN S) at half frequency are passed to the WDQS buffer, which generates the output clocks, 5 OUT xS (X = I, Q, IB, QB), to sample the DQ data in parallel (top left of Fig. 8.5.1). To ensure error-free sampling for all DQs, low jitter is necessary in the $S_{\text{OUT}X}\mathrm{s}$ , but it is difficult to achieve due to the power-supply-induced jitter (PSIJ) issues of the WDQS buffer. A sudden toggle of WDQS/2 IN in response to the command signal $(ACT_{\mathrm{C}\text{MD}})$ from the host causes a surge in the instantaneous load current $(l_{\mathrm{L}})$ of the WDQS buffer (e.g., edge time (T EDGE ) < 100ps) [1] (top right of Fig. 8.5.1). This results in a significant voltage droop in the supply voltage $(V_{\mathrm{D}\mathrm{D}})$ and a substantial increase in PSIJ [2]. This problem is particularly severe when the $V_{\mathrm{D}\mathrm{D}}$ is unstable and noisy due to complex power grids and limited capacitor availability.

关键词： program processors Capacitors Voltage Jitter Power grids System-on-chip Solid state circuits Noise measurement Surges Clocks

来源：评论

学校读者我要写书评

暂无评论

Automated Crop Harvesting with Robotics Using AI

Automated Crop Harvesting with Robotics Using AI

引用

Device Intelligence, Computing and Communication Technologies, (DICCT), International Conference on

作者： Kumar P Sangathtamil S Sadiq Peer Mohamed K Department of CSE Rajalakshmi Engineering College Chennai India Department of CSE REC Chennai India

ISBN: (数字)9798331543358

ISBN: (纸本)9798331543365

“Automated Crop Harvesting using Robotics and AI” research may truly change the future of modern farming. New Robot Systems Integrated with Smarter Artificial Intelligence So the objective for making self-operating harvesters that can move around fields and find ripe crops using advanced machine vision and executing selective harvesting accurately. Among them, the project on real-time data from Using environmental sensors and AI-driven predictive analytics, the system hopes to improve harvesting. operations, increase crop yield, reduce waste. Innovation addresses pivotal issues such as labour Shortages and high operational costs while promoting sustainable farming practices. Ultimately, this project It works toward making farming better, ensuring that food is produced in a more reliable and efficient way. Also, this project considers how these technologies can work with farm management systems to Provides full monitoring and control. The all-inclusive method not only makes immediate harvesting better. Tasks and also helps with long-term farming plans and managing resources. By encouraging with new developments in robots and AI, the project aims to create better ways in farming technology, opening the This is the way forward to smart farming, the future.

关键词： Temperature sensors program processors Accuracy Crops Robot sensing systems Manipulators Sensor systems Real-time systems Artificial intelligence Farming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：