检索结果-内蒙古大学图书馆

16th International Conference on Verification and Evaluation of computer and Communication Systems, VECoS 2023

ISBN: (纸本)9783031497360

the proceedings contain 12 papers. the special focus in this conference is on Verification and Evaluation of computer and Communication Systems. the topics include: Blockchain-Based Trust Management for IoMT Environment;Command & Control in UAVs Fleets: Coordinating Drones for Ground Missions in Changing Contexts;Verified high performance computing: the SyDPaCC Approach;A QoE Driven DRL Approach for Network Slicing Based on SFC Orchestration in SDN/NFV Enabled Networks;on Language-Based Opacity Verification Problem in Discrete Event Systems Under Orwellian Observation;an Enhanced Interface-Based Probabilistic Compositional Verification Approach;a Sound Abstraction Method Towards Efficient Neural Networks Verification;Towards Formal Verification of Node RED-Based IoT Applications;formal Verification of a Post-quantum Signal Protocol with Tamarin;a Comparative Study of Online Cybersecurity Training Platforms.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Evaluation Model for Current-Domain SRAM-based computing-in-Memory Circuits 16

Evaluation Model for Current-Domain SRAM-based Computing-in-...

引用

16th IEEE International symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

作者： Zhang, Yiran Wang, Bo Chen, Jinwu Chen, Xi Si, Xin Southeast Univ Natl ASIC Ctr Nanjing Peoples R China

ISBN: (纸本)9798350393613

computing-in-Memory (CIM) is an emerging non-von Neumann computing architecture that enhances energy efficiency in AI tasks. Current-domain CIM is a common kind of design with a higher potential for energy efficiency compared to digital-domain CIM. However, due to its fully analog design, current-domain CIM is susceptible to analog non-idealities that can introduce computational errors, thereby impacting the inference accuracy of neural networks. this paper provides a detailed analysis of the non-idealities and models the current-domain CIM with consideration of non-idealities. then the model is utilized to conduct design space exploration on a current-domain CIM design. To validate the model, we compare the predicted energy efficiency from the model with the measured energy efficiency of a 28nm test chip and find a high degree of concurrence between them. the simulation results show that when setting the tolerable maximum relative computation error to 0.01 and aiming to maintain the computation accuracy of over 80%, parallelism higher than 7, 11, and 17 is required for 5-bit, 6-bit, and 7-bit analog-to-digital converter (ADC) respectively. while for current-to-digital converter (CDC), lower parallelism is required.

关键词： Compute-in-Memory (CIM) current-domain deep neural network (DNN) analog non-idealities design space exploration

来源：评论

学校读者我要写书评

暂无评论

high-Quality I/O Bandwidth Prediction with Minimal Data via Transfer Learning Workflow 36

High-Quality I/O Bandwidth Prediction with Minimal Data via ...

引用

36th IEEE International symposium on computer architecture and high-performance computing, SBAC-PAD 2024

作者： Povaliaiev, Dmytro Liem, Radita Kunkel, Julian Lofstead, Jay Carns, Philip RWTH Aachen University Chair of High Performance Computing IT Center Aachen Germany Georg-August Universität Göttingen/GWDG Göttingen Germany Sandia National Laboratory Albuquerque United States Argonne National Laboratory Lemont United States

ISBN: (纸本)9798350356168

Providing a high-quality performance prediction has the potential to enhance various aspects of a cluster, such as devising scheduling and provisioning policies, guiding procurement decisions, suggesting candidate applications for tuning, and identifying probable scaling and porting challenges. Creating such a prediction for the I/O metrics is still challenging, however, due to the intricate interplay of multiple cluster components, making this an ideal case for machine learning. Nevertheless, achieving the required accuracy level with machine learning calls for a substantial amount of high-quality data, which is often a difficult challenge for most HPC clusters. In this work we explore the use of transfer learning to predict the applications' I/O bandwidth based on a public dataset. As a result, our experiment can provide an I/O bandwidth prediction for a different cluster comparable to the current state-of-the-art result while employing 100 times less data than needed to construct the base model. Furthermore, we evaluate potential future improvements of the proposed workflow. © 2024 IEEE.

关键词： Adversarial machine learning

来源：评论

学校读者我要写书评

暂无评论

A Ternary Neural Network computing-in-Memory Processor With 16T1C Bitcell architecture

引用

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS 2023年第5期70卷 1739-1743页

作者： Jeong, Hoichang Kim, Seungbin Park, Keonhee Jung, Jueun Lee, Kyuho Jason Ulsan Natl Inst Sci & Technol Dept Elect Engn Ulsan 44919 South Korea Ulsan Natl Inst Sci & Technol Grad Sch Artificial Intelligence Ulsan 44919 South Korea Ulsan Natl Inst Sci & Technol Grad Sch Artificial Intelligence Dept Elect Engn Ulsan 44919 South Korea

A highly energy-efficient computing-in-Memory (CIM) processor for Ternary Neural Network (TNN) acceleration is proposed in this brief. Previous CIM processors for multi-bit precision neural networks showed low energy efficiency and throughput. Lightweight binary neural networks were accelerated with CIM processors for high energy efficiency but showed poor inference accuracy. In addition, most previous works suffered from poor linearity of analog computing and energy-consuming analog-to-digital conversion. To resolve the issues, we propose a Ternary-CIM (T-CIM) processor with 16T1C ternary bitcell for good linearity with the compact area and a charge-based partial sum adder circuit to remove analog-to-digital conversion that consumes a large portion of the system energy. Furthermore, flexible data mapping enables execution of the whole convolution layers with smaller bitcell memory capacity. Designed with 65 nm CMOS technology, the proposed T-CIM achieves 1,316 GOPS of peak performance and 823 TOPS/W of energy efficiency.

关键词： computer architecture throughput Neural networks Linearity Energy efficiency Common Information Model (computing) Transistors SRAM computing-in-memory (CIM) processing-in-memory (PIM) ternary neural network (TNN) analog computing

来源：评论

学校读者我要写书评

暂无评论

Evaluating performance Portability of SYCL and Kokkos: A Case Study on LBM Simulations 21

Evaluating Performance Portability of SYCL and Kokkos: A Cas...

引用

21st IEEE International symposium on Parallel and Distributed Processing with Applications, 13th IEEE International Conference on Big Data and Cloud computing, 16th IEEE International Conference on Social computing and Networking and 13th International Conference on Sustainable computing and Communications, ISPA/BDCloud/SocialCom/SustainCom 2023

作者： Ding, Yue Xu, Chuanfu Qiu, Haozhong Wang, Qingsong Dai, Weixi Lin, Yongzhen Che, Yonggang National University of Defense Technology Institute for Quantum Information State Key Laboratory of High Performance Computing Changsha China

ISBN: (纸本)9798350329223

Since modern high performance computing systems are evolving towards diverse and heterogeneous architectures, the emergence of high-level portable programming models leads to a particular focus on performance portability. In this paper, we evaluate the performance portability and explore performance optimization methods for two portable programming models SYCL and Kokkos. We take an open-source multi-phase Lattice Boltzmann Method (LBM) flow simulation code as a case study and implement portable versions with different optimizations. then we compare our portable implementations with engineer-tuned OpenMP and CUDA versions on Intel CPUs and NVIDIA GPUs. Experimental results show that both SYCL and Kokkos can deliver superior performance than traditional programming models, but the best performance of the portable versions depends heavily on platform-specific optimizations. there is no single implementation that can achieve the best performance on both CPUs and GPUs. Consequently, we conclude that the performance portability still needs to be further improved for both SYCL and Kokkos. In addition, we present a comparative analysis of different optimization methods that qualify the performance enhancement when using SYCL and Kokkos on CPUs and GPUs. Our work offers valuable references for the development of both portable programming models and applications. © 2023 IEEE.

关键词： Program processors

来源：评论

学校读者我要写书评

暂无评论

ETTE: Efficient Tensor-Train-based computing Engine for Deep Neural Networks 23

ETTE: Efficient Tensor-Train-based Computing Engine for Deep...

引用

50th Annual International symposium on computer architecture (ISCA)

作者： Gong, Yu Yin, Miao Huang, Lingyi Xiao, Jinqi Sui, Yang Deng, Chunhua Yuan, Bo Rutgers State Univ New Brunswick NJ 08901 USA ScaleFlux Inc Milpitas CA USA

ISBN: (纸本)9798400700958

Tensor-train (TT) decomposition enables ultra-high compression ratio, making the deep neural network (DNN) accelerators based on this method very attractive. TIE, the state-of-the-art TT based DNN accelerator, achieved high performance by leveraging a compact inference scheme to remove unnecessary computations and memory access. However, TIE increases memory costs for stage-wise intermediate results and additional intra-layer data transfer, leading to limited speedups even the models are highly compressed. To unleash the full potential of TT decomposition, this paper proposes ETTE, an algorithm and hardware co-optimization framework for Efficient Tensor-Train Engine. At the algorithm level, ETTE proposes new tensor core construction and computation ordering mechanism to reduce stage-wise computation and storage cost at the same time. At the hardware level, ETTE proposes a lookahead-style across-stage processing scheme to eliminate the unnecessary stage-wise data movement. By fully leveraging the decoupled input and output dimension factors, ETTE develops an efficient low-cost memory partition-free access scheme to efficiently support the desired matrix transformation. We demonstrate the effectiveness of ETTE via implementing a 16PE hardware prototype with CMOS 28nm technology. Compared with GPU on various workloads, ETTE achieves 6.5x - 253.1x higher throughput and 189.2x - 9750.5x higher energy efficiency. Compared with the state-of-the-art DNN accelerators, ETTE brings 1.1x - 58.3x, 2.6x - 1170.4x and 1.8x - 2098.2x improvement on throughput, energy efficiency and area efficiency, respectively.

关键词： tensor decomposition neural networks low rank accelerator

来源：评论

学校读者我要写书评

暂无评论

Metrics for Packing Efficiency and Fairness of HPC Cluster Batch Job Scheduling 34

Metrics for Packing Efficiency and Fairness of HPC Cluster B...

引用

34th IEEE International symposium on computer architecture and high performance computing (SBAC-PAD)

作者： Goponenko, Alexander, V Lamar, Kenneth Peterson, Christina Allan, Benjamin A. Brandt, Jim M. Dechev, Damian Univ Cent Florida Dept Comp Sci 211 Harris CtrBldg 1164000 Cent Florida Blvd Orlando FL 32816 USA Sandia Natl Labs POB 5800MS 0823 Albuquerque NM 87185 USA

ISBN: (数字)9781665451550

ISBN: (纸本)9781665451550

Development of job scheduling algorithms, which directly influence high-performance computing (HPC) clusters performance, is hindered because popular scheduling quality metrics, such as Bounded Slowdown, poorly correlate with global scheduling objectives that include job packing efficiency and fairness. this report proposes Area Weighted Response Time, a metric that offers an unbiased representation of job packing efficiency, and presents a class of new metrics, Priority Weighted Specific Response Time, that assess both packing efficiency and fairness of schedules. the provided examples of simulation of scheduling of real workload traces and analysis of the resulting schedules with the help of these metrics and conventional metrics, demonstrate that although Bounded Slowdown can be readily improved by modifying the standard First Come First Served backfilling algorithm and by using existing techniques of estimating job runtime, these improvements are accompanied by significant degradation of job packing efficiency and fairness. In contrast, improving job packing efficiency and fairness over the standard backfilling algorithm, which is designed to target those objectives, is difficult. It requires further algorithm development and more accurate runtime estimation techniques that reduce frequency of underpredictions.

关键词： high performance computing parallel job scheduling performance metrics schedule quality runtime estimates packing efficiency fairness weighted flow time weighted response time

来源：评论

学校读者我要写书评

暂无评论

Disparate Scenarios for Reduction of Voltage and Current Sensors in Nested Neutral Point Clamped (NNPC) Converter 16

Disparate Scenarios for Reduction of Voltage and Current Sen...

引用

16th Annual Power Electronics, Drive Systems, and Technologies Conference, PEDSTC 2025

作者： Bagheri Hashkavayi, M. Masoud Barakati, S. Saeid Mohtavipour, S. Sadafi, Mojtaba Darmian, Saeed Yousofi University of Sistan and Baluchestan Department of Electrical and Computer Engineering Zahedan Iran University of Guilan Department of Electrical and Computer Engineering Rasht Iran Esfarayen University of Technology Department of Electrical and Computer Engineering Esfarayen Iran Regional Electric Co. of Sistan and Baluchestan Department of Research and Planning Zahedan Iran

ISBN: (纸本)9798331533946

the Nested Neutral Point Clamped (NNPC) converter, functioning as a voltage source Converter (VSC), provides an effective solution for applications requiring Medium-Voltage and high-Power (MVHP). Earlier implementations of this converter typically required many sensors to maintain capacitor voltages equally in all series Half-Bridge Sub-Modules (HB-SMs). this research introduces two methodologies to minimize sensor requirements for computing capacitor voltages using novel algorithms based on estimation methods. these strategies simplify the converter control procedure by eliminating individual current and voltage sensors specified for HB-SMs. the proposed approaches ensure precise voltage balance with minimal estimation error by continuously adjusting capacitor voltages using estimated values from previous iterations and switching signals. Extensive MATLAB Simulink tests verify the effectiveness of these techniques across diverse practical scenarios. the results highlight significant simplification of sensor complexity while maintaining strong performance in NNPC Converter applications, emphasizing the importance of sensor implementation in the cost and operational effectiveness of VSCs. © 2025 IEEE.

关键词： Capacitor bank

来源：评论

学校读者我要写书评

暂无评论

Appropriate Graph-Algorithm Selection for Edge Devices Using Machine Learning 16

Appropriate Graph-Algorithm Selection for Edge Devices Using...

引用

16th IEEE International symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

作者： Fukasawa, Yusuke Komatsu, Kazuhiko Sato, Masayuki Kobayashi, Hiroaki Tohoku Univ Grad Sch Informat Sci Sendai Miyagi Japan Tohoku Univ Cybersci Ctr Sendai Miyagi Japan

ISBN: (纸本)9798350393613

As a new information technology, edge computing has attracted much attention in recent years. Edge computing collects, stores, and processes data on the edge devices, such as smartphones and sensors. Edge devices can process data in real time by reducing communication time with central servers. therefore, various algorithms for preprocessing should be executed on edge devices to provide services rapidly. Graph algorithms are one of such candidates, because graph data play an important role in representing a variety of information around us such as maps, SNS, and web structures, which is one of data collected and processed on edge devices. However, since execution time of graph algorithms varies greatly depending on the graph data, the preprocessing on edge devices may take a long time. therefore, it is necessary to select an appropriate algorithm and process data for edge devices as fast as possible. this paper proposes a method to select the appropriate graph algorithm on edge devices. By using machine learning that takes features of graph data as input, the performances of graph algorithms, such as the execution times, are predicted. then, the proposed method selects a suitable algorithm for requests of edge users. the evaluation result demonstrates that the proposed method can select the appropriate algorithm from the several algorithms depending on the characteristics of graph data.

关键词： graph graph algorithm machine learning edge computing

来源：评论

学校读者我要写书评

暂无评论

Direct Spatial Implementation of Sparse Matrix Multipliers for Reservoir computing 28

Direct Spatial Implementation of Sparse Matrix Multipliers f...

引用

28th Annual IEEE International symposium on high-performance computer architecture (HPCA)

作者： Denton, Matthew Schmit, Herman Google Res Mountain View CA 94043 USA

ISBN: (纸本)9781665420273

Reservoir computing is a nascent sub-field of machine learning which relies on the recurrent multiplication of a very large, sparse, fixed matrix. We argue that direct spatial implementation of these fixed matrices minimizes the work performed in the computation, and allows for significant reduction in latency and power through constant propagation and logic minimization. Bit-serial arithmetic enables massive static matrices to be implemented. We present the structure of our bit-serial matrix multiplier, and evaluate using canonical signed digit representation to further reduce logic utilization. We have implemented these matrices on a large FPGA and provide a cost model that is simple and extensible. these FPGA implementations, on average, reduce latency by 50x up to 86x versus GPU libraries. Comparing against a recent sparse DNN accelerator, we measure a 4.1x to 47x reduction in latency depending on matrix dimension and sparsity.

关键词： Costs Graphics processing units computer architecture Machine learning Reservoirs throughput Minimization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：