检索结果-内蒙古大学图书馆

37th IEEE International System-on-Chip Conference, SOCC 2024

作者： Liu, Shuang Radetzki, Martin University of Stuttgart Embedded Systems Institute of Computer Architecture and Computer Engineering Stuttgart Germany

ISBN: (纸本)9798350377569

Chiplet-based systems have become prominent in large systems-on-Chips (SoCs) as a means to mitigate increasing design costs. However, the integration of multiple chiplets introduces new challenges in the interconnection network, potentially leading to deadlocks. In this paper, we propose an Integer Linear Programming (ILP) based design approach to address this issue. Our method considers various design factors for deadlock-free routing, such as topology, latency, load balancing, path diversity, and fault tolerance, applicable to both general-purpose chiplets and application-specific chiplets. It facilitates the determination of optimal turn restrictions for general-purpose chiplets or constructs optimal deadlock-free routing paths for application-specific chiplets if the communication patterns are known. The results demonstrate the capability of the method to find optimal solutions under various design considerations. © 2024 IEEE.

关键词： Integer linear programming

来源：评论

学校读者我要写书评

暂无评论

Energy-Aware Heterogeneous Federated Learning via Approximate DNN Accelerators

引用

IEEE Transactions on computer-Aided Design of Integrated Circuits and systems 2024年第6期44卷 2054-2066页

作者： Pfeiffer, Kilian Balaskas, Konstantinos Siozios, Kostas Henkel, Jorg Karlsruhe Institute of Technology Chair for Embedded Systems Karlsruhe76131 Germany University of Patras Department of Computer Engineering and Informatics Patras26504 Greece Aristotle University of Thessaloniki Department of Physics Thessaloniki54124 Greece

In Federated Learning (FL), devices that participate in the training usually have heterogeneous resources, i.e., energy availability. In current deployments of FL, devices that do not fulfill certain hardware requirements are often dropped from the collaborative training. However, dropping devices in FL can degrade training accuracy and introduce bias or unfairness. Several works have tackled this problem on an algorithm level, e.g., by letting constrained devices train a subset of the server neural network (NN) model. However, it has been observed that these techniques are not effective w.r.t. accuracy. Importantly, they make simplistic assumptions about devices' resources via indirect metrics such as multiply accumulate (MAC) operations or peak memory requirements. We observe that memory access costs (that are currently not considered in simplistic metrics) have a significant impact on the energy consumption. In this work, for the first time, we consider on-device accelerator design for FL with heterogeneous devices. We utilize compressed arithmetic formats and approximate computing, targeting to satisfy limited energy budgets. Using a hardware-aware energy model, we observe that, contrary to the state of the art's moderate energy reduction, our technique allows for lowering the energy requirements (by 4×) while maintaining higher accuracy. © 1982-2012 IEEE.

关键词： Budget control

来源：评论

学校读者我要写书评

暂无评论

Integer Linear Programming Based Design of Deadlock-Free Routing for Chiplet-Based systems

Integer Linear Programming Based Design of Deadlock-Free Rou...

引用

IEEE International SOC Conference

作者： Shuang Liu Martin Radetzki Chair of Embedded Systems Institute of Computer Architecture and Computer Engineering University of Stuttgart Stuttgart Germany

ISBN: (数字)9798350377569

ISBN: (纸本)9798350377576

关键词： Fault tolerance Network topology Chiplets Fault tolerant systems System recovery Integer linear programming Routing Load management Topology System-on-chip

来源：评论

学校读者我要写书评

暂无评论

Synergistic Floorplanning and Routing Topology Co-design for Application-Specific NoC Synthesis

Synergistic Floorplanning and Routing Topology Co-design for...

引用

IEEE International Symposium on embedded Multicore Socs (MCSoC)

作者： Shuang Liu Martin Radetzki Chair of Embedded Systems Institute of Computer Architecture and Computer Engineering University of Stuttgart Stuttgart Germany

ISBN: (数字)9798331530471

ISBN: (纸本)9798331530488

Network-on-Chip (NoC) offers a promising solution for on-chip communication in highly integrated System-on-Chips (SoCs). NoCs can be designed with either regular or application-specific network topologies. While regular topologies are easy to design, they are not ideal for systems with heterogeneous processing elements (PEs) that vary in size. The design of application-specific NoCs, however, involves several interrelated problems that impact each other. This work addresses the challenges in the synthesis of application-specific NoCs by proposing an Integer Linear Programming (ILP) framework. This framework enables the co-design of major problems, including floorplanning, routing topology generation, routing path construction, and application mapping. Although the ILP framework can be applied to each problem individually or in a stepwise manner, the co-design of these interconnected problems allows synthesis steps to interact, enabling designers to explore the entire design space. Using this framework, we have analyzed various design configurations in the synthesis of application-specific NoCs.

关键词： Network topology Multicore processing Fault tolerant systems Network-on-chip System recovery Integer linear programming Routing Linear programming Topology Wire

来源：评论

学校读者我要写书评

暂无评论

Systematic Construction of Deadlock-Free Routing for NoC Using Integer Linear Programming

Systematic Construction of Deadlock-Free Routing for NoC Usi...

引用

IEEE International Symposium on embedded Multicore Socs (MCSoC)

作者： Shuang Liu Martin Radetzki Chair of Embedded Systems Institute of Computer Architecture and Computer Engineering University of Stuttgart Stuttgart Germany

Network-on-Chip (NoC) presents a promising solution for on-chip communication in highly integrated System-on-Chips (SoCs). This work addresses critical challenges in NoC design, including routing construction, application mapping, and particularly the issue of deadlocks in the widely-used wormhole routing method. In this paper, an Integer Linear Programming (ILP) approach for deadlock-free routing is proposed, applicable to arbitrary network topologies. We systematically analyze deadlock-free routing construction for mesh and torus topologies under uniform random traffic and provide alternative solutions to turn models. In the context of application-specific NoCs, application mapping, and deadlock-free routing are integrated within a single ILP. Through evaluation with several benchmark applications, it is demonstrated that the ILP method consistently delivers optimal solutions and could obtain better results than various heuristic methods within an acceptable time. Fault tolerance is also explored and existing techniques are incorporated into the ILP approach. As an illustrative example, application mapping and a 1-link-fault-tolerant deadlock-free routing for the MP3 application on a mesh network is performed.

关键词：

来源：评论

学校读者我要写书评

暂无评论

VersaSens: An Extendable Multimodal Platform for Next-Generation Edge-AI Wearables

IEEE Transactions on Circuits and Systems for Artificial Int...

引用

IEEE Transactions on Circuits and systems for Artificial Intelligence 2024年第1期1卷 83-96页

作者： Taraneh Aminosharieh Najafi José Angel Miranda Calero Jérôme Thevenot Benjamin Duc Stefano Albini Alireza Amirshahi Hossein Taji María José Belda Beneyto Antonio Affanni David Atienza Embedded Systems Laboratory (ESL) Institute of Electrical and Micro Engineering École Polytechnique Fédérale de Lausanne (EPFL) Lausanne Switzerland Polytechnic Department of Engineering and Architecture Udine Italy Computer Architecture and Automation Department Universidad Complutense de Madrid Madrid Spain

The transition of healthcare towards digitalization is closely related to the advancement of health-related technologies, including wearable sensors and edge computing. In this paper, we present VersaSens, a versatile and customizable platform concept and its real implementation as a tool to boost research in wearable sensors. The platform embodies the core attributes of the VersaSens concept: versatility, flexibility, and extendability across multiple aspects of hardware, software, and processing components. It features a modular design, consisting of sensor, processor, and co-processor modules, allowing for various configurations. To evaluate the efficiency of the platform, we tested three use cases: cough monitoring, heartbeat classification and epileptic seizure detection. In all cases, the results indicate that the platform effectively executes the applications, achieving low energy consumption. In particular, our findings indicates that the integration of a domain-specific edge-AI co-processor [i.e., HEEP ocrates (Machetti et al., 2024)] equipped with several hardware accelerators further improved the overall execution time and energy consumption of the system. These results demonstrate the potential of VersaSens to effectively support a diverse range of edge-AI applications and configurations, thereby providing a robust foundation for the research and development of novel smart wearable sensor systems.

关键词： Sensors Wearable devices Artificial intelligence Wearable sensors Medical services Intelligent sensors Monitoring

来源：评论

学校读者我要写书评

暂无评论

An Accurate and Hardware-Efficient Dual Spike Detector for Implantable Neural Interfaces

An Accurate and Hardware-Efficient Dual Spike Detector for I...

引用

2022 IEEE Biomedical Circuits and systems Conference, BioCAS 2022

作者： Guo, Xiaorang Shaeri, MohammadAli Shoaran, Mahsa Institute of Electrical and Micro Engineering Center for Neuroprosthetics Epfl Geneva1202 Switzerland Faculty of Electrical and Computer Engineering Technische Universität Dresden Dresden01069 Germany Chair of Computer Architecture and Parallel Systems Technische Universität München Garching85748 Germany

ISBN: (数字)9781665469173

ISBN: (纸本)9781665469173

Spike detection plays a central role in neural data processing and brain-machine interfaces (BMIs). A challenge for future-generation implantable BMIs is to build a spike detector that features both low hardware cost and high performance. In this work, we propose a novel hardware-efficient and high-performance spike detector for implantable BMIs. The proposed design is based on a dual-detector architecture with adaptive threshold estimation. The dual-detector comprises two separate TEO-based detectors that distinguish a spike occurrence based on its discriminating features in both high and low noise scenarios. We evaluated the proposed spike detection algorithm on the Wave Clus dataset. It achieved an average detection accuracy of 98.9%, and over 95% in high-noise scenarios, ensuring the reliability of our method. When realized in hardware with a sampling rate of 16kHz and 7-bits resolution, the detection accuracy is 97.4%. Designed in 65nm TSMC process, a 256-channel detector based on this architecture occupies only 682μm2/Channel and consumes 0.07μW/Channel, improving over the state-of-the-art spike detectors by 39.7% in power consumption and 78.8% in area, while maintaining a high accuracy. © 2022 IEEE.

关键词： Data handling

来源：评论

学校读者我要写书评

暂无评论

Control Variate Approximation for DNN Accelerators 21

Control Variate Approximation for DNN Accelerators

引用

Proceedings of the 58th Annual ACM/IEEE Design Automation Conference

作者： Georgios Zervakis Ourania Spantidi Iraklis Anagnostopoulos Hussam Amrouch Jörg Henkel Chair for Embedded Systems (CES) Karlsruhe Institute of Technology Karlsruhe Germany Department of Electrical Computer and Biomedical Engineering Southern Illinois University Carbondale U.S.A Department of Electrical Computer and Biomedical Engineering Southern Illinois University Carbondale U.S.A. Chair of Semiconductor Test and Reliability (STAR) University of Stuttgart Stuttgart Germany

ISBN: (纸本)9781665432740

In this work, we introduce a control variate approximation technique for low error approximate Deep Neural Network (DNN) accelerators. The control variate technique is used in Monte Carlo methods to achieve variance reduction. Our approach significantly decreases the induced error due to approximate multiplications in DNN inference, without requiring time-exhaustive retraining compared to state-of-the-art. Leveraging our control variate method, we use highly approximated multipliers to generate power-optimized DNN accelerators. Our experimental evaluation on six DNNs, for Cifar-10 and Cifar-100 datasets, demonstrates that, compared to the accurate design, our control variate approximation achieves same performance and 24% power reduction for a merely 0.16% accuracy loss.

关键词： approximate computing

来源：评论

学校读者我要写书评

暂无评论

An Accurate and Hardware-Efficient Dual Spike Detector for Implantable Neural Interfaces

arXiv

引用

arXiv 2022年

作者： Guo, Xiaorang Shaeri, MohammadAli Shoaran, Mahsa Institute of Electrical and Micro Engineering Center for Neuroprosthetics EPFL Geneva1202 Switzerland Faculty of Electrical and Computer Engineering Technische Universität Dresden Dresden01069 Germany Chair of Computer Architecture and Parallel Systems Technische Universität München Garching85748 Germany

Spike detection plays a central role in neural data processing and brain-machine interfaces (BMIs). A challenge for future-generation implantable BMIs is to build a spike detector that features both low hardware cost and high performance. In this work, we propose a novel hardware-efficient and high-performance spike detector for implantable BMIs. The proposed design is based on a dual-detector architecture with adaptive threshold estimation. The dual-detector comprises two separate TEO-based detectors that distinguish a spike occurrence based on its discriminating features in both high and low noise scenarios. We evaluated the proposed spike detection algorithm on the Wave_Clus dataset. It achieved an average detection accuracy of 98.9%, and over 95% in high-noise scenarios, ensuring the reliability of our method. When realized in hardware with a sampling rate of 16kHz and 7-bits resolution, the detection accuracy is 97.4%. Designed in 65nm TSMC process, a 256-channel detector based on this architecture occupies only 682µm2/Channel and consumes 0.07µW/Channel, improving over the state-of-the-art spike detectors by 39.7% in power consumption and 78.8% in area, while maintaining a high accuracy. © 2022, CC BY.

关键词： Data handling

来源：评论

学校读者我要写书评

暂无评论

NPU Thermal Management

NPU Thermal Management

引用

作者： Amrouch, Hussam Zervakis, Georgios Salamin, Sami Kattan, Hammam Anagnostopoulos, Iraklis Henkel, Jorg in the Computer Science Electrical Engineering Faculty University of Stuttgart Stuttgart Germany Chair for Embedded Systems Karlsruhe Institute of Technology Karlsruhe Germany Department of Electrical and Computer Engineering Southern Illinois University CarbondaleIL United States

Neural processing units (NPUs) are becoming an integral part in all modern computing systems due to their substantial role in accelerating neural networks (NNs). The significant improvements in cost-energy-performance stem from the massive array of multiply accumulate (MAC) units that remarkably boosts the throughput of NN inference. In this work, we are the first to investigate the thermal challenges that NPUs bring, revealing how MAC arrays, which form the heart of any NPU, impose serious thermal bottlenecks to on-chip systems due to their excessive power densities. For the first time, we explore: 1) the effectiveness of precision scaling and frequency scaling (FS) in temperature reductions and 2) how advanced on-chip cooling using superlattice thin-film thermoelectric (TE) open doors for new tradeoffs between temperature, throughput, cooling cost, and inference accuracy in NPU chips. Our work unveils that hybrid thermal management, which composes different means to reduce the NPU temperature, is a key. To achieve that, we propose and implement PFS-TE technique that couples precision and FS together with superlattice TE cooling for effective NPU thermal management. Using commercial signoff tools, we obtain accurate power and timing analysis of MAC arrays after a full-chip design is performed based on 14-nm Intel FinFET technology. Then, multiphysics simulations using finite-element methods are carried out for accurate heat simulations in the presence and absence of on-chip cooling. Afterward, comprehensive design-space exploration is presented to demonstrate the Pareto frontier and the existing tradeoffs between temperature reductions, power overheads due to cooling, throughput, and inference accuracy. Using a wide range of NNs trained for image classification, experimental results demonstrate that our novel NPU thermal management increases the inference efficiency (TOPS/Joule) by $1.33\times $ , $1.87\times $ , and $2\times $ under different temperature constrain

关键词： Temperature control

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：