Chiplet-based systems have become prominent in large systems-on-Chips (SoCs) as a means to mitigate increasing design costs. However, the integration of multiple chiplets introduces new challenges in the interconnecti...
详细信息
In Federated Learning (FL), devices that participate in the training usually have heterogeneous resources, i.e., energy availability. In current deployments of FL, devices that do not fulfill certain hardware requirem...
详细信息
Chiplet-based systems have become prominent in large systems-on-Chips (SoCs) as a means to mitigate increasing design costs. However, the integration of multiple chiplets introduces new challenges in the interconnecti...
详细信息
ISBN:
(数字)9798350377569
ISBN:
(纸本)9798350377576
Chiplet-based systems have become prominent in large systems-on-Chips (SoCs) as a means to mitigate increasing design costs. However, the integration of multiple chiplets introduces new challenges in the interconnection network, potentially leading to deadlocks. In this paper, we propose an Integer Linear Programming (ILP) based design approach to address this issue. Our method considers various design factors for deadlock-free routing, such as topology, latency, load balancing, path diversity, and fault tolerance, applicable to both general-purpose chiplets and application-specific chiplets. It facilitates the determination of optimal turn restrictions for general-purpose chiplets or constructs optimal deadlock-free routing paths for application-specific chiplets if the communication patterns are known. The results demonstrate the capability of the method to find optimal solutions under various design considerations.
Network-on-Chip (NoC) offers a promising solution for on-chip communication in highly integrated System-on-Chips (SoCs). NoCs can be designed with either regular or application-specific network topologies. While regul...
详细信息
ISBN:
(数字)9798331530471
ISBN:
(纸本)9798331530488
Network-on-Chip (NoC) offers a promising solution for on-chip communication in highly integrated System-on-Chips (SoCs). NoCs can be designed with either regular or application-specific network topologies. While regular topologies are easy to design, they are not ideal for systems with heterogeneous processing elements (PEs) that vary in size. The design of application-specific NoCs, however, involves several interrelated problems that impact each other. This work addresses the challenges in the synthesis of application-specific NoCs by proposing an Integer Linear Programming (ILP) framework. This framework enables the co-design of major problems, including floorplanning, routing topology generation, routing path construction, and application mapping. Although the ILP framework can be applied to each problem individually or in a stepwise manner, the co-design of these interconnected problems allows synthesis steps to interact, enabling designers to explore the entire design space. Using this framework, we have analyzed various design configurations in the synthesis of application-specific NoCs.
Network-on-Chip (NoC) presents a promising solution for on-chip communication in highly integrated System-on-Chips (SoCs). This work addresses critical challenges in NoC design, including routing construction, applica...
Network-on-Chip (NoC) presents a promising solution for on-chip communication in highly integrated System-on-Chips (SoCs). This work addresses critical challenges in NoC design, including routing construction, application mapping, and particularly the issue of deadlocks in the widely-used wormhole routing method. In this paper, an Integer Linear Programming (ILP) approach for deadlock-free routing is proposed, applicable to arbitrary network topologies. We systematically analyze deadlock-free routing construction for mesh and torus topologies under uniform random traffic and provide alternative solutions to turn models. In the context of application-specific NoCs, application mapping, and deadlock-free routing are integrated within a single ILP. Through evaluation with several benchmark applications, it is demonstrated that the ILP method consistently delivers optimal solutions and could obtain better results than various heuristic methods within an acceptable time. Fault tolerance is also explored and existing techniques are incorporated into the ILP approach. As an illustrative example, application mapping and a 1-link-fault-tolerant deadlock-free routing for the MP3 application on a mesh network is performed.
The transition of healthcare towards digitalization is closely related to the advancement of health-related technologies, including wearable sensors and edge computing. In this paper, we present VersaSens, a versatile...
详细信息
The transition of healthcare towards digitalization is closely related to the advancement of health-related technologies, including wearable sensors and edge computing. In this paper, we present VersaSens, a versatile and customizable platform concept and its real implementation as a tool to boost research in wearable sensors. The platform embodies the core attributes of the VersaSens concept: versatility, flexibility, and extendability across multiple aspects of hardware, software, and processing components. It features a modular design, consisting of sensor, processor, and co-processor modules, allowing for various configurations. To evaluate the efficiency of the platform, we tested three use cases: cough monitoring, heartbeat classification and epileptic seizure detection. In all cases, the results indicate that the platform effectively executes the applications, achieving low energy consumption. In particular, our findings indicates that the integration of a domain-specific edge-AI co-processor [i.e., HEEP ocrates (Machetti et al., 2024)] equipped with several hardware accelerators further improved the overall execution time and energy consumption of the system. These results demonstrate the potential of VersaSens to effectively support a diverse range of edge-AI applications and configurations, thereby providing a robust foundation for the research and development of novel smart wearable sensor systems.
Spike detection plays a central role in neural data processing and brain-machine interfaces (BMIs). A challenge for future-generation implantable BMIs is to build a spike detector that features both low hardware cost ...
详细信息
In this work, we introduce a control variate approximation technique for low error approximate Deep Neural Network (DNN) accelerators. The control variate technique is used in Monte Carlo methods to achieve variance r...
详细信息
ISBN:
(纸本)9781665432740
In this work, we introduce a control variate approximation technique for low error approximate Deep Neural Network (DNN) accelerators. The control variate technique is used in Monte Carlo methods to achieve variance reduction. Our approach significantly decreases the induced error due to approximate multiplications in DNN inference, without requiring time-exhaustive retraining compared to state-of-the-art. Leveraging our control variate method, we use highly approximated multipliers to generate power-optimized DNN accelerators. Our experimental evaluation on six DNNs, for Cifar-10 and Cifar-100 datasets, demonstrates that, compared to the accurate design, our control variate approximation achieves same performance and 24% power reduction for a merely 0.16% accuracy loss.
Spike detection plays a central role in neural data processing and brain-machine interfaces (BMIs). A challenge for future-generation implantable BMIs is to build a spike detector that features both low hardware cost ...
详细信息
Neural processing units (NPUs) are becoming an integral part in all modern computing systems due to their substantial role in accelerating neural networks (NNs). The significant improvements in cost-energy-performance...
详细信息
Neural processing units (NPUs) are becoming an integral part in all modern computing systems due to their substantial role in accelerating neural networks (NNs). The significant improvements in cost-energy-performance stem from the massive array of multiply accumulate (MAC) units that remarkably boosts the throughput of NN inference. In this work, we are the first to investigate the thermal challenges that NPUs bring, revealing how MAC arrays, which form the heart of any NPU, impose serious thermal bottlenecks to on-chip systems due to their excessive power densities. For the first time, we explore: 1) the effectiveness of precision scaling and frequency scaling (FS) in temperature reductions and 2) how advanced on-chip cooling using superlattice thin-film thermoelectric (TE) open doors for new tradeoffs between temperature, throughput, cooling cost, and inference accuracy in NPU chips. Our work unveils that hybrid thermal management, which composes different means to reduce the NPU temperature, is a key. To achieve that, we propose and implement PFS-TE technique that couples precision and FS together with superlattice TE cooling for effective NPU thermal management. Using commercial signoff tools, we obtain accurate power and timing analysis of MAC arrays after a full-chip design is performed based on 14-nm Intel FinFET technology. Then, multiphysics simulations using finite-element methods are carried out for accurate heat simulations in the presence and absence of on-chip cooling. Afterward, comprehensive design-space exploration is presented to demonstrate the Pareto frontier and the existing tradeoffs between temperature reductions, power overheads due to cooling, throughput, and inference accuracy. Using a wide range of NNs trained for image classification, experimental results demonstrate that our novel NPU thermal management increases the inference efficiency (TOPS/Joule) by $1.33\times $ , $1.87\times $ , and $2\times $ under different temperature constrain
暂无评论