An on-demand offloading framework is a practical solution for resource-limited Internet of Things scenarios. However, an ineffective offloading policy can lead to wasteful transmission costs. Prior works have designed...
An on-demand offloading framework is a practical solution for resource-limited Internet of Things scenarios. However, an ineffective offloading policy can lead to wasteful transmission costs. Prior works have designed policies based solely on edge information, neglecting the role of the cloud and potentially degrading overall performance. In this paper, we propose two methods to address this issue. First, we modify the training process to incorporate information from both the edge and the cloud, achieving joint edge-cloud optimization in our trainable offloading policy. Second, we leverage structured feature representations to enhance our policy’s efficiency and reduce the cost of ineffective offloading. Our experimental results show that our methods outperform existing approaches on ResNet152 and VGG16, reducing the offloading ratio by 17.74%/23.57% and increasing the offloading efficiency by 4.47%/5.52%, respectively.
Redistribution layers (RDLs) are widely applied for signal transmissions in advanced packages. Traditional redistribution layer (RDL) routers use only 90- and 135-degree turns for routing. With technological advances,...
Redistribution layers (RDLs) are widely applied for signal transmissions in advanced packages. Traditional redistribution layer (RDL) routers use only 90- and 135-degree turns for routing. With technological advances, routing in RDLs can be any obtuse angle, leading to larger routing solution spaces and shorter total wirelength. This paper proposes the first any-angle routing algorithm in the literature for multiple RDLs. We first give a novel global routing algorithm with accurate routing resource estimation. A multi-net access point adjustment method is then proposed based on dynamic programming and our partial net separation scheme. Finally, we develop an efficient tile routing algorithm to obtain valid routes with fixed access points. Experimental results show that our algorithm can achieve a 15.7% shorter wirelength compared with a traditional RDL router.
This paper reports a magnetic sensing device for in-vivo artery pressure monitoring. The device consists of multiple magnets and hall sensors arranged orthogonally on a flexible substrate. The novelty of the device li...
This paper reports a magnetic sensing device for in-vivo artery pressure monitoring. The device consists of multiple magnets and hall sensors arranged orthogonally on a flexible substrate. The novelty of the device lies in the component plurality and their geometrical configuration, which enhances pressure sensitivity. Several designs involving different number of components have been investigated through Finite Element simulations. The design that incorporates two cross-positioned magnets and sensors exhibited an eight-fold improvement in artery expansion response compared to the traditional single magnet-sensor configuration. In-vitro experiments carried on a surgical latex tube demonstrated a five-fold improvement in pressure sensitivity. Further development of this device will enable continuous monitoring of blood pressure after organ transplantations.
This paper proposes a high-quality 3D placement algorithm to determine the positions of standard cells and inter-die vias to optimize wirelength considering multiple manufacturing technologies for different dies. The ...
This paper proposes a high-quality 3D placement algorithm to determine the positions of standard cells and inter-die vias to optimize wirelength considering multiple manufacturing technologies for different dies. The algorithm consists of three major novel techniques: (1) a multi-technologies weighted-average (MTWA) wirelength model, (2) a weighted inter-die-connection cost controlling the net-degree distribution of the cut set, and (3) a via-cell co-optimization technique to further improve the quality of placement solutions. Compared with the winners at the 2022 CAD Contest at ICCAD on 3D Placement with D2D Vertical Connections, our placer achieves the best results for all nontrivial cases.
For channels with finite input and output sets, under mild technical assumptions, the local behavior of the Augustin information as a function of the input distribution is characterized for all positive orders using t...
详细信息
ISBN:
(数字)9798350348934
ISBN:
(纸本)9798350348941
For channels with finite input and output sets, under mild technical assumptions, the local behavior of the Augustin information as a function of the input distribution is characterized for all positive orders using the implicit function theorem and the characterization of the Augustin information in terms of the Augustin dual. For channels with (potentially multiple) linear constraints, the slowest decrease of Augustin information with increasing distance from the Augustin capacity-achieving input distributions is characterized within small neighborhoods around these distributions for all positive orders.
Combinatorial optimization problems, such as IC layout and industrial scheduling, have significant industrial applications but are challenging due to exponential time complexity. In this work, we propose a novel annea...
详细信息
ISBN:
(数字)9798350367331
ISBN:
(纸本)9798350367348
Combinatorial optimization problems, such as IC layout and industrial scheduling, have significant industrial applications but are challenging due to exponential time complexity. In this work, we propose a novel annealing-inspired heuristic algorithm that treats combinatorial problems as function optimization problems using nonlinear programming. The proposed gradient-descent-based solver significantly improves the convergence rate and includes a new regularization constraint to escape local minima by increasing convexity. Applied to the Traveling Salesman Problem (TSP) with various city counts, the proposed algorithm demonstrates polynomial time complexity. It much reduces the complexity from (n−1)!/2 to n 4 and has a marked improvement in computation efficiency. Notably, for a 50-city TSP, the relative error is just around 5%, indicating the accuracy and efficiency of the proposed algorithm in solving high-dimensional instances.
Modern heterogeneous integration requires dense IO interconnections among chips, such as CPU and memory, to facilitate bandwidth-aware packaging. The embedded multi-die interconnect bridge (EMIB) has attracted much at...
Modern heterogeneous integration requires dense IO interconnections among chips, such as CPU and memory, to facilitate bandwidth-aware packaging. The embedded multi-die interconnect bridge (EMIB) has attracted much attention recently by providing a high wiring density and low manufacturing cost. However, EMIB optimization must consider constrained wire orientations and crosstalk. This paper presents the first work on floorplanning for EMIB-based packaging. We first model the floorplanning problem for EMIB-based packaging. Based on a hybrid structure of transitive closure graphs and B*-trees, we present a novel simulated-annealing-based algorithm to efficiently generate the desired EMIB-aware floorplans. We employ maximum-spanning-tree-based partitioning and tree-based classification for already found partial topologies to search for desired solutions more efficiently. Experimental results show that our algorithm can significantly improve the area, total wirelength, and computation time compared with simulated annealing based on TCGs alone.
In-memory computing (IMC) has become the current trend to accelerate the inference of deep neural networks (DNNs). Nonetheless, IMC suffers from variations that significantly degrade the inference accuracy, while near...
In-memory computing (IMC) has become the current trend to accelerate the inference of deep neural networks (DNNs). Nonetheless, IMC suffers from variations that significantly degrade the inference accuracy, while near-memory computing (NMC) maintains the ideal accuracy but at the expense of energy efficiency. In this work, we leverage the NMC/IMC hybrid architecture and propose a dynamic energy-aware policy to strike a better trade-off between accuracy and energy efficiency. Our approach takes advantage of deep reinforcement learning (DRL) to dynamically allocate workloads between NMC and IMC at the data level. Furthermore, we consider the varying energy overhead of NMC usage across different DNN layers. Compared with the prior works, we enhance the accuracy by up to 8.8% on CIFAR-10 and 4.6% on CIFAR-100 while consuming the same amount of energy.
Computing-in-memory (CIM) has become a potential trend for accelerating convolutional neural networks (CNNs). Ongoing research, e.g., Repetitive Input Sharing (RIS), focuses on removing redundant matrix-vector multipl...
Computing-in-memory (CIM) has become a potential trend for accelerating convolutional neural networks (CNNs). Ongoing research, e.g., Repetitive Input Sharing (RIS), focuses on removing redundant matrix-vector multiplication (MVM) by exploiting computational reuse for higher energy efficiency. However, we argue that the RIS neglects the extra overheads of the computation reuse scheme. Moreover, analog CIM is inherently vulnerable to noise. Consequently, reusing the noisy MVM results may lead to severe accuracy degradation. To address the above issues, we first evaluate the extra buffer overheads resulting from the computation reuse scheme for storing repetitive MVM results in the buffer. Based on our evaluation, we find an optimal RIS reuse ratio that balances between buffer costs and the efficiency gain from computation reuse, leading to more energy reduction. In addition, we introduce the RIS-based Hybrid-CIM (H-RIS), which mixes up the analog CIM and digital near-memory-computing (NMC) at the pattern level to maintain accuracy. Based on the above techniques, when we set the RIS ratio to 25%, H-RIS increases 18% accuracy compared with the pure analog CIM and also reduces 97% energy compared with the pure digital NMC.
暂无评论