Network-on-Chip (NoC), known for its high bandwidth and scalability, is extensively utilized in chip multiprocessors. However, as technology advances to the nanometer scale, NoC is becoming increasingly vulnerable to ...
详细信息
ISBN:
(数字)9798331509712
ISBN:
(纸本)9798331509729
Network-on-Chip (NoC), known for its high bandwidth and scalability, is extensively utilized in chip multiprocessors. However, as technology advances to the nanometer scale, NoC is becoming increasingly vulnerable to errors caused by crosstalk, radiation, electromagnetic interference, and other factors. In addition to ensuring network reliability, designers must consider the overhead instead of blindly pursuing fault-tolerance capabilities in NoC design. In this paper, we analyze the characteristics of conventional End-to-End (E2E) and Switch-to-Switch (S2S) designs and propose AFTP, an adaptive high cost-effectiveness fault-tolerant NoC design based on prediction. Our design prioritizes stronger protection for head flits and relaxed protection for body and tail flits, and it is capable of dynamically adjusting the decoding times for flits in response to changes in error rate, aiming to achieve a balance between the overhead and the reliability of NoC. Our design demonstrates a significant improvement in cost-effectiveness compared to conventional E2E and S2S designs, achieving a 5.9x and 8.9x improvement, respectively, under common synthetic traffic and PARSEC benchmarks.
This paper addresses the challenge of reducing the number of nodes in Look-Up Table (LUT) networks with two significant applications. First, Field-Programmable Gate Arrays (FPGAs) can be modelled as networks of LUTs, ...
详细信息
Narrow-precision fixed-point (INT) computation is a significant approach for reducing memory requirements and enhancing the performance of accelerators for Deep Neural Networks (DNNs). Different DNNs, as well as diffe...
详细信息
ISBN:
(数字)9798350349634
ISBN:
(纸本)9798350349641
Narrow-precision fixed-point (INT) computation is a significant approach for reducing memory requirements and enhancing the performance of accelerators for Deep Neural Networks (DNNs). Different DNNs, as well as different layers within the DNNs, may exhibit varying numerical distributions, necessitating INT formats with different minimum bit-widths. Therefore, DNN accelerators need to support multi-precision INT computations to strike a better balance between DNN inference accuracy and performance. However, existing precision-scalable accelerators face challenges such as low bandwidth utilization, insufficient utilization of computing resources across different precision modes, and complex circuit structures with associated overhead. In this paper, we propose (1) a hardware-friendly Combining-Like-Terms GEMM (CLT-GEMM) scheme that supports multiple computing modes of 2/4/8 bits and their combinations to align with the various bit-width settings of DNNs; (2) and subsequently design an efficient systolic accelerator with scalable precision, named BitShare, which features DataMap module and Multi-mode adder-tree-based accumulators. Compared to the state-of-the-art precision-scalable design, BitBlade, our accelerator achieves a 57.25% reduction in bandwidth requirement and exhibits an improvement of
$1.14\times$
and
$1.12\times$
in area and power efficiency
$(2\mathbf{b}\times 2\mathbf{b})$
, respectively.
The transition kernel of a continuous-state-action Markov decision process (MDP) admits a natural tensor structure. This paper proposes a tensor-inspired unsupervised learning method to identify meaningful low-dimensi...
The transition kernel of a continuous-state-action Markov decision process (MDP) admits a natural tensor structure. This paper proposes a tensor-inspired unsupervised learning method to identify meaningful low-dimensional state and action representations from empirical trajectories. The method exploits the MDP's tensor structure by kernelization, importance sampling and low-Tucker-rank approximation. This method can be further used to cluster states and actions respectively and find the best discrete MDP abstraction. We provide sharp statistical error bounds for tensor concentration and the preservation of diffusion distance after embedding. We further prove that the learned state/action abstractions provide accurate approximations to latent block structures if they exist, enabling function approximation in downstream tasks such as policy evaluation.
As industrial systems become more complex and interconnected, diagnosing faults accurately and in real time has become increasingly challenging. This paper explores how combining artificial intelligence with digital t...
详细信息
Network-on-Chip (NoC) has been widely applied in modern chip multiprocessors due to its high bandwidth and scalability. However, as technology advances to the nanometer scale, NoC is increasingly vulnerable to errors ...
详细信息
ISBN:
(数字)9798331540333
ISBN:
(纸本)9798331540340
Network-on-Chip (NoC) has been widely applied in modern chip multiprocessors due to its high bandwidth and scalability. However, as technology advances to the nanometer scale, NoC is increasingly vulnerable to errors caused by crosstalk, radiation, electromagnetic interference, etc. Conventional Switch-to-Switch (S2S) fault-tolerant designs based on ECC have overlooked the characteristic of the distribution of traffic load. This oversight not only increases area overhead significantly but also leads to low average utilization of ECC decoder modules. In this paper, we analyze the distribution of traffic load in mesh network and propose a load balancing-oriented fault-tolerant NoC design. The core idea is to allocate different numbers of ECC decoder modules to each router based on the distribution of traffic load, aiming to improve the average utilization of ECC decoder modules and reduce the area overhead without compromising fault-tolerant capability of NoC. The experiment under 6 common synthetic traffic patterns shows that compared to the baseline, our design exhibits an average delay performance loss of less than 0.88%. Additionally, the maximum reduction in the number of ECC decoder modules is 160, the maximum reduction in the area overhead of NoC is 15.06%, and the maximum improvement in the average utilization of ECC decoder modules is 1.21x. Furthermore, the experiment under PARSEC benchmarks shows that compared to the baseline, our design exhibits an average delay performance loss of less than 0.08%. Additionally, the maximum reduction in the number of ECC decoder modules is 156, the maximum reduction in total NoC area overhead is 14.69%, and the maximum improvement in the average utilization of ECC decoder modules is 1.13x.
1Introduction The satisfiability(SAT)problem is always a core problem in the field of computer *** theoretical and applied research have long been the common attention of many scholars in the field of artificial intel...
详细信息
1Introduction The satisfiability(SAT)problem is always a core problem in the field of computer *** theoretical and applied research have long been the common attention of many scholars in the field of artificial intelligence and mathematical *** the real world,all issues related to combinatorial optimization and coordination verification are closely related to SAT problem.
Safety has been a key goal for autonomous driving since its inception, and we believe recognizing and responding to risk is a key component of safety. In this work, we aim to answer the question, “How can explainable...
Safety has been a key goal for autonomous driving since its inception, and we believe recognizing and responding to risk is a key component of safety. In this work, we aim to answer the question, “How can explainable risk representations be generated and used to produce risk-averse trajectories?” To answer this question, previous work uses risk metrics to formulate an optimization problem. In contrast, our work is based on research showing the usefulness of grids as a representation to generate image-based risk maps through a trained neural network. We propose a method of determining risk from a bird's eye view (BEV) of an autonomous vehicle's surroundings. Our method consists of (1) a risk map generator, which is trained to recognize risk associated with nearby agents and the map, (2) differentiable value iteration using the risk map to learn a policy, and (3) a trajectory sampler, which samples from this policy to generate a trajectory. We evaluate our planner in a close-loop manner and find improvements in its overall ability to mimic human driving while maintaining comparable safety statistics. Self-ablation also reveals the potential for fine-tuning the behavior of the planner given a designer's needs.
Hyperbole, or exaggeration, is a common linguistic phenomenon. The detection of hyperbole is an important part of understanding human expression. There have been several studies on hyperbole detection, but most of whi...
详细信息
3D shape representation using mesh data is essential in various applications, such as virtual reality and simulation technologies. Current methods extracting features from mesh edges or faces struggle with complex 3D ...
详细信息
暂无评论