Systolic Array (SA) architecture is a hardware accelerator for running Artificial Intelligence (AI) workloads. Although approximate computing offers hardware and performance benefits, it often sacrifices accuracy, lim...
详细信息
ISBN:
(纸本)9798350384406
Systolic Array (SA) architecture is a hardware accelerator for running Artificial Intelligence (AI) workloads. Although approximate computing offers hardware and performance benefits, it often sacrifices accuracy, limiting its application to error-resilient tasks. Many inexact multipliers in SA exhibit one-sided Error Distribution (ErD), resulting in significant accumulated errors. Approximate computing proves valuable in error-resilient image processing applications, fostering research into various approximation techniques for enhanced hardware performance. This paper explores SA architecture with different configurations of approximate multipliers (AMs) featuring distinct ErDs. It employs a meta-heuristic optimization algorithm, Particle Swarm Optimization (PSO), for image smoothing tasks. The study delves into trade-offs between hardware acceleration and accuracy, offering insights for advancing approximate computing in SA-based AI workloads. The PSO-evolved SA configuration showcases a 3.05% performance improvement, 10.63% silicon footprint reduction, and 10.32% power savings compared to the exact SA. The PSO-derived SA structure also offers notable hardware gains when compared with other state-of-theart (SOTA) benchmark designs. Additionally, it demonstrates a better Structural Similarity Index (SSIM) compared to SA with one-sided error-distributed AMs. The proposed spatially optimized SAs with AMs of different ErDs represent a step towards reliable hardware design and establishing a nearly exact AI accelerator system. All design files and results are shared openly with the research and design community for easy adoption and further exploration.
In order to overcome the problems of high intrusion rate and low encryption depth of traditional cloud computing security research methods, this paper proposes a new cloud computing security research method based on V...
详细信息
The Open structure for allotted and Cooperative Media Algorithms (OADCMA) is an open-deliver framework imparting a plug-in platform that lets customers, without problem, develop distributed and cooperative media algor...
详细信息
Fog computing, a transformative computational paradigm, has ushered in new opportunities but also introduced significant security challenges. This paper presents a vital trust-based framework called Task Integrity Che...
详细信息
ISBN:
(纸本)9798350361261;9798350361278
Fog computing, a transformative computational paradigm, has ushered in new opportunities but also introduced significant security challenges. This paper presents a vital trust-based framework called Task Integrity Checker in Fog computingsystems (TICS) to address these challenges, mainly focusing on mitigating On-Off attacks in the fog computing environment. An On-Off attack is a threat where attackers strategically alternate between malicious and benign behaviors, exploiting temporary errors to evade conventional detection methods. Detecting and thwarting such attacks is imperative for safeguarding the integrity and reliability of fog systems. Our proposed approach tackles this challenge by continuously monitoring and updating trust values for fog nodes based on task results and past performance. These trust values are securely stored in a blockchain, ensuring immutability and enhancing TICS's overall security. Utilizing blockchain technology bolsters the effectiveness of our solution, as it provides a tamper-resistant foundation for trust computation and attack mitigation. We rigorously evaluate the performance of our solution through comprehensive simulation experiments, assessing its effectiveness under various scenarios, including fluctuations in the number of fog nodes and varying attack intensities. The results affirm the efficacy of our approach in detecting and mitigating On-Off attacks, ultimately strengthening the security posture of fog computingsystems. This solution is practical and adaptable to diverse fog computing scenarios, making it a valuable addition to security measures in this dynamic and evolving domain.
The Industrial Internet of Things (IIoT) has revolutionized industrial sectors with enhanced connectivity, data exchange, and predictive maintenance. However, it faces various challenges from non-IID data distribution...
详细信息
ISBN:
(纸本)9798350361360;9798350361353
The Industrial Internet of Things (IIoT) has revolutionized industrial sectors with enhanced connectivity, data exchange, and predictive maintenance. However, it faces various challenges from non-IID data distributions and communication overheads, to the consistency and privacy of prediction models for maintenance. Federated Learning (FL) has been considered as a prevalent technique to address privacy concerns. On the other hand, Edge computing (EC) is being increasingly introduced to ensure low-latency data processing in IIoT systems, especially those with time-critical requirements, e.g., industrial robotics and motion control systems. Moreover, the complexity and design dimensions of today's IIoT systems has lead to the development of large Machine Learning (ML) models with millions of parameters, e.g., using computer vision for field management. This introduces computational and privacy challenges in IIoT scenarios. Innovative FL optimization approaches such as pruning are aimed to tackle these challenges by reducing the number of training parameters, while they may impact accuracy. In this paper, we propose a new technique, called Adaptive Mean aBsolute devIaTion (AMBIT), which is an innovative pruning approach optimizing data transmission without compromising model accuracy and inducing additional computation overhead. By dynamically comparing the difference between the current and the previous weight value, AMBIT adapts better to parameter fluctuations at different stages, thereby accurately locating those parameters that have less impact on convergence. AMBIT's generality and efficiency are illustrated using MNIST and CIFAR-10 datasets, outperforming traditional Magnitude pruning in FL. For MNIST, AMBIT reduces data uploads up to 43.75%, with an increase of 0.62% in accuracy. While for For CIFAR-10, AMBIT achieved a 63.44% decrease with a 5.22% drop in accuracy.
A hierarchical approximate dynamic programming (ADP) strategy is presented to determine intra-day operations of distributed energy storage cluster for demand management and frequency response service. According to the...
详细信息
Analog in memory computing (IMC) has emerged as a promising method to accelerate deep neural networks (DNNs) on hardware efficiently. Yet, analog computation typically focuses on the multiply and accumulate operation,...
详细信息
ISBN:
(纸本)9798350383638;9798350383645
Analog in memory computing (IMC) has emerged as a promising method to accelerate deep neural networks (DNNs) on hardware efficiently. Yet, analog computation typically focuses on the multiply and accumulate operation, while other operations are still being computed digitally. Hence, these mixed-signal IMC cores require extensive use of data converters, which can take a third of the total energy and area consumption. Alternatively, all-analog DNN computation is possible but requires increasingly challenging analog storage solutions, due to noise and leakage of advanced technologies. To enable all-analog DNN acceleration, this work demonstrates a feasible IMC architecture using an efficient analog main memory (AMM) cell. The proposed AMM cell is 42x and 5x more power and area efficient than a baseline analog storage cell. An all-analog architecture using this cell achieves potential efficiency gains of 15x compared with a mixed-signal IMC core using data converters.
In this paper, we revisit a partition-based distributed extended Kalman filter (DEKF) method proposed in [1] for continuous-time nonlinear systems. Our objective is to offer a comprehensive perspective on the developm...
详细信息
ISBN:
(纸本)9798350363029;9798350363012
In this paper, we revisit a partition-based distributed extended Kalman filter (DEKF) method proposed in [1] for continuous-time nonlinear systems. Our objective is to offer a comprehensive perspective on the development of this DEKF method, elucidating its relationship with partition-based distributed full-information estimation within a discrete-time linear framework. Specifically, we present a partition-based distributed full-information estimation formulation for discrete-time linear systems. We derive an analytical solution for this full-information estimation problem which is in the form of a partition-based distributed Kalman filter (DKF). The DKF approach is extended to address nonlinear systems through successive linearization of nonlinear subsystem models, and a discrete-time distributed extended Kalman filter (DEKF) approach is derived. We compare the derived discrete-time DEKF with the continuous-time DEKF approach in [1] to reveal the connection between the DEKF approach in [1] and the distributed full-information estimation in a discrete-time context. A simulated process is utilized to verify the effectiveness and assess the performance of the distributed extended Kalman filter.
Bulk synchronous programming (in distributed-memory systems) and the fork-join pattern (in shared-memory systems) are often used for problems where independent processes must periodically synchronize. Frequent synchro...
详细信息
Heterogeneity has been an indispensable aspect of distributedcomputing throughout the history of these systems. In particular, with the increasing popularity of accelerator technologies (e.g., GPUs and TPUs) and the ...
详细信息
ISBN:
(纸本)9798350311990
Heterogeneity has been an indispensable aspect of distributedcomputing throughout the history of these systems. In particular, with the increasing popularity of accelerator technologies (e.g., GPUs and TPUs) and the emergence of domain-specific computing via ASICs and FPGA, the matter of heterogeneity and understanding its ramifications on the system performance has become more critical than ever before. However, it is challenging to effectively educate students about the potential impacts of heterogeneity on: (a) the performance of distributedsystems;and (b) the logic of resource allocation methods to efficiently utilize the resources. Making use of the real infrastructure (such as those offered by the public cloud providers) for benchmarking the performance of heterogeneous machines, for different applications, with respect to different objectives, and under various workload intensities is cost- and time-prohibitive. Moreover, not all students (globally and nationally) have access or can afford such real infrastructure. To reinforce the quality of learning about various dimensions of heterogeneity, and to decrease the widening gap in education, we develop an open-source simulation tool, called E2C, that can help students researchers and practitioners to study any type of heterogeneous (or homogeneous) computing system and measure its performance under various system configurations. To make the learning curve shallow, E2C is equipped with an intuitive graphical user interface (GUI) that enables its users to easily examine system-level solutions (scheduling, load balancing, scalability, etc.) in a controlled environment within a short time and at no cost. In particular, E2C is a discrete event simulator that offers the following features: (i) simulating a heterogeneous computing system;(ii) implementing a newly developed scheduling method and plugging it into the system, (iii) measuring energy consumption and other output-related metrics;and (iv) powerful visual a
暂无评论