This paper presents a streamlined approach to address the challenges of integrating heterogeneous clusters in deep learning(DL) accelerators. As the demand for scalable and high-performance computing in DL continues t...
详细信息
In this paper, a novel SoC FPGA based approach is proposed that accelerates the development of energy control applications for decentralized converter nodes that are controlled by one central unit. In this particular ...
详细信息
This paper explores and evaluates the potential of deep neural network (DNN)-based machine learning algorithms on embedded many-core processors in cyber-physical systems, such as self-driving systems. To run applicati...
详细信息
ISBN:
(纸本)9781665409674
This paper explores and evaluates the potential of deep neural network (DNN)-based machine learning algorithms on embedded many-core processors in cyber-physical systems, such as self-driving systems. To run applications in embeddedsystems, a platform characterized by low power consumption with high accuracy and real-time performance is required. Furthermore, a platform is required that allows the coexistence of DNN applications and other applications, including conventional real-time control software, to enable advanced embeddedsystems, such as self-driving systems. Clustered many-core processors, such as Katray MPPA380 Coolidge, can run multiple applications on a single platform because each cluster can run applications independently. Moreover, MPPA3-80 integrates multiple arithmetic elements that operate at low frequencies, thereby enabling high performance and low power consumption comparable to that of embedded graphics processing units. Furthermore, the Kalray Neural Network (KaNN) code generator, a deep learning inference compiler for the MPPA3-80 platform, can efficiently perform DNN inference on MPPA3-80. This paper evaluates DNN models, including You Only Look Once (YOLO)-based and Single Shot MultiBox Detector (SSD)-based models, on MPPA3-80. The evaluation examines the frame rate and power consumption in relation to the size of the input image, the computational accuracy, and the number of clusters.
It is generally assumed that elastic parallel applications, with the ability to dynamically resize their process count, would provide numerous benefits to High-Performance computing (HPC) systems and applications. Sup...
详细信息
Capturing the fundamental qualities and properties of the local spinach variation entails doing a thorough investigation. Examining the spinach variant's unique morphology, nutritional makeup, flavor character, an...
详细信息
To enhance the environmental adaptability of small-sized robots, jumping is commonly employed to achieve high mobility and obstacle-clearing capabilities. However, the prevalent problem of flipping in jumping robots l...
详细信息
This Special Issue addresses the evolving landscape of big data generated by sensors, devices, and services. The shift from centralized cloud infrastructures to distributed systems that involve cloud, edge, and Intern...
详细信息
This Special Issue addresses the evolving landscape of big data generated by sensors, devices, and services. The shift from centralized cloud infrastructures to distributed systems that involve cloud, edge, and Internet of Things (IoT) devices requires innovative approaches to managing and analyzing big data. The key challenges include privacy, security, energy efficiency, data quality, and trust. This Special Issue invited researchers to submit innovative solutions covering topics such as: Big Data Analytics and Machine Learning;Integrated, Heterogeneous, and Distributed Infrastructures for Big Data Management;Big Data Platforms and Technologies;real-time Big Data Services and applications;Big Data Security and Privacy Preservation;Big Data Quality and Trust;Trustworthy data sharing;Sustainability and Energy-Efficiency of Big Data;Storage and Computation;Big Data and Analytics for Healthcare;Big Data applications and Experiences. This initiative expands on discussions from the ieee Big Data Service (BDS) 2023 conference held in Athens Greece, reaching a broader audience of researchers.
Embedding artificial intelligence onto low-power devices is a challenging task that has been partially overcome by recent advances in machine learning and hardware design. Currently, deep neural networks can be deploy...
详细信息
ISBN:
(纸本)9798350373981;9798350373974
Embedding artificial intelligence onto low-power devices is a challenging task that has been partially overcome by recent advances in machine learning and hardware design. Currently, deep neural networks can be deployed on embedded targets to perform various tasks such as speech recognition, object detection or human activity recognition. However, it is still possible to optimize deep neural networks on embedded devices. These optimizations mainly concern energy consumption, memory and real-time constraints, but also easier deployment at the edge. In addition, there is still a need for a better understanding of what can be achieved for different use cases. This work focuses on the quantization and deployment of deep neural networks on low-power 32-bit microcontrollers. In this article, the quantization method used is based on solving an integer optimization problem derived from the neural network model and concerning the accuracy of the computations and results at each point of the network. We evaluate the performance of our quantization method on a collection of neural networks measuring the analysis time and time-to-solution improvement between the floating- and fixed-point networks, considering a typical embedded platform employing a STM32 Nucleo-144 microcontroller.
In recent studies of object detection and tracking, neural networks have been widely used, and their accuracy has improved. However, its computational complexity is very high and requires the use of high-end GPUs. In ...
详细信息
ISBN:
(纸本)9798350346855
In recent studies of object detection and tracking, neural networks have been widely used, and their accuracy has improved. However, its computational complexity is very high and requires the use of high-end GPUs. In order to achieve real-time inference on edge devices, it is necessary to reduce the computational complexity of the network by scaling it down, but this leads to a loss of accuracy. To avoid this loss of accuracy, a method has been proposed in which object detection is performed using a neural network at regular intervals, and in the frames in between, the detected object positions are interpolated using motion prediction. In this research, we propose a method to improve the accuracy of interpolation even when the camera is moving by using an affine transformation used for image stabilization. We also show its real-time computation method on Jetson TX2, one of the lowest power embedded GPUs. The proposed method enables real-time processing of object detection using Yolov5s and tracking of the detected objects at the edge.
Networked robotic systems balance compute, power, and latency constraints in applications such as self-driving vehicles, drone swarms, and teleoperated surgery. A core problem in this domain is deciding when to offloa...
详细信息
ISBN:
(纸本)9798350377712;9798350377705
Networked robotic systems balance compute, power, and latency constraints in applications such as self-driving vehicles, drone swarms, and teleoperated surgery. A core problem in this domain is deciding when to offload a computationally expensive task to the cloud, a remote server, at the cost of communication latency. Task offloading algorithms often rely on precise knowledge of system-specific performance metrics, such as sensor data rates, network bandwidth, and machine learning model latency. While these metrics can be modeled during system design, uncertainties in connection quality, server load, and hardware conditions introduce real-time performance variations, hindering overall performance. We introduce PEERNet, an end-to-end and real-time profiling tool for cloud robotics. PEERNet enables performance monitoring on heterogeneous hardware through targeted yet adaptive profiling of system components such as sensors, networks, deep-learning pipelines, and devices. We showcase PEERNet's capabilities through networked robotics tasks, such as image-based teleoperation of a Franka Emika Panda arm and querying vision language models using an Nvidia Jetson Orin. PEERNet reveals non-intuitive behavior in robotic systems, such as asymmetric network transmission and bimodal language model output. Our evaluation underscores the effectiveness and importance of benchmarking in networked robotics, demonstrating PEERNet's adaptability. Our code is open-source and available at ***/UTAustin-SwarmLab/PEERNet.
暂无评论