Deep Reinforcement Learning (DRL) is vital in various AI applications. DRL algorithms comprise diverse compute kernels, which may not be simultaneously optimized using a homogeneous architecture. However, even with av...
详细信息
ISBN:
(纸本)9798400705977
Deep Reinforcement Learning (DRL) is vital in various AI applications. DRL algorithms comprise diverse compute kernels, which may not be simultaneously optimized using a homogeneous architecture. However, even with available heterogeneous architectures, optimizing DRL performance remains a challenge due to the complexity of hardware and programming models employed in modern data centers. To address this, we introduce PEARL, a toolkit for composing parallel DRL systems on heterogeneous platforms consisting of general-purpose processors (CPUs) and accelerators (GPUs, FPGAs). Our innovations include: 1. A general training protocol agnostic of the underlying hardware, enabling portable implementations across various platforms. 2. Incorporation of DRL-specific optimizations on runtime scheduling and resource allocation, facilitating parallelized training and enhancing the overall system performance. 3. Automatic optimization of DRL task-to-device assignments through throughput estimation. 4. High-level API for productive development using the toolkit. We showcase our toolkit through experimentation with two widely used DRL algorithms, DQN and DDPG, on two diverse heterogeneous platforms. The generated implementations outperform state-of-the-art libraries for CPU-GPU platforms by up to 2.2x throughput improvements, and 2.4x higher performance portability across platforms.
In contemporary urban environments, the federation of IoT-empowered data spaces is gaining ground as a concept, however a single unifying approach to federation is still elusive. As a result, the exchange of data acro...
详细信息
ISBN:
(纸本)9798350366266;9798350366259
In contemporary urban environments, the federation of IoT-empowered data spaces is gaining ground as a concept, however a single unifying approach to federation is still elusive. As a result, the exchange of data across heterogeneous federated data spaces often encounters challenges, such as different data models and data exchange protocols, or stringent policies prohibiting data sharing across federations. This paper introduces CCDUIT, a software overlay architecture designed to address these issues, facilitating seamless cross-federation collaboration. As a comprehensive solution, CCDUIT offers modularity, scalability, and interoperability, enabling efficient and sovereignty-preserving data exchange across diverse federations. CCDUIT leverages rich property graph models for context modeling of federations, which are exchanged across federations via publish/subscribe topic schemes, with data sharing, and access control governed by policy mechanisms matching the topics. Our experimental results demonstrate that CCDUIT significantly reduces the complexity and effort involved in data management and sharing across federations, with a quantifiable decrease in operational overhead by approximately 40% to 60%. This streamlines collaboration while ensuring compliance with data sovereignty and sharing policies, providing a solution to a longstanding challenge in federation-based data ecosystems.
Hyper Dimensional (HD) computing when employed for machine learning tasks such as learning and classification involve the computation and comparison of large hypervectors within memory. The dimensionality of hypervect...
详细信息
ISBN:
(纸本)9798350300246
Hyper Dimensional (HD) computing when employed for machine learning tasks such as learning and classification involve the computation and comparison of large hypervectors within memory. The dimensionality of hypervectors in the order of thousands makes it difficult to implement HD computing in von Neumann systems. The in-memory computing capability of the memristor crossbar will speed up the vector-matrix multiplication to perform HD computing. The paper presents a memristive accelerator design for hyper-dimensional consumer text analytics. The hyper-dimensional computing offers simple encoding and data transformation techniques with higher accuracy in comparison with conventional techniques. The circuit implementation of the KNN-based HD classification using N-grams is proposed in the paper. The effect of device variations on the performance of the proposed system is evaluated and the performance is compared with conventional classifiers. The trade-off between accuracy and dimensionality in HD computing is presented.
Quantum computers promise polynomial or exponential speed-up in solving certain problems compared to classical computers. However, in practical use, there are currently a number of fundamental technical challenges. On...
详细信息
ISBN:
(纸本)9789819709885;9789819709892
Quantum computers promise polynomial or exponential speed-up in solving certain problems compared to classical computers. However, in practical use, there are currently a number of fundamental technical challenges. One of them concerns the loading of data into quantum computers, since they cannot access common databases. In this vision paper, we develop a hybrid data management architecture in which databases can serve as data sources for quantum algorithms. To test the architecture, we perform experiments in which we assign data points stored in a database to clusters. For cluster assignment, a quantum algorithm processes this data by determining the distances between data points and cluster centroids.
Smart devices now own the capability to conduct machine learning tasks. Such capabilities offer the potential to improve the learning efficiency and accuracy for certain difficult learning tasks with information shari...
详细信息
ISBN:
(纸本)9798350363999;9798350364002
Smart devices now own the capability to conduct machine learning tasks. Such capabilities offer the potential to improve the learning efficiency and accuracy for certain difficult learning tasks with information sharing among edge devices. In this paper, we discuss the implementation of ensemble reinforcement learning in mobile edge computing environments in real-time. We propose a novel system using Information-centric Networking to facilitate fast and efficient multicast transmission over edge devices. The system improves the efficiency and accuracy of ensemble reinforcement learning over edge devices by using targeted data transmission. To achieve efficient and instant transmission, we propose a new transport layer protocol as the middleware for upper-layer learning data and lower-layer ICN links and radios. We implement the proposed system using real edge devices and conduct experiments with several training testbeds to evaluate the performances. The results from the experiments show that the system can achieve up to 28.47% better reward within the same iterations compared to traditional multicast networks and 7.26 times reward compared to training without instant information sharing.
Optical injection locking (OIL) has numerous advantages over semiconductor lasers, including an increased frequency response, a wide modulation bandwidth, and a low-frequency chirp. In this paper, we delve into variou...
详细信息
LiDAR plays a critical role in autonomous car perception. Hence, the robustness of LiDAR data is imperative. However, malfunctions resulting from sensor cover contaminants are unavoidable and can lead to erroneous dat...
详细信息
ISBN:
(纸本)9798400705977
LiDAR plays a critical role in autonomous car perception. Hence, the robustness of LiDAR data is imperative. However, malfunctions resulting from sensor cover contaminants are unavoidable and can lead to erroneous data that slowly degrade performance. As such, detecting contamination in LiDAR is essential but remains an open challenge due to varying contaminant types, changing properties over time, and deployment aspects. Automatic classification of the contaminants would enable the automated response (like cleaning the sensor) to ensure the integrity of the data collected by the LiDAR sensor. To minimize the effect on the whole vehicle perception system, the contamination classification has to be performed near the sensor and in a computationally efficient way. To address these challenges, we have conducted a feasibility study of developing an efficient near-sensor machine learning-powered contaminant classification running on the RISC-V architecture. This paper proposes a lightweight 2D CNN network, TinyLid, trained to classify contaminants based on the most comprehensive LiDAR contaminant dataset. The results presented in this paper show that the proposed solution can achieve high classification performance while being computationally efficient and running on hardware with negligible power consumption compared to the LiDAR sensor itself. Specifically, implementing a proposed ML model on a reference RISC-V architecture GAP8 achieves the inference time of 2.575 milliseconds, 6.138 operations/cycle, and uses only 6.8% of 512 KiB L2 memory. The results presented in this work showcase the possibility of increasing the reliability and integrity of the LiDAR-collected sensor data without significant computational or energy consumption impact on the broader system.
The proceedings contain 27 papers. The topics discussed include: a general-purpose analog computer to population protocol compiler;compile-time optimization of the energy consumption of numerical computations;effectiv...
ISBN:
(纸本)9798400704925
The proceedings contain 27 papers. The topics discussed include: a general-purpose analog computer to population protocol compiler;compile-time optimization of the energy consumption of numerical computations;effective HPC programming via domain specific abstractions and compilation;high-level synthesis for complex applications: the Bambu approach;relaxed threshold implementations;using a performance model to implement a superscalar CVA6;seeing beyond the order: a LEN5 to sharpen edge microprocessors with dynamic scheduling;integrating SystemC-AMS power modeling with a RISC-V ISS for virtual prototyping of battery-operated embedded devices;a gigabit, DMA-enhanced open-source ethernet controller for mixed-criticality systems;and model theft attack against a tinyML application running on an ultra-low-power open-source SoC.
Heterogeneous systems are the go-to solution in computing, ranging from HPC to mobile, due to their excellent performance and energy efficiency. However, using this kind of systems adequately poses challenges. Namely,...
详细信息
ISBN:
(纸本)9798400705977
Heterogeneous systems are the go-to solution in computing, ranging from HPC to mobile, due to their excellent performance and energy efficiency. However, using this kind of systems adequately poses challenges. Namely, each of the devices that comprise the system are often considered as independent entities that need to be managed and dispatched work to manually. This represents a significant burden on programming and often results in a fastest-device-only approach, in which compute intensive regions are offloaded to the fastest device available, while the rest of the system idles. This idling represents a waste of computing capabilities that could be leveraged if the workload was co-executed. Software solutions have been proposed to provide transparent co-execution, but they always trade abstraction and ease of use for performance. In general, a higher level of abstraction, which improves programmability, will generate overheads. This paper presents HCoD (Hardware co-execution Dispatcher), a design for a hardware dispatcher to enable transparent co-execution without the overheads in integrated heterogeneous SoCs. The dispatcher distributes the work associated to a single kernel among CPU cores and GPU compute units at runtime, while monitoring co-execution to balance the load and prevent a slow device from delaying computation. HCoD achieves an excellent balance among all the compute elements and improves performance by an average of 14%, by transparently leveraging the computing capabilities already available in the hardware.
作者:
Bhoi, Sourav Kumar
Department of Computer Science and Engineering Berhampur India
In the current world of computing, due to tremendous rise of smart devices all over the world it is difficult to get quick service from the limited servers in the heterogeneous environment as some servers are highly l...
详细信息
暂无评论