This paper explores new opportunities afforded by the growing deployment of compute and I/O accelerators to improve the performance and efficiency of hardware-accelerated computing services in data centers. We propose...
详细信息
ISBN:
(纸本)9781450371025
This paper explores new opportunities afforded by the growing deployment of compute and I/O accelerators to improve the performance and efficiency of hardware-accelerated computing services in data centers. We propose Lynx, an accelerator-centric network server architecture that offloads the server data and control planes to the SmartNIC, and enables direct networking from accelerators via a lightweight hardware-friendly I/O mechanism. Lynx enables the design of hardware-accelerated network servers that run without CPU involvement, freeing CPU cores and improving performance isolation for accelerated services. It is portable across accelerator architectures and allows the management of both local and remote accelerators, seamlessly scaling beyond a single physical machine. We implement and evaluate Lynx on GPUs and the Intel Visual Compute Accelerator, as well as two SmartNIC architectures - one with an FPGA, and another with an 8-core ARM processor. Compared to a traditional host-centric approach, Lynx achieves over 4x higher throughput for a GPU-centric face verification server, where it is used for GPU communications with an external database, and 25% higher throughput for a GPU-accelerated neural network inference service. For this workload, we show that a single SmartNIC may drive 4 local and 8 remote GPUs while achieving linear performance scaling without using the host CPU.
An open question in autonomous driving is how best to use simulation to validate the safety of autonomous vehicles. Existing techniques rely on simulated rollouts, which can be inefficient for finding rare failure eve...
详细信息
ISBN:
(纸本)9781728141497
An open question in autonomous driving is how best to use simulation to validate the safety of autonomous vehicles. Existing techniques rely on simulated rollouts, which can be inefficient for finding rare failure events, while other techniques are designed to only discover a single failure. In this work, we present a new safety validation approach that attempts to estimate the distribution over failures of an autonomous policy using approximate dynamic programming. Knowledge of this distribution allows for the efficient discovery of many failure examples. To address the problem of scalability, we decompose complex driving scenarios into subproblems consisting of only the ego vehicle and one other vehicle. These subproblems can be solved with approximate dynamic programming and their solutions are recombined to approximate the solution to the full scenario. We apply our approach to a simple two-vehicle scenario to demonstrate the technique as well as a more complex five-vehicle scenario to demonstrate scalability. In both experiments, we observed an increase in the number of failures discovered compared to baseline approaches.
Predicting the impact of open residential roads will have significance not only for transportation planning and economic development, but also for the enhancement of human society. Aiming at the problem of different r...
详细信息
ISBN:
(纸本)9781728161365
Predicting the impact of open residential roads will have significance not only for transportation planning and economic development, but also for the enhancement of human society. Aiming at the problem of different road and community structure, this paper considers the road length, road speed limit, traffic volume, traffic lights, and other factors to improve the traffic capability. Then uses nonlinear programming to obtain the optimal solution based on BPR road impedance model. Finally, based on different residential road structures and traffic pressure, the enhancement of open residential roads is compared to find out which road structure is optimal for opening.
In today's world, a vast amount of data is being generated by edge devices that can be used as valuable training data to improve the performance of machine learning algorithms in terms of the achieved accuracy or ...
详细信息
ISBN:
(纸本)9781728193601
In today's world, a vast amount of data is being generated by edge devices that can be used as valuable training data to improve the performance of machine learning algorithms in terms of the achieved accuracy or to reduce the compute requirements of the model. However, due to user data privacy concerns as well as storage and communication bandwidth limitations, this data cannot be moved from the device to the data centre for further improvement of the model and subsequent deployment. As such there is a need for increased edge intelligence, where the deployed models can be fine-tuned on the edge, leading to improved accuracy and/or reducing the model's workload as well as its memory and power footprint. In the case of Convolutional Neural networks (CNNs), both the weights of the network as well as its topology can be tuned to adapt to the data that it processes. This paper provides a first step towards enabling CNN finetuning on an edge device based on structured pruning. It explores the performance gains and costs of doing so and presents an extensible open-source framework that allows the deployment of such approaches on a wide range of networkarchitectures and devices. The results show that on average, data-aware pruning with retraining can provide 10.2pp increased accuracy over a wide range of subsets, networks and pruning levels with a maximum improvement of 42.0pp over pruning and retraining in a manner agnostic to the data being processed by the network.
The ultraviolet-induced degradation (UVID) of solar panels is associated with the deterioration of cell performance and reduced reliability of packaging materials. Here we examine the UV stability of different archite...
详细信息
ISBN:
(纸本)9781728161150
The ultraviolet-induced degradation (UVID) of solar panels is associated with the deterioration of cell performance and reduced reliability of packaging materials. Here we examine the UV stability of different architectures of high-efficiency solar cells without any encapsulation. Identical UV exposure tests were performed at two different labs using UVA-340 fluorescent lamps under different electrical bias configurations (open- or short-circuit) and irradiated cell surfaces (for bifacial technologies). Cell technologies, including heterojunction (HJ), interdigitated back-contact cells (IBC), passivated emitter rear contact (PERC), passivated emitter rear totally-diffused (PERT), are found to be more susceptible to UVID, leading to significant I-sc loss (up to 4%) and P-max loss (up to 15%) as compared to the conventional back surface field (Al-BSF) cells after exposure to UV irradiation of 8.92 MJ m(-2).nm(-1) at 340 nm. Additionally, the bifacial cells when irradiated from the backside exhibited greater photocurrent loss compared to the front side exposure, indicating potential sensitivity of rear surface passivation to UV radiation.
Hybrid deep neural-symbolic architecture for event-detection employs a deep neural network at the back-end to perform low-level reasoning and a symbolic logical module to perform high-level cognitive reasoning. The cu...
详细信息
Over the past few years, genome research using machine and in-depth learning techniques has become increasingly popular, and researchers are being provided with sophisticated data analysis tools. Recognition of patter...
详细信息
ISBN:
(纸本)9781728162157
Over the past few years, genome research using machine and in-depth learning techniques has become increasingly popular, and researchers are being provided with sophisticated data analysis tools. Recognition of patterns of DNA secondary structures and genomic functional elements are still poorly investigated, despite the fact that research in this area has the potential to contribute greatly to the development of medicine and pharmacology. This study aims to explore machine and deep learning methods that have proven to be successful in natural language processing with respect to the task of DNA sequence recognition. Two deep learning models based on CNN and LSTM architectures were developed. Each model was tested on multiple classification tasks for recognition of DNA sequences containing quadruplexes with potential function of nucleosome barriers. Additionally, model interpretation analysis was performed in the form of extraction of CNN significant filters and their transformation into DNA-motifs.
This paper presents an open-source wireless Smart Power Outlet (SPO) system to improve the control and monitoring of household appliances in IoT-based smart homes. The system supports not only the most typical SPO fun...
详细信息
ISBN:
(纸本)9781728147017
This paper presents an open-source wireless Smart Power Outlet (SPO) system to improve the control and monitoring of household appliances in IoT-based smart homes. The system supports not only the most typical SPO functionalities, such as remote on/off control or real-time monitoring of the current consumption, but also other important functionalities, like programming of the power supply time schedule, automatic interruption of vampire currents, or prevention of electrocutions. Moreover, the devised system architecture allows to connect multiple SPOs, even with different capabilities, in an ieee 802.15.4 Sub-1 GHz Personal Area network (PAN). This network is managed by a Controller that also acts as a border router, providing remote access to the SPOs. A prof-of-concept prototype of the system has been implemented and tested. The considered SPO and Controller were based on SimpleLink Sub-1 GHz CC1310 wireless microcontrollers running the Contiki OS. The obtained results demonstrate the basic functionalities offered by the proposed open-source SPO and suggest that it can be used to make the next generation of homes smarter at lower costs.
Due to the latest advances in machine learning algorithms new deep learning-based approaches to the interpretation of 12-lead electrocardiograms have been developed, demonstrating the quality of diagnostics comparable...
详细信息
ISBN:
(纸本)9781665412919
Due to the latest advances in machine learning algorithms new deep learning-based approaches to the interpretation of 12-lead electrocardiograms have been developed, demonstrating the quality of diagnostics comparable to the expert one. In this paper, we propose several techniques increasing the quality of ECG classification by a deep neural network. The techniques include patient metadata incorporation, signal denoising and self-adaptive model training. The experimental validation of the approaches was carried out on a novel dataset containing 64198 standard ECG recordings obtained during routine medical practice. The conducted experiments demonstrated statistically significant quality growth compared to the baseline, supporting the further application of our findings.
Real-time status updating plays a pivotal role in automated control, situational awareness, and networked monitoring. However, how to balance data freshness and its distortion due to lossy compression remains open. In...
详细信息
ISBN:
(纸本)9781728186955
Real-time status updating plays a pivotal role in automated control, situational awareness, and networked monitoring. However, how to balance data freshness and its distortion due to lossy compression remains open. In this paper, we are interested in the optimal cross-layer policy that minimizes the Age-of-Information (AoI) and distortion simultaneously over an ON/OFF channel. Specifically, we formulate a hierarchical framework to jointly schedule the lossy compression in the application layer and the packet scheduling in the physical layer. In the application layer, we characterize the compression loss using age or forgetting factor weighted distortion. The optimal tradeoff between the age reduction and compression loss is then revealed via convex optimizations. It allows us to further determine how many packets to send in the physical layer based on a probabilistic scheduling policy. To this end, a Constrained Markov Decision Process (CMDP) problem is formulated and solved by Linear programming (LP), which gives the optimal tradeoff between the data freshness and distortion, as well as, the optimal cross-layer strategy to strike the AoI-distortion tradeoff.
暂无评论