Searching for a particular device in an ocean of devices is a perfect illustration of the idiom 'searching a needle in a haystack'. Yet the future IoT and edgecomputing platforms are facing an even more chall...
ISBN:
(纸本)9781450350877
Searching for a particular device in an ocean of devices is a perfect illustration of the idiom 'searching a needle in a haystack'. Yet the future IoT and edgecomputing platforms are facing an even more challenging problem because their mission-critical operations (e.g., application orchestration, device and application telemetry, inventory management) depend on their capability of identifying nodes of interest from potentially millions of service providers across the globe according to highly dynamic attributes such as geo-location information, bandwidth availability, real-time workload and so on. For example, a vehicular-based crowd sensing application that collects air quality data near an exit of a highway needs to locate cars in close proximity to the exit among millions of cars running on the road. In a business model where an enterprise offers a framework for clients to avail such edge/IoT services, we investigate the following problem: "among millions of IoT/edge nodes, how do we locate and communicate with only those nodes that satisfy certain attributes, especially when some of these attributes change rapidly?"In this paper, we address this problem through the design of a scalable message broker based on the following novel intuition: device discovery should be a joint effort between a centrally managed enterprise-level system (high availability, low accuracy) and the fully decentralized edge (high accuracy, unpredictable availability). To elaborate, the enterprise can centrally maintain and manage the attributes of all the IoT devices. However, since millions of devices cannot constantly update their attribute information, central management has the issue of attribute staleness. Clearly the devices themselves have the most up-to-date information. However, it is not feasible for every request to be routed to million devices connected by unpredictable networks, where only some of them may possess the correct attributes. In this paper, we propose a message
In the past decade, convolutional neural networks (CNNs) have achieved great practical success in image transformation tasks, including style transfer, semantic segmentation, etc. CNN-based style transfer, which denot...
ISBN:
(纸本)9781450367332
In the past decade, convolutional neural networks (CNNs) have achieved great practical success in image transformation tasks, including style transfer, semantic segmentation, etc. CNN-based style transfer, which denotes transforming an image into a desired output image according to a user-specified style image, is one of the most popular techniques in image transformation. It has led to to many successful industrial applications with significant commercial impacts, such as Prisma and DeepArt. Figure 1 shows the general workflow of the CNN-based style transfer. Given a content image and a user-specified style image, the content features and style features can be extracted using a pre-trained CNN, and then be merged to generate the stylized image. The CNN model is trained for generating a stylized image that has similar content features as the content image's and similar style features as the style image's. In this example, we can see the content image is captured at a lake in the daytime, while the style image is another similar scene captured at dusk. After performing style transfer, the content image is successfully transformed to the dusky scene while keeping the content unchanged as the content image.
The distributed Cloud computing paradigm is continuously being adopted within the industrial automation domain. The most distinguishing feature of these edge Clouds relates to their ability to provide low-latency and ...
详细信息
ISBN:
(纸本)9781665433266
The distributed Cloud computing paradigm is continuously being adopted within the industrial automation domain. The most distinguishing feature of these edge Clouds relates to their ability to provide low-latency and even hard realtime services. As infrastructure deployments can be rather heterogeneous in their nature, service providers require precise means for estimating end-to-end application latency behavior, in order to know performance boundaries that can be met for defining certain Service Level Agreements (SLAs). Although network performance tools exist for many years, mechanisms for assessing hard real-time performance of applications in distributed edge Cloud environments have not been considered extensively yet. Therefore, we use a built-in feature of the Linux Kernel, the extended Berkeley Packet Filter (eBPF), to measure delays between targeted endpoints in the kernel stack, that enable the user to gain deeper and more accurate measurements of events as compared to generalized approaches (as accurate as eBPF / the time-stamping facility from the kernel). As a result, the real-time behavior of particular edge Cloud deployments, including its hosted applications, can be profiled in detail by end-users as well as service-providers. Within our evaluation we have monitored a cyclic transmission of packets with a scheduled delay of under 190 μs and measured a round trip time under 2 ms. Future work include profiling the real-time behavior of potentially hosted time-critical applications, such as virtual Programmable Logic Controllers (vPLCs), over real-time networks, such Time Sensitive Networking (TSN); the extension towards dynamically configured real-time networks; and finally its application to future organic, self-optimizing Ultra-Reliable Low-Latency Communication 6G core networks.
As Point Clouds (PCs) gain popularity in processing millions of data points for 3D rendering in many applications, efficient data compression becomes a critical issue. This is because compression is the primary bottle...
详细信息
ISBN:
(纸本)9781665462723
As Point Clouds (PCs) gain popularity in processing millions of data points for 3D rendering in many applications, efficient data compression becomes a critical issue. This is because compression is the primary bottleneck in minimizing the latency and energy consumption of existing PC pipelines. Data compression becomes even more critical as PC processing is pushed to edge devices with limited compute and power budgets. In this paper, we propose and evaluate two complementary schemes, intra-frame compression and inter-frame compression, to speed up the PC compression, without losing much quality or compression efficiency. Unlike existing techniques that use sequential algorithms, our first design, intra-frame compression, exploits parallelism for boosting the performance of both geometry and attribute compression. The proposed parallelism brings around 43.7× performance improvement and 96.6% energy savings at a cost of 1.01× larger compressed data size. To further improve the compression efficiency, our second scheme, inter-frame compression, considers the temporal similarity among the video frames and reuses the attribute data from the previous frame for the current frame. We implement our designs on an NVIDIA Jetson AGX Xavier edge GPU board. Experimental results with six videos show that the combined compression schemes provide 34.0× speedup compared to a state-of-the-art scheme, with minimal impact on quality and compression ratio.
Building domain-specific accelerators is becoming increasingly paramount to meet the high-performance requirements under stringent power and real-time constraints. However, emerging application domains like autonomous...
详细信息
ISBN:
(纸本)9781665462723
Building domain-specific accelerators is becoming increasingly paramount to meet the high-performance requirements under stringent power and real-time constraints. However, emerging application domains like autonomous vehicles are complex systems with constraints extending beyond the computing stack. Manually selecting and navigating the design space to design custom and efficient domain-specific SoCs (DSSoC) is tedious and expensive. Hence, there is a need for automated DSSoC design methodologies. In this paper, we use agile and autonomous UAVs as a case study to understand how to automate domain-specific SoCs design for autonomous vehicles. Architecting a UAV DSSoC requires consideration of parameters such as sensor rate, compute throughput, and other physical characteristics (e.g., payload weight, thrust-to-weight ratio) that affect overall performance. Iterating over several component choices results in a combinatorial explosion of the number of possible combinations: from tens of thousands to billions, depending on implementation details. To navigate the DSSoC design space efficiently, we introduce AutoPilot, a systematic methodology for automatically designing DSSoC for autonomous UAVs. AutoPilot uses machine learning to navigate the large DSSoC design space and automatically select a combination of autonomy algorithm and hardware accelerator while considering the cross-product effect across different UAV components. AutoPilot consistently outperforms general-purpose hardware selections like Xavier NX and Jetson TX2, as well as dedicated hardware accelerators built for autonomous UAVs. DSSoC designs generated by AutoPilot increase the number of missions on average by up to 2.25×, 1.62×, and 1.43× for nano, micro, and mini-UAVs, respectively, over baselines. Further, we discuss the potential application of AutoPilot methodology to other related autonomous vehicles.
暂无评论