Machine learning and AI-based automated systems are gaining increasing attention for real-time intelligent applications by virtue of a superior co-ordination between the software and the hardware within these systems....
详细信息
ISBN:
(纸本)9781665419130
Machine learning and AI-based automated systems are gaining increasing attention for real-time intelligent applications by virtue of a superior co-ordination between the software and the hardware within these systems. Although the majority of the automated systems are implementing Convolutional neural networks (CNNs), and Deep Neural Networks (DNNs) on the hardware with impressive accuracy, a significant amount of cost is associated with data movement in these platforms. Recent advancements in processing-in-memory (PIM), a non-von Neumann computing paradigm, have proven to be very effective in minimizing data communication overheads by performing computations within the memory chip. However, these devices are primarily designed as inference engines and therefore have not been adequately investigated for real-time learning capabilities for applications in changing environments. In this work, we introduce uPIM, a PIM architecture that supports a Generative Adversarial Network (GAN)-based performance-aware online learning model for updating the weights with minimal overheads. Our hardware-software co-design approach exhibits superior performance and efficiency in real-time applications like Autonomous Navigation Systems (ANS) by leveraging massive data-level parallelism and ultra-low data movement latency. The evaluations are performed on multiple state-of-the-art deep learning networks like LeNet, AlexNet, ResNet18, 34, 50 on the German Traffic Sign Recognition Benchmark (GTSRB) dataset and the Belgium Traffic Sign Dataset (BTSD) with several data-precisions. The proposed performance-aware, quantization-friendly online learning based PIM architecture achieves an average accuracy of 72% for GTSRB and 83.4% for BTSD dataset under varying environment for CNNs implemented for Traffic Sign Recognition (TSR) with 8-bit fixed point data-precision.
Medical imaging applications produce large sets of similar images. The huge amount of data makes the manual analysis and interpretation a fastidious task. Medical image segmentation is thus an important process in ima...
详细信息
High performance computing applications have witnessed significant speedups in performance from leveraging the compute capabilities of heterogeneous platforms. These speedups are seldom taken advantage of in practice ...
详细信息
New fast moving markets demand flexible, adaptable production facilities to allow manufacturing companies a fast reaction on consumer needs. Reconfigurable manufacturing systems provide a solution for these problems. ...
详细信息
Next generation wireless networks environment consists of heterogeneous wireless technologies, offering mobility service. The objective is to exploit the popularity and the high data rates offered by wireless networks...
详细信息
ISBN:
(纸本)9781424417513
Next generation wireless networks environment consists of heterogeneous wireless technologies, offering mobility service. The objective is to exploit the popularity and the high data rates offered by wireless networks such as WE (Wireless Fidelity) or WiMAX (Worldwide Interoperability for Microwave Access) to enhance cellular services i.e. UMTS (Universal Mobile Telecommunication System). In this context, mobile terminal has to make several handoff decisions while moving between these technologies. In this paper we propose using the emerging EEEE802.21 standard defines Media Independent Handover (NUB) functions as transport service in order to offer a vertical handoff decision with a minimum of processing delay.
This paper presented a new kind of query, global nearest neighbors query, which is based on a special distance, global distance, between two objects during given time interval. According to the relationship with conti...
详细信息
ISBN:
(纸本)9780769529097
This paper presented a new kind of query, global nearest neighbors query, which is based on a special distance, global distance, between two objects during given time interval. According to the relationship with continuous nearest neighbor query, a native algorithm is proposed In terms of the mobility of data set, global distances at different situation are refined and some heuristics are presented for data set indexed by data structure of R tree family. Based on branch and bound technique and proposed pruning, updating and visiting heuristics, recursive depth-first and heap-based best-first query processing algorithms are developed for both cases. An extensive study based on experiments performed with synthetic data sets shows that the best-first algorithms outperform the depth-first algorithms.
Numerical computation on GPU has become easily accessible and offers good computation power for relatively little cost. Recently an application of Newton-Raphson method for analyzing power flow in multi-terminal high-...
详细信息
ISBN:
(纸本)9781479984909
Numerical computation on GPU has become easily accessible and offers good computation power for relatively little cost. Recently an application of Newton-Raphson method for analyzing power flow in multi-terminal high-voltage direct current (HVDC) networks was proposed and shown to have good results on five terminal grids. Since this method involves costly matrix operation, especially the inverse, increasing the number of terminals in the grid yields prohibitively large execution times in sequential operation. To address this issue, we adjust the algorithm so that it benefits from parallel computation and test our approach on recent GPU from NVidia. We give experimental results for grids up to few thousand terminals and show that execution time is still acceptable for real applications. We also provide some benchmarks of the GPU computation compared with other platforms.
Loop networks with multiple hops offer smaller diameters, path lengths and better fault-tolerance. These networks, also known as multi connected distributed loop (MCDL) networks in the literature, have extensive uses ...
详细信息
ISBN:
(纸本)1860948278
Loop networks with multiple hops offer smaller diameters, path lengths and better fault-tolerance. These networks, also known as multi connected distributed loop (MCDL) networks in the literature, have extensive uses in LAN, parallelprocessing, and multi processing environments. Results exist for finding shortest path between a single pair in O(8) time where 5 is diameter of distributed loop [4], and in O(h/g + log h), and g is GCD(N, h), where N is number of nodes h is hop size, without any knowledge of diameter [1]. In this paper we compare these algorithms based on their performance at various network loads.
Embedded systems are often characterized by limited memory while many applications on these systems are memory-intensive. Reducing the overhead of data movement between global memory and distributed local memory in su...
详细信息
ISBN:
(纸本)1932415068
Embedded systems are often characterized by limited memory while many applications on these systems are memory-intensive. Reducing the overhead of data movement between global memory and distributed local memory in such a system is critical to the performance of these applications. In this paper, we propose a unified theoretical framework for automatically partitioning parallel loops to optimize the data movement on such systems. We first introduce the notion of data movement and build a simple but accurate data movement model to estimate the overhead of the data movement for the footprint. We then present an algorithm to derive an optimal loop partitioning to minimize the number of data movement across the loop nests. We have implemented the framework in a parallel compiler on VE16, a limited memory embedded commercial system, and the experiment results demonstrate the efficiency of the proposed method.
Event stream processing (ESP) has become increasingly important in modern applications, ranging from supply chain management to real-time intrusion detection. Existing ESP engines have focused on detecting temporal pa...
ISBN:
(纸本)9781605586656
Event stream processing (ESP) has become increasingly important in modern applications, ranging from supply chain management to real-time intrusion detection. Existing ESP engines have focused on detecting temporal patterns from instantaneous events, that is, events with no duration. Under such a model, an event instance can only be happening "before", "after" or "at the same time as" another event instance. However, such sequential patterns are inadequate to express the complex temporal relationships in domains such as medical, finance and meteorology, where the events' durations could play an important role.
暂无评论