Deep Learning Neural Networks (DLNN) require an immense amount of computation, especially in the training phase when multiple layers of intermediate neurons need to be built. The situation is even more dramatic today ...
详细信息
ISBN:
(纸本)9798400718021
Deep Learning Neural Networks (DLNN) require an immense amount of computation, especially in the training phase when multiple layers of intermediate neurons need to be built. The situation is even more dramatic today with the proliferation of applications with intelligence at the edge, not just in the cloud. Therefore, to meet the new requirements of edge computing, it is imperative to accelerate the execution phases of neural networks as much as possible. In this paper, we will focus on the algorithm known as Particle Swarm Optimization (PSO). It is a bio-inspired, stochastic optimization approach whose goal is to iteratively improve the solution to a given (usually complex) problem by attempting to approximate a given objective. The use of PSO in an edge computing environment has the potential to make the training of the DLNN there without the need to transfer resource-intensive tasks to the cloud. However, implementing an efficient PSO is not a straightforward process due to the complexity of the computations performed on the swarm of particles and the iterative execution until until a near-target solution with minimal error is achieved. In the present work, two parallelizations of the PSO algorithm have been implemented, both designed for a distributed execution environment (Apache Spark). The first PSO parallelization follows a synchronous scheme;i.e., the best global position found by particles is globally updated before the execution of the next iteration of the algorithm. This implementation proved to be more efficient for medium-sized datasets (<40000 data points). In contrast, the second implementation is an asynchronous parallel variant of the PSO algorithm, which showed lower execution time for large datasets (> 170,000 data points) compared to the first one. Additionally, it exhibits better scalability and elasticity with respect to increasing dataset size. Both variants of the PSO have been implemented to distribute the computational load (particle fit
We present the first GPU-based parallel algorithm to efficiently update vertex coloring on large dynamic networks. For single GPU, we introduce the concept of loosely maintained vertex color update that reduces comput...
详细信息
ISBN:
(纸本)9781665494236
We present the first GPU-based parallel algorithm to efficiently update vertex coloring on large dynamic networks. For single GPU, we introduce the concept of loosely maintained vertex color update that reduces computation and memory requirements. For multiple GPUs, in distributed environments, we propose priority-based ordering of vertices to reduce the communication time. We prove the correctness of our algorithms and experimentally demonstrate that for graphs of over 16 million vertices and over 134 million edges on a single GPU, our dynamic algorithm is as much as 20x faster than state-of-the-art algorithm on static graphs. For larger graphs with over 130 million vertices and over 260 million edges, our distributed implementation with 8 GPUs produces updated color assignments within 160 milliseconds. In all cases, the proposed parallel algorithms produce comparable or fewer colors than state-of-the-art algorithms.
Federated Learning (FL) is a collaborative model training approach that protects data privacy while allowing for model updates and optimization. However, FL is vulnerable to poisoning attacks due to its distributed na...
详细信息
Terrain parameters such as slope, aspect, and hillshading are essential in various applications, including agriculture, forestry, and hydrology. However, generating high-resolution terrain parameters is computationall...
详细信息
ISBN:
(纸本)9798400701559
Terrain parameters such as slope, aspect, and hillshading are essential in various applications, including agriculture, forestry, and hydrology. However, generating high-resolution terrain parameters is computationally intensive, making it challenging to provide these value-added products to communities in need. We present a scalable workflow called GEOtiled that leverages data partitioning to accelerate the computation of terrain parameters from digital elevation models, while preserving accuracy. We assess our workflow in terms of its accuracy and wall time by comparing it to SAGA, which is highly accurate but slow to generate results, and to GDAL, which supports memory optimizations but not data parallelism. We obtain a coefficient of determination (R-2) between GEOtiled and SAGA of 0.794, ensuring accuracy in our terrain parameters. We achieve an X6 speedup compared to GDAL when generating the terrain parameters at a high-resolution (10 m) for the Contiguous United States (CONUS).
Although traditional 3D terrain algorithms can improve the rendering efficiency of the terrain, they often ignore the performance of the terrain itself. The use of four textures is not sufficient to deal with complex ...
详细信息
Graph queries on large networks leverage the stored graph properties to provide faster results. Since real-world graphs are mostly dynamic, i.e., the graph topology changes over time, the corresponding graph attribute...
详细信息
ISBN:
(纸本)9781450397964
Graph queries on large networks leverage the stored graph properties to provide faster results. Since real-world graphs are mostly dynamic, i.e., the graph topology changes over time, the corresponding graph attributes also change over time. In certain situations, recompiling or updating earlier properties is necessary to maintain the accuracy of a response to a graph query. Here, we first propose a generic framework for developing parallel algorithms to update graph properties on large dynamic networks. We use our framework to develop algorithms for updating Single Source Shortest Path (SSSP) and Vertex Color. Then we propose applications of the developed algorithms in Unmanned Aerial Vehicle (UAV) based delivery systems under time-varying dynamics. Finally, we implement our SSSP and vertex color update algorithms for Nvidia GPU architecture and show empirically that the developed algorithms can update properties in large dynamic networks faster than the state-of-the-art techniques.
Quantum computing is a new computing paradigm that exploits laws of quantum mechanics to achieve an exponential speedup compared to classical logic. However, noise strongly limits current quantum hardware, reducing ac...
详细信息
ISBN:
(纸本)9798350364613;9798350364606
Quantum computing is a new computing paradigm that exploits laws of quantum mechanics to achieve an exponential speedup compared to classical logic. However, noise strongly limits current quantum hardware, reducing achievable performance. Quantum Error Correction (QEC) techniques are a valuable approach to reduce the effects of noise. Nevertheless, the high computational complexity of QEC algorithms is incompatible with the tight time constraints of quantum devices. Thus, hardware acceleration is paramount to achieving real-time QEC. This work represents the first step in the FPGA acceleration of the Sparse Blossom Algorithm (SBA), a state-of-the-art decoding algorithm for QEC. We provide a performance profiling and a design methodology for the hardware development of the SBA. We evaluate the execution time, and energy efficiency of our solution, attaining up to 2.75x speedup and 9.59x improvement in energy efficiency compared to the software baseline.
Our proposed distributed computation framework addresses the issue of underutilized computing resources in institutions, companies, and communities, by providing a novel automated and efficient solution for ad-hoc dis...
详细信息
While parallel programming, particularly on graphics processing units (GPUs), and numerical optimization hold immense potential to tackle real-world computational challenges across disciplines, their inherent complexi...
详细信息
ISBN:
(纸本)9798350364613;9798350364606
While parallel programming, particularly on graphics processing units (GPUs), and numerical optimization hold immense potential to tackle real-world computational challenges across disciplines, their inherent complexity and technical demands often act as daunting barriers to entry. This, unfortunately, limits accessibility and diversity within these crucial areas of computer science. To combat this challenge and ignite excitement among undergraduate learners, we developed an application-driven course, harnessing robotics as a lens to demystify the intricacies of these topics making them tangible and engaging. Our course's prerequisites are limited to the required undergraduate introductory core curriculum, opening doors for a wider range of students. Our course also features a large final-project component to connect theoretical learning to applied practice. In our first offering of the course we attracted 27 students without prior experience in these topics and found that an overwhelming majority of the students fell that they learned both technical and soft skills such that they felt prepared for future study in these fields.
Modern supercomputers are becoming increasingly dense with accelerators. Industry leaders offer multi-GPU architectures with high interconnection bandwidth between the devices to match the requirements of modern workl...
详细信息
暂无评论