Locality-sensitive hashing (LSH) is an established method for fast data indexing and approximate similarity search, with useful parallelism properties. Although indexes and similarity measures are key for data cluster...
详细信息
ISBN:
(纸本)9783031125973;9783031125966
Locality-sensitive hashing (LSH) is an established method for fast data indexing and approximate similarity search, with useful parallelism properties. Although indexes and similarity measures are key for data clustering, little has been investigated on the benefits of LSH in the problem. Our proposition is that LSH can be extremely beneficial for parallelizing high-dimensional density-based clustering e.g., DBSCAN, a versatile method able to detect clusters of different shapes and sizes. We contribute to fill the gap between the advancements in LSH and density-based clustering. We show how approximate DBSCAN clustering can be fused into the process of creating an LSH index, and, through parallelization and fine-grained synchronization, also utilize efficiently available computing capacity. The resulting method, ***, can support a wide range of applications with diverse distance functions, as well as data distributions and dimensionality. We analyse its properties and evaluate our prototype implementation on a 36-core machine with 2-way hyper threading on massive data-sets with various numbers of dimensions. Our results show that *** effectively complements established state-of-the-art methods by up to several orders of magnitude of speed-up on higher dimensional datasets, with tunable high clustering accuracy through LSH parameters.
Shared memory programming and distributed memory programming, are the most prominent ways of parallelize applications requiring high processing times and large amounts of storage in High Performance Computing (HPC) sy...
详细信息
ISBN:
(纸本)9783030898205;9783030898199
Shared memory programming and distributed memory programming, are the most prominent ways of parallelize applications requiring high processing times and large amounts of storage in High Performance Computing (HPC) systems;parallelapplications can be represented by parallel Task Graphs (PTG) using Directed Acyclic Graphs (DAGs). The scheduling of PTGs in HPCS is considered a NP-Complete combinatorial problem that requires large amounts of storage and long processing times. Heuristic methods and sequential programming languages have been proposed to address this problem. In the open access paper: Scheduling in Heterogeneous distributed Computing Systems Based on Internal Structure of parallel Tasks Graphs with Meta-Heuristics, the Array Method is presented, this method optimizes the use of processing Elements (PE) in a HPCS and improves response times in scheduling and mapping resource with the use of the Univariate Marginal Distribution Algorithm (UMDA);Array Method uses the internal characteristics of PTGs to make task scheduling;this method was programmed in the C language in sequential form, analyzed and tested with the use of algorithms for the generation of synthetic workloads and DAGs of real applications. Considering the great benefits of parallel software, this research work presents the Array Method using parallel programming with OpenMP. The results of the experiments show an acceleration in the response times of parallel programming compared to sequential programming when evaluating three metrics: waiting time, makespan and quality of assignments.
A query processing engine is the core component of any modern database system. There are several types of query processing en-gines that employ different query processing techniques. The speed of data-driven decision-...
A query processing engine is the core component of any modern database system. There are several types of query processing en-gines that employ different query processing techniques. The speed of data-driven decision-making and analytics is crucial to organi-zations that build software and system applications. An intuitive way to speed up database querying is to improve the performance of these engines. Conventionally, databases use a disk-oriented, pull-based or tuple-at-a-time interpreted query evaluation model. This paper introduces a compilation-based, in-memory query pro-cessing engine CasaDB that accepts an SQL query and generates distributed C++ (UPC++) based physical query plans. As part of this work, different models and components of query processing are explored, and efficient Partitioned Global Address Space (PGAS) based parallel programs corresponding to SQL queries are designed and developed, emitted by a code generator that uses a data-centric compilation strategy. The approach proposed in this paper com-bines high-performance parallel programs with database query processing to take advantage of the advances in hardware available. We conduct an extensive experimental evaluation with industry-standard TPC-H benchmark. Our experimental evaluation shows that 4-node query execution produces up to 5× speedup in query performance over single-node approaches.
Performance and energy are the two most important objectives for optimization on heterogeneous HPC platforms. This work studies a mathematical problem motivated by the bi-objective optimization of a matrix multiplicat...
详细信息
ISBN:
(纸本)9783031061561;9783031061554
Performance and energy are the two most important objectives for optimization on heterogeneous HPC platforms. This work studies a mathematical problem motivated by the bi-objective optimization of a matrix multiplication application on such platforms for performance and energy. We formulate the problem and propose an algorithm of polynomial complexity solving the problem where all the application profiles of objective type one are continuous and strictly increasing, and all the application profiles of objective type two are linear increasing. We solve the problem for the matrix multiplication application employing five heterogeneous processors that include two Intel multicore CPUs, an Nvidia K40c GPU, an Nvidia P100 PCIe GPU, and an Intel Xeon Phi. Based on our experiments, a dynamic energy saving of 17% is gained while tolerating a performance degradation of 5% (a saving of 106 J for an execution time increase of 0.05 s).
For Global Navigation Satellite System (GNSS) data processing, voluminous real-time Continuously Operating Reference Stations (CORS) data processing is a challenging problem. There are many methods have been proposed ...
详细信息
ISBN:
(纸本)9783030861377;9783030861360
For Global Navigation Satellite System (GNSS) data processing, voluminous real-time Continuously Operating Reference Stations (CORS) data processing is a challenging problem. There are many methods have been proposed for regional network processing, such as parallel computing. However, they are mainly used for post-processing or near real-time processing. Due to the magnitude-increased and epoch-related of large geographic area CORS data processing, it brings huge challenges for real-time reception and efficient processing. Therefore, a real-time distributed Precise Point Positioning (PPP) platform is designed based on the idea of distributed computing and message queue to solve voluminous real-time CORS data processing, and it decomposed the real-time data processing into three processes: Input/output (I/O) multiplexing for real-time stream data acquisition, parallel PPP computing, and Weight Round Robin task scheduling. The real-time data of 5 international GNSS Service (IGS) stations is processed, the results show that it generally takes 30 min to achieve accuracy within centimeter. When the platform is applied for 1414 CORS real-time data processing in China, it can perform PPP calculations with stability and high precision. Application of the real-time Precipitable Water Vapor (PWV) monitoring is also provided.
This study evaluates the use of Quantum Convolutional Neural Networks (QCNNs) for identifying signals resembling Gamma-Ray Bursts (GRBs) within simulated astrophysical datasets in the form of light curves. The task ad...
详细信息
Since the last decade, radio astronomy has started a new era: the advent of the Square Kilometer Array (SKA), preceded by its pathfinders, will produce a huge amount of data that will be hard to process with a traditi...
详细信息
ISBN:
(数字)9798331524937
ISBN:
(纸本)9798331524944
Since the last decade, radio astronomy has started a new era: the advent of the Square Kilometer Array (SKA), preceded by its pathfinders, will produce a huge amount of data that will be hard to process with a traditional approach. This means that the current state-of-the-art software for data reduction and imaging will have to be re-modeled to face such data challenge. In order to manage such an increase in data size and computational requirements, scientists need to exploit modern high-performance computing (HPC) architectures. In particular, heterogeneous systems, based on complex combinations of CPUs, accelerators, high-speed networks and composite storage devices need to be used in an efficient and effective way. In this paper, we present an overview on Radio Imaging Code Kernels (RICK; [1]; [2]; [3]), a code able to perform the most computationally demanding steps of w-stacking gridder algorithm exploiting distributedparallelism and GPU acceleration. GPU offloading is possible through CUDA, HIP, and OpenMP, aiming at the largest possible usability among multiple architectures. After detailing the (multi-)GPU approach to the problem and listing all the new code implementations, we analyze its performances considering both the computational and communication workload. We will show how the full, distributed GPU offload of the code, first of its kind and crucial to deal with increasingly large interferometric data, represents not only an extremely fast and optimized approach, but also the greenest one if compared to its parallel CPU counterpart. This code, now publicly available, has been tested with a wide variety of modern interferometers and SKA pathfinders. This represents, to date, the first example of radio imaging software fully enabled to GPUs, becoming a potential state-of-the-art approach for the upcoming SKA. Finally, we will also present the future perspectives about the code, planned to be converted into a library and possibly be used by any of the m
Texture identification has been developed recently to support oneto-one verification and one-to-many search, which provides much broader support than texture classification in real-life applications. It has demonstrat...
详细信息
ISBN:
(纸本)9781450390682
Texture identification has been developed recently to support oneto-one verification and one-to-many search, which provides much broader support than texture classification in real-life applications. It has demonstrated great potentials to enable product traceability by identifying the unique texture information on the surface of the targeted objects. However, existing hardware acceleration schemes are not enough to support a large-scale texture identification, especially for the search task, where the number of texture images being searched can reach millions, creating enormous compute and memory demands and making real-time texture identification infeasible. To address these problems, we propose a comprehensive toolset with jointly optimization strategies from both hardware and software to deliver optimized GPU acceleration and leverage large-scale texture identification with real-time responses. Novel technologies include: 1) a highly-optimized cuBLAS implementation for efficiently running 2-nearest neighbors algorithm;2) a hybrid cache design to incorporate host memory for streaming data toward GPUs, which delivers a 5x larger memory capacity while running the targeted workloads;3) a batch process to fully exploit the data reuse opportunities by considering available compute resources and memory bandwidth constraints. 4) an asymmetric local feature extraction to reduce the memory footprint for keeping feature matrices of reference texture images. To the best of our knowledge, this work is the first implementation to provide realtime large-scale texture identification on GPUs. By exploring the co-optimizations from both hardware and software, we can deliver 31x faster search and 20x larger feature cache capacity compared to a conventional CUDA implementation. We also demonstrate our proposed designs by proposing a distributed texture identification system with 14 Nvidia Tesla P100 GPUs which can complete 872,984 texture similarity comparisons in just one second.
To address the demand for high-performance large-scale simulation of two-phase flows, a momentum-conserving weakly compressible Navier-Stokes solver with multi-GPU computation is proposed. Following the principle of c...
详细信息
The proceedings contain 80 papers. The special focus in this conference is on Futuristic Trends in Network and Communication Technologies. The topics include: Unmanned Vehicles: Safety Management Systems and Safety Fu...
ISBN:
(纸本)9789811614798
The proceedings contain 80 papers. The special focus in this conference is on Futuristic Trends in Network and Communication Technologies. The topics include: Unmanned Vehicles: Safety Management Systems and Safety Functions;probabilistic Characteristics of a Two-Channel Detector with Two Inertial Single-Photon Photoemission Devices and an Electronic Adder;energy and Spectrum-Aware Cluster-Based Routing in Cognitive Radio Sensor Networks;The Hybrid Approach for the Partitioning of VLSI Circuits;maximization of IoT Network Lifetime Using Efficient Clustering Technique;a Hybrid Metaheuristic to Solve Capacitated Vehicle Routing Problem;Energy Conservation in IOT: A Survey;identification of Implicit Threats Based on Analysis of User Activity in the Internet Space;EERO: Energy Efficient Route Optimization Technique for IoT Network;representing a Quantum Fourier Transform, Based on a Discrete Model of a Quantum-Mechanical System;forecasting Non-Stationary Time Series Using Kernel Regression for Control Problems;A Smart Waste Management System Based on LoRaWAN;simulation of the Semantic Network of Knowledge Representation in Intelligent Assistant Systems Based on Ontological Approach;A Deep Learning Approach for Autonomous Navigation of UAV;framework for processing Medical Data and Secure Machine Learning in Internet of Medical Things;path Planning for Autonomous Robot Navigation: Present Approaches;Development of a Routing Protocol Based on Clustering in MANET;threat Model for Trusted Sensory Information Collection and processing Platform;autonomous Navigation of Mobile Robot with Obstacle Avoidance: A Review;Design of a distributed Debit Management Network of Operating Wells of Deposits of the CMW Region;design of U-Shaped Multiline Microstrip Patch Antenna for Advanced Wireless applications.
暂无评论