the skyline queries help users handle the huge amount of available data by finding a set of interesting points. As the dataset sizes are constantly increasing and skyline queries are computationally expensive, it is c...
详细信息
ISBN:
(纸本)9781538637906
the skyline queries help users handle the huge amount of available data by finding a set of interesting points. As the dataset sizes are constantly increasing and skyline queries are computationally expensive, it is critical to compute such queries by utilizing parallelism. Existing works deal exclusively withthe totally ordered attribute domains. In this paper, we present a framework, named PSLP, for parallel skyline evaluation for data with both totally and partially ordered domains. We introduce a new partial-to-order mapping scheme that guarantees the correctness of the mapping by preserving incomparability and preference with low mapping cost. We also propose a novel logical partitioning for parallelprocessing where data space are partitioned according to their incomparability and preference relationships by using a pivot point. the logical partitioning can prune away partitions that do not contain any skyline point at the partitioning processing. An extensive performance evaluation confirms the efficiency and effectiveness of the proposed approach.
Bit-reproducibility has many advantages in the context of high-performance computing. Besides simplifying and making more accurate the process of debugging and testing the code, it can allow the deployment of applicat...
详细信息
ISBN:
(纸本)9780769552071
Bit-reproducibility has many advantages in the context of high-performance computing. Besides simplifying and making more accurate the process of debugging and testing the code, it can allow the deployment of applications on heterogeneous systems, maintaining the consistency of the computations. In this work we analyze the basic operations performed by scientific applications and identify the possible sources of non-reproducibility. In particular, we consider the tasks of evaluating transcendental functions and performing reductions using non-associative operators. We present a set of techniques to achieve reproducibility and we propose improvements over existing algorithms to perform reproducible computations in a portable way, at the same time obtaining good performance and accuracy. By applying these techniques to more complex tasks we show that bit-reproducibility can be achieved on a broad range of scientific applications.
Novel platforms of modular robot systems have been developed with important applications in safety, transportation and sensing domains. In such systems, modular robots are able to change their organization in order to...
详细信息
ISBN:
(纸本)9781479942930
Novel platforms of modular robot systems have been developed with important applications in safety, transportation and sensing domains. In such systems, modular robots are able to change their organization in order to obtain different shapes. the conception of distributed programs allowing the "optimal" reorganization of a set of robots into a specific shape appears as a very challenging problem. In this paper we present an original distributed meta-algorithm for micro-robots shape-shifting problem. We show that this meta-algorithm, described as a general functioning schema, presents a good framework to easily conceive distributed algorithms for shape-shifting problems. We also prove the facility to instantiate the algorithm for special target shapes and we give an adaptation of the algorithm to reach any horizontally convex form. the presented meta-algorithm presents two main advantages: first, there is no need to exact positioning of the robots and secondly, the memory storage and communication requirements are significantly reduced.
Pattern libraries are important tools for high productivity application development. their struggle for best performance is complicated by the fact that they are used to execute user-provided code, which is not known ...
详细信息
ISBN:
(纸本)9781479942930
Pattern libraries are important tools for high productivity application development. their struggle for best performance is complicated by the fact that they are used to execute user-provided code, which is not known during their creation. this makes pattern libraries good candidate for automatic software tuning. In this paper, we deal with automatic online parameter tuning of the HyPHI hybrid pattern library for heterogeneous systems equipped withthe Intel Xeon Phi coprocessors. We propose a framework that can be used to combine a pattern library with an existing tuning library in a practical and efficient way. Our experiments show that tuning can noticeably improve the performance of the library and it introduces very little overhead.
Real-time ship detection from remote sensing imagery it is a great challenge due to the complex scene, the changeable characteristics of ship target, and the uncontrollable interference factors. In this letter, an inf...
详细信息
ISBN:
(纸本)9781665435741
Real-time ship detection from remote sensing imagery it is a great challenge due to the complex scene, the changeable characteristics of ship target, and the uncontrollable interference factors. In this letter, an infrared ship detection algorithm based on multi-feature fusion is proposed. Based on the fully analysis of the target features, the proposed algorithm combines the cascade rejection mechanism with multi-features through a cascade linear classifier from simple to complex, and uses fine features to accurately distinguish the target from the complex background. Large number of experimental results show that the proposed method can achieve better results in detection performance and real-time processing.
We investigate cryptanalytic applications comprised of many independent tasks that exhibit a stochastic runtime distribution. We compare four algorithms for executing such applications on GPUs. We demonstrate that for...
详细信息
ISBN:
(纸本)9781665435772
We investigate cryptanalytic applications comprised of many independent tasks that exhibit a stochastic runtime distribution. We compare four algorithms for executing such applications on GPUs. We demonstrate that for different distributions, problem sizes, and platforms the best strategy varies. We support our analytic results by extensive experiments on two different GPUs, from different sides of the performance spectrum: A high performance GPU (Nvidia Volta) and an energy saving system on chip (Jetson Nano).
Measures of graph similarity have a broad range of applications but involve compute-intensive process. Similarity flooding algorithm is an efficient algorithm for comparing the similarity of graphs of small size and s...
详细信息
ISBN:
(纸本)9780769548791
Measures of graph similarity have a broad range of applications but involve compute-intensive process. Similarity flooding algorithm is an efficient algorithm for comparing the similarity of graphs of small size and small datasets. However, nowadays more and more large-scale graph applications emerge and existing stand-alone similarity flooding algorithm cannot efficiently conduct the similarity comparison process for large scale graph datasets in acceptable time. this paper presents a parallelized similarity flooding algorithm with MapReduce for large-scale graph datasets. the experimental results demonstrate that the parallelized algorithm achieves significant performance improvement compared to the stand-alone similarity flooding algorithm. Experimental results also reveal that the parallelized algorithm can obtain excellent speedup when the size of cluster increases.
the proceedings contain 153 papers. the topics discussed include: transaction data management optimization based on multi-partitioning in blockchain systems;semi-asynchronous federated learning optimized for NON-IID d...
ISBN:
(纸本)9798350329223
the proceedings contain 153 papers. the topics discussed include: transaction data management optimization based on multi-partitioning in blockchain systems;semi-asynchronous federated learning optimized for NON-IID data communication based on tensor decomposition;HKTGNN: hierarchical knowledge transferable graph neural network-based supply chain risk assessment;DQR-TTS: semi-supervised text-to-speech synthesis with dynamic quantized representation;deep reinforcement learning-based network moving target defense in DPDK;iNUMAlloc: towards intelligent memory allocation for AI accelerators with NUMA;and predictive queue-based low latency congestion detection in data center networks.
Acceleration for the training process of Deep Neural Networks (DNNs) has been the focus of deep learning field. there were many researches of accelerating deep learning on different platforms. Among them, Intel Xeon P...
详细信息
ISBN:
(纸本)9781538637906
Acceleration for the training process of Deep Neural Networks (DNNs) has been the focus of deep learning field. there were many researches of accelerating deep learning on different platforms. Among them, Intel Xeon Phi Co-processor is a many-core platform which provides both strong programmability and high performance. But previous work about Intel Many Integrated Core (MIC) focused on parallel computing only in MIC. In this paper, we speed up the training process of DNNs applied for automatic speech recognition with CPU+MIC architecture. In this architecture, the training process of DNNs is executed both on MIC and CPU. We apply several optimization methods for I/O and calculation and set up experiments to approve these methods. Putting all methods together, results show that our optimized algorithm acquires about 20x speedup compared withthe original sequential algorithm on CPU which uses one core.
Approximate nearest neighbor search (ANNS) is the most basic and important algorithm in Database, Machine Learning and other applications. Withthe expansion of cloud computing, the academia focuses on the study of ho...
详细信息
ISBN:
(纸本)9781665435741
Approximate nearest neighbor search (ANNS) is the most basic and important algorithm in Database, Machine Learning and other applications. Withthe expansion of cloud computing, the academia focuses on the study of how to optimize distributed frameworks based on approximate nearest neighbor search such as MapReduce, and Memcached. We implement a new distributed ANNS search framework (NetANNS). the main contributions of NetANNS are to accelerate the data preprocessing with programmable switch, and integrate a variety of efficient ANNS algorithms so that it can choose the most suitable algorithm for each datasets. the experiments show that the search efficiency of NetANNS is about 2x than the common distributed ANNS frameworks which are implemented based on the framework of MapReduce.
暂无评论