the paper presents parallelization of exact algorithm of resolution for the Probabilistic Traveling Salesman Problem (PTSP). this algorithm allows us, first, to verify the stability of well-solvable special cases and ...
详细信息
ISBN:
(纸本)9783030050511;9783030050504
the paper presents parallelization of exact algorithm of resolution for the Probabilistic Traveling Salesman Problem (PTSP). this algorithm allows us, first, to verify the stability of well-solvable special cases and also to optimally solve useful instances of PTSP. It again allows to perform our version of Karp partitioning algorithm, where real problems are very large-sized. the implementation of the algorithm of Karp consists in subdividing the square plan, into sub-plans. So we transform the resolution of a large size problem to the resolution of many small size sub-problems which can be exactly solved. this application can be gridified and these different sub-problems would be processed in parallel by different nodes since they are totally independent. In each sub-plan the Branch and Bound algorithm is used. In this paper we propose two parallelizations of the Branch and Bound algorithm for the resolution of the PTSP. On the one hand, the parallelization of the branches used in the exploration of the tree, on the other hand the parallelization of the algorithm associated withthe notion of partitioning introduced by Karp. We perform an experimental study conducted in a multi-core environment to evaluate the performance of the proposed approach.
Withthe growth of the scale of data set and neural networks, the training time is increasing rapidly. Distributed parallel training has been proposed to accelerate deep neural network training, and most efforts are m...
详细信息
ISBN:
(纸本)9783030050511;9783030050504
Withthe growth of the scale of data set and neural networks, the training time is increasing rapidly. Distributed parallel training has been proposed to accelerate deep neural network training, and most efforts are made on top of GPU clusters. this paper focuses on the performance of distributed parallel training in CPU clusters of supercomputer systems. Using resources at the supercomputer system of "Tianhe-2", we conduct extensive evaluation of the performance of popular deep learning tools, including Caffe, TensorFlow, and BigDL, and several deep neural network models are tested, including AutoEncoder, LeNet, AlexNet and ResNet. the experiment results show that Caffe performs the best in communication efficiency and scalability. BigDL is the fastest in computing speed benefiting from its optimization for CPU, but it suffers from long communication delay due to the dependency on MapReduce framework. the insights and conclusions from our evaluation provides significant reference for improving resource utility of supercomputer resources in distributed deep learning.
the Finite Difference Method (FDM) is the most common parallel computing method for high performance numerical simulation . the method is able to solve and obtain physical field parameters efficiently when grid streng...
详细信息
ISBN:
(纸本)9781538651148
the Finite Difference Method (FDM) is the most common parallel computing method for high performance numerical simulation . the method is able to solve and obtain physical field parameters efficiently when grid strengthening treatment is dense enough. However, FDM processing in parallel acceleration case is necessary to divide all raw data into several parts, which is corresponding to the number of computing nodes amount. It may cause that the edge raw data of each computing node are essential to communicate withthe other edge data of the surrounding six adjacent computing nodes. Once the system communication time step is short enough and the computing node is of high computing performance, the communication latency may turn to the primary reason for inefficient parallel. In this paper, it provides a scheme for communication latency hiding that part edge data of each node is able to running in communication and computing simultaneously. the experimental results show that the proposed method has the ability to improve parallel efficiency and reduce the running time definitely.
In the field of signal process, Fast Fourier Transform (FFT) is a widely used algorithm to transform signal data from time to frequency. Unfortunately, withthe exponential growth of data, traditional methods cannot m...
详细信息
ISBN:
(数字)9783030050573
ISBN:
(纸本)9783030050573;9783030050566
In the field of signal process, Fast Fourier Transform (FFT) is a widely used algorithm to transform signal data from time to frequency. Unfortunately, withthe exponential growth of data, traditional methods cannot meet the demand of large-scale computation on these big data because of three main challenges of large-scale FFT, i.e., big data size, real-time data processing and high utilization of compute resources. To satisfy these requirements, an optimized FFT algorithm in Cloud is deadly needed. In this paper, we introduce a new method to conduct FFT in Cloud withthe following contributions: first, we design a parallel FFT algorithm for large-scaled signal data in Cloud;second, we propose a MapReduce-based mechanism to distribute data to compute nodes using big data processing framework;third, an optimal method of distributing compute resources is implemented to accelerate the algorithm by avoiding redundant data exchange between compute nodes. the algorithm is designed in MapReduce computation framework which contains three steps: data preprocessing, local data transform and parallel data transform to integrate processing results. the parallel FFT is implemented in a 16-node Cloud to process real signal data the experimental results reveal an obvious improvement in the algorithm speed. Our parallel FFT is approximately five times faster than FFT in Matlab in when the data size reaches 10 GB.
Two main approaches are currently prevalent in the digital emulation of musical instruments: manipulation of pre-recorded samples and techniques of real-time synthesis, generally based on physical models with varying ...
详细信息
ISBN:
(纸本)9783030304904;9783030304898
Two main approaches are currently prevalent in the digital emulation of musical instruments: manipulation of pre-recorded samples and techniques of real-time synthesis, generally based on physical models with varying degrees of accuracy. Concerning the first, while the processing power of present-day computers enables their use in real-time, many restrictions arising from this sample-based design persist;the huge on disk space requirements and the stiffness of musical articulations being the most prominent. On the other side of the spectrum, pure synthesis approaches, while offering greater flexibility, fail to capture and reproduce certain nuances central to the verisimilitude of the generated sound, offering a dry, synthetic output, at a high computational cost. We propose a method where ensembles of lightweight neural networks working in parallel are learned, from crafted frequency-domain features of an instrument sound spectra, an arbitrary instrument's voice and articulations realistically and efficiently. We find that our method, while retaining perceptual sound quality on par with sampled approaches, exhibits 1/10 of latency times of industry standard real-time synthesis algorithms, and 1/100 of the disk space requirements of industry standard sample-based digital musical instruments. this method can, therefore, serve as a basis for more efficient implementations in dedicated devices, such as keyboards and electronic drumkits and in general purpose platforms, like desktops and tablets or open-source hardware like Arduino and Raspberry Pi. From a conceptual point of view, this work highlights the advantages of a closer integration of machine learning with other subjects, especially in the endeavor of new product development. Exploiting the synergy between neural networks, digital signal processing techniques and physical modelling, we illustrate the proposed method via the implementation of two virtual instruments: a conventional grand piano and a hibrid strin
As known to all, polar codes have been chosen as the control channel codes for the enhance mobile broadband (eMBB) scenario in the 3GPP RAN1. Due to its excellent performance, polar codes also have caused widespread c...
As known to all, polar codes have been chosen as the control channel codes for the enhance mobile broadband (eMBB) scenario in the 3GPP RAN1. Due to its excellent performance, polar codes also have caused widespread concern in the academia and industry. In general, polar codes can be decoded by two methods: successive cancellation (SC) and belief propagation (BP) algorithms. However, compared withthe series of SC algorithms, the error-correction performance of BP decoding is not satisfactory, even though it has excellent parallelthroughput. Hence, this paper proposes an efficient BP list (EBPL) decoding of polar codes which can enhance the performance to the same scale of successive cancellation list (SCL) without sacrificing decoding throughput. With reasonable complexity, the proposed EBPL decoding can achieve comparable BER and FER compared to SCL in simulation results.
Different parallel frameworks for implementing data analysis applications have been proposed by the HPC and Big Data communities. In this paper, we investigate three task-parallel frameworks: Spark, Dask and RADICAL-P...
详细信息
ISBN:
(纸本)9781450365109
Different parallel frameworks for implementing data analysis applications have been proposed by the HPC and Big Data communities. In this paper, we investigate three task-parallel frameworks: Spark, Dask and RADICAL-Pilot with respect to their ability to support data analytics on HPC resources and compare them to MPI. We investigate the data analysis requirements of Molecular Dynamics (MD) simulations which are significant consumers of supercomputing cycles, producing immense amounts of data. A typical large-scale MD simulation of a physical system of O(100k) atoms over fisecs can produce from O(10) GB to O(1000) GBs of data. We propose and evaluate different approaches for parallelization of a representative set of MD trajectory analysis algorithms, in particular the computation of path similarity and leaflet identification. We evaluate Spark, Dask and RADICAL-Pilot with respect to their abstractions and runtime engine capabilities to support these algorithms. We provide a conceptual basis for comparing and understanding different frameworks that enable users to select the optimal system for each application. We also provide a quantitative performance analysis of the different algorithms across the three frameworks.
the discrete wavelet transform can be found at the heart of many image-processingalgorithms. Until now, the transform on general-purpose processors (CPUs) was mostly computed using a separable lifting scheme. As the ...
详细信息
ISBN:
(数字)9781510617421
ISBN:
(纸本)9781510617421
the discrete wavelet transform can be found at the heart of many image-processingalgorithms. Until now, the transform on general-purpose processors (CPUs) was mostly computed using a separable lifting scheme. As the lifting scheme consists of a small number of operations, it is preferred for processing using single-core CPUs. However, considering a parallelprocessing using multi-core processors, this scheme is inappropriate due to a large number of steps. On such architectures, the number of steps corresponds to the number of points that represent the exchange of data. Consequently, these points often form a performance bottleneck. Our approach appropriately rearranges calculations inside the transform, and thereby reduces the number of steps. In other words, we propose a new scheme that is friendly to parallel environments. When evaluating on multi-core CPUs, we consistently overcome the original lifting scheme. the evaluation was performed on 61-core Intel Xeon Phi and 8-core Intel Xeon processors.
Graph coloring is an important problem in computer science and engineering with numerous applications. As the size of data increases today, graphs with millions of nodes are becoming commonplace. parallel graph colori...
详细信息
ISBN:
(纸本)9781538673089
Graph coloring is an important problem in computer science and engineering with numerous applications. As the size of data increases today, graphs with millions of nodes are becoming commonplace. parallel graph coloring algorithms on high throughput graphics processing units (GPUs) have recently been proposed to color such large graphs efficiently. We present two new graph coloring algorithms for GPUs which improve upon existing algorithms both in coloring speed and quality. the first algorithm, counting-based Jones-Plassmann (CJP), uses counters to implement the classic Jones-Plassmann parallel coloring heuristic in a work-efficient manner. the second algorithm, conflict coloring (CC) achieves higher parallelism than CJP, and is based on optimistically coloring the graph using estimates of the chromatic number. We compared CC and CJP with two state-of-the-art GPU coloring algorithms, csrcolor [1] and Deveci et al's [2] vertex/edge-based algorithms (which we call VEB), as well as the sequential CPU algorithm ColPack [3]. In terms of coloring quality, CJP and CC are both far better than csrcolor, while CJP uses 10% fewer colors than VEB on average and CC uses 10% more. Compared to ColPack, CJP and CC use 1.3 x and 1.5x more colors on nonbipartite graphs, resp. In terms of speed, CJP is on average 1.5 - 2x faster than the other algorithms, while CC is 2.7 - 4.3x faster.
this paper presents new multi-objectives scheduling strategies implemented in Docker SwarmKit. Docker SwarmKit is a container toolkit for orchestrating distributed systems at any scale. Currently, Docker SwarmKit has ...
详细信息
ISBN:
(数字)9783030050573
ISBN:
(纸本)9783030050573;9783030050566
this paper presents new multi-objectives scheduling strategies implemented in Docker SwarmKit. Docker SwarmKit is a container toolkit for orchestrating distributed systems at any scale. Currently, Docker SwarmKit has one scheduling strategy called Spread. Spread is based only on one objective to select from a set of cloud nodes, one node to execute a container. However, the containers submitted by users to be scheduled in Docker SwarmKit are configured according to multi-objectives criteria, as the number of CPUs and the memory size. To better address the multi-objectives configuration problem of containers, we introduce the concept and the implementation of new multi-objectives scheduling strategies adapted for Cloud Computing environments and implemented in Docker SwarmKit. the principle of our multi-objectives strategies consist to select a node which has a good compromise between multi-objectives criteria to execute a container. the proposed scheduling strategies are based on a combinaison of PROMEthEE and Kung multi-objectives decision algorithms in order to place containers. the implementation in Docker SwarmKit and experiments of our new strategies demonstrate the potential of our approach under different scenarios.
暂无评论