Global predicate detection is a fundamental problem in distributed systems and finds applications in many domains such as testing and debugging distributed programs, this paper presents an efficient distributed algori...
详细信息
ISBN:
(纸本)0818676833
Global predicate detection is a fundamental problem in distributed systems and finds applications in many domains such as testing and debugging distributed programs, this paper presents an efficient distributed algorithm to detect conjunctive form global predicates in distributed systems. the algorithm detects the first consistent global state that satisfies a given conjunction of local predicates. the algorithm is distributed because the predicate detection, efforts as well as the necessary information is equally distributed among the processes.
this paper presents on-line perturbation tracking and intrusion removal techniques which are designed to accommodate delays which occur due to monitoring activities. these accommodations eliminate the effect of monito...
详细信息
ISBN:
(纸本)0818676833
this paper presents on-line perturbation tracking and intrusion removal techniques which are designed to accommodate delays which occur due to monitoring activities. these accommodations eliminate the effect of monitoring intrusion on the execution behavior and the scheduling of the monitored computation. By maintaining an adjusted time view, the intrusion removal system preserves the execution order of processes and the message selection decisions that would have been made in an unmonitored execution.
Due to the growth of data scale, distributed machine learning has become more important than ever. Some recent work, like TuX(2), show promising prospect in dealing withdistributed machine learning by leveraging the ...
详细信息
ISBN:
(纸本)9781728111414
Due to the growth of data scale, distributed machine learning has become more important than ever. Some recent work, like TuX(2), show promising prospect in dealing withdistributed machine learning by leveraging the power of graph computation, but still leave some key problems unsolved. In this paper, we propose Cymbalo, a new distributed graph processing framework for large-scale machine learning algorithms. To satisfy the specific characteristics of machine learning, Cymbalo employs a heterogeneity-aware data model, a hybrid computing model and a vector-aware programming model, to ensure small memory footprint, good computation efficiency and expressiveness. the experiment results show that Cymbalo outperforms Spark by 2.4x-3.2x, and PowerGraph by up to 5.8x. Moreover, Cymbalo can also outperform Angel, a recent parameter server system, by 1.6x-2.1x.
We present a general framework for approximation schemes on parallel processor scheduling. We propose epsilon-approximation algorithms for scheduling on identical, uniform and unrelated machines when the number of pro...
详细信息
ISBN:
(纸本)0818676833
We present a general framework for approximation schemes on parallel processor scheduling. We propose epsilon-approximation algorithms for scheduling on identical, uniform and unrelated machines when the number of processors is fixed. For each of the three problems considered we perform grouping on job processing times in order to produce a transformed scheduling instance where the number of distinct task types is bounded. We optimally solve the corresponding mixed integer program and we prove that the optimal makespans for the initial and the transformed problems can differ at most by a factor of 1 + epsilon. the complexity of all epsilon-approximation algorithms is O(n), where n is the number of jobs to be scheduled.
this paper presents a parallel algorithm running in time O(log m log* m(log log m + log(n/m))) time on an EREW PRAM with O(m/(log m log* m)) processors for the problem of selection in an m x n matric with sorted rows ...
详细信息
ISBN:
(纸本)0818676833
this paper presents a parallel algorithm running in time O(log m log* m(log log m + log(n/m))) time on an EREW PRAM with O(m/(log m log* m)) processors for the problem of selection in an m x n matric with sorted rows and columns, m less than or equal to n. Our algorithm generalizes the result of Sarnath and He [13] for selection in a sorted matrix of equal dimensions, and thus answers the opera question they posted. the algorithm is work-optimal when n greater than or equal to m log m, and near optimal within O(log log m) factor otherwise, We show that our algorithm can be generalized to solve the selection problem on or set of sorted matrices of arbitrary dimensions.
We present DataFall, a simple yet effective policy-driven algorithm for decentralized placement and reorganization of replicated data. Without relying on a centralized location for data mapping, DataFall efficiently d...
详细信息
ISBN:
(纸本)9781728111414
We present DataFall, a simple yet effective policy-driven algorithm for decentralized placement and reorganization of replicated data. Without relying on a centralized location for data mapping, DataFall efficiently distributes data objects across storage devices using a mechanism built on multiple hash functions. When producing the data placement, policies can be enforced at object level in a flexible manner. In addition, with minimum data movement, DataFall is capable of accommodating a wide variety of changes such as topology massive upgrade and reorganization. the advantages of DataFall over the state-of-the-art are demonstrated through an experimental evaluation in a set of selected scenarios.
distributed stream processing can accomplish real-time processing of continuous streaming big data to obtain valuable information with high velocity. To maintain continuously stable and efficient running of stream app...
详细信息
ISBN:
(纸本)9781728111414
distributed stream processing can accomplish real-time processing of continuous streaming big data to obtain valuable information with high velocity. To maintain continuously stable and efficient running of stream applications, however, continuous online scheduling operations are required in the context of highly dynamic data stream. For this reason, this paper proposes the on-the-fly scheduling strategy in a distributed stream processing environment, which dynamically predicts abnormal events through double exponential smoothing and adopts trafficaware active migration protocol to adjust the network routing structure on-the-fly to balance the inter-worker load. Moreover, an evaluation method is proposed to quantitatively analyze the various scheduling objectives. Finally, we commendably apply the scheduling strategy to a stream processing platform, which regards docker instance as basic scheduling units. Meanwhile, based on the platform and the evaluation method, we complete performance comparison experiments of the scheduling algorithm. the experimental results indicate that our algorithm has excellent performance in throughput of topology, average processing time and balance of task load, which is suitable for deployment in a distributed environment with large-scale nodes and tasks.
We investigate the efficient implementation of algorithms with a two-level parallelism on distributed memory machines. We consider parallel specifications consisting of an upper level of multi-processor tasks each of ...
详细信息
ISBN:
(纸本)0818676833
We investigate the efficient implementation of algorithms with a two-level parallelism on distributed memory machines. We consider parallel specifications consisting of an upper level of multi-processor tasks each of which having an internal structure of uni-processor tasks. To achieve an optimal parallel execution time, the parallel execution of such a program requires an optimal scheduling of the multi-processor tasks and an appropriate treatment of uniprocessor tasks. In particular;we consider an important class of parallel programs that are generated within a specific parallel programming model designing group-SPMD programs for scientific computing. We show how the costs of data redistributions between M-tasks can be taken into consideration and how the special structure of the resulting program can be exploited by using a simple approximation algorithm with a provable good performance.
Free Lossless Audio Codec (FLAC) format is a widely used format for audio storage. Using a lower performance single threaded approach, FLAC is easily decoded faster than the rate at which it is played at. However, if ...
详细信息
ISBN:
(纸本)9781728111414
Free Lossless Audio Codec (FLAC) format is a widely used format for audio storage. Using a lower performance single threaded approach, FLAC is easily decoded faster than the rate at which it is played at. However, if you wish to transcode or edit long FLAC audio files then decoding times using single thread CPU approaches becomes significant. the FLAC format contains a sequence of frames, these frames vary in size so start locations are unknown until the previous frame is decoded. this complicates parallelizing decoding. However, frames start with known fixed bit patterns and each frame contains a frame index, it is possible to locate and decode frames in parallel. In this paper, we present an approach that exploits this characteristic enabling all the frames to be decoded in parallel. this approach is implemented and evaluated using an NVIDIA GeForce (R) GTX 1080 graphics card showing a 5 times performance improvements than the widely used official implementation running on an Intel Core T i7-6770K CPU.
In this paper, we use the Ffowcs Williams-Hawkings (FW-H) equation of the penetrable surface to simulate the acoustic radiation problem of a monopole source in a three-dimensional uniform flow. In addition, parallel o...
详细信息
ISBN:
(纸本)9781728111414
In this paper, we use the Ffowcs Williams-Hawkings (FW-H) equation of the penetrable surface to simulate the acoustic radiation problem of a monopole source in a three-dimensional uniform flow. In addition, parallel optimization was performed on the Sunway TaihuLight. Compared withthe Intel CPU, the ideal acceleration effect was obtained. And the method of dividing the time window was used, which solved the shortage of the local data memory space of the slave core in the calculation process, achieved the local coupling calculation of the sound source and far-field sound pressure, and obtained ideal results. It is verified that the parallel algorithm has a good acceleration effect and scalability on the Sunway TaihuLight.
暂无评论