Commodity multi-core SMPs may generate an enormous amount of coherency traffic. However, the impact of coherence traffic and snoop filtering on parallel program scalability has not attracted sufficient attention. We e...
详细信息
ISBN:
(纸本)9780769547497
Commodity multi-core SMPs may generate an enormous amount of coherency traffic. However, the impact of coherence traffic and snoop filtering on parallel program scalability has not attracted sufficient attention. We experimentally analyze the shared data access patterns of four typical applications having different memory layout. An OpenMp optimized execution model is derived for each application with emphasis on data dependencies and implied coherence messages. Using an 8-core SMP we present the obtained speedups versus change in the number of cores and problem scale. A discussion of potential limitation on scalability due to the application or SMP is presented. To assess the coherence behavior and its impact on scalability of parallel programs, a synthetic benchmark which alternates the data block ownership among two cores of the same or different processors is presented. It is found that coherence overheads including snoop filtering are responsible of significant limitation on parallel program scalability. For 8-core SMPs, speedup can be reduced by factors of 2.5 and 5 for row-major and column-major access patterns as compared to the use of private data, respectively. A truly parallel coherence protocol implementation is needed to provide truly scalable shared-memory model.
The stock market index is a valuable financial tool to measure the state of a segment of the stock market. With high input data rates, real-time index computation is a challenging task. It cannot be done in real-time ...
详细信息
ISBN:
(纸本)9781479909087;9781479909094
The stock market index is a valuable financial tool to measure the state of a segment of the stock market. With high input data rates, real-time index computation is a challenging task. It cannot be done in real-time with today's reasonably high-end computers with many CPU cores. Doing so with CPU-based systems will require server farms with lots of computing power and therefore costly. Thus currently this index value is computed periodically (non-real-time). In this paper we describe our attempt in fast index computation using Graphics Processing Units (GPUs), which usually have several hundreds of processing cores and are much less expensive than CPU-based solutions. The computation itself is data parallel and therefore suitable for GPU processing. Preliminary results indicate our approach is promising as we can compute much faster using GPUs than using multi-core CPUs.
Generally, image rendering requires high computing capacity. It is really time consuming to render a movie on a single machine. The use of multiple machines to render a move requires much effort to control the workflo...
详细信息
ISBN:
(纸本)9781479989379
Generally, image rendering requires high computing capacity. It is really time consuming to render a movie on a single machine. The use of multiple machines to render a move requires much effort to control the workflow and data. With the emergence of cloud computing, more and more scientists and engineers are moving their tasks from laboratories to public clouds. This migration requires some sort experience on both the cloud architecture and coding in the cloud. This paper proposes a simple service to render movies on Microsoft Azure that accelerates movie rendering. This service, called AzureRender, also introduces task parallelism and cache management to improve performance and reduce cost. A comparative study on image rendering performance and cost between Microsoft Azure and desktop machines is given at the end of the paper.
Very large problems with high resource requirements of both computation and communication could be tackled with large numbers of workstations. However, for LAN-based networks, contention becomes a limiting factor, whe...
详细信息
Very large problems with high resource requirements of both computation and communication could be tackled with large numbers of workstations. However, for LAN-based networks, contention becomes a limiting factor, whereas latency appears to limit communication for WAN-based networks, nominally the Internet. In this paper, we describe a model to analyze the gain of communication latency hiding by overlapping computation and communication. This model illustrates the limitations and opportunities of communication latency hiding for improving speedup of parallel computations that can be structured appropriately. Experiments show that latency hiding techniques increase the feasibility of parallel computing in high-latency networks of workstations across the Internet as well as in multiprocessor systems.
In this paper, we study a parallel job scheduling model which takes into account both computation time and the overhead from communication between processors. Assuming that a job Jj has a processing requirement pj and...
详细信息
ISBN:
(纸本)9780889867048
In this paper, we study a parallel job scheduling model which takes into account both computation time and the overhead from communication between processors. Assuming that a job Jj has a processing requirement pj and is assigned to kj processors for parallel execution, then the execution time will be modeled by tj = p j / kj+ (kj - 1) c, where c is the constant overhead cost associated with each processor other than the master processor. In this model, (kj - 1)c represents the cost for communication and coordination among the processors. This model attempts to accurately portray the actual execution time for jobs running in parallel on multiple processors. Using this model, we will study the online algorithm Earliest Completion Time (ECT) and show a lower bound for the competitive ratio of ECT for m ≥ 2 processors. For m ≤ 4, we show the matching upper bound to complete the competitive analysis for m = 2,3,4. For large m, we conjecture that the ratio approaches 30/13 ≈ 2.30769.
The research on complex Brain Networks plays a vital role in understanding the connectivity patterns of the human brain and disease-related alterations. Recent studies have suggested a noninvasive way to model and ana...
详细信息
In this paper, we propose and study a novel distributed traffic information system. The contributions of the paper include a road-network-aware (RNA) information publication protocol, a distributed query processing pr...
详细信息
ISBN:
(纸本)9781424413959
In this paper, we propose and study a novel distributed traffic information system. The contributions of the paper include a road-network-aware (RNA) information publication protocol, a distributed query processing protocol with transient memory, and an adaptive interaction model between the two, all in the context of a distributed traffic surveillance infrastructure. Our study focuses on (1) the impact of various information demand characteristics on dissemination strategies, and (2) the adaptive optimal strategies in a distributed manner without prior knowledge of information demand characteristics. Both theoretical and simulation results are presented.
In this paper, a distributed fusion white noise deconvolution estimator is presented for the multisensor linear discrete systems with different measurement matrices and correlated measurement noises. It is globally op...
详细信息
ISBN:
(纸本)9781479913909
In this paper, a distributed fusion white noise deconvolution estimator is presented for the multisensor linear discrete systems with different measurement matrices and correlated measurement noises. It is globally optimal because it is derived from the centralized fusion white noise deconvolution estimator and is identical to the centralized fuser. The proposed white noise fuser is obtained based on the local Kalman predictors. Compared with the existing globally suboptimal distributed fusion white noise estimators, the computation of complex covariance matrices is avoided. The effectiveness of the proposed results is shown by a Monte Carlo simulation for the Bernoulli-Gaussian input white noise.
Run time variability of parallel application codes continues to be a significant challenge in clusters. We are studying run time variability at the communication level from the perspective of the application, focusing...
详细信息
The Industrial Internet of Things (IIoT) generally uses cloud computing mode to process tasks. However, in cases of excessive task volume, a high workload will lead to significant processing delays. Edge computing, du...
详细信息
暂无评论