Improving the performance of stencil computations is a long-standing optimization challenge due to their inherent heavy memory-access patterns. This problem has been explored in many wave-propagation simulation engine...
详细信息
ISBN:
(纸本)9781538655559
Improving the performance of stencil computations is a long-standing optimization challenge due to their inherent heavy memory-access patterns. This problem has been explored in many wave-propagation simulation engines. Moving towards implementations with elastic waves instead of acoustic ones (e.g., used in medical imaging) results in computationally more expensive processes along with increased memory usage. Despite the computational demand, the elevated cost of exploration combined the need for higher success rates is driving the oil & gas industry to adopt elastic anisotropic wave-propagation models as the core of many geophysical imaging mechanisms to extract subsurface features more accurately, increasing return on investment. To reduce time-to-solution, the more complex stencil codes must run efficiently on modern CPU architectures. The Intel Xeon Phi processors emerge as an energy-efficient solution that provides a good trade-off between market price and computing capability. In this paper, we study the effect of several optimization techniques using the YASK stencil-generation framework to implement and evaluate a 25-point stencil of an elastic-wave propagation engine for Intel Xeon Phi processors. The results showed improvements of up to 7x in computations and 8x in memory bandwidth with respect to the non-tuned version, reaching up to 75% of the attainable floating-point performance at the given operational intensity. We collected performance metrics for a set of the most representative optimizations and revealed the relation between each strategy and fundamental characteristics of both code and hardware.
Random walks constitute an attractive technique in distributed computing. In this paper, we present an original method using relationship between electrical resistance and random walks, to automatically compute quanti...
详细信息
ISBN:
(纸本)0769520693
Random walks constitute an attractive technique in distributed computing. In this paper, we present an original method using relationship between electrical resistance and random walks, to automatically compute quantities such as cover time, and more generally any processing time measure defined through hitting times. This method comes from electrical theory by using Millman's theorem.
In this paper we introduce and discuss a model of distributed data processing. For this purpose, a typical application system is analyzed and divided into sub-applications. To fulfill the task of the global applicatio...
详细信息
ISBN:
(纸本)0818620528
In this paper we introduce and discuss a model of distributed data processing. For this purpose, a typical application system is analyzed and divided into sub-applications. To fulfill the task of the global application, the sub-applications have to communicate in an appropriate manner by exchanging data resp. information. In our model the communication between sub-applications is split up into two steps: the offering of information by sending sub-applications, and its acceptance by receiving sub-applications. For both communication steps synchronous and asynchronous processing modes are defined. Supporting those different communication modes the cooperation between sub-applications can be defined very closely to the specific demands of the application system. This optimizes distributed data processing. At last we demonstrate the prototype implementation of a distributed data management system, which is based on the flexible communication mechanism described in the paper.
Semijoin has traditionally been relied upon for reducing the communication cost required for distributed query processing. However, judiciously applying join operations as reducers can lead to further reduction in the...
详细信息
ISBN:
(纸本)0818620528
Semijoin has traditionally been relied upon for reducing the communication cost required for distributed query processing. However, judiciously applying join operations as reducers can lead to further reduction in the communication cost. In view of this fact, we explore in this paper the approach of using join operations, in addition to semijoins, as reducers in distributed query processing. We first show that the problem of determining a sequence of join operations for a query graph can be transformed to that of finding a set of cuts to that graph, where a cut to a graph is a partition of the nodes in that graph. In light of the mapping we develop an efficient heuristic algorithm to determine an effective sequence of join reducers for a query. The algorithm using the concept of divide-and-conquer is shown to have polynomial time complexity. Examples are also given to illustrate our results.
The main goal of this workshop is to provide a timely forum for the exchange and dissemination of new ideas, techniques and research in the field of the new parallel anddistributed computational models. The workshop ...
详细信息
Generic queuing models of parallel systems with K ≥ 2 exponential servers, where jobs may be split into K independent tasks, are considered. The queuing of jobs is distributed if each server has its own queue and cen...
详细信息
ISBN:
(纸本)0818620889
Generic queuing models of parallel systems with K ≥ 2 exponential servers, where jobs may be split into K independent tasks, are considered. The queuing of jobs is distributed if each server has its own queue and centralized if there is a common queue. The scheduling of jobs is no splitting if all tasks of a job must run on one processor and splitting if they can run concurrently on different processors. Exact and approximate expressions for the mean response time, Tr:K, of the rth, r = 1, 2, ..., K, departing task in a job are obtained and compared for four models: distributed/spitting, distributed/no splitting, centralized/splitting, and centralized/no splitting. The queuing models are described. Exact and approximate analyses of the various models are presented where expressions are obtained for the mean task response time. The various models are compared and applications in the areas of distributed query processing andparallel systems are included.
Virtual Reality (VR) is an exciting yet challenging area. Especially in commercial VR systems, one of the main challenges is how to maintain relatively constant performance under various loading and at low-cost. This ...
详细信息
Virtual Reality (VR) is an exciting yet challenging area. Especially in commercial VR systems, one of the main challenges is how to maintain relatively constant performance under various loading and at low-cost. This paper presents a parallel anddistributed solution to the problem under the background of a commercial entertainment VR system. In the paper, the architecture of the system is introduced. The strategies of distribution and the mechanism of the parallelprocessing is discussed.
The proceedings contains 92 papers from the 1996 International symposium on parallel Architectures, Algorithms and Networks. Topics discussed include: massively parallel processors;distributed memory parallel computer...
详细信息
The proceedings contains 92 papers from the 1996 International symposium on parallel Architectures, Algorithms and Networks. Topics discussed include: massively parallel processors;distributed memory parallel computers;multistage interconnection networks;Banyan switching fabrics;internetworking;transmission control protocol/Internet protocol networks;train traffic and event driven simulations;universal broadband network access devices;customer premises networks;andparallel random access machines.
The authors propose a distributed dynamic action scheme that allows a recovery concept permitting efficient distributed computing during normal operation to be combined with efficient exception handling in the case of...
详细信息
ISBN:
(纸本)0818620889
The authors propose a distributed dynamic action scheme that allows a recovery concept permitting efficient distributed computing during normal operation to be combined with efficient exception handling in the case of an effective error. They provide a dynamic action model tailored to the dynamic nature of distributedprocessing. This model offers a recovery concept which allows the recovery region of a recovery line to surmount the size of the corresponding computation in order to gain high efficiency during normal operation. By running the versions of a recovery block as distributed actions, it becomes possible to incorporate a recovery concept that allows efficient distributedprocessing during normal operation and prompt reaction in the case of error by running the different versions in parallel. To implement the dynamic action model efficiently a redundant recovery graph keeps track of recovery regions. On the basis of this graph the authors provide decentralized protocols that produce a consistent system state that is fast, efficient, and concurrent with normal system activity.
A high-throughput matching memory (MM) for a data-driven microprocessor is discussed. An MM can be constructed using a hashing memory. However, one of the biggest problems with hashing memory is the necessity for sele...
详细信息
A high-throughput matching memory (MM) for a data-driven microprocessor is discussed. An MM can be constructed using a hashing memory. However, one of the biggest problems with hashing memory is the necessity for selective processing whenever hashed address conflicts occur. To eliminate this problem, the MM incorporated a small amount of associative memory (32 words*50 b) as well as the hashing memory (512 words*42 b). The matching operation is subdivided into three pipeline stages, all controlled by the elastic pipeline scheme. With this structure, an MM with a high throughput of 100-mega-access/s MM can be realized.< >
暂无评论