The OTIS (Optical Transpose Interconnection System) is one of the efficient models of optoelectronic parallel computers. The OTIS-Hypercube is one of the popular models for optoelectronic parallel computer. In this pa...
详细信息
ISBN:
(纸本)1601320841
The OTIS (Optical Transpose Interconnection System) is one of the efficient models of optoelectronic parallel computers. The OTIS-Hypercube is one of the popular models for optoelectronic parallel computer. In this paper, we present two parallel algorithms for polynomial interpolation on a 64-processor OTIS- Hypercube interconnection network. We have considered N-data point polynomial interpolation on the above network with N processors. The algorithm for Lagrange polynomial interpolation requires (1.5N05 + 0.25N + 3) electronic moves + (N0.5 + 2) optical moves. However, assuming the availability of initial data points, it requires (1.5N0.5 + 0.25N) electronic moves + N0.5 optical moves. We have also shown our algorithm to be better that those in [15] in terms of AT cost.
Redundant arrays of independent disks (RAID) have been widely used for providing a mass storage with high performance and reliability. Among RAID architectures, RAID-1 and RAID-5 are most popular. But RAID-1 means exc...
详细信息
ISBN:
(纸本)1892512416
Redundant arrays of independent disks (RAID) have been widely used for providing a mass storage with high performance and reliability. Among RAID architectures, RAID-1 and RAID-5 are most popular. But RAID-1 means excessive redundancy, and RAID-5 shows poor write performance. Recently SMDA (Stripped Mirroring Disk Array) was proposed to overcome small-write problem of disk array. SMDA stores the original data in two ways, one on a single disk and the other on a plurality of disks in RAID-0 by stripping [2]. In this paper, we propose a new disk array architecture, called distributed Sparing-Stripped Mirroring Disk Array (ds-SMDA), that adds distributed on-line spares to SMDA. With ds-SMDA, we can increase parallelism of small-size read and write operations in the normal state. And ds-SMDA enables us to reduce seek time during the recovery time. Moreover, we can recover from any double disk failures.
In this paper we introduce and evaluate two prefetching techniques to improve the performance of Java applications executed on the grid. These techniques are experimentally evaluated on two grid environments, by runni...
详细信息
ISBN:
(纸本)9783540680673
In this paper we introduce and evaluate two prefetching techniques to improve the performance of Java applications executed on the grid. These techniques are experimentally evaluated on two grid environments, by running test applications on two different grid deployment configurations. Our testbed is SUMA/G, a grid platform specifically targeted at executing Java bytecode on Globus grids. The experimental results show that these techniques can be effective on improving the performance of applications run on the grid, especially for compute intensive scientific applications.
This research presents analytical models based on an energy consumption metric to analyze the impact of dynamic frequency scaling on the energy consumption of various architectural design choices for hybrid-architectu...
详细信息
ISBN:
(纸本)9781479927289
This research presents analytical models based on an energy consumption metric to analyze the impact of dynamic frequency scaling on the energy consumption of various architectural design choices for hybrid-architecture chips. The power consumption implications of different processing schemes and various chip configurations were also analyzed. The analysis shows that by choosing the optimal hardware configuration, the energy savings can be increased considerably while keeping sacrifices in performance at tolerable levels.
Efficient determination of processing termination at barrier synchronization points can occupy an important role in the overall throughput of parallel and distributed computing systems. Even though relatively efficien...
详细信息
ISBN:
(纸本)0769517609
Efficient determination of processing termination at barrier synchronization points can occupy an important role in the overall throughput of parallel and distributed computing systems. Even though relatively efficient termination detection techniques have been proposed for certain environments, no effective performance analysis methodology has been introduced to determine application attributes that favor the use of a particular termination detection technique. This fact has hindered the adoption and development of termination detection schemes. This paper addresses this problem by developing a communication pattern based methodology to improve the precision of the theoretical performance of termination detection techniques in lieu of laborious experiments or potentially subjective benchmarking studies. By measuring message complexity from the idle period respect, it provides a simple and effective way to evaluate existing termination detection techniques or design new termination detection algorithms.
In this paper it is shown how linear cellular automata can be computed via the parallel convolution algorithm. It is then shown that any finite forward iteration of a linear cellular automaton map can be directly comp...
详细信息
ISBN:
(纸本)1601320841
In this paper it is shown how linear cellular automata can be computed via the parallel convolution algorithm. It is then shown that any finite forward iteration of a linear cellular automaton map can be directly computed by the Z-transform.
Solving linear systems with a large number of variables is at the core of many scientific problems. parallelprocessingtechniques for solving such systems have received much attention in recent years. A pivotal theme...
详细信息
ISBN:
(纸本)1601320841
Solving linear systems with a large number of variables is at the core of many scientific problems. parallelprocessingtechniques for solving such systems have received much attention in recent years. A pivotal theme in the literature pertains to the application of LU decomposing which factorizes an N x N square matrix in to two triangular matrices so that the resulting linear system can be more easily solved in O(N2) work. Inherently, the computational complexity of LU decomposition is O(N3). Moreover, it is a process that is challenging to parallelize. A highly- parallel methodology for solving large-scale, dense, linear systems is proposed in this paper by means of a novel application of Cramer's Rule. A numerically stable scheme is described, yielding an overall computational complexity of O(N) with N 2 processing units.
The era of distributed computing, where applications are executed on platforms like clusters, grids and/or clouds of computers, have shown the need for taking into account the communications that take place on distrib...
详细信息
ISBN:
(纸本)9780769549521;9781467362399
The era of distributed computing, where applications are executed on platforms like clusters, grids and/or clouds of computers, have shown the need for taking into account the communications that take place on distributed computer architectures when executing applications. In that environment, different communication-aware mapping techniques were proposed for improving the system performance, both for off-chip and for on-chip networks. Some of these proposals are based on heuristic search for finding pseudo-optimal assignments of a given population of tasks and processing elements. The technology improvement has allowed a significant increase in the problem size, multiplying the number of processor cores in each chip. Therefore, the proposals based on heuristic search must be accelerated in order to search in larger exploration domains within the same execution times. In this paper, we propose a comparative study of the parallel version of the local search method for communication-aware task mapping techniques. Unlike other comparative studies of heuristic methods implemented on GPUs, we compare the performance provided by the parallel version for GPUs with the performance provided by a MPI parallel version in terms of execution times and fitness values provided. The MPI version was executed on a cluster optimized for MPI applications. Also, we have considered a GPU with Fermi architecture and we have mapped the local search algorithm onto the GPU in order to improve the performance. The results show that the parallel implementation on a single GPU provides similar fitness function values than the MPI implementation on the cluster. However, the execution times required by the GPU implementation are significantly lower than the ones required by the MPI implementation, and these differences increase as so does size of the parallel system.
With each technology improvement, parallel systems get larger, and the impact of interconnection networks becomes more prominent. Random topologies and their variants received more and more attention lately due to the...
详细信息
ISBN:
(纸本)9781467387767
With each technology improvement, parallel systems get larger, and the impact of interconnection networks becomes more prominent. Random topologies and their variants received more and more attention lately due to their low diameter, low average shortest path length and high scalability. However, existing supercomputers still prefer torus and fat-tree topologies, because a number of existing parallel algorithms are optimized for them and the interconnect implementation is more straight-forward in terms of floor layout. In this paper, we investigate the performance of traditional and emerging parallel workloads on these network topologies, using a event discrete simulation called SimGrid. We observe that random topology is better for Fourier Transform (FT), Graph500, Himeno benchmarks, and its improvement over the counterpart torus is 18 percent in average. Through this study, our recommendation is to use random topology in current and future supercomputers for these scientific and big-data analysis parallelapplications.
Efficient processing of similarity joins is important for a large class of data analysis and data-mining applications. This primitive finds all pairs of records within a predefined distance threshold of each other. Ho...
详细信息
ISBN:
(纸本)9780769539393
Efficient processing of similarity joins is important for a large class of data analysis and data-mining applications. This primitive finds all pairs of records within a predefined distance threshold of each other. However, most of the existing approaches have been based on spatial join techniques designed primarily for data in a vector space. Treating data collections as metric objects brings a great advantage in generality, because a single metric technique can be applied to many specific search problems quite different in nature. In this paper, we concentrate our attention on a special form of join, the Self Similarity Join, which retrieves pairs from the same dataset. In particular, we consider the case in which the dataset is split into subsets that are searched for self similarity join independently (e.g, as in a distributed computing environment). To this end, we formalize the abstract concept of epsilon-Cover, prove its correctness, and demonstrate its effectiveness by applying it to two real implementations on a real-life large dataset.
暂无评论