In the literature various two level interconnection networks are proposed using hypercubes or star graphs. In this paper, a new two level interconnection network topology called the Metastar denoted as Mstar(k,m) is i...
详细信息
ISBN:
(纸本)9783642280726;9783642280733
In the literature various two level interconnection networks are proposed using hypercubes or star graphs. In this paper, a new two level interconnection network topology called the Metastar denoted as Mstar(k,m) is introduced. the proposed network takes the star graph as basic building blocks. Here, the network at the lower level is a star but at the higher level the network is a cube. Its various topological parameters such as packing density, degree, diameter, cost, average distance and hamiltonicity are investigated. Message routing and broadcasting algorithms are also proposed. Performance analysis in terms of topological parameters is done and the proposed network is proved to be a suitable candidate for large scale computing.
Modern HPC applications put forward significant I/O requirements. To deal withthem, MPI provides the MPI-IO API for parallel file access. ROMIO library implements MPI-IO and provides efficient support for parallel I/...
详细信息
ISBN:
(纸本)9780769548791
Modern HPC applications put forward significant I/O requirements. To deal withthem, MPI provides the MPI-IO API for parallel file access. ROMIO library implements MPI-IO and provides efficient support for parallel I/O in C and Fortran based applications. On the other hand, Java based MPI-like libraries such as MPJ Express and F-MPJ have emerged but they lack parallel I/O support. Little research has been done to provide Java based ROMIO-like libraries due to the non-availability of MPI-IO-like API for the Java language. In this paper, we take the first step towards the development of parallel I/O API in Java by evaluating the newly introduced Java NIO API versus the legacy Java I/O API. We propose two simple approaches for performing parallel file I/O using NIO and evaluate them on two different computational platforms. the implementation of proposed approaches exploits the view buffers concept of NIO API to perform efficient array based file I/O operations from multiple processes. We report encouraging speedups and suggest that design of a parallel I/O API in Java should be based on the NIO API.
In this paper, we implement a novel parallelized approach of Local Binary Pattern (LBP) based face recognition algorithm on GPU. High performance rates have been achieved through maximizing the resource exploitation a...
详细信息
ISBN:
(纸本)9780769548791
In this paper, we implement a novel parallelized approach of Local Binary Pattern (LBP) based face recognition algorithm on GPU. High performance rates have been achieved through maximizing the resource exploitation available in the GPU. the launch of GPU programming tools like Open source Computation Language (OpenCL) and (CUDA) have boosted the development of various applications on GPU. In this paper we implement a parallelized LBP algorithm on GPU using OpenCL programming tools. Programs developed under the OpenCL enable us to utilize GPU for general purpose computation with increased performance efficiency in terms of execution time. the experimental results based on the implementation on AMD 6500 GPU processor are observed to increase the computational performance of the system by to 30 folds in case of 1024x1024 images. the relative computational efficiency increases with increase in the size of the Image. this paper addresses several parallelization problems related to memory access and updating, divergent execution paths, understanding and realizing the OpenCL's concurrency and Execution models.
In this paper, we develop a novel strategy to compute the transition matrix for the projection problem in a distributed fashion through gossiping in Wireless Sensor Networks. So far, the transition matrix had to be co...
详细信息
ISBN:
(纸本)9780769547077
In this paper, we develop a novel strategy to compute the transition matrix for the projection problem in a distributed fashion through gossiping in Wireless Sensor Networks. So far, the transition matrix had to be computed off-line by a third party and then provided to the network. the Subspace Projection Problem is useful in various application scenarios (e. g. spectral spatial maps in cognitive radios) and consists of projecting the observed sampled spatial field into a subspace of interest with lower dimension. Although the actual exact computation of the optimal transition matrix is not feasible in a distributed way, we develop an algorithm that is based on well known results from linear algebra and a distributed genetic algorithm in order to compute an approximation of the optimal matrix to a desired precision.
In this paper, a novel implementation of the distributed 3D Fast Fourier Transform (FFT) on a multi-GPU platform using CUDA is presented. the 3D FFT is the core of many simulation methods, thus its fast calculation is...
详细信息
ISBN:
(纸本)9780769548791
In this paper, a novel implementation of the distributed 3D Fast Fourier Transform (FFT) on a multi-GPU platform using CUDA is presented. the 3D FFT is the core of many simulation methods, thus its fast calculation is critical. the main bottleneck of the distributed 3D FFT is the global data exchange which must be performed. the latest version of CUDA introduces direct GPU-to-GPU transfers using a Unified Virtual Address space (UVA) that provides new possibilities for optimising the communication part of the FFT. Here, we propose different implementations of the distributed 3D FFT, investigate their behaviour, and compare their performance withthe single GPU CUFFT and CPU-based FFTW libraries. In particular, we demonstrate the advantage of direct GPU-to-GPU transfers over data exchanges via host main memory. Our preliminary results show that running the distributed 3D FFT with four GPUs can bring a 12% speedup over the single node (CUFFT) while also enabling the calculation of 3D FFTs of larger datasets. Replacing the global data exchange via shared memory with direct GPU-to-GPU transfers reduces the execution time by up to 49%. this clearly shows that direct GPU-to-GPU transfers are the key factor in obtaining good performance on multi-GPU systems.
In a cloud computing environment, users prefer to migrate their locally processing workloads onto the cloud where more resources with better performance can be expected. ProtoGENI [1] and PlanetLab [17] have further i...
详细信息
Withthe increasing complexity and scale of business processes, and the underlying information systems, there is a demand for monitoring the complicated business processes spanning multiple enterprise information syst...
详细信息
Withthe development of WSN applications, there is increasing concern for the research of WSN. the routing algorithm is one of the important supporting technologies of WSN. However, the current studies of routing algo...
详细信息
In this paper, we investigate how MapReduce and Cloud computing can accelerate performance of applications and scale up the computing resources through a real data mining use case in the Biomedical Sciences. We have p...
详细信息
ISBN:
(纸本)9780769548159
In this paper, we investigate how MapReduce and Cloud computing can accelerate performance of applications and scale up the computing resources through a real data mining use case in the Biomedical Sciences. We have prototyped the data mining task using the MapReduce model and evaluated it in the Cloud. A performance evaluation model has been built for assessing the eff ciency of the prototype. the results, from both experiments and the evaluation model, show the performance and scalability can be enhanced through these advanced technologies.
IEEE 802.15.4 standard specifies a beacon-enabled mode which provides a synchronization environment using beacon transmissions. However, this mode is designed for single hop networks and its use in multi-hop networks ...
详细信息
ISBN:
(纸本)9780769547077
IEEE 802.15.4 standard specifies a beacon-enabled mode which provides a synchronization environment using beacon transmissions. However, this mode is designed for single hop networks and its use in multi-hop networks is not straightforward. the main challenges of using beacon-enabled mode in multi-hop networks are how to efficiently schedule beacon transmissions to avoid direct and indirect beacon collisions and how to make a schedule tolerant to the clock drifts due to the low cost components. In this paper, we present TBoPS, a novel technique for scheduling beacons in the cluster tree topology. TBoPS uses a dedicated period called beacon only period (BOP) to schedule beacons at the beginning of IEEE 802.15.4 superframe. the advantages of TBoPS is that every beacon-enabled node selects a beacon schedule distributively during association. We analysed the robustness of TBoPS to clock drifts. We also show through simulations that all nodes in the network are synchronized and follow the same superframe structure.
暂无评论