In this paper, a novel implementation of the distributed 3D Fast Fourier Transform (FFT) on a multi-GPU platform using CUDA is presented. the 3D FFT is the core of many simulation methods, thus its fast calculation is...
详细信息
ISBN:
(纸本)9780769548791
In this paper, a novel implementation of the distributed 3D Fast Fourier Transform (FFT) on a multi-GPU platform using CUDA is presented. the 3D FFT is the core of many simulation methods, thus its fast calculation is critical. the main bottleneck of the distributed 3D FFT is the global data exchange which must be performed. the latest version of CUDA introduces direct GPU-to-GPU transfers using a Unified Virtual Address space (UVA) that provides new possibilities for optimising the communication part of the FFT. Here, we propose different implementations of the distributed 3D FFT, investigate their behaviour, and compare their performance withthe single GPU CUFFT and CPU-based FFTW libraries. In particular, we demonstrate the advantage of direct GPU-to-GPU transfers over data exchanges via host main memory. Our preliminary results show that running the distributed 3D FFT with four GPUs can bring a 12% speedup over the single node (CUFFT) while also enabling the calculation of 3D FFTs of larger datasets. Replacing the global data exchange via shared memory with direct GPU-to-GPU transfers reduces the execution time by up to 49%. this clearly shows that direct GPU-to-GPU transfers are the key factor in obtaining good performance on multi-GPU systems.
In this paper we investigate the energy efficiency of processors based on ARM Cortex-A9 cores for scientific numerical applications. We study the performance for a few numerical kernels which appear in a larger set of...
详细信息
ISBN:
(纸本)9780769548654;9781467351461
In this paper we investigate the energy efficiency of processors based on ARM Cortex-A9 cores for scientific numerical applications. We study the performance for a few numerical kernels which appear in a larger set of scientific applications. From power measurements that were performed on different platforms we estimate the energy consumed when executing these kernels.
To improve working efficiency of the risk assessment of Dam-break in Barrier Lake, this paper employs theories and technologies of the distributed virtual reality and geographic information system (GIS) to construct a...
详细信息
the proceedings contain 10 papers. the topics discussed include: distinguishing users with capacitive touch communication;the future of natural user interaction;statistical models for illumination and geometry inferen...
ISBN:
(纸本)9781467325042
the proceedings contain 10 papers. the topics discussed include: distinguishing users with capacitive touch communication;the future of natural user interaction;statistical models for illumination and geometry inference;a reversible two-level image authentication scheme based on chaotic fragile watermark;robot authoring tool with smart application for kids;exploring server redundancy in nonblocking multicast data center networks;device-to-device protocol in heterogeneous wireless environment;a distributed framework of jointly optimal congestion, contention and power control for wireless ad hoc networks;and novel mobile computing platform for high concurrency and reconfiguration.
Withthe rapid development of network hardware technologies and network bandwidth, the high link speeds and huge amount of threats poses challenges to network intrusion detection systems, which must handle the higher ...
详细信息
ISBN:
(纸本)9780769547497
Withthe rapid development of network hardware technologies and network bandwidth, the high link speeds and huge amount of threats poses challenges to network intrusion detection systems, which must handle the higher network traffic and perform more complicated packet processing. In general, pattern matching is a highly computationally intensive process part of network intrusion detection systems. In this paper, we present an efficient GPU-based pattern matching algorithm by leveraging the computational power of GPUs to accelerate the pattern matching operations to increase the over-all processing throughput. From the experiment results, the proposed algorithm achieved a maximum traffic processing throughput of 2.4 Gbit/s. the results demonstrate that GPUs can be used effectively to speed up intrusion detection systems.
In this paper, we investigate how MapReduce and Cloud computing can accelerate performance of applications and scale up the computing resources through a real data mining use case in the Biomedical Sciences. We have p...
详细信息
ISBN:
(纸本)9780769548159
In this paper, we investigate how MapReduce and Cloud computing can accelerate performance of applications and scale up the computing resources through a real data mining use case in the Biomedical Sciences. We have prototyped the data mining task using the MapReduce model and evaluated it in the Cloud. A performance evaluation model has been built for assessing the eff ciency of the prototype. the results, from both experiments and the evaluation model, show the performance and scalability can be enhanced through these advanced technologies.
Withthe development of WSN applications, there is increasing concern for the research of WSN. the routing algorithm is one of the important supporting technologies of WSN. However, the current studies of routing algo...
详细信息
Scalable distributed Data Structures (SDDS) are a user level software component that makes it possible to create a single coherent memory pool out of distributed RAMS of multicomputer nodes. In other words they are a ...
详细信息
ISBN:
(纸本)9783642314995;9783642315008
Scalable distributed Data Structures (SDDS) are a user level software component that makes it possible to create a single coherent memory pool out of distributed RAMS of multicomputer nodes. In other words they are a tool for distributed memory virtualization. applicationsthat use SDDS benefit from a fast data access and a scalability offered by such data structures. On the other hand, adapting an application to work with SDDS may require significant changes in its source code. We have proposed an architecture of SDDS called SDDSfL that overcomes this difficulty by providing SDDS functionality for applications in a form of an operating system service. In this paper we investigate usefulness of SDDSfL for different types of applications.
Higher global bandwidth requirement for many applications and lower network cost have motivated the use of the Dragonfly network topology for high performance computing systems. In this paper we present the architectu...
详细信息
ISBN:
(纸本)9781467308052;9781467308045
Higher global bandwidth requirement for many applications and lower network cost have motivated the use of the Dragonfly network topology for high performance computing systems. In this paper we present the architecture of the Cray Cascade system, a distributed memory system based on the Dragonfly [1] network topology. We describe the structure of the system, its Dragonfly network and the routing algorithms. We describe a set of advanced features supporting both mainstream high performance computingapplications and emerging global address space programing models. We present a combination of performance results from prototype systems and simulation data for large systems. We demonstrate the value of the Dragonfly topology and the benefits obtained through extensive use of adaptive routing.
Existing Grid monitoring approaches do not combine three desirable features: on-line access to monitoring data, advanced query capabilities and data reduction. We present a solution for on-line monitoring of large-sca...
详细信息
ISBN:
(纸本)9783642314995;9783642315008
Existing Grid monitoring approaches do not combine three desirable features: on-line access to monitoring data, advanced query capabilities and data reduction. We present a solution for on-line monitoring of large-scale computing infrastructures based on Complex Event Processing principles and technologies. We focus on leveraging CEP for distributed processing of client queries and monitoring data streams. this results in significant reduction of network traffic due to on-line monitoring. We discuss benefits of CEP-based approach to monitoring and describe details of processing queries in a distributed way. A case study monitoring of load caused by jobs in a Grid infrastructure is presented. Performance evaluation to investigate monitoring overhead in terms of CPU, memory and network traffic is also provided.
暂无评论