Individual-based simulations are an important class of applications where a complex system is modeled as a collection of autonomous entities, each having its own identify and behavior in the underlying simulated space...
详细信息
ISBN:
(纸本)9783642038686
Individual-based simulations are an important class of applications where a complex system is modeled as a collection of autonomous entities, each having its own identify and behavior in the underlying simulated space. The main drawback of such simulations is that they are extremely compute-intensive. We consider the class of individual-based simulations where the simulated entities interact with one another indirectly through the underlying simulated space, significant performance improvement is attainable through parallelism on a network of machines. We present a data distribution and an approach to reduce the communication overhead;which leads to significant performance improvements while preserving the accuracy of the simulation.
In many applications using database systems, the conventional method of transaction processing can not be used. This is on account of lack of integration and existence of centralized solutions. Such situations exist w...
详细信息
In many applications using database systems, the conventional method of transaction processing can not be used. This is on account of lack of integration and existence of centralized solutions. Such situations exist within heterogeneous systems, mobile database transactions and time-critical applications requiring admission on priority for a select group of transactions. For example, in conventional methods, the deadlock detection is based on use of delay to cause and watch deadlocks. It generates many difficulties, such as, (a) high overheads of periodic checking (b) Non-deterministic nature of the delays, and (c) difficulties to scale-up the centralized solutions. The existing proposal lacks in local processing for distributed transactions. The proposed technique uses normal message communication among peers. The proposal leads to enhanced role for resource sites. The proposal introduces asynchronous operations in transaction processing. As a result the detection processes do not wait for occurrences of time-outs delays. In most cases the technique eliminates the possibility of occurrence of waiting delays.
We describe computational science research that uses petascale resources to achieve scientific results at unprecedented scales and resolution. The applications span a wide range of domains, from investigation of funda...
详细信息
We describe computational science research that uses petascale resources to achieve scientific results at unprecedented scales and resolution. The applications span a wide range of domains, from investigation of fundamental problems in turbulence through computational materials science research to biomedical applications at the forefront of HIV/AIDS research and cerebrovascular haemodynamics. This work was mainly performed on the US TeraGrid 'petascale' resource, Ranger, at Texas Advanced computing Center, in the first half of 2008 when it was the largest computing system in the world available for open scientific research. We have sought to use this petascale supercomputer optimally across application domains and scales, exploiting the excellent parallel scaling performance found on up to at least 32 768 cores for certain of our codes in the so-called 'capability computing' category as well as high-throughput intermediate-scale jobs for ensemble simulations in the 32-512 core range. Furthermore, this activity provides evidence that conventional parallel programming with MPI should be successful at the petascale in the short to medium term. We also report on the parallel performance of some of our codes on up to 65 636 cores on the IBM Blue Gene/P system at the Argonne Leadership computing Facility, which has recently been named the fastest supercomputer in the world for open science.
In this paper, we present parallel algorithms for lossless data compression based on the Burrows-Wheeler Transform (BWT) block-sorting technique. We investigate the performance of using data parallelism and task paral...
详细信息
In this paper, we present parallel algorithms for lossless data compression based on the Burrows-Wheeler Transform (BWT) block-sorting technique. We investigate the performance of using data parallelism and task parallelism for both multi-threaded and message-passing programming. The output produced by the parallel algorithms is fully compatible with their sequential counterparts. To balance the workload among processors we develop a task scheduling strategy. An extensive set of experiments is performed with a shared memory NUMA system using up to 120 processors and on a distributed memory cluster using up to 100 processors. Our experimental results show that significant speedup can be achieved with both data parallel and task parallel methodologies. These algorithms will greatly reduce the amount of time it takes to compress large amounts of data while the compressed data remains in a form that users without access to multiple processor systems can still use.
In a network computing platform, tasks compete with others for shared resources to communicate messages. Incremental computing masks communication latency by overlapping computation with communication. However, a sequ...
详细信息
In a network computing platform, tasks compete with others for shared resources to communicate messages. Incremental computing masks communication latency by overlapping computation with communication. However, a sequence of messages with a large latency variance still makes computations proceed intermittently. In this paper, the impact of the message sequence on computation efficiency is studied and a framework which employs a well organized message sequence to maximize the efficiency of computations is introduced. Firstly, a network computing model for performing incremental computations is proposed. Based on the model, theorems are developed as the groundwork based on which algorithms for finding a well organized message sequence are derived. Finally, algorithms which find a well organized message sequence in O((r/k)(k+1)) P) and O(r!/(k!)(r/k) ) comparison steps are given for sending r input data items using r/k messages of a given size k.
distributed object-oriented platforms are increasingly important in wireless environments to provide frameworks for collaborative computation and for managing a large pool of distributed resources. One of the importan...
详细信息
distributed object-oriented platforms are increasingly important in wireless environments to provide frameworks for collaborative computation and for managing a large pool of distributed resources. One of the important layers for implementing distributedcomputing in such environments is via remoting mechanisms. For example, Java uses Remote Method Invocation (RMI) for handling distributed controls. In this paper, we investigate the support for this important layer on wireless environments and address the issues to support Java RMI over heterogeneous wireless environments. We present a case study for supporting Java RMI in Bluetooth, GPRS, and WLAN environments, which represents an important middleware for component communications. The Bluetooth layer is supported by incorporating a set of protocol stack layers for Bluetooth, known as JavaBT that has been developed by us, and by supporting an L2CAP layer with sockets to support Java RMI sockets. RMI over GPRS/WLAN is achieved by RMI implementation over IP layer. Our support for the roaming of Java RMI over heterogeneous wireless networks is based on the concept of direct connection, which avoids the problems caused by forwarding. The difficulty of this strategy is how to handle the existing connection when the mobile node moves to another location so as to avoid interruption of the high-level applications. We solve this problem in Java RMI by the support of dynamic addresses and dynamic sockets. We also propose algorithms to handle the handoff process. In addition, methods for connect-loss detection and data-integrity maintenance in dealing with roaming scenarios are also *** Grande benchmarks are used to demonstrate that our RMI implementations over GPRS, WLAN, and Bluetooth networks are effective in supporting parallel and distributed control of Java layers in heterogeneous wireless environments. (C) 2008 Elsevier Inc. All rights reserved.
In a grid computing environment, the network characteristics such as bandwidth and latency affect the task performance. The demands for bandwidth of wide-area networks become large and it reaches more than 100Gbps. In...
详细信息
In a grid computing environment, the network characteristics such as bandwidth and latency affect the task performance. The demands for bandwidth of wide-area networks become large and it reaches more than 100Gbps. In this article, we focus on parallel routes transmission, such as link aggregation, to realize large bandwidth network. The performance of grid computing with parallel routes transmission is evaluated on the emulated wide-area network.
parallelism has long been used to increase the throughput of applications that process independent data. With the advent of multicore technology designers and programmers are increasingly forced to think in parallel. ...
详细信息
ISBN:
(纸本)9783540928584
parallelism has long been used to increase the throughput of applications that process independent data. With the advent of multicore technology designers and programmers are increasingly forced to think in parallel. In this paper we present the evaluation of an encryption core capable of handling multiple data streams. The design is oriented towards future scenarios for internet, where throughput capacity requirements together with privacy and integrity will be critical for both personal and corporate users. To power such scenarios we present a technique that increases the efficiency of memory bandwidth utilization of cryptographic cores. We propose to feed cryptographic engines with multiple streams to better exploit the available bandwidth. To validate our claims, we have developed an AES core capable of encrypting two streams in parallel using either ECB or CBC modes. Our AES core implementation consumes trivial amount of resources when a. Virtex-II Pro FPGA device is targeted.
parallel netCDF supports parallel I/O operations for a view of data as a collection of self-describing, portable, and array-oriented objects that can be accessed through a simple interface. Its parallel I/O operations...
详细信息
ISBN:
(纸本)9783540928584
parallel netCDF supports parallel I/O operations for a view of data as a collection of self-describing, portable, and array-oriented objects that can be accessed through a simple interface. Its parallel I/O operations are realized with the help of an MPI-I/O library. However, Such the operations are not available in remote I/O operations. So, a remote I/O mechanism of a Stampi library was introduced in an MPI layer of the parallel netCDF to realize such the operations. This system was evaluated on two interconnected PC clusters, and sufficient performance was achieved with a huge amount of data.
P2P workflow systems meet the development trend of workflow systems. The running path optimization (the path with the minimum running time which consists of the services discovery time and the task execution time) in ...
详细信息
ISBN:
(纸本)9780769534985
P2P workflow systems meet the development trend of workflow systems. The running path optimization (the path with the minimum running time which consists of the services discovery time and the task execution time) in full-distribution (decentralized) environments has been a bottleneck restricting the performance of P2P workflow systems. Aiming at this problem, this paper proposes a innovative hybrid P2P location network by introducing a novel structured P2P network which effectively reduces the service discovery time with one hop routing and the search bandwidth with 100% location precision at each step and a novel decentralized workload balance network which optimizes the tasks execution time at each step with little churns of network to P2P workflow systems for the first time to fit the global optimal running path with the sequence of the optimal single running steps.
暂无评论