Software distributed shared memory (DSM) techniques, while effective on applications with coarse-grained sharing, yield poor performance for the fine-grained sharing encountered in applications increasingly relying on...
详细信息
Software distributed shared memory (DSM) techniques, while effective on applications with coarse-grained sharing, yield poor performance for the fine-grained sharing encountered in applications increasingly relying on sophisticated adaptive and hierarchical algorithms. Such applications exhibit irregular communication patterns unsynchronized with computation, incurring large overheads for synchronous (request-reply) DSM protocols that require responsive processing of coherence messages. We describe a new DSM framework, View Caching, that addresses this problem by utilizing application knowledge of data access semantics to enable the construction of low-overhead, asynchronous coherence protocols. Experiments on the Cray T3D show that view caching enables efficient execution of fine-grained irregular applications, reducing both coherence overheads and idle time to improve performance by up to 35% over a weakly-consistent DSM implementation.
This paper examines the effects of relaxed synchronization on both the numerical and parallel efficiency of parallel genetic algorithms (GAs). We describe a coarse-grain geographically structured parallel genetic algo...
详细信息
This paper examines the effects of relaxed synchronization on both the numerical and parallel efficiency of parallel genetic algorithms (GAs). We describe a coarse-grain geographically structured parallel genetic algorithm. Our experiments provide preliminary evidence that asynchronous versions of these algorithms have a lower run time than synchronous GAs. Our analysis shows that this improvement is due to (1) decreased synchronization costs and (2) high numerical efficiency (e.g. fewer function evaluations) for the asynchronous GAs. This analysis includes a critique of the utility of traditional parallel performance measures for parallel GAs.
The main contributions of this paper are in designing fast and scalable parallel algorithms for selection and median filtering. Based on the radix-ω representation and the power of pipelined optical buses, we first d...
详细信息
Multiprocessor scheduling in a shared multiprogramming environment can be structured in two levels, where a kernel-level OS allocator allots processors to jobs and a user-level thread scheduler maps the ready threads ...
详细信息
Multiprocessor scheduling in a shared multiprogramming environment can be structured in two levels, where a kernel-level OS allocator allots processors to jobs and a user-level thread scheduler maps the ready threads of a job onto the allotted processors. Between scheduling quanta, each thread scheduler computes its desire for processors in the upcoming quantum and feeds back to the OS allocator. The OS allocator then adjusts the allotment of processors for the next quantum. We present two provably efficient two-level scheduling schemes, called GRAD and SRAD, respectively. Both schemes use the same OS allocator RAD for the processor allotments, which ensures fair allocation under all levels of workload. In GRAD, RAD is combined with a greedy thread scheduler;in SRAD, RAD is combined with a work-stealing thread scheduler. The greedy thread scheduler is suitable for centralized scheduling, whereas the work-stealing thread scheduler is more suitable for distributed settings. Both GRAD and SRAD are nonclairvoyant, i.e., they do not require advance knowledge about the job's parallelism and arrival time. Moreover, they provide effective control over the scheduling overhead and ensure efficient utilization of processors. We analyze the competitiveness of both GRAD and SRAD with respect to an optimal clairvoyant scheduler. In terms of makespan, both schemes can achieve O(-)-competitiveness for any set of jobs with arbitrary release times. In terms of the mean response time, both schemes are O(1)-competitive against the offline optimal scheduler for arbitrary batched jobs. GRAD and SRAD are the first nonclairvoyant scheduling algorithms that guarantee provable efficiency, fairness, and minimal overhead simultaneously.
Researchers must often write their own simulation and analysis software. During this process they simultaneously confront both computational and scientific problems. Current strategies for aiding the generation of per...
详细信息
Efficient storage and retrieval of large multidimensional datasets is an important concern for large-scale scientific computations such as long-running time-dependent simulations which periodically generate snapshots ...
详细信息
ISBN:
(纸本)0818672552
Efficient storage and retrieval of large multidimensional datasets is an important concern for large-scale scientific computations such as long-running time-dependent simulations which periodically generate snapshots of the state. The main challenge for efficiently handling such datasets is to minimize response time for multidimensional range queries. The grid file is one of the well known access methods for multidimensional and spatial data. We investigate effective and scalable declustering techniques for grid files with the primary goal of minimizing response time and the secondary goal of maximizing the fairness of data distribution. The main contributions of this paper are (1) analytic and experimental evaluation of existing index-based declustering techniques and their extensions for grid files, and (2) development of a proximity-based declustering algorithm called minimax which is experimentally shown to scale and to consistently achieve better response time compared to available algorithms while maintaining perfect disk distribution.
Modern General Purpose Graphics processing Units (GPGPUs) provide high degrees of parallelism in computation and memory access, making them suitable for data parallelapplications such as those using the elastic MapRe...
详细信息
Collective communication libraries are widely developed and used in scientific community to support parallel and Grid programming. On the other side they often lack in Mobile Agents systems even if message passing is ...
详细信息
ISBN:
(纸本)9783540680673
Collective communication libraries are widely developed and used in scientific community to support parallel and Grid programming. On the other side they often lack in Mobile Agents systems even if message passing is always supported to grant communication ability to the agents. Collective communication primitives can help to develop agents based parallel application. They can also benefit social ability and interactions of collaborative agents. Here we present a collective communication service implemented in the Jade agent platform. Furthermore we propose its exploitation to interface transparently heterogeneous executions instances of a scientific parallel application that runs in a distributed environment.
parallel and distributed computing have enabled development of much more scalable software. However, developing concurrent software requires the programmer to be aware of non-determinism, data races, and deadlocks. MP...
详细信息
ISBN:
(纸本)9781538609415
parallel and distributed computing have enabled development of much more scalable software. However, developing concurrent software requires the programmer to be aware of non-determinism, data races, and deadlocks. MPI (message passing interface) is a popular standard for writing message-oriented distributedapplications. Some messages in MPI systems can be processed by one of the many machines and in many possible orders. This non-determinism can affect the result of an MPI application. The alternate results may or may not be correct. To verify MPI applications, we need to check all these possible orderings and use an application specific oracle to decide if these orderings give correct output. MPJ Express is an open source Java implementation of the MPI standard. Model checking of MPI Java programs is a challenging task due to their parallel nature. We developed a Java based model of MPJ Express, where processes are modeled as threads, and which can run unmodified MPI Java programs on a single system. This model enabled us to adapt the Java PathFinder explicit state software model checker (JPF) using a custom listener to verify our model running real MPI Java programs. The evaluation of our approach shows that model checking reveals incorrect system behavior that results in very intricate message orderings.
暂无评论