Suffix arrays and trees are important and fundamental string data structures which lie at the foundation of many string algorithms, with important applications in computational biology, text processing, and informatio...
详细信息
ISBN:
(数字)9781450362290
ISBN:
(纸本)9781450362290
Suffix arrays and trees are important and fundamental string data structures which lie at the foundation of many string algorithms, with important applications in computational biology, text processing, and information retrieval. Recent work enables the efficient parallel construction of suffix arrays and trees requiring at most O(n/p) memory per process in distributedmemory. However, querying these indexes in distributedmemory has not been studied extensively. Querying common string indexes such as suffix arrays, enhanced suffix arrays, and FM-Index, all require random accesses into O(n) memory - which in distributedmemory settings becomes prohibitively expensive. In this paper, we introduce a novel distributed string index, the distributed Enhanced Suffix Array (DESA). We present efficient algorithms for the construction and querying of this distributed data structure, all while requiring only O(n/p) memory per process. We further provide a scalable parallel implementation and demonstrate its performance and scalability.
The computational power of self-stabilizing distributed systems is examined. Assuming availability of any number of processors, each with (small) constant size memory we show that any computable problem can be realize...
详细信息
The computational power of self-stabilizing distributed systems is examined. Assuming availability of any number of processors, each with (small) constant size memory we show that any computable problem can be realized in a self-stabilizing fashion. The result is derived by presenting a distributed system which tolerates transient faults and simulates the execution of a Turing machine. The total amount of memory required by the distributed system is equal to the memory used by the Turing machine (up to a constant factor).
The concept of a structured distributed shared memory in which memory units are objects is introduced. The coherence of object replicas is maintained by type-specific coherence protocols that are based on the semantic...
详细信息
A compute farm is a pool of clustered workstations to provide high performance computing services for CPU-intensive, memory-intensive, and I/O active jobs in a batch merle. Existing loan sharing schemes with memory co...
详细信息
ISBN:
(纸本)0769510779
A compute farm is a pool of clustered workstations to provide high performance computing services for CPU-intensive, memory-intensive, and I/O active jobs in a batch merle. Existing loan sharing schemes with memory considerations assume jobs' memory demand sizes are known in advance or predictable based on users' hints. This assumption call greatly simplify the designs and implementations of load sharing schemes, but is not desirable in practice. In order to address this concern, we present three new results and contributions in this study (1) Conducting Linux kernel instrumentation, we have collected different types of workload execution traces to quantitatively characterize job interactions, and modeled page fault behavior as a function of the overloaded memory sizes and the amount of jobs' I/O activities. (2) Based on experimental results and collected dynamic system information, we have built a simulation model which accurately emulates the memory system operations and job migrations with virtual memory considerations. (3) We have proposed a memory-centric load sharing scheme and its variations to effectively process dynamic memory allocation demands, aiming at minimizing execution time of each individual job by dynamically migrating and remotely submitting jobs to eliminate or reduce page faults and to reduce the queuing rime for CPU services. Conducting trace-driven simulations, we have examined these load sharing policies to show their effectiveness.
memory consistency model is crucial to the performance of shared-memory multiprocessors, and in current architectures several different models are adopted. In this paper, using graph algorithms for illustrative purpos...
详细信息
ISBN:
(纸本)9780889866386
memory consistency model is crucial to the performance of shared-memory multiprocessors, and in current architectures several different models are adopted. In this paper, using graph algorithms for illustrative purposes, we consider the impact of memory model on the implementation and performance of parallel algorithms on shared-memory multiprocessors. We show that the implementation of PRAM algorithm's is largely "oblivious" of the underlying memory model, and has good performance on relaxed models. More importantly, we show that different memory models can favor drastically different algorithm designs.
There are two distinct types of MIMD (Multiple Instruction, Multiple Data) computers: the shared memory machine, e.g. Butterfly, and the distributedmemory machine, e.g. Hypercubes, Transputer arrays. Typically these ...
详细信息
There are two distinct types of MIMD (Multiple Instruction, Multiple Data) computers: the shared memory machine, e.g. Butterfly, and the distributedmemory machine, e.g. Hypercubes, Transputer arrays. Typically these utilize different programming models: the shared memory machine has monitors, semaphores and fetch-and-add; whereas the distributedmemory machine uses message passing. Moreover there are two popular types of operating systems: a multi-tasking, asynchronous operating system and a crystalline, loosely synchronous operating system.
In this paper I firstly describe the Butterfly, Hypercube and Transputer array MIMD computers, and review monitors, semaphores, fetch-and-add and message passing; then I explain the two types of operating systems and give examples of how they are implemented on these MIMD computers. Next I discuss the advantages and disadvantages of shared memory machines with monitors, semaphores and fetch-and-add, compared to distributedmemory machines using message passing, answering questions such as “is one model ‘easier’ to program than the other?” and “which is ‘more efficient‘?”. One may think that a shared memory machine with monitors, semaphores and fetch-and-add is simpler to program and runs faster than a distributedmemory machine using message passing but we shall see that this is not necessarily the case. Finally I briefly discuss which type of operating system to use and on which type of computer. This of course depends on the algorithm one wishes to compute.
distributed object-oriented middleware technologies have been adopted for ubiquitous communicating real-time and embedded systems. Although Java plays an important role in building distributed object-oriented middlewa...
详细信息
ISBN:
(纸本)9780889866386
distributed object-oriented middleware technologies have been adopted for ubiquitous communicating real-time and embedded systems. Although Java plays an important role in building distributed object-oriented middleware because of its portability and productivity, it is not popularto real-time system developers because it is slow and hard to keep real-timeness due to garbage collection. This paper presents a novel memory classification and measurement approach for modem programming language such as Java to analyze the memory utilization of middleware technologies in order to improve them for embedded systems using limited computing resource. We classified Java memory into four categories such as static, quasi-static, quasi-dynamic and dynamic, and show how to measure the size of each memory. Intensive case studies using popular Java middleware technologies such as Web Services, CORBA, RMI and HORB revealed that reducing the use of dynamic memory has contributed far more to the efficiency and performance than reducing the use of quasi-static and quasi-dynamic memory.
The notion of invariant consistency was proposed that allowed the programmers to specify inter-process ordering requirements. Data consistency protocols provided a consistent view of the shared memory in the presence ...
详细信息
ISBN:
(纸本)0769515851
The notion of invariant consistency was proposed that allowed the programmers to specify inter-process ordering requirements. Data consistency protocols provided a consistent view of the shared memory in the presence of multiple copies, implemented in message passing systems. The combination of invariant consistency and sequential consistency, known as InvSC consistency was studied, and a systematic way to modify the Lazy Cache algorithm for implementing InvSC consistency was proposed.
The Von Neumann's architecture has been the dominant computing paradigm ever since its inception in the mid-forties. It revolves around the concept of a "stored program" in memory, and a central processi...
详细信息
ISBN:
(纸本)9781538668719
The Von Neumann's architecture has been the dominant computing paradigm ever since its inception in the mid-forties. It revolves around the concept of a "stored program" in memory, and a central processing unit that executes the program. As an alternative, Processing-In-memory (PIM) ideas have been around for at least two decades, however with very limited adoption. Today, three trends are creating a compelling motivation to take a second look. Novel devices such as memristor blur the boundary between memory and compute, effectively providing both in the same element. Power efficiency has become very important, both in the datacenter and at the edge. Machine learning applications driven by a data-flow model have become ubiquitous. In this paper, we sketch our computing-In-memory (CIM) vision, and its substantial performance and power improvement potential. Compared to PIM models, CIM more clearly separates computing from memory. We then discuss the programming model, which we consider the biggest challenge. We close by describing how CIM impacts non-functional characteristics, such as reliability, scale, and configurability.
As data scales continue to increase, studying the porting and implementation of shared memory parallel algorithms for distributedmemory architectures becomes increasingly important. We consider the problem of biconne...
详细信息
ISBN:
(数字)9781665497862
ISBN:
(纸本)9781665497862
As data scales continue to increase, studying the porting and implementation of shared memory parallel algorithms for distributedmemory architectures becomes increasingly important. We consider the problem of biconnectivity for this current study, which identifies cut vertices and cut edges in a graph. As part of our study, we implemented and optimized a shared memory biconnectivity algorithm based on color propagation within a distributedmemory context. This algorithm is neither work nor time efficient. However, when we compare to distributed implementations of theoretically efficient algorithms, we find that simple non-optimal algorithms can greatly outperform time-efficient algorithms in practice when implemented for real distributed-memory environments and real data. Overall, our distributed implementation for computing graph biconnectivity demonstrates an average strong scaling speedup of 15 x across 64 MPI ranks on a suite of irregular real-world inputs. We also note an average of 11 x and 7.3 x speedup relative to the optimal serial algorithm and fastest shared-memory implementation for the biconnectivity problem, respectively.
暂无评论