In this paper, we design and implement a variety of parallel algorithms for both sweep spin selection and random spin selection. We analyze our parallel algorithms on LogP, a portable and general parallel machine mode...
In recent years, grid and mesh structures have received increasing attention. The mesh based multicomputers are the future of processing. As we slowly reach natural limits of semi-conductor spatial density supercomput...
详细信息
ISBN:
(纸本)9783540747666
In recent years, grid and mesh structures have received increasing attention. The mesh based multicomputers are the future of processing. As we slowly reach natural limits of semi-conductor spatial density supercomputer design depends more heavily on parallel and distributedprocessing. This paper concerns mesh allocation algorithms effectiveness assessment and the experimentation system that was developed to provide testing environment. Most focus was put on creation of such a system that would represent the highest scope of real supercomputers inner working routines and at the same time supply a way to input measured processing data as a base of allocation algorithm load computation. In investigations reported different allocation algorithms, including own WSBA, and various task parameters are considered.
This paper describes a routing algorithm that assigns each arriving packet to one of multiple homogeneous parallel servers. The algorithm differs from existing routing policies in that it considers the impact of each ...
详细信息
Approximate string matching problem is a common and often repeated task in information retrieval and bioinformatics. This paper proposes a generic design of a programmable array processor architecture for a wide varie...
详细信息
The development of high-speed networks with data rates of 100 Mbit/s and more as well as the design of advanced applications imposes new requirements on the performance of network nodes. The author describes an approa...
详细信息
The development of high-speed networks with data rates of 100 Mbit/s and more as well as the design of advanced applications imposes new requirements on the performance of network nodes. The author describes an approach to overcome the upcoming processing bottleneck of communication protocols. The approach is based on the use of multiprocessor architectures. Several parallel concepts inside network nodes such as pipeline and array constructs are outlined, and different memory concepts are suggested. Performance measurements of protocol implementations on transputer networks following the proposed parallel concepts indicate very promising performance results.
Current stream processing systems (SPSs) suffer from the imbalanced load and limited parallelism due to skewed data distributions and imbalanced computational resources. We observed that the cause of these problems is...
详细信息
ISBN:
(数字)9781728160955
ISBN:
(纸本)9781728160955
Current stream processing systems (SPSs) suffer from the imbalanced load and limited parallelism due to skewed data distributions and imbalanced computational resources. We observed that the cause of these problems is current SPSs partition their workloads statically. To address this problem, we design a distributed stream processing system, Marabunta, for skewed stream processing. Marabunta performs dynamic scaling and load balancing automatically at runtime. Large partitions in a skewed data distribution can be processed in parallel or migrated to idle machines to achieve load balancing. Moreover, Marabunta uses a new execution model to accelerate the execution by increases the parallelism and the computational resources utilization. We implemented Marabunta in C++ and optimized it for modern hardware. Our evaluations on typical streaming workloads show that Marabunta achieves higher throughputs and better elasticity with both uniform and skewed datasets compared to the state-of-the-art SPSs, e.g., Flink and Heron.
Modern embedded systems encompass a fast-increasing range of applications, spanning from automotive to multimedia, and industrial automation. To tackle the increasing design complexity, the model-based design paradigm...
详细信息
ISBN:
(纸本)9781538649756
Modern embedded systems encompass a fast-increasing range of applications, spanning from automotive to multimedia, and industrial automation. To tackle the increasing design complexity, the model-based design paradigm promotes the use of Models of Computation (MoCs) to capture the essential application properties. Existing MoCs are split between the event/time-triggered paradigm and the data-driven paradigm. However, time and data are two inter-related dimensions that are essential for defining the correct application behavior. In this paper we advocate a unified MoC that integrates the notions of time and data while accounting for imperfect clocks. We present the formal properties of our model and show how the Synchronous Data Flow (SDF) MoC can be used to analyze the time performance guarantees.
Software distributed shared memory (DSM) techniques, while effective on applications with coarse-grained sharing, yield poor performance for the fine-grained sharing encountered in applications increasingly relying on...
详细信息
Software distributed shared memory (DSM) techniques, while effective on applications with coarse-grained sharing, yield poor performance for the fine-grained sharing encountered in applications increasingly relying on sophisticated adaptive and hierarchical algorithms. Such applications exhibit irregular communication patterns unsynchronized with computation, incurring large overheads for synchronous (request-reply) DSM protocols that require responsive processing of coherence messages. We describe a new DSM framework, View Caching, that addresses this problem by utilizing application knowledge of data access semantics to enable the construction of low-overhead, asynchronous coherence protocols. Experiments on the Cray T3D show that view caching enables efficient execution of fine-grained irregular applications, reducing both coherence overheads and idle time to improve performance by up to 35% over a weakly-consistent DSM implementation.
Previous studies in speculative prefetching focus on building and evaluating access models for the purpose of access prediction. This paper on the other hand investigates the performance of speculative prefetching. Wh...
In high energy physics, bioinformatics, and other disciplines, we encounter applications involving numerous, loosely coupled jobs that both access and generate large data sets. So-called Data Grids seek to harness geo...
详细信息
ISBN:
(纸本)0769516866
In high energy physics, bioinformatics, and other disciplines, we encounter applications involving numerous, loosely coupled jobs that both access and generate large data sets. So-called Data Grids seek to harness geographically distributed resources for such large-scale data-intensive problems. Yet effective scheduling in such environments is challenging, due to a need to address a variety of metrics and constraints (e.g., resource utilization, response time, global and local allocation policies) while dealing with multiple, potentially independent sources of jobs and a large number of storage, compute, and network resources. We describe a scheduling framework that addresses these problems. Within this framework, data movement operations may be either tightly bound to job scheduling decisions or, alternatively, performed by a decoupled, asynchronous process on the basis of observed data access patterns and load. We develop a family of job scheduling and data movement (replication) algorithms and use simulation studies to evaluate various combinations. Our results suggest that while it is necessary to consider the impact of replication on the scheduling strategy, it is not always necessary to couple data movement and computation scheduling. Instead, these two activities can be addressed separately, thus significantly simplifying the design and implementation of the overall Data Grid system.
暂无评论