there are very many issues, where scheduling can be applied, in computer systems (single and multiprocessors) as well as in production systems. Scheduling problems belongs in most cases to NP-hard class. For most of c...
详细信息
ISBN:
(纸本)0769520804
there are very many issues, where scheduling can be applied, in computer systems (single and multiprocessors) as well as in production systems. Scheduling problems belongs in most cases to NP-hard class. For most of classical scheduling problems, published in last 10 years (for example benchmarks of Taillard [14]for the flow shop problem), there are still no optimal solutions. In this paper we propose very effective method of construct parallelalgorithms based on tabu search metaheuristic. We apply block properties, which enable parallel algorithm to distribute calculations and reduce communication between processors. algorithms are implemented in Ada95 and MPI.(*)
parallel jobs are characterized for having processes that communicate and synchronize with each other frequently. A processor allocation strategy widely used in parallel supercomputers is Space-Sharing, that is assign...
详细信息
ISBN:
(纸本)0769522297
parallel jobs are characterized for having processes that communicate and synchronize with each other frequently. A processor allocation strategy widely used in parallel supercomputers is Space-Sharing, that is assigning a processors partition to each job for its exclusive use. In this article we present a global solution to offer virtual Malleability on message passing parallel jobs, by applying a processor allocation strategy, the Folding by JobType (FJT). this technique is based on Folding and Moldability concepts and tries to decide the optimal initial number of processes, when to fold jobs and the number of folding times by analyzing the current and past system information. At processor level, we apply Co-Scheduling. We implement and evaluate the FJT under several workloads with different job sizes, classes and machine utilization. Results show that the FJT adapts easily to load changes, and can obtain better performance than the rest evaluated, on workloads with high coefficient variation and especially with burst arrivals.
Mixed-parallelism, the combination of data-and task parallelism, is a powerful way of increasing the scalability of entire classes of parallel applications on platforms comprising multiple compute clusters. While mult...
详细信息
Due to advances in fiber-optics and VLSI technology, interconnection networks which allow multiple simultaneous broadcasts are becoming feasible. this paper presents the multiprocessor architecture of the Simultaneous...
详细信息
Due to advances in fiber-optics and VLSI technology, interconnection networks which allow multiple simultaneous broadcasts are becoming feasible. this paper presents the multiprocessor architecture of the Simultaneous Optical Multiprocessor Exchange Bus (SOME-Bus), and examines the performance of representative algorithms for matrix operations, merging and sorting. using the message-passing and distributed-shared-memory paradigms. It shows that simple enhancements to the network interface and the cache and directory controllers can result in communication time of 0(l) for the matrix-vector multiplication algorithm using DSM. the SOME-Bus is a low-latency, high-bandwidth, fiber-optic interconnection network which directly links arbitrary pairs of processor nodes without contention, and can efficiently interconnect over 100 nodes. It contains a dedicated channel for the data output of each node, eliminating the need for global arbitration and providing bandwidththat scales directly withthe number of nodes in the system. Each of P nodes has an array of receivers, with one receiver dedicated to each node output channel. No node is ever blocked from transmitting by another transmitter or due to contention for shared switching logic. the entire P receiver array can be integrated on a single chip at a comparatively minor cost resulting in O(P) complexity. the SOME-Bus has much more functionality than a crossbar by supporting multiple simultaneous broadcasts of messages, allowing cache consistency protocols to complete much faster. (C) 2003 Elsevier B.V. All rights reserved.
We describe how the ASSIST parallel programming environment can be used to run parallel programs on collections of heterogeneous workstations and evaluate the scalability of one task-farm real application and a data-p...
详细信息
Technology trends present new challenges for processor architectures and their instruction schedulers. Growing transistor density will increase the number of execution units on a single chip, and decreasing wire trans...
详细信息
ISBN:
(纸本)0769522297
Technology trends present new challenges for processor architectures and their instruction schedulers. Growing transistor density will increase the number of execution units on a single chip, and decreasing wire transmission speeds will cause long and variable on-chip latencies. these trends will severely limit the two dominant conventional architectures: dynamic issue superscalars, and static placement and issue VLIWs. We present a new execution model in which the hardware and static scheduler instead work cooperatively, called Static Placement Dynamic Issue (SPDI). this paper focuses on the static instruction scheduler for SPDI. We identify and explore three issues SPDI schedulers must consider - locality, contention, and depth of speculation. We evaluate a range of SPDI scheduling algorithms executing on an Explicit Data Graph Execution (EDGE) architecture. We find that a surprisingly simple one achieves an average of 5.6 instructions-per-cycle (IPC) for SPEC2000 64-wide issue machine, and is within 80% of the performance without on-chip latencies. these results suggest that the compiler is effective at balancing on-chip latency and parallelism, and that the division of responsibilities between the compiler and the architecture is well suited to future systems.
Reconfigurable systems have the potential to combine the performance of ASICs withthe flexibility of software. the architecture presented in this paper offers a new concept for reconfiguration by operating self-timed...
详细信息
ISBN:
(纸本)1402081480
Reconfigurable systems have the potential to combine the performance of ASICs withthe flexibility of software. the architecture presented in this paper offers a new concept for reconfiguration by operating self-timed and self-controlling. Data is routed together with its control information in a so-called packet through the operator network to make local decisions concerning the behavior of the network. therefore, we can realize different paths without a central control unit. In this paper, we describe the architecture from the aspect of reconfiguration. An example shows the architecture in practical operation.
A sensor network forms a loosely-coupled distributed environment where collaborative processing among multiple sensor nodes is essential in order to compensate for the limitation of each sensor node in its processing ...
详细信息
ISBN:
(纸本)0769521525
A sensor network forms a loosely-coupled distributed environment where collaborative processing among multiple sensor nodes is essential in order to compensate for the limitation of each sensor node in its processing capability, sensing capability, and energy usage, as well as to improve the degree of fault tolerance. Due to the sheer amount of nodes deployed, collaboration is usually carried out among nodes within the same cluster. Different clustering protocols can affect the performance of network to a great extent. Most existing clustering protocols either do not adequately address the energy-constraint problem or derive clusters proactively which may not be suitable for event-driven collaborative processing in sensor networks. this paper focuses on the design of clustering protocols for collaborative processing. We propose a decentralized reactive clustering (DRC) protocol where the clustering procedure is initiated only when events are detected. It uses power control technique to minimize energy usage in forming clusters. We compare the performance of DRC with another popular clustering algorithm, LEACH. Simulation results show considerable improvement over LEA CH in energy conservation and network lifetime using DRC.
Hierarchical agglomerative clustering (HAC) is a common clustering method that outputs a dendrogram showing all N levels of agglomerations where N is the number of objects in the data set. High time and memory complex...
详细信息
Efficient parallel preconditioned iterative linear solvers for unstructured grid have been developed for symmetric multiprocessor (SMP) cluster architectures with vector processors such as the Earth Simulator. three t...
详细信息
ISBN:
(纸本)076952138X
Efficient parallel preconditioned iterative linear solvers for unstructured grid have been developed for symmetric multiprocessor (SMP) cluster architectures with vector processors such as the Earth Simulator. three types of preconditioning, methods (ICCG, multigrid and selective-blocking for contact problems) have been developed and performance has been demonstrated on the Earth Simulator using flat-MPI and hybrid parallel programming models, where each of three preconditioning methods corresponds, to typical finite-element type applications in solid earth simulation developed in GeoFEM project. Simple 3D linear elastic problems with more than 2.2x10(9) DOF have been solved using 3x3 block ICCG(0) method and PDJDS/CM-RCM reordering on 176 nodes of the Earth Simulator, achieving performance of 3.80 TFLOPS. Multicolor and RCM ordering provide excellent parallel and vector performance of the three preconditioned methods on the Earth Simulator.
暂无评论