BlueGene/L is a massively parallel computer system with 65,536 dual-processor compute nodes. the peak performance of BlueGene/L is in excess of 360 TFLOP/s if both processor cores in a node are used for computation. T...
详细信息
ISBN:
(纸本)0769520464
BlueGene/L is a massively parallel computer system with 65,536 dual-processor compute nodes. the peak performance of BlueGene/L is in excess of 360 TFLOP/s if both processor cores in a node are used for computation. the main challenge of deploying this dual-core mode of operation is that the L1 caches in each core are not hardware coherent. this forces a software-based approach to cache coherence and guides our design of a programming model for dual-core mode. In this paper we describe the design, implementation, and performance evaluation of system software for enabling the use of dual-core mode on BlueGene/L. Our preliminary performance results show that our approach to dual-core mode is effective for key numerical kernels.
this paper presents implementation of a very fast parallel complex FFT on M2, the second generation of MorphoSys Reconfigurable computation platform, which is targeting on streamed applications such as multimedia and ...
详细信息
ISBN:
(纸本)0769520464
this paper presents implementation of a very fast parallel complex FFT on M2, the second generation of MorphoSys Reconfigurable computation platform, which is targeting on streamed applications such as multimedia and DSP. the proposed mapping comprises fast presorting, cascaded radix-2 stages, and post-reordering. Data and twiddle factors are 16-bit real and 16-bit imaginary in 2's complement format and scaling is performed to avoid overflow. the mapping is tested on our cycle-accurate simulator, "Mulate", and the performance is encouragingly better than other architectures such as Imagine and VIRAM. Moreover, the performance is scalable according to FFT sizes. Since there is no functionality specifically tailored to FFT, the results demonstrate the capability of MorphoSys architecture to extract parallelism from streamed applications. Further rationales are given based on the concepts of scalar operand networks and memory hierarchy.
Trace reuse improves the performance of processors by skipping the execution of sequences of redundant instructions. However, many reusable traces do not have all of their inputs ready by the time the reuse test is do...
详细信息
ISBN:
(纸本)0769520464
Trace reuse improves the performance of processors by skipping the execution of sequences of redundant instructions. However, many reusable traces do not have all of their inputs ready by the time the reuse test is done. For these cases, we developed a new technique called Reuse through Speculation on Traces (RST), where trace inputs may be predicted this paper studies the limits of RST for modern processors with deep pipelines, as well as the effects of constraining resources on performance. We show that our approach reuses more traces than the non-speculative trace reuse technique, with speedups of 43% over a non-speculative trace reuse and 57% when memory accesses are reused.
Program tracing is one of the most used techniques to debug parallel and distributed programs. In this technique, events are recorded in trace files during the execution of the program for post mortem visualization of...
详细信息
ISBN:
(纸本)0769520464
Program tracing is one of the most used techniques to debug parallel and distributed programs. In this technique, events are recorded in trace files during the execution of the program for post mortem visualization of its behavior this article describes JRastro, a trace agent capable of tracing Java programs. the agent was designed to cover three key features: to be transparent to the application developer to use unmodified Java Virtual Machines and to observe Remote Method Invocations. By integrating these three features, JRastro differentiates itself from similar tools. Unfortunately, for a complete and clean implementation of RMI visualization, additional support on the Java monitoring system is needed.
Scheduling by Edge Reversal (SER) is a fully distributed scheduling mechanism based on the manipulation of acyclic orientations of a graph. this work uses SER to perform constraint partitioning of Constraint Satisfact...
详细信息
ISBN:
(纸本)0769520464
Scheduling by Edge Reversal (SER) is a fully distributed scheduling mechanism based on the manipulation of acyclic orientations of a graph. this work uses SER to perform constraint partitioning of Constraint Satisfaction Problems (CSP). In order to apply the SER mechanism, the graph representing the constraints must receive an acyclic orientation. Since obtaining an optimal acyclic orientation is an NP-hard problem, this work studies three non-deterministic strategies known in the literature: Alg-Neigh, Alg-Edges, and Alg-Colour. We implemented the three algorithms and the SER scheduling mechanism, applying them to the CSP constraint networks generated from 3 applications. Our results show that SER has a great potential to perform a good partitioning of the constraint graphs.
the use of clusters of computers as an environment for highperformancecomputing has been shown to be promising. However, the efficient use of such systems still requires advances that make the application developmen...
详细信息
ISBN:
(纸本)0769520464
the use of clusters of computers as an environment for highperformancecomputing has been shown to be promising. However, the efficient use of such systems still requires advances that make the application development process be simpler and more productive. the development of cluster monitoring tools is essential to achieve this advances. In this paper we present (PMP)-P-2, a tool for use in clusters of personal computers that provides a graphic visualization of the temporal execution of distributed applications that use the MPI standard for message passing. the tool uses an approach involving the parallel port to read the time of events that occur in all different machines of a cluster. It also simulates the execution of task precedence graphs and allocates tasks of a graph to the machines of a cluster, among other functionalities.
One of the main challenges to the wide use of the Internet is the scalability of the servers, that is, their ability to handle the increasing demand. Scalability in stateful servers, which comprise e-Commerce and othe...
详细信息
ISBN:
(纸本)0769520464
One of the main challenges to the wide use of the Internet is the scalability of the servers, that is, their ability to handle the increasing demand. Scalability in stateful servers, which comprise e-Commerce and other transaction-oriented servers, is even more difficult, since it is necessary to keep transaction data across requests from the same user One common strategy for achieving scalability is to employ clustered servers, where the load is distributed among the various servers. However, as a consequence of the workload characteristics and the need of maintaining data coherent among the servers that compose the cluster, load imbalance arise among servers, reducing the efficiency of the server as a whole. In this paper we propose and evaluate a strategy for load balancing in stateful clustered servers. Our strategy is based on control theory and allowed significant gains over configurations that do not employ the load balancing strategy, reducing the response time in up to 50% and increasing the throughput in up to 16%.
the poster presents the darkfibre "project architecture" deployed by RENATER to support research projects withhigh network resources requirements. We show maps of the RENATER standard and darkfibre architec...
详细信息
ISBN:
(纸本)1424403073
the poster presents the darkfibre "project architecture" deployed by RENATER to support research projects withhigh network resources requirements. We show maps of the RENATER standard and darkfibre architectures. We summarize requirements and results for projects currently using the architecture (DEISA, LHC, Grid5000).
Achieving highperformance parallel computing requires both a large scale and reliable system. We describe our design and implementation of the Message Passing Interface, called MPICH-OPeN, for parallel computing over...
详细信息
ISBN:
(纸本)1424403073
Achieving highperformance parallel computing requires both a large scale and reliable system. We describe our design and implementation of the Message Passing Interface, called MPICH-OPeN, for parallel computing over a peer-to-peer network to address this challenge. Our implementation uses the Condor standalone checkpoint library and the Chandy-Lamport algorithm, for reliability, with extensions to make it decentralized. We use the OPeN architecture with an adaptive peer-to-peer protocol that caches connections between peers according to communication requirements of the parallel processes. We used PlanetLab to compare the performance of our implementation to MPICH-P4 and to measure the impact of dynamic peers on parallel program execution.
In general, two types of resource reservations in computer networks can be distinguished: immediate reservations which are made in a just-in-time manner and advance reservations which allow to reserve resources a long...
详细信息
ISBN:
(纸本)0769520464
In general, two types of resource reservations in computer networks can be distinguished: immediate reservations which are made in a just-in-time manner and advance reservations which allow to reserve resources a long time before they are actually used. Advance reservations are especially useful for grid computing but also for a variety of other applications that require network quality-of-service, such as content distribution networks or even mobile clients, which need advance reservation to support handovers for streaming video. Withthe emerged MPLS standard, explicit routing can be implemented also in IP networks, thus overcoming the unpredictable routing behavior which so far prevented the implementation of advance reservation services. the impact of such advance reservation mechanisms on the performance of the network with respect to the amount of admitted requests and the allocated bandwidth has so far not been examined in detail. In this paper we show that advance reservations can lead to a reduced performance of the network with respect to both metrics. the analysis of the reasons shows a fragmentation of the network resources. In advance reservation environments, additional new services can be defined such as malleable reservations which are introduced in this paper and can lead to an increased performance of the network. Four strategies for scheduling malleable reservations are presented and compared. the results of the comparisons show that some strategies increase the resource fragmentation and are therefore unsuitable in the considered environment while others lead to a significantly better performance of the network. Besides discussing the performance issue, in this paper the software architecture of a management system for advance reservations is presented.
暂无评论