In this paper we present VCube-PS, a topic-based Publish/Subscribe system built on the top of a virtual hypercube-like topology. Membership information and published messages to subscribers (members) of a topic group ...
详细信息
ISBN:
(纸本)9781509012336
In this paper we present VCube-PS, a topic-based Publish/Subscribe system built on the top of a virtual hypercube-like topology. Membership information and published messages to subscribers (members) of a topic group are broadcast over dynamically built spanning trees rooted at the message's source. For a given topic, delivery of published messages respects causal order. performance results of experiments conducted on the PeerSim simulator confirm the efficiency of VCube-PS in terms of scalability, latency, number, and size of messages when compared to a single rooted, not dynamically, tree built approach.
high-speed elementary function generation is crucial to the performance of many DSP applications. this paper presents three new architectures for generating elementary functions with IEEE single precision using second...
详细信息
ISBN:
(纸本)0769511503
high-speed elementary function generation is crucial to the performance of many DSP applications. this paper presents three new architectures for generating elementary functions with IEEE single precision using second-order interpolation. these designs have been developed through a. combination of architectural innovations and algorithm developments. they represent a range of trade-off between the use of memory modules and computational circuits Our most memory intensive architecture uses one third less memory than alternative schemes while incurring no time penalty and minimal additional circuitry.
the increasingly important data-intensive scientific discovery presents a critical question to the highperformancecomputing (HPC) community-how to efficiently support these growing scientific big data applications w...
详细信息
through massive parallelism, distributed systems enable the multiplication of productivity. Unfortunately, increasing the scale of available machines to users will also multiply debugging when failure occurs. ata mini...
详细信息
ISBN:
(纸本)1424403073
through massive parallelism, distributed systems enable the multiplication of productivity. Unfortunately, increasing the scale of available machines to users will also multiply debugging when failure occurs. ata mining allows the extraction of patterns within large amounts of data and therefore forms the foundation for a useful method of debugging, particularly within such distributed systems. this paper outlines a successful application of data mining in troubleshooting distributed systems, proposes a framework for further study, and speculates on other future work.
We overview our GRAvity PipE (GRAPE) project to develop special-purpose computers for astrophysical N-body simulations. the basic idea of GRAPE is to attach a custom-build computer dedicated to the calculation of grav...
详细信息
We overview our GRAvity PipE (GRAPE) project to develop special-purpose computers for astrophysical N-body simulations. the basic idea of GRAPE is to attach a custom-build computer dedicated to the calculation of gravitational interaction between particles to a general-purpose programmable computer. By this hybrid architecture, we can achieve both a wide range of applications and very high peak performance. Our newest machine, GRAPE-6, achieved the peak speed of 32 Tflops, and sustained performance of 11.55 Tflops, for the total budget of about 4 million USD. We also discuss relative advantages of special-purpose and general-purpose computers and the future of high-performancecomputing for science and technology. (C) 2002 Elsevier Science B.V. All rights reserved.
the structure of direct vertical interconnections, called through Silicon Vias (TSVs), is an important issue in the realm of 3D ICs. the bus-based and network-based structures are the two dominant architectures for im...
详细信息
ISBN:
(纸本)9781424462698
the structure of direct vertical interconnections, called through Silicon Vias (TSVs), is an important issue in the realm of 3D ICs. the bus-based and network-based structures are the two dominant architectures for implementing TSVs as interlayer connection in 3D ICs. Both implementations have some disadvantages. the former suffers from poor scalability and deteriorates the performance at high injection rates, and the latter consumes more area and power dissipation. In this paper, we propose a novel pipeline bus structure for TSVs to improve the performance of the prior bus-based architecture. the presented structure can utilize bi-synchronous FIFO for synchronization between stacked layers if each layer is fabricated by different technologies. Experimental results with synthetic test cases demonstrate that the proposed architecture gives significant improvements in average network latency. Also, the hardware area and power consumption of the presented bus structure are 9% and 11% less than the typical bus structure of TSVs, respectively.
Numerous problems in science and engineering involve discretizing the problem domain as a regular structured grid and make use of domain decomposition techniques to obtain solutions faster using highperformance compu...
详细信息
ISBN:
(纸本)9781479980062
Numerous problems in science and engineering involve discretizing the problem domain as a regular structured grid and make use of domain decomposition techniques to obtain solutions faster using highperformancecomputing. However, the load imbalance of the workloads among the various processing nodes can cause severe degradation in application performance. this problem is exacerbated for the case when the computational workload is non-uniform and the processing nodes have varying computational capabilities. In this paper, we present novel local search algorithms for regular partitioning of a structured mesh to heterogeneous compute nodes in a distributed setting. the algorithms seek to assign larger workloads to processing nodes having higher computation capabilities while maintaining the regular structure of the mesh in order to achieve a better load balance. We also propose a distributed memory (MPI) parallelization architecturethat can be used to achieve a parallel implementation of scientific modeling software requiring structured grids on heterogeneous processing resources involving CPUs and GPUs. Our implementation can make use of the available CPU cores and multiple GPUs of the underlying platform simultaneously. Empirical evaluation on real world flood modeling domains on a heterogeneous architecture comprising of multicore CPUs and GPUs suggests that the proposed partitioning approach can provide a performance improvement of up to 8x over a naive uniform partitioning.
Increasing delay and power variation has become a major challenge to designing highperformance Multiprocessor System-On-Chips (MPSoC) in deep sub-micron technologies. As a result, a paradigm shift from deterministic ...
详细信息
ISBN:
(纸本)9781424462698
Increasing delay and power variation has become a major challenge to designing highperformance Multiprocessor System-On-Chips (MPSoC) in deep sub-micron technologies. As a result, a paradigm shift from deterministic to statistical design methodology at all levels of the design hierarchy is inevitable. In this paper, we propose a static variation-aware task scheduling and power mode selection algorithm for MPSoCs. the proposed algorithm is able to maximize the total power yield of the chip under a given performance yield constraint by searching for the optimal task scheduling and power mode selection policy for a specified multiprocessor platform. Experimental results are gathered by simulating the algorithm with two different statistical analysis methods called Monte Carlo and Event-Reference-Table-based method. We have shown that by considering both leakage and frequency variation during the simultaneous selection of task scheduling and power mode switching policies, our algorithm achieves significant improvement over conventional methods.
Component-based programming has been applied to address the requirements of applications in highperformancecomputing (HPC). the usual service connectors of commercial component models do not fit some requirements of...
详细信息
ISBN:
(纸本)9780769530147
Component-based programming has been applied to address the requirements of applications in highperformancecomputing (HPC). the usual service connectors of commercial component models do not fit some requirements of HPC, mainly regarding the support of parallelism, however this paper looks at extensions to the usual notion of service connector to meet such requirements, using the # component model as a substratum, evidencing its expressiveness.
Recent developments in the international arena has meant the technology is now mature enough to bring together those required for the implementation of a grid computing facility. this paper examines the requirements a...
详细信息
ISBN:
(纸本)0769517722
Recent developments in the international arena has meant the technology is now mature enough to bring together those required for the implementation of a grid computing facility. this paper examines the requirements and applications for an eScience infrastructure with particular reference to developments in Europe.
暂无评论