State-of-the-art run-time systems are a poor match to diverse, dynamic distributed applications because they are designed to provide support to a wide variety of applications, without much customization to individual ...
the software crisis within scientific computing has been that application codes become larger and more complex. the only conceivable solution is to make application codes smaller and less complex. We know of no way to...
Standard environments for exploiting idle time of workstations are based on some kind of spying process that detects low CPU usage and informs to a scheduler so that work can be dispatched. this approach generates loc...
详细信息
Many applications from scientific computing can benefit from object-oriented programming techniques because of their flexible and modular program development support. On the other hand, acceptable execution time can o...
With a rise in threats and attacks related to security, many companies have widely deployed Intrusion Detection Systems(IDSs) to protect their assets. thus IDSs are becoming the first targets before the attackers laun...
the proceedings contain 51 papers. the special focus in this conference is on Networks, architectures, HPC Systems, Earth Simulator, Experiences and Progress. the topics include: the Gilgamesh mind processor-in-memory...
ISBN:
(纸本)354043674X
the proceedings contain 51 papers. the special focus in this conference is on Networks, architectures, HPC Systems, Earth Simulator, Experiences and Progress. the topics include: the Gilgamesh mind processor-in-memory architecture for petaflops-scale computing;the next high-performancecomputer benchmark;language and compiler support for hybrid-parallel programming on SMP clusters;parallelizing merge sort onto distributed memory parallel computers;improving infiniband routing through multiple virtual networks;an adaptive subblock coherence protocol for improved SMP performance;efficient multiprocessing on commodity clusters;the impact of alias analysis on VLIW scheduling;low-cost value predictors using frequent value locality;integrated i-cache way predictor and branch target buffer to reduce energy consumption;a comprehensive analysis of indirect branch prediction;highperformance and energy efficient serial prefetch architecture;a programmable memory hierarchy for prefetching linked data structures;block red-black ordering method for parallel processing of ICCG solver;integrating performance analysis in the Uintah software development cycle;performance of adaptive mesh refinement scheme for hydrodynamics on simulations of expanding supernova envelope;an MPI benchmark program library and its application to the earth simulator;large-scale parallel computing of cloud resolving storm simulator;studying new ways for improving adaptive history length branch predictors;speculative clustered caches for clustered processors;the effects of timing dependence and recursion on parallel program schemata;an epic processor with pending functional units and distributed genetic algorithm with multiple populations using multi-agent.
this work examines the facility of using a large distributed memory system for rasterization of computer graphics using the OpenGL and GLUT libraries. Issues examined include the performance increases achieved through...
详细信息
this work examines the facility of using a large distributed memory system for rasterization of computer graphics using the OpenGL and GLUT libraries. Issues examined include the performance increases achieved through parallel processing and the effects of different methods for dividing the framebuffer over multiple processors.
In this paper, the parallelization aspects of the accelerated waveform relaxation algorithms for the transient simulation of semiconductor devices on parallel distributed memory computers are studied. these methods ar...
详细信息
In this paper, the parallelization aspects of the accelerated waveform relaxation algorithms for the transient simulation of semiconductor devices on parallel distributed memory computers are studied. these methods are competitive with standard pointwise methods on serial architectures, but are significantly faster on parallel computers. We make use of an improved parallel version of the conjugate gradient squared method (ICGS) combining elements of numerical stability and parallel algorithm design, for solving the resulting sequence of time-varying sparse linear differential-algebraic initial-value problems arising at each linearization step with waveform Newton. We reorganize the algorithm such that all the inner products, matrix-vector multiplications and vector updates of a single iteration step are independent and communication time required for inner product can be overlapped efficiently with computation time of vector updates. therefore, the bottleneck of the performance, namely the cost of global communication on parallel distributed memory computers can be significantly reduced. the resulting ICGS algorithm maintains the favorable properties of the original algorithm while not increasing the computational costs.
暂无评论