Run-time data redistribution can affect algorithm performance in distributed-memory machines. Redistribution of data can be performed between algorithm phases when a different data decomposition is expected to deliver...
详细信息
Run-time data redistribution can affect algorithm performance in distributed-memory machines. Redistribution of data can be performed between algorithm phases when a different data decomposition is expected to deliver increased performance for a subsequent phase of computation. Additionally, data redistribution can occur at subprogram boundaries. Redistribution, however, represents increased program overhead as algorithm computation is necessarily discontinued while data are exchanged among processor memories. In this paper, we present a technique for data-processor mapping, applicable to data redistribution, that minimizes the total amount of data that must be communicated among processors. the mapping technique is architecture-independent and represents our initial work toward achieving efficient redistribution in distributed-memory machines.< >
A second generation, contrast sensitive silicon retina is reported in this paper. the architecture and organization is inspired by the outer plexiform processing in the vertebrate retina. Current-mode subthreshold MOS...
详细信息
A second generation, contrast sensitive silicon retina is reported in this paper. the architecture and organization is inspired by the outer plexiform processing in the vertebrate retina. Current-mode subthreshold MOS design techniques are employed to obtain highperformance and energetic efficiency. the system has been fabricated with 230/spl times/210 pixels on a 1/spl times/1 cm die in a 1.2 /spl mu/m n-well double metal, double poly, digital oriented CMOS technology. the chip incorporates 590,000 transistors, 48,000 pixels, operating in subthreshold/transition region with power dissipation of 50 mW when powered from a 5 V power supply. the pixel has a frequency response of 100 kHz.
the Walsh transforms have been used in various digital signal processing applications, such as high definition television (HDTV), processing ultrasonic or X-ray images of the heart in medicine, target tracking and obj...
详细信息
the Walsh transforms have been used in various digital signal processing applications, such as high definition television (HDTV), processing ultrasonic or X-ray images of the heart in medicine, target tracking and object identification in military applications, and many more image processing applications. As the area of computer applications has broadened, the quantity of data to be transformed has greatly increased. One way of achieving fast transform is to parallelize the transform algorithms used in these applications. the author present the parallelization of three fast Walsh transform algorithms on the Alliant FX/2800 Supercomputer. they examine different parallelization techniques to optimize the computational time, and show how the Alliant architecture may be efficiently used for parallel computation.< >
Model-based evaluation of reliable distributed and parallel systems is difficult due to the complexity of these systems and the nature of the dependability measures of interest. the complexity creates problems for ana...
详细信息
Model-based evaluation of reliable distributed and parallel systems is difficult due to the complexity of these systems and the nature of the dependability measures of interest. the complexity creates problems for analytical model solution techniques, and the fact that reliability and availability measures are based on rare events makes traditional simulation methods inefficient. Importance sampling is a well-known technique for improving the efficiency of rare event simulations. However, finding an importance sampling strategy that works well in general is a difficult problem. the best strategy for importance sampling depends on the characteristics of the system and the dependability measure of interest. this fact motivated the development of an environment for importance sampling that would support the wide variety of model characteristics and interesting measures. the environment is based on stochastic activity networks, and importance sampling strategies are specified using the new concept of the importance sampling governor. the governor supports dynamic importance sampling strategies by allowing the stochastic elements of the model to be redefined based on the evolution of the simulation. the utility of the new environment is demonstrated by evaluating the unreliability of a highly dependable fault-tolerant unit used in the well-known MARS architecture. the model is non-Markovian, with Weibull distributed failure times and uniformly distributed repair times.< >
As commercial microprocessors become increasingly popu- lar in current MPP architectures, high-performance com mercial workstations have also received increased attention as cost-effective building@blo&s for large...
详细信息
As commercial microprocessors become increasingly popu- lar in current MPP architectures, high-performance com mercial workstations have also received increased attention as cost-effective building@blo&s for large parallel-processing systems. the Fast User-level Network (FUNet) project (111 is an attempt at constructing an inexpensive workstation based parallel system capable of supporting efficient ex ecution of message-passing parallel programs. Based on MIT's Arctic 15) network technology, FUNet connects stock configured commodity workstations with a high-bandwidth packet-switched routing network. the Past User-level Net work Interface (FUNi) is the custom hardware network interface device that provides access to FUNet for both message passing and remote direct-memory-access (DMA) block transfers between parallel peer processes on FUNet connected workstations. the FUNi hardware mechanisms allow direct low-overhead user-level accesses to FUNet while maintaining secure and transparent sharing of FUNet among multiple parallel applications. FUNi can be realized as SBus peripheral cards to allow compatibility with a variety of workstation platforms. the relaxed clock speed (25MHz max.) of SBus allows FUNi to be inexpensively imple mented using FPGA parts that are synthesized from de sips captured in Verilog Hardware Description Language [15]. SBus's Direct Virtual Memory Access (DVMA)[8J also assists FUNi in overcoming the performance limitations im posed by existing workstation designs. Simulation results have shown that FUNet with FUNi, when coupled with latency-hiding software techniques, is effective in supporting fine-gra@ined parallel processing on a workstation cluster.
this conference proceedings contain 32 papers. the main topics are architectural characteristics of scientific applications, TLBs and memory management, input/output systems, fault-tolerant computerarchitecture, mult...
详细信息
ISBN:
(纸本)0818638109
this conference proceedings contain 32 papers. the main topics are architectural characteristics of scientific applications, TLBs and memory management, input/output systems, fault-tolerant computerarchitecture, multiprocessor caches, high-performancecomputing from the application perspective, multithreading support, shared memory systems, cache designs,and multiprocessor memory systems and interconnections.
highperformancecomputing and networking are becoming the backbone of the scientific and information infrastructure, incorporating emerging technologies into productive applications at an accelerating pace. these tec...
详细信息
ISBN:
(纸本)0780313933
highperformancecomputing and networking are becoming the backbone of the scientific and information infrastructure, incorporating emerging technologies into productive applications at an accelerating pace. these technologies are important both for our national security and as the basis of our future economic competitiveness. the Federal highperformancecomputing and Communications Program provides an innovative and coordinated research agenda for the US in these areas. the question of greatest interest to the III/V community is, 'What role will emerge for compound semiconductors as computing reinvents itself?' this paper presents some insights, and a sample of the GaAs-related research funded by the Advanced Research Projects Agency. Results of these efforts will significantly contribute to the future role these semiconductors will play in mainstream computing.
Heterogeneity in computing environments is becoming increasingly common. Some consider this a problem, while others (including ourselves) prefer to think of it as a benefit. By exploiting the different features and ca...
详细信息
highperformancecomputing and networking are becoming the backbone of the scientific and information infrastructure, incorporating emerging technologies into productive applications at an accelerating pace. these tec...
详细信息
highperformancecomputing and networking are becoming the backbone of the scientific and information infrastructure, incorporating emerging technologies into productive applications at an accelerating pace. these technologies are important both for the national security and as the basis of the future economic competitiveness. the Federal highperformancecomputing and Communications Program provides an innovative and coordinated research agenda for the US in these areas. the question of greatest interest to the III/V community is, "What role will emerge for compound semiconductors as computing reinvents itself?". the authors present some insights, and a sample of the GaAs-related research funded by the Advanced Research Projects Agency. Results of these efforts will significantly contribute to the future role these semiconductors will play in mainstream computing.< >
the implementation, optimization, and evaluation of an ion implanted, 0.5μm refractory self-aligned gate GaAs MESFET process for DCFL digital IC's for supercomputer applications is described. the MESFET performan...
详细信息
ISBN:
(纸本)0780313933
the implementation, optimization, and evaluation of an ion implanted, 0.5μm refractory self-aligned gate GaAs MESFET process for DCFL digital IC's for supercomputer applications is described. the MESFET performance has been optimized for minimal short channel effects, ultra highperformance, minimal backgating, and improved manufacturability. this device process has been coupled together with a three or four level metal interconnect process for producing 1GHz clock rate LSI to VLSI digital computer IC's. the interconnect process makes use of up to four levels of CVD Tungsten via fill for planarity throughout the interconnect process. this process yields typical propagation delays of 25pS for a 2/4μm inverter with unity fan-out. Four input NOR gates with a fan-out of 4 have a typical delay of 65pS. Moreover, a 4 input NOR buffer driving a fan-out of 7 through 500μm of minimum geometry metal has a delay of 63pS. this delay increases to 93pS when the metal length is increased to 1500μm. this process is being used to produce 5 to 10K gate digital circuits for the 1GHz clock rate Cray-4 supercomputer. this work has resulted in a manufacturing process which produces devices and circuits with world class performance.
暂无评论