We present a fully numerical framework for the optimization of molecule-specific quantum chemical basis functions within the quantics tensor train format using a finite-difference scheme. The optimization is driven by...
详细信息
The primary objective of this study was to test the hypothesis that the binary information on the presence or absence of gene expression can sufficiently capture the inherent heterogeneity within single-cell RNA se qu...
详细信息
This paper presents a novel approach for head tracking in augmented reality (AR) flight simulators using an adaptive fusion of Kalman and particle filters. This fusion dynamically balances the strengths of both algori...
详细信息
The Spider chip, an interconnect for high-end networking applications, sustains a data transfer rate of 4.8 Gbytes/s, ether between chips in a single chassis or between remote chassis over cables up to 5 meters long.
The Spider chip, an interconnect for high-end networking applications, sustains a data transfer rate of 4.8 Gbytes/s, ether between chips in a single chassis or between remote chassis over cables up to 5 meters long.
Designed to efficiently support large, real-world, floating-point-intensive applications, the TFP (short for Tremendous Floating-Point) microprocessor is a superscalar implementation of the Mips Technologies architect...
详细信息
Designed to efficiently support large, real-world, floating-point-intensive applications, the TFP (short for Tremendous Floating-Point) microprocessor is a superscalar implementation of the Mips Technologies architecture. This floating-point, computation-oriented processor uses a superscalar machine organization that dispatches up to four instructions each clock cycle to two floating-point execution units, two memory load/store units, and two integer execution units. Its split-level cache structure reduces cache misses by directing integer data references to a 16-Kbyte on-chip cache, while channeling floating-point data references off chip to a 4-Mbyte cache.
Designers using CAE workstations often feel as if they are stuck in stop-and-go traffic. Some steps, like entering a schematic, move along briskly. Others, say, simulating a circuit and checking its design rules, slow...
详细信息
Designers using CAE workstations often feel as if they are stuck in stop-and-go traffic. Some steps, like entering a schematic, move along briskly. Others, say, simulating a circuit and checking its design rules, slow progress to a crawl because of the requisite computing time and muscle. True, hardware accelerators have helped unsnarl the simulation tie-up, but until now none has attempted to boost throughput across the full design cycle - from schematic entry to prototype testing. Rather than concentrate on a single step, a workstation accelerator brings its interactive approach to bear on the whole design cycle.
This paper presents a parallel volume rendering algorithm that can render a 256 x 256 x 225 voxel medical data set at over 15 Hz and a 512 x 512 x 334 voxel data set at over 7 Hz on a 32-processor Silicon graphics Cha...
详细信息
This paper presents a parallel volume rendering algorithm that can render a 256 x 256 x 225 voxel medical data set at over 15 Hz and a 512 x 512 x 334 voxel data set at over 7 Hz on a 32-processor Silicon graphics Challenge. The algorithm achieves these results by minimizing each of the three components of execution time: computation time, synchronization time, and data communication time. Computation time is low because the parallel algorithm is based on the recently-reported shear-warp serial volume rendering algorithm which is over five times faster than previous serial algorithms. The algorithm uses run-length encoding to exploit coherence and an efficient volume traversal to reduce overhead. Synchronization time is minimized by using dynamic load balancing and a task partition that minimizes synchronization events. Data communication costs are low because the algorithm is implemented for shared-memory multiprocessors, a class of machines with hardware support for low-latency fine-grain communication and hardware caching to hide latency. We draw two conclusions from our implementation. First, we find that on shared-memory architectures data redistribution and communication costs do not dominate rendering time. Second, we find that cache locality requirements impose a limit on parallelism in volume rendering algorithms. Specifically, our results indicate that shared-memory machines with hundreds of processors would be useful only for rendering very large data sets.
Accurate assessment of tissue perfusion is crucial in visceral surgery, especially during anastomosis. Currently, subjective visual judgment is commonly employed in clinical settings. Hyperspectral imaging (HSI) offer...
详细信息
The 1990s will be a period of revolutionary change for manufacturing firms competing at the world-class level. Global competition will force them to rely increasingly on mechanical design automation tools. graphics to...
详细信息
The 1990s will be a period of revolutionary change for manufacturing firms competing at the world-class level. Global competition will force them to rely increasingly on mechanical design automation tools. graphics tools provide realistic, dynamic displays and eliminate the need for the design artifacts that add no value to the product, but add tremendous cost to development. New software and hardware tools promise to focus purely on product design and less on artifacts.
The 4D-MP graphics superworkstation, which brings 40 MIPS (million instructions per second) of computing performance to a graphics superworkstation is described. It also delivers 40 MFLOPS (million floating-point oper...
详细信息
ISBN:
(纸本)0818608285
The 4D-MP graphics superworkstation, which brings 40 MIPS (million instructions per second) of computing performance to a graphics superworkstation is described. It also delivers 40 MFLOPS (million floating-point operations, is described per second) of geometry processing performance, enabling 100,000 lighted, four-sided, concave-tested polygons to be processed per second. This level of computing and graphics processing in an office-environment workstation is made possible by using the fastest available RISC (reduced-instruction-set computer) microprocessors in a single shared-memory multiprocessor design driving a tightly coupled, highly parallel graphics system. Aggregate sustained data rates of greater than 1 Gbyte/s are achieved by a hierarchy of buses in a balanced system designed to avoids bottlenecks.
暂无评论