We present a multi-stage method for solving large tridiagonal systems on the GPU. Previously large tridiagonal systems cannot be efficiently solved due to the limitation of on-chip shared memory size. We tackle this p...
详细信息
Graphics processing units provide a large computational power at a very low price which position them as an ubiquitous accelerator. General purpose programming on the graphics processing units (GPGPU) is best suited f...
详细信息
The Simultaneous Optical Multiprocessor Exchange Bus (SOME-Bus) is a low-latency, high-bandwidth interconnection network that directly links arbitrary pairs of processor nodes without contention, and can efficiently i...
Due to the increasing complexity, the behavior of large-scale distributed systems becomes difficult to predict. The ability of on-line identification and autotuning of adaptive control systems has made the adaptive co...
详细信息
ISBN:
(纸本)0769523129
Due to the increasing complexity, the behavior of large-scale distributed systems becomes difficult to predict. The ability of on-line identification and autotuning of adaptive control systems has made the adaptive control theoretical design an attractive approach for quality of service (QoS) guarantee. However, there is an inherent constraint in adaptive control systems, i.e. a conflict between asymptotically good control and asymptotically good parameter estimates. This paper addresses these limitations via sensitivity analysis. The simulation study demonstrates that the adaptive control theoretical design depends on the excitation signal, environment uncertainty, and a priori knowledge on the system. In addition, this paper proposes an adaptive dual control framework for mitigating these constraints in QoS design. By incorporating the existing uncertainty of the on-line prediction into the control strategy, the dual adaptive control framework optimizes the tradeoff between the control goal and the uncertainty.
This work presents a parallel algorithm for implementing the nonuniform Fast Fourier transform (NUFFT) on Google's Tensor processing Units (TPUs). TPU is a hardware accelerator originally designed for deep learnin...
详细信息
ISBN:
(纸本)9781665412469
This work presents a parallel algorithm for implementing the nonuniform Fast Fourier transform (NUFFT) on Google's Tensor processing Units (TPUs). TPU is a hardware accelerator originally designed for deep learning applications. NUFFT is considered as the main computation bottleneck in magnetic resonance (MR) image reconstruction when k-space data are sampled on a nonuniform grid. The computation of NUFFT consists of three operations: an apodization, an FFT, and an interpolation, all being formulated as tensor operations in order to fully utilize TPU's strength in matrix multiplications. The implementation is with TensorFlow. Numerical examples show 20x similar to 80x acceleration of NUFFT on a single-card TPU compared to CPU implementations. The strong scaling analysis shows a close-to-linear scaling of NUFFT on up to 64 TPU cores. The proposed implementation of NUFFT on TPUs is promising in accelerating MR image reconstruction and achieving practical runtime for clinical applications.
distributedprocessing in today's mobile multimedia devices requires efficient ways of inter-IC communication. This paper presents a hardware block allowing the host IC to have parallel asynchronous access to the ...
详细信息
ISBN:
(纸本)9781424407620
distributedprocessing in today's mobile multimedia devices requires efficient ways of inter-IC communication. This paper presents a hardware block allowing the host IC to have parallel asynchronous access to the resources of the target IC. The block targets mobile multimedia applications. It is composed of an asynchronous logic subsystem, configuration registers, a packer, 4 command/data FIFOs, 3 packer FIFOs, 8 Device Transaction Level (DTL) masters and a DTL slave. The block allows the target IC to be directly connected to a memory controller. The block supports 2D transfers, which improves performance and saves power for mobile video applications.
Sequence comparison is a basic operation in DNA sequencing projects, and most of sequence comparison methods used are based on heuristics, which are faster but there are no guarantees that the best alignments will be ...
详细信息
ISBN:
(纸本)0769523129
Sequence comparison is a basic operation in DNA sequencing projects, and most of sequence comparison methods used are based on heuristics, which are faster but there are no guarantees that the best alignments will be produced. On the other hand, the algorithm proposed by Smith-Waterman obtains the best local alignments at the expense of very high computing power and huge memory requirements. In this article, we present and evaluate our experiments with three strategies to run the Smith-Waterman algorithm in a cluster of workstations using a distributed Shared Memory System. Our results on an eight-machine cluster presented very good speedups and indicate that impressive improvements can be achieved, depending on the strategy used. Also, we present some theoretical remarks on how to reduce the amount of memory used.
Cross-species chromosome alignments can reveal ancestral relationships and may be used to identify the peculiarities of the species. It is thus an important problem in Bioinformatics. So far, aligning huge sequences, ...
详细信息
One of the most serious security threats in the Internet are distributed Denial of Service (DDoS) attacks, due to the significant service disruption they can create and the difficulty to prevent them. In this paper, w...
详细信息
ISBN:
(纸本)0769523129
One of the most serious security threats in the Internet are distributed Denial of Service (DDoS) attacks, due to the significant service disruption they can create and the difficulty to prevent them. In this paper, we propose new deterministic packet marking models in order to characterize DDoS attack streams. Such common characterization can be used to make filtering near the victim more effective. In this direction we propose a rate control scheme that protects destination domains by limiting the amount of traffic during an attack, while leaving a large percentage of legitimate traffic unaffected. The above features enable providers to offer enhanced security protection against such attacks as a value-added service to their customers, hence offer positive incentives for them to deploy the proposed models. We evaluate the proposed marking models using a snapshot of the actual Internet topology, in terms of how well they differentiate attack traffic from legitimate traffic in cases of full and partial deployment.
Heterogeneity has been considered in scheduling, but without taking into account the temporal variation of completion times of the sub-tasks for a divisible, independent task. In this paper, the problem of scheduling ...
详细信息
暂无评论