In this paper, we investigate computing systems and network architectures, dedicated to high frequency trading applications and evaluate their performances. Both a high processing speed and low network latency are imp...
详细信息
ISBN:
(纸本)9781467381970
In this paper, we investigate computing systems and network architectures, dedicated to high frequency trading applications and evaluate their performances. Both a high processing speed and low network latency are important for high-frequency traders. The financial market literature suggests, however, that extremely high speeds discourage other traders from participating in the market, therefore harming the quality of financial markets. We find that the existing medium cost technology is enough to promote an optimal trading speed and therefore postulate further investment in low latency technology to be inefficient from a technical and economical point of view.
In image-based three-dimensional (3-D) reconstruction, one topic of growing importance is how to quickly obtain a 3-D model from a large number of images. The retrieval of the correct and relevant images for the model...
详细信息
In image-based three-dimensional (3-D) reconstruction, one topic of growing importance is how to quickly obtain a 3-D model from a large number of images. The retrieval of the correct and relevant images for the model poses a considerable technological challenge. The "image vocabulary tree" has been proposed as a method to search for similar images. However, a significant drawback of this approach is identified in its low time efficiency and barely satisfactory classification result. The method proposed is inspired by, and improves upon, some recent methods. Specifically, vocabulary quality is considered and multivocabulary trees are designed to improve the classification result. A marked improvement was, indeed, observed in our evaluation of the proposed method. To improve time efficiency, graphics processing unit (GPU) computer unified device architecture parallel computation is applied in the multivocabulary trees. The results of the experiments showed that the GPU was three to four times more efficient than the enumeration matching and CPU methods when the number of images is large. This paper presents a reliable reference method for the rapid construction of a free network to be used for the computing of 3-D information. (C) 2015 SPIE and IS&T
Parallelization techniques have been exploited most successfully by the gaming/graphics industry with the adoption of graphical processing units (GPUs), possessing hundreds of processor cores. The opportunity has been...
详细信息
Parallelization techniques have been exploited most successfully by the gaming/graphics industry with the adoption of graphical processing units (GPUs), possessing hundreds of processor cores. The opportunity has been recognized by the computational sciences and engineering communities, who have recently harnessed successfully the numerical performance of GPUs. For example, parallel magnetohydrodynamic (MHD) algorithms are important for numerical modelling of highly inhomogeneous solar, astrophysical and geophysical plasmas. Here, we describe the implementation of SMAUG, the Sheffield Magnetohydrodynamics Algorithm Using GPUs. SMAUG is a 1-3D MHD code capable of modelling magnetized and gravitationally stratified plasma. The objective of this paper is to present the numerical methods and techniques used for porting the code to this novel and highly parallel compute architecture. The methods employed are justified by the performance benchmarks and validation results demonstrating that the code successfully simulates the physics for a range of test scenarios including a full 3D realistic model of wave propagation in the solar atmosphere.
We present a parallel implementation of a new deformable image registration algorithm using the computer unified device architecture (CUDA). The algorithm co-registers preoperative and intraoperative 3-dimensional mag...
详细信息
ISBN:
(纸本)9781424441228
We present a parallel implementation of a new deformable image registration algorithm using the computer unified device architecture (CUDA). The algorithm co-registers preoperative and intraoperative 3-dimensional magnetic resonance (MR) images of a deforming organ. It employs a linear elastic dynamic finite-element model of the deformation and distance measures such as mutual information and sum of squared differences to align volumetric image data sets. Computationally intensive elements of the method such as interpolation, displacement and force calculation are significantly accelerated using a Graphics Processing Unit (GPU). The result of experiments carried out with a realistic breast phantom tissue shows a 37-fold speedup for the GPU-based implementation compared with an optimized CPU-based implementation in high resolution MR image registration. The GPU implementation is capable of registering 512x512x136 image sets in just over 2 seconds, making it suitable for clinical applications requiring fast and accurate processing of medical images.
Spiking neural networks (SNN) are powerful computational model inspired by the human neural system for engineers and neuroscientists to simulate intelligent computation of the brain. Inspired by the visual system,...
详细信息
Spiking neural networks (SNN) are powerful computational model inspired by the human neural system for engineers and neuroscientists to simulate intelligent computation of the brain. Inspired by the visual system, various spiking neural network models have been used to process visual images. However, it is time-consuming to simulate a large scale of spiking neurons in the networks using CPU programming. Spiking neural networks inherit intrinsically parallel mechanism from biological system. A massively parallel implementation technology is required to simulate them. To address this issue, modern Graphic Processing Units (GPUs), which have parallel array of streaming multiprocessors, allow many thousands of lightweight threads to be run, is proposed and proved as a pertinent solution. This paper presents an approach for implementation of an SNN model which performs color image segmentation on GPU. This approach is then compared with an equivalent implementation on an Intel Xeon CPU. The results show that the GPU approach was found to provide a 31 times faster than the CPU implementation.
Low-Density Parity-Check (LDPC) codes are powerful error correcting codes (ECC). They have recently been adopted by several data communication standards such as DVB-S2 and WiMax. LDPCs are represented by bipartite gra...
详细信息
ISBN:
(纸本)9781595939609
Low-Density Parity-Check (LDPC) codes are powerful error correcting codes (ECC). They have recently been adopted by several data communication standards such as DVB-S2 and WiMax. LDPCs are represented by bipartite graphs, also called Tanner graphs, and their decoding demands very intensive computation. For that reason, VLSI dedicated architectures have been investigated and developed over the last few years. This paper proposes a new approach for LDPC decoding on graphics processing units (GPUs). Efficient data structures and an new algorithm are proposed to represent the Tanner graph and to perform LDPC decoding according to the stream-based computing model. GPUs were programmed to efficiently implement the proposed algorithms by applying data-parallel intensive computing. Experimental results show that GPUs perform LDPC decoding nearly three orders of magnitude faster than modem CPUs. Moreover, they lead to the conclusion that GPUs with their tremendous processing power can be considered as a consistent alternative to state-of-the-art hardware LDPC decoders.
暂无评论