This paper develops implementation strategy and method to accelerate the propagation and backpropagation (PBP) tomographic imaging algorithm using Graphic processing Units (GPUs). The Compute Unified Device Architectu...
详细信息
ISBN:
(纸本)9780769549569;9781467362184
This paper develops implementation strategy and method to accelerate the propagation and backpropagation (PBP) tomographic imaging algorithm using Graphic processing Units (GPUs). The Compute Unified Device Architecture (CUDA) programming model is used to develop our parallelized algorithm since the CUDA model allows the user to interact with the GPU resources more efficiently than traditional Shader methods. The results show an improvement of more than 80x when compared to the C/C++ version of the algorithm, and 515x when compared to the MATLAB version while achieving high quality imaging for both cases. We test different CUDA kernel configurations in order to measure changes in the processing-time of our algorithm. By examining the acceleration rate and the image quality, we develop an optimal kernel configuration that maximizes the throughput of CUDA implementation for the PBP method.
In this paper an unsupervised parallel approach called fuzzy competitive learning network (FCLN) for vector quantization (VQ) and spread FCLN (SFCLN) for color image compression in the mean value/difference value tran...
详细信息
ISBN:
(纸本)0769517609
In this paper an unsupervised parallel approach called fuzzy competitive learning network (FCLN) for vector quantization (VQ) and spread FCLN (SFCLN) for color image compression in the mean value/difference value transform (MDT) domain are proposed. In the FCLN, the codebook design is conceptually considered as a clustering problem. Here, it is a kind of competitive learning network model imposed by the fuzzy clustering strategies working toward minimizing an objective function defined as the average distortion measure between any two training vectors within the same class. The color image information transformed by the MDT operation was separated into RGB 3-plane mean value and detail coefficients. Then the detail coefficients for each plane were trained using the proposed SFCLN method to generate the VQ codebook. The experimental results show that promising codebooks can be obtained using the proposed FCLN and SFCLN for color image compression in the MDT domain.
This paper aims to achieve precise identification of diseases and pests affecting pear trees through the integration of YOLOv5, Jetson Nano, big data, and deep learning techniques. The objective is to facilitate timel...
详细信息
ISBN:
(纸本)9798400709630
This paper aims to achieve precise identification of diseases and pests affecting pear trees through the integration of YOLOv5, Jetson Nano, big data, and deep learning techniques. The objective is to facilitate timely detection of these issues, thereby enabling early prevention and control measures for pear tree health. The study employs YOLOv5 as the primary model, which is implemented on the embedded device jetson Nano. By leveraging the GPU parallelprocessing capabilities of jetson Nano's deep learning framework, this approach enhances picture analysis and detection speed while improving quality and efficiency. [1] Furthermore, it integrates big data with deep learning methodologies to bolster the accuracy of disease detection and identification. Utilizing these advanced technologies allows for accurate recognition of diseases and pests associated with pear trees through image analysis. This significantly reduces both the complexity involved in detecting such conditions and lowers operational thresholds for practitioners in the field. In comparison to traditional detection methods, YOLOvI technology exhibits no stringent requirements regarding environmental conditions or backgrounds;thus, it remains less susceptible to variations caused by weather factors making it a superior choice for pest and disease detection in agricultural settings.
We present a parallel conjugate gradient solver for the Poisson problem optimized for multi-GPU platforms. Our approach includes a novel heuristic Poisson preconditioner well suited for massively-parallel SIMD process...
详细信息
We present a parallel conjugate gradient solver for the Poisson problem optimized for multi-GPU platforms. Our approach includes a novel heuristic Poisson preconditioner well suited for massively-parallel SIMD processing. Furthermore, we address the problem of limited transfer rates over typical data channels such as the PCI-express bus relative to the bandwidth requirements of powerful GPUs. Specifically, naive communication schemes can severely reduce the achievable speedup in such communication-intense algorithms. For this reason, we employ overlapping memory transfers to establish a high level of concurrency and to improve scalability. We have implemented our model on a high-performance workstation with multiple hardware accelerators. We discuss the mathematical principles, give implementation details, and present the performance and the scalability of the system.
This paper presents a framework for the acceleration of Monte-Carlo simulations using reconfigurable hardware. Discrete-time random walk simulations are widely used in the financial computation to calculate derivative...
详细信息
ISBN:
(纸本)1424406897
This paper presents a framework for the acceleration of Monte-Carlo simulations using reconfigurable hardware. Discrete-time random walk simulations are widely used in the financial computation to calculate derivative prices and evaluate portfolio risk, but increases in model complexity and tighter time constraints now require large computer farms to meet operational demands. We present a model for accelerating such tasks with reconfigurable hardware, using an architecture that exploits parallelism at multiple levels, combining fine-grained pipe lining, intra-device multi-threading, and inter-device distributedprocessing. The architecture adopts a modular design approach, allowing components to be re-used across different applications, while also allowing automatic design space exploration to maximise performance within different devices. Using our framework, we implement two different discrete-time random walks representative of financial simulations, and these show 71 times and 8 times speedup respectively when compared to a C++ software and SSE vectorised implementations.
The proceedings contain 20 papers. The topics discussed include: using logic coverage to improve testing function block diagrams;automatic grammar-based test generation;adaptive homing and distinguishing experiments f...
ISBN:
(纸本)9783642417061
The proceedings contain 20 papers. The topics discussed include: using logic coverage to improve testing function block diagrams;automatic grammar-based test generation;adaptive homing and distinguishing experiments for nondeterministic finite state machines;remote testing of timed specifications;an implementation relation and test framework for timed distributed systems;unfolding-based test selection for concurrent conformance;predicting the size of test suites from use cases: an empirical exploration;chaining test cases for reactive system testing;variations over test suite reduction;case studies in learning-based testing;techniques and toolset for conformance testing against UML sequence diagrams;parallel SMT-constrained symbolic execution for eclipse CDT/codan;and challenges of testing periodic messages in avionics systems using TTCN-3.
Consistency and responsiveness are two important factors in providing the sense of reality in distributed Virtual Environment (DVE). However, it is not easy to optimize both aspects because of the trade-off between th...
详细信息
Consistency and responsiveness are two important factors in providing the sense of reality in distributed Virtual Environment (DVE). However, it is not easy to optimize both aspects because of the trade-off between these two factors. As a result, most existing consistency maintenance methods ignored the responsiveness requirements, or just assumed a simple responsiveness requirement model which cannot meet the real need of DVE systems. In this paper, we first present a new responsiveness requirement model. The model can describe requirement satisfaction situation of each node. Base on this model, we propose a responsiveness requirement based consistency method. The method can adjust the utilization of time resource according to the requirements of different nodes and improve the overall responsiveness performance by at least 20%. Therefore, it provides a good support to increase the applicability of DVE systems.
Throughout this paper a catenation between the universal paradigm of cellular nonlinear networks (CNN) and the innovative approach of grid computing is given. CNN are a massive parallel solution for solving non-linear...
详细信息
Throughout this paper a catenation between the universal paradigm of cellular nonlinear networks (CNN) and the innovative approach of grid computing is given. CNN are a massive parallel solution for solving non-linear problems, modelling complex phenomena in medicine, physics and data analysis as well as for powerful imageprocessing and recognition systems. They usually are simulated on local computer systems or built as dedicated VLSI-implementations. However, the research of complex CNN structures and settings require massive computing power and thus can benefit from multi-system open architectures which can be provided by the grid approach. Propositions of two different realizations with grid architecture in mind are given by introducing an algorithm of implementing such methods in a CNN software simulator. First a brief introduction to CNN is given. Afterwards, problems for the current determination of such networks are discussed
The paper considers a simulation algorithm for dendritic crystallogram images and offers its parallel implementation using MPI technology. As a basis we took an algorithm using an impurity-and-material-substance diffu...
详细信息
The paper considers a simulation algorithm for dendritic crystallogram images and offers its parallel implementation using MPI technology. As a basis we took an algorithm using an impurity-and-material-substance diffusion equation. The algorithm used as a guide was upgraded. An impurity redistribution method was changed, and the order of crystallization was updated that allowed to maintain the impurity volume during the crystal growth. A separation technique for algorithm stages was proposed on compute cores. An acceleration value of the proposed MPI-implementation has proven to be 20% more than the OpenMP analogue. The resulting implementation may be used to simulate large crystallograms in shared-memory systems.
Enhancement of low resolution images is always a priority Enhancement of low resolution images is always a priority field of digital imageprocessing. In this paper, we propose a novel hybrid approach based on discret...
详细信息
Enhancement of low resolution images is always a priority Enhancement of low resolution images is always a priority field of digital imageprocessing. In this paper, we propose a novel hybrid approach based on discrete wavelet transform (DWT) and particle swarm optimization (PSO). To develop the proposed method we use spatial domain as well as frequency domain. To reduce the low frequencies from the input image we use the frequency domain. DWT is used to decompose the input low resolution image into different sub bands. Each of the interpolated high frequency sub band (LH, HL, HH) is then summed up with the interpolated output image of the frequency domain. In order to achieve high resolution image, the estimated high frequency sub bands of the intermediate stage and the interpolated low resolution input image have been combined by using inverse DWT. To generate a better high resolution image particle swarm optimization (PSO) technique has been used. The quantitative (root mean square error, normalized cross correlation, normalized absolute error) and visual outcome show the strength of this proposed method.
暂无评论