Medical images are corrupted by different types of noises caused by the equipment itself. It is very important to obtain precise images to facilitate accurate observations for the given application. Removing of noise ...
详细信息
ISBN:
(数字)9781510620766
ISBN:
(纸本)9781510620766
Medical images are corrupted by different types of noises caused by the equipment itself. It is very important to obtain precise images to facilitate accurate observations for the given application. Removing of noise from images is now a very challenging issue in the field of medical imageprocessing. This work undertake the study of noise removal techniques in medical image by using fast implementation of different digital filters, such as average, median and Gaussian filter. processing of x-ray medical images takes a significant time. Now days modern hardware allows to use parallel technology for imageprocessing on CPU and GPU. Using GPU processing technology were proposed parallel implementations of noise reduction algorithm taking into account the data parallelism. The experimental study conducted on medical x-ray image, so that to choose the best filters considering medical task and time of processing. The comparison of the implementation of fast filters algorithm and GPU implementation show great increase in performance. Graphics processing units (GPUs) are used today in a wide range of applications, mainly because they can dramatically accelerate parallel computing. In the field of medical imaging, GPUs are in some cases crucial for enabling practical use of computationally demanding algorithms.
The paper aims to propose a distributed method for machine learning models and its application for medical data analysis. The great challenge in the medicine field is to provide a scalable imageprocessing model, whic...
详细信息
The paper aims to propose a distributed method for machine learning models and its application for medical data analysis. The great challenge in the medicine field is to provide a scalable imageprocessing model, which integrates the computing processing requirements and computing-aided medical decision making. The proposed Fuzzy logic method is based on a distributed approach of type-2 Fuzzy logic algorithm and merges the HPC (High Performance Computing) and cognitive aspect on one model. Accordingly, the method is assigned to be implemented on big data analysis and data science prediction models for healthcare applications. The paper focuses on the proposed distributed Type-2 Fuzzy Logic (DT2FL) method and its application for MRI data analysis under a massively parallel and distributed virtual mobile agent architecture. Indeed, the paper presents some experimental results which highlight the accuracy and efficiency of the proposed method.
Most window-based imageprocessing architecture can only achieve a certain kind of specific algorithms, such as 2D convolution, and therefore lack the flexibility and breadth of application. In addition, improper hand...
详细信息
ISBN:
(数字)9781510617247
ISBN:
(纸本)9781510617247;9781510617230
Most window-based imageprocessing architecture can only achieve a certain kind of specific algorithms, such as 2D convolution, and therefore lack the flexibility and breadth of application. In addition, improper handling of the image boundary can cause loss of accuracy, or consume more logic resources. For the above problems, this paper proposes a new VLSI architecture of window-based imageprocessing operations, which is configurable and based on consideration of the image boundary. An efficient technique is explored to manage the image borders by overlapping and flushing phases at the end of row and the end of frame, which does not produce new delay and reduce the overhead in real-time applications. Maximize the reuse of the on-chip memory data, in order to reduce the hardware complexity and external bandwidth requirements. To perform different scalar function and reduction function operations in pipeline, this can support a variety of applications of window-based imageprocessing. Compared with the performance of other reported structures, the performance of the new structure has some similarities to some of the structures, but also superior to some other structures. Especially when compared with a systolic array processor CWP, this structure at the same frequency of approximately 12.9% of the speed increases. The proposed parallel VLSI architecture was implemented with SIMC 0.18-mu m CMOS technology, and the maximum clock frequency, power consumption, and area are 125Mhz, 57mW, 104.8K Gates, respectively, furthermore the processing time is independent of the different window-based algorithms mapped to the structure.
In this paper, various contrast and low-light image enhancement methods are described and classified into three categories: I) histogram-based, ii) transmission map-based, and iii) retinexbased. The performance of the...
详细信息
Dataflow accelerators feature simplicity, programmability, and energy-efficiency and are visualized as a promising architecture for accelerating perfectly nested loops that dominate several important applications, inc...
详细信息
Dataflow accelerators feature simplicity, programmability, and energy-efficiency and are visualized as a promising architecture for accelerating perfectly nested loops that dominate several important applications, including image and media processing and deep learning. Although numerous accelerator designs are being proposed, how to discover the most efficient way to execute the perfectly nested loop of an application onto computational and memory resources of a given dataflow accelerator (execution method) remains an essential and yet unsolved challenge. In this paper, we propose dMazeRunner - to efficiently and accurately explore the vast space of the different ways to spatiotemporally execute a perfectly nested loop on dataflow accelerators (execution methods). The novelty of dMazeRunner framework is in: i) a holistic representation of the loop nests, that can succinctly capture the various execution methods, ii) accurate energy and performance models that explicitly capture the computation and communication patterns, data movement, and data buffering of the different execution methods, and iii) drastic pruning of the vast search space by discarding invalid solutions and the solutions that lead to the same cost. Our experiments on various convolution layers (perfectly nested loops) of popular deep learning applications demonstrate that the solutions discovered by dMazeRunner are on average 9.16x better in Energy-Delay-Product (EDP) and 5.83x better in execution time, as compared to prior approaches. With additional pruning heuristics, dMazeRunner reduces the search time from days to seconds with a mere 2.56% increase in EDP, as compared to the optimal solution.
Theinversion of linear systems is fundamental in computed tomography (CT) reconstruction. Computational challenges arise when trying to invert large linear systems, as limited computing resources mean that only a part...
详细信息
Theinversion of linear systems is fundamental in computed tomography (CT) reconstruction. Computational challenges arise when trying to invert large linear systems, as limited computing resources mean that only a part of the system can be kept in computer memory at any one time. In linear tomographic inversion problems, such asx-ray tomography, even a standard scan can produce millions of individual measurements and the reconstruction of x-ray attenuation profiles typically requires the estimation of a million attenuation coefficients. To deal with the large data sets encountered in real applications and to efficiently utilize modern graphics processing unit based computing architectures, combinations of iterative reconstruction algorithms and parallel computing schemes are increasingly applied. Whilst different parallel methods have been proposed, individual computations currently need to access either the entire set of observations or estimated x-ray absorptions, which can be prohibitive in many realistic applications. We present a fully parallelizable CT image reconstruction algorithm where each computation node works on arbitrary partial subsets of the data and the reconstructed volume. We further develop a non-homogeneously randomized selection criterion which guarantees that submatrices of the system matrix are selected more frequently if they are dense, thus maximizing information flow through the algorithm. We compare our algorithm with block alternating direction method of multipliers and show that our method is significantly faster for CT reconstruction.
Time resolved experiments are among the most powerful tools in physic for exploring photoelectron spectroscopy phenomena over time scales from milliseconds to picoseconds Moreover, acquisition systems with versatility...
详细信息
Time resolved experiments are among the most powerful tools in physic for exploring photoelectron spectroscopy phenomena over time scales from milliseconds to picoseconds Moreover, acquisition systems with versatility and real-time computing are needed. Cross Delay-Lines detectors (CDL) are extremely suitable for these applications, since arrival time measurement is exploited to perform position detection, allowing to provide both information together. Typical architectures for acquisition systems are based on Aplication Specific Integrated Circuit (ASIC) Time-to-Digital Converters (TDCs) followed by a Field Programmable Logic Array (FPGA); fast parallel computing is combined with time precision, allowing to perform state-of-the-art time resolved experiments. Nevertheless, the limiting factor of this architecture is the absence of reconfigurability of the ASIC that strongly limits the customization respect to the requests of a specific set-up. Especially today, where the state-of-the-art TDCs implemented in FPGA, is comparable to the ASIC solutions. In 2019 Nuclear Science Symposium, we presented a fully-reconfigurable FPGA-based solution, where the TDC and the image reconstruction algorithm were hosted in two FPGAs. In particular, we focused on the 4-channel TDC that, guarantees high-performance in terms of resolution (1 ps), Full-Scale Range (200 µs), Integral Non Linearity, (4 ps over 500 ns), In this contribution, we give significant improvements in order to satisfy the aforementioned experimental experimental requests. In fact, the “pulse-to-pulse” dead-time of the TDC has been reduced from 20 ns to 7 ns, and the transmission rate between the FPGAs has been incremented from 10 to 100 Msps. Furthermore, we have increased the number of channels of the TDC from 4 to 8. This makes possible to correlate the CDL events with signals coming from other sources that can be as well Time-of-Fight or laser pulses as other CDL signals.
The paper deals with the experimental performance assessment of Compressive Sampling (CS) based Terahertz (THz) imagingsystems, an emerging approach for carrying out non-destructive tests of materials with the aim of...
详细信息
The paper deals with the experimental performance assessment of Compressive Sampling (CS) based Terahertz (THz) imagingsystems, an emerging approach for carrying out non-destructive tests of materials with the aim of detecting defects and flaws. Differently from traditional methods based on raster scan, CS approach allows to reconstruct the image of interest through a reduced number of measurements, with a notable reduction of the time of investigation. Although both simulated and experimental results concerning the performance assessment of THz imaging technique are available in literature, the additional uncertainty due to the application of CS approach has never been in-depth taken into account, since the step of CS processing has been considered as ideal. Due to the success of CS-based imaging THz technique and the promising performance of its exploitation also in industrial applications, the considered assumption is no more acceptable. Therefore, the authors focused their attention on the uncertainty sources associated with the experimental application of CS to THz imagingsystems and on their impact on the overall quality of the reconstructed image. Several numerical tests, conducted by means of an optimized design of experiments, allow to (i) assess the sensitivity to relevant uncertainty sources of the reconstructed image quality and (ii) define a suitable performance factor capable of driving experimenters towards a proper configuration of the measurement station. In particular, misalignment of the CS masks turns out to be the most impacting uncertainty source, as confirmed by experimental tests carried out through an actual THz imaging system. Nevertheless, the performance factor estimated on the reconstructed image of a reference target is capable of highlighting the presence of an incorrect configured imaging system, thus making it possible to remedy and provide accurate and reliable THz images.
高光谱图像分类是遥感领域的研究热点之一,是对地观测的重要手段,在地物的精细识别等领域具有重要的应用。使用卷积神经网络(CNN)可以有效地从原始图像中提取高级特征,具有较高的分类精度。但CNN计算量巨大,对硬件要求较高。为了提高模型计算效率,可以在图形处理器(GPU)上进行CNN模型的训练。现有的并行算法,比如GCN(GPU based Cube-CNN),无法充分利用GPU的并行能力,算法加速效果并不理想。为了进一步提升算法效率,提出基于通用矩阵乘法(GEMM)算法的GGCN(GPU based Cube-CNN improved by GEMM)并行加速算法,通过G-PNPE(GEMM based parallel Neighbor Pixels Extraction)对输入数据和卷积核进行重新组织排列,实现卷积的并行计算,有效地提高了GPU的利用率并进一步提升了算法的训练效率。通过分析在三个数据集上的实验结果发现,改进算法的分类精度与原算法保持一致,而且模型的训练时间缩短了30%左右,表明算法的有效性和优越性。
This brief presents a low-power compression-based CMOS image sensor for wireless vision applications. The sensor implements low-bit-depth imaging with planned sensor distortion (PSD) to effectively compress both data ...
详细信息
This brief presents a low-power compression-based CMOS image sensor for wireless vision applications. The sensor implements low-bit-depth imaging with planned sensor distortion (PSD) to effectively compress both data bandwidth and processing power while maintaining high reconstruction image quality. Accordingly, a column-parallel microshift-guided successive-approximation-register (SAR) ADC is proposed to enable 3-bit PSD imaging based on a 3 x 3 pattern. To support normal imaging with low area overhead, the circuit is reconfigurable as an 8-bit SAR/single-slope ADC. The data bandwidth is further compressed by a customized lossless encoder based on predictive coding and run length coding. A 256 x 216 prototype imaging system composed of the compression-based image sensor and the lossless encoder is fabricated in a 0.18-mu m standard CMOS process. Measurement results show that the image sensor achieves 3 bit/pixel (bpp) with 34.9-dB reconstruction peak signal-to-noise ratio (PSNR) and 0.91 structural similarity index (SSIM). With 1/4 spatial downsampling and lossless encoding, 0.31 bpp is obtained with 29.2-dB PSNR and 0.83 SSIM. The sensor consumes 14.8 mu W (full resolution) and 4.3 mu W (downsampling) at 15 fps, achieving state-of-the-art FoMs of 17.8 and 5.2 pJ/***, respectively. Including the encoder, the overall system dissipates as low as 1.2 mu J/frame, making it an attractive solution for wireless sensor networks.
暂无评论