This paper presents a throughput efficient FPGA implementation of the 'Set Partitioning in Hierarchical Trees' (SPIHT) algorithm for compression of images. The SPIHT uses inherent redundancy among wavelet coef...
详细信息
This paper presents a throughput efficient FPGA implementation of the 'Set Partitioning in Hierarchical Trees' (SPIHT) algorithm for compression of images. The SPIHT uses inherent redundancy among wavelet coefficients and suited for both gray and color images. The SPIHT algorithm uses dynamic data structures which hinders hardware realization. In this FPGA implementation have modified basic SPIHT in two ways, one by using static (fixed) mappings which represent significant information and the other by interchanging the sorting and refinement passes. A hardware realization is done in a Xilinx XC3S200 device. The SPIHT algorithm can be applied to both grey-scale and colored images. SPIHT displays exceptional characteristics over several properties like good image quality, fast coding and decoding, a fully progressive bit stream, application in lossless compression, error protection and ability to code for exact bit rate.
In this dissertation, three problems in image deblurring, inpainting and virtual content insertion are solved in a Bayesian framework. Camera shake, motion or defocus during exposure leads to image blur. Single image ...
详细信息
In this dissertation, three problems in image deblurring, inpainting and virtual content insertion are solved in a Bayesian framework. Camera shake, motion or defocus during exposure leads to image blur. Single image deblurring has achieved remarkable results by solving a MAP problem, but there is no perfect solution due to inaccurate image prior and estimator. In the first part, a new non-blind deconvolution algorithm is proposed. The image prior is represented by a Gaussian Scale Mixture(GSM) model, which is estimated from non-blurry images as training data. Our experimental results on a total twelve natural images have shown that more details are restored than previous deblurring algorithms. In augmented reality, it is a challenging problem to insert virtual content in video streams by blending it with spatial and temporal information. A generic virtual content insertion (VCI) system is introduced in the second part. To the best of my knowledge, it is the first successful system to insert content on the building facades from street view video streams. Without knowing camera positions, the geometry model of a building facade is established by using a detection and tracking combined strategy. Moreover, motion stabilization, dynamic registration and color harmonization contribute to the excellent augmented performance in this automatic VCI system. Coding efficiency is an important objective in video coding. In recent years, video coding standards have been developing by adding new tools. However, it costs numerous modifications in the complex coding systems. Therefore, it is desirable to consider alternative standard-compliant approaches without modifying the codec structures. In the third part, an exemplar-based data pruning video compression scheme for intra frame is introduced. Data pruning is used as a pre-processing tool to remove part of video data before they are encoded. At the decoder, missing data is reconstructed by a sparse linear combination of similar
With the widespread use of powerful image editing tools, image tampering becomes easy and realistic. Existing image forensic methods still face challenges of low generalization performance and robustness. In this lett...
详细信息
With the widespread use of powerful image editing tools, image tampering becomes easy and realistic. Existing image forensic methods still face challenges of low generalization performance and robustness. In this letter, we propose an effective image tampering localization scheme based on ConvNeXt encoder and multi-scale Feature Fusion (ConvNeXtFF). Stacked ConvNeXt blocks are utilized as an encoder to capture hierarchical multi-scale features, which are then fused in decoder for locating tampered pixels accurately. Combined loss function and effective data augmentation strategies are adopted to further improve the model performance. Extensive experimental results show that both localization accuracy and robustness of the ConvNeXtFF scheme outperform other state-of-the-art ones. The source code is available at https://***/multimediaFor/ConvNeXtFF.
Currently, with the rapid development of deep learning, many breakthroughs have been made in the field of facial expression recognition (FER). However, according to our prior knowledge, facial images contain not only ...
详细信息
Currently, with the rapid development of deep learning, many breakthroughs have been made in the field of facial expression recognition (FER). However, according to our prior knowledge, facial images contain not only expression-related features but also some identity-related features, and the identity-related features vary from person to person which often have a negative influence on the FER process. It is one of the most important challenges in the field of FER. In this paper, a novel feature separation model exchange-GAN is proposed for the FER task, which can realize the separation of expression-related features and expression-independent features with high purity. And the FER method based on the exchange-GAN can overcome the interference of identity-related features to a large extent. First, the feature separation is achieved by the exchange-GAN through partial feature exchange and various constraints. Then we ignore the expression-independent features, and conduct FER only according to the expression-related features to alleviate the adverse effect of identity-related features. Finally, some experiments are conducted on three famous databases with the FER methods proposed in this paper. The experimental results show that the proposed FER method can alleviate the interference of identity-related information through feature separation by the exchange-GAN and achieve excellent performance for the objects that have not appeared in the training set. What's more, our method can obtain very competitive FER accuracy on the three experimental databases. (C) 2020 Elsevier B.V. All rights reserved.
Delivery of continuous cardiopulmonary resuscitation (CPR) plays an important role in the out -of -hospital cardiac arrest (OHCA) survival rate. However, to prevent CPR artifacts being superimposed on ECG morphology d...
详细信息
Delivery of continuous cardiopulmonary resuscitation (CPR) plays an important role in the out -of -hospital cardiac arrest (OHCA) survival rate. However, to prevent CPR artifacts being superimposed on ECG morphology data, currently available automated external defibrillators (AEDs) require pauses in CPR for accurate analysis heart rhythms. In this study, we propose a novel Convolutional Neural Network -based encoder -decoder (CNNED) structure with a shock advisory algorithm to improve the accuracy and reliability of shock versus nonshock decision -making without CPR pause in OHCA scenarios. Our approach employs a cascade of CNNEDs in conjunction with an AED shock advisory algorithm to process the ECG data for shock decisions. Initially, a CNNED trained on an equal number of shockable and non -shockable rhythms is used to filter the CPR -contaminated data. The resulting filtered signal is then fed into a second CNNED, which is trained on imbalanced data more tilted toward the specific rhythm being analyzed. A reliable shock versus non -shock decision is made when both classifiers from the cascade structure agree, while segments with conflicting classifications are labeled as indeterminate, indicating the need for additional segments to analyze. To evaluate our approach, we generated CPR -contaminated ECG data by combining clean ECG data with 52 CPR samples. We used clean ECG data from the CUDB, AFDB, SDDB, and VFDB databases, to which 52 CPR artifact cases were added, while a separate test set provided by the AED manufacturer Defibtech LLC was used for performance evaluation. The test set comprised 20,384 non -shockable CPR -contaminated segments from 392 subjects, as well as 3744 shockable CPR -contaminated samples from 41 subjects with coarse ventricular fibrillation (VF) and 31 subjects with rapid ventricular tachycardia (rapid VT). We observed improvements in rhythm analysis using our proposed cascading CNNED structure when compared to using a single CNNED struct
Classic high-accuracy semantic segmentation models typically come with a large number of parameters, making them unsuitable for deployment on driverless platforms with limited computational power. To strike a balance ...
详细信息
Classic high-accuracy semantic segmentation models typically come with a large number of parameters, making them unsuitable for deployment on driverless platforms with limited computational power. To strike a balance between accuracy and limited computational budget, and enable the use of the classic segmentation model UNet in unmanned driving scenarios, this paper proposes a multi-unit stacked architecture (MSA), namely, MSA-Net, based on UNet and ShuffleNetv2. First, MSA-Net replaces the convolution blocks in the UNet encoder and decoder with stacked basic ShuffleNetv2 units, which greatly reduces computational cost while maintaining high segmentation accuracy. Second, MSA-Net designs enhanced skip connections using pointwise convolution and convolutional block attention (CBAM) to aid the decoder in selecting more relevant and valuable information. Third, MSA-Net proposes multi-scale internal connections to extend the receptive fields of encoder and decoder with little increase in model parameters. The comprehensive experiments show MSANet achieves an optimal balance on the Cityscapes dataset between accuracy and model complexity, with strong generalization on the enhanced PASCAL VOC 2012 dataset. MSA-Net achieves a mean intersection over union (mIoU) of 73.6% and an inference speed of 31.0 frames per second (FPS) on the Cityscapes test dataset. We also propose two other MSA-Net models of different sizes, providing more options for resource-constrained inference.
An effective and accurate building energy consumption prediction model is an important means to effectively use building management systems and improve energy efficiency. To cope with the development and changes in di...
详细信息
An effective and accurate building energy consumption prediction model is an important means to effectively use building management systems and improve energy efficiency. To cope with the development and changes in digital data, data-driven models, especially deep learning models, have been applied for the prediction of energy consumption and have achieved good accuracy. However, as a deep learning model that can process high-dimensional data, the model often lacks interpretability, which limits the further application and promotion of the model. This paper proposes three interpretable encoder and decoder models based on long short-term memory (LSTM) and self-attention. Attention based on hidden layer states and feature-based attention improves the interpretability of the deep learning models. A case study of one office building is discussed to demonstrate the proposed method and models. Firstly, the addition in future real weather information yields only a 0.54% improvement in the MAPE. The visualization of the model attention weights improves the interpretability of the model at the hidden state level and feature level. For the hidden state of different time steps, the LSTM network will focus on the hidden state of the last time step because it contains more information. The Transformer model gives almost equal attention weight to each day in the coding sequence. For the interpretable results at the feature level, daily max temperature, mean temperature, min temperature, and dew point temperature are the four most important features. The four characteristics of pressure, wind speed-related features, and holidays have the lowest average weights. (c) 2021 Elsevier B.V. All rights reserved.
Sports videos are widely used by athletes and coaches for training and match analysis purposes outside the mainstream audience. Sports videos should be effectively classified into different genres to easily retrieve a...
详细信息
Sports videos are widely used by athletes and coaches for training and match analysis purposes outside the mainstream audience. Sports videos should be effectively classified into different genres to easily retrieve and index them from large video datasets. Manual labelling classification methods may cause errors and have low accuracy. Classification based on video content analysis is challenging for computer vision-based techniques. This work introduces an improved focus-net deep learning (DL) model called the Convolutional squeeze U-Net based encoder-decoder for sports video retrieval and classification. First, the keyframes are extracted from the input sports video using a clustering and optical flow analysis method. In the next stage, the frames are preprocessed using a smoothed shock filtering technique to remove the noise. The process of image segmentation is carried out using a Convolutional squeeze U-Net based encoder-decoder model. Finally, the sports video can be classified using the softmax classifier. A CNN (convolutional neural network) is utilized at the encoder section for extracting the features and fed to the decoder for video classification. The experiments are performed in the UCF101 dataset, and the proposed model achieved an overall accuracy of 99.68%. Hence, it is proven that the proposed focus-net model can be efficiently utilized in sports video classification.
Segmenting coronary arteries from X-ray coronary angiography(XCA) images allows observation of coronary artery morphology and stenosis, which is of great significance for computer-aided diagnosis and coronary artery d...
详细信息
ISBN:
(数字)9789887581581
ISBN:
(纸本)9798350366907
Segmenting coronary arteries from X-ray coronary angiography(XCA) images allows observation of coronary artery morphology and stenosis, which is of great significance for computer-aided diagnosis and coronary artery disease ***, XCA images have low contrast and irregular lighting due to the limits of existing imaging techniques, making vascular segmentation challenging. Segmentation of coronary arteries using conventional U-shaped segmentation networks is challenging due to the problem of major differences in vascular morphology, size, and overlap. This paper proposes a network based on the U-shape structure for multiscale context information fusion with lightweight GhostNetV2 as the backbone feature extraction network to improve the model's feature extraction capabilities. Then a multi-scale context fusion module(MCF) is proposed to effectively capture the contextual information of blood vessels. Finally, we propose a feature re-extraction module(FRM)to achieve effective fine feature re-extraction in complex backgrounds. Experimental results show that the model we propose achieves more accurate coronary artery segmentation in complex backgrounds with fewer parameters, improves the segmentation of fine vessels at the end of coronary arteries, and performs well compared with other artery segmentation models. The F1 score,Intersection over Union(IOU), accuracy(ACC), and Sensitivity(Sen) of the model can reach 82.23%, 69.83%, 98.79%, 81.93%,respectively.
In most of the telemedicine applications, the role of image compression techniques is important to deal with the medical images. This will be used for storage and transfer of data over a low bandwidth channel like the...
详细信息
ISBN:
(数字)9781728157184
ISBN:
(纸本)9781728157191
In most of the telemedicine applications, the role of image compression techniques is important to deal with the medical images. This will be used for storage and transfer of data over a low bandwidth channel like the Internet by pathologist to a doctor for diagnosis problems of *** a medical image is compressed using methods of lossy compression, the doctor will not be able to perceive any deterioration in quality with respect to the original input image. One of the main disadvantage of the lossy compression algorithms that are commonly used for multimedia applications not for medical image, while the overall quality of the image can be controlled to some extent, in these cases, it is necessary to use lossless compression algorithms, because the compression and decompression of an image is identical to the original image since the information is preserved during the decompression process after image compression the reconstructed image is exact replica of the original image means no information is lost in the coding process. The aim of this paper is to provide a review on various image compression techniques, which is used in medical image compression, performance analysis and compared existing research on medical image compression.
暂无评论