Recent advances in deep convolutional neural networks have shown improved performance in face super-resolution through joint training with other tasks such as face analysis and landmark prediction. However, these meth...
详细信息
Recent advances in deep convolutional neural networks have shown improved performance in face super-resolution through joint training with other tasks such as face analysis and landmark prediction. However, these methods have certain limitations. One major limitation is the requirement for manual marking information on the dataset for multi-task joint learning. This additional marking process increases the computational cost of the network model. Additionally, since prior information is often estimated from low-quality faces, the obtained guidance information tends to be inaccurate. To address these challenges, a novel Decoder Structure Guided CNN-Transformer Network (DCTNet) is introduced, which utilises the newly proposed Global-Local Feature Extraction Unit (GLFEU) for effective embedding. Specifically, the proposed GLFEU is composed of an attention branch and a Transformer branch, to simultaneously restore global facial structure and local texture details. Additionally, a Multi-Stage Feature Fusion Module is incorporated to fuse features from different network stages, further improving the quality of the restored face images. Compared with previous methods, DCTNet improves Peak signal-to-Noise Ratio by 0.23 and 0.19 dB on the CelebA and Helen datasets, respectively. Experimental results demonstrate that the designed DCTNet offers a simple yet powerful solution to recover detailed facial structures from low-quality images. An architecture called the Decoder Structure Guided CNN-Transformer Network (DCTNet) is presented by the authors for super-resolution of the face image. DCTNet utilises a decoder structure as its backbone, focusing primarily on Global-Local Feature Extraction Units (GLFEU).image
Nowadays, lung cancer has arisen as one of the major causes of death and subsequently making its detection immensely difficult. In this research article which consists of five steps framework, three different methods ...
详细信息
Nowadays, lung cancer has arisen as one of the major causes of death and subsequently making its detection immensely difficult. In this research article which consists of five steps framework, three different methods were developed for automatic detection and classification of lung tumor in CT (Computed Tomography) images. The initial step is an image acquisition;here, the input images are collected from public and in-house clinical lung cancer image. The next step image enhancement is performed using WFUM (Weiner Filter with Unsharp masking) enhancement technique which can eradicate the noise discern in the input images. In the subsequent step, the HRWBM (Hierarchical Random Walker with Bayes Model) segmentation algorithm is implemented on an enhanced image sequence for lung tumor region prediction and then the features are extracted using GLCM (Gray Level Co-occurrence Matrix). Ultimately, the lung cancer images (Public LIDC database) are classified by utilizing an HRWBM with SVM (Support Vector Machine) classification where the accuracy is 77.8%;in HRWBM with FFNN (Feed-Forward neural Network) classification, the accuracy is 93.3%;in HRWBM with DRNN (Deep Recurrent neural Network) classification, the accuracy is 97.3%. For in-house clinical dataset, the classification result is HRWBM with SVM classification where the accuracy is 84%;in HRWBM with FFNN classification, the accuracy is 90%;in HRWBM with DRNN classification, the accuracy is 94.7% predicted. The classification result reveals that among the three algorithms, the third method improves the accurate identification of lung cancer.
This research study analyzes the multidimensional landscape of steganography, examining its historical roots, theoretical background, contemporary approaches, and various applications. Beginning with a historical over...
详细信息
ISBN:
(纸本)9798350391558;9798350379990
This research study analyzes the multidimensional landscape of steganography, examining its historical roots, theoretical background, contemporary approaches, and various applications. Beginning with a historical overview, this study investigates the evolution of steganography from its ancient roots to its present iterations in the digital world. Next, the study progresses towards analyzing the fundamental principles and theoretical frameworks that underpin steganographic systems, such as cryptography and digital signalprocessing. Finally, this study presents a thorough evaluation of contemporary steganographic technologies, which range from simple LSB (Least Significant Bit) substitution techniques to advanced adaptive algorithms and machine learning methods by including deep-learning based steganography and coverless steganography. Notably, this study identifies key challenges, including detection resistance, payload capacity, and robustness against attacks. Overall, this study presents a thorough understanding of steganography, emphasizing its significance as a versatile tool for communication in the digital era, while also highlighting the challenges that pave way for future innovations.
The rate-distortion performance of neuralimage compression models has exceeded the state-of-the-art for non-learned codecs, but neural codecs are still far from widespread deployment and adoption. The largest obstacl...
详细信息
ISBN:
(纸本)9781728198354
The rate-distortion performance of neuralimage compression models has exceeded the state-of-the-art for non-learned codecs, but neural codecs are still far from widespread deployment and adoption. The largest obstacle is having efficient models that are feasible on a wide variety of consumer hardware. Comparative research and evaluation is difficult due to the lack of standard benchmarking platforms and due to variations in hardware architectures and test environments. Through our rate-distortion-computation (RDC) study we demonstrate that neither floating-point operations (FLOPs) nor runtime are sufficient on their own to accurately rank neural compression methods. We also explore the RDC frontier, which leads to a family of model architectures with the best empirical trade-off between computational requirements and RD performance. Finally, we identify a novel neural compression architecture that yields state-of-the-art RD performance with rate savings of 23.1% over BPG (7.0% over VTM and 3.0% over ELIC) without requiring significantly more FLOPs than other learning-based codecs.
Current privacy-aware joint source-channel coding (JSCC) works aim at avoiding private information transmission by adversarially training the JSCC encoder and decoder under specific signal-to-noise ratios (SNRs) of ea...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Current privacy-aware joint source-channel coding (JSCC) works aim at avoiding private information transmission by adversarially training the JSCC encoder and decoder under specific signal-to-noise ratios (SNRs) of eavesdroppers. However, these approaches incur additional computational and storage requirements as multiple neural networks must be trained for various eavesdroppers' SNRs to determine the transmitted information. To overcome this challenge, we propose a novel privacy-aware JSCC for image transmission based on disentangled information bottleneck (DIB-PAJSCC). In particular, we derive a novel disentangled information bottleneck objective to disentangle private and public information. Given the separate information, the transmitter can transmit only public information to the receiver while minimizing reconstruction distortion. Since DIB-PAJSCC transmits only public information regardless of the eavesdroppers' SNRs, it can eliminate additional training adapted to eavesdroppers' SNRs. Experimental results show that DIB-PAJSCC can reduce the eavesdropping accuracy on private information by up to 20% compared to existing methods.
This letter proposes GDNet equipped with the generation of discriminative mixing regions (GDMR) and discriminability-aware local image mixing (DLIM), a steganalysis network aiming at alleviating significant accuracy d...
详细信息
This letter proposes GDNet equipped with the generation of discriminative mixing regions (GDMR) and discriminability-aware local image mixing (DLIM), a steganalysis network aiming at alleviating significant accuracy degradation caused by cover-source mismatch (CSM), which pertains to the situation where source and target domains come from different distributions. GDNet guides a steganalyzer trained on the source domain to the target domain by mixing the source and target images at the region-level and pixel-level to construct a discriminative intermediate domain. On the one hand, GDMR designs an epoch-related region-level mixing ratio to control the size of the mixed region, and based on this ratio, selects the regions within the target image strongly related to the stego signal to participate in the generation of the intermediate domain, while suppressing other regions weakly related to the stego signal. On the other hand, DLIM utilizes the pixel-level mixing ratio to reduce the impact of the regions weakly related to the stego signal on the discriminability of the intermediate domain as the region-level mixing ratio increases, thereby increasing the diversity of the intermediate domain. Experimental results demonstrate that GDNet significantly outperforms existing methods across various CSM scenarios.
This paper presents a novel neural radiance field rendering method named 3D-IBLGS, which integrates prefiltered radiance fields to address global illumination in large-scale scenes. By extending the 3DGS formulation w...
详细信息
Context. The classification of galaxy morphology is among the most active fields in astronomical research today. With the development of artificial intelligence technology, deep learning is a useful tool in the classi...
详细信息
Context. The classification of galaxy morphology is among the most active fields in astronomical research today. With the development of artificial intelligence technology, deep learning is a useful tool in the classification of the morphology of galaxies and significant progress has been made in this domain. However, there is still some room for improvement in terms of classification accuracy, automation, and related issues. Aims. Convolutional vision Transformer (CvT) is an improved version of the Vision Transformer (ViT) model. It improves the performance of the ViT model by introducing a convolutional neural network (CNN). This study explores the performance of the CvT model in the area of galaxy morphology classification. methods. In this work, the CvT model was applied, for the first time, in a five-class classification task of galaxy morphology. We added different types and degrees of noise to the original galaxy images to verify that the CvT model achieves good classification performance, even in galaxy images with low signal-to-noise ratios (S/Ns). Then, we also validated the classification performance of the CvT model for galaxy images at different redshifts based on the low-redshift dataset GZ2 and the high-redshift dataset Galaxy Zoo CANDELS. In addition, we visualized and analyzed the classification results of the CvT model based on the t-distributed stochastic neighborhood -embedding (t-SNE) algorithm. Results. We find that (1) compared with other five-class classification models of galaxy morphology based on CNN models, the average accuracy, precision, recall, and F1_score evaluation metrics of the CvT classification model are all higher than 98%, which is an improvement of at least 1% compared with those based on CNNs;(2) the classification visualization results show that different categories of galaxies are separated from each other in multi-dimensional space. Conclusions. The application of the CvT model to the classification study of galaxy morpho
In response to the problem of traditional methods ignoring audio modality tampering, this study aims to explore an effective deep forgery video detection technique that improves detection precision and reliability by ...
详细信息
In response to the problem of traditional methods ignoring audio modality tampering, this study aims to explore an effective deep forgery video detection technique that improves detection precision and reliability by fusing lip images and audio signals. The main method used is lip-audio matching detection technology based on the Siamese neural network, combined with MFCC (Mel Frequency Cepstrum Coefficient) feature extraction of band-pass filters, an improved dual-branch Siamese network structure, and a two-stream network structure design. Firstly, the video stream is preprocessed to extract lip images, and the audio stream is preprocessed to extract MFCC features. Then, these features are processed separately through the two branches of the Siamese network. Finally, the model is trained and optimized through fully connected layers and loss functions. The experimental results show that the testing accuracy of the model in this study on the LRW (Lip Reading in the Wild) dataset reaches 92.3%;the recall rate is 94.3%;the F1 score is 93.3%, significantly better than the results of CNN (Convolutional neural Networks) and LSTM (Long Short-Term Memory) models. In the validation of multi-resolution image streams, the highest accuracy of dual-resolution image streams reaches 94%. Band-pass filters can effectively improve the signal-to-noise ratio of deep forgery video detection when processing different types of audio signals. The real-time processing performance of the model is also excellent, and it achieves an average score of up to 5 in user research. These data demonstrate that the method proposed in this study can effectively fuse visual and audio information in deep forgery video detection, accurately identify inconsistencies between video and audio, and thus verify the effectiveness of lip-audio modality fusion technology in improving detection performance.
In this work, we utilize the high-fidelity generation abilities of diffusion models to solve blind JPEG restoration at high compression levels. We propose an elegant modification of the forward stochastic differential...
详细信息
In this work, we utilize the high-fidelity generation abilities of diffusion models to solve blind JPEG restoration at high compression levels. We propose an elegant modification of the forward stochastic differential equation of diffusion models to adapt them to this restoration task and name our method DriftRec. Comparing DriftRec against an L-2 regression baseline with the same network architecture and state-of-the-art techniques for JPEG restoration, we show that our approach can escape the tendency of other methods to generate blurry images, and recovers the distribution of clean images significantly more faithfully. For this, only a dataset of clean/corrupted image pairs and no knowledge about the corruption operation is required, enabling wider applicability to other restoration tasks. In contrast to other conditional and unconditional diffusion models, we utilize the idea that the distributions of clean and corrupted images are much closer to each other than each is to the usual Gaussian prior of the reverse process in diffusion models. Our approach therefore requires only low levels of added noise and needs comparatively few sampling steps even without further optimizations. We show that DriftRec naturally generalizes to realistic and difficult scenarios such as unaligned double JPEG compression and blind restoration of JPEGs found online, without having encountered such examples during training.
暂无评论