检索结果-内蒙古大学图书馆

A Decoder Structure Guided CNN-Transformer Network for face super-resolution

IET COMPUTER VISION 2024年第4期18卷 473-484页

作者： Dou, Rui Li, Jiawen Wan, Xujie Chang, Heyou Zheng, Hao Gao, Guangwei Nanjing Univ Posts & Telecommun Inst Adv Technol Nanjing Peoples R China Soochow Univ Prov Key Lab Comp Informat Proc Technol Suzhou Peoples R China Nanjing Xiaozhuang Univ Key Lab Intelligent Informat Proc Nanjing Peoples R China

Recent advances in deep convolutional neural networks have shown improved performance in face super-resolution through joint training with other tasks such as face analysis and landmark prediction. However, these methods have certain limitations. One major limitation is the requirement for manual marking information on the dataset for multi-task joint learning. This additional marking process increases the computational cost of the network model. Additionally, since prior information is often estimated from low-quality faces, the obtained guidance information tends to be inaccurate. To address these challenges, a novel Decoder Structure Guided CNN-Transformer Network (DCTNet) is introduced, which utilises the newly proposed Global-Local Feature Extraction Unit (GLFEU) for effective embedding. Specifically, the proposed GLFEU is composed of an attention branch and a Transformer branch, to simultaneously restore global facial structure and local texture details. Additionally, a Multi-Stage Feature Fusion Module is incorporated to fuse features from different network stages, further improving the quality of the restored face images. Compared with previous methods, DCTNet improves Peak signal-to-Noise Ratio by 0.23 and 0.19 dB on the CelebA and Helen datasets, respectively. Experimental results demonstrate that the designed DCTNet offers a simple yet powerful solution to recover detailed facial structures from low-quality images. An architecture called the Decoder Structure Guided CNN-Transformer Network (DCTNet) is presented by the authors for super-resolution of the face image. DCTNet utilises a decoder structure as its backbone, focusing primarily on Global-Local Feature Extraction Units (GLFEU).image

关键词： computer vision image processing

来源：评论

学校读者我要写书评

暂无评论

Lung tumor analysis using a thrice novelty block classification approach

引用

signal image AND VIDEO processing 2023年第6期17卷 3027-3034页

作者： Soniya, S. L. Raj, T. Ajith Bosco St Xaviers Catholic Coll Engn Dept CSE Chunkankadai Nagercoil India PSN Coll Engn & Technol Dept ECE Tirunelveli India

Nowadays, lung cancer has arisen as one of the major causes of death and subsequently making its detection immensely difficult. In this research article which consists of five steps framework, three different methods were developed for automatic detection and classification of lung tumor in CT (Computed Tomography) images. The initial step is an image acquisition;here, the input images are collected from public and in-house clinical lung cancer image. The next step image enhancement is performed using WFUM (Weiner Filter with Unsharp masking) enhancement technique which can eradicate the noise discern in the input images. In the subsequent step, the HRWBM (Hierarchical Random Walker with Bayes Model) segmentation algorithm is implemented on an enhanced image sequence for lung tumor region prediction and then the features are extracted using GLCM (Gray Level Co-occurrence Matrix). Ultimately, the lung cancer images (Public LIDC database) are classified by utilizing an HRWBM with SVM (Support Vector Machine) classification where the accuracy is 77.8%;in HRWBM with FFNN (Feed-Forward neural Network) classification, the accuracy is 93.3%;in HRWBM with DRNN (Deep Recurrent neural Network) classification, the accuracy is 97.3%. For in-house clinical dataset, the classification result is HRWBM with SVM classification where the accuracy is 84%;in HRWBM with FFNN classification, the accuracy is 90%;in HRWBM with DRNN classification, the accuracy is 94.7% predicted. The classification result reveals that among the three algorithms, the third method improves the accurate identification of lung cancer.

关键词： Lung cancer Weiner filter with unsharp masking (WFUM) Hierarchical random walker with Bayes model (HRWBM) Support vector machine (SVM) Feed-forward neural network (FFNN) classifier Deep recurrent neural network (DRNN) classifier

来源：评论

学校读者我要写书评

暂无评论

A Comprehensive Survey of image Steganography 2

A Comprehensive Survey of Image Steganography

引用

2nd International Conference on Sustainable Computing and Smart Systems (ICSCSS)

作者： Kalaiarasi, G. Sudharani, B. Jonnalagadda, Sharon Christiana Battula, Harsha Vardhan Sanagala, Bhavana Vignans Fdn Sci Technol & Res Dept ACSE Guntur Andhra Pradesh India

ISBN: (纸本)9798350391558;9798350379990

This research study analyzes the multidimensional landscape of steganography, examining its historical roots, theoretical background, contemporary approaches, and various applications. Beginning with a historical overview, this study investigates the evolution of steganography from its ancient roots to its present iterations in the digital world. Next, the study progresses towards analyzing the fundamental principles and theoretical frameworks that underpin steganographic systems, such as cryptography and digital signal processing. Finally, this study presents a thorough evaluation of contemporary steganographic technologies, which range from simple LSB (Least Significant Bit) substitution techniques to advanced adaptive algorithms and machine learning methods by including deep-learning based steganography and coverless steganography. Notably, this study identifies key challenges, including detection resistance, payload capacity, and robustness against attacks. Overall, this study presents a thorough understanding of steganography, emphasizing its significance as a versatile tool for communication in the digital era, while also highlighting the challenges that pave way for future innovations.

关键词： image steganography Information hiding Convolutional neural Network Steganography (CNN) Generative Adversarial Network Steganography (GAN) stochastic computing

来源：评论

学校读者我要写书评

暂无评论

ADVANCING THE RATE-DISTORTION-COMPUTATION FRONTIER FOR neural image COMPRESSION 30

ADVANCING THE RATE-DISTORTION-COMPUTATION FRONTIER FOR NEURA...

引用

30th IEEE International Conference on image processing (ICIP)

作者： Minnen, David Johnston, Nick Google Res Mountain View CA 94043 USA

ISBN: (纸本)9781728198354

The rate-distortion performance of neural image compression models has exceeded the state-of-the-art for non-learned codecs, but neural codecs are still far from widespread deployment and adoption. The largest obstacle is having efficient models that are feasible on a wide variety of consumer hardware. Comparative research and evaluation is difficult due to the lack of standard benchmarking platforms and due to variations in hardware architectures and test environments. Through our rate-distortion-computation (RDC) study we demonstrate that neither floating-point operations (FLOPs) nor runtime are sufficient on their own to accurately rank neural compression methods. We also explore the RDC frontier, which leads to a family of model architectures with the best empirical trade-off between computational requirements and RD performance. Finally, we identify a novel neural compression architecture that yields state-of-the-art RD performance with rate savings of 23.1% over BPG (7.0% over VTM and 3.0% over ELIC) without requiring significantly more FLOPs than other learning-based codecs.

关键词： image compression neural networks FLOPs runtime

来源：评论

学校读者我要写书评

暂无评论

PRIVACY-AWARE JOINT SOURCE-CHANNEL CODING FOR image TRANSMISSION BASED ON DISENTANGLED INFORMATION BOTTLENECK 49

PRIVACY-AWARE JOINT SOURCE-CHANNEL CODING FOR IMAGE TRANSMIS...

引用

49th IEEE International Conference on Acoustics, Speech, and signal processing (ICASSP)

作者： Sun, Lunan Guo, Caili Chen, Mingzhe Yang, Yang Beijing Univ Posts & Telecommun Beijing Peoples R China Univ Miami Coral Gables FL 33124 USA

ISBN: (纸本)9798350344868;9798350344851

Current privacy-aware joint source-channel coding (JSCC) works aim at avoiding private information transmission by adversarially training the JSCC encoder and decoder under specific signal-to-noise ratios (SNRs) of eavesdroppers. However, these approaches incur additional computational and storage requirements as multiple neural networks must be trained for various eavesdroppers' SNRs to determine the transmitted information. To overcome this challenge, we propose a novel privacy-aware JSCC for image transmission based on disentangled information bottleneck (DIB-PAJSCC). In particular, we derive a novel disentangled information bottleneck objective to disentangle private and public information. Given the separate information, the transmitter can transmit only public information to the receiver while minimizing reconstruction distortion. Since DIB-PAJSCC transmits only public information regardless of the eavesdroppers' SNRs, it can eliminate additional training adapted to eavesdroppers' SNRs. Experimental results show that DIB-PAJSCC can reduce the eavesdropping accuracy on private information by up to 20% compared to existing methods.

关键词： Joint source-channel coding image transmission privacy wiretap channel

来源：评论

学校读者我要写书评

暂无评论

Discriminability-Aware Intermediate Domains for Mismatched Steganalysis

引用

IEEE signal processing LETTERS 2024年 31卷 3054-3058页

作者： Li, Yang Yu, Lifang Weng, Shaowei Tian, Huawei Cao, Gang Beijing Inst G Commun Dept Informat Engn Beijing 100026 Peoples R China Fujian Univ Technol Fujian Prov Key Lab Big Data Min & Applicat Fuzhou 350118 Peoples R China Peoples Publ Secur Univ China Res Ctr Publ Secur Informat Beijing 100038 Peoples R China Commun Univ China Sch Comp & Cyber Sci Beijing 100024 Peoples R China

This letter proposes GDNet equipped with the generation of discriminative mixing regions (GDMR) and discriminability-aware local image mixing (DLIM), a steganalysis network aiming at alleviating significant accuracy degradation caused by cover-source mismatch (CSM), which pertains to the situation where source and target domains come from different distributions. GDNet guides a steganalyzer trained on the source domain to the target domain by mixing the source and target images at the region-level and pixel-level to construct a discriminative intermediate domain. On the one hand, GDMR designs an epoch-related region-level mixing ratio to control the size of the mixed region, and based on this ratio, selects the regions within the target image strongly related to the stego signal to participate in the generation of the intermediate domain, while suppressing other regions weakly related to the stego signal. On the other hand, DLIM utilizes the pixel-level mixing ratio to reduce the impact of the regions weakly related to the stego signal on the discriminability of the intermediate domain as the region-level mixing ratio increases, thereby increasing the diversity of the intermediate domain. Experimental results demonstrate that GDNet significantly outperforms existing methods across various CSM scenarios.

关键词： Feature extraction Training Probability density function Degradation Steganography Accuracy Market research Gray-scale Convolutional neural networks Convolution Cover-source mismatch discriminative intermediate domain generation of discriminative mixing regions discriminability-aware local image mixing steganographic attention map

来源：评论

学校读者我要写书评

暂无评论

3D Gaussian Splatting for imaged-Based Lighting 4

3D Gaussian Splatting for Imaged-Based Lighting

引用

4th International signal processing, Communications and Engineering Management Conference, ISPCEM 2024

作者： Wu, Zhuguo Li, Zhiru Rong, Chenchu Wang, Yuanqing School of Electronic Science and Engineering Nanjing University Nanjing China

ISBN: (纸本)9798331528676

This paper presents a novel neural radiance field rendering method named 3D-IBLGS, which integrates prefiltered radiance fields to address global illumination in large-scale scenes. By extending the 3DGS formulation with image-space constraints, 3D-IBLGS optimizes implicit lighting distributions within neural images, capturing detailed spatial variations in lighting. The method efficiently renders general scenes while maintaining the quality of the original 3DGS and allows scene editing by changing materials or adding objects. Compared to Monte Carlo integration methods relying on environmental lighting, 3D-IBLGS demonstrates superior performance in decomposition and image synthesis, successfully segmenting the global illumination response of image components. ©2024 IEEE.

关键词： image segmentation

来源：评论

学校读者我要写书评

暂无评论

Galaxy morphology classification based on Convolutional vision Transformer (CvT)

引用

ASTRONOMY & ASTROPHYSICS 2024年 683卷

作者： Cao, Jie Xu, Tingting Deng, Yuhe Deng, Linhua Yang, Mingcun Liu, Zhijing Zhou, Weihong Yunnan Minzu Univ Sch Math & Comp Sci Kunming 650504 Yunnan Peoples R China Chinese Acad Sci Key Lab Struct & Evolut Celestial Objects Kunming 650011 Yunnan Peoples R China

Context. The classification of galaxy morphology is among the most active fields in astronomical research today. With the development of artificial intelligence technology, deep learning is a useful tool in the classification of the morphology of galaxies and significant progress has been made in this domain. However, there is still some room for improvement in terms of classification accuracy, automation, and related issues. Aims. Convolutional vision Transformer (CvT) is an improved version of the Vision Transformer (ViT) model. It improves the performance of the ViT model by introducing a convolutional neural network (CNN). This study explores the performance of the CvT model in the area of galaxy morphology classification. methods. In this work, the CvT model was applied, for the first time, in a five-class classification task of galaxy morphology. We added different types and degrees of noise to the original galaxy images to verify that the CvT model achieves good classification performance, even in galaxy images with low signal-to-noise ratios (S/Ns). Then, we also validated the classification performance of the CvT model for galaxy images at different redshifts based on the low-redshift dataset GZ2 and the high-redshift dataset Galaxy Zoo CANDELS. In addition, we visualized and analyzed the classification results of the CvT model based on the t-distributed stochastic neighborhood -embedding (t-SNE) algorithm. Results. We find that (1) compared with other five-class classification models of galaxy morphology based on CNN models, the average accuracy, precision, recall, and F1_score evaluation metrics of the CvT classification model are all higher than 98%, which is an improvement of at least 1% compared with those based on CNNs;(2) the classification visualization results show that different categories of galaxies are separated from each other in multi-dimensional space. Conclusions. The application of the CvT model to the classification study of galaxy morpho

关键词： methods: data analysis techniques: image processing Galaxy: general

来源：评论

学校读者我要写书评

暂无评论

Lip-Audio Modality Fusion for Deep Forgery Video Detection

引用

Computers, Materials & Continua 2025年第2期82卷 3499-3515页

作者： Yong Liu Zhiyu Wang Shouling Ji Daofu Gong Lanxin Cheng Ruosi Cheng College of Cyberspace Security Information Engineering UniversityZhengzhou450001China Research Institute of Intelligent Networks Zhejiang LabHangzhou311121China College of Computer Science and Technology Zhejiang UniversityHangzhou310027China Henan Key Laboratory of Cyberspace Situation Awareness Zhengzhou450001China Key Laboratory of Cyberspace Security Ministry of EducationZhengzhou450001China

In response to the problem of traditional methods ignoring audio modality tampering, this study aims to explore an effective deep forgery video detection technique that improves detection precision and reliability by fusing lip images and audio signals. The main method used is lip-audio matching detection technology based on the Siamese neural network, combined with MFCC (Mel Frequency Cepstrum Coefficient) feature extraction of band-pass filters, an improved dual-branch Siamese network structure, and a two-stream network structure design. Firstly, the video stream is preprocessed to extract lip images, and the audio stream is preprocessed to extract MFCC features. Then, these features are processed separately through the two branches of the Siamese network. Finally, the model is trained and optimized through fully connected layers and loss functions. The experimental results show that the testing accuracy of the model in this study on the LRW (Lip Reading in the Wild) dataset reaches 92.3%;the recall rate is 94.3%;the F1 score is 93.3%, significantly better than the results of CNN (Convolutional neural Networks) and LSTM (Long Short-Term Memory) models. In the validation of multi-resolution image streams, the highest accuracy of dual-resolution image streams reaches 94%. Band-pass filters can effectively improve the signal-to-noise ratio of deep forgery video detection when processing different types of audio signals. The real-time processing performance of the model is also excellent, and it achieves an average score of up to 5 in user research. These data demonstrate that the method proposed in this study can effectively fuse visual and audio information in deep forgery video detection, accurately identify inconsistencies between video and audio, and thus verify the effectiveness of lip-audio modality fusion technology in improving detection performance.

关键词： Deep forgery video detection lip-audio modality fusion mel frequency cepstrum coefficient siamese neural network band-pass filter

来源：评论

学校读者我要写书评

暂无评论

DriftRec: Adapting Diffusion Models to Blind JPEG Restoration

引用

IEEE TRANSACTIONS ON image processing 2024年 33卷 2795-2807页

作者： Welker, Simon Chapman, Henry N. Gerkmann, Timo Univ Hamburg Dept Informat Signal Proc Res Grp SP D-20148 Hamburg Germany Deutsch Elekt Synchrotron DESY Ctr Free Elect Laser Sci CFEL D-22607 Hamburg Germany Univ Hamburg Ctr Ultrafast Imaging D-20148 Hamburg Germany Univ Hamburg Dept Phys D-20148 Hamburg Germany

In this work, we utilize the high-fidelity generation abilities of diffusion models to solve blind JPEG restoration at high compression levels. We propose an elegant modification of the forward stochastic differential equation of diffusion models to adapt them to this restoration task and name our method DriftRec. Comparing DriftRec against an L-2 regression baseline with the same network architecture and state-of-the-art techniques for JPEG restoration, we show that our approach can escape the tendency of other methods to generate blurry images, and recovers the distribution of clean images significantly more faithfully. For this, only a dataset of clean/corrupted image pairs and no knowledge about the corruption operation is required, enabling wider applicability to other restoration tasks. In contrast to other conditional and unconditional diffusion models, we utilize the idea that the distributions of clean and corrupted images are much closer to each other than each is to the usual Gaussian prior of the reverse process in diffusion models. Our approach therefore requires only low levels of added noise and needs comparatively few sampling steps even without further optimizations. We show that DriftRec naturally generalizes to realistic and difficult scenarios such as unaligned double JPEG compression and blind restoration of JPEGs found online, without having encountered such examples during training.

关键词： Diffusion models JPEG restoration JPEG artifact removal blind restoration image restoration

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：