检索结果-内蒙古大学图书馆

49th IEEE International Conference on Acoustics, Speech, and signal processing (ICASSP)

作者： Pan, Hongyi Wang, Bin Zhang, Zheyuan Zhu, Xin Jha, Debesh Cetin, Ahmet Enis Spampinato, Concetto Bagci, Ulas Northwestern Univ Dept Radiol Machine Hybrid Intelligence Lab Evanston IL 60208 USA Univ Illinois Dept Elect & Comp Engn Chicago IL USA Univ Catania Dept Elect Elect & Comp Engn Catania Italy

ISBN: (纸本)9798350344868;9798350344851

Domain generalization aims to train models on multiple source domains so that they can generalize well to unseen target domains. Among many domain generalization methods, Fourier-transform-based domain generalization methods have gained popularity primarily because they exploit the power of Fourier transformation to capture essential patterns and regularities in the data, making the model more robust to domain shifts. The mainstream Fourier-transform-based domain generalization swaps the Fourier amplitude spectrum while preserving the phase spectrum between the source and the target images. However, it neglects background interference in the amplitude spectrum. To overcome this limitation, we introduce a soft-thresholding function in the Fourier domain. We apply this newly designed algorithm to retinal fundus image segmentation, which is important for diagnosing ocular diseases but the neural network's performance can degrade across different sources due to domain shifts. The proposed technique basically enhances fundus image augmentation by eliminating small values in the Fourier domain and providing better generalization. The innovative nature of the soft thresholding fused with Fourier-transform-based domain generalization improves neural network models' performance by reducing the target images' background interference significantly. Experiments on public data validate our approach's effectiveness over conventional and state-of-the-art methods with superior segmentation metrics.

关键词： Domain generalization Fourier transform soft thresholding fundus image segmentation

来源：评论

学校读者我要写书评

暂无评论

DEEP SUB-image SAMPLING BASED DEFENSE AGAINST SPATIAL-DOMAIN ADVERSARIAL STEGANOGRAPHY 49

DEEP SUB-IMAGE SAMPLING BASED DEFENSE AGAINST SPATIAL-DOMAIN...

引用

49th IEEE International Conference on Acoustics, Speech, and signal processing (ICASSP)

作者： Huang, Xinyu Cao, Yuwen Ohtsuki, Tomoaki Northwestern Polytech Univ Sch Cybersecur Xian Peoples R China Donghua Univ Coll Informat Sci & Technol Shanghai Peoples R China Keio Univ Dept Informat & Comp Sci Tokyo Japan

ISBN: (纸本)9798350374520;9798350374513

Deep steganalyzer combined with neural networks has achieved great success in image classification over recent years. However, it suffers from the following persistent challenges: i) Deep steganalyzer is extremely vulnerable and has the risk of being attacked via adversarial steganography when performing the image classification tasks;ii) Pre-processing based methods aiming to remove adversarial perturbations from cover images jeopardize the accuracy performance, as the involved steganographic signal will be wiped off as well. In this context, to defend against adversarial attacks by an adversary, we propose an adversarial steganography detection scheme based on the pre-processing and feature migration. In brief, sub-images are sampled to obtain the dimensionality of the extracted features, which are usually used to expand them while reducing the effect brought by adversarial perturbations. In particular, by computing statistical features together with normalizing the features, our approach can improve the classification accuracy of the samples. Our experimental results show that the proposed approach is capable of detecting adversarial steganographic image with an accuracy gain of up to 35.9% over the state-of-the-art methods.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Weakly supervised video anomaly detection based on spatial-temporal feature fusion enhancement

引用

signal image AND VIDEO processing 2024年第2期18卷 1111-1118页

作者： Liang, Weijie Zhang, Jianming Zhan, Yongzhao Jiangsu Univ Sch Comp Sci & Commun Engn Xuefu Rd 301 Zhenjiang 212013 Jiangsu Peoples R China Jiangsu Engn Res Ctr Big Data Ubiquitous Percept & Zhenjiang 212013 Jiangsu Peoples R China

When an abnormal event occurs, the features of the video will gradually change in the temporal sequences. Since the abnormal event is a continuous process, the abnormal features should have a continuous prominence compared to the normal features. Incorporating this differential information is critical for effective anomaly detection. Nevertheless, prior multi-instance learning (MIL) methods have overlooked this aspect. Therefore, this study proposes the spatial-temporal feature fusion enhancement (STFFE) learning approach to address this issue. STFFE improves the discriminative power of normal and abnormal features by enhancing temporal information through the fusion of top-k video segment features with their corresponding temporal information features. Furthermore, a temporal information constraint is proposed to increase the concentration of abnormal information and accentuate the saliency of abnormal features. Extensive experimentation demonstrates that our approach surpasses state-of-the-art (SOTA) methods on the Shanghai-Tech and XD-Violence datasets. Moreover, our method achieves competitive results on the UCF-Crime dataset, with an AUC of 84.52%. The results demonstrate the effectiveness of the proposed method.

关键词： Weakly supervised video anomaly detection neural network Fusion enhancement Top-k mechanics

来源：评论

学校读者我要写书评

暂无评论

Balancing Rates and Variance via Adaptive Batch-Size for stochastic Optimization Problems

引用

IEEE TRANSACTIONS ON signal processing 2022年 70卷 3693-3708页

作者： Gao, Zhan Koppel, Alec Ribeiro, Alejandro Univ Penn Dept Elect & Syst Engn Philadelphia PA 19104 USA Amazon Supply Chain Optimizat Technol Bellevue WA 98004 USA

stochastic gradient descent is a canonical tool for addressing stochastic optimization problems, which forms the bedrock of modern machine learning. In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster to a limiting error. To do so, rather than fixing the mini-batch and the step-size at the outset, we propose a strategy to allow parameters evolving adaptively. Specifically, the batch-size is set to be a piecewise-constant increasing sequence where the increase occurs when a suitable error criterion is satisfied. Moreover, the step-size is selected as that which yields the fastest convergence. The overall algorithm, two scale adaptive (TSA) scheme, is developed for both convex and non-convex problems. It inherits the exact convergence and more importantly, the optimal error decreasing rate and an overall computation reduction are achieved. Furthermore, we extended the TSA method to the generalized adaptive batching framework, which is a generic methodology modular to any stochastic algorithms pursuing a trade-off between convergence rates and stochastic variance. We evaluate the TSA method on the image classification problem on MNIST and CIFAR-10 datasets compared with standard SGD methods and existing adaptive batch-size methods, to corroborate theoretical findings.

关键词： Convergence stochastic processes Optimization Training Statistics Sociology signal processing algorithms stochastic optimization stochastic gradient descent adaptive batch-size optimal step-size

来源：评论

学校读者我要写书评

暂无评论

Fusing global features and local information for COVID-19 detection with X-ray images

引用

signal image AND VIDEO processing 2024年第3期18卷 2643-2657页

作者： Wang, Meiao Wu, Zhangjun Wang, Xingxing Hefei Univ Technol Sch Management Hefei 230009 Anhui Peoples R China Minist Educ Engn Res Ctr Intelligent Decis Making & Informat Syst Hefei 230009 Anhui Peoples R China 901st Hosp Joint Logist Support Force PLA Dept Urol Hefei Anhui Peoples R China

COVID-19 is a modern virus that has spread all over the world and is affecting billions of people. Timely and accurate detection is of great significance to slow down the spread of the virus and treat infected people effectively. Deep learning methods have been shown promising in COVID-19 detection due to their accurate feature extraction ability, which can enhance the classification ability of the detection model with high accuracy. Existing studies mostly focus on information compression and feature extraction, which inevitably leads to original information loss. This paper proposes a novel end-to-end COVID-19 classification model called MSCGRU, which is based on the fusion of multi-faceted global features with local information of original images. Firstly, multi-scale CNN is used to extract multi-scale global features from multiple channels, and these features are fused with local information to obtain comprehensive features. Secondly, GRU is adopted to extract deep abstract features from comprehensive features. The experimental results demonstrate that the proposed MSCGRU greatly strengthens the learning ability and achieves better prediction performance compared with other methods. The accuracy of multi-classification is 98.2%, and the accuracy of binary classification is 100%.

关键词： Medical images COVID-19 Convolutional neural network GRU

来源：评论

学校读者我要写书评

暂无评论

Weather-informed lightweight framework for robust smoke video detection using BFBlock-enhanced feature extraction

引用

signal image AND VIDEO processing 2025年第5期19卷 1-10页

作者： Li, Xinying Cheng, Pengle Liu, Xiaodong Huang, Ying Beijing Forestry Univ Sch Technol Beijing 100083 Peoples R China Beijing Forestry Univ Sch Ecol & Nat Conservat Beijing 100083 Peoples R China North Dakota State Univ Dept Civil Construct & Environm Engn Fargo ND 58102 USA

Frequent occurrences of forest fires globally result in significant harm to both human populations and the natural environment. Smoke is the earliest visual signal of wildfire and its efficient detection is crucial. However, due to the lack of real-world smoke video data, existing deep learning methods often face difficulties in effective training, and their generalization ability and detection stability in different scenarios are not validated and guaranteed. Since smoke characteristics are strongly correlated with environmental conditions, the weather state information in the smoke data is additionally exploited in this study to benefit the model training and optimization. This is a new approach to explore more useful information within limited smoke data, inspired by human perceptual tendencies. For the efficient modeling of smoke features and to achieve a trade-off between computational complexity and recognition accuracy, we further propose BFBlock for enhanced smoke feature extraction and build a two-stage lightweight video detection framework WISNet. Compared to regular 3DCNN models, the proposed framework reduces the model's parameters by 85% to only 5.6M, and achieves 96.69% accuracy and 00.58% false alarm rate. Extensive experimental results on real-world smoke samples prove the effectiveness of the proposed method.

关键词： Video smoke detection Auxiliary learning Convolutional neural network Lightweight model

来源：评论

学校读者我要写书评

暂无评论

Detecting HI Galaxies with Deep neural Networks in the Presence of Radio Frequency Interference

引用

Research in Astronomy and Astrophysics 2023年第11期23卷 38-50页

作者： Ruxi Liang Furen Deng Zepei Yang Chunming Li Feiyu Zhao Botao Yang Shuanghao Shu Wenxiu Yang Shifan Zuo Yichao Li Yougang Wang Xuelei Chen National Astronomical Observatories Chinese Academy of SciencesBeijing 100101China University of Chinese Academy of Sciences Beijing 101408China Department of Physics Northeastern UniversityBostonMA 02115USA Biomedical Instrument Institute School of Biomedical EngineeringShanghai Jiao Tong UniversityShanghai 200030China Shanghai Astronomical Observatory Chinese Academy of SciencesShanghai 200030China Key Laboratory of Cosmology and Astrophysics(Liaoning) College of SciencesNortheastern UniversityShenyang 110819China Key Laboratory of Radio Astronomy and Technology Chinese Academy of SciencesBeijing 100101China

In the neutral hydrogen(H I)galaxy survey,a significant challenge is to identify and extract the H I galaxy signal from the observational data contaminated by radio frequency interference(RFI).For a drift-scan survey,or more generally a survey of a spatially continuous region,in the time-ordered spectral data,the H I galaxies and RFI all appear as regions that extend an area in the time-frequency waterfall plot,so the extraction of the H I galaxies and RFI from such data can be regarded as an image segmentation problem,and machine-learning methods can be applied to solve such *** this study,we develop a method to effectively detect and extract signals of H I galaxies based on a Mask R-CNN network combined with the PointRend *** simulating FAST-observed galaxy signals and potential RFI impact,we created a realistic data set for the training and testing of our neural *** compared five different architectures and selected the best-performing *** architecture successfully performs instance segmentation of H I galaxy signals in the RFI-contaminated time-ordered data,achieving a precision of 98.64%and a recall of 93.59%.

关键词： methods:data analysis methods:observational techniques:image processing

来源：评论

学校读者我要写书评

暂无评论

Analysis of microvascular pattern in diabetes mellitus condition using the nailfold capillaroscopy images

引用

PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART H-JOURNAL OF ENGINEERING IN MEDICINE 2024年第3期238卷 340-347页

作者： Elumalai, Sowmiya Krishnamoorthi, Nirmala Periyasamy, Naveen Farazullah, Mohamed Raj, Kiran Mahadevan, Shriraam Sri Sivasubramaniya Nadar Coll Engn Dept Biomed Engn Chennai 603110 Tamil Nadu India Sri Ramachandra Med Coll & Hosp Dept Endocrinol Chennai Tamil Nadu India

Diabetes is often considered a vascular disease due to its impact on blood vessels, it is a complex condition with various metabolic and autoimmune factors involved. One of the long term comorbidities of diabetes includes microvascular complications. The microvascular complications can be analyzed using the Nailfold capillaroscopy, a non-invasive technique that allows for the visualization and analysis of capillaries in the proximal nailfold area. Using advanced video capillaroscopy with high magnification, capillary images can be captured from and processed to analyze their morphology. The capillary images of normal group and diabetic group are acquired from 118 participants using nailfold capillaroscopy and the obtained images are preprocessed using image processing filters. The identification and segmentation of the capillaries are the challenges to be addressed in the processing of the images. Hence segmentation of capillaries is done using morphological operations, thresholding and convolutional neural networks. The performance of the filters and segmentation methods are evaluated using Mean Square Error (MSE), Peak signal to Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM), Jaccard Index and Sorensen coefficient. By analyzing the morphological features namely the capillary diameter, density, distribution, presence of hemorrhage and the shape of the capillaries from both the groups, the capillary changes associated with diabetic condition were studied. It was found that the non diabetic participants considered in this study has capillary diameter in the range of 8-14 mu m and the capillary density in the range of 10-30 capillaries per mm2 whereas the diabetic participants has capillary diameter greater than 30 mu m and the capillary density is less than 10 capillaries per mm2. In addition to capillary density and diameter, the presence of hemorrhage, the orientation and distribution of the capillaries are also considered to differentiate the diabe

关键词： Diabetes nailfold capillaroscopy image processing convolutional neural networks performance metrics

来源：评论

学校读者我要写书评

暂无评论

Exploring optimizer efficiency for facial expression recognition with convolutional neural networks

引用

JOURNAL OF ENGINEERING-JOE 2025年第1期2025卷

作者： Madni, Syed Hamid Hussain Pathmanatan, Lokessh A. L. Faheem, Muhammad Shahzad, Hafiz Muhammad Faisal Shah, Sajid Univ Southampton Malaysia Sch Elect & Comp Sci Johor Baharu Malaysia Univ Teknol Malaysia Fac Comp Skudai Johor Malaysia Univ Vaasa Sch Technol & Innovat Vaasa Finland VTT Tech Res Ctr Finland Ltd Espoo Finland Univ Sargodha Dept Comp Sci Sargodha Pakistan

It's widely accepted that human expressions, considering for roughly sixty percent of all daily interactions, are among the most authentic forms of communication. Numerous studies are being conducted to explore the importance of facial expressions and the development of machine-assisted recognition techniques. Significant progress is being made in facial and expression recognition, largely due to the rapid growth of machine learning and computer vision. A variety of algorithmic approaches and methods exist for detecting and recognizing facial expressions and features. This study investigates various optimization algorithms used with convolutional neural networks for facial expression recognition. The primary focus is on Adam, RMSProp, stochastic gradient descent and AdaMax optimizers. A comprehensive comparison is being made, examining the key aspects of each optimizer, including its advantages and disadvantages. Furthermore, the study also incorporates findings from recent studies that used these optimizers in various applications, highlighting their performance in terms of training time and precision. The aim is to illuminate the process of selecting a suitable optimizer for specific applications, analysing the trade-offs between training speed and higher accuracy levels. Moreover, this study provides a deeper analysis of the role optimizers play in machine learning-based facial expression recognition models. The discussion of the technical challenges posed by these optimizers and future improvements for achieving much more optimal results concludes the study.

关键词： image classification image processing image recognition optimisation convolution neural network machine leanring

来源：评论

学校读者我要写书评

暂无评论

Ts-vit: feature-enhanced transformer via token selection for fine-grained image recognition

引用

signal image AND VIDEO processing 2025年第1期19卷 1-11页

作者： Wang, Yingge Liang, Hu Wen, Changchun Zhao, Shengrong Qilu Univ Technol Shandong Acad Sci Key Lab Comp Power Network & Informat Secur Minist EducShandong Comp Sci Ctr Jinan Peoples R China Qilu Univ Technol Shandong Acad Sci Fac Comp Sci & Technol Shandong Engn Res Ctr Big Data Appl Technol Jinan Peoples R China Shandong Fundamental Res Ctr Comp Sci Shandong Prov Key Lab Comp Networks Jinan Peoples R China Inspur Genersoft Co Ltd Jinan Peoples R China

Fine-grained image recognition is a challenging task that focuses on identifying images from similar subordinate categories. Recently, methods based on the vision transformer (ViT) have demonstrated remarkable achievements in fine-grained image recognition, which inherent multi-head self-attention (MHSA) can effectively capture the discriminative regions in images. However, most of these ViT-based methods ignore the channel relationships of image features, and there are also problems with the inconsistent learning performance of different heads in MHSA and different layers in ViT. To address these issues, an innovative feature-enhanced transformer is proposed, named TS-ViT. TS-ViT includes three key modules: soft channel attention (Soft-CA), multi-head token selection (MHTS), and multi-level feature enhancement (MLFE). The Soft-CA enables the model to concentrate on the relationships among various channels of image features. The MHTS is proposed to address the issue of inconsistent multi-head learning performance. It selects tokens with discriminative region positions based on attention maps to form the multi-level feature. By employing contrastive learning and enhanced feature extraction, the MLFE is proposed to effectively utilize multi-level features while mitigating background noise. Extensive experiments have demonstrated that TS-ViT achieves superior performance compared to popular methods, with average accuracy of 91.8%, 91.2%, 99.5%, and 93.9% on the experimental data sets, respectively. Furthermore, TS-ViT demonstrated outstanding performance in terms of computational complexity and efficiency, with an average parameter count of 93.5M, FLOPs of 73.2G, training time of 7.8 h, and inference time of 4.1 milliseconds.

关键词： Fine-grained image recognition Vision transformer Multi-head self-attention neural networks

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：