检索结果-内蒙古大学图书馆

从U-Net到Transformer:深度模型在医学图像分割中的应用综述

计算机应用 2024年第S01期44卷 204-222页

作者：张玮智于谦苏金善乎西旦·居马洪林玲伊犁师范大学网络安全与信息技术学院新疆伊宁835000 伊犁师范大学智能计算研究与应用重点实验室新疆伊宁835000 山东女子学院数据科学与计算机学院济南255300 伊犁师范大学电子与工程学院新疆伊宁835000 伊犁师范大学振动信号俘获及智能处理实验室新疆伊宁835000

精准分割医学图像中的病灶对医生探寻病因和制定诊疗方案起关键作用,计算机视觉技术的发展促使深度学习在医学图像分割领域衍生出多种模型架构。U-Net架构以其巧妙的跳跃连接、易于优化的模块设计成为这一领域的基准模型。然而,U-Net以... 详细信息

精准分割医学图像中的病灶对医生探寻病因和制定诊疗方案起关键作用,计算机视觉技术的发展促使深度学习在医学图像分割领域衍生出多种模型架构。U-Net架构以其巧妙的跳跃连接、易于优化的模块设计成为这一领域的基准模型。然而,U-Net以卷积神经网络(CNN)为主干,在长期建模依赖关系方面只擅长获取局部特征,基于CNN的各项方法在执行分割任务中缺乏对图像长期相关性的解释,无法提取全局特征。为帮助本领域学者了解U-Net的发展历程及研究现状,以问题为导向对2016-2023年U-Net改进工作进行综述。首先,从改进结构位置的角度对U-Net及其各项改进模型进行叙述,探讨各工作的研究目的和创新设计及不足之处;其次,对Transformer与U-Net的结合方式进行分析,从中获取改进工作的研究动向;最后,在Synapse和ACDC数据集上进行对比实验,通过实验分析和可视化结果表明,Transformer方法在分割精度方面有显著优势,特别是混合网络子块的结合方式,在确保模型性能的同时兼顾效率,证明了该类工作有着广阔的发展前景和研究价值。

关键词：医学图像分割 U-Net 结构改进 Transformer 深度神经网络

来源：评论

学校读者我要写书评

暂无评论

Attention for image Registration (AiR): an unsupervised Transformer approach

arXiv

引用

arXiv 2021年

作者： Wang, Zihao Delingette, Hervé Inria Sophia-Antipolis Epione Team Valbonne France Université Côte d’Azur Nice France

image registration is a crucial task in signal processing, but it often encounters issues with stability and efficiency. Non-learning registration approaches rely on optimizing similarity metrics between fixed and moving images, which can be expensive in terms of time and space complexity. This problem can be exacerbated when the images are large or there are significant deformations between them. Recently, deep learning, specifically convolutional neural network (CNN)-based methods, have been explored as an effective solution to the weaknesses of non-learning approaches. To further advance learning approaches in image registration, we introduce an attention mechanism in the deformable image registration problem. Our proposed approach is based on a Transformer framework called AiR, which can be efficiently trained on GPGPU devices. We treat the image registration problem as a language translation task and use the Transformer to learn the deformation field. The method learns an unsupervised generated deformation map and is tested on two benchmark datasets. In summary, our approach shows promising effectiveness in addressing stability and efficiency issues in image registration tasks. The source code of AiR is available on Github. Copyright © 2021, The Authors. All rights reserved.

关键词： Efficiency

来源：评论

学校读者我要写书评

暂无评论

Learning stacking regression for no-reference super-resolution image quality assessment'

引用

signal processing 2021年 178卷 107771-107771页

作者： Zhang, Kaibing Zhu, Danni Li, Jie Gao, Xinbo Gao, Fei Lu, Jian Xian Polytech Univ Sch Elect & Informat Xian 710048 Peoples R China Xidian Univ Sch Elect Engn Xian 710071 Peoples R China Chongqing Univ Posts & Telecommun Chongqing Key Lab Image Cognit Chongqing 400065 Peoples R China

No-reference super-resolution (SR) image quality assessment (NR-SRIQA) aims to evaluate the quality of SR images without relying on any reference images. Currently, most previous methods usually utilize a certain handcrafted perceptual statistical features to quantify the degradation of SR images and a simple regression model to learn the mapping relationship from the features to the perceptual quality. Although these methods achieved promising performance, they still have some limitations: 1) the handcrafted features cannot accurately quantify the degradation of SR images;2) the complex mapping relationship between the features and the quality scores cannot be well approximated by a simple regression model. To alleviate the above problems, we propose a novel stacking regression framework for NR-SRIQA. In the proposed method, we use a pre-trained VGGNet to extract the deep features for measuring the degradation of SR images, and then develop a stacking regression framework to establish the relationship between the learned deep features and the quality scores to achieve the NR-SRIQA. The stacking regression integrates two base regressors, namely Support Vector Regression (SVR) and K-Nearest Neighbor (K-NN) regression, and a simple linear regression as a meta-regressor. Thanks to the feature representation capability of deep neural networks (DNNs) and the complementary features of the two base regressors, the experimental results indicate that the proposed stacking regression framework is capable of yielding higher consistency with human visual judgments on the quality of SR images than other state-of-the-art SRIQA methods. (C) 2020 Elsevier B.V. All rights reserved.

关键词： No-reference (NR) Super-resolution (SR) image quality assessment (SRIQA) Stacking regression

来源：评论

学校读者我要写书评

暂无评论

基于权重不可知神经网络的旋翼无人机检测

引用

电讯技术 2022年第1期62卷 46-53页

作者：谢跃雷刘信梁文斌桂林电子科技大学信息与通信学院广西桂林541004 桂林电子科技大学信息科技学院广西桂林541004 广西无线宽带通信与信号处理重点实验室广西桂林541004

针对传统无人机检测方法缺乏智能性和泛化性的问题,提出了一种基于权重不可知神经网络(Weight Agnostic neural Network,WANN)的无人机微动特征检测方法,以实现探测无人机的目的。推导了旋翼无人机微动模型,详细说明了WANN模型的构建过... 详细信息

针对传统无人机检测方法缺乏智能性和泛化性的问题,提出了一种基于权重不可知神经网络(Weight Agnostic neural Network,WANN)的无人机微动特征检测方法,以实现探测无人机的目的。推导了旋翼无人机微动模型,详细说明了WANN模型的构建过程。以回波信号的循环谱等高图作为训练、测试数据集进行了仿真,结果表明该方法对噪声有较好的鲁棒性。实测结果也验证了WANN模型能有效提取无人机微动特征,有较好的图像识别能力。

关键词：无人机(UAV) 微动特征检测权重不可知神经网络(WANN) 循环谱等高图

来源：评论

学校读者我要写书评

暂无评论

A novel image retrieval framework: hybrid feature fusion with deep metric learning and uncertainty embedding

引用

signal, image and Video processing 2025年第9期19卷

作者： Thi, Hanh Nguyen Nguyen, Huu Quynh Dao, Thi Thuy Quynh Le, Nguyen Tuan Thanh Hanoi Architectural University Hanoi Viet Nam CMC University Hanoi Viet Nam Posts and Telecommunications Institute of Technology Hanoi Viet Nam Faculty of Information Technology Thuy Loi University Hanoi Viet Nam

Content-based image retrieval (CBIR) has made notable progress thanks to deep learning methods, particularly convolutional neural networks (CNNs). These methods have demonstrated competitive performance in feature extraction and representation. Additionally, comparing the similarity between query images and images in the database based on semantic features combined with deep metric learning has contributed to improved image retrieval efficiency. However, current methods mainly focus on semantic features while not fully addressing the exploitation of uncertainty features, which arise from noise or semantic ambiguity in images. Moreover, CNNs primarily extract local features and may not fully capture the global relationships between features, which is a notable advantage of Vision Transformers (ViT). In this paper, we propose a novel image retrieval method named H-FUSE (Hybrid Feature fUSion with uncErtainty embedding). This method integrates both semantic and uncertainty features with deep metric learning while constructing a hybrid model combining CNN and ViT. This combination enables the extraction of both local and global features, effectively combining the advantages of both networks to enhance image retrieval performance. The proposed method was evaluated on two benchmark datasets, CIFAR-100 and CUB-200-2011, and demonstrated competitive performance across the test set. Specifically, on CIFAR-100, the method achieved a mean Average Precision (mAP) of 98.09% with Top-10 retrieval, while on CUB-200-2011, it achieved a mAP of 93.63%, suggesting its potential for practical use. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2025.

关键词： Content based retrieval

来源：评论

学校读者我要写书评

暂无评论

A dynamic weighted fusion model for multimodal sentiment analysis

引用

signal, image and Video processing 2025年第8期19卷

作者： Yang, Liang School of Computer Science and Engineering Guangxi Normal University Guilin541004 China

Multimodal sentiment analysis (MSA) tasks leverage diverse data sources, including text, audio, and visual data, to infer users' sentiment states. Previous research has mainly focused on capturing the differences and consistency of sentiment information between different modalities, emphasizing cross-modal interaction, while neglecting the in-depth exploration of sentiment information within individual modalities. Additionally, existing MSA methods rarely examine the contribution of each modality to model performance. To address these issues, our paper proposes a dynamic weighted fusion model for multimodal sentiment analysis. Specifically, we first design a multi-level semantic enhancement module (MLSE) for each single mode, which replicates three copies of each mode, captures local and global emotional information using convolutional neural networks and attention mechanisms, and aims to extract semantic information from multiple perspectives and levels in a single mode. Subsequently, we design a genetic algorithm module suitable for multimodal sentiment analysis tasks, which dynamically calculates the optimal weight of each modality during model training and selects the modality with the maximum weight as the primary modality, while the other two are considered as auxiliary modalities. We conduct extensive experiments on three benchmark datasets (CMU-MOSI, CMU-MOSEI and CH-SIMS), and the results demonstrate that our proposed model outperforms state-of-the-art models across various metrics. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2025.

关键词： Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

HandNet: Identification based on hand images using deep learning methods 4

HandNet: Identification based on hand images using deep lear...

引用

4th International Conference on Vision, image and signal processing, ICVISP 2020

作者： Yuan, Yimin Tang, Chaoying Xia, Shuhang Chen, Zhou Qi, Tong Automation College Nanjing University of Aeronautics and Astronautics Nanjing China

ISBN: (纸本)9781450389532

Biometric identification is the technology that differentiates individuals by body parts or behavioral characteristics. Hand has been proved to be a successful biometric for verification and identification because of the rich features such as fingerprint, palmprint, dorsal vein, etc. This paper presents a system for identifying individuals based on their hand images. Firstly, after image preprocessing with guided filter and CLAHE method, hand images taken under visible light and near-infrared (NIR) light were normalized. Secondly, a convolutional neural network structure was designed and trained on a large dataset. Using hand images as the input of the network, different depth features were extracted, including the feature from the fusion layer. Thirdly, SVM classifiers were adopted to get the classification results. A fusion strategy was used to make use of different SVM classifiers. The proposed algorithm was tested on different datasets and the experimental results showed that high accuracy can be obtained from the fusion of features. It shows that the hand image is a strong biometric for verification and identification. © 2020 ACM.

关键词： Biometrics

来源：评论

学校读者我要写书评

暂无评论

Synthesizing Annotated image and Video Data Using a Rendering-Based Pipeline for Improved License Plate Recognition

arXiv

引用

arXiv 2022年

作者： Spruck, Andreas Gruber, Maximilane Maier, Anatol Moussa, Denise Seiler, Jürgen Riess, Christian Kaup, André Chair of Multimedia Communications an Signal Processing Friedrich-Alexander-Universität Erlangen-Nürnberg Cauerstr. 7 Erlangen91058 Germany Friedrich-Alexander-Universität Erlangen-Nürnberg Martensstr. 3 Erlangen91058 Germany

An insufficient number of training samples is a common problem in neural network applications. While data augmentation methods require at least a minimum number of samples, we propose a novel, rendering-based pipeline for synthesizing annotated data sets. Our method does not modify existing samples but synthesizes entirely new samples. The proposed rendering-based pipeline is capable of generating and annotating synthetic and partly-real image and video data in a fully automatic procedure. Moreover, the pipeline can aid the acquisition of real data. The proposed pipeline is based on a rendering process. This process generates synthetic data. Partly-real data bring the synthetic sequences closer to reality by incorporating real cameras during the acquisition process. The benefits of the proposed data generation pipeline, especially for machine learning scenarios with limited available training data, are demonstrated by an extensive experimental validation in the context of automatic license plate recognition. The experiments demonstrate a significant reduction of the character error rate and miss rate from 73.74% and 100% to 14.11% and 41.27% respectively, compared to an OCR algorithm trained on a real data set solely. These improvements are achieved by training the algorithm on synthesized data solely. When additionally incorporating real data, the error rates can be decreased further. Thereby, the character error rate and miss rate can be reduced to 11.90% and 39.88% respectively. All data used during the experiments as well as the proposed rendering-based pipeline for the automated data generation is made publicly available under (URL will be revealed upon publication). Copyright © 2022, The Authors. All rights reserved.

关键词： Optical character recognition

来源：评论

学校读者我要写书评

暂无评论

MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal image Matching

引用

IEEE transactions on image processing : a publication of the IEEE signal processing Society 2025年 PP卷 PP页

作者： Yepeng Liu Zhichao Sun Baosheng Yu Yitian Zhao Bo Du Yongchao Xu Jun Cheng National Engineering Research Center for Multimedia Software Institute of Artificial Intelligence School of Computer Science and Hubei Key Laboratory of Multimedia and Network Communication Engineering Wuhan University Wuhan China Lee Kong Chian School of Medicine Nanyang Technological University Nanyang Ave Singapore Ningbo Institute of Materials Technology and Engineering Chinese Academy of Sciences Ningbo Zhejiang China Institute for Infocomm Research (I2R) Agency for Science Technology and Research (A*STAR) Nanyang Ave Singapore

Many keypoint detection and description methods have been proposed for image matching or registration. While these methods demonstrate promising performance for single-modality image matching, they often struggle with multimodal data because the descriptors trained on single-modality data tend to lack robustness against the non-linear variations present in multimodal data. Extending such methods to multimodal image matching often requires well-aligned multimodal data to learn modality-invariant descriptors. However, acquiring such data is often costly and impractical in many real-world scenarios. To address this challenge, we propose a modality-invariant feature learning network (MIFNet) to compute modality-invariant features for keypoint descriptions in multimodal image matching using only single-modality training data. Specifically, we propose a novel latent feature aggregation module and a cumulative hybrid aggregation module to enhance the base keypoint descriptors trained on single-modality data by leveraging pre-trained features from Stable Diffusion models. We validate our method with recent keypoint detection and description methods in three multimodal retinal image datasets (CF-FA, CF-OCT, EMA-OCTA) and two remote sensing datasets (Optical-SAR and Optical-NIR). Extensive experiments demonstrate that the proposed MIFNet is able to learn modality-invariant feature for multimodal image matching without accessing the targeted modality and has good zero-shot generalization ability. The source code will be made publicly available.

关键词： image matching Feature extraction Training Graph neural networks Diffusion models Detectors Semantics Retina Noise Learning systems

来源：评论

学校读者我要写书评

暂无评论

Cross-task extreme learning machine for breast cancer image classification with deep convolutional features

引用

BIOMEDICAL signal processing AND CONTROL 2020年第0期57卷 101789-101789页

作者： Wang, Pin Song, Qi Li, Yongming Lv, Shanshan Wang, Jiaxin Li, Linyu Zhang, HeHua Chongqing Univ Coll Commun Engn Chongqing 400030 Peoples R China Third Mil Med Univ Army Med Univ Daping Hosp Inst Surg Res Chongqing 400038 Peoples R China

Automatic classification of breast histopathology images plays a key role in computer-aided breast cancer diagnosis. However, feature-based classification methods rely on the accurate cell segmentation and feature extraction. Due to overlapping cells, dust, impurities and uneven irradiation the accurate segmentation and efficient feature extraction are still challenging. In order to overcome the above difficulties and limited breast histopathology images, in this paper, a hybrid structure which includes a double deep transfer learning ((DTL)-T-2) and interactive cross-task extreme learning machine (ICELM) is proposed based on feature extraction and representation ability of CNN and classification robustness of ELM. First, high level features are extracted using deep transfer learning and double-step deep transfer learning. Then, the high level feature sets are jointly used as regularization terms to further improve classification performance in interactive cross task extreme learning machine. The proposed method was tested on 134 breast cancer histopathology images. Results show that our method has achieved remarkable performance in classification accuracy (96.67%, 96.96%, 98.18%). From the experiment result, the proposed method is promising for providing an efficient tool for breast cancer classification in clinical settings. (C) 2019 Elsevier Ltd. All rights reserved.

关键词： Breast cancer histopathology images Uninvolved images Convolutional neural networks Double deep transfer learning Interactive cross-task extreme learning machine

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：