检索结果-内蒙古大学图书馆

Semi-Supervised Medical Image Segmentation Based on Frequency Domain Aware Stable Consistency Regularization

学校读者我要写书评

暂无评论

Journal of imaging informatics in medicine 2025年 2025 Jan 22页

作者： Yihao Ouyang Peipei Li Haixiang Zhang Xuegang Hu Key Laboratory of Knowledge Engineering with Big Data (the Ministry of Education of China) Hefei University of Technology Hefei 230009 Anhui China. School of Computer Science and Information Engineering Hefei University of Technology Hefei 230009 Anhui China. School of Computer Science and Information Engineering Hefei University of Technology Hefei 230009 Anhui China. peipeili@***. Center for Big Data and Population Health of IHM Anhui Medical University Hefei Anhui China. peipeili@***. Center for Big Data and Population Health of IHM Anhui Medical University Hefei Anhui China. Computer Centre The Second People's Hospital of Hefei Hefei 230011 Anhui China. Anhui Province Key Laboratory of Industry Safety and Emergency Technology Hefei University of Technology Hefei 230009 Anhui China.

With the advancement of deep learning models nowadays, they have successfully applied in the semi-supervised medical image segmentation where there are few annotated medical images and a large number of unlabeled ones. A representative approach in this regard is the semi-supervised method based on consistency regularization, which improves model training by imposing consistency constraints (perturbations) on unlabeled data. However, the perturbations in this kind of methods are often artificially designed, which may introduce biases unfavorable to the model learning in the handling of medical image segmentation. On the other hand, the majority of such methods often overlook the supervision in the Encoder stage of training and primarily focus on the outcomes in the later stages, potentially leading to chaotic learning in the initial phase and subsequently impacting the learning process of the model in the later stages. At the meanwhile, they miss the intrinsic spatial-frequency information of the images. Therefore, in this study, we propose a new semi-supervised medical image segmentation approach based on frequency domain aware stable consistency regularization. Specifically, to avoid the bias introduced by artificially setting perturbations, we first utilize the inherent frequency domain information of images, including both high and low frequencies, as the consistency constraint. Secondly, we incorporate supervision in the Encoder stage of model training to ensure that the model does not fail to learn due to the disruption of the original feature space caused by strong augmentation. Finally, extensive experimentation validates the effectiveness of our semi-supervised approach.

关键词： Consistency regularization Frequency domain Medical image segmentation Semi-supervision

ConsistencyDet: A Robust Object Detector with a Denoising Paradigm of Consistency Model

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Jiang, Lifan Wang, Zhihui Wang, Changmiao Li, Ming Leng, Jiaxu Wu, Xindong the College of Computer Science and Engineering Shandong University of Science and Technology Qingdao266510 China Shenzhen Research Institute of Big Data Shenzhen518172 China Key Laboratory of Intelligent Education Technology and Application Zhejiang Normal University Jinhua321004 China the Key Laboratory of Image Cognition Chongqing University of Posts and Telecommunications Chongqing400065 China Jiangsu Key Laboratory of Image and Video Understanding for Social Safety Nanjing University of Science and Technology Nanjing210094 China the Key Laboratory of Knowledge Engineering with Big Data the Ministry of Education of China China School of Computer Science and Information Technology Hefei University of Technology Hefei230009 China

Object detection, a quintessential task in the realm of perceptual computing, can be tackled using a generative methodology. In the present study, we introduce a novel framework designed to articulate object detection as a denoising diffusion process, which operates on the perturbed bounding boxes of annotated entities. This framework, termed ConsistencyDet, leverages an innovative denoising concept known as the Consistency Model. The hallmark of this model is its self-consistency feature, which empowers the model to map distorted information from any temporal stage back to its pristine state, thereby realizing a "one-step denoising" mechanism. Such an attribute markedly elevates the operational efficiency of the model, setting it apart from the conventional Diffusion Model. Throughout the training phase, ConsistencyDet initiates the diffusion sequence with noise-infused boxes derived from the ground-truth annotations and conditions the model to perform the denoising task. Subsequently, in the inference stage, the model employs a denoising sampling strategy that commences with bounding boxes randomly sampled from a normal distribution. Through iterative refinement, the model transforms an assortment of arbitrarily generated boxes into definitive detections. Comprehensive evaluations employing standard benchmarks, such as MS-COCO and LVIS, corroborate that ConsistencyDet surpasses other leading-edge detectors in performance metrics. Our code is available at https://***/Tankowa/ConsistencyDet. Copyright © 2024, The Authors. All rights reserved.

关键词： Object detection

FR2Seg: Continual Segmentation Across Multiple Sites via Fourier Style Replay and Adaptive Consistency Regularization 39

学校读者我要写书评

暂无评论

FR2Seg: Continual Segmentation Across Multiple Sites via Fou...

39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025

作者： Xu, Cheng Zhang, Weiwen Zhang, Hongrui Xu, Xuemiao Zhang, Huaidong Zou, Jing Qin, Jing South China University of Technology China The Hong Kong Polytechnic University Hong Kong Guangdong Engineering Center for Large Model and GenAI Technology China State Key Laboratory of Subtropical Building Science China Ministry of Education Key Laboratory of Big Data and Intelligent Robot China Guangdong Provincial Key Lab of Computational Intelligence and Cyberspace Information China

ISBN: (纸本)157735897X

In clinical imaging, medical segmentation networks typically require continually adapting to new data from multiple sites over time, as aggregating all data for learning at once can be impractical due to storage limitations and privacy concerns. However, existing methods basically overlook domain-specific characteristics and fall short of adequately capturing domain-invariant knowledge during continual learning, leading to undesired catastrophic forgetting of previous sites and inferior generalization to new sites. To tackle this issue, this paper introduces FR2Seg, to sufficiently exploit both domain-specific and domain-invariant knowledge for efficient continual learning with the aid of low-frequency cues. For the former aspect, we propose a Fourier style replay module to synthesize pseudo images with old-site styles for data augmentation during new-site training, effectively preventing catastrophic forgetting without sacrificing data privacy. For the latter, we present a Fourier adaptive consistency regularization to identify and constrain the optimization of domain-invariant parameters with explicit awareness of knowledge transferability across sites, ensuring excellent generalizability to new sites. Experimental results on two public datasets confirm our method's superiority over existing state-of-the-art continual learning methods. Copyright © 2025, Association for the Advancement of Artificial Intelligence (***). All rights reserved.

关键词： Image segmentation

Text-guided Reconstruction Network for Sentiment Analysis with Uncertain Missing Modalities

学校读者我要写书评

暂无评论

IEEE Transactions on Affective Computing 2025年

作者： Shi, Piao Hu, Min Nakagawa, Satoshi Zheng, Xiangming Shi, Xuefeng Ren, Fuji Hefei University of Technology Key Laboratory of Knowledge Engineering with Big Data Ministry of Education Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine National Smart Eldercare International Science and Technology Cooperation Base School of Computer Science and Information Engineering Anhui Hefei230601 China Bozhou University School of Electronic and Information Engineering Bozhou236800 China University of Tokyo Graduate School of Information Science and Technology Tokyo113-8656 Japan University of Electronic Science and Technology of China College of Computer Science and Engineering Chengdu611731 China University of Electronic Science and Technology of China Shenzhen Institute for Advanced Study Shenzhen518110 China

Multimodal Sentiment Analysis (MSA) is an attractive research that aims to integrate sentiment expressed in textual, visual, and acoustic signals. There are two main problems in the existing methods: 1) the dominant role of the text is underutilization in unaligned multimodal data, and 2) the modality under uncertain missing feature is not sufficiently explored. This paper proposes a Text-guided Reconstruction Network (TgRN) for MSA with uncertain missing modalities in non-aligned sequences. The TgRN network includes three primary modules: Text-guided Extraction Module (TEM), Reconstruction Module (RM) and Text-guided Fusion Module (TFM). First, the TEM consists of the text-guided cross attention units and self-attention units to capture inter-modal features and intra-modal features, respectively. Second, leveraging enhanced attention units and a three-way squeeze-and-excitation block, the RM is designed to learn semantic information from incomplete data and reconstruct missing modality features. Third, the TFM utilizes a progressive modality-mixing adaptation gate to explore the dynamic correlations between nonverbal and verbal modalities, effectively addressing the modality gap issue. Finally, under the supervision of sentiment prediction loss and reconstruction loss, the TgRN effectively processes both uncertain missing-modality conditions and ideal complete modality conditions. Extensive experiments on CMU-MOSI and CH-SIMS demonstrate that our proposed method outperforms state-of-the-art approaches. © 2010-2012 IEEE.

关键词： Semantics

学校读者我要写书评

暂无评论

Proceedings of the 38th International Conference on Neural Information Processing Systems

作者： Zhicheng Chen Shibo Feng Zhong Zhang Xi Xiao Xingyu Gao Peilin Zhao Shenzhen International Graduate School Tsinghua University and Tencent AI Lab School of Computer Science and Engineering Nanyang Technological University Tencent AI Lab Shenzhen International Graduate School Tsinghua University and Key Laboratory of Data Protection and Intelligent Management (Sichuan University) Ministry of Education Institute of Microelectronics Chinese Academy of Sciences

ISBN: (纸本)9798331314385

The superior generation capabilities of Denoised Diffusion Probabilistic Models (DDPMs) have been effectively showcased across a multitude of domains. Recently, the application of DDPMs has extended to time series generation tasks, where they have significantly outperformed other deep generative models, often by a substantial margin. However, we have discovered two main challenges with these methods: 1) the inference time is excessively long; 2) there is potential for improvement in the quality of the generated time series. In this paper, we propose a method based on discrete token modeling technique called Similarity-driven Discrete Transformer (SDformer). Specifically, SDformer utilizes a similarity-driven vector quantization method for learning high-quality discrete token representations of time series, followed by a discrete Transformer for data distribution modeling at the token level. Comprehensive experiments show that our method significantly outperforms competing approaches in terms of the generated time series quality while also ensuring a short inference time. Furthermore, without requiring retraining, SDformer can be directly applied to predictive tasks and still achieve commendable results.

关键词：

JC5A: Service Delay Minimization for Aerial MEC-assisted Industrial Cyber-Physical Systems

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Sun, Geng Wu, Jiaxu Sun, Zemin He, Long Wang, Jiacheng Niyato, Dusit Jamalipour, Abbas Mao, Shiwen College of Computer Science and Technology Jilin University Changchun130012 China Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education Jilin University Changchun130012 China College of Computing and Data Science Nanyang Technological University Singapore639798 Singapore School of Computer Science and Engineering Nanyang Technological University Singapore639798 Singapore School of Electrical and Computer Engineering The University of Sydney SydneyNSW2006 Australia Department of Electrical and Computer Engineering Auburn University Auburn United States

In the era of the sixth generation (6G) and industrial Internet of Things (IIoT), an industrial cyber-physical system (ICPS) drives the proliferation of sensor devices and computing-intensive tasks. To address the limited resources of IIoT sensor devices, unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) has emerged as a promising solution, providing flexible and cost-effective services in close proximity of IIoT sensor devices (ISDs). However, leveraging aerial MEC to meet the delay-sensitive and computation-intensive requirements of the ISDs could face several challenges, including the limited communication, computation and caching (3C) resources, stringent offloading requirements for 3C services, and constrained on-board energy of UAVs. To address these issues, we first present a collaborative aerial MEC-assisted ICPS architecture by incorporating the computing capabilities of the macro base station (MBS) and UAVs. We then formulate a service delay minimization optimization problem (SDMOP). Since the SDMOP is proved to be an NP-hard problem, we propose a joint computation offloading, caching, communication resource allocation, computation resource allocation, and UAV trajectory control approach (JC5A). Specifically, JC5A consists of a block successive upper bound minimization method of multipliers (BSUMM) for computation offloading and service caching, a convex optimization-based method for communication and computation resource allocation, and a successive convex approximation (SCA)-based method for UAV trajectory control. Moreover, we theoretically prove the convergence and polynomial complexity of JC5A. Simulation results demonstrate that the proposed approach can achieve superior system performance compared to the benchmark approaches and algorithms. Copyright © 2024, The Authors. All rights reserved.

关键词： Convex optimization

Evaluating the point cloud of individual trees generated from images based on Neural Radiance fields (NeRF) method

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Huang, Hongyu Tian, Guoji Chen, Chongcheng National Engineering Research Center of Geospatial Information Technology Fuzhou University Fuzhou350108 China Key Lab of Spatial Data Mining and Information Sharing of Ministry of Education Fuzhou University Fuzhou350108 China Fuzhou350108 China

Three-dimensional (3D) reconstruction of trees has always been a key task in precision forestry management and research. Due to the complex branch morphological structure of trees themselves and the occlusions from tree stems, branches and foliage, it is difficult to recreate a complete three-dimensional tree model from a two-dimensional image by conventional photogrammetric methods. In this study, based on tree images collected by various cameras in different ways, the Neural Radiance Fields (NeRF) method was used for individual tree reconstruction and the exported point cloud models are compared with point cloud derived from photogrammetric reconstruction and laser scanning methods. The results show that the NeRF method performs well in individual tree 3D reconstruction, as it has higher successful reconstruction rate, better reconstruction in the canopy area, it requires less amount of images as input. Compared with photogrammetric reconstruction method, NeRF has significant advantages in reconstruction efficiency and is adaptable to complex scenes, but the generated point cloud tends to be noisy and low resolution. The accuracy of tree structural parameters (tree height and diameter at breast height) extracted from the photogrammetric point cloud is still higher than those of derived from the NeRF point cloud. The results of this study illustrate the great potential of NeRF method for individual tree reconstruction, and it provides new ideas and research directions for 3D reconstruction and visualization of complex forest scenes. Copyright © 2023, The Authors. All rights reserved.

关键词： Photogrammetry

ViGT: Proposal-free Video Grounding with Learnable Token in Transformer

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Li, Kun Guo, Dan Wang, Meng School of Computer Science and Information Engineering Hefei University of Technology Hefei230601 China Key Laboratory of Knowledge Engineering with Big Data Ministry of Education Hefei230601 China Intelligent Interconnected Systems Laboratory of Anhui Province Hefei230601 China Institute of Artificial Intelligence Hefei Comprehensive National Science Center Hefei230088 China

The video grounding (VG) task aims to locate the queried action or event in an untrimmed video based on rich linguistic descriptions. Existing proposal-free methods are trapped in complex interaction between video and query, overemphasizing cross-modal feature fusion and feature correlation for VG. In this paper, we propose a novel boundary regression paradigm that performs regression token learning in a transformer. Particularly, we present a simple but effective proposal-free framework, namely Video Grounding Transformer (ViGT), which predicts the temporal boundary using a learnable regression token rather than multi-modal or cross-modal features. In ViGT, the benefits of a learnable token are manifested as follows. (1) The token is unrelated to the video or the query and avoids data bias toward the original video and query. (2) The token simultaneously performs global context aggregation from video and query features. First, we employed a sharing feature encoder to project both video and query into a joint feature space before performing cross-modal co-attention (i.e., video-to-query attention and query-to-video attention) to highlight discriminative features in each modality. Furthermore, we concatenated a learnable regression token [REG] with the video and query features as the input of a vision-language transformer. Finally, we utilized the token [REG] to predict the target moment and visual features to constrain the foreground and background probabilities at each timestamp. The proposed ViGT performed well on three public datasets: ANet Captions, TACoS and YouCookII. Extensive ablation studies and qualitative analysis further validated the interpretability of ViGT. Copyright © 2023, The Authors. All rights reserved.

关键词： Regression analysis

Class-Imbalanced Semi-Supervised Learning for Large-Scale Point Cloud Semantic Segmentation via Decoupling Optimization

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Li, Mengtian Lin, Shaohui Wang, Zihan Shen, Yunhang Zhang, Baochang Ma, Lizhuang Shanghai University China Shanghai Engineering Research Center of Motion Picture Special Effects China East China Normal University China Key Laboratory of Advanced Theory and Application in Statistics and Data Science Ministry of Education China Beihang University China Tencent Youtu Lab China

Semi-supervised learning (SSL), thanks to the significant reduction of data annotation costs, has been an active research topic for large-scale 3D scene understanding. However, the existing SSL-based methods suffer from severe training bias, mainly due to class imbalance and long-tail distributions of the point cloud data. As a result, they lead to a biased prediction for the tail class segmentation. In this paper, we introduce a new decoupling optimization framework, which disentangles feature representation learning and classifier in an alternative optimization manner to shift the bias decision boundary effectively. In particular, we first employ two-round pseudo-label generation to select unlabeled points across head-to-tail classes. We further introduce multi-class imbalanced focus loss to adaptively pay more attention to feature learning across head-to-tail classes. We fix the backbone parameters after feature learning and retrain the classifier using ground-truth points to update its parameters. Extensive experiments demonstrate the effectiveness of our method outperforming previous state-of-the-art methods on both indoor and outdoor 3D point cloud datasets (i.e., S3DIS, ScanNet-V2, Semantic3D, and SemanticKITTI) using 1% and 1pt evaluation. Copyright © 2024, The Authors. All rights reserved.

关键词： Semantic Segmentation