检索结果-内蒙古大学图书馆

BUSIFusion: Blind Unsupervised Single Image Fusion of Hyperspectral and RGB Images

学校读者我要写书评

暂无评论

TechRxiv

TechRxiv 2022年

作者： Li, Jiabao Li, Yuqi Wang, Chong Ye, Xulun Heidrich, Wolfgang The Faculty of Electrical Engineering and Computer Science Ningbo University Ningbo315600 China Zhejiang Engineering Research Center of Advanced Mass Spectrometry and Clinical Application China The Visual Computing Center King Abdullah University of Science and Technology Thuwal23955-6900 Saudi Arabia

Hyperspectral images (HSIs) provide rich spectral information that has been widely used in numerous computer vision tasks. However, their low spatial resolution often prevents their use in applications such as image segmentation and recognition. Fusing low-resolution HSIs with high-resolution RGB images to reconstruct high-resolution HSIs has attracted great research attention recently. In this paper, we propose an unsupervised blind fusion network that operates on a single HSI and RGB image pair and requires neither known degradation models nor any training data. Our method takes full advantage of an unrolling network and coordinate encoding to provide a state-of-the-art HSI reconstruction. It can also estimate the degradation parameters relatively accurately through the neural representation and implicit regularization of the degradation model. The experimental results demonstrate the effectiveness of our method both in simulations and in our real experiments. The proposed method outperforms other state-of-the-art nonblind and blind fusion methods on two popular HSI datasets. Our related code and data is available at https://***/CPREgroup/Real-Spec-RGB-Fusion. © , CC BY-NC-SA.

关键词： Image fusion

∂H: Differentiable Holography

学校读者我要写书评

暂无评论

Research Square

Research Square 2023年

作者： Chen, Ni Wang, Congli Heidrich, Wolfgang Wyant College of Optical Sciences University of Arizona TucsonAZ85721 United States Department of Electrical Engineering & Computer Sciences University of California BerkeleyCA94720 United States Visual Computing Center King Abdullah University of Science and Technology Thuwal23955 Saudi Arabia

Over the past decade, the field of holography has gained significant ground due to advances in computational imaging. However, the utilization of computational tools is hampered by the mismatch between experimental setups and the conceptual model. We present differentiable holography (∂H), a novel framework for automatically self-calibrating experimental imperfections in inverse holographic imaging. The technique is demonstrated on auto-focused complex field imaging from a single intensity-only inline hologram. © 2023, CC BY.

关键词： Inverse problems

Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Wang, Rui Wu, Zuxuan Chen, Dongdong Chen, Yinpeng Dai, Xiyang Liu, Mengchen Zhou, Luowei Yuan, Lu Jiang, Yu-Gang Shanghai Key Lab of Intelligent Info. Processing School of Computer Science Fudan University China Shanghai Collaborative Innovation Center on Intelligent Visual Computing China Microsoft Cloud&AI China

Transformer-based models have achieved top performance on major video recognition benchmarks. Benefiting from the self-attention mechanism, these models show stronger ability of modeling long-range dependencies compared to CNN-based models. However, significant computation overheads, resulted from the quadratic complexity of self-attention on top of a tremendous number of tokens, limit the use of existing video transformers in applications with limited resources like mobile devices. In this paper, we extend Mobile-Former to Video Mobile-Former, which decouples the video architecture into a lightweight 3D-CNNs for local context modeling and a Transformer modules for global interaction modeling in a parallel fashion. To avoid significant computational cost incurred by computing self-attention between the large number of local patches in videos, we propose to use very few global tokens (e.g., 6) for a whole video in Transformers to exchange information with 3D-CNNs with a cross-attention mechanism. Through efficient global spatial-temporal modeling, Video Mobile-Former significantly improves the video recognition performance of alternative lightweight baselines, and outperforms other efficient CNN-based models at the low FLOP regime from 500M to 6G total FLOPs on various video recognition tasks. It is worth noting that Video Mobile-Former is the first Transformer-based video model which constrains the computational budget within 1G FLOPs. Copyright © 2022, The Authors. All rights reserved.

关键词： Benchmarking

Slimmable transformer with hybrid axial-attention for medical image segmentation

学校读者我要写书评

暂无评论

computers in Biology and Medicine 2024年 173卷 108370-108370页

作者： Hu, Yiyue Mu, Nan Liu, Lei Zhang, Lei Jiang, Jingfeng Li, Xiaoning College of Computer Science Sichuan Normal University Chengdu610101 China School of Science and Engineering The Chinese University of Hong Kong Shenzhen Shenzhen518172 China Department of Biomedical Engineering Michigan Technological University HoughtonMI49931 United States Visual Computing and Virtual Reality Key Laboratory of Sichuan Sichuan Normal University Chengdu610068 China Education Big Data Collaborative Innovation Center of Sichuan 2011 Chengdu610101 China

The transformer architecture has achieved remarkable success in medical image analysis owing to its powerful capability for capturing long-range dependencies. However, due to the lack of intrinsic inductive bias in modeling visual structural information, the transformer generally requires a large-scale pre-training schedule, limiting the clinical applications over expensive small-scale medical data. To this end, we propose a slimmable transformer to explore intrinsic inductive bias via position information for medical image segmentation. Specifically, we empirically investigate how different position encoding strategies affect the prediction quality of the region of interest (ROI) and observe that ROIs are sensitive to different position encoding strategies. Motivated by this, we present a novel Hybrid Axial-Attention (HAA) that can be equipped with pixel-level spatial structure and relative position information as inductive bias. Moreover, we introduce a gating mechanism to achieve efficient feature selection and further improve the representation quality over small-scale datasets. Experiments on LGG and COVID-19 datasets prove the superiority of our method over the baseline and previous works. Internal workflow visualization with interpretability is conducted to validate our success better;the proposed slimmable transformer has the potential to be further developed into a visual software tool for improving computer-aided lesion diagnosis and treatment planning. © 2024 Elsevier Ltd

关键词： COVID-19

Towards Robust Polyp Segmentation: Multi-Focus Attention Network with Fine-grained Polyp Cues 25

学校读者我要写书评

暂无评论

Towards Robust Polyp Segmentation: Multi-Focus Attention Net...

Proceedings of the 2025 International Conference on Multimedia Retrieval

作者： Nan Mu Xianchao Zhang Yazhou Feng Xiaoning Li Jingfeng Jiang Lei Liu College of Computer Science Sichuan Normal University Chengdu China Visual Computing and Virtual Reality Key Laboratory of Sichuan Sichuan Normal University Chengdu China Education Big Data Collaborative Innovation Center of Sichuan 2011 Sichuan Normal University Chengdu China Biomedical Engineering Department Michigan Technological University Houghton USA Ant Group Hangzhou China

ISBN: (纸本)9798400718779

Colorectal cancer (CRC) is one of the prominent causes of cancer-related morbidity and mortality worldwide. More AI-assisted methods are conducted for early polyp detection and segmentation to improve the screening efficacy. However, previous solutions generally exhibit weak segmentation performance due to irregular structures of polyps, while the model robustness suffers from background noise of homogeneous neighbors. To this end, we propose a novel Multi-Focus Attention Network (MFANet) to encode multi-dimensional information (i.e., scale, contour, and shape) as fine-grained cues for polyp segmentation. Concretely, a Scale-Residual-Aware Attention (SRAA) is designed to apply the residual operation over each layer of the feature pyramid architecture, which could minimize the feature interference among different scales. To improve the model robustness, a Geometry-Structure-Aware Attention (GSAA) is formulated to integrate and refine multi-dimensional geometric features via a Channel-Wise Enhance Attention (CWEA), which condenses the spatial information and recalibrates the channel importance for adaptive feature recalibration. Experiments on six public datasets indicate the effectiveness of the proposed method. Notably, on the more challenging BKAI dataset, which is featured by tiny polyps with serious interference of homogeneous neighboring region, our MFANet can outperform the state-of-the-art (SOTA) methods. Additionally, it is experimentally verified that our approach consistently exhibits better segmentation performance with higher robustness against different attack strategies (i.e., FGSM, WaNet and PGD).

关键词： attention mechanism

Unsupervised 3D Point Cloud Completion via Multi-view Adversarial Learning

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Wu, Lintai Cheng, Xianjing Xu, Yong Zeng, Huanqiang Hou, Junhui Bio-Computing Research Center Harbin Institute of Technology Shenzhen Guangdong Shenzhen518055 China Department of Computer Science City University of Hong Kong Hong Kong School of Computer Science and Technology Harbin Institute of Technology Shenzhen Guangdong Shenzhen518055 China Shenzhen Key Laboratory of Visual Object Detection and Recognition Guangdong Shenzhen518055 China School of Engineering Huaqiao University Quanzhou362021 China School of Information Science and Engineering Huaqiao University Xiamen361021 China

In real-world scenarios, scanned point clouds are often incomplete due to occlusion issues. The tasks of self-supervised and weakly-supervised point cloud completion involve reconstructing missing regions of these incomplete objects without the supervision of complete ground truth. Current methods either rely on multiple views of partial observations for supervision or overlook the intrinsic geometric similarity that can be identified and utilized from the given partial point clouds. In this paper, we propose MAL-UPC, a framework that effectively leverages both region-level and category-specific geometric similarities to complete missing structures. Our MAL-UPC does not require any 3D complete supervision and only necessitates single-view partial observations in the training set. Specifically, we first introduce a Pattern Retrieval Network to retrieve similar position and curvature patterns between the partial input and the predicted shape, then leverage these similarities to densify and refine the reconstructed results. Additionally, we render the reconstructed complete shape into multi-view depth maps and design an adversarial learning module to learn the geometry of the target shape from category-specific single-view depth images of the partial point clouds in the training set. To achieve anisotropic rendering, we design a density-aware radius estimation algorithm to improve the quality of the rendered images. Our MAL-UPC outperforms current state-of-the-art self-supervised methods and even some unpaired approaches. We will make the source code publicly available at https://***/ltwu6/malspc. Copyright © 2024, The Authors. All rights reserved.

关键词： Unsupervised learning

Resolving Task Confusion in Dynamic Expansion Architectures for Class Incremental Learning

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Huang, Bingchen Chen, Zhineng Zhou, Peng Chen, Jiayin Wu, Zuxuan Shanghai Key Lab of Intelligent Information Processing School of Computer Science Fudan University China Shanghai Collaborative Innovation Center on Intelligent Visual Computing China University of Maryland College ParkMD United States

The dynamic expansion architecture is becoming popular in class incremental learning, mainly due to its advantages in alleviating catastrophic forgetting. However, task confusion is not well assessed within this framework, e.g., the discrepancy between classes of different tasks is not well learned (i.e., inter-task confusion, ITC), and certain priority is still given to the latest class batch (i.e., old-new confusion, ONC). We empirically validate the side effects of the two types of confusion. Meanwhile, a novel solution called Task Correlated Incremental Learning (TCIL) is proposed to encourage discriminative and fair feature utilization across tasks. TCIL performs a multi-level knowledge distillation to propagate knowledge learned from old tasks to the new one. It establishes information flow paths at both feature and logit levels, enabling the learning to be aware of old classes. Besides, attention mechanism and classifier re-scoring are applied to generate more fair classification scores. We conduct extensive experiments on CIFAR100 and ImageNet100 datasets. The results demonstrate that TCIL consistently achieves state-of-the-art accuracy. It mitigates both ITC and ONC, while showing advantages in battle with catastrophic forgetting even no rehearsal memory is reserved. 1 Copyright © 2022, The Authors. All rights reserved.

关键词： Distillation

AN EMPIRICAL ANALYSIS OF UNCERTAINTY IN LARGE LANGUAGE MODEL EVALUATIONS

学校读者我要写书评

暂无评论

arXiv 2025年

作者： Xie, Qiujie Li, Qingqiu Yu, Zhuohao Zhang, Yuejie Zhang, Yue Yang, Linyi Zhejiang University China School of Engineering Westlake University China School of Computer Science Shanghai Key Lab of Intelligent Information Processing Shanghai Collaborative Innovation Center of Intelligent Visual Computing Fudan University China Peking University China Westlake Institute for Advanced Study China University College London United Kingdom Huawei Noah’s Ark Lab Hong Kong

As LLM-as-a-Judge emerges as a new paradigm for assessing large language models (LLMs), concerns have been raised regarding the alignment, bias, and stability of LLM evaluators. While substantial work has focused on alignment and bias, little research has concentrated on the stability of LLM evaluators. In this paper, we conduct extensive experiments involving 9 widely used LLM evaluators across 2 different evaluation settings to investigate the uncertainty in model-based LLM evaluations. We pinpoint that LLM evaluators exhibit varying uncertainty based on model families and sizes. With careful comparative analyses, we find that employing special prompting strategies, whether during inference or post-training, can alleviate evaluation uncertainty to some extent. By utilizing uncertainty to enhance LLM’s reliability and detection capability in Out-Of-Distribution (OOD) data, we further fine-tune an uncertainty-aware LLM evaluator named ConfiLM using a human-annotated fine-tuning set and assess ConfiLM’s OOD evaluation ability on a manually designed test set sourced from the 2024 Olympics. Experimental results demonstrate that incorporating uncertainty as additional information during the fine-tuning phase can largely improve the model’s evaluation performance in OOD scenarios. The code and data are released at: https://***/hasakiXie123/LLM-Evaluator-Uncertainty. © 2025, CC BY-NC-SA.

关键词： Digital elevation model

SIMULATING HUMAN-LIKE DAILY ACTIVITIES WITH DESIRE-DRIVEN AUTONOMY

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Wang, Yiding Chen, Yuxuan Zhong, Fangwei Ma, Long Wang, Yizhou Institute for Artificial Intelligence Peking University China The University of Hong Kong Hong Kong School of Artificial Intelligence Beijing Normal University China Academy for Advanced Interdisciplinary Studies Peking University China State Key Laboratory of General Artificial Intelligence BIGAI China Center on Frontiers of Computing Studies School of Computer Science Nat’l Eng. Research Center of Visual Technology Peking University China

Desires motivate humans to interact autonomously with the complex world. In contrast, current AI agents require explicit task specifications, such as instructions or reward functions, which constrain their autonomy and behavioral diversity. In this paper, we introduce a Desire-driven Autonomous Agent (D2A) that can enable a large language model (LLM) to autonomously propose and select tasks, motivated by satisfying its multi-dimensional desires. Specifically, the motivational framework of D2A is mainly constructed by a dynamic Value System, inspired by the Theory of Needs. It incorporates an understanding of human-like desires, such as the need for social interaction, personal fulfillment, and self-care. At each step, the agent evaluates the value of its current state, proposes a set of candidate activities, and selects the one that best aligns with its intrinsic motivations. We conduct experiments on Concordia, a text-based simulator, to demonstrate that our agent generates coherent, contextually relevant daily activities while exhibiting variability and adaptability similar to human behavior. A comparative analysis with other LLM-based agents demonstrates that our approach significantly enhances the rationality of the simulated activities © 2024, CC BY-NC-ND.

关键词： Autonomous agents