检索结果-内蒙古大学图书馆

Slimmable transformer with hybrid axial-attention for medical image segmentation

computers in Biology and Medicine 2024年 173卷 108370-108370页

作者： Hu, Yiyue Mu, Nan Liu, Lei Zhang, Lei Jiang, Jingfeng Li, Xiaoning College of Computer Science Sichuan Normal University Chengdu610101 China School of Science and Engineering The Chinese University of Hong Kong Shenzhen Shenzhen518172 China Department of Biomedical Engineering Michigan Technological University HoughtonMI49931 United States Visual Computing and Virtual Reality Key Laboratory of Sichuan Sichuan Normal University Chengdu610068 China Education Big Data Collaborative Innovation Center of Sichuan 2011 Chengdu610101 China

The transformer architecture has achieved remarkable success in medical image analysis owing to its powerful capability for capturing long-range dependencies. However, due to the lack of intrinsic inductive bias in modeling visual structural information, the transformer generally requires a large-scale pre-training schedule, limiting the clinical applications over expensive small-scale medical data. To this end, we propose a slimmable transformer to explore intrinsic inductive bias via position information for medical image segmentation. Specifically, we empirically investigate how different position encoding strategies affect the prediction quality of the region of interest (ROI) and observe that ROIs are sensitive to different position encoding strategies. Motivated by this, we present a novel Hybrid Axial-Attention (HAA) that can be equipped with pixel-level spatial structure and relative position information as inductive bias. Moreover, we introduce a gating mechanism to achieve efficient feature selection and further improve the representation quality over small-scale datasets. Experiments on LGG and COVID-19 datasets prove the superiority of our method over the baseline and previous works. Internal workflow visualization with interpretability is conducted to validate our success better;the proposed slimmable transformer has the potential to be further developed into a visual software tool for improving computer-aided lesion diagnosis and treatment planning. © 2024 Elsevier Ltd

关键词： COVID-19

来源：评论

学校读者我要写书评

暂无评论

Towards Robust Polyp Segmentation: Multi-Focus Attention Network with Fine-grained Polyp Cues 25

Towards Robust Polyp Segmentation: Multi-Focus Attention Net...

引用

Proceedings of the 2025 International Conference on Multimedia Retrieval

作者： Nan Mu Xianchao Zhang Yazhou Feng Xiaoning Li Jingfeng Jiang Lei Liu College of Computer Science Sichuan Normal University Chengdu China Visual Computing and Virtual Reality Key Laboratory of Sichuan Sichuan Normal University Chengdu China Education Big Data Collaborative Innovation Center of Sichuan 2011 Sichuan Normal University Chengdu China Biomedical Engineering Department Michigan Technological University Houghton USA Ant Group Hangzhou China

ISBN: (纸本)9798400718779

Colorectal cancer (CRC) is one of the prominent causes of cancer-related morbidity and mortality worldwide. More AI-assisted methods are conducted for early polyp detection and segmentation to improve the screening efficacy. However, previous solutions generally exhibit weak segmentation performance due to irregular structures of polyps, while the model robustness suffers from background noise of homogeneous neighbors. To this end, we propose a novel Multi-Focus Attention Network (MFANet) to encode multi-dimensional information (i.e., scale, contour, and shape) as fine-grained cues for polyp segmentation. Concretely, a Scale-Residual-Aware Attention (SRAA) is designed to apply the residual operation over each layer of the feature pyramid architecture, which could minimize the feature interference among different scales. To improve the model robustness, a Geometry-Structure-Aware Attention (GSAA) is formulated to integrate and refine multi-dimensional geometric features via a Channel-Wise Enhance Attention (CWEA), which condenses the spatial information and recalibrates the channel importance for adaptive feature recalibration. Experiments on six public datasets indicate the effectiveness of the proposed method. Notably, on the more challenging BKAI dataset, which is featured by tiny polyps with serious interference of homogeneous neighboring region, our MFANet can outperform the state-of-the-art (SOTA) methods. Additionally, it is experimentally verified that our approach consistently exhibits better segmentation performance with higher robustness against different attack strategies (i.e., FGSM, WaNet and PGD).

关键词： attention mechanism

来源：评论

学校读者我要写书评

暂无评论

Unsupervised 3D Point Cloud Completion via Multi-view Adversarial Learning

arXiv

引用

arXiv 2024年

作者： Wu, Lintai Cheng, Xianjing Xu, Yong Zeng, Huanqiang Hou, Junhui Bio-Computing Research Center Harbin Institute of Technology Shenzhen Guangdong Shenzhen518055 China Department of Computer Science City University of Hong Kong Hong Kong School of Computer Science and Technology Harbin Institute of Technology Shenzhen Guangdong Shenzhen518055 China Shenzhen Key Laboratory of Visual Object Detection and Recognition Guangdong Shenzhen518055 China School of Engineering Huaqiao University Quanzhou362021 China School of Information Science and Engineering Huaqiao University Xiamen361021 China

In real-world scenarios, scanned point clouds are often incomplete due to occlusion issues. The tasks of self-supervised and weakly-supervised point cloud completion involve reconstructing missing regions of these incomplete objects without the supervision of complete ground truth. Current methods either rely on multiple views of partial observations for supervision or overlook the intrinsic geometric similarity that can be identified and utilized from the given partial point clouds. In this paper, we propose MAL-UPC, a framework that effectively leverages both region-level and category-specific geometric similarities to complete missing structures. Our MAL-UPC does not require any 3D complete supervision and only necessitates single-view partial observations in the training set. Specifically, we first introduce a Pattern Retrieval Network to retrieve similar position and curvature patterns between the partial input and the predicted shape, then leverage these similarities to densify and refine the reconstructed results. Additionally, we render the reconstructed complete shape into multi-view depth maps and design an adversarial learning module to learn the geometry of the target shape from category-specific single-view depth images of the partial point clouds in the training set. To achieve anisotropic rendering, we design a density-aware radius estimation algorithm to improve the quality of the rendered images. Our MAL-UPC outperforms current state-of-the-art self-supervised methods and even some unpaired approaches. We will make the source code publicly available at https://***/ltwu6/malspc. Copyright © 2024, The Authors. All rights reserved.

关键词： Unsupervised learning

来源：评论

学校读者我要写书评

暂无评论

Low-Cost Automated visual Screw Inspection System

Low-Cost Automated Visual Screw Inspection System

引用

IEEE Symposium Series on Computational Intelligence (SSCI)

作者： Yiran Li Jiayi LI Xiaoying Yang Cheng'ao Li Xihan Xiong Yutong Fang Shusheng Ding Tianxiang Cui School of Computer Science University of Nottingham Ningbo China Ningbo China School of Aerospace Engineering University of Nottingham Ningbo China Ningbo China Deparment of Computing Imperial College London London UK Teaching Research Center Ningbo Open University Ningbo China Business School Ningbo University Ningbo China

Despite the significant achievements in the development of automation technologies, the application of autonomous robots to improve the production efficiency of small-scale indus-tries has been largely ignored. While there has been excellent progress in industrial image processing systems implementation, most of the work has focused on a unique aspect of specific objects rather than introducing a general inspection system. Thus, this paper discusses the critical industrial topic of quality control, which develops rapidly through the use of autonomous systems. Given the high cost of implementing automated systems, this paper presents an affordable low-budget solution for the visual inspection system. This method of inspecting screw dimensions consists of four visual inspection parts and a special mechanical supporting structure. The designed system was able to check the overall screw dimensions, including screw head diameter, screw head driven type, screw length, screw thread length, and screw head thickness. It could also separate the qualified screws from the unqualified ones after the inspection process. The accuracy of most inspection cases is 100%, meaning the error ranges within 0.1mm, which meets all the non-negotiable requirements and most of the target requirements. The visual inspection parts can be further enhanced by building a template matching library that includes different angles of the screw head or by using Hough Transform to identify the defect types of the screw thread.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Resolving Task Confusion in Dynamic Expansion Architectures for Class Incremental Learning

arXiv

引用

arXiv 2022年

作者： Huang, Bingchen Chen, Zhineng Zhou, Peng Chen, Jiayin Wu, Zuxuan Shanghai Key Lab of Intelligent Information Processing School of Computer Science Fudan University China Shanghai Collaborative Innovation Center on Intelligent Visual Computing China University of Maryland College ParkMD United States

The dynamic expansion architecture is becoming popular in class incremental learning, mainly due to its advantages in alleviating catastrophic forgetting. However, task confusion is not well assessed within this framework, e.g., the discrepancy between classes of different tasks is not well learned (i.e., inter-task confusion, ITC), and certain priority is still given to the latest class batch (i.e., old-new confusion, ONC). We empirically validate the side effects of the two types of confusion. Meanwhile, a novel solution called Task Correlated Incremental Learning (TCIL) is proposed to encourage discriminative and fair feature utilization across tasks. TCIL performs a multi-level knowledge distillation to propagate knowledge learned from old tasks to the new one. It establishes information flow paths at both feature and logit levels, enabling the learning to be aware of old classes. Besides, attention mechanism and classifier re-scoring are applied to generate more fair classification scores. We conduct extensive experiments on CIFAR100 and ImageNet100 datasets. The results demonstrate that TCIL consistently achieves state-of-the-art accuracy. It mitigates both ITC and ONC, while showing advantages in battle with catastrophic forgetting even no rehearsal memory is reserved. 1 Copyright © 2022, The Authors. All rights reserved.

关键词： Distillation

来源：评论

学校读者我要写书评

暂无评论

AN EMPIRICAL ANALYSIS OF UNCERTAINTY IN LARGE LANGUAGE MODEL EVALUATIONS

arXiv

引用

arXiv 2025年

作者： Xie, Qiujie Li, Qingqiu Yu, Zhuohao Zhang, Yuejie Zhang, Yue Yang, Linyi Zhejiang University China School of Engineering Westlake University China School of Computer Science Shanghai Key Lab of Intelligent Information Processing Shanghai Collaborative Innovation Center of Intelligent Visual Computing Fudan University China Peking University China Westlake Institute for Advanced Study China University College London United Kingdom Huawei Noah’s Ark Lab Hong Kong

As LLM-as-a-Judge emerges as a new paradigm for assessing large language models (LLMs), concerns have been raised regarding the alignment, bias, and stability of LLM evaluators. While substantial work has focused on alignment and bias, little research has concentrated on the stability of LLM evaluators. In this paper, we conduct extensive experiments involving 9 widely used LLM evaluators across 2 different evaluation settings to investigate the uncertainty in model-based LLM evaluations. We pinpoint that LLM evaluators exhibit varying uncertainty based on model families and sizes. With careful comparative analyses, we find that employing special prompting strategies, whether during inference or post-training, can alleviate evaluation uncertainty to some extent. By utilizing uncertainty to enhance LLM’s reliability and detection capability in Out-Of-Distribution (OOD) data, we further fine-tune an uncertainty-aware LLM evaluator named ConfiLM using a human-annotated fine-tuning set and assess ConfiLM’s OOD evaluation ability on a manually designed test set sourced from the 2024 Olympics. Experimental results demonstrate that incorporating uncertainty as additional information during the fine-tuning phase can largely improve the model’s evaluation performance in OOD scenarios. The code and data are released at: https://***/hasakiXie123/LLM-Evaluator-Uncertainty. © 2025, CC BY-NC-SA.

关键词： Digital elevation model

来源：评论

学校读者我要写书评

暂无评论

Rate Control for VVC Intra Coding with Simplified Cubic Rate-Distortion Model

Rate Control for VVC Intra Coding with Simplified Cubic Rate...

引用

IEEE Workshop on Multimedia Signal Processing

作者： Yizhao Wang Jiaqi Zhang Songlin Sun School of Information and Communication Engineering Beijing University of Posts and Telecommunications Beijing China Key Laboratory of Trustworthy Distributed Computing and Service (BUPT) Ministry of Education China Engineering Research Center of Blockchain and Network Convergence Technology Ministry of Education China National Engineering Research Center of Visual Technology School of Computer Science Peking University Beijing China

In this paper, we propose a simplified cubic polynomial R-D model with corresponding rate control methods for Versatile Video Coding (VVC) intra frame coding. First, we explore the rate-distortion (R-D) characteristics of VVC intra coding. By comparing several potential R-D modeling approaches, a new intra coding R-D model has been proposed based on the simplified cubic polynomial function. Subsequently, we derive the corresponding $R-\lambda$ model and introduce a complexity measurement to improve the performance of intra frame rate control. Furthermore, we propose a Coding Tree Unit (CTU)-level rate control method based on the newly proposed R-D model and further develop a pre-compression-based approach on this basis. Experimental results show that the proposed method can achieve 1.87% and 0.55% bit rate reduction for All-Intra (AI) and Random-Access (RA) configurations over the original rate control in VVC Test Model (VTM), while the computational complexity increment is negligible. Meanwhile, the enhanced bit rate accuracy from rate control has been observed in the proposed methods.

关键词：

来源：评论

学校读者我要写书评

暂无评论

SIMULATING HUMAN-LIKE DAILY ACTIVITIES WITH DESIRE-DRIVEN AUTONOMY

arXiv

引用

arXiv 2024年

作者： Wang, Yiding Chen, Yuxuan Zhong, Fangwei Ma, Long Wang, Yizhou Institute for Artificial Intelligence Peking University China The University of Hong Kong Hong Kong School of Artificial Intelligence Beijing Normal University China Academy for Advanced Interdisciplinary Studies Peking University China State Key Laboratory of General Artificial Intelligence BIGAI China Center on Frontiers of Computing Studies School of Computer Science Nat’l Eng. Research Center of Visual Technology Peking University China

Desires motivate humans to interact autonomously with the complex world. In contrast, current AI agents require explicit task specifications, such as instructions or reward functions, which constrain their autonomy and behavioral diversity. In this paper, we introduce a Desire-driven Autonomous Agent (D2A) that can enable a large language model (LLM) to autonomously propose and select tasks, motivated by satisfying its multi-dimensional desires. Specifically, the motivational framework of D2A is mainly constructed by a dynamic Value System, inspired by the Theory of Needs. It incorporates an understanding of human-like desires, such as the need for social interaction, personal fulfillment, and self-care. At each step, the agent evaluates the value of its current state, proposes a set of candidate activities, and selects the one that best aligns with its intrinsic motivations. We conduct experiments on Concordia, a text-based simulator, to demonstrate that our agent generates coherent, contextually relevant daily activities while exhibiting variability and adaptability similar to human behavior. A comparative analysis with other LLM-based agents demonstrates that our approach significantly enhances the rationality of the simulated activities © 2024, CC BY-NC-ND.

关键词： Autonomous agents

来源：评论

学校读者我要写书评

暂无评论

Prototypical Residual Networks for Anomaly Detection and Localization

arXiv

引用

arXiv 2022年

作者： Zhang, Hui Wu, Zuxuan Wang, Zheng Chen, Zhineng Jiang, Yu-Gang Shanghai Key Lab of Intell. Info. Processing School of CS Fudan University China Shanghai Collaborative Innovation Center of Intelligent Visual Computing China School of Computer Science Zhejiang University of Technology China

Anomaly detection and localization are widely used in industrial manufacturing for its efficiency and effectiveness. Anomalies are rare and hard to collect and supervised models easily over-fit to these seen anomalies with a handful of abnormal samples, producing unsatisfactory performance. On the other hand, anomalies are typically subtle, hard to discern, and of various appearance, making it difficult to detect anomalies and let alone locate anomalous regions. To address these issues, we propose a framework called Prototypical Residual Network (PRN), which learns feature residuals of varying scales and sizes between anomalous and normal patterns to accurately reconstruct the segmentation maps of anomalous regions. PRN mainly consists of two parts: multi-scale prototypes that explicitly represent the residual features of anomalies to normal patterns;a multi-size self-attention mechanism that enables variable-sized anomalous feature learning. Besides, we present a variety of anomaly generation strategies that consider both seen and unseen appearance variance to enlarge and diversify anomalies. Extensive experiments on the challenging and widely used MVTec AD benchmark show that PRN outperforms current state-of-the-art unsupervised and supervised methods. We further report SOTA results on three additional datasets to demonstrate the effectiveness and generalizability of PRN. Copyright © 2022, The Authors. All rights reserved.

关键词： Anomaly detection

来源：评论

学校读者我要写书评

暂无评论

A Novel Smartphone Recommendation System Using Ensemble Machine Learning

A Novel Smartphone Recommendation System Using Ensemble Mach...

引用

2023 IEEE Asia-Pacific Conference on computer science and Data Engineering, CSDE 2023

作者： Almadhor, Ahmad Abbas, Sidra Sampedro, Gabriel Avelino Abisado, Mideth Gadekallu, Thippa Reddy College of Computer and Information Sciences Jouf University Sakaka72388 Saudi Arabia Comsats University Islamabad Department of Computer Science Islamabad Pakistan University of the Philippines Open University Faculty of Information and Communication Studies Los Baños4031 Philippines De la Salle University Center for Computational Imaging and Visual Innovations 2401 Taft Ave. Manila1004 Philippines College of Computing and Information Technologies National University Manila Philippines Zhongda Group Jiaxing City Zhejiang Province Haiyan County314312 China Lebanese American University Department of Electrical and Computer Engineering Byblos Lebanon School of Information Technology and Engineering Vellore Institute of Technology Tamil Nadu India College of Information Science and Engineering Jiaxing University Jiaxing314001 China Lovely Professional University Division of Research and Development Phagwara India

ISBN: (纸本)9798350341072

Due to the proliferation of internet evaluations brought on by the rising demand for smartphones, consumers find it challenging to make accurate selections when purchasing. In this paper, we offer ensemble voting methods based on TF-IDF (Term Frequency-Inverse Document Frequency) features for clas-sifying mobile phone ratings. We use a recently assembled dataset comprising over 13,000 smartphone reviews from the Flipkart website. The suggested approach includes feature extraction using the TF-IDF, data cleaning, balancing, and voting-based model prediction. To identify the recently created Flipkart dataset, the suggested method created an ensemble voting mechanism based on machine learning techniques. According to the experimental findings, the suggested method performs more accurately and efficiently than conventional machine learning techniques. At 98.0 %, the model achieved the greatest accuracy. The suggested method can be expanded to additional e-commerce platforms with sizable datasets of online evaluations and assist customers in making updated purchase preferences. © 2023 IEEE.

关键词： Machine learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：