检索结果-内蒙古大学图书馆

5th Workshop on Intelligent Data - From Data to Knowledge, DOING 2024, 3rd Workshop on Knowledge Graphs Analysis on a Large Scale, K-GALS 2024, 6th Workshop on Modern Approaches in Data Engineering and Information System Design, MADEISD 2024 and 3rd Workshop on Personalization and Recommender Systems, PERS 2024 held in conjunction with 28th European Conference on Advances in Databases and Information Systems, ADBIS 2024

作者： Erezman, Mateusz Dziubich, Tomasz Computer Vision and Artificial Intelligence Laboratory Department of Computer Architecture Faculty of Electronics Telecommunications and Informatics Gdańsk University of Technology Gdańsk Poland

ISBN: (纸本)9783031704208

Accurate segmentation of cellular nuclei is imperative for various biological and medical applications, such as cancer diagnosis and drug discovery. Histopathology, a discipline employing microscopic examination of bodily tissues, serves as a cornerstone for cancer diagnosis. Nonetheless, the conventional histopathological diagnosis process is frequently marred by time constraints and potential inaccuracies. Consequently, there arises a pressing need for automated image analysis tools to augment medical practitioners’ efforts. In this paper, we present a novel approach utilising Transformer model, originally designed for natural language processing tasks, for automated cellular nuclei segmentation in whole-slide microscopic images. Specifically targeting cell nuclei, this methodology holds significance as the initial phase in diagnosing various illnesses, streamlining the analysis and quantification process. The study introduces a novel model that combines a U-Net architecture with a Transformer-based network functioning as a parallel encoder. This model was compared against three other popular architectures in the literature: U-Net, ResU-Net, and LinkNet-34. The impact of augmentation and colour normalisation techniques was investigated. The average Dice similarity coefficient for the considered images was found to be 0.8041. The obtained results seem to be clinically relevant. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Image segmentation

来源：评论

学校读者我要写书评

暂无评论

LLaVA-Endo:a large language-and-vision assistant for gastrointestinal endoscopy

引用

Frontiers of computer Science 2025年第4期19卷 121-123页

作者： Jieru YAO Xueran LI Qiang XIE Longfei HAN Yiwen JIA Nian LIU Dingwen ZHANG Junwei HAN School of Automation Northwestern Polytechnical UniversityXi’an 710072China Institute of Artificial Intelligence Hefei Comprehensive National Science CenterHefei 230088China AHU-IAI AI Joint Laboratory Anhui UniversityHefei 230039China Institute of Advanced Technology University of Science and Technology of ChinaHefei 230026China School of Computer and Artificial Intelligence Beijing Technology and Business UniversityBeijing 100048China Department of Gastroenterology The Third Affiliated Hospital of Anhui Medical University(Hefei First People’s Hospital)Hefei 230061China The Computer Vision Department Mohamed Bin Zayed University of Artificial IntelligenceMasdarAbu Dhabi 200120United Arab Emirates Xijing Hospital The Fourth Military Medical UniversityXi’an 710032China

1 Introduction Endoscopy plays a crucial role in the diagnoses and treatment of gastrointestinal(GI)diseases[1],as it helps to identify abnormalities,classify lesion,and determine treatment *** GI endoscopic examinations,physicians may encounter practical hindrances,i.e.,fatigue,stress,or limited experience,which can lead to erroneous *** intelligence(AI)-assisted GI endoscopy technology has emerged to address these limitations[2].

关键词： stress giendoscopy fatigue identify abnormalitiesclassify lesionidentification artificialintelligence

来源：评论

学校读者我要写书评

暂无评论

SIR-HCL: Semantic-Inconsistency Reasoning and Hybrid Contrastive Learning for Efficient Cross-Emotion Anomaly Detection

引用

IEEE Transactions on Cognitive and Developmental Systems 2025年

作者： Liu, Xin Chen, Qiyan Cheung, Yiu-Ming Peng, Shu-Juan Huaqiao University Department of Computer Science Xiamen361021 China Hong Kong Baptist University Department of Computer Science SAR Hong Kong Hong Kong Xiamen Key Laboratory of Computer Vision and Pattern Recognition Xiamen361021 China Huaqiao University Fujian Key Laboratory of Big Data Intelligence and Security Xiamen361021 China Huaqiao University Department of Artificial Intelligence Xiamen China Fujian Province University Key Laboratory of Computer Vision and Machine Learning Huaqiao University Xiamen361021 China

Cross-emotion anomaly detection is an emerging and challenging research topic in cognitive analysis field, which aims at identifying the abnormal emotion pair whose semantic patterns are inconsistent across different emotional modalities. To the best of our knowledge, this topic has yet to be well studied, which could potentially benefit lots of valuable cognitive applications such as autistic children diagnosis and criminal deception detection. To this end, this paper proposes an efficient cross-emotion anomaly detection approach via semanticinconsistency reasoning and hybrid contrastive learning (SIR-HCL), which is the first attempt to detect the anomalous emotional pairs across the audio-visual emotions. First, the proposed framework utilizes dual-branch network to obtain the deep emotional features in each modality, and then employs the shared residual block to derive the semantically compatible features. Subsequently, an efficient hybrid contrastive learning approach is designed to enlarge the semantic-inconsistency among abnormal emotional pair with different affective classes, while enhancing the semantic-consistency and increasing the feature correlation between normal emotional pair from the same affective class. At the same time, an efficient bidirectional learning scheme is employed to significantly improve the data utilization and a two-component Beta Mixture Model is adaptively utilized to reason the anomalous emotion pairs. Extensive experiments evaluated on two benchmark datasets show that the proposed SIR-HCL method can well detect the anomalous emotional pairs across audio-visual emotional data, and brings substantial improvements over the state-of-the-art competing methods. © 2016 IEEE.

关键词： Contrastive Learning

来源：评论

学校读者我要写书评

暂无评论

HS-Gen: Human-Like Handwriting Synthetic Generation—A Preliminary Investigation 27th

HS-Gen: Human-Like Handwriting Synthetic Generation—A Preli...

引用

27th International Conference on Pattern Recognition, ICPR 2024

作者： Bensalah, Asma Parziale, Antonio Coccaro, Rosanna Marcelli, Angelo Fornés, Alicia Lladós, Josep Computer Vision Center Universitat Autònoma de Barcelona Barcelona Spain Computer Science Department Universitat Autònoma de Barcelona Barcelona Spain DIEM University of Salerno Via Giovanni Paolo II 132 SA Fisciano84084 Italy AI3S Unit CINI National Laboratory of Artificial Intelligence and Intelligent Systems University of Salerno SA Fisciano Italy

ISBN: (纸本)9783031876592

The application of handwriting analysis in the health field for early detection and diagnosis is limited by a lack of data, which presents a significant challenge for the implementation of deep learning-based models. To address this issue, numerous studies have focused on generating synthetic data. Current methods for generating synthetic handwriting from offline images, including generative models and geometrical techniques, fail to account for the kinematics of handwriting movements, which is critical for achieving more human-like results. To address this limitation, we propose a novel, human-like approach for the synthetic generation of handwriting or drawing from offline images. This method creates new samples by incorporating a trajectory recovery algorithm and a human-like time law generator, as well as the extraction and manipulation of kinematic parameters. The evaluation of the proposed method from both visual and kinematic perspectives demonstrates its potential applicability across a wide range of devices and handwriting styles. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Deep learning

来源：评论

学校读者我要写书评

暂无评论

Multi-modal classification of forest biodiversity potential from 2D orthophotos and 3D airborne laser scanning point clouds

arXiv

引用

arXiv 2025年

作者： Jensen, Simon B. Oehmcke, Stefan Møgelmose, Andreas Madadi, Meysam Igel, Christian Escalera, Sergio Moeslund, Thomas B. Visual Analysis and Perception Laboratory Aalborg University Denmark Pioneer Centre for Artificial Intelligence Denmark Department of Computer Science Copenhagen University Denmark Institute for Visual & Analytic Computing Rostock University Germany University of Barcelona and Computer Vision Center Spain

Accurate assessment of forest biodiversity is crucial for ecosystem management and conservation. While traditional field surveys provide high-quality assessments, they are labor-intensive and spatially limited. This study investigates whether deep learning-based fusion of close-range sensing data from 2D orthophotos (12.5 cm resolution) and 3D airborne laser scanning (ALS) point clouds (8 points/m2) can enhance biodiversity assessment. We introduce the BioVista dataset, comprising 44 378 paired samples of orthophotos and ALS point clouds from temperate forests in Denmark, designed to explore multi-modal fusion approaches for biodiversity potential classification. Using deep neural networks (ResNet for orthophotos and PointVector for ALS point clouds), we investigate each data modality's ability to assess forest biodiversity potential, achieving mean accuracies of 69.4% and 72.8%, respectively. We explore two fusion approaches: a confidence-based ensemble method and a feature-level concatenation strategy, with the latter achieving a mean accuracy of 75.5%. Our results demonstrate that spectral information from orthophotos and structural information from ALS point clouds effectively complement each other in forest biodiversity assessment. © 2025, CC BY.

关键词：

来源：评论

学校读者我要写书评

暂无评论

FAA-CLIP: Federated Adversarial Adaptation of CLIP

引用

IEEE Internet of Things Journal 2025年

作者： Wu, Yihang Chaddad, Ahmad Desrosiers, Christian Daqqaq, Tareef Kateb, Reem Guilin University of Electronic Technology Artificial Intelligence for Personalised medicine School of Artificial Intelligence Guilin China Ecole de Technologie Superieure The Laboratory for Imagery Vision and Artificial Intelligence Montreal Canada Taibah University College of Medicine Al Madinah Saudi Arabia Ministry of National Guard health Affairs Prince Mohammed Bin Abdulaziz Hospital Al-Madinah Saudi Arabia Taibah University and College of Computer Science and Engineering Jeddah University College of Computer Science and Engineering Cyber Security Department Jeddah Saudi Arabia

Despite the remarkable performance of vision language models (VLMs) such as Contrastive Language Image Pre-training (CLIP), the large size of these models is a considerable obstacle to their use in federated learning (FL) systems where the parameters of local client models need to be transferred to a global server for aggregation. Another challenge in FL is the heterogeneity of data from different clients, which affects the generalization performance of the solution. In addition, natural pre-trained VLMs exhibit poor generalization ability in the medical datasets, suggests there exists a domain gap. To solve these issues, we introduce a novel method for the Federated Adversarial Adaptation (FAA) of CLIP. Our method, named FAA-CLIP, handles the large communication costs of CLIP using a light-weight feature adaptation module (FAM) for aggregation, effectively adapting this VLM to each client's data while greatly reducing the number of parameters to transfer. By keeping CLIP frozen and only updating the FAM parameters, our method is also computationally efficient. Unlike existing approaches, our FAA-CLIP method directly addresses the problem of domain shifts across clients via a domain adaptation (DA) module. This module employs a domain classifier to predict if a given sample is from the local client or the global server, allowing the model to learn domain-invariant representations. Extensive experiments on six different datasets containing both natural and medical images demonstrate that FAA-CLIP can generalize well on both natural and medical datasets compared to recent FL approaches. Our codes are available at https://***/AIPMLab/FAA-CLIP. © 2014 IEEE.

关键词： Federated learning

来源：评论

学校读者我要写书评

暂无评论

A Multi-Agent Digital Twin Framework for AI-Driven Fitness Coaching 25

A Multi-Agent Digital Twin Framework for AI-Driven Fitness C...

引用

Proceedings of the 2025 ACM International Conference on Interactive Media Experiences

作者： Monica (Monireh) Vahdati Kamran Gholizadeh HamlAbadi Fedwa Laamarti Abdulmotaleb El Saddik Multimedia Communications Research Laboratory (MCRLab) School of Electrical Engineering and Computer Science University of Ottawa Ottawa Ontario Canada Computer Vision Department Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) Abu Dhabi United Arab Emirates and Multimedia Communications Research Laboratory (MCRLab) School of Electrical Engineering and Computer Science University of Ottawa Ottawa Ontario Canada

来源：评论

学校读者我要写书评

暂无评论

Zero-Shot Audio Captioning Using Soft and Hard Prompts

IEEE Transactions on Audio, Speech and Language Processing

引用

IEEE Transactions on Audio, Speech and Language Processing 2025年 33卷 2045-2058页

作者： Yiming Zhang Xuenan Xu Ruoyi Du Haohe Liu Yuan Dong Zheng-Hua Tan Wenwu Wang Zhanyu Ma Pattern Recognition and Intelligent System Laboratory School of Artificial Intelligence Beijing University of Posts and Telecommunications Beijing China Department of Computer Science and Engineering Shanghai Jiao Tong University Shanghai China Centre for Vision Speech and Signal Processing University of Surrey Guildford U.K. Department of Electronic Systems Aalborg University Aalborg Denmark

In traditional audio captioning methods, a model is usually trained in a fully supervised manner using a human-annotated dataset containing audio-text pairs and then evaluated on the test set from the same dataset. Such methods have two limitations. First, these methods are often data-hungry and require time-consuming and expensive human annotations to obtain audio-text pairs. Second, these models often suffer from performance degradation in cross-domain scenarios, i.e., when the input audio comes from a different domain than the training set, and this issue has received little attention. To address these issues, we propose a new zero-shot method for audio captioning. Our method is built on the contrastive language-audio pre-training (CLAP) model. During training, the model reconstructs the ground-truth caption using the CLAP text encoder. In the inference stage, the model generates text descriptions from the CLAP audio embeddings of given audio inputs. To enhance the ability of the model in transitioning from text-to-text generation to audio-to-text generation, we propose to use the mixed-augmentations-based soft prompt to learn more robust latent representations, leveraging instance replacement and embedding augmentation. Additionally, we introduce the retrieval-based acoustic-aware hard prompt to improve the cross-domain performance of the model by employing the domain-agnostic label information of sound events. Extensive experiments on AudioCaps and Clotho benchmarks show the effectiveness of our proposed method, which outperforms other zero-shot audio captioning approaches for in-domain scenarios and outperforms the compared methods for cross-domain scenarios, underscoring the generalization ability of our method.

关键词： Training Decoding Semantics Data models Acoustics Electronic mail Benchmark testing Transformers Robustness Perturbation methods

来源：评论

学校读者我要写书评

暂无评论

Online Self-distillation and Self-modeling for 3D Brain Tumor Segmentation

引用

IEEE Journal of Biomedical and Health Informatics 2025年 PP卷 PP页

作者： Pang, Yan Li, Yunhao Huang, Teng Liang, Jiaming Wang, Zhen Dong, Changyu Kuang, Dongyang Hu, Ying Chen, Hao Lei, Tim Wang, Qiong The Guangdong Provincial Key Laboratory of Computer Vision and Virtual Reality Technology Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Shenzhen China The School of Artificial Intelligence Guangzhou University China The Zhejiang Lab Hangzhou China Sun Yat-sen University China The Department of Computer Science and Engineering The Department of Chemical and Biological Engineering Hong Kong University of Science and Technology China The Department of Electrical Engineering University of Colorado Denver United States

In the specialized domain of brain tumor segmentation, supervised segmentation approaches are hindered by the limited availability of high-quality labeled data, a condition arising from data privacy concerns, significant costs, and ethical issues. In response to this challenge, this paper presents a training framework that adeptly integrates a plug-and-play component, MOD, into current supervised learning models, boosting their efficacy in scenarios with limited data. The MOD consists of an Online Tokenizer and a Dense Predictor, which employs self-distillation and self-modeling on masked patches, promoting swift convergence and efficient representation learning. During the inference phase, the plug-and-play MOD component is excluded, preserving the computational efficiency of the original model without incurring extra processing costs. We substantiated the value of our approach through experiments on leading 3D brain tumor segmentation baselines. Remarkably, models augmented with the MOD consistently showcased superior results, achieving elevated Dice coefficients and HD95 scores on two datasets: BraTS 2021 and MSD 2019 Task-01 Brain Tumor. Code: https://***/aigzhusmart/MOD © 2013 IEEE.

关键词： Self-supervised learning

来源：评论

学校读者我要写书评

暂无评论

A generalized Hermite pyramid for ultrasonic image analysis .2. Gallstone

引用

ULTRASONIC IMAGING 1996年第4期18卷 305-327页

作者： Venkatesh, YV Computer Vision and Artificial Intelligence Laboratory Department of Electrical Engineering Indian Institute of Science

This paper deals with the problem of extracting information regarding the chemical composition of stones in the human gallbladder from in vitro and in vivo B-scan ultrasonic images. The images are subjected to the Hermite pyramid decomposition technique described in Part I (Venkatesh, Y. V., Ultrasonic Imaging, 18, 261-301, 1996). In an attempt to determine the chemical composition of the gallstones, the gradients of the decomposed images are input to an unsupervised classifier. The outputs of the classifier exhibit some interesting patterns that appear to be related to the chemical composition of the gallstones contained in these images. (C) 1996 Academic Press.

关键词： gallstones Hermite polynomials image decomposition medical imaging multiscale image analysis ultrasonic image analysis unsupervised classification

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：