检索结果-内蒙古大学图书馆

2024 Conference on Empirical Methods in Natural Language processing, EMNLP 2024

作者： Shiri, Fatemeh Guo, Xiao-Yu Far, Mona Golestan Yu, Xin Haffari, Gholamreza Li, Yuan-Fang Department of Data Science & AI Monash University Australia Australian Institute for Machine Learning University of Adelaide Australia School of Electrical Engineering and Computer Science University of Queensland Australia

ISBN: (纸本)9798891761643

Large Multimodal Models (LMMs) have achieved strong performance across a range of vision and language tasks. However, their spatial reasoning capabilities are under-investigated. In this paper, we construct a novel VQA dataset, Spatial-MM, to comprehensively study LMMs' spatial understanding and reasoning capabilities. Our analyses on object-relationship and multi-hop reasoning reveal several important findings. Firstly, bounding boxes and scene graphs, even synthetic ones, can significantly enhance LMMs' spatial reasoning. Secondly, LMMs struggle more with questions posed from the human perspective than the camera perspective about the image. Thirdly, chain of thought (CoT) prompting does not improve model performance on complex multi-hop questions involving spatial relations. Lastly, our perturbation analysis on GQA-spatial reveals that LMMs are much stronger at basic object detection than complex spatial reasoning. We believe our new benchmark dataset and in-depth analyses can spark further research on LMMs spatial reasoning. © 2024 Association for Computational Linguistics.

关键词： Computational linguistics

来源：评论

学校读者我要写书评

暂无评论

Pairwise Alignment Improves Graph Domain Adaptation 41

Pairwise Alignment Improves Graph Domain Adaptation

引用

41st International Conference on Machine Learning, ICML 2024

作者： Liu, Shikun Zou, Deyu Zhao, Han Li, Pan Department of Electrical and Computer Engineering Georgia Institute of Technology GA United States School of Data Science University of Science and Technology of China Hefei China Department of Computer Science University of Illinois Urbana-Champaign Champaign United States

Graph-based methods, pivotal for label inference over interconnected objects in many real-world applications, often encounter generalization challenges, if the graph used for model training differs significantly from the graph used for testing. This work delves into Graph Domain Adaptation (GDA) to address the unique complexities of distribution shifts over graph data, where interconnected data points experience shifts in features, labels, and in particular, connecting patterns. We propose a novel, theoretically principled method, Pairwise Alignment (Pair-Align) to counter graph structure shift by mitigating conditional structure shift (CSS) and label shift (LS). Pair-Align uses edge weights to recalibrate the influence among neighboring nodes to handle CSS and adjusts the classification loss with label weights to handle LS. Our method demonstrates superior performance in real-world applications, including node classification with region shift in social networks, and the pileup mitigation task in particle colliding experiments. For the first application, we also curate the largest dataset by far for GDA studies. Our method shows strong performance in synthetic and other existing benchmark datasets. Copyright 2024 by the author(s)

关键词： Graph theory

来源：评论

学校读者我要写书评

暂无评论

Oppositional Harris Hawks Optimization with Deep Learning-Based Image Captioning

引用

computer Systems science & engineering 2023年第1期44卷 579-593页

作者： V.R.Kavitha K.Nimala A.Beno K.C.Ramya Seifedine Kadry Byeong-Gwon Kang Yunyoung Nam Department of Computer Science and Engineering Prathyusha Engineering CollegeThiruvallur602025India Department of Networking and Communications SRM Institute of Science and TechnologyChennaiIndia Department of Electronics and Communication Engineering Dr.Sivanthi Aditanar College of EngineeringTiruchendur628215India Department of Electrical and Electronics Engineering Sri Krishna College of Engineering and TechnologyCoimbatore641008India Deparmtent of Applied Data Science Noroff University CollegeKristiansandNorway Department of Information and Communication Engineering Soonchunhyang UniversityAsanKorea Department of Computer Science and Engineering Soonchunhyang UniversityAsanKorea

Image Captioning is an emergent topic of research in the domain of artificial intelligence(AI).It utilizes an integration of computer Vision(CV)and Natural Language processing(NLP)for generating the image *** use in several application areas namely recommendation in editing applications,utilization in virtual assistance,*** development of NLP and deep learning(DL)modelsfind useful to derive a bridge among the visual details and textual *** this view,this paper introduces an Oppositional Harris Hawks Optimization with Deep Learning based Image Captioning(OHHO-DLIC)*** OHHO-DLIC technique involves the design of distinct levels of ***,the feature extraction of the images is carried out by the use of EfficientNet ***,the image captioning is performed by bidirectional long short term memory(BiLSTM)model,comprising encoder as well as *** last,the oppositional Harris Hawks optimization(OHHO)based hyperparameter tuning process is performed for effectively adjusting the hyperparameter of the EfficientNet and BiLSTM *** experimental analysis of the OHHO-DLIC technique is carried out on the Flickr 8k dataset and a comprehensive comparative analysis highlighted the better performance over the recent approaches.

关键词： Image captioning natural language processing artificial intelligence machine learning deep learning

来源：评论

学校读者我要写书评

暂无评论

DAMPER: A Dual-Stage Medical Report Generation Framework with Coarse-Grained MeSH Alignment and Fine-Grained Hypergraph Matching 39

DAMPER: A Dual-Stage Medical Report Generation Framework wit...

引用

39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025

作者： Huang, Xiaofei Chen, Wenting Liu, Jie Lu, Qisheng Luo, Xiaoling Shen, Linlin Computer Vision Institute College of Computer Science and Software Engineering Shenzhen University Shenzhen China Department of Electrical Engineering City University of Hong Kong Kowloon Hong Kong National Engineering Laboratory for Big Data System Computing Technology Shenzhen University China Guangdong Provincial Key Laboratory of Intelligent Information Processing China

ISBN: (纸本)157735897X

Medical report generation is crucial for clinical diagnosis and patient management, summarizing diagnoses and recommendations based on medical imaging. However, existing work often overlook the clinical pipeline involved in report writing, where physicians typically conduct an initial quick review followed by a detailed examination. Moreover, current alignment methods may lead to misaligned relationships. To address these issues, we propose DAMPER, a dual-stage framework for medical report generation that mimics the clinical pipeline of report writing in two stages. In the first stage, a MeSH-Guided Coarse-Grained Alignment (MCG) stage that aligns chest X-ray (CXR) image features with medical subject headings (MeSH) features to generate a rough keyphrase representation of the overall impression. In the second stage, a Hypergraph-Enhanced Fine-Grained Alignment (HFG) stage that constructs hypergraphs for image patches and report annotations, modeling high-order relationships within each modality and performing hypergraph matching to capture semantic correlations between image regions and textual phrases. Finally,the coarse-grained visual features, generated MeSH representations, and visual hypergraph features are fed into a report decoder to produce the final medical report. Extensive experiments on public datasets demonstrate the effectiveness of DAMPER in generating comprehensive and accurate medical reports, outperforming state-of-the-art methods across various evaluation metrics. Copyright © 2025, Association for the Advancement of Artificial Intelligence (***). All rights reserved.

关键词： Diagnosis

来源：评论

学校读者我要写书评

暂无评论

Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speech 39

Multi-modal and Multi-scale Spatial Environment Understandin...

引用

39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025

作者： Liu, Rui He, Shuwei Hu, Yifan Li, Haizhou Inner Mongolia University China Shenzhen Research Institute of Big Data School of Data Science The Chinese University of Hong Kong Shenzhen China Department of Electrical and Computer Engineering National University of Singapore Singapore

ISBN: (纸本)157735897X

Visual Text-to-Speech (VTTS) aims to take the environmental image as the prompt to synthesize the reverberant speech for the spoken content. The challenge of this task lies in understanding the spatial environment from the image. Many attempts have been made to extract global spatial visual information from the RGB space of an spatial image. However, local and depth image information are crucial for understanding the spatial environment, which previous works have ignored. To address the issues, we propose a novel multi-modal and multi-scale spatial environment understanding scheme to achieve immersive VTTS, termed M2SE-VTTS. The multimodal aims to take both the RGB and Depth spaces of the spatial image to learn more comprehensive spatial information, and the multi-scale seeks to model the local and global spatial knowledge simultaneously. Specifically, we first split the RGB and Depth images into patches and adopt the Gemini-generated environment captions to guide the local spatial understanding. After that, the multi-modal and multi-scale features are integrated by the local-aware global spatial understanding. In this way, M2SE-VTTS effectively models the interactions between local and global spatial contexts in the multi-modal spatial environment. Objective and subjective evaluations suggest that our model outperforms the advanced baselines in environmental speech generation. Copyright © 2025, Association for the Advancement of Artificial Intelligence (***). All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Knowledge Graphs Can be Learned with Just Intersection Features 41

Knowledge Graphs Can be Learned with Just Intersection Featu...

引用

41st International Conference on Machine Learning, ICML 2024

作者： Le, Duy Zhong, Shaochen Liu, Zirui Xu, Shuai Chaudhary, Vipin Zhou, Kaixiong Xu, Zhaozhuo Department of Computer and Data Sciences Case Western Reserve University United States Department of Computer Science Rice University United States Department of Electrical and Computer Engineering North Carolina State University United States Department of Computer Science Stevens Institute of Technology United States

Knowledge Graphs (KGs) are potent frameworks for knowledge representation and reasoning. Nevertheless, KGs are inherently incomplete, leaving numerous uncharted relationships and facts awaiting discovery. Deep learning methodologies have proven effective in enhancing KG completion by framing it as a link prediction task, where the goal is to discern the validity of a triple comprising a head, relation, and tail. The significance of structural information in assessing the validity of a triple within a KG is well-established. However, quantifying this structural information poses a challenge. We need to pinpoint the metric that encapsulates the structural information of a triple and smoothly incorporate this metric into the link prediction learning process. In this study, we recognize the critical importance of the intersection among the k-hop neighborhoods of the head, relation, and tail when determining the validity of a triple. To address this, we introduce a novel randomized algorithm designed to efficiently generate intersection features for candidate triples. Our experimental results demonstrate that a straightforward fully-connected network leveraging these intersection features can surpass the performance of established KG embedding models and even outperform graph neural network baselines. Additionally, we highlight the substantial training time efficiency gains achieved by our network trained on intersection features. Copyright 2024 by the author(s)

关键词： Knowledge graph

来源：评论

学校读者我要写书评

暂无评论

Generative Adversarial Networks for Spatio-Spectral Compression of Hyperspectral Images 14

Generative Adversarial Networks for Spatio-Spectral Compress...

引用

14th Workshop on Hyperspectral Imaging and Signal processing: Evolution in Remote Sensing, WHISPERS 2024

作者： Fuchs, Martin Hermann Paul Byju, Akshara Preethy Walda, Alisa Rasti, Behnood Demir, Begum Amrita School of Computing Amrita Vishwa Vidyapeetham Department of Computer Science and Engineering Amritapuri India Technische Universität Berlin Faculty of Electrical Engineering and Computer Science Germany BIFOLD - Berlin Institute for the Foundations of Learning and Data Germany

ISBN: (纸本)9798331513139

Deep learning-based hyperspectral image (HSI) compression has recently attracted great attention in remote sensing due to the growth of hyperspectral data archives. Most of the existing models achieve either spectral or spatial compression and do not jointly consider the spatio-spectral redundancies present in HSIs. To address this problem, in this paper, we propose High Fidelity Compression (HiFiC)-based models for spatio-spectral compression of HSIs. In detail, we introduce two new models: i) HiFiC using Squeeze and Excitation (SE) blocks (denoted as HiFiCsE);and ii) HiFiC with 3D convolutions (denoted as HiFiC3D) in the framework of compression of HSIs. We analyze the effectiveness of HiFiCsE and HiFiC3D in compressing the spatio-spectral redundancies with channel attention and inter-dependency analysis. Experimental results show the efficacy of the proposed models in performing spatio-spectral compression, while reconstructing images at reduced bitrates with higher reconstruction quality. The code of the proposed models is publicly available at https://***/rsim/HSI-SSC. © 2024 IEEE.

关键词： Generative adversarial networks

来源：评论

学校读者我要写书评

暂无评论

Predicting Cervical Cancer Using Deep Learning Mutation-based Atom Search Optimization (MASO) Algorithm 2

Predicting Cervical Cancer Using Deep Learning Mutation-base...

引用

2nd IEEE International Conference on Trends in Quantum Computing and Emerging Business Technologies, TQCEBT 2024

作者： Mayuri, A.V.R. Selvekumar Subash, K. Aeron, Anurag Vit Bhopal University School of Computing Science and Engineering Madhya Pradesh Sehore India Bahir Dar University Bahir Dar Institute of Technology Department of Electrical and Computer Engineering Ethiopia Department of Data Science Trichy India Miet Meerut Department of Computer Science and Engineering Uttar Pradesh India

ISBN: (纸本)9798350384277

Worldwide, women are compressed by cervical cancer, which is a prevalent malignancy. This disease, which is currently the fourth leading cause of death for women, shows no symptoms when it first arises. Cells that cause cervical cancer multiply gradually at the cervix. If this cancer is detected early enough, treatment can be effective. Presently, it is difficult for medical workers to detect this kind of cancer before it spreads rapidly. This study used a number of machine learning classification algorithms with risk markers to predict cervical cancer. These study suggestions mutation-based Atom Search Optimization (MASO) with a Deep Convolutional Neural Network (DCNN) to offer a original method for automated diagnosis of uterine cervical cancer. Though DCNN is used to extract features from medical imaging data, MASO is used to increase the optimization of the diagnostic model. Composed, these technologies have the potential to rise cervical cancer diagnosis accuracy and efficiency, offering an automated approach for early detection and prompt intervention. Our proposed model established a strong 95% accuracy rate for generalization. This study explores the difficulties caused by missing values and class imbalance in the specific dataset with the goal of assisting medical professionals in the early detection of cervical cancer and improving treatment for people that are impacted by the illness. © 2024 IEEE.

关键词： Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

DPT‐tracker:Dual pooling transformer for efficient visual tracking

引用

CAAI Transactions on Intelligence Technology 2024年第4期9卷 948-959页

作者： Yang Fang Bailian Xie Uswah Khairuddin Zijian Min Bingbing Jiang Weisheng Li Key Laboratory of Data Engineering and Visual Computing Chongqing University of Posts and TelecommunicationsChongqingChina Department of Mechanical Precision Engineering Malaysia‐Japan International Institute of TechnologyUniversity of Technology MalaysiaKuala LumpurMalaysia Department of Electrical and Computer Engineering Inha UniversityIncheonRepublic of Korea School of Information Science and Technology Hangzhou Normal UniversityHangzhouChina

Transformer tracking always takes paired template and search images as encoder input and conduct feature extraction and target‐search feature correlation by self and/or cross attention operations,thus the model complexity will grow quadratically with the number of input *** alleviate the burden of this tracking paradigm and facilitate practical deployment of Transformer‐based trackers,we propose a dual pooling transformer tracking framework,dubbed as DPT,which consists of three components:a simple yet efficient spatiotemporal attention model(SAM),a mutual correlation pooling Trans-former(MCPT)and a multiscale aggregation pooling Transformer(MAPT).SAM is designed to gracefully aggregates temporal dynamics and spatial appearance information of multi‐frame templates along space‐time *** aims to capture multi‐scale pooled and correlated contextual features,which is followed by MAPT that aggregates multi‐scale features into a unified feature representation for tracking *** tracker achieves AUC score of 69.5 on LaSOT and precision score of 82.8 on Track-ingNet while maintaining a shorter sequence length of attention tokens,fewer parameters and FLOPs compared to existing state‐of‐the‐art(SOTA)Transformer tracking *** experiments demonstrate that DPT tracker yields a strong real‐time tracking baseline with a good trade‐off between tracking performance and inference efficiency.

关键词： human‐computer interfacing image motion analysis pattern recognition signal processing tracking

来源：评论

学校读者我要写书评

暂无评论

A Global Geometric Analysis of Maximal Coding Rate Reduction 41

A Global Geometric Analysis of Maximal Coding Rate Reduction

引用

41st International Conference on Machine Learning, ICML 2024

作者： Wang, Peng Liu, Huikang Pai, Druv Yu, Yaodong Zhu, Zhihui Qu, Qing Ma, Yi Department of Electrical Engineering and Computer Science University of Michigan Ann Arbor United States Antai College of Economics and Management Shanghai Jiao Tong University Shanghai China Department of Electrical Engineering and Computer Science University of California Berkeley United States Department of Computer Science and Engineering The Ohio State University Columbus United States Institute of Data Science University of Hong Kong Hong Kong

The maximal coding rate reduction (MCR2) objective for learning structured and compact deep representations is drawing increasing attention, especially after its recent usage in the derivation of fully explainable and highly effective deep network architectures. However, it lacks a complete theoretical justification: only the properties of its global optima are known, and its global landscape has not been studied. In this work, we give a complete characterization of the properties of all its local and global optima, as well as other types of critical points. Specifically, we show that each (local or global) maximizer of the MCR2 problem corresponds to a low-dimensional, discriminative, and diverse representation, and furthermore, each critical point of the objective is either a local maximizer or a strict saddle point. Such a favorable landscape makes MCR2 a natural choice of objective for learning diverse and discriminative representations via first-order optimization methods. To validate our theoretical findings, we conduct extensive experiments on both synthetic and real data sets. Copyright 2024 by the author(s)

关键词： Network architecture

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：