检索结果-内蒙古大学图书馆

Correlation-aware Cross-modal Attention Network for Fashion Compatibility Modeling in UGC Systems

ACM Transactions on Multimedia Computing, Communications, and Applications 1000年

作者： Kai Cui Shenghao Liu Wei Feng Xianjun Deng Liangbin Gao Minmin Cheng Hongwei Lu Laurence T. Yang Hubei Key Laboratory of Distributed System Security Hubei Engineering Research Center on Big Data Security School of Cyber Science and Engineering Huazhong University of Science and Technology China China Nuclear Power Operation Technology Corporation Ltd. China

Empowered by the continuous integration of social multimedia and artificial intelligence, the application scenarios of information retrieval (IR) progressively tend to be diversified and personalized. Currently, User-Generated Content (UGC) systems have great potential to handle the interactions between large-scale users and massive media contents. As an emerging multimedia IR, Fashion Compatibility Modeling (FCM) aims to predict the matching degree of each given outfit and provide complementary item recommendation for user queries. Although existing studies attempt to explore the FCM task from a multimodal perspective with promising progress, they still fail to fully leverage the interactions between multimodal information or ignore the item-item contextual connectivities of intra-outfit. In this paper, a novel fashion compatibility modeling scheme is proposed based on Correlation-aware Cross-modal Attention Network. To better tackle these issues, our work mainly focuses on enhancing comprehensive multimodal representations of fashion items by integrating the cross-modal collaborative contents and uncovering the contextual correlations. Since the multimodal information of fashion items can deliver various semantic clues from multiple aspects, a modality-driven collaborative learning module is presented to explicitly model the interactions of modal consistency and complementarity via a co-attention mechanism. Considering the rich connections among numerous items in each outfit as contextual cues, a correlation-aware information aggregation module is further designed to adaptively capture significant intra-correlations of item-item for characterizing the content-aware outfit representations. Experiments conducted on two real-world fashion datasets demonstrate the superiority of our approach over state-of-the-art methods.

关键词： User-Generated Contents fashion intelligent analysis complementary compatibility modeling multimodal representation learning correlation-aware integration strategy

来源：评论

学校读者我要写书评

暂无评论

Interpreting Deep Forest through Feature Contribution and MDI Feature Importance

引用

ACM Transactions on Knowledge Discovery from data 1000年

作者： Yi-Xiao He Shen-Huan Lyu Yuan Jiang National Key Laboratory for Novel Software Technology and School of Artificial Intelligence Nanjing University China Key Laboratory of Water Big Data Technology of Ministry of Water Resources and College of Computer Science and Software Engineering Hohai University China

Deep forest is a non-differentiable deep model that has achieved impressive empirical success across a wide variety of applications, especially on categorical/symbolic or mixed modeling tasks. Many of the application fields prefer explainable models, such as random forests with feature contributions that can provide a local explanation for each prediction, and Mean Decrease Impurity (MDI) that can provide global feature importance. However, deep forest, as a cascade of random forests, possesses interpretability only at the first layer. From the second layer on, many of the tree splits occur on the new features generated by the previous layer, which makes existing explaining tools for random forests inapplicable. To disclose the impact of the original features in the deep layers, we design a calculation method with an estimation step followed by a calibration step for each layer, and propose our feature contribution and MDI feature importance calculation tools for deep forest. Experimental results on both simulated data and real-world data verify the effectiveness of our methods.

关键词： deep forest feature importance interpretability

来源：评论

学校读者我要写书评

暂无评论

A Relation-Constraint Link Prediction Model for Dynamic Knowledge Graphs with Entity Drift

引用

ACM Transactions on Knowledge Discovery from data 1000年

作者： Xiulin Zheng Peipei Li Zan Zhang Jia Wu Xindong Wu Key Laboratory of Knowledge Engineering with Big Data (the Ministry of Education of China) Hefei University of Technology China and School of Computer Science and Information Engineering Hefei University of Technology China School of Computing Faculty of Science and Engineering Macquarie University Australia

Knowledge Graphs (KGs) often suffer from incompleteness and this issue motivates the task of Knowledge Graph Completion (KGC). Traditional KGC models mainly concentrate on static KGs with a fixed set of entities and relations, or dynamic KGs with temporal characteristics, faltering in their generalization to constantly evolving KGs with possible irregular entity drift. Thus, in this paper, we propose a novel link prediction model based on the embedding representation to handle the incompleteness of KGs with entity drift, termed as DCEL. Unlike traditional link prediction, DCEL could generate precise embeddings for drifted entity without imposing any regular temporal characteristic. The drifted entity is added into the KG with its links to the existing entity predicted in an incremental fashion with no requirement to retrain the whole KG for computational efficiency. In terms of DCEL model, it fully takes advantages of unstructured textual description, and is composed of four modules, namely MRC (Machine Reading Comprehension), RCAA (Relation Constraint Attentive Aggregator), RSA (Relation Specific Alignment) and RCEO (Relation Constraint Embedding Optimization). Specifically, the MRC module is first employed to extract short texts from long and redundant descriptions. Then, RCAA is used to aggregate the embeddings of textual description of drifted entity and the pre-trained word embeddings learned from corpus to a single text-based entity embedding while shielding the impact of noise and irrelevant information. After that, RSA is applied to align the text-based entity embedding to graph-based space to obtain the corresponding graph-based entity embedding, and then the learned embeddings are fed into the gate structure to be optimized based on the RCEO to improve the accuracy of representation learning. Finally, the graph-based model TransE is used to perform link prediction for drifted entity. Extensive experiments conducted on benchmark datasets in terms of evaluat

关键词： dynamic knowledge graph entity drift link prediction relation constraint

来源：评论

学校读者我要写书评

暂无评论

MAINet: Modality-Aware Interaction Network for Medical Image Fusion

引用

ACM Transactions on Multimedia Computing, Communications, and Applications 1000年

作者： Lisi Wei Libo Zhao Xiaoli Zhang College of Computer Science and Technology Jilin University China College of Artificial Intelligence and Big Data Hulunbuir University China and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education Jilin University China College of Computer Science and Technology Jilin University China and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education Jilin University China

Due to the limitations of imaging sensors, obtaining a medical image that simultaneously captures both functional metabolic data and structural tissue details remains a significant challenge in clinical diagnosis. To address this, Multimodal Medical Image Fusion (MMIF) has emerged as an effective technique for integrating complementary information from multimodal source images, such as CT, PET, and SPECT, which is critical for providing a comprehensive understanding of both anatomical and functional aspects of the human body. One of the key challenges in MMIF is how to exchange and aggregate this multimodal information. This paper rethinks MMIF by addressing the harmony of modality gaps and proposes a novel Modality-Aware Interaction Network (MAINet), which leverages cross-modal feature interaction and progressively fuses multiple features in graph space. Specifically, we introduce two key modules: the Cascade Modality Interaction (CMI) module and the Dual-Graph Learning (DGL) module. The CMI module, integrated within a multi-scale encoder with triple branches, facilitates complementary multimodal feature learning and provides beneficial feedback to enhance discriminative feature learning across modalities. In the decoding process, the DGL module aggregates hierarchical features in two distinct graph spaces, enabling global feature interactions. Moreover, the DGL module incorporates a bottom-up guidance mechanism, where deeper semantic features guide the learning of shallower detail features, thus improving the fusion process by enhancing both scale diversity and modality awareness for visual fidelity results. Experimental results on medical image datasets demonstrate the superiority of the proposed method over existing fusion approaches in both subjective and objective evaluations. We also validated the performance of the proposed method in applications such as infrared-visible image fusion and medical image segmentation.

关键词： Multimodal fusion Medical image fusion Cascade modality interaction Modality-awareness Graph convolutional network

来源：评论

学校读者我要写书评

暂无评论

Contrastive Learning based Speech Spoofing Detection for Multimedia Security in Edge Intelligence

引用

ACM Transactions on Multimedia Computing, Communications, and Applications 1000年

作者： Jiaqi Sun Xianjun Deng Shenghao liu Xiaoxuan Fan Yongling Huang Yuanyuan He Celimuge Wu Jong Hyuk Park Hubei Key Laboratory of Distributed System Security Hubei Engineering Research Center on Big Data Security School of Cyber Science and Engineering Huazhong University of Science and Technology China The University of Electro-Communications Japan Seoul National University of Science and Technology South Korea

Artificial intelligence (AI) empowered edge computing has given rise to a new paradigm and effectively facilitated the promotion and development of multimedia applications. The speech assistant is one of the significant services provided by multimedia applications, which aims to offer intelligent interactive experiences between humans and machines. However, malicious attackers may exploit spoofed speeches to deceive speech assistants, posing great challenges to the security of multimedia applications. The limited resources of multimedia terminal devices hinder their ability to effectively load speech spoofing detection models. Furthermore, processing and analyzing speech in the cloud can result in poor real-time performance and potential privacy risks. Existing speech spoofing detection methods rely heavily on annotated data and exhibit poor generalization capabilities for unseen spoofed speeches. To address these challenges, this paper first proposes the Coordinate Attention Network (CA2Net) that consists of coordinate attention blocks and Res2Net blocks. CA2Net can simultaneously extract temporal and spectral speech feature information and represent multi-scale speech features at a granularity level. Besides, a contrastive learning-based speech spoofing detection framework named GEMINI is proposed. GEMINI can be effectively deployed on edge nodes and autonomously learn speech features with strong generalization capabilities. GEMINI first performs data augmentation on speech signals and extracts conventional acoustic features to enhance the feature robustness. Subsequently, GEMINI utilizes the proposed CA2Net to further explore the discriminative speech features. Then, a tensor-based multi-attention comparison model is employed to maximize the consistency between speech contexts. GEMINI continuously updates CA2Net with contrastive learning, which enables CA2Net to effectively represent speech signals and accurately detect spoofed speeches. Extensive experiments on

关键词： Edge intelligence Multimedia applications Speech spoofing detection Contrastive learning Coordinate attention

来源：评论

学校读者我要写书评

暂无评论

Model Pruning-enabled Federated Split Learning for Resource-constrained Devices in Artificial Intelligence Empowered Edge Computing Environment

引用

ACM Transactions on Sensor Networks 1000年

作者： Yongzhe Jia Bowen Liu Xuyun Zhang Fei Dai Arif Khan Lianyong Qi Wanchun Dou State Key Laboratory for Novel Software Technology Department of Computer Science and Technology Nanjing University Nanjing China Department of Computing Macquarie University Sydney Australia College of Big Data and Intelligent Engineering Southwest Forestry University Kunming China M3S Empirical Software Engineering Research Unit University of Oulu Oulu Finland College of Computer Science and Technology China University of Petroleum East China - Qingdao Campus Qingdao China

Distributed Collaborative Machine Learning (DCML) has emerged in artificial intelligence-empowered edge computing environments, such as the Industrial Internet of Things (IIoT), to process tremendous data generated by smart devices. However, parallel DCML frameworks require resource-constrained devices to update the entire Deep Neural Network (DNN) models and are vulnerable to reconstruction attacks. Concurrently, the serial DCML frameworks suffer from training efficiency problems due to their serial training nature. In this paper, we propose a Model Pruning-enabled Federated Split Learning framework (MP-FSL) to reduce resource consumption with a secure and efficient training scheme. Specifically, MP-FSL compresses DNN models by adaptive channel pruning and splits each compressed model into two parts that are assigned to the client and the server. Meanwhile, MP-FSL adopts a novel aggregation algorithm to aggregate the pruned heterogeneous models. We implement MP-FSL with a real FL platform to evaluate its performance. The experimental results show that MP-FSL outperforms the state-of-the-art frameworks in model accuracy by up to 1.35%, while concurrently reducing storage and computational resource consumption by up to 32.2% and 26.73%, respectively. These results demonstrate that MP-FSL is a comprehensive solution to the challenges faced by DCML, with superior performance in both reduced resource consumption and enhanced model performance.

关键词： Federated learning split learning model pruning edge computing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：