检索结果-内蒙古大学图书馆

5th International Symposium on computer engineering and Intelligent Communications, ISCEIC 2024

作者： Gu, Tao Zhang, Chongyang School of Computer Science and Engineering Nanjing University of Science and Technology Nanjing China

ISBN: (纸本)9798331518677

Recent advancements in Vision-Language Pre-training (VLP) techniques have greatly improved performance in Scene Text Detection tasks by leveraging the rich visual and textual content in scene text images. We propose an innovative integration of contrastive learning with Masked Language Modeling (MLM) and Masked Image Modeling (MIM), inspired by the Masked Autoencoder (MAE) approach. This synthesis enhances self-supervised learning by combining discriminative and generative learning. Notably, we introduce masked image modeling for text detection, enabling effective representation learning in text image regions. To distinguish our approach from traditional BERT's MLM, we develop a custom tokenizer tailored for scene text detection. Our pre-trained model aligns visual and textual information, resulting in improved performance of existing scene text detectors. Extensive experiments on datasets such as ICDAR2015 show that our method significantly outperforms previous pre-training approaches. © 2024 IEEE.

关键词： Contrastive Learning

来源：评论

学校读者我要写书评

暂无评论

BiMoeFormer: A Novel Transformer-Based Framework for 3D Human Motion Prediction with Bi-Level Routing Attention and Mixture of Experts 5

BiMoeFormer: A Novel Transformer-Based Framework for 3D Huma...

引用

5th International Seminar on Artificial Intelligence, Networking and Information technology, AINIT 2024

作者： Zhang, Wei Nanjing University of Science and Technology School of Computer Science and Engineering Nanjing China

ISBN: (纸本)9798350385557

We introduce a novel BiMoeFormer based on the Transformer architecture for 3D human motion prediction. Previous approaches primarily focus on the relationships between body joints in human poses, while neglecting their independencies. Our method addresses this by incorporating a Bi-Level Routing Attention (BRA) mechanism, which emphasizes balancing the correlation and independence among these joints. More specifically, BRA selects the top k most relevant joints for each joint, enhancing the overall modeling capability. Moreover, to accurately capture the diversity and complexity of human motion, we employ a Mixture of Experts (MoE) architecture to enhance the expressiveness of the Transformer. This allows us to control the model complexity by sharing the MoE across blocks. Comprehensive evaluations on the Human3.6M, AMASS, and 3DPW datasets demonstrate that our method consistently outperforms state-of-the-art baselines. © 2024 IEEE.

关键词： Motion estimation

来源：评论

学校读者我要写书评

暂无评论

Skinned Motion Retargeting with Preservation of Body Part Relationships

引用

IEEE Transactions on Visualization and computer Graphics 2024年 PP卷 1-13页

作者： Zhang, Jia-Qi Wang, Miao Zhang, Fu-Cheng Zhang, Fang-Lue State Key Laboratory of Virtual Reality Technology and Systems Beihang University Beijing China State Key Laboratory of Virtual Reality Technology and Systems School of Computer Science and Engineering Beihang University Beijing China School of Engineering and Computer Science Victoria University of Wellington New Zealand

Motion retargeting is an active research area in computer graphics and animation, allowing for the transfer of motion from one character to another, thereby creating diverse animated character data. While this technology has numerous applications in animation, games, and movies, current methods often produce unnatural or semantically inconsistent motion when applied to characters with different shapes or joint counts. This is primarily due to a lack of consideration for the geometric and spatial relationships between the body parts of the source and target characters. To tackle this challenge, we introduce a novel spatially-preserving Skinned Motion Retargeting Network (SMRNet) capable of handling motion retargeting for characters with varying shapes and skeletal structures while maintaining semantic consistency. By learning a hybrid representation of the character's skeleton and shape in a rest pose, SMRNet transfers the rotation and root joint position of the source character's motion to the target character through embedded rest pose feature alignment. Additionally, it incorporates a differentiable loss function to further preserve the spatial consistency of body parts between the source and target. Comprehensive quantitative and qualitative evaluations demonstrate the superiority of our approach over existing alternatives, particularly in preserving spatial relationships more effectively IEEE

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

Intelligent Internet of Things in Mammography Screening Using Multicenter Transformation Between Unified Capsules

引用

IEEE Internet of Things Journal 2023年第2期10卷 1536-1545页

作者： Wang, Boyan Hu, Xuegang Zhang, Jinglin Xu, Chenchu Gao, Zhifan Hefei University of Technology School of Computer Science and Information Engineering Hefei China Shandong University School of Control Science and Engineering Jinan250061 China Anhui University School of Computer Science and Technology Hefei230093 China Sun Yat-sen University School of Biomedical Engineering Guangzhou510275 China

Mammography screening is one of the important applications for the intelligent Internet of Things (IoT). Due to the efficient and personalized cyber-medicine system, early diagnosis can successfully reduce the breast cancer mortality rate by AI-driven healthcare. However, it is a huge challenge to extend the conventional single-center into the multicenter mammography screening, thus improving the effectiveness and robustness of intelligent IoT-based devices. To address this problem, we utilize multicenter mammograms by the modified capsule neural network and propose a novel framework called multicenter transformation between unified capsules (MLT-UniCaps) in this article. The proposed MLT-UniCaps is composed of Attentional Pose Embedding, Dynamic Source Capsule Traversal, and Adaptive Target Capsule Fusion to realize an intelligent remote assistant diagnosis. Attentional Pose Embedding extracts feature vectors via variations in position, orientation, scale, and lighting as the poses through an adversarial convolutional neural network with an attention-based layer. Based on the pose presentation, Dynamic Source Capsule Traversal deploys a dynamic routing mechanism between neurons to build a source cancer classifier for single-center mammography screening. Using the source cancer classifier, Adaptive Target Capsule Fusion integrates various centers of mammograms as the universal cancer detectors and optimizes heterogeneous distribution among them by the transformation-likelihood maximization. Owing to the three components, MLT-UniCaps effectively improves the results of single-center mammography screening and works in the multicenter breast cancer diagnosis. By comprehensive experiments on 58 965 samples, the proposed MLT-UniCaps obtains 90.1% of overall classification accuracy on single-center trials and 73.8% of overall F1 score on multicenter trials. All the experimental results illustrated that our MLT-UniCaps, an intelligent IoT-based clinical tool, inures the be

关键词： Mammography

来源：评论

学校读者我要写书评

暂无评论

An Efficient Dialogue Policy Agent with Model-Based Causal Reinforcement Learning 31

An Efficient Dialogue Policy Agent with Model-Based Causal R...

引用

31st International Conference on Computational Linguistics, COLING 2025

作者： Xu, Kai Wang, Zhenyu Zhao, Yangyang Fang, Bopeng School of Software Engineering South China University of Technology Guangdong China Department of Computer and Communication Engineering Changsha University of Science and Technology Changsha China School of Computer Science and Engineering University of Electronic Science and Technology of China Chengdu China

ISBN: (纸本)9798891761964

Dialogue policy trains an agent to select dialogue actions frequently implemented via deep reinforcement learning (DRL). The model-based reinforcement methods built a world model to generate simulated data to alleviate the sample inefficiency. However, traditional world model methods merely consider one-step dialogues, leading to an inaccurate environmental simulation. Furthermore, different users may have different intention preferences, while most existing studies lack consideration of the intention-preferences causal relationship. This paper proposes a novel framework for dialogue policy learning named MCA, implemented through model-based reinforcement learning with automatically constructed causal chains. The MCA model utilizes an autoregressive Transformer to model dialogue trajectories, enabling a more accurate simulation of the environment. Additionally, it constructs a causal chains module that outputs latent preference distributions for intention-action pairs, thereby elucidating the relationship between user intentions and agent actions. The experimental results show that MCA can achieve state-of-the-art performances on three dialogue datasets over the compared dialogue agents, highlighting its effectiveness and robustness. © 2025 Association for Computational Linguistics.

关键词： Deep reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

MUSE: Multi-Knowledge Passing on the Edges, Boosting Knowledge Graph Completion 23

MUSE: Multi-Knowledge Passing on the Edges, Boosting Knowled...

引用

23rd International Conference on Machine Learning and Cybernetics, ICMLC 2024

作者： Liu, Pengjie School of Computer Science and Engineering Southern University of Science and Technology Shenzhen China

ISBN: (纸本)9798331528041

Knowledge Graph Completion (KGC) aims to predict the missing information in the (head entity)-[relation]-(tail entity) triplet. Deep Neural Networks have achieved significant progress in the relation prediction task. However, most existing KGC methods focus on single features (e.g., entity IDs) and sub-graph aggregation, which cannot fully explore all the features in the Knowledge Graph (KG), and neglect the external semantic knowledge injection. To address these problems, we propose MUSE, a knowledgeaware reasoning model to learn a tailored embedding space in three dimensions for missing relation prediction through a multiknowledge representation learning mechanism. Our MUSE consists of three parallel components: 1) Prior Knowledge Learning for enhancing the triplets' semantic representation by finetuning BERT;2) Context Message Passing for enhancing the context messages of KG;3) Relational Path Aggregation for enhancing the path representation from the head entity to the tail entity. Our experimental results show that MUSE significantly outperforms other baselines on four public datasets, such as over 5.50% improvement in H@1 and 4.20% improvement in MRR on the NELL995 dataset. The code and all datasets will be released via https://***/NxxTGT/MUSE. © 2024 IEEE.

关键词： Knowledge graph

来源：评论

学校读者我要写书评

暂无评论

SCMotion: Predicting Human Motion Through Spatial Dependence and Center Anchoring 7

SCMotion: Predicting Human Motion Through Spatial Dependence...

引用

7th International Conference on computer Information science and Application technology, CISAT 2024

作者： Chen, Siqi Nanjing University of Science and Technology School of Computer Science and Engineering Nanjing China

ISBN: (纸本)9798350375107

Human motion prediction aims to predict future human motion based on past observations, playing important roles in several fields. However, previous works have often focused on the temporal sequential nature of human motion while neglecting the coordinated linkage of multiple components of the human body in space. Therefore, we introduce SCMotion, focused on predicting human motion using spatial dependencies. We divide the human body joints into four joint groups and predict each module individually. Additionally, SCMotion uses the more stable center of gravity of the human body as an anchor point for prediction, greatly enhancing accuracy. Our experiments on the H3.6M dataset demonstrate the excellent prediction capabilities of SCMotion. © 2024 IEEE.

关键词： Graph Convolutional Networks Human Motion Prediction Humanrobot Interaction

来源：评论

学校读者我要写书评

暂无评论

Automatic text inpainting and quality elevation in video sequences

引用

Multimedia Tools and Applications 2024年 1-34页

作者： Palivela, Lakshmi Harika Dharmalingam, Vivekanandan Gayathri, D. Bala School of Computer Science and Engineering Vellore Institute of Technology Chennai Vandalur-Kelambakkam Road Chennai600127 India Department of Information Technology Madras Institute of Technology Anna University Chennai India Department of Computer Science and Engineering B.S.Abdur Rahman Crescent Institute Of Science and Technology Chennai India

Scene text removal is a recent development in computer vision that replaces text patches in natural images with the appropriate background. Text removal is a difficult process leading to faulty areas of text containing text strokes with their hazy backgrounds. Text in the real world uses a variety of font kinds, some of which are difficult to localize due to their chaotic shapes, varied shading degrees, and orientation *** text erasing may include the subtasks of text detection as well as text inpainting. Both subtasks require a large amount of data to be successful;but, existing approaches were limited by insufficient real-world data for scene-text elimination. Eventhough the existing works produced considerable performance improvement in scene text removal, they often leave many text remains like text strokes, thus producinglow-quality visual outcomes. Therefore, this paper proposes an automatic text inpainting and video quality elevation model by using the Improved Convolutional Network-based ***, the video samples are collected from the diverse datasets and then converted into frames. Next, the frames are deblurred using an enhanced Convolutional Neural Network (CNN) model that has three convolutional layers for accurately localizing the texts in frames. Subsequently, the texts are detected by utilizing the CLARA-based VGG-16 network. Afterward, the text strokes are removed using a convolutional Encoder and decoder network to eliminate the presence of text on complex backgrounds and textures. Here, the coordinates of text in the deblurred frames are used to crop out the text stroke regions. So, the texts are in-painted, and then, the text in-painted regions are pasted back to their original positions in the frames. Furthermore, the video quality is elevated with the help of the DenseNet-centric Enhancement network. The experimental outcomes demonstrate that the proposed model effectively removed scene texts and enhanced the video qu

关键词： Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

Method for Relation Extraction Based on Entity Boundary Features 24

Method for Relation Extraction Based on Entity Boundary Feat...

引用

2024 7th International Conference on computer Information science and Artificial Intelligence, CISAI 2024

作者： Wang, Zining Ge, Jike Yang, Xiaolu Tan, Jie Xiang, Yu School of Computer Science and Engineering Chongqing University of Science and Technology Chongqing China

ISBN: (纸本)9798400707254

As a key task in natural language processing, the current knowledge extraction methods mostly involve joint extraction, simultaneously extracting named entities and relationships. When conducting relationship extraction, entities containing multiple words are usually represented using pooling or weighted sum methods, leading to the neglect of entity boundary information. To address the issue of reduced accuracy in relationship classification due to the introduction of intermediate noise information based on entity segment features, a relationship extraction method focusing on entity boundary features is proposed. Firstly, the semantic representation of the text is obtained using the pre-trained language model DeBERTa. Then, features of the head and tail positions of entities in the text are extracted and concatenated into a composite embedding vector containing information from these two key positions, used to enhance entity representation. By concatenating the composite embedding vectors of the head and tail entities, the final embedding vector of the entity pair is obtained, representing the information of each entity and enhancing the model’s understanding of entity boundaries and semantic information. Finally, the existence of relationships between entity pairs is determined by the relationship discriminator and relationship classifier, simultaneously classifying the relationship types of the entity pairs for relationship classification. Experimental results show that the proposed method achieves F1 scores of 89.3%, 94.4%, 94.2%, and 94.9% on the NYT, NYT*, WebNLG, and WebNLG* datasets, respectively, significantly outperforming the baseline models. Additionally, ablation experiments demonstrate the importance of the relationship discriminator, further confirming the superiority of the relationship extraction method focusing on entity boundary features. © 2024 Copyright held by the owner/author(s).

关键词： Boundary features

来源：评论

学校读者我要写书评

暂无评论

An Intelligent Privacy Protection Scheme for Efficient Edge Computation Offloading in IoV

引用

Chinese Journal of Electronics 2024年第4期33卷 910-919页

作者： Liang YAO Xiaolong XU Wanchun DOU Muhammad Bilal School of Software Nanjing University of Information Science and Technology State Key Laboratory for Novel Software Technology Nanjing University Department of Computer and Electronics Systems Engineering Hankuk University of Foreign Studies

As a pivotal enabler of intelligent transportation system(ITS), Internet of vehicles(Io V) has aroused extensive attention from academia and industry. The exponential growth of computation-intensive, latency-sensitive,and privacy-aware vehicular applications in Io V result in the transformation from cloud computing to edge computing,which enables tasks to be offloaded to edge nodes(ENs) closer to vehicles for efficient execution. In ITS environment,however, due to dynamic and stochastic computation offloading requests, it is challenging to efficiently orchestrate offloading decisions for application requirements. How to accomplish complex computation offloading of vehicles while ensuring data privacy remains challenging. In this paper, we propose an intelligent computation offloading with privacy protection scheme, named COPP. In particular, an Advanced Encryption Standard-based encryption method is utilized to implement privacy protection. Furthermore, an online offloading scheme is proposed to find optimal offloading policies. Finally, experimental results demonstrate that COPP significantly outperforms benchmark schemes in the performance of both delay and energy consumption.

关键词： Industries Privacy Energy consumption Transportation Computational efficiency Encryption Protection

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：