检索结果-内蒙古大学图书馆

Efficient image Semantic Representation and Visual-Textual Semantic Fusion for Multimodal Relation Extraction and Multimodal-Named Entity Recognition

引用

JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS 2025年第8期34卷

作者： Zhang, Qingchuan Wei, Siwei Alqahtani, Fayez Almakhadmeh, Zafer Cai, Yuanyuan Beijing Technol & Business Univ Natl Engn Res Ctr Agriprod Qual Traceabil 11 & 33 Fucheng Rd Beijing 100048 Peoples R China King Saud Univ Coll Comp & Informat Sci Software Engn Dept Riyadh 12372 Saudi Arabia King Saud Univ Community Coll Comp Sci Dept Riyadh Saudi Arabia

Recently, multimodal relation extraction (MRE) and multimodal-named entity recognition (MNER) have attracted widespread attention. However, prior research works have encountered challenges including inadequate semantic representation of images, cross-modal information fusion, and irrelevance between some images and text. To enhance semantic representation, we employ CLIP's image encoder, vision transformer (VIT), to generate visual features representing different semantic intensities. Addressing cross-modal semantic gaps, we introduce an image caption generation model and BERT to sequentially generate image captions and their features, transforming both modalities into text. Dynamic gates and attention mechanisms are introduced to efficiently fuse visual features, image description text features, and text features, mitigating noise from image-text irrelevance. Eventually, we successfully constructed an efficient MRE and MNER model. The experimental outcomes demonstrate that the model proposed in this paper improves 2.2% to 0.18% on the MRE and MNER datasets. Our code is available at https://***/SiweiWei6/VIT-CMNet.

关键词： Multimodal relation extraction multimodal named entity recognition image caption multimodal information fusion image encoder

来源：评论

学校读者我要写书评

暂无评论

PCCM-GAN: Photographic Text-to-image Generation with Pyramid Contrastive Consistency Model

引用

NEUROCOMPUTING 2021年 449卷 330-341页

作者： Zhongjian, Q. Sun, Jun Qian, Jinzhao Xu, Jiajia Zhan, Shu Hefei Univ Technol Key Lab Knowledge Engn Big Data Minist Educ Hefei Peoples R China Hefei Univ Technol Sch Comp & Informat Hefei 230601 Anhui Peoples R China Tsinghua Univ Dept Automat Beijing 100084 Peoples R China iFlytek Co Ltd Hefei 230088 Anhui Peoples R China

Synthesizing photographic images from given text descriptions is a challenging problem. Although previous many studies have made significant progress on the visual quality of the generated images by using the multi-stage and attentional network, they ignore the interrelationships between the images generated by the generator in each stage and simply leverage the attention mechanism. In this paper, the Photographic Text-to-image Generation with Pyramid Contrastive Consistency Model (PCCM-GAN) is proposed to generate photographic images. PCCM-GAN introduces two modules: a Pyramid Contrastive Consistency Model (PCCM) and a stack attention model (Stack-Attn). Based on generated images from the different stages, PCCM is proposed to compute a contrastive loss for training the generator. Stack-Attn concentrates on generating images with more details and better semantic consistency by stacking the global-local attention mechanism. And visual inspection of the inner product of PCCM and Stack-Attn is also performed to validate their effectiveness. Extensive experiments and ablation studies on the CUB and MS-COCO datasets prove the superiority of the proposed method. (c) 2021 Published by Elsevier B.V.

关键词： Text-to-image GAN Text encoder image encoder

来源：评论

学校读者我要写书评

暂无评论

Memory-Efficient Multiplier-Less 2-D DWT Design Using Combined Convolution and Lifting Schemes for Wireless Visual Sensors

引用

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 2024年第4期32卷 695-703页

作者： Mohanty, Basant Kumar Sambalpur Univ Inst Informat Technol Dept Elect & Commun Engn Burla 768019 Odisha India

In this article, the combined convolution-lifting scheme is explored to address the design issues of 2-D discrete wavelet transform (DWT) structures. We found that the combined convolution-lifting scheme of type-1 (convolution followed by lifting) is more suitable than convolution or lifting schemes to design 2-D DWT structures with less on-chip memory. Further more, the canonic signed digit (CSD)-based multiplier-less designs are presented for convolution-DWT and lifting-DWT using 9/7 biorthogonal filters, and they have identical resource requirements for 12-bit coefficients. The proposed multiplier-less designs of convolution-DWT and lifting-DWT are used to derive a 2-D DWT structure to take advantage of the combined convolution-lifting scheme. The comparison result shows that the proposed combined 2-D DWT structure involves 24xlessarea-delay-product (ADP) and 17xless energy per image (EPI)compared with the best of the existing fractional wavelet trans-form (FrWT)-based structure and provides reconstructed images of 14 dB higher peak signal-to-noise ratio (PSNR). Compared with the recently proposed approximate lifting (ALF) 2-D DWT structure, the proposed combined 2-D DWT structure involves4.5xless ADP, 2.2xless EPI, less on-chip memory by 4Nwords and provides reconstructed images of PSNR higher by7 dB, where Nis the image width or height. Therefore, the proposed combined 2-D DWT structure is a better alternative to the existing 2-D DWT structures for low-complexity and low-memory realization of 2-D DWT especially for the visual sensor node applications

关键词： Discrete wavelet transforms image encoder VLSI architecture visual sensor

来源：评论

学校读者我要写书评

暂无评论

Using digital twin to enhance Sim2real transfer for reinforcement learning in 3C assembly

引用

INDUSTRIAL ROBOT-THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH AND APPLICATION 2024年第1期51卷 125-133页

作者： Mu, Weiwen Chen, Wenbai Zhou, Huaidong Liu, Naijun Shi, Haobin Li, Jingchen Beijing Informat Sci & Technol Univ Qinghe Xiaoying Campus Beijing Peoples R China Tsinghua Univ Beijing Peoples R China Northwestern Polytech Univ Xian Peoples R China

PurposeThis paper aim to solve the problem of low assembly success rate for 3c assembly lines designed based on classical control algorithms due to inevitable random disturbances and other factors,by incorporating intelligent algorithms into the assembly line, the assembly process can be extended to uncertain assembly ***/methodology/approachThis work proposes a reinforcement learning framework based on digital twins. First, the authors used Unity3D to build a simulation environment that matches the real scene and achieved data synchronization between the real environment and the simulation environment through the robot operating system. Then, the authors trained the reinforcement learning model in the simulation environment. Finally, by creating a digital twin environment, the authors transferred the skill learned from the simulation to the real environment and achieved stable algorithm deployment in real-world *** this work, the authors have completed the transfer of skill-learning algorithms from virtual to real environments by establishing a digital twin environment. On the one hand, the experiment proves the progressiveness of the algorithm and the feasibility of the application of digital twins in reinforcement learning transfer. On the other hand, the experimental results also provide reference for the application of digital twins in 3C assembly ***/valueIn this work, the authors designed a new encoder structure in the simulation environment to encode image information, which improved the model's perception of the environment. At the same time, the authors used the fixed strategy combined with the reinforcement learning strategy to learn skills, which improved the rate of convergence and stability of skills learning. Finally, the authors transferred the learned skills to the physical platform through digital twin technology and realized the safe operation of the flexible printed circuit assembly task.

关键词： 3C assembly Digital twin image encoder Reinforcement learning transfer

来源：评论

学校读者我要写书评

暂无评论

Transformer with sparse self-attention mechanism for image captioning

引用

ELECTRONICS LETTERS 2020年第15期56卷 764-+页

作者： Wang, Duofeng Hu, Haifeng Chen, Dihu Sun Yat Sen Univ Sch Elect & Informat Engn Guangzhou 510006 Guangdong Peoples R China

Recently, transformer has been applied to the image caption model, in which the convolutional neural network and the transformer encoder act as the image encoder of the model, and the transformer decoder acts as the decoder of the model. However, transformer may suffer from the interference of non-critical objects of a scene and meet with difficulty to fully capture image information due to its self-attention mechanism's dense characteristics. In this Letter, in order to address this issue, the authors propose a novel transformer model with decreasing attention gates and attention fusion module. Specifically, they firstly use attention gate to force transformer to overcome the interference of non-critical objects and capture objects information more efficiently via truncating all the attention weights that smaller than gate threshold. Secondly, through inheriting attentional matrix from the previous layer of each network layer, the attention fusion module enables each network layer to consider other objects without losing the most critical ones. Their method is evaluated using the benchmark Microsoft COCO dataset and achieves better performance compared to the state-of-the-art methods.

关键词： computer vision image coding decoding neural nets learning (artificial intelligence) object detection image colour analysis matrix algebra image resolution image retrieval sparse self-attention mechanism image captioning image caption model convolutional neural network transformer encoder act image encoder transformer decoder noncritical objects image information transformer model attention gate attention fusion module capture objects information attention weights attentional matrix network layer

来源：评论

学校读者我要写书评

暂无评论

BI-LSTM Based Encoding and GAN for Text-to-image Synthesis

引用

SENSING AND IMAGING 2022年第1期23卷 1-17页

作者： Talasila, Vamsidhar Narasingarao, M. R. Koneru Lakshmaiah Educ Fdn Dept Comp Sci & Engn Vaddeswaram Andhra Pradesh India GITAM Univ Dept Comp Sci & Engn Visakhapatnam Andhra Pradesh India

Synthesizing images from text is to produce images with reliable content as specified text depiction that is an extremely demanding task with the most important problems like: content consistency and visual realism. Owing to considerable progression of GAN, it is now possible to produce images with good visual certainty. The translation of text descriptions to images with higher content reliability, on the other hand, is still a work in progress. This paper intends to frame a novel text-to-image synthesis approach, which includes two major phases namely;(1) Text to image encoding and (2) GAN. Initially, during text to image encoding, cross modal feature alignment takes place including text and image features. Consequently, BI-LSTM is deployed to transfer the text embedding to feature vector. At second stage, the image is synthesized based on the encoding. Consequently, text feature group are given as input to GAN, which offers the final synthesized images. Finally, the supremacy of developed approach is examined via evaluation over extant techniques.

关键词： Text-to-image image encoder Cross modal BI-LSTM GAN model

来源：评论

学校读者我要写书评

暂无评论

Design and Implementation Issues of Parallel Vector Quantization in FPGA for Real Time image Compression

Design and Implementation Issues of Parallel Vector Quantiza...

引用

International Conference on Intelligent Computing and Information Science

作者： Rasane, Krupa R. Kunte, Srinivasa Rao R. KLESCET Dept Elect & Commun Engn Belgaum Karnataka India JNNCE Shivamogga Karnataka India

ISBN: (纸本)9783642181337

In this paper a 4 codebook, Vector Quantization (VQ) core is implemented on FPGA (Field Programmable Gate Array). The proposed design has certain advantages over the earlier architecture in the form of design reuse of VQ core to build a large VQ system. The proposed core aims at increased compressing speed, modular design for design flexibility, easy reconfigurability. Modularity helps in flexile design changes for VQ with different codebook sizes and hence controls the recovered image quality. In general, the new VQ core, meets the specific and challenging needs of a single functioned, tightly constrained real time VQ encoder. The synthesis results show that a speed up of 5 is achieved. Experiments and analyses indicate that our design can satisfy the performance requirements of 30 image frames per sec for a real time image processing. The proposed VQ requires more memory and implements VQ encoder with codebook size which are in multiples of 4.

关键词： Vector Quantization image compression LBG Algorithm image encoder VLSI pipelining FPGA

来源：评论

学校读者我要写书评

暂无评论

Cross Modal Retrieval Algorithm Based on Iterative Queries 13th

Cross Modal Retrieval Algorithm Based on Iterative Queries

引用

13th International Conference on Computer Engineering and Networks (CENet)

作者： Cheng, Xiuchuan Yang, Xiaoyu Li, Huiping Wang, Zhiguo Yin, Guangqiang UESTC Shenzhen Inst Adv Study Shenzhen 518110 Peoples R China Univ Elect Sci & Technol China Chengdu 611730 Peoples R China Kashi Inst Elect & Informat Ind Kashi 844199 Peoples R China Univ Elect Sci & Technol China Kashi 844199 Peoples R China

ISBN: (纸本)9789819992454;9789819992430;9789819992423

The single-modal information retrieval pattern is gradually unable to meet the growing information processing needs. Cross-modal retrieval based on deep learning, as a new information retrieval scheme, is gradually receiving more attention. To address the potential issue of imprecise text queries in cross-modal retrieval, an iterative query-based cross-modal retrieval model is proposed. The model is generally divided into four modules: image feature extraction, text feature extraction, matching ranking, and query reinforcement. The model first extracts feature of images and text through deep learning models, then performs matching and retrieval of image-text features through the image-text stacked cross-attention algorithm. Finally, in the query reinforcement module, the most distinctive object category in the retrieval results is obtained through deep reinforcement learning for user confirmation, thereby increasing text richness and improving retrieval performance.

关键词： Cross-modal Retrieval image encoder Text encoder Matching Ranking Reinforced Query

来源：评论

学校读者我要写书评

暂无评论

GraMuFeN: graph-based multi-modal fake news detection in social media

引用

SOCIAL NETWORK ANALYSIS AND MINING 2024年第1期14卷 104-104页

作者： Kananian, Makan Badiei, Fatemeh Gh. Ghahramani, S. AmirAli Sharif Univ Technol Dept Comp Engn Int Campus Kish Isl Iran

Nowadays media overload is a pretty common scenario all around the world. The prevalence of media overload grants both individuals and governmental entities the ability to shape public opinions, highlighting the need to deploy effective fake news detection methods. In this paper, we suggest a novel model named GraMuFeN, for detecting fake news that has been posted by users on Twitter and Weibo. This model has been designed to detect fake news using both textual and image data accompanying each piece of news. We utilize Graph Convolution Neural Networks (GCN) as the text encoder and Convolutional Neural Networks (CNN) as the image encoder with the help of Supervised Contrastive Loss aiming to develop a model much lighter in terms of trainable parameters and easier to train while having a higher performance compared to previous works. Our evaluations on two different benchmarks show a promising 10% improvement in micro f1 score and a 50% reduction in terms of the model's trainable parameters.

关键词： Fake news detection Graph convolutional networks (GCNs) Convolutional neural networks Supervised contrastive loss ResNet-152 Text encoder image encoder

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：