检索结果-内蒙古大学图书馆

Designing an adaptive cost function for dynamic human pose predictions

MULTIMEDIA TOOLS AND APPLICATIONS 2023年第18期83卷 53201-53219页

作者： Yadav, Gaurav Kumar Puig, Domenec Nandi, G. C. Indian Inst Informat Technol Allahabad Dept Informat Technol Prayagraj Uttar Pradesh India Univ Rovira I Virgili Dept Comp Sci & Math Secur Tarragona Spain

In the modern-day scenario, machines and humans are expected to work together and collaborate in several social and manufacturing environments. The machines should predict humans' next move for effective collaborations by observing their present move. Human motion modelling and prediction are fundamental and challenging problems involving computer vision and graphics. To help solve some of the challenges, in the present investigation, we propose an innovative idea of developing a new cost function as the objective function based on adaptive sampling, which is subsequently used with an 'Adam' optimizer for training and validating a specially configured Deep Learning architecture. Our proposed development produced significantly improved results regarding future pose estimation/predictions. The adaptiveness of the proposed cost function is based on a bell-shaped locally weighted function. It has been observed that the area covered by the cost function plays a vital role during training, and the bell-shaped function's width helps decide the region of importance for the training samples. The proposed cost function has been used for training a gated recurrent unit (GRU) based encoder-decoder architecture. The encoder takes the observed input sequences, extracts the input sequence's significant variability, and passes it to the decoder. The decoder takes it as input, trains using the adaptive sampling-based method, and predicts future poses. We have experimented with this function in various sizes and shapes and compared the results obtained with some state-of-the-art research results. As elaborated in this paper, we obtained much-improved results in almost all the cases.

关键词： Deep sequential networks encoder-decoder Human 3.6M

来源：评论

学校读者我要写书评

暂无评论

End-to-End Handwritten Paragraph Text Recognition Using a Vertical Attention Network

引用

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023年第1期45卷 508-524页

作者： Coquenet, Denis Chatelain, Clement Paquet, Thierry LITIS EA 4108 F-76800 Saint Etienne Du Rouvray France Univ Rouen Normandy F-76000 Rouen France Normandy Univ F-14032 Caen France INSA Rouen Normandy F-76800 Saint Etienne Du Rouvray France

Unconstrained handwritten text recognition remains challenging for computer vision systems. Paragraph text recognition is traditionally achieved by two models: the first one for line segmentation and the second one for text line recognition. We propose a unified end-to-end model using hybrid attention to tackle this task. This model is designed to iteratively process a paragraph image line by line. It can be split into three modules. An encoder generates feature maps from the whole paragraph image. Then, an attention module recurrently generates a vertical weighted mask enabling to focus on the current text line features. This way, it performs a kind of implicit line segmentation. For each text line features, a decoder module recognizes the character sequence associated, leading to the recognition of a whole paragraph. We achieve state-of-the-art character error rate at paragraph level on three popular datasets: 1.91% for RIMES, 4.45% for IAM and 3.59% for READ 2016. Our code and trained model weights are available at https://***/FactoDeepLearning/VerticalAttentionOCR.

关键词： Seq2Seq model hybrid attention segmentation-free paragraph handwriting recognition fully convolutional network encoder-decoder optical character recognition

来源：评论

学校读者我要写书评

暂无评论

Research on neural processes with multiple latent variables

引用

IET IMAGE PROCESSING 2023年第11期17卷 3323-3336页

作者： Yu, Xiao-Han Mao, Shao-Chen Wang, Lei Lu, Shi-Jie Yu, Kun Army Engn Univ PLA Coll Command & Control Syst Nanjing Peoples R China Army Engn Univ PLA Coll Commun Engineer Nanjing Peoples R China Army Engn Univ PLA Coll Command & Control Syst Nanjing 210007 Peoples R China

Neural Process (NP) fully combines the advantages of neural network and Gaussian Process (GP) to provide an efficient method for solving regression problems. Nonetheless, limited by the dimensionality of the latent variable, NP has difficulty fitting the observed data completely and predicting the targets perfectly. To remedy these drawbacks, the authors propose a concise and effective improvement of the latent path of NP, which the authors term Multi-Latent Variables Neural Process (MLNP). MLNP samples multiple latent variables and integrates the representations corresponding to the latent variables in the decoder with adaptive weights. MLNP inherits the desirable property of linear computation scales of NP and learns the approximate distribution over objective functions from contexts more flexibly and accurately. By applying MLNP to 1-D regression, real-world image completion, which can be seen as a 2-D regression task, the authors demonstrate its significant improvement in the accuracy of prediction and contexts fitting capability compared with NP. Through ablation experiments, the authors also verify that the number of latent variables has a great impact on the prediction accuracy and fitting capability of MLNP. Moreover, the authors also analyze the roles played by different latent variables in reconstructing images.

关键词： encoder-decoder multiple latent variables neural process regression

来源：评论

学校读者我要写书评

暂无评论

An abstractive text summarization technique using transformer model with self-attention mechanism

引用

NEURAL COMPUTING & APPLICATIONS 2023年第25期35卷 18603-18622页

作者： Kumar, Sandeep Solanki, Arun Gautam Buddha Univ Dept Comp Sci & Engn Greater Noida 201312 Uttar Pradesh India

Creating a summarized version of a text document that still conveys precise meaning is an incredibly complex endeavor in natural language processing (NLP). Abstract text summarization (ATS) is the process of using facts from source sentences and merging them into concise representations while maintaining the content and intent of the text. Manually summarizing large amounts of text are challenging and time-consuming for humans. Therefore, text summarization has become an exciting research focus in NLP. This research paper proposed an ATS model using a Transformer Technique with Self-Attention Mechanism (T2SAM). The self-attention mechanism is added to the transformer to solve the problem of coreference in text. This makes the system to understand the text better. The proposed T2SAM model improves the performance of text summarization. It is trained on the Inshorts News dataset combined with the DUC-2004 shared tasks dataset. The performance of the proposed model has been evaluated using the ROUGE metrics, and it has been shown to outperform the existing state-of-the-art baseline models. The proposed model gives the training loss minimum to 1.8220 from 10.3058 (at the starting point) up to 30 epochs, and it achieved model accuracy 48.50% F1-Score on both the Inshorts and DUC-2004 news datasets.

关键词： Abstractive text summarization encoder-decoder Self-attention mechanism Rouge metrics Transformer architecture T2SAM

来源：评论

学校读者我要写书评

暂无评论

SATFace: Subject Agnostic Talking Face Generation with Natural Head Movement

引用

NEURAL PROCESSING LETTERS 2023年第6期55卷 7529-7542页

作者： Yang, Shuai Qiao, Kai Shi, Shuhao Yang, Jie Ma, Dekui Hu, Guoen Yan, Bin Chen, Jian PLA Strategy Support Force Informat Engn Univ Henan Key Lab Imaging & Intelligence Proc Sci Ave Zhengzhou 450001 Henan Peoples R China

Talking face generation is widely used in education, entertainment, shopping, and other social practices. Existing methods focus on matching the speaker's mouth shape with the speech content. Still, there is a lack of research on automatically extracting potential head motion features from speech, resulting in a lack of naturalness. This paper proposes SATFace, a subject agnostic talking face generation method with natural head movement. To model the talking face's complicated and critical features (identity, background, mouth shape, head posture, etc.), we construct SATFace by taking encoder-decoder as the primary network architecture. Then, we design a long short-time feature learning network to better reference the global and local information in audio for generating reasonable head movement. Besides, a modular training process is proposed to improve explicit and implicit features' learning effects and efficiency. The experimental comparison results show that SATFace improves by at least about 9.8% in cumulative probability of blur detection and 8.2% in synchronization confidence compared with the mainstream methods. The mean opinion scores show that SATFace has advantages in terms of lip sync quality, head movement naturalness, and video realness.

关键词： Talking face generation Generative adversarial networks encoder-decoder Feature learning

来源：评论

学校读者我要写书评

暂无评论

Efficient hand segmentation for rehabilitation tasks using a convolution neural network with attention

引用

EXPERT SYSTEMS WITH APPLICATIONS 2023年第1期234卷

作者： Dutta, H. Pallab Jyoti Bhuyan, M. K. Neog, Debanga Raj Macdorman, Karl Fredric Laskar, Rabul Hussain Indian Inst Technol Dept Elect & Elect Engn Gauhati 781039 Assam India Indian Inst Technol Mehta Family Sch Data Sci & Artificial Intelligenc Gauhati 781039 Assam India Indiana Univ Luddy Sch Informat Comp & Engn Indianapolis IN 46202 USA Natl Inst Technol Silchar Dept Elect & Commun Engn Silchar 788010 Assam India

We designed an interface to support hand rehabilitation tasks to restore hand function and relieve discomfort. The interface requires accurate hand segmentation, which is impeded by background clutter, occlusion, and variations in illumination. To overcome these challenges, we propose a novel encoder-decoder that segments the hand by encoding spatial and channel correlations using two attention blocks. This approach requires much less computation than benchmark self-attention mechanisms. Moreover, a novel loss function optimizes the model to resolve class imbalance, ensure boundary smoothness, and retain the hand's shape. The quantitative and qualitative results show the model's ability to segment the hands. It performed exceptionally well for images with different hand poses and orientations, the presence of a human face, background clutter, specularity, and variations in illumination. The model attained an F1-score of 97.3% for the Ouhands and 99.3% for the HGR dataset, higher than baseline models, with faster inference times. Furthermore, the model could generalize hand segmentation to multiple hands and unseen environments. Its segmentation precision enabled the development of the hand rehabilitation interface, which guided users to perform hand exercises. For five weeks, patients steadily improved hand function while using the interface.

关键词： Channel attention Efficient attention mechanism encoder-decoder Hand rehabilitation Hand segmentation Spatial attention

来源：评论

学校读者我要写书评

暂无评论

Abnormal behavior detection algorithm based on multi-branch convolutional fusion neural network

引用

MULTIMEDIA TOOLS AND APPLICATIONS 2023年第15期82卷 22723-22740页

作者： Xu, Zheng Lu, Yuanyao North China Univ Technol Sch Informat Sci & Technol Beijing Peoples R China

The recognition of abnormal behavior in surveillance video is the focus of current research, which has high research value and broad application possibilities. Its main applications are in the fields of intelligent surveillance, intelligent security, and smart cities, and it is of great significance to study the recognition of abnormal behaviors. Because of the complexity of human movement and the variability of the external environment, the recognition and detection of abnormal behaviors have some challenges. The recognition and detection of abnormal human behaviors in surveillance video still needs further research and development. This paper uses the multi-branch convolutional neural network to extract the spatial features of video frames for the first time, and as an encoder to pass the condensed features to the Gated Recurrent Unit (GRU), which extracts Temporal features from multiple video frames. And then the Gated Recurrent Unit output the result as the decoder. We did a series of comparative experiments on UCF-Crime dataset. And finally, we achieved an accuracy of 86.78% in the test set. The experimental results show that our multi-branch convolutional fusion neural network is better than previous surveillance video abnormal behavior recognition algorithms. At the same time, in order to verify the generalization performance and efficiency of the algorithm, we also conducted an experimental validation on the UCF-101 dataset in this paper, and the results show that the algorithm in this paper can also show a high accuracy rate on the UCF-101 dataset, and the speed of the algorithm is almost close to that of the C3D method with improved accuracy rate, making it possible to develop simple recognition applications based on the algorithm studied in this paper subsequently.

关键词： Abnormal behavior detection Multi-branch convolution GRU encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Paraphrase Generation and Supervised Learning for Improved Automatic Short Answer Grading

引用

INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE IN EDUCATION 2024年第4期34卷 1627-1670页

作者： Ouahrani, Leila Bennouar, Djamal Bouira Univ Fac Appl Sci Comp Sci Dept LIM Lab Bouira 10000 Algeria

We consider the reference-based approach for Automatic Short Answer Grading (ASAG) that involves scoring a textual constructed student answer comparing to a teacher-provided reference answer. The reference answer does not cover the variety of student answers as it contains only specific examples of correct answers. Considering other language variants of the reference answer can handle variability in student responses and improve scoring accuracy. Alternative reference answers may be possible, but manually creating them is expensive and time-consuming. In this paper, we consider two issues: First, we need to automatically generate various reference answers that can handle the diversity of student answers. Second, we should provide an accurate grading model that improves sentence similarity computation using multiple reference answers. Therefore, our proposed approach to solve both problems highlights two components. First, we provide a sequence-to-sequence deep learning model that targets generating plausible paraphrased reference answers conditioned on the provided reference answer. Secondly, we propose a supervised grading model based on sentence embedding features. The grading model enriches features to improve accuracy considering multiple reference answers. Experiments are conducted both in Arabic and English. They show that the paraphrase generator produces accurate paraphrases. Using multiple reference answers, the proposed grading model achieves a Root Mean Square Error of 0,6955, a Pearson correlation of 88,92% for the Arabic dataset, an RMSE of 0,7790, and a Pearson correlation of 73,50% for the English dataset. While fine-tuning pre-trained transformers on the English dataset provided state-of-the-art performance (RMSE: 0.7620), our approach yields comparable results. Simple to construct, load, and embed into the Learning Management System question engine with low computational complexity, the proposed approach can be easily integrated into the Learning Ma

关键词： Short Answer Grading Paraphrase generation Automatic reference answer generation encoder-decoder Natural Language Processing

来源：评论

学校读者我要写书评

暂无评论

Text Summarization Method Based on Gated Attention Graph Neural Network

引用

SENSORS 2023年第3期23卷 1654-1654页

作者： Huang, Jingui Wu, Wenya Li, Jingyi Wang, Shengchun Hunan Normal Univ Coll Informat Sci & Engn Changsha 410081 Peoples R China

Text summarization is an information compression technology to extract important information from long text, which has become a challenging research direction in the field of natural language processing. At present, the text summary model based on deep learning has shown good results, but how to more effectively model the relationship between words, more accurately extract feature information and eliminate redundant information is still a problem of concern. This paper proposes a graph neural network model GA-GNN based on gated attention, which effectively improves the accuracy and readability of text summarization. First, the words are encoded using a concatenated sentence encoder to generate a deeper vector containing local and global semantic information. Secondly, the ability to extract key information features is improved by using gated attention units to eliminate local irrelevant information. Finally, the loss function is optimized from the three aspects of contrastive learning, confidence calculation of important sentences, and graph feature extraction to improve the robustness of the model. Experimental validation was conducted on a CNN/Daily Mail dataset and MR dataset, and the results showed that the model in this paper outperformed existing methods.

关键词： encoder-decoder GNN contrastive learning confidence calculation of important sentences attention mechanism

来源：评论

学校读者我要写书评

暂无评论

High-resolution optical remote sensing image change detection based on dense connection and attention feature fusion network

引用

PHOTOGRAMMETRIC RECORD 2023年第184期38卷 498-519页

作者： Peng, Daifeng Zhai, Chenchen Zhang, Yongjun Guan, Haiyan Nanjing Univ Informat Sci & Technol Sch Remote Sensing & Geomat Engn Nanjing Peoples R China Minist Nat Resources Key Lab Natl Geog Census & Monitoring Wuhan Peoples R China Wuhan Univ Sch Remote Sensing Informat Engn Wuhan Peoples R China Nanjing Univ Informat Sci & Technol Sch Remote Sensing & Geomat Engn Nanjing 210044 Peoples R China

The detection of ground object changes from bi-temporal images is of great significance for urban planning, land-use/land-cover monitoring and natural disaster assessment. To solve the limitation of incomplete change detection (CD) entities and inaccurate edges caused by the loss of detailed information, this paper proposes a network based on dense connections and attention feature fusion, namely Siamese NestedUNet with Attention Feature Fusion (SNAFF). First, multi-level bi-temporal features are extracted through a Siamese network. The dense connections between the sub-nodes of the decoder are used to compensate for the missing location information as well as weakening the semantic differences between features. Then, the attention mechanism is introduced to combine global and local information to achieve feature fusion. Finally, a deep supervision strategy is used to suppress the problem of gradient vanishing and slow convergence speed. During the testing phase, the test time augmentation (TTA) strategy is adopted to further improve the CD performance. In order to verify the effectiveness of the proposed method, two datasets with different change types are used. The experimental results indicate that, compared with the comparison methods, the proposed SNAFF achieves the best quantitative results on both datasets, in which F1, IoU and OA in the LEVIR-CD dataset are 91.47%, 84.28% and 99.13%, respectively, and the values in the CDD dataset are 96.91%, 94.01% and 99.27%, respectively. In addition, the qualitative results show that SNAFF can effectively retain the global and edge information of the detected entity, thus achieving the best visual performance. This paper proposes a novel change detection (CD) method based on dense connections and attention feature fusion, which is capable of recovering detailed information as well as capturing global and local information. A deep supervision module is introduced to further improve the CD performance. Extensive experiment

关键词： attention mechanism change detection dense connection encoder-decoder feature fusion Siamese network

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：