检索结果-内蒙古大学图书馆

18th International Conference on Frontiers in Handwriting Recognition (ICFHR)

作者： Zhang, Zhang Zhang, Yibo Beijing Natl Day Sch 66 Yuquan Rd Beijing Peoples R China Beijing Jiaotong Univ 3 Shangyuancun Beijing Peoples R China

ISBN: (纸本)9783031216473;9783031216480

The attention-based encoder-decoder (AED) models are increasingly used in handwritten mathematical expression recognition (HMER) tasks. Given the recent success ofTransformer in computer vision and a variety of attempts to combine Transformer with convolutional neural network (CNN), in this paper, we study 3 ways of leveraging Transformer and CNN designs to improve AED-based HMER models: 1) Tandem way, which feeds CNN-extracted features to a Transformer encoder to capture global dependencies;2) Parallel way, which adds a Transformer encoder branch taking raw image patches as input and concatenates its output with CNN's as final feature;3) Mixing way, which replaces convolution layers of CNN's last stage withmulti-head self-attention (MHSA). We compared these 3 methods on the CROHME benchmark. On CROHME 2016 and 2019, Tandem way attained the ExpRate of 54.85% and 58.56%, respectively;Parallel way attained the ExpRate of 55.63% and 57.39%;and Mixing way achieved the ExpRate of 53.93% and 55.64%. This result indicates that Parallel and Tandem ways perform better than Mixing way, and have little difference between each other.

关键词： Handwritten mathematical expression recognition Transformer encoder-decoder model Convolutional neural network

来源：评论

学校读者我要写书评

暂无评论

FrameSum: Leveraging Framing Theory and Deep Learning for Enhanced News Text Summarization

引用

APPLIED SCIENCES-BASEL 2024年第17期14卷 7548页

作者： Zhang, Xin Wei, Qiyi Zheng, Bin Liu, Jiefeng Zhang, Pengzhou Commun Univ China Sch Comp & Cyber Sci Beijing 100024 Peoples R China Univ Elect Sci & Technol China Sch Informat & Software Engn Chengdu 610054 Peoples R China Commun Univ China State Key Lab Media Convergence & Commun Beijing 100024 Peoples R China

Framing theory is a widely accepted theoretical framework in the field of news communication studies, frequently employed to analyze the content of news reports. This paper innovatively introduces framing theory into the text summarization task and proposes a news text summarization method based on framing theory to address the global context of rapidly increasing speed and scale of information dissemination. Traditional text summarization methods often overlook the implicit deep-level semantic content and situational frames in news texts, and the method proposed in this paper aims to fill this gap. Our deep learning-based news frame identification module can automatically identify frame elements in the text and predict the dominant frame of the text. The frame-aware summarization generation model (FrameSum) can incorporate the identified frame feature into the text representation and attention mechanism, ensuring that the generated summary focuses on the core content of the news report while maintaining high information coverage, readability, and objectivity. Through empirical studies on the standard CNN/Daily Mail dataset, we found that this method performs significantly better in improving summary quality and maintaining the accuracy of news facts.

关键词： news text summarization framing theory deep learning framework recognition encoder-decoder model

来源：评论

学校读者我要写书评

暂无评论

Unsupervised feature selection using orthogonal encoder-decoder factorization

引用

INFORMATION SCIENCES 2024年 663卷

作者： Mozafari, Maryam Seyedi, Seyed Amjad Mohammadiani, Rojiar Pir Tab, Fardin Akhlaghian Univ Kurdistan Dept Comp Engn Sanandaj Iran

Unsupervised feature selection (UFS) is a fundamental task in machine learning and data analysis, aimed at identifying a subset of non -redundant and relevant features from a high -dimensional dataset. Embedded methods seamlessly integrate feature selection into model training, resulting in more efficient and interpretable models. Current embedded UFS methods primarily rely on self -representation or pseudo -supervised feature selection approaches to address redundancy and irrelevant feature issues, respectively. Nevertheless, there is currently a lack of research showcasing the fusion of these two approaches. This paper proposes the Orthogonal encoderdecoder factorization for unsupervised Feature Selection (OEDFS) model, combining the strengths of self -representation and pseudo -supervised approaches. This method draws inspiration from the self -representation properties of autoencoder architectures and leverages encoder and decoder factorizations to simulate a pseudo -supervised feature selection approach. To further enhance the part -based characteristics of factorization, orthogonality constraints and local structure preservation restrictions are incorporated into the objective function. The optimization process is based on the multiplicative update rule, ensuring efficient convergence. To assess the effectiveness of the proposed method, comprehensive experiments are conducted on 14 datasets and compare the results with eight state-of-the-art methods. The experimental results demonstrate the superior performance of the proposed approach in terms of UFS efficiency.

关键词： Unsupervised feature selection encoder-decoder model Self-representation learning Pseudo-supervised learning Nonnegative matrix factorization

来源：评论

学校读者我要写书评

暂无评论

Sequential Memory modelling for Video Captioning 19

Sequential Memory Modelling for Video Captioning

引用

19th IEEE-India-Council International Conference (INDICON)

作者： Puttaraja Nayaka, Chidambara Manikesh Sharma, Nitin Anand, Kumar M. Natl Inst Technol Karnataka Dept Informat Technol Surathkal 575025 India

ISBN: (纸本)9781665473507

In recent years, the automatic generation of natural language descriptions of video has focused on deep learning research and natural voice processing. Video understanding has multiple applications such as video search and indexing, but video subtitles are a correct sophisticated topic for complex and diverse types of video content. However, the understanding between video and natural language sets remains an open issue to better understand the video and create multiple methods to create a set automatically. The deep learning method has a major focus on the direction of video processing with performance and highspeed computing capabilities. This polling discusses an encoderdecoder network end-in-frame based on a deep learning approach to generate caption. In this paper we will describe the model, dataset and parameters used to evaluate the model.

关键词： Deep learning NLP Natural Language Processing LSTM encoder-decoder model

来源：评论

学校读者我要写书评

暂无评论

CoMER: modeling Coverage for Transformer-Based Handwritten Mathematical Expression Recognition 17th

CoMER: Modeling Coverage for Transformer-Based Handwritten M...

引用

17th European Conference on Computer Vision (ECCV)

作者： Zhao, Wenqi Gao, Liangcai Peking Univ Wangxuan Inst Comp Technol Beijing Peoples R China

ISBN: (纸本)9783031198144;9783031198151

The Transformer-based encoder-decoder architecture has recently made significant advances in recognizing handwritten mathematical expressions. However, the transformer model still suffers from the lack of coverage problem, making its expression recognition rate (ExpRate) inferior to its RNN counterpart. Coverage information, which records the alignment information of the past steps, has proven effective in the RNN models. In this paper, we propose CoMER, a model that adopts the coverage information in the transformer decoder. Specifically, we propose a novel Attention Refinement Module (ARM) to refine the attention weights with past alignment information without hurting its parallelism. Furthermore, we take coverage information to the extreme by proposing self-coverage and cross-coverage, which utilize the past alignment information from the current and previous layers. Experiments show that CoMER improves the ExpRate by 0.61%/2.09%/1.59% compared to the current state-of-the-art model, and reaches 59.33%/59.81%/62.97% on the CROHME 2014/2016/2019 test sets. (Source code is available at https://***/Green-Wood/CoMER)

关键词： Handwritten mathematical expression recognition Transformer Coverage Alignment encoder-decoder model

来源：评论

学校读者我要写书评

暂无评论

Joint Object Affordance Reasoning and Segmentation in RGB-D Videos

引用

IEEE ACCESS 2021年 9卷 89699-89713页

作者： Thermos, Spyridon Potamianos, Gerasimos Daras, Petros Univ Edinburgh Sch Engn Edinburgh EH9 3JL Midlothian Scotland Univ Thessaly Dept Elect & Comp Engn Volos 38221 Greece Informat Technol Inst Ctr Res & Technol Hellas Visual Comp Lab Thessaloniki 57001 Greece

Understanding human-object interaction is a fundamental challenge in computer vision and robotics. Crucial to it is the ability to infer "object affordances" from visual data, namely the types of interaction supported by an object of interest and the object parts involved. Such inference can be approached as an "affordance reasoning" task, where object affordances are recognized and localized as image heatmaps, and as an "affordance segmentation" task, where affordance labels are obtained at a more detailed, image pixel level. To tackle the two tasks, existing methods typically: (i) treat them independently;(ii) adopt static image-based models, ignoring the temporal aspect of human-object interaction;and / or (iii) require additional strong supervision concerning object class and location. In this paper, we focus on both tasks, while addressing all three aforementioned shortcomings. For this purpose, we propose a deep-learning based dual encoder-decoder model for joint affordance reasoning and segmentation, which learns from our recently introduced SOR3D-AFF corpus of RGB-D human-object interaction videos, without relying on object localization and classification. The basic components of the model comprise: (i) two parallel encoders that capture spatio-temporal interaction information;(ii) a reasoning decoder that predicts affordance heatmaps, assisted by an affordance classifier and an attention mechanism;and (iii) a segmentation decoder that exploits the predicted heatmap to yield pixel-level affordance segmentation. All modules are jointly trained, while the system can operate on both static images and videos. The approach is evaluated on four datasets, surpassing the current state-of-the-art in both affordance reasoning and segmentation.

关键词： Affordances Cognition Decoding Task analysis Image segmentation Heating systems Videos Object affordances human-object interaction reasoning semantic segmentation deep learning encoder-decoder model attention mechanism RGB-D video

来源：评论

学校读者我要写书评

暂无评论

CNN-based encoder-decoder networks for salient object detection: A comprehensive review and recent advances

引用

INFORMATION SCIENCES 2021年 546卷 835-857页

作者： Ji, Yuzhu Zhang, Haijun Zhang, Zhao Liu, Ming Harbin Inst Technol Dept Comp Sci Shenzhen Peoples R China Hefei Univ Technol Dept Comp Sci Hefei Peoples R China Harbin Inst Technol Sch Astronaut Harbin Peoples R China

Convolutional neural network (CNN)-based encoder-decoder models have profoundly inspired recent works in the field of salient object detection (SOD). With the rapid development of encoder-decoder models with respect to most pixel-level dense prediction tasks, an empirical study still does not exist that evaluates performance by applying a large body of encoder-decoder models on SOD tasks. In this paper, instead of limiting our survey to SOD methods, a broader view is further presented from the perspective of fundamental architectures of key modules and structures in CNN-based encoder-decoder models for pixel-level dense prediction tasks. Moreover, we focus on performing SOD by leveraging deep encoder-decoder models, and present an extensive empirical study on baseline encoder-decoder models in terms of different encoder backbones, loss functions, training batch sizes, and attention structures. Moreover, state-of-the-art encoder-decoder models adopted from semantic segmentation and deep CNN-based SOD models are also investigated. New baseline models that can outperform state-of-the-art performance were discovered. In addition, these newly discovered baseline models were further evaluated on three video-based SOD benchmark datasets. Experimental results demonstrate the effectiveness of these baseline models on both imageand video-based SOD tasks. This empirical study is concluded by a comprehensive summary which provides suggestions on future perspectives. (c) 2020 Elsevier Inc. All rights reserved.

关键词： Salient object detection encoder-decoder model Pixel-level classification Video saliency Empirical study

来源：评论

学校读者我要写书评

暂无评论

Response type selection for chat-like spoken dialog systems based on LSTM and multi-task

引用

SPEECH COMMUNICATION 2021年 133卷 23-30页

作者： Ohta, Kengo Nishimura, Ryota Kitaoka, Norihide Anan Coll Natl Inst Technol Anan Japan Tokushima Univ Tokushima Japan Toyohashi Univ Technol Toyohashi Aichi Japan

We propose a method of automatically selecting appropriate responses in conversational spoken dialog systems by explicitly determining the correct response type that is needed first, based on a comparison of the user's input utterance with many other utterances. Response utterances are then generated based on this response type designation (back channel, changing the topic, expanding the topic, etc.). This allows the generation of more appropriate responses than conventional end-to-end approaches, which only use the user's input to directly generate response utterances. As a response type selector, we propose an LSTM-based encoder-decoder framework utilizing acoustic and linguistic features extracted from input utterances. In order to extract these features more accurately, we utilize not only input utterances but also response utterances in the training corpus. To do so, multi-task learning using multiple decoders is also investigated. To evaluate our proposed method, we conducted experiments using a corpus of dialogs between elderly people and an interviewer. Our proposed method outperformed conventional methods using either a point wise classifier based on Support Vector Machines, or a single-task learning LSTM. The best performance was achieved when our two response type selectors (one trained using acoustic features, and the other trained using linguistic features) were combined, and multi-task learning was also performed.

关键词： Spoken dialog system Response type selection encoder-decoder model Multi-task learning

来源：评论

学校读者我要写书评

暂无评论

An Overview on Image Segmentation Techniques for Reversible Data Hiding

INTERNATIONAL JOURNAL OF MATHEMATICAL ENGINEERING AND MANAGE...

引用

INTERNATIONAL JOURNAL OF MATHEMATICAL ENGINEERING AND MANAGEMENT SCIENCES 2024年第5期9卷 1163-1184页

作者： Gupta, Rasika Delhi Technol Univ Dept Comp Sci & Engn Delhi India

The fields of image processing and computer vision have witnessed significant growth due to the proliferation of digital images across diverse domains. Image Segmentation is the fundamental task in digital image processing, finding applications in pivotal areas such as medical imaging, covert communication, autonomous driving, satellite imaging, among others. One particularly intriguing application of image segmentation lies in Reversible Data Hiding (RDH), where the delineation of the main Region of Interest (ROI) and Non-Region of Interest (NROI) using segmentation plays a crucial role for effective data encryption in the images. Over the last two decades, various studies focussed on developing an efficient data hiding approach, which can embed secret data within ROI and NROI part of image while ensuring its quality. A comprehensive survey has been conducted that meticulously examines different segmentation techniques, along with its usage in reversible data hiding. The main objective of this survey is to compare the performance metrics of reversible data hiding after applying different image segmentation techniques. The image segmentation techniques have been categorized systematically into three main classes: i) Traditional segmentation techniques, encompassing a spectrum of approaches like thresholding, region-based and edge detection based techniques, ii) Machine Learning (ML) based approach consisting of Clustering, Support Vector Machine (SVM) and iii) Deep Learning (DL) based technique, propelled by Convolutional Neural Networks (CNNs) that have emerged as a transformative paradigm, revolutionizing segmentation tasks with their ability to learn complex images. The survey finds out that PSNR value of data embedded images is high after applying deep learning based segmentation technique.

关键词： encoder-decoder model Dilated convolution model ROI segmentation Reversible data hiding .

来源：评论

学校读者我要写书评

暂无评论

Capturing intrinsic features from field data for predicting the production of natural gas

GEOENERGY SCIENCE AND ENGINEERING

引用

GEOENERGY SCIENCE AND ENGINEERING 2023年 227卷

作者： Wang, Xin Wang, Yong-Sheng Pang, Lan-Su Jiang, Tao Chen, Yu-Fan Wang, Yang Mei, Qing-Yan Qing, Sheng-Lan Jiang, Wei Southwest Petr Univ Sch Comp Sci Chengdu 610500 Peoples R China PetroChina Southwest Oil & Gasfield Co Explorat & Dev Res Inst Chengdu 610043 Peoples R China

Production prediction for gas wells is a popular topic in reservoir engineering as it plays a crucial role in the formulation of development plans. Most traditional techniques can be categorized into two types, i.e., numerical simulation methods and decline curve analysis, while none of them can precisely capture the varying trends of gas production, which leads to poor prediction results. To tackle the issue, we propose a comprehensive approach that works in a pipeline manner to learn intrinsic features from data for production prediction. (1) We propose to group wells with a clustering algorithm which does not need the pre-specified cluster number. To group wells even better, two parameters, i.e., dynamic volatility and static volatility of productions are introduced and involved for clustering. (2) We devise a technique that is based on the maximum likelihood estimation, for well matching. (3) We develop an encoder-decoder model for learning varying trends of well productions, by considering geological, engineering and production data simultaneously. (4) On real-life data, we conduct intensive experiments and find that our approach achieves superior performance and substantially outperforms its counterparts.

关键词： Natural gas Production prediction encoder-decoder model Well clustering & matching

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：