检索结果-内蒙古大学图书馆

MPOCSR: optical chemical structure recognition based on multi-path Vision Transformer

COMPLEX & INTELLIGENT SYSTEMS 2024年第6期10卷 7553-7563页

作者： Lin, Fan Li, Jianhua East China Univ Sci & Technol Sch Informat Sci & Engn Shanghai 200237 Peoples R China

Optical chemical structure recognition (OCSR) is a fundamental and crucial task in the field of chemistry, which aims at transforming intricate chemical structure images into machine-readable formats. Current deep learning-based OCSR methods typically use image feature extractors to extract visual features and employ encoder-decoder architectures for chemical structure recognition. However, the performance of these methods is limited by their image feature extractors and the class imbalance of elements in chemical structure representation. This paper proposes MPOCSR (multi-path optical chemical structure recognition), which introduces the multi-path Vision Transformer (MPViT) and the class-balanced (CB) loss function to address these two challenges. MPOCSR uses MPViT as an image feature extractor, combining the advantages of convolutional neural networks and Vision Transformers. This strategy enables the provision of richer visual information for subsequent decoding processes. Furthermore, MPOCSR incorporates CB loss function to rebalance the loss weights among different categories. For training and validation of our method, we constructed a dataset that includes both Markush and non-Markush structures. Experimental results show that MPOCSR achieves an accuracy of 90.95% on the test set, surpassing other existing methods.

关键词： Optical chemical structure recognition Deep learning Multi-path Vision Transformer encoder-decoder architecture

来源：评论

学校读者我要写书评

暂无评论

Research on prediction of nanocrystalline alloy hysteresis properties based on long short-term memory network

引用

SCIENTIFIC REPORTS 2025年第1期15卷 1-11页

作者： Li, Hailin Zhang, Bo Shen, Yongpeng Zhang, Lei Liu, Kun Zhengzhou Univ Light Ind Coll Elect & Informat Engn Zhengzhou 450000 Peoples R China XJ Elect Co Ltd Xuchang 461000 Peoples R China

In order to predict the hysteresis characteristics of nanocrystalline alloy materials at different frequencies, a data-driven hysteresis prediction model based on the encoder-decoder architecture, which combines long short-term memory network and feedforward neural network, is proposed in this paper. The data-driven based magnetic hysteresis prediction model can take advantage of the powerful nonlinear learning ability of artificial neural network to train and learn its magnetic hysteresis characteristics of nanocrystalline alloy materials at different frequencies. Firstly, based on the encoder-decoder architecture, a hysteresis prediction model is constructed by combining long short-term memory network and feedforward neural network. Subsequently, in order to obtain the training set and validation set used for the data-driven based hysteresis prediction model, the Jiles-Atherton (J-A) hysteresis model is identified based on the B-H measurement data of a small number of nanocrystalline alloy materials at different frequencies for expediency since it is quite cumbersome and time-consuming to get these B-H data by measurement. Finally, the validity and accuracy of the data-driven based hysteresis prediction model are proved by the validation set. The maximum error is about 10.29%. The results show that the hysteresis model of neural network is able to predict hysteresis characteristics with considering the effect of frequency, which provides a new way for the simulation of hysteresis characteristics.

关键词： Magnetic hysteresis characteristic Data-driven Long short-term memory network encoder-decoder architecture

来源：评论

学校读者我要写书评

暂无评论

GNet: An integrated context-aware neural framework for transcription factor binding signal at single nucleotide resolution prediction

引用

MATHEMATICAL BIOSCIENCES AND ENGINEERING 2023年第9期20卷 15809-15829页

作者： Zhuang, Jujuan Feng, Kexin Teng, Xinyang Jia, Cangzhi Dalian Maritime Univ Sch Sci Dalian 116026 Liaoning Peoples R China

Transcription factors (TFs) are important factors that regulate gene expression. Revealing the mechanism affecting the binding specificity of TFs is the key to understanding gene regulation. Most of the previous studies focus on TF-DNA binding sites at the sequence level, and they seldom utilize the contextual features of DNA sequences. In this paper, we develop an integrated spatiotemporal context-aware neural network framework, named GNet, for predicting TF-DNA binding signal at single nucleotide resolution by achieving three tasks: single nucleotide resolution signal prediction, identification of binding regions at the sequence level, and TF-DNA binding motif prediction. GNet extracts implicit spatial contextual information with a gated highway neural mechanism, which captures large context multi-level patterns using linear shortcut connections, and the idea of it permeates the encoder and decoder parts of GNet. The improved dual external attention mechanism, which learns implicit relationships both within and among samples, and improves the performance of the model. Experimental results on 53 human TF ChIP-seq datasets and 6 chromatin accessibility ATAC-seq datasets shows that GNet outperforms the state-of-the-art methods in the three tasks, and the results of cross-species studies on 15 human and 18 mouse TF datasets of the corresponding TF families indicate that GNet also shows the best performance in cross-species prediction over the competitive methods.

关键词： transcription factor binding site gated highway neural network encoder-decoder architecture external attention mechanism

来源：评论

学校读者我要写书评

暂无评论

Automated detection and segmentation of cracks in concrete surfaces using joined segmentation and classification deep neural network

引用

CONSTRUCTION AND BUILDING MATERIALS 2023年 408卷

作者： Tabernik, Domen Suc, Matic Skocaj, Danijel Univ Ljubljana Fac Comp & Informat Sci Vecna Pot 113 Ljubljana Slovenia

Automated quality control of pavement and concrete surfaces is essential for maintaining structural integrity and consistency in the construction and infrastructure industries. This paper presents a novel deep learning model designed for automated quality control of these surfaces during both construction and maintenance phases. The model employs per-pixel segmentation and per-image classification, integrating both local and broader context information. Additionally, we utilize the classification results to improve segmentation during both training and inference stages. We evaluated the proposed model on a publicly available dataset containing more than 7,000 images of pavement and concrete cracks. The model achieved a Dice score of 81% and an intersection-over-union of 71%, surpassing publicly available state-of-the-art methods by at least 6-7 percentage points. An ablation study confirms that leveraging classification information enhances overall segmentation performance. Furthermore, our model is computationally efficient, processing over 30 FPS for 512 x 512 images, making it suitable for real-time applications on medium-resolution images. Code and the corrected dataset ground truths are publicly available: https://***/vicoslab/***.

关键词： Concrete crack segmentation Deep learning encoder-decoder architecture Automated quality control Joint segmentation and classification

来源：评论

学校读者我要写书评

暂无评论

Video description: A comprehensive survey of deep learning approaches

引用

ARTIFICIAL INTELLIGENCE REVIEW 2023年第11期56卷 13293-13372页

作者： Rafiq, Ghazala Rafiq, Muhammad Choi, Gyu Sang Yeungnam Univ Dept Informat & Commun Engn Gyongsan 38541 South Korea Keimyung Univ Dept Game & Mobile Engn 1095 Dalgubeol Daero Daegu 42601 South Korea

Video description refers to understanding visual content and transforming that acquired understanding into automatic textual narration. It bridges the key AI fields of computer vision and natural language processing in conjunction with real-time and practical applications. Deep learning-based approaches employed for video description have demonstrated enhanced results compared to conventional approaches. The current literature lacks a thorough interpretation of the recently developed and employed sequence to sequence techniques for video description. This paper fills that gap by focusing mainly on deep learning-enabled approaches to automatic caption generation. Sequence to sequence models follow an encoder-decoder architecture employing a specific composition of CNN, RNN, or the variants LSTM or GRU as an encoder and decoder block. This standard-architecture can be fused with an attention mechanism to focus on a specific distinctiveness, achieving high quality results. Reinforcement learning employed within the encoder-decoder structure can progressively deliver state-of-the-art captions by following exploration and exploitation strategies. The transformer mechanism is a modern and efficient transductive architecture for robust output. Free from recurrence, and solely based on self-attention, it allows parallelization along with training on a massive amount of data. It can fully utilize the available GPUs for most NLP tasks. Recently, with the emergence of several versions of transformers, long term dependency handling is not an issue anymore for researchers engaged in video processing for summarization and description, or for autonomous-vehicle, surveillance, and instructional purposes. They can get auspicious directions from this research.

关键词： Deep learning encoder-decoder architecture Text description Video captioning techniques Video description approaches Video captioning Vision to text

来源：评论

学校读者我要写书评

暂无评论

Transfer learning approach based on satellite image time series for the crop classification problem

引用

JOURNAL OF BIG DATA 2023年第1期10卷 54页

作者： Antonijevic, Ognjen Jelic, Slobodan Bajat, Branislav Kilibarda, Milan Univ Belgrade Fac Civil Engn Dept Geodesy & Geoinformat Belgrade Serbia

This paper presents a transfer learning approach to the crop classification problem based on time series of images from the Sentinel-2 dataset labeled for two regions: Brittany (France) and Vojvodina (Serbia). During preprocessing, cloudy images are removed from the input data, the time series are interpolated over the time dimension, and additional remote sensing indices are calculated. We chose Transformerencoder as the base model for knowledge transfer from source to target domain with French and Serbian data, respectively. Even more, the accuracy of the base model with the preprocessing step is improved by 2% when trained and evaluated on the French dataset. The transfer learning approach with fine-tuning of the pre-trained weights on the French dataset outperformed all other methods in terms of overall accuracy 0.94 and mean class recall 0.907 on the Serbian dataset. Our partially fine-tuned model improved recall of crop types that were poorly classified by the base model. In the case of sugar beet, class recall is improved by 85.71%.

关键词： Transfer learning Remote sensing encoder-decoder architecture Domain adaptation Crop classification Attention mechanism

来源：评论

学校读者我要写书评

暂无评论

MITP-Net: A deep-learning framework for short-term indoor temperature in multi-zone

引用

BUILDING AND ENVIRONMENT 2023年第1期239卷

作者： Xing, Tian Sun, Kailai Zhao, Qianchuan Tsinghua Univ Ctr Intelligent & Networked Syst Dept Automat BNRist Beijing 100084 Peoples R China

Indoor temperature prediction is an essential component of building control and energy saving. Although existing indoor temperature prediction frameworks have achieved remarkable progress, they struggle to achieve high performance due to information, method, application, and sim-to-real gaps. Aiming to fill these gaps, we propose a novel deep-learning framework for short-term indoor temperature prediction in multi-zone buildings. In particular, we expand the sensing information and formulate the multi-zone indoor temperature prediction (MITP) problem. To improve the prediction performance, we employ information fusion and encoder-decoder architecture to the MITP problem and propose MITP-Net. We set up 11 ablation experiments to compare the prediction performance of relative frameworks. To evaluate frameworks' performance, we publicly release a dataset including 2-week real operating data in a multi-zone office with a 1-min sampling interval (829,440 digits in total). Compared with existing deep-learning frameworks, MITP-Net significantly raises the prediction accuracy and can flexibly adjust the lengths of input and prediction sequences for different requirements. We provide the usage steps of MITP-Net and publish the operating data and codes on the GitHub repository: https://***/XingTian1994/MITP-Net.

关键词： Multi-zone temperature prediction Multiple sensor information Information fusion encoder-decoder architecture Gated recurrent unit network

来源：评论

学校读者我要写书评

暂无评论

A comprehensive survey on image captioning: from handcrafted to deep learning-based techniques, a taxonomy and open research issues

引用

ARTIFICIAL INTELLIGENCE REVIEW 2023年第11期56卷 13619-13661页

作者： Sharma, Himanshu Padha, Devanand Cent Univ Jammu Dept Comp Sci & Informat Technol Jammu & Kashmir Jammu 181124 India

Image captioning is a pretty modern area of the convergence of computer vision and natural language processing and is widely used in a range of applications such as multi-modal search, robotics, security, remote sensing, medical, and visual aid. The image captioning techniques have witnessed a paradigm shift from classical machine-learning-based approaches to the most contemporary deep learning-based techniques. We present an in-depth investigation of image captioning methodologies in this survey using our proposed taxonomy. Furthermore, the study investigates several eras of image captioning advancements, including template-based, retrieval-based, and encoder-decoder-based models. We also explore captioning in languages other than English. A thorough investigation of benchmark image captioning datasets and assessment measures is also discussed. The effectiveness of real-time image captioning is a severe barrier that prevents its use in sensitive applications such as visual aid, security, and medicine. Another observation from our research is the scarcity of personalized domain datasets that limits its adoption into more advanced issues. Despite influential contributions from several academics, further efforts are required to construct substantially robust and reliable image captioning models.

关键词： Attention-based image captioning encoder-decoder architecture Image captioning Multimodal embedding

来源：评论

学校读者我要写书评

暂无评论

M²: Mixed Models With Preferences, Popularities and Transitions for Next-Basket Recommendation

引用

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2023年第4期35卷 4033-4046页

作者： Peng, Bo Ren, Zhiyun Parthasarathy, Srinivasan Ning, Xia Ohio State Univ Dept Comp Sci & Engn Columbus OH 43210 USA Ohio State Univ Dept Biomed Informat Columbus OH 43210 USA Ohio State Univ Dept Biomed Informat Dept Comp Sci & Engn Columbus OH 43210 USA Ohio State Univ Translat Data Analyt Inst Columbus OH 43210 USA

Next-basket recommendation considers the problem of recommending a set of items into the next basket that users will purchase as a whole. In this paper, we develop a novel mixed model with preferences, popularities and transitions (M-2 ) for the next-basket recommendation. This method models three important factors in next-basket generation process: 1) users' general preferences, 2) items' global popularities and 3) transition patterns among items. Unlike existing recurrent neural network-based approaches, M-2 does not use the complicated networks to model the transitions among items, or generate embeddings for users. Instead, it has a simple encoder-decoder based approach (ed-Trans ) to better model the transition patterns among items. We compared M-2 with different combinations of the factors with 5 state-of-the-art next-basket recommendation methods on 4 public benchmark datasets in recommending the first, second and third next basket. Our experimental results demonstrate that M-2 significantly outperforms the state-of-the-art methods on all the datasets in all the tasks, with an improvement of up to 22.1%. In addition, our ablation study demonstrates that the ed-Trans is more effective than recurrent neural networks in terms of the recommendation performance. We also have a thorough discussion on various experimental protocols and evaluation metrics for next-basket recommendation evaluation.

关键词： Recurrent neural networks Task analysis Benchmark testing Adaptation models Decoding Protocols Markov processes Recommender systems next-basket recommendation encoder-decoder architecture mixed models

来源：评论

学校读者我要写书评

暂无评论

Mathematical expression recognition using a new deep neural model

引用

NEURAL NETWORKS 2023年 167卷 865-874页

作者： Mirkazemy, Abolfazl Adibi, Peyman Ehsani, Seyed Mohhamad Saied Darvishy, Alireza Hutter, Hans-Peter Univ Isfahan Fac Comp Engn Artificial Intelligence Dept Esfahan Iran Zurich Univ Appl Sci ZHAW Sch Engn Winterthur Switzerland

In this paper, we propose a novel deep neural model for Mathematical Expression Recognition (MER). The proposed model uses encoder-decoder transformer architecture that is supported by additional pre/post-processing modules, to recognize the image of mathematical formula and convert it to a well-formed language. A novel pre-processing module based on domain prior knowledge is proposed to generate random pads around the formula's image to create more efficient feature maps and keeps all the encoder neurons active during the training process. Also, a new post-processing module is developed which uses a sliding window to extract additional position-based information from the feature map, that is proved to be useful in the recognition process. The recurrent decoder module uses the combination of feature maps and the additional position-based information, which takes advantage of a soft attention mechanism, to extract the formula context into the LaTeX well-formed language. Finally, a novel Reinforcement Learning (RL) module processes the decoder output and tunes its results by sending proper feedbacks to the previous steps. The experimental results on im2latex100k benchmark dataset indicate that each devised pre/post-processing as well as the RL refinement module has a positive effect on the performance of the proposed model. The results also demonstrate the higher accuracy of the proposed model compared to the state-of-the-art methods. (c) 2023 Elsevier Ltd. All rights reserved.

关键词： Mathematical expression recognition Deep learning encoder-decoder architecture Attention Scientific documents accessibility

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：