Optical chemical structure recognition (OCSR) is a fundamental and crucial task in the field of chemistry, which aims at transforming intricate chemical structure images into machine-readable formats. Current deep lea...
详细信息
Optical chemical structure recognition (OCSR) is a fundamental and crucial task in the field of chemistry, which aims at transforming intricate chemical structure images into machine-readable formats. Current deep learning-based OCSR methods typically use image feature extractors to extract visual features and employ encoder-decoder architectures for chemical structure recognition. However, the performance of these methods is limited by their image feature extractors and the class imbalance of elements in chemical structure representation. This paper proposes MPOCSR (multi-path optical chemical structure recognition), which introduces the multi-path Vision Transformer (MPViT) and the class-balanced (CB) loss function to address these two challenges. MPOCSR uses MPViT as an image feature extractor, combining the advantages of convolutional neural networks and Vision Transformers. This strategy enables the provision of richer visual information for subsequent decoding processes. Furthermore, MPOCSR incorporates CB loss function to rebalance the loss weights among different categories. For training and validation of our method, we constructed a dataset that includes both Markush and non-Markush structures. Experimental results show that MPOCSR achieves an accuracy of 90.95% on the test set, surpassing other existing methods.
In order to predict the hysteresis characteristics of nanocrystalline alloy materials at different frequencies, a data-driven hysteresis prediction model based on the encoder-decoder architecture, which combines long ...
详细信息
In order to predict the hysteresis characteristics of nanocrystalline alloy materials at different frequencies, a data-driven hysteresis prediction model based on the encoder-decoder architecture, which combines long short-term memory network and feedforward neural network, is proposed in this paper. The data-driven based magnetic hysteresis prediction model can take advantage of the powerful nonlinear learning ability of artificial neural network to train and learn its magnetic hysteresis characteristics of nanocrystalline alloy materials at different frequencies. Firstly, based on the encoder-decoder architecture, a hysteresis prediction model is constructed by combining long short-term memory network and feedforward neural network. Subsequently, in order to obtain the training set and validation set used for the data-driven based hysteresis prediction model, the Jiles-Atherton (J-A) hysteresis model is identified based on the B-H measurement data of a small number of nanocrystalline alloy materials at different frequencies for expediency since it is quite cumbersome and time-consuming to get these B-H data by measurement. Finally, the validity and accuracy of the data-driven based hysteresis prediction model are proved by the validation set. The maximum error is about 10.29%. The results show that the hysteresis model of neural network is able to predict hysteresis characteristics with considering the effect of frequency, which provides a new way for the simulation of hysteresis characteristics.
Transcription factors (TFs) are important factors that regulate gene expression. Revealing the mechanism affecting the binding specificity of TFs is the key to understanding gene regulation. Most of the previous studi...
详细信息
Transcription factors (TFs) are important factors that regulate gene expression. Revealing the mechanism affecting the binding specificity of TFs is the key to understanding gene regulation. Most of the previous studies focus on TF-DNA binding sites at the sequence level, and they seldom utilize the contextual features of DNA sequences. In this paper, we develop an integrated spatiotemporal context-aware neural network framework, named GNet, for predicting TF-DNA binding signal at single nucleotide resolution by achieving three tasks: single nucleotide resolution signal prediction, identification of binding regions at the sequence level, and TF-DNA binding motif prediction. GNet extracts implicit spatial contextual information with a gated highway neural mechanism, which captures large context multi-level patterns using linear shortcut connections, and the idea of it permeates the encoder and decoder parts of GNet. The improved dual external attention mechanism, which learns implicit relationships both within and among samples, and improves the performance of the model. Experimental results on 53 human TF ChIP-seq datasets and 6 chromatin accessibility ATAC-seq datasets shows that GNet outperforms the state-of-the-art methods in the three tasks, and the results of cross-species studies on 15 human and 18 mouse TF datasets of the corresponding TF families indicate that GNet also shows the best performance in cross-species prediction over the competitive methods.
Automated quality control of pavement and concrete surfaces is essential for maintaining structural integrity and consistency in the construction and infrastructure industries. This paper presents a novel deep learnin...
详细信息
Automated quality control of pavement and concrete surfaces is essential for maintaining structural integrity and consistency in the construction and infrastructure industries. This paper presents a novel deep learning model designed for automated quality control of these surfaces during both construction and maintenance phases. The model employs per-pixel segmentation and per-image classification, integrating both local and broader context information. Additionally, we utilize the classification results to improve segmentation during both training and inference stages. We evaluated the proposed model on a publicly available dataset containing more than 7,000 images of pavement and concrete cracks. The model achieved a Dice score of 81% and an intersection-over-union of 71%, surpassing publicly available state-of-the-art methods by at least 6-7 percentage points. An ablation study confirms that leveraging classification information enhances overall segmentation performance. Furthermore, our model is computationally efficient, processing over 30 FPS for 512 x 512 images, making it suitable for real-time applications on medium-resolution images. Code and the corrected dataset ground truths are publicly available: https://***/vicoslab/***.
Video description refers to understanding visual content and transforming that acquired understanding into automatic textual narration. It bridges the key AI fields of computer vision and natural language processing i...
详细信息
Video description refers to understanding visual content and transforming that acquired understanding into automatic textual narration. It bridges the key AI fields of computer vision and natural language processing in conjunction with real-time and practical applications. Deep learning-based approaches employed for video description have demonstrated enhanced results compared to conventional approaches. The current literature lacks a thorough interpretation of the recently developed and employed sequence to sequence techniques for video description. This paper fills that gap by focusing mainly on deep learning-enabled approaches to automatic caption generation. Sequence to sequence models follow an encoder-decoder architecture employing a specific composition of CNN, RNN, or the variants LSTM or GRU as an encoder and decoder block. This standard-architecture can be fused with an attention mechanism to focus on a specific distinctiveness, achieving high quality results. Reinforcement learning employed within the encoder-decoder structure can progressively deliver state-of-the-art captions by following exploration and exploitation strategies. The transformer mechanism is a modern and efficient transductive architecture for robust output. Free from recurrence, and solely based on self-attention, it allows parallelization along with training on a massive amount of data. It can fully utilize the available GPUs for most NLP tasks. Recently, with the emergence of several versions of transformers, long term dependency handling is not an issue anymore for researchers engaged in video processing for summarization and description, or for autonomous-vehicle, surveillance, and instructional purposes. They can get auspicious directions from this research.
This paper presents a transfer learning approach to the crop classification problem based on time series of images from the Sentinel-2 dataset labeled for two regions: Brittany (France) and Vojvodina (Serbia). During ...
详细信息
This paper presents a transfer learning approach to the crop classification problem based on time series of images from the Sentinel-2 dataset labeled for two regions: Brittany (France) and Vojvodina (Serbia). During preprocessing, cloudy images are removed from the input data, the time series are interpolated over the time dimension, and additional remote sensing indices are calculated. We chose Transformerencoder as the base model for knowledge transfer from source to target domain with French and Serbian data, respectively. Even more, the accuracy of the base model with the preprocessing step is improved by 2% when trained and evaluated on the French dataset. The transfer learning approach with fine-tuning of the pre-trained weights on the French dataset outperformed all other methods in terms of overall accuracy 0.94 and mean class recall 0.907 on the Serbian dataset. Our partially fine-tuned model improved recall of crop types that were poorly classified by the base model. In the case of sugar beet, class recall is improved by 85.71%.
Indoor temperature prediction is an essential component of building control and energy saving. Although existing indoor temperature prediction frameworks have achieved remarkable progress, they struggle to achieve hig...
详细信息
Indoor temperature prediction is an essential component of building control and energy saving. Although existing indoor temperature prediction frameworks have achieved remarkable progress, they struggle to achieve high performance due to information, method, application, and sim-to-real gaps. Aiming to fill these gaps, we propose a novel deep-learning framework for short-term indoor temperature prediction in multi-zone buildings. In particular, we expand the sensing information and formulate the multi-zone indoor temperature prediction (MITP) problem. To improve the prediction performance, we employ information fusion and encoder-decoder architecture to the MITP problem and propose MITP-Net. We set up 11 ablation experiments to compare the prediction performance of relative frameworks. To evaluate frameworks' performance, we publicly release a dataset including 2-week real operating data in a multi-zone office with a 1-min sampling interval (829,440 digits in total). Compared with existing deep-learning frameworks, MITP-Net significantly raises the prediction accuracy and can flexibly adjust the lengths of input and prediction sequences for different requirements. We provide the usage steps of MITP-Net and publish the operating data and codes on the GitHub repository: https://***/XingTian1994/MITP-Net.
Image captioning is a pretty modern area of the convergence of computer vision and natural language processing and is widely used in a range of applications such as multi-modal search, robotics, security, remote sensi...
详细信息
Image captioning is a pretty modern area of the convergence of computer vision and natural language processing and is widely used in a range of applications such as multi-modal search, robotics, security, remote sensing, medical, and visual aid. The image captioning techniques have witnessed a paradigm shift from classical machine-learning-based approaches to the most contemporary deep learning-based techniques. We present an in-depth investigation of image captioning methodologies in this survey using our proposed taxonomy. Furthermore, the study investigates several eras of image captioning advancements, including template-based, retrieval-based, and encoder-decoder-based models. We also explore captioning in languages other than English. A thorough investigation of benchmark image captioning datasets and assessment measures is also discussed. The effectiveness of real-time image captioning is a severe barrier that prevents its use in sensitive applications such as visual aid, security, and medicine. Another observation from our research is the scarcity of personalized domain datasets that limits its adoption into more advanced issues. Despite influential contributions from several academics, further efforts are required to construct substantially robust and reliable image captioning models.
Next-basket recommendation considers the problem of recommending a set of items into the next basket that users will purchase as a whole. In this paper, we develop a novel mixed model with preferences, popularities an...
详细信息
Next-basket recommendation considers the problem of recommending a set of items into the next basket that users will purchase as a whole. In this paper, we develop a novel mixed model with preferences, popularities and transitions (M-2 ) for the next-basket recommendation. This method models three important factors in next-basket generation process: 1) users' general preferences, 2) items' global popularities and 3) transition patterns among items. Unlike existing recurrent neural network-based approaches, M-2 does not use the complicated networks to model the transitions among items, or generate embeddings for users. Instead, it has a simple encoder-decoder based approach (ed-Trans ) to better model the transition patterns among items. We compared M-2 with different combinations of the factors with 5 state-of-the-art next-basket recommendation methods on 4 public benchmark datasets in recommending the first, second and third next basket. Our experimental results demonstrate that M-2 significantly outperforms the state-of-the-art methods on all the datasets in all the tasks, with an improvement of up to 22.1%. In addition, our ablation study demonstrates that the ed-Trans is more effective than recurrent neural networks in terms of the recommendation performance. We also have a thorough discussion on various experimental protocols and evaluation metrics for next-basket recommendation evaluation.
In this paper, we propose a novel deep neural model for Mathematical Expression Recognition (MER). The proposed model uses encoder-decoder transformer architecture that is supported by additional pre/post-processing m...
详细信息
In this paper, we propose a novel deep neural model for Mathematical Expression Recognition (MER). The proposed model uses encoder-decoder transformer architecture that is supported by additional pre/post-processing modules, to recognize the image of mathematical formula and convert it to a well-formed language. A novel pre-processing module based on domain prior knowledge is proposed to generate random pads around the formula's image to create more efficient feature maps and keeps all the encoder neurons active during the training process. Also, a new post-processing module is developed which uses a sliding window to extract additional position-based information from the feature map, that is proved to be useful in the recognition process. The recurrent decoder module uses the combination of feature maps and the additional position-based information, which takes advantage of a soft attention mechanism, to extract the formula context into the LaTeX well-formed language. Finally, a novel Reinforcement Learning (RL) module processes the decoder output and tunes its results by sending proper feedbacks to the previous steps. The experimental results on im2latex100k benchmark dataset indicate that each devised pre/post-processing as well as the RL refinement module has a positive effect on the performance of the proposed model. The results also demonstrate the higher accuracy of the proposed model compared to the state-of-the-art methods. (c) 2023 Elsevier Ltd. All rights reserved.
暂无评论