Current research widely acknowledges that the subcellular localization of mRNA is crucial for understanding its biological functions. However, current methods for mRNA subcellular localization based on k-mer frequency...
详细信息
Current research widely acknowledges that the subcellular localization of mRNA is crucial for understanding its biological functions. However, current methods for mRNA subcellular localization based on k-mer frequency features may overlook the sequential information of the sequence, and a single encoding method may not adequately extract the sequence's features. This paper proposes a novel deep learning prediction method, CSpredR, specifically designed for predicting the subcellular localization of multi-site mRNAs. Unlike previous methods, CSpredR first employs k-mer to tokenize the mRNA sequences, then converts the tokenized sequences into de Bruijn graphs, thereby enabling a more precise capture of the structural information within the sequences. To mitigate the impact of lost sequential information and better capture sequence features, we combine word2vec and fasttext models to extract the features of each node in the graph and retain the sequence order. They can encode the k-mer units in the sequence into wordvectors, thus serving as the node feature vectors of the graph. In this way, each node in the graph is assigned a feature vector containing rich semantic information. Subsequently, we utilize multi-scale convolutional neural networks and bidirectional long short-term memory networks to capture sequence features, respectively, and fuse the results as input for a multi-head attention mechanism model. The information from these heads is integrated into the node representations, and finally, the attention-processed data are fed into an MLP (Multi-Layer Perceptron) for prediction tasks. Extensive experiments reveal that CSpredR achieves a 2% improvement over the best existing predictors, offering a more effective tool for mRNA subcellular localization prediction.
This article examines and analyzes the use of the word2vec method for solving semantic coding problems. The task of semantic coding has acquired particular importance with the development of search system. The relevan...
详细信息
ISBN:
(纸本)9783030264741
This article examines and analyzes the use of the word2vec method for solving semantic coding problems. The task of semantic coding has acquired particular importance with the development of search system. The relevance of such technologies is associated primarily with the ability to search in large-volume databases. Based on the obtained practical results, a prototype of a search system based on the use of selected semantic information for the implementation of relevant search in the database of documents has been developed. Proposed two main scenarios for the implementation of such a search. The training set has been prepared on the basis of documents in the English version of Wikipedia, which includes more than 100,000 original articles. The resulting set was used in the experimental part of the work to test the effectiveness of the developed prototype search system.
暂无评论