DNA-binding proteins play a crucial role in gene regulation and biological functions, making their accurate prediction essential for understanding biological processes and developing new drugs. Traditional DNA-binding...
详细信息
ISBN:
(数字)9798350368208
ISBN:
(纸本)9798350368215
DNA-binding proteins play a crucial role in gene regulation and biological functions, making their accurate prediction essential for understanding biological processes and developing new drugs. Traditional DNA-binding protein prediction methods rely on sequence features and machine learning algorithms; however, these approaches often face challenges in accuracy and generalization when dealing with complex sequence information. To address this issue, this study proposes a deep learning-based prediction model that effectively captures both spatial and temporal features of sequences by combining Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN). Experimental results demonstrate that the proposed model performs excellently on multiple benchmark datasets, significantly improving prediction accuracy and robustness. This research offers a new approach to DNA-binding protein prediction and highlights the immense potential of deep learning in bioinformatics.
Promoter identification is an important field of genomics research and holds a significant impact on understanding the mechanism of transcriptional regulation. The recent advancements of computational methods have mar...
详细信息
ISBN:
(数字)9798350386226
ISBN:
(纸本)9798350386233
Promoter identification is an important field of genomics research and holds a significant impact on understanding the mechanism of transcriptional regulation. The recent advancements of computational methods have markedly enhanced classification accuracy. Nonetheless, the desire for higher accuracy drives the proposal of more innovative methods. In this paper, we propose BiDNAMamba, a novel state space model (SSM) for promoter motif identification from DNA sequences and offers the flexibility to easily expand scan path to accommodate various application scenarios. Compared with other pre-training computational algorithms, BiDNAMamba owned a significant superiority in terms of pre-training consumption. In addition, BiDNAMamba achieved excellent promoter identification performance, with an absolute improvement in the area under the receiver operating characteristic of 0.74%. Related interpretability analysis revealed the reasoning behind decisions made by the BiDNAMamba and filled the gap in visualization of SSM-based model. With its unique advantages and excellent performance, BiDNAMamba is poised to become a backbone in the realm of genomics foundation models. The code will be available at https://***/GaryinDeep/BiDNAMamba.
In this position paper, we discussed the potential to fit mechanistic mathematical models of acute myeloid leukaemia to patient data. The overarching aim was to estimate personalized models. We briefly introduced one ...
详细信息
ISBN:
(纸本)9789897584909
In this position paper, we discussed the potential to fit mechanistic mathematical models of acute myeloid leukaemia to patient data. The overarching aim was to estimate personalized models. We briefly introduced one selected mechanistic ODE model to illustrate the approach. The usually available outcome measures, e.g. in clinical datasets, were aligned with the model's prediction capabilities. Among the most relevant outcomes (blast load, complete remission, and survival), only blast load turned out to be well suited to be used in the model fitting process. We formulated an optimization problem that, finally, resulted in personalized model parameters. The degree of personalization could be chosen by selecting only a subset of parameters within the optimization problem. To illustrate the fitness landscape for individual patients we performed a grid search and calculated the fitness values for each grid point. The grid search revealed that an optimum exists, but that the fitness landscape can be very noisy. In these cases, gradient-based solvers will perform poorly and other algorithms needs to be chosen. Finally, we belief that personalized model fitting will be a promising approach to integrate mechanistic mathematical models into clinical research.
Entity resolution (ER) finds records that refer to the same entities in the real world. Blocking is an important task in ER, filtering out unnecessary comparisons and speeding up ER. Blocking is usually an unsupervise...
详细信息
A promoter is a short DNA sequence located near the start codon, responsible for initiating the transcription of specific genes in the genome. Accurate identification of promoters can help better understand the transc...
详细信息
ISBN:
(数字)9798331533991
ISBN:
(纸本)9798331534004
A promoter is a short DNA sequence located near the start codon, responsible for initiating the transcription of specific genes in the genome. Accurate identification of promoters can help better understand the transcriptional regulation of genes. Identifying DNA promoters through traditional biochemical experimental methods is time-consuming and costly. In recent years, deep learning algorithms have shown excellent performance in the field of bioinformatics. Therefore, this paper proposes a new DNA promoter prediction model iProm-EC based on deep learning. This model automatically extracts sequence features of DNA promoters through embedding layers, and integrates architectures such as convolutional layers, pooling layers, and fully connected layers to classify promoters. The experimental results show that compared to other existing prediction models, iProm-EC has significantly improved in various indicators (Acc, Sn, Sp, Mcc, Auc). This indicates that the iProm-EC model proposed in this article has good predictive ability for DNA promoters.
The proceedings contain 27 papers. The special focus in this conference is on Optimization and Applications. The topics include: One Segregation Problem for the Sum of Two Quasiperiodic Sequences;Parame...
ISBN:
(纸本)9783031478581
The proceedings contain 27 papers. The special focus in this conference is on Optimization and Applications. The topics include: One Segregation Problem for the Sum of Two Quasiperiodic Sequences;Parameter Estimation via Time Modeling for MLIR Implementation of GEMM;Reliable Production Process Design Problem: Compact MILP Model and ALNS-Based Primal Heuristic;reciprocal Import Tariffs in the Monopolistic Competition Open Economy;numerical Modelling of Mean-Field Game Epidemic;models of Decision-Making in a Game with Nature Under Conditions of Probabilistic Uncertainty;application of Optimization methods in Solving the Problem of Optimal Control of Assets and Liabilities by a Bank;an Endogenous Production Function in the Green Solow Model;two Balanced Growth Paths Based on an Endogenous Production Function;the pth-Order Karush-Kuhn-Tucker Type Optimality Conditions for Nonregular Inequality Constrained Optimization Problems;analysis of Import Substitution Processes Taking into Account Industry Specifics in the Regional Economy;features of Optimal Drilling in Gas Fields;on Solving the Robust Transfer Line Balancing Problem with Parallel Tasks and Interval Processing Times;on Cluster Editing Problem with Clusters of Small Sizes;limiting the Search in Brute Force Method for Subsets Detection;bicentered Interval Newton Operator for Robot’s Workspace Approximation;statistical Performance of Subgradient Step-Size Update Rules in Lagrangian Relaxations of Chance-Constrained Optimization models;the Customization of the Geodesic Algorithm for Optimal Fastener Arrangement;convergence Rate of Gradient-Concordant methods for Smooth Unconstrained Optimization;a Derivative-Free Nonlinear Least Squares Solver for Nonsmooth Functions;stochastic Adversarial Noise in the "Black Box" Optimization Problem;Accelerated Zero-Order SGD Method for Solving the Black Box Optimization Problem Under "Overparametrization" Condition;algorithms for Euclidean-Regularised Optimal Transport;real Accel
Cerebral stroke occurs due to hindered blood flow to various parts of the brain, often caused by ischemic blockages or haemorrhagic vascular rupture. These disruptions can lead to severe brain damage or death if not a...
详细信息
ISBN:
(数字)9798350355468
ISBN:
(纸本)9798350355475
Cerebral stroke occurs due to hindered blood flow to various parts of the brain, often caused by ischemic blockages or haemorrhagic vascular rupture. These disruptions can lead to severe brain damage or death if not addressed promptly. Annually, over 15 million people worldwide suffer from cerebral stroke, with lifestyle factors such as hypertension and smoking playing a significant role. This study presents a comparative analysis of cerebral stroke prediction models utilizing classification algorithms—Random Forest, XGBoost, Gaussian Naïve Bayes, and KNN—as well as anomaly detection methods such as Local Outlier Factor, Isolation Forest, and Gaussian Probability Distribution. Among these, the Isolation Forest algorithm demonstrated superior performance, achieving a weighted TPR-TNR score of 0.7449, reflecting balanced anomaly detection capabilities in imbalanced datasets. The findings highlight the importance of tailored machine learning solutions for stroke risk prediction. Future work will explore hybrid approaches to enhance detection accuracy while addressing dataset challenges.
In this paper, we present a numerical scheme to select the kernel shape parameters within partition of unity methods. In an interpolation framework, we propose the use of a leave-one-out cross validation technique com...
详细信息
Amidst global population growth and escalating food demands, real-time agricultural monitoring is crucial for ensuring food security. During the initial stages of crop growth, however, it faces significant challenges ...
详细信息
Next-generation sequencing techniques provide us with an opportunity for generating sequenced proteins and identifying the biological families and functions of these proteins. However, compared with identified protein...
详细信息
Next-generation sequencing techniques provide us with an opportunity for generating sequenced proteins and identifying the biological families and functions of these proteins. However, compared with identified proteins, uncharacterized proteins consist of a notable percentage of the overall proteins in the bioinformatics research field. Traditional family classification methods often devote themselves to extracting N-Gram features from sequences while ignoring motif information as well as affinity information between motifs and adjacent amino acids. Previous clustering-based algorithms have typically been used to define protein features with domain knowledge and annotate protein families based on extensive data samples. In this paper, we apply CNN based amino acid representation learning with limited characterized proteins to explore the performances of annotated protein families by taking into account the amino acid location information. Additionally, we apply the method to all reviewed protein sequences with their families retrieved from the UniProt database to evaluate our approach. Last but not least, we verify our model using those unreviewed protein records, which is typically ignored by other methods.
暂无评论