The task of retrieving and analyzing mass spectra is indispensable for the identification of compounds in mass spectrometry (MS). This methodology is of critical importance as it enables researchers to correlate obser...
详细信息
ISBN:
(数字)9798350386226
ISBN:
(纸本)9798350386233
The task of retrieving and analyzing mass spectra is indispensable for the identification of compounds in mass spectrometry (MS). This methodology is of critical importance as it enables researchers to correlate observed spectra with established databases, thereby precisely determining the chemical composition of samples. The primary challenges to its efficacy lie in optimizing the balance between retrieval accuracy and processing speed. Empirical studies have demonstrated that by converting mass spectra into embeddings via deep learning, it is possible to achieve both high accuracy and speed in retrieval. Nevertheless, there are complex challenges associated with employing deep learning for spectral embedding, particularly within the domain of electron ionization mass spectrometry (EI-MS). In this paper, we introduce a novel representation learning technique termed EI-MS2VEC for EI-MS retrieval. Our spectrum retrieval methodology surpasses current state-of-the-art techniques such as FastEI. For the in-silico library, we attain hit rate@1 and hit rate@10 of 43.6% and 84.5%, respectively, compared to FastEI’s 36.7% and 80.4%. Moreover, our retrieval approach operates with an order of magnitude greater speed than FastEI. The source code is available on Github (https://***/xfcui/EI-MS2VEC).
With information consumption via online video streaming becoming increasingly popular, misinformation video poses a new threat to the health of the online information ecosystem. Though previous studies have made much ...
详细信息
Mass spectrometry serves as a pivotal tool for the analysis of small molecules through an examination of their mass-to-charge ratios. Recent advancements in deep learning have markedly enhanced the analysis of mass sp...
详细信息
ISBN:
(数字)9798350386226
ISBN:
(纸本)9798350386233
Mass spectrometry serves as a pivotal tool for the analysis of small molecules through an examination of their mass-to-charge ratios. Recent advancements in deep learning have markedly enhanced the analysis of mass spectrometric data, facilitating the prediction of novel small molecule structures without the necessity of extensive databases. Nonetheless, the paucity of annotated datasets impedes the efficacious training of molecular generation models predicated on MS
2
spectra. To mitigate this limitation, we introduce ctMSNovelist, an avant-garde method that amalgamates pre-training, fine-tuning, and co-training techniques to construct a more precise model for the generation of molecular structures from tandem mass spectrometry data. This novel approach augments both the training regimen and the predictive accuracy of the MSNovelist model, thereby surmounting the obstacle of limited data. The methodology commences with the pretraining of a Variational Autoencoder (VAE) to generate molecular fingerprints derived from SMILES strings. Subsequently, it undergoes fine-tuning to emulate noisy fingerprints originating from mass spectrometry (MS) data. Concurrently, MSNovelist is co-trained utilizing these simulated fingerprints, inclusive of the highly noisy variants produced in the early stages of VAE training. The incorporation of a substantial volume of noisy data serves to enhance model accuracy and avert overfitting. We evaluated ctMSNovelist using the GNPS dataset and attained a SMILES prediction accuracy of 48.8%, representing a 4.1% enhancement over MSNovelist. It is pertinent to note that the sole distinction between ctMSNovelist and MSNovelist in this experiment was the training process. The code and models are publicly available at https://***/xfcui/ctMSNovelist.
It is a challenge for USV navigation due to uncertainties and disturbances in the complex marine environment, which may lead to tilting or collision. However, current path planning methods for USVs lack dynamic enviro...
详细信息
Adding subtle perturbations to an image can cause the classification model to misclassify, and such images are called adversarial examples. Adversarial examples threaten the safe use of deep neural networks, but when ...
详细信息
Exploring dense matching between the current frame and past frames for long-range context modeling, memory-based methods have demonstrated impressive results in video object segmentation (VOS) recently. Nevertheless, ...
Exploring dense matching between the current frame and past frames for long-range context modeling, memory-based methods have demonstrated impressive results in video object segmentation (VOS) recently. Nevertheless, due to the lack of instance understanding ability, the above approaches are oftentimes brittle to large appearance variations or viewpoint changes resulted from the movement of objects and cameras. In this paper, we argue that instance understanding matters in VOS, and integrating it with memory-based matching can enjoy the synergy, which is intuitively sensible from the definition of VOS task, i.e., identifying and segmenting object instances within the video. Towards this goal, we present a two-branch network for VOS, where the query-based instance segmentation (IS) branch delves into the instance details of the current frame and the VOS branch performs spatial-temporal matching with the memory bank. We employ the well-learned object queries from IS branch to inject instance-specific information into the query key, with which the instance-augmented matching is further performed. In addition, we introduce a multi-path fusion block to effectively combine the memory readout with multi-scale features from the instance segmentation decoder, which incorporates high-resolution instance-aware features to produce final segmentation results. Our method achieves state-of-the-art performance on DAVIS 2016/2017 val (92.6% and 87.1%), DAVIS 2017 test-dev (82.8%), and YouTube-VOS 2018/2019 val (86.3% and 86.3%), outperforming alternative methods by clear margins.
Diffusion models have made significant contributions to computer vision, sparking a growing interest in the community recently regarding the application of them to graph generation. Existing discrete graph diffusion m...
详细信息
Diffusion models have made significant contributions to computer vision, sparking a growing interest in the community recently regarding the application of them to graph generation. Existing discrete graph diffusion models exhibit heightened computational complexity and diminished training efficiency. A preferable and natural way is to directly diffuse the graph within the latent space. However, due to the non-Euclidean structure of graphs is not isotropic in the latent space, the existing latent diffusion models effectively make it difficult to capture and preserve the topological information of graphs. To address the above challenges, we propose a novel geometrically latent diffusion framework HypDiff. Specifically, we first establish a geometrically latent space with interpretability measures based on hyperbolic geometry, to define anisotropic latent diffusion processes for graphs. Then, we propose a geometrically latent diffusion process that is constrained by both radial and angular geometric properties, thereby ensuring the preservation of the original topological properties in the generative graphs. Extensive experimental results demonstrate the superior effectiveness of HypDiff for graph generation with various topologies. Copyright 2024 by the author(s)
The goal of multi-label learning with missing labels (MLML) is assigning each testing instance multiple labels given training instances that have a partial set of labels. The most challenging issue is to complete the ...
详细信息
Multimodal sentiment analysis aims to identify human emotions by leveraging multimodal information, including language, visual, and audio data. Different modalities contribute unequally to sentiment analysis, with tex...
详细信息
Machine learning (ML) approaches have been successfully applied to accelerating exact combinatorial optimization (CO) solvers. However, many of them fail to explain what patterns they have learned that accelerate the ...
Machine learning (ML) approaches have been successfully applied to accelerating exact combinatorial optimization (CO) solvers. However, many of them fail to explain what patterns they have learned that accelerate the CO algorithms due to the black-box nature of ML models like neural networks, and thus they prevent researchers from further understanding the tasks they are interested in. To tackle this problem, we propose the first graph-based algorithm discovery framework--namely, graph symbolic discovery for exact combinatorial optimization solver (GS4CO)--that learns interpretable branching policies directly from the general bipartite graph representation of CO problems. Specifically, we mainly focus on the variable selection part of the branching policy. We design a unified representation for symbolic variable selection policies with graph inputs, and then we employ a Transformer with multiple treestructural encodings to generate symbolic trees end-to-end, which effectively reduces the cumulative error from iteratively distilling graph neural networks. Experiments show that GS4CO learned interpretable and lightweight policies outperform all the baselines on CPU machines, including both the human-designed and the learning-based. GS4CO shows an encouraging step towards general algorithm discovery on modern CO solvers. Codes are available at https://***/MIRAlab-USTC/L2O-GS4CO.
暂无评论