Video frame interpolation is an important low-level vision task, which can increase frame rate for more fluent visual experience. Existing methods have achieved great success by employing advanced motion models and sy...
详细信息
Automated segmentation of retinal vessels is challenged by the complexity of curvilinear structures. In this work, we formulate the segmentation task as the decomposition and interaction of topological and scale featu...
Automated segmentation of retinal vessels is challenged by the complexity of curvilinear structures. In this work, we formulate the segmentation task as the decomposition and interaction of topological and scale features of vessels. The connectivity of the curvilinear structure is preserved by the topological properties while the scale features characterize the local morphology. Therefore, we propose a decomposition-then-interaction framework for retinal vessel segmentation. A multi-branch network is designed where the centerline map and scale map are obtained from the original segmentation ground truth to fully exploit these features. The features from auxiliary branches have interacted with cross attention which finally generates the masks of retinal vessels. Experiments on DRIVE, CHASE-DB1, and STARE datasets demonstrate the promising accuracy of the proposed method.
Endobronchial intervention is increasingly used as a minimally invasive means for the treatment of pulmonary diseases. In order to reduce the difficulty of manipulation in complex airway networks, robust lumen detecti...
详细信息
Deep Neural Networks (DNNs) are vulnerable to invisible perturbations on the images generated by adversarial attacks, which raises researches on the adversarial robustness of DNNs. A series of methods represented by t...
详细信息
In the context of Industrial Anomaly Detection (IAD), ensuring the quality of manufactured products is critical. Traditional 2D based methods often fail to capture anomalies present in complex 3D shapes. For effective...
详细信息
Isoforms refer to different mRNA molecules transcribed from the same gene, which can be translated into proteins with varying structures and functions. Predicting the functions of isoforms is an essential topic in bio...
Isoforms refer to different mRNA molecules transcribed from the same gene, which can be translated into proteins with varying structures and functions. Predicting the functions of isoforms is an essential topic in bioinformatics as it can provide valuable insights into the intricate mechanisms of gene regulation and biological processes. Conventionally, gene function labels are standardized in Gene Ontology (GO) terms. However, traditional methods for predicting isoform function are largely limited by the absence of isoform-specific labels, sparse annotations, and the vast number of GO terms. To address these issues, we propose HANIso, a deep learning-based method for isoform function prediction. HANIso leverages a pretrained protein language model to extract features from protein sequences. It also integrates heterogeneous information, such as isoform sequence features, GO annotations, and isoform interaction data, using a Heterogeneous Graph Attention Network (HAN). This allows the model to learn the importance of different sources of information and their semantic relationships through the attention mechanism. Our method can predict function labels at both the gene level and isoform level. We conduct experiments on two species datasets, and the results demonstrate that our method outperforms existing methods on both AUROC and AUPRC. HANIso has the potential to overcome the limitations of traditional methods and provide a more accurate and comprehensive understanding of isoform function.
In surgery-based renal cancer treatment, one of the most essential tasks is the three-dimensional (3D) kidney parsing on computed tomography angiography (CTA) images. In this paper, we propose an end-to-end convolutio...
In surgery-based renal cancer treatment, one of the most essential tasks is the three-dimensional (3D) kidney parsing on computed tomography angiography (CTA) images. In this paper, we propose an end-to-end convolutional neural network-based framework to segment multiple renal structures, including kidneys, kidney tumors, arteries, and veins from arterial-phase CT images. Our method consists of two collaborative modules: First, we propose an encoding-decoding network, named Multi-Branch Dilated Convolutional Network (MBD-Net), consisting of residual, hybrid dilated convolutional, and reduced-dimensional convolutional structures, which improves the feature extraction ability with relatively fewer network parameters. Given that renal tumors and cysts have confusing geometric structures, we also design the Cyst Discriminator to effectively distinguish tumors from cysts without labeling information via gray-scale curves and radiographic features. We have quantitatively evaluated our approach on a publicly available dataset from MICCAI 2022 Kidney Parsing for Renal Cancer Treatment Challenge (KiPA2022), with mean Dice similarity coefficient (DSC) as 96.18%, 90.99%, 88.66% and 80.35% for the kidneys, kidney tumors, arteries, and veins respectively, winning the stable and top performance in the *** relevance—The proposed CNN-Based framework can automatically segment 3D kidneys, renal tumors, arteries, and veins for kidney parsing techniques, benefiting surgery-based renal cancer treatment.
This paper proposes a principle of fully autonomous ground mobile landing recovery of Unmanned Aerial Vehicles (UAV) for the problems of relatively fixed landing point, passive recovery, poor flexibility, and environm...
详细信息
This paper proposes a principle of fully autonomous ground mobile landing recovery of Unmanned Aerial Vehicles (UAV) for the problems of relatively fixed landing point, passive recovery, poor flexibility, and environmental adaptability, which mainly includes localization, landing point tracking, and buffering landing for quadrotor UAV. Firstly, aiming at the problem that it is difficult to accurately obtain the position of a UAV in dynamic mobile landing recovery, a target location method based on Asynchronous Multisensor Information Fusion(AMIF) and servo turntable focus tracking is proposed. Secondly, to achieve fast and high-precision tracking of UAVs, a tracking control strategy of an independently driven landing recovery system and a Stewart six-degree of freedom platform is proposed. Then, to solve the problems of large impact force and center of gravity instability in the landing process of UAV, a stationarity control algorithm based on model prediction and a compliance control algorithm based on adaptive variable impedance are designed to achieve active compliance control while adjusting the position and attitude of the receiving surface in real-time. Finally, a quadrotor unmanned landing and recovery experimental platform is built to verify the feasibility of the ground mobile landing and recovery strategy proposed in this paper and the effectiveness of the control algorithm.
Graph neural networks (GNNs), as a powerful deep learning framework for modeling graph-structured data, have attracted lots of attention recently. Most of existing GNNs need a lot of labeled data. However, constructin...
详细信息
ISBN:
(数字)9798350359312
ISBN:
(纸本)9798350359329
Graph neural networks (GNNs), as a powerful deep learning framework for modeling graph-structured data, have attracted lots of attention recently. Most of existing GNNs need a lot of labeled data. However, constructing generalizable and robust representation from unlabeled graph data remains a challenge for GNNs. Existing graph contrastive learning (GCL) methods either try to uniformly drop edges, or intend to remove unimportant nodes and edges, which heavily relies on the specific structure of the data. Another thing is that vanilla graph convolutional network only utilize low-pass filter (adjacency matrix), which ignores the middle and high frequency information of the graph structural data. To tackle existing challenges in the GCL methods, instead, we propose a noise perturbation based general GCL framework via flexible filters. Specifically, we first add various types of noise to the nodes and edges. Subsequently, we design flexible filters, which are the combination of low, middle and high-pass filters. Our investigation systematically examines the impact of noise and filters, with an initial theoretical analysis linking these elements to the triplet loss function, shedding light on their roles. Extensive experiments in node classification showcase that our proposed approach surpasses existing state-of-the-art baselines. Surprisingly, we find that moderate levels of noise effectively alleviate the over-smoothing problem encountered in GNNs, while the use of flexible filters notably enhances model performance.
In traditional audio captioning methods, a model is usually trained in a fully supervised manner using a human-annotated dataset containing audio-text pairs and then evaluated on the test set from the same dataset. Su...
详细信息
In traditional audio captioning methods, a model is usually trained in a fully supervised manner using a human-annotated dataset containing audio-text pairs and then evaluated on the test set from the same dataset. Such methods have two limitations. First, these methods are often data-hungry and require time-consuming and expensive human annotations to obtain audio-text pairs. Second, these models often suffer from performance degradation in cross-domain scenarios, i.e., when the input audio comes from a different domain than the training set, and this issue has received little attention. To address these issues, we propose a new zero-shot method for audio captioning. Our method is built on the contrastive language-audio pre-training (CLAP) model. During training, the model reconstructs the ground-truth caption using the CLAP text encoder. In the inference stage, the model generates text descriptions from the CLAP audio embeddings of given audio inputs. To enhance the ability of the model in transitioning from text-to-text generation to audio-to-text generation, we propose to use the mixed-augmentations-based soft prompt to learn more robust latent representations, leveraging instance replacement and embedding augmentation. Additionally, we introduce the retrieval-based acoustic-aware hard prompt to improve the cross-domain performance of the model by employing the domain-agnostic label information of sound events. Extensive experiments on AudioCaps and Clotho benchmarks show the effectiveness of our proposed method, which outperforms other zero-shot audio captioning approaches for in-domain scenarios and outperforms the compared methods for cross-domain scenarios, underscoring the generalization ability of our method.
暂无评论