检索结果-内蒙古大学图书馆

arXiv 2023年

作者： Kong, Lingtong Jiang, Boyuan Luo, Donghao Chu, Wenqing Tai, Ying Wang, Chengjie Yang, Jie The Institute of Image Processing and Pattern Recognition Department of Automation Shanghai Jiao Tong University Shanghai200240 China The Youtu Lab Tencent Shanghai200233 China The School of Intelligence Science and Technology Nanjing University Suzhou215163 China

Video frame interpolation is an important low-level vision task, which can increase frame rate for more fluent visual experience. Existing methods have achieved great success by employing advanced motion models and synthesis networks. However, the spatial redundancy when synthesizing the target frame has not been fully explored, that can result in lots of inefficient computation. On the other hand, the computation compression degree in frame interpolation is highly dependent on both texture distribution and scene motion, which demands to understand the spatial-temporal information of each input frame pair for a better compression degree selection. In this work, we propose a novel two-stage frame interpolation framework termed WaveletVFI to address above problems. It first estimates intermediate optical flow with a lightweight motion perception network, and then a wavelet synthesis network uses flow aligned context features to predict multi-scale wavelet coefficients with sparse convolution for efficient target frame reconstruction, where the sparse valid masks that control computation in each scale are determined by a crucial threshold ratio. Instead of setting a fixed value like previous methods, we find that embedding a classifier in the motion perception network to learn a dynamic threshold for each sample can achieve more computation reduction with almost no loss of accuracy. On the common high resolution and animation frame interpolation benchmarks, proposed WaveletVFI can reduce computation up to 40% while maintaining similar accuracy, making it perform more efficiently against other state-of-the-arts. Code is available at https://***/ltkong218/WaveletVFI. Copyright © 2023, The Authors. All rights reserved.

关键词： Wavelet transforms

来源：评论

学校读者我要写书评

暂无评论

Fdinet: Feature-Decomposition-Interaction Networks for Retinal Vessel Segmentation

Fdinet: Feature-Decomposition-Interaction Networks for Retin...

引用

IEEE International Symposium on Biomedical Imaging

作者： Yuncheng Yang Jie Yang Junjun He Yun Gu Institute of Image Processing and Pattern Recognition Shanghai Jiao Tong University Shanghai China School of Biomedical Engineering Shanghai Jiao Tong University Shanghai China Institute of Medical Robotics Shanghai Jiao Tong University Shanghai China

Automated segmentation of retinal vessels is challenged by the complexity of curvilinear structures. In this work, we formulate the segmentation task as the decomposition and interaction of topological and scale features of vessels. The connectivity of the curvilinear structure is preserved by the topological properties while the scale features characterize the local morphology. Therefore, we propose a decomposition-then-interaction framework for retinal vessel segmentation. A multi-branch network is designed where the centerline map and scale map are obtained from the original segmentation ground truth to fully exploit these features. The features from auxiliary branches have interacted with cross attention which finally generates the masks of retinal vessels. Experiments on DRIVE, CHASE-DB1, and STARE datasets demonstrate the promising accuracy of the proposed method.

关键词：

来源：评论

学校读者我要写书评

暂无评论

CDFI: Cross Domain Feature Interaction for Robust Bronchi Lumen Detection

arXiv

引用

arXiv 2023年

作者： Xu, Jiasheng Zhang, Tianyi Wu, Yangqian Yang, Jie Yang, Guang-Zhong Gu, Yun The Institute of Medical Robotics Shanghai Jiao Tong University Shanghai China The Institute of Image Processing and Pattern Recognition Shanghai Jiao Tong University Shanghai China The Shanghai Center for Brain Science and Brain-Inspired Technology Shanghai China

Endobronchial intervention is increasingly used as a minimally invasive means for the treatment of pulmonary diseases. In order to reduce the difficulty of manipulation in complex airway networks, robust lumen detection is essential for intraoperative guidance. However, these methods are sensitive to visual artifacts which are inevitable during the surgery. In this work, a cross domain feature interaction (CDFI) network is proposed to extract the structural features of lumens, as well as to provide artifact cues to characterize the visual features. To effectively extract the structural and artifact features, the Quadruple Feature Constraints (QFC) module is designed to constrain the intrinsic connections of samples with various imaging-quality. Furthermore, we design a Guided Feature Fusion (GFF) module to supervise the model for adaptive feature fusion based on different types of artifacts. Results show that the features extracted by the proposed method can preserve the structural information of lumen in the presence of large visual variations, bringing much-improved lumen detection accuracy. Copyright © 2023, The Authors. All rights reserved.

关键词： Feature extraction

来源：评论

学校读者我要写书评

暂无评论

Towards Robust Neural Networks Via Orthogonal Diversity

SSRN

引用

SSRN 2023年

作者： Fang, Kun Tao, Qinghua Wu, Yingwen Li, Tao Cai, Jia Cai, Feipeng Huang, Xiaolin Yang, Jie Institute of Image Processing and Pattern Recognition Department of Automation Shanghai Jiao Tong University Shanghai China ESAT-STADIUS KU Leuven LeuvenB-3001 Belgium Central Media Technology Institute Huawei Technologies Ltd. China

Deep Neural Networks (DNNs) are vulnerable to invisible perturbations on the images generated by adversarial attacks, which raises researches on the adversarial robustness of DNNs. A series of methods represented by the adversarial training and its variants have proven as one of the most effective techniques in enhancing the DNN robustness. Generally, adversarial training focuses on enriching the training data by involving perturbed data. Such data augmentation effect of the involved perturbed data in adversarial training does not contribute to the robustness of DNN itself and usually suffers from clean accuracy drop. Towards the robustness of DNN itself, we in this paper propose a novel defense that aims at augmenting the model in order to learn features that are adaptive to diverse inputs, including adversarial examples. More specifically, to augment the model, multiple paths are embedded into the network, and an orthogonality constraint is imposed on these paths to guarantee the diversity among them. A margin-maximization loss is then designed to further boost such DIversity via Orthogonality (DIO). In this way, the proposed DIO augments the model and enhances the robustness of DNN itself as the learned features can be corrected by these mutually-orthogonal paths. Extensive empirical results on various data sets, architectures, and attacks verify the adversarial robustness of the proposed DIO utilizing model augmentation. Besides, DIO can also be flexibly combined with different data augmentation techniques (e.g., TRADES and DDPM), further promoting robustness gains. © 2023, The Authors. All rights reserved.

关键词： Deep neural networks

来源：评论

学校读者我要写书评

暂无评论

2M3DF: Advancing 3D Industrial Defect Detection with Multi Perspective Multimodal Fusion Network

引用

IEEE Transactions on Circuits and Systems for Video Technology 2025年

作者： Asad, Mujtaba Azeem, Waqar Jiang, He Mustafa, Hafiz Tayyab Yang, Jie Liu, Wei Shanghai Jiao Tong University Institute of Image Processing and Pattern Recognition Department of Automation Shanghai200240 China Lahore Garrison University Department of Software Engineering Lahore54000 Pakistan China University of Mining and Technology School of Information and Control Engineering Jiangsu Xuzhou221116 China Zhejiang Normal University School of Computer Science and Technology Jinhua321004 China

In the context of Industrial Anomaly Detection (IAD), ensuring the quality of manufactured products is critical. Traditional 2D based methods often fail to capture anomalies present in complex 3D shapes. For effective anomaly detection in 3D shapes, it is essential to incorporate global semantic context, local geometric structure, and color information of the object. To fully leverage these features, we propose a network named 2M3DF, that leverages knowledge from multi-view RGB images and corresponding point cloud information for enhanced anomaly detection performance. Our model initially employs pre-trained feature extractors that generate local features from multi-view RGB images and corresponding point clouds. The novel inter-modality feature representation and fusion module first adapts these inter-modality features and then effectively aligns and aggregates these multimodality features on a pixel-to-point basis. To learn the normality from point-wise fused multimodal features, we fit a multivariate Gaussian distribution to model the normal feature distribution. Comprehensive experimental evaluations using the MVTec3D-AD and Eyecandies dataset validate the effectiveness of our propose model and demonstrate significant improvements in comparison to existing state-of-the-art methods. Our model achieves a 96.6% mean I-AUROC while delivering real-time results. © 1991-2012 IEEE.

关键词： Normal distribution

来源：评论

学校读者我要写书评

暂无评论

Isoform Function Prediction Based on Heterogeneous Graph Attention Networks

Isoform Function Prediction Based on Heterogeneous Graph Att...

引用

IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

作者： Kuo Guo Yifan Li Hao Chen Hong-Bin Shen Yang Yang Department of Computer Science and Engineering Shanghai Jiao Tong University and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering Shanghai China Institute of Image Processing and Pattern Recognition Shanghai Jiao Tong University and Key Laboratory of System Control and Information Processing Ministry of Education of China Shanghai China Computational Biology Department School of Computer Science Carnegie Mellon University Pittsburgh PA USA

Isoforms refer to different mRNA molecules transcribed from the same gene, which can be translated into proteins with varying structures and functions. Predicting the functions of isoforms is an essential topic in bioinformatics as it can provide valuable insights into the intricate mechanisms of gene regulation and biological processes. Conventionally, gene function labels are standardized in Gene Ontology (GO) terms. However, traditional methods for predicting isoform function are largely limited by the absence of isoform-specific labels, sparse annotations, and the vast number of GO terms. To address these issues, we propose HANIso, a deep learning-based method for isoform function prediction. HANIso leverages a pretrained protein language model to extract features from protein sequences. It also integrates heterogeneous information, such as isoform sequence features, GO annotations, and isoform interaction data, using a Heterogeneous Graph Attention Network (HAN). This allows the model to learn the importance of different sources of information and their semantic relationships through the attention mechanism. Our method can predict function labels at both the gene level and isoform level. We conduct experiments on two species datasets, and the results demonstrate that our method outperforms existing methods on both AUROC and AUPRC. HANIso has the potential to overcome the limitations of traditional methods and provide a more accurate and comprehensive understanding of isoform function.

关键词：

来源：评论

学校读者我要写书评

暂无评论

MBD-Net: Multi-Branch Dilated Convolutional Network With Cyst Discriminator for Renal Multi-Structure Segmentation

MBD-Net: Multi-Branch Dilated Convolutional Network With Cys...

引用

Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

作者： Yusheng Liu Yingjie Zhao Meihuan Wang Yichao Hao Xiuying Wang Lisheng Wang Department of Automation Institute of Image Processing and Pattern Recognition Shanghai Jiao Tong University Shanghai China College of Medicine and Biological Information Engineering Northeastern University Shenyang China School of Computer Science The University of Sydney Sydney NSW Australia

In surgery-based renal cancer treatment, one of the most essential tasks is the three-dimensional (3D) kidney parsing on computed tomography angiography (CTA) images. In this paper, we propose an end-to-end convolutional neural network-based framework to segment multiple renal structures, including kidneys, kidney tumors, arteries, and veins from arterial-phase CT images. Our method consists of two collaborative modules: First, we propose an encoding-decoding network, named Multi-Branch Dilated Convolutional Network (MBD-Net), consisting of residual, hybrid dilated convolutional, and reduced-dimensional convolutional structures, which improves the feature extraction ability with relatively fewer network parameters. Given that renal tumors and cysts have confusing geometric structures, we also design the Cyst Discriminator to effectively distinguish tumors from cysts without labeling information via gray-scale curves and radiographic features. We have quantitatively evaluated our approach on a publicly available dataset from MICCAI 2022 Kidney Parsing for Renal Cancer Treatment Challenge (KiPA2022), with mean Dice similarity coefficient (DSC) as 96.18%, 90.99%, 88.66% and 80.35% for the kidneys, kidney tumors, arteries, and veins respectively, winning the stable and top performance in the *** relevance—The proposed CNN-Based framework can automatically segment 3D kidneys, renal tumors, arteries, and veins for kidney parsing techniques, benefiting surgery-based renal cancer treatment.

关键词：

来源：评论

学校读者我要写书评

暂无评论

An Active Landing Recovery Method for Quadrotor UAV: Localization, Tracking and Buffering Landing

引用

IFAC-PapersOnLine 2023年第2期56卷 3366-3372页

作者： Yongkang Xu Zhihua Chen Shoukun Wang Junzheng Wang National Key Lab of Autonomous Intelligent Unmanned Systems Beijing Institute of Technology Beijing CO 100081 P.R.China Key Laboratory of Jiangxi Province for Image Processing and Pattern Recognition and MOE Key Lab of Nondestructive Testing Technology School of Information Engineering Nanchang Hangkong University Nanchang CO 330063 P.R. China

This paper proposes a principle of fully autonomous ground mobile landing recovery of Unmanned Aerial Vehicles (UAV) for the problems of relatively fixed landing point, passive recovery, poor flexibility, and environmental adaptability, which mainly includes localization, landing point tracking, and buffering landing for quadrotor UAV. Firstly, aiming at the problem that it is difficult to accurately obtain the position of a UAV in dynamic mobile landing recovery, a target location method based on Asynchronous Multisensor Information Fusion(AMIF) and servo turntable focus tracking is proposed. Secondly, to achieve fast and high-precision tracking of UAVs, a tracking control strategy of an independently driven landing recovery system and a Stewart six-degree of freedom platform is proposed. Then, to solve the problems of large impact force and center of gravity instability in the landing process of UAV, a stationarity control algorithm based on model prediction and a compliance control algorithm based on adaptive variable impedance are designed to achieve active compliance control while adjusting the position and attitude of the receiving surface in real-time. Finally, a quadrotor unmanned landing and recovery experimental platform is built to verify the feasibility of the ground mobile landing and recovery strategy proposed in this paper and the effectiveness of the control algorithm.

关键词： Quadrotor UAV Mobile Autonomous recovery Target localization Falling point tracking Buffering landing

来源：评论

学校读者我要写书评

暂无评论

Noise Perturbation Based Graph Contrastive Learning via Flexible Filters for Node Classification

Noise Perturbation Based Graph Contrastive Learning via Flex...

引用

International Joint Conference on Neural Networks (IJCNN)

作者： Zhilong Xiong Jia Cai Ranhui Yan Xiaolin Huang xFusion Digital Technologies Company Limited Shenzhen China School of Digital Economics Guangdong University of Finance & Economics Guangzhou China School of Statistics and Mathematics Guangdong University of Finance & Economics Guangzhou China Institute of Image Processing and Pattern Recognition Shanghai Jiao Tong University Shanghai China

ISBN: (数字)9798350359312

ISBN: (纸本)9798350359329

Graph neural networks (GNNs), as a powerful deep learning framework for modeling graph-structured data, have attracted lots of attention recently. Most of existing GNNs need a lot of labeled data. However, constructing generalizable and robust representation from unlabeled graph data remains a challenge for GNNs. Existing graph contrastive learning (GCL) methods either try to uniformly drop edges, or intend to remove unimportant nodes and edges, which heavily relies on the specific structure of the data. Another thing is that vanilla graph convolutional network only utilize low-pass filter (adjacency matrix), which ignores the middle and high frequency information of the graph structural data. To tackle existing challenges in the GCL methods, instead, we propose a noise perturbation based general GCL framework via flexible filters. Specifically, we first add various types of noise to the nodes and edges. Subsequently, we design flexible filters, which are the combination of low, middle and high-pass filters. Our investigation systematically examines the impact of noise and filters, with an initial theoretical analysis linking these elements to the triplet loss function, shedding light on their roles. Extensive experiments in node classification showcase that our proposed approach surpasses existing state-of-the-art baselines. Surprisingly, we find that moderate levels of noise effectively alleviate the over-smoothing problem encountered in GNNs, while the use of flexible filters notably enhances model performance.

关键词： Filters Sensitivity analysis Perturbation methods image processing image edge detection Noise Low-pass filters

来源：评论

学校读者我要写书评

暂无评论

Zero-Shot Audio Captioning Using Soft and Hard Prompts

IEEE Transactions on Audio, Speech and Language Processing

引用

IEEE Transactions on Audio, Speech and Language processing 2025年 33卷 2045-2058页

作者： Yiming Zhang Xuenan Xu Ruoyi Du Haohe Liu Yuan Dong Zheng-Hua Tan Wenwu Wang Zhanyu Ma Pattern Recognition and Intelligent System Laboratory School of Artificial Intelligence Beijing University of Posts and Telecommunications Beijing China Department of Computer Science and Engineering Shanghai Jiao Tong University Shanghai China Centre for Vision Speech and Signal Processing University of Surrey Guildford U.K. Department of Electronic Systems Aalborg University Aalborg Denmark

In traditional audio captioning methods, a model is usually trained in a fully supervised manner using a human-annotated dataset containing audio-text pairs and then evaluated on the test set from the same dataset. Such methods have two limitations. First, these methods are often data-hungry and require time-consuming and expensive human annotations to obtain audio-text pairs. Second, these models often suffer from performance degradation in cross-domain scenarios, i.e., when the input audio comes from a different domain than the training set, and this issue has received little attention. To address these issues, we propose a new zero-shot method for audio captioning. Our method is built on the contrastive language-audio pre-training (CLAP) model. During training, the model reconstructs the ground-truth caption using the CLAP text encoder. In the inference stage, the model generates text descriptions from the CLAP audio embeddings of given audio inputs. To enhance the ability of the model in transitioning from text-to-text generation to audio-to-text generation, we propose to use the mixed-augmentations-based soft prompt to learn more robust latent representations, leveraging instance replacement and embedding augmentation. Additionally, we introduce the retrieval-based acoustic-aware hard prompt to improve the cross-domain performance of the model by employing the domain-agnostic label information of sound events. Extensive experiments on AudioCaps and Clotho benchmarks show the effectiveness of our proposed method, which outperforms other zero-shot audio captioning approaches for in-domain scenarios and outperforms the compared methods for cross-domain scenarios, underscoring the generalization ability of our method.

关键词： Training Decoding Semantics Data models Acoustics Electronic mail Benchmark testing Transformers Robustness Perturbation methods

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：