检索结果-内蒙古大学图书馆

15th International Conference on Digital Image processing, ICDIP 2023

作者： Ding, Hui Chen, Weifeng He, Zhifen Li, Bo Liu, Bin Wang, Kang School of Mathematics and Information Science Key Laboratory of Jiangxi Province for Image Processing and Pattern Recognition Nanchang Hangkong University Nanchang China Jiangxi Science and Technology Infrastructure Center and Jiangxi Computing Center Jiangxi Computing Center Nanchang China

ISBN: (纸本)9798400708237

Single-image super-resolution (SR) tasks have achieved fancy success in recent years by leveraging deep convolution neural network (CNN). Although CNNs obtain powerful representation capabilities of reconstructing a high-resolution (HR) image from its corresponding low-resolution (LR) observation, it is hard to apply these accomplishments to real world for its large number of parameters that bring about huge computing resource cost. To solve this problem, many researchers turn their directions to lightening the network by delicate designs while keeping the performance in the high level. In this paper, we propose a new method to further reduce the parameters by using the new feature extraction paradigm called Mixer which only contains Multi-Layer Perceptron (MLP). Compared to convolutional operations, Mixer operation has the same representation capabilities with fewer parameters. In this paper, three typical lightweight networks, where we replace the convolutional operation with Mixer, are used to show the excellent ability. Our experimental results demonstrate that Mixer can help these lightweight networks further reduce the number of parameters by up to 38% while keep their performance in the same level. © 2023 ACM.

关键词： Feature extraction

来源：评论

学校读者我要写书评

暂无评论

Element-Wise Layer Normalization for Continuous Signal Representation 23

Element-Wise Layer Normalization for Continuous Signal Repre...

引用

15th International Conference on Digital Image processing, ICDIP 2023

作者： Chen, Weifeng Ding, Hui He, Zhifen Li, Bo Liu, Bin Wang, Kang School of Mathematics and Information Science Key Laboratory of Jiangxi Province for Image Processing and Pattern Recognition Nanchang Hangkong University Nanchang China Jiangxi Science and Technology Infrastructure Center Jiangxi Computing Center Jiangxi Computing Center Nanchang China

ISBN: (纸本)9798400708237

Implicit neural representation (INR), sometimes also referred to coordinate-based representation or fitting, has gained the state-of-the-art performance in numerous research fields including computer vision and computer graphics due to the powerful continuous representation ability. Recent researches indicate that Fourier embedding is critical for INR to fit realistic images with high-frequency details. However, is Fourier embedding all we need for high-frequency coordinate (image) fitting? In this paper, we revisit the problem of coordinate fitting from a novel perspective of distribution mapping. Fourier embedding, as a preprocessing step of coordinate fitting, essentially performs the operation of mapping the uniform coordinate distribution to a normal distribution, and makes the learning of mapping function between two similar smooth distribution become easier. However, the number of discrete Fourier basis function affects the fitting performance dramatically and cannot be determined automatically. Based on the above analysis, a simple yet efficient INR coordinate fitting method is proposed in this paper, which demonstrates that the Fourier embedding is not the only way to improve INR. The proposed method only adds an element-wise layer normalization (ELN) module to the vanilla multi-layer perception (MLP) with ReLU activation. Experimental results on public database demonstrate that the proposed method outperforms the state-of-the-art methods using Fourier embedding. © 2023 ACM.

关键词： Mapping

来源：评论

学校读者我要写书评

暂无评论

FASTC: A Fast Attentional Framework for Semantic Traversability Classification Using Point Cloud

arXiv

引用

arXiv 2024年

作者： Chen, Yirui Wei, Pengjin Liu, Zhenhuan Wang, Bingchao Yang, Jie Liu, Wei Institute of Image Processing and Pattern Recognition Department of Automation Shanghai Jiao Tong University China Key Laboratory of System Control and Information Processing Ministry of Education China

Producing traversability maps and understanding the surroundings are crucial prerequisites for autonomous navigation. In this paper, we address the problem of traversability assessment using point clouds. We propose a novel pillar feature extraction module that utilizes PointNet to capture features from point clouds organized in vertical volume and a 2D encoder-decoder structure to conduct traversability classification instead of the widely used 3D convolutions. This results in less computational cost while even better performance is achieved at the same time. We then propose a new spatio-temporal attention module to fuse multi-frame information, which can properly handle the varying density problem of LIDAR point clouds, and this makes our module able to assess distant areas more accurately. Comprehensive experimental results on augmented Semantic KITTI and RELLIS-3D datasets show that our method is able to achieve superior performance over existing approaches both quantitatively and quantitatively. Our code is publicly available at https://***/chenyirui/FASTC. © 2024, CC BY.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents 38

Watch Out for Your Agents! Investigating Backdoor Threats to...

引用

38th Conference on Neural Information processing Systems, NeurIPS 2024

作者： Yang, Wenkai Bi, Xiaohan Lin, Yankai Chen, Sishuo Zhou, Jie Sun, Xu Gaoling School of Artificial Intelligence Renmin University of China Beijing China Center for Data Science Peking University China Pattern Recognition Center WeChat AI Tencent Inc. China National Key Laboratory for Multimedia Information Processing School of Computer Science Peking University China

Driven by the rapid development of Large Language Models (LLMs), LLM-based agents have been developed to handle various real-world applications, including finance, healthcare, and shopping, etc. It is crucial to ensure the reliability and security of LLM-based agents during applications. However, the safety issues of LLM-based agents are currently under-explored. In this work, we take the first step to investigate one of the typical safety threats, backdoor attack, to LLM-based agents. We first formulate a general framework of agent backdoor attacks, then we present a thorough analysis of different forms of agent backdoor attacks. Specifically, compared with traditional backdoor attacks on LLMs that are only able to manipulate the user inputs and model outputs, agent backdoor attacks exhibit more diverse and covert forms: (1) From the perspective of the final attacking outcomes, the agent backdoor attacker can not only choose to manipulate the final output distribution, but also introduce the malicious behavior in an intermediate reasoning step only, while keeping the final output correct. (2) Furthermore, the former category can be divided into two subcategories based on trigger locations, in which the backdoor trigger can either be hidden in the user query or appear in an intermediate observation returned by the external environment. We implement the above variations of agent backdoor attacks on two typical agent tasks including web shopping and tool utilization. Extensive experiments show that LLM-based agents suffer severely from backdoor attacks and such backdoor vulnerability cannot be easily mitigated by current textual backdoor defense algorithms. This indicates an urgent need for further research on the development of targeted defenses against backdoor attacks on LLM-based agents. © 2024 Neural information processing systems foundation. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

ProtDAT: A Unified Framework for Protein Sequence Design from Any Protein Text Description

arXiv

引用

arXiv 2024年

作者： Guo, Xiao-Yu Li, Yi-Fan Liu, Yuan Pan, Xiaoyong Shen, Hong-Bin Institute of Image Processing and Pattern Recognition Shanghai Jiao Tong University Key Laboratory of System Control and Information Processing Ministry of Education of China Shanghai200240 China

Protein design has become a critical method in advancing significant potential for various applications such as drug development and enzyme engineering. However, protein design methods utilizing large language models with solely pretraining and fine-tuning struggle to capture relationships in multi-modal protein data. To address this, we propose ProtDAT, a de novo fine-grained framework capable of designing proteins from any descriptive protein text input. ProtDAT builds upon the inherent characteristics of protein data to unify sequences and text as a cohesive whole rather than separate entities. It leverages an innovative multi-modal cross-attention, integrating protein sequences and textual information for a foundational level and seamless integration. Experimental results demonstrate that ProtDAT achieves the state-of-the-art performance in protein sequence generation, excelling in rationality, functionality, structural similarity, and validity. On 20,000 text-sequence pairs from Swiss-Prot, it improves pLDDT by 6%, TM-score by 0.26, and reduces RMSD by 1.2 Å, highlighting its potential to advance protein design. © 2024, CC BY-NC-SA.

关键词：

来源：评论

学校读者我要写书评

暂无评论

A Unified Framework for Multi-Intent Spoken Language Understanding with Prompting

A Unified Framework for Multi-Intent Spoken Language Underst...

引用

International Conference on Acoustics, Speech, and Signal processing (ICASSP)

作者： Feifan Song Lianzhe Huang Houfeng Wang School of Computer Science Peking University National Key Laboratory of Multimedia Information Processing Peking University Pattern Recognition Center WeChat AI Tencent

ChatGPT has demonstrated impressive capabilities in building conversations. However, for Spoken Language Understanding (SLU) with multiple intents, traditional approaches where Intent Detection and Slot Filling are jointly modeled with distinct formulations hinder networks from effectively extracting shared features. In this work, we describe a Prompt-based SLU (PromptSLU) framework, to intuitively unify two sub-tasks into the same form for a common pre-trained model. Specifically, variable intents are predicted first, then naturally embedded into prompts to guide slot-value inference from a semantic perspective. Furthermore, we are inspired by multi-task learning to introduce an auxiliary sub-task and a concise general objective, which helps to learn relationships among provided labels. Experiment results show that our framework outperforms several competitive baselines on two datasets. The source code is available at https://***/F2-Song/PromptSLU.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Vsgnet: Visual Saliency Guided Network for Skin Lesion Segmentation

SSRN

引用

SSRN 2023年

作者： Cai, Zhefei Fan, Yingle Fang, Tao Wu, Wei Laboratory of Pattern Recognition and Image Processing Hangzhou Dianzi University Hangzhou310018 China Zhejiang Provincial Key Laboratory of Information Processing Communication and Networking Zhejiang310058 China

The accuracy of skin lesion segmentation is of great significance for the subsequent clinical diagnosis. In order to improve the segmentation accuracy, some pioneering works tried to embed multiple complex modules, or used the huge Transformer framework, but due to the limitation of computing resources, these type of large models were not suitable for the actual clinical environment. To address the coexistence challenges of precision and lightweight, we propose a visual saliency guided network (VSGNet) for skin lesion segmentation, which generates saliency images of skin lesions through the efficient attention mechanism of biological vision, and guides the network to quickly locate the target area, so as to solve the localization difficulties in the skin lesion segmentation tasks. VSGNet includes three parts: Color Constancy module, Saliency Detection module and Ultra Lightweight Multi-level Interconnection Network(ULMI-Net). Specially, ULMI-Net uses a U-shaped structure network as the skeleton, including the Adaptive Split Channel Attention (ASCA) module that simulates the parallel mechanism of biological vision dual pathway, and the Channel-Spatial Parallel Attention (CSPA) module inspired by the multi-level interconnection structure of visual cortices. Through these modules, ULMI-Net can balance the efficient extraction and multi-scale fusion of global and local features, and try to achieve the excellent segmentation results at the lowest cost of parameters and computational complexity. To validate the effectiveness and robustness of the proposed VSGNet on three publicly available skin lesion segmentation datasets (ISIC2017, ISIC2018 and PH2 datasets). The experimental results show that compared to other state-of-the-art methods, VSGNet improves the Dice and mIoU metrics by 1.84% and 3.34%, respectively, and with a 196× and 106× reduction in the number of parameters and computational complexity. This paper constructs the VSGNet integrating the biological vision m

关键词： Complex networks

来源：评论

学校读者我要写书评

暂无评论

Face Forgery Detection Algorithm Based on Improved MobileViT Network 8

Face Forgery Detection Algorithm Based on Improved MobileViT...

引用

8th International Conference on Intelligent Computing and Signal processing, ICSP 2023

作者： Wang, Tiantian Lu, Xiaoqi Inner Mongolia University of Science and Technology School of Information Engineering Baotou014010 China Inner Mongolia Key Laboratory of Pattern Recognition and Intelligent Image Processing Baotou014010 China Inner Mongolia University of Technology Institute of Information Engineering Hohhot010051 China

ISBN: (纸本)9798350302455

DeepFakes blur the boundaries between reality and forgery, resulting in the collapse of exiting credit system, causing immeasurable consequences for national security and social order. Through analysis of existing face forgery techniques, it is found that most generation techniques rely on random noise distribution, and global information will be lost after up sampling. Therefore, we propose a deepfake detection algorithm based on improved MobileViT, which uses CNN local space biasing and the global space representation of the Transformer network to learn the local features and global representation of forged faces, respectively. Coordinate attention is introduced to obtain directional perception and position sensitive information, making the model locate synthetic traces of fake faces better and fusion local and global representation more effectively. For the improved generalization of the model, with the GELU activation function to solve the problem of neuron death. Our model achieved 96.2% on FF++(C23) datasets, and 93.7%,94.1%,96.3%,87.9% on DF, F2F, FS, and NT datasets, respectively. Comparing with previous methods, our model has shown detection robustness and better generalization. © 2023 IEEE.

关键词： National security

来源：评论

学校读者我要写书评

暂无评论

Fre-Yolo: Feature Refinement Extraction Network with Yolo for Blade Tip Small Point Light Detection

SSRN

引用

SSRN 2024年

作者： Zheng, Wenhao Xiong, Bangshu Yi, Hui Au, Qiaofeng Chen, Jiujiu Key Laboratory of Jiangxi Province for Image Processing and Pattern Recognition Nanchang Hangkong University Nanchang330063 China China Helicopter Research and Development Institute China

Detecting blade tip point light sources based on airborne computer vision is a critical step in measuring blade tip distance for coaxial unmanned helicopters. However, detecting blade tip point light sources quickly and highly accurately in a helicopter airborne environment is a challenge due to the extremely tiny imaging size of the blade tip point light sources. To this end, we propose a FRE-YOLO for detecting blade tip point light sources based on the YOLO network structure. In this work, we focus on designing a Feature Refinement Extraction network (FRE), which can greatly enrich the feature information of small targets during Backbone downsampling. Secondly, we also design a new paradigm Smooth Gaussian Distance (SmoothGD) which measures the degree of target bounding box matching. SmoothGD is mainly applied to regression loss calculation and label assigner, which can effectively improve the small target detection accuracy. Meanwhile, we construct the Blade Tip Point Light (BTPL) dataset for the network training. In addition, in order to evaluate the effectiveness of the proposed method more fairly, this paper conducts the same experiments on BTPL and the small target dataset VisDrone on MMYOLO. A large amount of experimental data shows that the proposed method has a positive impact on the detection of tiny point sources at the tip of the blade. It is also at the advanced level in the field of generalized small object detection. © 2024, The Authors. All rights reserved.

关键词： Extraction

来源：评论

学校读者我要写书评

暂无评论

Remote Sensing images Semantic Segmentation Method Based on Improved Nested UNet

Remote Sensing Images Semantic Segmentation Method Based on ...

引用

2022 International Conference on Geographic Information and Remote Sensing Technology, GIRST 2022

作者： Li, Zhongyu Liu, Yang Kuang, Yin Wang, Huajun Liu, Cheng College of Geophysics Chengdu University of Technology Chengdu610059 China College of Computer Science Chengdu Normal University Chengdu611130 China Key Laboratory of interior Layout optimization and Security Institutions of Higher Education of Sichuan Province Sichuan Chengdu611130 China Key Laboratory of Pattern Recognition and Intelligent Information Processing of Sichuan Chengdu University Chengdu610106 China Artificial Intelligence Key Laboratory of Sichuan Province Zigong643000 China

ISBN: (纸本)9781510662186

With the development of remote sensing technology, remote sensing images of buildings are of great significance in urban planning, disaster response, and other directions. When we use a neural network containing batch normalization layers for semantic segmentation, the neural network is sensitive to batch size and has low segmentation accuracy for occluded and dense buildings. This paper proposes a method for building segmentation in remote sensing images based on Nested UNet (UNet++) deep neural network. First, the UNet++ network is used to extract features, and the Group Normalization (GN) method is used instead of Batch Normalization (BN) to alleviate the model's sensitivity to batch size. Then the weighted combination of Cross-Entropy Loss (CELoss) and DiceLoss is used as the loss function to improve the feature extraction ability of the neural network for unbalanced buildings. Finally, experiments are carried out on the WHUBuilding dataset. The experimental results show that the improved model (UNet++-GN) improves Mean Intersection over Union (MIoU) and Mean Pixel Accuracy (Macc) by 12.16% and 2.92%, respectively, compared with the original model (UNet++-BN). © 2023 SPIE.

关键词： Feature extraction

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：