检索结果-内蒙古大学图书馆

IEEE Journal of Selected Areas in Sensors 2024年 1卷 211-223页

作者： Zhang, Kecheng Wu, Jun Dong, Fuwang Lu, Shihang Li, Xiang Yuan, Weijie Southern University of Science and Technology Shenzhen518055 China School of Interdisciplinary Studies Lingnan University Tuen Mun Hong Kong Shenzhen Key Laboratory of Robotics and Computer Vision Southern University of Science and Technology Shenzhen518055 China

The integrated sensing and communication (ISAC) waveform with a low sidelobe level on all delay indices is important for probing targets in the ISAC scenario. In this article, we consider the problem of jointly designing receiving filters and unimodular complementary sets of sequences (CSS) by minimizing the weighted sum of complementary integrated sidelobe level (CISL) and ISAC interference term at the communication receiver. We propose an optimization algorithm based on the majorization minimization scheme to solve the formulated nonconvex problem with a promised convergence. Fast Fourier transform (FFT) operations are performed in each iteration to improve the computation efficiency. Simulation results demonstrate that the crosscorrelation between the optimized receiving filters and CSS can achieve very low autocorrelation sidelobe levels on all time delay indices. The proposed algorithm has better convergence performance and the same computation complexity compared to the gradient descent algorithm. © 2023 IEEE.

关键词： Fast Fourier transforms

来源：评论

学校读者我要写书评

暂无评论

GOAL: Generalized Jointly Sparse Linear Discriminant Regression for Feature Extraction

IEEE Transactions on Artificial Intelligence

引用

IEEE Transactions on Artificial Intelligence 2024年第10期5卷 4959-4971页

作者： Lu, Haoquan Lai, Zhihui Zhang, Junhong Yu, Zhuozhen Wen, Jiajun Shenzhen University Computer Vision Institute College of Computer Science and Software Engineering Guangdong Provincial Key Laboratory of Intelligent Information Processing Shenzhen518060 China Peking University Shenzhen Graduate School School of Electronic and Computer Engineering Shenzhen518055 China

Ridge regression (RR)-based methods aim to obtain a low-dimensional subspace for feature extraction. However, the subspace's dimensionality does not exceed the number of data categories, hence compromising its capability of feature representation. Moreover, these methods with L2-norm metric and regularization cannot extract highly robust features from data with corruption. To address these problems, in this article, we propose generalized jointly sparse linear discriminant regression (GOAL), a novel regression method based on joint L2,1-norm and capped-L2-norm, which can integrate sparsity, locality, and discriminability into one model to learn a full-rank robust feature extractor. The sparsely selected discriminative features are robust enough to characterize the decision boundary between classes. Locality is related to manifold structure and Laplacian smoothing, which can enhance the robustness of the model. By using the multinorm metric and regularization regression framework, the proposed method obtains the projection with joint sparsity and guarantees that the rank of the projection matrix will not be limited by the number of classes. An iterative algorithm is proposed to compute the optimal solution. Complexity analysis and proofs of convergence are also given in the article. Experiments on well-known datasets demonstrate our model's superiority and generalization ability. © 2020 IEEE.

关键词： Data mining

来源：评论

学校读者我要写书评

暂无评论

Reinforcement learning of non-additive joint steganographic embedding costs with attention mechanism

引用

Science China(Information Sciences) 2023年第3期66卷 273-286页

作者： Weixuan TANG Bin LI Weixiang LI Yuangen WANG Jiwu HUANG Institute of Artificial Intelligence and Blockchain Guangzhou University Guangdong Key Laboratory of Intelligent Information Processing Shenzhen Key Laboratory of Media Security Shenzhen University Shenzhen Institute of Artificial Intelligence and Robotics for Society School of Computer Science and Cyber Engineering Guangzhou University

Image steganography is the art and science of secure communication by concealing information within digital images. In recent years, the techniques of steganographic cost learning have developed rapidly. Although the existing methods can learn satisfactory additive costs, the interplay of different pixels' embedding impacts has not been considered, so the potential of learning may not be fully exploited. To overcome this limitation, in this paper, a reinforcement learning paradigm called Jo Po L(joint policy learning) is proposed to extend the idea of additive cost learning to a non-additive situation. Jo Po L aims to capture the interactions within pixel blocks by defining embedding policies and evaluating contributions of embedding impacts on a block level rather than a pixel level. Then, a policy network is utilized to learn optimal joint embedding policies for pixel blocks through interactions with the environment. Afterwards,these policies can be converted into joint embedding costs for practical message embedding. The structure of the policy network is designed with an effective attention mechanism and incorporated with the domain knowledge derived from traditional non-additive steganographic methods. The environment is responsible for assigning rewards according to the impacts of the sampled joint embedding actions, which are evaluated by the gradient information of a neural network-based steganalyzer. Experimental results show that the proposed non-additive method Jo Po L significantly outperforms the existing additive methods against both feature-based and CNN-based steganalzyers over different payloads.

关键词： information hiding non-additive steganography steganalysis cost learning image processing

来源：评论

学校读者我要写书评

暂无评论

On‐device audio‐visual multi‐person wake word spotting

引用

CAAI Transactions on Intelligence Technology 2023年第4期8卷 1578-1589页

作者： Yidi Li Guoquan Wang Zhan Chen Hao Tang Hong Liu Key Laboratory of Machine Perception Peking UniversityShenzhen Graduate SchoolShenzhenChina College of Computer and Information Hefei University of TechnologyHefeiChina Computer Vision Lab ETH ZurichZurichSwitzerland

Audio‐visual wake word spotting is a challenging multi‐modal task that exploits visual information of lip motion patterns to supplement acoustic speech to improve overall detection ***,most audio‐visual wake word spotting models are only suitable for simple single‐speaker scenarios and require high computational *** development is hindered by complex multi‐person scenarios and computational limitations in mobile *** this paper,a novel audio‐visual model is proposed for on‐device multi‐person wake word ***,an attention‐based audio‐visual voice activity detection module is presented,which generates an attention score matrix of audio and visual representations to derive active speaker ***,the knowledge distillation method is introduced to transfer knowledge from the large model to the on‐device model to control the size of our ***,a new audio‐visual dataset,PKU‐KWS,is collected for sentence‐level multi‐person wake word *** results on the PKU‐KWS dataset show that this approach outperforms the previous state‐of‐the‐art methods.

关键词： audio‐visual fusion human‐computer interfacing speech processing

来源：评论

学校读者我要写书评

暂无评论

Commonsense Scene Graph-based Target Localization for Object Search

arXiv

引用

arXiv 2024年

作者： Ge, Wenqi Tang, Chao Zhang, Hong Shenzhen Key Laboratory of Robotics and Computer Vision SUSTech Shenzhen China

Object search is a fundamental skill for household robots, yet the core problem lies in the robot's ability to locate the target object accurately. The dynamic nature of household environments, characterized by the arbitrary placement of daily objects by users, makes it challenging to perform target localization. To efficiently locate the target object, the robot needs to be equipped with knowledge at both the object and room level. However, existing approaches rely solely on one type of knowledge, leading to unsatisfactory object localization performance and, consequently, inefficient object search processes. To address this problem, we propose a commonsense scene graph-based target localization, CSG-TL, to enhance target object search in the household environment. Given the pre-built map with stationary items, the robot models the room-level spatial knowledge with object-level commonsense knowledge generated by a large language model (LLM) to a commonsense scene graph (CSG), supporting both types of knowledge for CSG-TL. To demonstrate the superiority of CSG-TL on target localization, extensive experiments are performed on the real-world ScanNet dataset and the AI2THOR simulator. Moreover, we have extended CSG-TL to an object search framework, CSG-OS, validated in both simulated and real-world environments. Code and videos are available at https://***/view/csg-os. Copyright © 2024, The Authors. All rights reserved.

关键词： Knowledge graph

来源：评论

学校读者我要写书评

暂无评论

Commonsense Scene Graph-based Target Localization for Object Search

Commonsense Scene Graph-based Target Localization for Object...

引用

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

作者： Wenqi Ge Chao Tang Hong Zhang Shenzhen Key Laboratory of Robotics and Computer Vision SUSTech Shenzhen China

ISBN: (数字)9798350377705

ISBN: (纸本)9798350377712

Object search is a fundamental skill for household robots, yet the core problem lies in the robot’s ability to locate the target object accurately. The dynamic nature of household environments, characterized by the arbitrary placement of daily objects by users, makes it challenging to perform target localization. To efficiently locate the target object, the robot needs to be equipped with knowledge at both the object and room level. However, existing approaches rely solely on one type of knowledge, leading to unsatisfactory object localization performance and, consequently, inefficient object search processes. To address this problem, we propose a commonsense scene graph-based target localization, CSG-TL, to enhance target object search in the household environment. Given the pre-built map with stationary items, the robot models the room-level spatial knowledge with object-level commonsense knowledge generated by a large language model (LLM) to a commonsense scene graph (CSG), supporting both types of knowledge for CSG-TL. To demonstrate the superiority of CSG-TL on target localization, extensive experiments are performed on the real-world ScanNet dataset and the AI2THOR simulator. Moreover, we have extended CSG-TL to an object search framework, CSG-OS, validated in both simulated and real-world environments. Code and videos are available at https://***/view/csg-os.

关键词： Location awareness Codes Large language models Laboratories Search problems Commonsense reasoning Intelligent robots Videos

来源：评论

学校读者我要写书评

暂无评论

A deep convolutional neural network for diabetic retinopathy detection via mining local and long-range dependence

引用

CAAI Transactions on Intelligence Technology 2024年第1期9卷 153-166页

作者： Xiaoling Luo Wei Wang Yong Xu Zhihui Lai Xiaopeng Jin Bob Zhang David Zhang Shenzhen Key Laboratory of Visual Object Detection and Recognition Harbin Institute of TechnologyShenzhenChina Peng Cheng Laboratory ShenzhenChina Shenzhen Institute of Artificial Intelligence and Robotics for Society ShenzhenChina College of Big Data and Internet Shenzhen Technology UniversityShenzhenChina The Department of Computer and Information Science University of MacaoMacaoMacaoChina The Chinese University of Hong Kong(Shenzhen) ShenzhenChina

Diabetic retinopathy(DR),the main cause of irreversible blindness,is one of the most common complications of *** present,deep convolutional neural networks have achieved promising performance in automatic DR detection *** convolution operation of methods is a local cross-correlation operation,whose receptive field de-termines the size of the local neighbourhood for ***,for retinal fundus photographs,there is not only the local information but also long-distance dependence between the lesion features(*** and exudates)scattered throughout the whole *** proposed method incorporates correlations between long-range patches into the deep learning framework to improve DR ***-wise re-lationships are used to enhance the local patch features since lesions of DR usually appear as *** Long-Range unit in the proposed network with a residual structure can be flexibly embedded into other trained *** experimental results demon-strate that the proposed approach can achieve higher accuracy than existing state-of-the-art models on Messidor and EyePACS datasets.

关键词： image classification medical image processing pattern recognition

来源：评论

学校读者我要写书评

暂无评论

Activation Template Matching Loss for Explainable Face Recognition 17

Activation Template Matching Loss for Explainable Face Recog...

引用

17th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2023

作者： Lin, Huawei Liu, Haozhe Li, Qiufu Shen, Linlin Computer Vision Institute School of Computer Science and Softwre Enginnering Shenzhen University Shenzhen China Shenzhen Institute of Artificial Intelligence and Robotics for Society Shenzhen China Guangdong Key Laboratory of Intelligent Information Processing Shenzhen University Shenzhen China

ISBN: (纸本)9798350345445

Can we construct an explainable face recognition network able to learn a facial part-based feature like eyes, nose, mouth and so forth, without any manual annotation or additionalsion datasets? In this paper, we propose a generic Explainable Channel Loss (ECLoss) to construct an explainable face recognition network. The explainable network trained with ECLoss can easily learn the facial part-based representation on the target convolutional layer, where an individual channel can detect a certain face part. Our experiments on dozens of datasets show that ECLoss achieves superior explainability metrics, and at the same time improves the performance of face verification without face alignment. In addition, our visualization results also illustrate the effectiveness of the proposed ECLoss. © 2023 IEEE.

关键词： Face recognition

来源：评论

学校读者我要写书评

暂无评论

An Experimental Study of keypoint Descriptor Fusion

An Experimental Study of Keypoint Descriptor Fusion

引用

2022 IEEE International Conference on robotics and Biomimetics, ROBIO 2022

作者： Pan, Yaling He, Li Guan, Yisheng Zhang, Hong Guangzhou China Southern University of Science and Technology Shenzhen Key Laboratory of Robotics and Computer Vision Shenzhen China

ISBN: (纸本)9781665481090

Local feature descriptors play a crucial role in computer vision problems, especially robot motion. Existing descriptors are highly accurate, but their performance de-pends on the influence of distracting factors, such as illumi-nation and viewpoint. There is room for further improvement of these descriptors. In this paper, we provide an in-depth analysis of several exciting features of the descriptor fusion model (DFM) we have proposed in our recent work, which uses an autoencoder to combine descriptors and exploit their respective advantages. With this DFM framework, we fur-ther validate that fused descriptors can retain advantageous properties and that our DFM is a generally applicable method with respect to various component descriptors. Specifically, we evaluate multiple combinations of hand-crafted and CNN descriptors concerning their performance on a benchmark dataset with illumination and viewpoint changes to obtain comprehensive experimental results. The results show that the fused descriptors have better matching accuracy than their component descriptors. © 2022 IEEE.

关键词： Benchmarking

来源：评论

学校读者我要写书评

暂无评论

HairDiffusion: Vivid Multi-Colored Hair Editing via Latent Diffusion 38

HairDiffusion: Vivid Multi-Colored Hair Editing via Latent D...

引用

38th Conference on Neural Information Processing Systems, NeurIPS 2024

作者： Zeng, Yu Zhang, Yang Liu, Jiachen Shen, Linlin Deng, Kaijun He, Weizhao Wang, Jinbao Computer Vision Institute School of Computer Science & Software Engineering Shenzhen University China Shenzhen Institute of Artificial Intelligence and Robotics for Society China National Engineering Laboratory for Big Data System Computing Technology Shenzhen University China Guangdong Provincial Key Laboratory of Intelligent Information Processing China

Hair editing is a critical image synthesis task that aims to edit hair color and hairstyle using text descriptions or reference images, while preserving irrelevant attributes (e.g., identity, background, cloth). Many existing methods are based on StyleGAN to address this task. However, due to the limited spatial distribution of StyleGAN, it struggles with multiple hair color editing and facial preservation. Considering the advancements in diffusion models, we utilize Latent Diffusion Models (LDMs) for hairstyle editing. Our approach introduces Multi-stage Hairstyle Blend (MHB), effectively separating control of hair color and hairstyle in diffusion latent space. Additionally, we train a warping module to align the hair color with the target region. To further enhance multi-color hairstyle editing, we fine-tuned a CLIP model using a multi-color hairstyle dataset. Our method not only tackles the complexity of multi-color hairstyles but also addresses the challenge of preserving original colors during diffusion editing. Extensive experiments showcase the superiority of our method in editing multi-color hairstyles while preserving facial attributes given textual descriptions and reference images. © 2024 Neural information processing systems foundation. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：