检索结果-内蒙古大学图书馆

Learning Comprehensive Representation via Selective Activation and Dual-Level Orthogonality for Pedestrian Attribute Recognition

引用

IEEE Transactions on Circuits and Systems for Video Technology 2025年

作者： Wu, Junyi Huang, Yan Gao, Min Niu, Yuzhen Chen, Yuzhong Wu, Qiang Zhao, Jianqiang Ministry of Education Engineering Research Center of Big Data Intelligence China Ai Research Center Sdic Intelligence Xiamen Information Co. Ltd Xiamen China Xiamen Meiya Pico Information Security Research Institute Co. Ltd Xiamen China Institute of Automation Beijing China Fuzhou University Fujian Key Lab for Intelligent Processing and Wireless Transmission of Media Information College of Physics and Information Engineering Fuzhou China University of Technology Sydney School of Electrical and Data Engineering Ultimo Australia

Multi-label Pedestrian Attribute Recognition (PAR) involves identifying a series of semantic attributes in person images. Existing PAR solutions typically rely on CNN as the backbone network to extract pedestrian features. Unfortunately, CNNs process only one adjacent region at a time, resulting in the disappearance of long-range relations between different attribute-specific regions. To address this limitation, we adopt the Vision Transformer (ViT) instead of CNN as the backbone for PAR, aiming to build long-range relations and extract more robust features. However, PAR suffers from an inherent attribute imbalance issue, causing ViT to naturally focus more on attributes that appear frequently in the training set and ignore some pedestrian attributes that appear less. The native features extracted by ViT are not able to tolerate the imbalance attribute distribution issue. To tackle this issue, we propose a novel component and a dual-level loss: the Selective Feature Activation Method (SFAM), the Orthogonal Feature Activation Loss (OFALoss), and Orthogonal Weight Regularization Loss (OWRLoss). SFAM smartly suppresses the more informative attribute-specific features, thus compelling the PAR model to pay greater attention to attribute-specific regions that are often overlooked. The proposed OFALoss enforces an orthogonal constraint on the original feature extracted by ViT and the suppressed features from SFAM, promoting the comprehensiveness of feature representation in each attribute-specific region. Furthermore, OWRLoss is employed for decreasing correlations among entries of the last shared classification layer, which can alleviate the highly correlated of weight vectors caused by non-uniform distribution. This can prevent excessive mutual interference among different attributes during attribute recognition. Our model-agnostic approach is plug-and-play, requiring no additional training parameters in the training process. We conduct experiments on several benchmark P

关键词： Dual-Level Orthogonal Regularization Pedestrian Attribute Recognition Selective Feature Activation

来源：评论

学校读者我要写书评

暂无评论

Estimating Noisy Class Posterior with Part-level labels for Noisy label Learning

arXiv

引用

arXiv 2024年

作者： Zhao, Rui Shi, Bin Ruan, Jianfei Pan, Tianze Dong, Bo School of Computer Science and Technology Xi’an Jiaotong University China Shaanxi Province Key Lab of Big Data Knowledge Engineering Xi’an Jiaotong University China School of Physics Xi’an Jiaotong University China School of Distance Education Xi’an Jiaotong University China

In noisy label learning, estimating noisy class posteriors plays a fundamental role for developing consistent classifiers, as it forms the basis for estimating clean class posteriors and the transition matrix. Existing methods typically learn noisy class posteriors by training a classification model with noisy labels. However, when labels are incorrect, these models may be misled to overemphasize the feature parts that do not reflect the instance characteristics, resulting in significant errors in estimating noisy class posteriors. To address this issue, this paper proposes to augment the supervised information with part-level labels, encouraging the model to focus on and integrate richer information from various parts. Specifically, our method first partitions features into distinct parts by cropping instances, yielding part-level labels associated with these various parts. Subsequently, we introduce a novel single-to-multiple transition matrix to model the relationship between the noisy and part-level labels, which incorporates part-level labels into a classifier-consistent framework. Utilizing this framework with part-level labels, we can learn the noisy class posteriors more precisely by guiding the model to integrate information from various parts, ultimately improving the classification performance. Our method is theoretically sound, while experiments show that it is empirically effective in synthetic and real-world noisy benchmarks. Copyright © 2024, The Authors. All rights reserved.

关键词： Contrastive Learning

来源：评论

学校读者我要写书评

暂无评论

Delving Globally into Texture and Structure for Image Inpainting

arXiv

引用

arXiv 2022年

作者： Liu, Haipeng Wang, Yang Wang, Meng Rui, Yong School of Computer Science and Information Engineering Hefei University of Technology Hefei China Key Laboratory of Knowledge Engineering with Big Data Ministry of Education Hefei University of Technology Hefei China Lenovo Research Beijing China

Image inpainting has achieved remarkable progress and inspired abundant methods, where the critical bottleneck is identified as how to fulfill the high-frequency structure and low-frequency texture information on the masked regions with semantics. To this end, deep models exhibit powerful superiority to capture them, yet constrained on the local spatial regions. In this paper, we delve globally into texture and structure information to well capture the semantics for image inpainting. As opposed to the existing arts trapped on the independent local patches, the texture information of each patch is reconstructed from all other patches across the whole image, to match the coarsely filled information, especially the structure information over the masked regions. Unlike the current decoder-only transformer within the pixel level for image inpainting, our model adopts the transformer pipeline paired with both encoder and decoder. On one hand, the encoder captures the texture semantic correlations of all patches across image via self-attention module. On the other hand, an adaptive patch vocabulary is dynamically established in the decoder for the filled patches over the masked regions. Building on this, a structure-texture matching attention module anchored on the known regions comes up to marry the best of these two worlds for progressive inpainting via a probabilistic diffusion process. Our model is orthogonal to the fashionable arts, such as Convolutional Neural Networks (CNNs), Attention and Transformer model, from the perspective of texture and structure information for image inpainting. The extensive experiments over the benchmarks validate its superiority. Our code is available here. Copyright © 2022, The Authors. All rights reserved.

关键词： Textures

来源：评论

学校读者我要写书评

暂无评论

Joint Optimization of UAV-Carried IRS for Urban Low Altitude mmWave Communications with Deep Reinforcement Learning

arXiv

引用

arXiv 2025年

作者： Xie, Wenwen Sun, Geng Liu, Bei Li, Jiahui Wang, Jiacheng Du, Hongyang Niyato, Dusit Kim, Dong In The College of Computer Science and Technology Jilin University Changchun130012 China Key Laboratory of Symbolic Computation and Knowledge Engineering Ministry of Education Jilin University Changchun130012 China The College of Computing and Data Science Nanyang Technological University Singapore639798 Singapore The College of Computing and Data Science Nanyang Technological University Singapore The Department of Electrical and Electronic Engineering The University of Hong Kong 999077 Hong Kong The Department of Electrical and Computer Engineering Sungkyunkwan University Suwon16419 Korea Republic of

Emerging technologies in sixth generation (6G) of wireless communications, such as terahertz communication and ultra-massive multiple-input multiple-output, present promising prospects. Despite the high data rate potential of millimeter wave communications, millimeter wave (mmWave) communications in urban low altitude economy (LAE) environments are constrained by challenges such as signal attenuation and multipath interference. Specially, in urban environments, mmWave communication experiences significant attenuation due to buildings, owing to its short wavelength, which necessitates developing innovative approaches to improve the robustness of such communications in LAE networking. In this paper, we explore the use of an unmanned aerial vehicle (UAV)-carried intelligent reflecting surface (IRS) to support low altitude mmWave communication. Specifically, we consider a typical urban low altitude communication scenario where a UAV-carried IRS establishes a line-of-sight (LoS) channel between the mobile users and a source user (SU) despite the presence of obstacles. Subsequently, we formulate an optimization problem aimed at maximizing the transmission rates and minimizing the energy consumption of the UAV by jointly optimizing phase shifts of the IRS and UAV trajectory. Given the non-convex nature of the problem and its high dynamics, we propose a deep reinforcement learning-based approach incorporating neural episodic control, long short-term memory, and an IRS phase shift control method to enhance the stability and accelerate the convergence. Simulation results show that the proposed algorithm effectively resolves the problem and surpasses other benchmark algorithms in various performances. Copyright © 2025, The Authors. All rights reserved.

关键词： Deep reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Using Depth-Enhanced Spatial Transformation for Student Gaze Target Estimation in Dual-View Classroom Images

Using Depth-Enhanced Spatial Transformation for Student Gaze...

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Haonan Miao Peizheng Zhao Yuqi Sun Fang Nan Xiaolong Zhang Yaqiang Wu Feng Tian School of Computer Science and Technology Xi’an Jiaotong University Xi’an China Ministry of Education Key Laboratory of Intelligent Networks and Network Security Xi’an Jiaotong University Xi’an China School of Advanced Technology Xi’an Jiaotong-Liverpool University Suzhou China Shaanxi Province Key Laboratory of Big Data Knowledge Engineering Xi’an Jiaotong University Xi’an China

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

Dual-view gaze target estimation in classroom environments has not been thoroughly explored. Existing methods lack consideration of depth information, primarily focusing on 2D image information and neglecting the latent 3D spatial context, which could lead to suboptimal transformation and cause the gaze cone to intersect with an incorrect object. This paper introduces a novel dual-view gaze target estimation method tailored for classroom settings, leveraging depth-enhanced spatial transformations. By formulating a depth-enhanced 2D space, our method uses depth-enhanced spatial transformation to accurately project students’ gaze cones to the teacher-oriented image. Additionally, we collected a dataset named DVSGE, specifically for student gaze target estimation in dual-view classroom images. Experimental results demonstrate significant performance improvements of 9.8% in AUC and 19.9% in L2-Distance for our method, surpassing existing methods.

关键词： Three-dimensional displays Estimation Focusing Signal processing Acoustics Speech processing

来源：评论

学校读者我要写书评

暂无评论

Generating Action-conditioned Prompts for Open-vocabulary Video Action Recognition 24

Generating Action-conditioned Prompts for Open-vocabulary Vi...

引用

32nd ACM International Conference on Multimedia, MM 2024

作者： Jia, Chengyou Luo, Minnan Chang, Xiaojun Dang, Zhuohang Han, Mingfei Wang, Mengmeng Dai, Guang Dang, Sizhe Wang, Jingdong School of Computer Science and Technology MOEKLINNS Lab Xi'an Jiaotong University Shaanxi Xi'an China University of Science and Technology of China Anhui Hefei China School of Computer Science and Technology Xi'an Jiaotong University Shaanxi Xi'an China ReLER Lab AAII University of Technology Sydney SydneyNSW Australia Zhejiang University of Technology College of Computer Science and Technology China SGIT AI Lab State Grid Corporation of China Beijing China Baidu Inc Beijing China United Arab Emirates Shaanxi Province Key Laboratory of Big Data Knowledge Engineering Xi'an Jiaotong University Xi'an710049 China SGIT AI Lab State Grid Corporation of China China School of Computer Science and Technology Ministry of Education Key Laboratory of Intelligent Networks and Network Security Xi'an Jiaotong University Xi'an710049 China

ISBN: (纸本)9798400706868

Exploring open-vocabulary video action recognition is a promising venture, which aims to recognize previously unseen actions within any arbitrary set of categories. Existing methods typically adapt pretrained image-text models to the video domain, capitalizing on their inherent strengths in generalization. A common thread among such methods is the augmentation of visual embeddings with temporal information to improve the recognition of seen actions. Yet, they compromise with standard less-informative action descriptions, thus faltering when confronted with novel actions. Drawing inspiration from human cognitive processes, we argue that augmenting text embeddings with human prior knowledge is pivotal for open-vocabulary video action recognition. To realize this, we innovatively blend video models with Large Language Models (LLMs) to devise Action-conditioned Prompts. Specifically, we harness the knowledge in LLMs to produce a set of descriptive sentences that contain distinctive features for identifying given actions. Building upon this foundation, we further introduce a multi-modal action knowledge alignment mechanism to align concepts in video and textual knowledge encapsulated within the prompts. Extensive experiments on various video benchmarks, including zero-shot, few-shot, and base-to-novel generalization settings, demonstrate that our method not only sets new SOTA performance but also possesses excellent interpretability. © 2024 ACM.

关键词： Embeddings

来源：评论

学校读者我要写书评

暂无评论

Rotation-Adaptive Point Cloud Domain Generalization via Intricate Orientation Learning

arXiv

引用

arXiv 2025年

作者： Liu, Bangzhen Zheng, Chenxi Xu, Xuemiao Xu, Cheng Zhang, Huaidong He, Shengfeng South China University of Technology Guangzhou China Guangdong Engineering Center for Large Model and GenAI Technology The State Key Laboratory of Subtropical Building and Urban Science The Ministry of Education Key Laboratory of Big Data and Intelligent Robot The Guangdong Provincial Key Lab of Computational Intelligence and Cyberspace Information China Singapore Management University Singapore

The vulnerability of 3D point cloud analysis to unpredictable rotations poses an open yet challenging problem: orientation-aware 3D domain generalization. Cross-domain robustness and adaptability of 3D representations are crucial but not easily achieved through rotation augmentation. Motivated by the inherent advantages of intricate orientations in enhancing generalizability, we propose an innovative rotation-adaptive domain generalization framework for 3D point cloud analysis. Our approach aims to alleviate orientational shifts by leveraging intricate samples in an iterative learning process. Specifically, we identify the most challenging rotation for each point cloud and construct an intricate orientation set by optimizing intricate orientations. Subsequently, we employ an orientation-aware contrastive learning framework that incorporates an orientation consistency loss and a margin separation loss, enabling effective learning of categorically discriminative and generalizable features with rotation consistency. Extensive experiments and ablations conducted on 3D cross-domain benchmarks firmly establish the state-of-the-art performance of our proposed approach in the context of orientation-aware 3D domain generalization. © 2025, CC BY-NC-SA.

关键词： Contrastive Learning

来源：评论

学校读者我要写书评

暂无评论

Fast incremental structure from motion based on parallel bundle adjustment

Fast incremental structure from motion based on parallel bun...

引用

作者： Cao, Mingwei Zheng, Liping Jia, Wei Liu, Xiaoping Key Laboratory of Knowledge Engineering with Big Data Ministry of Education Hefei University of Technology Hefei230009 China Anhui Province Key Laboratory of Industry Safety and Emergency Technology Hefei University of Technology Hefei230009 China School of Computer Science and Information Engineering Hefei University of Technology Hefei230009 China

Structure from motion has attracted a lot of research in recent years, with new state-of-the-art approaches coming almost every year. One of its advantages over 3D reconstruction is that it can be used for any cameras (UAVs, depth sensor, light field) and produces relatively accurate point clouds and camera parameters. One of its disadvantages compared to other approaches is that it is computationally expensive. In this paper, we design a novel structure-from-motion framework to reduce the computational cost and implement a parallel bundle adjustment on GPU device for large-scale optimization. In our framework, the local bundle adjustment is added into the architecture of the incremental structure from motion;namely, the point clouds and camera’s parameters are optimized when an additional number of images was added. Then, the purpose is not only to improve the quality of the produced point clouds but also to reduce computation time via parallel bundle adjustment. We conduct extensively experiments on several challenging datasets and make comparison with the state-of-the-art methods. Experimental results show that the proposed method has the best performance in terms of accuracy and efficiency. © 2020, Springer-Verlag GmbH Germany, part of Springer Nature.

关键词： Cameras

来源：评论

学校读者我要写书评

暂无评论

NSPG-Miner: Mining Repetitive Negative Sequential Patterns

arXiv

引用

arXiv 2025年

作者： Li, Yan Wang, Zhulin Liu, Jing Guo, Lei Fournier-Viger, Philippe Wu, Youxi Wu, Xindong School of Economics and Management Hebei University of Technology Tianjin300401 China School of Artificial Intelligence Hebei University of Technology Tianjin300401 China State Key Laboratory of Reliability and Intelligence of Electrical Equipment Hebei University of Technology Tianjin300401 China College of Computer Science and Software Engineering Shenzhen University Shenzhen518061 China Hebei Key Laboratory of Big Data Computing 300401 China Key Laboratory of Knowledge Engineering with Big Data The Ministry of Education of China Hefei University of Technology Hefei230009 China

Sequential pattern mining (SPM) with gap constraints (or repetitive SPM or tandem repeat discovery in bioinformatics) can find frequent repetitive subsequences satisfying gap constraints, which are called positive sequential patterns with gap constraints (PSPGs). However, classical SPM with gap constraints cannot find the frequent missing items in the PSPGs. To tackle this issue, this paper explores negative sequential patterns with gap constraints (NSPGs). We propose an efficient NSPG-Miner algorithm that can mine both frequent PSPGs and NSPGs simultaneously. To effectively reduce candidate patterns, we propose a pattern join strategy with negative patterns which can generate both positive and negative candidate patterns at the same time. To calculate the support (frequency of occurrence) of a pattern in each sequence, we explore a NegPair algorithm that employs a key-value pair array structure to deal with the gap constraints and the negative items simultaneously and can avoid redundant rescanning of the original sequence, thus improving the efficiency of the algorithm. To report the performance of NSPG-Miner, 11 competitive algorithms and 11 datasets are employed. The experimental results not only validate the effectiveness of the strategies adopted by NSPG-Miner, but also verify that NSPG-Miner can discover more valuable information than the state-ofthe-art algorithms. Algorithms and datasets can be downloaded from https://***/wuc567/PatternMining/tree/master/NSPG-Miner. Copyright © 2025, The Authors. All rights reserved.

关键词： Miners

来源：评论

学校读者我要写书评

暂无评论

Generalized Category Discovery with Large Language Models in the Loop

arXiv

引用

arXiv 2023年

作者： An, Wenbin Shi, Wenkai Tian, Feng Lin, Haonan Wang, QianYing Wu, Yaqiang Cai, Mingxiang Wang, Luyan Chen, Yan Zhu, Haiping Chen, Ping School of Automation Science and Engineering Xi’an Jiaotong University China School of Computer Science and Technology Xi’an Jiaotong University China Ministry of Education Key Laboratory of Intelligent Networks and Network Security China Shaanxi Province Key Laboratory of Big Data Knowledge Engineering China Lenovo Research China University of Massachusetts Boston United States

Generalized Category Discovery (GCD) is a crucial task that aims to recognize both known and novel categories from a set of unlabeled data by utilizing a few labeled data with only known categories. Due to the lack of supervision and category information, current methods usually perform poorly on novel categories and struggle to reveal semantic meanings of the discovered clusters, which limits their applications in the real world. To mitigate the above issues, we propose Loop, an end-to-end active-learning framework that introduces Large Language Models (LLMs) 1 into the training loop, which can boost model performance and generate category names without relying on any human efforts. Specifically, we first propose Local Inconsistent Sampling (LIS) to select samples that have a higher probability of falling to wrong clusters, based on neighborhood prediction consistency and entropy of cluster assignment probabilities. Then we propose a Scalable Query strategy to allow LLMs to choose true neighbors of the selected samples from multiple candidate samples. Based on the feedback from LLMs, we perform Refined Neighborhood Contrastive Learning (RNCL) to pull samples and their neighbors closer to learn clustering-friendly representations. Finally, we select representative samples from clusters corresponding to novel categories to allow LLMs to generate category names for them. Extensive experiments on three benchmark datasets show that Loop outperforms SOTA models by a large margin and generates accurate category names for the discovered clusters. Code and data are available at https://***/Lackel/LOOP. Copyright © 2023, The Authors. All rights reserved.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：