检索结果-内蒙古大学图书馆

Action Status Based Novel Relative Feature Representations for Interaction Recognition

Chinese Journal of Electronics 2022年第1期31卷 168-180页

作者： LI Yanshan GUO Tianyu LIU Xing LUO Wenhan XIE Weixin ATR National Key Laboratory of Defense Technology Shenzhen University Guangdong Key Laboratory of Intelligent Information Processing Shenzhen University Tencent

Skeleton-based action recognition has always been an important research topic in computer vision. Most of the researchers in this field currently pay more attention to actions performed by a single person while there is very little work dedicated to the identification of interactions between two people. However, the practical application of interaction recognition is actually more critical in our society considering that actions are often performed by multiple people. How to design an effective scheme to learn discriminative spatial and temporal representations for skeleton-based interaction recognition is still a challenging problem. Focusing on the characteristics of skeleton data for interactions, we first define the moving distance to distinguish the action status of the participants. Then some view-invariant relative features are proposed to fully represent the spatial and temporal relationship of the skeleton sequence. Further, a new coding method is proposed to obtain the novel relative feature representations. Finally, we design a three-stream CNN model to learn deep features for interaction recognition. We evaluate our method on SBU dataset, NTU RGB+D 60 dataset and NTU RGB+D 120 dataset. The experimental results also verify that our method is effective and exhibits great robustness compared with current state-of-the-art methods.

关键词： image representation relative feature representations skeleton-based interaction recognition skeleton sequence skeleton data image coding deep learning (artificial intelligence) image recognition temporal representations temporal relationship view-invariant relative features spatial relationship skeleton-based action recognition action status computer vision convolutional neural nets discriminative spatial representations image motion analysis

来源：评论

学校读者我要写书评

暂无评论

Mixture of Experts for Audio-Visual Learning 38

Mixture of Experts for Audio-Visual Learning

引用

38th Conference on Neural information processing Systems, NeurIPS 2024

作者： Cheng, Ying Li, Yang He, Junjie Feng, Rui School of Computer Science Fudan University China Shanghai Key Laboratory of Intelligent Information Processing China Shanghai Collaborative Innovation Center of Intelligent Visual Computing China

With the rapid development of multimedia technology, audio-visual learning has emerged as a promising research topic within the field of multimodal analysis. In this paper, we explore parameter-efficient transfer learning for audio-visual learning and propose the Audio-Visual Mixture of Experts (AVMoE) to inject adapters into pre-trained models flexibly. Specifically, we introduce unimodal and cross-modal adapters as multiple experts to specialize in intra-modal and inter-modal information, respectively, and employ a lightweight router to dynamically allocate the weights of each expert according to the specific demands of each task. Extensive experiments demonstrate that our proposed approach AVMoE achieves superior performance across multiple audio-visual tasks, including AVE, AVVP, AVS, and AVQA. Furthermore, visual-only experimental results also indicate that our approach can tackle challenging scenes where modality information is missing. The source code is available at https://***/yingchengy/AVMOE. © 2024 Neural information processing systems foundation. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Facial micro-expression recognition based on dual-stream fusion network

引用

Multimedia Tools and Applications 2025年 1-14页

作者： Sun, Jiacheng Chen, Changhong Jiangsu Key Laboratory of Intelligent Information Processing and Communication Technology Nanjing University of Posts and Telecommunications No.66 Xin Mofan RD Nanjing210003 China

Micro-expressions(MEs) have emerged as a viable strategy for affective estimation due to their high reliability in emotion detection. In recent years, deep learning methods have been successfully applied to the field of micro-expression recognition. However, extracting and learning features from MEs presents challenges due to their brief duration and subtle intensity. To address these challenges, we propose the dual-stream fusion network (DSFNet). Specifically, we design shallow tokens-to-token vision transformers (T2T-ViT) to effectively capture comprehensive spatial position information. We also fine-tuned the number of ViT encoders and heads to enhance overall model performance. Additionally, the proposed multiscale convolution block (MCB) and attention mechanism modules (AMM) facilitate the effective extraction of detailed and valuable multiscale features from MEs. By employing various sizes of convolutional kernels and attention mechanisms, our approach captures higher-level image information, thereby improving MER accuracy. Finally, we integrate the information obtained from both branches. Performance evaluations on three mainstream ME datasets-SMIC, CASME II, and SAMM-demonstrate that the proposed framework significantly outperforms other advanced methods in micro-expression classification. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.

关键词： Deep learning

来源：评论

学校读者我要写书评

暂无评论

A Small-sample Trajectory Classification Algorithm via Bayesian Program Learning 17

A Small-sample Trajectory Classification Algorithm via Bayes...

引用

17th IEEE International Conference on Signal processing, ICSP 2024

作者： Li, Yanshan Yao, Ruoqiang Shenzhen University Institute of Intelligent Information Processing Shenzhen518060 China Shenzhen University Guangdong Key Laboratory of Intelligent Information Processing Shenzhen518060 China

ISBN: (纸本)9798350387384

Trajectory classification algorithms are widely used in fields such as behavior recognition, anomaly detection and monitoring, and video content analysis. To overcome the issues of traditional trajectory classification algorithms that require large supporting data and continuous model updates to ensure effectiveness, we take inspiration from the concept learning approach proposed by Brenden et al. We simulate how humans learn rich concepts from a single hand-drawn sample and propose a Bayesian program learning-based trajectory classification algorithm. This algorithm first builds concepts from a set of hand-drawn trajectories and then classifies trajectory data based on these constructed concepts. Experimental results show that our algorithm outperforms traditional single-sample learning algorithms in trajectory classification tasks and also shows superior performance over traditional single-sample learning algorithms when the dataset includes noisy data. © 2024 IEEE.

关键词： Contrastive Learning

来源：评论

学校读者我要写书评

暂无评论

Nested Linear Array DoA Estimation with Mixed One-bit Quantization 17

Nested Linear Array DoA Estimation with Mixed One-bit Quanti...

引用

17th IEEE International Conference on Signal processing, ICSP 2024

作者： Huang, Jianjun Huang, Hao Kang, Li Ding, Jianbin Shenzhen University Guangdong Provincial Key Laboratory of Intelligent Information Processing Shenzhen China

ISBN: (纸本)9798350387384

Direction of Arrival (DoA) estimation is a key technology in array signal processing. One-bit quantization is a popular method for reducing hardware costs in DoA estimation. However, one-bit quantization introduces significant quantization noise, which can greatly impact the accuracy of DoA estimation. Therefore, we propose a method for DoA estimation based on nested linear array (NLA) and mixed one-bit quantization. This method combines all one-bit quantization and full precision quantization, aiming to leverage the advantages of all one-bit quantization while preserving more signal information, thereby maintaining higher estimation accuracy. Additionally, by employing NLA, the system performance is enhanced and hardware costs are reduced, effectively overcoming the limitations of uniform linear array (ULA). We derived the principles of the covariance matrix for mixed one-bit quantization used in DoA estimation and the calculation method for its CRB. Simulation experiments demonstrate that compared to all one-bit quantization, this method significantly improves DoA estimation performance and achieves a lower Cramer-Rao Bound (CRB). The proposed method can achieve a good trade off between hardware costs and estimation accuracy. © 2024 IEEE.

关键词： Quantization (signal)

来源：评论

学校读者我要写书评

暂无评论

Mixed Quantization of One-bit and Full-precision Signals for Direction of Arrival Estimation 17

Mixed Quantization of One-bit and Full-precision Signals for...

引用

17th IEEE International Conference on Signal processing, ICSP 2024

作者： Kang, Li Ding, Jianbin Huang, Jianjun Huang, Hao Shenzhen University Guangdong Provincial Key Laboratory of Intelligent Information Processing Shenzhen China

ISBN: (纸本)9798350387384

Direction of Arrival (DOA) estimation is a crucial technology in array signal processing. One-bit quantization is commonly employed to reduce hardware costs in DOA estimation, yet it introduces significant quantization noise, adversely affecting the accuracy of DOA estimation. To mitigate this issue, we propose a mixed-precision quantization method that integrates one-bit and full-precision quantization. We derive the covariance matrix of signals subjected to mixed-precision quantization and establish the Cramér-Rao Bound (CRB) for this quantization approach. Through experiments, we evaluate the performance of mixed-precision quantization in both DOA estimation and target recognition. The results demonstrate that mixed-precision quantization significantly enhances DOA estimation accuracy compared to one-bit quantization. Additionally, this method improves the target recognition capability of signals. Our approach strikes a better balance between hardware costs and quantization errors, offering a promising solution for practical applications. © 2024 IEEE.

关键词： Quantization (signal)

来源：评论

学校读者我要写书评

暂无评论

Leveraging Slot-Aware Enhancements for Improved Zero-Shot Dialogue State Tracking 6

Leveraging Slot-Aware Enhancements for Improved Zero-Shot Di...

引用

6th International Conference on Frontier Technologies of information and Computer, ICFTIC 2024

作者： Song, Tianhang Hu, Haolei Chen, Xinxin Zi, Kangli Henan Institute of Advanced Technology Zhengzhou University Zhengzhou China Institute of Computing Technology Key Laboratory of Intelligent Information Processing Beijing China

ISBN: (纸本)9798331541750

Task-oriented dialogue systems (TOD) aim to help users complete specific tasks through multiple rounds of dialogue, in which Dialogue State Tracking (DST) is a key component. The training of DST models typically necessitates the availability of a substantial corpus of dialogue data with turn-level annotations. However, the collection of such data is both costly and inefficient. Despite the impressive performance of systems based on advanced large language models (LLMs), their closed-source nature restricts the potential for local deployment. To address this challenge, this paper proposes a new zero-shot slot-aware enhanced model (SAE), which introduces a novel coded prompting framework to enhance the DST capability of small parameter models in new domains and task scenarios. In particular, we have devised a prompting framework that employs code to refine task instructions. Furthermore, we have proposed a slot-aware question generation mechanism that generates slot-aware questions based on contextual details in accordance with the dialogue context, thereby enhancing the model's perception and matching capabilities for slots. In order to verify the effectiveness of the model, we conducted experiments on the MultiWOZ dataset. The results show that the SAE model has achieved significant performance improvements in zero-shot DST tasks. © 2024 IEEE.

关键词： Zero-shot learning

来源：评论

学校读者我要写书评

暂无评论

Exploiting Multi-Decision and Deep Refinement for Ultrasound Image Segmentation 48

Exploiting Multi-Decision and Deep Refinement for Ultrasound...

引用

48th IEEE International Conference on Acoustics, Speech and Signal processing, ICASSP 2023

作者： Liu, Wenjing Li, Xuanya Hu, Kai Gao, Xieping Xiangtan University Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education Xiangtan411105 China Baidu Inc Beijing100085 China Hunan Normal University Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing Changsha410081 China

ISBN: (纸本)9781728163277

In this paper, we propose a novel convolutional neural network (MDR-Net) for ultrasound image segmentation by exploiting multi-decision and deep refinement of the target. Our MDR-Net consists of two main parts, i.e., a multi-decision module (MDM) and a deep refinement module (DRM). Specifically, the MDM effectively addresses the issue of inconspicuous target regions in ultrasound images by combining multi-scale features and multi-receptive field self-attention to enhance the discriminative representation of features and diagnose feature points multiple times. In addition, to alleviate the problem of blurred boundaries and severe speckle noise, the DRM progressively fuses multi-scale features and makes the fused features interact with higher-level features to refine the target details step by step. Finally, we evaluate the proposed method on two publicly available datasets, namely BUSI and UDIAT. We achieve a Dice of 0.8265 and 0.8827 on the two datasets, which are at least 2% and 1.24% higher than other state-of-the-art ultrasound image segmentation methods. © 2023 IEEE.

关键词： Convolutional neural network Deep refinement Multi-decision Ultrasound image segmentation

来源：评论

学校读者我要写书评

暂无评论

From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial Expression Recognition in Videos

引用

IEEE Transactions on Affective computing 2024年第02期16卷 624-638页

作者： Chen, Yin Li, Jia Shan, Shiguang Wang, Meng Hong, Richang Hefei University of Technology School of Computer Science and Information Engineering Hefei230601 China Institute of Computing Technology Chinese Academy of Sciences Key Laboratory of Intelligent Information Processing Beijing100190 China University of Chinese Academy of Sciences Beijing100049 China

Dynamic facial expression recognition (DFER) in the wild is still hindered by data limitations, e.g., insufficient quantity and diversity of pose, occlusion and illumination, as well as the inherent ambiguity of facial expressions. In contrast, static facial expression recognition (SFER) currently shows much higher performance and can benefit from more abundant high-quality training data. Moreover, the appearance features and dynamic dependencies of DFER remain largely unexplored. Recognizing the potential in leveraging SFER knowledge for DFER, we introduce a novel Static-to-Dynamic model (S2D) that leverages existing SFER knowledge and dynamic information implicitly encoded in extracted facial landmark-aware features, thereby significantly improving DFER performance. Firstly, we build and train an image model for SFER, which incorporates a standard Vision Transformer (ViT) and Multi-View Complementary Prompters (MCPs) only. Then, we obtain our video model (i.e., S2D), for DFER, by inserting Temporal-Modeling Adapters (TMAs) into the image model. MCPs enhance facial expression features with landmark-aware features inferred by an off-the-shelf facial landmark detector. And the TMAs capture and model the relationships of dynamic changes in facial expressions, effectively extending the pre-trained image model for videos. Notably, MCPs and TMAs only increase a fraction of trainable parameters (less than +10%) to the original image model. Moreover, we present a novel Emotion-Anchors (i.e., reference samples for each emotion category) based Self-Distillation Loss to reduce the detrimental influence of ambiguous emotion labels, further enhancing our S2D. Experiments conducted on popular SFER and DFER datasets show that we have achieved a new state of the art. © 2010-2012 IEEE.

关键词： Face recognition

来源：评论

学校读者我要写书评

暂无评论

A C-band quad-cluster low-profile broadband circularly polarized antenna array based on sequentially rotating metasurface-based

A C-band quad-cluster low-profile broadband circularly polar...

引用

2024 International Applied Computational Electromagnetics Society Symposium, ACES-China 2024

作者： Bai, Rongxian Li, Minquan Anhui University Key Laboratory of Intelligent Computing & Signal Processing Ministry of Education Hefei230601 China

ISBN: (纸本)9798350355581

This paper proposes a metasurface-based left-handed circularly polarized (CP) sequential rotating metasurface-based (MTS) antenna array for C-band. The array comprises a cluster of 4 × 4 periodically aligned MTS cells, which exhibit excellent performance at a low profile of 0.08 λ0. The antenna array is constructed on a Rogers RO4003 dielectric board in two layers. The upper surface of the upper layer is the MTS, the floor has a gap between the upper and lower layers, and the lower surface of the lower layer is the sequential feed network. The simulation conducted in HFSS software yielded an integrated bandwidth (IBW) of 95% (3.79-8.54 GHz) and an antenna radiation bandwidth (ARBW) of 49.8% (3.77-6.26 GHz), respectively. The dimensions of the antenna are 1.33λ0 × 1.33λ0 × 0.08λ0. This configuration results in a miniaturized, compact, and low-profile antenna array. This antenna configuration is applicable to satellite communications using circularly polarized waves through MTS units. © 2024 IEEE.

关键词： Circular polarization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：