With the application of damping controllers developed using remote feedback signals provided by phasor measurement unit (PMU) and wida-area measurement system (WAMS), the wind power generation system (WPGS) can damp t...
详细信息
Reconstructing an object’s high-quality 3D shape with inherent spectral reflectance property, beyond typical device-dependent RGB albedos, opens the door to applications requiring a high-fidelity 3D model in terms of...
详细信息
A multi-modal emotion recognition method based on facial multi-scale features and cross-modal attention (MS-FCA) network is proposed. The MSFCA model improves the traditional single-branch ViT network into a two-branc...
详细信息
ISBN:
(数字)9798331521950
ISBN:
(纸本)9798331521967
A multi-modal emotion recognition method based on facial multi-scale features and cross-modal attention (MS-FCA) network is proposed. The MSFCA model improves the traditional single-branch ViT network into a two-branch ViT architecture by using classification tokens in each branch to interact with picture embeddings in the other branch, which facilitates effective interactions between different scales of information. Subsequently, audio features are extracted using ResNet18 network. The cross-modal attention mechanism is used to obtain the weight matrices between different modal features, making full use of inter-modal correlation and effectively fusing visual and audio features for more accurate emotion recognition. Two datasets are used for the experiments: eNTERFACE'05 and REDVESS dataset. The experimental results show that the accuracy of the proposed method on the eNTERFACE'05 and REDVESS datasets is 85.42% and 83.84% respectively, which proves the effectiveness of the proposed method.
Recent successes in the Machine Learning community have led to a steep increase in the number of papers submitted to conferences. This increase made more prominent some of the issues that affect the current review pro...
详细信息
Zero-shot learning (ZSL) is an important but challenging task in computer vision that aims to identify unseen classes without matching training samples. Current cutting-edge ZSL methods based on locality focus on acqu...
Zero-shot learning (ZSL) is an important but challenging task in computer vision that aims to identify unseen classes without matching training samples. Current cutting-edge ZSL methods based on locality focus on acquiring the explicit locality of distinguishing characteristics, which could face a lack of adequate supervision at the class attribute level. This paper introduces a novel approach called IAC, which aims to learn Implicit Attribute Composition for ZSL. This method is more comprehensive compared to attribute localization that solely focuses on class-level attribute supervision. IAC utilizes subspace representations that efficiently capture the inherent structure of high-dimensional image features. Then, we learn implicit attribute composition through subspace representation learning. The superiority of the proposed IAC compared to the state-of-the-art is demonstrated through sufficient experiments conducted on three commonly used ZSL datasets, CUB, SUN, and AwA2.
This paper explores the cubic-regularized Newton method within a federated learning framework while addressing two major concerns: privacy leakage and communication bottlenecks. We propose the Differentially Private F...
详细信息
In this paper we focus on the problem of decomposing a global Signal Temporal Logic formula (STL) assigned to a multi-agent system to local STL tasks when the team of agents is a-priori decomposed to disjoint sub-team...
详细信息
Networked Traffic Signal control (NTSC) is a fundamental component of Intelligent Transportation systems (ITS) and the broader vision of smart city development. While a plethora of intelligent strategies have been dev...
Networked Traffic Signal control (NTSC) is a fundamental component of Intelligent Transportation systems (ITS) and the broader vision of smart city development. While a plethora of intelligent strategies have been developed, the Sim2Real challenge often impedes their full realization. In response, this paper introduces the Parallel Learning-based Adaptive Network for Traffic Signal control (PLANT) as a foundation model for NTSC. We employ the Wasserstein GAN with Gradient Penalty (WGAN-GP) to generate a wide range of artificial scenarios for robust PLANT training. Further, the Transformer-based Cooperation Mechanism (TCM) is integrated as the primary learner within PLANT, facilitating effective capture of traffic dynamics and knowledge accumulation. This knowledge is readily transferable to real-world applications through meticulous fine-tuning, equipping PLANT to adapt and evolve in alignment with shifting transportation paradigms. Our empirical study on the Hangzhou road network demonstrates PLANT's superiority over both traditional and emerging DRL-based approaches, emphasizing its viability as a potential foundation model for NTSC.
Lung cancer treatment management has always been at the interface of medicine, biology, and physics. Rapid progress is being made in the direction of new high-precision technology developments that emerge toward more ...
详细信息
Recent transformer-based methods for estimating 3D human pose have gained widespread attention, achieving state-of-the-art results. Previous methods have primarily focused on capturing motion patterns of the human bod...
详细信息
ISBN:
(数字)9798350385724
ISBN:
(纸本)9798350385731
Recent transformer-based methods for estimating 3D human pose have gained widespread attention, achieving state-of-the-art results. Previous methods have primarily focused on capturing motion patterns of the human body at a single scale or cascading multiple scales, such as joints, bones, and body-parts. However, they are difficult to simultaneously capture spatial-temporal motion patterns of the human body at different scales due to the complex motion patterns. To address this issue, we propose Dual-scale Spatial and Temporal transFormer (DSTFormer), which can concurrently explore the spatial dependencies and temporal motion patterns of human joints and bones. Additionally, we introduce a Gcn-Spatial Transformer Block (GSTB), which introduces Graph Convolutional Networks (GCN) into transformer to enhance the exploitation of local relationships and global information between adjacent joints or bones. Extensive experiments are conducted on the Human3.6M benchmark dataset, and superior results are reported when comparing to other state-of-the-art methods. More remarkably, our model achieves to-date the best published performance, with P1 errors of 37.9 mm and 15.6 mm, respectively.
暂无评论