In this paper, we constructed speech synthesis corpus of Kham dialect. At the same time, we designed SAMP-Kham machine-readable phonetic label of Kham dialect, and proposed a framework of Kham dialect speech synthesis...
详细信息
Facial expression recognition (FER) has received increasing interest in computer vision. We propose the Trans-FER model which can learn rich relation-aware local representations. It mainly consists of three components...
详细信息
ISBN:
(纸本)9781665428132
Facial expression recognition (FER) has received increasing interest in computer vision. We propose the Trans-FER model which can learn rich relation-aware local representations. It mainly consists of three components: Multi-Attention Dropping (MAD), ViT-FER, and Multi-head Self-Attention Dropping (MSAD). First, local patches play an important role in distinguishing various expressions, however, few existing works can locate discriminative and diverse local patches. This can cause serious problems when some patches are invisible due to pose variations or viewpoint changes. To address this issue, the MAD is proposed to randomly drop an attention map. Consequently, models are pushed to explore diverse local patches adaptively. Second, to build rich relations between different local patches, the Vision Transformers (ViT) are used in FER, called ViT-FER. Since the global scope is used to reinforce each local patch, a better representation is obtained to boost the FER performance. Thirdly, the multi-head self-attention allows ViT to jointly attend to features from different information subspaces at different positions. Given no explicit guidance, however, multiple self-attentions may extract similar relations. To address this, the MSAD is proposed to randomly drop one self-attention module. As a result, models are forced to learn rich relations among diverse local patches. Our proposed TransFER model outperforms the state-of-the-art methods on several FER benchmarks, showing its effectiveness and usefulness.
This paper is mainly about a speech synthesis system based on deep Neural Network (DNN) model of Yi languages, a kind of minority language in china. The system is composed of relatively complete text analysis of Yi, m...
详细信息
Facial Expression Recognition (FER) in the wild is an extremely challenging task. Recently, some Vision Transformers (ViT) have been explored for FER, but most of them perform inferiorly compared to Convolutional Neur...
详细信息
The propeller is one of main vibration sources and cabin noise of a ship. This study utilized numerical simulation of CFD to analyze the characteristics of pressure fluctuation induced by propeller of a new generation...
详细信息
Internal solitary wave(ISW),as a typical marine dynamic process in the deep sea,widely exists in oceans and marginal seas *** interaction between ISW and the seafloor mainly occurs in the bottom boundary *** the seabe...
详细信息
Internal solitary wave(ISW),as a typical marine dynamic process in the deep sea,widely exists in oceans and marginal seas *** interaction between ISW and the seafloor mainly occurs in the bottom boundary *** the seabed boundary layer of the deep sea,ISW is the most important dynamic *** study analyzed the current status,hotspots,and frontiers of research on the interaction between ISW and the seafloor by *** on the action of ISW on the seabed,such as transformation and reaction,a large amount of research work and results were systematically analyzed and *** this basis,this study analyzed the wave–wave interaction and interaction between ISW and the bedform or slope of the seabed,which provided a new perspective for an in‐depth understanding of the interaction between ISW and the ***,the latest research results of the bottom boundary layer and marine engineering stability by ISW were introduced,and the unresolved problems in the current research work were *** study provides a valuable reference for further research on the hazards of ISW to marine engineering geology.
The availability of handy multi-modal (i.e., RGB-D) sensors has brought about a surge of face anti-spoofing research. However, the current multi-modal face presentation attack detection (PAD) has two defects: (1) The ...
详细信息
The ability to predict city-wide parking availability is crucial for the successful development of Parking Guidance and Information (PGI) systems. Indeed, the effective prediction of city-wide parking availability can...
详细信息
Machine learning has been highly successful in data-intensive applications, but is often hampered when the data set is small. Recently, Few-Shot learning (FSL) is proposed to tackle this problem. Using prior knowledge...
详细信息
Clustering is an important branch of unsupervised tasks, aiming at mining deeper relationships and patterns in data. The quality of feature representation based on image datasets often determines the upper limit of cl...
详细信息
暂无评论