检索结果-内蒙古大学图书馆

TransFER: learning Relation-aware Facial Expression Representations with Transformers

学校读者我要写书评

暂无评论

TransFER: Learning Relation-aware Facial Expression Represen...

International Conference on Computer Vision (ICCV)

作者： Fanglei Xue Qiangchang Wang Guodong Guo University of Chinese Academy of Sciences Beijing China Chinese Academy of Sciences Beijing China West Virginia University Morgantown USA Institute of Deep Learning Beijing China National Engineering Laboratory for Deep Learning Technology and Application Beijing China

ISBN: (纸本)9781665428132

Facial expression recognition (FER) has received increasing interest in computer vision. We propose the Trans-FER model which can learn rich relation-aware local representations. It mainly consists of three components: Multi-Attention Dropping (MAD), ViT-FER, and Multi-head Self-Attention Dropping (MSAD). First, local patches play an important role in distinguishing various expressions, however, few existing works can locate discriminative and diverse local patches. This can cause serious problems when some patches are invisible due to pose variations or viewpoint changes. To address this issue, the MAD is proposed to randomly drop an attention map. Consequently, models are pushed to explore diverse local patches adaptively. Second, to build rich relations between different local patches, the Vision Transformers (ViT) are used in FER, called ViT-FER. Since the global scope is used to reinforce each local patch, a better representation is obtained to boost the FER performance. Thirdly, the multi-head self-attention allows ViT to jointly attend to features from different information subspaces at different positions. Given no explicit guidance, however, multiple self-attentions may extract similar relations. To address this, the MSAD is proposed to randomly drop one self-attention module. As a result, models are forced to learn rich relations among diverse local patches. Our proposed TransFER model outperforms the state-of-the-art methods on several FER benchmarks, showing its effectiveness and usefulness.

关键词： Computer vision Adaptation models Face recognition Computational modeling Transfer learning Computer architecture Benchmark testing

The speech synthesis of yi language based on DNN

学校读者我要写书评

暂无评论

The speech synthesis of yi language based on DNN

2019 International Joint Conference on Information, Media, and engineering, IJCIME 2019

作者： Bu, Xiaolong Yang, Hongwu Zhang, Weizhao College of Physics and Electronic Engineering Eng. Research Center of Gansu Province for in Telligent Information Technology and Application Northwest Normal University Lanzhou China School of Educational Technology National and Provincial Joint Engineering Laboratory of Learning Analysis Technology in Online Education Northwest Normal University Lanzhou China

ISBN: (纸本)9781728155869

This paper is mainly about a speech synthesis system based on deep Neural Network (DNN) model of Yi languages, a kind of minority language in china. The system is composed of relatively complete text analysis of Yi, model training and speech synthesis module. Especially in front-end, the word segmentation, pause handling, word-to-phoneme conversion and label processing are used to analysis text of Yi language. We designed the question set for decision tree of DNN model training and used vocoder: WORLD for synthesis. The system achieves a relatively good Mean Opinion Score (MOS) of 3.93 by Yi undergraduates as evaluators compared with a MOS of 4.58 of original speech. To investigate the factors affecting the quality of synthesized Yi speech, this paper also objectively evaluates the performance of different training set and DNN model. The system successfully synthesized Yi speech for the first time and synthesized speech is relatively good as the result of an only complete minority language speech synthesis system. © 2019 IEEE.

关键词： deep neural networks

Vision Transformer with Attentive Pooling for Robust Facial Expression Recognition

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Xue, Fanglei Wang, Qiangchang Tan, Zichang Ma, Zhongsong Guo, Guodong University of Chinese Academy of Sciences The Key Laboratory of Space Utilization Technology and Engineering Center for Space Utilization Chinese Academy of Sciences Beijing China West Virginia University Morgantown United States Institute of Deep Learning Baidu Research National Engineering Laboratory for Deep Learning Technology and Application Beijing China

Facial Expression Recognition (FER) in the wild is an extremely challenging task. Recently, some Vision Transformers (ViT) have been explored for FER, but most of them perform inferiorly compared to Convolutional Neural Networks (CNN). This is mainly because the new proposed modules are difficult to converge well from scratch due to lacking inductive bias and easy to focus on the occlusion and noisy areas. TransFER, a representative transformer-based method for FER, alleviates this with multi-branch attention dropping but brings excessive computations. On the contrary, we present two attentive pooling (AP) modules to pool noisy features directly. The AP modules include Attentive Patch Pooling (APP) and Attentive Token Pooling (ATP). They aim to guide the model to emphasize the most discriminative features while reducing the impacts of less relevant features. The proposed APP is employed to select the most informative patches on CNN features, and ATP discards unimportant tokens in ViT. Being simple to implement and without learnable parameters, the APP and ATP intuitively reduce the computational cost while boosting the performance by ONLY pursuing the most discriminative features. Qualitative results demonstrate the motivations and effectiveness of our attentive poolings. Besides, quantitative results on six in-the-wild datasets outperform other state-of-the-art methods. © 2022, CC BY.

关键词： Convolutional neural networks

Numerical calculation on propeller-induced pressure fluctuation of a river-to-sea ship 27

学校读者我要写书评

暂无评论

Numerical calculation on propeller-induced pressure fluctuat...

27th International Ocean and Polar engineering Conference, ISOPE 2017

作者： Wu, Weiguo Wei, Ji Ezheng Lin, Yongshui Qin, Shengjie Yang, Tao Key Laboratory of High Performance Ship Technology of Ministry of Education Department of Structural Engineering School of Transportation Wuhan University of Technology Wuhan Hubei China Department of Mechanics and Engineering Structure Hubei Key Laboratory of Theory and Application of Advanced Materials Mechanics Wuhan University of Technology Wuhan Hubei China Department of Technology National Deep Sea Center Qingdao Shandong China

ISBN: (纸本)9781880653975

The propeller is one of main vibration sources and cabin noise of a ship. This study utilized numerical simulation of CFD to analyze the characteristics of pressure fluctuation induced by propeller of a new generation of river-to-sea ship beyond rules. The pressure fluctuation in the area of D ×D above propeller and the effects of ship speed and propeller rotational speed are calculated and analyzed. The results show that maximum pressure fluctuation in the direction of length is in front of propeller approximately 0.1D and the magnitude is determined by amplitude of blade frequency. Centered on propeller, the amplitude in front of propeller is greater than that behind propeller and the amplitude on the outside of propeller is greater than that inside. It's also found that propeller rotational speed has a great influence on pressure fluctuation, and the effect of ship speed is small. This study provides a numerical computation method for pressure fluctuation induced by propeller of new generation of river-to-sea ship beyond rules, and has a great significance for the design of reducing vibration and noise. Copyright © 2017 by the International Society of Offshore and Polar Engineers (ISOPE).

关键词： Rivers

Interaction between internal solitary waves and the seafloor in the deep sea

学校读者我要写书评

暂无评论

deep Underground Science and engineering 2024年第2期3卷 149-162页

作者： Zhuangcai Tian Jinjian Huang Jiaming Xiang Shaotong Zhang Jinran Wu Xiaolei Liu Tingting Luo Jianhua Yue State Key Laboratory of Intelligent Construction and Healthy Operation and Maintenance of Deep Underground Engineering China University of Mining and TechnologyXuzhouChina Research Center for Deep Ocean Science and Underwater Engineering China University of Mining and TechnologyXuzhouChina Frontiers Science Center for Deep Ocean Multispheres and Earth System Key Lab of Submarine Geosciences and Prospecting TechniquesMOECollege of Marine GeosciencesOcean University of ChinaQingdaoChina Institute for Learning Sciences&Teacher Education Australian Catholic UniversityBrisbaneQueenslandAustralia Shandong Provincial Key Laboratory of Marine Environment and Geological Engineering Ocean University of ChinaQingdaoChina Department of Chemical and Biomolecular Engineering National University of SingaporeSingaporeSingapore School of Resources and Geosciences China University of Mining and TechnologyXuzhouChina

Internal solitary wave(ISW),as a typical marine dynamic process in the deep sea,widely exists in oceans and marginal seas *** interaction between ISW and the seafloor mainly occurs in the bottom boundary *** the seabed boundary layer of the deep sea,ISW is the most important dynamic *** study analyzed the current status,hotspots,and frontiers of research on the interaction between ISW and the seafloor by *** on the action of ISW on the seabed,such as transformation and reaction,a large amount of research work and results were systematically analyzed and *** this basis,this study analyzed the wave–wave interaction and interaction between ISW and the bedform or slope of the seabed,which provided a new perspective for an in‐depth understanding of the interaction between ISW and the ***,the latest research results of the bottom boundary layer and marine engineering stability by ISW were introduced,and the unresolved problems in the current research work were *** study provides a valuable reference for further research on the hazards of ISW to marine engineering geology.

关键词： bottom boundary layer interaction internal solitary wave seafloor sediment

FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Liu, Ajian Tan, Zichang Yu, Zitong Zhao, Chenxu Wan, Jun Liang, Yanyan Lei, Zhen Zhang, Du Li, Stan Z. Guo, Guodong Beijing China The Institute of Deep Learning Baidu Research National Engineering Laboratory for Deep Learning Technology and Application China The Great Bay University Dongguan523000 China The Mininglamp Academy of Sciences Mininglamp Technology China The Macau University of Science and Technology China The Westlake University China

The availability of handy multi-modal (i.e., RGB-D) sensors has brought about a surge of face anti-spoofing research. However, the current multi-modal face presentation attack detection (PAD) has two defects: (1) The framework based on multi-modal fusion requires providing modalities consistent with the training input, which seriously limits the deployment scenario. (2) The performance of ConvNet-based model on high fidelity datasets is increasingly limited. In this work, we present a pure transformer-based framework, dubbed the Flexible Modal Vision Transformer (FM-ViT), for face anti-spoofing to flexibly target any single-modal (i.e., RGB) attack scenarios with the help of available multi-modal data. Specifically, FM-ViT retains a specific branch for each modality to capture different modal information and introduces the Cross-Modal Transformer Block (CMTB), which consists of two cascaded attentions named Multi-headed Mutual-Attention (MMA) and Fusion-Attention (MFA) to guide each modal branch to mine potential features from informative patch tokens, and to learn modality-agnostic liveness features by enriching the modal information of own CLS token, respectively. Experiments demonstrate that the single model trained based on FM-ViT can not only flexibly evaluate different modal samples, but also outperforms existing single-modal frameworks by a large margin, and approaches the multi-modal frameworks introduced with smaller FLOPs and model parameters. Copyright © 2023, The Authors. All rights reserved.

关键词： Modal analysis

Semi-supervised hierarchical recurrent graph neural network for city-wide parking availability prediction

学校读者我要写书评

暂无评论

arXiv 2019年

作者： Zhang, Weijia Liu, Hao Liu, Yanchi Zhou, Jingbo Xiong, Hui University of Science and Technology of China Hefei China Business Intelligence Lab Baidu Research National Engineering Laboratory of Deep Learning Technology and Application Beijing China Rutgers University United States

The ability to predict city-wide parking availability is crucial for the successful development of Parking Guidance and Information (PGI) systems. Indeed, the effective prediction of city-wide parking availability can improve parking efficiency, help urban planning, and ultimately alleviate city congestion. However, it is a non-trivial task for predicting citywide parking availability because of three major challenges: 1) the non-Euclidean spatial autocorrelation among parking lots, 2) the dynamic temporal autocorrelation inside of and between parking lots, and 3) the scarcity of information about real-time parking availability obtained from real-time sensors (e.g., camera, ultrasonic sensor, and GPS). To this end, we propose Semi-supervised Hierarchical Recurrent Graph Neural Network (SHARE) for predicting city-wide parking availability. Specifically, we first propose a hierarchical graph convolution structure to model non-Euclidean spatial autocorrelation among parking lots. Along this line, a contextual graph convolution block and a soft clustering graph convolution block are respectively proposed to capture local and global spatial dependencies between parking lots. Additionally, we adopt a recurrent neural network to incorporate dynamic temporal dependencies of parking lots. Moreover, we propose a parking availability approximation module to estimate missing real-time parking availabilities from both spatial and temporal domain. Finally, experiments on two real-world datasets demonstrate the prediction performance of SHARE outperforms seven state-of-the-art baselines. Copyright © 2019, The Authors. All rights reserved.

关键词： Forecasting

Generalizing from a few examples: A survey on few-shot learning

学校读者我要写书评

暂无评论

arXiv 2019年

作者： WANG, YAQING YAO, QUANMING KWOK, JAMES T. NI, LIONEL M. Department of Computer Science and Engineering Hong Kong University of Science and Technology Business Intelligence Lab National Engineering Laboratory of Deep Learning Technology and Application Baidu Research 4Paradigm Inc.

Machine learning has been highly successful in data-intensive applications, but is often hampered when the data set is small. Recently, Few-Shot learning (FSL) is proposed to tackle this problem. Using prior knowledge, FSL can rapidly generalize to new tasks containing only a few samples with supervised information. In this paper, we conduct a thorough survey to fully understand FSL. Starting from a formal definition of FSL, we distinguish FSL from several relevant machine learning problems. We then point out that the core issue in FSL is that the empirical risk minimizer is unreliable. Based on how prior knowledge can be used to handle this core issue, we categorize FSL methods from three perspectives: (i) data, which uses prior knowledge to augment the supervised experience;(ii) model, which uses prior knowledge to reduce the size of the hypothesis space;and (iii) algorithm, which uses prior knowledge to alter the search for the best hypothesis in the given hypothesis space. With this taxonomy, we review and discuss the pros and cons of each category. Promising directions, in the aspects of the FSL problem setups, techniques, applications and theories, are also proposed to provide insights for future research. Copyright © 2019, The Authors. All rights reserved.

关键词： Surveys

Research on deep Clustering Based on Image Data 13th

学校读者我要写书评

暂无评论

Research on Deep Clustering Based on Image Data

13th International Conference on Computer engineering and Networks, CENet 2023

作者： Li, Xuanyu Yang, Houqun Zhang, Xiaoying Yang, Dangui Huang, Jianqiang Gan, Lin College of Computer Science and Technology Hainan University Hainan Haikou570228 China Haikou Key Laboratory of Deep Learning and Big Data Application Technology Hainan University Hainan Haikou570228 China Hainan Haikou China Elmore Family School of Computer and Electrical Engineering Purdue University West Lafayette United States

ISBN: (纸本)9789819992386

Clustering is an important branch of unsupervised tasks, aiming at mining deeper relationships and patterns in data. The quality of feature representation based on image datasets often determines the upper limit of clustering tasks. In recent years, the application of deep learning in deep clustering representation learning module has learned better feature representation, which gradually overcomes the limitation of traditional shallow clustering facing high-dimensional unstructured data. deep clustering has become a popular research direction in the unsupervised field in recent years, and high clustering performance has been obtained in the continuous deepening research. The existing deep clustering research is mainly oriented to various fields of artificial intelligence, including natural language processing (NLP), speech recognition (ASR), computer vision (CV), etc. We take the field of computer vision as the entry point, analyze and study the research progress of deep clustering in computer vision (CV), and deconstruct the deep clustering model into two parts: a detailed classification of the feature extraction module from the perspective of network model architecture, while the data space of clustering is used as the entry point to divide the clustering module, and discuss the deep clustering. Finally, we analyze the datasets and evaluation metrics commonly used in experiments to study deep clustering architectures in computer vision. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd 2024.

关键词： Feature extraction