咨询与建议

限定检索结果

文献类型

  • 29 篇 会议
  • 24 篇 期刊文献
  • 1 篇 学位论文

馆藏范围

  • 54 篇 电子文献
  • 0 种 纸本馆藏

日期分布

学科分类号

  • 50 篇 工学
    • 47 篇 计算机科学与技术...
    • 20 篇 电气工程
    • 10 篇 软件工程
    • 8 篇 信息与通信工程
    • 4 篇 电子科学与技术(可...
    • 3 篇 控制科学与工程
  • 8 篇 理学
    • 7 篇 物理学
    • 1 篇 生物学
  • 6 篇 医学
    • 6 篇 临床医学
  • 3 篇 教育学
    • 3 篇 心理学(可授教育学...
  • 1 篇 管理学
    • 1 篇 管理科学与工程(可...

主题

  • 54 篇 audio-visual lea...
  • 5 篇 multi-modal lear...
  • 5 篇 visualization
  • 4 篇 task analysis
  • 4 篇 self-supervised ...
  • 4 篇 cross-modal retr...
  • 3 篇 multimodal learn...
  • 3 篇 representation l...
  • 3 篇 deep learning
  • 3 篇 event localizati...
  • 3 篇 sound source loc...
  • 3 篇 contrastive lear...
  • 3 篇 location awarene...
  • 3 篇 action recogniti...
  • 3 篇 feature extracti...
  • 2 篇 spiking neural n...
  • 2 篇 individual diffe...
  • 2 篇 audio-visual cor...
  • 2 篇 transformer
  • 2 篇 zero-shot learni...

机构

  • 3 篇 univ tubingen tu...
  • 2 篇 shanghai ai lab ...
  • 2 篇 univ surrey guil...
  • 2 篇 hefei univ techn...
  • 2 篇 beijing inst tec...
  • 1 篇 fudan univ sch c...
  • 1 篇 univ amsterdam
  • 1 篇 baidu inc people...
  • 1 篇 univ paris 05 un...
  • 1 篇 univ geneva fac ...
  • 1 篇 univ las palmas ...
  • 1 篇 univ michigan an...
  • 1 篇 chinese inst bra...
  • 1 篇 beijing univ pos...
  • 1 篇 univ elect sci &...
  • 1 篇 chinese acad sci...
  • 1 篇 czech tech univ ...
  • 1 篇 sichuan univ col...
  • 1 篇 int inst informa...
  • 1 篇 postech dept ele...

作者

  • 3 篇 koepke a. sophia
  • 3 篇 wang meng
  • 3 篇 mercea otniel-bo...
  • 3 篇 guo dan
  • 3 篇 zhou jinxing
  • 3 篇 akata zeynep
  • 2 篇 wang jing
  • 2 篇 liu miao
  • 2 篇 zeng donghuo
  • 2 篇 kim junsik
  • 2 篇 yin jianqin
  • 2 篇 hummel thomas
  • 2 篇 zhong yiran
  • 2 篇 ikeda kazushi
  • 2 篇 mei xinhao
  • 2 篇 kweon in so
  • 2 篇 xie xiang
  • 2 篇 tian yapeng
  • 2 篇 senocak arda
  • 2 篇 li wenrui

语言

  • 54 篇 英文
检索条件"主题词=Audio-Visual Learning"
54 条 记 录,以下是31-40 订阅
排序:
Enhancing Sound Source Localization via False Negative Elimination
收藏 引用
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024年 第12期46卷 10499-10514页
作者: Song, Zengjie Zhang, Jiangshe Wang, Yuxi Fan, Junsong Zhang, Zhaoxiang Xi An Jiao Tong Univ Sch Math & Stat Xian 710049 Peoples R China Chinese Acad Sci Hong Kong Inst Sci & Innovat Ctr Artificial Intelligence & Robot Hong Kong Peoples R China Chinese Acad Sci Inst Automat New Lab Pattern Recognit State Key Lab Multimodal Artificial Intelligence Beijing 100190 Peoples R China Univ Chinese Acad Sci Beijing 100049 Peoples R China
Sound source localization aims to localize objects emitting the sound in visual scenes. Recent works obtaining impressive results typically rely on contrastive learning. However, the common practice of randomly sampli... 详细信息
来源: 评论
UAVM: Towards Unifying audio and visual Models
收藏 引用
IEEE SIGNAL PROCESSING LETTERS 2022年 29卷 2437-2441页
作者: Gong, Yuan Liu, Alexander H. Rouditchenko, Andrew Glass, James MIT Comp Sci & Artificial Intelligence Lab Cambridge MA 02139 USA
Conventional audio-visual models have independent audio and video branches. In this work, we unify the audio and visual branches by designing a Unified audio-visual Model (UAVM). The UAVM achieves a new state-of-the-a... 详细信息
来源: 评论
Deep Multi-biometric Fusion for audio-visual User Re-Identification and Verification  8th
Deep Multi-biometric Fusion for Audio-Visual User Re-Identif...
收藏 引用
8th International Conference on Pattern Recognition Applications and Methods (ICPRAM)
作者: Marras, Mirko Marin-Reyes, Pedro A. Lorenzo-Navarro, Javier Castrillon-Santana, Modesto Fenu, Gianni Univ Cagliari Dept Math & Comp Sci V Osped 72 I-09124 Cagliari Italy Univ Las Palmas Gran Canaria Inst Univ Sistemas Inteligentes & Aplicac Numer I Campus Univ Tafira Las Palmas Gran Canaria 35017 Spain
From border controls to personal devices, from online exam proctoring to human-robot interaction, biometric technologies are empowering individuals and organizations with convenient and secure authentication and ident... 详细信息
来源: 评论
visually-Aware audio Captioning With Adaptive audio-visual Attention  24
Visually-Aware Audio Captioning With Adaptive Audio-Visual A...
收藏 引用
Interspeech Conference
作者: Liu, Xubo Huang, Qiushi Mei, Xinhao Liu, Haohe Kong, Qiuqiang Sun, Jianyuan Li, Shengchen Ko, Tom Zhang, Yu Tang, Lilian H. Plumbley, Mark D. Kilic, Volkan Wang, Wenwu Univ Surrey Guildford Surrey England ByteDance Beijing Peoples R China Xian Jiaotong Liverpool Univ Xian Peoples R China Southern Univ Sci & Technol Shenzhen Peoples R China Izmir Katip Celebi Univ Izmir Turkiye
audio captioning aims to generate text descriptions of audio clips. In the real world, many objects produce similar sounds. How to accurately recognize ambiguous sounds is a major challenge for audio captioning. In th... 详细信息
来源: 评论
DEEP VIDEO INPAINTING GUIDED BY audio-visual SELF-SUPERVISION  47
DEEP VIDEO INPAINTING GUIDED BY AUDIO-VISUAL SELF-SUPERVISIO...
收藏 引用
47th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
作者: Kim, Kyuyeon Jung, Junsik Kim, Woo Jae Yoon, Sung-Eui Korea Adv Inst Sci & Technol KAIST Sch Comp Daejeon South Korea
Humans can easily imagine a scene from auditory information based on their prior knowledge of audio-visual events. In this paper, we mimic this innate human ability in deep learning models to improve the quality of vi... 详细信息
来源: 评论
Unraveling Instance Associations: A Closer Look for audio-visual Segmentation
Unraveling Instance Associations: A Closer Look for Audio-Vi...
收藏 引用
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
作者: Chen, Yuanhong Liu, Yuyuan Wang, Hu Liu, Fengbei Wang, Chong Frazer, Helen Carneiro, Gustavo Univ Adelaide Australian Inst Machine Learning Adelaide SA Australia St Vincents Hosp Melbourne Melbourne Vic Australia Univ Surrey Ctr Vis Speech & Signal Proc Guildford Surrey England
audio-visual segmentation (AVS) is a challenging task that involves accurately segmenting sounding objects based on audio-visual cues. The effectiveness of audio-visual learning critically depends on achieving accurat... 详细信息
来源: 评论
PANO-ECHO: PANOramic depth prediction enhancement with ECHO features  2
PANO-ECHO: PANOramic depth prediction enhancement with ECHO ...
收藏 引用
2nd IEEE Conference on Artificial Intelligence (CAI)
作者: Liu, Xiaohu Brunetto, Amandine Hornauer, Sascha Moutarde, Fabien Lu, Jialiang Shanghai Jiao Tong Univ SJTU Paris Elite Inst Technol Shanghai Peoples R China PSL Univ Ctr Robot MINES Paris Paris France
Panoramic depth estimation gains importance with more 360 degrees images being widely available. However, traditional mono-to-depth approaches, optimized for a limited field of view, show subpar performance when naive... 详细信息
来源: 评论
Integrating audio-visual Contexts with Refinement for Segmentation  33rd
Integrating Audio-Visual Contexts with Refinement for Segmen...
收藏 引用
33rd International Conference on Artificial Neural Networks and Machine learning (ICANN)
作者: Geng, Qingwei Gu, Xiaodong Fudan Univ Dept Elect Engn Shanghai 200438 Peoples R China
A more fine-grained video spatial localization task audio visual segmentation(AVS) has recently been proposed, which aims to generate the masks of the sounding objects that sound in the given videos. In this paper, we... 详细信息
来源: 评论
Masked Lip-Sync Prediction by audio-visual Contextual Exploitation in Transformers  22
Masked Lip-Sync Prediction by Audio-Visual Contextual Exploi...
收藏 引用
SIGGRAPH Asia Conference
作者: Sun, Yasheng Zhou, Hang Wang, Kaisiyuan Wu, Qianyi Hong, Zhibin Liu, Jingtuo Ding, Errui Wang, Jingdong Liu, Ziwei Koike, Hideki Tokyo Inst Technol Tokyo Japan Baidu Inc Shanghai Peoples R China Univ Sydney Sydney NSW Australia Monash Univ Melbourne Vic Australia Baidu Inc Shenzhen Peoples R China Baidu Inc Beijing Peoples R China Nanyang Technol Univ Singapore Singapore
Previous studies have explored generating accurately lip-synced talking faces for arbitrary targets given audio conditions. However, most of them deform or generate the whole facial area, leading to non-realistic resu... 详细信息
来源: 评论
Extreme-scale Talking-Face Video Upsampling with audio-visual Priors  22
Extreme-scale Talking-Face Video Upsampling with Audio-Visua...
收藏 引用
30th ACM International Conference on Multimedia (MM)
作者: Hegde, Sindhu B. Mukhopadhyay, Rudrabha Namboodiri, Vinay P. Jawahar, C. V. Int Inst Informat Technol Hyderabad India Univ Bath Bath Avon England
In this paper, we explore an interesting question of what can be obtained from an 8 x 8 pixel video sequence. Surprisingly, it turns out to be quite a lot. We show that when we process this 8 x 8 video with the right ... 详细信息
来源: 评论