咨询与建议

限定检索结果

文献类型

  • 29 篇 会议
  • 25 篇 期刊文献
  • 1 篇 学位论文

馆藏范围

  • 55 篇 电子文献
  • 0 种 纸本馆藏

日期分布

学科分类号

  • 51 篇 工学
    • 48 篇 计算机科学与技术...
    • 21 篇 电气工程
    • 10 篇 软件工程
    • 8 篇 信息与通信工程
    • 4 篇 电子科学与技术(可...
    • 3 篇 控制科学与工程
  • 8 篇 理学
    • 7 篇 物理学
    • 1 篇 生物学
  • 6 篇 医学
    • 6 篇 临床医学
  • 3 篇 教育学
    • 3 篇 心理学(可授教育学...
  • 1 篇 管理学
    • 1 篇 管理科学与工程(可...

主题

  • 55 篇 audio-visual lea...
  • 5 篇 multi-modal lear...
  • 5 篇 visualization
  • 4 篇 task analysis
  • 4 篇 self-supervised ...
  • 4 篇 cross-modal retr...
  • 3 篇 multimodal learn...
  • 3 篇 representation l...
  • 3 篇 deep learning
  • 3 篇 event localizati...
  • 3 篇 sound source loc...
  • 3 篇 contrastive lear...
  • 3 篇 location awarene...
  • 3 篇 action recogniti...
  • 3 篇 feature extracti...
  • 2 篇 spiking neural n...
  • 2 篇 individual diffe...
  • 2 篇 audio-visual cor...
  • 2 篇 transformer
  • 2 篇 zero-shot learni...

机构

  • 3 篇 univ tubingen tu...
  • 2 篇 shanghai ai lab ...
  • 2 篇 univ surrey guil...
  • 2 篇 hefei univ techn...
  • 2 篇 beijing inst tec...
  • 1 篇 fudan univ sch c...
  • 1 篇 univ amsterdam
  • 1 篇 baidu inc people...
  • 1 篇 univ paris 05 un...
  • 1 篇 univ geneva fac ...
  • 1 篇 univ las palmas ...
  • 1 篇 univ michigan an...
  • 1 篇 univ tokyo inst ...
  • 1 篇 chinese inst bra...
  • 1 篇 beijing univ pos...
  • 1 篇 univ elect sci &...
  • 1 篇 chinese acad sci...
  • 1 篇 czech tech univ ...
  • 1 篇 sichuan univ col...
  • 1 篇 int inst informa...

作者

  • 3 篇 koepke a. sophia
  • 3 篇 wang meng
  • 3 篇 mercea otniel-bo...
  • 3 篇 guo dan
  • 3 篇 zhou jinxing
  • 3 篇 akata zeynep
  • 2 篇 wang jing
  • 2 篇 liu miao
  • 2 篇 zeng donghuo
  • 2 篇 kim junsik
  • 2 篇 yin jianqin
  • 2 篇 sato yoichi
  • 2 篇 sato tomoya
  • 2 篇 hummel thomas
  • 2 篇 zhong yiran
  • 2 篇 ikeda kazushi
  • 2 篇 mei xinhao
  • 2 篇 kweon in so
  • 2 篇 xie xiang
  • 2 篇 tian yapeng

语言

  • 55 篇 英文
检索条件"主题词=Audio-Visual Learning"
55 条 记 录,以下是31-40 订阅
排序:
visualLY GUIDED BINAURAL audio GENERATION WITH CROSS-MODAL CONSISTENCY  49
VISUALLY GUIDED BINAURAL AUDIO GENERATION WITH CROSS-MODAL C...
收藏 引用
49th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
作者: Liu, Miao Wang, Jing Qian, Xinyuan Xie, Xiang Beijing Inst Technol Beijing Peoples R China Univ Sci & Technol Beijing Beijing Peoples R China
Binaural audio delivers an immersive spatial auditory experience to human listeners, but most existing videos lack binaural audio due to the expertise required for recording environments. Recent studies have been dedi... 详细信息
来源: 评论
Motion Based audio-visual Segmentation  25
Motion Based Audio-Visual Segmentation
收藏 引用
25th Interspeech Conference
作者: Li, Jiahao Liu, Miao Yang, Shu Wang, Jing Xie, Xiang Beijing Inst Technol Beijing Peoples R China Tsinghua Univ Beijing Peoples R China
Recently, a novel task called audio-visual segmentation (AVS) has emerged, focusing on pixel-wise segmentation of sounding objects in videos. This task is particularly challenging as it involves segmenting individual ... 详细信息
来源: 评论
TIM: A Time Interval Machine for audio-visual Action Recognition
TIM: A Time Interval Machine for Audio-Visual Action Recogni...
收藏 引用
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
作者: Chalk, Jacob Huh, Jaesung Kazakos, Evangelos Zisserman, Andrew Damen, Dima Univ Bristol Bristol Avon England Univ Oxford VGG Oxford England Czech Tech Univ Prague Czech Republic
Diverse actions give rise to rich audio-visual signals in long videos. Recent works showcase that the two modalities of audio and video exhibit different temporal extents of events and distinct labels. We address the ... 详细信息
来源: 评论
AV-SUPERB: A MULTI-TASK EVALUATION BENCHMARK FOR audio-visual REPRESENTATION MODELS  49
AV-SUPERB: A MULTI-TASK EVALUATION BENCHMARK FOR AUDIO-VISUA...
收藏 引用
49th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
作者: Tseng, Yuan Berry, Layne Chen, Yi-Ting Chiu, I-Hsiang Lin, Hsuan-Hao Liu, Max Peng, Puyuan Shih, Yi-Jen Wang, Hung-Yu Wu, Haibin Huang, Po-Yao Lai, Chun-Mao Li, Shang-Wen Harwath, David Tsao, Yu Mohamed, Abdelrahman Feng, Chi-Luen Lee, Hung-Yi Natl Taiwan Univ Taipei Taiwan Univ Texas Austin Austin TX USA Acad Sinica Taipei Taiwan Meta AI Toronto ON Canada Rembrand Palo Alto CA USA
audio-visual representation learning aims to develop systems with human-like perception by utilizing correlation between auditory and visual information. However, current models often focus on a limited set of tasks, ... 详细信息
来源: 评论
learning SOUND LOCALIZATION BETTER FROM SEMANTICALLY SIMILAR SAMPLES  47
LEARNING SOUND LOCALIZATION BETTER FROM SEMANTICALLY SIMILAR...
收藏 引用
47th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
作者: Senocak, Arda Ryu, Hyeonggon Kim, Junsik Kweon, In So Korea Adv Inst Sci & Technol Daejeon South Korea Harvard Univ Cambridge MA 02138 USA
The objective of this work is to localize the sound sources in visual scenes. Existing audio-visual works employ contrastive learning by assigning corresponding audio-visual pairs from the same source as positives whi... 详细信息
来源: 评论
T-VSL: Text-Guided visual Sound Source Localization in Mixtures
T-VSL: Text-Guided Visual Sound Source Localization in Mixtu...
收藏 引用
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
作者: Mahmud, Tanvir Tian, Yapeng Marculescu, Diana Univ Texas Austin Austin TX 78712 USA Univ Texas Dallas Dallas TX 75080 USA
visual sound source localization poses a significant challenge in identifying the semantic region of each sounding source within a video. Existing self-supervised and weakly supervised source localization methods stru... 详细信息
来源: 评论
FOLEYGEN: visualLY-GUIDED audio GENERATION  34
FOLEYGEN: VISUALLY-GUIDED AUDIO GENERATION
收藏 引用
34th International Workshop on Machine learning for Signal Processing
作者: Mei, Xinhao Nagaraj, Varun Le Lant, Gael Ni, Zhaoheng Chang, Ernie Shi, Yangyang Chandrakumar, Vikas Meta Menlo Pk CA 94025 USA Univ Surrey Guildford Surrey England
Recent advancements in audio generation tasks, such as text-to-audio and text-to-music generation, have been spurred by the evolution of deep learning models and large-scale datasets. However, the task of video-to-aud... 详细信息
来源: 评论
learning to Localize Sound Sources in visual Scenes: Analysis and Applications
收藏 引用
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021年 第5期43卷 1605-1619页
作者: Senocak, Arda Oh, Tae-Hyun Kim, Junsik Yang, Ming-Hsuan Kweon, In So Korea Adv Inst Sci & Technol Sch Elect Engn Daejeon 34141 South Korea POSTECH Dept Elect Engn Pohang 37673 South Korea Univ Calif Dept Elect Engn & Comp Sci Merced CA 95343 USA
visual events are usually accompanied by sounds in our daily lives. However, can the machines learn to correlate the visual scene and sound, as well as localize the sound source only by observing them like humans? To ... 详细信息
来源: 评论
visually-Aware audio Captioning With Adaptive audio-visual Attention  24
Visually-Aware Audio Captioning With Adaptive Audio-Visual A...
收藏 引用
Interspeech Conference
作者: Liu, Xubo Huang, Qiushi Mei, Xinhao Liu, Haohe Kong, Qiuqiang Sun, Jianyuan Li, Shengchen Ko, Tom Zhang, Yu Tang, Lilian H. Plumbley, Mark D. Kilic, Volkan Wang, Wenwu Univ Surrey Guildford Surrey England ByteDance Beijing Peoples R China Xian Jiaotong Liverpool Univ Xian Peoples R China Southern Univ Sci & Technol Shenzhen Peoples R China Izmir Katip Celebi Univ Izmir Turkiye
audio captioning aims to generate text descriptions of audio clips. In the real world, many objects produce similar sounds. How to accurately recognize ambiguous sounds is a major challenge for audio captioning. In th... 详细信息
来源: 评论
PEANUT: A Human-AI Collaborative Tool for Annotating audio-visual Data  23
PEANUT: A Human-AI Collaborative Tool for Annotating Audio-V...
收藏 引用
36th Annual ACM Symposium on User Interface Software and Technology (UIST)
作者: Ning, Zheng Zhang, Zheng Xu, Chenliang Tian, Yapeng Li, Toby Jia-Jun Univ Notre Dame Notre Dame IN 46556 USA Univ Rochester Rochester NY USA Univ Texas Dallas Richardson TX 75083 USA
audio-visual learning seeks to enhance the computers multi-modal perception leveraging the correlation between the auditory and visual modalities. Despite their many useful downstream tasks, such as video retrieval, A... 详细信息
来源: 评论