咨询与建议

限定检索结果

文献类型

  • 29 篇 会议
  • 24 篇 期刊文献
  • 1 篇 学位论文

馆藏范围

  • 54 篇 电子文献
  • 0 种 纸本馆藏

日期分布

学科分类号

  • 47 篇 工学
    • 44 篇 计算机科学与技术...
    • 17 篇 电气工程
    • 7 篇 软件工程
    • 5 篇 信息与通信工程
    • 1 篇 电子科学与技术(可...
  • 8 篇 理学
    • 7 篇 物理学
    • 1 篇 生物学
  • 6 篇 医学
    • 6 篇 临床医学
  • 3 篇 教育学
    • 3 篇 心理学(可授教育学...
  • 1 篇 管理学
    • 1 篇 管理科学与工程(可...

主题

  • 54 篇 audio-visual lea...
  • 5 篇 multi-modal lear...
  • 5 篇 visualization
  • 4 篇 task analysis
  • 4 篇 self-supervised ...
  • 4 篇 cross-modal retr...
  • 3 篇 multimodal learn...
  • 3 篇 representation l...
  • 3 篇 deep learning
  • 3 篇 event localizati...
  • 3 篇 sound source loc...
  • 3 篇 contrastive lear...
  • 3 篇 location awarene...
  • 3 篇 action recogniti...
  • 3 篇 feature extracti...
  • 2 篇 spiking neural n...
  • 2 篇 individual diffe...
  • 2 篇 audio-visual cor...
  • 2 篇 transformer
  • 2 篇 zero-shot learni...

机构

  • 3 篇 univ tubingen tu...
  • 2 篇 shanghai ai lab ...
  • 2 篇 univ surrey guil...
  • 2 篇 hefei univ techn...
  • 2 篇 beijing inst tec...
  • 1 篇 fudan univ sch c...
  • 1 篇 univ amsterdam
  • 1 篇 baidu inc people...
  • 1 篇 univ paris 05 un...
  • 1 篇 univ geneva fac ...
  • 1 篇 univ las palmas ...
  • 1 篇 univ michigan an...
  • 1 篇 chinese inst bra...
  • 1 篇 beijing univ pos...
  • 1 篇 univ elect sci &...
  • 1 篇 chinese acad sci...
  • 1 篇 czech tech univ ...
  • 1 篇 sichuan univ col...
  • 1 篇 int inst informa...
  • 1 篇 postech dept ele...

作者

  • 3 篇 koepke a. sophia
  • 3 篇 wang meng
  • 3 篇 mercea otniel-bo...
  • 3 篇 guo dan
  • 3 篇 zhou jinxing
  • 3 篇 akata zeynep
  • 2 篇 wang jing
  • 2 篇 liu miao
  • 2 篇 zeng donghuo
  • 2 篇 kim junsik
  • 2 篇 yin jianqin
  • 2 篇 hummel thomas
  • 2 篇 zhong yiran
  • 2 篇 ikeda kazushi
  • 2 篇 mei xinhao
  • 2 篇 kweon in so
  • 2 篇 xie xiang
  • 2 篇 tian yapeng
  • 2 篇 senocak arda
  • 2 篇 li wenrui

语言

  • 54 篇 英文
检索条件"主题词=Audio-Visual Learning"
54 条 记 录,以下是1-10 订阅
排序:
An IoT-enhanced automatic music composition system integrating audio-visual learning with transformer and SketchVAE
收藏 引用
ALEXANDRIA ENGINEERING JOURNAL 2025年 113卷 378-390页
作者: Zhang, Yifei Shanghai Conservatory Mus Dept Composit & Conducting Shanghai 200031 Peoples R China
With the rapid development of artificial intelligence and the Internet of Things technology, the automatic music composition system has become a hot topic of research. This paper presents the TransVAE-Music compositio... 详细信息
来源: 评论
Metric learning with Progressive Self-Distillation for audio-visual Embedding learning
Metric Learning with Progressive Self-Distillation for Audio...
收藏 引用
2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
作者: Zeng, Donghuo Ikeda, Kazushi KDDI Research Inc. Saitama Japan
Metric learning projects samples into an embedded space, where similarities and dissimilarities are quantified based on their learned representations. However, existing methods often rely on label-guided representatio... 详细信息
来源: 评论
A novel task and methods to evaluate inter-individual variation in audio-visual associative learning
收藏 引用
COGNITION 2024年 242卷 105658页
作者: Pasqualotto, Angela Cochrane, Aaron Bavelier, Daphne Altarelli, Irene Univ Geneva Fac Psychol & Educ Sci FPSE Geneva Switzerland Campus Biotech Geneva Switzerland Univ Paris Cite LaPsyDE CNRS Paris France Univ Geneva Fac Psychol & Educ Sci FPSE Campus Biotech Geneva Switzerland
learning audio-visual associations is foundational to a number of real-world skills, such as reading acquisition or social communication. Characterizing individual differences in such learning has therefore been of in... 详细信息
来源: 评论
Multi-modal spiking tensor regression network for audio-visual zero-shot learning
收藏 引用
NEUROCOMPUTING 2025年 629卷
作者: Yang, Zhe Li, Wenrui Hou, Jinxiu Cheng, Guanghui Univ Elect Sci & Technol China Sch Math Sci Chengdu 611731 Sichuan Peoples R China Harbin Inst Technol Dept Comp Sci & Technol Harbin 150001 Peoples R China
Recently, convolutional neural networks have got significant attention, particularly in the field of audio-visual zero-shot learning. It can accurately perceive and capture local features, which allows the model to ef... 详细信息
来源: 评论
audio-visual self-supervised representation learning: A survey
收藏 引用
NEUROCOMPUTING 2025年 634卷
作者: Alsuwat, Manal Al-Shareef, Sarah Alghamdi, Manal Umm Al Qura Univ Dept Comp Sci & Artificial Intelligence Mecca Saudi Arabia
Artificial intelligence developers leverage the inherent relationships among video, text, and audio to create enhanced representations of the world, mirroring the way humans use multiple senses to understand their env... 详细信息
来源: 评论
A Survey of Multimodal learning: Methods, Applications, and Future
收藏 引用
ACM COMPUTING SURVEYS 2025年 第7期57卷 1-34页
作者: Yuan, Yuan Li, Zhaojian Zhao, Bin Northwestern Polytech Univ Sch Artificial Intelligence Opt & Elect iOPEN Xian Peoples R China
The multimodal interplay of the five fundamental senses-Sight, Hearing, Smell, Taste, and Touch-provides humans with superior environmental perception and learning skills. Adapted from the human perceptual system, mul... 详细信息
来源: 评论
audio-visual Segmentation with Semantics
收藏 引用
INTERNATIONAL JOURNAL OF COMPUTER VISION 2025年 第4期133卷 1644-1664页
作者: Zhou, Jinxing Shen, Xuyang Wang, Jianyuan Zhang, Jiayi Sun, Weixuan Zhang, Jing Birchfield, Stan Guo, Dan Kong, Lingpeng Wang, Meng Zhong, Yiran Hefei Univ Technol Hefei Peoples R China Shanghai AI Lab Shanghai Peoples R China Univ Oxford Oxford England Beihang Univ Beijing Peoples R China Australian Natl Univ Canberra Australia Nvidia Santa Clara CA USA Univ Hong Kong Hong Kong Peoples R China
We propose a new problem called audio-visual segmentation (AVS), in which the goal is to output a pixel-level map of the object(s) that produce sound at the time of the image frame. To facilitate this research, we con... 详细信息
来源: 评论
STNet: Deep audio-visual Fusion Network for Robust Speaker Tracking
收藏 引用
IEEE TRANSACTIONS ON MULTIMEDIA 2025年 27卷 1835-1847页
作者: Li, Yidi Liu, Hong Yang, Bing Taiyuan Univ Technol Coll Comp Sci & Technol Taiyuan 030024 Peoples R China Peking Univ Shenzhen Grad Sch Key Lab Machine Percept Beijing 100871 Peoples R China Westlake Univ Westlake Inst Adv Study Hangzhou 310024 Peoples R China
audio-visual speaker tracking aims to determine the location of human targets in a scene using signals captured by a multi-sensor platform, whose accuracy and robustness can be improved by multi-modal fusion methods. ... 详细信息
来源: 评论
Day2Dark: Pseudo-Supervised Activity Recognition Beyond Silent Daylight
收藏 引用
INTERNATIONAL JOURNAL OF COMPUTER VISION 2025年 第4期133卷 2136-2157页
作者: Zhang, Yunhua Doughty, Hazel Snoek, Cees G. M. Univ Amsterdam Amsterdam Netherlands Leiden Univ Leiden Netherlands
This paper strives to recognize activities in the dark, as well as in the day. We first establish that state-of-the-art activity recognizers are effective during the day, but not trustworthy in the dark. The main caus... 详细信息
来源: 评论
CLIP-Powered TASS: Target-Aware Single-Stream Network for audio-visual Question Answering
收藏 引用
INTERNATIONAL JOURNAL OF COMPUTER VISION 2025年 第5期133卷 2581-2598页
作者: Jiang, Yuanyuan Yin, Jianqin Beijing Univ Posts & Telecommun Sch Artificial Intelligence Xitucheng Rd 10 Beijing 100876 Peoples R China
While vision-language pretrained models (VLMs) excel in various multimodal understanding tasks, their potential in fine-grained audio-visual reasoning, particularly for audio-visual question answering (AVQA), remains ... 详细信息
来源: 评论