版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology United States Center for Brains Minds and Machines Massachusetts Institute of Technology United States Children's Hospital Harvard Medical School United States Department of Computer Science and Applied Mathematics Weizmann Institute of Science Israel Ge Research Artificial Intelligence United States
出 版 物:《arXiv》 (arXiv)
年 卷 期:2021年
核心收录:
主 题:Behavioral research
摘 要:In human vision objects and their parts can be visually recognized from purely spatial or purely temporal information but the mechanisms integrating space and time are poorly understood. Here we show that human visual recognition of objects and actions can be achieved by efficiently combining spatial and motion cues in configurations where each source on its own is insufficient for recognition. This analysis is obtained by identifying minimal videos: these are short and tiny video clips in which objects, parts, and actions can be reliably recognized, but any reduction in either space or time makes them unrecognizable. State-of-the-art deep networks for dynamic visual recognition cannot replicate human behavior in these configurations. This gap between humans and machines points to critical mechanisms in human dynamic vision that are lacking in current models. © 2021, CC BY.