This paper addresses learning and recognition of humanbehavior models from multimodal observation in a smart home environment. The proposed approach is part of a framework for acquiring a high-level contextual model ...
详细信息
This paper addresses learning and recognition of humanbehavior models from multimodal observation in a smart home environment. The proposed approach is part of a framework for acquiring a high-level contextual model for humanbehavior in an augmented environment. A 3-D video tracking system creates and tracks entities (persons) in the scene. Further, a speech activity detector analyzes audio streams coming from head set microphones and determines for each entity, whether the entity speaks or not. An ambient sound detector detects noises in the environment. An individual role detector derives basic activity like "walking" or "interacting with table" from the extracted entity properties of the 3-D tracker. From the derived multimodal observations, different situations like "aperitif" or "presentation" are learned and detected using statistical models (HMMs). The objective of the proposed general framework is two-fold: the automatic offline analysis of humanbehavior recordings and the online detection of learned humanbehavior models. To evaluate the proposed approach, several multimodal recordings showing different situations have been conducted. The obtained results, in particular for offline analysis, are very good, showing that multimodality as well as multiperson observation generation are beneficial for situation recognition. Note to Practitioners-This paper was motivated by the problem of automatically recognizing humanbehavior and interactions in a smart home environment. The smart home environment is equipped with cameras and microphones that permit the observation of human activity in the scene. The objective is first to visualize the perceived human activities (e. g., for videoconferencing or surveillance of elderly people), and then to provide appropriate services based on these activities. We adopt a layered approach for human activity recognition in the environment. The layered framework is motivated by the human perception of humanbehavior in the sce
暂无评论