Research on video activity detection has primarily focused on identifying well-defined human activities in short video segments. The majority of the research on video activity recognition is focused on the development...
详细信息
Research on video activity detection has primarily focused on identifying well-defined human activities in short video segments. The majority of the research on video activity recognition is focused on the development of large parameter systems that require training on large video datasets. This paper develops a low-parameter, modular system with rapid inferencing capabilities that can be trained entirely on limited datasets without requiring transfer learning from large-parameter systems. The system can accurately detect and associate specific activities with the students who perform the activities in real-life classroom videos. Additionally, the paper develops an interactive web-based application to visualize human activity maps over long real-life classroom videos. Long-term video activity detection in real-life classroom videos present unique challenges, such as the need to detect multiple simultaneous activities, rapid transitions between activities, long-term occlusions, durations exceeding 15 minutes, and numerous individuals performing similar activities in the background. Moreover, subtle hand movements further complicate the need to differentiate between actual typing and writing activities as opposed to unrelated hand movements. The system processes the input videos using fast activity initializations and current methods for object detection to determine the location and the the person performing the activities. These regions are then processed through an optimal low-parameter dyadic 3D-CNN classifier to identify the activity. The proposed system processes 1 hour of video in 15 minutes for typing and 50 minutes for writing activities. The system uses several methods to optimize the inference pipeline. For each activity, the system determines an optimal low-parameter 3D CNN architecture selected from a family of low-parameter architectures. The input video is broken into smaller video regions that are transcoded at an optimized frame rate. For inference, a
The paper develops datasets and methods to assess student participation in real-life collaborative learning environments. In collaborative learning environments, students are organized into small groups where they are...
详细信息
Operant keypress tasks, where each action has a consequence, have been analogized to the construct of "wanting" and produce lawful relationships in humans that quantify preferences for approach and avoidance...
详细信息
暂无评论