Classroom emotion is an important dimension to evaluate teaching effect, and the application of imageprocessing to online teaching emotion analysis has become an inevitable trend of development. Aiming at the problem...
详细信息
Classroom emotion is an important dimension to evaluate teaching effect, and the application of imageprocessing to online teaching emotion analysis has become an inevitable trend of development. Aiming at the problems of low accuracy of expression recognition, unclear emotion scheme for online teaching evaluation, and low applicability of expression recognition model in existing methods, this paper conducts a research on classroom videoimage emotion analysis method for online teaching quality evaluation. First, the classroom videoimage emotion analysis task is divided into facial expression recognition task and facial feature point location task, and multi-task learning is carried out to achieve real-time switching between the two tasks for different types of input. The tag attention mechanism is proposed to deeply mine the key areas of the face in the classroom videoimage, so as to maintain the compactness of the distribution of the center sample of the classroom videoimage and its neighborhood samples in the feature space. Finally, based on the expression activity of teachers and students in the online classroom, the online classroom teaching emotion is analyzed, and the online teaching quality is evaluated from the side. The experimental results verify the validity of the model.
The realm of video content has witnessed exponential growth, and alongside it emerges the challenge of enhancing video quality efficiently. Traditional techniques, although valuable, often fall short of addressing the...
详细信息
With the rapid development of the automobile industry, the problem of traffic congestion has become increasingly prominent. In order to reduce traffic accidents, improve road transportation efficiency and ensure road ...
详细信息
Due to the increasing number of tumors, new interventional Computed Tomography (CT) procedures have been proposed that aim to optimize workflow, time-effective diagnosis and treatments. To support tumor ablation proce...
详细信息
ISBN:
(纸本)9783031661457;9783031661464
Due to the increasing number of tumors, new interventional Computed Tomography (CT) procedures have been proposed that aim to optimize workflow, time-effective diagnosis and treatments. To support tumor ablation procedures, CT scanners must pre-process 2D projections and reconstruct 3D slices of the human body in realtime, while data are acquired. This paper proposes a lightweight processing architecture for MPSoC-FPGA that performs the "CT pre-processing phase" on the fly;this phase consists of the pixel processing of 2D images. It is also suitable for exploring different data formats that can be selected at design time to improve performance while keeping image quality. This article focuses on the cosine and redundancy weighting steps, which can not be implemented following the standard method on embedded MPSoC-FPGA, due to the high resource utilization costs of their arithmetic operations. Therefore, this work proposes different optimizations that result in a reduction of the number of operations to compute and the amount of on-chip memory required in comparison to the standard algorithm. Finally, the proposed architecture has been implemented and instantiated within a Control Data Acquisition System (CDAS) architecture running on the XC7Z045 AMD-Xilinx MPSoC-FPGA and integrated into an open-interface CT scanner assembled in our laboratory. Here, the optimized weighting steps use up to 33.8 times fewer DSPs than the implementation based on the standard solution. Furthermore, it adds only 80 ns of latency, making it 7.9 times faster than the implementation based on the standard solution.
Nowadays, ensuring road safety is a crucial issue that demands continuous development and measures to minimize the risk of accidents. This paper presents the development of a driver fatigue detection method based on t...
详细信息
ISBN:
(纸本)9798350362923;9798350362916
Nowadays, ensuring road safety is a crucial issue that demands continuous development and measures to minimize the risk of accidents. This paper presents the development of a driver fatigue detection method based on the analysis of facial images. To monitor the driver's condition in real-time, a video camera was used. The method of detection is based on analyzing facial features related to the mouth area and eyes, such as the frequency of blinking and yawning, mouth aspect ratio (MAR), and the duration of eye closure. The method was implemented in Python using a convolutional neural network (CNN). To validate the method, a dataset was created containing eye images that were subjected to various modifications, including the use of corrective glasses. The model's results confirm the method's effectiveness in detecting fatigue, achieving an average accuracy of 92% for eye detection and 82% for yawning detection under well-lit conditions.
real-time Magnetic resonance imaging (rtMRI) of the midsagittal plane of the mouth is of interest for speech production research. In this work, we focus on estimating utterance level rtMRI video from the spoken phonem...
详细信息
Object detection involves looking at pictures and figuring out where things are and what the things are. This idea is expanded into real-time settings where numerous items must be detected simultaneously by using mult...
详细信息
Dance video analysis and interpretation have been challenging tasks in computer vision due to the lack of availability of annotated data. We may get several videos available in the public domain but hardly annotated b...
详细信息
ISBN:
(纸本)9783031581809;9783031581816
Dance video analysis and interpretation have been challenging tasks in computer vision due to the lack of availability of annotated data. We may get several videos available in the public domain but hardly annotated because it incurs much time, cost, and expert knowledge. This paper addresses a solution to this problem in the field of Bharatanatyam, an Indian Classical Dance (ICD) form. The video annotation itself has a broad area of coverage. It may include the annotation of objects, activities, hand gestures, facial expressions, the semantics of a video, and many more. However, the paper annotates the elementary postures of the Bharatanatyam dance videos. The proposed tool takes a video as an input and segments it into motion and stationary/non-motion frames. The non-motion frames are part of the elementary postures and our point of interest. After segmentation, the tool recognizes the postures and labels the postures' duration of occurrence inform of frame number. The data set on which it is applied covers most basic dance variations in the Bharatanatyam, which was not addressed earlier on this large scale. The basic dance variations used to learn Bharatanatyam are called Adavus. The paper includes 13 Adavus and its 52 variations, which three dancers have performed. We use this as our data set. This annotation tool may significantly contribute to the state of the arts, saving the time and cost involved in manual annotation. The tool uses the deep learning technique for dance video annotation and gets an accuracy above 75%.
Unsupervised video Object Segmentation (UVOS) refers to the challenging task of segmenting the prominent object in videos without manual guidance. In recent works, two approaches for UVOS have been discussed that can ...
详细信息
ISBN:
(纸本)9781728198354
Unsupervised video Object Segmentation (UVOS) refers to the challenging task of segmenting the prominent object in videos without manual guidance. In recent works, two approaches for UVOS have been discussed that can be divided into: appearance and appearance-motion-based methods, which have limitations respectively. Appearance-based methods do not consider the motion of the target object due to exploiting the correlation information between randomly paired frames. Appearance-motion-based methods have the limitation that the dependency on optical flow is dominant due to fusing the appearance with motion. In this paper, we propose a novel framework for UVOS that can address the aforementioned limitations of the two approaches in terms of both time and scale. Temporal Alignment Fusion aligns the saliency information of adjacent frames with the target frame to leverage the information of adjacent frames. Scale Alignment Decoder predicts the target object mask by aggregating multi-scale feature maps via continuous mapping with implicit neural representation. We present experimental results on public benchmark datasets, DAVIS 2016 and FBMS, which demonstrate the effectiveness of our method. Furthermore, we outperform the state-of-the-art methods on DAVIS 2016.
In this paper, we experimentally demonstrate an optical camera communications (OCC) system for wearable light-emitting diode (LED) as the transmitter. Wearable devices are powerful tools for supporting Internet of Thi...
详细信息
ISBN:
(纸本)9798350302233
In this paper, we experimentally demonstrate an optical camera communications (OCC) system for wearable light-emitting diode (LED) as the transmitter. Wearable devices are powerful tools for supporting Internet of Things (IoT) systems because of their sensing, processing, and communication capability. The term "wearable devices" refers to a wide range of products that can be integrated into clothing and accessories, thus allowing real-time data detection, storage, and exchange without human intervention. This paper presents the practical evaluation of an LED-based wearable transmitter for an OCC system to demonstrate its feasibility. In particular, an LED array attached to the body is modulated using on-off keying to transmit data via visible light, and a smartphone camera captures video of the user wearing the device while slightly moving in a static position in the room. Finally, the data is decoded from the video frames using an imageprocessing algorithm that tracks the source and demodulates the signal.
暂无评论