The proceedings contain 92 papers. The special focus in this conference is on image Analysis and processing. The topics include: An Effective CNN-Based Super Resolution Method for video Coding;medical Transformers for...
ISBN:
(纸本)9783031510229
The proceedings contain 92 papers. The special focus in this conference is on image Analysis and processing. The topics include: An Effective CNN-Based Super Resolution Method for video Coding;medical Transformers for Boosting Automatic Grading of Colon Carcinoma in Histological images;FERMOUTH: Facial Emotion Recognition from the MOUTH Region;consensus Ranking for Efficient Face image Retrieval: A Novel Method for Maximising Precision and Recall;towards Explainable Navigation and Recounting;towards Facial Expression Robustness in Multi-scale Wild Environments;depth Camera Face Recognition by Normalized Fractal Encodings;automatic Generation of Semantic Parts for Face image Synthesis;improved Bilinear Pooling for real-time Pose Event Camera Relocalisation;continual Source-Free Unsupervised Domain Adaptation;End-to-End Asbestos Roof Detection on Orthophotos Using Transformer-Based YOLO Deep Neural Network;OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data;UAV Multi-object Tracking by Combining Two Deep Neural Architectures;GLR: Gradient-Based Learning Rate Scheduler;a Large-scale Analysis of Athletes’ Cumulative Race time in Running Events;uncovering Lies: Deception Detection in a Rolling-Dice Experiment;active Class Selection for Dataset Acquisition in Sign Language Recognition;MC-GTA: A Synthetic Benchmark for Multi-Camera Vehicle Tracking;a Differentiable Entropy Model for Learned image Compression;learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation;self-Similarity Block for Deep image Denoising;SCENE-pathy: Capturing the Visual Selective Attention of People Towards Scene Elements;not with My Name! Inferring Artists’ Names of Input Strings Employed by Diffusion Models;benchmarking of Blind video Deblurring Methods on Long Exposure and Resource Poor Settings;LieToMe: An LSTM-Based Method for Deception Detection by Hand Movements;spatial Transformer Generative Adversarial Network for image Super
This work introduces a perspective-corrected video see-through mixed-reality head-mounted display with edge-preserving occlusion and low-latency capabilities. To realize the consistent spatial and temporal composition...
详细信息
This work introduces a perspective-corrected video see-through mixed-reality head-mounted display with edge-preserving occlusion and low-latency capabilities. To realize the consistent spatial and temporal composition of a captured real world containing virtual objects, we perform three essential tasks: 1) to reconstruct captured images so as to match the user's view;2) to occlude virtual objects with nearer real objects, to provide users with correct depth cues;and 3) to reproject the virtual and captured scenes to be matched and to keep up with users' head motions. Captured image reconstruction and occlusion-mask generation require dense and accurate depth maps. However, estimating these maps is computationally difficult, which results in longer latencies. To obtain an acceptable balance between spatial consistency and low latency, we rapidly generated depth maps by focusing on edge smoothness and disocclusion (instead of fully accurate maps), to shorten the processingtime. Our algorithm refines edges via a hybrid method involving infrared masks and color-guided filters, and it fills disocclusions using temporally cached depth maps. Our system combines these algorithms in a two-phase temporal warping architecture based upon synchronized camera pairs and displays. The first phase of warping is to reduce registration errors between the virtual and captured scenes. The second is to present virtual and captured scenes that correspond with the user's head motion. We implemented these methods on our wearable prototype and performed end-to-end measurements of its accuracy and latency. We achieved an acceptable latency due to head motion (less than 4 ms) and spatial accuracy (less than 0.1 degrees in size and less than 0.3 degrees in position) in our test environment. We anticipate that this work will help improve the realism of mixed reality systems.
Closed-circuit television, or CCTV, is another name for video surveillance. It is a fast-expanding sector that has been around for more than 30 years and has seen many technological advancements. In the modern world, ...
详细信息
The spatiotemporal data of railway infrastructure plays an important role in the development of railway informatization, but existing collection technologies have problems such as low efficiency, high cost, and many l...
详细信息
A prototype of a blind assisting system that utilizes machine learning for real-time object detection and classification to help visually impaired people to navigate independently without relying on external assistanc...
详细信息
Intraprocedural 3D real-time magnetic resonance imaging (MRI) provides a way for accurate and precise radiofrequency catheter targeting during ventricular tachycardia ablation. However, the limited data acquisition ti...
详细信息
ISBN:
(纸本)9781510671577;9781510671560
Intraprocedural 3D real-time magnetic resonance imaging (MRI) provides a way for accurate and precise radiofrequency catheter targeting during ventricular tachycardia ablation. However, the limited data acquisition time needed to freeze cardiac motion results in highly undersampled k-space data that are challenging to reconstruct. In this work, we evaluated several deep learning (DL) based methods for real-time reconstruction of highly undersampled 3D real-time cardiac MRI. Algorithm reconstruction performance and speed were compared between classical algorithms and DL-based methods. Generative adversarial networks with attention layers in the generator were used to perform reconstructions in the image domain, which strived to balance reconstruction speed and image quality. In addition, variational networks were implemented by iterating data consistency in k-space and enforcing image smoothness via neural network-based regularization. In a preliminary study of heartbeat-resolved highly undersampled 3D cardiac MRI for 11 healthy volunteers, we observed that DL reconstruction methods provided good image quality with a significant increase in computational speed.
Face masks are necessary during the worldwide pandemic to prevent the transmission of infectious diseases. This research proposes a deep learning-based system for detecting face masks in live video feeds in real-time....
详细信息
This paper presents an innovative approach to real-time sign language recognition using Long Short-Term Memory networks (LSTM), aimed at enhancing communication accessibility for the deaf and hard-of-hearing community...
详细信息
ISBN:
(纸本)9798350384901;9798350384895
This paper presents an innovative approach to real-time sign language recognition using Long Short-Term Memory networks (LSTM), aimed at enhancing communication accessibility for the deaf and hard-of-hearing community. We address the challenge of understanding and interpreting sign language, which is critical for millions worldwide, yet restricted to those proficient in it. Our research contributes to bridging this communication gap by developing a deep learning model capable of recognizing a broad spectrum of sign language gestures and sentences with high accuracy and speed. Utilizing a rich dataset comprising diverse sign language gestures, collected in collaboration with a professional video production studio and proficient sign language users, we employ LSTM networks integrated with Dense layers to effectively capture the complex spatial and temporal patterns of sign language. The architecture of our model is specifically designed to accommodate the nuanced dynamics of sign language, with an emphasis on real-timeprocessing. Through rigorous training and validation, our model demonstrates an outstanding accuracy rate of 92% on a comprehensive testing dataset, alongside remarkable real-timeprocessing capabilities. The system's efficiency in recognizing a wide array of sign gestures nearly instantaneously underscores its potential applicability in various real-world scenarios, including assistive technologies and human-computer interaction. This study not only showcases the practicality and efficacy of LSTM networks in real-time sign language recognition but also marks a significant step towards more inclusive and accessible communication technologies. Our future work includes integrating this system with the Langue des Signes Quebecoise website, further advancing the goal of universal communication accessibility.
Nowadays, analyzing football videos using computer vision techniques has attracted increasing attention. Significant events detection, football video summarization, football results predictions, statistics etc. are ex...
详细信息
Nowadays, analyzing football videos using computer vision techniques has attracted increasing attention. Significant events detection, football video summarization, football results predictions, statistics etc. are exciting applications in this area. On the other hand, the deep learning approaches are very successful methods for image and video analysis that need much data. Nevertheless, to the best of our knowledge, publicly available datasets in this area are small or individual, which are not enough for such deep learning-based approaches. A public dataset was collected, annotated, and prepared, namely IAUFD*, to meet this gap for researches in this direction. The IAUFD contains 100,000 real-world images from 33 football videos in 2,508 min, annotated in 10 event categories. These categories include the goal, center of the field, celebration, red card, yellow card, the ball, stadium, the referee, penalty-kick, and free-kick. It is believed that these moments are the basis and useful for any high-level action or event exploration. For a generalization of our dataset, we paid attention to various weather (e.g., sunny, rainy, cloudy etc.), season, time of day, and location. We also used two deep neural networks (VggNet-13 and ResNet-18) to evaluate our proposed dataset as the baseline for future studies and comparison.
The purpose of this study is to investigate the use of 360° video technology to monitor the unsafe behaviour of workers in the construction industry. To achieve this, a survey questionnaire was designed and distr...
详细信息
暂无评论