This paper describes a low cost computer vision system able to obtain traffic metrics at urban intersections. The proposed system is based on a Bayesian network based reasoning model. It employs the data extracted fro...
详细信息
This paper describes a low cost computer vision system able to obtain traffic metrics at urban intersections. The proposed system is based on a Bayesian network based reasoning model. It employs the data extracted from background subtraction and contrast analysis techniques applied to predefined regions of interest of the video sequences, to evaluate different traffic metrics. The system has been designed to be able to work with already installed urban cameras, in order to reduce installation costs. So, it can be configured to work with different types of image sizes and video frame rates, as well as to process images taken from different distances and perspectives. The validity of the proposed system has been proved using a Raspberry Pi platform and tested using two real surveillance video cameras managed by the local authority of Cartagena (Spain) during different environmental light conditions. Using this hardware the system is able to process VGA grayscale images at a rate of 8 frames per second.
This study initiative focused on improving an exits automatic optical inspection application which uses rule base imageprocessing, object detection, image segmentation and template matching through dynamic database u...
详细信息
ISBN:
(纸本)9798350386851;9798350386844
This study initiative focused on improving an exits automatic optical inspection application which uses rule base imageprocessing, object detection, image segmentation and template matching through dynamic database updates. A realtime screw defect detection in various light source situations has been proposed to ultimately reducing defect detection costs and time. Difference from CNN which require an amount of training material to achieve specific accuracy rate, this method just need a few of necessary. The effectiveness of the proposed method has been evaluated through many experiments.
Three-dimensional human pose estimation plays an important role in the field of computer vision, such as in healthcare, sports, activity recognition, motion capture, and augmented reality. However, monocular image or ...
详细信息
Three-dimensional human pose estimation plays an important role in the field of computer vision, such as in healthcare, sports, activity recognition, motion capture, and augmented reality. However, monocular image or video based methods are sensitive to occlusions, while multi-view methods usually require enormous computation resources. Currently, inertial measurement unit (IMU)-based methods have begun to overcome the occlusion problem and can potentially achieve real-time inference. Yet, they still suffer from insufficient precision and scale drift error over time. In this paper, we propose a novel, efficient framework to fuse a single image with temporal sequence from IMU sensors to estimate human poses and reconstruct human shapes. Our method achieves 46 mm Mean Per Joint Positional Error (MPJPE) on the Total Capture dataset with 30 frames time segment, and surpasses state-of-the-art pure IMU-based methods. Moreover, in comparison with other vision-based methods, the proposed method shows great advantage in reducing computing floating point operations per second (FLOPS) quota while still achieving competitive estimation precision. Our method achieves 74 FPS on an IPhone 12 for offline processing. In addition, our method can easily be generalized for outdoor cases.
The rise of video streaming for digital visual processing has been a boon for the industry of visual processing. video streaming technology has made it easier for companies to capture, analyze, and interpret visual da...
详细信息
Cloud Virtual reality (VR) gaming is a novel technology that allows users to enjoy complex games on their thin clients by offloading the graphics rendering to cloud servers. The thin clients only need to perform basic...
详细信息
We present a method that can analyze coded ultra-high resolution (UHD) video content an order of magnitude faster than real-time. We observe that the larger the resolution of a video, the larger the fraction of the ov...
详细信息
ISBN:
(纸本)9783031064333;9783031064326
We present a method that can analyze coded ultra-high resolution (UHD) video content an order of magnitude faster than real-time. We observe that the larger the resolution of a video, the larger the fraction of the overall processingtime is spent on decoding frames from the video. In this paper, we exploit the way video is coded to significantly speed up the frame decoding process. More precisely, we only decode keyframes, which can be decoded significantly faster than 'random' frames in the video. A key insight is that in modern video codecs, keyframes are often placed around scene changes (shot boundaries), and hence form a very representative subset of frames of the video. We show on the example of video genre tagging that keyframes nicely lend themselves to video analysis tasks. Unlike previous genre prediction methods which include a multitude of signals, we train a per-frame genre classification system using a CNN that solely takes (key-)frames as input. We show that the aggregated genre predictions are very competitive to much more involved methods at predicting the video genre(s), and even outperform state-of-the-art genre tagging that solely rely on video frames as input. The proposed system can reliably tag video genres of a compressed video between 12 x (8K content) and 96x (1080p content) faster than real-time.
Recently, the vision transformer (ViT) has achieved remarkable performance in computer vision tasks and has been actively utilized in colorization. Vision transformer uses multi-head self attention to effectively prop...
详细信息
ISBN:
(纸本)9798350318920;9798350318937
Recently, the vision transformer (ViT) has achieved remarkable performance in computer vision tasks and has been actively utilized in colorization. Vision transformer uses multi-head self attention to effectively propagate user hints to distant relevant areas in the image. However, despite the success of vision transformers in colorizing the image, heavy underlying ViT architecture and the large computational cost hinder active real-time user interaction for colorization applications. Several research removed redundant image patches to reduce the computational cost of ViT in image classification tasks. However, the existing efficient ViT methods cause severe performance degradation in colorization task since it completely removes the redundant patches. Thus, we propose a novel efficient ViT architecture for real-time interactive colorization, AdaColViT determines which redundant image patches and layers to reduce in the ViT. Unlike existing methods, our novel pruning method alleviates performance drop and flexibly allocates computational resources of input samples, effectively achieving actual acceleration. In addition, we demonstrate through extensive experiments on imageNet-ctest10k, Oxford 102flowers, and CUB-200 datasets that our method outperforms the baseline methods.
Intraprocedural 3D real-time magnetic resonance imaging (MRI) provides a way for accurate and precise radiofrequency catheter targeting during ventricular tachycardia ablation. However, the limited data acquisition ti...
详细信息
ISBN:
(纸本)9781510671577;9781510671560
Intraprocedural 3D real-time magnetic resonance imaging (MRI) provides a way for accurate and precise radiofrequency catheter targeting during ventricular tachycardia ablation. However, the limited data acquisition time needed to freeze cardiac motion results in highly undersampled k-space data that are challenging to reconstruct. In this work, we evaluated several deep learning (DL) based methods for real-time reconstruction of highly undersampled 3D real-time cardiac MRI. Algorithm reconstruction performance and speed were compared between classical algorithms and DL-based methods. Generative adversarial networks with attention layers in the generator were used to perform reconstructions in the image domain, which strived to balance reconstruction speed and image quality. In addition, variational networks were implemented by iterating data consistency in k-space and enforcing image smoothness via neural network-based regularization. In a preliminary study of heartbeat-resolved highly undersampled 3D cardiac MRI for 11 healthy volunteers, we observed that DL reconstruction methods provided good image quality with a significant increase in computational speed.
The proceedings contain 92 papers. The special focus in this conference is on image Analysis and processing. The topics include: An Effective CNN-Based Super Resolution Method for video Coding;medical Transformers for...
ISBN:
(纸本)9783031510229
The proceedings contain 92 papers. The special focus in this conference is on image Analysis and processing. The topics include: An Effective CNN-Based Super Resolution Method for video Coding;medical Transformers for Boosting Automatic Grading of Colon Carcinoma in Histological images;FERMOUTH: Facial Emotion Recognition from the MOUTH Region;consensus Ranking for Efficient Face image Retrieval: A Novel Method for Maximising Precision and Recall;towards Explainable Navigation and Recounting;towards Facial Expression Robustness in Multi-scale Wild Environments;depth Camera Face Recognition by Normalized Fractal Encodings;automatic Generation of Semantic Parts for Face image Synthesis;improved Bilinear Pooling for real-time Pose Event Camera Relocalisation;continual Source-Free Unsupervised Domain Adaptation;End-to-End Asbestos Roof Detection on Orthophotos Using Transformer-Based YOLO Deep Neural Network;OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data;UAV Multi-object Tracking by Combining Two Deep Neural Architectures;GLR: Gradient-Based Learning Rate Scheduler;a Large-scale Analysis of Athletes’ Cumulative Race time in Running Events;uncovering Lies: Deception Detection in a Rolling-Dice Experiment;active Class Selection for Dataset Acquisition in Sign Language Recognition;MC-GTA: A Synthetic Benchmark for Multi-Camera Vehicle Tracking;a Differentiable Entropy Model for Learned image Compression;learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation;self-Similarity Block for Deep image Denoising;SCENE-pathy: Capturing the Visual Selective Attention of People Towards Scene Elements;not with My Name! Inferring Artists’ Names of Input Strings Employed by Diffusion Models;benchmarking of Blind video Deblurring Methods on Long Exposure and Resource Poor Settings;LieToMe: An LSTM-Based Method for Deception Detection by Hand Movements;spatial Transformer Generative Adversarial Network for image Super
The spatiotemporal data of railway infrastructure plays an important role in the development of railway informatization, but existing collection technologies have problems such as low efficiency, high cost, and many l...
详细信息
暂无评论