In the past decade, various haze removal techniques have been widely reported for object recognition. But hitherto little has been identified on the use of single image dehazing using transfer learning approach for ob...
详细信息
video object detection aims to detect and track each object in a given video. However, due to the problem of appearance deterioration in the video, it is still challenging to obtain good results when we apply traditio...
详细信息
ISBN:
(数字)9781665496209
ISBN:
(纸本)9781665496209
video object detection aims to detect and track each object in a given video. However, due to the problem of appearance deterioration in the video, it is still challenging to obtain good results when we apply traditional image object detection methods to videos. In this paper, we propose a new feature aggregation method, called Dual Feature Aggregation (DualFeat) for video object detection. By effectively combining the temporal and spatial attention mechanisms, we make full use of the temporal and spatial information in videos. Meanwhile, we leverage a real-time tracker to track detected objects in video frames, where features are aggregated again with previously obtained features. Such a way helps to obtain more comprehensive and richer features, greatly improving the accuracy of video object detection. We perform experiments on the ILSVRC2017 dataset, and the experimental results also verify the effectiveness of our method.
Narrow-band imaging (NBI), a relatively new bronchoscopy technology, offers superior visualization of vascular details in lesion areas along the airway walls compared to standard white light bronchoscopy. This empower...
详细信息
ISBN:
(纸本)9781510685987;9781510685994
Narrow-band imaging (NBI), a relatively new bronchoscopy technology, offers superior visualization of vascular details in lesion areas along the airway walls compared to standard white light bronchoscopy. This empowers physicians to detect suspect lesions and characterize their underlying vascular structures for further indications of cancerous activity. Unfortunately, the bronchoscopic video stream suffers from blurring artifacts due to device and patient motions, resulting in low-resolution visualization of lesion areas. To address this problem, we present an image enhancement method for NBI bronchoscopy to improve: 1) visualization of vascular structures;2) lesion detection;and 3) vessel segmentation. We adapted real-ESRGAN, a single-image super-resolution network, to enhance bronchoscopic images in real-time. This involved a transfer learning approach to fine-tune a pre-trained model using our public NBI bronchial lesion database. The results, derived from bronchoscopic airway exam videos of 10 lung cancer patients, demonstrate significant improvement in the visual quality of super-resolved frames, particularly in vascular regions. Our quantitative analysis further shows enhanced vessel segmentation and lesion detection accuracy, with increased confidence scores. This method offers a practical, real-time solution for improving the diagnostic utility of NBI bronchoscopy by providing clearer, more detailed images. Thus, we integrated the method into an NBI video analysis system for aiding in the early detection and characterization of bronchial lesions.
The rise of video streaming for digital visual processing has been a boon for the industry of visual processing. video streaming technology has made it easier for companies to capture, analyze, and interpret visual da...
详细信息
We present a method that can analyze coded ultra-high resolution (UHD) video content an order of magnitude faster than real-time. We observe that the larger the resolution of a video, the larger the fraction of the ov...
详细信息
ISBN:
(纸本)9783031064333;9783031064326
We present a method that can analyze coded ultra-high resolution (UHD) video content an order of magnitude faster than real-time. We observe that the larger the resolution of a video, the larger the fraction of the overall processingtime is spent on decoding frames from the video. In this paper, we exploit the way video is coded to significantly speed up the frame decoding process. More precisely, we only decode keyframes, which can be decoded significantly faster than 'random' frames in the video. A key insight is that in modern video codecs, keyframes are often placed around scene changes (shot boundaries), and hence form a very representative subset of frames of the video. We show on the example of video genre tagging that keyframes nicely lend themselves to video analysis tasks. Unlike previous genre prediction methods which include a multitude of signals, we train a per-frame genre classification system using a CNN that solely takes (key-)frames as input. We show that the aggregated genre predictions are very competitive to much more involved methods at predicting the video genre(s), and even outperform state-of-the-art genre tagging that solely rely on video frames as input. The proposed system can reliably tag video genres of a compressed video between 12 x (8K content) and 96x (1080p content) faster than real-time.
There are many studies about acquisition of monoscopic and stereoscopic panorama in the literature. Since obtaining motion parallax accurately without having distortions is a very challenging problem, especially 360-d...
详细信息
ISBN:
(纸本)9781665450928
There are many studies about acquisition of monoscopic and stereoscopic panorama in the literature. Since obtaining motion parallax accurately without having distortions is a very challenging problem, especially 360-degree stereoscopic image and video capturing has become a prevalent research topic. However, studies in this topic have focused on costly systems with many cameras and high processing power demand. In this study, which presents an efficient solution in terms of processing power and cost, stereoscopic frames are processed in realtime using three consumer grade 360-degree cameras whose outputs are sampled according to view orientation. Besides, a method is developed to eliminate the distortions around the borders of the field of view with the help of blending with selected auxiliary camera frames.
video-based dynamic mesh compression is being developed for a standard for compressing time-varying mesh data by using existing 2D video codecs. In this paper, we propose a spatial scalable structure for video-based d...
详细信息
Cloud Virtual reality (VR) gaming is a novel technology that allows users to enjoy complex games on their thin clients by offloading the graphics rendering to cloud servers. The thin clients only need to perform basic...
详细信息
The proceedings contain 92 papers. The special focus in this conference is on image Analysis and processing. The topics include: An Effective CNN-Based Super Resolution Method for video Coding;medical Transformers for...
ISBN:
(纸本)9783031510229
The proceedings contain 92 papers. The special focus in this conference is on image Analysis and processing. The topics include: An Effective CNN-Based Super Resolution Method for video Coding;medical Transformers for Boosting Automatic Grading of Colon Carcinoma in Histological images;FERMOUTH: Facial Emotion Recognition from the MOUTH Region;consensus Ranking for Efficient Face image Retrieval: A Novel Method for Maximising Precision and Recall;towards Explainable Navigation and Recounting;towards Facial Expression Robustness in Multi-scale Wild Environments;depth Camera Face Recognition by Normalized Fractal Encodings;automatic Generation of Semantic Parts for Face image Synthesis;improved Bilinear Pooling for real-time Pose Event Camera Relocalisation;continual Source-Free Unsupervised Domain Adaptation;End-to-End Asbestos Roof Detection on Orthophotos Using Transformer-Based YOLO Deep Neural Network;OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data;UAV Multi-object Tracking by Combining Two Deep Neural Architectures;GLR: Gradient-Based Learning Rate Scheduler;a Large-scale Analysis of Athletes’ Cumulative Race time in Running Events;uncovering Lies: Deception Detection in a Rolling-Dice Experiment;active Class Selection for Dataset Acquisition in Sign Language Recognition;MC-GTA: A Synthetic Benchmark for Multi-Camera Vehicle Tracking;a Differentiable Entropy Model for Learned image Compression;learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation;self-Similarity Block for Deep image Denoising;SCENE-pathy: Capturing the Visual Selective Attention of People Towards Scene Elements;not with My Name! Inferring Artists’ Names of Input Strings Employed by Diffusion Models;benchmarking of Blind video Deblurring Methods on Long Exposure and Resource Poor Settings;LieToMe: An LSTM-Based Method for Deception Detection by Hand Movements;spatial Transformer Generative Adversarial Network for image Super
Closed-circuit television, or CCTV, is another name for video surveillance. It is a fast-expanding sector that has been around for more than 30 years and has seen many technological advancements. In the modern world, ...
详细信息
暂无评论