Narrow-band imaging (NBI), a relatively new bronchoscopy technology, offers superior visualization of vascular details in lesion areas along the airway walls compared to standard white light bronchoscopy. This empower...
详细信息
ISBN:
(纸本)9781510685987;9781510685994
Narrow-band imaging (NBI), a relatively new bronchoscopy technology, offers superior visualization of vascular details in lesion areas along the airway walls compared to standard white light bronchoscopy. This empowers physicians to detect suspect lesions and characterize their underlying vascular structures for further indications of cancerous activity. Unfortunately, the bronchoscopic video stream suffers from blurring artifacts due to device and patient motions, resulting in low-resolution visualization of lesion areas. To address this problem, we present an image enhancement method for NBI bronchoscopy to improve: 1) visualization of vascular structures;2) lesion detection;and 3) vessel segmentation. We adapted real-ESRGAN, a single-image super-resolution network, to enhance bronchoscopic images in real-time. This involved a transfer learning approach to fine-tune a pre-trained model using our public NBI bronchial lesion database. The results, derived from bronchoscopic airway exam videos of 10 lung cancer patients, demonstrate significant improvement in the visual quality of super-resolved frames, particularly in vascular regions. Our quantitative analysis further shows enhanced vessel segmentation and lesion detection accuracy, with increased confidence scores. This method offers a practical, real-time solution for improving the diagnostic utility of NBI bronchoscopy by providing clearer, more detailed images. Thus, we integrated the method into an NBI video analysis system for aiding in the early detection and characterization of bronchial lesions.
The Space-timevideo Super-Resolution (STVSR) task aims to enhance the visual quality of videos, by simultaneously performing video frame interpolation (VFI) and video super-resolution (VSR). However, facing the chall...
详细信息
ISBN:
(纸本)9798350318920;9798350318937
The Space-timevideo Super-Resolution (STVSR) task aims to enhance the visual quality of videos, by simultaneously performing video frame interpolation (VFI) and video super-resolution (VSR). However, facing the challenge of the additional temporal dimension and scale inconsistency, most existing STVSR methods are complex and inflexible in dynamically modeling different motion amplitudes. In this work, we find that choosing an appropriate processing scale achieves remarkable benefits in flow-based feature propagation. We propose a novel Scale-Adaptive Feature Aggregation (SAFA) network that adaptively selects sub-networks with different processing scales for individual samples. Experiments on four public STVSR benchmarks demonstrate that SAFA achieves state-of-the-art performance. Our SAFA network outperforms recent state-of-the-art methods such as TMNet [83] and videoINR [10] by an average improvement of over 0.5dB on PSNR, while requiring less than half the number of parameters and only 1/3 computational costs.
The rise of video streaming for digital visual processing has been a boon for the industry of visual processing. video streaming technology has made it easier for companies to capture, analyze, and interpret visual da...
详细信息
We present a method that can analyze coded ultra-high resolution (UHD) video content an order of magnitude faster than real-time. We observe that the larger the resolution of a video, the larger the fraction of the ov...
详细信息
ISBN:
(纸本)9783031064333;9783031064326
We present a method that can analyze coded ultra-high resolution (UHD) video content an order of magnitude faster than real-time. We observe that the larger the resolution of a video, the larger the fraction of the overall processingtime is spent on decoding frames from the video. In this paper, we exploit the way video is coded to significantly speed up the frame decoding process. More precisely, we only decode keyframes, which can be decoded significantly faster than 'random' frames in the video. A key insight is that in modern video codecs, keyframes are often placed around scene changes (shot boundaries), and hence form a very representative subset of frames of the video. We show on the example of video genre tagging that keyframes nicely lend themselves to video analysis tasks. Unlike previous genre prediction methods which include a multitude of signals, we train a per-frame genre classification system using a CNN that solely takes (key-)frames as input. We show that the aggregated genre predictions are very competitive to much more involved methods at predicting the video genre(s), and even outperform state-of-the-art genre tagging that solely rely on video frames as input. The proposed system can reliably tag video genres of a compressed video between 12 x (8K content) and 96x (1080p content) faster than real-time.
This study initiative focused on improving an exits automatic optical inspection application which uses rule base imageprocessing, object detection, image segmentation and template matching through dynamic database u...
详细信息
ISBN:
(纸本)9798350386851;9798350386844
This study initiative focused on improving an exits automatic optical inspection application which uses rule base imageprocessing, object detection, image segmentation and template matching through dynamic database updates. A realtime screw defect detection in various light source situations has been proposed to ultimately reducing defect detection costs and time. Difference from CNN which require an amount of training material to achieve specific accuracy rate, this method just need a few of necessary. The effectiveness of the proposed method has been evaluated through many experiments.
There are many studies about acquisition of monoscopic and stereoscopic panorama in the literature. Since obtaining motion parallax accurately without having distortions is a very challenging problem, especially 360-d...
详细信息
ISBN:
(纸本)9781665450928
There are many studies about acquisition of monoscopic and stereoscopic panorama in the literature. Since obtaining motion parallax accurately without having distortions is a very challenging problem, especially 360-degree stereoscopic image and video capturing has become a prevalent research topic. However, studies in this topic have focused on costly systems with many cameras and high processing power demand. In this study, which presents an efficient solution in terms of processing power and cost, stereoscopic frames are processed in realtime using three consumer grade 360-degree cameras whose outputs are sampled according to view orientation. Besides, a method is developed to eliminate the distortions around the borders of the field of view with the help of blending with selected auxiliary camera frames.
Recently, the vision transformer (ViT) has achieved remarkable performance in computer vision tasks and has been actively utilized in colorization. Vision transformer uses multi-head self attention to effectively prop...
详细信息
ISBN:
(纸本)9798350318920;9798350318937
Recently, the vision transformer (ViT) has achieved remarkable performance in computer vision tasks and has been actively utilized in colorization. Vision transformer uses multi-head self attention to effectively propagate user hints to distant relevant areas in the image. However, despite the success of vision transformers in colorizing the image, heavy underlying ViT architecture and the large computational cost hinder active real-time user interaction for colorization applications. Several research removed redundant image patches to reduce the computational cost of ViT in image classification tasks. However, the existing efficient ViT methods cause severe performance degradation in colorization task since it completely removes the redundant patches. Thus, we propose a novel efficient ViT architecture for real-time interactive colorization, AdaColViT determines which redundant image patches and layers to reduce in the ViT. Unlike existing methods, our novel pruning method alleviates performance drop and flexibly allocates computational resources of input samples, effectively achieving actual acceleration. In addition, we demonstrate through extensive experiments on imageNet-ctest10k, Oxford 102flowers, and CUB-200 datasets that our method outperforms the baseline methods.
video-based dynamic mesh compression is being developed for a standard for compressing time-varying mesh data by using existing 2D video codecs. In this paper, we propose a spatial scalable structure for video-based d...
详细信息
Cloud Virtual reality (VR) gaming is a novel technology that allows users to enjoy complex games on their thin clients by offloading the graphics rendering to cloud servers. The thin clients only need to perform basic...
详细信息
High frame rate videos are recently used for training a variety of imageprocessing tasks such as image/video deblurring and frame interpolation. Yet, the number of publicly available datasets with high frame rate vid...
详细信息
ISBN:
(纸本)9781665491303
High frame rate videos are recently used for training a variety of imageprocessing tasks such as image/video deblurring and frame interpolation. Yet, the number of publicly available datasets with high frame rate videos are still limited. In this paper, we propose the iPhone 240fps dataset, a dataset which consists of real world 240fps videos. We conduct experiments with deblurring and frame interpolation models to show the effectiveness of this dataset. We also use this dataset to conduct experiments with a state-of-the-art video super-resolution model. We take advantage of the flexibility to change the frame rate of the dataset and test which fps is best suited for training the model. We believe this dataset can make contributions to future works in image and videoprocessing. The dataset is publicly available on our cloud storage(dagger):
暂无评论