In order to solve the problems of irregular targets and fuzzy boundaries in bone scintigraphy segmentation, an improved TransUNet model was proposed. The feature extraction part of the encoder is replaced with an asym...
详细信息
Hyperspectral imaging is one of the most promising techniques for intraoperative tissue characterisation. Snapshot mosaic cameras, which can capture hyperspectral data in a single exposure, have the potential to make ...
详细信息
Hyperspectral imaging is one of the most promising techniques for intraoperative tissue characterisation. Snapshot mosaic cameras, which can capture hyperspectral data in a single exposure, have the potential to make a real-time hyperspectral imaging system for surgical decision-making possible. However, optimal exploitation of the captured data requires solving an ill-posed demosaicking problem and applying additional spectral corrections. In this work, we propose a supervised learning-based image demosaicking algorithm for snapshot hyperspectral images. Due to the lack of publicly available medical images acquired with snapshot mosaic cameras, a synthetic image generation approach is proposed to simulate snapshot images from existing medical image datasets captured by high-resolution, but slow, hyperspectral imaging devices. image reconstruction is achieved using convolutional neural networks for hyperspectral image super-resolution, followed by spectral correction using a sensor-specific calibration matrix. The results are evaluated both quantitatively and qualitatively, showing clear improvements in image quality compared to a baseline demosaicking method using linear interpolation. Moreover, the fast processingtime of 45 ms of our algorithm to obtain super-resolved RGB or oxygenation saturation maps per image for a state-of-the-art snapshot mosaic camera demonstrates the potential for its seamless integration into real-time surgical hyperspectral imaging applications.
deeplearning methods can now generate high quality synthetic speech which is perceptually indistinguishable from real speech. As synthetic speech can be used for nefarious purposes, speech forensics methods to detect...
详细信息
ISBN:
(纸本)9798350351439;9798350351422
deeplearning methods can now generate high quality synthetic speech which is perceptually indistinguishable from real speech. As synthetic speech can be used for nefarious purposes, speech forensics methods to detect fully synthetic speech have been developed. Speech editing tools can also create partially synthetic speech in which only a part of the speech signal is synthetic. Detecting these short synthetic segments within a speech signal requires specialized methods to determine the temporal location of the synthetic speech. In this paper, we propose the Synthetic Speech Localization Convolutional Transformer (SSLCT), a neural network and transformer method for synthetic speech localization. SSLCT can temporally localize synthetic speech segments as small as 20 milliseconds. We demonstrate that SSLCT achieves less than 10% Equal Error Rate (EER), which is an improvement over several existing methods.
real-world data often exhibit long-tailed distributions with heavy class imbalance, which deteriorates the generalization performance of the classifier. To mitigate this problem, we propose a novel Prototype-based Aug...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
real-world data often exhibit long-tailed distributions with heavy class imbalance, which deteriorates the generalization performance of the classifier. To mitigate this problem, we propose a novel Prototype-based Augmentation framework (ProAug) to address the data scarcity issue by augmenting the feature space for tail classes. Our ProAug consists of a prototype construction branch and a dynamic augmentation branch. The prototype-based dictionary is optimized with category-aware margin loss to learn multi-center and discriminative prototypes for each category. In the dynamic augmentation branch, we aim to produce high-quality tail-class features by dynamically composing context-similar prototypes with an attention mechanism. Moreover, to further improve the reliability of prototypes and the quality of augmented features, a meta-update strategy is adopted to calibrate two branches of ProAug to boost performance. Extensive empirical results on CIFAR-LT-10/100, imageNet-LT, and iNaturalist 2018 demonstrate the effectiveness of our method.
This research introduces a novel approach that integrates deep Contextual learning (DCL), specifically the DCL-256-32 model with an embedding model to accurately classify offense levels within the textual data. The DC...
详细信息
Whole slide imaging (WSI) has become an essential tool in pathological diagnosis, owing to its convenience on remote and collaborative review. However, how to bring the sample at the optimal position in the axial dire...
详细信息
Whole slide imaging (WSI) has become an essential tool in pathological diagnosis, owing to its convenience on remote and collaborative review. However, how to bring the sample at the optimal position in the axial direction and image without defocusing artefacts is still a challenge, as traditional methods are either not universal or time-consuming. Until recently, deeplearning has been shown to be effective in the autofocusing task in predicting defocusing distance. Here, we apply quantized spiral phase modulation on the Fourier domain of the captured images before feeding them into a light-weight neural network. It can significantly reduce the average predicting error to be lower than any previous work on an open dataset. Also, the high predicting speed strongly supports it can be applied on an edge device for real-time tasks with limited computational source and memory footprint. (C) 2022 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement
In recent years, light field imaging has gained significant attention in the scientific community due to its ability to provide a more immersive representation of the 3D world. However, ensuring the quality of light f...
详细信息
ISBN:
(纸本)9798350350920
In recent years, light field imaging has gained significant attention in the scientific community due to its ability to provide a more immersive representation of the 3D world. However, ensuring the quality of light field images is crucial for their subsequent processing and applications. deeplearning methods, leveraging neural networks, have shown promising performance in image Quality Assessment (IQA). However, the unique characteristics of light field data pose a challenge for existing IQA methods. To address this challenge, we propose a Robust Large-scale Dataset for Assessing Light Field image Quality, named RLSD, specifically designed for evaluating the quality of light field images. The dataset comprises both real and synthetic scenes, covering a wide range of key low attributes and including three representative distortions: compression, noise, and blur. To obtain subjective evaluations, we adopt the single stimulus continuous quality evaluation (SSCQE) method and compute the Mean Opinion Score (MOS). We performed statistical analysis on the dataset and experimental results indicate that our proposed RLSD dataset includes various common scenes and distortion levels, making it suitable for designing and evaluating LF-IQA algorithms. The dataset is publicly available at the following link: "https://***/s/1kJmx4qsy8ywLPba-HwGCEg" (password: XY28).
This research puts forward a deep-learning-centered automotive manufacturing defect detection algorithm. It utilizes the SSD (Single Shot MultiBox Detector) algorithm to realize the efficient detection of surface flaw...
详细信息
Sorting pomegranates based on quality grades is a crucial stage in their export preparation and packing process. Traditionally reliant on manual sorting methods which are time-consuming, prone to inaccuracies and incu...
详细信息
The proceedings contain 86 papers. The topics discussed include: robust real-time monitoring of complex human activities using multi modal video analytics;a robust approach for classifying laparoscopic video distortio...
ISBN:
(纸本)9798331506520
The proceedings contain 86 papers. The topics discussed include: robust real-time monitoring of complex human activities using multi modal video analytics;a robust approach for classifying laparoscopic video distortions using ResNet-50;enhancing x-ray image classification through neural architecture;revolutionary MRI imaging for Alzheimer’s: cutting-edge GANs and vision transformer solutions;advanced deeplearning strategies for breast cancer image analysis;identifying surgical instruments in pedagogical cataract surgery videos through an optimized aggregation network;enhancing auxiliary cancer classification task for multi-task breast ultrasound diagnosis network;and bioinspired computer vision for effective extended reality applications.
暂无评论