Tracking image sources and verifying copyright information is crucial in digital media communication. Digital image watermarking technology, widely used for copyright protection and source tracking, faces challenges i...
详细信息
Automated retinal image medical description generation is crucial for streamlining medical diagnosis and treatment planning. Existing challenges include the reliance on learned retinal image representations, difficult...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Automated retinal image medical description generation is crucial for streamlining medical diagnosis and treatment planning. Existing challenges include the reliance on learned retinal image representations, difficulties in handling multiple imaging modalities, and the lack of clinical context in visual representations. Addressing these issues, we propose the Multi-Modal Medical Transformer (M3T), a novel deep learning architecture that integrates visual representations with diagnostic keywords. Unlike previous studies focusing on specific aspects, our approach efficiently learns contextual information and semantics from both modalities, enabling the generation of precise and coherent medical descriptions for retinal images. Experimental studies on the DeepEyeNet dataset validate the success of M3T in meeting ophthalmologists' standards, demonstrating a substantial 13.5% improvement in BLEU@4 over the best-performing baseline model.
The 1-ms visual feedback system is critical for seamless actuation in robotics, as any delay affects its performance in handling dynamic situations. Specular reflections cause problems in many visual technologies, mak...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
The 1-ms visual feedback system is critical for seamless actuation in robotics, as any delay affects its performance in handling dynamic situations. Specular reflections cause problems in many visual technologies, making specular detection crucial in 1-ms visual feedback systems. However, existing real-time methods, which target Neumann architecture, fail to achieve the 1-ms delay due to spatial memory paths resulting from extensive frame-based processing. This research aims to develop a 1-ms specular detection system from both algorithm and architecture perspectives, proposing 1) temporal clustering and temporal reference based specular detection method, which leverages temporal domain information to address the requirements of frame-based processing;and 2) global-local integrated specular detection architecture, which enables the coexistence of local and global processing within a 1-ms stream-based architecture. The proposed methods are implemented on FPGA. The evaluation shows that the proposed system supports sensing and processing a 1000-fps sequence with a delay of 0.941 ms/frame.
The role of computer vision technology in the field of artificial intelligence development is very important, but there is a problem of poor application effect of key technologies. Traditional neural network algorithm...
详细信息
An electrocardiogram (ECG) captures the heart's electrical signal to assess various heart conditions. In practice, ECG data is stored as either digitized signals or printed images. Despite the emergence of numerou...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
An electrocardiogram (ECG) captures the heart's electrical signal to assess various heart conditions. In practice, ECG data is stored as either digitized signals or printed images. Despite the emergence of numerous deep learning models for digitized signals, many hospitals prefer image storage due to cost considerations. Recognizing the unavailability of raw ECG signals in many clinical settings, we propose VizECGNet, which uses only printed ECG graphics to determine the prognosis of multiple cardiovascular diseases. During training, cross-modal attention modules (CMAM) are used to integrate information from two modalities - image and signal, while self-modality attention modules (SMAM) capture inherent long-range dependencies in ECG data of each modality. Additionally, we utilize knowledge distillation to improve the similarity between two distinct predictions from each modality stream. This innovative multi-modal deep learning architecture enables the utilization of only ECG images during inference. VizECGNet with image input achieves higher performance in precision, recall, and F1-Score compared to signal-based ECG classification models, with improvements of 3.50%, 8.21%, and 7.38%, respectively.
Zero-shot learning (ZSL) directs the challenge of classifying unseen test images without explicit training on those samples. ZSL can identify and classify unlabeled images available in abundance by learning from visua...
详细信息
ISBN:
(纸本)9783031734762;9783031734779
Zero-shot learning (ZSL) directs the challenge of classifying unseen test images without explicit training on those samples. ZSL can identify and classify unlabeled images available in abundance by learning from visual and semantic embedding vectors (feature vectors). Information-enriched visual features extracted from images play a crucial role in ZSL. This paper proposes a hybrid feature approach that integrates low-level (LL), and high-level (HL) features extracted from images. Gray Level Co-occurrence Matrix (GLCM) and Gabor features are employed to obtain LL texture features, while HL features are derived from the ResNet-50 model, renowned for capturing complex hierarchical representations. These hybrid visual features are then mapped with semantic features using linear mapping, where the semantic features are embedding vectors of labels generated by the fastText model. Experiments on the AWA2 and SUN datasets are conducted in a bid to evaluate the proposed approach's effectiveness. The hybrid feature approach has demonstrated enhanced quality in zero-shot image classification, effectively classifying images that the model has not seen during training.
Implicit Neural Representations (INR) are a novel data representation technique which is gaining ground in the image compression field due to its simplicity and interesting results in terms of rate/distortion ratio. A...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Implicit Neural Representations (INR) are a novel data representation technique which is gaining ground in the image compression field due to its simplicity and interesting results in terms of rate/distortion ratio. Although a variety of methods based on this paradigm were proposed, limited interest has been given to the analysis of the loss function and the impact of compression artifacts on the visual quality of the reconstructed images, which are mainly due to the adoption of the simple Mean Squared Error (MSE) loss function and to the evaluation done merely in terms of Peak Signal-to-Noise Ratio (PSNR), which do not often correlate with the human perception. In this paper, we evaluate a set of five loss functions in the context of training INRs for image compression, applied to three state-of-the-art architectures, and evaluate their effect on a broader collection of quantitative metrics and the visual fidelity of the decoded images to the originals. The presented outcomes show that the reconstructions obtained by training with some loss functions as MSE suffer from over-smoothing and aliasing artifacts. Our findings reveal that through the employing of a suitable loss function, state-of-the-art architectures quantitatively and qualitatively outperform the results reported in their original papers.
This article introduces a novel multi-modal image fusion approach based on Convolutional Block Attention Module and dense networks to enhance human perceptual quality and information content in the fused images. The p...
详细信息
ISBN:
(纸本)9783031585340;9783031585357
This article introduces a novel multi-modal image fusion approach based on Convolutional Block Attention Module and dense networks to enhance human perceptual quality and information content in the fused images. The proposed model preserves the edges of the infrared images and enhances the contrast of the visible image as a pre-processing part. Consequently, the use of Convolutional Block Attention Module has resulted in the extraction of more refined features from the source images. The visual results demonstrate that the fused images produced by the proposed method are visually superior to those generated by most standard fusion techniques. To substantiate the findings, quantitative analysis is conducted using various metrics. The proposed method exhibits the best Naturalness image Quality Evaluator and Chen-Varshney metric values, which are human perception-based parameters. Moreover, the fused images exhibit the highest Standard Deviation value, signifying enhanced contrast. These results justify the proposed multi-modal image fusion technique outperforms standard methods both qualitatively and quantitatively, resulting in superior fused images with improved human perception quality.
In this work, we propose an Unequal Error Protection technique suitable for scenarios that demand low power consumption and are equipped with limited computationally capabilities. CosinePrism incorporates colour theor...
详细信息
ISBN:
(纸本)9798350350463;9798350350456
In this work, we propose an Unequal Error Protection technique suitable for scenarios that demand low power consumption and are equipped with limited computationally capabilities. CosinePrism incorporates colour theory and techniques from lossy image compression methods towards optimizing the allocation of Forward Error Correction factor. CosinePrism can progressively deliver RGB and non-RGB images over half-duplex links, and includes a JPEG re-compression mode. The proposed technique achieves compression efficiency and image quality comparable to JPEG, and on an average offers DSSIM score improvement of 55.5% over Equal Error Protection method. An open-source reference encoder is provided for further research and development to the community.
visual information processing is required in a large number of different fields of human activity. A large number of new methods and techniques for its performing appear with the development of information technology....
详细信息
暂无评论