In response to the urgent needs of military optoelectronic systems for field of view range, image quality, and system miniaturization, this article designs a large aperture off-axis tri reflection optical system accor...
详细信息
Withthe groundbreaking development of Artificial Intelligence (AI) technology, the volume of video and image consumed by machines has surpassed that consumed by humans. Consequently, video coding for machines technol...
详细信息
ISBN:
(纸本)9789819786848;9789819786855
Withthe groundbreaking development of Artificial Intelligence (AI) technology, the volume of video and image consumed by machines has surpassed that consumed by humans. Consequently, video coding for machines technology has grown rapidly, with feature coding as a prominent technique demonstrating exceptional compression and task performance. this technology has developed rapidly and has entered the stages of chip integration and industrialization. Feature coding for machines can only reconstruct feature tensors, not images. However, in typical machine vision application scenarios such as smart cities, industrial quality inspection, intelligent transportation, and automated broadcasting, there still persists a demand for video and image review. these needs are essential for retrospective analysis of abnormal events, confirmation of incidents, and evidentiary purposes. this work attempts to reconstruct images from feature tensors extracted for machine vision tasks to meet the needs of human visual observation. We propose a lightweight and plug-in feature-to-image reconstruction method for feature coding for machines, with low complexity neural network blocks. the proposed method achieves an average Peak Signal-to-Noise Ratio (PSNR) of up to 28.92 dB, meeting the requirements of human visual perception.
the Hough transform is a powerful mathematical tool designed for detecting geometric patterns, including straight lines, circles, and other shapes, within images. the fundamental idea behind this transformation is to ...
详细信息
Multimodal sentiment analysis (MSA) aims to predict the sentiment expressed in paired images and texts. Cross-modal feature alignment is crucial for models to understand the context and extract complementary semantic ...
详细信息
ISBN:
(纸本)9789819786190;9789819786206
Multimodal sentiment analysis (MSA) aims to predict the sentiment expressed in paired images and texts. Cross-modal feature alignment is crucial for models to understand the context and extract complementary semantic features. However, most previous MSA tasks have shown deficiencies in aligning features across different modalities. Experimental evidence shows that prompt learning can effectively align features, and previous studies have applied prompt learning to MSA tasks, but only in an unimodal context. Applying prompt learning to multimodal feature alignment remains a challenge. this paper employs a multimodal sentiment analysis model based on alignment prompts (MSAPL). Our model generates text and image alignment prompts via the Kronecker Product, enhancing visual modality engagement and the correlation between graphical and textual data, thus enabling a better understanding of multimodal data. Simultaneously, it employs a multi-layer, stepwise learning approach to acquire textual and image features, progressively modeling stage-feature relationships for rich contextual learning. Our experiments on three public datasets demonstrate that our model consistently outperforms all baseline models.
Comprehensive analysis of abnormal changes in anatomical structures in two-dimensional grey-scale ultrasound images and blood flow change characteristics in color Doppler images can be more conducive to the identifica...
详细信息
ISBN:
(纸本)9789819784981;9789819784998
Comprehensive analysis of abnormal changes in anatomical structures in two-dimensional grey-scale ultrasound images and blood flow change characteristics in color Doppler images can be more conducive to the identification of ventricular septal defect diseases (VSD). Starting from the perspective of multi-modality, this paper designs a multi-modality correlation learning network (MC-Net) for VSD identification. MC-Net performs correlation analysis on multi-modality features from two perspectives: network structure and the image itself. In terms of network structure, this paper first constructs dual-branch feature cross-fusion blocks (CFB) to encode the associated information between different modalities to achieve the fusion of global features and local features and then performs reinforcement learning on the fused features through a series of hybrid learning blocks (HLB). In terms of the image itself, this paper designs a group selection transformer (GST) to capture the correlation between image tokens and their context, prompting the network to focus on the region of interest more effectively. this paper conducts experimental analysis in multi-modality five-chamber and parasternal short-axis views. the experimental results show that the identification performance of the proposed algorithm is better than that of the comparison methods.
image restoration aims to obtain a high-quality image from a degraded one. For real-world applications, an increasing number of methods are moving towards addressing multiple degradations using a single model. However...
详细信息
ISBN:
(纸本)9789819786848;9789819786855
image restoration aims to obtain a high-quality image from a degraded one. For real-world applications, an increasing number of methods are moving towards addressing multiple degradations using a single model. However, most of these methods still require task-specific training and primarily extract information from the spatial domain. To overcome this challenge, we introduce a novel All-in-one network, FASPNet, which effectively incorporates both frequency and spatial information to handle various degradations, without requiring any degradation priors. Specifically, we propose a Frequency Refiner Module (FRM), which adaptively adjusts frequency representations and captures crucial global frequency information to facilitate better image restoration. Furthermore, to provide essential low-level information related to restoration, we introduce a Spatial Prompt Module (SPM), utilizing prompts to encode restoration-relevant spatial detail representations and abstract degradation patterns. Extensive experiments have demonstrated that our model outperforms other baseline models on multiple datasets for three common and challenging tasks: deraining, dehazing, and denoising.
Due to the limitation of web-side processing capabilities, the existing mainstream inference models adopt the client-to-server operation mode, and the emergence of WebAssembly has brought opportunities for the client ...
详细信息
the proceedings contain 23 papers. the special focus in this conference is on Skin Imaging Collaboration, Interpretability of Machine Intelligence in Medical image Computing, Embodied AI and Robotics for Healthcare Wo...
ISBN:
(纸本)9783031776090
the proceedings contain 23 papers. the special focus in this conference is on Skin Imaging Collaboration, Interpretability of Machine Intelligence in Medical image Computing, Embodied AI and Robotics for Healthcare Workshop and MICCAI Workshop on Distributed, Collaborative and Federated Learning. the topics include: DeCaF 2024 Preface;i2M2Net: Inter/Intra-modal Feature Masking Self-distillation for Incomplete Multimodal Skin Lesion Diagnosis;from Majority to Minority: A Diffusion-Based Augmentation for Underrepresented Groups in Skin Lesion analysis;segmentation Style Discovery: Application to Skin Lesion images;a Vision Transformer with Adaptive Cross-image and Cross-Resolution Attention;lesion Elevation Prediction from Skin images Improves Diagnosis;DWARF: Disease-Weighted Network for Attention Map Refinement;PIPNet3D: Interpretable Detection of Alzheimer in MRI Scans;Detecting Unforeseen Data Properties with Diffusion Autoencoder Embeddings Using Spine MRI Data;interpretability of Uncertainty: Exploring Cortical Lesion Segmentation in Multiple Sclerosis;TextCAVs: Debugging Vision Models Using Text;evaluating Visual Explanations of Attention Maps for Transformer-Based Medical Imaging;Exploiting XAI Maps to Improve MS Lesion Segmentation and Detection in MRI;EndoGS: Deformable Endoscopic Tissues Reconstruction with Gaussian Splatting;VISAGE: Video Synthesis Using Action Graphs for Surgery;a Review of 3D Reconstruction Techniques for Deformable Tissues in Robotic Surgery;SurgTrack: CAD-Free 3D Tracking of Real-World Surgical Instruments;MUTUAL: Towards Holistic Sensing and Inference in the Operating Room;Complex-Valued Federated Learning with Differential Privacy and MRI Applications;enhancing Privacy in Federated Learning: Secure Aggregation for Real-World Healthcare Applications;federated Impression for Learning with Distributed Heterogeneous Data;A Federated Learning-Friendly Approach for Parameter-Efficient Fine-Tuning of SAM in 3D Segmentation;probing the Effic
In medical imaging applications, particularly in cardiac and skeletal analysis, the anatomical structure detection is crucial for diagnosing cardiac disease and other disease. However, the domain gap between images ac...
详细信息
ISBN:
(纸本)9789819784950;9789819784967
In medical imaging applications, particularly in cardiac and skeletal analysis, the anatomical structure detection is crucial for diagnosing cardiac disease and other disease. However, the domain gap between images acquired from different sources or modalities poses a significant challenge and impedes model generalization across diverse patient populations and imaging conditions. Bridging this gap is particularly essential in image-based diagnosis, where subtle variations in anatomical structures and imaging characteristics can profoundly impact diagnostic performance. Take fetal cardiac ultrasound images as an example, this paper proposes a novel method for unsupervised domain adaptive fetal cardiac structure detection. the method integrates boththe frequency-based distributional properties and anatomical structural information inherent in medical images. Specifically, we introduce a Frequency Distribution Alignment (FDA) module and an Organ Structure Alignment (OSA) module to mitigate detection misalignment across different hospital settings. We demonstrates the effectiveness of these modules through extensive experiments. Our method significantly improves the performance of fetal cardiac structure detection tasks, enabling adaptation to diverse hospital scenarios and showcasing its potential in addressing domain gaps in medical imaging.
暂无评论