The proceedings contain 21 papers. The topics discussed include: from ray tracing to channel impulse responses: a review on the description of polarimetric time-invariant SISO channels;an efficient algorithm for sched...
ISBN:
(纸本)9798350388459
The proceedings contain 21 papers. The topics discussed include: from ray tracing to channel impulse responses: a review on the description of polarimetric time-invariant SISO channels;an efficient algorithm for scheduling aircraft landing problem;modeling and characterization of a compact in line filter with transmission zeros;from concept to implementation: lessons learned in designing and deploying a visible light positioning system;designing an augmented reality teaching module for power consumption in FPGAs;navigating the future: digital twin in maritime industry;measurement of a baby dummy in a car for child presence detection;advancing automotive connectivity: new technologies and security considerations;digital twins to monitor IoT devices for green transformation of university campus;and feasibility study of time synchronization solution for the bistatic synthetic aperture radar using mobile platforms.
作者:
Chen, ZhaoguoCollege of Arts
Shandong Agricultural Engineering University Shandong Province Jinan250103 China
To fully harness the capabilities of computer graphics and imageprocessing technologies and elevate the quality of visual communication design, this paper presents a comprehensive suite of innovative methodologies. F...
详细信息
Zero-shot learning (ZSL) directs the challenge of classifying unseen test images without explicit training on those samples. ZSL can identify and classify unlabeled images available in abundance by learning from visua...
详细信息
ISBN:
(纸本)9783031734762;9783031734779
Zero-shot learning (ZSL) directs the challenge of classifying unseen test images without explicit training on those samples. ZSL can identify and classify unlabeled images available in abundance by learning from visual and semantic embedding vectors (feature vectors). Information-enriched visual features extracted from images play a crucial role in ZSL. This paper proposes a hybrid feature approach that integrates low-level (LL), and high-level (HL) features extracted from images. Gray Level Co-occurrence Matrix (GLCM) and Gabor features are employed to obtain LL texture features, while HL features are derived from the ResNet-50 model, renowned for capturing complex hierarchical representations. These hybrid visual features are then mapped with semantic features using linear mapping, where the semantic features are embedding vectors of labels generated by the fastText model. Experiments on the AWA2 and SUN datasets are conducted in a bid to evaluate the proposed approach's effectiveness. The hybrid feature approach has demonstrated enhanced quality in zero-shot image classification, effectively classifying images that the model has not seen during training.
The Multimodal Abstractive Summarization task aims to generate a concise summary using given multimodal data (textual and visual). Existing related research is still simple splicing and blending of information from mu...
详细信息
ISBN:
(纸本)9789819794393;9789819794409
The Multimodal Abstractive Summarization task aims to generate a concise summary using given multimodal data (textual and visual). Existing related research is still simple splicing and blending of information from multiple modalities, without considering the interaction between image and corresponding text and the contextual structural relationship of the image and text. We believe that these existing models can't fully integrate multimodal information and leverage the Transformer's ability to process sequential data. To this end, for MAS task, we use image captions that are highly correlated with the image for image fusion;and design image-text alignment tasks to improve the effectiveness of visual modalities in embedding text summary tasks;and propose a sequential structured image-text fusion method to enhance the model's ability of sequences semantic understanding. Through these methods, we can give full play to the contribution of visual modality information to the summary task to enhance the MAS model, thereby generating more accurate summaries. We conducted experiments on related dataset and found that ROUGE-1, ROUGE-2, and ROUGE-L improved by 1.34, 1.64, and 1.32 compared to the baseline model. Additionally, we contributed a large-scale sequential structured multimodal abstractive summarization dataset.
The majority of existing low-light image enhancement methods are based on uniform low illumination, they are prone to issues such as overexposure, dark area noise amplification, when applied to nighttime road images w...
详细信息
Text-to-image generation is a cutting-edge technology that enables computers to generate images from textual descriptions. While this technology has been extensively researched and applied to English language text, ap...
详细信息
ISBN:
(纸本)9783031804373;9783031804380
Text-to-image generation is a cutting-edge technology that enables computers to generate images from textual descriptions. While this technology has been extensively researched and applied to English language text, applying it to Arabic language text is still in its early stages. Additionally, the Arabic language is challenging due to its right-to-left writing system and extensive vocabulary of 1.3 million words. In this paper, we explore text-to-image generation for generating images from Arabic language text descriptions. Firstly, we fine-tune a transformer-based model pre-trained on the Arabic text to transform the text information into affine transformation within the DF-GAN generator. Secondly, we present a text transformer that combines LSTM layers to address the limitation of unrecognized words. Thirdly, a mask predictor is trained into the generator using a weakly supervised method and incorporated into the affine transformation for a more effective integration of image and text features. In addition, we add the DAMSM loss function as a regularization to the loss function to achieve convergences and stability in the training phase. The experiment on two challenging datasets CUB and Oxford-flower shows that our architectures can accurately generate high-quality images faithfully representing the Arabic textual descriptions. We believe the scaling of this task could have critical applications in fields such as Arabic visual learning, e-commerce, advertising, and entertainment.
visualizing perceived content through functional magnetic resonance imaging (fMRI) analysis is a captivating research area in brain decoding. Previous studies have primarily focused on restoring either high-level sema...
详细信息
ISBN:
(纸本)9789819770007;9789819770014
visualizing perceived content through functional magnetic resonance imaging (fMRI) analysis is a captivating research area in brain decoding. Previous studies have primarily focused on restoring either high-level semantic features or low-level semantic features from fMRI data, but rarely achieved effective restoration of both. This study proposes a novel approach for decoding the visual cortex activity measured by fMRI into the layered visual features that share the hierarchical information from the corresponding images. By iteratively optimizing the relationship between the layered visual features and the image's depth features extracted by a visual transformer, the method in this research significantly improves the reconstruction of the image's deep features. Furthermore, by incorporating the prior natural image information through a deep generator network, this work enhances the reconstruction process, resulting in richer semantic details. Experimental results verify the effectiveness of our methodology in restoring both high-level and low-level semantic features of the perceived images, ultimately enhancing the overall visual fidelity of the reconstructed image. Importantly, our model demonstrates successful generalization to reconstruct artificial shapes, indicating that the performance of our model is not simply achieved by relying on extensive sample datasets. These findings prove the efficacy of the method in effectively reconstructing the perceived content based on the hierarchical neural representations, providing a new method to study the brain's underlying mechanisms.
The proceedings contain 26 papers. The topics discussed include: underwater aquaculture object contour detection dataset with benchmarks;the detection of foreign objects on coal mine conveyor belts based on semi-super...
ISBN:
(纸本)9798400709647
The proceedings contain 26 papers. The topics discussed include: underwater aquaculture object contour detection dataset with benchmarks;the detection of foreign objects on coal mine conveyor belts based on semi-supervised learning;abnormal cable detection at the bottom of high-speed train using improved Yolov8;research on image edge detection algorithms based on fractional-order differentiation;weakly supervised salient object detection via hybrid pseudo labels;region feature-guided automated seed selection region growing method for chip-surface-defect detection;enhanced multi-modal heart segmentation approach based on SWIN-transformer for small datasets;blind image quality assessment based on human visual perception;and hybrid loss and 3D convolutional multilayer perceptron for hyperspectral images classification with noisy labels.
Underwater images are widely used in marine science, ocean engineering, and underwater robotics. However, challenges such as insufficient lighting, scattering, and absorption often degrade image quality, limiting thei...
详细信息
暂无评论