Depth perception is essential for our daily experiences, aiding in orientation and interaction with our surroundings. Virtual Reality allows us to decouple such depth cues mainly represented through binocular disparit...
详细信息
ISBN:
(数字)9798350374490
ISBN:
(纸本)9798350374506
Depth perception is essential for our daily experiences, aiding in orientation and interaction with our surroundings. Virtual Reality allows us to decouple such depth cues mainly represented through binocular disparity and motion parallax. Dealing with fully-mesh-based rendering methods these cues are not problematic as they originate from the object's underlying geometry. However, manipulating motion parallax, as seen in stereoscopic imposter-based rendering, raises multiple perceptual questions. Therefore, we conducted a user experiment to investigate how varying object sizes affect such visual errors and perceived 3-dimensionality, revealing an interestingly significant negative correlation and new assumptions about visual quality.
Calamansi has been declared as one of the most important fruit growing crops in the Philippines. However, due to certain bacteria, it is susceptible to certain diseases affecting its harvest rate. This paper aims to e...
Calamansi has been declared as one of the most important fruit growing crops in the Philippines. However, due to certain bacteria, it is susceptible to certain diseases affecting its harvest rate. This paper aims to effectively monitor the state of the calamansi at its healthy state and at its diseased state. Specifically, it classifies diseases such as Citrus Canker, Citrus Scab, and Citrus Browning by utilizing existing imageprocessing techniques for disease detection of different fruits and determining which algorithm is most apt for this application in terms of precision, accuracy and recall. Techniques such as K-Means Clustering, utilization of an Artificial Neural Network (ANN), feature extraction through GLCM along with the usage of a minimum distance classifier, a Support Vector Machine (SVM) classifier and other techniques and/or their combinations were explored and measured. The researchers performed two kinds of tests: 1×1 comparison and merged comparison. For the 1×1 comparison, making use of GrabCut, color feature extraction, and SVM produced the best overall results, with an overall average of 98% for precision, 95% for accuracy, 91% for recall, and 94% for F-score. Adaptive Gaussian Filtering along with texture feature extraction and SVM was the most accurate for detecting calamansi fruits with citrus canker and citrus scab. Overall, the two methods acquired the same average accuracy of 61%
image captioning is one of the most prevalent and difficult challenges in Natural Language processing and Computer Vision: given an image, a written description of the image must be developed. The counterpart of the t...
image captioning is one of the most prevalent and difficult challenges in Natural Language processing and Computer Vision: given an image, a written description of the image must be developed. The counterpart of the text to image problem is text to image synthesis, which involves creating a picture from a written description. These challenges approximate language translation challenges from a high-level perspective. images and text are two separate "languages" to convey related information, just like equivalent semantics might be represented in two different languages. This is undoubtedly beneficial, however existing AI systems are a long way from achieving this. Strong neural network topologies, such GANs (Generative Adversarial Networks), have been found to provide effective outcomes in recent years. There are, however, some text to image synthesis algorithms that use GAN (Generative Adversarial Network) that attempt to directly map words and characters to image pixels utilizing image synthesis and natural language synthesis approaches. To reconcile these advancements in text as well as image modeling and successfully translate visual representations from characters to pixels, we design an innovative deep architecture and GAN formulation in this study. Here, the demonstration of our model has been done by generating plausible images of birds and flowers from detailed text descriptions.
The flexible field of view (FoV) and large depth of field (DoF) are the main bricks that build the strong immersive experience in Metaverse. However, due to the nature of optics, the captured multi-view images are gen...
The flexible field of view (FoV) and large depth of field (DoF) are the main bricks that build the strong immersive experience in Metaverse. However, due to the nature of optics, the captured multi-view images are generally with flexible FoV but limited DoF. To extend the DoF of captured data, in this paper, we propose an all-in-focus image fusion scheme by varifocal multi-view computational imaging. Varifocal multi-view images are a series of multi-view images in different DoF, where different scene contents and blur degrees are the main features among views. Due to the complex inter-view features, the existing extending DoF methods on varifocal multi-view images yield severe ghosting problem. To alleviate the ghosting problem, a patch-based DenseNet image fusion network is designed and embedded in the proposed scheme. The patch-based image fusion network enables to mitigate the ghosting problem in the fused image. Experiments on varifocal multi-view images of different scenes demonstrate that the proposed all-in-focus image fusion scheme can synthesize all-in-focus results with higher visual quality and accuracy. The proposed all-in-focus image fusion scheme is expected to benefit metaverse, photo-realistic novel view synthesis, interactive and immersive experience.
Due to the accessibility of virtual reality in recent years, there has been a great interest in producing and streaming omnidirectional $(360 ^{\circ}$ field of view) high resolution images and videos. Since both high...
详细信息
Due to the accessibility of virtual reality in recent years, there has been a great interest in producing and streaming omnidirectional $(360 ^{\circ}$ field of view) high resolution images and videos. Since both high resolution and high quality are demanding for the storage and distribution of such content, the use of advanced compression methods is a key factor in achieving this goal. This paper provides an objective comparison of conventional image compression codecs (JPEG, JPEG XL, HEIC, AVIF, VVC Intra) and deep learning image compression algorithms with a JPEG AI framework recommendation. The visual quality evaluation is based on ten images from publicly available databases compressed to predetermined bit rates. Six full reference objective metrics (WS-PSNR, MS-SSIM, VIFp, FSIMc, GMSD, VMAF) are used to evaluate the visual quality of the compressed images. Modern image compression codecs outperform the oldest and most widely used codec JPEG in terms of bandwidth reduction but require more processing power and system resources.
Wireless capsule endoscopy (WCE) is a non-surgical diagnostic procedure enabling the examination of the whole human gastrointestinal tract. Thus, a patient swallows a capsule that travels down the human digestive syst...
详细信息
ISBN:
(数字)9781665452762
ISBN:
(纸本)9781665452762
Wireless capsule endoscopy (WCE) is a non-surgical diagnostic procedure enabling the examination of the whole human gastrointestinal tract. Thus, a patient swallows a capsule that travels down the human digestive system and a camera captures wirelessly thousands of images that are transmitted to an external recording device. The diagnosis of these images need a specialist who can identify gastrointestinal abnormalities and it is very time-consuming. Recently, artificial intelligence and deep learning techniques aim to automate disease diagnosis and identification of tumors in the gastrointestinal tract (GI) such as polyps, ulcers and bleeding, etc. In this paper, a deep learning method is proposed for gastrointestinal disease classification. The pre-trained model ResNet50 is fine-tuned through transfer learning to extract deep features from WCE images. The proposed algorithm is trained and tested on the publicly available dataset k-vasir capsule, which contains 14 different classes of gastrointestinal anomalies.
We investigate Self-Attention (SA) networks for directly learning visual representations for prosthetic vision. Specifically, we explore how the SA mechanism can be leveraged to produce task-specific scene representat...
We investigate Self-Attention (SA) networks for directly learning visual representations for prosthetic vision. Specifically, we explore how the SA mechanism can be leveraged to produce task-specific scene representations for prosthetic vision, overcoming the need for explicit hand-selection of learnt features and post-processing. Further, we demonstrate how the mapping of importance to image regions can serve as an explainability tool to analyse the learnt vision processing behaviour, providing enhanced validation and interpretation capability than current learning-based methods for prosthetic vision. We investigate our approach in the context of an orientation and mobility (OM) task, and demonstrate its feasibility for learning vision processing pipelines for prosthetic vision.
Medical data analysis is a critical process aimed at extracting valuable insights and knowledge from complex healthcare information. It plays a vital role in enhancing diagnostics, treatment planning, and medical rese...
Medical data analysis is a critical process aimed at extracting valuable insights and knowledge from complex healthcare information. It plays a vital role in enhancing diagnostics, treatment planning, and medical research. In this context, medical images serve as a fundamental source of information, providing visual representations of anatomical structures and pathological conditions. imageprocessing techniques, utilizing mesh-based representations, offer unique opportunities for advancing the analysis of medical *** article presents a new method for analyzing medical data based on the concept of meshing. By representing medical images as undirected graphs, the proposed approach enables efficient exploration and analysis of spatial relationships.
In the additive manufacturing process, the molten pool contains abundant information related to the deposition quality and stability. Real-time monitoring and feature extraction of the molten pool is of great signific...
In the additive manufacturing process, the molten pool contains abundant information related to the deposition quality and stability. Real-time monitoring and feature extraction of the molten pool is of great significance for closed-loop control of molten pool and guarantee of consistency and stability of the deposition process. In this study, we develop a visual monitoring system of molten pool for electron beam freeform fabrication(EBF 3 ) and propose corresponding digital imageprocessing algorithms to extract molten pool geometric and identify the wire position and metal transfer state of the deposition. The effectiveness of the proposed algorithm is verified by deposition experiments under different beam currents and different metal transfer states, and the transfer function model between the molten pool feature and the energy input is established by step response experiments. The proposed molten pool visual monitoring method and model identification work provide the foundation for the closed-loop control of the molten pool.
Among the most significant areas of computational imageprocessing research is multi-focus picture fusion, particularly in the DCT domain. It is used to create a single image from focused regions and significant detai...
Among the most significant areas of computational imageprocessing research is multi-focus picture fusion, particularly in the DCT domain. It is used to create a single image from focused regions and significant details of input multi-focus photos. This technique is particularly effective in JPEG images. Researchers often focus on focus measurement computations and direct DCT fusion methods. However, previous DCT works made mistakes in choosing appropriate divided blocks. This article presents a reliable image fusion technique that can fuse multiple damaged input pictures into one crisp output picture, providing more detailed information about the combined input pictures. The Laplacian variance (VOL) and energy (EOL) criteria are used to gauge picture contrast, reducing penalties caused by incorrect choice of blocks. By contrasting the results of the suggested algorithms with the output from earlier algorithms, the output image quality of the suggested techniques is demonstrated.
暂无评论