The multi-modal personality trait analysis system aims to look at the association between personality characteristics, speech, body language, and facial expressions. Subsequently, many techniques for gathering data ha...
详细信息
High dynamic range (HDR) imaging is a technique that allows an extensive dynamic range of exposures, which is important in imageprocessing, computer graphics, and computer vision. In recent years, there has been a si...
详细信息
High dynamic range (HDR) imaging is a technique that allows an extensive dynamic range of exposures, which is important in imageprocessing, computer graphics, and computer vision. In recent years, there has been a significant advancement in HDR imaging using deep learning (DL). This study conducts a comprehensive and insightful survey and analysis of recent developments in deep HDR imaging methodologies. We hierarchically and structurally group existing deep HDR imaging methods into five categories based on (1) number/domain of input exposures, (2) number of learning tasks, (3) novel sensor data, (4) novel learning strategies, and (5) applications. Importantly, we provide a constructive discussion on each category regarding its potential and challenges. Moreover, we review some crucial aspects of deep HDR imaging, such as datasets and evaluation metrics. Finally, we highlight some open problems and point out future research directions.
Failures of tailings dams have been happening lately. Due to the lack of laws on particular design criteria and stability requirements related monitoring during construction and maintenance, they are thought to be mor...
详细信息
The development of autoregressive modeling (AM) in computer vision lags behind natural language processing (NLP) in self-supervised pre-training. This is mainly caused by the challenge that images are not sequential s...
详细信息
ISBN:
(纸本)1577358872
The development of autoregressive modeling (AM) in computer vision lags behind natural language processing (NLP) in self-supervised pre-training. This is mainly caused by the challenge that images are not sequential signals and lack a natural order when applying autoregressive modeling. In this study, inspired by human beings' way of grasping an image, i.e., focusing on the main object first, we present a semantic-aware autoregressive image modeling (SemAIM) method to tackle this challenge. The key insight of SemAIM is to autoregressive model images from the semantic patches to the less semantic patches. To this end, we first calculate a semantic-aware permutation of patches according to their feature similarities and then perform the autoregression procedure based on the permutation. In addition, considering that the raw pixels of patches are low-level signals and are not ideal prediction targets for learning high-level semantic representation, we also explore utilizing the patch features as the prediction targets. Extensive experiments are conducted on a broad range of downstream tasks, including image classification, object detection, and instance/semantic segmentation, to evaluate the performance of SemAIM. The results demonstrate SemAIM achieves state-of-the-art performance compared with other self-supervised methods. Specifically, with viT-B, SemAIM achieves 84.1% top-1 accuracy for fine-tuning on imageNet, 51.3% AP and 45.4% AP for object detection and instance segmentation on COCO, which outperforms the vanilla MAE by 0.5%, 1.0%, and 0.5%, respectively. Code is available at https://***/skyoux/SemAIM.
Classification and retrieval of medical images (MedIR) are emerging applications of computer vision for enabling intelligent medical diagnostics. Medical images are multi-dimensional and require specialised processing...
详细信息
Classification and retrieval of medical images (MedIR) are emerging applications of computer vision for enabling intelligent medical diagnostics. Medical images are multi-dimensional and require specialised processing for the extraction of features from their manifold underlying content. Existing models often fail to consider the inherent characteristics of data and have thus often fallen short when applied to medical images. In this paper, we present a MedIR approach based on the bag of visual words (BovW) model for content-based medical image retrieval. When it comes to any medical approach models, an imbalance in the dataset is one of the issues. Hence the perspective is also considering a balanced set of categories from an imbalanced dataset. The proposed work on BovW model extracts features from each image are used to train supervised machine learning classifier for X-ray medical image classification and retrieval. During the experimental validation, the proposed model performed well with the classification accuracy of 89.73% and a good retrieval result using our filter-based approach.
We present LOCOvQA, a dynamic benchmark generator for evaluating long-context extractive reasoning in vision language models (vLMs). LOCOvQA augments test examples for mathematical reasoning, vQA, and character recogn...
详细信息
In the current era, machinevision systems are being implemented widely in varied fields due to its key features, such as rapid processing, non-contact-based technology and in-situ measurements. This technology also p...
详细信息
In the current era, machinevision systems are being implemented widely in varied fields due to its key features, such as rapid processing, non-contact-based technology and in-situ measurements. This technology also possesses wide applications in the manufacturing sector. The surface texture properties of any machined component vary based on the manufacturing process, machining parameters, tool and machine conditions etc. As the surface texture of the machined components greatly influences the functional performance, it is vital to examine the surface characteristics. The surface texture of the machine component can be assessed by implementing a series of imageprocessing techniques on its speckle images. Speckle image refers to the randomly distributed granular pattern which is obtained when a rough or textured surface is illuminated using a laser beam. This paper focuses on estimating the orientation of the workpiece and examining the surface characteristics based on the post-processing of the speckle images. The hardened steel workpieces used in this investigation were ground by varying the process parameters and speckle images were obtained at 0°, 30°, 60° and 90° orientations. The shifted power spectral density of the ground sample images contains high-energy coefficients which mimic a line and its orientation varies based on the sample orientation. The Hough transform technique was applied to the binary image of shifted PSD to efficiently determine the orientation. Furthermore, correlations have been established between several surface texture characteristics and GLCM parameters with the surface roughness of ground samples.
image segmentation models are often evaluated using measures of overlap and boundary deviation between a ground truth and a prediction. These measures do not indicate whether a prediction is an overestimation or under...
详细信息
This research study explores the emerging area of quantum-inspired evolutionary algorithms (QIEAs) applied to high-dimensional data processing, with a focus on homeland security imaging systems. This work attempts to ...
详细信息
In recent years, great progress has been made on 2D and 3D image understanding tasks, such as object detection and instance segmentation. The recent trends in technology driverless cars are making a difference in dail...
详细信息
暂无评论