Underwater environments present unique challenges for image capture and analysis due to factors like light attenuation, color distortion, and loss of detail. In this study, we propose a comprehensive strategy to under...
详细信息
The proliferation of digital image repositories has revolutionized the accessibility of visual data across various domains, from scientific research to cultural heritage preservation. However, the quality of images wi...
详细信息
ISBN:
(纸本)9798350390797;9789532901351
The proliferation of digital image repositories has revolutionized the accessibility of visual data across various domains, from scientific research to cultural heritage preservation. However, the quality of images within these repositories varies significantly, posing a challenge for users relying on accurate and high-quality visual information. In recent years, a substantial effort has been made to find an objective and blind image quality assessment (IQA) metric that can correlate with perceived quality measurements. Unfortunately, there is still no definite method to asses this problem because of the complex nature of objectively measuring image distortions. In this paper, we present a comprehensive framework for assessing image quality in online repositories for humanities. The approach relies on a combination of known imageprocessing techniques, including histogram analysis, artefact estimation and dynamic range evaluation. Other parameters are also taken into account, such as resolution, dots per inch (DPI), bit depth and format. By automatically grading image quality using these methods, we aim to provide researchers and practitioners within the humanities field, with a reliable tool for evaluating the trustworthiness and suitability of images sourced from national repositories. Through a systematic examination of image quality metrics, the strengths and weaknesses of existing image collections can be highlighted, paving the way for enhanced curation standards and improved access to high-quality visual resources. Methods were tested on images from online repositories in public domain.
The attribute of signal sparsity is widely used to sparse representaion. The existing nuclear norm minimization and weighted nuclear norm minimization may achieve a suboptimal in real application with the inaccurate a...
详细信息
ISBN:
(纸本)9781728198354
The attribute of signal sparsity is widely used to sparse representaion. The existing nuclear norm minimization and weighted nuclear norm minimization may achieve a suboptimal in real application with the inaccurate approximation of rank function. This paper presents a novel denoising method that preserves fine structures in the image by imposing L-1 norm constraints on the wavelet transform coefficients and low rank on high-frequency components of group similar patches. An efficient proximal operator of Truncated Weighted Nuclear Norm (TWNN) is proposed to accurately recover the underlying high-frequency components of low rank patches. By combining a wavelet domain sparse preservation prior with TWNN, the proposed method significantly improves the reconstruction accuracy, leading to a higher PSNR/SSIM and visual quality than state of the art approaches.
An imageprocessing algorithm for real-time examination of LED light strips is proposed, which enables quick detection of blind LED beads in strips. It is successfully used in production line to replace manual inspect...
详细信息
One of the most important information needed while performing unmanned aerial vehicles (UAV) operations is about the platform location and the environment. Such platforms mostly use GNSS signals outdoors. However, in ...
详细信息
ISBN:
(数字)9781665450928
ISBN:
(纸本)9781665450928
One of the most important information needed while performing unmanned aerial vehicles (UAV) operations is about the platform location and the environment. Such platforms mostly use GNSS signals outdoors. However, in indoor areas where GNSS signals cannot be received or in situations where signals are jammed, it is not possible to obtain location information using these signals. For that reason, alternative navigation systems have become so crucial. One of the most preferred systems among navigation technologies is the visual simultaneous localization and mapping (vSLAM) method performed using RGB cameras on the UAVs. In this study, an open monocular image dataset called AG-Mono was created and published online to test the performance of vSLAM algorithms. This dataset was created at three different exposure times using a handheld platform, and it includes video sequences at 640x480 image resolution. The experimental area where the images were created is a closed corridor with 16.5 x 4.5 meters and four sharp corners.
Learned image Compression (LIC), which uses neural networks to compress images, has experienced significant growth in recent years. The hyperprior-module-based LIC model has achieved higher performance than classical ...
详细信息
image denoising is a crucial step in image acquisition and processing that helps improve the image quality by removing the unwanted noise. In this paper Gaussian and median filters are used as denoiser and performance...
详细信息
Given the growing dependence on medical imaging, there is a significant requirement for automated report generation, which can save the radiologist's time and reduce the possibility of diagnostic errors. Existing ...
详细信息
ISBN:
(纸本)9798350377873;9798350377866
Given the growing dependence on medical imaging, there is a significant requirement for automated report generation, which can save the radiologist's time and reduce the possibility of diagnostic errors. Existing approaches face various difficulties, including insufficient professionalism, a variety of diseases, and fluency in reports. These problems are the result of the use of an encoder-decoder deep learning architecture to establish a uni-directional image-to-report relationship and neglect the bidirectional connections between images and reports, making it challenging to establish the intrinsic medical correlations between them. To this end, we propose a novel approach for chest radiology report generation based on multimodal feature fusion. Our method uses textual and visual features that are taken from medical chest X-ray images and their real reports. Firstly, we use a vision transformer to extract visual features from medical images;on the other hand, we use the Word2Vec model to extract semantic features from textual medical reports. Additionally, we employ advanced techniques such as channel attention networks and cross- modal information fusion modules to enhance the quality and coherence of the generated reports. We have evaluated our proposed approach on two publicly available chest X-ray datasets, IU X-ray and NIH. The results show that our approach outperforms state-of-the-art methods. Particularly in the ROUGE metric and BLEU metric.
Recent deep learning based visual simultaneous localization and mapping (SLAM) methods have made significant progress. However, how to make full use of visual information as well as better integrate with inertial meas...
详细信息
ISBN:
(纸本)9798350384581;9798350384574
Recent deep learning based visual simultaneous localization and mapping (SLAM) methods have made significant progress. However, how to make full use of visual information as well as better integrate with inertial measurement unit (IMU) in visual SLAM has potential research value. This paper proposes a novel deep SLAM network with dual visual factors. The basic idea is to integrate both photometric factor and re-projection factor into the end-to-end differentiable structure through multi-factor data association module. We show that the proposed network dynamically learns and adjusts the confidence maps of both visual factors and it can be further extended to include the IMU factors as well. Extensive experiments validate that our proposed method significantly outperforms the state-of-the-art methods on several public datasets, including TartanAir, EuRoC and ETH3D-SLAM. Specifically, when dynamically fusing the three factors together, the absolute trajectory error for both monocular and stereo configurations on EuRoC dataset has reduced by 45.3% and 36.2% respectively.
Scene text image super-resolution has significantly improved the accuracy of scene text recognition. However, many existing methods emphasize performance over efficiency and ignore the practical need for lightweight s...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Scene text image super-resolution has significantly improved the accuracy of scene text recognition. However, many existing methods emphasize performance over efficiency and ignore the practical need for lightweight solutions in deployment scenarios. Faced with the issues, our work proposes an efficient framework called SGENet to facilitate deployment on resource-limited platforms. SGENet contains two branches: super-resolution branch and semantic guidance branch. We apply a lightweight pre-trained recognizer as a semantic extractor to enhance the understanding of text information. Meanwhile, we design the visual-semantic alignment module to achieve bidirectional alignment between image features and semantics, resulting in the generation of high-quality prior guidance. We conduct extensive experiments on benchmark dataset, and the proposed SGENet achieves excellent performance with fewer computational costs.
暂无评论