image classification is one of the most fundamental capabilities of machinevision intelligence. In this work, we revisit the image classification task using visually-grounded language models (vLMs) such as GPT-4v and...
Enables readers to understand the fundamental concepts of machine and deep learning techniques with interactive, real-life applications within signal and imageprocessingmachine Learning Algorithms for Signal and Ima...
详细信息
ISBN:
(数字)9781119861850
ISBN:
(纸本)9781119861829
Enables readers to understand the fundamental concepts of machine and deep learning techniques with interactive, real-life applications within signal and imageprocessingmachine Learning Algorithms for Signal and imageprocessing aids the reader in designing and developing real-world applications using advances in machine learning to aid and enhance speech
Manual visual assessment of mangoes has been problematic for the agriculture sector because of its time-consuming nature and inconsistent evaluation and sorting methods. The advent of automated flaw identification usi...
详细信息
ISBN:
(纸本)9798350357974
Manual visual assessment of mangoes has been problematic for the agriculture sector because of its time-consuming nature and inconsistent evaluation and sorting methods. The advent of automated flaw identification using computer vision and machine learning offers a notable shift and improvement in the visual inspection process. A common issue with mangoes is the presence of dark patches, indicative of disease or rot, which negatively affect the appearance and quality of the fruit. This paper introduces a framework using computer vision which utilizes image analysis and machine learning methods to identify these dark spots, taking into account the mangoes' texture. The proposed framework has a simplified configuration and tuning process, enhancing its ease of deployment in real-world applications. This innovation aligns with the advancements in integrating cutting-edge technologies to optimize efficiency and consistency in agricultural practices, thereby contributing to the evolution of smart agriculture and addressing the challenges and opportunities presented by the next wave of industrial revolution.
image foresting transform (IFT) is a graph-based framework to develop image operators based on optimum connectivity between a root set and the remaining nodes, according to a given path-cost function. Oriented image f...
详细信息
image foresting transform (IFT) is a graph-based framework to develop image operators based on optimum connectivity between a root set and the remaining nodes, according to a given path-cost function. Oriented image foresting transform (OIFT) was proposed as an extension of some seeded IFT-based segmentation methods to directed graphs, enabling them to support the processing of global object properties, such as connectedness, shape constraints, boundary polarity, and hierarchical constraints, allowing their customization to a given target object. OIFT lies in the intersection of generalized graph cut and general fuzzy connectedness frameworks, inheriting their properties. Its returned segmentation is optimal, with respect to an appropriate graph cut measure, among all segmentations satisfying the given constraints. In this work, we propose differential oriented image foresting transform, which allows multiple OIFT executions for different root sets, making the processing time proportional to the number of modified nodes. Experimental results show considerable efficiency gains over the sequential flow of OIFTs in image segmentation, while maintaining a good treatment of tie zones. We also demonstrate that the differential flow makes it feasible to incorporate the prior knowledge about the maximum allowable size for the segmented object, thus avoiding false positive errors in the segmentation of multi-dimensional images. We also propose an algorithm to efficiently create a hierarchy map that encodes area-constrained OIFT results for all possible thresholds, facilitating the quick selection of the object of interest.
Semi-supervised object detection (SSOD) aims to improve the performance and generalization of existing object detectors by utilizing limited labeled data and extensive unlabeled data. Despite many advances, recent SSO...
详细信息
ISBN:
(纸本)9798350318920;9798350318937
Semi-supervised object detection (SSOD) aims to improve the performance and generalization of existing object detectors by utilizing limited labeled data and extensive unlabeled data. Despite many advances, recent SSOD methods are still challenged by inadequate model refinement using the classical exponential moving average (EMA) strategy, the consensus of Teacher-Student models in the latter stages of training (i.e., losing their distinctiveness), and noisy/misleading pseudo-labels. This paper proposes a novel training-based model refinement (TMR) stage and a simple yet effective representation disagreement (RD) strategy to address the limitations of classical EMA and the consensus problem. The TMR stage of Teacher-Student models optimizes the lightweight scaling operation to refine the model's weights and prevent overfitting or forgetting learned patterns from unlabeled data. Meanwhile, the RD strategy helps keep these models diverged to encourage the student model to explore additional patterns in unlabeled data. Our approach can be integrated into established SSOD methods and is empirically validated using two baseline methods, with and without cascade regression, to generate more reliable pseudo-labels. Extensive experiments demonstrate the superior performance of our approach over state-of-the-art SSOD methods. Specifically, the proposed approach outperforms the baseline Unbiased-Teacher-v2 (& Unbiased-Teacher-v1) method by an average mAP margin of 2.23, 2.1, and 3.36 (& 2.07, 1.9 and 3.27) on COCO-standard, COCO-additional, and Pascal vOC datasets, respectively.
The application of machine learning techniques in pavement health monitoring based on computer vision has greatly improved the accuracy and efficiency in the detection of pavement distress levels and categories. Howev...
详细信息
The application of machine learning techniques in pavement health monitoring based on computer vision has greatly improved the accuracy and efficiency in the detection of pavement distress levels and categories. However, a persistent challenge in this field is the issue of sample imbalance, primarily arising from the scarcity of cracked pavement images, which hampers their effectiveness in road maintenance engineering. To address this issue and enhance the fast and stable generation of high-quality crack images for engineering purposes, this study proposes two frameworks based on Generative Adversarial Networks (GAN): Multi-Stage GAN-v1 and MultiStage GAN-v2. These frameworks break down the complex task of directly generating high-quality images into a series of incremental steps, gradually increasing the image resolution from initially generated lowprecision images. Both versions, v1 and v2, consist of multiple sequentially connected generation units, with each unit utilizing the Wasserstein Generative Adversarial Network-Gradient Penalty (WGAN-GP). Furthermore, v2 has the additional capability of generating pavement crack images of specified types and simultaneously providing crack segmentation labels. This feature significantly enhances the practical applicability of the generated data in engineering contexts. In a comprehensive case study, the evaluation results clearly illustrate the superior image generation quality from the two proposed frameworks. Moreover, the results from ablation experiments, involving the training of nine state-of-the-art crack semantic segmentation and object detection networks using both generated images and real images, demonstrate the effective utility of these generated images for training pavement distress detection networks.
Conventional imaging and data processing devices are not ideal for mobile artificial visionapplications, such as vision systems for drones and robots, because of the heavy and bulky multilens optics in the camera mod...
详细信息
Diabetic Retinopathy is an eye disorder that affects people suffering from diabetes. Higher sugar levels in blood leads to damage of blood vessels in eyes and may even cause blindness. Diabetic retinopathy is identifi...
详细信息
Diabetic Retinopathy is an eye disorder that affects people suffering from diabetes. Higher sugar levels in blood leads to damage of blood vessels in eyes and may even cause blindness. Diabetic retinopathy is identified by red spots known as microanuerysms and bright yellow lesions called exudates. It has been observed that early detection of exudates and microaneurysms may save the patient's vision and this paper proposes a simple and effective technique for diabetic retinopathy. Both publicly available and real time datasets of colored images captured by fundus camera have been used for the empirical analysis. In the proposed work, grading has been done to know the severity of diabetic retinopathy i.e. whether it is mild, moderate or severe using exudates and micro aneurysms in the fundus images. An automated approach that uses imageprocessing, features extraction and machine learning models to predict accurately the presence of the exudates and micro aneurysms which can be used for grading has been proposed. The research is carried out in two segments;one for exudates and another for micro aneurysms. The grading via exudates is done based upon their distance from macula whereas grading via micro aneurysms is done by calculating their count. For grading using exudates, support vector machine and K-Nearest neighbor show the highest accuracy of 92.1% and for grading using micro aneurysms, decision tree shows the highest accuracy of 99.9% in prediction of severity levels of the disease.
Pipeline infrastructures are the most suitable means of transporting oil and gas products, making these infrastructures demand reliable inspection methods to ensure their integrity and reliability. Current inspection ...
详细信息
ISBN:
(纸本)9798350348798;9798350348804
Pipeline infrastructures are the most suitable means of transporting oil and gas products, making these infrastructures demand reliable inspection methods to ensure their integrity and reliability. Current inspection techniques are labour-intensive, error-prone, safety-threatening, time-consuming, and limited coverage. This paper presents a realtime Pipeline Tracking System hosted on the RISC-v Embedded System Platform, aiming to automate the inspection process. The model was trained using the YOLOv7 algorithm, which is trained to detect and track pipelines and is deployed on the visionFive 2 Single Board Computer, a RISC-v embedded system platform which offers capabilities in 3D imageprocessing, making it an ideal platform for automated pipeline inspection in resource-constrained environments. The system is designed for integration with unmanned aerial vehicles (UAvs), providing an onboard computer for vision-based detection. Experimental results demonstrate compatibility in resource-constrained environments, emphasising computational efficiency and tracking accuracy. This work contributes to automating pipeline inspection processes, enhancing safety, and advancing RISC-v technology. Future work includes optimising computer vision performance and hardware implementation on a drone.
Cameras currently allow access to two image states: (i) a minimally processed linear raw-RGB image state (i.e., raw sensor data);or (ii) a highly-processed nonlinear image state (e.g., sRGB). There are many computer v...
详细信息
Cameras currently allow access to two image states: (i) a minimally processed linear raw-RGB image state (i.e., raw sensor data);or (ii) a highly-processed nonlinear image state (e.g., sRGB). There are many computer vision tasks that work best with a linear image state, such as image deblurring and image dehazing. Unfortunately, the vast majority of images are saved in the nonlinear image state. Because of this, a number of methods have been proposed to "unprocess" nonlinear images back to a raw-RGB state. However, existing unprocessing methods have a drawback because raw-RGB images are sensor-specific. As a result, it is necessary to know which camera produced the sRGB output and use a method or network tailored for that sensor to properly unprocess it. This paper addresses this limitation by exploiting another camera image state that is not available as an output, but it is available inside the camera pipeline. In particular, cameras apply a colorimetric conversion step to convert the raw-RGB image to a device-independent space based on the CIE XYZ color space before they apply the nonlinear photo-finishing. Leveraging this canonical image state, we propose a deep learning framework, CIE XYZ Net, that can unprocess a nonlinear image back to the canonical CIE XYZ image. This image can then be processed by any low-level computer vision operator and re-rendered back to the nonlinear image. We demonstrate the usefulness of the CIE XYZ Net on several low-level vision tasks and show significant gains that can be obtained by this processing framework. Code and dataset are publicly available at https://***/mahmoudnafifi/CIE_XYZ_NET.
暂无评论