检索结果-内蒙古大学图书馆

38th Conference on Neural Information processing Systems, NeurIPS 2024

作者： Zhang, Yuhui Unell, Alyssa Wang, Xiaohan Ghosh, Dhruba Su, Yuchang Schmidt, Ludwig Yeung-Levy, Serena Stanford University United States University of Washington United States Tsinghua University China

image classification is one of the most fundamental capabilities of machine vision intelligence. In this work, we revisit the image classification task using visually-grounded language models (vLMs) such as GPT-4v and LLavA. We find that existing proprietary and public vLMs, despite often using CLIP as a vision encoder and having many more parameters, significantly underperform CLIP on standard image classification benchmarks like imageNet. To understand the reason, we explore several hypotheses concerning the inference algorithms, training objectives, and data processing in vLMs. Our analysis reveals that the primary cause is data-related: critical information for image classification is encoded in the vLM's latent space but can only be effectively decoded with enough training data. Specifically, there is a strong correlation between the frequency of class exposure during vLM training and instruction-tuning and the vLM's performance in those classes;when trained with sufficient data, vLMs can match the accuracy of state-of-the-art classification models. Based on these findings, we enhance a vLM by integrating classification-focused datasets into its training, and demonstrate that the enhanced classification performance of the vLM transfers to its general capabilities, resulting in an improvement of 11.8% on the newly collected imageWikiQA dataset. © 2024 Neural information processing systems foundation. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

machine Learning Algorithms for Signal and image processing 1

引用

2022年

作者： Deepika Ghai Suman Lata Tripathi Sobhit Saxena Manash Chanda Mamoun Alazab

ISBN: (数字)9781119861850

ISBN: (纸本)9781119861829

Enables readers to understand the fundamental concepts of machine and deep learning techniques with interactive, real-life applications within signal and image processing machine Learning Algorithms for Signal and image processing aids the reader in designing and developing real-world applications using advances in machine learning to aid and enhance speech

关键词： speech signal processing image processing computer vision biomedical signal processing adaptive filtering text processing pre-processing feature extraction source separation data decompositions machine learning tasks Speech recognition image reconstruction object classification and detection Healthcare monitoring biomedical systems green energy sign language recognition fake news detection in social media structural damage prediction epileptic seizure detection

来源：评论

学校读者我要写书评

暂无评论

A machine Learning-based Approach for Automatic Grading and Quality Inspection of Indian Mangoes 2

A Machine Learning-based Approach for Automatic Grading and ...

引用

2nd IEEE-Industrial-Electronics-Society Annual On-Line Conference (ONCON)

作者： Bagchi, Sourav Aditya, Janumpally varun Kumari, Sneha Dhanraj, Malla Jenamani, Mamata Indian Inst Technol Kharagpur Dept Ind & Syst Engn Kharagpur W Bengal India

ISBN: (纸本)9798350357974

Manual visual assessment of mangoes has been problematic for the agriculture sector because of its time-consuming nature and inconsistent evaluation and sorting methods. The advent of automated flaw identification using computer vision and machine learning offers a notable shift and improvement in the visual inspection process. A common issue with mangoes is the presence of dark patches, indicative of disease or rot, which negatively affect the appearance and quality of the fruit. This paper introduces a framework using computer vision which utilizes image analysis and machine learning methods to identify these dark spots, taking into account the mangoes' texture. The proposed framework has a simplified configuration and tuning process, enhancing its ease of deployment in real-world applications. This innovation aligns with the advancements in integrating cutting-edge technologies to optimize efficiency and consistency in agricultural practices, thereby contributing to the evolution of smart agriculture and addressing the challenges and opportunities presented by the next wave of industrial revolution.

关键词： Computer vision System Dark Patches detection image processing Local Binary Pattern (LBP) machine Learning Random Forest SvM Classifier Grading

来源：评论

学校读者我要写书评

暂无评论

Differential Oriented image Foresting Transform and Its applications to Support High-level Priors for Object Segmentation

引用

JOURNAL OF MATHEMATICAL IMAGING AND vision 2023年第5期65卷 802-817页

作者： Condori, Marcos A. T. Miranda, Paulo A. v. Univ Sao Paulo Inst Math & Stat R Matao 1010 BR-05508090 Sao Paulo SP Brazil

image foresting transform (IFT) is a graph-based framework to develop image operators based on optimum connectivity between a root set and the remaining nodes, according to a given path-cost function. Oriented image foresting transform (OIFT) was proposed as an extension of some seeded IFT-based segmentation methods to directed graphs, enabling them to support the processing of global object properties, such as connectedness, shape constraints, boundary polarity, and hierarchical constraints, allowing their customization to a given target object. OIFT lies in the intersection of generalized graph cut and general fuzzy connectedness frameworks, inheriting their properties. Its returned segmentation is optimal, with respect to an appropriate graph cut measure, among all segmentations satisfying the given constraints. In this work, we propose differential oriented image foresting transform, which allows multiple OIFT executions for different root sets, making the processing time proportional to the number of modified nodes. Experimental results show considerable efficiency gains over the sequential flow of OIFTs in image segmentation, while maintaining a good treatment of tie zones. We also demonstrate that the differential flow makes it feasible to incorporate the prior knowledge about the maximum allowable size for the segmented object, thus avoiding false positive errors in the segmentation of multi-dimensional images. We also propose an algorithm to efficiently create a hierarchy map that encodes area-constrained OIFT results for all possible thresholds, facilitating the quick selection of the object of interest.

关键词： Oriented image foresting transform image segmentation in directed graphs Generalized graph cut Differential algorithms

来源：评论

学校读者我要写书评

暂无评论

Training-Based Model Refinement and Representation Disagreement for Semi-Supervised Object Detection

Training-Based Model Refinement and Representation Disagreem...

引用

IEEE/CvF Winter Conference on applications of Computer vision (WACv)

作者： Marvasti-Zadeh, Seyed Mojtaba Ray, Nilanjan Erbilgin, Nadir Univ Alberta Edmonton AB Canada

ISBN: (纸本)9798350318920;9798350318937

Semi-supervised object detection (SSOD) aims to improve the performance and generalization of existing object detectors by utilizing limited labeled data and extensive unlabeled data. Despite many advances, recent SSOD methods are still challenged by inadequate model refinement using the classical exponential moving average (EMA) strategy, the consensus of Teacher-Student models in the latter stages of training (i.e., losing their distinctiveness), and noisy/misleading pseudo-labels. This paper proposes a novel training-based model refinement (TMR) stage and a simple yet effective representation disagreement (RD) strategy to address the limitations of classical EMA and the consensus problem. The TMR stage of Teacher-Student models optimizes the lightweight scaling operation to refine the model's weights and prevent overfitting or forgetting learned patterns from unlabeled data. Meanwhile, the RD strategy helps keep these models diverged to encourage the student model to explore additional patterns in unlabeled data. Our approach can be integrated into established SSOD methods and is empirically validated using two baseline methods, with and without cascade regression, to generate more reliable pseudo-labels. Extensive experiments demonstrate the superior performance of our approach over state-of-the-art SSOD methods. Specifically, the proposed approach outperforms the baseline Unbiased-Teacher-v2 (& Unbiased-Teacher-v1) method by an average mAP margin of 2.23, 2.1, and 3.36 (& 2.07, 1.9 and 3.27) on COCO-standard, COCO-additional, and Pascal vOC datasets, respectively.

关键词： Algorithms Algorithms and algorithms formulations image recognition and understanding machine learning architectures

来源：评论

学校读者我要写书评

暂无评论

Multi-stage generative adversarial networks for generating pavement crack images

引用

ENGINEERING applications OF ARTIFICIAL INTELLIGENCE 2024年 131卷

作者： Han, Chengjia Ma, Tao Ju, Huyan Tong, Zheng Yang, Handuo Yang, Yaowen Southeast Univ Sch Transportat Nanjing 211189 Peoples R China Nanyang Technol Univ Sch Civil & Environm Engn Singapore Singapore

The application of machine learning techniques in pavement health monitoring based on computer vision has greatly improved the accuracy and efficiency in the detection of pavement distress levels and categories. However, a persistent challenge in this field is the issue of sample imbalance, primarily arising from the scarcity of cracked pavement images, which hampers their effectiveness in road maintenance engineering. To address this issue and enhance the fast and stable generation of high-quality crack images for engineering purposes, this study proposes two frameworks based on Generative Adversarial Networks (GAN): Multi-Stage GAN-v1 and MultiStage GAN-v2. These frameworks break down the complex task of directly generating high-quality images into a series of incremental steps, gradually increasing the image resolution from initially generated lowprecision images. Both versions, v1 and v2, consist of multiple sequentially connected generation units, with each unit utilizing the Wasserstein Generative Adversarial Network-Gradient Penalty (WGAN-GP). Furthermore, v2 has the additional capability of generating pavement crack images of specified types and simultaneously providing crack segmentation labels. This feature significantly enhances the practical applicability of the generated data in engineering contexts. In a comprehensive case study, the evaluation results clearly illustrate the superior image generation quality from the two proposed frameworks. Moreover, the results from ablation experiments, involving the training of nine state-of-the-art crack semantic segmentation and object detection networks using both generated images and real images, demonstrate the effective utility of these generated images for training pavement distress detection networks.

关键词： Generative adversarial network Pavement crack image generation Convolutional neural network Road engineering Data enhancement

来源：评论

学校读者我要写书评

暂无评论

Bio-Inspired Electronic Eyes and Synaptic Photodetectors for Mobile Artificial vision

IEEE Journal on Flexible Electronics

引用

IEEE Journal on Flexible Electronics 2022年第2期1卷 76-87页

作者： Choi, Changsoon Seung, Hyojin Kim, Dae-Hyeong Seoul02792 Korea Republic of Seoul08826 Korea Republic of School of Chemical and Biological Engineering Institute of Chemical Processes Seoul National University Seoul08826 Korea Republic of

Conventional imaging and data processing devices are not ideal for mobile artificial vision applications, such as vision systems for drones and robots, because of the heavy and bulky multilens optics in the camera modules. Furthermore, the physically isolated image data processing units of conventional systems induce large power consumption and data latency. For mobile artificial vision applications, electronic eyes, including neuromorphic ones, have been developed inspired by biological eyes and neural networks. Here, we summarize the development of such bio-inspired electronic eyes and synaptic photodetectors (PDs). Bio-inspired electronic eyes, typically consisting of curved image sensor arrays, enable aberration-free imaging and module size miniaturization in addition to other advantageous optical features, such as wide field-of-view and deep depth-of-field. Furthermore, photodetecting devices with synaptic properties can efficiently enhance image contrast because of photon-triggered synaptic plasticity. Therefore, the signal-to-noise ratio of the acquired image can be enhanced, which facilitates efficient image recognition for machine vision. A brief summary of the remaining challenges and prospects concludes this review. © 2022 Institute of Electrical and Electronics Engineers. All rights reserved.

关键词： Cameras

来源：评论

学校读者我要写书评

暂无评论

Detection and diabetic retinopathy grading using digital retinal images

引用

INTERNATIONAL JOURNAL OF INTELLIGENT ROBOTICS AND applications 2023年第2期7卷 426-458页

作者： Malhi, Avleen Grewal, Reaya Pannu, Husanbir Singh Bournemouth Univ Poole England Aalto Univ Dept Comp Sci Espoo Finland Thapar Inst Engn & Technol Patiala India

Diabetic Retinopathy is an eye disorder that affects people suffering from diabetes. Higher sugar levels in blood leads to damage of blood vessels in eyes and may even cause blindness. Diabetic retinopathy is identified by red spots known as microanuerysms and bright yellow lesions called exudates. It has been observed that early detection of exudates and microaneurysms may save the patient's vision and this paper proposes a simple and effective technique for diabetic retinopathy. Both publicly available and real time datasets of colored images captured by fundus camera have been used for the empirical analysis. In the proposed work, grading has been done to know the severity of diabetic retinopathy i.e. whether it is mild, moderate or severe using exudates and micro aneurysms in the fundus images. An automated approach that uses image processing, features extraction and machine learning models to predict accurately the presence of the exudates and micro aneurysms which can be used for grading has been proposed. The research is carried out in two segments;one for exudates and another for micro aneurysms. The grading via exudates is done based upon their distance from macula whereas grading via micro aneurysms is done by calculating their count. For grading using exudates, support vector machine and K-Nearest neighbor show the highest accuracy of 92.1% and for grading using micro aneurysms, decision tree shows the highest accuracy of 99.9% in prediction of severity levels of the disease.

关键词： machine learning image processing Diabetic retinopathy Exudates Microaneurysms

来源：评论

学校读者我要写书评

暂无评论

Real-time Pipeline Tracking System on a RISC-v Embedded System Platform 14

Real-time Pipeline Tracking System on a RISC-V Embedded Syst...

引用

14th Symposium on Computer applications & Industrial Electronics (ISCAIE)

作者： Wei, Eric Sia Siew Aromoye, Ibrahim Akinjobi Hiung, Lo Hai Univ Teknol PETRONAS Dept Elect & Elect Engn Seri Iskandar Malaysia

ISBN: (纸本)9798350348798;9798350348804

Pipeline infrastructures are the most suitable means of transporting oil and gas products, making these infrastructures demand reliable inspection methods to ensure their integrity and reliability. Current inspection techniques are labour-intensive, error-prone, safety-threatening, time-consuming, and limited coverage. This paper presents a realtime Pipeline Tracking System hosted on the RISC-v Embedded System Platform, aiming to automate the inspection process. The model was trained using the YOLOv7 algorithm, which is trained to detect and track pipelines and is deployed on the visionFive 2 Single Board Computer, a RISC-v embedded system platform which offers capabilities in 3D image processing, making it an ideal platform for automated pipeline inspection in resource-constrained environments. The system is designed for integration with unmanned aerial vehicles (UAvs), providing an onboard computer for vision-based detection. Experimental results demonstrate compatibility in resource-constrained environments, emphasising computational efficiency and tracking accuracy. This work contributes to automating pipeline inspection processes, enhancing safety, and advancing RISC-v technology. Future work includes optimising computer vision performance and hardware implementation on a drone.

关键词： Pipeline inspection RISC YOLO Autonomous aerial vehicles Pervasive computing

来源：评论

学校读者我要写书评

暂无评论

CIE XYZ Net: Unprocessing images for Low-Level Computer vision Tasks

引用

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND machine INTELLIGENCE 2022年第9期44卷 4688-4700页

作者： Afifi, Mahmoud Abdelhamed, Abdelrahman Abuolaim, Abdullah Punnappurath, Abhijith Brown, Michael S. York Univ Lassonde Sch Engn Dept Elect Engn & Comp Sci Toronto ON M3J 1P3 Canada

Cameras currently allow access to two image states: (i) a minimally processed linear raw-RGB image state (i.e., raw sensor data);or (ii) a highly-processed nonlinear image state (e.g., sRGB). There are many computer vision tasks that work best with a linear image state, such as image deblurring and image dehazing. Unfortunately, the vast majority of images are saved in the nonlinear image state. Because of this, a number of methods have been proposed to "unprocess" nonlinear images back to a raw-RGB state. However, existing unprocessing methods have a drawback because raw-RGB images are sensor-specific. As a result, it is necessary to know which camera produced the sRGB output and use a method or network tailored for that sensor to properly unprocess it. This paper addresses this limitation by exploiting another camera image state that is not available as an output, but it is available inside the camera pipeline. In particular, cameras apply a colorimetric conversion step to convert the raw-RGB image to a device-independent space based on the CIE XYZ color space before they apply the nonlinear photo-finishing. Leveraging this canonical image state, we propose a deep learning framework, CIE XYZ Net, that can unprocess a nonlinear image back to the canonical CIE XYZ image. This image can then be processed by any low-level computer vision operator and re-rendered back to the nonlinear image. We demonstrate the usefulness of the CIE XYZ Net on several low-level vision tasks and show significant gains that can be obtained by this processing framework. Code and dataset are publicly available at https://***/mahmoudnafifi/CIE_XYZ_NET.

关键词： image color analysis Cameras Pipelines Task analysis image restoration Computer vision Computational modeling CIE XYZ color space color linearization scene-referred image reconstruction image rendering

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：