检索结果-内蒙古大学图书馆

International Conference on image processing, Computer vision and machine Learning (ICICML)

作者： Zou, Yifei Wuhan Univ WHU Wuhan Peoples R China

ISBN: (纸本)9781665464680

In the process of grabbing and issuing of the unattended system for fine particles dry bulk materials, the three-dimensional distribution of dry bulk materials is continuous, extremely irregular and fast dynamic. The selection of grabbing area is related to the grabbing safety and grabbing effect. In the extreme case of three-dimensional distribution, the improper selection of grabbing area may lead to accidents such as turning of grabs, closing rope out of groove, poor grabbing effect, or unsatisfied grab efficiency. Based on the distribution characteristics of dry bulk materials, a sliding window detection algorithm based on three-dimensional feature distribution is proposed in this paper. Before grabbing, the detection and analysis are carried out according to the distribution of materials, and the influence characteristics of relatively safe and effective distribution are judged, such as central depression, superelevation slope and other characteristics. The Digital Elevation Model (DEM)three-dimensional feature distribution model of the whole area is established, and the sliding window is further used to detect safety and efficient grab areas. The simulation results show that the sliding window detection algorithm based on DEM three-dimensional feature distribution can predict a more efficient and safe grab area according to the distribution of stacking materials, which provides a new detection method for the application of unattended bridge crane grab engineering.

关键词： three-dimensional distribution grabbing safety grabbing effect sliding window DEM

来源：评论

学校读者我要写书评

暂无评论

Dual-Adaptive Heterojunction Synaptic Transistors for Efficient machine vision in Harsh Lighting Conditions

引用

ADVANCED MATERIALS 2024年第32期36卷 2404160-2404160页

作者： Wang, Yiru Nie, Shimiao Liu, Shanshuo Hu, Yunfei Fu, Jingwei Ming, Jianyu Liu, Jing Li, Yueqing He, Xiang Wang, Le Li, Wen Yi, Mingdong Ling, Haifeng Xie, Linghai Huang, Wei Nanjing Univ Posts & Telecommun NJUPT State Key Lab Organ Elect & Informat Displays Nanjing 210023 Peoples R China Nanjing Univ Posts & Telecommun NJUPT Inst Adv Mat IAM Nanjing 210023 Peoples R China Northwestern Polytech Univ Frontiers Sci Ctr Flexible Elect FSCFE MIIT Key Lab Flexible Elect KLoFE Xian 710072 Peoples R China

Photoadaptive synaptic devices enable in-sensor processing of complex illumination scenes, while second-order adaptive synaptic plasticity improves learning efficiency by modifying the learning rate in a given environment. The integration of above adaptations in one phototransistor device will provide opportunities for developing high-efficient machine vision system. Here, a dually adaptable organic heterojunction transistor as a working unit in the system, which facilitates precise contrast enhancement and improves convergence rate under harsh lighting conditions, is reported. The photoadaptive threshold sliding originates from the bidirectional photoconductivity caused by the light intensity-dependent photogating effect. Metaplasticity is successfully implemented owing to the combination of ambipolar behavior and charge trapping effect. By utilizing the transistor array in a machine vision system, the details and edges can be highlighted in the 0.4% low-contrast images, and a high recognition accuracy of 93.8% with a significantly promoted convergence rate by about 5 times are also achieved. These results open a strategy to fully implement metaplasticity in optoelectronic devices and suggest their vision processing applications in complex lighting scenes. Organic heterojunction transistors are designed to integrate light intensity-adaptive threshold sliding and second-order adaptive metaplasticity. The unique dual adaptability enables the highlighting of 0.4% low-contrast images, and the efficient recognition can be achieved benefiting from the learning rate changes in the backpropagation process. image

关键词： adaptation machine vision metaplasticity organic heterojunction visuomorphic computing

来源：评论

学校读者我要写书评

暂无评论

Smartphone based app development with machine learning using Hibiscus sabdariffa L. extract for pH estimation

引用

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS 2025年 257卷

作者： Aydin, Omer Faruk Aydin, Merve Demir, Melisa Caliskan Kahraman, Sibel Istanbul Aydin Univ Dept Comp Programming Istanbul Turkiye Marmara Univ Dept Control & Automat Technol Istanbul Turkiye Istanbul Aydin Univ Dept Ind Engn Istanbul Turkiye Istanbul Aydin Univ Dept Food Engn Istanbul Turkiye

This study presents a novel approach for pH estimation in buffer solutions using images of solutions prepared with Hibiscus sabdariffa L. as a natural pH indicator. The images of the solutions, each displaying distinctive colours indicative of their pH levels, were transformed into standardized 200x200-pixel images through the application of image processing techniques. Following this, a pH prediction model was constructed using the Adaptive Boosting regressor algorithm. The pH values of the training data used when training the model were distributed irregularly between 0-14. The models were trained with 94 pictures and 1880 experimental values. In addition, a reliable pre-processing part has been placed into the model using image processing techniques, allowing test data to be obtained in any desired environment. The obtained training and test data were separated from noise parameters, affecting the prediction results negatively. A smartphone application based on the model has been developed and made available to everyone. This innovative methodology bridges the gap between traditional pH measurement techniques and computer vision, offering amore accessible and eco-friendly means of pH assessment. The practical applications of this research extend to various fields, including environmental monitoring, agriculture, and educational settings.

关键词： machine learning image processing pH estimation Hibiscus sabdariffa L. Smartphone

来源：评论

学校读者我要写书评

暂无评论

MUWS'2023: The 2nd International Workshop on Multimodal Human Understanding for the Web and Social Media 23

MUWS'2023: The 2nd International Workshop on Multimodal Huma...

引用

32nd ACM International Conference on Information and Knowledge Management (CIKM)

作者： Cheema, Gullal S. Hakimov, Sherzod Kastner, Marc A. Garcia, Noa Leibniz Univ Hannover L3S Res Ctr Hannover Germany Univ Potsdam Potsdam Germany Kyoto Univ Kyoto Japan Osaka Univ Osaka Japan

ISBN: (纸本)9798400701245

Multimodal human understanding and analysis is an emerging research area that cuts through several disciplines like Computer vision (CV), Natural Language processing (NLP), Speech processing, Human-Computer Interaction (HCI), and Multimedia. Several multimodal learning techniques have recently shown the benefit of combining multiple modalities in image-text, audio-visual and video representation learning and various downstream multimodal tasks. At the core, these methods focus on modelling the modalities and their complex interactions by using large amounts of data, different loss functions and deep neural network architectures. However, for many Web and Social media applications, there is the need to model the human, including the understanding of human behaviour and perception. For this, it becomes important to consider interdisciplinary approaches, including social sciences, semiotics and psychology. The core is understanding various cross-modal relations, quantifying bias such as social biases, and the applicability of models to real-world problems. Interdisciplinary theories such as semiotics or gestalt psychology can provide additional insights and analysis on perceptual understanding through signs and symbols via multiple modalities. In general, these theories provide a compelling view of multimodality and perception that can further expand computational research and multimedia applications on the Web and Social media. The theme of the MUWS workshop, multimodal human understanding, includes various interdisciplinary challenges related to social bias analyses, multimodal representation learning, detection of human impressions or sentiment, hate speech, sarcasm in multimodal data, multimodal rhetoric and semantics, and related topics. The MUWS workshop will be an interactive event and include keynotes by relevant experts, poster and demo sessions, research presentations and discussion.

关键词： multimodality machine learning image-text relations social media web human understanding semiotics

来源：评论

学校读者我要写书评

暂无评论

涉水视觉

引用

电子学报 2024年第4期52卷 1041-1082页

作者：李学龙西北工业大学光电与智能研究院陕西西安710072 智能交互与应用工业和信息化部重点实验室(西北工业大学) 陕西西安710072

地球表面有约71%的面积被江河湖海等水体覆盖,陆地上的成像也会受到云雪雨雾等水体影响,但是,当前常见的机器视觉科研工作和应用系统基本只围绕空气和真空介质中的视觉任务展开,涉及不同形态水体的视觉工作没有得到系统的研究.涉水视觉(... 详细信息

地球表面有约71%的面积被江河湖海等水体覆盖,陆地上的成像也会受到云雪雨雾等水体影响,但是,当前常见的机器视觉科研工作和应用系统基本只围绕空气和真空介质中的视觉任务展开,涉及不同形态水体的视觉工作没有得到系统的研究.涉水视觉(water-related vision)作为涉水光学技术在视觉领域的具象化体现,重点研究光与水的物质相互作用及跨介质传播过程中,涉水视觉影像信号智能处理与分析方面的科学问题,以及先进智能涉水视觉装备研制方面的工程技术问题.本文从“为什么大海是蓝色的?”这一具有普适意义的问题出发,系统介绍了水对光的吸收、散射、衰减作用机理,对涉水视觉任务造成的影响,以及现有的涉水图像处理与解析方法.本文基于水体光学特性及成像退化机理,介绍了团队在探索涉水成像和图像解析等涉水视觉关键技术及装备方面的成果,先后研制了全海深超高清相机“海瞳”、全海深3D相机、全海深高清摄像机等,形成了从色彩、强度、偏振、光谱等全方位、体系化的水下观测解析装备研制能力,填补了我国全海深光学视觉技术的空白,推动了我国涉水视觉领域技术的升级,应用价值和社会效益显著.

关键词：涉水视觉涉水光学多模态认知计算机器视觉图像视频信号处理地外海洋

来源：评论

学校读者我要写书评

暂无评论

Chaos Theory Based Gravitational Search Algorithm For Medical image Segmentation 27th

Chaos Theory Based Gravitational Search Algorithm For Medica...

引用

27th International Conference on Pattern Recognition, ICPR 2024

作者： Rather, Sajad Ahmad Roy, Partha Pratim Das, Sujit Department of Computer Science and Engineering Indian Institute of Technology Roorkee247667 India Department of Computer Science and Engineering National Institute of Technology Telangana506004 India

ISBN: (纸本)9783031781032

Multilevel thresholding plays a crucial role in image processing, with extensive applications in object detection, machine vision, medical imaging, and traffic control systems. It entails the partitioning of an image into distinct regions based on optimal pixel values. However, as the number of threshold levels increases, so does the computational cost for segmentation. To address this challenge, a novel method is proposed namely Chaos theory based Gravitational Search Algorithm (CGSA) for multilevel thresholding. CGSA combines the standard Gravitational Search Algorithm (GSA) for exploration with chaotic maps for exploitation of the complex pixel problem space. In this study, Kapur’s entropy method is utilized to segment sample images into various partitions based on optimal pixel values. The effectiveness of CGSA in real-world scenarios is evaluated using COVID-19 chest CT scan imaging datasets from Kaggle database. The quality, symmetry, and consistency of the segmented output are assessed using metrics like Peak Signal to Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM), and Feature Similarity Index Measure (FSIM). Qualitative analysis includes convergence curves, segmented graphs, colormap images, and box plots. Statistical validation is conducted using the signed Wilcoxon rank sum test. Additionally, a comparison is made between CGSA’s performance and that of eight state-of-the-art heuristic algorithms. The findings demonstrate the superior performance of CGSA, evidenced by its reduced computational time and enhanced image quality metrics values. Specifically, CGSA achieved SSIM of 0.81, FSIM of 0.82, and PSNR of 24.27, surpassing the performance of other competitive algorithms. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Signal to noise ratio

来源：评论

学校读者我要写书评

暂无评论

Scrape, Cut, Paste and Learn: Automated Dataset Generation Applied to Parcel Logistics 21

Scrape, Cut, Paste and Learn: Automated Dataset Generation A...

引用

21st IEEE International Conference on machine Learning and applications (IEEE ICMLA)

作者： Naumann, Alexander Hertlein, Felix Zhou, Benchun Doerr, Laura Furmans, Kai FZI Res Ctr Informat Technol Karlsruhe Germany Karlsruhe Inst Technol KIT Inst Mat Handling & Logist Karlsruhe Germany

ISBN: (纸本)9781665462839

State-of-the-art approaches in computer vision heavily rely on sufficiently large training datasets. For real-world applications, obtaining such a dataset is usually a tedious task. In this paper, we present a fully automated pipeline to generate a synthetic dataset for instance segmentation in four steps. In contrast to existing work, our pipeline covers every step from data acquisition to the final dataset. We first scrape images for the objects of interest from popular image search engines and since we rely only on text-based queries the resulting data comprises a wide variety of images. Hence, image selection is necessary as a second step. This approach of image scraping and selection relaxes the need for a real-world domain-specific dataset that must be either publicly available or created for this purpose. We employ an object-agnostic background removal model and compare three different methods for image selection: Object-agnostic pre-processing, manual image selection and CNN-based image selection. In the third step, we generate random arrangements of the object of interest and distractors on arbitrary backgrounds. Finally, the composition of the images is done by pasting the objects using four different blending methods. We present a case study for our dataset generation approach by considering parcel segmentation. For the evaluation we created a dataset of parcel photos that were annotated automatically. We find that (1) our dataset generation pipeline allows a successful transfer to real test images (Mask AP 86.2), (2) a very accurate image selection process - in contrast to human intuition - is not crucial and a broader category definition can help to bridge the domain gap, (3) the usage of blending methods is beneficial compared to simple copy-and-paste. We made our full code for scraping, image composition and training publicly available at https://***/parcel2d.

关键词： computer vision dataset generation instance segmentation parcel logistics synthetic dataset

来源：评论

学校读者我要写书评

暂无评论

Gaussian mixture model clustering allows accurate semantic image segmentation of wheat kernels from near-infrared hyperspectral images

引用

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS 2025年 259卷

作者： Kartakoullis, Andreas Caporaso, Nicola Whitworth, Martin B. Fisk, Ian D. Univ Nottingham Sch Biosci Div Div Food Nutr & Dietet Sutton Bonington Campus Nottingham LE12 5RD England Campden BRI Chipping Campden GL55 6LD Glos England Buhler UK Ltd London E16 2LD England

In this study, an ad-hoc image processing pipeline has been developed and proposed for the purpose of semantically segmenting wheat kernel data acquired through near-infrared hyperspectral imaging (HSI). The Gaussian Mixture Model (GMM), characterized as a soft clustering method, has been employed for this task, yielding noteworthy results in both kernel and germ segmentation. A comparative analysis was conducted, wherein GMM was compared with two hard clustering methods, hierarchical clustering and k-means, as well as other common clustering algorithms prevalent in food HSI applications. Notably, GMM exhibited the highest accuracy, with a Jaccard index of 0.745, surpassing hierarchical clustering at 0.698 and k-means at 0.652. Furthermore, the spectral variations observed in wheat kernel topology can be used for semantic image segmentation, especially in the context of selecting the germ portion within the wheat kernels. These findings carry practical significance for professionals in the fields of hyperspectral imaging (HSI) and machine vision, particularly for food product quality assessment and real-time inspection.

关键词： NIR hyperspectral imaging image segmentation Real time image processing Grain computer vision

来源：评论

学校读者我要写书评

暂无评论

Road Infrastructure Defect Detection using Yolo8Seg Based Approach

Road Infrastructure Defect Detection using Yolo8Seg Based Ap...

引用

International image processing, applications and Systems Conference (IPAS)

作者： Norah A. AlSubaie Ghayda A. AlMalki Ghada N. AlMutairi Sarah A. AlRumaih Department of Computer Sciences Princess Nourah Bint Abdulrahman University Riyadh Saudi Arabia

ISBN: (数字)9798331506520

ISBN: (纸本)9798331506537

This research introduces "Jaddah," an innovative AI-based system for the automated detection of road infrastructure defects using advanced computer vision and machine learning techniques. The system addresses the limitations of traditional road inspection methods, which are often slow and prone to human error. Jaddah develops a mobile application that efficiently detects, classifies, and segments road defects at the pixel level. By utilizing a comprehensive dataset of high-resolution images, the model training process is significantly enhanced. The YOLOv8-seg model is implemented to achieve precise defect localization and segmentation, ensuring high accuracy in identifying and categorizing road defects. Performance metrics show an impressive 87% mAP50, demonstrating reliable defect detection. These results contribute to improved infrastructure maintenance, enhanced road safety, and greater operational efficiency.

关键词： Measurement Location awareness Computer vision Urban planning machine learning Inspection Road safety Maintenance Mobile applications Defect detection

来源：评论

学校读者我要写书评

暂无评论

Coarse-to-Fine vision-Language Pre-training with Fusion in the Backbone 36

Coarse-to-Fine Vision-Language Pre-training with Fusion in t...

引用

36th Conference on Neural Information processing Systems (NeurIPS)

作者： Dou, Zi-Yi Kamath, Aishwarya Gan, Zhe Zhang, Pengchuan Wang, Jianfeng Li, Linjie Liu, Zicheng Liu, Ce LeCun, Yann Peng, Nanyun Gao, Jianfeng Wang, Lijuan Microsoft Redmond WA USA Univ Calif Los Angeles Los Angeles CA 90095 USA NYU New York NY 10003 USA

ISBN: (纸本)9781713871088

vision-language (VL) pre-training has recently received considerable attention. However, most existing end-to-end pre-training approaches either only aim to tackle VL tasks such as image-text retrieval, visual question answering (VQA) and image captioning that test high-level understanding of images, or only target region-level understanding for tasks such as phrase grounding and object detection. We present FIBER (Fusion-In-the-Backbone-based transformER), a new VL model architecture that can seamlessly handle both these types of tasks. Instead of having dedicated transformer layers for fusion after the uni-modal backbones, FIBER pushes multimodal fusion deep into the model by inserting cross-attention into the image and text backbones, bringing gains in terms of memory and performance. In addition, unlike previous work that is either only pre-trained on image-text data or on fine-grained data with box-level annotations, we present a two-stage pre-training strategy that uses both these kinds of data efficiently: (i) coarse-grained pre-training based on image-text data;followed by (ii) fine-grained pre-training based on image-text-box data. We conduct comprehensive experiments on a wide range of VL tasks, ranging from VQA, image captioning, and retrieval, to phrase grounding, referring expression comprehension, and object detection. Using deep multimodal fusion coupled with the two-stage pre-training, FIBER provides consistent performance improvements over strong baselines across all tasks, often outperforming methods using magnitudes more data. Code is available at https://***/microsoft/FIBER.

关键词： Object detection

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：