检索结果-内蒙古大学图书馆

32nd ACM International Conference on Information and Knowledge Management (CIKM)

作者： Cheema, Gullal S. Hakimov, Sherzod Kastner, Marc A. Garcia, Noa Leibniz Univ Hannover L3S Res Ctr Hannover Germany Univ Potsdam Potsdam Germany Kyoto Univ Kyoto Japan Osaka Univ Osaka Japan

ISBN: (纸本)9798400701245

Multimodal human understanding and analysis is an emerging research area that cuts through several disciplines like Computer vision (CV), Natural Language processing (NLP), Speech processing, Human-Computer Interaction (HCI), and Multimedia. Several multimodal learning techniques have recently shown the benefit of combining multiple modalities in image-text, audio-visual and video representation learning and various downstream multimodal tasks. At the core, these methods focus on modelling the modalities and their complex interactions by using large amounts of data, different loss functions and deep neural network architectures. However, for many Web and Social media applications, there is the need to model the human, including the understanding of human behaviour and perception. For this, it becomes important to consider interdisciplinary approaches, including social sciences, semiotics and psychology. The core is understanding various cross-modal relations, quantifying bias such as social biases, and the applicability of models to real-world problems. Interdisciplinary theories such as semiotics or gestalt psychology can provide additional insights and analysis on perceptual understanding through signs and symbols via multiple modalities. In general, these theories provide a compelling view of multimodality and perception that can further expand computational research and multimedia applications on the Web and Social media. The theme of the MUWS workshop, multimodal human understanding, includes various interdisciplinary challenges related to social bias analyses, multimodal representation learning, detection of human impressions or sentiment, hate speech, sarcasm in multimodal data, multimodal rhetoric and semantics, and related topics. The MUWS workshop will be an interactive event and include keynotes by relevant experts, poster and demo sessions, research presentations and discussion.

关键词： multimodality machine learning image-text relations social media web human understanding semiotics

来源：评论

学校读者我要写书评

暂无评论

Road Infrastructure Defect Detection using Yolo8Seg Based Approach

Road Infrastructure Defect Detection using Yolo8Seg Based Ap...

引用

International image processing, applications and Systems Conference (IPAS)

作者： Norah A. AlSubaie Ghayda A. AlMalki Ghada N. AlMutairi Sarah A. AlRumaih Department of Computer Sciences Princess Nourah Bint Abdulrahman University Riyadh Saudi Arabia

ISBN: (数字)9798331506520

ISBN: (纸本)9798331506537

This research introduces "Jaddah," an innovative AI-based system for the automated detection of road infrastructure defects using advanced computer vision and machine learning techniques. The system addresses the limitations of traditional road inspection methods, which are often slow and prone to human error. Jaddah develops a mobile application that efficiently detects, classifies, and segments road defects at the pixel level. By utilizing a comprehensive dataset of high-resolution images, the model training process is significantly enhanced. The YOLOv8-seg model is implemented to achieve precise defect localization and segmentation, ensuring high accuracy in identifying and categorizing road defects. Performance metrics show an impressive 87% mAP50, demonstrating reliable defect detection. These results contribute to improved infrastructure maintenance, enhanced road safety, and greater operational efficiency.

关键词： Measurement Location awareness Computer vision Urban planning machine learning Inspection Road safety Maintenance Mobile applications Defect detection

来源：评论

学校读者我要写书评

暂无评论

A novel image processing technique based on deep learning for water consumption detection

A novel image processing technique based on deep learning fo...

引用

IEEE International Instrumentation and Measurement Technology Conference (I2MTC)

作者： Carratu, Marco Dello Iacono, Salvatore Di Leo, Giuseppe Gallo, Vincenzo Liguori, Consolatina Pietrosanto, Antonio Univ Salerno Dept Ind Engn Via Giovanni Paolo II 132 Fisciano SA Italy

ISBN: (纸本)9781665483605

In recent years, traditional image processing techniques have seen the introduction of novel tools, able to face issues that are not always handy with classical vision algorithms. For example, classical image processing algorithms (measurement, detection of features, and many others) require a controlled environment, like illumination, target positioning, and vibration that can influence the scene for the correct operation. On the other hand, the machine learning approaches enabled image processing techniques also in non-controlled environments. One of these applications can be represented by developing a leak detector at the household level, based on processing pictures of the mechanical water meter dial. The proposed research investigates using a deep learning approach to detect the minimal movement of the water meter needles related to water leakage. In particular, a CNN was trained to correlate successive differences on the water meter dial images taken with an applied calibrated water flow. From this analysis, it is possible to detect the absence of periods with null consumption and thus detect small water losses.

关键词： Deep Learning CNN Water leakage image processing

来源：评论

学校读者我要写书评

暂无评论

Scrape, Cut, Paste and Learn: Automated Dataset Generation Applied to Parcel Logistics 21

Scrape, Cut, Paste and Learn: Automated Dataset Generation A...

引用

21st IEEE International Conference on machine Learning and applications (IEEE ICMLA)

作者： Naumann, Alexander Hertlein, Felix Zhou, Benchun Doerr, Laura Furmans, Kai FZI Res Ctr Informat Technol Karlsruhe Germany Karlsruhe Inst Technol KIT Inst Mat Handling & Logist Karlsruhe Germany

ISBN: (纸本)9781665462839

State-of-the-art approaches in computer vision heavily rely on sufficiently large training datasets. For real-world applications, obtaining such a dataset is usually a tedious task. In this paper, we present a fully automated pipeline to generate a synthetic dataset for instance segmentation in four steps. In contrast to existing work, our pipeline covers every step from data acquisition to the final dataset. We first scrape images for the objects of interest from popular image search engines and since we rely only on text-based queries the resulting data comprises a wide variety of images. Hence, image selection is necessary as a second step. This approach of image scraping and selection relaxes the need for a real-world domain-specific dataset that must be either publicly available or created for this purpose. We employ an object-agnostic background removal model and compare three different methods for image selection: Object-agnostic pre-processing, manual image selection and CNN-based image selection. In the third step, we generate random arrangements of the object of interest and distractors on arbitrary backgrounds. Finally, the composition of the images is done by pasting the objects using four different blending methods. We present a case study for our dataset generation approach by considering parcel segmentation. For the evaluation we created a dataset of parcel photos that were annotated automatically. We find that (1) our dataset generation pipeline allows a successful transfer to real test images (Mask AP 86.2), (2) a very accurate image selection process - in contrast to human intuition - is not crucial and a broader category definition can help to bridge the domain gap, (3) the usage of blending methods is beneficial compared to simple copy-and-paste. We made our full code for scraping, image composition and training publicly available at https://***/parcel2d.

关键词： computer vision dataset generation instance segmentation parcel logistics synthetic dataset

来源：评论

学校读者我要写书评

暂无评论

Color Imaging XXVII: Displaying, processing, Hardcopy, and applications

Color Imaging XXVII: Displaying, Processing, Hardcopy, and A...

引用

IS and T International Symposium on Electronic Imaging: 27th Color Imaging: Displaying, processing, Hardcopy, and applications, COLOR 2022

The proceedings contain 154 papers. The topics discussed include: feature-driven 3d range geometry compression via spatially-aware depth encoding;open source deep learning inference libraries for autonomous driving systems;problems in image target-based color correction;improvement of aerial image by simulations;recognition-aware learned image compression;artist-specific style transfer for semantic segmentation of paintings: the value of large corpora of surrogate artworks;data visualization of crime data using immersive virtual reality;a comparison of non-experts and experts using DSIS method;contrast enhancement: cross-modal learning approach for medical images;a continuous bitstream-based blind video quality assessment using multi-layer perceptron;correspondences for image and video reconstruction;design and analysis on low-power and low-noise single slope ADC for digital pixel sensors;incremental two-network approach to develop a purity analyzer system for canola seeds;advantage of machine learning over maximum likelihood in limited-angle low-photon x-ray tomography;image montage detection based on image segmentation and robust hashing techniques;chatbot integrated with machine learning deployed in the cloud and performance evaluation;and the relationship between vision and simulated remote vision system air refueling performance.

关键词：

来源：评论

学校读者我要写书评

暂无评论

涉水视觉

引用

电子学报 2024年第4期52卷 1041-1082页

作者：李学龙西北工业大学光电与智能研究院陕西西安710072 智能交互与应用工业和信息化部重点实验室(西北工业大学) 陕西西安710072

地球表面有约71%的面积被江河湖海等水体覆盖,陆地上的成像也会受到云雪雨雾等水体影响,但是,当前常见的机器视觉科研工作和应用系统基本只围绕空气和真空介质中的视觉任务展开,涉及不同形态水体的视觉工作没有得到系统的研究.涉水视觉(... 详细信息

地球表面有约71%的面积被江河湖海等水体覆盖,陆地上的成像也会受到云雪雨雾等水体影响,但是,当前常见的机器视觉科研工作和应用系统基本只围绕空气和真空介质中的视觉任务展开,涉及不同形态水体的视觉工作没有得到系统的研究.涉水视觉(water-related vision)作为涉水光学技术在视觉领域的具象化体现,重点研究光与水的物质相互作用及跨介质传播过程中,涉水视觉影像信号智能处理与分析方面的科学问题,以及先进智能涉水视觉装备研制方面的工程技术问题.本文从“为什么大海是蓝色的?”这一具有普适意义的问题出发,系统介绍了水对光的吸收、散射、衰减作用机理,对涉水视觉任务造成的影响,以及现有的涉水图像处理与解析方法.本文基于水体光学特性及成像退化机理,介绍了团队在探索涉水成像和图像解析等涉水视觉关键技术及装备方面的成果,先后研制了全海深超高清相机“海瞳”、全海深3D相机、全海深高清摄像机等,形成了从色彩、强度、偏振、光谱等全方位、体系化的水下观测解析装备研制能力,填补了我国全海深光学视觉技术的空白,推动了我国涉水视觉领域技术的升级,应用价值和社会效益显著.

关键词：涉水视觉涉水光学多模态认知计算机器视觉图像视频信号处理地外海洋

来源：评论

学校读者我要写书评

暂无评论

Roberta with Low-Rank Adaptation and Hierarchical Attention for Hallucination Detection in LLMs

Roberta with Low-Rank Adaptation and Hierarchical Attention ...

引用

image processing, Computer vision and machine Learning (ICICML), International Conference on

作者： Jiaxin Lu Siyue Li Trine University Phoenix USA Northeastern University Santa Clara USA

ISBN: (数字)9798350355413

ISBN: (纸本)9798350355420

The prevalence of hallucinations in responses generated by large language models (LLMs) poses significant challenges for the reliability of natural language processing applications. This study addresses the detection of such hallucinations through an enhanced Roberta-base model, specifically targeting hallucination responses produced by the Mistral 7B Instruct model. By implementing Low-Rank Adaptation (LoRA) for fine-tuning and incorporating hierarchical multi-head attention and multi-level self-attention weighting mechanisms, we aim to improve both the accuracy of hallucination detection and the interpretability of the model’s decisions. Our experimental results demonstrate that the proposed model significantly outperforms baseline models across various metrics, including accuracy, precision, recall, and area under the curve (AUC). Future research directions will explore the integration of larger-scale models and additional fine-tuning techniques to further bolster the model’s capacity for detecting hallucinations, thereby enhancing the reliability of LLM outputs.

关键词： Measurement Adaptation models Computer vision Accuracy Attention mechanisms Large language models Computational modeling image processing Natural language processing Reliability

来源：评论

学校读者我要写书评

暂无评论

Chaos Theory Based Gravitational Search Algorithm For Medical image Segmentation 27th

Chaos Theory Based Gravitational Search Algorithm For Medica...

引用

27th International Conference on Pattern Recognition, ICPR 2024

作者： Rather, Sajad Ahmad Roy, Partha Pratim Das, Sujit Department of Computer Science and Engineering Indian Institute of Technology Roorkee247667 India Department of Computer Science and Engineering National Institute of Technology Telangana506004 India

ISBN: (纸本)9783031781032

Multilevel thresholding plays a crucial role in image processing, with extensive applications in object detection, machine vision, medical imaging, and traffic control systems. It entails the partitioning of an image into distinct regions based on optimal pixel values. However, as the number of threshold levels increases, so does the computational cost for segmentation. To address this challenge, a novel method is proposed namely Chaos theory based Gravitational Search Algorithm (CGSA) for multilevel thresholding. CGSA combines the standard Gravitational Search Algorithm (GSA) for exploration with chaotic maps for exploitation of the complex pixel problem space. In this study, Kapur’s entropy method is utilized to segment sample images into various partitions based on optimal pixel values. The effectiveness of CGSA in real-world scenarios is evaluated using COVID-19 chest CT scan imaging datasets from Kaggle database. The quality, symmetry, and consistency of the segmented output are assessed using metrics like Peak Signal to Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM), and Feature Similarity Index Measure (FSIM). Qualitative analysis includes convergence curves, segmented graphs, colormap images, and box plots. Statistical validation is conducted using the signed Wilcoxon rank sum test. Additionally, a comparison is made between CGSA’s performance and that of eight state-of-the-art heuristic algorithms. The findings demonstrate the superior performance of CGSA, evidenced by its reduced computational time and enhanced image quality metrics values. Specifically, CGSA achieved SSIM of 0.81, FSIM of 0.82, and PSNR of 24.27, surpassing the performance of other competitive algorithms. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Signal to noise ratio

来源：评论

学校读者我要写书评

暂无评论

Refiner: Fine-grained Cross-modal Concepts Refinement for Compositional Zero-Shot Learning

Refiner: Fine-grained Cross-modal Concepts Refinement for Co...

引用

2025 IEEE International Conference on Acoustics, Speech, and Signal processing, ICASSP 2025

作者： Zhang, Xiao Jing, Haodong Chen, Hui Ma, Yongqiang Zheng, Nanning National Key Laboratory of Human-Machine Hybrid Augmented Intelligence National Engineering Research Center for Visual Information and Applications Institute of Artificial Intelligence and Robotics Xi'an Jiaotong University Shaanxi China

ISBN: (纸本)9798350368741

Recent Compositional Zero-Shot Learning (CZSL) methods increasingly adopt the pre-trained vision-language models to capture the contextual relations between image and text spaces. However, the single-class-token design from Transformer-based encoder inevitably captures contextual information from unrelated objects and background, thus hindering the modeling of fine-grained class-specific visual features. Suffering from cross-modal gap, prior methods also struggle to improve compositional recognition performance. To address these issues, we propose a fine-grained cross-modal concepts refinement framework, termed as Refiner, which comprises two pivotal components: (i) the fine-grained concepts refinement of image embeddings to capture state-object context within visual scenes, and (ii) the cross-modal information fusion to mitigate the modality gap. By leveraging learnable query vectors to capture region-specific semantic information pertinent to composition labels, our approach refines visual representations with fine-grained state-object context information. As for cross-modal information fusion, we construct a robust image-to-text mapping by aligning visual embeddings with states, objects, and compositions, respectively. Extensive experiments demonstrate that our Refiner achieves new state-of-the-art performance across all popular benchmarks in both closed- and open-world settings. © 2025 IEEE.

关键词： Compositional Zero-shot Learning Cross-Modal Fusion Fine-Grained Refinement Multimodal Models

来源：评论

学校读者我要写书评

暂无评论

Advanced Techniques for Chinese image Captioning: Investigating Attention Mechanisms Based on Object Detection for Chinese image Caption Generation

Advanced Techniques for Chinese Image Captioning: Investigat...

引用

machine Learning and Computer Application (ICMLCA), International Conference on

作者： Yongbin Hua Pei'ang Li Xiangjin Zeng Hong Xu School of Computer Science and Engineering Wuhan Institute of Technology Wuhan Hubei China

ISBN: (数字)9798331530334

ISBN: (纸本)9798331530341

The task of image caption generation aims to automatically produce natural language descriptions that match the content of images, integrating the fields of machine vision and natural language processing, which holds significant theoretical and practical value. Inspired by top-down attention mechanisms, this paper proposes an innovative attention model. Utilizing the output of pretrained object detection networks as prior knowledge for images, the model guides the generation of natural language descriptions. By directly incorporating the results of object detection as attention inputs into the text generation network, the model effectively focuses on key descriptive regions of images, thereby significantly enhancing performance. On public Chinese image captioning datasets, this model demonstrates substantial advantages in metrics such as BLEU-4 and METEOR.

关键词： Measurement Knowledge engineering Attention mechanisms Computational modeling machine vision Object detection Computer applications Data models Natural language processing Meteors

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：