检索结果-内蒙古大学图书馆

10th International Work-Conference on the Interplay Between Natural and Artificial Computation (IWINAC)

作者： Lopez de la Rosa, Francisco Moreno-Salvador, Lucia Gomez-Sirvent, Jose L. Morales, Rafael Sanchez-Reolid, Roberto Fernandez-Caballero, Antonio Univ Castilla La Mancha Inst Invest Informat Albacete Calle Invest 2 Albacete Spain Univ Castilla La Mancha Dept Ingn Elect Elect Automat & Comunicac Ave Espana S-N Albacete Spain Univ Castilla La Mancha Dept Sistemas Informat Ave Espana S-N Albacete Spain

ISBN: (纸本)9783031611360;9783031611377

Convolutional neural networks (CNNs) play an important role in an increasing number of image processing tasks. There is an obvious demand to improve their classification performance and efficiency. Current research in this area tends to focus on developing increasingly complex models and algorithms to achieve this end. However, research into computer vision techniques and data augmentation tends to be neglected. This paper demonstrates that even a very simple CNN model achieves high performance in surface defect classification on the NEU dataset thanks to image preprocessing and data augmentation. The initial F1-score of 0.9646 without image preprocessing increases to 0.9727 when preprocessing is carried out. The simple CNN then achieves an F1-score of 0.9854 after data augmentation.

关键词： Surface defect classification image preprocessing data augmentation convolutional neural network

来源：评论

学校读者我要写书评

暂无评论

Diffusion-Adapter: Text Guided image Manipulation with Frozen Diffusion Models 1

引用

32nd International Conference on Artificial Neural Networks (ICANN)

作者： Wei, Rongting Fan, Chunxiao Wu, Yuexin Beijing Univ Posts & Telecommun Beijing Peoples R China

ISBN: (数字)9783031442100

ISBN: (纸本)9783031442094;9783031442100

Research on vision-language models has seen rapid development, enabling natural language-based processing for image generation and manipulation. Existing text-driven image manipulation is typically implemented by GAN inversion or fine-tuning diffusion models. The former is limited by the inversion capability ofGANs, which fail to reconstruct pictures with novel poses and perspectives. The latter methods require expensive optimization for each input, and fine-tuning is still a complex process. To mitigate these problems, we propose a novel approach, dubbed Diffusion-Adapter, which performs text-driven image manipulation using frozen pre-trained diffusion models. In this work, we design an Adapter architecture to modify the target attributes without fine-tuning the pretrained models. Our approach can be applied to diffusion models in any domain and only take a few examples to train the Adapter that could successfully edit images from unknown data. Compared with previous work, Diffusion-Adapter preserves a maximal amount of details from the original image without unintended changes to the input content. Extensive experiments demonstrate the advantages of our approach over competing baselines, and we make a novel attempt at text-driven image manipulation.

关键词： Diffusion model image manipulation Adapter Multi-modal

来源：评论

学校读者我要写书评

暂无评论

Evaluating the Impact of Lossy Compression on ADAS Deep Learning Models using Fisheye Cameras 26

Evaluating the Impact of Lossy Compression on ADAS Deep Lear...

引用

26th Irish machine vision and image processing Conference, IMVIP 2024

作者： Simha, Srinidhi Mukanahallipatna Molloy, Dara Fahy, Darren Valeo Vision Systems Ireland University of Galway Ireland

ISBN: (纸本)9781837242672

The increasing deployment of Advanced Driver Assistance Systems (ADAS) alongside the continual rise in camera sensor resolution has led to high bandwidth, and generally high cost, computation, and intra-vehicle communication. While the sensor bandwidth impacts the vehicle architecture, it also affects the data collection, storage, deep learning model training, and validation infrastructures. However, if the bandwidth was low, while still achieving the goal of high accuracy ADAS perception, the time and cost associated with creating and deploying the system would be greatly reduced. This study investigates the influence of lossy compression on multi-task deep learning models for real-time perception in ADAS employing fisheye cameras. We leverage a large-scale dataset and train a representative multi-task ADAS perception model for pedestrian, kerb, line, and soiling classification. The testing dataset is subjected to compression using the popular H.264 video codec at varying compression ratios. Through rigorous evaluation, we analyse the effects of compression on model performance, providing insights into the feasibility of employing lossy compression techniques in ADAS applications. Our results reveal that lossy compression could be deployed in automotive perception applications and that a compression ratio of up to 98% (720Mb/s to 12Mb/s), could be utilised with negligible performance degradation. © This is an open access article published by the IET under the Creative Commons Attribution License (http://***/licenses/by/3.0/)

关键词： Advanced driver assistance systems

来源：评论

学校读者我要写书评

暂无评论

Automatic Data processing for Space Robotics machine Learning 74

Automatic Data Processing for Space Robotics Machine Learnin...

引用

74th International Astronautical Congress, IAC 2023

作者： Sheppard, Anja Skinner, Katherine A. Department of Robotics University of Michigan 2505 Hayward St Ann ArborMI48109 United States

Autonomous terrain classification is an important problem in planetary navigation, whether the goal is to identify scientific sites of interest or to traverse treacherous areas safely. Past Martian rovers have relied on human operators to manually identify a navigable path from transmitted imagery. Our goals on Mars in the next few decades will eventually require rovers that can autonomously move farther, faster, and through more dangerous landscapes-demonstrating a need for improved terrain classification for traversability. Autonomous navigation through extreme environments will enable the search for water on the Moon and Mars as well as preparations for human habitats. Advancements in machine learning techniques have demonstrated potential to improve terrain classification capabilities for ground vehicles on Earth. However, classification results for space applications are limited by the availability of training data suitable for supervised learning methods. This paper contributes an open source automatic data processing pipeline that uses camera geometry to co-locate Curiosity and Perseverance Mastcam image products with Mars overhead maps via ray projection over a terrain model. In future work, this automated data processing pipeline will be leveraged for development of machine learning methods for terrain classification. Copyright © 2023 by the International Astronautical Federation (IAF). All rights reserved.

关键词： computer vision geographic information systems open source robotics space

来源：评论

学校读者我要写书评

暂无评论

Train Once, Deploy Anywhere: Edge-Guided Single-source Domain Generalization for Medical image Segmentation 7

Train Once, Deploy Anywhere: Edge-Guided Single-source Domai...

引用

7th International Conference on Medical Imaging with Deep Learning, MIDL 2024

作者： Jiang, Jun Gu, Shi Shenzhen Institute for Advanced Study UESTC Shenzhen China School of Computer Science and Engineering UESTC Chengdu China

In medical image analysis, unsupervised domain adaptation models require retraining when receiving samples from a new data distribution, and multi-source domain generalization methods might be infeasible when there is only a single source domain. These will pose formidable obstacles to model deployment. To this end, we take the"Train Once, Deploy Anywhere" as our objective and consider a challenging but practical problem: Single-source Domain Generalization (SDG). Meanwhile, we note that (i) the medical image segmentation applications where generalization errors often come from imprecise predictions at the ambiguous boundary of anatomies and (ii) the edge of the image is domain-invariant, which can reduce the domain shift between the source and target domain in all network layers. Specifically, we borrow the prior knowledge from Digital image processing and take the edge of the image as input to enhance the model attention at the boundary of anatomies and improve the generalization performance on unknown target domains. Extensive experiments on three typical medical image segmentation datasets, which cover cross-sequence, cross-center, and cross-modality settings with various anatomical structures, demonstrate our method achieves superior generalization performance compared to the state-of-the-art SDG methods. The code is available at https://***/thinkdifferentor/EGSDG. © 2024 CC-BY 4.0, J.J. & S.G.

关键词： image segmentation

来源：评论

学校读者我要写书评

暂无评论

COMPARISON OF EXPLAINABLE AI FOR image CLASSIFICATION TO HUMAN PERCEPTION: A CASE STUDY OF THREADED FASTENERS

COMPARISON OF EXPLAINABLE AI FOR IMAGE CLASSIFICATION TO HUM...

引用

ASME 2024 International Mechanical Engineering Congress and Exposition, IMECE 2024

作者： Gill, Amaninder Singh Agarwal, Ankit Tummala, Vijayanth Lee, Seung-Jin Mears, Laine Faculty of Mechanical Engineering Centralia College CentraliaWA United States International Center for Automotive Research Clemson University GreenvilleSC United States College of Arts and Science Embry-Riddle Aeronautical University Daytona BeachFL United States School of Engineering and Technology University of Washington Tacoma TacomaWA United States

ISBN: (纸本)9780791888605

machine vision in quality control and sorting applications enhance organizational *** vision systems are coupled with a pre-trained Convolutional Neural Network (CNN) to enhance the capability of the system for classification and identification *** overarching research goal of this study is a) to understand how a CNN decides on classifying threaded fasteners, and b) how well does the CNN's decision making compare with that of a *** order to answer the first research question, an image-based fastener identification model augmented with a pre-trained CNN was *** CNN used is called Efficient-Net-b0, that can perform a wide range of image classification *** training set provided to the Efficient-Net-b0 model consisted of labeled images of 12 types of threaded *** data set was enlarged by using image augmentation techniques such as varying the brightness, contrast, and orientation of the captured *** results produced by the CNN classifier were then parsed through Gradient-weighted Class Activation Mapping (Grad-CAM).This technique produce visual explanations of the decisions made by *** is the XAI component of this *** provides transparency in the reasons for the identification and classification done by the Efficient-Net-b0, thereby providing context for the key feature of the threaded fastener that was used to classify and identify *** order to answer the second research question, a user study was *** participants of this study are novice and experienced mechanical engineers enrolled in a Bachelor's and a Master's program at two universities in the United *** aim of this study was to answer three research sub-questions, each of which was compared to the results from the Efficient-Netb0 as explained by *** three questions are: i) Can human subjects distinguish between fasteners within the same category?, ii) What features do human subjects look at when distinguishing betw

关键词： Cams

来源：评论

学校读者我要写书评

暂无评论

Early vision on the Focal-Plane with High Dynamic Range Pixels

Early Vision on the Focal-Plane with High Dynamic Range Pixe...

引用

International Workshop on Compressed Sensing Theory and its applications to Radar, Sonar and Remote Sensing (CoSeRa)

作者： Marko Jaklin D. García-Lesta P. López V.M. Brea Centro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS) Universidade de Santiago de Compostela Santiago de Compostela Spain

ISBN: (数字)9798350365504

ISBN: (纸本)9798350365511

This paper introduces a high dynamic range pixel for early vision processing. Early vision is the first stage to subsequently extract semantic information for image processing or video analytics. This paper proposes to bring said processing to the focal plane, next to a high dynamic range image sensor working on the principle of lateral overflow capacitor. This brings the benefits of processing scenes with a wide dynamic range in a power efficient manner. Circuit simulations for edge detection, as an example of early vision processing conveyed in this paper, show that our proposal meets the accuracy typically found in applications like machine vision. Simulations are in XFAB’s XS018 technology.

关键词： image sensors Accuracy Power demand image edge detection Visual analytics Multimodal sensors Semantics Radar imaging High dynamic range Proposals

来源：评论

学校读者我要写书评

暂无评论

A Review of Research on Instance Segmentation Based on Deep Learning 1

引用

13th International Conference on Computer Engineering and Networks (CENet)

作者： Yang, Qing Peng, Jiansheng Chen, Dunhua Guangxi Univ Sci & Technol Coll Automat Liuzhou 545000 Peoples R China Hechi Univ Dept Artificial Intelligence & Mfg Hechi 547000 Peoples R China

ISBN: (数字)9789819992430

ISBN: (纸本)9789819992454;9789819992430;9789819992423

The field of machine vision has witnessed a significant surge in the application of deep learning technology, as researchers increasingly leverage its capabilities in their work. While deep learning has been extensively used in object detection and semantic segmentation, research on deep learning-based instance segmentation has gained significant traction only in recent years. Instance segmentation is a computer vision task that is closest to the real human visual experience and provides a deep understanding of image scenes. Instance segmentation encompasses more than just pixel-level segmentation of various object categories;it also involves the ability to distinguish and separate individual instances within each category. It can be widely applied in fields such as autonomous driving, assisted medical treatment, and remote sensing imaging. This article systematically summarizes some typical instance segmentation models in two parts: two-stage and single-stage, analyzes and compares the advantages and disadvantages of different algorithms, and conducts performance tests on the COCO dataset. This article also provides a brief introduction to the COCO dataset and instance segmentation evaluation indicators. Finally, the possible future development directions and challenges faced by instance segmentation are discussed.

关键词： Deep Learning Instance Segmentation Computer vision

来源：评论

学校读者我要写书评

暂无评论

SPA 2024 Tutorial

SPA 2024 Tutorial

引用

Signal processing Algorithms, Architectures, Arrangements and applications (SPA)

来源：评论

学校读者我要写书评

暂无评论

An optimized automated recognition of infant sign language using enhanced convolution neural network and deep LSTM

引用

MULTIMEDIA TOOLS AND applications 2023年第18期82卷 28043-28065页

作者： Enireddy, Vamsidhar Anitha, J. Mahendra, N. Kishore, G. Koneru Lakshmaiah Educ Fdn Dept Comp Sci & Engn Guntur 522502 Andhra Pradesh India Malla Reddy Engn Coll Dept Comp Sci & Engn Hyderabad 500100 Telangana India Miracle Educ Soc Grp Inst Miracle City 535216 Andhra Pradesh India RISE Krishna Sai Prakasam Grp Inst Dept CSE Ongole 523272 Andhra Pradesh India

In the world, several sign languages (SL) are used, and BSL (Baby Sign Language) is the process of communication between the parents and baby using gestures. Communication by gestures is a non-verbal process that utilizes motion to pass on realities, expressions and feelings to people. SL is the communication mode in which the information is conveyed via movement of body parts like cheeks, eyebrows and head. Even though many research works based on SL are available, research in BSL remains a challenge. Hence, this paper presents an optimization-based automated recognition of the deep BSL system, which determines the gesture signalled by the kids. Initially, the image frames are extracted from the videos and data augmentation processes are performed. After pre-processing, the features are extracted from the frames using the Enhanced Convolution Neural Network (ECNN). The optimal characteristics are then selected by a new Life Choice Based Optimizer (LCBO). Finally, the classification is carried out by the Deep Long Short-Term Memory (DLSTM) scheme. The implementation is performed on the Python platform, and the performances are evaluated using several performance metrics such as accuracy, precision, kappa, f1-score and recall. The performance of the proposed approach (ECNN-DLSTM) is compared with several deep and machine learning approaches and obtains an accuracy of 99% and a kappa of 96%.

关键词： Baby sign language Automated recognition Computer vision Optimization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：