检索结果-内蒙古大学图书馆

Systematic Review of Retinal Blood vessels Segmentation Based on AI-driven Technique

JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024年第4期37卷 1783-1799页

作者： verma, Prem Kumari Kaur, Jagdeep Dr B R Ambedkar Natl Inst Technol Dept Comp Sci & Engn Jalandhar 144008 Punjab India

image segmentation is a crucial task in computer vision and image processing, with numerous segmentation algorithms being found in the literature. It has important applications in scene understanding, medical image analysis, robotic perception, video surveillance, augmented reality, image compression, among others. In light of this, the widespread popularity of deep learning (DL) and machine learning has inspired the creation of fresh methods for segmenting images using DL and ML models respectively. We offer a thorough analysis of this recent literature, encompassing the range of ground-breaking initiatives in semantic and instance segmentation, including convolutional pixel-labeling networks, encoder-decoder architectures, multi-scale and pyramid-based methods, recurrent networks, visual attention models, and generative models in adversarial settings. We study the connections, benefits, and importance of various DL- and ML-based segmentation models;look at the most popular datasets;and evaluate results in this Literature.

关键词： Retinal image segmentation machine learning Deep learning

来源：评论

学校读者我要写书评

暂无评论

Feature Map Guided Adapter Network for Object Detection in Low-light Conditions

Feature Map Guided Adapter Network for Object Detection in L...

引用

IEEE International Symposium on Circuits and Systems (ISCAS)

作者： Pang, Cong Zhou, Wei Li, Haoyan Zhang, Xiangyu Lou, Xin ShanghaiTech Univ Sch Informat Sci & Technol Shanghai Peoples R China Minist Educ Key Lab Intelligent Percept & Human Machine Colla Shanghai Peoples R China

ISBN: (纸本)9798350330991;9798350331004

Conventional ISP pipelines and image enhancement methods are designed and optimized for human vision, creating a gap between the requirements of computer and human visions. To bridge the requirement gap, we present a co-design framework in which backend computer vision plays a pivotal role in shaping the proceeding image processing algorithm. It features a pre-processing adapter network, responsible for the restoration and enhancement of RAW images from computer vision perspective, especially in challenging environmental conditions. Specifically, we extract feature maps from the backend vision network, utilizing them as constraints for optimizing the preprocessing adapter network. To validate the effectiveness of our proposed framework, we employ object detection in low-light conditions as the computer vision task, with YOLO-v5 as the backbone. Given the considerable noise in low-light images, we compare our results with state-of-the-art denoising algorithms, showcasing the superior performance of our framework.

关键词： Computer vision image processing co-design raw image perceptual loss

来源：评论

学校读者我要写书评

暂无评论

A Survey on Multimodal Large Language Models in Radiology for Report Generation and visual Question Answering

引用

INFORMATION 2025年第2期16卷 136-136页

作者： Yi, Ziruo Xiao, Ting Albert, Mark v. Univ North Texas Dept Comp Sci & Engn Denton TX 76205 USA Univ North Texas Dept Informat Sci Denton TX 76205 USA

Large language models (LLMs) and large vision models (LvMs) have driven significant advancements in natural language processing (NLP) and computer vision (Cv), establishing a foundation for multimodal large language models (MLLMs) to integrate diverse data types in real-world applications. This survey explores the evolution of MLLMs in radiology, focusing on radiology report generation (RRG) and radiology visual question answering (RvQA), where MLLMs leverage the combined capabilities of LLMs and LvMs to improve clinical efficiency. We begin by tracing the history of radiology and the development of MLLMs, followed by an overview of MLLM applications in RRG and RvQA, detailing core datasets, evaluation metrics, and leading MLLMs that demonstrate their potential in generating radiology reports and answering image-based questions. We then discuss the challenges MLLMs face in radiology, including dataset scarcity, data privacy and security, and issues within MLLMs such as bias, toxicity, hallucinations, catastrophic forgetting, and limitations in traditional evaluation metrics. Finally, this paper proposes future research directions to address these challenges, aiming to help AI researchers and radiologists overcome these obstacles and advance the study of MLLMs in radiology.

关键词： multimodal large language models (MLLMs) large language models (LLMs) large vision models (LvMs) radiology report generation (RRG) radiology visual question answering (RvQA) radiology

来源：评论

学校读者我要写书评

暂无评论

Automated Detection of Cracks in Asphalt Pavement images Using Texture Descriptors and machine Learning Classifier 8th

Automated Detection of Cracks in Asphalt Pavement Images Usi...

引用

8th International Conference on Computer vision and image processing (CvIP)

作者： Rakshitha, R. Srinath, S. Kumar, N. vinay Rashmi, S. Poornima, B. v. JSS Sci & Technol Univ Mysuru India

ISBN: (纸本)9783031581731;9783031581748

The cracks play a major role in deteriorating the transportation infrastructure. The maintenance of the pavement is done by the early diagnosis of crack. The manual approaches for evaluating a pavement is done by the experts which consumes more time and the occasionally produces subjective results. Hence an 2D digital road image is analyzed to detect the crack automatically. The proposed work focuses on the pre-processing the image, extracting the texture feature and classification using LGBM classifier. The texture descriptor explored here are Grey-level Co-occurrence matrix (GLCM), Local binary pattern (LBP), Gabor filter and their respective combinations for extracting the features on the non-overlapping image blocks (80 x 80 pixels) and a Light Gradient Boosting machine (LGBM) algorithm to classify the image block containing a crack or not and then localizing the cracks. The experimentation was performed on the four standard dataset like Road Damage Dataset (RDD)-2018, Road Damage Dataset (RDD)-2019, Road Damage Dataset (RDD)-2020 and Road Damage Dataset (RDD)-2022 considering the different locations and uneven illumination condition. From the study it is figured out that Co-occurrence matrix of the LBP image with Light Gradient Boosting machine (LGBM) Classifier gave good accuracy results. The accuracy for RDD-2018, RDD-2019, RDD 2020 and RDD 2022 dataset are 0.7707, 0.6778, 0.6227 and 0.6051 respectively. This proposed framework was successfully in identifying and localizing the cracks in a irregular texture background or uneven illumination in the pavement image based on the conventional machine learning approach which helps in the easy maintenance of the pavement.

关键词： Crack detection texture descriptor machine learning LGBM

来源：评论

学校读者我要写书评

暂无评论

Word to Sentence visual Semantic Similarity for Caption Generation: Lessons Learned 18

Word to Sentence Visual Semantic Similarity for Caption Gene...

引用

18th International Conference on machine vision and applications (MvA)

作者： Sabir, Ahmed Univ Politecn Cataluna TALP Res Ctr Barcelona Spain

ISBN: (纸本)9784885523434

This paper focuses on enhancing the captions generated by image captioning systems. We propose an approach for improving caption generation systems by choosing the most closely related output to the image rather than the most likely output produced by the model. Our model revises the language generation output beam search from a visual context perspective. We employ a visual semantic measure in a word and sentence level manner to match the proper caption to the related information in the image. This approach can be applied to any caption system as a post-processing method.

关键词： image enhancement

来源：评论

学校读者我要写书评

暂无评论

Recovering image Information from Colored Speckle under RGB Illumination in an Imaging System 7

Recovering Image Information from Colored Speckle under RGB ...

引用

7th International Conference on machine vision and applications (ICMvA)

作者： Liu, Shuang Hanson, Steen G. Takeda, Mitsuo Wang, Wei Xian Technol Univ Sch Optoelect Engn Int Ctr Optic Res & Educ iCORE Xian 710032 Shaanxi Peoples R China Tech Univ Denmark DTU Foton Dept Photon Engn DK-4000 Roskilde Denmark Utsunomiya Univ Ctr Opt Res & Educ CORE 7-1-2 Yoto Utsunomiya Tochigi 3218585 Japan Heriot Watt Univ Sch Engn & Phys Sci Inst Photon & Quantum Sci Edinburgh EH14 4AS Midlothian Scotland

ISBN: (纸本)9798400716553

Speckle field is one of the most information-rich light fields related to plentiful physical characteristics at present that can be used to provide high-resolution surface topography information or applied to image reconstruction, image enhancement and other fields. However, the most study of speckle image recovery for monochromatic wavelength ignores a large amount of real object information. In this paper, the amplitude information of colored speckle recovery in optical imaging is studied, as might be seen if red, green, and blue lasers illuminate a rough surface with different reflectivity at these three wavelengths. We derived the expression for color speckle distribution and designed an imaging system with a pupil stop, normal plane-wave incidence on the diffuser, and a camera to observe the colored speckled image. In order to analyze the simulation experiment, two aspects are studied: phase shift and average speckle size. The results show that more characteristic information is recovered from the colored speckle image.

关键词： Colored speckle image recovery image processing

来源：评论

学校读者我要写书评

暂无评论

Robot Automatic Wire Welding Based on machine vision 5

Robot Automatic Wire Welding Based on Machine Vision

引用

5th International Conference on Computer Engineering and Application (ICCEA)

作者： Liu, Xinsheng Li, Lidan Xing, Hongmei Wang, Shaohua Zhang, Chuan Diao, Xiyao Zhuo, Zhang China North Vehicle Res Inst Beijing Peoples R China

ISBN: (纸本)9798350386783;9798350386776

Currently, the welding process between electrical connectors and multi-core wires mainly relies on manual operation. This traditional method not only consumes a lot of time and manpower, but also long-term operation may cause certain physical burden and health hazards to the operator. Therefore, researching and implementing automated welding between electrical connectors and multi-core wires has become an urgent problem to be solved. On the basis of summarizing the current research status at home and abroad, the software and hardware parts of the system were designed to meet the requirements of identifying and positioning welding circular electrical connectors. By introducing image processing and machine vision technology, adopting a dual machine collaboration approach and based on machine vision methods, automatic wire welding of electrical connectors has been achieved, improving welding efficiency and reducing the labor intensity of operators. In addition, it is also conducive to promoting the development of industrial automation.

关键词： Industrial robot Automatic welding visual image Aviation plug Wires

来源：评论

学校读者我要写书评

暂无评论

Pure vision Transformer (CT-viT) with Noise2Neighbors Interpolation for Low-Dose CT image Denoising

引用

JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024年第5期37卷 2669-2687页

作者： Marcos, Luella Babyn, Paul Alirezaie, Javad Toronto Metropolitan Univ Dept Elect Biomed & Comp Engn 350 Victoria St Toronto ON M5B 2K3 Canada Univ Saskatchewan Dept Med Imaging 105 Adm Pl Saskatoon SK S7N 0W8 Canada

Convolutional neural networks (CNN) have been used for a wide variety of deep learning applications, especially in computer vision. For medical image processing, researchers have identified certain challenges associated with CNNs. These challenges encompass the generation of less informative features, limitations in capturing both high and low-frequency information within feature maps, and the computational cost incurred when enhancing receptive fields by deepening the network. Transformers have emerged as an approach aiming to address and overcome these specific limitations of CNNs in the context of medical image analysis. Preservation of all spatial details of medical images is necessary to ensure accurate patient diagnosis. Hence, this research introduced the use of a pure vision Transformer (viT) for a denoising artificial neural network for medical image processing specifically for low-dose computed tomography (LDCT) image denoising. The proposed model follows a U-Net framework that contains viT modules with the integration of Noise2Neighbor (N2N) interpolation operation. Five different datasets containing LDCT and normal-dose CT (NDCT) image pairs were used to carry out this experiment. To test the efficacy of the proposed model, this experiment includes comparisons between the quantitative and visual results among CNN-based (BM3D, RED-CNN, DRL-E-MP), hybrid CNN-viT-based (TED-Net), and the proposed pure viT-based denoising model. The findings of this study showed that there is about 15-20% increase in SSIM and PSNR when using self-attention transformers than using the typical pure CNN. visual results also showed improvements especially when it comes to showing fine structural details of CT images.

关键词： CT denoising Convolutional neural networks Deep learning Medical image processing machine learning vision transformers

来源：评论

学校读者我要写书评

暂无评论

A survey on deep learning in UAv imagery for precision agriculture and wild flora monitoring: Datasets, models and challenges

引用

SMART AGRICULTURAL TECHNOLOGY 2024年 9卷

作者： Epifani, Lorenzo Caruso, Antonio Palazzo Fiorini Dept Math & Phys Ennio Giorgi Campus Ecotekne I-73100 Lecce Italy

machine learning is the state of the art for many recurring tasks in several heterogeneous domains. In the last decade, it has been also widely used in Precision Agriculture (PA) and Wild Flora Monitoring (WFM) to address a set of problems with a big impact on economy, society and academia, heralding a paradigm shift across the industry and academia. Many applications in those fields involve image processing and computer vision stages. Remote sensing devices are very popular choice for image acquisition in this context, and in particular, Unmanned Aerial vehicles (UAvs) offer a good tradeoff between cost and area coverage. For these reasons, research literature is rich of works that face problems in Precision Agriculture and Wild Flora Monitoring domains with machine learning/computer vision methods applied to UAv imagery. In this work, we review this literature, with a special focus on algorithms, model sizing, dataset characteristics and innovative technical solutions presented in many domain-specific models, providing the reader with an overview of the research trend in recent years.

关键词： machine learning Deep neural networks image analysis Unmanned aerial vehicles Agritech

来源：评论

学校读者我要写书评

暂无评论

LEGIT: TEXT LEGIBILITY FOR USER-GENERATED MEDIA 31

LEGIT: TEXT LEGIBILITY FOR USER-GENERATED MEDIA

引用

2024 International Conference on image processing

作者： Mandal, Maniratnam Birkbeck, Neil Adsumilli, Balu Bovik, Alan C. Univ Texas Austin Austin TX 78712 USA YouTube Mountain View CA USA Google Inc Mountain View CA USA

ISBN: (纸本)9798350349405;9798350349399

User-generated content (UGC) is ubiquitous across the internet as a result of billions of videos and images being uploaded each day. All kinds of UGC media are affected by natural distortions, occurring both during and after capture, which are inherently diverse and commingled. These distortions have different perceptual effects based on the media content. Given recent dramatic increases in the consumption of short-form content, the analysis and control of their perceptual quality has become an important problem. Regardless of the content, many UGC videos have overlaid and embedded texts in them, which are visually salient. Hence text quality has a significant impact on the global perception of video or image quality and needs to be studied. One of the most important factors in perceptual text quality in user-generated media is legibility, which has been studied very little in the context of computer vision. Predicting text legibility can also help in text recognition applications such as image search or document identification. This work aims at modeling text legibility using computer vision techniques and thus studying the relationship between text quality and legibility. We propose a modified dataset variant of COCO-Text [1] and a model for predicting text legibility for both handwritten and machine-generated texts. We also demonstrate how models trained to predict text legibility can help in the prediction of text (perceptual) quality. The dataset and models can be accessed here https://***/research/Quality/***.

关键词： image Quality Assessment Text Quality Text Legibility User-generated Content

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：