检索结果-内蒙古大学图书馆

Unleashing the power of generative adversarial networks: A novel machine learning approach for vehicle detection and localisation in the dark

引用

COGNITIVE COMPUTATION AND SYSTEMS 2023年第3期5卷 169-180页

作者： Onim, Md Saif Hassan Nyeem, Hussain Arnob, Md. Wahiduzzaman Khan Pooja, Arunima Dey Univ Tennessee Elect Engn & Comp Sci Coll Knoxville TN USA Mil Inst Sci & Technol MIST Dept Elect Elect & Commun Engn EECE Dhaka Bangladesh

machine vision in low-light conditions is a critical requirement for object detection in road transportation, particularly for assisted and autonomous driving scenarios. Existing vision-based techniques are limited to daylight traffic scenarios due to their reliance on adequate lighting and high frame rates. This paper presents a novel approach to tackle this problem by investigating Vehicle Detection and Localisation (VDL) in extremely low-light conditions by using a new machine learning model. Specifically, the proposed model employs two customised generative adversarial networks, based on Pix2PixGAN and CycleGAN, to enhance dark images for input into a YOLOv4-based VDL algorithm. The model's performance is thoroughly analysed and compared against the prominent models. Our findings validate that the proposed model detects and localises vehicles accurately in extremely dark images, with an additional run-time of approximately 11 ms and an accuracy improvement of 10%-50% compared to the other models. Moreover, our model demonstrates a 4%-8% increase in Intersection over Union (IoU) at a mean frame rate of 9 fps, which underscores its potential for broader applications in ubiquitous road-object detection. The results demonstrate the significance of the proposed model as an early step to overcoming the challenges of low-light vision in road-object detection and autonomous driving, paving the way for safer and more efficient transportation systems.

关键词： computer vision image processing machine intelligence

来源：评论

学校读者我要写书评

暂无评论

Bridging the machine-Human Gap in Blurred-image Classification via Entropy Maximisation 6

Bridging the Machine-Human Gap in Blurred-Image Classificati...

引用

6th IEEE International Conference on image processing, applications and Systems, IPAS 2025

作者： Sansano-Sansano, Emilio Martinez-Garcia, Marina Portilla, Javier Universitat Jaume I INIT Castellón de la Plana Spain Universitat Jaume I IMAC Castellón de la Plana Spain CSIC Instituto de Óptica Madrid Spain

ISBN: (纸本)9798331506520

Recent studies point to an accuracy gap between humans and Artificial Neural Network (ANN) models when classifying blurred images, with humans outperforming ANNs. To bridge this gap, we introduce a spectral channel-based range-constrained entropy merit function, from which we devise a zero-phase, circular symmetric blind deblurring method. We apply it as a pre-processing step for image classification and test it using pre-trained classification models and images blurred by Gaussian kernels. We compare our method to state-of-the-art restoration methods, showing its superiority, effectively bridging the machine-human gap for most models and blur levels. Our results also rank higher than the competitors in no-reference and full-reference image quality metrics. Notwithstanding the limitation to zero-phase blur, this work shows that, for image pre-processing aimed at visual tasks, it may be advantageous to use merit functions based on vision science and information theory, rather than on the expected error to the latent image. © 2025 IEEE.

关键词： Constrained optimization

来源：评论

学校读者我要写书评

暂无评论

An Experimental Evaluation of LLM on image Classification 35th

An Experimental Evaluation of LLM on Image Classification

引用

35th Australasian Database Conference

作者： Wu, Jiaxuan Tang, Xushuo Yang, Zhengyi Hao, Kongzhang Lai, Longbin Liu, Yongfei Univ Calif Irvine Irvine CA 92697 USA Euler AI Sydney NSW Australia Univ New South Wales Sydney NSW Australia Alibaba Grp Hangzhou Peoples R China

ISBN: (纸本)9789819612413;9789819612420

image classification is one of the fundamental tasks in computer vision (CV) and has numerous practical applications. Traditionally, machine learning and deep learning methods such as k-Nearest Neighbors (kNN), decision trees, and Convolutional Neural Networks (CNN) have been widely used to perform this task. However, with the recent emergence of large language models (LLMs), such as Generative Pre-trained Transformers (GPT), originally designed for natural language processing, their cross-domain applications, including in CV, are now being explored. In this paper, we investigate the capabilities of GPT-4o, a variant of the GPT model, for image classification on the Fashion-MNIST dataset. By using carefully designed prompts, we evaluate GPT-4o's performance and compare it with more traditional models. Our study offers insights into the cross-domain potential of GPT models, explores how prompt engineering can enhance GPT's performance on image classification tasks, and suggests new avenues for developing more flexible and adaptable multimodal LLM systems. The code can be found at https://***/Tanghaha1424/gpt-fashionmnist.

关键词： LLM ChatGPT image Classification Fashion-MNIST

来源：评论

学校读者我要写书评

暂无评论

From Paper to Pixels: A Multi-modal Approach to Understand and Digitize Assembly Drawings for Automated Systems 35th

From Paper to Pixels: A Multi-modal Approach to Understand a...

引用

35th International Conference on Database and Expert Systems applications (DEXA)

作者： Seliger, Raphael Guel-Ficici, Sebnem Goehner, Ulrich Univ Appl Sci Kempten Inst Data Optimized Mfg IDF Bahnhofstr 61 D-87435 Kempten Germany

ISBN: (纸本)9783031683015;9783031683022

The transition to Industry 4.0 intensifies the demand for advanced manufacturing techniques and efficient data processing capabilities. A notable challenge in engineering is that many older engineering drawings are only available in paper form, creating significant barriers for modern automated systems. This study tackles these challenges by employing advanced deep-learning techniques alongside traditional image processing to convert legacy engineering drawings into structured, machine-readable formats. Following this digitization process, this multi-modal approach further processes drawings containing a lot of heterogeneous data by filtering non-essential details to isolate and extract critical features. This process enables the conversion of complex drawings into formats suitable for computer vision and deep learning applications. The structured datasets resulting from this process are then utilized to enhance the efficiency of automated processes significantly. For instance, they enable more efficient pick-and-place operations by providing the data necessary for machine learning-driven automation.

关键词： Deep Learning Computer vision Document Analysis Engineering Drawing Instance Segmentation

来源：评论

学校读者我要写书评

暂无评论

Retinal fundus image enhancement using an ensemble framework for accurate glaucoma detection

引用

Neural Computing and applications 2024年 1-19页

作者： Lenka, Satyabrata Mayaluri, Zefree Lazarus Panda, Ganapati Department of Electrical Engineering C.V. Raman Global University Odisha Bhubaneswar India C.V. Raman Global University Odisha Bhubaneswar India

Retinal fundus imaging plays a crucial role in the diagnosis of ophthalmic diseases such as glaucoma, a significant cause of vision loss worldwide. Accurate detection of glaucoma using image processing, machine learning, and deep learning approaches depends on the effectiveness with which the retinal fundus images are captured. Poor-quality images with artifacts, including uneven illumination, blur, and color distortion, can lead to incorrect diagnoses. In this work, we propose an end-to-end glaucoma detection model based on the ensemble of image enhancement networks, segmentation networks, and image classification networks. The proposed approach consists of an improved version of generative adversarial network (GAN) called the cycle consistency GAN (cycle-GAN) for image quality enhancement, U-Net for optic cup and optic disc segmentation, and support vector machine for image classification. The cycle-GAN model uses autoencoders as generators and a deep convolutional neural network (CNN) as discriminators to generate high-quality fundus images. The cup-to-disc ratio, a popular feature, is utilized to categorize fundus images as either glaucomatous or non-glaucomatous. We use six imbalanced datasets for experimental analysis of the proposed ensemble model, including ORIGA, ACRIMA, DRISTI-GS, REFUGE, Messidor, and Mendeley. The experimental findings demonstrate that the proposed ensemble model works better than individual models such as GAN, Autoencoder, deep CNN, and also from existing methods. The proposed method not only reduces the artifacts from fundus images but also solves the problem of imbalanced datasets for accurate glaucoma detection. The experimental results show maximum accuracy, precision, recall, and F-measure values of 0.968, 0.821, 0.974, and 0.891, respectively. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.

关键词： Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

Oppositional Harris Hawks Optimization with Deep Learning-Based image Captioning

引用

Computer Systems Science & Engineering 2023年第1期44卷 579-593页

作者： V.R.Kavitha K.Nimala A.Beno K.C.Ramya Seifedine Kadry Byeong-Gwon Kang Yunyoung Nam Department of Computer Science and Engineering Prathyusha Engineering CollegeThiruvallur602025India Department of Networking and Communications SRM Institute of Science and TechnologyChennaiIndia Department of Electronics and Communication Engineering Dr.Sivanthi Aditanar College of EngineeringTiruchendur628215India Department of Electrical and Electronics Engineering Sri Krishna College of Engineering and TechnologyCoimbatore641008India Deparmtent of Applied Data Science Noroff University CollegeKristiansandNorway Department of Information and Communication Engineering Soonchunhyang UniversityAsanKorea Department of Computer Science and Engineering Soonchunhyang UniversityAsanKorea

image Captioning is an emergent topic of research in the domain of artificial intelligence(AI).It utilizes an integration of Computer vision(CV)and Natural Language processing(NLP)for generating the image *** use in several application areas namely recommendation in editing applications,utilization in virtual assistance,*** development of NLP and deep learning(DL)modelsfind useful to derive a bridge among the visual details and textual *** this view,this paper introduces an Oppositional Harris Hawks Optimization with Deep Learning based image Captioning(OHHO-DLIC)*** OHHO-DLIC technique involves the design of distinct levels of ***,the feature extraction of the images is carried out by the use of EfficientNet ***,the image captioning is performed by bidirectional long short term memory(BiLSTM)model,comprising encoder as well as *** last,the oppositional Harris Hawks optimization(OHHO)based hyperparameter tuning process is performed for effectively adjusting the hyperparameter of the EfficientNet and BiLSTM *** experimental analysis of the OHHO-DLIC technique is carried out on the Flickr 8k Dataset and a comprehensive comparative analysis highlighted the better performance over the recent approaches.

关键词： image captioning natural language processing artificial intelligence machine learning deep learning

来源：评论

学校读者我要写书评

暂无评论

Hybrid features extraction for the online mineral grades determination in the flotation froth using Deep Learning

引用

ENGINEERING applications OF ARTIFICIAL INTELLIGENCE 2024年 129卷

作者： Bendaouia, Ahmed Abdelwahed, El Hassan Qassimi, Sara Boussetta, Abdelmalek Benzakour, Intissar Benhayoun, Abderrahmane Amar, Oumkeltoum Bourzeix, Francois Baina, Karim Cherkaoui, Mouhamed Hasidi, Oussama Cadi Ayyad Univ Fac Sci Semlalia Comp Sci Dept Comp Syst Engn Lab LISI Marrakech Morocco Moroccan Fdn Adv Sci Innovat & Res UM6P Rabat Morocco R&D & Engn Ctr Managem Grp Reminex Marrakech Morocco Cadi Ayyad Univ Fac Sci & Technol Comp & Syst Engn Lab L2IS Marrakech Morocco Mohammed V Univ Ecole Natl Super Informat Anal Syst ENSIAS Rabat 10000 Morocco ENSMR Engn Sch Rabat Morocco

The control of the froth flotation process in the mineral industry is a challenging task due to its multiple impacting parameters. Accurate and convenient examination of the concentrate grade is a crucial step in realizing effective and real-time control of the flotation process. The goal of this study is to employ image processing techniques and CNN-based features extraction combined with machine learning and deep learning to predict the elemental composition of minerals in the flotation froth. A real world dataset has been collected and preprocessed from a differential flotation circuit at the industrial flotation site based in Guemassa, Morocco. Using image-processing algorithms, the extracted features from the flotation froth include: the texture, the bubble size, the velocity and the color distribution. To predict the mineral concentrate grades, our study includes several supervised machine learning algorithms (ML), artificial neural networks (ANN) and convolutional neural networks (CNN). The industrial experimental evaluations revealed relevant performances with an accuracy up to 0.94. Furthermore, our proposed Hybrid method was evaluated in a real flotation process for the Zn, Pb, Fe and Cu concentrate grades, with an error of precision lesser than 4.53. These results demonstrate the significant potential of our proposed online analyzer as an artificial intelligence application in the field of complex polymetallic flotation circuits (Pb, Fe, Cu, Zn).

关键词： machine learning Deep learning Computer vision Features extraction Mining industry Industry 4.0 Flotation

来源：评论

学校读者我要写书评

暂无评论

CLIPSONIC: TEXT-TO-AUDIO SYNTHESIS WITH UNLABELED VIDEOS AND PRETRAINED LANGUAGE-vision MODELS

CLIPSONIC: TEXT-TO-AUDIO SYNTHESIS WITH UNLABELED VIDEOS AND...

引用

IEEE Workshop on applications of Signal processing to Audio and Acoustics (WASPAA)

作者： Dong, Hao-Wen Liu, Xiaoyu Pons, Jordi Bhattacharya, Gautam Pascual, Santiago Serra, Joan Berg-Kirkpatrick, Taylor McAuley, Julian Dolby Labs San Francisco CA 94103 USA Univ Calif San Diego La Jolla CA 92093 USA

ISBN: (纸本)9798350323726

Recent work has studied text-to-audio synthesis using large amounts of paired text-audio data. However, audio recordings with high-quality text annotations can be difficult to acquire. In this work, we approach text-to-audio synthesis using unlabeled videos and pretrained language-vision models. We propose to learn the desired text-audio correspondence by leveraging the visual modality as a bridge. We train a conditional diffusion model to generate the audio track of a video, given a video frame encoded by a pretrained contrastive language-image pretraining (CLIP) model. At test time, we first explore performing a zero-shot modality transfer and condition the diffusion model with a CLIP-encoded text query. However, we observe a noticeable performance drop with respect to image queries. To close this gap, we further adopt a pretrained diffusion prior model to generate a CLIP image embedding given a CLIP text embedding. Our results show the effectiveness of the proposed method, and that the pretrained diffusion prior can reduce the modality transfer gap. While we focus on text-to-audio synthesis, the proposed model can also generate audio from image queries, and it shows competitive performance against a state-of-the-art image-to-audio synthesis model in a subjective listening test. This study offers a new direction of approaching text-to-audio synthesis that leverages the naturally-occurring audio-visual correspondence in videos and the power of pretrained language-vision models.

关键词： Sound synthesis audio generation multimodal learning diffusion models neural networks machine learning

来源：评论

学校读者我要写书评

暂无评论

Development of an Independent Adversarial Sample Detection Model, Based on image Features 5th

Development of an Independent Adversarial Sample Detection M...

引用

5th International Conference on 3D Imaging Technologies—Multidimensional Signal processing and Deep Learning, 3DIT-MSP and DL 2023

作者： Xu, Long Beijing University of Aeronautics and Astronautics Beijing China

ISBN: (纸本)9789819751808

Independent adversarial sample detection is an important problem in the field of computer vision and machine learning, especially in the context of the widespread use of deep learning models. This can lead to misclassification and performance degradation of the model, so adversarial sample detection is crucial to ensure the reliability of the model. This research focuses on the development of an independent adversarial sample detection model based on image features. A new approach is proposed which does not rely on the original model but focuses on detecting adversarial features in the samples. The effectiveness and robustness of the proposed method is verified in extensive experiments. The model is able to detect independent adversarial samples with high accuracy, regardless of whether the adversarial samples are targeted at a specific deep learning model or not. In addition, the method demonstrates excellent performance on a variety of image datasets and applications in different domains. It is expected to enhance the robustness and reliability of deep learning models, and thus better cope with adversarial sample attacks in practical applications. This approach also has a wide range of applications for a variety of computer vision tasks and domains. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.

关键词： Adversarial machine learning

来源：评论

学校读者我要写书评

暂无评论

Comparing YOLOv8 and Mask R-CNN for instance segmentation in complex orchard environments

引用

Artificial Intelligence in Agriculture 2024年第3期13卷 84-99页

作者： Ranjan Sapkota Dawood Ahmed Manoj Karkee Center for Precision&Automated Agricultural Systems Washington State University24106 N Bunn RdProsser99350 WashingtonUSA

Instance segmentation,an important image processing operation for automation in agriculture,is used to precisely delineate individual objects of interestwithin images,which provides foundational information for various automated or robotic tasks such as selective harvesting and precision *** study compares the one-stage YOLOv8 and the two-stage Mask R-CNN machine learning models for instance segmentation under varying orchard conditions across two *** 1,collected in dormant season,includes images of dormant apple trees,which were used to train multi-object segmentation models delineating tree branches and *** 2,collected in the early growing season,includes images of apple tree canopies with green foliage and immature(green)apples(also called fruitlet),which were used to train single-object segmentation models delineating only immature green *** results showed that YOLOv8 performed better than Mask R-CNN,achieving good precision and near-perfect recall across both datasets at a confidence threshold of ***,for Dataset 1,YOLOv8 achieved a precision of 0.90 and a recall of 0.95 for all *** comparison,Mask R-CNN demonstrated a precision of 0.81 and a recall of 0.81 for the *** Dataset 2,YOLOv8 achieved a precision of 0.93 and a recall of *** R-CNN,in this single-class scenario,achieved a precision of 0.85 and a recall of ***,the inference times for YOLOv8 were 10.9 ms for multi-class segmentation(Dataset 1)and 7.8 ms for single-class segmentation(Dataset 2),compared to 15.6 ms and 12.8 ms achieved by Mask R-CNN's,*** findings showYOLOv8's superior accuracy and efficiency in machine learning applications compared to two-stage models,specifically Mask-R-CNN,which suggests its suitability in developing smart and automated orchard operations,particularly when real-time applications are necessary in such cases as robotic harvesting and robotic immature green fruit thin

关键词： YOLOv8 Mask R-CNN Deep learning machine learning Automation Robotics Artificial intelligence machine vision

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：