machinevision in low-light conditions is a critical requirement for object detection in road transportation, particularly for assisted and autonomous driving scenarios. Existing vision-based techniques are limited to...
详细信息
machinevision in low-light conditions is a critical requirement for object detection in road transportation, particularly for assisted and autonomous driving scenarios. Existing vision-based techniques are limited to daylight traffic scenarios due to their reliance on adequate lighting and high frame rates. This paper presents a novel approach to tackle this problem by investigating Vehicle Detection and Localisation (VDL) in extremely low-light conditions by using a new machine learning model. Specifically, the proposed model employs two customised generative adversarial networks, based on Pix2PixGAN and CycleGAN, to enhance dark images for input into a YOLOv4-based VDL algorithm. The model's performance is thoroughly analysed and compared against the prominent models. Our findings validate that the proposed model detects and localises vehicles accurately in extremely dark images, with an additional run-time of approximately 11 ms and an accuracy improvement of 10%-50% compared to the other models. Moreover, our model demonstrates a 4%-8% increase in Intersection over Union (IoU) at a mean frame rate of 9 fps, which underscores its potential for broader applications in ubiquitous road-object detection. The results demonstrate the significance of the proposed model as an early step to overcoming the challenges of low-light vision in road-object detection and autonomous driving, paving the way for safer and more efficient transportation systems.
Recent studies point to an accuracy gap between humans and Artificial Neural Network (ANN) models when classifying blurred images, with humans outperforming ANNs. To bridge this gap, we introduce a spectral channel-ba...
详细信息
image classification is one of the fundamental tasks in computer vision (CV) and has numerous practical applications. Traditionally, machine learning and deep learning methods such as k-Nearest Neighbors (kNN), decisi...
详细信息
ISBN:
(纸本)9789819612413;9789819612420
image classification is one of the fundamental tasks in computer vision (CV) and has numerous practical applications. Traditionally, machine learning and deep learning methods such as k-Nearest Neighbors (kNN), decision trees, and Convolutional Neural Networks (CNN) have been widely used to perform this task. However, with the recent emergence of large language models (LLMs), such as Generative Pre-trained Transformers (GPT), originally designed for natural language processing, their cross-domain applications, including in CV, are now being explored. In this paper, we investigate the capabilities of GPT-4o, a variant of the GPT model, for image classification on the Fashion-MNIST dataset. By using carefully designed prompts, we evaluate GPT-4o's performance and compare it with more traditional models. Our study offers insights into the cross-domain potential of GPT models, explores how prompt engineering can enhance GPT's performance on image classification tasks, and suggests new avenues for developing more flexible and adaptable multimodal LLM systems. The code can be found at https://***/Tanghaha1424/gpt-fashionmnist.
The transition to Industry 4.0 intensifies the demand for advanced manufacturing techniques and efficient data processing capabilities. A notable challenge in engineering is that many older engineering drawings are on...
详细信息
ISBN:
(纸本)9783031683015;9783031683022
The transition to Industry 4.0 intensifies the demand for advanced manufacturing techniques and efficient data processing capabilities. A notable challenge in engineering is that many older engineering drawings are only available in paper form, creating significant barriers for modern automated systems. This study tackles these challenges by employing advanced deep-learning techniques alongside traditional imageprocessing to convert legacy engineering drawings into structured, machine-readable formats. Following this digitization process, this multi-modal approach further processes drawings containing a lot of heterogeneous data by filtering non-essential details to isolate and extract critical features. This process enables the conversion of complex drawings into formats suitable for computer vision and deep learning applications. The structured datasets resulting from this process are then utilized to enhance the efficiency of automated processes significantly. For instance, they enable more efficient pick-and-place operations by providing the data necessary for machine learning-driven automation.
Retinal fundus imaging plays a crucial role in the diagnosis of ophthalmic diseases such as glaucoma, a significant cause of vision loss worldwide. Accurate detection of glaucoma using imageprocessing, machine learni...
详细信息
image Captioning is an emergent topic of research in the domain of artificial intelligence(AI).It utilizes an integration of Computer vision(CV)and Natural Language processing(NLP)for generating the image *** use in s...
详细信息
image Captioning is an emergent topic of research in the domain of artificial intelligence(AI).It utilizes an integration of Computer vision(CV)and Natural Language processing(NLP)for generating the image *** use in several application areas namely recommendation in editing applications,utilization in virtual assistance,*** development of NLP and deep learning(DL)modelsfind useful to derive a bridge among the visual details and textual *** this view,this paper introduces an Oppositional Harris Hawks Optimization with Deep Learning based image Captioning(OHHO-DLIC)*** OHHO-DLIC technique involves the design of distinct levels of ***,the feature extraction of the images is carried out by the use of EfficientNet ***,the image captioning is performed by bidirectional long short term memory(BiLSTM)model,comprising encoder as well as *** last,the oppositional Harris Hawks optimization(OHHO)based hyperparameter tuning process is performed for effectively adjusting the hyperparameter of the EfficientNet and BiLSTM *** experimental analysis of the OHHO-DLIC technique is carried out on the Flickr 8k Dataset and a comprehensive comparative analysis highlighted the better performance over the recent approaches.
The control of the froth flotation process in the mineral industry is a challenging task due to its multiple impacting parameters. Accurate and convenient examination of the concentrate grade is a crucial step in real...
详细信息
The control of the froth flotation process in the mineral industry is a challenging task due to its multiple impacting parameters. Accurate and convenient examination of the concentrate grade is a crucial step in realizing effective and real-time control of the flotation process. The goal of this study is to employ imageprocessing techniques and CNN-based features extraction combined with machine learning and deep learning to predict the elemental composition of minerals in the flotation froth. A real world dataset has been collected and preprocessed from a differential flotation circuit at the industrial flotation site based in Guemassa, Morocco. Using image-processing algorithms, the extracted features from the flotation froth include: the texture, the bubble size, the velocity and the color distribution. To predict the mineral concentrate grades, our study includes several supervised machine learning algorithms (ML), artificial neural networks (ANN) and convolutional neural networks (CNN). The industrial experimental evaluations revealed relevant performances with an accuracy up to 0.94. Furthermore, our proposed Hybrid method was evaluated in a real flotation process for the Zn, Pb, Fe and Cu concentrate grades, with an error of precision lesser than 4.53. These results demonstrate the significant potential of our proposed online analyzer as an artificial intelligence application in the field of complex polymetallic flotation circuits (Pb, Fe, Cu, Zn).
Recent work has studied text-to-audio synthesis using large amounts of paired text-audio data. However, audio recordings with high-quality text annotations can be difficult to acquire. In this work, we approach text-t...
详细信息
ISBN:
(纸本)9798350323726
Recent work has studied text-to-audio synthesis using large amounts of paired text-audio data. However, audio recordings with high-quality text annotations can be difficult to acquire. In this work, we approach text-to-audio synthesis using unlabeled videos and pretrained language-vision models. We propose to learn the desired text-audio correspondence by leveraging the visual modality as a bridge. We train a conditional diffusion model to generate the audio track of a video, given a video frame encoded by a pretrained contrastive language-image pretraining (CLIP) model. At test time, we first explore performing a zero-shot modality transfer and condition the diffusion model with a CLIP-encoded text query. However, we observe a noticeable performance drop with respect to image queries. To close this gap, we further adopt a pretrained diffusion prior model to generate a CLIP image embedding given a CLIP text embedding. Our results show the effectiveness of the proposed method, and that the pretrained diffusion prior can reduce the modality transfer gap. While we focus on text-to-audio synthesis, the proposed model can also generate audio from image queries, and it shows competitive performance against a state-of-the-art image-to-audio synthesis model in a subjective listening test. This study offers a new direction of approaching text-to-audio synthesis that leverages the naturally-occurring audio-visual correspondence in videos and the power of pretrained language-vision models.
Independent adversarial sample detection is an important problem in the field of computer vision and machine learning, especially in the context of the widespread use of deep learning models. This can lead to misclass...
详细信息
Instance segmentation,an important imageprocessing operation for automation in agriculture,is used to precisely delineate individual objects of interestwithin images,which provides foundational information for variou...
详细信息
Instance segmentation,an important imageprocessing operation for automation in agriculture,is used to precisely delineate individual objects of interestwithin images,which provides foundational information for various automated or robotic tasks such as selective harvesting and precision *** study compares the one-stage YOLOv8 and the two-stage Mask R-CNN machine learning models for instance segmentation under varying orchard conditions across two *** 1,collected in dormant season,includes images of dormant apple trees,which were used to train multi-object segmentation models delineating tree branches and *** 2,collected in the early growing season,includes images of apple tree canopies with green foliage and immature(green)apples(also called fruitlet),which were used to train single-object segmentation models delineating only immature green *** results showed that YOLOv8 performed better than Mask R-CNN,achieving good precision and near-perfect recall across both datasets at a confidence threshold of ***,for Dataset 1,YOLOv8 achieved a precision of 0.90 and a recall of 0.95 for all *** comparison,Mask R-CNN demonstrated a precision of 0.81 and a recall of 0.81 for the *** Dataset 2,YOLOv8 achieved a precision of 0.93 and a recall of *** R-CNN,in this single-class scenario,achieved a precision of 0.85 and a recall of ***,the inference times for YOLOv8 were 10.9 ms for multi-class segmentation(Dataset 1)and 7.8 ms for single-class segmentation(Dataset 2),compared to 15.6 ms and 12.8 ms achieved by Mask R-CNN's,*** findings showYOLOv8's superior accuracy and efficiency in machine learning applications compared to two-stage models,specifically Mask-R-CNN,which suggests its suitability in developing smart and automated orchard operations,particularly when real-time applications are necessary in such cases as robotic harvesting and robotic immature green fruit thin
暂无评论