In-sensor reservoir computing has recently gained considerable attention due to its highly efficient training process and advanced integration of sensing, storage, and processing functionalities. These advancements gr...
详细信息
In-sensor reservoir computing has recently gained considerable attention due to its highly efficient training process and advanced integration of sensing, storage, and processing functionalities. These advancements greatly enhance the machinevision capabilities by reducing data latency and energy overheads. However, the development of a highly efficient and low-cost in-sensor reservoir computing system remains a challenging task, primarily due to the lack of suitable materials and processes. In this letter, we present a simple ITO/NiOx/Au two-terminal photomemristor fabricated using the full physical vapor deposition (PVD) technique at room temperature without further treatment. This photomemristor leverages light-triggered dynamics to map input signals into a high-dimensional space and extract hidden information. As a proof of concept, we demonstrate an in-sensor reservoir computing system based on the photomemristor. Experimental results indicate that the system exhibits an impressive accuracy of 90.88% for image classification task and a low normalized root mean squared error (NRMSE) of 0.0082 for time-series prediction task. This work has complemented the wide spectrum of applications of NiOx in in-sensor neuromorphic computing.
machinevision in low-light conditions is a critical requirement for object detection in road transportation, particularly for assisted and autonomous driving scenarios. Existing vision-based techniques are limited to...
详细信息
machinevision in low-light conditions is a critical requirement for object detection in road transportation, particularly for assisted and autonomous driving scenarios. Existing vision-based techniques are limited to daylight traffic scenarios due to their reliance on adequate lighting and high frame rates. This paper presents a novel approach to tackle this problem by investigating Vehicle Detection and Localisation (VDL) in extremely low-light conditions by using a new machine learning model. Specifically, the proposed model employs two customised generative adversarial networks, based on Pix2PixGAN and CycleGAN, to enhance dark images for input into a YOLOv4-based VDL algorithm. The model's performance is thoroughly analysed and compared against the prominent models. Our findings validate that the proposed model detects and localises vehicles accurately in extremely dark images, with an additional run-time of approximately 11 ms and an accuracy improvement of 10%-50% compared to the other models. Moreover, our model demonstrates a 4%-8% increase in Intersection over Union (IoU) at a mean frame rate of 9 fps, which underscores its potential for broader applications in ubiquitous road-object detection. The results demonstrate the significance of the proposed model as an early step to overcoming the challenges of low-light vision in road-object detection and autonomous driving, paving the way for safer and more efficient transportation systems.
Recent studies point to an accuracy gap between humans and Artificial Neural Network (ANN) models when classifying blurred images, with humans outperforming ANNs. To bridge this gap, we introduce a spectral channel-ba...
详细信息
image classification is one of the fundamental tasks in computer vision (CV) and has numerous practical applications. Traditionally, machine learning and deep learning methods such as k-Nearest Neighbors (kNN), decisi...
详细信息
ISBN:
(纸本)9789819612413;9789819612420
image classification is one of the fundamental tasks in computer vision (CV) and has numerous practical applications. Traditionally, machine learning and deep learning methods such as k-Nearest Neighbors (kNN), decision trees, and Convolutional Neural Networks (CNN) have been widely used to perform this task. However, with the recent emergence of large language models (LLMs), such as Generative Pre-trained Transformers (GPT), originally designed for natural language processing, their cross-domain applications, including in CV, are now being explored. In this paper, we investigate the capabilities of GPT-4o, a variant of the GPT model, for image classification on the Fashion-MNIST dataset. By using carefully designed prompts, we evaluate GPT-4o's performance and compare it with more traditional models. Our study offers insights into the cross-domain potential of GPT models, explores how prompt engineering can enhance GPT's performance on image classification tasks, and suggests new avenues for developing more flexible and adaptable multimodal LLM systems. The code can be found at https://***/Tanghaha1424/gpt-fashionmnist.
Retinal fundus imaging plays a crucial role in the diagnosis of ophthalmic diseases such as glaucoma, a significant cause of vision loss worldwide. Accurate detection of glaucoma using imageprocessing, machine learni...
详细信息
image Captioning is an emergent topic of research in the domain of artificial intelligence(AI).It utilizes an integration of Computer vision(CV)and Natural Language processing(NLP)for generating the image *** use in s...
详细信息
image Captioning is an emergent topic of research in the domain of artificial intelligence(AI).It utilizes an integration of Computer vision(CV)and Natural Language processing(NLP)for generating the image *** use in several application areas namely recommendation in editing applications,utilization in virtual assistance,*** development of NLP and deep learning(DL)modelsfind useful to derive a bridge among the visual details and textual *** this view,this paper introduces an Oppositional Harris Hawks Optimization with Deep Learning based image Captioning(OHHO-DLIC)*** OHHO-DLIC technique involves the design of distinct levels of ***,the feature extraction of the images is carried out by the use of EfficientNet ***,the image captioning is performed by bidirectional long short term memory(BiLSTM)model,comprising encoder as well as *** last,the oppositional Harris Hawks optimization(OHHO)based hyperparameter tuning process is performed for effectively adjusting the hyperparameter of the EfficientNet and BiLSTM *** experimental analysis of the OHHO-DLIC technique is carried out on the Flickr 8k Dataset and a comprehensive comparative analysis highlighted the better performance over the recent approaches.
The control of the froth flotation process in the mineral industry is a challenging task due to its multiple impacting parameters. Accurate and convenient examination of the concentrate grade is a crucial step in real...
详细信息
The control of the froth flotation process in the mineral industry is a challenging task due to its multiple impacting parameters. Accurate and convenient examination of the concentrate grade is a crucial step in realizing effective and real-time control of the flotation process. The goal of this study is to employ imageprocessing techniques and CNN-based features extraction combined with machine learning and deep learning to predict the elemental composition of minerals in the flotation froth. A real world dataset has been collected and preprocessed from a differential flotation circuit at the industrial flotation site based in Guemassa, Morocco. Using image-processing algorithms, the extracted features from the flotation froth include: the texture, the bubble size, the velocity and the color distribution. To predict the mineral concentrate grades, our study includes several supervised machine learning algorithms (ML), artificial neural networks (ANN) and convolutional neural networks (CNN). The industrial experimental evaluations revealed relevant performances with an accuracy up to 0.94. Furthermore, our proposed Hybrid method was evaluated in a real flotation process for the Zn, Pb, Fe and Cu concentrate grades, with an error of precision lesser than 4.53. These results demonstrate the significant potential of our proposed online analyzer as an artificial intelligence application in the field of complex polymetallic flotation circuits (Pb, Fe, Cu, Zn).
Recent work has studied text-to-audio synthesis using large amounts of paired text-audio data. However, audio recordings with high-quality text annotations can be difficult to acquire. In this work, we approach text-t...
详细信息
ISBN:
(纸本)9798350323726
Recent work has studied text-to-audio synthesis using large amounts of paired text-audio data. However, audio recordings with high-quality text annotations can be difficult to acquire. In this work, we approach text-to-audio synthesis using unlabeled videos and pretrained language-vision models. We propose to learn the desired text-audio correspondence by leveraging the visual modality as a bridge. We train a conditional diffusion model to generate the audio track of a video, given a video frame encoded by a pretrained contrastive language-image pretraining (CLIP) model. At test time, we first explore performing a zero-shot modality transfer and condition the diffusion model with a CLIP-encoded text query. However, we observe a noticeable performance drop with respect to image queries. To close this gap, we further adopt a pretrained diffusion prior model to generate a CLIP image embedding given a CLIP text embedding. Our results show the effectiveness of the proposed method, and that the pretrained diffusion prior can reduce the modality transfer gap. While we focus on text-to-audio synthesis, the proposed model can also generate audio from image queries, and it shows competitive performance against a state-of-the-art image-to-audio synthesis model in a subjective listening test. This study offers a new direction of approaching text-to-audio synthesis that leverages the naturally-occurring audio-visual correspondence in videos and the power of pretrained language-vision models.
Data augmentation is one of the most effective techniques for regularizing deep learning models and improving recognition performance in various tasks and domains. However, this holds for standard in-domain settings, ...
详细信息
ISBN:
(纸本)9798350318920;9798350318937
Data augmentation is one of the most effective techniques for regularizing deep learning models and improving recognition performance in various tasks and domains. However, this holds for standard in-domain settings, in which the training and test data follow the same distribution. For the out-of-domain case, where the test data follow a different and unknown distribution, the best recipe for data augmentation is unclear. In this paper, we show that for out-of-domain and domain generalization settings, data augmentation can provide a conspicuous and robust improvement in performance. To do that, we propose a simple training procedure: (i) use uniform sampling on standard data augmentation transformations;(ii) increase the strength transformations to account for the higher data variance expected when working out-of-domain, and (iii) devise a new reward function to reject extreme transformations that can harm the training. With this procedure, our data augmentation scheme achieves a level of accuracy comparable to or better than state-of-the-art methods on benchmark domain generalization datasets. Code: https://***/Masseeh/DCAug
Independent adversarial sample detection is an important problem in the field of computer vision and machine learning, especially in the context of the widespread use of deep learning models. This can lead to misclass...
详细信息
暂无评论