Airborne platforms and satellites provide rich sensor data in the form of hyperspectral images (HSI), which are crucial for numerous vision-related tasks, such as feature extraction, image enhancement, and data synthe...
详细信息
Airborne platforms and satellites provide rich sensor data in the form of hyperspectral images (HSI), which are crucial for numerous vision-related tasks, such as feature extraction, image enhancement, and data synthesis. This article reviews the contextual importance and applications of generative artificial intelligence (GAI) in the advancement of HSI processing. GAI methods address the inherent challenges of HSI data, such as high dimensionality, noise, and the need to preserve spectral-spatial correlations, rendering them indispensable for modern HSI analysis. Generative neural networks, including generative adversarial networks and denoising diffusion probabilistic models, are highlighted for their superior performance in classification, segmentation, and object identification tasks, often surpassing traditional approaches, such as U-Nets, autoencoders, and deep convolutional neural networks. Diffusion models showed competitive performance in tasks, such as feature extraction and image resolution enhancement, particularly in terms of inference time and computational cost. Transformer architectures combined with attention mechanisms further improved the accuracy of generative methods, particularly for preserving spectral and spatial information in tasks, such as image translation, data augmentation, and data synthesis. Despite these advancements, challenges remain, particularly in developing computationally efficient models for super-resolution and data synthesis. In addition, novel evaluation metrics tailored to the complex nature of HSI data are needed. This review underscores the potential of GAI in addressing these challenges while presenting its current strengths, limitations, and future research directions.
Deep learning (DL)-based systems have emerged as powerful methods for the diagnosis and treatment of plant stress, offering high accuracy and efficiency in analyzing imagery data. This review paper aims to present a t...
详细信息
Deep learning (DL)-based systems have emerged as powerful methods for the diagnosis and treatment of plant stress, offering high accuracy and efficiency in analyzing imagery data. This review paper aims to present a thorough overview of the state-of-the-art DL technologies for plant stress detection. For this purpose, a systematic literature review was conducted to identify relevant articles for highlighting the technologies and approaches currently employed in the development of a DL-based plant stress detection system, specifically the advancement of image-based data collection systems, image preprocessing techniques, and deep learning algorithms and their applications in plant stress classification, disease detection, and segmentation tasks. Additionally, this review emphasizes the challenges and future directions in collecting and preprocessingimage data, model development, and deployment in real-world agricultural settings. Some of the key findings from this review paper are: Training data: (i) Most plant stress detection models have been trained on Red Green Blue (RGB) images;(ii) Data augmentation can increase both the quantity and variation of training data;(iii) Handling multimodal inputs (e. g., image, temperature, humidity) allows the model to leverage information from diverse sources, which can improve prediction accuracy;Model Design and Efficiency: (i) Self-supervised learning (SSL) and Few-shot learning (FSL)-based methods may be better than transfer learning (TL)-based models for classifying plant stress when the number of labeled training images are scarce;(ii) Custom designed DL architectures for a specific stress and plant type can have better performance than the state-of-the-art DL architectures in terms of efficiency, overfitting, and accuracy;(iii) The multi-task learning DL structure reuses most of the network architecture while performing multiple tasks (e.g., estimate stress type and severity) simultaneously, which makes the learning much
Recent advances in artificial intelligence (AI) have prompted the search for enhanced algorithms and hardware to support the deployment of machine learning (ML) at the edge. More specifically, in the context of the In...
详细信息
Recent advances in artificial intelligence (AI) have prompted the search for enhanced algorithms and hardware to support the deployment of machine learning (ML) at the edge. More specifically, in the context of the Internet of Things (IoT), vision chips must be able to fulfill the tasks of low to medium complexity, such as feature extraction (FE) or region-of-interest (RoI) detection, with a sub-mW power budget imposed by the use of small batteries or energy harvesting. Mixed-signal vision chips relying on in-or near-sensor processing have emerged as an interesting candidate because of their favorable tradeoff between energy efficiency (EE) and computational accuracy compared with digital systems for these specific tasks. In this article, we introduce a mixed-signal convolutional imager system-on-chip (SoC) codenamed MANTIS, featuring a unique combination of large 16 x 16 4b-weighted filters, operation at multiple scales, and double sampling, well suited to the requirements of medium-complexity tasks. The main contributions are (i) circuits called DS3 units combining delta-reset sampling (DRS), image downsampling (DS), and voltage downshifting and (ii) charge-domain multiply-and-accumulate (MAC) operations based on switched-capacitor (SC) amplifiers and charge sharing in the capacitive DAC of the successive-approximation (SAR) ADCs, MANTIS achieves peak EEs normalized to 1b operations of 4.6 and 84.1 TOPS/W at the accelerator and SoC levels, while computing feature maps (fmaps) with a root-mean-square error (RMSE) ranging from 3 to 11.3%. It also demonstrates a face RoI detection with a false negative rate (FNR) of 11.5%, while discarding 81.3% of image patches and reducing the data transmitted off chip by 13 x compared with the raw image.
Removing shadows in images is often a necessary pre-processing task for improving the performance of computer visionapplications. Deep learning shadow removal approaches require a large-scale dataset that is challeng...
详细信息
Removing shadows in images is often a necessary pre-processing task for improving the performance of computer visionapplications. Deep learning shadow removal approaches require a large-scale dataset that is challenging to gather. To address the issue of limited shadow data, we present a new and cost-effective method of synthetically generating shadows using 3D virtual primitives as occluders. We simulate the shadow generation process in a virtual environment where foreground objects are composed of mapped textures from the Places-365 dataset. We argue that complex shadow regions can be approximated by mixing primitives, analogous to how 3D models in computer graphics can be represented as triangle meshes. We use the proposed synthetic shadow removal dataset, DLSUSynthPlaces-100K, to train a feature-attention-based shadow removal network without explicit domain adaptation or style transfer strategy. The results of this study show that the trained network achieves competitive results with state-of-the-art shadow removal networks that were trained purely on typical SR datasets such as ISTD or SRD. Using a synthetic shadow dataset of only triangular prisms and spheres as occluders produces the best results. Therefore, the synthetic shadow removal dataset can be a viable alternative for future deep-learning shadow removal methods. The source code and dataset can be accessed at this link: https://***/SynthShadowRemoval/.
Depth sensing is an essential technology in robotics and many other fields. Many depth sensing (or RGB-D) cameras are available on the market and selecting the best one for your application can be challenging. In this...
详细信息
Depth sensing is an essential technology in robotics and many other fields. Many depth sensing (or RGB-D) cameras are available on the market and selecting the best one for your application can be challenging. In this work, we tested four stereoscopic RGB-D cameras that sense the distance by using two images from slightly different views. We empirically compared four cameras (Intel RealSense D435, Intel RealSense D455, StereoLabs ZED 2, and Luxonis OAK-D Pro) in three scenarios: (i) planar surface perception, (ii) plastic doll perception, (iii) household object perception (YCB dataset). We recorded and evaluated more than 3,000 RGB-D frames for each camera. For table-top robotics scenarios with distance to objects up to one meter, the best performance is provided by the D435 camera that is able to perceive with an error under 1 cm in all of the tested scenarios. For longer distances, the other three models perform better, making them more suitable for some mobile robotics applications. OAK-D Pro additionally offers integrated AI modules (e.g., object and human keypoint detection). ZED 2 is overall the best camera which is able to keep the error under 3 cm even at 4 meters. However, it is not a standalone device and requires a computer with a GPU for depth data acquisition. All data (more than 12,000 RGB-D frames) are publicly available at https://***/rgbd-comparison
The integration of human-robot interaction (HRI) technologies with industrial automation has become increasingly essential for enhancing productivity and safety in manufacturing environments. In this paper, we propose...
详细信息
Rice is a staple food for a significant portion of the global population, making accurate classification of rice varieties essential for farming and consumer protection. This review provides a focused analysis of the ...
详细信息
Rice is a staple food for a significant portion of the global population, making accurate classification of rice varieties essential for farming and consumer protection. This review provides a focused analysis of the current advancements and challenges in applying computer vision (CV) techniques to rice variety classification. The study examines key steps in the automation process, including image acquisition, pre-processing, feature extraction, and classification algorithms, with particular emphasis on machine learning and deep learning methods such as Convolutional Neural Networks (CNNs), which have demonstrated exceptional performance in recent research. However, practical implementation faces challenges, including the availability of high-quality datasets, the impact of environmental variations on image quality, and the computational demands of complex models. Our study discusses these obstacles and highlights the importance of developing resilient and scalable systems for real-world applications. By synthesizing findings from various studies, this review proposes future directions for advancing rice variety classification, focusing on improved feature extraction techniques, enhanced dataset management, and integrating innovative machine learning paradigms. This work is a valuable resource for researchers and practitioners aiming to advance rice classification technologies and contribute to food security and agricultural sustainability.
Recent studies point to an accuracy gap between humans and Artificial Neural Network (ANN) models when classifying blurred images, with humans outperforming ANNs. To bridge this gap, we introduce a spectral channel-ba...
详细信息
Fatigued drivers often cause traffic accidents. This study introduces a novel method for detecting fatigue that combines machine learning and imageprocessing techniques. We propose a unique approach that utilizes the...
详细信息
image classification is one of the fundamental tasks in computer vision (CV) and has numerous practical applications. Traditionally, machine learning and deep learning methods such as k-Nearest Neighbors (kNN), decisi...
详细信息
ISBN:
(纸本)9789819612413;9789819612420
image classification is one of the fundamental tasks in computer vision (CV) and has numerous practical applications. Traditionally, machine learning and deep learning methods such as k-Nearest Neighbors (kNN), decision trees, and Convolutional Neural Networks (CNN) have been widely used to perform this task. However, with the recent emergence of large language models (LLMs), such as Generative Pre-trained Transformers (GPT), originally designed for natural language processing, their cross-domain applications, including in CV, are now being explored. In this paper, we investigate the capabilities of GPT-4o, a variant of the GPT model, for image classification on the Fashion-MNIST dataset. By using carefully designed prompts, we evaluate GPT-4o's performance and compare it with more traditional models. Our study offers insights into the cross-domain potential of GPT models, explores how prompt engineering can enhance GPT's performance on image classification tasks, and suggests new avenues for developing more flexible and adaptable multimodal LLM systems. The code can be found at https://***/Tanghaha1424/gpt-fashionmnist.
暂无评论