Automotive simulation can potentially compensate for a lack of training data in computer visionapplications. However, there has been little to no image quality evaluation of automotive simulation and the impact of op...
详细信息
Robot vision servo control systems play an important role in modern automation systems, and image feature extraction and tracking, as its key components, have a direct impact on its performance and application scope. ...
详细信息
Distance calculation is a critical link in the research fields of object trajectory prediction, automatic driving obstacle avoidance, and so on. However, the research on distance using deep learning methods has yet to...
详细信息
Distance calculation is a critical link in the research fields of object trajectory prediction, automatic driving obstacle avoidance, and so on. However, the research on distance using deep learning methods has yet to attract wide attention. The accuracy of traditional distance estimation algorithms based on the optical principle and mathematical modeling is low in practical applications, mainly the curve or slope of the road surface. This paper addresses the challenging distance estimation problem by developing an end-to-end structured model to directly predict the distance for objects in a given image. Besides, the traditional mathematical modeling process is replaced by this learning-based method. To facilitate the research on this task, we construct the extended distance datasets by KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) and NYU(Nathan Silberman, Pushmeet Kohli, Derek Hoiem, Rob Fergus) Depth v2 distance datasets. Experimental results demonstrate that the structured learning model has higher accuracy than the traditional algorithm in different distance ranges and better performance for curves and ramps. Moreover, improving neural network performance will be the direction of improving the model in the future.
image coding for multi-task applications, catering to both human perception and machinevision, has been extensively investigated. Existing methods often rely on multiple task-specific encoder-decoder pairs, leading t...
The distinctive properties and facile integration of 2D materials hold the potential to offer promising avenues for the on-chip photonic devices, and the expeditious and nondestructive identification and localization ...
详细信息
The distinctive properties and facile integration of 2D materials hold the potential to offer promising avenues for the on-chip photonic devices, and the expeditious and nondestructive identification and localization of diverse fundamental building blocks become key prerequisites. Here, we present a methodology grounded in digital imageprocessing and deep learning, which effectively achieves the detection and precise localization of four monolayer-thick triangular single crystals of transition metal dichalcogenides with the mean average precision above 90%, and the approach demonstrates robust recognition capabilities across varied imaging conditions encompassing both white light and monochromatic light. This stands poised to serve as a potent data-driven tool enhancing the characterizing efficiency and holds the potential to expedite research initiatives and applications founded on the utilization of 2D materials. (c) 2024 Optica Publishing Group
Leveraging sensory information to aid the millimeter-wave (mmWave) and sub-terahertz (sub-THz) beam selection process is attracting increasing interest. This sensory data, captured for example by cameras at the basest...
详细信息
Leveraging sensory information to aid the millimeter-wave (mmWave) and sub-terahertz (sub-THz) beam selection process is attracting increasing interest. This sensory data, captured for example by cameras at the basestations, has the potential of significantly reducing the beam sweeping overhead and enabling highly-mobile applications. The solutions developed so far, however, have mainly considered single-candidate scenarios, i.e., scenarios with a single candidate user in the visual scene, and were evaluated using synthetic datasets. To address these limitations, this paper extensively investigates the sensing-aided beam prediction problem in a real-world multi-object vehicle-to-infrastructure (v2I) scenario and presents a comprehensive machine learning based framework. In particular, this paper proposes to utilize visual and positional data to predict the optimal beam indices as an alternative to the conventional beam sweeping approaches. For this, a novel user (transmitter) identification solution has been developed, a key step in realizing sensing-aided multi-candidate and multi-user beam prediction solutions. The proposed solutions are evaluated on the large-scale real-world DeepSense 6G dataset. Experimental results in realistic v2I communication scenarios indicate that the proposed solutions achieve between $67-84\%$ top-1 and close to 100% top-5 beam prediction accuracy for the scenarios with single-user, and between $65-80\%$ top-1 and close to 95% top-5 beam prediction accuracy for multi-candidate scenarios. Furthermore, the proposed approach can identify the probable transmitting candidate with more than 93% accuracy across the different scenarios. This highlights a promising approach for significantly reducing the beam training overhead in mmWave/THz communication systems.
Pre-trained vision-Language Models (vLMs), such as CLIP, have shown enhanced performance across a range of tasks that involve the integration of visual and linguistic modalities. When CLIP is used for depth estimation...
详细信息
ISBN:
(纸本)9798350318920;9798350318937
Pre-trained vision-Language Models (vLMs), such as CLIP, have shown enhanced performance across a range of tasks that involve the integration of visual and linguistic modalities. When CLIP is used for depth estimation tasks, the patches, divided from the input images, can be combined with a series of semantic descriptions of the depth information to obtain similarity results. The coarse estimation of depth is then achieved by weighting and summing the depth values, called depth bins, corresponding to the pre-defined semantic descriptions. The zero-shot approach circumvents the computational and time-intensive nature of traditional fully-supervised depth estimation methods. However, this method, utilizing fixed depth bins, may not effectively generalize as images from different scenes may exhibit distinct depth distributions. To address this challenge, we propose a few-shot-based method which learns to adapt the vLMs for monocular depth estimation to balance training costs and generalization capabilities. Specifically, it assigns different depth bins for different scenes, which can be selected by the model during inference. Additionally, we incorporate learnable prompts to preprocess the input text to convert the easily human-understood text into easily model-understood vectors and further enhance the performance. With only one image per scene for training, our extensive experiment results on the NYU v2 and KITTI dataset demonstrate that our method outperforms the previous state-of-the-art method by up to 10.6% in terms of MARE(1).
In the past years, machine learning (ML) and deep learning (DL) have led to the advancement of several applications, including computer vision, natural language processing, and audio processing. These complex tasks re...
详细信息
ISBN:
(纸本)9798400716164
In the past years, machine learning (ML) and deep learning (DL) have led to the advancement of several applications, including computer vision, natural language processing, and audio processing. These complex tasks require large models, which is a challenge to deploy in devices with limited resources. These resource-constrained devices have limited computation power and memory. Hence, the neural networks must be optimized through network acceleration and compression techniques. This paper proposes a novel method to compress and accelerate neural networks from a small set of spatial convolution kernels. Firstly, a novel pruning algorithm is proposed based on the density-based clustering method that identifies and removes redundancy in CNNs while maintaining the accuracy and throughput tradeoff. Secondly, a novel pruning algorithm based on the grid-based clustering method is proposed to identify and remove redundancy in CNNs. The performance of the three pruning algorithms (density-based, grid-based, and partitional-based clustering algorithms) is evaluated against each other. The experiments were conducted using the deep CNN compression technique on the vGG-16 and ResNet models to achieve higher accuracy on image classification than the original model at a higher compression ratio and speedup.
The reconfiguration of machinevision systems heavily depends on the collection and availability of large datasets, rendering them inflexible and vulnerable to even minor changes in the data. This paper proposes a ref...
详细信息
ISBN:
(纸本)9798350337440
The reconfiguration of machinevision systems heavily depends on the collection and availability of large datasets, rendering them inflexible and vulnerable to even minor changes in the data. This paper proposes a refinement of Miller's Cartesian Genetic Programming methodology, aimed at generating filter pipelines for imageprocessing tasks. The approach is based on CGP-IP, but specifically adapted for imageprocessing in industrial monitoring applications. The suggested method allows for retraining of filter pipelines using small datasets;this concept of self-adaptivity renders high-precision machinevision more resilient to faulty machine settings or changes in the environment and provides compact programs. A dependency graph is introduced to rule out invalid pipeline solutions. Furthermore, we suggest to not only generate pipelines from scratch, but store and reapply previous solutions and re-adjust filter parameters. Our modifications are designed to increase the likelihood of early convergence and improvement in the fitness indicators. This form of self-adaptivity allows for a more resource-efficient configuration of image filter pipelines with small datasets.
Nowadays, video monitoring applications have become significant for observing human activity with the help of computer vision-based approaches for investigating numerous video sequences. The major goal of anomaly iden...
详细信息
Nowadays, video monitoring applications have become significant for observing human activity with the help of computer vision-based approaches for investigating numerous video sequences. The major goal of anomaly identification is to discover the abnormalities automatically in a short interval of time. Performing efficient anomaly detection in a video monitoring system is considered as a complex task due to video noise, spilling, and anomalies. various anomaly detection models based on Artificial Intelligence (AI) have been developed for video surveillance;however, these models often address only specific issues and do not consider the evaluation concerns over time. Hence, this paper aims to implement a video anomaly detection model through surveillance cameras for reducing abnormal activities that enhance the security of the environment. At first, the input videos are collected from the standard benchmark datasets. These collected videos are given in the frame extraction phase. Further, the extracted frames are fed to the object detection phase, where the YOLO-v3 technique is used. Parameter optimization of YOLO-v3 is achieved using the Modified Cat and Mouse Optimization (MCMO) algorithm to improve detection performance. The object-detected frames are fed as input to the ResNet for extracting the deep features. The extracted deep features are utilized for the classification phase, where the Optimized Bi-directional Long Short Term Memory (Bi-LSTM)-Radial Basis Function (RBF) (OBi-LSTM-RBF) provides the classified anomaly outcome. The variables are optimized using the enhanced CMO algorithm for enhancing the efficacy of the anomaly classification. Simulation evaluations are carried out to reveal the effectiveness of the offered approach with diverse baseline algorithms using diverse performance measures. The offered approach shows significant enhancement in accuracy over baseline approaches. Specifically, it outperforms conventional CNNs by 45.47%, DNNs by 90.03%,
暂无评论