In the digital transformation era, video media libraries' untapped potential is immense, restricted primarily by their non-machine-readable nature and basic search functionalities limited to standard metadata. Thi...
详细信息
In the digital transformation era, video media libraries' untapped potential is immense, restricted primarily by their non-machine-readable nature and basic search functionalities limited to standard metadata. This study presents a novel multimodal methodology that utilizes advances in artificial intelligence, including neural networks, computer vision, and natural language processing, to extract and geocode geospatial references from videos. Leveraging the geospatial information from videos enables semantic searches, enhances search relevance, and allows for targeted advertising, particularly on mobile platforms. The methodology involves a comprehensive process, including data acquisition from ARD Mediathek, image and text analysis using advanced machine learning models, and audio and subtitle processing with state-of-the-art linguistic models. Despite challenges like model interpretability and the complexity of geospatial data extraction, this study's findings indicate significant potential for advancing the precision of spatial data analysis within video content, promising to enrich media libraries with more navigable, contextually rich content. This advancement has implications for user engagement, targeted services, and broader urban planning and cultural heritage applications.
Automated detection of road hazards such as speed bumps, has become an important area of research due to its potential to improve road safety in autonomous driving. Various techniques have been introduced to detect th...
详细信息
ISBN:
(数字)9798331506520
ISBN:
(纸本)9798331506537
Automated detection of road hazards such as speed bumps, has become an important area of research due to its potential to improve road safety in autonomous driving. Various techniques have been introduced to detect these hazards using camera vision and artificial intelligence-based imageprocessing methods. However, estimating their distance is still challenging. To address this problem and to satisfy the requirement for real-time on-board data processing, the proposed system has the following properties: (1) high-accuracy road hazard detection by analyzing mono-images and videos with a re-trained YOLO neural network; (2) precise distance measurement utilizing a LiDAR; and (3) efficient local data processing using ROS, implemented on an NVIDIA Jetson AGX Xavier. An important contribution of this paper is introducing multiple classes of road hazards when training the network, instead of only focusing on speed bumps and potholes. Furthermore we have analyzed different LiDAR technologies (standard rotating and non-repetitive circular scanning) to evaluate and compare their precision and to demonstrate that our method can be successfully applied regardless of the scanning pattern of the LiDAR.
Information hiding technology is a technique to hide meaningful information in the public carrier information. When data elements are becoming more and more important, information hiding technology has a better perfor...
详细信息
Interest in Artificial Intelligence (AI) and its applications has seen unprecedented growth in the last few years. This success can be partly attributed to the advancements made in the sub-fields of AI such as machine...
详细信息
Nowadays, we usually compress images before uploading them to social media. However, images on social media can easily be copied, so embedding secret messages in compressed images has become increasingly popular. Ther...
详细信息
ISBN:
(数字)9798331543037
ISBN:
(纸本)9798331543044
Nowadays, we usually compress images before uploading them to social media. However, images on social media can easily be copied, so embedding secret messages in compressed images has become increasingly popular. There are many compression methods, such as Huffman, VQ, ZIP, AMBTC, RAR, JPEG, etc. In this article, we propose an improved data hiding in VQ compression method to achieve better capacity and high quality. Experimental results show that our data-hiding approach is practical.
Small and Medium Enterprises (SMEs) and Micro, Small, and Medium Enterprises (MSMEs) contemplate inte-grating machinevision with high throughput manufacturing lines to ensure a consistent quality of standardized comp...
详细信息
Small and Medium Enterprises (SMEs) and Micro, Small, and Medium Enterprises (MSMEs) contemplate inte-grating machinevision with high throughput manufacturing lines to ensure a consistent quality of standardized components. The inspection productivity can improve considerably by substituting machinevision with manual activities. The pre-trained Convolutional Neural Networks (CNNs) can facilitate enhanced machinevision ca-pabilities compared to the rule-based classical imageprocessing algorithms. However, the non-availability of labeled datasets and lack of expertise in model development restricts their utilities for SMEs and MSMEs. The present work examines the practicality of utilizing publicly available labeled datasets while developing surface defect detection algorithms using pre-trained CNNs considering case studies of typical machined components -flat washers and tapered rollers. It is shown that the publicly available surface defect datasets are ineffective for specific-case such as machined surfaces of flat washers and tapered rollers. The explicitly labeled image datasets can offer better prediction abilities in such cases. A comparative assessment of common pre-trained CNNs is conducted to identify an appropriate network while developing a surface defect detection framework for machined components. The common pre-trained CNNs VGG-19, GoogLeNet, ResNet-50, EfficientNet-b0, and DenseNet-201 showing prediction abilities for similar classification tasks have been examined. The pre-trained CNNs developed using explicit image datasets were implemented to segregate defective flat washers and tapered rollers as sample components manufactured by SMEs and MSMEs. The performance assessment was accomplished using parameters estimated from the confusion matrix. It is observed that EfficientNet-b0 out-performs other networks on most parameters, and it can be preferred while developing a surface defect detection algorithm. The outcomes of the present study form the b
A vision-based automatic bar counting system for two-stage conveying bars is proposed. The system solves the counting problems of sticking and relative sliding of a large number of rebar stacks through image processin...
详细信息
Convolutional Neural Networks (CNNs) play a crucial role in computer vision and machine learning applications, but they are often associated with high computational demands. To tackle this challenge, researchers have ...
详细信息
ISBN:
(数字)9798350354058
ISBN:
(纸本)9798350354065
Convolutional Neural Networks (CNNs) play a crucial role in computer vision and machine learning applications, but they are often associated with high computational demands. To tackle this challenge, researchers have turned to the Fast Fourier Transform (FFT) for spectral convolution to help reduce complexity. However, the Discrete Hirschman Transform (DHT) has emerged as a more efficient alternative for performing linear convolutions. In this study, we introduce a novel CNN methodology based on the principles of the DHT. Our experimental results highlight the impressive efficiency of this approach, significantly lowering both computational complexity and processing time. Additionally, we implement the DHT-based method in hardware to validate its performance in real-world applications, demon-strating its effectiveness in practical scenarios.
In the realm of deep learning, the traditional approach has been to train specialized models for individual tasks, which, although effective, is resource-intensive. The advent of large, universal models has mitigated ...
In the realm of deep learning, the traditional approach has been to train specialized models for individual tasks, which, although effective, is resource-intensive. The advent of large, universal models has mitigated this issue by offering multitask capabilities, reduced training time, and lower computational costs. However, these generalized models often underperform on specific tasks compared to specialized models. This paper introduces an innovative ensemble approach that integrates specialized and generalized models, specifically focusing on Contrastive Language–image Pretraining (CLIP) and EfficientNet. This work proposes three fusion strategies: Weighted Voting, Confidence Comparison, and Fully Connected Network Fusion, and evaluate them using the CIFAR-100 dataset. The ensemble model significantly outperforms individual models, achieving an adjusted accuracy of up to 0.848. The paper also introduces a novel evaluation metric, Confidence-Accuracy Correlation, to assess the reliability of model confidence. The findings could revolutionize ensemble learning by making it more adaptive and suited for real-world applications, thereby pushing the boundaries of possibility in artificial intelligence.
暂无评论