Grayscale video capture remains a popular, low-cost approach for security and surveillance-related tasks, especially on edge devices. We create a deep-learning solution to colorize grayscale videos in real-time withou...
详细信息
The proliferation of drone technology in surveillance, media, and commercial applications has intensified the need for robust privacy protection measures, especially in regions with strict data protection laws like th...
详细信息
Due to the rapid temporal and fine-grained nature of complex human assembly atomic actions, traditional action segmentation approaches requiring the spatial (and often temporal) down sampling of video frames often loo...
详细信息
ISBN:
(纸本)9798350320565
Due to the rapid temporal and fine-grained nature of complex human assembly atomic actions, traditional action segmentation approaches requiring the spatial (and often temporal) down sampling of video frames often loose vital fine-grained spatial and temporal information required for accurate classification within the manufacturing domain. In order to fully utilise higher resolution video data (often collected within the manufacturing domain) and facilitate realtime accurate action segmentation - required for human robot collaboration - we present a novel hand location guided high resolution feature enhanced model. We also propose a simple yet effective method of deploying offline trained action recognition models for realtime action segmentation on temporally short fine-grained actions, through the use of surround sampling while training and temporally aware label cleaning at inference. We evaluate our model on a novel action segmentation dataset containing 24 (+background) atomic actions from video data of a real world robotics assembly production line. Showing both high resolution hand features as well as traditional frame wide features improve fine-grained atomic action classification, and that though temporally aware label clearing our model is capable of surpassing similar encoder/decoder methods, while allowing for realtime classification.
In applications related to traffic management, a specific kind of vehicle recognition is important. This research aims to improve traffic management systems by designing and implementing a lightweight Convolutional Ne...
详细信息
ISBN:
(纸本)9798350360875;9798350360868
In applications related to traffic management, a specific kind of vehicle recognition is important. This research aims to improve traffic management systems by designing and implementing a lightweight Convolutional Neural Network (CNN) for vehicle-type detection from aerial photos. This study aims to develop a model that is accurate in classification and computationally efficient to provide real-timeprocessing skills required for dynamic traffic monitoring. It does this by employing a dataset consisting of high-resolution aerial images taken by drones. The main issue that needs to be addressed is how cars appear differently depending on the angles, sizes, and environmental factors present in aerial imagery. The lightweight CNN architecture is specifically designed to balance performance and computational efficiency, which is critical for implementation in real-time traffic management applications, including low-power devices such as the Raspberry Pi. It optimizes parameter counts and employs approaches that speed up training without sacrificing accuracy. The study's key findings show that the suggested model outperforms pre-trained models in terms of both accuracy and efficiency. The model achieves a testing accuracy of 99.31% while remaining compact, making it ideal for real-time applications.
Searching for available parking spaces can be a painful experience for drivers due to driving around until finding a vacant *** study proposes a new method to automatically detect available parking *** proposed system...
详细信息
Searching for available parking spaces can be a painful experience for drivers due to driving around until finding a vacant *** study proposes a new method to automatically detect available parking *** proposed system identifies empty parking spaces using grayscale images obtained from any type of video *** method was found to successfully identify parking availability under different conditions and *** method was tested using real-life data and achieved a detection rate of 99.7%.This method can be applied in real-time to monitor parking availability and guide drivers to empty *** method has several advantages,including simple algorithms,the use of low-quality black and white images,and simple ***,the system can provide enormous cost savings for locations with existing black and white surveillance cameras instead of replacing existing cameras with new high-quality cameras.
When the image algorithm is directly applied to the video scene and the video is processed frame by frame, an obvious pixel flickering phenomenon is happened, that is the problem of temporal inconsistency. In this pap...
详细信息
ISBN:
(数字)9789819916399
ISBN:
(纸本)9789819916382;9789819916399
When the image algorithm is directly applied to the video scene and the video is processed frame by frame, an obvious pixel flickering phenomenon is happened, that is the problem of temporal inconsistency. In this paper, a temporal consistency enhancement algorithm based on pixel flicker correction is proposed to enhance video temporal consistency. The algorithm consists of temporal stabilization module TSM-Net, optical flow constraint module and loss calculation module. The innovation of TSM-Net is that the ConvGRU network is embedded layer by layer with dual-channel parallel structure in the decoder, which effectively enhances the information extraction ability of the neural network in the time domain space through feature fusion. This paper also proposes a hybrid loss based on optical flow, which sums the temporal loss and the spatial loss to better balance the dominant role of the two during training. It improves temporal consistency while ensuring better perceptual similarity. Since the algorithm does not require optical flow during testing, it achieves real-time performance. This paper conducts experiments based on public datasets to verify the effectiveness of the pixel flicker correction algorithm.
Agriculture is a vital sector for ensuring global food security and promoting sustainable development in every country. Additionally, accurate prediction of crop yield and best harvest time is vital as it will help th...
详细信息
Integration of artificial intelligence in industrial automation has led to significant advancements in new techniques for automation. Such an aspect of industrial automation includes sorting consumables on conveyor be...
详细信息
ISBN:
(纸本)9798350362923;9798350362916
Integration of artificial intelligence in industrial automation has led to significant advancements in new techniques for automation. Such an aspect of industrial automation includes sorting consumables on conveyor belt systems via imageprocessing. Typically, these applications use expensive dedicated, and focus-driven hardware and individual image-processing coding. This paper discusses the development of such an image-processing sorting conveyor belt but utilizing low-cost processors compared to dedicated and focus-driven hardware. This is achieved by using at the core of this system a Convolutional Neural Network (CNN), specifically tailored for hue-based imageprocessing, and implemented on a Raspberry Pi 4B. A standard Pi camera, attached to the Raspberry Pi, captures images for real-time object classification. A key innovation of the system is the utilization of a pixel-based trigger mechanism for image capture, which significantly improves the accuracy and efficiency of the sorting process. The system achieves an accuracy rate of 92.74% in classifying objects as trained, underscoring the efficacy of the approach. Additionally, the system operates in a dual-mode capacity, enabling not only the sorting of existing object types but also the learning and adaptation to new objects through user input. This feature enhances the system's versatility and applicability in various industrial contexts. The paper details the design, implementation, and testing of this AI-driven sorting mechanism, highlighting its potential as a scalable and low-cost solution for modern industrial sorting needs.
With the increasing popularity of digital video applications, video restoration techniques have become increasingly important. This paper presents a flow-based video restoration method that aims to achieve high-qualit...
详细信息
In the current technological landscape, cross-modal retrieval systems have become essential, bridging the gap between diverse data types to boost accessibility and interaction across digital platforms. Our research en...
详细信息
ISBN:
(纸本)9798350367782;9798350367775
In the current technological landscape, cross-modal retrieval systems have become essential, bridging the gap between diverse data types to boost accessibility and interaction across digital platforms. Our research enhances these systems by aiming for the efficient handling of low-resolution inputs, a common challenge in various real-life fields. This was conducted while ensuring robust performance even when high-resolution data is unavailable. The paper introduces advancement to the Local-Global Scene Graph Matching (LGSGM) architecture for cross-modal image/text retrieval, by incorporating a lightweight replacement of the scene graph generation module. The novel MiT-RelTR scene graph generation model is used to optimize the retrieval process. Our contribution improved caption retrieval by achieving a 0.4% increase in Recall@10, which signifies boosted accuracy in processing textual data. Conversely, it resulted in a decline in the image retrieval Recall@10 by 0.9%. Nonetheless, the system's inference speed improved notably, with a 38% increase in frames per second (FPS), bolstering its fitness for real-time applications. These findings illustrate the trade-offs and benefits of refining system components and suggest a need for balanced optimization strategies that equally benefit all modalities.
暂无评论