The article proposes an approach to the development of computationally simple and fast algorithms for data preprocessing and the selection of stable features. The following algorithms are used: 1. a modified method of...
详细信息
ISBN:
(纸本)9781510673199;9781510673182
The article proposes an approach to the development of computationally simple and fast algorithms for data preprocessing and the selection of stable features. The following algorithms are used: 1. a modified method of multicriteria processing in local windows. The method is based on minimizing the objective function, which allows both to reduce the noise component in locally stationary areas and to preserve and strengthen the transition boundaries;2. The method of reducing the scope of clusters allows you to change the number of color histograms with the absorption of nearby areas and preservation of objects;3. The method of non-local change in color balance allows you to select areas on a dark/light background when the color balance is shifted;4. Edge detector based on the analysis of local areas in various data layers. The effectiveness test was carried out on a set of test images obtained by the flip chip machine, images by a microcircuit analyzer, as well as data from the product production line. The analyzation frames had low resolution and poor lighting. images are captured in RGB color space.
With the exponential growth of video data, individuals, particularly scholars in the fields of history and sociology, are increasingly reliant on video materials. However, the task of locating specific frames within v...
详细信息
ISBN:
(纸本)9789464593617;9798331519773
With the exponential growth of video data, individuals, particularly scholars in the fields of history and sociology, are increasingly reliant on video materials. However, the task of locating specific frames within videos remains a laborious and time-consuming endeavor. Advanced machine learning-assisted video processing techniques have emerged, including text-based video searches, video summarization, real-time object detection, and person re-identification. However, distinct from these, the main challenge of retrieving video frames based on given visual content is how to efficiently and accurately pinpoint the instance occurrences. To expedite the process while maintaining retrieval performance, we propose a two-stage approach, combining KeyFrame Extraction (KFE) and Content-based image Retrieval (CBIR), underpinned a DNN-empowered framework called MoReSo. Our innovations include 1) the integration of improved statistical features with dynamic clustering in the KFE stage and 2) the development of the MoReSo framework, which consists of MobileNet and ResNet backbones with SOA layer to jointly represent video frames, achieving 2.67x increase in efficiency compared to existing solutions. Our framework is evaluated on two datasets: the annotated EHM Historical Database provided by digital history researchers and the widely-used image retrieval benchmark datasets, the Oxford and Paris datasets. The experimental results showcase that the proposed framework and scheme excel among other models in the CBVIR task. We make our code available for further exploration through our GitHub repository. This repository contains the implementation of our model and CBVIR system with a GUI prototype.
In applications related to traffic management, a specific kind of vehicle recognition is important. This research aims to improve traffic management systems by designing and implementing a lightweight Convolutional Ne...
详细信息
ISBN:
(纸本)9798350360875;9798350360868
In applications related to traffic management, a specific kind of vehicle recognition is important. This research aims to improve traffic management systems by designing and implementing a lightweight Convolutional Neural Network (CNN) for vehicle-type detection from aerial photos. This study aims to develop a model that is accurate in classification and computationally efficient to provide real-timeprocessing skills required for dynamic traffic monitoring. It does this by employing a dataset consisting of high-resolution aerial images taken by drones. The main issue that needs to be addressed is how cars appear differently depending on the angles, sizes, and environmental factors present in aerial imagery. The lightweight CNN architecture is specifically designed to balance performance and computational efficiency, which is critical for implementation in real-time traffic management applications, including low-power devices such as the Raspberry Pi. It optimizes parameter counts and employs approaches that speed up training without sacrificing accuracy. The study's key findings show that the suggested model outperforms pre-trained models in terms of both accuracy and efficiency. The model achieves a testing accuracy of 99.31% while remaining compact, making it ideal for real-time applications.
With the popularization of sports and fitness activities, how to effectively monitor and prevent sports injuries has become an important challenge. This article proposes a sports injury detection and prevention system...
详细信息
Face recognition is used in numerous authentication applications, unfortunately they are susceptible to spoofing attacks such as paper and screen attacks. In this paper, we propose a method that is able to recognise i...
详细信息
ISBN:
(纸本)9783031510229;9783031510236
Face recognition is used in numerous authentication applications, unfortunately they are susceptible to spoofing attacks such as paper and screen attacks. In this paper, we propose a method that is able to recognise if a face detected in a video is not real and the type of attack performed on the fake video. We propose to learn the temporal features exploiting a 3D Convolution Network that is more suitable for temporal information. The 3D ConvNet, other than summarizing temporal information, allows us to build a real-time method since it is so much more efficient to analyse clips instead of analyzing single frames. The learned features are classified using a binary classifier to distinguish if the person in the clip video is real (i.e. live) or not, multi class classifier recognises if the person is real or the type of attack (screen, paper, ect.). We performed our test on 5 public datasets: Replay Attack, Replay Mobile, MSU-MSFD, Rose-Youtu, RECOD-MPAD.
Integration of artificial intelligence in industrial automation has led to significant advancements in new techniques for automation. Such an aspect of industrial automation includes sorting consumables on conveyor be...
详细信息
ISBN:
(纸本)9798350362923;9798350362916
Integration of artificial intelligence in industrial automation has led to significant advancements in new techniques for automation. Such an aspect of industrial automation includes sorting consumables on conveyor belt systems via imageprocessing. Typically, these applications use expensive dedicated, and focus-driven hardware and individual image-processing coding. This paper discusses the development of such an image-processing sorting conveyor belt but utilizing low-cost processors compared to dedicated and focus-driven hardware. This is achieved by using at the core of this system a Convolutional Neural Network (CNN), specifically tailored for hue-based imageprocessing, and implemented on a Raspberry Pi 4B. A standard Pi camera, attached to the Raspberry Pi, captures images for real-time object classification. A key innovation of the system is the utilization of a pixel-based trigger mechanism for image capture, which significantly improves the accuracy and efficiency of the sorting process. The system achieves an accuracy rate of 92.74% in classifying objects as trained, underscoring the efficacy of the approach. Additionally, the system operates in a dual-mode capacity, enabling not only the sorting of existing object types but also the learning and adaptation to new objects through user input. This feature enhances the system's versatility and applicability in various industrial contexts. The paper details the design, implementation, and testing of this AI-driven sorting mechanism, highlighting its potential as a scalable and low-cost solution for modern industrial sorting needs.
Agriculture is a vital sector for ensuring global food security and promoting sustainable development in every country. Additionally, accurate prediction of crop yield and best harvest time is vital as it will help th...
详细信息
In the current technological landscape, cross-modal retrieval systems have become essential, bridging the gap between diverse data types to boost accessibility and interaction across digital platforms. Our research en...
详细信息
ISBN:
(纸本)9798350367782;9798350367775
In the current technological landscape, cross-modal retrieval systems have become essential, bridging the gap between diverse data types to boost accessibility and interaction across digital platforms. Our research enhances these systems by aiming for the efficient handling of low-resolution inputs, a common challenge in various real-life fields. This was conducted while ensuring robust performance even when high-resolution data is unavailable. The paper introduces advancement to the Local-Global Scene Graph Matching (LGSGM) architecture for cross-modal image/text retrieval, by incorporating a lightweight replacement of the scene graph generation module. The novel MiT-RelTR scene graph generation model is used to optimize the retrieval process. Our contribution improved caption retrieval by achieving a 0.4% increase in Recall@10, which signifies boosted accuracy in processing textual data. Conversely, it resulted in a decline in the image retrieval Recall@10 by 0.9%. Nonetheless, the system's inference speed improved notably, with a 38% increase in frames per second (FPS), bolstering its fitness for real-time applications. These findings illustrate the trade-offs and benefits of refining system components and suggest a need for balanced optimization strategies that equally benefit all modalities.
This paper extends a previous conference publication that proposed a real-time task scheduling framework for criticality-based machine perception, leveraging image resizing as the tool to control the accuracy and exec...
详细信息
This paper extends a previous conference publication that proposed a real-time task scheduling framework for criticality-based machine perception, leveraging image resizing as the tool to control the accuracy and execution time trade-off. Criticality-based machine perception reduces the computing demand of on-board AI-based machine inference pipelines (that run on embedded hardware) in applications such as autonomous drones and cars. By segmenting inputs, such as individual video frames, into smaller parts and allowing the downstream AI-based perception module to process some segments ahead of (or at a higher quality than) others, limited machine resources are spent more judiciously on more important parts of the input (e.g., on foreground objects in lieu of backgrounds). In recent work, we explored the use of image resizing as a way to offer a middle ground between full-resolution processing and dropping, thus allowing more flexibility in handling less important parts of the input. In this journal extension, we make the following contributions: (i) We relax a limiting assumption of our prior work;namely, the need for a "perfect sensor" to identify which parts of the image are more critical. Instead, we investigate the use of real LiDAR measurements for quick-and-dirty image segmentation ahead of AI-based processing. (ii) We explore another dimension of freedom in the scheduler: namely, merging several nearby objects into a consolidated segment for downstream processing. We formulate the scheduling problem as an optimal resize-merge problem and design a solution for it. Experiments on an AI-powered embedded platform with a real-world driving dataset demonstrate the practicality and effectiveness of our proposed framework.
Due to the increasing number of tumors, new interventional Computed Tomography (CT) procedures have been proposed that aim to optimize workflow, time-effective diagnosis and treatments. To support tumor ablation proce...
详细信息
ISBN:
(纸本)9783031661457;9783031661464
Due to the increasing number of tumors, new interventional Computed Tomography (CT) procedures have been proposed that aim to optimize workflow, time-effective diagnosis and treatments. To support tumor ablation procedures, CT scanners must pre-process 2D projections and reconstruct 3D slices of the human body in realtime, while data are acquired. This paper proposes a lightweight processing architecture for MPSoC-FPGA that performs the "CT pre-processing phase" on the fly;this phase consists of the pixel processing of 2D images. It is also suitable for exploring different data formats that can be selected at design time to improve performance while keeping image quality. This article focuses on the cosine and redundancy weighting steps, which can not be implemented following the standard method on embedded MPSoC-FPGA, due to the high resource utilization costs of their arithmetic operations. Therefore, this work proposes different optimizations that result in a reduction of the number of operations to compute and the amount of on-chip memory required in comparison to the standard algorithm. Finally, the proposed architecture has been implemented and instantiated within a Control Data Acquisition System (CDAS) architecture running on the XC7Z045 AMD-Xilinx MPSoC-FPGA and integrated into an open-interface CT scanner assembled in our laboratory. Here, the optimized weighting steps use up to 33.8 times fewer DSPs than the implementation based on the standard solution. Furthermore, it adds only 80 ns of latency, making it 7.9 times faster than the implementation based on the standard solution.
暂无评论