Embedded computer vision systems are increasingly being adopted across various domains, playing a pivotal role in enabling advanced technologies such as autonomous vehicles and industrial automation. Their cost-effect...
详细信息
Embedded computer vision systems are increasingly being adopted across various domains, playing a pivotal role in enabling advanced technologies such as autonomous vehicles and industrial automation. Their cost-effectiveness, compact size, and portability make them particularly well-suited for diverse implementations and operations. In real-time scenarios, these systems must process visual data with minimal latency, which is crucial for immediate decision-making. However, these solutions continue to face significant challenges related to computational efficiency, memory usage, and accuracy. This research addresses these challenges by enhancing classification methodologies, specifically in Gray Level Co-occurrence Matrix (GLCM) feature extraction and Support Vector machine (SVM) classifiers. To maintain a high level of accuracy while preserving performance, a smaller feature set is selected following a comprehensive complexity analysis and is further refined through Correlation-based Feature Selection (CFS). The proposed method achieves an overall classification accuracy of 84.76% with a feature set reduced by 79.2%, resulting in a 72.45% decrease in processing time, a 50% reduction in storage requirements, and up to a 77.8% decrease in memory demand during prediction. These improvements demonstrate the effectiveness of the proposed approach in improving the adaptability and capabilities of embedded vision systems (EVS), optimizing their performance under the constraints of real-time limited-resource environments.
Transformers have dominated the landscape of Natural Language processing (NLP) and revolutionalized generative AI applications. vision Transformers (VT) have recently become a new state-of-the-art for computer vision ...
详细信息
Transformers have dominated the landscape of Natural Language processing (NLP) and revolutionalized generative AI applications. vision Transformers (VT) have recently become a new state-of-the-art for computer visionapplications. Motivated by the success of VTs in capturing short and long-range dependencies and their ability to handle class imbalance, this paper proposes an ensemble framework of VTs for the efficient classification of Alzheimer's Disease (AD). The framework consists of four vanilla VTs, and ensembles formed using hard and soft-voting approaches. The proposed model was tested using two popular AD datasets: OASIS and ADNI. The ADNI dataset was employed to assess the models' efficacy under imbalanced and data-scarce conditions. The ensemble of VT saw an improvement of around 2% compared to individual models. Furthermore, the results are compared with state-of-the-art and custom-built Convolutional Neural Network (CNN) architectures and machine Learning (ML) models under varying data conditions. The experimental results demonstrated an overall performance gain of 4.14% and 4.72% accuracy over the ML and CNN algorithms, respectively. The study has also identified specific limitations and proposes avenues for future research. The codes used in the study are made publicly available.
Damage to reinforced concrete (RC) facilities occurs through the process of natural deterioration. machine learning can be employed to effectively identify various damage areas and ensure safety. The performance of ma...
详细信息
Damage to reinforced concrete (RC) facilities occurs through the process of natural deterioration. machine learning can be employed to effectively identify various damage areas and ensure safety. The performance of machinevision methods depends on image quality. In this study, five image types (Types I-V) with combinations of image deficiencies pertaining to uniform illuminance, uneven illuminance, orthoimage, tilt angle, and image blur were used to evaluate the damage recognition capabilities of maximum likelihood (MLH), support vector machine (SVM), and random forest (RF) methods. Type I images were orthoimages with uniform illuminance, Type II images were tilted images with uniform illuminance, Type III images were orthoimages with uneven illuminance, Type ivimages were tilted images with uneven illuminance, and Type V images were tilted, blurred images with uneven illuminance. MLH was most accurate (98.6%) in Type I images, and RF was the least accurate (62.8%) in Type V images. image tilt (in Type II images) did not diminish the damage recognition capabilities of the three types of machine learning methods (mean accuracy = 97.2%). For tilted images with uneven illuminance (Type iv), a severe expansion effect was produced, reducing the mean accuracy to 70.1%. Type III images were recognized with a mean accuracy of 87.1%;uneven illuminance increased the error rate for three classes of damage. By testing various image types, the impact of image quality on the variability of machine learning recognition is understood, and the ability of automated machine learning recognition in the future is improved.
X-ray imaging technology has been used for decades in clinical tasks to reveal the internal condition of different organs, and in recent years, it has become more common in other areas such as industry, security, and ...
详细信息
X-ray imaging technology has been used for decades in clinical tasks to reveal the internal condition of different organs, and in recent years, it has become more common in other areas such as industry, security, and geography. The recent development of computer vision and machine learning techniques has also made it easier to automatically process X-ray images and several machine learning-based object (anomaly) detection, classification, and segmentation methods have been recently employed in X-ray image analysis. Due to the high potential of deep learning in related imageprocessingapplications, it has been used in most of the studies. This survey reviews the recent research on using computer vision and machine learning for X-ray analysis in industrial production and security applications and covers the applications, techniques, evaluation metrics, datasets, and performance comparison of those techniques on publicly available datasets. We also highlight some drawbacks in the published research and give recommendations for future research in computer vision-based X-ray analysis.
We introduce a high-performance computer vision based Intravenous (iv) infusion speed measurement system as a camera application on an iPhone or Android phone. Our system uses You Only Look Once version 5 (YOLOv5) as ...
详细信息
We introduce a high-performance computer vision based Intravenous (iv) infusion speed measurement system as a camera application on an iPhone or Android phone. Our system uses You Only Look Once version 5 (YOLOv5) as it was designed for real-time object detection, making it substantially faster than two-stage algorithms such as R-CNN. In addition, YOLOv5 offers greater precision than its predecessors, making it more competitive with other object detection methods. However, YOLOv5 can be challenging to use on a mobile device for several reasons as it requires substantial computational resources for imageprocessing and prediction generation. Thus, we chose the model optimization approach because it requires the least effort to implement. Because NCNN (Neural Network Computing) is a high-performance neural network inference framework optimized for mobile platforms such as Android and iOS, we converted a YOLOv5 model to an NCNN (Novel Convolutional Neural Network) model. Compared to the previous research, our application showed less variability and higher consistency in the infusion flow rate measurement.
This article studies the merits of applying log-gradient input images to convolutional neural networks (CNNs) for tinyML computer vision (CV). We show that log gradients enable: (i) aggressive 1-bit quantization of fi...
详细信息
This article studies the merits of applying log-gradient input images to convolutional neural networks (CNNs) for tinyML computer vision (CV). We show that log gradients enable: (i) aggressive 1-bit quantization of first-layer inputs, (ii) potential CNN resource reductions, (iii) inherent insensitivity to illumination changes (1.7% accuracy loss across 2(-5)... 2(3) brightness variation vs. up to 10% for JPEG), and (iv) robustness to adversarial attacks (>10% higher accuracy than JPEG-trained models). We establish these results using the PASCAL RAW image dataset and through a combination of experiments using quantization threshold search, neural architecture search, and a fixed three-layer network. The latter reveals that training on log-gradient images leads to higher filter similarity, making the CNN more prunable. The combined benefits of aggressive first-layer quantization, CNN resource reductions, and operation without tight exposure control and image signal processing (ISP) are helpful for pushing tinyML CV toward its ultimate efficiency limits.
The AdaMax algorithm provides enhanced convergence properties for stochastic optimization problems. In this paper, we present a regret bound for the AdaMax algorithm, offering a tighter and more refined analysis compa...
详细信息
The AdaMax algorithm provides enhanced convergence properties for stochastic optimization problems. In this paper, we present a regret bound for the AdaMax algorithm, offering a tighter and more refined analysis compared to existing bounds. This theoretical advancement provides deeper insights into the optimization landscape of machine learning algorithms. Specifically, the You Only Look Once (YOLO) framework has become well-known as an extremely effective object segmentation tool, mostly because of its extraordinary accuracy in real-time processing, which makes it a preferred option for many computer visionapplications. Finally, we used this algorithm for image segmentation.
With the rapid advancement in wafer packaging technology, especially the surging demand for chips, enhancing product quality and process efficiency has become increasingly crucial. This article delves into the automat...
详细信息
With the rapid advancement in wafer packaging technology, especially the surging demand for chips, enhancing product quality and process efficiency has become increasingly crucial. This article delves into the automatic detection of pins on Ball Grid Array (BGA) within wafer packaging processes. This system is engineered with a flexible software and hardware architecture to address evolving industrial requirements, facilitating swift adaptation to new processing standards and technological demands. By utilizing Programmable Logic Controller (PLC) to control a three-axis gantry slide combined with industrial camera imaging technology, this system achieves high efficiency and precise positioning, thereby delivering high-quality image. This article utilizes YOLOv10 imageprocessing technology and machine learning algorithms to effectively achieve accurate identification and classification of BGA defects. The YOLOv10 is chosen for its outstanding recognition capabilities and swift processing speed, enabling the rapid and accurate identification of minor defects, such as bent pins, missing pins, and solder ball defects. Through large image analysis, this system has been proven to enhance detection accuracy and reduce errors of manual detection. This article primarily addresses issues in semiconductor manufacturing processes and improves the product yield rate in current production lines. By effectively integrating AI-based detection technology into semiconductor manufacturing, it replaces labor-intensive tasks, enhancing efficiency and precision.
image - Caption Generator is a popular Artificial Intelligence research tool that works with image comprehension and language definition. Creating well-structured sentences requires a thorough understanding of languag...
详细信息
image - Caption Generator is a popular Artificial Intelligence research tool that works with image comprehension and language definition. Creating well-structured sentences requires a thorough understanding of language in a systematic and semantic way. Being able to describe the substance of an image using well-structured phrases is a difficult undertaking, but it can have a significant impact in terms of assisting visually impaired people in better understanding the images' content. image captions has gained a lot of attention as a study subject for various computer vision and natural language processing (NLP) applications. The goal of image captions is to create logical and accurate natural language phrases that describes an image. It relies on the caption model to see items and appropriately characterise their relationships. Intuitively, it is also difficult for a machine to see a typical image in the same way that humans do. It does, however, provide the foundation for intelligent exploration in deep learning. In this review paper, we will focus on the latest in-depth advanced captions techniques for image captioning. This paper highlights related methodologies and focuses on aspects that are crucial in computer recognition, as well as on the numerous strategies and procedures being developed for the development of image captions. It was also observed that Recurrent neural networks (RNNs) are used in the bulk of research works (45%), followed by attention-based models (30%), transformer-based models (15%) and other methods (10%). An overview of the approaches utilised in image captioning research is discussed in this paper. Furthermore, the benefits and drawbacks of these methodologies are explored, as well as the most regularly used data sets and evaluation processes in this sector are being studied.
This paper presents a deep learning method for image dehazing and clarification. The main advantages of the method are high computational speed and using unpaired image data for training. The method adapts the Zero-DC...
详细信息
This paper presents a deep learning method for image dehazing and clarification. The main advantages of the method are high computational speed and using unpaired image data for training. The method adapts the Zero-DCE approach (Li et al. in IEEE Trans Pattern Anal Mach Intell 44(8):4225-4238, 2021) for the image dehazing problem and uses high-order curves to adjust the dynamic range of images and achieve dehazing. Training the proposed dehazing neural network does not require paired hazy and clear datasets but instead utilizes a set of loss functions, assessing the quality of dehazed images to drive the training process. Experiments on a large number of real-world hazy images demonstrate that our proposed network effectively removes haze while preserving details and enhancing brightness. Furthermore, on an affordable GPU-equipped laptop, the processing speed can reach 1000 FPS for images with 2K resolution, making it highly suitable for real-time dehazing applications.
暂无评论