Anomaly detection (AD) is a challenging problem in computer vision. Particularly in the field of medical imaging, AD poses even more challenges due to a number of reasons, including insufficient availability of ground...
详细信息
Anomaly detection (AD) is a challenging problem in computer vision. Particularly in the field of medical imaging, AD poses even more challenges due to a number of reasons, including insufficient availability of ground truth (annotated) data. In recent years, AD models based on generative adversarial networks (GANs) have made significant progress. However, their effectiveness in biomedical imaging remains underexplored. In this paper, we present an overview of using GANs for AD, as well as an investigation of state-of-the-art GAN-based AD methods for biomedical imaging and the challenges encountered in detail. We have also specifically investigated the advantages and limitations of AD methods on medical image datasets, conducting experiments using 3 AD methods on 7 medical imaging datasets from different modalities and organs/tissues. Given the highly different findings achieved across these experiments, we further analyzed the results from both data-centric and model-centric points of view. The results showed that none of the methods had a reliable performance for detecting abnormalities in medical images. Factors such as the number of training samples, the subtlety of the anomaly, and the dispersion of the anomaly in the images are among the phenomena that highly impact the performance of the AD models. The obtained results were highly variable (AUC: 0.475-0.991;Sensitivity: 0.17-0.98;Specificity: 0.14-0.97). In addition, we provide recommendations for the deployment of AD models in medical imaging and foresee important research directions.
Printed circuit boards (PCBs) becoming more complex as technology advances, adding new components and changing their architecture. One of the most crucial quality control procedures is PCB surface inspection since eve...
详细信息
Large vision-Language Models (LvLMs) have shown remarkable performance on many visual-language tasks. However, these models still suffer from multimodal hallucination, which means the generation of objects or content ...
详细信息
Electromyography (EMG) signals have been used in designing muscle-machine interfaces (MuMIs) for various applications, ranging from entertainment (EMG controlled games) to human assistance and human augmentation (EMG ...
详细信息
Electromyography (EMG) signals have been used in designing muscle-machine interfaces (MuMIs) for various applications, ranging from entertainment (EMG controlled games) to human assistance and human augmentation (EMG controlled prostheses and exoskeletons). For this, classical machine learning methods such as Random Forest (RF) models have been used to decode EMG signals. However, these methods depend on several stages of signal pre-processing and extraction of hand-crafted features so as to obtain the desired output. In this work, we propose EMG based frameworks for the decoding of object motions in the execution of dexterous, in-hand manipulation tasks using raw EMG signals input and two novel deep learning (DL) techniques called Temporal Multi-Channel Transformers and vision Transformers. The results obtained are compared, in terms of accuracy and speed of decoding the motion, with RF-based models and Convolutional Neural Networks as a benchmark. The models are trained for 11 subjects in a motion-object specific and motion-object generic way, using the 10-fold cross-validation procedure. This study shows that the performance of MuMIs can be improved by employing DL-based models with raw myoelectric activations instead of developing DL or classic machine learning models with hand-crafted features.
Moving Object Segmentation (MOS) is a fundamental task in computer vision. Due to undesirable variations in the background scene, MOS becomes very challenging for static and moving camera sequences. Several deep learn...
详细信息
Moving Object Segmentation (MOS) is a fundamental task in computer vision. Due to undesirable variations in the background scene, MOS becomes very challenging for static and moving camera sequences. Several deep learning methods have been proposed for MOS with impressive performance. However, these methods show performance degradation in the presence of unseen videos;and usually, deep learning models require large amounts of data to avoid overfitting. Recently, graph learning has attracted significant attention in many computer visionapplications since they provide tools to exploit the geometrical structure of data. In this work, concepts of graph signal processing are introduced for MOS. First, we propose a new algorithm that is composed of segmentation, background initialization, graph construction, unseen sampling, and a semi-supervised learning method inspired by the theory of recovery of graph signals. Second, theoretical developments are introduced, showing one bound for the sample complexity in semi-supervised learning, and two bounds for the condition number of the Sobolev norm. Our algorithm has the advantage of requiring less labeled data than deep learning methods while having competitive results on both static and moving camera videos. Our algorithm is also adapted for video Object Segmentation (vOS) tasks and is evaluated on six publicly available datasets outperforming several state-of-the-art methods in challenging conditions.
The violence-related instances had surged recently in areas including footpaths, sports stadiums, remote roads, liquor stores and elevators that are tragically discovered only after some time. In exploring this issue,...
详细信息
The violence-related instances had surged recently in areas including footpaths, sports stadiums, remote roads, liquor stores and elevators that are tragically discovered only after some time. In exploring this issue, the complete video analysis model's potential to determine any violent acts from the sequence of video clips is evolved. However, the recent studies that work on the violent detection approach majorly focus on traditional hand-crafted features, less performance accuracy in violence detection and do not make entire utilization of deep learning research outcomes in computer vision. The proposed system is put forth a violence detection framework based on (CNN) Convolutional neural network with (LSTM) Long short-term memory feature extraction process and fine-tuned the image frame hyperparameter from extracted features using Random forest classifier updated with weight score through (WLS) Weight least square algorithm. The Model in prior subjected to the feature extraction phase and the image frames are segmented through the mosaicking pre-processing step, with a 30:20 enlargement ratio to image mosaics, aiding to generate time-consistent outcomes and algorithm's performance improvisation through minimizing search space. The integration of CNN and LSTM framework is applied to reduce the complexity of the extraction learning process, and the LSTM network in correlating feature value with past information, and retaining memory space. The dynamic weighing scheme is proposed with the WLS method and this weighted score is assigned to the most probable class in the decision tree. Such more similar parameters as hyperparameters were tuned through a random forest classifier, and it categorizes the outcomes as non-fight or fights clips dynamically. The comparative performance evaluation of the proposed framework (DFE-WLSRF), Deep feature extraction - Weighted least square random-forest classifier delineated the outperforming high accuracy results in comparison to o
Although image captioning has a vast array of applications, it has not reached its full potential in languages other than English. Arabic, for instance, although the native language of more than 400 million people, re...
详细信息
This paper presents a comprehensive comparative analysis of image partitioning and compression mechanisms, two fundamental techniques in imageprocessing and data compression. image partitioning involves dividing an i...
详细信息
High Dynamic Range (HDR) imaging has become a significant technological advancement in visual data processing, allowing for the capture of a wider dynamic range of luminance levels in images. This paper explores vario...
详细信息
ISBN:
(数字)9798331529505
ISBN:
(纸本)9798331529512
High Dynamic Range (HDR) imaging has become a significant technological advancement in visual data processing, allowing for the capture of a wider dynamic range of luminance levels in images. This paper explores various HDR processing techniques and their potential applications in automation and machinevision. By using methods such as multiple image fusion, image registration, and tone mapping, the paper demonstrates how HDR processing can enhance visual data in automated systems, improving accuracy in environments requiring complex lighting conditions. This work applies HDR algorithms to real-world scenarios, showcasing their potential in industrial automation and robotics, where accurate visual data plays a crucial role.
Computer vision and Biometrics benefit from the recent advances in Pattern Recognition and Artificial Intelligence, which tends to make model-based face recognition more efficient. Also, deep learning combined with da...
详细信息
Computer vision and Biometrics benefit from the recent advances in Pattern Recognition and Artificial Intelligence, which tends to make model-based face recognition more efficient. Also, deep learning combined with data augmentation tends to enrich the training sets used for learning tasks. Nevertheless, face recognition still is challenging, especially because of imaging issues that occur in practice, such as changes in lighting, appearance, head posture and facial expression. In order to increase the reliability of face recognition, we propose a novel supervised appearance-based face recognition method which creates a low-dimensional orthogonal subspace that enforces the face class separability. The proposed approach uses data augmentation to mitigate the problem of training sample scarcity. Unlike most face recognition approaches, the proposed approach is capable of handling efficiently grayscale and color face images, as well as low and high-resolution face images. Moreover, proposed supervised method presents better class structure preservation than typical unsupervised approaches, and also provides better data preservation than typical supervised approaches as it obtains an orthogonal discriminating subspace that is not affected by the singularity problem that is common in such cases. Furthermore, a soft margins Support vector machine classifier is learnt in the low-dimensional subspace and tends to be robust to noise and outliers commonly found in practical face recognition. To validate the proposed method, an extensive set of face identification experiments was conducted on three challenging public face databases, comparing the proposed method with methods representative of the state-of-the-art. The proposed method tends to present higher recognition rates in all databases. In addition, the experiments suggest that data augmentation also plays an essential role in the appearance-based face recognition, and that the CIELAB color space (L*a*b) is generally mor
暂无评论