Human Action Recognition is one of the most applied research directions in the field of Computer vision, which is widely used in human-computer interaction, Augmented Reality (AR) technology, security monitoring, and ...
详细信息
ISBN:
(纸本)9798350350920
Human Action Recognition is one of the most applied research directions in the field of Computer vision, which is widely used in human-computer interaction, Augmented Reality (AR) technology, security monitoring, and other scenarios. However, due to the complexity of human action gestures, existing Human Action Recognition methods have certain deficiencies in dealing with variable human gestures and action information, and the accuracy needs to be improved. To improve the accuracy, We propose a multi-dimensional network model based on SC-LSTM(Skip-Connection + LSTM). First, a Temporal Feature Extraction Module is designed based on SC-LSTM, and a Spatial Feature Extraction Module is designed based on CNN and Multi-Attention Mechanism to extract potential human action features from both temporal and spatial dimensions, respectively. Then, a separate SC-LSTM classification network is utilized to process these spatio-temporal features to obtain the final HAR results. The experimental results show that compared to other algorithms, the present model can more fully utilize the information in the temporal dimension, and thus performs better in terms of HAR accuracy.
With the rapid advances of deep learning-based computer vision (CV) technology, digital images are increasingly consumed, not by humans, but by downstream CV algorithms. However, capturing high-fidelity and high-resol...
详细信息
ISBN:
(纸本)9798400700958
With the rapid advances of deep learning-based computer vision (CV) technology, digital images are increasingly consumed, not by humans, but by downstream CV algorithms. However, capturing high-fidelity and high-resolution images is energy-intensive. It not only dominates the energy consumption of the sensor itself (i.e. in low-power edge devices), but also contributes to significant memory burdens and performance bottlenecks in the later storage, processing, and communication stages. In this paper, we systematically explore a new paradigm of in-sensor processing, termed "learned compressive acquisition" (LeCA). Targeting machinevisionapplications on the edge, the LeCA framework exploits the joint learning of a sensor autoencoder structure with the downstream CV algorithms to effectively compress the original image into low-dimensional features with adaptive bit depth. We employ column-parallel analog-domain processing directly inside the image sensor to perform the compressive encoding of the raw image, resulting in meaningful hardware savings, and energy efficiency improvements. Evaluated within a modern machinevisionprocessing pipeline, LeCA achieves 4x, 6x, and 8x compression ratios prior to any digital compression, with minimal accuracy loss of 0.97%, 0.98%, and 2.01% on imageNet, outperforming existing methods. Compared with the conventional full-resolution image sensor and the state-of-the-art compressive sensing sensor, our LeCA sensor is 6.3x and 2.2x more energy-efficient while reaching a 2x higher compression ratio.
Face recognition systems have always been in great demand for various security-based applications, authentication being one of them. Due to the continuous upsurge of the COVID-19 pandemic and emerging variants of coro...
详细信息
Face recognition systems have always been in great demand for various security-based applications, authentication being one of them. Due to the continuous upsurge of the COVID-19 pandemic and emerging variants of coronaviruses, wearing a face mask has been made mandatory in many countries, especially in crowded places. This situation poses significant challenges to the face recognition systems in recognizing the person's identity with face mask-based partial occlusion. Therefore, an update is needed in the traditional face recognition systems to ascertain whether the person is wearing a mask. This manuscript offers a novel Fiducial Point-based Non-local Means De-Noising (FP-NMDN) method for data pre-processing. This manuscript also proposed two comprehensive feature extraction mechanisms, i.e., transfer learning-based models and a customized Convolutional Neural Network (CNN) model. The experiment is conducted for five popular baseline architectures viz. Visual Geometry Group (VGG16), Residual Network (ResNet50), MobileNetV2, InceptionV3, and EfficientNetB0 with fine-tuning of hyperparameters and a customized CNN architecture. A modified dense network with a new classification layer has been introduced to obtain high classification results in less inference time. The datasets are collected from four valid sources;Kaggle Medical Masked Face, Real-world Masked Face, Face Mask, and open-source datasets that have been resynthesized based on predefined experimental criteria named Dataset-I and other existing datasets as Dataset-ii. The experimental results reveal that our optimized transfer learning-based ResNet50 model achieves the best accuracy of 99.68% and 99.67% for Dataset-I and Dataset-ii, respectively. Besides, our customized CNN model outperforms other recent methods regarding overhead and inference time.
Complexity intensifies when gesticulations span various scales. Traditional scale-invariant object recognition methods often falter when confronted with case-sensitive characters in the English alphabet. The literatur...
详细信息
Complexity intensifies when gesticulations span various scales. Traditional scale-invariant object recognition methods often falter when confronted with case-sensitive characters in the English alphabet. The literature underscores a notable gap, the absence of an open-source multi-scale un-instructional gesture database featuring a comprehensive dictionary. In response, we have created the NITS (gesture scale) database, which encompasses isolated mid-air gesticulations of ninety-five alphanumeric characters. In this research, we present a scale-centric framework that addresses three critical aspects: (1) detection of smaller gesture objects: our framework excels at detecting smaller gesture objects, such as a red color marker. (2) Removal of redundant self co-articulated strokes: we propose an effective approach to eliminate redundant self co-articulated strokes often present in gesture trajectories. (3) Scale-variant approach for recognition: to tackle the scale vs. size ambiguity in recognition, we introduce a novel scale-variant methodology. Our experimental results reveal a substantial improvement of approximately 16% compared to existing state-of-the-art recognition models for mid-air gesture recognition. These outcomes demonstrate that our proposed approach successfully emulates the perceptibility found in the human visual system, even when utilizing data from monophthalmic vision. Furthermore, our findings underscore the imperative need for comprehensive studies encompassing scale variations in gesture recognition.
Object-based image analysis (OBIA) is extensively used for the classification of High-Resolution Satellite imagery (HRSI). The various attributes of the image segments like spectral, spatial and textural, can be gener...
详细信息
ISBN:
(纸本)9783031581731;9783031581748
Object-based image analysis (OBIA) is extensively used for the classification of High-Resolution Satellite imagery (HRSI). The various attributes of the image segments like spectral, spatial and textural, can be generated for analysis and classification purposes. However, the use of all these attributes may not lead to attaining high classification accuracy. Experiments have shown that, a suitable set of these features need to be identified for faster and accurate classification of imageries. The filter based methods likeChi-Square, Information-gain and ReliefF are extensively used for identification and ranking the best set of parameters. The random tree based Boruta machine learning feature ranking method is also used in identifying the feature ranking along with the above algorithms. Subsequently, a learner is fused with a filter and the resultant receiver operating characteristic (ROC) plot of the model has been used to identify the best accuracy and the minimal set of attributes for identifying an individual feature like roads, trees, grass, buildings and shadow. The best set of parameters for a class is identified by the best ROC plot. The best parameters are identified from Boruta feature analysis. The results indicate that the identified smaller feature set helps in enhancing classification accuracy.
Open-Set Object Detection (OSOD) has emerged as a contemporary research direction to address the detection of unknown objects. Recently, few works have achieved remarkable performance in the OSOD task by employing con...
详细信息
ISBN:
(纸本)9798350318920;9798350318937
Open-Set Object Detection (OSOD) has emerged as a contemporary research direction to address the detection of unknown objects. Recently, few works have achieved remarkable performance in the OSOD task by employing contrastive clustering to separate unknown classes. In contrast, we propose a new semantic clustering-based approach to facilitate a meaningful alignment of clusters in semantic space and introduce a class decorrelation module to enhance inter-cluster separation. Our approach further incorporates an object focus module to predict objectness scores, which enhances the detection of unknown objects. Further, we employ i) an evaluation technique that penalizes low-confidence outputs to mitigate the risk of misclassification of the unknown objects and ii) a new metric called HMP that combines known and unknown precision using harmonic mean. Our extensive experiments demonstrate that the proposed model achieves significant improvement on the MS-COCO & PASCAL VOC dataset for the OSOD task.
Convolution is a fundamental operation in imageprocessing and machine learning. Aimed primarily at maintaining image size, padding is a key ingredient of convolution, which, however, can introduce undesirable boundar...
详细信息
Background: Scanning electron microscope (SEM) images acquired by E-beam tools for inspection and metrology applications are usually degraded by blurring and additive noises. Blurring sources include the intrinsic poi...
详细信息
Background: Scanning electron microscope (SEM) images acquired by E-beam tools for inspection and metrology applications are usually degraded by blurring and additive noises. Blurring sources include the intrinsic point spread function of optics, lens aberration, and potential motion blur caused by the wafer stage movements during the image acquisition process. Noise sources include shot noise, quantization noise, and electronic read-out noise. image degradation caused by blurring and noise usually leads to noisy, inaccurate metrology results. For low-dosage metrology applications, metrology algorithms often fail to obtain successful measurements due to elevated levels of blurring and noise. image restoration and enhancement are necessary as preprocessing steps to obtain meaningful metrology results. Initial success was obtained by applying neural network-based framework to drastically improve image quality and metrology precision as is demonstrated in the previous ***: We aim to provide more details on the neural network model architecture, model regularization, and training dynamics to better understand the model's behavior. We also analyze the effect of image restoration on key metrology performances such as line edge roughness and mean critical dimension of the ***: Non-machine learning-based image quality enhancement methods fail to restore low-quality SEM images to a satisfactory degree. More recent convolutional neural networks and vision transformer-based, supervised deep learning models have achieved superior performance in various low-level imageprocessing and computer vision tasks. Nevertheless, they require a huge amount of training data that contain high-quality ground truth images. Unfortunately, high-quality ground truth images for low-dosage SEM images do not exist. Instead, we use self-supervised U-Net combined with a fully connected network (FCN) to recover low-dosage images without the need for ground truth training images. The
Synthetically-generated imagery holds the promise of being a panacea for the challenges of real world datasets. Yet it continues to be frequently observed that deep learning model performance is not as good when train...
详细信息
ISBN:
(纸本)9781510673892;9781510673885
Synthetically-generated imagery holds the promise of being a panacea for the challenges of real world datasets. Yet it continues to be frequently observed that deep learning model performance is not as good when trained with synthetic data versus real measured imagery. In this study we present analyses and illustration of the use of several statistical metrics, measures, and visualization tools based on the distance and similarity between real and synthetic data empirical distributions in the latent feature embedding space, which provide a quantitative understanding of the relevant image-domain distribution discrepancy issues hampering the generation of performant simulated datasets. We also demonstrate the practical applications of these tools and techniques in a novel study comparing latent space embedding vector distributions of real, pristine synthetic, and synthetic modified by physics-based degradation models. The results may assist deep learning practitioners and synthetic imagery modelers with evaluating latent space embedding distributional dissimilarity and improving model performance when using simulation tools to generate synthetic imagery training data.
Local feature detection and description play a crucial role in various computer vision tasks, including image matching. Variations in illumination conditions significantly affect the accuracy of these applications. Ho...
详细信息
Local feature detection and description play a crucial role in various computer vision tasks, including image matching. Variations in illumination conditions significantly affect the accuracy of these applications. However, existing methods inadequately address this issue. In this paper, a novel algorithm based on illumination auxiliary learning module (IALM) is introduced. Firstly, a new local feature extractor named illumination auxiliary Superpoint (IA-Superpoint) is established, based on the integration of IALM and Superpoint. Secondly, illumination-aware auxiliary training focuses on capturing the effects of illumination variations during feature extraction through tailored loss functions and jointly learning mechanisms. Lastly, in order to evaluate the illumination robustness of local features, a metric is proposed by simulating various illumination disturbances. Experiments on HPatches and RDNIM datasets demonstrate that the performance of local feature extraction is greatly improved by our method. Compared to the baseline method, the proposed method exhibits improvements in both mean matching accuracy and homography estimation.
暂无评论