The purpose of cross-age face recognition is to identify people with a large age difference. It is of great significance in security, finance, fighting criminals and other fields. In order to organize the development ...
详细信息
We introduce a framework that predicts the goals behind observable human action in video. Motivated by evidence in developmental psychology, we leverage video of unintentional action to learn video representations of ...
详细信息
ISBN:
(纸本)9781665445092
We introduce a framework that predicts the goals behind observable human action in video. Motivated by evidence in developmental psychology, we leverage video of unintentional action to learn video representations of goals without direct supervision. Our approach models videos as contextual trajectories that represent both low-level motion and high-level action features. Experiments and visualizations show our trained model is able to predict the underlying goals in video of unintentional action. We also propose a method to "automatically correct" unintentional action by leveraging gradient signals of our model to adjust latent trajectories. Although the model is trained with minimal supervision, it is competitive with or outperforms baselines trained on large (supervised) datasets of successfully executed goals, showing that observing unintentional action is crucial to learning about goals in video.
Oracle bones, one of the earliest and most influential writing systems in China, have significantly contributed to the fields of archaeology and paleography. Despite the scarcity of oracle bone data available for scan...
详细信息
Sign language is a language primarily used by the hearing-impaired for communication and has more than 200 variations worldwide. Communication is nearly impossible between signers of different variations. Moreover for...
详细信息
ISBN:
(纸本)9798350351194;9798350351187
Sign language is a language primarily used by the hearing-impaired for communication and has more than 200 variations worldwide. Communication is nearly impossible between signers of different variations. Moreover for a person with normal hearing, learning sign language can be challenging because the syntax of sign language differs from that of natural language. Translation of signs by machine learning offers potential solutions to these challenges, facilitating communication for everyone. This study attempts to enhance the performance of the existing state-of-the-art sign language translation model, Gloss attention SLT network (GASLT), through the integration of a multimodal approach. By combining RGB video with 3D pose data extracted using Mediapipe in an innovative way, our multimodal method significantly enhances the GASLT's results. We conducted two experiments involving the fusion of video and pose data with the GASLT model. These experiments led to an 18.39% improvement in the model's BLEU score compared to the original model, showcasing the effectiveness of the multimodal approach in enhancing sign translation.
Numerical analysis is the focus of this study, which employs the Adam optimizer in conjunction with CNNs. computervision and deep learning have enabled exact number recognition, which is crucial for many current appl...
详细信息
We propose a comprehensive computervision framework that integrates multi-scale signal processing with an enhanced ConvNeXt-YOLO architecture for robust object detection. Our framework addresses three critical challe...
详细信息
ISBN:
(纸本)9798350377040;9798350377033
We propose a comprehensive computervision framework that integrates multi-scale signal processing with an enhanced ConvNeXt-YOLO architecture for robust object detection. Our framework addresses three critical challenges in visual recognition: multi-scale feature representation, signal quality enhancement, and model generalization. The framework implements a sophisticated signal processing pipeline for image preprocessing. Initially, we develop an adaptive resolution normalization algorithm that maintains consistent feature quality across varying input dimensions. Subsequently, we design a context-aware Gaussian filtering mechanism that optimizes the signal-to-noise ratio while preserving essential feature characteristics. These preprocessing techniques significantly enhance the framework's capability to extract discriminative features and maintain computational stability. To optimize the learning process, we introduce a systematic data augmentation strategy incorporating both geometric and signal-level transformations. Our approach combines predetermined rotation sampling (90 degrees, 180 degrees, 270 degrees) with continuous-space ROI augmentation during inference. This hybrid strategy enables the framework to achieve rotation invariance and enhanced generalization capabilities, particularly beneficial for complex object detection scenarios. The core innovation lies in our architectural integration of ConvNeXt with YOLO. We redesign the feature extraction backbone using hierarchical ConvNeXt blocks, enabling efficient multi-scale feature learning. The cross-branch information fusion mechanism, coupled with our signal-aware design, substantially improves the model's representational capacity. Experimental results on standard computervision benchmarks demonstrate superior performance, achieving state-of-the-art accuracy (improvement of X%) and recall rates (improvement of Y%) compared to conventional approaches.
In the aspect of intelligence, we try to use computervision technology to analyze the face recognition information collected by the camera, so as to achieve the functions of early warning, prevention and active monit...
详细信息
The problem of novelty detection in fine-grained visual classification (FGVC) is considered. An integrated understanding of the probabilistic and distance-based approaches to novelty detection is developed within the ...
详细信息
ISBN:
(纸本)9781665445092
The problem of novelty detection in fine-grained visual classification (FGVC) is considered. An integrated understanding of the probabilistic and distance-based approaches to novelty detection is developed within the framework of convolutional neural networks (CNNs). It is shown that softmax CNN classifiers are inconsistent with novelty detection, because their learned class-conditional distributions and associated distance metrics are unidentifiable. A new regularization constraint, the class-conditional Gaussianity loss, is then proposed to eliminate this unidentifiability, and enforce Gaussian class-conditional distributions. This enables training Novelty Detection Consistent Classifiers (NDCCs) that are jointly optimal for classification and novelty detection. Empirical evaluations show that NDCCs achieve significant improvements over the state-of-the-art on both small- and large-scale FGVC datasets.
Text-line segmentation is still considered challenging for complex background scene images. The success of text detection and recognition depends on the success of the text segmentation. This study presents a new meth...
详细信息
For learned image compression, the autoregressive context model is proved effective in improving the rate-distortion (RD) performance. Because it helps remove spatial redundancies among latent representations. However...
详细信息
ISBN:
(纸本)9781665445092
For learned image compression, the autoregressive context model is proved effective in improving the rate-distortion (RD) performance. Because it helps remove spatial redundancies among latent representations. However, the decoding process must be done in a strict scan order, which breaks the parallelization. We propose a parallelizable checkerboard context model (CCM) to solve the problem. Our two-pass checkerboard context calculation eliminates such limitations on spatial locations by re-organizing the decoding order. Speeding up the decoding process more than 40 times in our experiments, it achieves significantly improved computational efficiency with almost the same rate-distortion performance. To the best of our knowledge, this is the first exploration on parallelization-friendly spatial context model for learned image compression.
暂无评论