Any intelligent traffic monitoring system must be able to detect anomalies such as traffic accidents in real time. In this paper, we propose a Decision-Tree enabled approach powered by deep learning for extracting ano...
详细信息
ISBN:
(纸本)9781665448994
Any intelligent traffic monitoring system must be able to detect anomalies such as traffic accidents in real time. In this paper, we propose a Decision-Tree enabled approach powered by deep learning for extracting anomalies from traffic cameras while accurately estimating the start and end times of the anomalous event. Our approach included creating a detection model, followed by anomaly detection and analysis. YOLOv5 served as the foundation for our detection model. The anomaly detection and analysis step entail traffic scene background estimation, road mask extraction, and adaptive thresholding. Candidate anomalies were passed through a decision tree to detect and analyze final anomalies. The proposed approach yielded an F1 score of 0.8571, and an S4 score of 0.5686, per the experimental validation.
Facial expressions are an exciting study area within computer apparition, affecting estimating, and human-calculating interaction. Our approach offers a distinct end-to-end network for attention-based self-governing f...
详细信息
Architectures based on siamese networks with triplet loss have shown outstanding performance on the image-based similarity search problem. This approach attempts to discriminate between positive (relevant) and negativ...
详细信息
ISBN:
(纸本)9781665448994
Architectures based on siamese networks with triplet loss have shown outstanding performance on the image-based similarity search problem. This approach attempts to discriminate between positive (relevant) and negative (irrelevant) items. However, it undergoes a critical weakness. Given a query, it cannot discriminate weakly relevant items, for instance, items of the same type but different color or texture as the given query, which could be a serious limitation for many real-world search applications. Therefore, in this work, we present a quadruplet-based architecture that overcomes the aforementioned weakness. Moreover, we present an instance of this quadruplet network, which we call Sketch-QNet, to deal with the color sketch-based image retrieval (CSBIR) problem, achieving new state-of-the-art results.
Multistage, or serial, fusion refers to the algorithms sequentially fusing an increased number of matching results at each step and making decisions about accepting or rejecting the match hypothesis, or going to the n...
详细信息
ISBN:
(纸本)9781665448994
Multistage, or serial, fusion refers to the algorithms sequentially fusing an increased number of matching results at each step and making decisions about accepting or rejecting the match hypothesis, or going to the next step. Such fusion methods are beneficial in the situations where running additional matching algorithms needed for later stages is time consuming or expensive. The construction of multistage fusion methods is challenging, since it requires both learning fusion functions and finding optimal decision thresholds for each stage. In this paper, we propose the use of single neural network for learning the multistage fusion. In addition we discuss the choices for the performance measurements of the trained algorithms and for the selection of network training optimization criteria. We perform the experiments using three face matching algorithms and IJB-A and IJB-C databases.
Most standard learning approaches lead to fragile models which are prone to drift when sequentially trained on samples of a different nature-the well-known catastrophic forgetting issue. In particular, when a model co...
详细信息
ISBN:
(纸本)9781665445092
Most standard learning approaches lead to fragile models which are prone to drift when sequentially trained on samples of a different nature-the well-known catastrophic forgetting issue. In particular, when a model consecutively learns from different visual domains, it tends to forget the past domains in favor of the most recent ones. In this context, we show that one way to learn models that are inherently more robust against forgetting is domain randomization-for vision tasks, randomizing the current domain's distribution with heavy image manipulations. Building on this result, we devise a meta-learning strategy where a regularizer explicitly penalizes any loss associated with transferring the model from the current domain to different "auxiliary" meta-domains, while also easing adaptation to them. Such meta-domains are also generated through randomized image manipulations. We empirically demonstrate in a variety of experiments-spanning from classification to semantic segmentation-that our approach results in models that are less prone to catastrophic forgetting when transferred to new domains.
The task of identifying and classifying playing cards within images or video streams in computervision involves employing a Sequential Convolutional Neural Network (CNN) model. This process combines image categorizat...
详细信息
Inner speech recognition is a modern advancement in Brain computer interfaces (BCI) that facilitates a communication between the computer and the brain in a direct way. It is particularly beneficial for individuals wh...
详细信息
Text recognition in information loss scenarios like blurriness, occlusion, and perspective distortion is challenging in real-world applications. To enhance robustness, some studies use extra unlabeled data for encoder...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Text recognition in information loss scenarios like blurriness, occlusion, and perspective distortion is challenging in real-world applications. To enhance robustness, some studies use extra unlabeled data for encoder pretraining. Others focus on improving decoder context reasoning. However, pretraining methods require abundant unlabeled data and high computing resources, while decoder-based approaches risk over-correction. In this paper, we propose MaskSTR, a dual-branch training framework for STR models, using patch masking to simulate information loss. MaskSTR guides visual representation learning, improving robustness to information loss conditions without extra data or training stages. Furthermore, we introduce Block Masking, a novel and straightforward mask generation method, for further performance enhancement. Experiments demonstrate MaskSTR's effectiveness across CTC, attention, and Transformer decoding methods, achieving significant performance gains and setting new state-of-the-art results.
The current prevalent approach of the Internet of Health and Medical Things entails proactively preventing disease onset through routine monitoring of individuals' physical activities, making Human Activity Recogn...
详细信息
Due to the popularity and mobility of smart phones, phone-related pedestrian distracted behaviors, e.g., Texting, Game Playing, and Phone calls, have caused many traffic fatalities and accidents. As an advanced driver...
详细信息
ISBN:
(纸本)9781665448994
Due to the popularity and mobility of smart phones, phone-related pedestrian distracted behaviors, e.g., Texting, Game Playing, and Phone calls, have caused many traffic fatalities and accidents. As an advanced driver-assistance or autonomous-driving system, computervision could be used to automatically detect distractions from cameras installed on the vehicle for useful safety intervention. The state-of-the-art method models this problem as a standard supervised learning method with a two-branch Convolutional Neural Network (CNN) followed by a voting on all image frames. In contrast, this paper proposes a new synthetic dataset named SYN-PPDB (448 synchronized video pairs of 53,760 computer game images) for this research problem and models it as a transfer learning problem from synthetic data to real data. A new deep learning model embedded with spatial-temporal feature learning and pose-aware transfer learning is proposed. Experimental results show that we could improve the state-of-the-art overall recognition accuracy from 84.27% to 96.67%.
暂无评论