We present a key point-based activity recognition framework, built upon pre-trained human pose estimation and facial feature detection models. Our method extracts complex static and movement-based features from key fr...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
We present a key point-based activity recognition framework, built upon pre-trained human pose estimation and facial feature detection models. Our method extracts complex static and movement-based features from key frames in videos, which are used to predict a sequence of key-frame activities. Finally, a merge procedure is employed to identify robust activity segments while ignoring outlier frame activity predictions. We analyze the different components of our framework via a wide array of experiments and draw conclusions with regards to the utility of the model and ways it can be improved. Results show our model is competitive, taking the 11th place out of 27 teams submitting to Track 3 of the 2022 AI City Challenge.
Existing low-light image enhancement approaches based upon pixel-wise reconstruction losses are inadept at capturing the complex distribution of well-exposed images, resulting in residual noise, insufficient...
详细信息
the 6th edition of the AI City Challenge specifically focuses on problems in two domains where there is tremendous unlocked potential at the intersection of computervision and artificial intelligence: Intelligent Tra...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
the 6th edition of the AI City Challenge specifically focuses on problems in two domains where there is tremendous unlocked potential at the intersection of computervision and artificial intelligence: Intelligent Traffic Systems (ITS), and brick and mortar retail businesses. the four challenge tracks of the 2022 AI City Challenge received participation requests from 254 teams across 27 countries. Track 1 addressed city-scale multi-target multi-camera (MTMC) vehicle tracking. Track 2 addressed natural-language-based vehicle track retrieval. Track 3 was a brand new track for naturalistic driving analysis, where the data were captured by several cameras mounted inside the vehicle focusing on driver safety, and the task was to classify driver actions. Track 4 was another new track aiming to achieve retail store automated checkout using only a single view camera. We released two leader boards for submissions based on different methods, including a public leader board for the contest, where no use of external data is allowed, and a general leader board for all submitted results. the top performance of participating teams established strong baselines and even outperformed the state-of-the-art in the proposed challenge tracks.
Underwater images frequently experience quality degradation due to refraction, back-scattering, and absorption, leading to color distortion, blurriness, and reduced visibility. Such degradation present in the underwat...
详细信息
the proliferation of scene text in both structured and unstructured environments presents significant challenges in optical character recognition (OCR), necessitating more efficient and robust text spotting solutions....
详细信息
Snowfall severely degrades outdoor video visibility while reducing the performance of subsequent vision tasks. Although video recovery methods based on deep learning have achieved amazing accomplishments, video snow r...
详细信息
Developing a real-time sentiment analysis application that relies solely on features extracted from images or textual content falls short of capturing human emotions’ nuanced and multifaceted nature. the unlabeled da...
详细信息
Watching sports events via 3D-instead of two-dimensional video streaming allows for increased immersion, e.g. via mixed reality headsets in comparison to traditional screens. So far, capturing 3D video of sports event...
详细信息
ISBN:
(数字)9781728165325
ISBN:
(纸本)9781728165325
Watching sports events via 3D-instead of two-dimensional video streaming allows for increased immersion, e.g. via mixed reality headsets in comparison to traditional screens. So far, capturing 3D video of sports events required expensive outside-in tracking with numerous cameras. this study demonstrates the feasibility of streaming sports content to mixed reality headsets as holographs in real-time using inside-out tracking and low-cost equipment only. We demonstrate our system by streaming a race car on an indoor track as 3D models, which are then rendered in an Magic Leap One headset. An onboard camera, mounted on the race car provides the video stream used to localize the car via computervision. the localization is estimated by an end-to-end convolutional neural network (CNN). the study compares three state-of-the-art CNN models in their respective accuracy and execution time, with PoseNet+LSTM achieving position and orientation accuracy of 0.35m and 3.95 degrees. the total streaming latency in this study was 1041ms, suggesting technical feasibility of streaming 3D sports content, e.g. on large playgrounds, in near real-time onto mixed-reality headsets.
Robot vision is an interdisciplinary field that deals with how robots can be made to gain high-level understanding from digital images or videos. Understanding an image at the pixel level often does not provide enough...
详细信息
ISBN:
(纸本)9781728103693
Robot vision is an interdisciplinary field that deals with how robots can be made to gain high-level understanding from digital images or videos. Understanding an image at the pixel level often does not provide enough information for decision making and action taking. In this case, higher level semantic information that describes the image is required. this helps the robot to accomplish complex tasks that require visual understanding. For robots to add value they need to be sufficiently effective at executing tasks in different settings. Despite many impressive advances in robot vision, robots still lack the ability to function as humans do in complex environments. Importantly, this includes being able to interpret and understand the perceptual complexities of the world. Robot vision is dependant on ideas from bothcomputervision and machine learning. In this paper we provide a overview of the advances in these disciplines and how they contribute to robot vision.
In the field of computervision, multi-class outdoor weather classification is a difficult task to perform due to diversity and lack of distinct weather characteristic or features. this research proposed a novel frame...
详细信息
ISBN:
(纸本)9781728103693
In the field of computervision, multi-class outdoor weather classification is a difficult task to perform due to diversity and lack of distinct weather characteristic or features. this research proposed a novel framework for identifying different weather scenes from still images using heterogeneous ensemble methods. Our approach is based on a method called Selection Based on Accuracy Intuition and diversity (SAID) of stacked ensemble algorithms. this involves the extraction of histogram of features from different weather scenes. the blending and boosting of different weather features using stacked ensemble algorithms increases recognition rate of different weather conditions compared to other classification and ensemble methods. the paper presents academic and practitioners a new insight into diversity of heterogeneous ensemble methods for solving the challenges of weather recognition from still images.
暂无评论