This paper introduces a novel dataset for video enhancement and studies the state-of-the-art methods of the NTIRE 2021 challenge on quality enhancement of compressed video. The challenge is the first NTIRE challenge i...
详细信息
ISBN:
(纸本)9781665448994
This paper introduces a novel dataset for video enhancement and studies the state-of-the-art methods of the NTIRE 2021 challenge on quality enhancement of compressed video. The challenge is the first NTIRE challenge in this direction, with three competitions, hundreds of participants and tens of proposed solutions. Our newly collected Large-scale Diverse Video (LDV) dataset is employed in the challenge. In our study, we analyze the solutions of the challenges and several representative methods from previous literature on the proposed LDV dataset. We find that the NTIRE 2021 challenge advances the state-of-theart of quality enhancement on compressed video.
Laughter detection is an important area of interest in the Affective Computing and Human-computer Interaction fields. In this paper we propose a multi-modal methodology, based on the fusion of audio and visual cues to...
详细信息
ISBN:
(纸本)9781424439942
Laughter detection is an important area of interest in the Affective Computing and Human-computer Interaction fields. In this paper we propose a multi-modal methodology, based on the fusion of audio and visual cues to deal with the laughter recognition problem in face-to-face conversations. The audio features are extracted from the spectogram and the video features are obtained estimating the mouth movement degree and using a smile and laughter classifier Finally, the multi-modal cues are included in a sequential classifier Results over videos from the public discussion blog of the New York Times show that both types of features perform better when considered together by the classifier Moreover the sequential methodology shows to significantly, outperform the results obtained by an Adaboost classifier
Recent interest in developing online computervision algorithms is spurred in part by a growth of applications capable of generating large volumes of images and videos. These applications are rich sources of images an...
详细信息
ISBN:
(纸本)9781479943098
Recent interest in developing online computervision algorithms is spurred in part by a growth of applications capable of generating large volumes of images and videos. These applications are rich sources of images and video streams. Online vision algorithms for managing, processing and analyzing these streams need to rely upon streaming concepts, such as pipelines, to ensure timely and incremental processing of data. This paper is a first attempt at defining a formal stream algebra that provides a mathematical description of vision pipelines and describes the distributed manipulation of image and video streams. We also show how our algebra can effectively describe the vision pipelines of two state of the art techniques.
In this paper, we introduce a challenging new dataset, MLB-YouTube, designed for fine-grained activity detection. The dataset contains two settings: segmented video classification as well as activity detection in cont...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
In this paper, we introduce a challenging new dataset, MLB-YouTube, designed for fine-grained activity detection. The dataset contains two settings: segmented video classification as well as activity detection in continuous videos. We experimentally compare various recognition approaches capturing temporal structure in activity videos, by classifying segmented videos and extending those approaches to continuous videos. We also compare models on the extremely difficult task of predicting pitch speed and pitch type from broadcast baseball videos. We find that learning temporal structure is valuable for fine-grained activity recognition.
Action recognition is one of the major challenges of computervision. Several approaches have been proposed using different descriptors and multi-class models. In this paper, we focus on binary ranking models for the ...
详细信息
ISBN:
(纸本)9780769549903
Action recognition is one of the major challenges of computervision. Several approaches have been proposed using different descriptors and multi-class models. In this paper, we focus on binary ranking models for the action recognition problem and address the action recognition as a ranking problem. A binary ranking model is trained for each action and used to recognize the test videos for that action. Binary ranking models are constructed using dense SIFT (DSIFT) descriptors and histogram of oriented gradients / histogram of optical flows (HOG/HOF) descriptors. We show that using ranking models, it is possible to obtain higher recognition accuracies from a baseline that is based on multi-class models on the very recent and challenging benchmark datasets;Human Motion Database (HMDB) and The Action Similarity Labeling (ASLAN).
Object recognition on the satellite images is one of the most relevant and popular topics in the problem of patternrecognition. This was facilitated by many factors, such as a high number of satellites with high-reso...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
Object recognition on the satellite images is one of the most relevant and popular topics in the problem of patternrecognition. This was facilitated by many factors, such as a high number of satellites with high-resolution imagery, the significant development of computervision, especially with a major breakthrough in the field of convolutional neural networks, a wide range of industry verticals for usage and still a quite empty market. Roads are one of the most popular objects for recognition. In this article, we want to present you the combination of work of neural network and postprocessing algorithm, due to which we get not only the coverage mask but also the vectors of all of the individual roads that are present in the image and can be used to address the higher-level tasks in the future. This approach was used to solve the DeepGlobe Road Extraction Challenge.
Lane feature extraction is one of the key computational steps in lane analysis systems. In this paper, we propose a lane feature extraction method, which enables different configurations of embedded solutions that add...
详细信息
ISBN:
(纸本)9780769549903
Lane feature extraction is one of the key computational steps in lane analysis systems. In this paper, we propose a lane feature extraction method, which enables different configurations of embedded solutions that address both accuracy and embedded systems' constraints. The proposed lane feature extraction process is evaluated in detail using real world lane data, to explore its effectiveness for embedded realization and adaptability to varying contextual information like lane types and environmental conditions.
Understanding human actions in videos has been a central research theme in computervision for decades, and much progress has been achieved over the years. Much of this progress was demonstrated on standard benchmarks...
详细信息
ISBN:
(纸本)9780769549903
Understanding human actions in videos has been a central research theme in computervision for decades, and much progress has been achieved over the years. Much of this progress was demonstrated on standard benchmarks used to evaluate novel techniques. These benchmarks and their evolution, provide a unique perspective on the growing capabilities of computerized action recognition systems. They demonstrate just how far machine vision systems have come while also underscore the gap that still remains between existing state-of-the-art performance and the needs of real-world applications. In this paper we provide a comprehensive survey of these benchmarks: from early examples, such as the Weizmann set [1], to recently presented, contemporary benchmarks. This paper further provides a summary of the results obtained in the last couple of years on the recent ASLAN benchmark [12], which was designed to reflect the many challenges modern Action recognition systems are expected to overcome.
During the performance optimization of a computervision system, developers frequently run into platform-level inefficiencies and bottlenecks that can not be addressed by traditional methods. OpenVX is designed to add...
详细信息
ISBN:
(纸本)9781479943098
During the performance optimization of a computervision system, developers frequently run into platform-level inefficiencies and bottlenecks that can not be addressed by traditional methods. OpenVX is designed to address such system-level issues by means of a graph-based computation model. This approach differs from the traditional acceleration of one-off functions, and exposes optimization possibilities that might not be available or obvious with traditional computervision libraries such as OpenCV.
We present a vision-based method for signer diarization - the task of automatically determining "who signed when?" in a video. This task has similar motivations and applications as speaker diarization but ha...
详细信息
ISBN:
(纸本)9780769549903
We present a vision-based method for signer diarization - the task of automatically determining "who signed when?" in a video. This task has similar motivations and applications as speaker diarization but has received little attention in the literature. In this paper, we motivate the problem and propose a method for solving it. The method is based on the hypothesis that signers make more movements than their interlocutors. Experiments on four videos (a total of 1.4 hours and each consisting of two signers) show the applicability of the method. The best diarization error rate (DER) obtained is 0.16.
暂无评论