Human risky behavior in driving is an important visual recognition problem. In this paper, we propose a multi-view temporal action localization system based on the grayscale video to achieve action recognition in natu...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Human risky behavior in driving is an important visual recognition problem. In this paper, we propose a multi-view temporal action localization system based on the grayscale video to achieve action recognition in naturalistic driving. Specifically, we adopted SwinTransformer as feature extractor, and a single framework to detect boundary and class at the same time. Also, we improve multiple loss function for explicit constraints of embedded feature distributions. Our proposed framework achieves the overall F1 -score of 0.3154 on A2 dataset.
Dynamic vision sensor event cameras produce a variable data rate stream of brightness change events. Event production at the pixel level is controlled by threshold, bandwidth, and refractory period bias current parame...
详细信息
ISBN:
(纸本)9781665448994
Dynamic vision sensor event cameras produce a variable data rate stream of brightness change events. Event production at the pixel level is controlled by threshold, bandwidth, and refractory period bias current parameter settings. Biases must be adjusted to match application requirements and the optimal settings depend on many factors. As a first step towards automatic control of biases, this paper proposes fixed-step feedback controllers that use measurements of event rate and noise. The controllers regulate the event rate within an acceptable range using threshold and refractory period control, and regulate noise using bandwidth control. Experiments demonstrate model validity and feedback control.
Perceiving distance from two camera images, a task called stereo vision, is fundamental for many applications in robotics or automation. However, algorithms that compute this information at high accuracy have a high c...
详细信息
ISBN:
(纸本)9781509014378
Perceiving distance from two camera images, a task called stereo vision, is fundamental for many applications in robotics or automation. However, algorithms that compute this information at high accuracy have a high computational complexity. One such algorithm, Semi Global Matching (SGM), performs well in many stereo vision benchmarks, while maintaining a manageable computational complexity. Nevertheless, CPU and GPU implementations of this algorithm often fail to achieve real-time processing of camera images, especially in power-constrained embedded environments. This work presents a novel architecture to calculate disparities through SGM. The proposed architecture is highly scalable and applicable for low-power embedded as well as high-performance multi-camera high-resolution applications.
Detecting suspicious events from video surveillance cameras has been an important task recently. Many trajectory based descriptors were developed, such as to detect people running or moving in opposite direction. Howe...
详细信息
ISBN:
(纸本)9781424439942
Detecting suspicious events from video surveillance cameras has been an important task recently. Many trajectory based descriptors were developed, such as to detect people running or moving in opposite direction. However, these trajectory based descriptors are not working well in the crowd environments like airports, rail stations, because those descriptors assume perfect motion/object segmentation. In this paper, we present an event detection method using dynamic texture descriptor. The dynamic texture descriptor is an extension of the local binary patterns. The image sequences are divided into regions. A flow is formed based on the similarity of the dynamic texture descriptors on the regions. We used real dataset for experiments. The results are promising.
While most approaches to symmetry detection in machine vision try to explain the gray-values or colors of the pixels, Gestalt algebra has no room for such measurement data. The entities (i.e. Gestalten) are only defin...
详细信息
ISBN:
(纸本)9780769549903
While most approaches to symmetry detection in machine vision try to explain the gray-values or colors of the pixels, Gestalt algebra has no room for such measurement data. The entities (i.e. Gestalten) are only defined with respect to each other. They form a generic hierarchy, and live in a continuous domain without any pixel raster. There is also no constraint forcing them to completely fill an image, or prohibiting overlap. Yet, when used as a tool for symmetry recognition, the algebra must be somehow connected to the given data. In this paper this is done only on the primitive level using the well-known SIFT feature detector. From a set of such SIFT-based Gestalten follows a combinatorial set of higher-order symmetric Gestalten by constructing all possible terms using the operations of the algebra. The Gestalt domain contains a quality or assessment dimension. Taking the best Gestalten with respect to this attribute and clustering them yields the output for this competition participation.
Most popular metric learning losses have no direct relation with the evaluation metrics that are subsequently applied to evaluate their performance. We hypothesize that training a metric learning model by maximizing t...
详细信息
ISBN:
(纸本)9781665487399
Most popular metric learning losses have no direct relation with the evaluation metrics that are subsequently applied to evaluate their performance. We hypothesize that training a metric learning model by maximizing the area under the ROC curve (which is a typical performance measure of recognition systems) can induce an implicit ranking suitable for retrieval problems. This hypothesis is supported by previous work that proved that a curve dominates in ROC space if and only if it dominates in Precision-Recall space. To test this hypothesis, we design and maximize an approximated, derivable relaxation of the area under the ROC curve. The proposed AUC loss achieves state-of-the-art results on two large scale retrieval benchmark datasets (Stanford Online Products and DeepFashion In-Shop). Moreover, the AUC loss achieves comparable performance to more complex, domain specific, state-of-the-art methods for vehicle re-identification.
Neural network designers have reached progressive accuracy by increasing models depth, introducing new layer types and discovering new combinations of layers. A common element in many architectures is the distribution...
详细信息
ISBN:
(纸本)9781665448994
Neural network designers have reached progressive accuracy by increasing models depth, introducing new layer types and discovering new combinations of layers. A common element in many architectures is the distribution of the number of filters in each layer. Neural network models keep a pattern design of increasing filters in deeper layers such as those in LeNet, VGG, ResNet, MobileNet and even in automatic discovered architectures such as NASNet. It remains unknown if this pyramidal distribution of filters is the best for different tasks and constrains. In this work we present a series of modifications in the distribution of filters in three popular neural network models and their effects in accuracy and resource consumption. Results show that by applying this approach, some models improve up to 8.9% in accuracy showing reductions in parameters up to 54%.
In this paper a novel bottom-up video event recognition approach is proposed, ObjectGraphs, which utilizes a rich frame representation and the relations between objects within each frame. Following the application of ...
详细信息
ISBN:
(纸本)9781665448994
In this paper a novel bottom-up video event recognition approach is proposed, ObjectGraphs, which utilizes a rich frame representation and the relations between objects within each frame. Following the application of an object detector (OD) on the frames, graphs are used to model the object relations and a graph convolutional network (GCN) is utilized to perform reasoning on the graphs. The resulting object-based frame-level features are then forwarded to a long short-term memory (LSTM) network for video event recognition. Moreover, the weighted in-degrees (WiDs) derived from the graph's adjacency matrix at frame level are used for identifying the objects that were considered most (or least) salient for event recognition and contributed the most (or least) to the final event recognition decision, thus providing an explanation for the latter. The experimental results show that the proposed method achieves state-of-the-art performance on the publicly available FCVID and YLI-MED datasets(1).
Manifold learning has been effectively used in computervision applications for dimensionality reduction that improves classification performance and reduces computational load. Grassmann manifolds are well suited for...
详细信息
ISBN:
(纸本)9780769549903
Manifold learning has been effectively used in computervision applications for dimensionality reduction that improves classification performance and reduces computational load. Grassmann manifolds are well suited for computervision problems because they promote smooth surfaces where points are represented as subspaces. In this paper we propose Grassmannian Sparse Representations (GSR), a novel subspace learning algorithm that combines the benefits of Grassmann manifolds with sparse representations using least squares loss L1-norm minimization for optimal classification. We further introduce a new descriptor that we term Motion Depth Surface (MDS) and compare its classification performance against the traditional Motion History Image (MHI) descriptor. We demonstrate the effectiveness of GSR on computationally intensive 3D action sequences from the Microsoft Research 3D-Action and 3D-Gesture datasets.
暂无评论