This paper presents a Cloud-based architecture for detecting and tracking multiple moving targets from airborne videos combined with the audio assistance, which is called Cloud-based Audio-Video (CAV) fusion. The CAV ...
详细信息
This paper presents a Cloud-based architecture for detecting and tracking multiple moving targets from airborne videos combined with the audio assistance, which is called Cloud-based Audio-Video (CAV) fusion. The CAV system innovation is a method for user-based voice-to-text color feature descriptor track matching with an automated hue feature extraction from image pixels. The introduced CAV approach is general purpose for detecting and tracking different valuable targets' movement for suspicious behavior recognition through multi-intelligence data fusion. Using Cloud computing leads to real-time performance as compared a single machine workflow. The obtained multiple moving target tracking results from airborne videos demonstrate that the CAV approach provides improved frame rate, enhanced detection, and real-time tracking and classification performance under realistic conditions.
This paper is based on the observation that if the viewing camera is appropriately mounted on a vehicle which moves on a planar surface, the problem of motion and structure recovery from optical flow becomes linear an...
详细信息
This paper is based on the observation that if the viewing camera is appropriately mounted on a vehicle which moves on a planar surface, the problem of motion and structure recovery from optical flow becomes linear and, in principle, can be solved locally. It is shown that angular velocity and depth can be computed from one component only of the optical flow, and that the accuracy in the estimation of depth from the vertical component is more accurate than that from the horizontal component. Experiments on synthetic and real sequences support the presented analysis.< >
No feature-based vision system can work unless good features can be identified and tracked from frame to frame. Although tracking itself is by and large a solved problem, selecting features that can be tracked well an...
详细信息
No feature-based vision system can work unless good features can be identified and tracked from frame to frame. Although tracking itself is by and large a solved problem, selecting features that can be tracked well and correspond to physical points in the world is still hard. We propose a feature selection criterion that is optimal by construction because it is based on how the tracker works, and a feature monitoring method that can detect occlusions, disocclusions, and features that do not correspond to points in the world. These methods are based on a new tracking algorithm that extends previous Newton-Raphson style search methods to work under affine image transformations. We test performance with several simulations and experiments.< >
Interactive Image Retrieval (IIR) aims to retrieve images that are generally similar to the reference image but under the requested text modification. The existing methods usually concatenate or sum the features of im...
Interactive Image Retrieval (IIR) aims to retrieve images that are generally similar to the reference image but under the requested text modification. The existing methods usually concatenate or sum the features of image and text simply and roughly, which, however, is difficult to precisely change the local semantics of the image that the text intends to modify. To solve this problem, we propose a Language Guided Local Infiltration (LGLI) system, which fully utilizes the text information and penetrates text features into image features as much as possible. Specifically, we first propose a Language Prompt Visual Localization (LPVL) module to generate a localization mask which explicitly locates the region (semantics) intended to be modified. Then we introduce a Text Infiltration with Local Awareness (TILA) module, which is deployed in the network to precisely modify the reference image and generate image-text infiltrated representation. Extensive experiments on various benchmark databases validate that our method outperforms most state-of-the-art IIR approaches.
We present a probabilistic reliable-inference framework to address the issue of rapid-and-reliable detection of human actions. The approach determines the shortest video exposure needed for low-latency recognition by ...
详细信息
We present a probabilistic reliable-inference framework to address the issue of rapid-and-reliable detection of human actions. The approach determines the shortest video exposure needed for low-latency recognition by sequentially evaluating a series of posterior class ratios to find the earliest reliable decision point. Results are presented for a set of people walking, running, and standing at different styles and multiple viewpoints, and compared to an alternative ML approach.
This paper deals with the recovery of 3D information using a single mobile camera in the context of active vision. We propose a general revisited formulation of the structure-from-motion issue, and we determine adequa...
详细信息
This paper deals with the recovery of 3D information using a single mobile camera in the context of active vision. We propose a general revisited formulation of the structure-from-motion issue, and we determine adequate camera configurations and motions which lead to a robust and accurate estimation of the 3D structure parameters. We apply the visual servoing approach to perform these camera motions. Real-time experiments dealing with the 3D structure estimation of points and cylinders are reported, and demonstrate that this active vision strategy can very significantly improve the estimation accuracy.< >
Anticipation problem has been studied considering different aspects such as predicting humans’ locations, predicting hands and objects trajectories, and forecasting actions and human-object interactions. In this pape...
Anticipation problem has been studied considering different aspects such as predicting humans’ locations, predicting hands and objects trajectories, and forecasting actions and human-object interactions. In this paper, we studied the short-term object interaction anticipation problem from the egocentric point of view, proposing a new end-to-end architecture named StillFast. Our approach simultaneously processes a still image and a video detecting and localizing next-active objects, predicting the verb which describes the future interaction and determining when the interaction will start. Experiments on the large-scale egocentric dataset EGO4D [17] show that our method outperformed state-of-the-art approaches on the considered task. Our method is ranked first in the public leaderboard of the EGO4D short term object interaction anticipation challenge 2022 and it is the official baseline for the 2023 one. Please see the project web page for code and additional details: https://***/stillfast/.
As the use of deep neural networks continues to grow, understanding their behaviour has become more crucial than ever. Post-hoc explainability methods are a potential solution, but their reliability is being called in...
As the use of deep neural networks continues to grow, understanding their behaviour has become more crucial than ever. Post-hoc explainability methods are a potential solution, but their reliability is being called into question. Our research investigates the response of post-hoc visual explanations to naturally occurring transformations, often referred to as augmentations. We anticipate explanations to be invariant under certain transformations, such as changes to the colour map while responding in an equivariant manner to transformations like translation, object scaling, and rotation. We have found remarkable differences in robustness depending on the type of transformation, with some explainability methods (such as LRP composites and Guided Backprop) being more stable than others. We also explore the role of training with data augmentation. We provide evidence that explanations are typically less robust to augmentation than classification performance, regardless of whether data augmentation is used in training or not.
We present a novel approach to localizing parts in images of human faces. The approach combines the output of local detectors with a non-parametric set of global models for the part locations based on over one thousan...
详细信息
A vectorization method for line patterns is proposed which converts digital binary images into line segment vectors. The vector data is more compact and more natural than that obtained by conventional methods using th...
详细信息
ISBN:
(纸本)0818608625
A vectorization method for line patterns is proposed which converts digital binary images into line segment vectors. The vector data is more compact and more natural than that obtained by conventional methods using thinning operations. The proposed method consists of four steps. First, thinning of an input binary image is performed. Then a medial line image obtained by the thinning operation is transformed into a graph, in which pixels on the medial line correspond to nodes and neighboring nodes are connected by edges. Next, extra edges unnecessary for preserving the topology of the medial line image are deleted. The deletion can be implemented as an iterative parallel operation. Finally, the graph is simplified by line approximation. Every step except the line approximation is suitable for parallel processing. The experimental results of applying the proposed method to geographical maps show that the method reduces data volume by about 30%, as compared with conventional methods.
暂无评论