Despite many successful applications of robust statistics, they have yet to be completely adapted to many computervision problems. Range reconstruction, particularly in unstructured environments, requires a robust es...
详细信息
ISBN:
(纸本)0818672587
Despite many successful applications of robust statistics, they have yet to be completely adapted to many computervision problems. Range reconstruction, particularly in unstructured environments, requires a robust estimator that not only tolerates a large outlier percentage but also tolerates several discontinuities, extracting multiple surfaces in an image region. Observing that random outliers and/or points from across discontinuities increase a hypothesized fit's scale estimate (standard deviation of the noise), our new operator, called MUSE (Minimum Unbiased Scale Estimator), evaluates a hypothesized fit over potential inlier sets via an objective function of unbiased scale estimates. MUSE extracts the single best fit from the data by minimizing its objective function over a set of hypothesized fits and can sequentially extract multiple surfaces from an image region. We show MUSE to be effective on synthetic data modelling small scale discontinuities and in preliminary experiments on complicated range data.
In this paper, we introduce a challenging new dataset, MLB-YouTube, designed for fine-grained activity detection. The dataset contains two settings: segmented video classification as well as activity detection in cont...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
In this paper, we introduce a challenging new dataset, MLB-YouTube, designed for fine-grained activity detection. The dataset contains two settings: segmented video classification as well as activity detection in continuous videos. We experimentally compare various recognition approaches capturing temporal structure in activity videos, by classifying segmented videos and extending those approaches to continuous videos. We also compare models on the extremely difficult task of predicting pitch speed and pitch type from broadcast baseball videos. We find that learning temporal structure is valuable for fine-grained activity recognition.
Focus of attention mechanisms for robot vision are discussed. A new method for neglecting low level filter responses from already modelled structures is presented. The method is based on a filtering technique termed n...
详细信息
ISBN:
(纸本)0818672587
Focus of attention mechanisms for robot vision are discussed. A new method for neglecting low level filter responses from already modelled structures is presented. The method is based on a filtering technique termed normalized convolution. In one experiment, the robot is continuously moving its arm in the scene while tracking other objects. It is shown how the arm can be made 'invisible' so that only the moving object of interest is detected. This makes tracking of objects much simpler. In another experiment, the attention of the system is shifted between objects by simply cancelling the mask of the object to be attended to. With this strategy the low level processes do not need to know the difference between a new object entering the scene and a mask being cancelled, and thus a complex communication structure between high and low levels is avoided.
Traditional empirical risk minimization (ERM) for semantic segmentation can disproportionately advantage or disadvantage certain target classes in favor of an (unfair but) improved overall performance. Inspired by the...
详细信息
ISBN:
(纸本)9781665448994
Traditional empirical risk minimization (ERM) for semantic segmentation can disproportionately advantage or disadvantage certain target classes in favor of an (unfair but) improved overall performance. Inspired by the recently introduced tilted ERM (TERM), we propose tilted cross-entropy (TCE) loss and adapt it to the semantic segmentation setting to minimize performance disparity among target classes and promote fairness. Through quantitative and qualitative performance analyses, we demonstrate that the proposed Stochastic TCE for semantic segmentation can offer improved overall fairness by efficiently minimizing the performance disparity among the target classes of Cityscapes.
Object recognition on the satellite images is one of the most relevant and popular topics in the problem of patternrecognition. This was facilitated by many factors, such as a high number of satellites with high-reso...
详细信息
ISBN:
(数字)9781538661000
ISBN:
(纸本)9781538661000
Object recognition on the satellite images is one of the most relevant and popular topics in the problem of patternrecognition. This was facilitated by many factors, such as a high number of satellites with high-resolution imagery, the significant development of computervision, especially with a major breakthrough in the field of convolutional neural networks, a wide range of industry verticals for usage and still a quite empty market. Roads are one of the most popular objects for recognition. In this article, we want to present you the combination of work of neural network and postprocessing algorithm, due to which we get not only the coverage mask but also the vectors of all of the individual roads that are present in the image and can be used to address the higher-level tasks in the future. This approach was used to solve the DeepGlobe Road Extraction Challenge.
Trajectory prediction is an important task in autonomous driving. State-of-the-art trajectory prediction models often use attention mechanisms to model the interaction between agents. In this paper, we show that the a...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Trajectory prediction is an important task in autonomous driving. State-of-the-art trajectory prediction models often use attention mechanisms to model the interaction between agents. In this paper, we show that the attention information from such models can also be used to measure the importance of each agent with respect to the ego vehicle's future planned trajectory. Our experiment results on the nuPlans dataset show that our method can effectively find and rank surrounding agents by their impact on the ego's plan.
We have designed and implemented a system for real-time detection qi 2-D features on a reconfigurable computer based on Field Programmable Gate Arrays (FPGA's). We envision this device as the front-end si a system...
详细信息
ISBN:
(纸本)0818684976
We have designed and implemented a system for real-time detection qi 2-D features on a reconfigurable computer based on Field Programmable Gate Arrays (FPGA's). We envision this device as the front-end si a system able to track image features in real-time control applications like autonomous vehicle navigation. The algorithm employed to select good features is inspired by Tomasi and Kanade's method. Compared to the original method, the algorithm that we have devised does not require any floating point or transcendental operations, and can be implemented either in hardware or in software. Moreover, it maps efficiently into a highly pipelined architecture, well suited to implementation in FPGA technology. We have implemented the algorithm on a lour-cost reconfigurable computer and have observed reliable operation on an image stream generated by a standard NTSC video camera at 30 Hz.
Human-object interaction (HOI) detection is a core task in computervision. The goal is to localize all human-object pairs and recognize their interactions. An interaction defined by a tuple leads to a long-tailed vi...
详细信息
ISBN:
(纸本)9781728193601
Human-object interaction (HOI) detection is a core task in computervision. The goal is to localize all human-object pairs and recognize their interactions. An interaction defined by a tuple leads to a long-tailed visual recognition challenge since many combinations are rarely represented. The performance of the proposed models is limited especially for the tail categories, but little has been done to understand the reason. To that end, in this paper, we propose to diagnose rarity in HOI detection. We propose a three-step strategy, namely Detection, Identification and recognition where we carefully analyse the limiting factors by studying state-of-the-art models. Our findings indicate that detection and identification steps are altered by the interaction signals like occlusion and relative location, as a result limiting the recognition accuracy.
This paper presents a sensing approach where phototransduction, multi-resolution feature extraction, scale-space integration and edge tracking are performed on a mixed (digital-analog) VLSI architecture in order to ge...
详细信息
ISBN:
(纸本)0818658258
This paper presents a sensing approach where phototransduction, multi-resolution feature extraction, scale-space integration and edge tracking are performed on a mixed (digital-analog) VLSI architecture in order to generate medium-level scene description. The proposed system is mainly targeted for robot vision applications where feature description is preferred to a set of raw or raster 2D images and edge maps. The Multiport Access photo-Receptor (MAR) is a CMOS sensor and represents the main sensory part of this integrated image acquisition system. VLSI also provides means to integrate analog computing, digital controller and DSP co-processor modules which define a powerful sensory chip set for focal plane image processing. A current version of the MAR sensor which implements 256 × 256 pixels includes 16 analog spatial filters which simultaneously compute multiresolution edge maps. This unique 2D hexagonal smart sensor approach which performs up to 8.5 × 109 arithmetic Op/sec during the acquisition/filtering phase and 25 × 109 Logical Op/sec for scale-space integration allows high resolution image capability. It represents a significant improvement for passive sensory units in a compact assembly for computervision applications.
Recent interest in developing online computervision algorithms is spurred in part by a growth of applications capable of generating large volumes of images and videos. These applications are rich sources of images an...
详细信息
ISBN:
(纸本)9781479943098
Recent interest in developing online computervision algorithms is spurred in part by a growth of applications capable of generating large volumes of images and videos. These applications are rich sources of images and video streams. Online vision algorithms for managing, processing and analyzing these streams need to rely upon streaming concepts, such as pipelines, to ensure timely and incremental processing of data. This paper is a first attempt at defining a formal stream algebra that provides a mathematical description of vision pipelines and describes the distributed manipulation of image and video streams. We also show how our algebra can effectively describe the vision pipelines of two state of the art techniques.
暂无评论