Anticipation problem has been studied considering different aspects such as predicting humans’ locations, predicting hands and objects trajectories, and forecasting actions and human-object interactions. In this pape...
Anticipation problem has been studied considering different aspects such as predicting humans’ locations, predicting hands and objects trajectories, and forecasting actions and human-object interactions. In this paper, we studied the short-term object interaction anticipation problem from the egocentric point of view, proposing a new end-to-end architecture named StillFast. Our approach simultaneously processes a still image and a video detecting and localizing next-active objects, predicting the verb which describes the future interaction and determining when the interaction will start. Experiments on the large-scale egocentric dataset EGO4D [17] show that our method outperformed state-of-the-art approaches on the considered task. Our method is ranked first in the public leaderboard of the EGO4D short term object interaction anticipation challenge 2022 and it is the official baseline for the 2023 one. Please see the project web page for code and additional details: https://***/stillfast/.
It is likely that human-level online learning for vision will require a brain-like developmental model. We present a general purpose model, called the Self-Aware and Self-Effecting (SASE) model, characterized by inter...
详细信息
It is likely that human-level online learning for vision will require a brain-like developmental model. We present a general purpose model, called the Self-Aware and Self-Effecting (SASE) model, characterized by internal sensation and action. Rooted in the biological genomic equivalence principle, this model is a general-purpose cell-centered in-place learning scheme to handle different levels of development and operation, from the cell level all the way to the brain level. It is unknown how the brain self-organizes its internal wiring without a holistically-aware central controller. How does the brain develop internal object representations? How do such representations enable tightly intertwined attention and recognition in the presence of complex backgrounds? Internally in SASE, local neural learning uses only the co-firing between the pre-synaptic and post-synaptic activities. Such a two-way representation automatically boosts action-relevant components in the sensory inputs (e.g., foreground vs. background) by increasing the chance of only action-related feature detectors to win in competition. It enables develop in a “skull-closed” fashion. We discuss SASE networks called Where-What networks (WWN) for the open problem of general purpose online attention and recognition with complex backgrounds. In WWN, desired invariance and specificity emerge at each of the what and where motor ends without an internal master map. WWN allows both type-based top-down attention and location-based top-down attention, to attend and recognize individual objects from complex backgrounds (which may include other objects). It is proposed that WWN deals with any real-world foreground objects and any complex backgrounds.
We describe an FPGA-based on-board control system for autonomous orientation of an aerial robot to assist aerial manipulation tasks. The system is able to apply yaw control to aid an operator to precisely position a d...
详细信息
We describe an FPGA-based on-board control system for autonomous orientation of an aerial robot to assist aerial manipulation tasks. The system is able to apply yaw control to aid an operator to precisely position a drone when it is nearby a bar-like object. This is achieved by applying parallel Hough transform enhanced with a novel image space separation method, enabling highly reliable results in various circumstances combined with high performance. The feasibility of this approach is shown by applying the system to a multi-rotor aerial robot equipped with an upward directed robotic hand on top of the airframe developed for high altitude manipulation tasks. In order to grasp a barlike object, orientation of the bar object is observed from the image data obtained by a monocular camera mounted on the robot. This data is then analyzed by the on-board FPGA system to control yaw angle of the aerial robot. In experiments, reliable yaw-orientation control of the aerial robot is achieved.
A new robust estimator based on an evolutionary optimization technique is proposed. The general hypothesizeand- verify strategy accelerates the parameter estimation substantially by systematic trial and parallel evalu...
详细信息
A new robust estimator based on an evolutionary optimization technique is proposed. The general hypothesizeand- verify strategy accelerates the parameter estimation substantially by systematic trial and parallel evaluation without the use of prior information. The method is evaluated by estimation of multi-view relations, i.e. the fundamental matrix. Additionally, some results for the trifocal geometry are presented. However, the general methodology could be used for any problem in which relations can be determined from a minimum number of points.
The author studies the differential geometry of straight homogeneous generalized cylinders (SHGCs). He derives a necessary and sufficient condition that an SHGC must verify to parameterize a regular surface, computer ...
详细信息
ISBN:
(纸本)0818608625
The author studies the differential geometry of straight homogeneous generalized cylinders (SHGCs). He derives a necessary and sufficient condition that an SHGC must verify to parameterize a regular surface, computer the Gaussian curvature of a regular SHGC, and prove that the parabolic lines of an SHGC are either meridians or parallels. Using these results, he addresses the following problem: under which conditions can a given surface have several descriptions by SHGCs? He proves several results. In particular, he proves that two SHGCs with the same cross-section plane and axis direction are necessarily deduced from each other through inverse scalings of their cross-sections and sweeping rule curve. He extends Shafer's pivot and slant theorems. Finally, he proves that a surface with at least two parabolic lines has at most three different SHGC descriptions, and that a surface with at leat four parabolic lines has at most a unique SHGC description.
This paper proposes a new database sequence pattern search algorithm N-OPS, which is based on text search algorithms, and describes its complete technique for searching a given sequence pattern in stored sequential da...
详细信息
Classifying a visual concept merely from its associated online textual source, such as a Wikipedia article, is an attractive research topic in zero-shot learning because it alleviates the burden of manually collecting...
详细信息
ISBN:
(纸本)9781467388511
Classifying a visual concept merely from its associated online textual source, such as a Wikipedia article, is an attractive research topic in zero-shot learning because it alleviates the burden of manually collecting semantic attributes. Recent work has pursued this approach by exploring various ways of connecting the visual and text domains. In this paper, we revisit this idea by going further to consider one important factor: the textual representation is usually too noisy for the zero-shot learning application. This observation motivates us to design a simple yet effective zero-shot learning method that is capable of suppressing noise in the text. Specifically, we propose an l2,1-norm based objective function which can simultaneously suppress the noisy signal in the text and learn a function to match the text document and visual features. We also develop an optimization algorithm to efficiently solve the resulting problem. By conducting experiments on two large datasets, we demonstrate that the proposed method significantly outperforms those competing methods which rely on online information sources but with no explicit noise suppression. Furthermore, we make an in-depth analysis of the proposed method and provide insight as to what kind of information in documents is useful for zero-shot learning.
暂无评论