Recently, large-scale pre-trained language-image models like CLIP have shown extraordinary capabilities for understanding spatial contents, but naively transferring such models to video recognition still suffers from ...
Recently, large-scale pre-trained language-image models like CLIP have shown extraordinary capabilities for understanding spatial contents, but naively transferring such models to video recognition still suffers from unsatisfactory temporal modeling capabilities. Existing methods insert tunable structures into or in parallel with the pre-trained model, which either requires back-propagation through the whole pre-trained model and is thus resource-demanding, or is limited by the temporal reasoning capability of the pre-trained structure. In this work, we present DiST, which disentangles the learning of spatial and temporal aspects of videos. Specifically, DiST uses a dual-encoder structure, where a pre-trained foundation model acts as the spatial encoder, and a lightweight network is introduced as the temporal encoder. An integration branch is inserted between the encoders to fuse spatio-temporal information. The disentangled spatial and temporal learning in DiST is highly efficient because it avoids the back-propagation of massive pre-trained parameters. Meanwhile, we empirically show that disentangled learning with an extra network for integration benefits both spatial and temporal understanding. Extensive experiments on five benchmarks show that DiST delivers better performance than existing state-of-the-art methods by convincing gaps. When pre-training on the large-scale Kinetics-710, we achieve 89.7% on Kinetics-400 with a frozen ViT-L model, which verifies the scalability of DiST. Codes and models can be found in https://***/alibaba-mmai-research/DiST.
In this paper, we propose a novel transfer learning framework, named generalized subspace distribution adaptation (GSDA), to tackle the challenging cross-corpus speech emotion recognition problem. First, we learn a co...
详细信息
This paper introduces a broadband microwave bandpass filter. The structure of the filter is a filter cavity formed by two balanced dielectric sheets. On the two dielectric sheets, the relative face of the microwave in...
详细信息
Aiming at the problems of the integration of computer course teaching with international teaching and engineering certification, this study designs a multi-dimensional teaching mode of computer courses through the sup...
详细信息
We show that crowd counting can be viewed as a decomposable point querying process. This formulation enables arbitrary points as input and jointly reasons whether the points are crowd and where they locate. The queryi...
详细信息
In the context of the era of big data, the rise of using mobile terminal is also putting forward new requirements for the teaching of Discipline English. Based on some general problems in the teaching practice of this...
详细信息
This paper introduces a microwave filter with pentagram grooves, which belongs to an artificial surface plasmon (SSPPs) type microwave bandpass *** filter adopts a two-stage structure. The first section is a slot-line...
详细信息
The key for bulldozers to realize automatic operation in mine scenes is whether they can accurately identify and accurately segment the retaining walls, however, because the point cloud dataset of the mine site is too...
The key for bulldozers to realize automatic operation in mine scenes is whether they can accurately identify and accurately segment the retaining walls, however, because the point cloud dataset of the mine site is too few, resulting in the deep learning-based identification and segmentation algorithm cannot be applied in this scene. At the same time, because of the rugged road surface of the mine scene, the existence of a lot of dust, numerous disturbances, retaining wall features are not obvious and other problems, the algorithms proposed by previous scholars are not perfect enough to solve the problem. We propose a point cloud recognition and segmentation algorithm based on clustering and evaluation function of integrated features. Ours firstly compensates the skew of the point cloud map with RANSAC and down samples the data by gridding the point cloud. Then reduces the influence of dust and truck materials in the scene by normal vector and variance information. Finally, screens out the candidate target class by density clustering, and identifies and segments the retaining wall by integrated feature. Our proposed algorithm is validated in several different real mine scenarios and the results show that ours has high accuracy and strong robustness.
In view of the serious issues commonly existing with coursework in Chinese universities at present, such as its original function weakening or being suppressed, its form being too abstract and lack of elaborate design...
详细信息
Spatial deployment of large-scale heterogeneous multi-agent systems (HMASs) over desired 2D or 3D curves is investigated in this paper. With assumption that HMASs consist of numerous first-order agents (FOAs) and seco...
Spatial deployment of large-scale heterogeneous multi-agent systems (HMASs) over desired 2D or 3D curves is investigated in this paper. With assumption that HMASs consist of numerous first-order agents (FOAs) and second-order agents (SOAs) that could obtain local information of desired curves and their positions relative to their closest neighbors, the collective dynamics of large-scale HMASs are modeled as heterogeneous partial differential equations (PDEs). In particular, this paper introduces series-dependent topological weights between neighboring agents, which are more versatile and practical than constant topological weights commonly used in previous studies. A novel single-point control scheme is proposed, where an informed agent is situated between the last FOA and first SOA. This operation could not only ensure successful implementation of spatial deployment, but also guarantee well-posedness of the constructed heterogeneous error PDEs. By utilizing inequality techniques, sufficient conditions for exponential convergence of error system are derived. A numerical example is presented to demonstrate effectiveness of the proposed approaches.
暂无评论