The video grounding(VG) task aims to locate the queried action or event in an untrimmed video based on rich linguistic descriptions. Existing proposal-free methods are trapped in the complex interaction between video ...
详细信息
The video grounding(VG) task aims to locate the queried action or event in an untrimmed video based on rich linguistic descriptions. Existing proposal-free methods are trapped in the complex interaction between video and query, overemphasizing cross-modal feature fusion and feature correlation for VG. In this paper, we propose a novel boundary regression paradigm that performs regression token learning in a transformer. Particularly, we present a simple but effective proposal-free framework, namely video grounding transformer(ViGT), which predicts the temporal boundary using a learnable regression token rather than multi-modal or cross-modal features. In ViGT, the benefits of a learnable token are manifested as follows.(1) The token is unrelated to the video or the query and avoids data bias toward the original video and query.(2) The token simultaneously performs global context aggregation from video and query ***, we employed a sharing feature encoder to project both video and query into a joint feature space before performing cross-modal co-attention(i.e., video-to-query attention and query-to-video attention) to highlight discriminative features in each modality. Furthermore, we concatenated a learnable regression token [REG] with the video and query features as the input of a vision-language transformer. Finally, we utilized the token [REG] to predict the target moment and visual features to constrain the foreground and background probabilities at each timestamp. The proposed ViGT performed well on three public datasets:ANet-Captions, TACoS, and YouCookⅡ. Extensive ablation studies and qualitative analysis further validated the interpretability of ViGT.
A new algebraic transition model is proposed based on a Structural Ensemble Dynamics(SED)theory of wall turbulence,for accurately predicting the hypersonic flow heat transfer on *** model defines the eddy viscosity in...
详细信息
A new algebraic transition model is proposed based on a Structural Ensemble Dynamics(SED)theory of wall turbulence,for accurately predicting the hypersonic flow heat transfer on *** model defines the eddy viscosity in terms of a two-dimensional multi-regime distribution of a Stress Length(SL)function,and hence is named as *** paper presents clear evidence of precise predictions of transition onset location and peak heat flux of a wide range of hypersonic Transitional Boundary Layers(TrBL)around straight cone at zero incidence,to an unprecedented accuracy as validated by over 70 measurements for varying five crucial influential factors(Mach number,temperature ratio,cone half angle,nose Reynolds number and noise level).The results demonstrate the universality of the postulated multi-regime similarity structure,in characterizing not only the spatial non-uniform distribution of the eddy viscosity in hypersonic TrBL on cone,but also the dependence of the transition onset location on the five influential *** latter yields a novel correlation formula for transition center Reynolds number which takes similar functional form as the SL function within the symmetry *** is concluded that the SED-SL model simulates TrBL around cone with uniformly high accuracy,and then points out to an optimistic alternative way to construct hypersonic transition model.
The hollow soft magnetic particle-based magnetorheological (MR) fluid has been extensively studied due to its good anti-settling property. However, studies are mainly focused on experiments, and the discussion of the ...
详细信息
THE development of agriculture faces significant challenges due to population growth, climate change, land depletion, and environmental pollution, threatening global food security [1]. This necessitates the developmen...
THE development of agriculture faces significant challenges due to population growth, climate change, land depletion, and environmental pollution, threatening global food security [1]. This necessitates the development of sustainable agriculture, where a fundamental step is crop breeding to improve agronomic or economic traits, e.g., increasing yields of crops while decreasing resource usage and minimizing pollution to the environment [2].
THE tremendous impact of large models represented by ChatGPT[1]-[3]makes it necessary to con-sider the practical applications of such models[4].However,for an artificial intelligence(AI)to truly evolve,it needs to pos...
详细信息
THE tremendous impact of large models represented by ChatGPT[1]-[3]makes it necessary to con-sider the practical applications of such models[4].However,for an artificial intelligence(AI)to truly evolve,it needs to possess a physical“body”to transition from the virtual world to the real world and evolve through interaction with the real *** this context,“embodied intelligence”has sparked a new wave of research and technology,leading AI beyond the digital realm into a new paradigm that can actively act and perceive in a physical environment through tangible entities such as robots and automated devices[5].
Compared to 2D imaging data,the 4D light field(LF)data retains richer scene’s structure information,which can significantly improve the computer’s perception capability,including depth estimation,semantic segmentati...
详细信息
Compared to 2D imaging data,the 4D light field(LF)data retains richer scene’s structure information,which can significantly improve the computer’s perception capability,including depth estimation,semantic segmentation,and LF ***,there is a contradiction between spatial and angular resolution during the LF image acquisition *** overcome the above problem,researchers have gradually focused on the light field super-resolution(LFSR).In the traditional solutions,researchers achieved the LFSR based on various optimization frameworks,such as Bayesian and Gaussian *** learning-based methods are more popular than conventional methods because they have better performance and more robust generalization *** this paper,the present approach can mainly divided into conventional methods and deep learning-based *** discuss these two branches in light field spatial super-resolution(LFSSR),light field angular super-resolution(LFASR),and light field spatial and angular super-resolution(LFSASR),***,this paper also introduces the primary public datasets and analyzes the performance of the prevalent approaches on these ***,we discuss the potential innovations of the LFSR to propose the progress of our research field.
The investigations of surface waves in the piezoelectric medium bring out great possibility in designing smart surface acoustic wave(SAW)*** is important to study the dispersion properties and manipulation mechanism o...
详细信息
The investigations of surface waves in the piezoelectric medium bring out great possibility in designing smart surface acoustic wave(SAW)*** is important to study the dispersion properties and manipulation mechanism of surface waves in the semi-infinite piezoelectric medium connected with periodic arrangement of shunting *** this study,the extended Stroh formalism is developed to theoretically analyze the dispersion relations of surface waves under different external *** band structures of both the Rayleigh wave and the Bleustein-Gulyaev(BG)wave can be determined and manipulated with proper electrical boundary ***,the electromechanical coupling effects on the band structures of surface waves are discussed to figure out the manipulation mechanism of adjusting electric *** results indicate that the proposed method can explain the propagation behaviors of surface waves under the periodic electrical boundary conditions,and can provide an important theoretical guidance for designing novel SAW devices and exploring extensive applications in practice.
Dear Editor,Light fields give relatively complete description of scenes from perspective of angles and positions of rays. At present time, most of the computer vision algorithms take 2D images as input which are simpl...
详细信息
Dear Editor,Light fields give relatively complete description of scenes from perspective of angles and positions of rays. At present time, most of the computer vision algorithms take 2D images as input which are simplified expression of light fields with depth information discarded. In theory, computer vision tasks may achieve better performance as long as complete light fields are acquired.
Dear Editor,This letter focuses on leveraging the object information in images to improve the performance of the U-Net based change *** detection is fundamental to many computer vision *** existing solutions based on ...
详细信息
Dear Editor,This letter focuses on leveraging the object information in images to improve the performance of the U-Net based change *** detection is fundamental to many computer vision *** existing solutions based on deep neural networks are able to achieve impressive results.
In the coming decades,the space-based gravitational-wave(GW)detectors such as Taiji,TianQin,and LISA are expected to form a network capable of detecting millihertz GWs emitted by the mergers of massive black hole bina...
详细信息
In the coming decades,the space-based gravitational-wave(GW)detectors such as Taiji,TianQin,and LISA are expected to form a network capable of detecting millihertz GWs emitted by the mergers of massive black hole binaries(MBHBs).In this work,we investigate the potential of GW standard sirens from the Taiji-TianQin-LISA network in constraining cosmological *** the optimistic scenario in which electromagnetic(EM)counterparts can be detected,we predict the number of detectable bright sirens based on three different MBHB population models,i.e.,popⅢ,Q3d,and *** results show that the TaijiTianQin-LISA network alone could achieve a constraint precision of 0.9%for the Hubble constant,meeting the standard of precision ***,the Taiji-TianQin-LISA network could effectively break the cosmological parameter degeneracies generated by the CMB data,particularly in the dynamical dark energy *** combined with the CMB data,the joint CMB+Taiji-TianQin-LISA data offerσ(w)=0.036 in the wCDM model,which is close to the latest constraint result obtained from the CMB+SN *** also consider a conservative scenario in which EM counterparts are not *** to the precise sky localizations of MBHBs by the Taiji-TianQin-LISA network,the constraint precision of the Hubble constant is expected to reach 1.2%.In conclusion,the GW standard sirens from the Taiji-TianQin-LISA network will play a critical role in helping solve the Hubble tension and shedding light on the nature of dark energy.
暂无评论