Recent advancements in blockchain-based cloud computing highlight its potential in providing robust data security, integrity, and confidentiality. While cloud computing is increasingly utilized for remote resource acc...
详细信息
The video grounding(VG) task aims to locate the queried action or event in an untrimmed video based on rich linguistic descriptions. Existing proposal-free methods are trapped in the complex interaction between video ...
详细信息
The video grounding(VG) task aims to locate the queried action or event in an untrimmed video based on rich linguistic descriptions. Existing proposal-free methods are trapped in the complex interaction between video and query, overemphasizing cross-modal feature fusion and feature correlation for VG. In this paper, we propose a novel boundary regression paradigm that performs regression token learning in a transformer. Particularly, we present a simple but effective proposal-free framework, namely video grounding transformer(ViGT), which predicts the temporal boundary using a learnable regression token rather than multi-modal or cross-modal features. In ViGT, the benefits of a learnable token are manifested as follows.(1) The token is unrelated to the video or the query and avoids data bias toward the original video and query.(2) The token simultaneously performs global context aggregation from video and query ***, we employed a sharing feature encoder to project both video and query into a joint feature space before performing cross-modal co-attention(i.e., video-to-query attention and query-to-video attention) to highlight discriminative features in each modality. Furthermore, we concatenated a learnable regression token [REG] with the video and query features as the input of a vision-language transformer. Finally, we utilized the token [REG] to predict the target moment and visual features to constrain the foreground and background probabilities at each timestamp. The proposed ViGT performed well on three public datasets:ANet-Captions, TACoS, and YouCookⅡ. Extensive ablation studies and qualitative analysis further validated the interpretability of ViGT.
Due to nodes in the Internet of Medical Things are generally implemented in accessible, unsecured contexts, authentication of devices is crucial. In order to address hardware security issues in Internet of Medical Thi...
详细信息
This paper focuses on the adoption of biometric and RFID security gadgets as innovative solutions for enhancing door lock systems. The traditional reliance on physical keys has proven vulnerable to security breaches, ...
详细信息
The most prevalent cancer in women worldwide is breast cancer. A better outlook and lower mortality rates depend on early detection. Machine learning algorithms have recently demonstrated encouraging results in assist...
详细信息
DNA sequencing data analysis has been an important area of research. However, due to the high-cost sequence production, data production is hindered. In recent years, the development of Next Generation Sequencing techn...
详细信息
Gas leaks pose a serious risk to public safety and the environment, endangering human health, destroying infrastructure, and increasing greenhouse gas emissions. In order to minimize the harm and protect human life, e...
详细信息
This paper discusses the clinical chatbot which could examine the contamination and deliver essential insights regarding the contamination previous to counseling a specialist. To lower the healthcare charges and simil...
详细信息
Detecting sleepiness in drivers while driving is essential in order to prevent accidents and reduce the number of mortality caused by drivers sleeping behind the wheels. With the rapid growth of the population, the nu...
详细信息
The polar regions of the Earth, specifically the Arctic and Antarctic, play a pivotal role in regulating the planet’s climate systems. These regions are highly sensitive to climate change, with observable shifts such...
详细信息
暂无评论