This paper is focused on the synchronous control problem of teleoperation robot arms based on Markov jump linear systems. More specifically, a drive-response Markov model is used to describe the master and slave teleo...
详细信息
Dear editor, Key security is of great practical significance and demand to guarantee the security of digital assets in the blockchain system. At present, users prefer to escrow their assets on centralized institutions...
Dear editor, Key security is of great practical significance and demand to guarantee the security of digital assets in the blockchain system. At present, users prefer to escrow their assets on centralized institutions, but this phenomenon has gone against the unique characteristics of decentralization and anonymity in the blockchain. Among them, the suspense incidents of assets lock or lost are enough to prove that the security of escrowed keys on the exchanges is questionable.
Dear Editor,This letter proposes a symmetry-preserving dual-stream graph neural network(SDGNN) for precise representation learning to an undirected weighted graph(UWG). Although existing graph neural networks(GNNs) ar...
详细信息
Dear Editor,This letter proposes a symmetry-preserving dual-stream graph neural network(SDGNN) for precise representation learning to an undirected weighted graph(UWG). Although existing graph neural networks(GNNs) are influential instruments for representation learning to a UWG, they invariably adopt a unique node feature matrix for illustrating the sole node set of a UWG.
Time series data generated by thousands of sensors are suffering data quality problems. Traditional constraint-based techniques have greatly contributed to data cleaning applications. However, cleaning methods that su...
详细信息
The research on the mental health education of college students is of great significance to improve the mental health level and quality of college students. In terms of research methods, there are both traditional sta...
详细信息
Federated learning (FL) safeguards user privacy by uploading gradients instead of raw data. However, inference attacks can reconstruct raw data using gradients uploaded by users in FL. To mitigate this issue, research...
详细信息
This paper presents ControlVideo for text-driven video editing — generating a video that aligns with a given text while preserving the structure of the source video. Building on a pre-trained text-to-image diffusion ...
详细信息
This paper presents ControlVideo for text-driven video editing — generating a video that aligns with a given text while preserving the structure of the source video. Building on a pre-trained text-to-image diffusion model, ControlVideo enhances the fidelity and temporal consistency by incorporating additional conditions(such as edge maps), and fine-tuning the key-frame and temporal attention on the source video-text pair via an in-depth exploration of the design space. Extensive experimental results demonstrate that ControlVideo outperforms various competitive baselines by delivering videos that exhibit high fidelity w.r.t. the source content, and temporal consistency, all while aligning with the text. By incorporating low-rank adaptation layers into the model before training, ControlVideo is further empowered to generate videos that align seamlessly with reference images. More importantly, ControlVideo can be readily extended to the more challenging task of long video editing(e.g., with hundreds of frames), where maintaining long-range temporal consistency is crucial. To achieve this, we propose to construct a fused ControlVideo by applying basic ControlVideo to overlapping short video segments and key frame videos and then merging them by pre-defined weight functions. Empirical results validate its capability to create videos across 140 frames, which is approximately 5.83 to 17.5 times more than what previous studies achieved. The code is available at https://***/thu-ml/controlvideo.
Glacier dynamics in the Himalayan midlatitudes,particularly in regions like the Shishapangma,are not yet fully understood,especially the localized topographic and climatic impacts on glacier *** study analyzes the spa...
详细信息
Glacier dynamics in the Himalayan midlatitudes,particularly in regions like the Shishapangma,are not yet fully understood,especially the localized topographic and climatic impacts on glacier *** study analyzes the spatiotemporal characteristics of glacier surface deformation in the Shishapangma region using the Small Baseline Subset(SBAS)Interferometric Synthetic Aperture Radar(In SAR)*** analysis reveals an average deformation rate of-4.02±17.65 mm/yr across the entire study area,with glacier regions exhibiting significantly higher rates of uplift(16.87±13.20 mm/yr)and subsidence(20.11±14.55 mm/yr)compared to non-glacier *** identifies significant surface lowering on the mountain flanks and localized uplift in certain catchments,emphasizing the higher deformation rates in glacial areas compared to non-glacial *** found a strong positive correlation between temperature and cumulative deformation(correlation coefficient of 0.63),particularly in glacier areas(0.82).The research highlights the role of temperature as the primary driver of glacier wastage,particularly at lower elevations,with strong correlations found between temperature and cumulative *** also indicates the complex interactions between topographic features,notably,slope gradient,which shows a positive correlation with subsidence rates,especially for slopes below 35°.South-,southwest-,and west-facing slopes exhibit significant uplift,while north-,northeast-,and east-facing slopes predominantly ***,we identified transition zones between debris-covered glaciers and clean ice as areas of most intense deformation,with average rates exceeding 30 mm/yr,highlighting these as potential high-risk zones for *** study comprehensively analyzes the deformation characteristics in both glacier and non-glacier areas in the Shishapangma region,revealing the complex interplay of topographic,climatic,and hydrological factors influencing glacier dynamic
Sparse representation plays an important role in the research of face *** a deformable sample classification task,face recognition is often used to test the performance of classification *** face recognition,differenc...
详细信息
Sparse representation plays an important role in the research of face *** a deformable sample classification task,face recognition is often used to test the performance of classification *** face recognition,differences in expression,angle,posture,and lighting conditions have become key factors that affect recognition ***,there may be significant differences between different image samples of the same face,which makes image classification very ***,how to build a robust virtual image representation becomes a vital *** solve the above problems,this paper proposes a novel image classification ***,to better retain the global features and contour information of the original sample,the algorithm uses an improved non‐linear image representation method to highlight the low‐intensity and high‐intensity pixels of the original training sample,thus generating a virtual ***,by the principle of sparse representation,the linear expression coefficients of the original sample and the virtual sample can be calculated,*** obtaining these two types of coefficients,calculate the distances between the original sample and the test sample and the distance between the virtual sample and the test *** two distances are converted into distance ***,a simple and effective weight fusion scheme is adopted to fuse the classification scores of the original image and the virtual *** fused score will determine the final classification *** experimental results show that the proposed method outperforms other typical sparse representation classification methods.
The video grounding(VG) task aims to locate the queried action or event in an untrimmed video based on rich linguistic descriptions. Existing proposal-free methods are trapped in the complex interaction between video ...
详细信息
The video grounding(VG) task aims to locate the queried action or event in an untrimmed video based on rich linguistic descriptions. Existing proposal-free methods are trapped in the complex interaction between video and query, overemphasizing cross-modal feature fusion and feature correlation for VG. In this paper, we propose a novel boundary regression paradigm that performs regression token learning in a transformer. Particularly, we present a simple but effective proposal-free framework, namely video grounding transformer(ViGT), which predicts the temporal boundary using a learnable regression token rather than multi-modal or cross-modal features. In ViGT, the benefits of a learnable token are manifested as follows.(1) The token is unrelated to the video or the query and avoids data bias toward the original video and query.(2) The token simultaneously performs global context aggregation from video and query ***, we employed a sharing feature encoder to project both video and query into a joint feature space before performing cross-modal co-attention(i.e., video-to-query attention and query-to-video attention) to highlight discriminative features in each modality. Furthermore, we concatenated a learnable regression token [REG] with the video and query features as the input of a vision-language transformer. Finally, we utilized the token [REG] to predict the target moment and visual features to constrain the foreground and background probabilities at each timestamp. The proposed ViGT performed well on three public datasets:ANet-Captions, TACoS, and YouCookⅡ. Extensive ablation studies and qualitative analysis further validated the interpretability of ViGT.
暂无评论