Probe machine(PM) is a recently reported mathematic model with massive parallelism. Herein,we presented searching the maximum clique of an undirected graph with six vertices. We constructed data library containing n s...
详细信息
Probe machine(PM) is a recently reported mathematic model with massive parallelism. Herein,we presented searching the maximum clique of an undirected graph with six vertices. We constructed data library containing n sublibraries, each sublibrary corresponded to a vertex in the given graph. Then, probe library according to the induced subgraph was designed in order to search and generate all maximal cliques. Subsequently,we performed probe operation, and all maximal cliques were generated in parallel. The advantages of the proposed model lie in two aspects. On one hand, solution to NP-complete problem is generated in just one step of probe operation rather than found in vast solution *** the other hand, the proposed model is highly *** work demonstrates that PM is superior to TM in terms of searching capacity when tackling NP-complete problem.
This paper presents ControlVideo for text-driven video editing — generating a video that aligns with a given text while preserving the structure of the source video. Building on a pre-trained text-to-image diffusion ...
详细信息
This paper presents ControlVideo for text-driven video editing — generating a video that aligns with a given text while preserving the structure of the source video. Building on a pre-trained text-to-image diffusion model, ControlVideo enhances the fidelity and temporal consistency by incorporating additional conditions(such as edge maps), and fine-tuning the key-frame and temporal attention on the source video-text pair via an in-depth exploration of the design space. Extensive experimental results demonstrate that ControlVideo outperforms various competitive baselines by delivering videos that exhibit high fidelity w.r.t. the source content, and temporal consistency, all while aligning with the text. By incorporating low-rank adaptation layers into the model before training, ControlVideo is further empowered to generate videos that align seamlessly with reference images. More importantly, ControlVideo can be readily extended to the more challenging task of long video editing(e.g., with hundreds of frames), where maintaining long-range temporal consistency is crucial. To achieve this, we propose to construct a fused ControlVideo by applying basic ControlVideo to overlapping short video segments and key frame videos and then merging them by pre-defined weight functions. Empirical results validate its capability to create videos across 140 frames, which is approximately 5.83 to 17.5 times more than what previous studies achieved. The code is available at https://***/thu-ml/controlvideo.
Many well-known and effective anomaly detection methods assume that a reasonable decision boundary has a hypersphere shape, which however is difficult to obtain in practice and is not sufficiently compact, especially ...
详细信息
The video grounding(VG) task aims to locate the queried action or event in an untrimmed video based on rich linguistic descriptions. Existing proposal-free methods are trapped in the complex interaction between video ...
详细信息
The video grounding(VG) task aims to locate the queried action or event in an untrimmed video based on rich linguistic descriptions. Existing proposal-free methods are trapped in the complex interaction between video and query, overemphasizing cross-modal feature fusion and feature correlation for VG. In this paper, we propose a novel boundary regression paradigm that performs regression token learning in a transformer. Particularly, we present a simple but effective proposal-free framework, namely video grounding transformer(ViGT), which predicts the temporal boundary using a learnable regression token rather than multi-modal or cross-modal features. In ViGT, the benefits of a learnable token are manifested as follows.(1) The token is unrelated to the video or the query and avoids data bias toward the original video and query.(2) The token simultaneously performs global context aggregation from video and query ***, we employed a sharing feature encoder to project both video and query into a joint feature space before performing cross-modal co-attention(i.e., video-to-query attention and query-to-video attention) to highlight discriminative features in each modality. Furthermore, we concatenated a learnable regression token [REG] with the video and query features as the input of a vision-language transformer. Finally, we utilized the token [REG] to predict the target moment and visual features to constrain the foreground and background probabilities at each timestamp. The proposed ViGT performed well on three public datasets:ANet-Captions, TACoS, and YouCookⅡ. Extensive ablation studies and qualitative analysis further validated the interpretability of ViGT.
Dear Editor, This letter deals with the problem of algorithm recommendation for online fault detection of spacecraft. By transforming the time series data into distributions and introducing a distribution-aware measur...
详细信息
Dear Editor, This letter deals with the problem of algorithm recommendation for online fault detection of spacecraft. By transforming the time series data into distributions and introducing a distribution-aware measure, a principal method is designed for quantifying the detectabilities of fault detection algorithms over special datasets.
The incentive mechanism of federated learning has been a hot topic,but little research has been done on the compensation of privacy *** this end,this study uses the Local SGD federal learning framework and gives a the...
详细信息
The incentive mechanism of federated learning has been a hot topic,but little research has been done on the compensation of privacy *** this end,this study uses the Local SGD federal learning framework and gives a theoretical analysis under the use of differential privacy *** on the analysis,a multi‐attribute reverse auction model is proposed to be used for user selection as well as payment calculation for participation in federal *** model uses a mixture of economic and non‐economic attributes in making choices for users and is transformed into an optimisation equation to solve the user choice *** addition,a post‐auction negotiation model that uses the Rubinstein bargaining model as well as optimisation equations to describe the negotiation process and theoretically demonstrate the improvement of social welfare is *** the experimental part,the authors find that their algorithm improves both the model accuracy and the F1‐score values relative to the comparison algorithms to varying degrees.
The new shoot density of slash pine serves as a vital indicator for assessing its growth and photosynthetic capacity,while the number of new shoots offers an intuitive reflection of this *** deep learning methods beco...
详细信息
The new shoot density of slash pine serves as a vital indicator for assessing its growth and photosynthetic capacity,while the number of new shoots offers an intuitive reflection of this *** deep learning methods becoming increasingly popular,automated counting of new shoots has greatly improved in recent years but is still limited by tedious and expensive data collection and *** resolve these issues,this paper proposes a semi-supervised counting network(MTSC-Net)for estimating the number of slash pine new ***,based on the mean-teacher framework,we introduce the improved VGG19 to extract multiscale new shoot ***,to connect local new shoot feature information with global channel features,attention feature fusion module is introduced to achieve effective feature ***,the new shoot density map and density probability distribution are processed in a fine-grained manner through multiscale dilated convolution of the regression head and classification *** addition,a masked image modeling strategy is introduced to encourage the contextual understanding of global new shoot features and improve the counting *** experimental results show that MTSC-Net outperforms other semi-supervised counting models with labeled percentages ranging from 5%to 50%.When the labeled percentage is 5%,the mean absolute error and root mean square error are 17.71 and 25.49,*** findings demonstrate that our work can be used as an efficient semi-supervised counting method to provide automated support for tree breeding and genetic utilization.
This paper introduces the Surrogate-assisted Multi-objective Grey Wolf Optimizer(SMOGWO)as a novel methodology for addressing the complex problem of empty-heavy train allocation,with a focus on line utilization *** in...
详细信息
This paper introduces the Surrogate-assisted Multi-objective Grey Wolf Optimizer(SMOGWO)as a novel methodology for addressing the complex problem of empty-heavy train allocation,with a focus on line utilization *** integrating surrogate models to approximate the objective functions,SMOGWO significantly improves the efficiency and accuracy of the optimization *** effectiveness of this approach is evaluated using the CEC2009 multi-objective test function suite,where SMOGWO achieves a superiority rate of 76.67%compared to other leading multi-objective ***,the practical applicability of SMOGWO is demonstrated through a case study on empty and heavy train allocation,which validates its ability to balance line capacity,minimize transportation costs,and optimize the technical combination of heavy *** research highlights SMOGWO's potential as a robust solution for optimization challenges in railway transportation,offering valuable contributions toward enhancing operational efficiency and promoting sustainable development in the sector.
Three-dimensional ink rendering is a NPR (Non-Photorealistic rendering) art style, widely used in a range of fields, including gaming and ***, CycleGAN is a standard image transformation model, however, it prefers Wes...
详细信息
Network traffic anomaly detection is essential for securing digital infrastructures, yet traditional methods often fail to handle the complexity and dynamics of network data effectively. Additionally, this paper prese...
详细信息
暂无评论