This paper proposes to solve the task scheduling problem in cloud computing by using a load balance aware genetic algorithm (LAGA) with Minmin and Max-min methods. Task scheduling problems are of great importance in c...
详细信息
Object detection under imperfect data receives great attention recently. Weakly supervised object detection (WSOD) suffers from severe localization issues due to the lack of instance-level annotation, while semi-super...
详细信息
Most works on person re-identification (ReID) take advantage of large backbone networks such as ResNet, which are designed for image classification instead of ReID, for feature extraction. However, these backbones may...
详细信息
We study the problem of leveraging the syntactic structure of text to enhance pre-trained models such as BERT and RoBERTa. Existing methods utilize syntax of text either in the pre-training stage or in the fine-tuning...
详细信息
Diffusion model is a promising approach to image generation and has been employed for Pose-Guided Person Image Synthesis (PGPIS) with competitive performance. While existing methods simply align the person appearance ...
详细信息
ISBN:
(数字)9798350353006
ISBN:
(纸本)9798350353013
Diffusion model is a promising approach to image generation and has been employed for Pose-Guided Person Image Synthesis (PGPIS) with competitive performance. While existing methods simply align the person appearance to the target pose, they are prone to overfitting due to the lack of a high-level semantic understanding on the source person image. In this paper, we propose a novel Coarse-to-Fine Latent Diffusion (CFLD) method for PGPIS. In the absence of image-caption pairs and textual prompts, we de-velop a novel training paradigm purely based on images to control the generation process of a pre-trained text-to-image diffusion model. A perception-refined decoder is designed to progressively refine a set of learnable queries and extract semantic understanding of person images as a coarse-grained prompt. This allows for the decoupling of fine-grained appearance and pose information controls at different stages, and thus circumventing the potential over-fitting problem. To generate more realistic texture details, a hybrid- granularity attention module is proposed to encode multi-scale fine-grained appearance features as bias terms to augment the coarse-grained prompt. Both quantitative and qualitative experimental results on the DeepFashion benchmark demonstrate the superiority of our method over the state of the arts for PGPIS. Code is availab.e at https://***/YanzuoLu/CFLD.
Person re-identification (Re-ID) aims to match person images across non-overlapping camera views. The majority of Re-ID methods focus on small-scale surveillance systems in which each pedestrian is captured in differe...
详细信息
Bias Temperature Instability (BTI) is one of the dominant CMOS aging mechanisms. It causes time-dependent variation, threatening circuit lifetime reliability. BTI-induced circuit errors are not detectable at the fabri...
详细信息
ISBN:
(数字)9781728174679
ISBN:
(纸本)9781728174686
Bias Temperature Instability (BTI) is one of the dominant CMOS aging mechanisms. It causes time-dependent variation, threatening circuit lifetime reliability. BTI-induced circuit errors are not detectable at the fabrication stage. On-line monitoring schemes are therefore necessary to capture the degradations during the operational time. Traditional aging monitoring techniques exhibit high implementation complexity and low stability. In this paper, we propose a BTI monitoring approach by simply tracking the start-up behavior of SRAM cells. SRAM is a widely used on-chip device in many applications. We study the impact of BTI for SRAM start-up values and age some cells in a manipulated manner. The BTI degradation is evaluated based on the number of SRAM cells starting with a certain value. This technique can be used to estimate the degradation for on-chip logic circuits without introducing additional circuitry, and thus has very low implementation complexity. We use an SRAM array with 1024 cells to estimate the degradations for multiple logic circuits, and show the average mean absolute percentage error as 8.48%. In addition, this technique is robust considering process, voltage and temperature variations.
The integrity of training data, even when annotated by experts, is far from guaranteed, especially for non-IID datasets comprising both in- and out-of-distribution samples. In an ideal scenario, the majority of sample...
详细信息
Deep neural networks (DNNs) are known to be vulnerable to adversarial attacks. A range of defense methods have been proposed to train adversarially robust DNNs, among which adversarial training has demonstrated promis...
ISBN:
(纸本)9781713845393
Deep neural networks (DNNs) are known to be vulnerable to adversarial attacks. A range of defense methods have been proposed to train adversarially robust DNNs, among which adversarial training has demonstrated promising results. However, despite preliminary understandings developed for adversarial training, it is still not clear, from the architectural perspective, what configurations can lead to more robust DNNs. In this paper, we address this gap via a comprehensive investigation on the impact of network width and depth on the robustness of adversarially trained DNNs. Specifically, we make the following key observations: 1) more parame- ters (higher model capacity) does not necessarily help adversarial robustness; 2) reducing capacity at the last stage (the last group of blocks) of the network can actually improve adversarial robustness; and 3) under the same parameter budget, there exists an optimal architectural configuration for adversarial robustness. We also provide a theoretical analysis explaning why such network configuration can help robustness. These architectural insights can help design adversarially robust DNNs.
Speaker extraction requires a sample speech from the target speaker as the reference. However, enrolling a speaker with a long speech is not practical. We propose a speaker extraction technique, that performs in multi...
详细信息
ISBN:
(纸本)9781728176055;9781728176062
Speaker extraction requires a sample speech from the target speaker as the reference. However, enrolling a speaker with a long speech is not practical. We propose a speaker extraction technique, that performs in multiple stages to take full advantage of short reference speech sample. The extracted speech in early stages is used as the reference speech for late stages. For the first time, we use frame-level sequential speech embedding as the reference for target speaker. This is a departure from the traditional utterance-based speaker embedding reference. In addition, a signal fusion scheme is proposed to combine the decoded signals in multiple scales with automatically learned weights. Experiments on WSJ0-2mix and its noisy versions (WHAM! and WHAMR!) show that SpEx++ consistently outperforms other state-of-the-art baselines.
暂无评论