Accurately synthesizing talking face videos and capturing fine facial features for individuals with long hair presents a significant challenge. To tackle these challenges in existing methods, we propose a decomposed p...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
Accurately synthesizing talking face videos and capturing fine facial features for individuals with long hair presents a significant challenge. To tackle these challenges in existing methods, we propose a decomposed per-embedding Gaussian fields (DEGSTalk), a 3D Gaussian Splatting (3DGS)-based talking face synthesis method for generating realistic talking faces with long hairs. Our DEGSTalk employs Deformable Pre-Embedding Gaussian Fields, which dynamically adjust pre-embedding Gaussian primitives using implicit expression coefficients. This enables precise capture of dynamic facial regions and subtle expressions. Additionally, we propose a Dynamic Hair-Preserving Portrait Rendering technique to enhance the realism of long hair motions in the synthesized videos. Results show that DEGSTalk achieves improved realism and synthesis quality compared to existing approaches, particularly in handling complex facial dynamics and hair preservation. Our code is available at https://***/CVI-SZU/DEGSTalk.
Edge computing moves cloud services closer to consumer Internet of Things (IoT) devices, reducing latency and bandwidth usage. This setup enables faster responses but also introduces new security challenges, particula...
详细信息
Diffusion-Weighted Imaging (DWI) is a significant technique for studying white matter. However, it suffers from low-resolution obstacles in clinical settings. Post-acquisition Super-Resolution (SR) can enhance the res...
详细信息
The 3D generative adversarial network (GAN) inversion converts an image into 3D representation to attain high-fidelity reconstruction and facilitate realistic image manipulation within the 3D latent space. However, pr...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
The 3D generative adversarial network (GAN) inversion converts an image into 3D representation to attain high-fidelity reconstruction and facilitate realistic image manipulation within the 3D latent space. However, previous approaches face challenges regarding the trade-off between the reconstruction ability and editability. That is, reversing a real-world image to a low-dimensional latent code would inevitably lead to information loss, and achieving a near-perfect reconstruction using high-rate triplane representation often limits the ability to manipulate the image freely in the latent space. To address these issues, we propose a novel latent conditioning encoder-based framework with the alignment between the low-dimensional latent and high-dimensional triplane. A non-semantic guided editing strategy bridges the intrinsic relation between the latent condition and triplane generation, making it possible to edit the high-dimensional representation by latent manipulation. As a result, our method can achieve high-fidelity reconstruction and editing simultaneously by directly controlling the latent code. Experimental results demonstrate that our approach excels in reconstruction and editing quality compared to previous 3D inversion methods. Furthermore, our method can also edit even real faces with large poses and out-of-domain cases.
Current diffusion-based inpainting models struggle to preserve unmasked regions or generate highly coherent content. Additionally, it is hard for them to generate meaningful content for 3D inpainting. To tackle these ...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
Current diffusion-based inpainting models struggle to preserve unmasked regions or generate highly coherent content. Additionally, it is hard for them to generate meaningful content for 3D inpainting. To tackle these challenges, we design a plug-and-play branch that runs through the entire generation process to enhance existing models. Specifically, we utilize dual encoders - a Convolutional Neural Network (CNN) encoder and the pre-trained Variational AutoEncoder (VAE) encoder, to encode masked images. The latent code and the feature map from the dual encoders are fed to diffusion models simultaneously. In addition, we apply Zero-padded initialization to solve the problem of mode collapse caused by this branch. Experiments on BrushBench and EditBench demonstrate that models with our plug-and-play branch can improve the coherence of inpainting, and our model achieves new state-of-the-art results.
Federated Learning (FL) is increasingly adopted in edge computing scenarios, where a large number of heterogeneous clients operate under constrained or sufficient resources. The iterative training process in conventio...
详细信息
In recent years, Neural Architecture Search (NAS) has emerged as a promising approach for automatically discovering superior model architectures for deep Graph Neural Networks (GNNs). Different methods have paid atten...
详细信息
The analysis of Cardiotocography (CTG) signals is often hindered by challenges such as limited data availability and label imbalance, which can undermine the performance of deep learning models. To address these issue...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
The analysis of Cardiotocography (CTG) signals is often hindered by challenges such as limited data availability and label imbalance, which can undermine the performance of deep learning models. To address these issues, we present CTGDiff, a novel conditional diffusion model designed for generating synthetic Fetal Heart Rate (FHR) and Uterine Contraction (UC) signals. CTGDiff leverages both Phase-Rectified Signal Averaging (PRSA) spectrograms and UC as conditioning inputs for FHR, and integrates time encoding, condition generation from PRSA features, and residual blocks with dilated convolutions to capture both temporal dynamics and long-range dependencies. Extensive experiments, both qualitative and quantitative, demonstrate the model’s ability to synthesize high-quality CTG signals. In comparison with GANs and image-based diffusion models, CTGDiff achieves superior signal fidelity and distribution similarity for FHR, as indicated by metrics such as a 0.004 maximum mean deviation (MMD), 0.646 percent root mean square difference (PRD), 3.951 relative entropy (RE), and 0.291 Frechet distance (FD). Expert evaluations confirm that the model can generate both normal and abnormal CTG signals with high accuracy, conditioned on specific input data. These results underscore the potential of diffusion models for a wide range of applications in biomedical time series analysis, including signal synthesis, imputation, and noise reduction.
Hypergraph Neural Networks (HGNNs) are increasingly utilized to analyze complex inter-entity relationships. Traditional HGNN systems, based on a hyperedge-centric dataflow model, independently process aggregation task...
详细信息
ISBN:
(数字)9798331506476
ISBN:
(纸本)9798331506483
Hypergraph Neural Networks (HGNNs) are increasingly utilized to analyze complex inter-entity relationships. Traditional HGNN systems, based on a hyperedge-centric dataflow model, independently process aggregation tasks for hyperedges and vertices, leading to significant computational redundancy. This redundancy arises from recalculating shared information across different tasks. For the first time, we identify and harness implicit dataflows (i.e., dependencies) within HGNNs, introducing the microedge concept to effectively capture and reuse intricate shared information among aggregation tasks, thereby minimizing redundant computations. We have developed a new microedge-centric dataflow model that processes shared information as fine-grained microedge aggregation tasks. This dataflow model is supported by the Read-Process-Activate-Generate execution model, which aims to optimize parallelism among these tasks. Furthermore, our newly developed MeHyper, a microedge-centric HGNN accelerator, incorporates a decoupled pipeline for improved computational parallelism and a hierarchical feature management strategy to reduce off-chip memory accesses for large volumes of intermediate feature vectors generated. Our evaluation demonstrates that MeHyper substantially outperforms the leading CPUbased system PyG-CPU and the GPU-based system HyperGef, delivering performance improvements of $1,032.23 \times$ and $10.51 \times$, and energy efficiencies of $1,169.03 \times$ and $9.96 \times$, respectively.
暂无评论