Approximate computing has become a widely recognized method for designing energy-efficient arithmetic architectures in the context of error-tolerant applications. This paper presents the design and analysis of a 4-bit...
详细信息
image completion with large-scale free-form missing regions is one of the most challenging tasks for the computer vision community. While researchers pursue better solutions, drawbacks such as pattern unawareness, blu...
详细信息
ISBN:
(纸本)9781665493468
image completion with large-scale free-form missing regions is one of the most challenging tasks for the computer vision community. While researchers pursue better solutions, drawbacks such as pattern unawareness, blurry textures, and structure distortion remain noticeable, and thus leave space for improvement. To overcome these challenges, we propose a new StyleGAN-based image completion network, Spectral Hint GAN (SH-GAN), inside which a carefully designed spectral processing module, Spectral Hint Unit, is introduced. We also propose two novel 2D spectral processing strategies, Heterogeneous Filtering and Gaussian Split that well-fit modern deep learning models and may further be extended to other tasks. From our inclusive experiments, we demonstrate that our model can reach FID scores of 3.4134 and 7.0277 on the benchmark datasets FFHQ and Places2, and therefore outperforms prior works and reaches a new state-of-the-art. We also prove the effectiveness of our design via ablation studies, from which one may notice that the aforementioned challenges, i.e. pattern unawareness, blurry textures, and structure distortion, can be noticeably resolved. Our code will be open-sourced at: https://***/SHI-Labs/SH-GAN.
For the best predictive results of novel coronavirus infection and COVID-19 mortality, this research bases on the XGBoost machine learning algorithm. Through the research of data on related diseases, it not only helps...
For the best predictive results of novel coronavirus infection and COVID-19 mortality, this research bases on the XGBoost machine learning algorithm. Through the research of data on related diseases, it not only helps to prevent the infection of COVID-19 effectively, but also gives advice to specific patients who need treatment without delay. The existing machine learning model uses the logistic regression algorithm to train the uneven data but gets an unsatisfying precision. This research improves the result by combining undersampling with XGBoost, with random forest, logistic regression, decision tree, and other models as comparisons. The prediction of COVID-19 mortality has the same accuracy of 91% before and after using the undersampling method, however, the AUC rises about 13% and finally reaches 92%• This research is available for the prediction of prevalence rate and death rate of COVID-19 in people who have basic diseases.
Commonly spatial data are stored in the Geographic Database (GDB) which is the backbone of Geographic information System (GIS). Setting up such database is a tedious and cost task. Moreover, the overwhelming amount of...
详细信息
ISBN:
(纸本)9798350319439
Commonly spatial data are stored in the Geographic Database (GDB) which is the backbone of Geographic information System (GIS). Setting up such database is a tedious and cost task. Moreover, the overwhelming amount of spatial information, particularly in the textual data is growing continuously. Hence, efforts were devoted to extract such data convoyed in the text streams freely available. In this context, we address this issue by proposing a novel hybrid approach that combines several Natural Language processing (NLP) techniques, rules, and gazetteers to extract spatial named entities and retrieve relationships amongst them.
This paper presents a novel framework called HST for semi-supervised video object segmentation (VOS). HST extracts image and video features using the latest Swin Transformer and Video Swin Transformer to inherit their...
详细信息
ISBN:
(纸本)9798350307443
This paper presents a novel framework called HST for semi-supervised video object segmentation (VOS). HST extracts image and video features using the latest Swin Transformer and Video Swin Transformer to inherit their inductive bias for the spatiotemporal locality, which is essential for temporally coherent VOS. To take full advantage of the image and video features, HST casts image and video features as a query and memory, respectively. By applying efficient memory read operations at multiple scales, HST produces hierarchical features for the precise reconstruction of object masks. HST shows effectiveness and robustness in handling challenging scenarios with occluded and fast-moving objects under cluttered backgrounds. In particular, HST-B outperforms the state-of-the-art competitors on multiple popular benchmarks, i.e., YouTube-VOS (85.0%), DAVIS 2017 (85.9%), and DAVIS 2016 (94.0%).
We address the problem of efficiently compressing video for conferencing-type applications. We build on recent approaches based on image animation, which can achieve good reconstruction quality at very low bitrate by ...
详细信息
ISBN:
(纸本)9781728198354
We address the problem of efficiently compressing video for conferencing-type applications. We build on recent approaches based on image animation, which can achieve good reconstruction quality at very low bitrate by representing face motions with a compact set of sparse keypoints. However, these methods encode video in a frame-by-frame fashion, i.e., each frame is reconstructed from a reference frame, which limits the reconstruction quality when the bandwidth is larger. Instead, we propose a predictive coding scheme which uses image animation as a predictor, and codes the residual with respect to the actual target frame. The residuals can be in turn coded in a predictive manner, thus removing efficiently temporal dependencies. Our experiments indicate a significant bitrate gain, in excess of 70% compared to the HEVC video standard and over 30% compared to VVC, on a dataset of talking-head videos.
we focused on exploring novel methods and approaches in data analysis and processing for recommendation systems. To build our recommendation system, we established models for points of interest (POIs) and users. Our s...
详细信息
ISBN:
(纸本)9798350319439
we focused on exploring novel methods and approaches in data analysis and processing for recommendation systems. To build our recommendation system, we established models for points of interest (POIs) and users. Our solution incorporated three key factors: sentiment analysis, user preferences, and ratings, culminating in the integration of the LightGCN model. The sentiment analysis factor played a crucial role in analyzing user reviews and predicting ratings. The user preference factor enabled us to recommend the most suitable POIs based on individual preferences and interests. The culmination of these factors, along with the preprocessing, filtering, and modelling of the POIs, led to the integration of the LightGCN model. During the experimentation phase, we utilized Yelp datasets to preprocess, filter, and model the POIs, incorporating sentiment analysis of reviews. The recommendation system, developed in the same environment, utilized the combined results of the three factors and the LightGCN model to provide improved POI recommendations for users.
Deep learning based food image classification has enabled more accurate nutrition content analysis for image-based dietary assessment by predicting the types of food in eating occasion images. However, there are two m...
详细信息
ISBN:
(纸本)9781728198354
Deep learning based food image classification has enabled more accurate nutrition content analysis for image-based dietary assessment by predicting the types of food in eating occasion images. However, there are two major obstacles to apply food classification in real life applications. First, real life food images are usually heavy-tailed distributed, resulting in severe class-imbalance issue. Second, it is challenging to train a single-stage (i.e. end-to-end) framework under heavy-tailed data distribution, which cause the over-predictions towards head classes with rich instances and under-predictions towards tail classes with rare instance. In this work, we address both issues by introducing a novel single-stage heavy-tailed food classification framework. Our method is evaluated on two heavy-tailed food benchmark datasets, Food101-LT and VFN-LT, and achieves the best performance compared to existing work with over 5% improvements for top-1 accuracy.
Diffusion models generate images by iterative denoising. Recent work has shown that by making the denoising process deterministic, one can encode real images into latent codes of the same size, which can be used for i...
ISBN:
(纸本)9798350307184
Diffusion models generate images by iterative denoising. Recent work has shown that by making the denoising process deterministic, one can encode real images into latent codes of the same size, which can be used for image editing. This paper explores the possibility of defining a latent space even when the denoising process remains stochastic. Recall that, in stochastic diffusion models, Gaussian noises are added in each denoising step, and we can concatenate all the noises to form a latent code. This results in a latent space of much higher dimensionality than the original image. We demonstrate that this latent space of stochastic diffusion models can be used in the same way as that of deterministic diffusion models in two applications. First, we propose CycleDiffusion, a method for zero-shot and unpaired image editing using stochastic diffusion models, which improves the performance over its deterministic counterpart. Second, we demonstrate unified, plug-and-play guidance in the latent spaces of deterministic and stochastic diffusion models.(1)
The majority of existing image forgeries involve augmenting a specific region of the source image which leaves detectable artifacts and forensic traces. These distinguishing features are mostly found in and around the...
详细信息
ISBN:
(纸本)9798350307443
The majority of existing image forgeries involve augmenting a specific region of the source image which leaves detectable artifacts and forensic traces. These distinguishing features are mostly found in and around the local neighborhood of the manipulated pixels. However, patch-based detection approaches quickly become intractable due to inefficient computation and low robustness. In this work, we investigate how to effectively learn these forensic representations using local window-based attention techniques. We propose Forensic Modulation Network (ForMoNet) that uses focal modulation and gated attention layers to automatically identify the long and short-range context for any query pixel. Furthermore, the network is more interpretable and computationally efficient than standard self-attention, which is critical for real-world applications. Our evaluation of various benchmarks shows that ForMoNet outperforms existing transformer-based forensic networks by 6% to 11% on different forgeries.
暂无评论