An essential tool for describing human emotion is facial expression. From morning tonight, people go through a broad spectrum of emotions that may be caused by their mental or physical health. The six major facial exp...
详细信息
Single-image super-resolution technology has become a topic of extensive research in various applications, aiming to enhance the quality and resolution of degraded images obtained from low-resolution sensors. However,...
Single-image super-resolution technology has become a topic of extensive research in various applications, aiming to enhance the quality and resolution of degraded images obtained from low-resolution sensors. However, most existing studies on single-image super-resolution have primarily focused on developing deep learning networks operating on high-performance graphics processing.units. Therefore, this study proposes a lightweight real-time image super-resolution network for 4K images. Furthermore, we applied a reparameterization method to improve the network performance without incurring additional computational costs. The experimental results demonstrate that the proposed network achieves a PSNR of 30.15 dB and an inference time of 4.75 ms on an RTX 3090Ti device, as evaluated on the NTIRE 2023 Real-Time Super-Resolution validation scale X3 dataset. The code is available at https://***/Ganzooo/***.
As a new method of film research, Cinemeitrics uses a systematic and digital way to measure and analyze film style, It focuses more on the measurability of objects and the accuracy of results. As an artistic medium, f...
详细信息
Diffusion models (DMs) can generate realistic images with text guidance using large-scale datasets. However, they demonstrate limited controllability on the generated images. We introduce iEdit, a novel method for tex...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Diffusion models (DMs) can generate realistic images with text guidance using large-scale datasets. However, they demonstrate limited controllability on the generated images. We introduce iEdit, a novel method for text-guided image editing conditioned on a source image and textual prompt. As a fully-annotated dataset with target images does not exist, previous approaches perform subject-specific fine-tuning at test time or adopt contrastive learning without a target image, leading to issues on preserving source image fidelity. We propose to automatically construct a dataset derived from LAION-5B, containing pseudo-target images and descriptive edit prompts. The dataset allows us to incorporate a weakly-supervised loss function, generating the pseudo-target image from the source image’s latent noise conditioned on the edit prompt. To encourage localised editing we propose a loss function that uses segmentation masks to guide the editing during training and optionally at inference. Trained with limited GPU resources on the constructed dataset, our model outperforms counterparts in image fidelity, CLIP alignment score, and qualitatively for both generated and real images.
Bokeh effect transformation is a novel task in computer vision and computational photography. It aims to convert bokeh effects from one camera lens to another. To this end, we introduce a new concept of blur ratio, wh...
Bokeh effect transformation is a novel task in computer vision and computational photography. It aims to convert bokeh effects from one camera lens to another. To this end, we introduce a new concept of blur ratio, which represents the ratio of the blur amount of a target image to that of a source image, and propose a novel framework SBTNet based on this concept. For cat-eye simulation and lens type transformation, a two-channel coordinate map and a two-channel one-hot map are added as extra inputs. The core of the framework is a sequence of parallel FeaNets, along with a feature selection and integration strategy, which aims to transform the blur amount with arbitrary blur ratio. The effectiveness of the proposed framework is demonstrated through extensive experiments, and our solution has achieved the top LPIPS metric in NTIRE 2023 Bokeh Effect Transformation Challenge.
This study investigates the integration of vision language models (VLM) to enhance the classification of situations within rugby match broadcasts. The importance of accurately identifying situations in sports videos i...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
This study investigates the integration of vision language models (VLM) to enhance the classification of situations within rugby match broadcasts. The importance of accurately identifying situations in sports videos is emphasized for understanding game dynamics and facilitating downstream tasks like performance evaluation and injury prevention. Utilizing a dataset comprising 18, 000 labeled images extracted at 0.2-second intervals from 100 minutes of rugby match broadcasts, scene classification tasks including contact plays (scrums, mauls, rucks, tackles, lineouts), rucks, tackles, lineouts, and multiclass classification were performed. The study aims to validate the utility of VLM outputs in improving classification performance compared to using solely image data. Experimental results demonstrate substantial performance improvements across all tasks with the incorporation of VLM outputs. Our analysis of prompts suggests that, when provided with appropriate contextual information through natural language, VLMs can effectively capture the context of a given image. The findings of our study indicate that leveraging VLMs in the domain of sports analysis holds promise for developing imageprocessing.models capable of incorpolating the tacit knowledge encoded within language models, as well as information conveyed through natural language descriptions.
This paper reports on the NTIRE 2023 Quality Assessment of Video Enhancement Challenge, which will be held in conjunction with the New Trends in image Restoration and Enhancement Workshop (NTIRE) at CVPR 2023. This ch...
详细信息
Recent research has highlighted improvements in high-quality imaging guided by event cameras, with most of these efforts concentrating on the RGB domain. However, these advancements frequently neglect the unique chall...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Recent research has highlighted improvements in high-quality imaging guided by event cameras, with most of these efforts concentrating on the RGB domain. However, these advancements frequently neglect the unique challenges introduced by the inherent flaws in the sensor design of event cameras in the RAW domain. Specifically, this sensor design results in the partial loss of pixel values, posing new challenges for RAW domain processes like demosaicing. The challenge intensifies as most research in the RAW domain is based on the premise that each pixel contains a value, making the straightforward adaptation of these methods to event camera demosaicing problematic. To end this, we present a Swin-Transformer-based backbone and a pixel-focus loss function for demosaicing with missing pixel values in RAW domain processing. Our core motivation is to refine a general and widely applicable foundational model from the RGB domain for RAW domain processing. thereby broadening the model’s applicability within the entire imaging process. Our method harnesses multi-scale processing.and space-to-depth techniques to ensure efficiency and reduce computing complexity. We also proposed the Pixel-focus Loss function for network fine-tuning to improve network convergence based on our discovery of a long-tailed distribution in training loss. Our method has undergone validation on the MIPI Demosaic Challenge dataset, with subsequent analytical experimentation confirming its efficacy. All code and trained models are released here: https://***/yunfanLu/ev-demosaic.
Domain names and URLs are essential technologies in the current Internet. Thus, a failure in URL processing.not only gives an inconvenience to users but also causes serious security vulnerability. If URLs are still or...
详细信息
ISBN:
(纸本)9781665424639
Domain names and URLs are essential technologies in the current Internet. Thus, a failure in URL processing.not only gives an inconvenience to users but also causes serious security vulnerability. If URLs are still organized with only ASCII characters, there may be no problem on URL processing. However, current URLs are further extended and complicated. One of the complexity is coming from Internationalized Domain Name (IDN) related extensions. Thus, there are no simple ways to process URLs due to their characteristics and historical extensions. In this paper, firstly, we introduce possible threats due to wrong IDN processing. Then, we present potential threats due to URL extraction operations in applications with classifying attack surfaces. We examined the above problems with various programming languages and web browsers and confirmed many issues in different environments. Furthermore, we confirmed and demonstrated that the failure pattern are not identical because the issues that come from IDN processing.varies. Finally, we conclude the experimental result and propose ways to a comprehensive solution.
In this paper, we present EdgeRelight360, an approach for real-time video portrait relighting on mobile devices, utilizing text-conditioned generation of 360-degree high dynamic range image (HDRI) maps. Our method pro...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
In this paper, we present EdgeRelight360, an approach for real-time video portrait relighting on mobile devices, utilizing text-conditioned generation of 360-degree high dynamic range image (HDRI) maps. Our method proposes a diffusion-based text-to-360-degree image generation in the HDR domain, taking advantage of the HDR10 standard. This technique facilitates the generation of high-quality, realistic lighting conditions from textual descriptions, offering flexibility and control in portrait video relighting task. Unlike the previous relighting frameworks, our proposed system performs video relighting directly on-device, enabling real-time inference with real 360-degree HDRI maps. This on-device processing.ensures both privacy and guarantees low runtime, providing an immediate response to changes in lighting conditions or user inputs. Our approach paves the way for new possibilities in real-time video applications, including video conferencing, gaming, and augmented reality, by allowing dynamic, text-based control of lighting conditions.
暂无评论