检索结果-内蒙古大学图书馆

LucIE: Language-guided local image editing for fashion images

Computational Visual Media 2025年第1期11卷 179-194页

作者： Huanglu Wen Shaodi You Ying Fu School of Computer Science and Technology Beijing Institute of TechnologyBeijingChina Computer Vision Research Group in the Institute of Informatics University of AmsterdamAmsterdamthe Netherlands

Language-guided fashion image editing is challenging,as fashion image editing is local and requires high precision,while natural language cannot provide precise visual information for *** this paper,we propose LucIE,a novel unsupervised language-guided local image editing method for fashion *** adopts and modifies recent text-to-image synthesis network,DF-GAN,as its ***,the synthesis backbone often changes the global structure of the input image,making local image editing *** increase structural consistency between input and edited images,we propose Content-Preserving Fusion Module(CPFM).Different from existing fusion modules,CPFM prevents iterative refinement on visual feature maps and accumulates additive modifications on RGB *** achieves local image editing explicitly with language-guided image segmentation and maskguided image blending while only using image and text *** on the DeepFashion dataset shows that LucIE achieves state-of-the-art *** with previous methods,images generated by LucIE also exhibit fewer *** provide visualizations and perform ablation studies to validate LucIE and the *** also demonstrate and analyze limitations of LucIE,to provide a better understanding of LucIE.

关键词： deep learning language-guided image editing local image editing content preservation fashion images

来源：评论

学校读者我要写书评

暂无评论

Multi-Strategy Grey Wolf Optimization Algorithm for Global Optimization and Engineering Applications

引用

Journal of Systems Science and Systems Engineering 2025年第2期34卷 203-230页

作者： Likai Wang Qingyang Zhang Shengxiang Yang Yongquan Dong School of Computer Science and Technology Jiangsu Normal UniversityXuzhou 221000China School of Computer Science and Informatics De Montfort UniversityLeicesterLE19BHLeicesterUK

The grey wolf optimizer(GWO),a population-based meta-heuristic algorithm,mimics the predatory behavior of grey wolf *** exploring and introducing improvement mechanisms is one of the keys to drive the development and application of GWO *** overcome the premature and stagnation of GWO,the paper proposes a multiple strategy grey wolf optimization algorithm(MSGWO).Firstly,an variable weights strategy is proposed to improve convergence rate by adjusting the weights ***,this paper proposes a reverse learning strategy,which randomly reverses some individuals to improve the global search ***,the chain predation strategy is designed to allow the search agent to be guided by both the best individual and the previous ***,this paper proposes a rotation predation strategy,which regards the position of the current best individual as the pivot and rotate other members for enhacing the exploitation *** verify the performance of the proposed technique,MSGWO is compared with seven state-of-the-art meta-heuristics and four variant GWO algorithms on CEC2022 benchmark functions and three engineering optimization *** results demonstrate that MSGWO has better performance on most of benchmark functions and shows competitive in solving engineering design problems.

关键词： Grey wolf optimizer variable weights reverse learning chain predation rotation predation

来源：评论

学校读者我要写书评

暂无评论

Enhancing demand forecasting through combination of anomaly detection and continuous improvement

引用

Soft Computing 2025年第2期29卷 1243-1258页

作者： Jahani, Meysam Zojaji, Zahra Raji, Fatemeh Faculty of Computer Engineering University of Isfahan Isfahan Iran School of Computer Science and Informatics De Montfort University Leicester United Kingdom

Demand forecasting has emerged as a crucial element in supply chain management. It is essential to identify anomalous data and continuously improve the forecasting model with new data. However, existing literature fails to comprehensively cover both aspects of anomaly detection and continuous improvement in demand forecasting. This study proposes an enhanced model to improve accuracy in the demand forecasting. The proposed model introduces a novel data handling method that incorporates an anomaly detection autoencoder, improved with anomaly correction mechanisms. The data handling approach simultaneously detects data anomalies, distinguishes between expected and unexpected anomalies, and corrects anomalous data, ensuring cleaner input for demand forecasting. Then, the proposed model employs a long short-term memory architecture for demand forecasting, enhanced with a continuous improvement method. Thus, the model not only forecasts demand but also retrains the model when the anomaly data surpasses the predetermined threshold, thereby improving the accuracy of forecasting. The results show that the proposed model outperforms other models in detecting data anomalies, achieving an average precision-recall of 0.922, a receiver operating characteristic value of 0.739, and a significance level of less than 0.05. Finally, the model exhibits superior performance in demand forecasting, with average mean squared error, root mean squared error, and mean absolute error values of 33.167, 4.347, and 1.509, respectively, all with a significance level of less than 0.05. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2025.

关键词： Demand response

来源：评论

学校读者我要写书评

暂无评论

MH-HMR:Human mesh recovery from monocular images via multi-hypothesis learning

引用

CAAI Transactions on Intelligence Technology 2024年第5期9卷 1263-1274页

作者： Haibiao Xuan Jinsong Zhang Yu-Kun Lai Kun Li College of Intelligence and Computing Tianjin UniversityTianjinChina School of Computer Science and Informatics Cardiff UniversityCardiffUK

Recovering 3D human meshes from monocular images is an inherently ill-posed and challenging task due to depth ambiguity,joint occlusion,and ***,most existing approaches do not model such uncertainties,typically yielding a single reconstruction for one *** contrast,the ambiguity of the reconstruction is embraced and the problem is considered as an inverse problem for which multiple feasible solutions *** address these issues,the authors propose a multi-hypothesis approach,multi-hypothesis human mesh recovery(MH-HMR),to efficiently model the multi-hypothesis representation and build strong relationships among the hypothetical ***,the task is decomposed into three stages:(1)generating a reasonable set of initial recovery results(i.e.,multiple hypotheses)given a single colour image;(2)modelling intra-hypothesis refinement to enhance every single-hypothesis feature;and(3)establishing inter-hypothesis communication and regressing the final human ***,the authors take further advantage of multiple hypotheses and the recovery process to achieve human mesh recovery from multiple uncalibrated *** with state-of-the-art methods,the MH-HMR approach achieves superior performance and recovers more accurate human meshes on challenging benchmark datasets,such as Human3.6M and 3DPW,while demonstrating the effectiveness across a variety of *** code will be publicly available at https://***/faculty/likun/projects/MH-HMR.

关键词： 3-D computer vision human reconstraction

来源：评论

学校读者我要写书评

暂无评论

Attention-optimized vision-enhanced prompt learning for few-shot multi-modal sentiment analysis

引用

Neural Computing and Applications 2024年第33期36卷 21091-21105页

作者： Zhou, Zikai Qiao, Baiyou Feng, Haisong Han, Donghong Wu, Gang School of Computer Science and Engineering Northeastern University Shenyang110819 China School of Informatics Xiamen University Xiamen361105 China

To fulfill the explosion of multi-modal data, multi-modal sentiment analysis (MSA) emerged and attracted widespread attention. Unfortunately, conventional multi-modal research relies on large-scale datasets. On the one hand, collecting and annotating large-scale datasets is challenging and resource-intensive. On the other hand, the training on large-scale datasets also increases the research cost. However, the few-shot MSA (FMSA), which is proposed recently, requires only few samples for training. Therefore, in comparison, it is more practical and realistic. There have been approaches to investigating the prompt-based method in the field of FMSA, but they have not sufficiently considered or leveraged the information specificity of visual modality. Thus, we propose a vision-enhanced prompt-based model based on graph structure to better utilize vision information for fusion and collaboration in encoding and optimizing prompt representations. Specifically, we first design an aggregation-based multi-modal attention module. Then, based on this module and the biaffine attention, we construct a syntax–semantic dual-channel graph convolutional network to optimize the encoding of learnable prompts by understanding the vision-enhanced information in semantic and syntactic knowledge. Finally, we propose a collaboration-based optimization module based on the collaborative attention mechanism, which employs visual information to collaboratively optimize prompt representations. Extensive experiments conducted on both coarse-grained and fine-grained MSA datasets have demonstrated that our model significantly outperforms the baseline models. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

MusicFace: Music-driven expressive singing face synthesis

引用

Computational Visual Media 2024年第1期10卷 119-136页

作者： Pengfei Liu Wenjin Deng Hengda Li Jintai Wang Yinglin Zheng Yiwei Ding Xiaohu Guo Ming Zeng School of Informatics Xiamen UniversityXiamen361000China Department of Computer Science The University of Texas at DallasRichardsonTexas75080-3021USA

It remains an interesting and challenging problem to synthesize a vivid and realistic singing face driven by music. In this paper, we present a method for this task with natural motions for the lips, facial expression, head pose, and eyes. Due to the coupling of mixed information for the human voice and backing music in common music audio signals, we design a decouple-and-fuse strategy to tackle the challenge. We first decompose the input music audio into a human voice stream and a backing music stream. Due to the implicit and complicated correlation between the two-stream input signals and the dynamics of the facial expressions, head motions, and eye states, we model their relationship with an attention scheme, where the effects of the two streams are fused seamlessly. Furthermore, to improve the expressivenes of the generated results, we decompose head movement generation in terms of speed and direction, and decompose eye state generation into short-term blinking and long-term eye closing, modeling them separately. We have also built a novel dataset, SingingFace, to support training and evaluation of models for this task, including future work on this topic. Extensive experiments and a user study show that our proposed method is capable of synthesizing vivid singing faces, qualitatively and quantitatively better than the prior state-of-the-art.

关键词： face synthesis singing music generative adversarial network

来源：评论

学校读者我要写书评

暂无评论

Tile-size aware bitrate allocation for adaptive 360∘ video streaming

引用

Multimedia Tools and Applications 2025年第13期84卷 12615-12632页

作者： Huang, Jiawei Liu, Mingyue Liu, Jingling Gao, Feng Li, Weihe Wang, Jianxin School of Computer Science and Engineering Central South University Hunan Changsha410083 China School of Informatics The University of Edinburgh Edinburgh United Kingdom

360∘ videos have become increasingly popular recently, but consume much more bandwidth than non-360∘ videos. Usually, 360∘ video streaming partitions the video surface into multiple tiles and encodes the tiles independently to effectively and flexibly use limited link bandwidth. However, current bitrate adaptive algorithms generally aim to maximize the bitrate, rather than perceptual quality, resulting in degradation of user experience. More importantly, we reveal that the distribution of tile size is very skewed, that is, a small number of large tiles consumes more bandwidth than a large number of small tiles, further hurting the overall viewing quality. Therefore, in this paper, we propose a tile-size aware bitrate allocation scheme TSA for adaptive 360∘ video streaming to improve the viewing experience of users. Specifically, TSA cautiously decreases the quality of a few large tiles to allocate more bandwidth to large number of small tiles, thus improving the perceptual quality of overall video, without sacrificing large tiles excessively. Experiments over real-world datasets show that TSA effectively improves V-VMAF by up to 39% compared with several state-of-the-art adaptive bitrate algorithms. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.

关键词： Video streaming

来源：评论

学校读者我要写书评

暂无评论

SCD: Statistical Color Distribution-Based Objective Image Colorization Quality Assessment 41st

SCD: Statistical Color Distribution-Based Objective Image Co...

引用

41st computer Graphics International Conference, CGI 2024

作者： Lyu, Hongjin Elangovan, Hareeharan Rosin, Paul L. Lai, Yu-Kun School of Computer Science and Informatics Cardiff University United Kingdom

ISBN: (纸本)9783031818059

Colorization research has long been a focal point in computer vision and image processing. However, due to its inherently ill-posed nature, a reasonable assessment of the quality of their outcomes remains a challenge. Subjective evaluations are often restricted to a limited number of participants due to the high costs. This along with the existence of individual differences and subjective biases makes it difficult to derive convincing conclusions. Despite no need for participants in objective evaluation metrics, the currently widely applied objective metrics fail to accurately reflect the quality of colorization results, thereby impeding the attainment of consistency with subjective user opinions. Facing the above problems, we propose a novel Statistical Color Distribution-based Objective Evaluation Metric (SCD) for better consistency with human opinions. We first segment images into semantic regions. For each semantic type, a novel two-dimensional natural color distribution w.r.t. hue and saturation is collected to better align with human perceptual observations during image assessment. An adjacency weighted matrix considering surrounding neighboring regions smooths the color distribution table, enabling a more reliable quality assessment. The application of probability density eliminates the issue of frequency anomalies caused by human visual insensitivity, ensuring more accurate *** extensive and comprehensive experiments involving two distinct datasets with the participation of 1321 volunteers, this paper demonstrates that the proposed SCD is more consistent with subjective user opinions compared with current objective metrics for evaluating colorization. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Semantic Segmentation

来源：评论

学校读者我要写书评

暂无评论

A Homogeneous Approach to Reasoning Over Global Geographic Data 44th

A Homogeneous Approach to Reasoning Over Global Geographic ...

引用

44th SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence, SGAI 2024

作者： Abdelmoty, Alia I. Satoti, Abdurauf School of Computer Science Informatics Cardiff University United Kingdom

ISBN: (纸本)9783031779145

Much of the information that we use is geospatially referenced. The need for homogeneous representation of global geographic themes is recognised as critical for sustainable development goals. The richness of local geographic data created and maintained by individual countries vary widely, creating what is known as a geospatial digital divide. Attempts to bridge this divide include the adoption of Discrete Global Grid Systems that provide an abstract and uniform method of partitioning space on Earth. This paper considers how the local methods of partitioning space adopted in individual countries and provided as open data can be integrated with this global grid system. The paper proposes a novel ontology design pattern for representing the integration of both grid systems, and evaluates it against existing methods. It is shown how a uniform treatment of spatial semantics is used to represent geographic places across grid systems. This proposal is a step towards the effective utilisation of these grid systems in building global geographic information systems. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Ontology

来源：评论

学校读者我要写书评

暂无评论

UniArk: Improving Generalisation and Consistency for Factual Knowledge Extraction through Debiasing

UniArk: Improving Generalisation and Consistency for Factual...

引用

2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024

作者： Yang, Yijun He, Jie Chen, Pinzhen Gutiérrez-Basulto, Víctor Pan, Jeff Z. School of Informatics University of Edinburgh United Kingdom School of Computer Science and Informatics Cardiff University United Kingdom

ISBN: (纸本)9798891761148

Several recent papers have investigated the potential of language models as knowledge bases as well as the existence of severe biases when extracting factual knowledge. In this work, we focus on the factual probing performance over unseen prompts from tuning, and using a probabilistic view we show the inherent misalignment between pre-training and downstream tuning objectives in language models for probing knowledge. We hypothesize that simultaneously debiasing these objectives can be the key to generalisation over unseen prompts. We propose an adapter-based framework, UniArk, for generalised and consistent factual knowledge extraction through simple methods without introducing extra parameters. Extensive experiments show that UniArk can significantly improve the model’s out-of-domain generalisation as well as consistency under various prompts. Additionally, we construct ParaTrex, a large-scale and diverse dataset for measuring the inconsistency and out-of-domain generation of models. Further, ParaTrex offers a reference method for constructing paraphrased datasets using large language models. © 2024 Association for Computational Linguistics.

关键词： Extraction

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：