检索结果-内蒙古大学图书馆

Feature-Grounded Single-Stage Text-to-Image Generation

Tsinghua science and Technology 2024年第2期29卷 469-480页

作者： Yuan Zhou Peng Wang Lei Xiang Haofeng Zhang School of Artificial Intelligence Nanjing University of Information Science and TechnologyNanjing 210044China School of Computer Science and Engineering Nanjing University of Science and TechnologyNanjing 210094China

Recently,Generative Adversarial Networks(GANs)have become the mainstream text-to-image(T2I)***,a standard normal distribution noise of inputs cannot provide sufficient information to synthesize an image that approaches the ground-truth image ***,the multistage generation strategy results in complex T2I ***,this study proposes a novel feature-grounded single-stage T2I model,which considers the“real”distribution learned from training images as one input and introduces a worst-case-optimized similarity measure into the loss function to enhance the model's generation *** results on two benchmark datasets demonstrate the competitive performance of the proposed model in terms of the Frechet inception distance and inception score compared to those of some classical and state-of-the-art models,showing the improved similarities among the generated image,text,and ground truth.

关键词： text-to-image(T2I) feature-grounded single-stage generation Generative Adversarial Network(GAN)

来源：评论

学校读者我要写书评

暂无评论

Performance Analysis of a Novel Relay Selection Scheme for Wireless-Powered Cluster-Based Multi-Hop Cognitive Relay Networks

引用

IEEE Transactions on Cognitive Communications and Networking 2025年第3期11卷 1551-1562页

作者： Sun, Hui Naraghi-Pour, Mort Qian, Yuwen Sheng, Weixing Han, Yubing Nanjing University of Science and Technology School of Electronic and Optical Engineering Nanjing210094 China Louisiana State University Division of Electrical and Computer Engineering School of Electrical Engineering and Computer Science Baton RougeLA70803 United States

In this paper, we study the performance of wireless-powered cluster-based multi-hop cognitive relay networks (MCRNs), where secondary nodes harvest energy from multiple dedicated power beacons (PBs) and share the spectrum with multiple primary receivers (PRs) in the underlay paradigm. For this system, we propose a hop-by-hop relay selection scheme called the largest decoding set (LDS). In each stage, relay selection is based on the harvested energy from PBs and the channel state information (CSI) of both the interference links to PRs and the relaying links in the subsequent hop. Considering both harvested energy and maximum interference constraints, we derive the exact end-to-end outage probability and show that the results closely match those obtained from simulations. Moreover, the asymptotic end-to-end outage probabilities in two different scenarios are derived to provide more valuable insights. Numerical results show that the outage probability of the proposed LDS scheme is very close to that of the best-path scheduling (BPS) scheme, which provides a lower bound on outage probability but requires global CSI before the secondary source transmits. We also compare the LDS scheme with other relay selection schemes that have appeared recently in the literature and show that LDS has the best outage performance. © 2015 IEEE.

关键词： Cognitive radio

来源：评论

学校读者我要写书评

暂无评论

Multi-Task Visual Semantic Embedding Network for Image-Text Retrieval

引用

Journal of computer science & Technology 2024年第4期39卷 811-826页

作者： Xue-Yang Qin Li-Shuang Li Jing-Yao Tang Fei Hao Mei-Ling Ge Guang-Yao Pang School of Computer Science and Technology Dalian University of TechnologyDalian 116024China School of Computer Science Shaanxi Normal UniversityXi’an 710119China School of Computer Engineering Weifang UniversityWeifang 261061China Guangxi Colleges and Universities Key Laboratory of Intelligent Industry Software Wuzhou UniversityWuzhou 543002 China

Image-text retrieval aims to capture the semantic correspondence between images and texts,which serves as a foundation and crucial component in multi-modal recommendations,search systems,and online *** mainstream methods primarily focus on modeling the association of image-text pairs while neglecting the advantageous impact of multi-task learning on image-text *** this end,a multi-task visual semantic embedding network(MVSEN)is proposed for image-text ***,we design two auxiliary tasks,including text-text matching and multi-label classification,for semantic constraints to improve the generalization and robustness of visual semantic embedding from a training ***,we present an intra-and inter-modality interaction scheme to learn discriminative visual and textual feature representations by facilitating information flow within and between ***,we utilize multi-layer graph convolutional networks in a cascading manner to infer the correlation of image-text *** results show that MVSEN outperforms state-of-the-art methods on two publicly available datasets,Flickr30K and MSCOCO,with rSum improvements of 8.2%and 3.0%,respectively.

关键词： image-text retrieval cross-modal retrieval multi-task learning graph convolutional network

来源：评论

学校读者我要写书评

暂无评论

A comprehensive systematic review of machine learning in the retail industry: classifications, limitations, opportunities, and challenges

引用

Neural Computing and Applications 2024年第4期37卷 2035-2070页

作者： Hassan, Dler O. Hassan, Bryar A. Department of Computer Science College of Science Charmo University Kurdistan Region Chamchamal Sulaimani46023 Iraq Computer Science and Engineering Department School of Science and Engineering University of Kurdistan Hewler Erbil Iraq

Machine learning has profoundly transformed various industries, notably revolutionizing the retail sector through diverse applications that significantly enhance operational efficiency and performance. This comprehensive review examines the state-of-the-art machine learning applications in the retail sector from 2019 to 2024, focusing on supervised learning, unsupervised learning, and ensemble methods. It aims to identify and categorize recent machine learning applications in retail, evaluate the performance of machine learning algorithms, and determine the most suitable algorithms for specific retail use cases. This review article examines 56 studies and identifies 20 unique machine learning applications within the retail sector. This review also discusses the challenges and opportunities of implementing machine learning in retail, offering valuable insights to guide future research and enhance retail performance and customer satisfaction. The findings highlight the strengths and limitations of different machine learning methods, providing insights into their practical applications and future potential. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.

关键词： Ensemble methods Machine learning Retail Supervised learning Unsupervised learning

来源：评论

学校读者我要写书评

暂无评论

On building automation system security

引用

High-Confidence Computing 2024年第3期4卷 103-122页

作者： Christopher Morales-Gonzalez Matthew Harper Michael Cash Lan Luo Zhen Ling Qun Z.Sun Xinwen Fu Department of Computer Science University of Massachusetts LowellLowell 01854USA Department of Electrical and Computer Engineering University of Central FloridaOrlando 32816USA School of Computer Science and Technology Southeast UniversityMa’anshan 243032China School of Computer Science and Engineering Anhui University of TechnologyNanjing 211189China

Building Automation Systems(BASs)are seeing increased usage in modern society due to the plethora of benefits they provide such as automation for climate control,HVAC systems,entry systems,and lighting *** BASs in use are outdated and suffer from numerous vulnerabilities that stem from the design of the underlying BAS *** this paper,we provide a comprehensive,up-to-date survey on BASs and attacks against seven BAS protocols including BACnet,EnOcean,KNX,LonWorks,Modbus,ZigBee,and *** studies of secure BAS protocols are also presented,covering BACnet Secure Connect,KNX Data Secure,KNX/IP Secure,ModBus/TCP Security,EnOcean High Security and Z-Wave *** and ZigBee do not have security *** point out how these security protocols improve the security of the BAS and what issues remain.A case study is provided which describes a real-world BAS and showcases its vulnerabilities as well as recommendations for improving the security of *** seek to raise awareness to those in academia and industry as well as highlight open problems within BAS security.

关键词： Building automation system BAS protocols Security Attack

来源：评论

学校读者我要写书评

暂无评论

High-rate metal-free MXene microsupercapacitors on paper substrates

引用

Carbon Energy 2024年第5期6卷 94-104页

作者： Han Xue Po‐Han Huang Lee‐Lun Lai Yingchun Su Axel Strömberg Gaolong Cao Yuzhu Fan Sergiy Khartsev Mats Göthelid Yan‐Ting Sun Jonas Weissenrieder Kristinn BGylfason Frank Niklaus Jiantong Li School of Electrical Engineering and Computer Science KTH Royal Institute of TechnologyStockholmSweden School of Engineering Sciences KTH Royal Institute of TechnologyStockholmSweden

MXene is a promising energy storage material for miniaturized microbatteries and microsupercapacitors(MSCs).Despite its superior electrochemical performance,only a few studies have reported MXene-based ultrahigh-rate(>1000 mV s^(−1))on-paper MSCs,mainly due to the reduced electrical conductance of MXene films deposited on ***,ultrahigh-rate metal-free on-paper MSCs based on heterogeneous MXene/poly(3,4-ethylenedioxythiophene)-poly(styrenesulfonate)(PEDOT:PSS)-stack electrodes are fabricated through the combination of direct ink writing and femtosecond laser *** a footprint area of only 20 mm^(2),the on-paper MSCs exhibit excellent high-rate capacitive behavior with an areal capacitance of 5.7 mF cm^(−2)and long cycle life(>95%capacitance retention after 10,000 cycles)at a high scan rate of 1000 mV s^(−1),outperforming most of the present on-paper ***,the heterogeneous MXene/PEDOT:PSS electrodes can interconnect individual MSCs into metal-free on-paper MSC arrays,which can also be simultaneously charged/discharged at 1000 mV s^(−1),showing scalable capacitive *** heterogeneous MXene/PEDOT:PSS stacks are a promising electrode structure for on-paper MSCs to serve as ultrafast miniaturized energy storage components for emerging paper electronics.

关键词： direct ink writing femtosecond laser scribing MXene on-paper microsupercapacitors PEDOT:PSS ultrahigh rate capability

来源：评论

学校读者我要写书评

暂无评论

Align Is Not Enough: Multimodal Universal Jailbreak Attack Against Multimodal Large Language Models

引用

IEEE Transactions on Circuits and Systems for Video Technology 2025年第6期35卷 5475-5488页

作者： Wang, Youze Hu, Wenbo Dong, Yinpeng Liu, Jing Zhang, Hanwang Hong, Richang Hefei University of Technology School of Computer Science and Information Engineering Hefei230009 China Tsinghua University Department of Computer Science and Technology Beijing100084 China Institute of Automation Chinese Academy of Science Beijing100190 China Nanyang Technological University School of Computer Science and Engineering Jurong West 639798 Singapore

Large Language Models (LLMs) have evolved into Multimodal Large Language Models (MLLMs), significantly enhancing their capabilities by integrating visual information and other types, thus aligning more closely with the nature of human intelligence, which processes a variety of data forms beyond just text. Despite advancements, the undesirable generation of these models remains a critical concern, particularly due to vulnerabilities exposed by text-based jailbreak attacks, which have represented a significant threat by challenging existing safety protocols. Motivated by the unique security risks posed by the integration of new and old modalities for MLLMs, we propose a unified multimodal universal jailbreak attack framework that leverages iterative image-text interactions and transfer-based strategy to generate a universal adversarial suffix and image. Our work not only highlights the interaction of image-text modalities can be used as a critical vulnerability but also validates that multimodal universal jailbreak attacks can bring higher-quality undesirable generations across different MLLMs. We evaluate the undesirable context generation of MLLMs like LLaVA, Yi-VL, MiniGPT4, MiniGPT-v2, and InstructBLIP, and reveal significant multimodal safety alignment issues, highlighting the inadequacy of current safety mechanisms against sophisticated multimodal attacks. This study underscores the urgent need for robust safety measures in MLLMs, advocating for a comprehensive review and enhancement of security protocols to mitigate potential risks associated with multimodal capabilities. © 1991-2012 IEEE.

关键词： Human form models

来源：评论

学校读者我要写书评

暂无评论

Event-triggered tracking control for a class of nonholonomic systems in chained form

引用

science China(Information sciences) 2023年第7期66卷 147-161页

作者： Liang XU Youfeng SU He CAI Center for Discrete Mathematics and Theoretical Computer Science Fuzhou University College of Computer and Data Science Fuzhou University School of Automation Science and Engineering South China University of Technology

In this study, the event-triggered asymptotic tracking control problem is considered for a class of nonholonomic systems in chained form for the time-varying reference input. First, to eliminate the ripple phenomenon caused by the imprecise compensation of the time-varying reference input, a novel time-varying event-triggered piecewise continuous control law and a triggering mechanism with a time-varying triggering function are developed. Second, an explicit integral input-to-state stable Lyapunov function is constructed for the time-varying closed-loop system regarding the sampling error as the external input. The origin of the closed-loop system is shown to be uniformly globally asymptotically stable for any global exponential decaying threshold signals, which in turn rules out the Zeno behavior. Moreover, infinitely fast sampling can be avoided by appropriately tuning the exponential convergence rate of the threshold signal. A numerical simulation example is provided to illustrate the proposed control approach.

关键词： event-triggered nonholonomic systems strict Lyapunov function tracking integral input-to-state stable

来源：评论

学校读者我要写书评

暂无评论

FERMixNet: An Occlusion Robust Facial Expression Recognition Model With Facial Mixing Augmentation and Mid-Level Representation Learning

引用

IEEE Transactions on Affective Computing 2025年第2期16卷 639-654页

作者： Huang, Yansong Peng, Junjie Zhang, Wenqiang Zhao, Tong Chen, Gan Tan, Shuhua Yi, Fen Wang, Lu Shanghai University School of Computer Engineering and Science Shanghai200444 China Shanghai University School of Computer Engineering and Science Shanghai Institute for Advanced Communication and Data Science Shanghai200444 China Fudan University Academy for Engineering and Technology School of Computer Science and Technology Shanghai200437 China YTO Express Company Ltd. National Logistics Engineering Laboratory Shanghai201708 China

Facial expressions can provide a better understanding of people’s mental status and attitudes towards specific things. However, facial occlusion in real world is an unfavorable phenomenon that greatly affects the performance of facial expression recognition models. Recent works addressing the occlusion problem have primarily relied on attention mechanisms or occlusion discarding methods that focus on non-occluded regions of the face. However, these methods have not achieved a good balance between occlusion robustness and model efficiency. In this paper, we propose a simple and efficient model, called FERMixNet, for occluded facial expression recognition. The model incorporates a novel facial mixing augmentation strategy (FERMix) that generates new training samples by simulating real-world facial occlusion and preserving high expression-related semantic information. By co-training the original and newly generated samples, the model’s occlusion robustness is improved without increasing its complexity during inference. Additionally, to further enhance the model’s occlusion robustness, we include mid-level representation learning in the network to learn the discriminative non-occluded local features of the samples with low computational cost. Extensive experiments on four public facial occlusion datasets: Occlusion-RAF-DB, Occlusion-FERPlus and FED-RO show that the proposed model achieves state-of-the-art results which demonstrates the good robustness of our method for occluded facial expression recognition. Meanwhile, the proposed model also achieves state-of-the-art results on the in-the-wild facial expression datasets RAF-DB, AffectNet-8, and AffectNet-7. It proves that the proposed model has good application prospects in real world. © 2010-2012 IEEE.

关键词： Face recognition

来源：评论

学校读者我要写书评

暂无评论

Explainable Business Process Remaining Time Prediction Using Reachability Graph

引用

Chinese Journal of Electronics 2023年第3期32卷 625-639页

作者： CAO Rui ZENG Qingtian NI Weijian LU Faming LIU Cong DUAN Hua College of Computer Science and Engineering Shandong University of Science and Technology School of Computer Science and Technology Shandong University of Technology

With the recent advances in the field of deep learning, an increasing number of deep neural networks have been applied to business process prediction tasks, remaining time prediction, to obtain more accurate predictive results. However, existing time prediction methods based on deep learning have poor interpretability, an explainable business process remaining time prediction method is proposed using reachability graph,which consists of prediction model construction and visualization. For prediction models, a Petri net is mined and the reachability graph is constructed to obtain the transition occurrence vector. Then, prefixes and corresponding suffixes are generated to cluster into different transition partitions according to transition occurrence vector. Next,the bidirectional recurrent neural network with attention is applied to each transition partition to encode the prefixes, and the deep transfer learning between different transition partitions is performed. For the visualization of prediction models, the evaluation values are added to the sub-processes of a Petri net to realize the visualization of the prediction models. Finally, the proposed method is validated by publicly available event logs.

关键词： Deep learning Training Visualization Recurrent neural networks Petri nets Transfer learning Process control

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：