检索结果-内蒙古大学图书馆

PVT v2:Improved baselines with Pyramid Vision Transformer

Computational Visual Media 2022年第3期8卷 415-424页

作者： Wenhai Wang Enze Xie Xiang Li Deng-Ping Fan Kaitao Song Ding Liang Tong Lu Ping Luo Ling Shao Shanghai AI Laboratory Shanghai 200232China Department of Computer Science and Technology NanjingUniversityNanjing 210023China Department of Computer Science the University ofHong KongHong Kong 999077China School of Computer Science and Engineering Nanjing University of Science and TechnologyNanjing 210014China Computer Vision Lab ETH ZurichZurich 8092Switzerland SenseTime Beijing 100080China Inception Institute of Artificial Intelligence Abu DhabiUnited Arab Emirates

Transformers have recently lead to encouraging progress in computer *** this work,we present new baselines by improving the original Pyramid Vision Transformer(PVT v1)by adding three designs:(i)a linear complexity attention layer,(ii)an overlapping patch embedding,and(iii)a convolutional feed-forward *** these modifications,PVT v2 reduces the computational complexity of PVT v1 to linearity and provides significant improvements on fundamental vision tasks such as classification,detection,and *** particular,PVT v2 achieves comparable or better performance than recent work such as the Swin *** hope this work will facilitate state-ofthe-art transformer research in computer *** is available at https://***/whai362/PVT.

关键词： transformers dense prediction image classification object detection semantic segmentation

来源：评论

学校读者我要写书评

暂无评论

HairDiffusion: Vivid Multi-Colored Hair Editing via Latent Diffusion 38

HairDiffusion: Vivid Multi-Colored Hair Editing via Latent D...

引用

38th Conference on Neural Information Processing Systems, NeurIPS 2024

作者： Zeng, Yu Zhang, Yang Liu, Jiachen Shen, Linlin Deng, Kaijun He, Weizhao Wang, Jinbao Computer Vision Institute School of Computer Science & Software Engineering Shenzhen University China Shenzhen Institute of Artificial Intelligence and Robotics for Society China National Engineering Laboratory for Big Data System Computing Technology Shenzhen University China Guangdong Provincial Key Laboratory of Intelligent Information Processing China

Hair editing is a critical image synthesis task that aims to edit hair color and hairstyle using text descriptions or reference images, while preserving irrelevant attributes (e.g., identity, background, cloth). Many existing methods are based on StyleGAN to address this task. However, due to the limited spatial distribution of StyleGAN, it struggles with multiple hair color editing and facial preservation. Considering the advancements in diffusion models, we utilize Latent Diffusion Models (LDMs) for hairstyle editing. Our approach introduces Multi-stage Hairstyle Blend (MHB), effectively separating control of hair color and hairstyle in diffusion latent space. Additionally, we train a warping module to align the hair color with the target region. To further enhance multi-color hairstyle editing, we fine-tuned a CLIP model using a multi-color hairstyle dataset. Our method not only tackles the complexity of multi-color hairstyles but also addresses the challenge of preserving original colors during diffusion editing. Extensive experiments showcase the superiority of our method in editing multi-color hairstyles while preserving facial attributes given textual descriptions and reference images. © 2024 Neural information processing systems foundation. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Multi-Request Data Cache Optimization Strategy in Edge Computing 26

Multi-Request Data Cache Optimization Strategy in Edge Compu...

引用

26th ACIS International Winter Conference on Software Engineering, artificial intelligence, Networking and Parallel/Distributed Computing, SNPD-Winter 2023

作者： Wang, Futian Wu, Zhimin Tang, Jin Li, Chenglong Zhang, Cheng School of Computer Science and Technology Anhui University Anhui Provincial Key Laboratory of Multimodal Cognitive Computation Hefei China School of Computer Science and Technology Anhui University Hefei China Institute of Artificial Intelligence Anhui University Hefei China

ISBN: (纸本)9798350345865

With the rapid growth in the number of mobile devices, more and more data is created and requested by users. By caching the data to the edge server, users can obtain the content they request in a closer place, which reduces the network load and improves the user experience. However, existing articles only focus on a single request of users, while users often request multiple content at one time, which leads to more complex caching scenarios. In this paper, we will study an edge data cache scenario with multiple requests, and design a cost function to optimize the cache results. In this paper, we formulate the data cache problem as a mixed integer programming problem and minimize our cost function. Extensive experiments are conducted on a real-world dataset that contains the locations of edge servers and mobile users, and the results reveal that our approach significantly outperform the baseline approaches. © 2023 IEEE.

关键词： Cost functions

来源：评论

学校读者我要写书评

暂无评论

Cross-Modal Retrieval for Motion and Text via DropTriple Loss 5

Cross-Modal Retrieval for Motion and Text via DropTriple Los...

引用

5th ACM International Conference on Multimedia in Asia, MMAsia 2023

作者： Yan, Sheng Liu, Yang Wang, Haoqiang Du, Xin Liu, Mengyuan Liu, Hong School of Artificial Intelligence Chongqing University of Technology China College of Computer Science Sichuan University China Key Laboratory of Machine Perception Shenzhen Graduate School Peking University China

ISBN: (纸本)9798400702051

Cross-modal retrieval of image-text and video-text is a prominent research area in computer vision and natural language processing. However, there has been insufficient attention given to cross-modal retrieval between human motion and text, despite its wide-ranging applicability. To address this gap, we utilize a concise yet effective dual-unimodal transformer encoder for tackling this task. Recognizing that overlapping atomic actions in different human motion sequences can lead to semantic conflicts between samples, we explore a novel triplet loss function called DropTriple Loss. This loss function discards false negative samples from the negative sample set and focuses on mining remaining genuinely hard negative samples for triplet training, thereby reducing violations they cause. We evaluate our model and approach on the HumanML3D and KIT Motion-Language datasets. On the latest HumanML3D dataset, we achieve a recall of 62.9% for motion retrieval and 71.5% for text retrieval (both based on R@10). The source code for our approach is publicly available at https://***/eanson023/rehamot. © 2023 Copyright held by the owner/author(s).

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

Fusion of Dynamic Hypergraph and Clinical Event for Sequential Diagnosis Prediction 29

Fusion of Dynamic Hypergraph and Clinical Event for Sequenti...

引用

29th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2023

作者： Zhang, Xin Peng, Xueping Guan, Hongjiao Zhao, Long Qiao, Xinxiao Lu, Wenpeng Key Laboratory of Computing Power Network and Information Security Ministry of Education Shandong Computer Science Center Jinan China Shandong Fundamental Research Center for Computer Science Shandong Provincial Key Laboratory of Computer Networks Jinan China Australian Artificial Intelligence Institute University of Technology Sydney Sydney Australia

ISBN: (纸本)9798350330717

Sequential diagnosis prediction (SDP) is a challenging task, aiming to predict patients' future diagnoses based on their historical medical records. While methods based on graph neural networks (GNNs) have proven successful for this task, they typically focus on modeling pairwise diseases using a global disease combination graph. However, these approaches neglect the fine-grained higher-order relations among persistent and emerging diseases within a single visit, which may contain crucial clues to predict the next diagnosis. Additionally, they fail to fully leverage patient-related clinical information present in electronic health records (EHRs). To address these challenges, this paper proposes a novel approach called the fusion of Dynamic Hypergraph and Clinical Event (DHCE) for sequential diagnosis prediction. The proposed method aims to exploit the fine-grained higher-order relations among diagnoses within a visit and leverage clinical event information from EHRs to improve the accuracy of predicting the next diagnosis. Specifically, DHCE categorizes diagnoses within a single visit in a fine-grained granularity into persistent and emerging categories based on a patient's historical diagnoses. It then constructs dynamic hypergraphs to capture higher-order disease relations within each visit. Next, we design a transition function to extract the transitional context from previous visits in order to generate the visit representation. Furthermore, to fully leverage patient-related clinical events in a visit, we utilize Bio-Clinical BERT to encode them and generate the clinical event representation for each visit. Finally, we combine the visit representation and event representation to generate a comprehensive patient representation, which is then used to predict the patient's next diagnosis. Experimental results on two benchmark datasets consistently demonstrate that DHCE outperforms state-of-the-art methods. © 2023 IEEE.

关键词： Bio-Clinical BERT clinical event dynamic hypergraph sequential diagnosis prediction

来源：评论

学校读者我要写书评

暂无评论

A Hybrid Medical Images Watermarking Technique Using DWT and DCT 7th

A Hybrid Medical Images Watermarking Technique Using DWT and...

引用

7th International Conference on artificial intelligence in Renewable Energetic Systems, IC-AIRES 2023

作者： Hamami, Rania Zermi, Narima Boubchir, Larbi Khaldi, Amine LASA Laboratory Faculty of Technology Badji Mokhtar Annaba University P.O. BOX 12 Annaba23000 Algeria LIASD Laboratory Department of Computer Science University of Paris 8 Saint-Denis93526 France Computer Science Department Faculty of Sciences and Technology Artificial Intelligence and Information University of Kasdi Merbah Ouargla Algeria

ISBN: (纸本)9783031606311

Medical images, such as X-rays, MRI scans, and CT scans, play an important role in diagnosing and treating various diseases. However, the sensitive nature of these images requires protection against unauthorized use and distribution. This article presents a novel approach for embedding hospital logos, patient information, and hospital/doctor details into medical images using a combination of wavelet transform and discrete cosine transform, To ensure enhanced security and robustness, we hash the watermark data using the MD5 algorithm before the embedding. Experimental results demonstrate the exceptional performance of the proposed technique, with a peak signal-to-noise ratio (PSNR) exceeding 50 dB, and very good robustness against attacks. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

关键词： Discrete cosine transforms

来源：评论

学校读者我要写书评

暂无评论

PROVABLE SIM-TO-REAL TRANSFER IN CONTINUOUS DOMAIN WITH PARTIAL OBSERVATIONS 11

PROVABLE SIM-TO-REAL TRANSFER IN CONTINUOUS DOMAIN WITH PART...

引用

11th International Conference on Learning Representations, ICLR 2023

作者： Hu, Jiachen Zhong, Han Jin, Chi Wang, Liwei School of Computer Science Peking University China Center for Data Science Peking University China Department of Electrical and Computer Engineering Princeton University United States National Key Laboratory of General Artificial Intelligence School of Intelligence Science and Technology Peking University Center for Data Science Peking University Beijing Institute of Big Data Research China

Sim-to-real transfer, which trains RL agents in the simulated environments and then deploys them in the real world, has been widely used to overcome the limitations of gathering samples in the real world. Despite the empirical success of the sim-to-real transfer, its theoretical foundation is much less understood. In this paper, we study the sim-to-real transfer in continuous domain with partial observations, where the simulated environments and real-world environments are modeled by linear quadratic Gaussian (LQG) systems. We show that a popular robust adversarial training algorithm is capable of learning a policy from the simulated environment that is competitive to the optimal policy in the real-world environment. To achieve our results, we design a new algorithm for infinite-horizon average-cost LQGs and establish a regret bound that depends on the intrinsic complexity of the model class. Our algorithm crucially relies on a novel history clipping scheme, which might be of independent interest. © 2023 11th International Conference on Learning Representations, ICLR 2023. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Generalization error bounds for two-stage recommender systems with tree structure 24

Generalization error bounds for two-stage recommender system...

引用

Proceedings of the 38th International Conference on Neural Information Processing Systems

作者： Jin Zhang Ze Liu Defu Lian Enhong Chen School of Artificial Intelligence and Data Science University of Science and Technology of China School of Computer Science and Technology University of Science and Technology of China School of Artificial Intelligence and Data Science and School of Computer Science and Technology University of Science and Technology of China and State Key Laboratory of Cognitive Intelligence Hefei Anhui China

ISBN: (纸本)9798331314385

Two-stage recommender systems play a crucial role in efficiently identifying relevant items and personalizing recommendations from a vast array of options. This paper, based on an error decomposition framework, analyzes the generalization error for two-stage recommender systems with a tree structure, which consist of an efficient tree-based retriever and a more precise yet time-consuming ranker. We use the Rademacher complexity to establish the generalization upper bound for various tree-based retrievers using beam search, as well as for different ranker models under a shifted training distribution. Both theoretical insights and practical experiments on real-world datasets indicate that increasing the branches in tree-based retrievers and harmonizing distributions across stages can enhance the generalization performance of two-stage recommender systems.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Action Recognition with PIR Sensor Array and Bidirectional Long Short-term Memory Neural Network 9

Action Recognition with PIR Sensor Array and Bidirectional L...

引用

9th IEEE International Conference on Cloud Computing and intelligence Systems, CCIS 2023

作者： Liu, Tong Liang, Jianchu Wan, Kai Liu, Jun Huizhou University Department of Electronics Engineering Huizhou China Wenzhou University College of Computer Science and Artificial Intelligence Wenzhou China Laboratory for Intelligent Networking of Wenzhou City Wenzhou China

ISBN: (纸本)9798350304428

The spatio-temporal sequence of human body movements provides important information about daily action patterns. This article presents a pyroelectric infrared (PIR) sensor array for detecting human motion features and develops a bidirectional long short-term memory (LSTM) neural network for sequence recognition. A PIR sensor array with five direction-sensitive modules is proposed to detect the changing infrared field induced by the human head, upper limb, and lower limb. The low-dimensional output of the sensor array is directly fed to a multi-layer bidirectional LSTM-based neural network for sequential dependency learning. A PIR data set of seven daily actions is generated by four subjects imitating the predefined movements. Experimental results demonstrate that the presented approach achieves high accuracy in action recognition. © 2023 IEEE.

关键词： Long short-term memory

来源：评论

学校读者我要写书评

暂无评论

OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models 62

OOP: Object-Oriented Programming Evaluation Benchmark for La...

引用

Findings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024

作者： Wang, Shuai Ding, Liang Shen, Li Luo, Yong Du, Bo Tao, Dacheng Institute of Artificial Intelligence School of Computer Science Wuhan University China Hubei Luojia Laboratory Wuhan China The University of Sydney Australia School of Cyber Science and Teachnology Sun Yat-sen University China College of Computing & Data Science Nanyang Technology University Singapore

ISBN: (纸本)9798891760998

Advancing automated programming necessitates robust and comprehensive code generation benchmarks, yet current evaluation frameworks largely neglect object-oriented programming (OOP) in favour of functional programming (FP), e.g., HumanEval and MBPP. To address this, (1) our study introduces a pioneering OOP-focused benchmark, featuring 431 Python programs that encompass essential OOP concepts and features like classes and encapsulation methods. (2) We propose a novel evaluation metric, pass@o, tailored for OOP, enhancing traditional pass@k metric. (3) Our evaluation of 23 leading large language models (LLMs), including both general and code-specialized models, reveals three key insights: 1) pass@o offers a more relevant and comprehensive assessment for OOP code generation;2) Despite excelling in FP, code-specialized LLMs like WizardCoder lag in OOP compared to models like ChatGPT;3) The poor performance of all advanced LLMs on our OOP benchmark highlights a critical need for improvements in this field. Our benchmark and scripts are publicly released at: https://***/alphadl/OOP-eval. © 2024 Association for Computational Linguistics.

关键词： Object oriented programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：