检索结果-内蒙古大学图书馆

Vision-based Semantic Communications for Metaverse Services: A Contest Theoretic Approach

学校读者我要写书评

暂无评论

Vision-based Semantic Communications for Metaverse Services:...

IEEE Conference on Global Communications (GLOBECOM)

作者： Guangyuan Liu Hongyang Du Dusit Niyato Jiawen Kang Zehui Xiong Boon Hee Soong Energy Research Institute@NTU Interdisciplinary Graduate Program School of Computer Science and Engineering Nanyang Technological University Singapore School of Computer Science and Engineering Nanyang Technological University Singapore School of Automation Guangdong University of Technology China Pillar of Information Systems Technology and Design Singapore University of Technology and Design Singapore Department of Electrical and Electronic Engineering Nanyang Technological University Singapore

The popularity of Metaverse as an entertainment, social, and work platform has led to a great need for seamless avatar integration in the virtual world. In Metaverse, avatars must be updated and rendered to reflect users' behaviour. Achieving real-time synchronization between the virtual bilocation and the user is complex, placing high demands on the Metaverse Service Provider (MSP)'s rendering resource allocation scheme. To tackle this issue, we propose a semantic communication framework that leverages contest theory to model the interactions between users and MSPs and determine optimal resource allocation for each user. To reduce the consumption of network resources in wireless transmission, we use the semantic communication technique to reduce the amount of data to be transmitted. Under our simulation settings, the encoded semantic data only contains 51 bytes of skeleton coordinates instead of the image size of 8.243 megabytes. Moreover, we implement Deep Q-Network to optimize reward settings for maximum performance and efficient resource allocation. With the optimal reward setting, users are incentivized to select their respective suitable uploading frequency, reducing down-sampling loss due to rendering resource constraints by 66.076% compared with the traditional average distribution method. The framework provides a novel solution to resource allocation for avatar association in VR environments, ensuring a smooth and immersive experience for all users.

关键词：

Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Jin, Peng Huang, Jinfa Xiong, Pengfei Tian, Shangxuan Liu, Chang Ji, Xiangyang Yuan, Li Chen, Jie School of Electronic and Computer Engineering Peking University Shenzhen China Peng Cheng Laboratory Shenzhen China -Preferred Program Peking University Shenzhen Graduate School Shenzhen China Shopee Shenzhen China Department of Automation BNRist Tsinghua University Beijing China

Contrastive learning-based video-language representation learning approaches, e.g., CLIP, have achieved outstanding performance, which pursue semantic interaction upon pre-defined video-text pairs. To clarify this coarse-grained global interaction and move a step further, we have to encounter challenging shell-breaking interactions for fine-grained cross-modal learning. In this paper, we creatively model video-text as game players with multivariate cooperative game theory to wisely handle the uncertainty during fine-grained semantic interaction with diverse granularity, flexible combination, and vague intensity. Concretely, we propose Hierarchical Banzhaf Interaction (HBI) to value possible correspondence between video frames and text words for sensitive and explainable cross-modal contrast. To efficiently realize the cooperative game of multiple video frames and multiple text words, the proposed method clusters the original video frames (text words) and computes the Banzhaf Interaction between the merged tokens. By stacking token merge modules, we achieve cooperative games at different semantic levels. Extensive experiments on commonly used text-video retrieval and video-question answering benchmarks with superior performances justify the efficacy of our HBI. More encouragingly, it can also serve as a visualization tool to promote the understanding of cross-modal interaction, which have a far-reaching impact on the community. Project page is available at https://***/HBI/. Copyright © 2023, The Authors. All rights reserved.

关键词： Game theory

Enabling AI-Generated Content (AIGC) Services in Wireless Edge Networks

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Du, Hongyang Li, Zonghang Niyato, Dusit Kang, Jiawen Xiong, Zehui Shen, Xuemin Kim, Dong In The School of Computer Science and Engineering The Energy Research Institute @ NTU Interdisciplinary Graduate Program Nanyang Technological University Singapore The School of Information and Communication Engineering University of Electronic Sciences and Technology of China Chengdu China The School of Automation Guangdong University of Technology China The Pillar of Information Systems Technology and Design Singapore University of Technology and Design Singapore The Department of Electrical and Computer Engineering University of Waterloo Canada The Department of Electrical and Computer Engineering Sungkyunkwan University Korea Republic of

Artificial Intelligence-Generated Content (AIGC) refers to the use of AI to automate the information creation process while fulfilling the personalized requirements of users. However, due to the instability of AIGC models, e.g., the stochastic nature of diffusion models, the quality and accuracy of the generated content can vary significantly. In wireless edge networks, the transmission of incorrectly generated content may unnecessarily consume network resources. Thus, a dynamic AIGC service provider (ASP) selection scheme is required to enable users to connect to the most suited ASP, improving the users’ satisfaction and quality of generated content. In this article, we first review the AIGC techniques and their applications in wireless networks. We then present the AIGC-as-a-service (AaaS) concept and discuss the challenges in deploying AaaS at the edge networks. Yet, it is essential to have performance metrics to evaluate the accuracy of AIGC services. Thus, we introduce several image-based perceived quality evaluation metrics. Then, we propose a general and effective model to illustrate the relationship between computational resources and user-perceived quality evaluation metrics. To achieve efficient AaaS and maximize the quality of generated content in wireless edge networks, we propose a deep reinforcement learning-enabled algorithm for the optimal ASP selection. Simulation results show that the proposed algorithm can provide a higher quality of generated content to users and achieve fewer crashed tasks by comparing with four benchmarks, i.e., overloading-avoidance, random, round-robin policies, and the upper-bound schemes. © 2023, CC BY.

关键词： Reinforcement learning

Out-of-Candidate Rectification for Weakly Supervised Semantic Segmentation

学校读者我要写书评

暂无评论

Out-of-Candidate Rectification for Weakly Supervised Semanti...

Conference on computer Vision and Pattern Recognition (CVPR)

作者： Zesen Cheng Pengchong Qiao Kehan Li Siheng Li Pengxu Wei Xiangyang Ji Li Yuan Chang Liu Jie Chen School of Electronic and Computer Engineering Peking University Shenzhen China AI for Science (AI4S)-Preferred Program Peking University Shenzhen Graduate School China Peng Cheng Laboratory Shenzhen China Tsinghua University Sun Yat-Sen University

Weakly supervised semantic segmentation is typically inspired by class activation maps, which serve as pseudo masks with class-discriminative regions highlighted. Although tremendous efforts have been made to recall precise and complete locations for each class, existing methods still commonly suffer from the unsolicited Out-of-Candidate (OC) error predictions that do not belong to the label candidates, which could be avoidable since the contradiction with image-level class tags is easy to be detected. In this paper, we develop a group ranking-based Out-of-f;Candidate Rectification (OCR) mechanism in a plug-and-play fashion. Firstly, we adaptively split the semantic categories into In-Candidate (IC) and OC groups for each OC pixel according to their prior annotation correlation and posterior prediction correlation. Then, we derive a differentiable rectification loss to force OC pixels to shift to the IC group. Incorporating OCR with seminal baselines (e.g., AffinityNet, SEAM, MCTformer), we can achieve remarkable performance gains on both Pascal VOC (+3.2%, +3.3%, +0.8% mIoU) and MS COCO (+1.0%, +1.3%, +0.5% mIoU) datasets with negligible extra training overhead, which jus-tifies the effectiveness and generality of OCR. † † Ŋ ***/sennnnn/Out-of-Candidate-Rectification

关键词：

Reconfigurable Intelligent Surface-Aided Joint Radar and Covert Communications: Fundamentals, Optimization, and Challenges

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Du, Hongyang Kang, Jiawen Niyato, Dusit Zhang, Jiayi Kim, Dong In The School of Computer Science and Engineering The Energy Research Institute @ NTU Interdisciplinary Graduate Program Nanyang Technological University Singapore The School of Computer Science and Engineering Nanyang Technological University Singapore The School of Electronic and Information Engineering Beijing Jiaotong University Beijing100044 China The Department of Electrical and Computer Engineering Sungkyunkwan University Suwon16419 Korea Republic of

Future wireless communication systems will evolve toward multi-functional integrated systems to improve spectrum utilization and reduce equipment sizes. A joint radar and communication (JRC) system, which can support simultaneous information transmission and target detection, has been regarded as a promising solution for emerging applications such as autonomous vehicles. In JRC, data security and privacy protection are critical issues. Thus, we first apply covert communication into JRC and propose a joint radar and covert communication (JRCC) system to achieve high spectrum utilization and secure data transmission simultaneously. In the JRCC system, an existence of sensitive data transmission is hidden from a maliciously observant warden. However, the performance of JRCC is restricted by severe signal propagation environment and hardware devices. Fortunately, reconfigurable intelligent surfaces (RISs) can change the signal propagation smartly to improve the networks performance with low cost. We first overview fundamental concepts of JRCC and RIS and then propose the RIS-aided JRCC system design. Furthermore, both covert communication and radar performance metrics are investigated and a game theory-based covert rate optimization scheme is designed to achieve secure communication. Finally, we present several promising applications and future directions of RIS-aided JRCC systems. © 2022, CC BY.

关键词： Data transfer

GraCo: Granularity-Controllable Interactive Segmentation

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Zhao, Yian Li, Kehan Cheng, Zesen Qiao, Pengchong Zheng, Xiawu Ji, Rongrong Liu, Chang Yuan, Li Chen, Jie School of Electronic and Computer Engineering Peking University Shenzhen China Peng Cheng Laboratory Shenzhen China -Preferred Program Peking University Shenzhen Graduate School China Key Laboratory of Multimedia Trusted Perception and Efficient Computing Ministry of Education of China Xiamen University China Department of Automation and BNRist Tsinghua University Beijing China

Interactive Segmentation (IS) segments specific objects or parts in the image according to user input. Current IS pipelines fall into two categories: single-granularity output and multi-granularity output. The latter aims to alleviate the spatial ambiguity present in the former. However, the multi-granularity output pipeline suffers from limited interaction flexibility and produces redundant results. In this work, we introduce Granularity-Controllable Interactive Segmentation (GraCo), a novel approach that allows precise control of prediction granularity by introducing additional parameters to input. This enhances the customization of the interactive system and eliminates redundancy while resolving ambiguity. Nevertheless, the exorbitant cost of annotating multi-granularity masks and the lack of available datasets with granularity annotations make it difficult for models to acquire the necessary guidance to control output granularity. To address this problem, we design an any-granularity mask generator that exploits the semantic property of the pre-trained IS model to automatically generate abundant mask-granularity pairs without requiring additional manual annotation. Based on these pairs, we propose a granularity-controllable learning strategy that efficiently imparts the granularity controllability to the IS model. Extensive experiments on intricate scenarios at object and part levels demonstrate that our GraCo has significant advantages over previous methods. This highlights the potential of GraCo to be a flexible annotation tool, capable of adapting to diverse segmentation scenarios. The project page: https://***/GraCo. Copyright © 2024, The Authors. All rights reserved.

关键词： Pipelines

Fuzzy Positive Learning for Semi-Supervised Semantic Segmentation

学校读者我要写书评

暂无评论

Fuzzy Positive Learning for Semi-Supervised Semantic Segment...

Conference on computer Vision and Pattern Recognition (CVPR)

作者： Pengchong Qiao Zhidan Wei Yu Wang Zhennan Wang Guoli Song Fan Xu Xiangyang Ji Chang Liu Jie Chen School of Electronic and Computer Engineering Peking University Shenzhen China Peng Cheng Laboratory Shenzhen China AI for Science (AI4S)-Preferred Program Peking University Shenzhen Graduate School China Department of Automation and BNRist Tsinghua University Beijing China

Semi-supervised learning (SSL) essentially pursues class boundary exploration with less dependence on human annotations. Although typical attempts focus on ameliorating the inevitable error-prone pseudo-labeling, we think differently and resort to exhausting informative semantics from multiple probably correct candidate labels. In this paper, we introduce Fuzzy Positive Learning (FPL) for accurate SSL semantic segmentation in a plug-and-play fashion, targeting adaptively encouraging fuzzy positive predictions and suppressing highly-probable negatives. Being conceptually simple yet practically effective, FPL can remarkably alleviate interference from wrong pseudo labels and progressively achieve clear pixel-level semantic discrimination. Concretely, our FPL approach consists of two main components, including fuzzy positive assignment (FPA) to provide an adaptive number of labels for each pixel and fuzzy positive regularization (FPR) to restrict the predictions of fuzzy positive categories to be larger than the rest under different perturbations. Theoretical analysis and extensive experiments on Cityscapes and VOC 2012 with consistent performance gain justify the superiority of our approach. Codes are provided in https://***/qpc1611094/FPL.

关键词：

Out-of-Distributed Semantic Pruning for Robust Semi-Supervised Learning

学校读者我要写书评

暂无评论

Out-of-Distributed Semantic Pruning for Robust Semi-Supervis...

Conference on computer Vision and Pattern Recognition (CVPR)

作者： Yu Wang Pengchong Qiao Chang Liu Guoli Song Xiawu Zheng Jie Chen School of Electronic and Computer Engineering Peking University Shenzhen China AI for Science(AI4S)-Preferred Program Peking University Shenzhen Graduate School China Peng Cheng Laboratory Shenzhen China Department of Automation and BNRist Tsinghua University Beijing China

Recent advances in robust semi-supervised learning (SSL) typically filter out-of-distribution (OOD) information at the sample level. We argue that an overlooked problem of robust SSL is its corrupted information on semantic level, practically limiting the development of the field. In this paper, we take an initial step to explore and propose a unified framework termed OOD Semantic Pruning (OSP), which aims at pruning OOD semantics out from in-distribution (ID) features. Specifically, (i) we propose an aliasing OOD matching module to pair each ID sample with an OOD sample with semantic overlap. (ii) We design a soft orthogonality regularization, which first transforms each ID feature by suppressing its semantic component that is collinear with paired OOD sample. It then forces the predictions before and after soft orthogonality decomposition to be consistent. Being practically simple, our method shows a strong performance in OOD detection and ID classification on challenging benchmarks. In particular, OSP surpasses the previous state-of-the-art by 13.7% on accuracy for ID classification and 5.9% on AUROC for OOD detection on TinyImageNet dataset. The source codes are publicly available at https://***/rain305f/OSP.

关键词：

DiffusionRet: Generative Text-Video Retrieval with Diffusion Model

学校读者我要写书评

暂无评论

DiffusionRet: Generative Text-Video Retrieval with Diffusion...

International Conference on computer Vision (ICCV)

作者： Peng Jin Hao Li Zesen Cheng Kehan Li Xiangyang Ji Chang Liu Li Yuan Jie Chen School of Electronic and Computer Engineering Peking University Shenzhen China AI for Science (AI4S)-Preferred Program Peking University Shenzhen Graduate School Shenzhen China Department of Automation and BNRist Tsinghua University Beijing China Peng Cheng Laboratory Shenzhen China

Existing text-video retrieval solutions are, in essence, discriminant models focused on maximizing the conditional likelihood, i.e., p(candidates|query). While straightforward, this de facto paradigm overlooks the underlying data distribution p(query), which makes it challenging to identify out-of-distribution data. To address this limitation, we creatively tackle this task from a generative viewpoint and model the correlation between the text and the video as their joint probability p(candidates,query). This is accomplished through a diffusion-based text-video retrieval framework (Diffusion-Ret), which models the retrieval task as a process of gradually generating joint distribution from noise. During training, DiffusionRet is optimized from both the generation and discrimination perspectives, with the generator being optimized by generation loss and the feature extractor trained with contrastive loss. In this way, DiffusionRet cleverly leverages the strengths of both generative and discriminative methods. Extensive experiments on five commonly used text-video retrieval benchmarks, including MSRVTT, LSMDC, MSVD, ActivityNet Captions, and DiDeMo, with superior performances, justify the efficacy of our method. More encouragingly, without any modification, DiffusionRet even performs well in out-domain retrieval settings. We believe this work brings fundamental insights into the related fields. Code is available at https://***/jpthu17/DiffusionRet.

关键词：