检索结果-内蒙古大学图书馆

arXiv 2025年

作者： Luo, Ziyuan Rocha, Anderson Shi, Boxin Guo, Qing Li, Haoliang Wan, Renjie Department of Computer Science Hong Kong Baptist University Hong Kong Institute of Computing University of Campinas Brazil State Key Laboratory of Multimedia Information Processing and National Engineering Research Center of Visual Technology School of Computer Science Peking University Beijing100871 China A*STAR Singapore Department of Electrical Engineering City University of Hong Kong Hong Kong

Neural Radiance Fields (NeRF) have been gaining attention as a significant form of 3D content representation. With the proliferation of NeRF-based creations, the need for copyright protection has emerged as a critical issue. Although some approaches have been proposed to embed digital watermarks into NeRF, they often neglect essential model-level considerations and incur substantial time overheads, resulting in reduced imperceptibility and robustness, along with user inconvenience. In this paper, we extend the previous criteria for image watermarking to the model level and propose NeRF Signature, a novel watermarking method for NeRF. We employ a Codebook-aided Signature Embedding (CSE) that does not alter the model structure, thereby maintaining imperceptibility and enhancing robustness at the model level. Furthermore, after optimization, any desired signatures can be embedded through the CSE, and no fine-tuning is required when NeRF owners want to use new binary signatures. Then, we introduce a joint pose-patch encryption watermarking strategy to hide signatures into patches rendered from a specific viewpoint for higher robustness. In addition, we explore a Complexity-Aware Key Selection (CAKS) scheme to embed signatures in high visual complexity patches to enhance imperceptibility. The experimental results demonstrate that our method outperforms other baseline methods in terms of imperceptibility and robustness. The source code is available at: https://***/luo-ziyuan/NeRF_Signature. Copyright © 2025, The Authors. All rights reserved.

关键词： Image watermarking

来源：评论

学校读者我要写书评

暂无评论

SAT-HMR: Real-Time Multi-Person 3D Mesh Estimation via Scale-Adaptive Tokens

arXiv

引用

arXiv 2024年

作者： Su, Chi Ma, Xiaoxuan Su, Jiajun Wang, Yizhou Center on Frontiers of Computing Studies School of Computer Science Peking University China Inst. for Artificial Intelligence Peking University China Nat’l Eng. Research Center of Visual Technology China State Key Laboratory of General Artificial Intelligence Peking University China China

We propose SAT-HMR, a one-stage framework for real-time multi-person 3D human mesh estimation from a single RGB image. While current one-stage methods, which follow a DETR-style pipeline, achieve state-of-the-art (SOTA) performance with high-resolution inputs, we observe that this particularly benefits the estimation of individuals in smaller scales of the image (e.g., those of young age or far from the camera), but at the cost of significantly increased computation overhead. To address this, we introduce scale-adaptive tokens that are dynamically adjusted based on the relative scale of each individual in the image within the DETR framework. Specifically, individuals in smaller scales are processed at higher resolutions, larger ones at lower resolutions, and background regions are further distilled. These scale-adaptive tokens more efficiently encode the image features, facilitating subsequent decoding to regress the human mesh, while allowing the model to allocate computational resources more effectively and focus on more challenging cases. Experiments show that our method preserves the accuracy benefits of high-resolution processing while substantially reducing computational cost, achieving real-time inference with performance comparable to SOTA methods. Copyright © 2024, The Authors. All rights reserved.

关键词： Image coding

来源：评论

学校读者我要写书评

暂无评论

3D Shape Completion on Unseen Categories: A Weakly-supervised Approach

arXiv

引用

arXiv 2024年

作者： Wu, Lintai Hou, Junhui Song, Linqi Xu, Yong The Department of Computer Science City University of Hong Kong Hong Kong The Bio-Computing Research Center Harbin Institute of Technology Shenzhen Guangdong Shenzhen518055 China The Shenzhen Key Laboratory of Visual Object Detection and Recognition Guangdong Shenzhen518055 China

3D shapes captured by scanning devices are often incomplete due to occlusion. 3D shape completion methods have been explored to tackle this limitation. However, most of these methods are only trained and tested on a subset of categories, resulting in poor generalization to unseen categories. In this paper, we propose a novel weakly-supervised framework to reconstruct the complete shapes from unseen categories. We first propose an end-to-end prior-assisted shape learning network that leverages data from the seen categories to infer a coarse shape. Specifically, we construct a prior bank consisting of representative shapes from the seen categories. Then, we design a multi-scale pattern correlation module for learning the complete shape of the input by analyzing the correlation between local patterns within the input and the priors at various scales. In addition, we propose a self-supervised shape refinement model to further refine the coarse shape. Considering the shape variability of 3D objects across categories, we construct a category-specific prior bank to facilitate shape refinement. Then, we devise a voxel-based partial matching loss and leverage the partial scans to drive the refinement process. Extensive experimental results show that our approach is superior to state-of-the-art methods by a large margin. We will make the source code publicly available at https://***/ltwu6/WSSC. Copyright © 2024, The Authors. All rights reserved.

关键词： Self-supervised learning

来源：评论

学校读者我要写书评

暂无评论

Cross-Point Adversarial Attack Based on Feature Neighborhood Disruption Against Segment Anything Model

Cross-Point Adversarial Attack Based on Feature Neighborhood...

引用

IEEE International Conference on Multimedia and Expo (ICME)

作者： Yan Jiang Guisheng Yin Ye Yuan Jingjing Chen Zhipeng Wei College of Computer Science and Technology Harbin Engineering University Harbin China Shanghai Key Lab of Intell. Info. Processing School of CS Fudan University Shanghai China Shanghai Collaborative Innovation Center of Intelligent Visual Computing Shanghai China

ISBN: (数字)9798350390155

ISBN: (纸本)9798350390162

Segment anything model (SAM) has received significant attention owing to its outstanding segmentation performance. However, it may still face security threats from adversarial examples. Since SAM interactively realizes the prediction of target areas according to user-specified prompts (e.g., points), adversarial examples generated by existing end-to-end attack methods usually exhibit limited attack performance when faced with different point prompts. To this end, we propose a cross-point adversarial attack method based on feature neighborhood disruption against SAM, called CP-FND attack. CP-FND aims to generate adversarial examples capable of effectively deceiving SAM under different user-specified point prompts. Specifically, CP-FND forces the intermediate feature of adversarial examples to be similar to the designed disruption features without relying on any specified point prompt. Subsequently, the continuity and relevance of contextual features are disrupted, thereby fooling SAM and suppressing its predicted masks. Extensive experiments demonstrate that CP-FND achieves superior cross-point adversarial attack performance against SAM compared to state-of-the-art methods.

关键词： Image segmentation Correlation Security Faces

来源：评论

学校读者我要写书评

暂无评论

3D Human Mesh Estimation from Virtual Markers

arXiv

引用

arXiv 2023年

作者： Ma, Xiaoxuan Su, Jiajun Wang, Chunyu Zhu, Wentao Wang, Yizhou School of Computer Science Center on Frontiers of Computing Studies Peking University China Inst. for Artificial Intelligence Peking University China Microsoft Research Asia China Nat’l Eng. Research Center of Visual Technology China

Inspired by the success of volumetric 3D pose estimation, some recent human mesh estimators propose to estimate 3D skeletons as intermediate representations, from which, the dense 3D meshes are regressed by exploiting the mesh topology. However, body shape information is lost in extracting skeletons, leading to mediocre performance. The advanced motion capture systems solve the problem by placing dense physical markers on the body surface, which allows to extract realistic meshes from their non-rigid motions. However, they cannot be applied to wild images without markers. In this work, we present an intermediate representation, named virtual markers, which learns 64 landmark keypoints on the body surface based on the large-scale mocap data in a generative style, mimicking the effects of physical markers. The virtual markers can be accurately detected from wild images and can reconstruct the intact meshes with realistic shapes by simple interpolation. Our approach outperforms the state-of-the-art methods on three datasets. In particular, it surpasses the existing methods by a notable margin on the SURREAL dataset, which has diverse body shapes. Code is available at https://***/ShirleyMaxx/VirtualMarker. Copyright © 2023, The Authors. All rights reserved.

关键词： Mesh generation

来源：评论

学校读者我要写书评

暂无评论

Transferability Estimation Based On Principal Gradient Expectation

arXiv

引用

arXiv 2022年

作者： Qi, Huiyan Cheng, Lechao Chen, Jingjing Yu, Yue Song, Xue Fengg, Zunlei Jiang, Yu-Gang School of Computer Science Shanghai Collaborative Innovation Center of Intelligent Visual Computing Fudan University China Zhejiang Lab China Zhejiang University China

Transfer learning aims to improve the performance of target tasks by transferring knowledge acquired in source tasks. The standard approach is pre-training followed by fine-tuning or linear probing. Especially, selecting a proper source domain for a specific target domain under predefined tasks is crucial for improving efficiency and effectiveness. It is conventional to solve this problem via estimating transferability. However, existing methods can not reach a trade-off between performance and cost. To comprehensively evaluate estimation methods, we summarize three properties: stability, reliability and efficiency. Building upon them, we propose Principal Gradient Expectation (PGE), a simple yet effective method for assessing transferability. Specifically, we calculate the gradient over each weight unit multiple times with a restart scheme, and then we compute the expectation of all gradients. Finally, the transferability between the source and target is estimated by computing the gap of normalized principal gradients. Extensive experiments show that the proposed metric is superior to state-of-the-art methods on all properties. © 2022, CC BY.

关键词： Economic and social effects

来源：评论

学校读者我要写书评

暂无评论

Richelieu: self-evolving LLM-based agents for AI diplomacy 24

Richelieu: self-evolving LLM-based agents for AI diplomacy

引用

Proceedings of the 38th International Conference on Neural Information Processing Systems

作者： Zhenyu Guan Xiangyu Kong Fangwei Zhong Yizhou Wang Institute for Artificial Intelligence Peking University College of Computer Science Beijing Information Science and Technology University and State Key Laboratory of General Artificial Intelligence BIGAI School of Artificial Intelligence Beijing Normal University and State Key Laboratory of General Artificial Intelligence BIGAI Center on Frontiers of Computing Studies School of Computer Science Nat'l Eng. Research Center of Visual Technology Peking University and Institute for Artificial Intelligence Peking University

ISBN: (纸本)9798331314385

Diplomacy is one of the most sophisticated activities in human society, involving complex interactions among multiple parties that require skills in social reasoning, negotiation, and long-term strategic planning. Previous AI agents have demonstrated their ability to handle multi-step games and large action spaces in multi-agent tasks. However, diplomacy involves a staggering magnitude of decision spaces, especially considering the negotiation stage required. While recent agents based on large language models (LLMs) have shown potential in various applications, they still struggle with extended planning periods in complex multi-agent settings. Leveraging recent technologies for LLM-based agents, we aim to explore AI's potential to create a human-like agent capable of executing comprehensive multi-agent missions by integrating three fundamental capabilities: 1) strategic planning with memory and reflection; 2) goal-oriented negotiation with social reasoning; and 3) augmenting memory through self-play games for self-evolution without human in the loop. Project page: https://***/view/richelieu-diplomacy.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Research on liquid crystal display technology based on regional dynamic dimming algorithm

Research on liquid crystal display technology based on regio...

引用

IEEE International Conference on Data science in Cyberspace (DSC)

作者： Zhitao Yu Peng Sun Mingle Zhou Qianlong Liu Shilong Zhao Hisense Video Technology Co. Ltd Qing Dao China Hisense Visual Technology Co. Ltd Qing Dao China Key Laboratory of Computing Power Network and Information Security Ministry of Education Shandong Computer Science Center (National Supercomputer Center in Jinan) Qilu University of Technology (Shandong Academy of Sciences) Jinan China Shandong Provincial Key Laboratory of Computer Networks Shandong Fundamental Research Center for Computer Science Jinan China

ISBN: (数字)9798350391367

ISBN: (纸本)9798350391374

In vehicle liquid crystal display (LCD) technology has attracted much attention for its wide range of applications in automotive infotainment systems. However, conventional LCD technologies have limitations in terms of energy consumption and display quality, especially in the context of the increasing demand for high contrast and low energy consumption. To address these challenges, this paper carries out a study of LCD technology based on an area dynamic dimming algorithm. The technology significantly improves the contrast ratio of LCDs and effectively reduces energy consumption by dividing the backlight module into multiple independently adjustable regions and dynamically adjusting the backlight brightness of each region according to the image content. The results show that the region dynamic dimming algorithm proposed in this paper can effectively adjust the brightness of the backlight according to the image content, achieving higher display contrast and lower energy consumption.

关键词： Energy consumption Heuristic algorithms Brightness Light emitting diodes Liquid crystal displays Robustness Entropy Energy efficiency Vehicle dynamics Optimization

来源：评论

学校读者我要写书评

暂无评论

Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision

arXiv

引用

arXiv 2023年

作者： Xu, Jilan Hou, Junlin Zhang, Yuejie Feng, Rui Wang, Yi Qiao, Yu Xie, Weidi School of Computer Science Shanghai Key Lab of Intelligent Information Processing Shanghai Collaborative Innovation Center of Intelligent Visual Computing Fudan University China Shanghai AI Laboratory China Shanghai Jiaotong University China

In this paper, we consider the problem of open-vocabulary semantic segmentation (OVS), which aims to segment objects of arbitrary classes instead of pre-defined, closed-set categories. The main contributions are as follows: First, we propose a transformer-based model for OVS, termed as OVSegmentor, which only exploits web-crawled image-text pairs for pre-training without using any mask annotations. OVSegmentor assembles the image pixels into a set of learnable group tokens via a slot-attention based binding module, and aligns the group tokens to the corresponding caption embedding. Second, we propose two proxy tasks for training, namely masked entity completion and cross-image mask consistency. The former aims to infer all masked entities in the caption given the group tokens, that enables the model to learn fine-grained alignment between visual groups and text entities. The latter enforces consistent mask predictions between images that contain shared entities, which encourages the model to learn visual invariance. Third, we construct CC4M dataset for pre-training by filtering CC12M with frequently appeared entities, which significantly improves training efficiency. Fourth, we perform zero-shot transfer on three benchmark datasets, PASCAL VOC 2012, PASCAL Context, and COCO Object. Our model achieves superior segmentation results over the state-of-the-art method by using only 3% data (4M vs 134M) for pre-training. Code and pre-trained models will be released for future research. © 2023, CC BY.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

SELF-SUPERVISED VIDEO REPRESENTATION LEARNING WITH MOTION-CONTRASTIVE PERCEPTION

arXiv

引用

arXiv 2022年

作者： Liu, Jinyu Cheng, Ying Zhang, Yuejie Zhao, Rui-Wei Feng, Rui School of Computer Science Shanghai Collaborative Innovation Center of Intelligent Visual Computing Fudan University China Academy for Engineering and Technology Fudan University China

visual-only self-supervised learning has achieved significant improvement in video representation learning. Existing related methods encourage models to learn video representations by utilizing contrastive learning or designing specific pretext tasks. However, some models are likely to focus on the background, which is unimportant for learning video representations. To alleviate this problem, we propose a new view called long-range residual frame to obtain more motion-specific information. Based on this, we propose the Motion-Contrastive Perception Network (MCPNet), which consists of two branches, namely, Motion Information Perception (MIP) and Contrastive Instance Perception (CIP), to learn generic video representations by focusing on the changing areas in videos. Specifically, the MIP branch aims to learn fine-grained motion features, and the CIP branch performs contrastive learning to learn overall semantics information for each instance. Experiments on two benchmark datasets UCF-101 and HMDB-51 show that our method outperforms current state-of-the-art visual-only self-supervised approaches. Copyright © 2022, The Authors. All rights reserved.

关键词： Video recording

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：