检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

50,479 篇 会议
1,421 册 图书
1,041 篇 期刊文献
1 篇 学位论文

馆藏范围

52,940 篇 电子文献
4 种 纸本馆藏

日期分布

学科分类号

31,811 篇 工学
- 24,804 篇 计算机科学与技术...
- 12,568 篇 软件工程
- 5,153 篇 光学工程
- 4,756 篇 电气工程
- 4,436 篇 信息与通信工程
- 4,257 篇 机械工程
- 3,956 篇 控制科学与工程
- 2,474 篇 生物工程
- 1,728 篇 生物医学工程（可授...
- 1,584 篇 仪器科学与技术
- 1,317 篇 电子科学与技术（可...
- 793 篇 化学工程与技术
- 698 篇 安全科学与工程
- 542 篇 交通运输工程
- 379 篇 建筑学
- 331 篇 土木工程
11,839 篇 理学
- 6,434 篇 物理学
- 5,405 篇 数学
- 2,761 篇 生物学
- 1,910 篇 统计学（可授理学、...
- 801 篇 化学
- 669 篇 系统科学
5,305 篇 医学
- 5,094 篇 临床医学
- 729 篇 基础医学(可授医学...
- 459 篇 药学(可授医学、理...
3,350 篇 管理学
- 1,953 篇 图书情报与档案管...
- 1,535 篇 管理科学与工程(可...
- 479 篇 工商管理
720 篇 艺术学
- 718 篇 设计学（可授艺术学...
428 篇 法学
- 401 篇 社会学
297 篇 农学
197 篇 教育学
163 篇 经济学
63 篇 文学
49 篇 军事学

主题

17,385 篇 computer vision
9,017 篇 pattern recognit...
4,196 篇 training
3,815 篇 feature extracti...
3,134 篇 cameras
2,870 篇 computational mo...
2,789 篇 image segmentati...
2,622 篇 visualization
2,573 篇 shape
2,533 篇 face recognition
2,171 篇 robustness
2,123 篇 computer science
1,973 篇 object detection
1,959 篇 computer archite...
1,878 篇 layout
1,853 篇 object recogniti...
1,802 篇 three-dimensiona...
1,725 篇 neural networks
1,708 篇 humans
1,691 篇 image recognitio...

机构

165 篇 univ chinese aca...
144 篇 tsinghua univers...
136 篇 national laborat...
108 篇 univ sci & techn...
104 篇 zhejiang univers...
100 篇 shanghai jiao to...
95 篇 microsoft resear...
94 篇 university of sc...
86 篇 zhejiang univ pe...
84 篇 shanghai ai lab ...
74 篇 school of comput...
69 篇 computer vision ...
68 篇 peking univ peop...
68 篇 chinese acad sci...
65 篇 chinese univ hon...
63 篇 institute of inf...
62 篇 google res mount...
61 篇 univ oxford oxfo...
59 篇 univ toronto on
57 篇 swiss fed inst t...

作者

91 篇 van gool luc
87 篇 umapada pal
76 篇 zhang lei
64 篇 lee seong-whan
49 篇 vittorio murino
42 篇 yang yi
34 篇 nassir navab
33 篇 li xin
33 篇 jie yang
32 篇 liu yang
31 篇 escalera sergio
31 篇 loy chen change
30 篇 ling haibin
30 篇 h. bischof
29 篇 zhou jie
29 篇 vasconcelos nuno
29 篇 jan-michael frah...
29 篇 hanqing lu
28 篇 blumenstein mich...
27 篇 jia yunde

语言

51,871 篇 英文
835 篇 其他
241 篇 中文
22 篇 土耳其文
5 篇 西班牙文
2 篇 日文
2 篇 葡萄牙文
2 篇 俄文

检索条件"任意字段=IEEE Conference on Computer Vision and Pattern Recognition"

共 52943 条记录，以下是141-150 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes

DrivingGaussian: Composite Gaussian Splatting for Surroundin...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Zhou, Xiaoyu Lin, Zhiwei Shan, Xiaojun Wang, Yongtao Sun, Deqing Yang, Ming-Hsuan Peking Univ Wangxuan Inst Comp Technol Beijing Peoples R China Google Res Mountain View CA USA Univ Calif Merced Merced CA USA

ISBN: (纸本)9798350353006

We present DrivingGaussian, an efficient and effective framework for surrounding dynamic autonomous driving scenes. For complex scenes with moving objects, we first sequentially and progressively model the static background of the entire scene with incremental static 3D Gaussians. We then leverage a composite dynamic Gaussian graph to handle multiple moving objects, individually reconstructing each object and restoring their accurate positions and occlusion relationships within the scene. We further use a LiDAR prior for Gaussian Splatting to reconstruct scenes with greater details and maintain panoramic consistency. DrivingGaussian outperforms existing methods in dynamic driving scene reconstruction and enables photorealistic surround-view synthesis with high-fidelity and multi-camera consistency. Our project page is at: https://***/VDIGPKU/DrivingGaussian.

关键词： 3D Reconstruction 3D vision Autonomous Driving

来源：评论

学校读者我要写书评

暂无评论

SlowFormer: Adversarial Attack on Compute and Energy Consumption of Efficient vision Transformers

SlowFormer: Adversarial Attack on Compute and Energy Consump...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Navaneet, K. L. Koohpayegani, Soroush Abbasi Sleiman, Essam Pirsiavash, Hamed Univ Calif Davis Davis CA 95616 USA Harvard Univ Cambridge MA 02138 USA

ISBN: (纸本)9798350353006

Recently, there has been a lot of progress in reducing the computation of deep models at inference time. These methods can reduce both the computational needs and power usage of deep models. Some of these approaches adaptively scale the compute based on the input instance. We show that such models can be vulnerable to a universal adversarial patch attack, where the attacker optimizes for a patch that when pasted on any image, can increase the compute and power consumption of the model. We run experiments with three different efficient vision transformer methods showing that in some cases, the attacker can increase the computation to the maximum possible level by simply pasting a patch that occupies only 8% of the image area. We also show that a standard adversarial training defense method can reduce some of the attack's success. We believe adaptive efficient methods will be necessary in the future to lower the power usage of expensive deep models, so we hope our paper encourages the community to study the robustness of these methods and develop better defense methods for the proposed attack. Code is available at: https://***/UCDvision/SlowFormer

关键词： Adversarial attack computation attack efficient inference efficient vision transformers energy attack

来源：评论

学校读者我要写书评

暂无评论

Do vision and Language Encoders Represent the World Similarly?

Do Vision and Language Encoders Represent the World Similarl...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Maniparambil, Mayug Akshulakov, Raiymbek Djilali, Yasser Abdelaziz Dahou Seddik, Mohamed El Amine Narayan, Sanath Mangalam, Karttikeya O'Connor, Noel E. Dublin City Univ ML Labs Dublin Ireland Univ Calif Berkeley Berkeley CA 94720 USA Technol Innovat Inst Dublin Ireland

ISBN: (纸本)9798350353006

Aligned text-image encoders such as CLIP have become the de-facto model for vision-language tasks. Furthermore, modality-specific encoders achieve impressive performances in their respective domains. This raises a central question: does an alignment exist between uni-modal vision and language encoders since they fundamentally represent the same physical world? Analyzing the latent spaces structure of vision and language models on image-caption benchmarks using the Centered Kernel Alignment (CKA), we find that the representation spaces of unaligned and aligned encoders are semantically similar. In the absence of statistical similarity in aligned encoders like CLIP, we show that a possible matching of unaligned encoders exists without any training. We frame this as a seeded graph-matching problem exploiting the semantic similarity between graphs and propose two methods - a Fast Quadratic Assignment Problem optimization, and a novel localized CKA metric-based matching/retrieval. We demonstrate the effectiveness of this on several downstream tasks including cross-lingual, cross-domain caption matching and image classification. Code available at ***/mayug/0-shot-llm-vision.

关键词： CLIP Unified Representations vision Language Zero-shot

来源：评论

学校读者我要写书评

暂无评论

Exploring vision Transformers for 3D Human Motion-Language Models with Motion Patches

Exploring Vision Transformers for 3D Human Motion-Language M...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Yu, Qing Tanaka, Mikihiro Fujiwara, Kent LY Corp Tokyo Japan

ISBN: (纸本)9798350353013;9798350353006

To build a cross-modal latent space between 3D human motion and language, acquiring large-scale and high-quality human motion data is crucial. However, unlike the abundance of image data, the scarcity of motion data has limited the performance of existing motion-language models. To counter this, we introduce "motion patches", a new representation of motion sequences, and propose using vision Transformers (ViT) as motion encoders via transfer learning, aiming to extract useful knowledge from the image domain and apply it to the motion domain. These motion patches, created by dividing and sorting skeleton joints based on body parts in motion sequences, are robust to varying skeleton structures, and can be regarded as color image patches in ViT. We find that transfer learning with pre-trained weights of ViT obtained through training with 2D image data can boost the performance of motion analysis, presenting a promising direction for addressing the issue of limited motion data. Our extensive experiments show that the proposed motion patches, used jointly with ViT, achieve state-of-the-art performance in the benchmarks of text-to-motion retrieval, and other novel challenging tasks, such as cross-skeleton recognition, zero-shot motion classification, and human interaction recognition, which are currently impeded by the lack of data.

关键词： Motion Representation Motion-Language Models Text-Motion Retrieval

来源：评论

学校读者我要写书评

暂无评论

Mitigating Object Hallucinations in Large vision-Language Models through Visual Contrastive Decoding

Mitigating Object Hallucinations in Large Vision-Language Mo...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Leng, Sicong Zhang, Hang Chen, Guanzheng Li, Xin Lug, Shijian Miao, Chunyan Bing, Lidong Alibaba Grp DAMO Acad Hangzhou Peoples R China Nanyang Technol Univ Singapore Singapore Hupan Lab Hangzhou 310023 Peoples R China

ISBN: (纸本)9798350353006

Large vision-Language Models (LVLMs) have advanced considerably, intertwining visual recognition and language understanding to generate content that is not only coherent but also contextually attuned. Despite their success, LVLMs still suffer from the issue of object hallucinations, where models generate plausible yet incorrect outputs that include objects that do not exist in the images. To mitigate this issue, we introduce Visual Contrastive Decoding (VCD), a simple and training-free method that contrasts output distributions derived from original and distorted visual inputs. The proposed VCD effectively reduces the over-reliance on statistical bias and unimodal priors, two essential causes of object hallucinations. This adjustment ensures the generated content is closely grounded to visual inputs, resulting in contextually accurate outputs. Our experiments show that VCD, without either additional training or the usage of external tools, significantly mitigates the object hallucination issue across different LVLM families. Beyond mitigating object hallucinations, VCD also excels in general LVLM benchmarks, highlighting its wide-ranging applicability.

关键词： Large Multimodal Models Multimodality vision and Language

来源：评论

学校读者我要写书评

暂无评论

Cinematic Behavior Transfer via NeRF-based Differentiable Filming

Cinematic Behavior Transfer via NeRF-based Differentiable Fi...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Jiang, Xuekun Rao, Anyi Wang, Jingbo Lin, Dahua Dai, Bo Shanghai AI Lab Shanghai Peoples R China Stanford Univ Stanford CA 94305 USA Chinese Univ Hong Kong Hong Kong Peoples R China

ISBN: (纸本)9798350353013;9798350353006

In the evolving landscape of digital media and video production, the precise manipulation and reproduction of visual elements like camera movements and character actions are highly desired. Existing SLAM methods face limitations in dynamic scenes and human pose estimation often focuses on 2D projections, neglecting 3D statuses. To address these issues, we first introduce a reverse filming behavior estimation technique. It optimizes camera trajectories by leveraging NeRF as a differentiable renderer and refining SMPL tracks. We then introduce a cinematic transfer pipeline that is able to transfer various shot types to a new 2D video or a 3D virtual environment. The incorporation of 3D engine workflow enables superior rendering and control abilities, which also achieves a higher rating in the user study.

关键词： Three dimensional computer graphics

来源：评论

学校读者我要写书评

暂无评论

Prompt Learning via Meta-Regularization

Prompt Learning via Meta-Regularization

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Park, Jinyoung Ko, Juyeon Kim, Hyunwoo J. Korea Univ Dept Comp Sci & Engn Seoul South Korea

ISBN: (纸本)9798350353006

Pre-trained vision-language models have shown impressive success on various computer vision tasks with their zero-shot generalizability. Recently, prompt learning approaches have been explored to efficiently and effectively adapt the vision-language models to a variety of down-stream tasks. However, most existing prompt learning methods suffer from task overfitting since the general knowledge of the pre-trained vision language models is forgotten while the prompts are finetuned on a small data set from a specific target task. To address this issue, we propose a Prompt Meta-Regularization (ProMetaR) to improve the generalizability of prompt learning for vision-language models. Specifically, ProMetaR meta-learns both the regularizer and the soft prompts to harness the task-specific knowledge from the downstream tasks and task-agnostic general knowledge from the vision-language models. Further, ProMetaR augments the task to generate multiple virtual tasks to alleviate the meta-overfitting. In addition, we provide the analysis to comprehend how ProMetaR improves the generalizability of prompt tuning in the perspective of the gradient alignment. Our extensive experiments demonstrate that our ProMetaR improves the generalizability of conventional prompt learning methods under base-to-base/base-to-new and domain generalization settings. The code of ProMetaR is available at https://***/mlvlab/ProMetaR.

关键词： Meta-learning Prompt tuning vision-Language Model

来源：评论

学校读者我要写书评

暂无评论

vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning

Vision-Language Pseudo-Labels for Single-Positive Multi-Labe...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Xing, Xin Xiong, Zhexiao Stylianou, Abby Sastry, Srikumar Gong, Liyu Jacobs, Nathan Univ Nebraska Omaha Omaha NE 68182 USA Washington Univ St Louis St Louis MO USA St Louis Univ St Louis MO USA Oracle Inc Austin TX USA

ISBN: (纸本)9798350365474

We study a limited label problem and present a novel approach to Single-Positive Multi-label Learning. In the multi-label learning setting, a model learns to predict multiple labels or categories for a single input image. This contrasts with standard multi-class image classification, where the task is to predict a single label from many possible labels for an image. Single-Positive Multi-label Learning specifically considers learning to predict multiple labels when there is only one annotation per image in the training data. Multi-label learning is a more natural task than single-label learning because real-world data often involves instances belonging to multiple categories simultaneously;however, most computer vision datasets contain single labels due to the inherent complexity and cost of collecting multiple high-quality annotations per image. We propose a novel approach called vision-Language Pseudo-Labeling, which uses a vision-language model, CLIP, to suggest strong positive and negative pseudo-labels. The experiment performance shows the effectiveness of the proposed model. Our code and data will be made publicly available at https://***/mvrl/VLPL.

关键词： CLIP Pseudo-labeling Single-Positive Multi-label Learning

来源：评论

学校读者我要写书评

暂无评论

SpikingResformer: Bridging ResNet and vision Transformer in Spiking Neural Networks

SpikingResformer: Bridging ResNet and Vision Transformer in ...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Shi, Xinyu Hao, Zecheng Yu, Zhaofei Peking Univ Inst Artificial Intelligence Beijing Peoples R China Peking Univ Sch Comp Sci Beijing Peoples R China

ISBN: (纸本)9798350353013;9798350353006

The remarkable success of vision Transformers in Artificial Neural Networks (ANNs) has led to a growing interest in incorporating the self-attention mechanism and transformer-based architecture into Spiking Neural Networks (SNNs). While existing methods propose spiking self-attention mechanisms that are compatible with SNNs, they lack reasonable scaling methods, and the overall architectures proposed by these methods suffer from a bottleneck in effectively extracting local features. To address these challenges, we propose a novel spiking self-attention mechanism named Dual Spike Self-Attention (DSSA) with a reasonable scaling method. Based on DSSA, we propose a novel spiking vision Transformer architecture called SpikingResformer, which combines the ResNet-based multi-stage architecture with our proposed DSSA to improve both performance and energy efficiency while reducing parameters. Experimental results show that SpikingResformer achieves higher accuracy with fewer parameters and lower energy consumption than other spiking vision Transformer counterparts. Notably, our SpikingResformer-L achieves 79.40% top-1 accuracy on ImageNet with 4 time-steps, which is the state-of-the-art result in the SNN field. Codes are available at https://***/xyshi2000/SpikingResformer

关键词： Spiking Neural Networks vision Transformer

来源：评论

学校读者我要写书评

暂无评论

Active Open-Vocabulary recognition: Let Intelligent Moving Mitigate CLIP Limitations

Active Open-Vocabulary Recognition: Let Intelligent Moving M...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Fan, Lei Zhou, Jianxiong Xing, Xiaoying Wu, Ying Northwestern Univ Evanston IL 60208 USA

ISBN: (纸本)9798350353006

Active recognition, which allows intelligent agents to explore observations for better recognition performance, serves as a prerequisite for various embodied AI tasks, such as grasping, navigation and room arrangements. Given the evolving environment and the multitude of object classes, it is impractical to include all possible classes during the training stage. In this paper, we aim at advancing active open-vocabulary recognition, empowering embodied agents to actively perceive and classify arbitrary objects. However, directly adopting recent open-vocabulary classification models, like Contrastive Language Image Pretraining (CLIP), poses its unique challenges. Specifically, we observe that CLIP's performance is heavily affected by the viewpoint and occlusions, compromising its reliability in unconstrained embodied perception scenarios. Further, the sequential nature of observations in agent-environment interactions necessitates an effective method for integrating features that maintains discriminative strength for open-vocabulary classification. To address these issues, we introduce a novel agent for active open-vocabulary recognition. The proposed method leverages inter-frame and inter-concept similarities to navigate agent movements and to fuse features, without relying on class-specific knowledge. Compared to baseline CLIP model with 29.6% accuracy on ShapeNet dataset, the proposed agent could achieve 53.3% accuracy for open-vocabulary recognition, without any fine-tuning to the equipped CLIP model. Additional experiments conducted with the Habitat simulator further affirm the efficacy of our method.

关键词： Active vision Embodied perception

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 11 12 13 14 15 16 17 18 19 20 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：