检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

23,001 篇 会议
126 册 图书
92 篇 期刊文献

馆藏范围

23,218 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,623 篇 工学
- 11,108 篇 计算机科学与技术...
- 3,479 篇 软件工程
- 2,445 篇 机械工程
- 1,716 篇 光学工程
- 1,075 篇 电气工程
- 1,014 篇 控制科学与工程
- 785 篇 信息与通信工程
- 412 篇 仪器科学与技术
- 352 篇 生物工程
- 251 篇 生物医学工程（可授...
- 196 篇 电子科学与技术（可...
- 114 篇 化学工程与技术
- 108 篇 安全科学与工程
- 100 篇 测绘科学与技术
- 88 篇 建筑学
- 87 篇 交通运输工程
- 84 篇 土木工程
3,494 篇 医学
- 3,481 篇 临床医学
- 81 篇 基础医学(可授医学...
3,242 篇 理学
- 1,939 篇 物理学
- 1,640 篇 数学
- 563 篇 统计学（可授理学、...
- 500 篇 生物学
- 249 篇 系统科学
- 107 篇 化学
522 篇 管理学
- 311 篇 图书情报与档案管...
- 224 篇 管理科学与工程(可...
- 76 篇 工商管理
276 篇 艺术学
- 276 篇 设计学（可授艺术学...
66 篇 法学
- 63 篇 社会学
38 篇 农学
28 篇 教育学
22 篇 经济学
10 篇 军事学
3 篇 文学

主题

10,187 篇 computer vision
3,967 篇 pattern recognit...
3,005 篇 training
2,007 篇 computational mo...
1,818 篇 visualization
1,815 篇 cameras
1,516 篇 feature extracti...
1,481 篇 shape
1,455 篇 three-dimensiona...
1,438 篇 image segmentati...
1,287 篇 robustness
1,205 篇 computer archite...
1,155 篇 semantics
1,147 篇 conferences
1,107 篇 layout
1,092 篇 computer science
1,087 篇 object detection
1,025 篇 benchmark testin...
970 篇 codes
922 篇 face recognition

机构

136 篇 univ sci & techn...
121 篇 univ chinese aca...
118 篇 chinese univ hon...
107 篇 carnegie mellon ...
101 篇 tsinghua univers...
101 篇 microsoft resear...
95 篇 swiss fed inst t...
93 篇 zhejiang univ pe...
82 篇 university of sc...
81 篇 zhejiang univers...
80 篇 university of ch...
77 篇 shanghai ai lab ...
72 篇 shanghai jiao to...
69 篇 national laborat...
67 篇 microsoft res as...
67 篇 alibaba grp peop...
64 篇 adobe research
61 篇 tsinghua univ pe...
60 篇 peking univ peop...
59 篇 univ oxford oxfo...

作者

81 篇 van gool luc
72 篇 timofte radu
64 篇 zhang lei
47 篇 luc van gool
40 篇 yang yi
40 篇 li stan z.
37 篇 loy chen change
34 篇 chen chen
33 篇 xiaoou tang
32 篇 liu yang
32 篇 qi tian
31 篇 tian qi
31 篇 sun jian
30 篇 murino vittorio
30 篇 pascal fua
29 篇 darrell trevor
29 篇 li fei-fei
28 篇 li xin
28 篇 ying shan
27 篇 vasconcelos nuno

语言

23,137 篇 英文
53 篇 其他
22 篇 中文
5 篇 土耳其文
2 篇 日文

检索条件"任意字段=IEEE Conference on Computer Vision and Pattern Recognition Workshops"

共 23219 条记录，以下是191-200 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Mitigating Object Hallucinations in Large vision-Language Models through Visual Contrastive Decoding

Mitigating Object Hallucinations in Large Vision-Language Mo...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Leng, Sicong Zhang, Hang Chen, Guanzheng Li, Xin Lug, Shijian Miao, Chunyan Bing, Lidong Alibaba Grp DAMO Acad Hangzhou Peoples R China Nanyang Technol Univ Singapore Singapore Hupan Lab Hangzhou 310023 Peoples R China

ISBN: (纸本)9798350353006

Large vision-Language Models (LVLMs) have advanced considerably, intertwining visual recognition and language understanding to generate content that is not only coherent but also contextually attuned. Despite their success, LVLMs still suffer from the issue of object hallucinations, where models generate plausible yet incorrect outputs that include objects that do not exist in the images. To mitigate this issue, we introduce Visual Contrastive Decoding (VCD), a simple and training-free method that contrasts output distributions derived from original and distorted visual inputs. The proposed VCD effectively reduces the over-reliance on statistical bias and unimodal priors, two essential causes of object hallucinations. This adjustment ensures the generated content is closely grounded to visual inputs, resulting in contextually accurate outputs. Our experiments show that VCD, without either additional training or the usage of external tools, significantly mitigates the object hallucination issue across different LVLM families. Beyond mitigating object hallucinations, VCD also excels in general LVLM benchmarks, highlighting its wide-ranging applicability.

关键词： Large Multimodal Models Multimodality vision and Language

来源：评论

学校读者我要写书评

暂无评论

Cinematic Behavior Transfer via NeRF-based Differentiable Filming

Cinematic Behavior Transfer via NeRF-based Differentiable Fi...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Jiang, Xuekun Rao, Anyi Wang, Jingbo Lin, Dahua Dai, Bo Shanghai AI Lab Shanghai Peoples R China Stanford Univ Stanford CA 94305 USA Chinese Univ Hong Kong Hong Kong Peoples R China

ISBN: (纸本)9798350353013;9798350353006

In the evolving landscape of digital media and video production, the precise manipulation and reproduction of visual elements like camera movements and character actions are highly desired. Existing SLAM methods face limitations in dynamic scenes and human pose estimation often focuses on 2D projections, neglecting 3D statuses. To address these issues, we first introduce a reverse filming behavior estimation technique. It optimizes camera trajectories by leveraging NeRF as a differentiable renderer and refining SMPL tracks. We then introduce a cinematic transfer pipeline that is able to transfer various shot types to a new 2D video or a 3D virtual environment. The incorporation of 3D engine workflow enables superior rendering and control abilities, which also achieves a higher rating in the user study.

关键词： Three dimensional computer graphics

来源：评论

学校读者我要写书评

暂无评论

Prompt Learning via Meta-Regularization

Prompt Learning via Meta-Regularization

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Park, Jinyoung Ko, Juyeon Kim, Hyunwoo J. Korea Univ Dept Comp Sci & Engn Seoul South Korea

ISBN: (纸本)9798350353006

Pre-trained vision-language models have shown impressive success on various computer vision tasks with their zero-shot generalizability. Recently, prompt learning approaches have been explored to efficiently and effectively adapt the vision-language models to a variety of down-stream tasks. However, most existing prompt learning methods suffer from task overfitting since the general knowledge of the pre-trained vision language models is forgotten while the prompts are finetuned on a small data set from a specific target task. To address this issue, we propose a Prompt Meta-Regularization (ProMetaR) to improve the generalizability of prompt learning for vision-language models. Specifically, ProMetaR meta-learns both the regularizer and the soft prompts to harness the task-specific knowledge from the downstream tasks and task-agnostic general knowledge from the vision-language models. Further, ProMetaR augments the task to generate multiple virtual tasks to alleviate the meta-overfitting. In addition, we provide the analysis to comprehend how ProMetaR improves the generalizability of prompt tuning in the perspective of the gradient alignment. Our extensive experiments demonstrate that our ProMetaR improves the generalizability of conventional prompt learning methods under base-to-base/base-to-new and domain generalization settings. The code of ProMetaR is available at https://***/mlvlab/ProMetaR.

关键词： Meta-learning Prompt tuning vision-Language Model

来源：评论

学校读者我要写书评

暂无评论

KBody: Balanced monocular whole-body estimation

KBody: Balanced monocular whole-body estimation

引用

2023 ieee/CVF conference on computer vision and pattern recognition workshops, CVPRW 2023

作者： Zioulis, Nikolaos O'Brien, James F. Klothed Technologies Inc. United States Uc Berkeley United States

ISBN: (纸本)9798350302493

KBody is a method for fitting a low-dimensional body model to an image. It follows a predict-and-optimize approach, relying on data-driven model estimates for the constraints that will be used to solve for the body's parameters. Compared to other approaches, it introduces virtual joints to identify higher quality correspondences and disentangles the optimization between the pose and shape parameters to achieve a more balanced result in terms of pose and shape capturing capacity, as well as pixel alignment. © 2023 ieee.

关键词： computer vision

来源：评论

学校读者我要写书评

暂无评论

A case for using rotation invariant features in state of the art feature matchers

A case for using rotation invariant features in state of the...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Bokman, Georg Kahl, Fredrik Chalmers Univ Technol Gothenburg Sweden

ISBN: (纸本)9781665487399

The aim of this paper is to demonstrate that a state of the art feature matcher (LoFTR) can be made more robust to rotations by simply replacing the backbone CNN with a steerable CNN which is equivariant to translations and image rotations. It is experimentally shown that this boost is obtained without reducing performance on ordinary illumination and viewpoint matching sequences.

关键词： computer vision conferences Lighting pattern matching

来源：评论

学校读者我要写书评

暂无评论

vision + Language Applications: A Survey

Vision + Language Applications: A Survey

引用

2023 ieee/CVF conference on computer vision and pattern recognition workshops, CVPRW 2023

作者： Zhou, Yutong Shimada, Nobutaka Ritsumeikan University Shiga Japan

ISBN: (纸本)9798350302493

Text-to-image generation has attracted significant interest from researchers and practitioners in recent years due to its widespread and diverse applications across various industries. Despite the progress made in the domain of vision and language research, the existing literature remains relatively limited, particularly with regard to advancements and applications in this field. This paper explores a relevant research track within multimodal applications, including text, vision, audio, and others. In addition to the studies discussed in this paper, we are also committed to continually updating the latest relevant papers, datasets, application projects and corresponding information at https://***/Yutong-Zhou-cv/Awesome-Text-to-Image. © 2023 ieee.

关键词：

来源：评论

学校读者我要写书评

暂无评论

SpikingResformer: Bridging ResNet and vision Transformer in Spiking Neural Networks

SpikingResformer: Bridging ResNet and Vision Transformer in ...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Shi, Xinyu Hao, Zecheng Yu, Zhaofei Peking Univ Inst Artificial Intelligence Beijing Peoples R China Peking Univ Sch Comp Sci Beijing Peoples R China

ISBN: (纸本)9798350353013;9798350353006

The remarkable success of vision Transformers in Artificial Neural Networks (ANNs) has led to a growing interest in incorporating the self-attention mechanism and transformer-based architecture into Spiking Neural Networks (SNNs). While existing methods propose spiking self-attention mechanisms that are compatible with SNNs, they lack reasonable scaling methods, and the overall architectures proposed by these methods suffer from a bottleneck in effectively extracting local features. To address these challenges, we propose a novel spiking self-attention mechanism named Dual Spike Self-Attention (DSSA) with a reasonable scaling method. Based on DSSA, we propose a novel spiking vision Transformer architecture called SpikingResformer, which combines the ResNet-based multi-stage architecture with our proposed DSSA to improve both performance and energy efficiency while reducing parameters. Experimental results show that SpikingResformer achieves higher accuracy with fewer parameters and lower energy consumption than other spiking vision Transformer counterparts. Notably, our SpikingResformer-L achieves 79.40% top-1 accuracy on ImageNet with 4 time-steps, which is the state-of-the-art result in the SNN field. Codes are available at https://***/xyshi2000/SpikingResformer

关键词： Spiking Neural Networks vision Transformer

来源：评论

学校读者我要写书评

暂无评论

Active Open-Vocabulary recognition: Let Intelligent Moving Mitigate CLIP Limitations

Active Open-Vocabulary Recognition: Let Intelligent Moving M...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Fan, Lei Zhou, Jianxiong Xing, Xiaoying Wu, Ying Northwestern Univ Evanston IL 60208 USA

ISBN: (纸本)9798350353006

Active recognition, which allows intelligent agents to explore observations for better recognition performance, serves as a prerequisite for various embodied AI tasks, such as grasping, navigation and room arrangements. Given the evolving environment and the multitude of object classes, it is impractical to include all possible classes during the training stage. In this paper, we aim at advancing active open-vocabulary recognition, empowering embodied agents to actively perceive and classify arbitrary objects. However, directly adopting recent open-vocabulary classification models, like Contrastive Language Image Pretraining (CLIP), poses its unique challenges. Specifically, we observe that CLIP's performance is heavily affected by the viewpoint and occlusions, compromising its reliability in unconstrained embodied perception scenarios. Further, the sequential nature of observations in agent-environment interactions necessitates an effective method for integrating features that maintains discriminative strength for open-vocabulary classification. To address these issues, we introduce a novel agent for active open-vocabulary recognition. The proposed method leverages inter-frame and inter-concept similarities to navigate agent movements and to fuse features, without relying on class-specific knowledge. Compared to baseline CLIP model with 29.6% accuracy on ShapeNet dataset, the proposed agent could achieve 53.3% accuracy for open-vocabulary recognition, without any fine-tuning to the equipped CLIP model. Additional experiments conducted with the Habitat simulator further affirm the efficacy of our method.

关键词： Active vision Embodied perception

来源：评论

学校读者我要写书评

暂无评论

SnAG: Scalable and Accurate Video Grounding

SnAG: Scalable and Accurate Video Grounding

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Mu, Fangzhou Mo, Sicheng Li, Yin Univ Wisconsin Madison Madison WI 53706 USA

ISBN: (纸本)9798350353006

Temporal grounding of text descriptions in videos is a central problem in vision-language learning and video understanding. Existing methods often prioritize accuracy over scalability - they have been optimized for grounding only a few text queries within short videos, and fail to scale up to long videos with hundreds of queries. In this paper, we study the effect of cross-modal fusion on the scalability of video grounding models. Our analysis establishes late fusion as a more cost-effective fusion scheme for long-form videos with many text queries. Moreover, it leads us to a novel, video-centric sampling scheme for efficient training. Based on these findings, we present SnAG, a simple baseline for scalable and accurate video grounding. Without bells and whistles, SnAG is 43% more accurate and 1.5x faster than CONE, a state of the art for long-form video grounding on the challenging MAD dataset, while achieving highly competitive results on short videos. Our code is available at https://***/fmu2/snag_release.

关键词： Temporal Sentence Grounding Video understanding vision-Language Learning

来源：评论

学校读者我要写书评

暂无评论

AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving

AIDE: An Automatic Data Engine for Object Detection in Auton...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Liang, Mingfu Sue, Jong-Chyi Schulter, Samuel Garg, Sparsh Zhao, Shiyu Wu, Ying Chandraker, Manmohan Northwestern Univ Evanston IL 60208 USA NEC Labs Amer Princeton NJ USA Rutgers State Univ New Brunswick NJ USA Univ Calif San Diego San Diego CA USA

ISBN: (纸本)9798350353006

Autonomous vehicle (AV) systems rely on robust perception models as a cornerstone of safety assurance. However, objects encountered on the road exhibit a long-tailed distribution, with rare or unseen categories posing challenges to a deployed perception model. This necessitates an expensive process of continuously curating and annotating data with significant human effort. We propose to leverage recent advances in vision-language and large language models to design an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios. This process operates iteratively, allowing for continuous self-improvement of the model. We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.

关键词： Automatic Data Engine Autonomous Driving Continual Training Large Language Model Novel Object Detection vision Language Model

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 16 17 18 19 20 21 22 23 24 25 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：