检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

22,774 篇 会议
111 篇 期刊文献
23 册 图书

馆藏范围

22,907 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,400 篇 工学
- 10,880 篇 计算机科学与技术...
- 3,450 篇 软件工程
- 2,429 篇 机械工程
- 1,723 篇 光学工程
- 1,011 篇 控制科学与工程
- 998 篇 电气工程
- 761 篇 信息与通信工程
- 393 篇 仪器科学与技术
- 337 篇 生物工程
- 257 篇 生物医学工程（可授...
- 214 篇 电子科学与技术（可...
- 113 篇 化学工程与技术
- 112 篇 安全科学与工程
- 98 篇 测绘科学与技术
- 93 篇 交通运输工程
- 86 篇 建筑学
- 82 篇 土木工程
3,361 篇 医学
- 3,347 篇 临床医学
- 79 篇 基础医学(可授医学...
3,251 篇 理学
- 1,953 篇 物理学
- 1,665 篇 数学
- 567 篇 统计学（可授理学、...
- 484 篇 生物学
- 245 篇 系统科学
- 109 篇 化学
506 篇 管理学
- 299 篇 图书情报与档案管...
- 219 篇 管理科学与工程(可...
- 75 篇 工商管理
252 篇 艺术学
- 252 篇 设计学（可授艺术学...
62 篇 法学
- 59 篇 社会学
40 篇 农学
25 篇 教育学
19 篇 经济学
11 篇 军事学
3 篇 文学

主题

10,126 篇 computer vision
4,026 篇 pattern recognit...
2,900 篇 training
1,958 篇 computational mo...
1,792 篇 cameras
1,759 篇 visualization
1,484 篇 shape
1,466 篇 image segmentati...
1,445 篇 feature extracti...
1,412 篇 three-dimensiona...
1,288 篇 robustness
1,170 篇 computer archite...
1,146 篇 layout
1,142 篇 computer science
1,134 篇 semantics
1,071 篇 object detection
1,043 篇 conferences
1,009 篇 benchmark testin...
967 篇 codes
810 篇 face recognition

机构

135 篇 univ sci & techn...
118 篇 univ chinese aca...
118 篇 chinese univ hon...
110 篇 carnegie mellon ...
99 篇 tsinghua univers...
99 篇 microsoft resear...
94 篇 swiss fed inst t...
92 篇 zhejiang univ pe...
82 篇 university of sc...
81 篇 zhejiang univers...
77 篇 shanghai ai lab ...
77 篇 university of ch...
72 篇 shanghai jiao to...
68 篇 microsoft res as...
65 篇 national laborat...
65 篇 alibaba grp peop...
63 篇 adobe research
63 篇 tsinghua univ pe...
60 篇 peking univ peop...
59 篇 peng cheng labor...

作者

78 篇 van gool luc
72 篇 timofte radu
63 篇 zhang lei
45 篇 luc van gool
40 篇 yang yi
37 篇 loy chen change
33 篇 xiaoou tang
33 篇 li stan z.
33 篇 qi tian
32 篇 sun jian
31 篇 liu yang
31 篇 li fei-fei
30 篇 chen chen
30 篇 tian qi
30 篇 pascal fua
29 篇 darrell trevor
28 篇 ying shan
27 篇 li xin
27 篇 vasconcelos nuno
27 篇 hanqing lu

语言

22,719 篇 英文
162 篇 其他
20 篇 中文
5 篇 土耳其文
2 篇 日文

检索条件"任意字段=1994 IEEE Computer-Society Conference on Computer Vision and Pattern Recognition"

共 22908 条记录，以下是291-300 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Weak-to-Strong 3D Object Detection with X-Ray Distillation

Weak-to-Strong 3D Object Detection with X-Ray Distillation

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Gambashidze, Alexander Dadukin, Aleksandr Golyadkin, Maxim Razzhivina, Maria Makarov, Ilya Artificial Intelligence Res Inst Barcelona Spain HSE Univ Moscow Russia ISP RAS Moscow Russia

ISBN: (纸本)9798350353006

This paper addresses the critical challenges of sparsity and occlusion in LiDAR-based 3D object detection. Current methods often rely on supplementary modules or specific architectural designs, potentially limiting their applicability to new and evolving architectures. To our knowledge, we are the first to propose a versatile technique that seamlessly integrates into any existing framework for 3D Object Detection, marking the first instance of Weak-to-Strong generalization in 3D computer vision. We introduce a novel framework, X-Ray Distillation with Object-Complete Frames, suitable for both supervised and semi-supervised settings, that leverages the temporal aspect of point cloud sequences. This method extracts crucial information from both previous and subsequent LiDAR frames, creating Object-Complete frames that represent objects from multiple viewpoints, thus addressing occlusion and sparsity. Given the limitation of not being able to generate Object-Complete frames during online inference, we utilize Knowledge Distillation within a Teacher-Student framework. This technique encourages the strong Student model to emulate the behavior of the weaker Teacher, which processes simple and informative Object-Complete frames, effectively offering a comprehensive view of objects as if seen through X-ray vision. Our proposed methods surpass state-of-the-art in semi-supervised learning by 1-1.5 mAP and enhance the performance of five established supervised models by 1-2 mAP on standard autonomous driving datasets, even with default hyperparameters. Code for Object-Complete frames is available here: https://***/sakharok13/X-Ray-TeacherPatching-Tools.

关键词： 3D detection autonomous driving computer vision

来源：评论

学校读者我要写书评

暂无评论

Distilling vision-Language Models on Millions of Videos

Distilling Vision-Language Models on Millions of Videos

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Zhao, Yue Zhao, Long Zhou, Xingyi Wu, Jialin Chu, Chun-Te Mia, Hui Schroff, Florian Adam, Hartwig Liu, Ting Gong, Boqing Krahenbuhl, Philipp Yuan, Liangzhe Google Res Mountain View CA 94043 USA Univ Texas Austin Austin TX 78712 USA

ISBN: (纸本)9798350353006

The recent advance in vision-language models is largely attributed to the abundance of image-text data. We aim to replicate this success for video-language models, but there simply is not enough human- curated video-text data available. We thus resort to fine-tuning a video-language model from a strong image-language baseline with synthesized instructional data. The resulting video model by video-instruction-tuning (VIIT) is then used to auto-label millions of videos to generate high-quality captions. We show the adapted video-language model performs well on a wide range of video-language benchmarks. For instance, it surpasses the best prior result on open-ended NExT-QA by 2.8%. Besides, our model generates detailed descriptions for previously unseen videos, which provide better textual supervision than existing methods. Experiments show that a video-language dual-encoder model contrastively trained on these auto-generated captions is 3.8% better than the strongest baseline that also leverages vision-language models. Our best model outperforms state-of-the-art methods on MSR-VTT zero-shot text-to-video retrieval by 6%. As a side product, we generate the largest video capation dataset to date.

关键词： Video analysis

来源：评论

学校读者我要写书评

暂无评论

PoNQ: a Neural QEM-based Mesh Representation

PoNQ: a Neural QEM-based Mesh Representation

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Maruani, Nissim Ovsjanikov, Maks Alliez, Pierre Desbrun, Mathieu Univ Cote Azur INRIA Nice France IP Paris Ecole Polytech LIX Paris France Ecole Polytech Inria Saclay Paris France

ISBN: (纸本)9798350353013;9798350353006

Although polygon meshes have been a standard representation in geometry processing, their irregular and combinatorial nature hinders their suitability for learning-based applications. In this work, we introduce a novel learnable mesh representation through a set of local 3D sample Points and their associated Normals and Quadric error metrics (QEM) w.r.t. the underlying shape, which we denote PoNQ. A global mesh is directly derived from PoNQ by efficiently leveraging the knowledge of the local quadric errors. Besides marking the first use of QEM within a neural shape representation, our contribution guarantees both topological and geometrical properties by ensuring that a PoNQ mesh does not self-intersect and is always the boundary of a volume. Notably, our representation does not rely on a regular grid, is supervised directly by the target surface alone, and also handles open surfaces with boundaries and/or sharp features. We demonstrate the efficacy of PoNQ through a learning-based mesh prediction from SDF grids and show that our method surpasses recent state-of-the-art techniques in terms of both surface and edge-based metrics.

关键词： 3D Mesh Computational Geometry computer vision

来源：评论

学校读者我要写书评

暂无评论

Attentive Illumination Decomposition Model for Multi-Illuminant White Balancing

Attentive Illumination Decomposition Model for Multi-Illumin...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Kim, Dongyoung Kim, Jinwoo Yu, Junsang Kim, Seon Joo Yonsei Univ Seoul South Korea Samsung Adv Inst Technol Suwon South Korea

ISBN: (纸本)9798350353006

White balance (WB) algorithms in many commercial cameras assume single and uniform illumination, leading to undesirable results when multiple lighting sources with different chromaticities exist in the scene. Prior research on multi-illuminant WB typically predicts illumination at the pixel level without fully grasping the scene's actual lighting conditions, including the number and color of light sources. This often results in unnatural outcomes lacking in overall consistency. To handle this problem, we present a deep white balancing model that leverages the slot attention, where each slot is in charge of representing individual illuminants. This design enables the model to generate [ chromaticities and weight maps for individual illuminants, which are then fused to compose the final illumination map. Furthermore, we propose the centroid-matching loss, which regulates the activation of each slot based on the color range, thereby enhancing the model to separate illumination more effectively. Our method achieves the state-of-the-art performance on both single- and multi-illuminant WB benchmarks, and also offers additional information such as the number of illuminants in the scene and their chromaticity. This capability allows for illumination editing, an application not feasible with prior methods.

关键词： Low level vision Photography White Balancing

来源：评论

学校读者我要写书评

暂无评论

Collaborating Foundation Models for Domain Generalized Semantic Segmentation

Collaborating Foundation Models for Domain Generalized Seman...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Benigmim, Yasser Roy, Subhankar Essid, Slim Kalogeiton, Vicky Lathuiliere, Stephane Inst Polytech Paris Telecom Paris LTCI Palaiseau France Inst Polytech Paris CNRS Ecole Polytech LIX Palaiseau France Univ Aberdeen Aberdeen Scotland

ISBN: (纸本)9798350353013;9798350353006

Domain Generalized Semantic Segmentation (DGSS) deals with training a model on a labeled source domain with the aim of generalizing to unseen domains during inference. Existing DGSS methods typically effectuate robust features by means of Domain Randomization (DR). Such an approach is often limited as it can only account for style diversification and not content. In this work, we take an orthogonal approach to DGSS and propose to use an assembly of CoLlaborative FOUndation models for Domain Generalized Semantic Segmentation (CLOUDS). In detail, CLOUDS is a framework that integrates Foundation Models of various kinds: (i) CLIP backbone for its robust feature representation, (ii) Diffusion Model to diversify the content, thereby covering various modes of the possible target distribution, and (iii) Segment Anything Model (SAM) for iteratively refining the predictions of the segmentation model. Extensive experiments show that our CLOUDS excels in adapting from synthetic to real DGSS benchmarks and under varying weather conditions, notably outperforming prior methods by 5.6% and 6.7% on averaged mIoU, respectively. Our code is available at https://***/yasserben/CLOUDS

关键词： computer vision Deep Learning Domain Adaptation Domain Generalization Foundation Models Semantic Segmentation

来源：评论

学校读者我要写书评

暂无评论

TIGER: Time-Varying Denoising Model for 3D Point Cloud Generation with Diffusion Process

TIGER: Time-Varying Denoising Model for 3D Point Cloud Gener...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Ren, Zhiyuan Kim, Minchul Liu, Feng Liu, Xiaoming Michigan State Univ E Lansing MI 48824 USA

ISBN: (纸本)9798350353006

Recently, diffusion models have emerged as a new powerful generative method for 3D point cloud generation tasks. However, few works study the effect of the architecture of the diffusion model in the 3D point cloud, resorting to the typical UNet model developed for 2D images. Inspired by the wide adoption of Transformers, we study the complementary role of convolution (from UNet) and attention (from Transformers). We discover that their respective importance change according to the timestep in the diffusion process. At early stage, attention has an out-sized influence because Transformers are found to generate the overall shape more quickly, and at later stages when adding fine detail, convolution starts having a larger impact on the generated point cloud's local surface quality. In light of this observation, we propose a time-varying two-stream denoising model combined with convolution layers and transformer blocks. We generate an optimizable mask from each timestep to reweigh global and local features, obtaining time-varying fused features. Experimentally, we demonstrate that our proposed method quantitatively outperforms other state-of-the-art methods regarding visual quality and diversity. Code is avaiable https://***/Zhiyuan-R/Tiger-Diffusion.

关键词： 3D vision Diffusion Model Generative Model Point Cloud ShapeNet

来源：评论

学校读者我要写书评

暂无评论

CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor

CLIP as RNN: Segment Countless Visual Concepts without Train...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Sun, Shuyang Li, Runjia Torr, Philip Gu, Xiuye Li, Siyang Univ Oxford Oxford England Google Res Mountain View CA 94043 USA

ISBN: (纸本)9798350353006

Existing open-vocabulary image segmentation methods require a fine-tuning step on mask labels and/or image-text datasets. Mask labels are labor-intensive, which limits the number of categories in segmentation datasets. Consequently, the vocabulary capacity of pre-trained VLMs is severely reduced after fine-tuning. However, without fine-tuning, VLMs trained under weak image-text supervision tend to make suboptimal mask predictions. To alleviate these issues, we introduce a novel recurrent framework that progressively filters out irrelevant texts and enhances mask quality without training efforts. The recurrent unit is a two-stage segmenter built upon a frozen VLM. Thus, our model retains the VLM's broad vocabulary space and equips it with segmentation ability. Experiments show that our method outperforms not only the training-free counterparts, but also those fine-tuned with millions of data samples, and sets the new state-of-the-art records for both zero-shot semantic and referring segmentation. Concretely, we improve the current record by 28.8, 16.0, and 6.9 mIoU on Pascal VOC, COCO Object, and Pascal Context.

关键词： image segmentation open-vocabulary referring segmentation training-free methods vision-language models

来源：评论

学校读者我要写书评

暂无评论

PELA: Learning Parameter-Efficient Models with Low-Rank Approximation

PELA: Learning Parameter-Efficient Models with Low-Rank Appr...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Guo, Yangyang Wang, Guangzhi Kankanhalli, Mohan Natl Univ Singapore Singapore Singapore

ISBN: (纸本)9798350353006

Applying a pre-trained large model to downstream tasks is prohibitive under resource-constrained conditions. Re-cent dominant approaches for addressing efficiency issues involve adding a few learnable parameters to the fixed backbone model. This strategy, however, leads to more challenges in loading large models for downstream fine-tuning with limited resources. In this paper, we propose a novel method for increasing the parameter efficiency of pre-trained models by introducing an intermediate pre-training stage. To this end, we first employ low-rank approximation to compress the original large model and then devise a feature distillation module and a weight perturbation regularization module. These modules are specifically designed to enhance the low-rank model. In particular, we update only the low-rank model while freezing the backbone parameters during pre-training. This allows for direct and efficient utilization of the low-rank model for downstream fine-tuning tasks. The proposed method achieves both efficiencies in terms of required parameters and computation time while maintaining comparable results with minimal modifications to the backbone architecture. Specifically, when applied to three vision-only and one vision-language Transformer models, our approach often demonstrates a merely similar to 0.6 point decrease in performance while reducing the original parameter size by 1/3 to 2/3. We release our code at link.

关键词： Knowledge Distillation Low-rank Approximation vision-Language

来源：评论

学校读者我要写书评

暂无评论

DePT: Decoupled Prompt Tuning

DePT: Decoupled Prompt Tuning

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Zhang, Ji Wu, Shihan Gao, Lianli Shen, Heng Tao Song, Jingkuan Univ Elect Sci & Technol China UESTC Chengdu Peoples R China UESTC Shenzhen Inst Adv Study Chengdu Peoples R China Tongji Univ Shanghai Peoples R China

ISBN: (纸本)9798350353006

This work breaks through the Base-New Tradeoff (BNT) dilemma in prompt tuning, i.e., the better the tuned model generalizes to the base (or target) task, the worse it generalizes to new tasks, and vice versa. Specifically, through an in-depth analysis of the learned features of the base and new tasks, we observe that the BNT stems from a channel bias issue - the vast majority of feature channels are occupied by base-specific knowledge, leading to the collapse of task-shared knowledge important to new tasks. To address this, we propose the Decoupled Prompt Tuning (DePT) framework, which decouples base-specific knowledge from feature channels into an isolated feature space during prompt tuning, so as to maximally preserve task-shared knowl-edge in the original feature space for achieving better zero-shot generalization on new tasks. Importantly, our DePT is orthogonal to existing prompt tuning approaches, and can enhance them with negligible additional computational cost. Extensive experiments on several datasets show the flexibility and effectiveness of DePT. Code is available at https://***/Koorye/DePT.

关键词： Feature decoupling Few-shot learning Prompt tuning vision and language

来源：评论

学校读者我要写书评

暂无评论

GROUNDHOG : Grounding Large Language Models to Holistic Segmentation

GROUNDHOG : Grounding Large Language Models to Holistic Segm...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Zhang, Yichi Qiao, Zhiqiao Gao, Xiaofeng Shakiah, Suhaila Gao, Qiaozi Chai, Joyce Univ Michigan Ann Arbor MI 48109 USA Amazon AGI Seattle WA USA

ISBN: (纸本)9798350353006

Most multimodal large language models (MLLMs) learn language-to-object grounding through causal language modeling where grounded objects are captured by bounding boxes as sequences of location tokens. This paradigm lacks pixel-level representations that are important for fine-grained visual understanding and diagnosis. In this work, we introduce GROUNDHOG, an MLLM developed by grounding Large Language Models to holistic segmentation. GROUNDHOG incorporates a masked feature extractor and converts extracted features into visual entity tokens for the MLLM backbone, which then connectsgroundable phrases to unified grounding masks by retrieving and merging the entity masks. To train GROUNDHOG, we carefully curated M3G2, a grounded visual instruction tuning dataset with Multi-Modal Multi-Grained Grounding, by harvesting a collection of segmentation-grounded datasets with rich annotations. Our experimental results show that GROUNDHOG achieves superior performance on various language grounding tasks without task-specific fine-tuning, and significantly reduces object hallucination. GROUNDHOG also demonstrates better grounding towards complex forms of visual input and provides easy-to-understand diagnosis in failure cases.

关键词： Language Grounding Multi-Modal vision-Language Model

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 26 27 28 29 30 31 32 33 34 35 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：