检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

23,001 篇 会议
126 册 图书
92 篇 期刊文献

馆藏范围

23,218 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,623 篇 工学
- 11,108 篇 计算机科学与技术...
- 3,479 篇 软件工程
- 2,445 篇 机械工程
- 1,716 篇 光学工程
- 1,075 篇 电气工程
- 1,014 篇 控制科学与工程
- 785 篇 信息与通信工程
- 412 篇 仪器科学与技术
- 352 篇 生物工程
- 251 篇 生物医学工程（可授...
- 196 篇 电子科学与技术（可...
- 114 篇 化学工程与技术
- 108 篇 安全科学与工程
- 100 篇 测绘科学与技术
- 88 篇 建筑学
- 87 篇 交通运输工程
- 84 篇 土木工程
3,494 篇 医学
- 3,481 篇 临床医学
- 81 篇 基础医学(可授医学...
3,242 篇 理学
- 1,939 篇 物理学
- 1,640 篇 数学
- 563 篇 统计学（可授理学、...
- 500 篇 生物学
- 249 篇 系统科学
- 107 篇 化学
522 篇 管理学
- 311 篇 图书情报与档案管...
- 224 篇 管理科学与工程(可...
- 76 篇 工商管理
276 篇 艺术学
- 276 篇 设计学（可授艺术学...
66 篇 法学
- 63 篇 社会学
38 篇 农学
28 篇 教育学
22 篇 经济学
10 篇 军事学
3 篇 文学

主题

10,187 篇 computer vision
3,967 篇 pattern recognit...
3,005 篇 training
2,007 篇 computational mo...
1,818 篇 visualization
1,815 篇 cameras
1,516 篇 feature extracti...
1,481 篇 shape
1,455 篇 three-dimensiona...
1,438 篇 image segmentati...
1,287 篇 robustness
1,205 篇 computer archite...
1,155 篇 semantics
1,147 篇 conferences
1,107 篇 layout
1,092 篇 computer science
1,087 篇 object detection
1,025 篇 benchmark testin...
970 篇 codes
922 篇 face recognition

机构

136 篇 univ sci & techn...
121 篇 univ chinese aca...
118 篇 chinese univ hon...
107 篇 carnegie mellon ...
101 篇 tsinghua univers...
101 篇 microsoft resear...
95 篇 swiss fed inst t...
93 篇 zhejiang univ pe...
82 篇 university of sc...
81 篇 zhejiang univers...
80 篇 university of ch...
77 篇 shanghai ai lab ...
72 篇 shanghai jiao to...
69 篇 national laborat...
67 篇 microsoft res as...
67 篇 alibaba grp peop...
64 篇 adobe research
61 篇 tsinghua univ pe...
60 篇 peking univ peop...
59 篇 univ oxford oxfo...

作者

81 篇 van gool luc
72 篇 timofte radu
64 篇 zhang lei
47 篇 luc van gool
40 篇 yang yi
40 篇 li stan z.
37 篇 loy chen change
34 篇 chen chen
33 篇 xiaoou tang
32 篇 liu yang
32 篇 qi tian
31 篇 tian qi
31 篇 sun jian
30 篇 murino vittorio
30 篇 pascal fua
29 篇 darrell trevor
29 篇 li fei-fei
28 篇 li xin
28 篇 ying shan
27 篇 vasconcelos nuno

语言

23,137 篇 英文
53 篇 其他
22 篇 中文
5 篇 土耳其文
2 篇 日文

检索条件"任意字段=IEEE Conference on Computer Vision and Pattern Recognition Workshops"

共 23219 条记录，以下是261-270 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

Cross-view and Cross-pose Completion for 3D Human Understanding

Cross-view and Cross-pose Completion for 3D Human Understand...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Armando, Matthieu Galaaoui, Salma Baradel, Fabien Lucas, Thomas Leroy, Vincent Bregier, Romain Weinzaepfel, Philippe Rogez, Gregory NAVER LABS Europe Meylan France

ISBN: (纸本)9798350353013;9798350353006

Human perception and understanding is a major domain of computer vision which, like many other vision subdomains recently, stands to gain from the use of large models pre-trained on large datasets. We hypothesize that the most common pre-training strategy of relying on general purpose, object-centric image datasets such as ImageNet, is limited by an important domain shift. On the other hand, collecting domain-specific ground truth such as 2D or 3D labels does not scale well. Therefore, we propose a pre-training approach based on self-supervised learning that works on human-centric data using only images. Our method uses pairs of images of humans: the first is partially masked and the model is trained to reconstruct the masked parts given the visible ones and a second image. It relies on both stereoscopic (cross-view) pairs, and temporal (cross-pose) pairs taken from videos, in order to learn priors about 3D as well as human motion. We pre-train a model for body-centric tasks and one for hand-centric tasks. With a generic transformer architecture, these models outperform existing self-supervised pre-training methods on a wide set of human-centric downstream tasks, and obtain state-of-the-art performance for instance when fine-tuning for model-based and model-free human mesh recovery.

关键词： hand-centric human-centric pretraining representation learning vision tranformer

来源：评论

学校读者我要写书评

暂无评论

Boosting Continual Learning of vision-Language Models via Mixture-of-Experts Adapters

Boosting Continual Learning of Vision-Language Models via Mi...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Yu, Jiazuo Zhuge, Yunzhi Zhang, Lu Hu, Ping Wang, Dong Lu, Huchuan He, You Dalian Univ Technol Dalian Peoples R China Univ Elect Sci & Technol China Chengdu Peoples R China Tsinghua Univ Beijing Peoples R China

ISBN: (纸本)9798350353006

Continual learning can empower vision-language models to continuously acquire new knowledge, without the need for access to the entire historical dataset. However, mitigating the performance degradation in large-scale models is non-trivial due to (i) parameter shifts throughout life-long learning and (ii) significant computational burdens associated with full-model tuning. In this work, we present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models. Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters in response to new tasks. To preserve the zero-shot recognition capability of vision-language models, we further introduce a Distribution Discriminative Auto-Selector (DDAS) that automatically routes in-distribution and out-of-distribution inputs to the MoE Adapter and the original CLIP, respectively. Through extensive experiments across various settings, our proposed method consistently outperforms previous state-of-the-art approaches while concurrently reducing parameter training burdens by 60%. Our code locates at https://***/JiazuoYu/MoE-Adapters4CL

关键词： class incremental learning continual learning task incremental learning vision-Language Model

来源：评论

学校读者我要写书评

暂无评论

BigGait: Learning Gait Representation You Want by Large vision Models

BigGait: Learning Gait Representation You Want by Large Visi...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Ye, Dingqiang Fan, Chao Ma, Jingzhe Liu, Xiaoming Yu, Shiqi Southern Univ Sci & Technol Res Inst Trustworthy Autonomous Syst Shenzhen Peoples R China Southern Univ Sci & Technol Dept Comp Sci & Engn Shenzhen Peoples R China Michigan State Univ E Lansing MI USA

ISBN: (纸本)9798350353013;9798350353006

Gait recognition stands as one of the most pivotal remote identification technologies and progressively expands across research and industry communities. However, existing gait recognition methods heavily rely on task-specific upstream driven by supervised learning to provide explicit gait representations like silhouette sequences, which inevitably introduce expensive annotation costs and potential error accumulation. Escaping from this trend, this work explores effective gait representations based on the all-purpose knowledge produced by task-agnostic Large vision Models (LVMs) and proposes a simple yet efficient gait framework, termed BigGait. Specifically, the Gait Representation Extractor (GRE) within BigGait draws upon design principles from established gait representations, effectively transforming all-purpose knowledge into implicit gait representations without requiring third-party supervision signals. Experiments on CCPG, CAISA-B* and SUSTech1K indicate that BigGait significantly outperforms the previous methods in both within-domain and cross-domain tasks in most cases, and provides a more practical paradigm for learning the next-generation gait representation. Finally, we delve into prospective challenges and promising directions in LVMs-based gait recognition, aiming to inspire future work in this emerging topic. The source code is available at https://github. com/ShiqiYu/OpenGait.

关键词： Gait recognition Large vision Models Person Re-Identification

来源：评论

学校读者我要写书评

暂无评论

AM-RADIO: Agglomerative vision Foundation Model Reduce All Domains Into One

AM-RADIO: Agglomerative Vision Foundation Model Reduce All D...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Ranzinger, Mike Heinrich, Greg Kautz, Jan Molchanov, Pavlo NVIDIA Santa Clara CA 95051 USA

ISBN: (纸本)9798350353006

A handful of visual foundation models (VFMs) have recently emerged as the backbones for numerous downstream tasks. VFMs like CLIP, DINOv2, SAM are trained with distinct objectives, exhibiting unique characteristics for various downstream tasks. We find that despite their conceptual differences, these models can be effectively merged into a unified model through multi-teacher distillation. We name this approach AM-RADIO (Agglomerative Model - Reduce All Domains Into One). This integrative approach not only surpasses the performance of individual teacher models but also amalgamates their distinctive features, such as zero-shot vision-language comprehension, detailed pixel-level understanding, and open vocabulary segmentation capabilities. Additionally, in pursuit of the most hardware-efficient backbone, we evaluated numerous architectures in our multi-teacher distillation pipeline using the same training recipe. This led to the development of a novel architecture (E-RADIO) that exceeds the performance of its predecessors and is at least 6x faster than the teacher models at matched resolution. Our comprehensive benchmarking process covers downstream tasks including ImageNet classification, semantic segmentation linear probing, COCO object detection and integration into LLaVa-1.5. Code: https://***/NVlabs/RADIO.

关键词： Distillation Knowledge Distillation Semantic Segmentation Visual Foundation Model Visual Large Language Models Zero Shot Classification

来源：评论

学校读者我要写书评

暂无评论

Flow-Guided Online Stereo Rectification for Wide Baseline Stereo

Flow-Guided Online Stereo Rectification for Wide Baseline St...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Kumar, Anush Mannan, Fahim Jafari, Omid Hosseini Li, Shile Heider, Felix Torc Robot Blacksburg VA 24060 USA Princeton Univ Princeton NJ 08544 USA

ISBN: (纸本)9798350353006

Stereo rectification is widely considered "solved" due to the abundance of traditional approaches to perform rectification. However, autonomous vehicles and robots in-the-wild require constant re-calibration due to exposure to various environmental factors, including vibration, and structural stress, when cameras are arranged in a wide-baseline configuration. Conventional rectification methods fail in these challenging scenarios: especially for larger vehicles, such as autonomous freight trucks and semi-trucks, the resulting incorrect rectification severely affects the quality of downstream tasks that use stereo/multi-view data. To tackle these challenges, we propose an online rectification approach that operates at real-time rates while achieving high accuracy. We propose a novel learning-based online calibration approach that utilizes stereo correlation volumes built from a feature representation obtained from cross-image attention. Our model is trained to minimize vertical optical flow as proxy rectification constraint, and predicts the relative rotation between the stereo pair. The method is real-time and even outperforms conventional methods used for offline calibration, and substantially improves downstream stereo depth, post-rectification. We release two public datasets (https://***/online-stereo-recification/), a synthetic and experimental wide baseline dataset, to foster further research.

关键词： Autonomous Driving Camera Pose Estimation computer vision Rectification Datasets Stereo Rectification Wide Baseline Stereo

来源：评论

学校读者我要写书评

暂无评论

From Correspondences to Pose: Non-minimal Certifiably Optimal Relative Pose without Disambiguation

From Correspondences to Pose: Non-minimal Certifiably Optima...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Tirado-Garin, Javier Civera, Javier Univ Zaragoza I3A Zaragoza Spain

ISBN: (纸本)9798350353013;9798350353006

Estimating the relative camera pose from n >= 5 correspondences between two calibrated views is a fundamental task in computer vision. This process typically involves two stages: 1) estimating the essential matrix between the views, and 2) disambiguating among the four candidate relative poses that satisfy the epipolar geometry. In this paper, we demonstrate a novel approach that, for the first time, bypasses the second stage. Specifically, we show that it is possible to directly estimate the correct relative camera pose from correspondences without needing a post-processing step to enforce the cheirality constraint on the correspondences. Building on recent advances in certifiable non-minimal optimization, we frame the relative pose estimation as a Quadratically Constrained Quadratic Program (QCQP). By applying the appropriate constraints, we ensure the estimation of a camera pose that corresponds to a valid 3D geometry and that is globally optimal when certified. We validate our method through exhaustive synthetic and real-world experiments, confirming the efficacy, efficiency and accuracy of the proposed approach. Code is available at https://***/javrtg/C2P.

关键词： computer vision convex optimization epipolar geometry essential matrix non-minimal solver relative pose semidefinite programming

来源：评论

学校读者我要写书评

暂无评论

Improved Visual Grounding through Self-Consistent Explanations

Improved Visual Grounding through Self-Consistent Explanatio...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： He, Ruozhen Cascante-Bonilla, Paola Yang, Ziyan Berg, Alexander C. Ordonez, Vicente Rice Univ Houston TX 77005 USA Univ Calif Irvine Irvine CA USA

ISBN: (纸本)9798350353006

vision-and-language models trained to match images with text can be combined with visual explanation methods to point to the locations of specific objects in an image. Our work shows that the localization -"grounding"-abilities of these models can be further improved by fine-tuning for self-consistent visual explanations. We propose a strategy for augmenting existing text-image datasets with paraphrases using a large language model, and SelfEQ, a weakly-supervised strategy on visual explanation maps for paraphrases that encourages self-consistency. Specifically, for an input textual phrase, we attempt to generate a paraphrase and finetune the model so that the phrase and paraphrase map to the same region in the image. We posit that this both expands the vocabulary that the model is able to handle, and improves the quality of the object locations highlighted by gradient-based visual explanation methods (e.g. GradCAM). We demonstrate that SelfEQ improves performance on Flickr30k, ReferIt, and RefCOCO+ over a strong baseline method and several prior works. Particularly, comparing to other methods that do not use any type of box annotations, we obtain 84.07% on Flickr30k ( an absolute improvement of 4.69%), 67.40% on ReferIt (an absolute improvement of 7.68%), and 75.10%, 55.49% on RefCOCO+ test sets A and B respectively (an absolute improvement of 3.74% on average).

关键词： vision and language visual grounding

来源：评论

学校读者我要写书评

暂无评论

HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces

HybridNeRF: Efficient Neural Rendering via Adaptive Volumetr...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Turki, Haithem Agrawal, Vasu Bulo, Samuel Rota Porzi, Lorenzo Kontschieder, Peter Ramanan, Deva Zollhofer, Michael Richardt, Christian Meta Real Labs Menlo Pk CA 94025 USA Carnegie Mellon Univ Pittsburgh PA 15213 USA

ISBN: (纸本)9798350353006

Neural radiance fields provide state-of-the-art view synthesis quality but tend to be slow to render. One reason is that they make use of volume rendering, thus requiring many samples (and model queries) per ray at render time. Although this representation is flexible and easy to optimize, most real-world objects can be modeled more efficiently with surfaces instead of volumes, requiring far fewer samples per ray. This observation has spurred considerable progress in surface representations, such as signed distance functions, but these may struggle to model semi-opaque and thin structures. We propose a method, HybridNeRF, that leverages the strengths of both representations by rendering most objects as surfaces while modeling the (typically) small fraction of challenging regions volumetrically. We evaluate HybridNeRF against the challenging Eyeful Tower dataset [38] along with other commonly used view synthesis datasets. When comparing to state-of-the-art baselines, including recent rasterization-based approaches, we improve error rates by 15-30% while achieving real-time framerates (at least 36 FPS) for virtual-reality resolutions (2K -> 2K). Project page: https://***/hybrid-nerf/.

关键词： 3d reconstruction computer vision machine learning neural radiance fields neural rendering novel view synthesis

来源：评论

学校读者我要写书评

暂无评论

A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural Network

A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neu...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Ma, Ruichen Qiao, Guanchao Liu, Yian Meng, Liwei Ning, Ning Liu, Yang Hu, Shaogang Univ Elect Sci & Technol China Chengdu Peoples R China

ISBN: (纸本)9798350353013;9798350353006

Binary neural networks utilize 1-bit quantized weights and activations to reduce both the model's storage demands and computational burden. However, advanced binary architectures still incorporate millions of inefficient and non-hardware-friendly full-precision multiplication operations. A&B BNN is proposed to directly remove part of the multiplication operations in a traditional BNN and replace the rest with an equal number of bit operations, introducing the mask layer and the quantized RPReLU structure based on the normalizer-free network architecture. The mask layer can be removed during inference by leveraging the intrinsic characteristics of BNN with straightforward mathematical transformations to avoid the associated multiplication operations. The quantized RPReLU structure enables more efficient bit operations by constraining its slope to be integer powers of 2. Experimental results achieved 92.30%, 69.35%, and 66.89% on the CIFAR-10, CIFAR-100, and ImageNet datasets, respectively, which are competitive with the state-of-the-art. Ablation studies have verified the efficacy of the quantized RPReLU structure, leading to a 1.14% enhancement on the ImageNet compared to using a fixed slope RLeakyReLU. The proposed add&bit-operation-only BNN offers an innovative approach for hardware-friendly network architecture.

关键词： Binary neural networks Hardware-friendly structure Multiplication-free structure pattern recognition

来源：评论

学校读者我要写书评

暂无评论

UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes

UnScene3D: Unsupervised 3D Instance Segmentation for Indoor ...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Rozenberszki, David Litany, Or Dai, Angela Tech Univ Munich Munich Germany Technion Haifa Israel NVIDIA Santa Clara CA USA

ISBN: (纸本)9798350353006

3D instance segmentation is fundamental to geometric understanding of the world around us. Existing methods for instance segmentation of 3D scenes rely on supervision from expensive, manual 3D annotations. We propose UnScene3D, the first fully unsupervised 3D learning approach for class-agnostic 3D instance segmentation of indoor scans. UnScene3D first generates pseudo masks by leveraging self-supervised color and geometry features to find potential object regions. We operate on a basis of geometric oversegmentation, enabling efficient representation and learning on high-resolution 3D data. The coarse proposals are then refined through self-training our model on its predictions. Our approach improves over clustering-based alternatives to unsupervised 3D instance segmentation methods by more than 300% Average Precision score, demonstrating effective instance segmentation even in challenging, cluttered 3D scenes.

关键词： 3D computer vision 3D Instance Segmentation Graph-cuts Scene Understanding Self-training Unsupervised Instance Segmentation Unsupervised Learning

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 23 24 25 26 27 28 29 30 31 32 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：