检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

23,142 篇 会议
91 篇 期刊文献
15 册 图书

馆藏范围

23,247 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,637 篇 工学
- 11,168 篇 计算机科学与技术...
- 3,342 篇 软件工程
- 2,414 篇 机械工程
- 1,663 篇 光学工程
- 1,205 篇 电气工程
- 974 篇 控制科学与工程
- 739 篇 信息与通信工程
- 381 篇 仪器科学与技术
- 322 篇 生物工程
- 239 篇 生物医学工程（可授...
- 189 篇 电子科学与技术（可...
- 109 篇 化学工程与技术
- 106 篇 安全科学与工程
- 99 篇 测绘科学与技术
- 85 篇 建筑学
- 85 篇 交通运输工程
- 82 篇 土木工程
- 56 篇 力学（可授工学、理...
3,696 篇 医学
- 3,684 篇 临床医学
- 76 篇 基础医学(可授医学...
3,140 篇 理学
- 1,882 篇 物理学
- 1,605 篇 数学
- 547 篇 统计学（可授理学、...
- 466 篇 生物学
- 243 篇 系统科学
- 107 篇 化学
492 篇 管理学
- 290 篇 图书情报与档案管...
- 213 篇 管理科学与工程(可...
- 74 篇 工商管理
252 篇 艺术学
- 251 篇 设计学（可授艺术学...
58 篇 法学
38 篇 农学
25 篇 教育学
19 篇 经济学
10 篇 军事学
3 篇 文学

主题

10,395 篇 computer vision
3,893 篇 pattern recognit...
3,101 篇 training
2,104 篇 computational mo...
1,898 篇 visualization
1,799 篇 cameras
1,487 篇 feature extracti...
1,475 篇 three-dimensiona...
1,464 篇 shape
1,447 篇 image segmentati...
1,287 篇 robustness
1,235 篇 computer archite...
1,213 篇 semantics
1,112 篇 benchmark testin...
1,111 篇 conferences
1,104 篇 layout
1,092 篇 object detection
1,084 篇 computer science
1,026 篇 codes
907 篇 face recognition

机构

137 篇 univ sci & techn...
124 篇 univ chinese aca...
121 篇 chinese univ hon...
108 篇 tsinghua univers...
108 篇 carnegie mellon ...
105 篇 microsoft resear...
97 篇 zhejiang univ pe...
91 篇 swiss fed inst t...
85 篇 university of sc...
84 篇 zhejiang univers...
81 篇 shanghai ai lab ...
79 篇 university of ch...
75 篇 shanghai jiao to...
69 篇 microsoft res as...
68 篇 alibaba grp peop...
66 篇 adobe research
65 篇 national laborat...
64 篇 peking univ peop...
61 篇 univ oxford oxfo...
59 篇 peng cheng labor...

作者

80 篇 van gool luc
71 篇 timofte radu
65 篇 zhang lei
43 篇 luc van gool
40 篇 yang yi
37 篇 loy chen change
34 篇 li stan z.
33 篇 liu yang
33 篇 xiaoou tang
33 篇 murino vittorio
33 篇 chen chen
33 篇 qi tian
33 篇 li fei-fei
32 篇 tian qi
32 篇 sun jian
30 篇 ying shan
30 篇 pascal fua
29 篇 darrell trevor
28 篇 li xin
28 篇 hanqing lu

语言

23,073 篇 英文
148 篇 其他
20 篇 中文
5 篇 土耳其文
2 篇 日文

检索条件"任意字段=IEEE/CVF Conference on Computer Vision and Pattern Recognition"

共 23248 条记录，以下是331-340 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

OVER-NAV: Elevating Iterative vision-and-Language Navigation with Open-Vocabulary Detection and StructurEd Representation

OVER-NAV: Elevating Iterative Vision-and-Language Navigation...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Zhao, Ganlong Li, Guanbin Chen, Weikai Yu, Yizhou Univ Hong Kong Hong Kong Peoples R China Sun Yat Sen Univ Guangzhou Guangdong Peoples R China GuangDong Prov Key Lab Informat Secur Technol Guangzhou Guangdong Peoples R China Tencent Games Digital Content Technol Ctr Shenzhen Guangdong Peoples R China

ISBN: (纸本)9798350353006

Recent advances in Iterative vision-and-Language Navigation (IVLN) introduce a more meaningful and practical paradigm of VLN by maintaining the agent's memory across tours of scenes. Although the long-term memory aligns better with the persistent nature of the VLN task, it poses more challenges on how to utilize the highly unstructured navigation memory with extremely sparse supervision. Towards this end, we propose OVER-NAV, which aims to go over and beyond the current arts of IVLN techniques. In particular, we propose to incorporate LLMs and open-vocabulary detectors to distill key information and establish correspondence between multi-modal signals. Such a mechanism introduces reliable cross-modal supervision and enables on-the-fly generalization to unseen scenes without the need of extra annotation and re-training. To fully exploit the interpreted navigation data, we further introduce a structured representation, coded Omnigraph, to effectively integrate multi-modal information along the tour. Accompanied with a novel omnigraph fusion mechanism, OVER-NAV is able to extract the most relevant knowledge from omnigraph for a more accurate navigating action. In addition, OVER-NAV seamlessly supports both discrete and continuous environments under a unified framework. We demonstrate the superiority of OVER-NAV in extensive experiments.

关键词： Multi-Modal Learning Open-vocabulary vision-and-Language Navigation

来源：评论

学校读者我要写书评

暂无评论

Referring Expression Counting

Referring Expression Counting

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Dai, Siyang Liu, Jun Cheung, Ngai-Man Singapore Univ Technol & Design Singapore Singapore

ISBN: (纸本)9798350353006

Existing counting tasks are limited to the class level, which don't account for fine-grained details within the class. In real applications, it often requires in-context or referring human input for counting target objects. Take urban analysis as an example, fine-grained information such as traffic flow in different directions, pedestrians and vehicles waiting or moving at different sides of the junction, is more beneficial. Current settings of both class-specific and class-agnostic counting treat objects of the same class indifferently, which pose limitations in real use cases. To this end, we propose a new task named Referring Expression Counting (REC) which aims to count objects with different attributes within the same class. To evaluate the REC task, we create a novel dataset named REC-8K which contains 8011 images and 17122 referring expressions. Experiments on REC-8K show that our proposed method achieves state-of-the-art performance compared with several text-based counting methods and an open-set object detection model. We also outperform prior models on the class agnostic counting (CAC) benchmark [36] for the zero-shot setting, and perform on par with the few-shot methods. Code and dataset is available at https://***/sydai/referring-expression-counting.

关键词： counting object detection referring expression vision-language model zero-shot

来源：评论

学校读者我要写书评

暂无评论

RoMa: Robust Dense Feature Matching

RoMa: Robust Dense Feature Matching

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Edstedt, Johan Sun, Qiyu Bokman, Georg Wadenback, Marten Felsberg, Michael Linkoping Univ Linkoping Sweden East China Univ Sci & Technol Shanghai Peoples R China Chalmers Univ Technol Gothenburg Sweden

ISBN: (纸本)9798350353006

Feature matching is an important computer vision task that involves estimating correspondences between two images of a 3D scene, and dense methods estimate all such correspondences. The aim is to learn a robust model, i.e., a model able to match under challenging real-world changes. In this work, we propose such a model, leveraging frozen pretrained features from the foundation model DINOv2. Although these features are significantly more robust than local features trained from scratch, they are inherently coarse. We therefore combine them with specialized ConvNet fine features, creating a precisely localizable feature pyramid. To further improve robustness, we propose a tailored transformer match decoder that predicts anchor probabilities, which enables it to express multimodality. Finally, we propose an improved loss formulation through regression-by-classification with subsequent robust regression. We conduct a comprehensive set of experiments that show that our method, RoMa, achieves significant gains, setting a new state-of-the-art. In particular, we achieve a 36% improvement on the extremely challenging WxBS benchmark. Code is provided at ***/Parskatt/RoMa.

关键词： 3D vision dense feature matching dense matching feature matching geometry estimation image matching two-view geometry

来源：评论

学校读者我要写书评

暂无评论

GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis

GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Zheng, Shunyuan Zhou, Boyao Shao, Ruizhi Liu, Boning Zhang, Shengping Nie, Liqiang Liu, Yebin Harbin Inst Technol Harbin Peoples R China Tsinghua Univ Beijing Peoples R China Peng Cheng Lab Shenzhen Peoples R China

ISBN: (纸本)9798350353006

We present a new approach, termed GPS-Gaussian, for synthesizing novel views of a character in a real-time manner. The proposed method enables 2K-resolution rendering under a sparse-view camera setting. Unlike the original Gaussian Splatting or neural implicit rendering methods that necessitate per-subject optimizations, we introduce Gaussian parameter maps defined on the source views and regress directly Gaussian Splatting properties for instant novel view synthesis without any fine-tuning or optimization. To this end, we train our Gaussian parameter regression module on a large amount of human scan data, jointly with a depth estimation module to lift 2D parameter maps to 3D space. The proposed framework is fully differentiable and experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed. The code is available at https://***/aipixel/GPS-Gaussian.

关键词： Rendering (computer graphics)

来源：评论

学校读者我要写书评

暂无评论

Investigating Compositional Challenges in vision-Language Models for Visual Grounding

Investigating Compositional Challenges in Vision-Language Mo...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Zeng, Yunan Huang, Yan Zhang, Jinjin Jie, Zequn Chai, Zhenhua Wang, Liang Ctr Res Intelligent Percept & Comp CRIPAC Beijing Peoples R China Chinese Acad Sci CASIA Inst Automat Beijing Peoples R China Meituan Beijing Peoples R China

ISBN: (纸本)9798350353006

Pre-trained vision-language models (VLMs) have achieved high performance on various downstream tasks, which have been widely used for visual grounding tasks in a weakly supervised manner. However, despite the performance gains contributed by large vision and language pre-training, we find that state-of-the-art VLMs struggle with compositional reasoning on grounding tasks. To demonstrate this, we propose Attribute, Relation, and Priority grounding (ARPGrounding) benchmark to test VLMs' compositional reasoning ability on visual grounding tasks. ARPGrounding contains 11,425 samples and evaluates the compositional understanding of VLMs in three dimensions: 1) attribute, denoting comprehension of objects' properties;2) relation, indicating an understanding of relation between objects;3) priority, reflecting an awareness of the part of speech associated with nouns. Using the ARPGrounding benchmark, we evaluate several mainstream VLMs. We empirically find that these models perform quite well on conventional visual grounding datasets, achieving performance comparable to or surpassing state-of-the-art methods but showing strong deficiencies in compositional reasoning. Furthermore, we propose a composition-aware fine- tuning pipeline, demonstrating the potential to leverage cost- effective image-text annotations for enhancing the compositional understanding of VLMs in grounding tasks. Code is available at link.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Random Entangled Tokens for Adversarially Robust vision Transformer

Random Entangled Tokens for Adversarially Robust Vision Tran...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Gong, Huihui Dong, Mingjing Mao, Siqi Camtepe, Seyit Nepal, Surya Xu, Chang Univ Sydney Sydney NSW Australia CSIRO Data61 Eveleigh Australia City Univ Hong Kong Hong Kong Peoples R China Univ New South Wales Sydney NSW Australia

ISBN: (纸本)9798350353006

vision Transformers (ViTs) have emerged as a compelling alternative to Convolutional Neural Networks ( CNNs) in the realm of computer vision, showcasing tremendous potential. However, recent research has unveiled a susceptibility of ViTs to adversarial attacks, akin to their CNN counterparts. Adversarial training and randomization are two representative effective defenses for CNNs. Some researchers have attempted to apply adversarial training to ViTs and achieved comparable robustness to CNNs, while it is not easy to directly apply randomization to ViTs because of the architecture difference between CNNs and ViTs. In this paper, we delve into the structural intricacies of ViTs and propose a novel defense mechanism termed Random entangled image Transformer (ReiT), which seamlessly integrates adversarial training and randomization to bolster the adversarial robustness of ViTs. Recognizing the challenge posed by the structural disparities between ViTs and CNNs, we introduce a novel module, input-independent random entangled self-attention (II-ReSA). This module optimizes random entangled tokens that lead to "dissimilar" self-attention outputs by leveraging model parameters and the sampled random tokens, thereby synthesizing the self-attention module outputs and random entangled tokens to diminish adversarial similarity. ReiT incorporates two distinct random entangled tokens and employs dual randomization, offering an effective countermeasure against adversarial examples while ensuring comprehensive deduction guarantees. Through extensive experiments conducted on various ViT variants and benchmarks, we substantiate the superiority of our proposed method in enhancing the adversarial robustness of vision Transformers.

关键词： Adversarial Robustness Randomized Defence Self-Attention Mechanism vision Transformers

来源：评论

学校读者我要写书评

暂无评论

Zero-Reference Low-Light Enhancement via Physical Quadruple Priors

Zero-Reference Low-Light Enhancement via Physical Quadruple ...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Wang, Wenjing Yang, Huan Fu, Jianlong Liu, Jiaying Peking Univ Beijing Peoples R China 01 AI Beijing Peoples R China Microsoft Res Asia Beijing Peoples R China

ISBN: (纸本)9798350353006

Understanding illumination and reducing the need for supervision pose a significant challenge in low-light enhancement. Current approaches are highly sensitive to data usage during training and illumination-specific hyper-parameters, limiting their ability to handle unseen scenarios. In this paper, we propose a new zero-reference low-light enhancement framework trainable solely with normal light images. To accomplish this, we devise an illumination-invariant prior inspired by the theory of physical light transfer. This prior serves as the bridge between normal and low-light images. Then, we develop a prior-to-image framework trained without low-light data. During testing, this frame-work is able to restore our illumination-invariant prior back to images, automatically achieving low-light enhancement. Within this framework, we leverage a pretrained generative diffusion model for model ability, introduce a bypass decoder to handle detail distortion, as well as offer a lightweight version for practicality. Extensive experiments demonstrate our framework's superiority in various scenarios as well as good interpretability, robustness, and efficiency. Code is available on our project homepage.

关键词： diffusion image processing low-level vision Low-light enhancement zero-reference

来源：评论

学校读者我要写书评

暂无评论

SAM-CLIP: Merging vision Foundation Models towards Semantic and Spatial Understanding

SAM-CLIP: Merging Vision Foundation Models towards Semantic ...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Wang, Haoxiang Vasu, Pavan Kumar Anasosalu Faghri, Fartash Vemulapalli, Raviteja Farajtabar, Mehrdad Mehta, Sachin Rastegari, Mohammad Tuzel, Oncel Pouransari, Hadi Apple Cupertino CA 95014 USA Univ Illinois Urbana IL 61801 USA

ISBN: (纸本)9798350365474

The landscape of publicly available vision foundation models (VFMs), such as CLIP and Segment Anything Model (SAM), is expanding rapidly. VFMs are endowed with distinct capabilities stemming from their pre-training objectives. For instance, CLIP excels in semantic understanding, while SAM specializes in spatial understanding for segmentation. In this work, we introduce a simple recipe to efficiently merge VFMs into a unified model that absorbs their expertise. Our method integrates techniques of multi-task learning, continual learning, and distillation. Further, it demands significantly less computational cost compared to traditional multi-task training from scratch, and it only needs a small fraction of the pre-training datasets that were initially used to train individual models. By applying our method to SAM and CLIP, we obtain SAM-CLIP : a unified model that combines the capabilities of SAM and CLIP into a single vision transformer. Compared with deploying SAM and CLIP independently, our merged model, SAM-CLIP, reduces storage and compute costs for inference, making it well-suited for edge device applications. We show that SAM-CLIP not only retains the foundational strengths of SAM and CLIP, but also introduces synergistic functionalities, notably in zero-shot semantic segmentation, where SAM-CLIP establishes new state-of-the-art results on 5 benchmarks. It outperforms previous models that are specifically designed for this task by a large margin, including +6.8% and +5.9% mean IoU improvement on Pascal-VOC and COCO-Stuff datasets, respectively.

关键词： CLIP Foundation Model Model Merging Segmentation

来源：评论

学校读者我要写书评

暂无评论

Event-Based Eye Tracking. AIS 2024 Challenge Survey

Event-Based Eye Tracking. AIS 2024 Challenge Survey

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Wang, Zuowen Gao, Chang Wu, Zongwei Conde, Marcos, V Timofte, Radu Liu, Shih-Chii Chen, Qinyu Zha, Zheng-jun Zhai, Wei Han, Han Liao, Bohao Wu, Yuliang Wan, Zengyu Wang, Zhong Cao, Yang Tan, Ganchao Chen, Jinze Pei, Yan Ru Bruers, Sasskia Crouzet, Sebastien McLelland, Douglas Coenen, Oliver Zhang, Baoheng Gao, Yizhao Li, Jingyuan So, Hayden Kwok-Hay Bich, Philippe Boretti, Chiara Prono, Luciano Lica, Mircea Dinucu-Jianu, David Griu, Catalin Lin, Xiaopeng Ren, Hongwei Cheng, Bojun Zhang, Xinan Vial, Valentin Yezzi, Anthony Tsai, James Univ Zurich Inst Neuroinformat Zurich Switzerland Swiss Fed Inst Technol Zurich Switzerland Delft Univ Technol Delft Netherlands Univ Wurzburg Wurzburg Germany Leiden Univ Leiden Netherlands Univ Sci & Technol China Hefei Anhui Peoples R China Brainchip Inc Laguna Hills CA USA Univ Hong Kong Hong Kong Peoples R China Politecn Torino Turin Italy Hong Kong Univ Sci & Technol Guangzhou Guangzhou Guangdong Peoples R China Georgia Inst Technol Atlanta GA 30332 USA

ISBN: (纸本)9798350365474

This survey reviews the AIS 2024 Event-Based Eye Tracking (EET) Challenge. The task of the challenge focuses on processing eye movement recorded with event cameras and predicting the pupil center of the eye. The challenge emphasizes efficient eye tracking with event cameras to achieve good task accuracy and efficiency trade-off. During the challenge period, 38 participants registered for the Kaggle competition, and 8 teams submitted a challenge factsheet. The novel and diverse methods from the submitted factsheets are reviewed and analyzed in this survey to advance future event-based eye tracking research.

关键词： computer vision dynamic vision sensor event camera eye tracking

来源：评论

学校读者我要写书评

暂无评论

Weak-to-Strong 3D Object Detection with X-Ray Distillation

Weak-to-Strong 3D Object Detection with X-Ray Distillation

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Gambashidze, Alexander Dadukin, Aleksandr Golyadkin, Maxim Razzhivina, Maria Makarov, Ilya Artificial Intelligence Res Inst Barcelona Spain HSE Univ Moscow Russia ISP RAS Moscow Russia

ISBN: (纸本)9798350353006

This paper addresses the critical challenges of sparsity and occlusion in LiDAR-based 3D object detection. Current methods often rely on supplementary modules or specific architectural designs, potentially limiting their applicability to new and evolving architectures. To our knowledge, we are the first to propose a versatile technique that seamlessly integrates into any existing framework for 3D Object Detection, marking the first instance of Weak-to-Strong generalization in 3D computer vision. We introduce a novel framework, X-Ray Distillation with Object-Complete Frames, suitable for both supervised and semi-supervised settings, that leverages the temporal aspect of point cloud sequences. This method extracts crucial information from both previous and subsequent LiDAR frames, creating Object-Complete frames that represent objects from multiple viewpoints, thus addressing occlusion and sparsity. Given the limitation of not being able to generate Object-Complete frames during online inference, we utilize Knowledge Distillation within a Teacher-Student framework. This technique encourages the strong Student model to emulate the behavior of the weaker Teacher, which processes simple and informative Object-Complete frames, effectively offering a comprehensive view of objects as if seen through X-ray vision. Our proposed methods surpass state-of-the-art in semi-supervised learning by 1-1.5 mAP and enhance the performance of five established supervised models by 1-2 mAP on standard autonomous driving datasets, even with default hyperparameters. Code for Object-Complete frames is available here: https://***/sakharok13/X-Ray-TeacherPatching-Tools.

关键词： 3D detection autonomous driving computer vision

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 30 31 32 33 34 35 36 37 38 39 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：