检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

12,844 篇 会议
13 篇 期刊文献
2 册 图书

馆藏范围

12,859 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

7,573 篇 工学
- 6,863 篇 计算机科学与技术...
- 880 篇 机械工程
- 814 篇 软件工程
- 435 篇 控制科学与工程
- 360 篇 光学工程
- 306 篇 电气工程
- 209 篇 仪器科学与技术
- 124 篇 信息与通信工程
- 91 篇 生物工程
- 62 篇 生物医学工程（可授...
- 39 篇 电子科学与技术（可...
- 34 篇 安全科学与工程
- 26 篇 化学工程与技术
- 21 篇 交通运输工程
- 20 篇 建筑学
- 18 篇 土木工程
2,957 篇 医学
- 2,956 篇 临床医学
- 15 篇 基础医学(可授医学...
- 12 篇 药学(可授医学、理...
700 篇 理学
- 359 篇 物理学
- 225 篇 数学
- 175 篇 系统科学
- 95 篇 统计学（可授理学、...
- 93 篇 生物学
- 22 篇 化学
201 篇 艺术学
- 201 篇 设计学（可授艺术学...
84 篇 管理学
- 59 篇 图书情报与档案管...
- 25 篇 管理科学与工程(可...
- 14 篇 工商管理
23 篇 法学
- 21 篇 社会学
5 篇 农学
4 篇 教育学
2 篇 经济学
1 篇 军事学

主题

6,464 篇 computer vision
2,688 篇 training
2,437 篇 pattern recognit...
1,780 篇 computational mo...
1,522 篇 visualization
1,348 篇 three-dimensiona...
1,091 篇 computer archite...
1,063 篇 semantics
997 篇 benchmark testin...
976 篇 codes
970 篇 conferences
854 篇 feature extracti...
830 篇 cameras
771 篇 task analysis
707 篇 deep learning
646 篇 image segmentati...
611 篇 object detection
595 篇 shape
554 篇 transformers
538 篇 neural networks

机构

132 篇 univ sci & techn...
122 篇 carnegie mellon ...
120 篇 tsinghua univ pe...
114 篇 univ chinese aca...
113 篇 chinese univ hon...
94 篇 tsinghua univers...
91 篇 zhejiang univ pe...
91 篇 swiss fed inst t...
85 篇 peng cheng lab p...
81 篇 university of ch...
80 篇 zhejiang univers...
77 篇 shanghai ai lab ...
77 篇 peng cheng labor...
75 篇 university of sc...
69 篇 shanghai jiao to...
68 篇 shanghai jiao to...
67 篇 alibaba grp peop...
67 篇 stanford univ st...
66 篇 univ hong kong p...
64 篇 sensetime res pe...

作者

77 篇 timofte radu
63 篇 van gool luc
45 篇 zhang lei
36 篇 yang yi
36 篇 luc van gool
34 篇 tao dacheng
31 篇 loy chen change
29 篇 chen chen
28 篇 sun jian
28 篇 qi tian
25 篇 li xin
24 篇 liu yang
24 篇 tian qi
24 篇 ying shan
23 篇 wang xinchao
23 篇 zha zheng-jun
23 篇 boxin shi
21 篇 zhou jie
21 篇 vasconcelos nuno
20 篇 luo ping

语言

12,851 篇 英文
7 篇 其他
1 篇 中文

检索条件"任意字段=IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops"

共 12859 条记录，以下是341-350 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Leveraging vision-Language Models for Improving Domain Generalization in Image Classification

Leveraging Vision-Language Models for Improving Domain Gener...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Addepalli, Sravanti Asokan, Ashish Ramayee Sharma, Lakshay Babu, R. Venkatesh Indian Inst Sci Vision & AI Lab Bangalore Karnataka India

ISBN: (纸本)9798350353006

vision-Language Models (VLMs) such as CLIP are trained on large amounts of image-text pairs, resulting in remarkable generalization across several data distributions. However, in several cases, their expensive training and data collection/curation costs do not justify the end application. This motivates a vendor-client paradigm, where a vendor trains a large-scale VLM and grants only input-output access to clients on a pay-per-query basis in a black-box setting. The client aims to minimize inference cost by distilling the VLM to a student model using the limited available task-specific data, and further deploying this student model in the downstream application. While naive distillation largely improves the In-Domain (ID) accuracy of the student, it fails to transfer the superior out-of-distribution (OOD) generalization of the VLM teacher using the limited available labeled images. To mitigate this, we propose vision-Language to vision - Align, Distill, Predict (VL2V-ADiP), which first aligns the vision and language modalities of the teacher model with the vision modality of a pre-trained student model, and further distills the aligned VLM representations to the student. This maximally retains the pre-trained features of the student, while also incorporating the rich representations of the VLM image encoder and the superior generalization of the text embeddings. The proposed approach achieves state-of-the-art results on the standard Domain Generalization benchmarks in a black-box teacher setting as well as a white-box setting where the weights of the VLM are accessible. Project page: http://***/VL2V-ADiP/

关键词： CLIP Distillation Domain Generalization OOD Generalization vision-language models

来源：评论

学校读者我要写书评

暂无评论

Beyond Image Super-Resolution for Image recognition with Task-Driven Perceptual Loss

Beyond Image Super-Resolution for Image Recognition with Tas...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Kim, Jaeha Oh, Junghun Lee, Kyoung Mu Seoul Natl Univ Dept ECE Seoul South Korea Seoul Natl Univ ASRI Seoul South Korea Seoul Natl Univ IPAI Seoul South Korea

ISBN: (纸本)9798350353013;9798350353006

In real-world scenarios, image recognition tasks, such as semantic segmentation and object detection, often pose greater challenges due to the lack of information available within low-resolution (LR) content. Image super-resolution (SR) is one of the promising solutions for addressing the challenges. However, due to the ill-posed property of SR, it is challenging for typical SR methods to restore task-relevant high-frequency contents, which may dilute the advantage of utilizing the SR method. Therefore, in this paper, we propose Super-Resolution for Image recognition (SR4IR) that effectively guides the generation of SR images beneficial to achieving satisfactory image recognition performance when processing LR images. The critical component of our SR4IR is the task-driven perceptual (TDP) loss that enables the SR network to acquire task-specific knowledge from a network tailored for a specific task. Moreover, we propose a cross-quality patch mix and an alternate training framework that significantly enhances the efficacy of the TDP loss by addressing potential problems when employing the TDP loss. Through extensive experiments, we demonstrate that our SR4IR achieves outstanding task performance by generating SR images useful for a specific image recognition task, including semantic segmentation, object detection, and image classification. The implementation code is available at https://***/JaehaKim97/SR4IR.

关键词： Low-level vision Perceptual loss Super-resolution Task-aware restoration

来源：评论

学校读者我要写书评

暂无评论

Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions

Improved Zero-Shot Classification by Adapting VLMs with Text...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Saha, Oindrila Van Horn, Grant Maji, Subhransu Univ Massachusetts Amherst MA 01003 USA

ISBN: (纸本)9798350353006

The zero-shot performance of existing vision-language models (VLMs) such as CLIP [29] is limited by the availability of large-scale, aligned image and text datasets in specific domains. In this work, we leverage two complementary sources of information-descriptions of categories generated by large language models (LLMs) and abundant, fine- grained image classification datasets-to improve the zero-shot classification performance of VLMs across finegrained domains. On the technical side, we develop methods to train VLMs with this "bag-level" image-text supervision. We find that simply using these attributes at test-time does not improve performance, but our training strategy, for example, on the iNaturalist [41] dataset, leads to an average improvement of 4-5% in zero-shot classification accuracy for novel categories of birds [42] and flowers [23]. Similar improvements are observed in domains where a subset of the categories was used to fine-tune the model. By prompting LLMs in various ways, we generate descriptions that capture visual appearance, habitat, and geographic regions and pair them with existing attributes such as the taxonomic structure of the categories. We systematically evaluate their ability to improve zero-shot cat-egorization in natural domains. Our findings suggest that geographic priors can be just as effective and are complementary to visual appearance. Our method also outperforms prior work on prompt-based tuning of VLMs. We release the benchmark, consisting of 14 datasets at https: //***/cvl-umass/AdaptCLIPZS, which will contribute to future research in zero-shot recognition.

关键词： Fine-grained Classification Large Language Model Multimodal Model vision Language Model Zero Shot Learning

来源：评论

学校读者我要写书评

暂无评论

Differentiable Shadow Mapping for Efficient Inverse Graphics

Differentiable Shadow Mapping for Efficient Inverse Graphics

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Worchel, Markus Alexa, Marc TU Berlin Berlin Germany

ISBN: (纸本)9798350301298

We show how shadows can be efficiently generated in differentiable rendering of triangle meshes. Our central observation is that pre-filtered shadow mapping, a technique for approximating shadows based on rendering from the perspective of a light, can be combined with existing differentiable rasterizers to yield differentiable visibility information. We demonstrate at several inverse graphics problems that differentiable shadow maps are orders of magnitude faster than differentiable light transport simulation with similar accuracy - while differentiable rasterization without shadows often fails to converge.

关键词： vision + graphics

来源：评论

学校读者我要写书评

暂无评论

Grounding Everything: Emerging Localization Properties in vision-Language Transformers

Grounding Everything: Emerging Localization Properties in Vi...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Bousselham, Walid Petersen, Felix Ferrari, Vittorio Kuehne, Hilde Univ Bonn Bonn Germany Goethe Univ Frankfurt Frankfurt Germany Stanford Univ Stanford CA 94305 USA Synthesia Io London England MIT IBM Watson AI Lab Cambridge MA USA

ISBN: (纸本)9798350353013;9798350353006

vision-language foundation models have shown remarkable performance in various zero-shot settings such as image retrieval, classification, or captioning. But so far, those models seem to fall behind when it comes to zero-shot localization of referential expressions and objects in images. As a result, they need to be fine-tuned for this task. In this paper, we show that pretrained vision-language (VL) models allow for zero-shot open-vocabulary object localization without any fine-tuning. To leverage those capabilities, we propose a Grounding Everything Module (GEM) that generalizes the idea of value-value attention introduced by CLIPSurgery [17] to a self-self attention path. We show that the concept of self-self attention corresponds to clustering, thus enforcing groups of tokens arising from the same object to be similar while preserving the alignment with the language space. To further guide the group formation, we propose a set of regularizations that allows the model to finally generalize across datasets and backbones. We evaluate the proposed GEM framework on various benchmark tasks and datasets for semantic segmentation. GEM not only outperforms other training-free open-vocabulary localization methods, but also achieves state-of-the-art results on the recently proposed OpenImagesV7 large-scale segmentation benchmark. (1)

关键词： CLIP open-vocabulary zero-shot segmentation vision-language model

来源：评论

学校读者我要写书评

暂无评论

ESR-NeRF: Emissive Source Reconstruction Using LDR Multi-view Images

ESR-NeRF: Emissive Source Reconstruction Using LDR Multi-vie...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Jeong, Jinseo Koo, Junseo Zhang, Qimeng Kim, Gunhee Seoul Natl Univ Seoul South Korea Korea Univ Seoul South Korea

ISBN: (纸本)9798350353013;9798350353006

Existing NeRF-based inverse rendering methods suppose that scenes are exclusively illuminated by distant light sources, neglecting the potential influence of emissive sources within a scene. In this work, we confront this limitation using LDR multi-view images captured with emissive sources turned on and off. Two key issues must be addressed: 1) ambiguity arising from the limited dynamic range along with unknown lighting details, and 2) the expensive computational cost in volume rendering to backtrace the paths leading to final object colors. We present a novel approach, ESR-NeRF, leveraging neural networks as learnable functions to represent ray-traced fields. By training networks to satisfy light transport segments, we regulate outgoing radiances, progressively identifying emissive sources while being aware of reflection areas. The results on scenes encompassing emissive sources with various properties demonstrate the superiority of ESR-NeRF in qualitative and quantitative ways. Our approach also extends its applicability to the scenes devoid of emissive sources, achieving lower CD metrics on the DTU dataset.

关键词： Inverse-rendering NeRF Scene-editing vision + graphics

来源：评论

学校读者我要写书评

暂无评论

EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

EfficientSAM: Leveraged Masked Image Pretraining for Efficie...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Xiong, Yunyang Varadarajan, Bala Wu, Lemeng Xiang, Xiaoyu Xiao, Fanyi Zhu, Chenchen Dai, Xiaoliang Wang, Dilin Sun, Fei Iandola, Forrest Krishnamoorthi, Raghuraman Chandra, Vikas Meta AI Res Menlo Pk CA 94025 USA

ISBN: (纸本)9798350353006

Segment Anything Model (SAM) has emerged as a powerful tool for numerous vision applications. A key component that drives the impressive performance for zero-shot transfer and high versatility is a super large Transformer model trained on the extensive high-quality SA-1B dataset. While beneficial, the huge computation cost of SAM model has limited its applications to wider real-world applications. To address this limitation, we propose EfficientSAMs, lightweight SAM models that exhibits decent performance with largely reduced complexity. Our idea is based on leveraging masked image pretraining, SAMI, which learns to reconstruct features from SAM image encoder for effective visual representation learning. Further, we take SAMI-pretrained light-weight image encoders and mask decoder to build EfficientSAMs, and finetune the models on SA-1B for segment anything task. We perform evaluations on multiple vision tasks including image classification, object detection, instance segmentation, and semantic segmentation, and find that our proposed pretraining method, SAMI, consistently outperforms other masked image pretraining methods. On segment anything task such as zero-shot instance segmentation, our EfficientSAMs with SAMI-pretrained lightweight image encoders perform favorably with a significant gain (e.g., similar to 4 AP on COCO/LVIS) over other fast SAM models. Our EfficientSAM code and models are available at here.

关键词： EfficientSAM Masked Image Pretraining SAMI Segment Anything vision Transformer

来源：评论

学校读者我要写书评

暂无评论

Revisiting Counterfactual Problems in Referring Expression Comprehension

Revisiting Counterfactual Problems in Referring Expression C...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Yu, Zhihan Li, Ruifan Beijing Univ Posts & Telecommun Sch Artificial Intelligence Beijing Peoples R China

ISBN: (纸本)9798350353006

Traditional referring expression comprehension (REC) aims to locate the target referent in an image guided by a text query. Several previous methods have studied on the Counterfactual problem in REC (C-REC) where the objects for a given query cannot be found in the image. However, these methods focus on the overall image-text or specific attribute mismatch only. In this paper, we address the C-REC problem from a deep perspective of fine-grained attributes. To this aim, we first propose a fine-grained counterfactual sample generation method to construct C-REC datasets. Specifically, we leverage pre-trained language model such as BERT to modify the attribute words in the queries, obtaining the corresponding counterfactual samples. Furthermore, we propose a C-REC framework. We first adopt three encoders to extract image, text and attribute features. Then, our dual-branch attentive fusion module fuses these crossmodal features with two branches by an attention mechanism. At last, two prediction heads generate a bounding box and a counterfactual label, respectively. In addition, we incorporate contrastive learning with the generated counterfactual samples as negatives to enhance the counterfactual perception. Extensive experiments show that our framework achieves promising performance on both public REC datasets RefCOCO/+/g and our constructed C-REC datasets C-RefCOCO/+/g. The code and data are available at https://***/Glacier0012/CREC.

关键词： Counterfactual Generation Referring Expression Comprehension vision and Language

来源：评论

学校读者我要写书评

暂无评论

Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange

Mitigating Object Dependencies: Improving Point Cloud Self-S...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Wu, Yanhao Zhang, Tong Ke, Wei Qi, Congpei Susstrunk, Sabine Salzmann, Mathieu Xi An Jiao Tong Univ Sch Software Engn Xian Peoples R China Ecole Polytech Fed Lausanne Sch Comp & Commun Sci Lausanne Switzerland

ISBN: (纸本)9798350353006

In the realm of point cloud scene understanding, particularly in indoor scenes, objects are arranged following human habits, resulting in objects of certain semantics being closely positioned and displaying notable inter-object cor-relations. This can create a tendency for neural networks to exploit these strong dependencies, bypassing the individual object patterns. To address this challenge, we introduce a novel self-supervised learning (SSL) strategy. Our approach leverages both object patterns and contextual cues to produce robust features. It begins with the formulation of an object-exchanging strategy, where pairs of objects with comparable sizes are exchanged across different scenes, effectively disentangling the strong contextual dependencies. Subsequently, we introduce a context-aware feature learning strategy, which encodes object patterns without relying on their specific context by aggregating object features across various scenes. Our extensive experiments demonstrate the superiority of our method over existing SSL techniques, further showing its better robustness to environmen tal changes. Moreover, we showcase the applicability of our approach by transferring pre-trained models to diverse point cloud datasets.

关键词： 3D vision self-supervised learning

来源：评论

学校读者我要写书评

暂无评论

VCoder: Versatile vision Encoders for Multimodal Large Language Models

VCoder: Versatile Vision Encoders for Multimodal Large Langu...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Jain, Jitesh Yang, Jianwei Shi, Humphrey Georgia Tech SHI Labs Atlanta GA 30332 USA Microsoft Res Redmond WA USA Picsart AI Res PAIR Atlanta GA USA

ISBN: (纸本)9798350353006

Humans possess the remarkable skill of Visual Perception, the ability to see and understand the seen, helping them make sense of the visual world and, in turn, reason. Multimodal Large Language Models (MLLM) have recently achieved impressive performance on vision-language tasks ranging from visual question-answering and image captioning to visual reasoning and image generation. However, when prompted to identify or count (perceive) the entities in a given image, existing MLLM systems fail. Working towards developing an accurate MLLM system for perception and reasoning, we propose using Versatile vision enCoders (VCoder) as perception eyes for Multimodal LLMs. We feed the VCoder with perception modalities such as segmentation or depth maps, improving the MLLM's perception abilities. Secondly, we leverage the images from COCO and outputs from off-the-shelf vision perception models to create our COCO Segmentation Text (COST) dataset for training and evaluating MLLMs on the object perception task. Thirdly, we introduce metrics to assess the object perception abilities in MLLMs on our COST dataset. Lastly, we provide extensive experimental evidence proving the VCoder's improved object-level perception skills over existing Multimodal LLMs, including GPT-4V. We open-source our dataset, code, and models to promote research.

关键词： Visual languages

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 31 32 33 34 35 36 37 38 39 40 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：