检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

23,142 篇 会议
91 篇 期刊文献
15 册 图书

馆藏范围

23,247 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,637 篇 工学
- 11,168 篇 计算机科学与技术...
- 3,342 篇 软件工程
- 2,414 篇 机械工程
- 1,663 篇 光学工程
- 1,205 篇 电气工程
- 974 篇 控制科学与工程
- 739 篇 信息与通信工程
- 381 篇 仪器科学与技术
- 322 篇 生物工程
- 239 篇 生物医学工程（可授...
- 189 篇 电子科学与技术（可...
- 109 篇 化学工程与技术
- 106 篇 安全科学与工程
- 99 篇 测绘科学与技术
- 85 篇 建筑学
- 85 篇 交通运输工程
- 82 篇 土木工程
- 56 篇 力学（可授工学、理...
3,696 篇 医学
- 3,684 篇 临床医学
- 76 篇 基础医学(可授医学...
3,140 篇 理学
- 1,882 篇 物理学
- 1,605 篇 数学
- 547 篇 统计学（可授理学、...
- 466 篇 生物学
- 243 篇 系统科学
- 107 篇 化学
492 篇 管理学
- 290 篇 图书情报与档案管...
- 213 篇 管理科学与工程(可...
- 74 篇 工商管理
252 篇 艺术学
- 251 篇 设计学（可授艺术学...
58 篇 法学
38 篇 农学
25 篇 教育学
19 篇 经济学
10 篇 军事学
3 篇 文学

主题

10,395 篇 computer vision
3,893 篇 pattern recognit...
3,101 篇 training
2,104 篇 computational mo...
1,898 篇 visualization
1,799 篇 cameras
1,487 篇 feature extracti...
1,475 篇 three-dimensiona...
1,464 篇 shape
1,447 篇 image segmentati...
1,287 篇 robustness
1,235 篇 computer archite...
1,213 篇 semantics
1,112 篇 benchmark testin...
1,111 篇 conferences
1,104 篇 layout
1,092 篇 object detection
1,084 篇 computer science
1,026 篇 codes
907 篇 face recognition

机构

137 篇 univ sci & techn...
124 篇 univ chinese aca...
121 篇 chinese univ hon...
108 篇 tsinghua univers...
108 篇 carnegie mellon ...
105 篇 microsoft resear...
97 篇 zhejiang univ pe...
91 篇 swiss fed inst t...
85 篇 university of sc...
84 篇 zhejiang univers...
81 篇 shanghai ai lab ...
79 篇 university of ch...
75 篇 shanghai jiao to...
69 篇 microsoft res as...
68 篇 alibaba grp peop...
66 篇 adobe research
65 篇 national laborat...
64 篇 peking univ peop...
61 篇 univ oxford oxfo...
59 篇 peng cheng labor...

作者

80 篇 van gool luc
71 篇 timofte radu
65 篇 zhang lei
43 篇 luc van gool
40 篇 yang yi
37 篇 loy chen change
34 篇 li stan z.
33 篇 liu yang
33 篇 xiaoou tang
33 篇 murino vittorio
33 篇 chen chen
33 篇 qi tian
33 篇 li fei-fei
32 篇 tian qi
32 篇 sun jian
30 篇 ying shan
30 篇 pascal fua
29 篇 darrell trevor
28 篇 li xin
28 篇 hanqing lu

语言

23,073 篇 英文
148 篇 其他
20 篇 中文
5 篇 土耳其文
2 篇 日文

检索条件"任意字段=IEEE/CVF Conference on Computer Vision and Pattern Recognition"

共 23248 条记录，以下是421-430 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

Initialization Noise in Image Gradients and Saliency Maps

Initialization Noise in Image Gradients and Saliency Maps

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Woerl, Ann-Christin Disselhoff, Jan Wand, Michael Johannes Gutenberg Univ Mainz Inst Comp Sci Mainz Germany

ISBN: (纸本)9798350301298

In this paper, we examine gradients of logits of image classification CNNs by input pixel values. We observe that these fluctuate considerably with training randomness, such as the random initialization of the networks. We extend our study to gradients of intermediate layers, obtained via GradCAM, as well as popular network saliency estimators such as DeepLIFT, SHAP, LIME, Integrated Gradients, and SmoothGrad. While empirical noise levels vary, qualitatively different attributions to image features are still possible with all of these, which comes with implications for interpreting such attributions, in particular when seeking data-driven explanations of the phenomenon generating the data. Finally, we demonstrate that the observed artefacts can be removed by marginalization over the initialization distribution by simple stochastic integration.

关键词： Explainable computer vision

来源：评论

学校读者我要写书评

暂无评论

Composing Object Relations and Attributes for Image-Text Matching

Composing Object Relations and Attributes for Image-Text Mat...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Pham, Khoi Huynh, Chuong Lim, Ser-Nam Shrivastava, Abhinav Univ Maryland College Pk MD 20742 USA Univ Cent Florida Orlando FL 32816 USA

ISBN: (纸本)9798350353006

We study the visual semantic embedding problem for image-text matching. Most existing work utilizes a tailored cross-attention mechanism to perform local alignment across the two image and text modalities. This is computationally expensive, even though it is more powerful than the unimodal dual-encoder approach. This work introduces a dual-encoder image-text matching model, leveraging a scene graph to represent captions with nodes for objects and attributes interconnected by relational edges. Utilizing a graph attention network, our model efficiently encodes object-attribute and object-object semantic relations, resulting in a robust and fast-performing system. Representing caption as a scene graph offers the ability to utilize the strong relational inductive bias of graph neural networks to learn object-attribute and object-object relations effectively. To train the model, we propose losses that align the image and caption both at the holistic level ( image-caption) and the local level (image-object entity), which we show is key to the success of the model. Our model is termed Composition model for Object Relations and Attributes, CORA. Experimental results on two prominent image-text retrieval benchmarks, Flickr30K and MS-COCO, demonstrate that CORA outperforms existing state-of-the-art computationally expensive cross-attention methods regarding recall score while achieving fast computation speed of the dual encoder. Our code is available at https://***/vkhoi/cora_cvpr24

关键词： image retrieval image-text matching vision-language

来源：评论

学校读者我要写书评

暂无评论

Model Inversion Robustness: Can Transfer Learning Help?

Model Inversion Robustness: Can Transfer Learning Help?

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Ho, Sy-Tuyen Hao, Koh Jun Chandrasegaran, Keshigeyan Ngoc-Bao Nguyen Cheung, Ngai-Man Singapore Univ Technol & Design SUTD Singapore Singapore Stanford Univ Stanford CA 94305 USA SUTD Singapore Singapore

ISBN: (纸本)9798350353006

Model Inversion (MI) attacks aim to reconstruct private training data by abusing access to machine learning models. Contemporary MI attacks have achieved impressive attack performance, posing serious threats to privacy. Meanwhile, all existing MI defense methods rely on regularization that is in direct conflict with the training objective, resulting in noticeable degradation in model utility. In this work, we take a different perspective, and propose a novel and simple Transfer Learning-based Defense against Model Inversion (TL-DMI) to render MI-robust models. Particularly, by leveraging TL, we limit the number of layers encoding sensitive information from private training dataset, thereby degrading the performance of MI attack. We conduct an analysis using Fisher Information to justify our method. Our defense is remarkably simple to implement. Without bells and whistles, we show in extensive experiments that TL-DMI achieves state-of-the-art (SOTA) MI robustness. Our code, pre-trained models, demo and inverted data are available at: https://***/projects/TL-DMI

关键词： accountability ethics in vision fairness privacy Transparency

来源：评论

学校读者我要写书评

暂无评论

Dual Memory Networks: A Versatile Adaptation Approach for vision-Language Models

Dual Memory Networks: A Versatile Adaptation Approach for Vi...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Zhang, Yabin Zhu, Wenjie Tang, Hui Ma, Zhiyuan Zhou, Kaiyang Zh, Lei HKPolyU Hong Kong Peoples R China OPPO Hong Kong Peoples R China HKUST Hong Kong Peoples R China HKBU Hong Kong Peoples R China

ISBN: (纸本)9798350353006

With the emergence of pre-trained vision-language models like CLIP, how to adapt them to various downstream classification tasks has garnered significant attention in recent research. The adaptation strategies can be typically categorized into three paradigms: zero-shot adaptation, few-shot adaptation, and the recently-proposed training-free few-shot adaptation. Most existing approaches are tailored for a specific setting and can only cater to one or two of these paradigms. In this paper, we introduce a versatile adaptation approach that can effectively work under all three settings. Specifically, we propose the dual memory networks that comprise dynamic and static memory components. The static memory caches training data knowledge, enabling training-free few-shot adaptation, while the dynamic memory preserves historical test features online during the testing process, allowing for the exploration of additional data insights beyond the training set. This novel capability enhances model performance in the few-shot setting and enables model usability in the absence of training data. The two memory networks employ the same flexible memory interactive strategy, which can operate in a training-free mode and can be further enhanced by incorporating learnable projection layers. Our approach is tested across 11 datasets under the three task settings. Remarkably, in the zero-shot scenario, it outperforms existing methods by over 3% and even shows superior results against methods utilizing external training data. Additionally, our method exhibits robust performance against natural distribution shifts. Codes are available at https://***/YBZh/DMN.

关键词： dual memory networks versatile adaptation vision-language models

来源：评论

学校读者我要写书评

暂无评论

LAMP: Learn A Motion pattern for Few-Shot Video Generation

LAMP: Learn A Motion Pattern for Few-Shot Video Generation

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Wu, Ruiqi Chen, Liangyu Yang, Tong Guo, Chunle Li, Chongyi Zhang, Xiangyu Nankai Univ CS VCIP Tianjin Peoples R China NKIARI Shenzhen Futian Peoples R China MEGVII Technol Beijing Peoples R China

ISBN: (纸本)9798350353013;9798350353006

In this paper, we present a few-shot text-to-video framework, LAMP, which enables a text-to-image diffusion model to Learn A specific Motion pattern with 8 16 videos on a single GPU. Unlike existing methods, which require a large number of training resources or learn motions that are precisely aligned with template videos, it achieves a trade-off between the degree of generation freedom and the resource costs for model training. Specifically, we design a motion-content decoupled pipeline that uses an off-the-shelf text-to-image model for content generation so that our tuned video diffusion model mainly focuses on motion learning. The well-developed text-to-image techniques can provide visually pleasing and diverse content as generation conditions, which highly improves video quality and generation freedom. To capture the features of temporal dimension, we expand the pre-trained 2D convolution layers of the T2I model to our novel temporal-spatial motion learning layers and modify the attention blocks to the temporal level. Additionally, we develop an effective inference trick, shared-noise sampling, which can improve the stability of videos without computational costs. Our method can also be flexibly applied to other tasks, e.g. real-world image animation and video editing. Extensive experiments demonstrate that LAMP can effectively learn the motion pattern on limited data and generate high-quality videos. The code and models are available at https://rqwu. ***/projects/LAMP.

关键词： Image coding

来源：评论

学校读者我要写书评

暂无评论

Modeling Collaborator: Enabling Subjective vision Classification With Minimal Human Effort via LLM Tool-Use

Modeling Collaborator: Enabling Subjective Vision Classifica...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Toubal, Imad Eddine Avinash, Aditya Alldrin, Neil Gordon Dlabal, Jan Zhou, Wenlei Luo, Enming Stretcu, Otilia Xiong, Hao Lu, Chun-Ta Zhou, Howard Krishna, Ranjay Fuxman, Ariel Duerig, Tom Google Res Mountain View CA 94043 USA Univ Missouri Columbia MO 65211 USA Univ Washington Seattle WA USA

ISBN: (纸本)9798350353006

From content moderation to wildlife conservation, the number of applications that require models to recognize nuanced or subjective visual concepts is growing. Traditionally, developing classifiers for such concepts requires substantial manual effort measured in hours, days, or even months to identify and annotate data needed for training. Even with recently proposed Agile Modeling techniques, which enable rapid bootstrapping of image classifiers, users are still required to spend 30 minutes or more of monotonous, repetitive data labeling just to train a single classifier. Drawing on Fiske's Cognitive Miser theory, we propose a new framework that alleviates manual effort by replacing human labeling with natural language interactions, reducing the total effort required to define a concept by an order of magnitude: from labeling 2,000 images to only 100 plus some natural language interactions. Our framework leverages recent advances in foundation models, both large language models and vision-language models, to carve out the concept space through conversation and by automatically labeling training data points. Most importantly, our framework eliminates the need for crowd-sourced annotations. Moreover, our framework ultimately produces lightweight classification models that are deployable in cost-sensitive scenarios. Across 15 subjective concepts and across 2 public image classification datasets, our trained models outperform traditional Agile Modeling as well as state-of-the-art zero-shot classification models like ALIGN, CLIP, CuPL, and large visual question answering models like PaLI-X.

关键词： agile modeling large language model subjective classification tool-use vision language model

来源：评论

学校读者我要写书评

暂无评论

Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval

Do You Remember? Dense Video Captioning with Cross-Modal Mem...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Kim, Minkuk Kim, Hyeon Bae Moon, Jinyoung Choi, Jinwoo Kim, Seong Tae Kyung Hee Univ Seoul South Korea Elect & Telecommun Res Inst ETRI Daejeon South Korea

ISBN: (纸本)9798350353006

There has been significant attention to the research on dense video captioning, which aims to automatically localize and caption all events within untrimmed video. Several studies introduce methods by designing dense video captioning as a multitasking problem of event localization and event captioning to consider inter-task relations. However, addressing both tasks using only visual input is challenging due to the lack of semantic content. In this study, we address this by proposing a novel framework inspired by the cognitive information processing of humans. Our model utilizes external memory to incorporate prior knowledge. The memory retrieval method is proposed with cross-modal video-to-text matching. To effectively incorporate retrieved text features, the versatile encoder and the decoder with visual and textual cross-attention modules are designed. Comparative experiments have been conducted to show the effectiveness of the proposed method on ActivityNet Captions and YouCook2 datasets. Experimental results show promising performance of our model with-out extensive pretraining from a large video dataset. Our code is available at https://***/ailab-kyunghee/CM2_DVC.

关键词： Dense Video Captioning Launguage Reasoning Retrieval Agumented Generation Video Captioning vision

来源：评论

学校读者我要写书评

暂无评论

GeoChat : Grounded Large vision-Language Model for Remote Sensing

GeoChat : Grounded Large Vision-Language Model for Remote Se...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Kuckreja, Kartik Danish, Muhammad Sohail Naseer, Muzammal Das, Abhijit Khan, Salman Khan, Fahad Shahbaz Mohamed bin Zayed Univ AI Abu Dhabi U Arab Emirates Birla Inst Technol & Sci Hyderabad India Australian Natl Univ Canberra ACT Australia Linkoping Univ Linkoping Sweden

ISBN: (纸本)9798350353006

Recent advancements in Large vision-Language Models (VLMs) have shown great promise in natural image domains, allowing users to hold a dialogue about given visual content. However, such general-domain VLMs perform poorly for Remote Sensing (RS) scenarios, leading to inaccurate or fabricated information when presented with RS domain-specific queries. Such a behavior emerges due to the unique challenges introduced by RS imagery. For example, to handle high-resolution RS imagery with diverse scale changes across categories and many small objects, region-level reasoning is necessary alongside holistic scene interpretation. Furthermore, the lack of domain-specific multimodal instruction following data as well as strong backbone models for RS make it hard for the models to align their behavior with user queries. To address these limitations, we propose GeoChat - the first versatile remote sensing VLM that offers multitask conversational capabilities with high-resolution RS images. Specifically, GeoChat can not only answer image-level queries but also accepts region inputs to hold region-specific dialogue. Furthermore, it can visually ground objects in its responses by referring to their spatial coordinates. To address the lack of domain-specific datasets, we generate a novel RS multi-modal instruction-following dataset by extending image-text pairs from existing diverse RS datasets. We establish a comprehensive benchmark for RS multitask conversations and compare with a number of baseline methods. GeoChat demonstrates robust zero-shot performance on various RS tasks, e.g., image and region captioning, visual question answering, scene classification, visually grounded conversations and referring detection. Our code is available here.

关键词：

来源：评论

学校读者我要写书评

暂无评论

From a Bird's Eye View to See: Joint Camera and Subject Registration without the Camera Calibration

From a Bird's Eye View to See: Joint Camera and Subject Regi...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Qian, Zekun Han, Ruize Feng, Wei Wang, Song Tianjin Univ Coll Intelligence & Comp Tianjin Peoples R China Chinese Acad Sci Shenzhen Inst Adv Technol Shenzhen Peoples R China City Univ Hong Kong Hong Kong Peoples R China Univ South Carolina Columbia SC 29208 USA

ISBN: (纸本)9798350353013;9798350353006

We tackle a new problem of multi-view camera and subject registration in the bird's eye view (BEV) without pre-given camera calibration, which promotes the multi-view subject registration problem to a new calibration-free stage. This greatly alleviates the limitation in many practical applications. However, this is a very challenging problem since its only input is several RGB images from different first-person views (FPVs), without the BEV image and the calibration of the FPVs, while the output is a unified plane aggregated from all views with the positions and orientations of both the subjects and cameras in a BEV. For this purpose, we propose an end-to-end framework solving camera and subject registration together by taking advantage of their mutual dependence, whose main idea is as below: i) creating a subject view-transform module (VTM) to project each pedestrian from FPV to a virtual BEV, ii) deriving a multi-view geometry-based spatial alignment module (SAM) to estimate the relative camera pose in a unified BEV, iii) selecting and refining the subject and camera registration results within the unified BEV. We collect a new large-scale synthetic dataset with rich annotations for training and evaluation. Additionally, we also collect a real dataset for cross-domain evaluation. The experimental results show the remarkable effectiveness of our method. The code and proposed datasets are available at BEVSee.

关键词： Camera Registration computer vision Multiview Detection Subject Registration

来源：评论

学校读者我要写书评

暂无评论

HALLUSIONBENCH: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large vision-Language Models

HALLUSIONBENCH: An Advanced Diagnostic Suite for Entangled L...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Guan, Tianrui Liu, Fuxiao Wu, Xiyang Xian, Ruiqi Li, Zongxia Liu, Xiaoyu Wang, Xijun Chen, Lichang Huang, Furong Yacoob, Yaser Manocha, Dinesh Zhou, Tianyi Univ Maryland College Pk MD 20742 USA

ISBN: (纸本)9798350353006

We introduce "HALLUSIONBENCH(1)," a comprehensive benchmark designed for the evaluation of image-context reasoning. This benchmark presents significant challenges to advanced large visual-language models (LVLMs), such as GPT-4V(ision), Gemini Pro vision, Claude 3, and LLaVA-1.5, by emphasizing nuanced understanding and interpretation of visual data. The benchmark comprises 346 images paired with 1129 questions, all meticulously crafted by human experts. We introduce a novel structure for these visual questions designed to establish control groups. This structure enables us to conduct a quantitative analysis of the models' response tendencies, logical consistency, and various failure modes. In our evaluation on HALLUSIONBENCH, we benchmarked 15 different models, highlighting a 31.42% question-pair accuracy achieved by the state-of-the-art GPT-4V. Notably, all other evaluated models achieve accuracy below 16%. Moreover, our analysis not only highlights the observed failure modes, including language hallucination and visual illusion but also deepens an understanding of these pitfalls. Our comprehensive case studies within HALLUSIONBENCH shed light on the challenges of hallucination and illusion in LVLMs. Based on these insights, we suggest potential pathways for their future improvement. The benchmark and codebase can be accessed at https://***/tianyi-lab/HallusionBench.

关键词： Evaluation Dataset Hallucination vision language model VLM Evaluation

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 39 40 41 42 43 44 45 46 47 48 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：