检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

20,860 篇 会议
105 篇 期刊文献
43 册 图书

馆藏范围

21,007 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,620 篇 工学
- 11,056 篇 计算机科学与技术...
- 2,652 篇 机械工程
- 2,252 篇 软件工程
- 914 篇 光学工程
- 885 篇 电气工程
- 529 篇 控制科学与工程
- 477 篇 信息与通信工程
- 216 篇 测绘科学与技术
- 135 篇 生物工程
- 127 篇 生物医学工程（可授...
- 98 篇 电子科学与技术（可...
- 92 篇 仪器科学与技术
- 46 篇 安全科学与工程
- 40 篇 建筑学
- 40 篇 化学工程与技术
- 39 篇 土木工程
- 37 篇 交通运输工程
- 35 篇 力学（可授工学、理...
- 33 篇 航空宇航科学与技...
3,494 篇 医学
- 3,489 篇 临床医学
- 32 篇 基础医学(可授医学...
2,247 篇 理学
- 1,145 篇 物理学
- 1,081 篇 数学
- 401 篇 生物学
- 384 篇 统计学（可授理学、...
- 245 篇 系统科学
- 46 篇 化学
343 篇 管理学
- 176 篇 管理科学与工程(可...
- 168 篇 图书情报与档案管...
- 34 篇 工商管理
31 篇 法学
19 篇 农学
15 篇 教育学
8 篇 经济学
5 篇 艺术学
2 篇 军事学
1 篇 文学

主题

8,141 篇 computer vision
2,886 篇 training
2,841 篇 pattern recognit...
1,809 篇 computational mo...
1,715 篇 visualization
1,493 篇 cameras
1,433 篇 three-dimensiona...
1,433 篇 feature extracti...
1,366 篇 shape
1,360 篇 face recognition
1,243 篇 image segmentati...
1,135 篇 robustness
1,124 篇 semantics
992 篇 computer archite...
985 篇 object detection
982 篇 layout
959 篇 benchmark testin...
935 篇 codes
900 篇 computer science
898 篇 object recogniti...

机构

174 篇 univ sci & techn...
158 篇 univ chinese aca...
153 篇 carnegie mellon ...
145 篇 chinese univ hon...
109 篇 microsoft resear...
103 篇 zhejiang univ pe...
99 篇 swiss fed inst t...
95 篇 tsinghua univers...
90 篇 microsoft res as...
90 篇 tsinghua univ pe...
88 篇 shanghai ai lab ...
81 篇 zhejiang univers...
77 篇 alibaba grp peop...
74 篇 hong kong univ s...
73 篇 university of sc...
72 篇 peking univ peop...
72 篇 university of ch...
68 篇 shanghai jiao to...
66 篇 univ oxford oxfo...
65 篇 google res mount...

作者

80 篇 van gool luc
70 篇 zhang lei
58 篇 timofte radu
48 篇 yang yi
47 篇 luc van gool
46 篇 xiaoou tang
44 篇 tian qi
43 篇 darrell trevor
42 篇 loy chen change
42 篇 sun jian
41 篇 qi tian
40 篇 li stan z.
38 篇 li fei-fei
37 篇 chen xilin
36 篇 shan shiguang
35 篇 zhou jie
35 篇 vasconcelos nuno
35 篇 liu yang
35 篇 torralba antonio
34 篇 liu xiaoming

语言

20,982 篇 英文
10 篇 中文
7 篇 其他
5 篇 土耳其文
2 篇 日文
2 篇 葡萄牙文

检索条件"任意字段=2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016"

共 21008 条记录，以下是381-390 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

HALLUSIONBENCH: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large vision-Language Models

HALLUSIONBENCH: An Advanced Diagnostic Suite for Entangled L...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Guan, Tianrui Liu, Fuxiao Wu, Xiyang Xian, Ruiqi Li, Zongxia Liu, Xiaoyu Wang, Xijun Chen, Lichang Huang, Furong Yacoob, Yaser Manocha, Dinesh Zhou, Tianyi Univ Maryland College Pk MD 20742 USA

ISBN: (纸本)9798350353006

We introduce "HALLUSIONBENCH(1)," a comprehensive benchmark designed for the evaluation of image-context reasoning. This benchmark presents significant challenges to advanced large visual-language models (LVLMs), such as GPT-4V(ision), Gemini Pro vision, Claude 3, and LLaVA-1.5, by emphasizing nuanced understanding and interpretation of visual data. The benchmark comprises 346 images paired with 1129 questions, all meticulously crafted by human experts. We introduce a novel structure for these visual questions designed to establish control groups. This structure enables us to conduct a quantitative analysis of the models' response tendencies, logical consistency, and various failure modes. In our evaluation on HALLUSIONBENCH, we benchmarked 15 different models, highlighting a 31.42% question-pair accuracy achieved by the state-of-the-art GPT-4V. Notably, all other evaluated models achieve accuracy below 16%. Moreover, our analysis not only highlights the observed failure modes, including language hallucination and visual illusion but also deepens an understanding of these pitfalls. Our comprehensive case studies within HALLUSIONBENCH shed light on the challenges of hallucination and illusion in LVLMs. Based on these insights, we suggest potential pathways for their future improvement. The benchmark and codebase can be accessed at https://***/tianyi-lab/HallusionBench.

关键词： Evaluation Dataset Hallucination vision language model VLM Evaluation

来源：评论

学校读者我要写书评

暂无评论

SHViT: Single-Head vision Transformer with Memory Efficient Macro Design

SHViT: Single-Head Vision Transformer with Memory Efficient ...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Yun, Seokju Ro, Youngmin Univ Seoul Machine Intelligence Lab Seoul South Korea

ISBN: (纸本)9798350353013;9798350353006

Recently, efficient vision Transformers have shown great performance with low latency on resource-constrained devices. Conventionally, they use 4x4 patch embeddings and a 4-stage structure at the macro level, while utilizing sophisticated attention with multi-head configuration at the micro level. This paper aims to address computational redundancy at all design levels in a memory-efficient manner. We discover that using larger-stride patchify stem not only reduces memory access costs but also achieves competitive performance by leveraging token representations with reduced spatial redundancy from the early stages. Furthermore, our preliminary analyses suggest that attention layers in the early stages can be substituted with convolutions, and several attention heads in the latter stages are computationally redundant. To handle this, we introduce a single-head attention module that inherently prevents head redundancy and simultaneously boosts accuracy by parallelly combining global and local information. Building upon our solutions, we introduce SHViT, a SingleHead vision Transformer that obtains the state-of-the-art speed-accuracy tradeoff. For example, on ImageNet-1k, our SHViT-S4 is 3.3x, 8.1x, and 2.4x faster than MobileViTv2 x1.0 on GPU, CPU, and iPhone12 mobile device, respectively, while being 1.3% more accurate. For object detection and instance segmentation on MS COCO using MaskRCNN head, our model achieves performance comparable to FastViT-SA12 while exhibiting 3.8x and 2.0x lower backbone latency on GPU and mobile device, respectively.

关键词： CNNs efficient visual backbone vision Transformer

来源：评论

学校读者我要写书评

暂无评论

Lookahead Exploration with Neural Radiance Representation for Continuous vision-Language Navigation

Lookahead Exploration with Neural Radiance Representation fo...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Wang, Zihan Li, Xiangyang Yang, Jiahao Liu, Yeqi Hu, Junjie Jiang, Ming Jiang, Shuqiang Chinese Acad Sci Inst Comp Technol Beijing 100190 Peoples R China Univ Chinese Acad Sci Beijing 100049 Peoples R China Univ Wisconsin Dept Comp Sci 1210 W Dayton St Madison WI 53706 USA Univ Wisconsin Dept Biostat & Med Informat Madison WI USA Indiana Univ Dept Human Ctr Comp Indianapolis IN 46204 USA

ISBN: (纸本)9798350353006

vision-and-language navigation (VLN) enables the agent to navigate to a remote location following the natural language instruction in 3D environments. At each navigation step, the agent selects from possible candidate locations and then makes the move. For better navigation planning, the lookahead exploration strategy aims to effectively evaluate the agent's next action by accurately anticipating the future environment of candidate locations. To this end, some existing works predict RGB images for future environments, while this strategy suffers from image distortion and high computational cost. To address these issues, we propose the pre-trained hierarchical neural radiance representation model (HNR) to produce multi-level semantic features for future environments, which are more robust and efficient than pixel-wise RGB reconstruction. Furthermore, with the predicted future environmental representations, our lookahead VLN model is able to construct the navigable future path tree and select the optimal path via efficient parallel evaluation. Extensive experiments on the VLN-CE datasets confirm the effectiveness of our method. The code is available at https://***/MrZihan/HNR-VLN

关键词： Lookahead Exploration Neural Radiance Fields vision-and-Language Navigation

来源：评论

学校读者我要写书评

暂无评论

Multi-scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units recognition

Multi-scale Dynamic and Hierarchical Relationship Modeling f...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Wang, Zihan Song, Siyang Luo, Cheng Deng, Songhe Xie, Weicheng Shen, Linlin Shenzhen Univ Sch Comp Sci & Software Engn Comp Vis Inst Shenzhen Peoples R China Shenzhen Univ Shenzhen Inst Artificial Intelligence & Robot Soc Shenzhen Peoples R China Shenzhen Univ Natl Engn Lab Big Data Syst Comp Technol Shenzhen Peoples R China Univ Leicester Leicester Leics England Monash Univ Clayton Vic Australia

ISBN: (纸本)9798350353013;9798350353006

Human facial action units (AUs) are mutually related in a hierarchical manner, as not only they are associated with each other in both spatial and temporal domains but also AUs located in the same/close facial regions show stronger relationships than those of different facial regions. While none of existing approach thoroughly model such hierarchical inter-dependencies among AUs, this paper proposes to comprehensively model multi-scale AU-related dynamic and hierarchical spatio-temporal relationship among AUs for their occurrences recognition. Specifically, we first propose a novel multi-scale temporal differencing network with an adaptive weighting block to explicitly capture facial dynamics across frames at different spatial scales, which specifically considers the heterogeneity of range and magnitude in different AUs' activation. Then, a two-stage strategy is introduced to hierarchically model the relationship among AUs based on their spatial distribution (i.e., local and cross-region AU relationship modelling). Experimental results achieved on BP4D and DISFA show that our approach is the new state-of-the-art in the field of AU occurrence recognition. Our code is publicly available at https://***/CVI-SZU/MDHR.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Model Inversion Robustness: Can Transfer Learning Help?

Model Inversion Robustness: Can Transfer Learning Help?

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Ho, Sy-Tuyen Hao, Koh Jun Chandrasegaran, Keshigeyan Ngoc-Bao Nguyen Cheung, Ngai-Man Singapore Univ Technol & Design SUTD Singapore Singapore Stanford Univ Stanford CA 94305 USA SUTD Singapore Singapore

ISBN: (纸本)9798350353006

Model Inversion (MI) attacks aim to reconstruct private training data by abusing access to machine learning models. Contemporary MI attacks have achieved impressive attack performance, posing serious threats to privacy. Meanwhile, all existing MI defense methods rely on regularization that is in direct conflict with the training objective, resulting in noticeable degradation in model utility. In this work, we take a different perspective, and propose a novel and simple Transfer Learning-based Defense against Model Inversion (TL-DMI) to render MI-robust models. Particularly, by leveraging TL, we limit the number of layers encoding sensitive information from private training dataset, thereby degrading the performance of MI attack. We conduct an analysis using Fisher Information to justify our method. Our defense is remarkably simple to implement. Without bells and whistles, we show in extensive experiments that TL-DMI achieves state-of-the-art (SOTA) MI robustness. Our code, pre-trained models, demo and inverted data are available at: https://***/projects/TL-DMI

关键词： accountability ethics in vision fairness privacy Transparency

来源：评论

学校读者我要写书评

暂无评论

SparseOcc: Rethinking Sparse Latent Representation for vision-Based Semantic Occupancy Prediction

SparseOcc: Rethinking Sparse Latent Representation for Visio...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Tang, Pin Wang, Zhongdao Wang, Gum-ling Zheng, Jilai Ren, Xiangxuan Feng, Bailan Ma, Chao Shanghai Jiao Tong Univ AI Inst MoE Key Lab Artificial Intelligence Shanghai Peoples R China Huawei Noahs Ark Lab Montreal PQ Canada

ISBN: (纸本)9798350353006

vision-based perception for autonomous driving requires an explicit modeling of a 3D space, where 2D latent representations are mapped and subsequent 3D operators are applied. However, operating on dense latent spaces introduces a cubic time and space complexity, which limits scalability in terms of perception range or spatial resolution. Existing approaches compress the dense representation using projections like Bird's Eye View (BEV) or Tri-Perspective View (TPV). Although efficient, these projections result in information loss, especially for tasks like semantic occupancy prediction. To address this, we propose SparseOcc, an efficient occupancy network inspired by sparse point cloud processing. It utilizes a lossless sparse latent representation with three key innovations. Firstly, a 3D sparse diffuser performs latent completion using spatially decomposed 3D sparse convolutional kernels. Secondly, a feature pyramid and sparse interpolation enhance scales with information from others. Finally, the transformer head is redesigned as a sparse variant. SparseOcc achieves a remarkable 74.9% reduction on FLOPs over the dense baseline. Interestingly, it also improves accuracy, from 12.8% to 14.1% mIOU, which in part can be attributed to the sparse representation's ability to avoid hallucinations on empty voxels.

关键词： Autonomous Driving Occupancy Prediction Sparse Representation vision-Based Perception

来源：评论

学校读者我要写书评

暂无评论

Explaining CLIP's performance disparities on data from blind/low vision users

Explaining CLIP's performance disparities on data from blind...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Massiceti, Daniela Longden, Camilla Slowik, Agnieszka Wills, Samuel Grayson, Martin Morrison, Cecily Microsoft Res Redmond WA 98052 USA World Bank 1818 H St NW Washington DC 20433 USA

ISBN: (纸本)9798350353006

Large multi-modal models (LMMs) hold the potential to usher in a new era of automated visual assistance for people who are blind or low vision (BLV). Yet, these models have not been systematically evaluated on data captured by BLV users. We address this by empirically assessing CLIP, a widely-used LMM likely to underpin many assistive technologies. Testing 25 CLIP variants in a zero-shot classification task, we find that their accuracy is 15 percentage points lower on average for images captured by BLV users than web-crawled images. This disparity stems from CLIP's sensitivities to 1) image content (e.g. not recognizing disability objects as well as other objects);2) image quality (e.g. not being robust to lighting variation);and 3) text content (e.g. not recognizing objects described by tactile adjectives as well as visual ones). We delve deeper with a textual analysis of three common pre-training datasets: LAION-400M, LAION-2B and DataComp-1B, showing that disability content is rarely mentioned. We then provide three examples that illustrate how the performance disparities extend to three downstream models underpinned by CLIP: OWL-ViT, CLIPSeg and DALL-E2. We find that few-shot learning with as few as 5 images can mitigate CLIP's quality-of-service disparities for BLV users in some scenarios, which we discuss alongside a set of other possible mitigations.

关键词： accessibility blind CLIP fairness low vision multi-modal vision-language zero-shot classification

来源：评论

学校读者我要写书评

暂无评论

Enhanced Motion-Text Alignment for Image-to-Video Transfer Learning

Enhanced Motion-Text Alignment for Image-to-Video Transfer L...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Zhang, Wei Wan, Chaoqun Liu, Tongliang Tian, Xinmei Shen, Xu Ye, Jieping Univ Sci & Technol China Hefei Peoples R China Alibaba Cloud Hangzhou Peoples R China Univ Sydney Sydney Australia Hefei Comprehens Natl Sci Ctr Inst Artificial Intelligence Hefei Peoples R China

ISBN: (纸本)9798350353006

Extending large image-text pre-trained models (e.g., CLIP) for video understanding has made significant advancements. To enable the capability of CLIP to perceive dynamic information in videos, existing works are dedicated to equipping the visual encoder with various temporal modules. However, these methods exhibit "asymmetry" between the visual and textual sides, with neither temporal descriptions in input texts nor temporal modules in text encoder. This limitation hinders the potential of language supervision emphasized in CLIP, and restricts the learning of temporal features, as the text encoder has demonstrated limited proficiency in motion understanding. To address this issue, we propose leveraging "MoTion-Enhanced Descriptions" (MoTED) to facilitate the extraction of distinctive temporal features in videos. Specifically, we first generate discriminative motion-related descriptions via querying GPT4 to compare easy-confusing action categories. Then, we incorporate both the visual and textual encoders with additional perception modules to process the video frames and generated descriptions, respectively. Finally, we adopt a contrastive loss to align the visual and textual motion features. Extensive experiments on five benchmarks show that MoTED surpasses state-of-the-art methods with convincing gaps, laying a solid foundation for empowering CLIP with strong temporal modeling.

关键词： Image-to-Video Transfer Learning Language Supervision Video recognition

来源：评论

学校读者我要写书评

暂无评论

Knowledge Combination to Learn Rotated Detection Without Rotated Annotation

Knowledge Combination to Learn Rotated Detection Without Rot...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Zhu, Tianyu Ferenczi, Bryce Purkait, Pulak Drummond, Tom Rezatofighi, Hamid van den Hengel, Anton Amazon IML Seattle WA 98109 USA Monash Univ Clayton Vic Australia

ISBN: (纸本)9798350301298

Rotated bounding boxes drastically reduce output ambiguity of elongated objects, making it superior to axis-aligned bounding boxes. Despite the effectiveness, rotated detectors are not widely employed. Annotating rotated bounding boxes is such a laborious process that they are not provided in many detection datasets where axis-aligned annotations are used instead. In this paper, we propose a framework that allows the model to predict precise rotated boxes only requiring cheaper axis-aligned annotation of the target dataset(1). To achieve this, we leverage the fact that neural networks are capable of learning richer representation of the target domain than what is utilized by the task. The under-utilized representation can be exploited to address a more detailed task. Our framework combines task knowledge of an out-of-domain source dataset with stronger annotation and domain knowledge of the target dataset with weaker annotation. A novel assignment process and projection loss are used to enable the co-training on the source and target datasets. As a result, the model is able to solve the more detailed task in the target domain, without additional computation overhead during inference. We extensively evaluate the method on various target datasets including fresh-produce dataset, HRSC2016 and SSDD. Results show that the proposed method consistently performs on par with the fully supervised approach.

关键词： detection recognition: Categorization retrieval

来源：评论

学校读者我要写书评

暂无评论

OpenBias: Open-set Bias Detection in Text-to-Image Generative Models

OpenBias: Open-set Bias Detection in Text-to-Image Generativ...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： D'Inca, Moreno Peruzzo, Elia Mancini, Massimiliano Xu, Dejia Goel, Vidit Xu, Xinggian Wang, Zhangyang Shi, Humphrey Sebe, Nicu Univ Trento Trento Italy UT Austin Austin TX USA SHI Labs Georgia Tech & UIUC Atlanta GA USA Picsart AI Res PAIR Miami FL USA

ISBN: (纸本)9798350353006

Text-to-image generative models are becoming increasingly popular and accessible to the general public. As these models see large-scale deployments, it is necessary to deeply investigate their safety and fairness to not disseminate and perpetuate any kind of biases. However, existing works focus on detecting closed sets of biases defined a priori, limiting the studies to well-known concepts. In this paper, we tackle the challenge of open-set bias detection in text-to-image generative models presenting OpenBias, a new pipeline that identifies and quantifies the severity of biases agnostically, without access to any precompiled set. OpenBias has three stages. In the first phase, we leverage a Large Language Model (LLM) to propose biases given a set of captions. Secondly, the target generative model produces images using the same set of captions. Lastly, a vision Question Answering model recognizes the presence and extent of the previously proposed biases. We study the behavior of Stable Diffusion 1.5, 2, and XL emphasizing new biases, never investigated before. Via quantitative experiments, we demonstrate that OpenBias agrees with current closed-set bias detection methods and human judgement.

关键词： Bias computer vision Fairness Generative AI Text-to-Image

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 35 36 37 38 39 40 41 42 43 44 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：