检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

20,798 篇 会议
88 篇 期刊文献
65 册 图书

馆藏范围

20,950 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,275 篇 工学
- 10,923 篇 计算机科学与技术...
- 2,484 篇 机械工程
- 2,307 篇 软件工程
- 913 篇 光学工程
- 771 篇 电气工程
- 556 篇 控制科学与工程
- 405 篇 信息与通信工程
- 210 篇 测绘科学与技术
- 131 篇 生物医学工程（可授...
- 104 篇 电子科学与技术（可...
- 100 篇 生物工程
- 92 篇 仪器科学与技术
- 56 篇 化学工程与技术
- 52 篇 建筑学
- 48 篇 土木工程
- 44 篇 安全科学与工程
- 38 篇 力学（可授工学、理...
- 38 篇 航空宇航科学与技...
- 35 篇 交通运输工程
3,457 篇 医学
- 3,449 篇 临床医学
- 34 篇 基础医学(可授医学...
2,315 篇 理学
- 1,154 篇 数学
- 1,132 篇 物理学
- 417 篇 统计学（可授理学、...
- 386 篇 生物学
- 252 篇 系统科学
- 57 篇 化学
353 篇 管理学
- 184 篇 图书情报与档案管...
- 176 篇 管理科学与工程(可...
- 32 篇 工商管理
28 篇 法学
20 篇 农学
15 篇 教育学
9 篇 经济学
8 篇 艺术学
5 篇 文学
5 篇 军事学

主题

8,203 篇 computer vision
3,010 篇 pattern recognit...
2,732 篇 training
1,769 篇 computational mo...
1,657 篇 visualization
1,483 篇 cameras
1,415 篇 shape
1,369 篇 three-dimensiona...
1,369 篇 face recognition
1,285 篇 image segmentati...
1,272 篇 feature extracti...
1,178 篇 robustness
1,090 篇 semantics
1,040 篇 layout
1,007 篇 object detection
975 篇 object recogniti...
969 篇 computer science
946 篇 computer archite...
946 篇 benchmark testin...
931 篇 codes

机构

174 篇 univ sci & techn...
154 篇 carnegie mellon ...
148 篇 univ chinese aca...
144 篇 chinese univ hon...
113 篇 microsoft resear...
103 篇 zhejiang univ pe...
99 篇 swiss fed inst t...
97 篇 tsinghua univ pe...
93 篇 tsinghua univers...
91 篇 microsoft res as...
88 篇 shanghai ai lab ...
81 篇 zhejiang univers...
76 篇 alibaba grp peop...
74 篇 hong kong univ s...
73 篇 university of sc...
72 篇 peking univ peop...
69 篇 university of ch...
68 篇 shanghai jiao to...
66 篇 google res mount...
66 篇 univ oxford oxfo...

作者

80 篇 van gool luc
71 篇 zhang lei
59 篇 timofte radu
48 篇 yang yi
47 篇 xiaoou tang
44 篇 darrell trevor
43 篇 tian qi
43 篇 luc van gool
42 篇 loy chen change
42 篇 sun jian
42 篇 li fei-fei
40 篇 qi tian
39 篇 li stan z.
37 篇 liu yang
37 篇 chen xilin
36 篇 shan shiguang
35 篇 liu xiaoming
35 篇 vasconcelos nuno
35 篇 torralba antonio
32 篇 zhou jie

语言

20,928 篇 英文
14 篇 中文
6 篇 其他
2 篇 日文
2 篇 土耳其文

检索条件"任意字段=2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009"

共 20951 条记录，以下是41-50 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

FairDeDup: Detecting and Mitigating vision-Language Fairness Disparities in Semantic Dataset Deduplication

FairDeDup: Detecting and Mitigating Vision-Language Fairness...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Slyman, Eric Lee, Stefan Cohen, Scott Kafle, Kushal Oregon State Univ Dept EECS Corvallis OR 97331 USA Adobe Res San Francisco CA 94107 USA

ISBN: (纸本)9798350353006

Worst-GroupRecent dataset deduplication techniques have demonstrated that content-aware dataset pruning can dramatically reduce the cost of training vision-Language Pre-trained (VLP) models without significant performance losses compared to training on the original dataset. These results have been based on pruning commonly used image-caption datasets collected from the web - datasets that are known to harbor harmful social biases that may then be codified in trained models. In this work, we evaluate how deduplication affects the prevalence of these biases in the resulting trained models and introduce an easy-to-implement modification to the recent SemDeDup algorithm that can reduce the negative effects that we observe. When examining CLIP-style models trained on deduplicated variants of LAION-400M, we find our proposed FairDeDup algorithm consistently leads to improved fairness metrics over SemDeDup on the FairFace and FACET datasets while maintaining zero-shot performance on CLIP benchmarks.

关键词： deduplication fairness foundation models human-centered ai language multimodal pretraining vision vision-language

来源：评论

学校读者我要写书评

暂无评论

LAFS: Landmark-based Facial Self-supervised Learning for Face recognition

LAFS: Landmark-based Facial Self-supervised Learning for Fac...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Sun, Zhonglin Feng, Chen Patras, Ioannis Tzimiropoulos, Georgios Queen Mary Univ London London England

ISBN: (纸本)9798350353013;9798350353006

In this work we focus on learning facial representations that can be adapted to train effective face recognition models, particularly in the absence of labels. Firstly, compared with existing labelled face datasets, a vastly larger magnitude of unlabeled faces exists in the real world. We explore the learning strategy of these unlabeled facial images through self-supervised pretraining to transfer generalized face recognition performance. Moreover, motivated by one recent finding, that is, the face saliency area is critical for face recognition, in contrast to utilizing random cropped blocks of images for constructing augmentations in pretraining, we utilize patches localized by extracted facial landmarks. This enables our method - namely LAndmark-based Facial Self-supervised learning (LAFS), to learn key representation that is more critical for face recognition. We also incorporate two landmark-specific augmentations which introduce more diversity of landmark information to further regularize the learning. With learned landmark-based facial representations, we further adapt the representation for face recognition with regularization mitigating variations in landmark positions. Our method achieves significant improvement over the state-of-the-art on multiple face recognition benchmarks, especially on more challenging few-shot scenarios. The code is available at https://***/szlbiubiubiulLAFS_cvpr2024.

关键词： Face recognition Facial Parts Few-shot Self-supervised

来源：评论

学校读者我要写书评

暂无评论

Enhancing vision-Language Pre-training with Rich Supervisions

Enhancing Vision-Language Pre-training with Rich Supervision...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Gao, Yuan Shi, Kunyu Zhu, Pengkai Belval, Edouard Nuriel, Oren Appalaraju, Srikar Ghadar, Shabnam Tu, Zhuowen Mahadevan, Vijay Soatto, Stefano Stanford Univ Stanford CA 94305 USA AWS AI Labs Seattle WA USA Amazon Seattle WA 98109 USA

ISBN: (纸本)9798350353006

We propose Strongly Supervised pre-training with ScreenShots (S4) - a novel pre-training paradigm for vision-Language Models using data from large-scale web screenshot rendering. Using web screenshots unlocks a treasure trove of visual and textual cues that are not present in using image-text pairs. In S4, we leverage the inherent tree-structured hierarchy of HTML elements and the spatial localization to carefully design 10 pre-training tasks with large scale annotated data. These tasks resemble down-stream tasks across different domains and the annotations are cheap to obtain. We demonstrate that, compared to current screenshot pre-training objectives, our innovative pre-training method significantly enhances performance of image-to-text model in nine varied and popular downstream tasks - up to 76.1% improvements on Table Detection, and at least 1% on Widget Captioning.

关键词： pre-training UI understanding vision language models

来源：评论

学校读者我要写书评

暂无评论

Training vision Transformers for Semi-Supervised Semantic Segmentation

Training Vision Transformers for Semi-Supervised Semantic Se...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Hu, Xinting Jiang, Li Schiele, Bernt Max Planck Inst Informat Saarland Informat Campus Munich Germany

ISBN: (纸本)9798350353013;9798350353006

We present S(4)Former, a novel approach to training vision Transformers for Semi-Supervised Semantic Segmentation (S-4). At its core, S(4)Former employs a vision Transformer within a classic teacher-student framework, and then leverages three novel technical ingredients: PatchShujjle as a parameter-free perturbation technique, Patch-Adaptive Self-Attention (PASA) as a fine-grained feature modulation method, and the innovative Negative Class Ranking (NCR) regularization loss. Based on these regularization modules aligned with Transformer-specific characteristics across the image input, feature, and output dimensions, S(4)Former exploits the Transformer's ability to capture and difef rentiate consistent global contextual information in unlabeled images. Overall, S(4)Former not only defines a new state of the art in s(4) but also maintains a streamlined and scalable architecture. Being readily compatible with existingframeworks, S(4)Former achieves strong improvements (up to 4.9%) on benchmarks like Pascal VOC 2012, COCO, and Cityscapes, with varying numbers of labeled data. The code is at https://***/JoyHuYY1412/S4Former.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Learning Correlation Structures for vision Transformers

Learning Correlation Structures for Vision Transformers

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Kim, Manjin Seo, Paul Hongsuck Schmid, Cordelia Cho, Minsu POSTECH Pohang South Korea Korea Univ Seoul South Korea Google Res Mountain View CA USA

ISBN: (纸本)9798350353006

We introduce a new attention mechanism, dubbed structural self-attention (StructSA), that leverages rich correlation patterns naturally emerging in key-query interactions of attention. StructSA generates attention maps by recognizing space-time structures of key-query correlations via convolution and uses them to dynamically aggregate local contexts of value features. This effectively leverages rich structural patterns in images and videos such as scene layouts, object motion, and inter-object relations. Using StructSA as a main building block, we develop the structural vision transformer (StructViT) and evaluate its effectiveness on both image and video classification tasks, achieving state-of-the-art results on ImageNet-1K, Kinetics-400, Something-Something V1 & V2, Diving-48, and FineGym.

关键词： correlation modeling image classification self-attention video classification vision Transformers visual representation learning

来源：评论

学校读者我要写书评

暂无评论

HumMUSS: Human Motion Understanding using State Space Models

HumMUSS: Human Motion Understanding using State Space Models

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Mondal, Arnab Alletto, Stefano Tome, Denis Mila Montreal PQ Canada Apple Cupertino CA 95014 USA

ISBN: (纸本)9798350353013;9798350353006

Understanding human motion from video is essential for a range of applications, including pose estimation, mesh recovery and action recognition. While state-of-the-art methods predominantly rely on transformer-based architectures, these approaches have limitations in practical scenarios. Transformers are slower when sequentially predicting on a continuous stream of frames in real-time, and do not generalize to new frame rates. In light of these constraints, we propose a novel attention-free spatiotemporal model for human motion understanding building upon recent advancements in state space models. Our model not only matches the performance of transformer-based models in various motion understanding tasks but also brings added benefits like adaptability to different video frame rates and enhanced training speed when working with longer sequences of keypoints. Moreover, the proposed model supports both offline and real-time applications. For real-time sequential prediction, our model is both memory efficient and several times faster than transformer-based approaches while maintaining their high accuracy.

关键词： action recognition human motion understanding mesh recovery pose estimation self-supervised learning spatiotemporal modeling state space models

来源：评论

学校读者我要写书评

暂无评论

VLP: vision Language Planning for Autonomous Driving

VLP: Vision Language Planning for Autonomous Driving

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Pan, Chenbin Yaman, Burhaneddin Nesti, Tommaso Mallik, Abhirup Allievi, Alessandro G. Velipasalar, Senem Rene, Liu Syracuse Univ Syracuse NY USA Bosch Res North Amer & Bosch Ctr Artificial Intel Sunnyvale CA 94085 USA

ISBN: (纸本)9798350353006

Autonomous driving is a complex and challenging task that aims at safe motion planning through scene understanding and reasoning. While vision-only autonomous driving methods have recently achieved notable performance, through enhanced scene understanding, several key issues, including lack of reasoning, low generalization performance and long-tail scenarios, still need to be addressed. In this paper, we present VLP, a novel vision-Language-Planning framework that exploits language models to bridge the gap between linguistic understanding and autonomous driving. VLP enhances autonomous driving systems by strengthening both the source memory foundation and the self-driving car's contextual understanding. VLP achieves state-of-the-art end-to-end planning performance on the challenging NuScenes dataset by achieving 35.9% and 60.5% reduction in terms of average L2 error and collision rates, respectively, compared to the previous best method. Moreover, VLP shows improved performance in challenging long-tail scenarios and strong generalization capabilities when faced with new urban environments.

关键词： Visual languages

来源：评论

学校读者我要写书评

暂无评论

3DInAction: Understanding Human Actions in 3D Point Clouds

3DInAction: Understanding Human Actions in 3D Point Clouds

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Ben-Shabat, Yizhak Shrout, Oren Gould, Stephen Australian Natl Univ Canberra ACT Australia Technion Israel Inst Technol Haifa Israel

ISBN: (纸本)9798350353006

We propose a novel method for 3D point cloud action recognition. Understanding human actions in RGB videos has been widely studied in recent years, however, its 3D point cloud counterpart remains under-explored despite the clear value that 3D information may bring. This is mostly due to the inherent limitation of the point cloud data modality-lack of structure, permutation invariance, and varying number of points-which makes it difficult to learn a spatio-temporal representation. To address this limitation, we propose the 3DinAction pipeline that first estimates patches moving in time (t-patches) as a key building block, alongside a hierarchical architecture that learns an informative spatio-temporal representation. We show that our method achieves improved performance on existing datasets, including DFAUST and IKEA ASM. Code is publicly available at https://***/sitzikbs/3dincaction.

关键词： 3D action recognition point clouds spatio-temporal representation temporal patches

来源：评论

学校读者我要写书评

暂无评论

Contrasting intra-modal and ranking cross-modal hard negatives to enhance visio-linguistic compositional understanding

Contrasting intra-modal and ranking cross-modal hard negativ...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Zhang, Le Awal, Rabiul Agrawal, Aishwarya Mila Quebec AI Inst Montreal PQ Canada Univ Montreal Montreal PQ Canada Canada CIFAR AI Chair Montreal PQ Canada

ISBN: (纸本)9798350353006

vision-Language Models (VLMs), such as CLIP, exhibit strong image-text comprehension abilities, facilitating advances in several downstream tasks such as zero-shot image classification, image-text retrieval, and text-to-image generation. However, the compositional reasoning abilities of existing VLMs remains subpar. The root of this limitation lies in the inadequate alignment between the images and captions in the pretraining datasets. Additionally, the current contrastive learning objective fails to focus on fine-grained grounding components like relations, actions, and attributes, resulting in "bag-of-words" representations. We introduce a simple and effective method to improve compositional reasoning in VLMs. Our method better leverages available datasets by refining and expanding the standard image-text contrastive learning framework. Our approach does not require specific annotations and does not incur extra parameters. When integrated with CLIP, our technique yields notable improvement over state-of-the-art baselines across five vision-language compositional benchmarks.(1)

关键词： compositional understanding contrastive learning vision-language models

来源：评论

学校读者我要写书评

暂无评论

PEEKABOO: Interactive Video Generation via Masked-Diffusion

PEEKABOO: Interactive Video Generation via Masked-Diffusion

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Jain, Yash Nasery, Anshul Vineet, Vibhav Behl, Harkirat Microsoft Redmond WA 98052 USA Univ Washington Seattle WA USA

ISBN: (纸本)9798350353013;9798350353006

Modern video generation models like Sora have achieved remarkable success in producing high-quality videos. However, a significant limitation is their inability to offer interactive control to users, a feature that promises to open up unprecedented applications and creativity. In this work, we introduce the first solution to equip diffusion-based video generation models with spatio-temporal control. We present PEEKABOO, a novel masked attention module, which seamlessly integrates with current video generation models offering control without the need for additional training or inference overhead. To facilitate future research, we also introduce a comprehensive benchmark for interactive video generation. This benchmark offers a standardized framework for the community to assess the efficacy of emerging interactive video generation models. Our extensive qualitative and quantitative assessments reveal that PEEKABOO achieves up to a 3.8x improvement in mIoU over baseline models, all while maintaining the same latency. Code and benchmark are available on the webpage.

关键词： computer vision diffusion interactive text to video video generation

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 1 2 3 4 5 6 7 8 9 10 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：