检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

11,267 篇 会议
14 篇 期刊文献

馆藏范围

11,281 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

7,859 篇 工学
- 7,418 篇 计算机科学与技术...
- 799 篇 机械工程
- 390 篇 电气工程
- 377 篇 软件工程
- 224 篇 控制科学与工程
- 68 篇 光学工程
- 32 篇 信息与通信工程
- 26 篇 生物工程
- 10 篇 生物医学工程（可授...
- 8 篇 化学工程与技术
- 7 篇 电子科学与技术（可...
- 6 篇 交通运输工程
- 5 篇 安全科学与工程
- 3 篇 仪器科学与技术
- 2 篇 力学（可授工学、理...
- 2 篇 材料科学与工程（可...
- 2 篇 动力工程及工程热...
- 2 篇 航空宇航科学与技...
3,103 篇 医学
- 3,102 篇 临床医学
- 4 篇 基础医学(可授医学...
297 篇 理学
- 199 篇 系统科学
- 69 篇 物理学
- 27 篇 生物学
- 24 篇 数学
- 9 篇 统计学（可授理学、...
- 7 篇 化学
23 篇 管理学
- 14 篇 图书情报与档案管...
- 9 篇 管理科学与工程(可...
- 4 篇 工商管理
6 篇 法学
- 6 篇 社会学
2 篇 农学
1 篇 教育学
1 篇 艺术学

主题

5,461 篇 computer vision
2,564 篇 training
2,118 篇 pattern recognit...
1,632 篇 computational mo...
1,454 篇 visualization
1,325 篇 three-dimensiona...
1,070 篇 semantics
972 篇 codes
968 篇 benchmark testin...
930 篇 computer archite...
885 篇 deep learning
831 篇 task analysis
729 篇 feature extracti...
541 篇 conferences
530 篇 neural networks
526 篇 face recognition
503 篇 transformers
480 篇 object detection
478 篇 image segmentati...
469 篇 cameras

机构

169 篇 univ sci & techn...
146 篇 tsinghua univ pe...
142 篇 univ chinese aca...
142 篇 carnegie mellon ...
132 篇 chinese univ hon...
122 篇 peng cheng lab p...
102 篇 zhejiang univ pe...
96 篇 sensetime res pe...
95 篇 swiss fed inst t...
90 篇 shanghai ai lab ...
86 篇 tsinghua univers...
86 篇 stanford univ st...
84 篇 shanghai jiao to...
80 篇 zhejiang univers...
79 篇 alibaba grp peop...
79 篇 univ hong kong p...
76 篇 peng cheng labor...
76 篇 tech univ munich...
74 篇 australian natl ...
73 篇 peking univ peop...

作者

67 篇 timofte radu
60 篇 van gool luc
50 篇 zhang lei
43 篇 yang yi
36 篇 loy chen change
36 篇 tao dacheng
31 篇 liu yang
30 篇 zhou jie
30 篇 chen chen
30 篇 tian qi
29 篇 sun jian
28 篇 zha zheng-jun
27 篇 qi tian
27 篇 boxin shi
26 篇 li xin
26 篇 vasconcelos nuno
26 篇 pollefeys marc
24 篇 liu xiaoming
24 篇 zheng wei-shi
24 篇 luo ping

语言

11,274 篇 英文
6 篇 其他
1 篇 中文

检索条件"任意字段=2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020"

共 11281 条记录，以下是51-60 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Semantic Shield: Defending vision-Language Models Against Backdooring and Poisoning via Fine-grained Knowledge Alignment

Semantic Shield: Defending Vision-Language Models Against Ba...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Ishmam, Alvi Md Thomas, Christopher Virginia Tech Blacksburg VA 24061 USA

ISBN: (纸本)9798350353006

In recent years there has been enormous interest in vision-language models trained using self-supervised objectives. However, the use of large-scale datasets scraped from the web for training also makes these models vulnerable to potential security threats, such as backdooring and poisoning attacks. In this paper, we propose a method for mitigating such attacks on contrastively trained vision-language models. Our approach leverages external knowledge extracted from a language model to prevent models from learning correlations between image regions which lack strong alignment with external knowledge. We do this by imposing constraints to enforce that attention paid by the model to visual regions is proportional to the alignment of those regions with external knowledge. We conduct extensive experiments using a variety of recent backdooring and poisoning attacks on multiple datasets and architectures. Our results clearly demonstrate that our proposed approach is highly effective at defending against such attacks across multiple settings, while maintaining model utility and without requiring any changes at inference time.

关键词： Adversarial attack and defense vision languge model

来源：评论

学校读者我要写书评

暂无评论

Training vision Transformers for Semi-Supervised Semantic Segmentation

Training Vision Transformers for Semi-Supervised Semantic Se...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Hu, Xinting Jiang, Li Schiele, Bernt Max Planck Inst Informat Saarland Informat Campus Munich Germany

ISBN: (纸本)9798350353013;9798350353006

We present S(4)Former, a novel approach to training vision Transformers for Semi-Supervised Semantic Segmentation (S-4). At its core, S(4)Former employs a vision Transformer within a classic teacher-student framework, and then leverages three novel technical ingredients: PatchShujjle as a parameter-free perturbation technique, Patch-Adaptive Self-Attention (PASA) as a fine-grained feature modulation method, and the innovative Negative Class Ranking (NCR) regularization loss. Based on these regularization modules aligned with Transformer-specific characteristics across the image input, feature, and output dimensions, S(4)Former exploits the Transformer's ability to capture and difef rentiate consistent global contextual information in unlabeled images. Overall, S(4)Former not only defines a new state of the art in s(4) but also maintains a streamlined and scalable architecture. Being readily compatible with existingframeworks, S(4)Former achieves strong improvements (up to 4.9%) on benchmarks like Pascal VOC 2012, COCO, and Cityscapes, with varying numbers of labeled data. The code is at https://***/JoyHuYY1412/S4Former.

关键词：

来源：评论

学校读者我要写书评

暂无评论

3DInAction: Understanding Human Actions in 3D Point Clouds

3DInAction: Understanding Human Actions in 3D Point Clouds

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Ben-Shabat, Yizhak Shrout, Oren Gould, Stephen Australian Natl Univ Canberra ACT Australia Technion Israel Inst Technol Haifa Israel

ISBN: (纸本)9798350353006

We propose a novel method for 3D point cloud action recognition. Understanding human actions in RGB videos has been widely studied in recent years, however, its 3D point cloud counterpart remains under-explored despite the clear value that 3D information may bring. This is mostly due to the inherent limitation of the point cloud data modality-lack of structure, permutation invariance, and varying number of points-which makes it difficult to learn a spatio-temporal representation. To address this limitation, we propose the 3DinAction pipeline that first estimates patches moving in time (t-patches) as a key building block, alongside a hierarchical architecture that learns an informative spatio-temporal representation. We show that our method achieves improved performance on existing datasets, including DFAUST and IKEA ASM. Code is publicly available at https://***/sitzikbs/3dincaction.

关键词： 3D action recognition point clouds spatio-temporal representation temporal patches

来源：评论

学校读者我要写书评

暂无评论

HumMUSS: Human Motion Understanding using State Space Models

HumMUSS: Human Motion Understanding using State Space Models

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Mondal, Arnab Alletto, Stefano Tome, Denis Mila Montreal PQ Canada Apple Cupertino CA 95014 USA

ISBN: (纸本)9798350353013;9798350353006

Understanding human motion from video is essential for a range of applications, including pose estimation, mesh recovery and action recognition. While state-of-the-art methods predominantly rely on transformer-based architectures, these approaches have limitations in practical scenarios. Transformers are slower when sequentially predicting on a continuous stream of frames in real-time, and do not generalize to new frame rates. In light of these constraints, we propose a novel attention-free spatiotemporal model for human motion understanding building upon recent advancements in state space models. Our model not only matches the performance of transformer-based models in various motion understanding tasks but also brings added benefits like adaptability to different video frame rates and enhanced training speed when working with longer sequences of keypoints. Moreover, the proposed model supports both offline and real-time applications. For real-time sequential prediction, our model is both memory efficient and several times faster than transformer-based approaches while maintaining their high accuracy.

关键词： action recognition human motion understanding mesh recovery pose estimation self-supervised learning spatiotemporal modeling state space models

来源：评论

学校读者我要写书评

暂无评论

Learning Correlation Structures for vision Transformers

Learning Correlation Structures for Vision Transformers

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Kim, Manjin Seo, Paul Hongsuck Schmid, Cordelia Cho, Minsu POSTECH Pohang South Korea Korea Univ Seoul South Korea Google Res Mountain View CA USA

ISBN: (纸本)9798350353006

We introduce a new attention mechanism, dubbed structural self-attention (StructSA), that leverages rich correlation patterns naturally emerging in key-query interactions of attention. StructSA generates attention maps by recognizing space-time structures of key-query correlations via convolution and uses them to dynamically aggregate local contexts of value features. This effectively leverages rich structural patterns in images and videos such as scene layouts, object motion, and inter-object relations. Using StructSA as a main building block, we develop the structural vision transformer (StructViT) and evaluate its effectiveness on both image and video classification tasks, achieving state-of-the-art results on ImageNet-1K, Kinetics-400, Something-Something V1 & V2, Diving-48, and FineGym.

关键词： correlation modeling image classification self-attention video classification vision Transformers visual representation learning

来源：评论

学校读者我要写书评

暂无评论

PEEKABOO: Interactive Video Generation via Masked-Diffusion

PEEKABOO: Interactive Video Generation via Masked-Diffusion

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Jain, Yash Nasery, Anshul Vineet, Vibhav Behl, Harkirat Microsoft Redmond WA 98052 USA Univ Washington Seattle WA USA

ISBN: (纸本)9798350353013;9798350353006

Modern video generation models like Sora have achieved remarkable success in producing high-quality videos. However, a significant limitation is their inability to offer interactive control to users, a feature that promises to open up unprecedented applications and creativity. In this work, we introduce the first solution to equip diffusion-based video generation models with spatio-temporal control. We present PEEKABOO, a novel masked attention module, which seamlessly integrates with current video generation models offering control without the need for additional training or inference overhead. To facilitate future research, we also introduce a comprehensive benchmark for interactive video generation. This benchmark offers a standardized framework for the community to assess the efficacy of emerging interactive video generation models. Our extensive qualitative and quantitative assessments reveal that PEEKABOO achieves up to a 3.8x improvement in mIoU over baseline models, all while maintaining the same latency. Code and benchmark are available on the webpage.

关键词： computer vision diffusion interactive text to video video generation

来源：评论

学校读者我要写书评

暂无评论

VLP: vision Language Planning for Autonomous Driving

VLP: Vision Language Planning for Autonomous Driving

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Pan, Chenbin Yaman, Burhaneddin Nesti, Tommaso Mallik, Abhirup Allievi, Alessandro G. Velipasalar, Senem Rene, Liu Syracuse Univ Syracuse NY USA Bosch Res North Amer & Bosch Ctr Artificial Intel Sunnyvale CA 94085 USA

ISBN: (纸本)9798350353006

Autonomous driving is a complex and challenging task that aims at safe motion planning through scene understanding and reasoning. While vision-only autonomous driving methods have recently achieved notable performance, through enhanced scene understanding, several key issues, including lack of reasoning, low generalization performance and long-tail scenarios, still need to be addressed. In this paper, we present VLP, a novel vision-Language-Planning framework that exploits language models to bridge the gap between linguistic understanding and autonomous driving. VLP enhances autonomous driving systems by strengthening both the source memory foundation and the self-driving car's contextual understanding. VLP achieves state-of-the-art end-to-end planning performance on the challenging NuScenes dataset by achieving 35.9% and 60.5% reduction in terms of average L2 error and collision rates, respectively, compared to the previous best method. Moreover, VLP shows improved performance in challenging long-tail scenarios and strong generalization capabilities when faced with new urban environments.

关键词： Visual languages

来源：评论

学校读者我要写书评

暂无评论

MULTIFLOW: Shifting Towards Task-Agnostic vision-Language Pruning

MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pr...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Farina, Matteo Mancini, Massimiliano Cunegatti, Elia Liu, Gaowen Iacca, Giovanni Ricci, Elisa Univ Trento Trento Italy Cisco Res Res Triangle Pk NC USA Fdn Bruno Kessler Povo Italy

ISBN: (纸本)9798350353006

While excellent in transfer learning, vision-Language models (VLMs) come with high computational costs due to their large number of parameters. To address this issue, removing parameters via model pruning is a viable solution. However, existing techniques for VLMs are task-specific, and thus require pruning the network from scratch for each new task of interest. In this work, we explore a new direction: Task-Agnostic vision-Language Pruning (TA-VLP). Given a pretrained VLM, the goal is to find a unique pruned counterpart transferable to multiple unknown downstream tasks. In this challenging setting, the transferable representations already encoded in the pretrained model are a key aspect to preserve. Thus, we propose Multimodal Flow Pruning (MULTIFLOW), a first, gradient-free, pruning framework for TA-VLP where: (i) the importance of a parameter is expressed in terms of its magnitude and its information flow, by incorporating the saliency of the neu-rons it connects;and (ii) pruning is driven by the emergent (multimodal) distribution of the VLM parameters after pretraining. We benchmark eight state-of-the-art pruning algorithms in the context of TA-VLP, experimenting with two VLMs, three vision-language tasks, and three pruning ratios. Our experimental results show that MULTIFLOW outperforms recent sophisticated, combinatorial competitors in the vast majority of the cases, paving the way towards addressing TA-VLP. The code is publicly available at https://***/FarinaMatteo/multiflow.

关键词： multimodal learning neural network pruning sparse neural networks transfer learning vision-language models

来源：评论

学校读者我要写书评

暂无评论

Contrasting intra-modal and ranking cross-modal hard negatives to enhance visio-linguistic compositional understanding

Contrasting intra-modal and ranking cross-modal hard negativ...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Zhang, Le Awal, Rabiul Agrawal, Aishwarya Mila Quebec AI Inst Montreal PQ Canada Univ Montreal Montreal PQ Canada Canada CIFAR AI Chair Montreal PQ Canada

ISBN: (纸本)9798350353006

vision-Language Models (VLMs), such as CLIP, exhibit strong image-text comprehension abilities, facilitating advances in several downstream tasks such as zero-shot image classification, image-text retrieval, and text-to-image generation. However, the compositional reasoning abilities of existing VLMs remains subpar. The root of this limitation lies in the inadequate alignment between the images and captions in the pretraining datasets. Additionally, the current contrastive learning objective fails to focus on fine-grained grounding components like relations, actions, and attributes, resulting in "bag-of-words" representations. We introduce a simple and effective method to improve compositional reasoning in VLMs. Our method better leverages available datasets by refining and expanding the standard image-text contrastive learning framework. Our approach does not require specific annotations and does not incur extra parameters. When integrated with CLIP, our technique yields notable improvement over state-of-the-art baselines across five vision-language compositional benchmarks.(1)

关键词： compositional understanding contrastive learning vision-language models

来源：评论

学校读者我要写书评

暂无评论

Object recognition as Next Token Prediction

Object Recognition as Next Token Prediction

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Yue, Kaiyu Chen, Bor-Chun Geiping, Jonas Li, Hengduo Goldstein, Tom Lim, Ser-Nam Meta Menlo Pk CA 94025 USA Univ Maryland College Pk MD 20742 USA ELLIS Inst Tubingen Germany MPI IS Tubingen Tubingen Germany Univ Cent Florida Orlando FL 32816 USA Meta AI Menlo Pk CA USA

ISBN: (纸本)9798350353006

We present an approach to pose object recognition as next token prediction. The idea is to apply a language decoder that auto-regressively predicts the text tokens from image embeddings to form labels. To ground this prediction process in auto-regression, we customize a non-causal attention mask for the decoder, incorporating two key features: modeling tokens from different labels to be independent, and treating image tokens as a prefix. This masking mechanism inspires an efficient method - one-shot sampling - to simultaneously sample tokens of multiple labels in parallel and rank generated labels by their probabilities during inference. To further enhance the efficiency, we propose a simple strategy to construct a compact decoder by simply discarding the intermediate blocks of a pretrained language model. This approach yields a decoder that matches the full model's performance while being notably more efficient. The code is available at ***/kaiyuyue/nxtp.

关键词： Decoding

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 2 3 4 5 6 7 8 9 10 11 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：