检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

22,774 篇 会议
111 篇 期刊文献
23 册 图书

馆藏范围

22,907 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,400 篇 工学
- 10,880 篇 计算机科学与技术...
- 3,450 篇 软件工程
- 2,429 篇 机械工程
- 1,723 篇 光学工程
- 1,011 篇 控制科学与工程
- 998 篇 电气工程
- 761 篇 信息与通信工程
- 393 篇 仪器科学与技术
- 337 篇 生物工程
- 257 篇 生物医学工程（可授...
- 214 篇 电子科学与技术（可...
- 113 篇 化学工程与技术
- 112 篇 安全科学与工程
- 98 篇 测绘科学与技术
- 93 篇 交通运输工程
- 86 篇 建筑学
- 82 篇 土木工程
3,361 篇 医学
- 3,347 篇 临床医学
- 79 篇 基础医学(可授医学...
3,251 篇 理学
- 1,953 篇 物理学
- 1,665 篇 数学
- 567 篇 统计学（可授理学、...
- 484 篇 生物学
- 245 篇 系统科学
- 109 篇 化学
506 篇 管理学
- 299 篇 图书情报与档案管...
- 219 篇 管理科学与工程(可...
- 75 篇 工商管理
252 篇 艺术学
- 252 篇 设计学（可授艺术学...
62 篇 法学
- 59 篇 社会学
40 篇 农学
25 篇 教育学
19 篇 经济学
11 篇 军事学
3 篇 文学

主题

10,126 篇 computer vision
4,026 篇 pattern recognit...
2,900 篇 training
1,958 篇 computational mo...
1,792 篇 cameras
1,759 篇 visualization
1,484 篇 shape
1,466 篇 image segmentati...
1,445 篇 feature extracti...
1,412 篇 three-dimensiona...
1,288 篇 robustness
1,170 篇 computer archite...
1,146 篇 layout
1,142 篇 computer science
1,134 篇 semantics
1,071 篇 object detection
1,043 篇 conferences
1,009 篇 benchmark testin...
967 篇 codes
810 篇 face recognition

机构

135 篇 univ sci & techn...
118 篇 univ chinese aca...
118 篇 chinese univ hon...
110 篇 carnegie mellon ...
99 篇 tsinghua univers...
99 篇 microsoft resear...
94 篇 swiss fed inst t...
92 篇 zhejiang univ pe...
82 篇 university of sc...
81 篇 zhejiang univers...
77 篇 shanghai ai lab ...
77 篇 university of ch...
72 篇 shanghai jiao to...
68 篇 microsoft res as...
65 篇 national laborat...
65 篇 alibaba grp peop...
63 篇 adobe research
63 篇 tsinghua univ pe...
60 篇 peking univ peop...
59 篇 peng cheng labor...

作者

78 篇 van gool luc
72 篇 timofte radu
63 篇 zhang lei
45 篇 luc van gool
40 篇 yang yi
37 篇 loy chen change
33 篇 xiaoou tang
33 篇 li stan z.
33 篇 qi tian
32 篇 sun jian
31 篇 liu yang
31 篇 li fei-fei
30 篇 chen chen
30 篇 tian qi
30 篇 pascal fua
29 篇 darrell trevor
28 篇 ying shan
27 篇 li xin
27 篇 vasconcelos nuno
27 篇 hanqing lu

语言

22,719 篇 英文
162 篇 其他
20 篇 中文
5 篇 土耳其文
2 篇 日文

检索条件"任意字段=1994 IEEE Computer-Society Conference on Computer Vision and Pattern Recognition"

共 22908 条记录，以下是241-250 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Sequential Modeling Enables Scalable Learning for Large vision Models

Sequential Modeling Enables Scalable Learning for Large Visi...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Bail, Yutong Geng, Xinyang Mangalam, Karttikeya Bar, Amir Yuille, Alan L. Darrell, Trevor Malik, Jitendra Efros, Alexei A. UC Berkeley BAIR Berkeley CA 94720 USA Johns Hopkins Univ Baltimore MD 21218 USA

ISBN: (纸本)9798350353006

We introduce a novel sequential modeling approach which enables learning a Large vision Model (LVM) without making use of any linguistic data. To do this, we define a common format, "visual sentences", in which we can represent raw images and videos as well as annotated data sources such as semantic segmentations and depth reconstructions with-out needing any meta-knowledge beyond the pixels. Once this wide variety of visual data (comprising 420 billion to-kens) is represented as sequences, the model can be trained to minimize a cross-entropy loss for next token prediction. By training across various scales of model architecture and data diversity, we provide empirical evidence that our models scale effectively. Many different vision tasks can be solved by designing suitable visual prompts at test time.

关键词： pretraining scaling Self-supervised Learning

来源：评论

学校读者我要写书评

暂无评论

GRAM: Global Reasoning for Multi-Page VQA

GRAM: Global Reasoning for Multi-Page VQA

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Blau, Tsachi Fogel, Sharon Ronen, Roi Goltst, Alona Per, Shahar Tsi Ben Avraham, Elad Aberdam, Aviad Ganz, Roy Litman, Ron Technion Haifa Israel AWS AI Labs Shanghai Peoples R China

ISBN: (纸本)9798350353006

The increasing use of transformer-based large language models brings forward the challenge of processing long sequences. In document visual question answering (DocVQA), leading methods focus on the single-page setting, while documents can span hundreds of pages. We present GRAM, a method that seamlessly extends pre-trained single-page models to the multi-page setting, without requiring computationally-heavy pretraining. To do so, we leverage a single-page encoder for local page-level understanding, and enhance it with document-level designated layers and learnable tokens, facilitating the flow of information across pages for global reasoning. To enforce our model to utilize the newly introduced document tokens, we propose a tailored bias adaptation method. For additional computational savings during decoding, we introduce an optional compression stage using our compression-transformer(C-Former), reducing the encoded sequence length, thereby allowing a tradeoff between quality and latency. Extensive experiments showcase GRAM's state-of-the-art performance on the benchmarks for multi-page DocVQA, demonstrating the effectiveness of our approach.

关键词： Document Understanding Long Sequence Processing vision Language Models

来源：评论

学校读者我要写书评

暂无评论

Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation

Split to Merge: Unifying Separated Modalities for Unsupervis...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Li, Xinyao Li, Yuke Du, Zhekai Li, Fengling Lu, Ke Li, Jingjing Univ Elect Sci & Technol China Chengdu Peoples R China Boston Coll Chestnut Hill MA 02167 USA Univ Technol Sydney Sydney NSW Australia

ISBN: (纸本)9798350353006

Large vision-language models (VLMs) like CLIP have demonstrated good zero-shot learning performance in the unsupervised domain adaptation task. Yet, most transfer approaches for VLMs focus on either the language or visual branches, overlooking the nuanced interplay between both modalities. In this work, we introduce a Unified Modality Separation (UniMoS) framework for unsupervised domain adaptation. Leveraging insights from modality gap studies, we craft a nimble modality separation network that distinctly disentangles CLIP's features into language-associated and vision-associated components. Our proposed Modality-Ensemble Training (MET) method fosters the exchange of modality-agnostic information while maintaining modality-specific nuances. We align features across domains using a modality discriminator. Comprehensive evaluations on three benchmarks reveal our approach sets a new state-of-the-art with minimal computational costs. Code: https://***/TL-UESTC/UniMoS.

关键词： deep learning Unsupervised domain adaptation vision-language model

来源：评论

学校读者我要写书评

暂无评论

ieee computer society conference on computer vision and pattern recognition Workshops

IEEE Computer Society Conference on Computer Vision and Patt...

引用

31st Meeting of the ieee/CVF conference on computer vision and pattern recognition Workshops, CVPRW 2018

ISBN: (纸本)9781538661000

The proceedings contain 307 papers. The topics discussed include: deep features for recognizing disguised faces in the wild;unconstrained fingerphoto database;hybrid user-independent and user-dependent offline signature verification with a two-channel CNN;it takes two to tango: cascading off-the-shelf face detectors;time analysis of pulse-based face anti-spoofing in visible and NIR;a deep face identification network enhanced by facial attributes prediction;gait recognition by deformable registration;and fusion of handcrafted and deep learning features for large-scale multiple iris presentation attack detection.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Transductive Zero-Shot and Few-Shot CLIP

Transductive Zero-Shot and Few-Shot CLIP

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Martin, Segolene Huang, Yunshi Shakeri, Fereshteh Pesquet, Jean-Christophe Ben Ayed, Ismail Univ Paris Saclay CVN Cent Supelec INRIA Paris France ETS Montreal Montreal PQ Canada

ISBN: (纸本)9798350353006

Transductive inference has been widely investigated in few-shot image classification, but completely overlooked in the recent, fast growing literature on adapting vision-langage models like CLIP. This paper addresses the transductive zero-shot and few-shot CLIP classification challenge, in which inference is performed jointly across a mini-batch of unlabeled query samples, rather than treating each instance independently. We initially construct informative vision-text probability features, leading to a classification problem on the unit simplex set. Inspired by Expectation-Maximization (EM), our optimization-based classification objective models the data probability distribution for each class using a Dirichlet law. The minimization problem is then tackled with a novel block Majorization-Minimization algorithm, which simultaneously estimates the distribution parameters and class assignments. Extensive numerical experiments on 11 datasets underscore the benefits and efficacy of our batch inference approach. On zero-shot tasks with test batches of 75 samples, our approach yields near 20% improvement in ImageNet accuracy over CLIP's zero-shot performance. Additionally, we outperform state-of-the-art methods in the few-shot setting. The code is available at: https://***/SegoleneMartin/transductive-CLIP.

关键词： expectation-maximization few-shot text-vision models transductive learning zero-shot

来源：评论

学校读者我要写书评

暂无评论

Not All Prompts Are Secure: A Switchable Backdoor Attack Against Pre-trained vision Transfomers

Not All Prompts Are Secure: A Switchable Backdoor Attack Aga...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Yang, Sheng Bai, Jiawang Gao, Kuofeng Yang, Yong Li, Yiming Xia, Shu-Tao Tsinghua Univ Beijing Peoples R China Tencent Secur Platform Dept Shenzhen Peoples R China Zhejiang Univ Hangzhou Peoples R China Peng Cheng Lab Res Ctr Artificial Intelligence Shenzhen Peoples R China

ISBN: (纸本)9798350353006

Given the power of vision transformers, a new learning paradigm, pretraining and then prompting, makes it more efficient and effective to address downstream visual recognition tasks. In this paper, we identify a novel security threat towards such a paradigm from the perspective of backdoor attacks. Specifically, an extra prompt token, called the switch token in this work, can turn the backdoor mode on, i.e., converting a benign model into a backdoored one. Once under the backdoor mode, a specific trigger can force the model to predict a target class. It poses a severe risk to the users of cloud API, since the malicious behavior can not be activated and detected under the benign mode, thus making the attack very stealthy. To attack a pretrained model, our proposed attack, named SWARM, learns a trigger and prompt tokens including a switch token. They are optimized with the clean loss which encourages the model always behaves normally even the trigger presents, and the backdoor loss that ensures the backdoor can be activated by the trigger when the switch is on. Besides, we utilize the crossmode feature distillation to reduce the effect of the switch token on clean samples. The experiments on diverse visual recognition tasks confirm the success of our switchable backdoor attack, i.e., achieving 95%+ attack success rate, and also being hard to be detected and removed. Our code is available at https://***/20000yshust/SWARM.

关键词： Backdoor computer vision Parameter-Efficient Fine Tuning Switchable vision Transformers Visual Prompting

来源：评论

学校读者我要写书评

暂无评论

vision-language models for decoding provider attention during neonatal resuscitation

Vision-language models for decoding provider attention durin...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Parodi, Felipe Matelsky, Jordan K. Regla-Vargas, Alejandra Foglia, Elizabeth E. Lim, Charis Weinberg, Danielle Kording, Konrad P. Herrick, Heidi M. Platt, Michael L. Univ Penn Dept Neurosci Philadelphia PA 19104 USA Univ Penn Dept Bioengn Philadelphia PA 19104 USA Univ Penn Dept Sociol Philadelphia PA 19104 USA Univ Penn Dept Mkt Philadelphia PA 19104 USA Univ Penn Dept Psychol 3815 Walnut St Philadelphia PA 19104 USA Univ Penn Dept Pediat Div Neonatol Perelman Sch Med Philadelphia PA 19104 USA Childrens Hosp Philadelphia Dept Pediat Div Neonatol Philadelphia PA 19104 USA Johns Hopkins Univ Appl Phys Lab Baltimore MD 21218 USA

ISBN: (纸本)9798350365474

Neonatal resuscitations demand an exceptional level of attentiveness from providers, who must process multiple streams of information simultaneously. Gaze strongly influences decision making;thus, understanding where a provider is looking during neonatal resuscitations could inform provider training, enhance real-time decision support, and improve the design of delivery rooms and neonatal intensive care units (NICUs). Current approaches to quantifying neonatal providers' gaze rely on manual coding or simulations, which limit scalability and utility. Here, we introduce an automated, real-time, deep learning approach capable of decoding provider gaze into semantic classes directly from first-person point-of-view videos recorded during live resuscitations. Combining state-of-the-art, real-time segmentation with vision-language models, our low-shot pipeline attains 91% classification accuracy in identifying gaze targets without training. Upon fine-tuning, the performance of our gaze-guided vision transformer exceeds 98% accuracy in semantic gaze analysis, approaching human-level precision. This system, capable of real-time inference, enables objective quantification of provider attention dynamics during live neonatal resuscitation. Our approach offers a scalable solution that seamlessly integrates with existing infrastructure for data-scarce gaze analysis, thereby offering new opportunities for understanding and refining clinical decision making.

关键词： Resuscitation

来源：评论

学校读者我要写书评

暂无评论

Exploring Regional Clues in CLIP for Zero-Shot Semantic Segmentation

Exploring Regional Clues in CLIP for Zero-Shot Semantic Segm...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Zhang, Yi Guo, Meng-Hao Wang, Miao Hu, Shi-Min Beihang Univ State Key Lab Virtual Real Technol & Syst SCSE Beijing Peoples R China Tsinghua Univ Dept Comp Sci & Technol BNRist Beijing Peoples R China Tsinghua Univ Beijing Peoples R China

ISBN: (纸本)9798350353013;9798350353006

CLIP has demonstrated marked progress in visual recognition due to its powerful pre-training on large-scale image-text pairs. However, it still remains a critical challenge: how to transfer image-level knowledge into pixel-level understanding tasks such as semantic segmentation. In this paper, to solve the mentioned challenge, we analyze the gap between the capability of the CLIP model and the requirement of the zero-shot semantic segmentation task. Based on our analysis and observations, we propose a novel method for zero-shot semantic segmentation, dubbed CLIP-RC (CLIP with Regional Clues), bringing two main insights. On the one hand, a region-level bridge is necessary to provide fine-grained semantics. On the other hand, overfitting should be mitigated during the training stage. Benefiting from the above discoveries, CLIP-RC achieves state-of-the-art performance on various zero-shot semantic segmentation benchmarks, including PASCAL VOC, PASCAL Context, and COCO-Stuff 164K. Code will be available at https://***/Jittor/JSeg.

关键词： computer vision semantic segmentation zero-shot semantic segmentation

来源：评论

学校读者我要写书评

暂无评论

Towards Better vision-Inspired vision-Language Models

Towards Better Vision-Inspired Vision-Language Models

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Cao, Yun-Hao Ji, Kaixiang Huang, Ziyuan Zheng, Chuanyang Liu, Jiajia Wang, Jian Chen, Jingdong Yang, Ming Nanjing Univ Natl Key Lab Novel Software Technol Nanjing Jiangsu Peoples R China Ant Grp Hangzhou Zhejiang Peoples R China

ISBN: (纸本)9798350353006

vision-language (VL) models have achieved unprecedented success recently, in which the connection module is the key to bridge the modality gap. Nevertheless, the abundant visual clues are not sufficiently exploited in most existing methods. On the vision side, most existing approaches only use the last feature of the vision tower, without using the low-level features. On the language side, most existing methods only introduce shallow vision-language interactions. In this paper, we present a vision-inspired vision-language connection module, dubbed as VIVL, which efficiently exploits the vision cue for VL models. To take advantage of the lower-level information from the vision tower, a feature pyramid extractor (FPE) is introduced to combine features from different intermediate layers, which enriches the visual cue with negligible parameters and computation overhead. To enhance VL interactions, we propose deep vision-conditioned prompts (DVCP) that allows deep interactions of vision and language features efficiently. Our VIVL exceeds the previous state-of- the-art method by 18.1 CIDEr when training from scratch on the COCO caption task, which greatly improves the data efficiency. When used as a plug-in module, VIVL consistently improves the performance for various backbones and VL frameworks, delivering new state-of-the-art results on multiple benchmarks, e.g., NoCaps and VQAv2.

关键词： deep learning vision-language models feature pyramid deep prompt

来源：评论

学校读者我要写书评

暂无评论

PeVL: Pose-Enhanced vision-Language Model for Fine-Grained Human Action recognition

PeVL: Pose-Enhanced Vision-Language Model for Fine-Grained H...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Zhang, Haosong Leong, Mei Chee Li, Liyuan Lin, Weisi Nanyang Technol Univ Inst Infocomm Res I2R A STAR Singapore Singapore

ISBN: (纸本)9798350353006

Recent progress in vision-Language (VL) foundation models has revealed the great advantages of cross-modality learning. However, due to a large gap between vision and text, they might not be able to sufficiently utilize the benefits of cross-modality information. In the field of human action recognition, the additional pose modality may bridge the gap between vision and text to improve the effectiveness of cross-modality learning. In this paper, we propose a novel framework, called Pose-enhanced vision-Language (PeVL) model, to adapt the VL model with pose modality to learn effective knowledge of fine-grained human actions. Our PeVL model includes two novel components: an Unsymmetrical Cross-Modality Refinement (UCMR) block and a Semantic-Guided Multi-level Contrastive (SGMC) module. The UCMR block includes Pose-guided Visual Refinement (P2V-R) and Visual-enriched Pose Refinement (V2P-R) for effective cross-modality learning. The SGMC module includes Multi-level Contrastive Associations of vision-text and pose-text at both action and sub-action levels, and a Semantic-Guided Loss, enabling effective contrastive learning with text. Built upon a pre-trained VL foundation model, our model integrates trainable adapters and can be trained end-to-end. Our novel PeVL design over VL foundation model yields remarkable performance gains on four fine-grained human action recognition datasets, achieving a new SOTA with a significantly small number of FLOPs for low-cost re-training.(1)

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 21 22 23 24 25 26 27 28 29 30 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：