检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

23,001 篇 会议
126 册 图书
92 篇 期刊文献

馆藏范围

23,218 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,623 篇 工学
- 11,108 篇 计算机科学与技术...
- 3,479 篇 软件工程
- 2,445 篇 机械工程
- 1,716 篇 光学工程
- 1,075 篇 电气工程
- 1,014 篇 控制科学与工程
- 785 篇 信息与通信工程
- 412 篇 仪器科学与技术
- 352 篇 生物工程
- 251 篇 生物医学工程（可授...
- 196 篇 电子科学与技术（可...
- 114 篇 化学工程与技术
- 108 篇 安全科学与工程
- 100 篇 测绘科学与技术
- 88 篇 建筑学
- 87 篇 交通运输工程
- 84 篇 土木工程
3,494 篇 医学
- 3,481 篇 临床医学
- 81 篇 基础医学(可授医学...
3,242 篇 理学
- 1,939 篇 物理学
- 1,640 篇 数学
- 563 篇 统计学（可授理学、...
- 500 篇 生物学
- 249 篇 系统科学
- 107 篇 化学
522 篇 管理学
- 311 篇 图书情报与档案管...
- 224 篇 管理科学与工程(可...
- 76 篇 工商管理
276 篇 艺术学
- 276 篇 设计学（可授艺术学...
66 篇 法学
- 63 篇 社会学
38 篇 农学
28 篇 教育学
22 篇 经济学
10 篇 军事学
3 篇 文学

主题

10,187 篇 computer vision
3,967 篇 pattern recognit...
3,005 篇 training
2,007 篇 computational mo...
1,818 篇 visualization
1,815 篇 cameras
1,516 篇 feature extracti...
1,481 篇 shape
1,455 篇 three-dimensiona...
1,438 篇 image segmentati...
1,287 篇 robustness
1,205 篇 computer archite...
1,155 篇 semantics
1,147 篇 conferences
1,107 篇 layout
1,092 篇 computer science
1,087 篇 object detection
1,025 篇 benchmark testin...
970 篇 codes
922 篇 face recognition

机构

136 篇 univ sci & techn...
121 篇 univ chinese aca...
118 篇 chinese univ hon...
107 篇 carnegie mellon ...
101 篇 tsinghua univers...
101 篇 microsoft resear...
95 篇 swiss fed inst t...
93 篇 zhejiang univ pe...
82 篇 university of sc...
81 篇 zhejiang univers...
80 篇 university of ch...
77 篇 shanghai ai lab ...
72 篇 shanghai jiao to...
69 篇 national laborat...
67 篇 microsoft res as...
67 篇 alibaba grp peop...
64 篇 adobe research
61 篇 tsinghua univ pe...
60 篇 peking univ peop...
59 篇 univ oxford oxfo...

作者

81 篇 van gool luc
72 篇 timofte radu
64 篇 zhang lei
47 篇 luc van gool
40 篇 yang yi
40 篇 li stan z.
37 篇 loy chen change
34 篇 chen chen
33 篇 xiaoou tang
32 篇 liu yang
32 篇 qi tian
31 篇 tian qi
31 篇 sun jian
30 篇 murino vittorio
30 篇 pascal fua
29 篇 darrell trevor
29 篇 li fei-fei
28 篇 li xin
28 篇 ying shan
27 篇 vasconcelos nuno

语言

23,137 篇 英文
53 篇 其他
22 篇 中文
5 篇 土耳其文
2 篇 日文

检索条件"任意字段=IEEE Conference on Computer Vision and Pattern Recognition Workshops"

共 23219 条记录，以下是251-260 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Know Your Neighbors: Improving Single-View Reconstruction via Spatial vision-Language Reasoning

Know Your Neighbors: Improving Single-View Reconstruction vi...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Li, Rui Fischer, Tobias Segu, Mattia Pollefeys, Marc Van Gool, Luc Tombari, Federico Swiss Fed Inst Technol Zurich Switzerland Google Mountain View CA 94043 USA Tech Univ Munich Munich Germany

ISBN: (纸本)9798350353006

Recovering the 3D scene geometry from a single view is a fundamental yet ill-posed problem in computer vision. While classical depth estimation methods infer only a 2.5D scene representation limited to the image plane, recent approaches based on radiance fields reconstruct a full 3D representation. However, these methods still struggle with occluded regions since inferring geometry without visual observation requires (i) semantic knowledge of the surroundings, and (ii) reasoning about spatial context. We propose KYN, a novel method for single-view scene reconstruction that reasons about semantic and spatial context to predict each point's density. We introduce a vision-language modulation module to enrich point features with fine-grained semantic information. We aggregate point representations across the scene through a language-guided spatial attention mechanism to yield per-point density predictions aware of the 3D semantic context. We show that KYN improves 3D shape recovery compared to predicting density for each 3D point in isolation. We achieve state-of-the-art results in scene and object reconstruction on KITTI-360, and show improved zero-shot generalization compared to prior work. Project page: https://***/kyn.

关键词： radiance field spatial attention spatial context vision-language model

来源：评论

学校读者我要写书评

暂无评论

GRAM: Global Reasoning for Multi-Page VQA

GRAM: Global Reasoning for Multi-Page VQA

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Blau, Tsachi Fogel, Sharon Ronen, Roi Goltst, Alona Per, Shahar Tsi Ben Avraham, Elad Aberdam, Aviad Ganz, Roy Litman, Ron Technion Haifa Israel AWS AI Labs Shanghai Peoples R China

ISBN: (纸本)9798350353006

The increasing use of transformer-based large language models brings forward the challenge of processing long sequences. In document visual question answering (DocVQA), leading methods focus on the single-page setting, while documents can span hundreds of pages. We present GRAM, a method that seamlessly extends pre-trained single-page models to the multi-page setting, without requiring computationally-heavy pretraining. To do so, we leverage a single-page encoder for local page-level understanding, and enhance it with document-level designated layers and learnable tokens, facilitating the flow of information across pages for global reasoning. To enforce our model to utilize the newly introduced document tokens, we propose a tailored bias adaptation method. For additional computational savings during decoding, we introduce an optional compression stage using our compression-transformer(C-Former), reducing the encoded sequence length, thereby allowing a tradeoff between quality and latency. Extensive experiments showcase GRAM's state-of-the-art performance on the benchmarks for multi-page DocVQA, demonstrating the effectiveness of our approach.

关键词： Document Understanding Long Sequence Processing vision Language Models

来源：评论

学校读者我要写书评

暂无评论

Situational Awareness Matters in 3D vision Language Reasoning

Situational Awareness Matters in 3D Vision Language Reasonin...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Man, Yunze Gui, Liang-Yan Wang, Yu-Xiong Univ Illinois Urbana IL 61801 USA

ISBN: (纸本)9798350353006

Being able to carry out complicated vision language reasoning tasks in 3D space represents a significant milestone in developing household robots and human-centered embodied AI. In this work, we demonstrate that a critical and distinct challenge in 3D vision language reasoning is the situational awareness, which incorporates two key components: (1) The autonomous agent grounds its self-location based on a language prompt. (2) The agent answers open-ended questions from the perspective of its calculated position. To address this challenge, we introduce SIG3D, an end-to-end Situation-Grounded model for 3D vision language reasoning. We tokenize the 3D scene into sparse voxel representation, and propose a language-grounded situation estimator, followed by a situated question answering module. Experiments on the SQA3D and ScanQA datasets show that SIG3D outperforms state-of-the-art models in situational estimation and question answering by a large margin (e.g., an enhancement of over 30% on situation accuracy). Subsequent analysis corroborates our architectural design choices, explores the distinct functions of visual and textual tokens, and highlights the importance of situational awareness in the domain of 3D question-answering. Project page is available at https://***/situation3d.

关键词： vision-Language Multi-modal 3D Reasoning

来源：评论

学校读者我要写书评

暂无评论

Persistent-Transient Duality in Human Behavior Modeling

Persistent-Transient Duality in Human Behavior Modeling

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Hung Tran Vuong Le Venkatesh, Svetha Truyen Tran Deakin Univ Appl AI Inst Geelong Vic Australia

ISBN: (数字)9781665487399

ISBN: (纸本)9781665487399

We propose to model the persistent-transient duality in human behavior using a parent-child multi-channel neural network, which features a parent persistent channel that manages the global dynamics and children transient channels that are initiated and terminated on-demand to handle detailed interactive actions. The short-lived transient sessions are managed by a proposed Transient Switch. The neural framework is trained to discover the structure of the duality automatically. Our model shows superior performances in human-object interaction motion prediction.

关键词： computer vision Computational modeling conferences Dynamics Neural networks Switches Predictive models

来源：评论

学校读者我要写书评

暂无评论

Low-Resource vision Challenges for Foundation Models

Low-Resource Vision Challenges for Foundation Models

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Zhang, Yunhua Doughty, Hazel Snoek, Cees G. M. Univ Amsterdam Amsterdam Netherlands Leiden Univ Leiden Netherlands

ISBN: (纸本)9798350353006

Low-resource settings are well-established in natural language processing, where many languages lack sufficient data for deep learning at scale. However, low-resource problems are under-explored in computer vision. In this paper, we address this gap and explore the challenges of low-resource image tasks with vision foundation models. We first collect a benchmark of genuinely low-resource image data, covering historic maps, circuit diagrams, and mechanical drawings. These low-resource settings all share three challenges: data scarcity, fine-grained differences, and the distribution shift from natural images to the specialized domain of interest. While existing foundation models have shown impressive generalizability, we find they cannot transfer well to our low-resource tasks. To begin to tackle the challenges of low-resource vision, we introduce one simple baseline per challenge. Specifically, we i) enlarge the data space by generative models, ii) adopt the best sub-kernels to encode local regions for fine-grained difference discovery and iii) learn attention for specialized domains. Experiments on our three low-resource tasks demonstrate our proposals already provide a better baseline than transfer learning, data augmentation, and fine-grained methods. This highlights the unique characteristics and challenges of low-resource vision for foundation models that warrant further investigation. Project page: https://***/Low-Resource-vision/.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Not All Prompts Are Secure: A Switchable Backdoor Attack Against Pre-trained vision Transfomers

Not All Prompts Are Secure: A Switchable Backdoor Attack Aga...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Yang, Sheng Bai, Jiawang Gao, Kuofeng Yang, Yong Li, Yiming Xia, Shu-Tao Tsinghua Univ Beijing Peoples R China Tencent Secur Platform Dept Shenzhen Peoples R China Zhejiang Univ Hangzhou Peoples R China Peng Cheng Lab Res Ctr Artificial Intelligence Shenzhen Peoples R China

ISBN: (纸本)9798350353006

Given the power of vision transformers, a new learning paradigm, pretraining and then prompting, makes it more efficient and effective to address downstream visual recognition tasks. In this paper, we identify a novel security threat towards such a paradigm from the perspective of backdoor attacks. Specifically, an extra prompt token, called the switch token in this work, can turn the backdoor mode on, i.e., converting a benign model into a backdoored one. Once under the backdoor mode, a specific trigger can force the model to predict a target class. It poses a severe risk to the users of cloud API, since the malicious behavior can not be activated and detected under the benign mode, thus making the attack very stealthy. To attack a pretrained model, our proposed attack, named SWARM, learns a trigger and prompt tokens including a switch token. They are optimized with the clean loss which encourages the model always behaves normally even the trigger presents, and the backdoor loss that ensures the backdoor can be activated by the trigger when the switch is on. Besides, we utilize the crossmode feature distillation to reduce the effect of the switch token on clean samples. The experiments on diverse visual recognition tasks confirm the success of our switchable backdoor attack, i.e., achieving 95%+ attack success rate, and also being hard to be detected and removed. Our code is available at https://***/20000yshust/SWARM.

关键词： Backdoor computer vision Parameter-Efficient Fine Tuning Switchable vision Transformers Visual Prompting

来源：评论

学校读者我要写书评

暂无评论

Eclipse: Disambiguating Illumination and Materials using Unintended Shadows

Eclipse: Disambiguating Illumination and Materials using Uni...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Verbin, Dor Mildenhall, Ben Hedman, Peter Barron, Jonathan T. Zickler, Todd Srinivasan, Pratul P. Google Res Mountain View CA 94043 USA Harvard Univ Cambridge MA USA

ISBN: (纸本)9798350353013;9798350353006

Decomposing an object's appearance into representations of its materials and the surrounding illumination is difficult, even when the object's 3D shape is known beforehand. This problem is especially challenging for diffuse objects: it is ill-conditioned because diffuse materials severely blur incoming light, and it is ill-posed because diffuse materials under high-frequency lighting can be indistinguishable from shiny materials under low-frequency lighting. We show that it is possible to recover precise materials and illumination-even from diffuse objects-by exploiting unintended shadows, like the ones cast onto an object by the photographer who moves around it. These shadows are a nuisance in most previous inverse rendering pipelines, but here we exploit them as signals that improve conditioning and help resolve material-lighting ambiguities. We present a method based on differentiable Monte Carlo ray tracing that uses images of an object to jointly recover its spatiallyvarying materials, the surrounding illumination environment, and the shapes of the unseen light occluders who inadvertently cast shadows upon it.

关键词： computer Graphics Inverse Rendering Non-line-of-sight Imaging

来源：评论

学校读者我要写书评

暂无评论

Convolutional Prompting meets Language Models for Continual Learning

Convolutional Prompting meets Language Models for Continual ...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Roy, Anurag Moulick, Riddhiman Verma, Vinay K. Ghosh, Saptarshi Das, Abir IIT Kharagpur Kharagpur W Bengal India IML Amazon India Hyderabad India

ISBN: (纸本)9798350353006

Continual Learning (CL) enables machine learning models to learn from continuously shifting new training data in absence of data from old tasks. Recently, pretrained vision transformers combined with prompt tuning have shown promise for overcoming catastrophic forgetting in CL. These approaches rely on a pool of learnable prompts which can be inefficient in sharing knowledge across tasks leading to inferior performance. In addition, the lack of fine-grained layer specific prompts does not allow these to fully express the strength of the prompts for CL. We address these limitations by proposing ConvPrompt, a novel convolutional prompt creation mechanism that maintains layer-wise shared embeddings, enabling both layer-specific learning and better concept transfer across tasks. The intelligent use of convolution enables us to maintain a low parameter overhead without compromising performance. We further leverage Large Language Models to generate fine-grained text descriptions of each category which are used to get task similarity and dynamically decide the number of prompts to be learned. Extensive experiments demonstrate the superiority of ConvPrompt and improves SOTA by 3% with significantly less parameter overhead. We also perform strong ablation over various modules to disentangle the importance of different components.(1)

关键词： Continual Learning Language Models Prompt Tuning vision Transformer

来源：评论

学校读者我要写书评

暂无评论

ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization

ICON: Incremental CONfidence for Joint Pose and Radiance Fie...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Wang, Weiyao Gleize, Pierre Tang, Hao Chen, Xingyu Liang, Kevin J. Feiszli, Matt Meta FAIR Menlo Pk CA 94025 USA

ISBN: (纸本)9798350353013;9798350353006

Neural Radiance Fields (NeRF) exhibit remarkable performance for Novel View Synthesis (NVS) given a set of 2D images. However, NeRF training requires accurate camera pose for each input view, typically obtained by Structure-from-Motion (SfM) pipelines. Recent works have attempted to relax this constraint, but they still often rely on decent initial poses which they can refine. Here we aim at removing the requirement for pose initialization. We present Incremental CONfidence (ICON), an optimization procedure for training NeRFs from 2D video frames. ICON only assumes smooth camera motion to estimate initial guess for poses. Further, ICON introduces "confidence": an adaptive measure of model quality used to dynamically reweight gradients. ICON relies on high-confidence poses to learn NeRF, and high-confidence 3D structure (as encoded by NeRF) to learn poses. We show that ICON, without prior pose initialization, achieves superior performance in both CO3D and HO3D versus methods which use SfM pose.

关键词： Three dimensional computer graphics

来源：评论

学校读者我要写书评

暂无评论

CPLIP: Zero-Shot Learning for Histopathology with Comprehensive vision-Language Alignment

CPLIP: Zero-Shot Learning for Histopathology with Comprehens...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Javed, Sajid Mahmood, Arif Ganapathil, Iyyakutti Iyappan Dharej, Fayaz Ali Werghil, Naoufel Bennamoun, Mohammed Khalifa Univ Sci & Technol Dept Comp Sci Abu Dhabi U Arab Emirates Khalifa Univ Sci & Technol C2PS Abu Dhabi U Arab Emirates Informat Technol Univ Punjab Lahore Pakistan Univ Western Australia Perth WA Australia

ISBN: (纸本)9798350353006

This paper proposes Comprehensive Pathology Language Image Pre-training (CPLIP), a new unsupervised technique designed to enhance the alignment of images and text in histopathology for tasks such as classification and segmentation. This methodology enriches vision-language models by leveraging extensive data without needing ground truth annotations. CPLIP involves constructing a pathology-specific dictionary, generating textual descriptions for images using language models, and retrieving relevant images for each text snippet via a pre-trained model. The model is then fine-tuned using a many-to-many contrastive learning method to align complex interrelated concepts across both modalities. Evaluated across multiple histopathology tasks, CPLIP shows notable improvements in zero-shot learning scenarios, outperforming existing methods in both interpretability and robustness and setting a higher benchmark for the application of vision-language models in the field. To encourage further research and replication, the code for CPLIP is available on GitHub at https://***/

关键词： Cancer Detection Computational Pathology Contrastive Loss Histopathology Many-to-Many vision-Language Alignment vision Language Modeling Whole Slide Image Zero-shot Learning

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 22 23 24 25 26 27 28 29 30 31 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：