检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

22,999 篇 会议
107 册 图书
93 篇 期刊文献

馆藏范围

23,198 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,622 篇 工学
- 11,107 篇 计算机科学与技术...
- 3,478 篇 软件工程
- 2,445 篇 机械工程
- 1,715 篇 光学工程
- 1,076 篇 电气工程
- 1,014 篇 控制科学与工程
- 784 篇 信息与通信工程
- 411 篇 仪器科学与技术
- 352 篇 生物工程
- 251 篇 生物医学工程（可授...
- 196 篇 电子科学与技术（可...
- 114 篇 化学工程与技术
- 107 篇 安全科学与工程
- 100 篇 测绘科学与技术
- 88 篇 建筑学
- 85 篇 交通运输工程
- 84 篇 土木工程
3,494 篇 医学
- 3,481 篇 临床医学
- 81 篇 基础医学(可授医学...
3,240 篇 理学
- 1,939 篇 物理学
- 1,639 篇 数学
- 563 篇 统计学（可授理学、...
- 500 篇 生物学
- 249 篇 系统科学
- 106 篇 化学
521 篇 管理学
- 311 篇 图书情报与档案管...
- 223 篇 管理科学与工程(可...
- 76 篇 工商管理
276 篇 艺术学
- 276 篇 设计学（可授艺术学...
66 篇 法学
- 63 篇 社会学
38 篇 农学
28 篇 教育学
22 篇 经济学
10 篇 军事学
3 篇 文学

主题

10,187 篇 computer vision
3,967 篇 pattern recognit...
3,005 篇 training
2,007 篇 computational mo...
1,818 篇 visualization
1,816 篇 cameras
1,515 篇 feature extracti...
1,481 篇 shape
1,455 篇 three-dimensiona...
1,438 篇 image segmentati...
1,287 篇 robustness
1,205 篇 computer archite...
1,155 篇 semantics
1,147 篇 conferences
1,107 篇 layout
1,093 篇 computer science
1,088 篇 object detection
1,025 篇 benchmark testin...
970 篇 codes
922 篇 face recognition

机构

136 篇 univ sci & techn...
121 篇 univ chinese aca...
118 篇 chinese univ hon...
107 篇 carnegie mellon ...
101 篇 tsinghua univers...
101 篇 microsoft resear...
95 篇 swiss fed inst t...
93 篇 zhejiang univ pe...
82 篇 university of sc...
81 篇 zhejiang univers...
80 篇 university of ch...
77 篇 shanghai ai lab ...
72 篇 shanghai jiao to...
69 篇 national laborat...
67 篇 microsoft res as...
67 篇 alibaba grp peop...
64 篇 adobe research
62 篇 tsinghua univ pe...
60 篇 peking univ peop...
59 篇 univ oxford oxfo...

作者

81 篇 van gool luc
72 篇 timofte radu
64 篇 zhang lei
47 篇 luc van gool
40 篇 yang yi
40 篇 li stan z.
37 篇 loy chen change
34 篇 chen chen
33 篇 xiaoou tang
32 篇 liu yang
32 篇 qi tian
31 篇 tian qi
31 篇 sun jian
30 篇 murino vittorio
30 篇 pascal fua
29 篇 darrell trevor
29 篇 li fei-fei
28 篇 li xin
28 篇 ying shan
27 篇 vasconcelos nuno

语言

23,132 篇 英文
38 篇 其他
22 篇 中文
5 篇 土耳其文
2 篇 日文

检索条件"任意字段=IEEE Conference on Computer Vision and Pattern Recognition Workshops"

共 23199 条记录，以下是321-330 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

DePT: Decoupled Prompt Tuning

DePT: Decoupled Prompt Tuning

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Zhang, Ji Wu, Shihan Gao, Lianli Shen, Heng Tao Song, Jingkuan Univ Elect Sci & Technol China UESTC Chengdu Peoples R China UESTC Shenzhen Inst Adv Study Chengdu Peoples R China Tongji Univ Shanghai Peoples R China

ISBN: (纸本)9798350353006

This work breaks through the Base-New Tradeoff (BNT) dilemma in prompt tuning, i.e., the better the tuned model generalizes to the base (or target) task, the worse it generalizes to new tasks, and vice versa. Specifically, through an in-depth analysis of the learned features of the base and new tasks, we observe that the BNT stems from a channel bias issue - the vast majority of feature channels are occupied by base-specific knowledge, leading to the collapse of task-shared knowledge important to new tasks. To address this, we propose the Decoupled Prompt Tuning (DePT) framework, which decouples base-specific knowledge from feature channels into an isolated feature space during prompt tuning, so as to maximally preserve task-shared knowl-edge in the original feature space for achieving better zero-shot generalization on new tasks. Importantly, our DePT is orthogonal to existing prompt tuning approaches, and can enhance them with negligible additional computational cost. Extensive experiments on several datasets show the flexibility and effectiveness of DePT. Code is available at https://***/Koorye/DePT.

关键词： Feature decoupling Few-shot learning Prompt tuning vision and language

来源：评论

学校读者我要写书评

暂无评论

Attentive Illumination Decomposition Model for Multi-Illuminant White Balancing

Attentive Illumination Decomposition Model for Multi-Illumin...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Kim, Dongyoung Kim, Jinwoo Yu, Junsang Kim, Seon Joo Yonsei Univ Seoul South Korea Samsung Adv Inst Technol Suwon South Korea

ISBN: (纸本)9798350353006

White balance (WB) algorithms in many commercial cameras assume single and uniform illumination, leading to undesirable results when multiple lighting sources with different chromaticities exist in the scene. Prior research on multi-illuminant WB typically predicts illumination at the pixel level without fully grasping the scene's actual lighting conditions, including the number and color of light sources. This often results in unnatural outcomes lacking in overall consistency. To handle this problem, we present a deep white balancing model that leverages the slot attention, where each slot is in charge of representing individual illuminants. This design enables the model to generate [ chromaticities and weight maps for individual illuminants, which are then fused to compose the final illumination map. Furthermore, we propose the centroid-matching loss, which regulates the activation of each slot based on the color range, thereby enhancing the model to separate illumination more effectively. Our method achieves the state-of-the-art performance on both single- and multi-illuminant WB benchmarks, and also offers additional information such as the number of illuminants in the scene and their chromaticity. This capability allows for illumination editing, an application not feasible with prior methods.

关键词： Low level vision Photography White Balancing

来源：评论

学校读者我要写书评

暂无评论

DGC-GNN: Leveraging Geometry and Color Cues for Visual Descriptor-Free 2D-3D Matching

DGC-GNN: Leveraging Geometry and Color Cues for Visual Descr...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Wang, Shuzhe Kannala, Juho Baratht, Daniel Aalto Univ Dept Comp Sci Espoo Finland Swiss Fed Inst Technol Comp Vision & Geometry Grp Zurich Switzerland

ISBN: (纸本)9798350353006

Matching 2D keypoints in an image to a sparse 3D point cloud of the scene without requiring visual descriptors has garnered increased interest due to its low memory requirements, inherent privacy preservation, and reduced need for expensive 3D model maintenance compared to visual descriptor-based methods. However, existing algorithms of-ten compromise on performance, resulting in a significant deterioration compared to their descriptor-based counterparts. In this paper, we introduce DGC-GNN, a novel algorithm that employs a global-to-local Graph Neural Network (GNN) that progressively exploits geometric and color cues to represent keypoints, thereby improving matching accuracy. Our procedure encodes both Euclidean and angular relations at a coarse level, forming the geometric embedding to guide the point matching. We evaluate DGC-GNN on both indoor and outdoor datasets, demonstrating that it not only doubles the accuracy of the state-of-the-art visual descriptor-free algorithm but also substantially narrows the performance gap between descriptor-based and descriptor-free methods.

关键词： 2D-3D Matching Global-to-local GNN privacy preservation Visual Descriptor-Free

来源：评论

学校读者我要写书评

暂无评论

GROUNDHOG : Grounding Large Language Models to Holistic Segmentation

GROUNDHOG : Grounding Large Language Models to Holistic Segm...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Zhang, Yichi Qiao, Zhiqiao Gao, Xiaofeng Shakiah, Suhaila Gao, Qiaozi Chai, Joyce Univ Michigan Ann Arbor MI 48109 USA Amazon AGI Seattle WA USA

ISBN: (纸本)9798350353006

Most multimodal large language models (MLLMs) learn language-to-object grounding through causal language modeling where grounded objects are captured by bounding boxes as sequences of location tokens. This paradigm lacks pixel-level representations that are important for fine-grained visual understanding and diagnosis. In this work, we introduce GROUNDHOG, an MLLM developed by grounding Large Language Models to holistic segmentation. GROUNDHOG incorporates a masked feature extractor and converts extracted features into visual entity tokens for the MLLM backbone, which then connectsgroundable phrases to unified grounding masks by retrieving and merging the entity masks. To train GROUNDHOG, we carefully curated M3G2, a grounded visual instruction tuning dataset with Multi-Modal Multi-Grained Grounding, by harvesting a collection of segmentation-grounded datasets with rich annotations. Our experimental results show that GROUNDHOG achieves superior performance on various language grounding tasks without task-specific fine-tuning, and significantly reduces object hallucination. GROUNDHOG also demonstrates better grounding towards complex forms of visual input and provides easy-to-understand diagnosis in failure cases.

关键词： Language Grounding Multi-Modal vision-Language Model

来源：评论

学校读者我要写书评

暂无评论

Unified-IO 2: Scaling Autoregressive Multimodal Models with vision, Language, Audio, and Action

Unified-IO 2: Scaling Autoregressive Multimodal Models with ...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Lu, Jiasen Clark, Christopher Lee, Sangho Zhang, Zichen Khosla, Savya Marten, Ryan Hoiem, Derek Kembhavi, Aniruddha Allen Inst AI Seattle WA 98103 USA Univ Illinois Urbana IL USA Univ Washington Seattle WA 98195 USA

ISBN: (纸本)9798350353006

We present UNIFIED-IO 2, the first autoregressive multimodal model that is capable of understanding and generating image, text, audio, and action. To unify different modalities, we tokenize inputs and outputs - images, text, audio, action, bounding boxes etc., into a shared semantic space and then process them with a single encoder-decoder transformer model. Since training with such diverse modalities is challenging, we propose various architectural improvements to stabilize model training. We train our model from scratch on a large multimodal pre-training corpus from diverse sources with a multimodal mixture of denoisers objective. To learn an expansive set of skills, such as following multimodal instructions, we construct and finetune on an ensemble of 120 datasets with prompts and augmentations. With a single unified model, UNIFIED-IO 2 achieves state-of-the-art performance on the GRIT benchmark and strong results in more than 35 benchmarks, including image generation and understanding, natural language understanding, video and audio understanding, and robotic manipulation. We release all our models to the research community.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

HomoFormer: Homogenized Transformer for Image Shadow Removal

HomoFormer: Homogenized Transformer for Image Shadow Removal

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Xiao, Jie Fu, Xueyang Zhu, Yurui Li, Dong Huang, Jie Zhu, Kai Zha, Zheng-Jun Univ Sci & Technol China Hefei Peoples R China Alibaba Grp Hangzhou Peoples R China

ISBN: (纸本)9798350353006

The spatial non-uniformity and diverse patterns of shadow degradation conflict with the weight sharing manner of dominant models, which may lead to an unsatisfactory compromise. To tackle with this issue, we present a novel strategy from the view of shadow transformation in this paper: directly homogenizing the spatial distribution of shadow degradation. Our key design is the random shuffle operation and its corresponding inverse operation. Specifically, random shuffle operation stochastically rear-ranges the pixels across spatial space and the inverse operation recovers the original order. After randomly shuffling, the shadow diffuses in the whole image and the degradation appears in a homogenized way, which can be effectively processed by the local self-attention layer. Moreover, we further devise a new feed forward network with position modeling to exploit image structural information. Based on these elements, we construct the final local window based transformer named HomoFormer for image shadow removal. Our HomoFormer can enjoy the linear complexity of local transformers while bypassing challenges of non-uniformity and diversity of shadow. Extensive experiments are conducted to verify the superiority of our HomoFormer across public datasets. Code is available at https://***/jiexiaou/HomoFormer.

关键词： Image Restoration Image Shadow Removal vision Transformer

来源：评论

学校读者我要写书评

暂无评论

The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective

The Audio-Visual Conversational Graph: From an Egocentric-Ex...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Jia, Wenqi Liu, Miao Jiang, Hao Ananthabhotla, Ishwarya Rehg, James M. Ithapu, Vamsi Krishna Gao, Ruohan Georgia Tech Atlanta GA 30332 USA Meta Real Labs Menlo Pk CA 94025 USA UIUC Champaign IL USA Meta GenAI Menlo Pk CA USA

ISBN: (纸本)9798350353006

In recent years, the thriving development of research related to egocentric videos has provided a unique perspective for the study of conversational interactions, where both visual and audio signals play a crucial role. While most prior work focus on learning about behaviors that directly involve the camera wearer, we introduce the Ego-Exocentric Conversational Graph Prediction problem, marking the first attempt to infer exocentric conversational interactions from egocentric videos. We propose a unified multi-modal framework-Audio-Visual Conversational Attention (AV-CONV), for the joint prediction of conversation behaviors-speaking and listening-for both the camera wearer as well as all other social partners present in the egocentric video. Specifically, we adopt the self-attention mechanism to model the representations across-time, across-subjects, and across-modalities. To validate our method, we conduct experiments on a challenging egocentric video dataset that includes multi-speaker and multi-conversation scenarios. Our results demonstrate the superior performance of our method compared to a series of baselines. We also present detailed ablation studies to assess the contribution of each component in our model. Check our Project Page.

关键词： egocentric vision Multi-modal learning social ai

来源：评论

学校读者我要写书评

暂无评论

Unified Language-driven Zero-shot Domain Adaptation

Unified Language-driven Zero-shot Domain Adaptation

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Yang, Senqiao Tian, Zhuotao Jiang, Li Jia, Jiaya Chinese Univ Hong Kong Hong Kong Peoples R China Harbin Inst Technol Shenzhen Peoples R China Chinese Univ Hong Kong Shenzhen Peoples R China

ISBN: (纸本)9798350353006

This paper introduces Unified Language-driven Zero-shot Domain Adaptation ( ULDA), a novel task setting that enables a single model to adapt to diverse target domains without explicit domain-ID knowledge. We identify the constraints in the existing language-driven zero-shot domain adaptation task, particularly the requirement for domain IDs and domain-specific models, which may restrict flexibility and scalability. To overcome these issues, we propose a new framework for ULDA, consisting of Hierarchical Context Alignment (HCA), Domain Consistent Representation Learning (DCRL), and Text-Driven Rectifier (TDR). These components work synergistically to align simulated features with target text across multiple visual levels, retain semantic correlations between different regional representations, and rectify biases between simulated and real target visual features, respectively. Our extensive empirical evaluations demonstrate that this framework achieves competitive performance in both settings, surpassing even the model that requires domain-ID, showcasing its superiority and generalization ability. The proposed method is not only effective but also maintains practicality and efficiency, as it does not introduce additional computational costs during inference. The code is available on the project website(1).

关键词： Domain Adaptation Segmentation vision-Language Model

来源：评论

学校读者我要写书评

暂无评论

Grounded Question-Answering in Long Egocentric Videos

Grounded Question-Answering in Long Egocentric Videos

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Di, Shangzhe Xie, Weidi Shanghai Jiao Tong Univ CMIC Shanghai Peoples R China Shanghai AI Lab Shanghai Peoples R China

ISBN: (纸本)9798350353006

Existing approaches to video understanding, mainly designed for short videos from a third-person perspective, are limited in their applicability in certain fields, such as robotics. In this paper, we delve into open-ended question-answering (QA) in long, egocentric videos, which allows individuals or robots to inquire about their own past visual experiences. This task presents unique challenges, including where did I put lettuce? Choices: (A) pantry (B) refrigerator (C) cupboard (D) draw 20-50s Answer: in the fridge / (B) refrigerator the complexity of temporally grounding queries within extensive video content, the high resource demands for precise data annotation, and the inherent difficulty of evaluating open-ended answers due to their ambiguous nature. Our proposed approach tackles these challenges by (i) integrating query grounding and answering within a unified model to reduce error propagation;(ii) employing large language models for efficient and scalable data synthesis;and (iii) introducing a close-ended QA task for evaluation, to manage answer ambiguity. Extensive experiments demonstrate the effectiveness of our method, which also achieves state-of-the-art performance on the QAEgo4D and Ego4D-NLQ benchmarks. Code, data, and models are open-sourced (1).

关键词： egocentric vision video grounding video question answering

来源：评论

学校读者我要写书评

暂无评论

DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly

DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D ...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Scarpellini, Gianluca Fiorini, Stefano Giuliari, Francesco Morerio, Pietro Del Bue, Alessio Ist Italiano Tecnol IIT Pattern Anal & Comp Vis PAVIS Genoa Italy

ISBN: (纸本)9798350353006

Reassembly tasks play a fundamental role in many fields and multiple approaches exist to solve specific reassembly problems. In this context, we posit that a general unified model can effectively address them all, irrespective of the input data type (images, 3D, etc.). We introduce DiffAssemble, a Graph Neural Network (GNN)-based architecture that learns to solve reassembly tasks using a diffusion model formulation. Our method treats the elements of a set, whether pieces of 2D patch or 3D object fragments, as nodes of a spatial graph. Training is performed by introducing noise into the position and rotation of the elements and iteratively denoising them to reconstruct the coherent initial pose. DiffAssemble achieves state-of-the-art (SOTA) results in most 2D and 3D reassembly tasks and is the first learning-based approach that solves 2D puzzles for both rotation and translation. Furthermore, we highlight its remarkable reduction in run-time, performing 11 times faster than the quickest optimization-based method for puzzle solving. Code available at https:// ***/IITPAVIS/DiffAssemble

关键词： diffusion model graph neural network puzzle reassembly

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 29 30 31 32 33 34 35 36 37 38 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：