检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

12,844 篇 会议
13 篇 期刊文献
2 册 图书

馆藏范围

12,859 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

7,573 篇 工学
- 6,863 篇 计算机科学与技术...
- 880 篇 机械工程
- 814 篇 软件工程
- 435 篇 控制科学与工程
- 360 篇 光学工程
- 306 篇 电气工程
- 209 篇 仪器科学与技术
- 124 篇 信息与通信工程
- 91 篇 生物工程
- 62 篇 生物医学工程（可授...
- 39 篇 电子科学与技术（可...
- 34 篇 安全科学与工程
- 26 篇 化学工程与技术
- 21 篇 交通运输工程
- 20 篇 建筑学
- 18 篇 土木工程
2,957 篇 医学
- 2,956 篇 临床医学
- 15 篇 基础医学(可授医学...
- 12 篇 药学(可授医学、理...
700 篇 理学
- 359 篇 物理学
- 225 篇 数学
- 175 篇 系统科学
- 95 篇 统计学（可授理学、...
- 93 篇 生物学
- 22 篇 化学
201 篇 艺术学
- 201 篇 设计学（可授艺术学...
84 篇 管理学
- 59 篇 图书情报与档案管...
- 25 篇 管理科学与工程(可...
- 14 篇 工商管理
23 篇 法学
- 21 篇 社会学
5 篇 农学
4 篇 教育学
2 篇 经济学
1 篇 军事学

主题

6,464 篇 computer vision
2,688 篇 training
2,437 篇 pattern recognit...
1,780 篇 computational mo...
1,522 篇 visualization
1,348 篇 three-dimensiona...
1,091 篇 computer archite...
1,063 篇 semantics
997 篇 benchmark testin...
976 篇 codes
970 篇 conferences
854 篇 feature extracti...
830 篇 cameras
771 篇 task analysis
707 篇 deep learning
646 篇 image segmentati...
611 篇 object detection
595 篇 shape
554 篇 transformers
538 篇 neural networks

机构

132 篇 univ sci & techn...
122 篇 carnegie mellon ...
120 篇 tsinghua univ pe...
114 篇 univ chinese aca...
113 篇 chinese univ hon...
94 篇 tsinghua univers...
91 篇 zhejiang univ pe...
91 篇 swiss fed inst t...
85 篇 peng cheng lab p...
81 篇 university of ch...
80 篇 zhejiang univers...
77 篇 shanghai ai lab ...
77 篇 peng cheng labor...
75 篇 university of sc...
69 篇 shanghai jiao to...
68 篇 shanghai jiao to...
67 篇 alibaba grp peop...
67 篇 stanford univ st...
66 篇 univ hong kong p...
64 篇 sensetime res pe...

作者

77 篇 timofte radu
63 篇 van gool luc
45 篇 zhang lei
36 篇 yang yi
36 篇 luc van gool
34 篇 tao dacheng
31 篇 loy chen change
29 篇 chen chen
28 篇 sun jian
28 篇 qi tian
25 篇 li xin
24 篇 liu yang
24 篇 tian qi
24 篇 ying shan
23 篇 wang xinchao
23 篇 zha zheng-jun
23 篇 boxin shi
21 篇 zhou jie
21 篇 vasconcelos nuno
20 篇 luo ping

语言

12,849 篇 英文
9 篇 其他
1 篇 中文

检索条件"任意字段=IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops"

共 12859 条记录，以下是431-440 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

Equivariant Multi-Modality Image Fusion

Equivariant Multi-Modality Image Fusion

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Zhao, Zixiang Hai, Haowen Zhang, Jiangshe Zhang, Yulun Zhane, Kai Xu, Shuang Chen, Dongdong Timofte, Radu Van Gool, Luc Xi An Jiao Tong Univ Xian Peoples R China Swiss Fed Inst Technol Zurich Switzerland Shanghai Jiao Tong Univ Shanghai Peoples R China Nanjing Univ Nanjing Peoples R China Northwestern Polytech Univ Xian Peoples R China Heriot Watt Univ Edinburgh Midlothian Scotland Univ Wurzburg Wurzburg Germany INSAIT Sofia Bulgaria

ISBN: (纸本)9798350353006

Multi-modality image fusion is a technique that combines information from different sensors or modalities, enabling the fused image to retain complementary features from each modality, such as functional highlights and texture details. However, effective training of such fusion models is challenging due to the scarcity of ground truth fusion data. To tackle this issue, we propose the Equivariant Multi-Modality imAge fusion (EMMA) paradigm for end-to-end self-supervised learning. Our approach is rooted in the prior knowledge that natural imaging responses are equivariant to certain transformations. Consequently, we introduce a novel training paradigm that encompasses a fusion module, a pseudo-sensing module, and an equivariant fusion module. These components enable the net training to follow the principles of the natural sensing-imaging process while satisfying the equivariant imaging prior. Extensive experiments confirm that EMMA yields high-quality fusion results for infraredvisible and medical images, concurrently facilitating downstream multi-modal segmentation and detection tasks. The code is available at https://***/Zhaozixiang1228/MMIF-EMMA.

关键词： image fusion low-level vision

来源：评论

学校读者我要写书评

暂无评论

CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update

CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Up...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Gao, Zhi Du, Yuntao Zhang, Xintong Ma, Xiaojian Han, Wenjuan Zhu, Song-Chun Li, Qing Peking Univ Sch Intelligence Sci & Technol Beijing Peoples R China BIGAI State Key Lab Gen Artificial Intelligence Beijing Peoples R China Beijing Jiaotong Univ Beijing Peoples R China Tsinghua Univ Dept Automat Beijing Peoples R China

ISBN: (纸本)9798350353006

Utilizing large language models (LLMs) to compose off-the-shelf visual tools represents a promising avenue of research for developing robust visual assistants capable of addressing diverse visual tasks. However, these methods often overlook the potential for continual learning, typically by freezing the utilized tools, thus limiting their adaptation to environments requiring new knowledge. To tackle this challenge, we propose CLOVA, a Closed-LOop Visual Assistant, which operates within a framework encompassing inference, reflection, and learning phases. During the inference phase, LLMs generate programs and execute cor responding tools to complete assigned tasks. In the reflection phase, a multimodal global-local reflection scheme analyzes human feedback to determine which tools require updating. Lastly, the learning phase employs three flexible approaches to automatically gather training data and introduces a novel prompt tuning scheme to update the tools, allowing CLOVA to efficiently acquire new knowledge. Experimental findings demonstrate that CLOVA surpasses existing tool-usage methods by 5% in visual question answering and multiple-image reasoning, by 10% in knowledge tagging, and by 20% in image editing. These results underscore the significance of the continual learning capability in general visual assistants.

关键词： Compositional Reasoning Large Language Models Multimodal Learning vision and Language Visual Assistants

来源：评论

学校读者我要写书评

暂无评论

3D Human Pose Perception from Egocentric Stereo Videos

3D Human Pose Perception from Egocentric Stereo Videos

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Akada, Hiroyasu Wang, Jian Golyanik, Vladislav Theobalt, Christian Max Planck Inst Informat SIC Saarbrucken Germany

ISBN: (纸本)9798350353013;9798350353006

While head-mounted devices are becoming more compact, they provide egocentric views with significant self-occlusions of the device user. Hence, existing methods often fail to accurately estimate complex 3D poses from egocentric views. In this work, we propose a new transformer-based framework to improve egocentric stereo 3D human pose estimation, which leverages the scene information and temporal context of egocentric stereo videos. Specifically, we utilize 1) depth features from our 3D scene reconstruction module with uniformly sampled windows of egocentric stereo frames, and 2) human joint queries enhanced by temporal features of the video inputs. Our method is able to accurately estimate human poses even in challenging scenarios, such as crouching and sitting. Furthermore, we introduce two new benchmark datasets, i.e., UnrealEgo2 and UnrealEgo-RW (RealWorld). The proposed datasets offer a much larger number of egocentric stereo views with a wider variety of human motions than the existing datasets, allowing comprehensive evaluation of existing and upcoming methods. Our extensive experiments show that the proposed approach significantly outperforms previous methods. UnrealEgo2, UnrealEgo-RW, and trained models are available on our project page(1) and Benchmark Challenge(2).

关键词： Egocentric 3D human pose estimation First-person view Stereo vision

来源：评论

学校读者我要写书评

暂无评论

Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers

Zero-TPrune: Zero-Shot Token Pruning through Leveraging of t...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Wang, Hongjie Dedhia, Bhishma Jha, Niraj K. Princeton Univ Princeton NJ 08540 USA

ISBN: (纸本)9798350353006

Deployment of Transformer models on edge devices is becoming increasingly challenging due to the exponentially growing inference cost that scales quadratically with the number of tokens in the input sequence. Token pruning is an emerging solution to address this challenge due to its ease of deployment on various Transformer backbones. However, most token pruning methods require computationally expensive fine-tuning, which is undesirable in many edge deployment cases. In this work, we propose Zero-TPrune, the first zero-shot method that considers both the importance and similarity of tokens in performing token pruning. It leverages the attention graph of pre-trained Transformer models to produce an importance distribution for tokens via our proposed Weighted Page Rank (WPR) algorithm. This distribution further guides token partitioning for efficient similarity-based pruning. Due to the elimination of the fine-tuning overhead, Zero-TPrune can prune large models at negligible computational cost, switch between different pruning configurations at no computational cost, and perform hyperparameter tuning efficiently. We evaluate the performance of Zero-TPrune on vision tasks by applying it to various vision Transformer backbones and testing them on ImageNet. Without any fine-tuning, Zero-TPrune reduces the FLOPs cost of DeiT-S by 34.7% and improves its throughput by 45.3% with only 0.4% accuracy loss. Compared with state- of-the-art pruning methods that require fine-tuning, Zero-TPrune not only eliminates the need for fine-tuning after pruning but also does so with only 0.1% accuracy loss. Compared with state-of-the-art fine-tuning-free pruning methods, Zero-TPrune reduces accuracy loss by up to 49% with similar FLOPs budgets. Project webpage: https://***/zerotprune.

关键词： attention efficiency token pruning vision transformer zero-shot

来源：评论

学校读者我要写书评

暂无评论

Color Shift Estimation-and-Correction for Image Enhancement

Color Shift Estimation-and-Correction for Image Enhancement

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Li, Yiyu Xu, Ke Hancke, Gerhard Petrus Lau, Rynson W. H. City Univ Hong Kong Hong Kong Peoples R China

ISBN: (纸本)9798350353006

Images captured under sub-optimal illumination conditions may contain both over- and under-exposures. Current approaches mainly focus on adjusting image brightness, which may exacerbate color tone distortion in underexposed areas and fail to restore accurate colors in over-exposed regions. We observe that over- and over-exposed regions display opposite color tone distribution shifts, which may not be easily normalized in joint modeling as they usually do not have "normal-exposed" regions/pixels as reference. In this paper, we propose a novel method to enhance images with both over- and under-exposures by learning to estimate and correct such color shifts. Specifically, we first derive the color feature maps of the bright-ened and darkened versions of the input image via a UNet-based network, followed by a pseudo-normal feature generator to produce pseudo-normal color feature maps. We then propose a novel COlor Shift Estimation (COSE) module to estimate the color shifts between the derived brightened ( or darkened) color feature maps and the pseudo-normal color feature maps. The COSE module corrects the estimated color shifts of the over- and under-exposed regions separately. We further propose a novel COlor MOdulation (COMO) module to modulate the separately corrected colors in the over- and under-exposed regions to produce the enhanced image. Comprehensive experiments show that our method outperforms existing approaches. Project web-page: https://***/yiyulics/CSEC.

关键词： computer vision Exposure Correction Image Enhancement Low-level vision Low-light Image Enhancement

来源：评论

学校读者我要写书评

暂无评论

InternVL: Scaling up vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

InternVL: Scaling up Vision Foundation Models and Aligning f...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Chen, Zhe Wu, Jiannan Wang, Wenhai Su, Weijie Chen, Guo Xing, Sen Zhong, Muyan Zhang, Qinglong Zhu, Xizhou Lu, Lewei Li, Bin Luo, Ping Lu, Tong Qiao, Yu Dai, Jifeng Shanghai AI Lab OpenGVLab Shanghai Peoples R China Nanjing Univ Nanjing Peoples R China Univ Hong Kong Hong Kong Peoples R China Chinese Univ Hong Kong Hong Kong Peoples R China Tsinghua Univ Beijing Peoples R China Univ Sci & Technol China Hefei Peoples R China SenseTime Res Hong Kong Peoples R China Shanghai AI Lab Shanghai Peoples R China

ISBN: (纸本)9798350353006

The exponential growth of large language models (LLMs) has opened up numerous possibilities for multi-modal AGI systems. However, the progress in vision and vision-language foundation models, which are also critical elements of multi-modal AGI, has not kept pace with LLMs. In this work, we design a large-scale vision-language foundation model (InternVL), which scales up the vision foundation model to 6 billion parameters and progressively aligns it with the LLM, using web-scale image-text data from various sources. This model can be broadly applied to and achieve state-of-the-art performance on 32 generic visual-linguistic benchmarks including visual perception tasks such as image-level or pixel-level recognition, vision-language tasks such as zero-shot image/video classification, zero-shot image/video-text retrieval, and link with LLMs to create multi-modal dialogue systems. It has powerful visual capabilities and can be a good alternative to the ViT-22B. We hope that our research could contribute to the development of multi-modal large models.

关键词： multi-modal vision foundation model vision-language model

来源：评论

学校读者我要写书评

暂无评论

OpenBias: Open-set Bias Detection in Text-to-Image Generative Models

OpenBias: Open-set Bias Detection in Text-to-Image Generativ...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： D'Inca, Moreno Peruzzo, Elia Mancini, Massimiliano Xu, Dejia Goel, Vidit Xu, Xinggian Wang, Zhangyang Shi, Humphrey Sebe, Nicu Univ Trento Trento Italy UT Austin Austin TX USA SHI Labs Georgia Tech & UIUC Atlanta GA USA Picsart AI Res PAIR Miami FL USA

ISBN: (纸本)9798350353006

Text-to-image generative models are becoming increasingly popular and accessible to the general public. As these models see large-scale deployments, it is necessary to deeply investigate their safety and fairness to not disseminate and perpetuate any kind of biases. However, existing works focus on detecting closed sets of biases defined a priori, limiting the studies to well-known concepts. In this paper, we tackle the challenge of open-set bias detection in text-to-image generative models presenting OpenBias, a new pipeline that identifies and quantifies the severity of biases agnostically, without access to any precompiled set. OpenBias has three stages. In the first phase, we leverage a Large Language Model (LLM) to propose biases given a set of captions. Secondly, the target generative model produces images using the same set of captions. Lastly, a vision Question Answering model recognizes the presence and extent of the previously proposed biases. We study the behavior of Stable Diffusion 1.5, 2, and XL emphasizing new biases, never investigated before. Via quantitative experiments, we demonstrate that OpenBias agrees with current closed-set bias detection methods and human judgement.

关键词： Bias computer vision Fairness Generative AI Text-to-Image

来源：评论

学校读者我要写书评

暂无评论

Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

Scaling Up to Excellence: Practicing Model Scaling for Photo...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Yu, Fanghua Gu, Jinjin Li, Zheyuan Liu, Jinfan Kong, Xiangtao Wang, Xintao He, Jingwen Qiao, Yu Dong, Chao Chinese Acad Sci Shenzhen Inst Adv Technol Shenzhen Peoples R China Shanghai AI Lab Shanghai Peoples R China Hong Kong Polytech Univ Hong Kong Peoples R China Tencent PCG ARC Lab Hangzhou Peoples R China Chinese Univ Hong Kong Hong Kong Peoples R China

ISBN: (纸本)9798350353006

We introduce SUPIR (Scaling-UP Image Restoration), a groundbreaking image restoration method that harnesses generative prior and the power of model scaling up. Leveraging multi-modal techniques and advanced generative prior, SUPIR marks a significant advance in intelligent and realistic image restoration. As a pivotal catalyst within SUPIR, model scaling dramatically enhances its capabilities and demonstrates new potential for image restoration. We collect a dataset comprising 20 million high-resolution, high-quality images for model training, each enriched with descriptive text annotations. SUPIR provides the capability to restore images guided by textual prompts, broadening its application scope and potential. Moreover, we introduce negative-quality prompts to further improve perceptual quality. We also develop a restoration-guided sampling method to suppress the fidelity issue encountered in generative-based restoration. Experiments demonstrate SUPIR's exceptional restoration effects and its novel capacity to manipulate restoration through textual prompts.

关键词： Blind Image Restoration Diffusion Models Low-level vision

来源：评论

学校读者我要写书评

暂无评论

CAD-SIGNet: CAD Language Inference from Point Clouds using Layer-wise Sketch Instance Guided Attention

CAD-SIGNet: CAD Language Inference from Point Clouds using L...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Khan, Mohammad Sadil Dupont, Elona Ali, Sk Aziz Cherenkova, Kseniya Kacem, Anis Aouada, Djamila Univ Luxembourg SnT Luxembourg Luxembourg German Res Ctr Artificial Intelligence Berlin Germany Artec3D Luxembourg Luxembourg

ISBN: (纸本)9798350353013;9798350353006

Reverse engineering in the realm of computer-Aided Design (CAD) has been a longstanding aspiration, though not yet entirely realized. Its primary aim is to uncover the CAD process behind a physical object given its 3D scan. We propose CAD- SIGNet, an end-to-end trainable and aeto-regressive architecture to recover the design history of a CAD model represented as a sequence of sketch-and-extrusion from an input point cloud. Our model learns CAD visual-language representations by layer-wise cross-attention between point cloud and CAD language embedding. In particular, a new Sketch instance Guided Attention (SGA) module is proposed in order to reconstruct the fine-grained details of the sketches. Thanks to its auto-regressive nature, CAD-SIGNet not only reconstructs a unique full design history of the corresponding CAD model given an input point cloud but also provides multiple plausible design choices. This allows for an interactive reverse engineering scenario by providing designers with multiple next step choices along with the design process. Extensive experiments on publicly available CAD datasets showcase the effectiveness of our approach against existing baseline models in two settings, namely, full design history recovery and conditional autocompletion from point clouds.

关键词： 3D Point Clouds computer-Aided Design (CAD) Reverse Engineering vision Language Models

来源：评论

学校读者我要写书评

暂无评论

Lookahead Exploration with Neural Radiance Representation for Continuous vision-Language Navigation

Lookahead Exploration with Neural Radiance Representation fo...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Wang, Zihan Li, Xiangyang Yang, Jiahao Liu, Yeqi Hu, Junjie Jiang, Ming Jiang, Shuqiang Chinese Acad Sci Inst Comp Technol Beijing 100190 Peoples R China Univ Chinese Acad Sci Beijing 100049 Peoples R China Univ Wisconsin Dept Comp Sci 1210 W Dayton St Madison WI 53706 USA Univ Wisconsin Dept Biostat & Med Informat Madison WI USA Indiana Univ Dept Human Ctr Comp Indianapolis IN 46204 USA

ISBN: (纸本)9798350353006

vision-and-language navigation (VLN) enables the agent to navigate to a remote location following the natural language instruction in 3D environments. At each navigation step, the agent selects from possible candidate locations and then makes the move. For better navigation planning, the lookahead exploration strategy aims to effectively evaluate the agent's next action by accurately anticipating the future environment of candidate locations. To this end, some existing works predict RGB images for future environments, while this strategy suffers from image distortion and high computational cost. To address these issues, we propose the pre-trained hierarchical neural radiance representation model (HNR) to produce multi-level semantic features for future environments, which are more robust and efficient than pixel-wise RGB reconstruction. Furthermore, with the predicted future environmental representations, our lookahead VLN model is able to construct the navigable future path tree and select the optimal path via efficient parallel evaluation. Extensive experiments on the VLN-CE datasets confirm the effectiveness of our method. The code is available at https://***/MrZihan/HNR-VLN

关键词： Lookahead Exploration Neural Radiance Fields vision-and-Language Navigation

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 40 41 42 43 44 45 46 47 48 49 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：