检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

23,142 篇 会议
91 篇 期刊文献
15 册 图书

馆藏范围

23,247 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,637 篇 工学
- 11,168 篇 计算机科学与技术...
- 3,342 篇 软件工程
- 2,414 篇 机械工程
- 1,663 篇 光学工程
- 1,205 篇 电气工程
- 974 篇 控制科学与工程
- 739 篇 信息与通信工程
- 381 篇 仪器科学与技术
- 322 篇 生物工程
- 239 篇 生物医学工程（可授...
- 189 篇 电子科学与技术（可...
- 109 篇 化学工程与技术
- 106 篇 安全科学与工程
- 99 篇 测绘科学与技术
- 85 篇 建筑学
- 85 篇 交通运输工程
- 82 篇 土木工程
- 56 篇 力学（可授工学、理...
3,696 篇 医学
- 3,684 篇 临床医学
- 76 篇 基础医学(可授医学...
3,140 篇 理学
- 1,882 篇 物理学
- 1,605 篇 数学
- 547 篇 统计学（可授理学、...
- 466 篇 生物学
- 243 篇 系统科学
- 107 篇 化学
492 篇 管理学
- 290 篇 图书情报与档案管...
- 213 篇 管理科学与工程(可...
- 74 篇 工商管理
252 篇 艺术学
- 251 篇 设计学（可授艺术学...
58 篇 法学
38 篇 农学
25 篇 教育学
19 篇 经济学
10 篇 军事学
3 篇 文学

主题

10,395 篇 computer vision
3,893 篇 pattern recognit...
3,101 篇 training
2,104 篇 computational mo...
1,898 篇 visualization
1,799 篇 cameras
1,487 篇 feature extracti...
1,475 篇 three-dimensiona...
1,464 篇 shape
1,447 篇 image segmentati...
1,287 篇 robustness
1,235 篇 computer archite...
1,213 篇 semantics
1,112 篇 benchmark testin...
1,111 篇 conferences
1,104 篇 layout
1,092 篇 object detection
1,084 篇 computer science
1,026 篇 codes
907 篇 face recognition

机构

137 篇 univ sci & techn...
124 篇 univ chinese aca...
121 篇 chinese univ hon...
108 篇 tsinghua univers...
108 篇 carnegie mellon ...
105 篇 microsoft resear...
97 篇 zhejiang univ pe...
91 篇 swiss fed inst t...
85 篇 university of sc...
84 篇 zhejiang univers...
81 篇 shanghai ai lab ...
79 篇 university of ch...
75 篇 shanghai jiao to...
69 篇 microsoft res as...
68 篇 alibaba grp peop...
66 篇 adobe research
65 篇 national laborat...
64 篇 peking univ peop...
61 篇 univ oxford oxfo...
59 篇 peng cheng labor...

作者

80 篇 van gool luc
71 篇 timofte radu
65 篇 zhang lei
43 篇 luc van gool
40 篇 yang yi
37 篇 loy chen change
34 篇 li stan z.
33 篇 liu yang
33 篇 xiaoou tang
33 篇 murino vittorio
33 篇 chen chen
33 篇 qi tian
33 篇 li fei-fei
32 篇 tian qi
32 篇 sun jian
30 篇 ying shan
30 篇 pascal fua
29 篇 darrell trevor
28 篇 li xin
28 篇 hanqing lu

语言

23,073 篇 英文
148 篇 其他
20 篇 中文
5 篇 土耳其文
2 篇 日文

检索条件"任意字段=IEEE/CVF Conference on Computer Vision and Pattern Recognition"

共 23248 条记录，以下是401-410 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

DUSt3R: Geometric 3D vision Made Easy

DUSt3R: Geometric 3D Vision Made Easy

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Wang, Shuzhe Leroyt, Vincent Cabon, Yohann Chidlovskii, Boris Revaud, Jerome Aalto Univ Espoo Finland Naver Labs Europe Meylan France

ISBN: (纸本)9798350353006

Multi-view stereo reconstruction (MVS) in the wild re-quires to first estimate the camera intrinsic and extrinsic parameters. These are usually tedious and cumbersome to obtain, yet they are mandatory to triangulate corresponding pixels in 3D space, which is at the core of all best performing MVS algorithms. In this work, we take an opposite stance and introduce DUSt3R, a radically novel paradigm for Dense and Unconstrained Stereo 3D Reconstruction of arbitrary image collections, operating without prior information about camera calibration nor viewpoint poses. We cast the pairwise reconstruction problem as a regression of pointmaps, relaxing the hard constraints of usual projective camera models. We show that this formulation smoothly unifies the monocular and binocular reconstruction cases. In the case where more than two images are provided, we further propose a simple yet effective global alignment strategy that expresses all pairwise pointmaps in a common reference frame. We base our network architecture on standard Transformer encoders and decoders, allowing us to leverage powerful pretrained models. Our formulation directly provides a 3D model of the scene as well as depth information, but interestingly, we can seamlessly recover from it, pixel matches, focal lengths, relative and absolute cameras. Extensive experiments on all these tasks showcase how DUSt3R effectively unifies various 3D vision tasks, setting new performance records on monocular & multi-view depth estimation as well as relative pose estimation. In summary, DUSt3R makes many geometric 3D vision tasks easy. Code and mod-els at https://***/naver/dust3r.

关键词： 3D reconstruction camera calibration foundation model monocular depth multi-view depth multi-view pose estimation visual localization

来源：评论

学校读者我要写书评

暂无评论

SHViT: Single-Head vision Transformer with Memory Efficient Macro Design

SHViT: Single-Head Vision Transformer with Memory Efficient ...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Yun, Seokju Ro, Youngmin Univ Seoul Machine Intelligence Lab Seoul South Korea

ISBN: (纸本)9798350353013;9798350353006

Recently, efficient vision Transformers have shown great performance with low latency on resource-constrained devices. Conventionally, they use 4x4 patch embeddings and a 4-stage structure at the macro level, while utilizing sophisticated attention with multi-head configuration at the micro level. This paper aims to address computational redundancy at all design levels in a memory-efficient manner. We discover that using larger-stride patchify stem not only reduces memory access costs but also achieves competitive performance by leveraging token representations with reduced spatial redundancy from the early stages. Furthermore, our preliminary analyses suggest that attention layers in the early stages can be substituted with convolutions, and several attention heads in the latter stages are computationally redundant. To handle this, we introduce a single-head attention module that inherently prevents head redundancy and simultaneously boosts accuracy by parallelly combining global and local information. Building upon our solutions, we introduce SHViT, a SingleHead vision Transformer that obtains the state-of-the-art speed-accuracy tradeoff. For example, on ImageNet-1k, our SHViT-S4 is 3.3x, 8.1x, and 2.4x faster than MobileViTv2 x1.0 on GPU, CPU, and iPhone12 mobile device, respectively, while being 1.3% more accurate. For object detection and instance segmentation on MS COCO using MaskRCNN head, our model achieves performance comparable to FastViT-SA12 while exhibiting 3.8x and 2.0x lower backbone latency on GPU and mobile device, respectively.

关键词： CNNs efficient visual backbone vision Transformer

来源：评论

学校读者我要写书评

暂无评论

3D Human Pose Perception from Egocentric Stereo Videos

3D Human Pose Perception from Egocentric Stereo Videos

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Akada, Hiroyasu Wang, Jian Golyanik, Vladislav Theobalt, Christian Max Planck Inst Informat SIC Saarbrucken Germany

ISBN: (纸本)9798350353013;9798350353006

While head-mounted devices are becoming more compact, they provide egocentric views with significant self-occlusions of the device user. Hence, existing methods often fail to accurately estimate complex 3D poses from egocentric views. In this work, we propose a new transformer-based framework to improve egocentric stereo 3D human pose estimation, which leverages the scene information and temporal context of egocentric stereo videos. Specifically, we utilize 1) depth features from our 3D scene reconstruction module with uniformly sampled windows of egocentric stereo frames, and 2) human joint queries enhanced by temporal features of the video inputs. Our method is able to accurately estimate human poses even in challenging scenarios, such as crouching and sitting. Furthermore, we introduce two new benchmark datasets, i.e., UnrealEgo2 and UnrealEgo-RW (RealWorld). The proposed datasets offer a much larger number of egocentric stereo views with a wider variety of human motions than the existing datasets, allowing comprehensive evaluation of existing and upcoming methods. Our extensive experiments show that the proposed approach significantly outperforms previous methods. UnrealEgo2, UnrealEgo-RW, and trained models are available on our project page(1) and Benchmark Challenge(2).

关键词： Egocentric 3D human pose estimation First-person view Stereo vision

来源：评论

学校读者我要写书评

暂无评论

CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update

CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Up...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Gao, Zhi Du, Yuntao Zhang, Xintong Ma, Xiaojian Han, Wenjuan Zhu, Song-Chun Li, Qing Peking Univ Sch Intelligence Sci & Technol Beijing Peoples R China BIGAI State Key Lab Gen Artificial Intelligence Beijing Peoples R China Beijing Jiaotong Univ Beijing Peoples R China Tsinghua Univ Dept Automat Beijing Peoples R China

ISBN: (纸本)9798350353006

Utilizing large language models (LLMs) to compose off-the-shelf visual tools represents a promising avenue of research for developing robust visual assistants capable of addressing diverse visual tasks. However, these methods often overlook the potential for continual learning, typically by freezing the utilized tools, thus limiting their adaptation to environments requiring new knowledge. To tackle this challenge, we propose CLOVA, a Closed-LOop Visual Assistant, which operates within a framework encompassing inference, reflection, and learning phases. During the inference phase, LLMs generate programs and execute cor responding tools to complete assigned tasks. In the reflection phase, a multimodal global-local reflection scheme analyzes human feedback to determine which tools require updating. Lastly, the learning phase employs three flexible approaches to automatically gather training data and introduces a novel prompt tuning scheme to update the tools, allowing CLOVA to efficiently acquire new knowledge. Experimental findings demonstrate that CLOVA surpasses existing tool-usage methods by 5% in visual question answering and multiple-image reasoning, by 10% in knowledge tagging, and by 20% in image editing. These results underscore the significance of the continual learning capability in general visual assistants.

关键词： Compositional Reasoning Large Language Models Multimodal Learning vision and Language Visual Assistants

来源：评论

学校读者我要写书评

暂无评论

Equivariant Multi-Modality Image Fusion

Equivariant Multi-Modality Image Fusion

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Zhao, Zixiang Hai, Haowen Zhang, Jiangshe Zhang, Yulun Zhane, Kai Xu, Shuang Chen, Dongdong Timofte, Radu Van Gool, Luc Xi An Jiao Tong Univ Xian Peoples R China Swiss Fed Inst Technol Zurich Switzerland Shanghai Jiao Tong Univ Shanghai Peoples R China Nanjing Univ Nanjing Peoples R China Northwestern Polytech Univ Xian Peoples R China Heriot Watt Univ Edinburgh Midlothian Scotland Univ Wurzburg Wurzburg Germany INSAIT Sofia Bulgaria

ISBN: (纸本)9798350353006

Multi-modality image fusion is a technique that combines information from different sensors or modalities, enabling the fused image to retain complementary features from each modality, such as functional highlights and texture details. However, effective training of such fusion models is challenging due to the scarcity of ground truth fusion data. To tackle this issue, we propose the Equivariant Multi-Modality imAge fusion (EMMA) paradigm for end-to-end self-supervised learning. Our approach is rooted in the prior knowledge that natural imaging responses are equivariant to certain transformations. Consequently, we introduce a novel training paradigm that encompasses a fusion module, a pseudo-sensing module, and an equivariant fusion module. These components enable the net training to follow the principles of the natural sensing-imaging process while satisfying the equivariant imaging prior. Extensive experiments confirm that EMMA yields high-quality fusion results for infraredvisible and medical images, concurrently facilitating downstream multi-modal segmentation and detection tasks. The code is available at https://***/Zhaozixiang1228/MMIF-EMMA.

关键词： image fusion low-level vision

来源：评论

学校读者我要写书评

暂无评论

Color Shift Estimation-and-Correction for Image Enhancement

Color Shift Estimation-and-Correction for Image Enhancement

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Li, Yiyu Xu, Ke Hancke, Gerhard Petrus Lau, Rynson W. H. City Univ Hong Kong Hong Kong Peoples R China

ISBN: (纸本)9798350353006

Images captured under sub-optimal illumination conditions may contain both over- and under-exposures. Current approaches mainly focus on adjusting image brightness, which may exacerbate color tone distortion in underexposed areas and fail to restore accurate colors in over-exposed regions. We observe that over- and over-exposed regions display opposite color tone distribution shifts, which may not be easily normalized in joint modeling as they usually do not have "normal-exposed" regions/pixels as reference. In this paper, we propose a novel method to enhance images with both over- and under-exposures by learning to estimate and correct such color shifts. Specifically, we first derive the color feature maps of the bright-ened and darkened versions of the input image via a UNet-based network, followed by a pseudo-normal feature generator to produce pseudo-normal color feature maps. We then propose a novel COlor Shift Estimation (COSE) module to estimate the color shifts between the derived brightened ( or darkened) color feature maps and the pseudo-normal color feature maps. The COSE module corrects the estimated color shifts of the over- and under-exposed regions separately. We further propose a novel COlor MOdulation (COMO) module to modulate the separately corrected colors in the over- and under-exposed regions to produce the enhanced image. Comprehensive experiments show that our method outperforms existing approaches. Project web-page: https://***/yiyulics/CSEC.

关键词： computer vision Exposure Correction Image Enhancement Low-level vision Low-light Image Enhancement

来源：评论

学校读者我要写书评

暂无评论

Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers

Zero-TPrune: Zero-Shot Token Pruning through Leveraging of t...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Wang, Hongjie Dedhia, Bhishma Jha, Niraj K. Princeton Univ Princeton NJ 08540 USA

ISBN: (纸本)9798350353006

Deployment of Transformer models on edge devices is becoming increasingly challenging due to the exponentially growing inference cost that scales quadratically with the number of tokens in the input sequence. Token pruning is an emerging solution to address this challenge due to its ease of deployment on various Transformer backbones. However, most token pruning methods require computationally expensive fine-tuning, which is undesirable in many edge deployment cases. In this work, we propose Zero-TPrune, the first zero-shot method that considers both the importance and similarity of tokens in performing token pruning. It leverages the attention graph of pre-trained Transformer models to produce an importance distribution for tokens via our proposed Weighted Page Rank (WPR) algorithm. This distribution further guides token partitioning for efficient similarity-based pruning. Due to the elimination of the fine-tuning overhead, Zero-TPrune can prune large models at negligible computational cost, switch between different pruning configurations at no computational cost, and perform hyperparameter tuning efficiently. We evaluate the performance of Zero-TPrune on vision tasks by applying it to various vision Transformer backbones and testing them on ImageNet. Without any fine-tuning, Zero-TPrune reduces the FLOPs cost of DeiT-S by 34.7% and improves its throughput by 45.3% with only 0.4% accuracy loss. Compared with state- of-the-art pruning methods that require fine-tuning, Zero-TPrune not only eliminates the need for fine-tuning after pruning but also does so with only 0.1% accuracy loss. Compared with state-of-the-art fine-tuning-free pruning methods, Zero-TPrune reduces accuracy loss by up to 49% with similar FLOPs budgets. Project webpage: https://***/zerotprune.

关键词： attention efficiency token pruning vision transformer zero-shot

来源：评论

学校读者我要写书评

暂无评论

InternVL: Scaling up vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

InternVL: Scaling up Vision Foundation Models and Aligning f...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Chen, Zhe Wu, Jiannan Wang, Wenhai Su, Weijie Chen, Guo Xing, Sen Zhong, Muyan Zhang, Qinglong Zhu, Xizhou Lu, Lewei Li, Bin Luo, Ping Lu, Tong Qiao, Yu Dai, Jifeng Shanghai AI Lab OpenGVLab Shanghai Peoples R China Nanjing Univ Nanjing Peoples R China Univ Hong Kong Hong Kong Peoples R China Chinese Univ Hong Kong Hong Kong Peoples R China Tsinghua Univ Beijing Peoples R China Univ Sci & Technol China Hefei Peoples R China SenseTime Res Hong Kong Peoples R China Shanghai AI Lab Shanghai Peoples R China

ISBN: (纸本)9798350353006

The exponential growth of large language models (LLMs) has opened up numerous possibilities for multi-modal AGI systems. However, the progress in vision and vision-language foundation models, which are also critical elements of multi-modal AGI, has not kept pace with LLMs. In this work, we design a large-scale vision-language foundation model (InternVL), which scales up the vision foundation model to 6 billion parameters and progressively aligns it with the LLM, using web-scale image-text data from various sources. This model can be broadly applied to and achieve state-of-the-art performance on 32 generic visual-linguistic benchmarks including visual perception tasks such as image-level or pixel-level recognition, vision-language tasks such as zero-shot image/video classification, zero-shot image/video-text retrieval, and link with LLMs to create multi-modal dialogue systems. It has powerful visual capabilities and can be a good alternative to the ViT-22B. We hope that our research could contribute to the development of multi-modal large models.

关键词： multi-modal vision foundation model vision-language model

来源：评论

学校读者我要写书评

暂无评论

OpenBias: Open-set Bias Detection in Text-to-Image Generative Models

OpenBias: Open-set Bias Detection in Text-to-Image Generativ...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： D'Inca, Moreno Peruzzo, Elia Mancini, Massimiliano Xu, Dejia Goel, Vidit Xu, Xinggian Wang, Zhangyang Shi, Humphrey Sebe, Nicu Univ Trento Trento Italy UT Austin Austin TX USA SHI Labs Georgia Tech & UIUC Atlanta GA USA Picsart AI Res PAIR Miami FL USA

ISBN: (纸本)9798350353006

Text-to-image generative models are becoming increasingly popular and accessible to the general public. As these models see large-scale deployments, it is necessary to deeply investigate their safety and fairness to not disseminate and perpetuate any kind of biases. However, existing works focus on detecting closed sets of biases defined a priori, limiting the studies to well-known concepts. In this paper, we tackle the challenge of open-set bias detection in text-to-image generative models presenting OpenBias, a new pipeline that identifies and quantifies the severity of biases agnostically, without access to any precompiled set. OpenBias has three stages. In the first phase, we leverage a Large Language Model (LLM) to propose biases given a set of captions. Secondly, the target generative model produces images using the same set of captions. Lastly, a vision Question Answering model recognizes the presence and extent of the previously proposed biases. We study the behavior of Stable Diffusion 1.5, 2, and XL emphasizing new biases, never investigated before. Via quantitative experiments, we demonstrate that OpenBias agrees with current closed-set bias detection methods and human judgement.

关键词： Bias computer vision Fairness Generative AI Text-to-Image

来源：评论

学校读者我要写书评

暂无评论

Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

Scaling Up to Excellence: Practicing Model Scaling for Photo...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Yu, Fanghua Gu, Jinjin Li, Zheyuan Liu, Jinfan Kong, Xiangtao Wang, Xintao He, Jingwen Qiao, Yu Dong, Chao Chinese Acad Sci Shenzhen Inst Adv Technol Shenzhen Peoples R China Shanghai AI Lab Shanghai Peoples R China Hong Kong Polytech Univ Hong Kong Peoples R China Tencent PCG ARC Lab Hangzhou Peoples R China Chinese Univ Hong Kong Hong Kong Peoples R China

ISBN: (纸本)9798350353006

We introduce SUPIR (Scaling-UP Image Restoration), a groundbreaking image restoration method that harnesses generative prior and the power of model scaling up. Leveraging multi-modal techniques and advanced generative prior, SUPIR marks a significant advance in intelligent and realistic image restoration. As a pivotal catalyst within SUPIR, model scaling dramatically enhances its capabilities and demonstrates new potential for image restoration. We collect a dataset comprising 20 million high-resolution, high-quality images for model training, each enriched with descriptive text annotations. SUPIR provides the capability to restore images guided by textual prompts, broadening its application scope and potential. Moreover, we introduce negative-quality prompts to further improve perceptual quality. We also develop a restoration-guided sampling method to suppress the fidelity issue encountered in generative-based restoration. Experiments demonstrate SUPIR's exceptional restoration effects and its novel capacity to manipulate restoration through textual prompts.

关键词： Blind Image Restoration Diffusion Models Low-level vision

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 37 38 39 40 41 42 43 44 45 46 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：