检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

23,001 篇 会议
126 册 图书
92 篇 期刊文献

馆藏范围

23,218 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,623 篇 工学
- 11,108 篇 计算机科学与技术...
- 3,479 篇 软件工程
- 2,445 篇 机械工程
- 1,716 篇 光学工程
- 1,075 篇 电气工程
- 1,014 篇 控制科学与工程
- 785 篇 信息与通信工程
- 412 篇 仪器科学与技术
- 352 篇 生物工程
- 251 篇 生物医学工程（可授...
- 196 篇 电子科学与技术（可...
- 114 篇 化学工程与技术
- 108 篇 安全科学与工程
- 100 篇 测绘科学与技术
- 88 篇 建筑学
- 87 篇 交通运输工程
- 84 篇 土木工程
3,494 篇 医学
- 3,481 篇 临床医学
- 81 篇 基础医学(可授医学...
3,242 篇 理学
- 1,939 篇 物理学
- 1,640 篇 数学
- 563 篇 统计学（可授理学、...
- 500 篇 生物学
- 249 篇 系统科学
- 107 篇 化学
522 篇 管理学
- 311 篇 图书情报与档案管...
- 224 篇 管理科学与工程(可...
- 76 篇 工商管理
276 篇 艺术学
- 276 篇 设计学（可授艺术学...
66 篇 法学
- 63 篇 社会学
38 篇 农学
28 篇 教育学
22 篇 经济学
10 篇 军事学
3 篇 文学

主题

10,187 篇 computer vision
3,967 篇 pattern recognit...
3,005 篇 training
2,007 篇 computational mo...
1,818 篇 visualization
1,815 篇 cameras
1,516 篇 feature extracti...
1,481 篇 shape
1,455 篇 three-dimensiona...
1,438 篇 image segmentati...
1,287 篇 robustness
1,205 篇 computer archite...
1,155 篇 semantics
1,147 篇 conferences
1,107 篇 layout
1,092 篇 computer science
1,087 篇 object detection
1,025 篇 benchmark testin...
970 篇 codes
922 篇 face recognition

机构

136 篇 univ sci & techn...
121 篇 univ chinese aca...
118 篇 chinese univ hon...
107 篇 carnegie mellon ...
101 篇 tsinghua univers...
101 篇 microsoft resear...
95 篇 swiss fed inst t...
93 篇 zhejiang univ pe...
82 篇 university of sc...
81 篇 zhejiang univers...
80 篇 university of ch...
77 篇 shanghai ai lab ...
72 篇 shanghai jiao to...
69 篇 national laborat...
67 篇 microsoft res as...
67 篇 alibaba grp peop...
64 篇 adobe research
61 篇 tsinghua univ pe...
60 篇 peking univ peop...
59 篇 univ oxford oxfo...

作者

81 篇 van gool luc
72 篇 timofte radu
64 篇 zhang lei
47 篇 luc van gool
40 篇 yang yi
40 篇 li stan z.
37 篇 loy chen change
34 篇 chen chen
33 篇 xiaoou tang
32 篇 liu yang
32 篇 qi tian
31 篇 tian qi
31 篇 sun jian
30 篇 murino vittorio
30 篇 pascal fua
29 篇 darrell trevor
29 篇 li fei-fei
28 篇 li xin
28 篇 ying shan
27 篇 vasconcelos nuno

语言

23,137 篇 英文
53 篇 其他
22 篇 中文
5 篇 土耳其文
2 篇 日文

检索条件"任意字段=IEEE Conference on Computer Vision and Pattern Recognition Workshops"

共 23219 条记录，以下是201-210 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Intrinsic Image Diffusion for Indoor Single-view Material Estimation

Intrinsic Image Diffusion for Indoor Single-view Material Es...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Kocsis, Peter Sitzmann, Vincent Niessner, Matthias Tech Univ Munich Munich Germany MIT EECS Cambridge MA 02139 USA

ISBN: (纸本)9798350353013;9798350353006

We present Intrinsic Image Diffusion, a generative model for appearance decomposition of indoor scenes. Given a single input view, we sample multiple possible material explanations represented as albedo, roughness, and metallic maps. Appearance decomposition poses a considerable challenge in computer vision due to the inherent ambiguity between lighting and material properties and the lack of real datasets. To address this issue, we advocate for a probabilistic formulation, where instead of attempting to directly predict the true material properties, we employ a conditional generative model to sample from the solution space. Furthermore, we show that utilizing the strong learned prior of recent diffusion models trained on large-scale real-world images can be adapted to material estimation and highly improves the generalization to real images. Our method produces significantly sharper, more consistent, and more detailed materials, outperforming state-of-the-art methods by 1.5dB on PSNR and by 45% better FID score on albedo prediction. We demonstrate the effectiveness of our approach through experiments on both synthetic and real-world datasets.

关键词： Appearance Decompostion computer vision Deep Learning Diffusion Graphics Lighting Estimation Material Estimation

来源：评论

学校读者我要写书评

暂无评论

Solving Masked Jigsaw Puzzles with Diffusion vision Transformers

Solving Masked Jigsaw Puzzles with Diffusion Vision Transfor...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Liu, Jinyang Teshome, Wondmgezahu Ghimire, Sandesh Sznaier, Mario Camps, Octavia Northeastern Univ Boston MA 02115 USA Qualcomm San Diego CA USA

ISBN: (纸本)9798350353006

Solving image and video jigsaw puzzles poses the challenging task of rearranging image fragments or video frames from unordered sequences to restore meaningful images and video sequences. Existing approaches often hinge on discriminative models tasked with predicting either the absolute positions of puzzle elements or the permutation actions applied to the original data. Unfortunately, these methods face limitations in effectively solving puzzles with a large number of elements. In this paper, we propose JPDVT, an innovative approach that harnesses diffusion transformers to address this challenge. Specifically, we generate positional information for image patches or video frames, conditioned on their underlying visual content. This information is then employed to accurately assemble the puzzle pieces in their correct positions, even in scenarios involving missing pieces. Our method achieves state-of-the-art performance on several datasets.

关键词： data imputation diffusion models Solving puzzles

来源：评论

学校读者我要写书评

暂无评论

BIOCLIP: A vision Foundation Model for the Tree of Life

BIOCLIP: A Vision Foundation Model for the Tree of Life

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Stevens, Samuel Wu, Jiaman Thompson, Matthew J. Campolongo, Elizabeth G. Song, Chan Hee Carlyle, David Edward Dong, Li Dahdul, Wasila M. Stewart, Charles Berger-Wolf, Tanya Chao, Wei-Lun Su, Yu Ohio State Univ Columbus OH 43210 USA Microsoft Res Mountain View CA USA Univ Calif Irvine Irvine CA USA Rensselaer Polytech Inst Troy NY USA

ISBN: (纸本)9798350353006

Images of the natural world, collected by a variety of cameras, from drones to individual phones, are increasingly abundant sources of biological information. There is an explosion of computational methods and tools, particularly computer vision, for extracting biologically relevant information from images for science and conservation. Yet most of these are bespoke approaches designed for a specific task and are not easily adaptable or extendable to new questions, contexts, and datasets. A vision model for general organismal biology questions on images is of timely need. To approach this, we curate and release TREEOFLIFE-10M, the largest and most diverse ML-ready dataset of biology images. We then develop BIOCLIP, a foundation model for the tree of life, leveraging the unique properties of biology captured by TREEOFLIFE-10M, namely the abundance and variety of images of plants, animals, and fungi, together with the availability of rich structured biological knowledge. We rigorously benchmark our approach on diverse fine-grained biology classification tasks and find that BIOCLIP consistently and substantially outperforms existing baselines (by 16% to 17% absolute). Intrinsic evaluation reveals that BIOCLIP has learned a hierarchical representation conforming to the tree of life, shedding light on its strong generalizability.(1)

关键词： computer vision evolutionary biology & ecology imageomics machine learning

来源：评论

学校读者我要写书评

暂无评论

MAFA: Managing False Negatives for vision-Language Pre-training

MAFA: Managing False Negatives for Vision-Language Pre-train...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Byun, Jaeseok Kim, Dohoon Moon, Taesup Seoul Natl Univ Dept ECE Seoul South Korea Seoul Natl Univ Dept ASRI INMC IPAI AIIS Seoul South Korea

ISBN: (纸本)9798350353006

We consider a critical issue of false negatives in vision-Language Pre-training (VLP), a challenge that arises from the inherent many-to-many correspondence of image-text pairs in large-scale web-crawled datasets. The presence of false negatives can impede achieving optimal performance and even lead to a significant performance drop. To address this challenge, we propose MAFA (MAnaging FAlse negatives), which consists of two pivotal components building upon the recently developed GRouped mIni-baTch sampling (GRIT) strategy: 1) an efficient connection mining process that identifies and converts false negatives into positives, and 2) label smoothing for the image-text contrastive (ITC) loss. Our comprehensive experiments verify the effectiveness of MAFA across multiple downstream tasks, emphasizing the crucial role of addressing false negatives in VLP, potentially even surpassing the importance of addressing false positives. In addition, the compatibility of MAFA with the recent BLIP-family model is also demonstrated. Code is available at https://***/jaeseokbyun/MAFA.

关键词：

来源：评论

学校读者我要写书评

暂无评论

VicTR: Video-conditioned Text Representations for Activity recognition

VicTR: Video-conditioned Text Representations for Activity R...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Kahatapitiya, Kumara Arnab, Anurag Nagrani, Arsha Ryoo, Michael S. SUNY Stony Brook Stony Brook NY 11794 USA Google Res Mountain View CA USA

ISBN: (纸本)9798350353006

vision-Language models (VLMs) have excelled in the image-domain- especially in zero-shot settings- thanks to the availability of vast pretraining data (i.e., paired image-text samples). However for videos, such paired data is not as abundant. Therefore, video- VLMs are usually designed by adapting pretrained image- VLMs to the video-domain, instead of training from scratch. All such recipes rely on aug-menting visual embeddings with temporal information (i.e., image -+ video), often keeping text embeddings unchanged or even being discarded. In this paper, we argue the contrary, that better video- VLMs can be designed by focusing more on augmenting text, rather than visual information. More specifically, we introduce Video-conditioned Text Representations (Vi c TR): a form of text embeddings optimized w.r.t. vi-sual embeddings, creating a more-flexible contrastive latent space. Our model canfurther make use offreely-available semantic information, in the form of visually- grounded aux-iliary text (e.g. object or scene information). We evaluate our model on few-shot, zero-shot (HMDB-51, UCF-10l), short-form (Kinetics-400) and long-form (Charades) activ-ity recognition benchmarks, showing strong performance among video-VLMs.

关键词： Activity recognition Video Understanding Video-conditioned Text vision-language models

来源：评论

学校读者我要写书评

暂无评论

StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN

StyleCineGAN: Landscape Cinemagraph Generation using a Pre-t...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Choi, Jongwoo Seo, Kwanggyoon Ashtari, Amirsaman Noh, Junyong Korea Adv Inst Sci & Technol Visual Media Lab Daejeon South Korea

ISBN: (纸本)9798350353013;9798350353006

We propose a method that can generate cinemagraphs automatically from a still landscape image using a pre-trained StyleGAN. Inspired by the success of recent unconditional video generation, we leverage a powerful pre-trained image generator to synthesize high-quality cinemagraphs. Unlike previous approaches that mainly utilize the latent space of a pre-trained StyleGAN, our approach utilizes its deep feature space for both GAN inversion and cinemagraph generation. Specifically, we propose multi-scale deep feature warping (MSDFW), which warps the intermediate features of a pre-trained StyleGAN at different resolutions. By using MSDFW, the generated cinemagraphs are of high resolution and exhibit plausible looping animation. We demonstrate the superiority of our method through user studies and quantitative comparisons with state-of-the-art cinemagraph generation methods and a video generation method that uses a pre-trained StyleGAN.

关键词： Image and video synthesis and generation vision + graphics vision applications and systems

来源：评论

学校读者我要写书评

暂无评论

Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting

Attention-Propagation Network for Egocentric Heatmap to 3D P...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Kang, Taeho Lee, Youngki Seoul Natl Univ Seoul South Korea

ISBN: (纸本)9798350353013;9798350353006

We present EgoTAP, a heatmap-to-3D pose lifting method for highly accurate stereo egocentric 3D pose estimation. Severe self-occlusion and out-of-view limbs in egocentric camera views make accurate pose estimation a challenging problem. To address the challenge, prior methods employ joint heatmaps-probabilistic 2D representations of the body pose, but heatmap-to-3D pose conversion still remains an inaccurate process. We propose a novel heatmap-to-3D lifting method composed of the Grid ViT Encoder and the Propagation Network. The Grid ViT Encoder summarizes joint heatmaps into effective feature embedding using self-attention. Then, the Propagation Network estimates the 3D pose by utilizing skeletal information to better estimate the position of obscure joints. Our method significantly outperforms the previous state-of-the-art qualitatively and quantitatively demonstrated by a 23.9% reduction of error in an MPJPE metric. Our source code is available on GitHub (1).

关键词： 3D pose estimation egocentric vision stereo vision

来源：评论

学校读者我要写书评

暂无评论

Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation

Animate Anyone: Consistent and Controllable Image-to-Video S...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Hu, Li Alibaba Grp Inst Intelligent Comp Hangzhou Peoples R China

ISBN: (数字)9798350353006

ISBN: (纸本)9798350353013;9798350353006

Character Animation aims to generating character videos from still images through driving signals. Currently, diffusion models have become the mainstream in visual generation research, owing to their robust generative capabilities. However, challenges persist in the realm of image-to-video, especially in character animation, where temporally maintaining consistency with detailed information from character remains a formidable problem. In this paper, we leverage the power of diffusion models and propose a novel framework tailored for character animation. To preserve consistency of intricate appearance features from reference image, we design ReferenceNet to merge detail features via spatial attention. To ensure controllability and continuity, we introduce an efficient pose guider to direct character's movements and employ an effective temporal modeling approach to ensure smooth inter-frame transitions between video frames. By expanding the training data, our approach can animate arbitrary characters, yielding superior results in character animation compared to other image-to-video methods. Furthermore, we evaluate our method on image animation benchmarks, achieving state-of-the-art results.

关键词： Visualization computer vision Training data Benchmark testing Animation Diffusion models Controllability

来源：评论

学校读者我要写书评

暂无评论

Resource-Efficient Transformer Pruning for Finetuning of Large Models

Resource-Efficient Transformer Pruning for Finetuning of Lar...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Ilhan, Fatih Su, Gong Tekin, Selim Furkan Huang, Tiansheng Hu, Sihao Liu, Ling Georgia Inst Technol Atlanta GA 30332 USA IBM Res Yorktown Hts NY USA

ISBN: (纸本)9798350353006

With the recent advances in vision transformers and large language models (LLMs), finetuning costly large models on downstream learning tasks poses significant challenges under limited computational resources. This paper presents a REsource and ComputAtion-efficient Pruning framework (RECAP) for the finetuning of transformer-based large models. RECAP by design bridges the gap between efficiency and performance through an iterative process cycling between pruning, finetuning, and updating stages to explore different chunks of the given large-scale model. At each iteration, we first prune the model with Taylor-approximation-based importance estimation and then only update a subset of the pruned model weights based on the Fisher-information criterion. In this way, RECAP achieves two synergistic and yet conflicting goals: reducing the GPU memory footprint while maintaining model performance, unlike most existing pruning methods that require the model to be finetuned beforehand for better preservation of model performance. We perform extensive experiments with a wide range of large transformer-based architectures on various computer vision and natural language understanding tasks. Compared to recent pruning techniques, we demonstrate that RECAP offers significant improvements in GPU memory efficiency, capable of reducing the footprint by up to 65%.

关键词： efficient finetuning pruning vision transformers

来源：评论

学校读者我要写书评

暂无评论

CONFORM: Contrast is All You Need For High-Fidelity Text-to-Image Diffusion Models

CONFORM: Contrast is All You Need For High-Fidelity Text-to-...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Meral, Tuna Han Salih Simsar, Enis Tombari, Federico Yanardag, Pinar Virginia Tech Blacksburg VA USA Swiss Fed Inst Technol Zurich Switzerland TUM Munich Germany Google Menlo Pk CA USA

ISBN: (纸本)9798350353006

Images produced by text-to-image diffusion models might not always faithfully represent the semantic intent of the provided text prompt, where the model might overlook or entirely fail to produce certain objects. Existing solutions often require customly tailored functions for each of these problems, leading to sub-optimal results, especially for complex prompts. Our work introduces a novel perspective by tackling this challenge in a contrastive context. Our approach intuitively promotes the segregation of objects in attention maps while also maintaining that pairs of related attributes are kept close to each other. We conduct extensive experiments across a wide variety of scenarios, each involving unique combinations of objects, attributes, and scenes. These experiments effectively showcase the versatility, efficiency, and flexibility of our method in working with both latent and pixel-based diffusion models, including Stable Diffusion and Imagen. Moreover, we publicly share our source code to facilitate further research.

关键词： computer vision Contrastive learning Generative AI Semantic fidelity Stable Diffusion Text-to-image diffusion models

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 17 18 19 20 21 22 23 24 25 26 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：