检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

23,136 篇 会议
90 篇 期刊文献
15 册 图书

馆藏范围

23,240 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,632 篇 工学
- 11,162 篇 计算机科学与技术...
- 3,338 篇 软件工程
- 2,413 篇 机械工程
- 1,664 篇 光学工程
- 1,204 篇 电气工程
- 973 篇 控制科学与工程
- 738 篇 信息与通信工程
- 381 篇 仪器科学与技术
- 322 篇 生物工程
- 239 篇 生物医学工程（可授...
- 188 篇 电子科学与技术（可...
- 109 篇 化学工程与技术
- 104 篇 安全科学与工程
- 99 篇 测绘科学与技术
- 85 篇 建筑学
- 83 篇 交通运输工程
- 82 篇 土木工程
- 56 篇 力学（可授工学、理...
3,695 篇 医学
- 3,683 篇 临床医学
- 76 篇 基础医学(可授医学...
3,138 篇 理学
- 1,880 篇 物理学
- 1,605 篇 数学
- 547 篇 统计学（可授理学、...
- 466 篇 生物学
- 243 篇 系统科学
- 107 篇 化学
491 篇 管理学
- 290 篇 图书情报与档案管...
- 212 篇 管理科学与工程(可...
- 74 篇 工商管理
252 篇 艺术学
- 251 篇 设计学（可授艺术学...
58 篇 法学
38 篇 农学
25 篇 教育学
19 篇 经济学
10 篇 军事学
3 篇 文学

主题

10,396 篇 computer vision
3,893 篇 pattern recognit...
3,105 篇 training
2,097 篇 computational mo...
1,895 篇 visualization
1,798 篇 cameras
1,481 篇 feature extracti...
1,479 篇 three-dimensiona...
1,468 篇 shape
1,445 篇 image segmentati...
1,288 篇 robustness
1,229 篇 computer archite...
1,214 篇 semantics
1,112 篇 conferences
1,112 篇 benchmark testin...
1,102 篇 layout
1,093 篇 object detection
1,085 篇 computer science
1,022 篇 codes
903 篇 face recognition

机构

137 篇 univ sci & techn...
124 篇 univ chinese aca...
121 篇 chinese univ hon...
109 篇 carnegie mellon ...
108 篇 tsinghua univers...
105 篇 microsoft resear...
97 篇 zhejiang univ pe...
95 篇 swiss fed inst t...
85 篇 university of sc...
84 篇 zhejiang univers...
81 篇 shanghai ai lab ...
79 篇 university of ch...
75 篇 shanghai jiao to...
70 篇 microsoft res as...
70 篇 alibaba grp peop...
66 篇 adobe research
65 篇 national laborat...
64 篇 peking univ peop...
61 篇 univ oxford oxfo...
60 篇 peng cheng labor...

作者

80 篇 van gool luc
72 篇 timofte radu
65 篇 zhang lei
49 篇 luc van gool
40 篇 yang yi
37 篇 loy chen change
36 篇 li stan z.
34 篇 chen chen
34 篇 qi tian
33 篇 liu yang
33 篇 xiaoou tang
33 篇 murino vittorio
33 篇 tian qi
32 篇 pascal fua
30 篇 sun jian
29 篇 darrell trevor
28 篇 li xin
28 篇 hanqing lu
28 篇 ying shan
28 篇 nuno vasconcelos

语言

23,043 篇 英文
171 篇 其他
20 篇 中文
5 篇 土耳其文
2 篇 日文

检索条件"任意字段=IEEE/CVF Conference on Computer Vision and Pattern Recognition"

共 23241 条记录，以下是211-220 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

Robust Image Denoising through Adversarial Frequency Mixup

Robust Image Denoising through Adversarial Frequency Mixup

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Ryou, Donghun Ha, Inju Yoo, Hyewon Kim, Dongwan Han, Bohyung Seoul Natl Univ ECE Comp Vis Lab Seoul South Korea Seoul Natl Univ IPAI Seoul South Korea

ISBN: (纸本)9798350353013;9798350353006

Image denoising approaches based on deep neural networks often struggle with overfitting to specific noise distributions present in training data. This challenge persists in existing real-world denoising networks, which are trained using a limited spectrum of real noise distributions, and thus, show poor robustness to out-of-distribution real noise types. To alleviate this issue, we develop a novel training framework called Adversarial Frequency Mixup (AFM). AFM leverages mixup in the frequency domain to generate noisy images with distinctive and challenging noise characteristics, all the while preserving the properties of authentic real-world noise. Subsequently, incorporating these noisy images into the training pipeline enhances the denoising network's robustness to variations in noise distributions. Extensive experiments and analyses, conducted on a wide range of real noise benchmarks demonstrate that denoising networks trained with our proposed framework exhibit significant improvements in robustness to unseen noise distributions. The code is available at https://***/dhryougit/AFM.

关键词： Image Denoising Low-level vision Robustness

来源：评论

学校读者我要写书评

暂无评论

ELSA: Exploiting Layer-wise N:M Sparsity for vision Transformer Acceleration

ELSA: Exploiting Layer-wise N:M Sparsity for Vision Transfor...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Huang, Ning-Chi Chang, Chi-Chih Lin, Wei-Cheng Taka, Endri Marculescu, Diana Wu, Kai-Chiang Natl Yang Ming Chiao Tung Univ Hsinchu Taiwan Univ Texas Austin Austin TX USA

ISBN: (纸本)9798350365474

N:M sparsity is an emerging model compression method supported by more and more accelerators to speed up sparse matrix multiplication in deep neural networks. Most existing N:M sparsity methods compress neural networks with a uniform setting for all layers in a network or heuristically determine the layer-wise configuration by considering the number of parameters in each layer. However, very few methods have been designed for obtaining a layer-wise customized N:M sparse configuration for vision transformers (ViTs), which usually consist of transformer blocks involving the same number of parameters. In this work, to address the challenge of selecting suitable sparse configuration for ViTs on N:M sparsity-supporting accelerators, we propose ELSA, Exploiting Layer-wise N:M Sparsity for ViTs. Considering not only all N:M sparsity levels supported by a given accelerator but also the expected throughput improvement, our methodology can reap the benefits of accelerators supporting mixed sparsity by trading off negligible accuracy loss with both memory usage and inference time reduction for ViT models. For instance, our approach achieves a noteworthy 2.9x reduction in FLOPs to both Swin-B and DeiT-B with only a marginal degradation of accuracy on ImageNet. Our code is publicly available at https://***/ningchihuang/ ELSA.

关键词： Deep neural networks

来源：评论

学校读者我要写书评

暂无评论

HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces

HybridNeRF: Efficient Neural Rendering via Adaptive Volumetr...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Turki, Haithem Agrawal, Vasu Bulo, Samuel Rota Porzi, Lorenzo Kontschieder, Peter Ramanan, Deva Zollhofer, Michael Richardt, Christian Meta Real Labs Menlo Pk CA 94025 USA Carnegie Mellon Univ Pittsburgh PA 15213 USA

ISBN: (纸本)9798350353006

Neural radiance fields provide state-of-the-art view synthesis quality but tend to be slow to render. One reason is that they make use of volume rendering, thus requiring many samples (and model queries) per ray at render time. Although this representation is flexible and easy to optimize, most real-world objects can be modeled more efficiently with surfaces instead of volumes, requiring far fewer samples per ray. This observation has spurred considerable progress in surface representations, such as signed distance functions, but these may struggle to model semi-opaque and thin structures. We propose a method, HybridNeRF, that leverages the strengths of both representations by rendering most objects as surfaces while modeling the (typically) small fraction of challenging regions volumetrically. We evaluate HybridNeRF against the challenging Eyeful Tower dataset [38] along with other commonly used view synthesis datasets. When comparing to state-of-the-art baselines, including recent rasterization-based approaches, we improve error rates by 15-30% while achieving real-time framerates (at least 36 FPS) for virtual-reality resolutions (2K -> 2K). Project page: https://***/hybrid-nerf/.

关键词： 3d reconstruction computer vision machine learning neural radiance fields neural rendering novel view synthesis

来源：评论

学校读者我要写书评

暂无评论

Exploring the Zero-Shot Capabilities of vision-Language Models for Improving Gaze Following

Exploring the Zero-Shot Capabilities of Vision-Language Mode...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Gupta, Anshul Vuillecard, Pierre Farkhondeh, Arya Odobez, Jean-Marc Idiap Res Inst Martigny Switzerland Ecole Polytech Fed Lausanne Lausanne Switzerland

ISBN: (纸本)9798350365474

Contextual cues related to a person's pose and interactions with objects and other people in the scene can provide valuable information for gaze following. While existing methods have focused on dedicated cue extraction methods, in this work we investigate the zero-shot capabilities of vision-Language Models (VLMs) for extracting a wide array of contextual cues to improve gaze following performance. We first evaluate various VLMs, prompting strategies, and in-context learning (ICL) techniques for zero-shot cue recognition performance. We then use these insights to extract contextual cues for gaze following, and investigate their impact when incorporated into a state of the art model for the task. Our analysis indicates that BLIP-2 is the overall top performing VLM and that ICL can improve performance. We also observe that VLMs are sensitive to the choice of the text prompt although ensembling over multiple text prompts can provide more robust performance. Additionally, we discover that using the entire image along with an ellipse drawn around the target person is the most effective strategy for visual prompting. For gaze following, incorporating the extracted cues results in better generalization performance, especially when considering a larger set of cues, highlighting the potential of this approach.

关键词： Gaze Following vision-Language Zero-Shot Evaluation

来源：评论

学校读者我要写书评

暂无评论

TIGER: Time-Varying Denoising Model for 3D Point Cloud Generation with Diffusion Process

TIGER: Time-Varying Denoising Model for 3D Point Cloud Gener...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Ren, Zhiyuan Kim, Minchul Liu, Feng Liu, Xiaoming Michigan State Univ E Lansing MI 48824 USA

ISBN: (纸本)9798350353006

Recently, diffusion models have emerged as a new powerful generative method for 3D point cloud generation tasks. However, few works study the effect of the architecture of the diffusion model in the 3D point cloud, resorting to the typical UNet model developed for 2D images. Inspired by the wide adoption of Transformers, we study the complementary role of convolution (from UNet) and attention (from Transformers). We discover that their respective importance change according to the timestep in the diffusion process. At early stage, attention has an out-sized influence because Transformers are found to generate the overall shape more quickly, and at later stages when adding fine detail, convolution starts having a larger impact on the generated point cloud's local surface quality. In light of this observation, we propose a time-varying two-stream denoising model combined with convolution layers and transformer blocks. We generate an optimizable mask from each timestep to reweigh global and local features, obtaining time-varying fused features. Experimentally, we demonstrate that our proposed method quantitatively outperforms other state-of-the-art methods regarding visual quality and diversity. Code is avaiable https://***/Zhiyuan-R/Tiger-Diffusion.

关键词： 3D vision Diffusion Model Generative Model Point Cloud ShapeNet

来源：评论

学校读者我要写书评

暂无评论

AIGeN: An Adversarial Approach for Instruction Generation in VLN

AIGeN: An Adversarial Approach for Instruction Generation in...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Rawal, Niyati Bigazzi, Roberto Baraldi, Lorenzo Cucchiara, Rita Univ Modena & Reggio Emilia Modena Italy

ISBN: (纸本)9798350365474

In the last few years, the research interest in vision-and-Language Navigation (VLN) has grown significantly. VLN is a challenging task that involves an agent following human instructions and navigating in a previously unknown environment to reach a specified goal. Recent work in literature focuses on different ways to augment the available datasets of instructions for improving navigation performance by exploiting synthetic training data. In this work, we propose AIGeN, a novel architecture inspired by Generative Adversarial Networks (GANs) that produces meaningful and well-formed synthetic instructions to improve navigation agents' performance. The model is composed of a Transformer decoder (GPT-2) and a Transformer encoder (BERT). During the training phase, the decoder generates sentences for a sequence of images describing the agent's path to a particular point while the encoder discriminates between real and fake instructions. Experimentally, we evaluate the quality of the generated instructions and perform extensive ablation studies. Additionally, we generate synthetic instructions for 217K trajectories using AIGeN on Habitat-Matterport 3D Dataset (HM3D) and show an improvement in the performance of an off-the-shelf VLN method. The validation analysis of our proposal is conducted on REVERIE and R2R and highlights the promising aspects of our proposal, achieving state-of-the-art performance.

关键词： Generative Adversarial Networks Text Generation vision-and-Language Navigation

来源：评论

学校读者我要写书评

暂无评论

Boosting Continual Learning of vision-Language Models via Mixture-of-Experts Adapters

Boosting Continual Learning of Vision-Language Models via Mi...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Yu, Jiazuo Zhuge, Yunzhi Zhang, Lu Hu, Ping Wang, Dong Lu, Huchuan He, You Dalian Univ Technol Dalian Peoples R China Univ Elect Sci & Technol China Chengdu Peoples R China Tsinghua Univ Beijing Peoples R China

ISBN: (纸本)9798350353006

Continual learning can empower vision-language models to continuously acquire new knowledge, without the need for access to the entire historical dataset. However, mitigating the performance degradation in large-scale models is non-trivial due to (i) parameter shifts throughout life-long learning and (ii) significant computational burdens associated with full-model tuning. In this work, we present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models. Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters in response to new tasks. To preserve the zero-shot recognition capability of vision-language models, we further introduce a Distribution Discriminative Auto-Selector (DDAS) that automatically routes in-distribution and out-of-distribution inputs to the MoE Adapter and the original CLIP, respectively. Through extensive experiments across various settings, our proposed method consistently outperforms previous state-of-the-art approaches while concurrently reducing parameter training burdens by 60%. Our code locates at https://***/JiazuoYu/MoE-Adapters4CL

关键词： class incremental learning continual learning task incremental learning vision-Language Model

来源：评论

学校读者我要写书评

暂无评论

Compositional Chain-of-Thought Prompting for Large Multimodal Models

Compositional Chain-of-Thought Prompting for Large Multimoda...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Mitra, Chancharik Huang, Brandon Darrell, Trevor Herzig, Roei Univ Calif Berkeley Berkeley CA 94720 USA

ISBN: (纸本)9798350353006

The combination of strong visual backbones and Large Language Model (LLM) reasoning has led to Large Multimodal Models (LMMs) becoming the current standard for a wide range of vision and language (VL) tasks. However, recent research has shown that even the most advanced LMMs still struggle to capture aspects of compositional visual reasoning, such as attributes and relationships between objects. One solution is to utilize scene graphs (SGs)-a formalization of objects and their relations and attributes that has been extensively used as a bridge between the visual and textual domains. Yet, scene graph data requires scene graph annotations, which are expensive to collect and thus not easily scalable. Moreover, finetuning an LMM based on SG data can lead to catastrophic forgetting of the pretraining objective. To overcome this, inspired by chain-of-thought methods, we propose Compositional Chain-of-Thought (CCoT), a novel zero-shot Chain-of-Thought prompting method that utilizes SG representations in order to extract compositional knowledge from an LMM. Specifically, we first generate an SG using the LMM, and then use that SG in the prompt to produce a response. Through extensive experiments, we find that the proposed CCoT approach not only improves LMM performance on several vision and language (VL) compositional benchmarks but also improves the performance of several popular LMMs on general multimodal benchmarks, without the need for fine-tuning or annotated ground-truth SGs. Code: https://***/chancharikmitra/CCoT.

关键词： Compositionality Large Multimodal Models Multimodality Prompting Scene Graphs vision & Language

来源：评论

学校读者我要写书评

暂无评论

The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective

The Audio-Visual Conversational Graph: From an Egocentric-Ex...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Jia, Wenqi Liu, Miao Jiang, Hao Ananthabhotla, Ishwarya Rehg, James M. Ithapu, Vamsi Krishna Gao, Ruohan Georgia Tech Atlanta GA 30332 USA Meta Real Labs Menlo Pk CA 94025 USA UIUC Champaign IL USA Meta GenAI Menlo Pk CA USA

ISBN: (纸本)9798350353006

In recent years, the thriving development of research related to egocentric videos has provided a unique perspective for the study of conversational interactions, where both visual and audio signals play a crucial role. While most prior work focus on learning about behaviors that directly involve the camera wearer, we introduce the Ego-Exocentric Conversational Graph Prediction problem, marking the first attempt to infer exocentric conversational interactions from egocentric videos. We propose a unified multi-modal framework-Audio-Visual Conversational Attention (AV-CONV), for the joint prediction of conversation behaviors-speaking and listening-for both the camera wearer as well as all other social partners present in the egocentric video. Specifically, we adopt the self-attention mechanism to model the representations across-time, across-subjects, and across-modalities. To validate our method, we conduct experiments on a challenging egocentric video dataset that includes multi-speaker and multi-conversation scenarios. Our results demonstrate the superior performance of our method compared to a series of baselines. We also present detailed ablation studies to assess the contribution of each component in our model. Check our Project Page.

关键词： egocentric vision Multi-modal learning social ai

来源：评论

学校读者我要写书评

暂无评论

Pseudo-label based unsupervised fine-tuning of a monocular 3D pose estimation model for sports motions

Pseudo-label based unsupervised fine-tuning of a monocular 3...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Suzuki, Tomohiro Tanaka, Ryota Takeda, Kazuya Fujii, Keisuke Nagoya Univ Nagoya Aichi Japan

ISBN: (纸本)9798350365474

Accurate motion capture is useful for sports motion analysis, but requires higher acquisition costs. Monocular or few camera multi-view pose estimation provides an accessible but less accurate alternative, especially for sports motion, due to training on datasets of daily activities. In addition, multi-view estimation is still costly due to camera calibration. Therefore, it is desirable to develop an accurate and cost-effective motion capture system for the daily training in sports. In this paper, we propose an accurate and convenient sports motion capture system based on unsupervised fine-tuning. The proposed system estimates 3D joint positions by multi-view estimation based on automatic calibration with the human body. These results are used as pseudo-labels for fine-tuning of the recent higher performance monocular 3D pose estimation model. Since the fine-tuning improves the model accuracy for sports motion, we can choose multi-view or monocular estimation depending on the situation. We evaluated the system using a running motion dataset and ASPset-510, and showed that fine-tuning improved the performance of monocular estimation to the same level as that of multi-view estimation for running motion. Our proposed system can be useful for the daily motion analysis in sports.

关键词： computer vision Pose estimation Running Sports

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 18 19 20 21 22 23 24 25 26 27 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：