检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

23,015 篇 会议
126 册 图书
93 篇 期刊文献

馆藏范围

23,233 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,637 篇 工学
- 11,122 篇 计算机科学与技术...
- 3,490 篇 软件工程
- 2,446 篇 机械工程
- 1,719 篇 光学工程
- 1,078 篇 电气工程
- 1,024 篇 控制科学与工程
- 791 篇 信息与通信工程
- 412 篇 仪器科学与技术
- 352 篇 生物工程
- 251 篇 生物医学工程（可授...
- 198 篇 电子科学与技术（可...
- 114 篇 化学工程与技术
- 110 篇 安全科学与工程
- 100 篇 测绘科学与技术
- 89 篇 交通运输工程
- 88 篇 建筑学
- 84 篇 土木工程
3,496 篇 医学
- 3,482 篇 临床医学
- 81 篇 基础医学(可授医学...
3,249 篇 理学
- 1,943 篇 物理学
- 1,645 篇 数学
- 564 篇 统计学（可授理学、...
- 500 篇 生物学
- 249 篇 系统科学
- 108 篇 化学
525 篇 管理学
- 313 篇 图书情报与档案管...
- 227 篇 管理科学与工程(可...
- 76 篇 工商管理
276 篇 艺术学
- 276 篇 设计学（可授艺术学...
67 篇 法学
- 64 篇 社会学
38 篇 农学
28 篇 教育学
22 篇 经济学
10 篇 军事学
4 篇 文学

主题

10,190 篇 computer vision
3,972 篇 pattern recognit...
3,005 篇 training
2,007 篇 computational mo...
1,818 篇 visualization
1,816 篇 cameras
1,516 篇 feature extracti...
1,481 篇 shape
1,455 篇 three-dimensiona...
1,438 篇 image segmentati...
1,287 篇 robustness
1,206 篇 computer archite...
1,155 篇 semantics
1,147 篇 conferences
1,107 篇 layout
1,093 篇 computer science
1,089 篇 object detection
1,025 篇 benchmark testin...
970 篇 codes
922 篇 face recognition

机构

136 篇 univ sci & techn...
121 篇 univ chinese aca...
118 篇 chinese univ hon...
106 篇 carnegie mellon ...
101 篇 tsinghua univers...
101 篇 microsoft resear...
95 篇 swiss fed inst t...
93 篇 zhejiang univ pe...
82 篇 university of sc...
81 篇 zhejiang univers...
79 篇 university of ch...
77 篇 shanghai ai lab ...
74 篇 shanghai jiao to...
69 篇 national laborat...
67 篇 microsoft res as...
67 篇 alibaba grp peop...
64 篇 adobe research
60 篇 peking univ peop...
60 篇 tsinghua univ pe...
59 篇 univ oxford oxfo...

作者

81 篇 van gool luc
73 篇 timofte radu
65 篇 zhang lei
47 篇 luc van gool
40 篇 yang yi
40 篇 li stan z.
37 篇 loy chen change
35 篇 chen chen
33 篇 xiaoou tang
32 篇 liu yang
32 篇 qi tian
31 篇 tian qi
31 篇 sun jian
30 篇 murino vittorio
29 篇 darrell trevor
29 篇 pascal fua
29 篇 li fei-fei
28 篇 ling haibin
28 篇 li xin
28 篇 ying shan

语言

22,989 篇 英文
216 篇 其他
22 篇 中文
5 篇 土耳其文
2 篇 日文

检索条件"任意字段=IEEE Conference on Computer Vision and Pattern Recognition Workshops"

共 23234 条记录，以下是721-730 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

BEVSpread: Spread Voxel Pooling for Bird's-Eye-View Representation in vision-based Roadside 3D Object Detection

BEVSpread: Spread Voxel Pooling for Bird's-Eye-View Represen...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Wang, Wenjie Lu, Yehao Zheng, Guangcong Zhan, Shuigen Ye, Xiaoqing Tan, Zichang Wang, Jingdong Wang, Gaoang Li, Xi Zhejiang Univ Coll Comp Sci & Technol Hangzhou Zhejiang Peoples R China Zhejiang Univ Polytech Inst Hangzhou Zhejiang Peoples R China Baidu Beijing Peoples R China Zhejiang Singapore Innovat & AI Joint Res Lab Singapore Singapore

ISBN: (纸本)9798350353006

vision-based roadside 3D object detection has attracted rising attention in autonomous driving domain, since it encompasses inherent advantages in reducing blind spots and expanding perception range. While previous work mainly focuses on accurately estimating depth or height for 2D- to-3D mapping, ignoring the position approximation error in the voxel pooling process. Inspired by this insight, we propose a novel voxel pooling strategy to reduce such error, dubbed BEVSpread. Specifically, instead of bringing the image features contained in a frustum point to a single BEV grid, BEVSpread considers each frustum point as a source and spreads the image features to the surrounding BEV grids with adaptive weights. To achieve superior propagation performance, a specific weight function is designed to dynamically control the decay speed of the weights according to distance and depth. Aided by customized CUDA parallel acceleration, BEVSpread achieves comparable inference time as the original voxel pooling. Extensive experiments on two large-scale roadside benchmarks demonstrate that, as a plug-in, BEVSpread can significantly improve the performance of existing frustum-based BEV methods by a large margin of (1.12, 5.26, 3.01) AP in vehicle, pedestrian and cyclist. The source code will be made publicly available at BEVSpread.

关键词： 3D Object Detection Autonomous Driving BEV

来源：评论

学校读者我要写书评

暂无评论

Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction

Generalizable Whole Slide Image Classification with Fine-Gra...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Li, Hao Chen, Ying Chen, Yifei Yu, Rongshan Yang, Wenxian Wang, Liansheng Ding, Bowen Han, Yuchen Xiamen Univ Sch Informat Xiamen Peoples R China Huawei Xiamen Peoples R China Aginome Sci Xiamen Peoples R China Shanghai Jiao Tong Univ Shanghai Chest Hosp Dept Pathol Sch Med Shanghai Peoples R China

ISBN: (纸本)9798350353006

Whole Slide Image (WSI) classification is often formulated as a Multiple Instance Learning (MIL) problem. Recently, vision-Language Models (VLMs) have demonstrated remarkable performance in WSI classification. However, existing methods leverage coarse-grained pathogenetic descriptions for visual representation supervision, which are insufficient to capture the complex visual appearance of pathogenetic images, hindering the generalizability of models on diverse downstream tasks. Additionally, processing high-resolution WSIs can be computationally expensive. In this paper, we propose a novel "Fine-grained Visual-Semantic Interaction" (FiVE) framework for WSI classification. It is designed to enhance the model's generalizability by leveraging the interaction between localized visual patterns and fine-grained pathological semantics. Specifically, with meticulously designed queries, we start by utilizing a large language model to extract fine-grained pathological descriptions from various non-standardized raw reports. The output descriptions are then reconstructed into fine-grained labels used for training. By introducing a Task-specific Fine-grained Semantics (TFS) module, we enable prompts to capture crucial visual information in WSIs, which enhances representation learning and augments generalization capabilities significantly. Furthermore, given that pathological visual patterns are redundantly distributed across tissue slices, we sample a subset of visual instances during training. Our method demonstrates robust generalizability and strong transferability, dominantly outperforming the counterparts on the TCGA Lung Cancer dataset with at least 9.19% higher accuracy in few-shot experiments. The code is available at: https://***/ls1rius/WSI FiVE.

关键词： Fine-Grained Generalizable vision-Language-Model Visual-Semantic Whole Slide Image

来源：评论

学校读者我要写书评

暂无评论

PoseIRM: Enhance 3D Human Pose Estimation on Unseen Camera Settings via Invariant Risk Minimization

PoseIRM: Enhance 3D Human Pose Estimation on Unseen Camera S...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Cai, Yanlu Zhang, Weizhong Wu, Yuan Jin, Cheng Fudan Univ Shanghai Peoples R China Haina Lab Shanghai Peoples R China

ISBN: (纸本)9798350353013;9798350353006

Camera-parameter-free multi-view pose estimation is an emerging technique for 3D human pose estimation (HPE). They can infer the camera settings implicitly or explicitly to mitigate the depth uncertainty impact, showcasing significant potential in real applications. However, due to the limited camera setting diversity in the available datasets, the inferred camera parameters are always simply hard-coded into the model during training and not adaptable to the input in inference, making the learned models cannot generalize well under unseen camera settings. A natural solution is to artificially synthesize some samples, i.e., 2D-3D pose pairs, under massive new camera settings. Unfortunately, to prevent over-fitting the existing camera setting, the number of synthesized samples for each new camera setting should be comparable with that for the existing one, which multiplies the scale of training and even makes it computationally prohibitive. In this paper, we propose a novel HPE approach under the invariant risk minimization (IRM) paradigm. Precisely, we first synthesize 2D poses from myriad camera settings. We then train our model under the IRM paradigm, which targets at learning a common optimal model across all camera settings and thus enforces the model to automatically learn the camera parameters based on the input data. This allows the model to accurately infer 3D poses on unseen data by training on only a handful of samples from each synthesized setting and thus avoid the unbearable training cost increment. Another appealing feature of our method is that benefited from the capability of IRM in identifying the invariant features, its performance on the seen camera settings is enhanced as well. Comprehensive experiments verify the superiority of our approach.

关键词： computer vision Multi-View Pose Estimation

来源：评论

学校读者我要写书评

暂无评论

Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture

Frequency Decoupling for Motion Magnification via Multi-Leve...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Wang, Fei Guo, Dan Li, Kun Zhong, Zhun Wang, Meng Hefei Univ Technol Sch Comp Sci & Informat Engn Hefei Peoples R China Hefei Comprehens Natl Sci Ctr Inst Artificial Intelligence Hefei Peoples R China Univ Nottingham Sch Comp Sci Nottingham NG8 1BB England

ISBN: (纸本)9798350353006

Video Motion Magnification (VMM) aims to reveal subtle and imperceptible motion information of objects in the macroscopic world. Prior methods directly model the motion field from the Eulerian perspective by Representation Learning that separates shape and texture or Multi-domain Learning from phase fluctuations. Inspired by the frequency spectrum, we observe that the low-frequency components with stable energy always possess spatial structure and less noise, making them suitable for modeling the subtle motion field. To this end, we present FD4MM, a new paradigm of Frequency Decoupling for Motion Magnification with a Multi-level Isomorphic Architecture to capture multi-level high-frequency details and a stable low-frequency structure (motion field) in video space. Since high-frequency details and subtle motions are susceptible to information degradation due to their inherent subtlety and unavoidable external interference from noise, we carefully design Sparse High/Low-pass Filters to enhance the integrity of details and motion structures, and a Sparse Frequency Mixer to promote seamless recoupling. Besides, we innovatively design a contrastive regularization for this task to strengthen the model's ability to discriminate irrelevant features, reducing undesired motion magnification. Extensive experiments on both Real-world and Synthetic Datasets show that our FD4MM outperforms SOTA methods. Meanwhile, FD4MM reduces FLOPs by 1.63x and boosts inference speed by 1.68x than the latest method. Our code is available at https://***/Jiafei127/FD4MM.

关键词： computer vision Frequency Decoupling Video Motion Magnification

来源：评论

学校读者我要写书评

暂无评论

ANIM: Accurate Neural Implicit Model for Human Reconstruction from a single RGB-D image

ANIM: Accurate Neural Implicit Model for Human Reconstructio...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Pesavento, Marco Xu, Yuanlu Sarafianos, Nikolaos Maier, Robert Wang, Ziyan Yao, Chun-Han Volino, Marco Boyer, Edmond Hilton, Adrian Tung, Tony Univ Surrey CVSSP Guildford Surrey England UC Merced Merced CA USA Meta Real Labs Sausalito CA USA

ISBN: (纸本)9798350353013;9798350353006

Recent progress in human shape learning, shows that neural implicit models are effective in generating 3D human surfaces from limited number of views, and even from a single RGB image. However, existing monocular approaches still struggle to recover fine geometric details such as face, hands or cloth wrinkles. They are also easily prone to depth ambiguities that result in distorted geometries along the camera optical axis. In this paper, we explore the benefits of incorporating depth observations in the reconstruction process by introducing ANIM, a novel method that reconstructs arbitrary 3D human shapes from single-view RGB-D images with an unprecedented level of accuracy. Our model learns geometric details from both multi-resolution pixel-aligned and voxel-aligned features to leverage depth information and enable spatial relationships, mitigating depth ambiguities. We further enhance the quality of the reconstructed shape by introducing a depth-supervision strategy, which improves the accuracy of the signed distance field estimation of points that lie on the re-constructed surface. Experiments demonstrate that ANIM outperforms state-of-the-art works that use RGB, surface normals, point cloud or RGB-D data as input. In addi-tion, we introduce ANIM-Real, a new multi-modal dataset comprising high-quality scans paired with consumer-grade RGB-D camera, and our protocol to fine-tune ANIM, enabling high-quality reconstruction from real-world human capture. https://***/ANIM/

关键词： 3D Digital Avatars 3D Human reconstruction computer vision Neural Implicit Model

来源：评论

学校读者我要写书评

暂无评论

D⁴M: Dataset Distillation via Disentangled Diffusion Model

D<SUP>4</SUP>M: Dataset Distillation via Disentangled Diffus...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Su, Duo Hou, Junjie Gao, Weizhi Tian, Yingjie Tang, Bowen UCAS Sch Comp Sci & Technol Beijing Peoples R China UCAS Sino Danish Coll Beijing Peoples R China NCSU Dept Comp Sci Raleigh NC USA UCAS Sch Econ & Management Beijing Peoples R China Chinese Acad Sci Res Ctr Fictitious Econ & Data Sci Beijing Peoples R China Chinese Acad Sci Key Lab Big Data Min & Knowledge Management Beijing Peoples R China UCAS MOE Social Sci Lab Digital Econ Forecasts & Polic Beijing Peoples R China Chinese Acad Sci Inst Comp Technol Beijing Peoples R China

ISBN: (纸本)9798350353013;9798350353006

Dataset distillation offers a lightweight synthetic dataset for fast network training with promising test accuracy. To imitate the performance of the original dataset, most approaches employ bi-level optimization and the distillation space relies on the matching architecture. Nevertheless, these approaches either suffer significant computational costs on large-scale datasets or experience performance decline on cross-architectures. We advocate for designan economical dataset distillation framework that is independent of the matching architectures. With empirical observations, we argue that constraining the consistency of the real and synthetic image spaces will enhance the cross-architecture generalization. Motivated by this, we introduce Dataset Distillation via Disentangled Diffusion Model ((DM)-M-4), an efficient framework for dataset distillation. Compared to architecture-dependent methods, D4(M) employs latent diffusion model to guarantee consistency and incorporates label information into category prototypes. The distilled datasets are versatile, eliminating the need for repeated generation of distinct datasets for various architectures. Through comprehensive experiments, (DM)-M-4 demonstrates superior performance and robust generalization, surpassing the SOTA methods across most aspects.

关键词： computer vision Data-Centric AI Dataset Distillation Latent Diffusion Model Prototype Learning

来源：评论

学校读者我要写书评

暂无评论

Unsupervised Anomaly Detection from Time-of-Flight Depth Images

Unsupervised Anomaly Detection from Time-of-Flight Depth Ima...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Schneider, Pascal Rambach, Jason Mirbach, Bruno Stricker, Didier German Res Ctr Artificial Intelligence DFKI Trippstadter Str 122 D-67663 Kaiserslautern Germany

ISBN: (数字)9781665487399

ISBN: (纸本)9781665487399

Video anomaly detection (VAD) addresses the problem of automatically finding anomalous events in video data. The primary data modalities on which current VAD systems work on are monochrome or RGB images. Using depth data in this context instead is still hardly explored in spite of depth images being a popular choice in many other computer vision research areas and the increasing availability of inexpensive depth camera hardware. We evaluate the application of existing autoencoder-based methods on depth video and propose how the advantages of using depth data can be leveraged by integration into the loss function. Training is done unsupervised using normal sequences without need for any additional annotations. We show that depth allows easy extraction of auxiliary information for scene analysis in the form of a foreground mask and demonstrate its beneficial effect on the anomaly detection performance through evaluation on a large public dataset, for which we are also the first ones to present results on.

关键词： Training Optical losses computer vision Cameras Transformers Sensors Task analysis

来源：评论

学校读者我要写书评

暂无评论

Diffusart: Enhancing Line Art Colorization with Conditional Diffusion Models

Diffusart: Enhancing Line Art Colorization with Conditional ...

引用

2023 ieee/CVF conference on computer vision and pattern recognition workshops, CVPRW 2023

作者： Carrillo, Hernan Clément, Michaël Bugeau, Aurélie Simo-Serra, Edgar Univ. Bordeaux CNRS Bordeaux INP LaBRI UMR 5800 France France Waseda University Japan

ISBN: (纸本)9798350302493

Colorization of line art drawings is an important task in illustration and animation workflows. However, this highly laborious process is mainly done manually, limiting the creative productivity. This paper presents a novel interactive approach for line art colorization using conditional Diffusion Probabilistic Models (DPMs). In our proposed approach, the user provides initial color strokes for colorizing the line art. The strokes are then integrated into the conditional DPM-based colorization process by means of a coupled implicit and explicit conditioning strategy to generates diverse and high-quality colorized images. We evaluate our proposal and show it outperforms existing state-of-the-art approaches using the FID, LPIPS and SSIM metrics. © 2023 ieee.

关键词： computer vision

来源：评论

学校读者我要写书评

暂无评论

MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating vision-Language Transformer

MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Cao, Jianjian Ye, Peng Li, Shengze Yu, Chong Tang, Yansong Lu, Jiwen Chen, Tao Fudan Univ Sch Informat Sci & Technol Shanghai Peoples R China Fudan Univ Acad Engn & Technol Shanghai Peoples R China Tsinghua Univ Tsinghua Shenzhen Int Grad Sch Beijing Peoples R China Tsinghua Univ Dept Automat Beijing Peoples R China

ISBN: (纸本)9798350353006

vision-Language Transformers (VLTs) have shown great success recently, but are meanwhile accompanied by heavy computation costs, where a major reason can be attributed to the large number of visual and language tokens. Existing token pruning research for compressing VLTs mainly follows a single-modality-based scheme yet ignores the critical role of aligning different modalities for guiding the token pruning process, causing the important tokens for one modality to be falsely pruned in another modality branch. Meanwhile, existing VLT pruning works also lack the flexibility to dynamically compress each layer based on different input samples. To this end, we propose a novel framework named Multimodal Alignment-Guided Dynamic Token Pruning (MADTP) for accelerating various VLTs. Specifically, we first introduce a well-designed Multi-modality Alignment Guidance (MAG) module that can align features of the same semantic concept from different modalities, to ensure the pruned tokens are less important for all modalities. We further design a novel Dynamic Token Pruning (DTP) module, which can adaptively adjust the token compression ratio in each layer based on different input instances. Extensive experiments on various benchmarks demonstrate that MADTP significantly reduces the computational complexity of kinds of multimodal models while preserving competitive performance. Notably, when applied to the BLIP model in the NLVR2 dataset, MADTP can reduce the GFLOPs by 80% with less than 4% performance degradation. The code is available at https://***/double125/MADTP.

关键词： Model Compress Token Pruning

来源：评论

学校读者我要写书评

暂无评论

Test-Time Zero-Shot Temporal Action Localization

Test-Time Zero-Shot Temporal Action Localization

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Liberatori, Benedetta Conti, Alessandro Rota, Paolo Wang, Yiming Ricci, Elisa Univ Trento Trento Italy Fdn Bruno Kessler Trento Italy

ISBN: (纸本)9798350353006

Zero-Shot Temporal Action Localization (ZS-TAL) seeks to identify and locate actions in untrimmed videos unseen during training. Existing ZS-TAL methods involve fine-tuning a model on a large amount of annotated training data. While effective, training-based ZS-TAL approaches assume the availability of labeled data for supervised learning, which can be impractical in some applications. Furthermore, the training process naturally induces a domain bias into the learned model, which may adversely affect the model's generalization ability to arbitrary videos. These considerations prompt us to approach the ZS-TAL problem from a radically novel perspective, relaxing the requirement for training data. To this aim, we introduce a novel method that performs Test-Time adaptation for Temporal Action Localization (T3AL). In a nutshell, T3AL adapts a pre-trained vision and Language Model (VLM). T3AL operates in three steps. First, a video-level pseudo-label of the action category is computed by aggregating information from the entire video. Then, action localization is performed adopting a novel procedure inspired by self-supervised learning. Finally, frame-level textual descriptions extracted with a state-of-the-art captioning model are employed for refining the action region proposals. We validate the effectiveness of T3AL by conducting experiments on the THUMOS14 and the ActivityNet-v1.3 datasets. Our results demonstrate that T3AL significantly outperforms zero-shot baselines based on state-of-the-art VLMs, confirming the benefit of a test-time adaptation approach.

关键词： temporal action localization vision and language

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 69 70 71 72 73 74 75 76 77 78 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：