检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

19,438 篇 会议
46 篇 期刊文献
5 册 图书

馆藏范围

19,488 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

12,440 篇 工学
- 10,282 篇 计算机科学与技术...
- 2,395 篇 机械工程
- 2,007 篇 软件工程
- 813 篇 光学工程
- 531 篇 电气工程
- 419 篇 控制科学与工程
- 322 篇 信息与通信工程
- 210 篇 测绘科学与技术
- 80 篇 生物医学工程（可授...
- 73 篇 电子科学与技术（可...
- 70 篇 生物工程
- 60 篇 仪器科学与技术
- 38 篇 建筑学
- 36 篇 土木工程
- 33 篇 力学（可授工学、理...
- 31 篇 航空宇航科学与技...
- 26 篇 安全科学与工程
- 20 篇 材料科学与工程（可...
- 20 篇 交通运输工程
3,409 篇 医学
- 3,408 篇 临床医学
1,980 篇 理学
- 1,006 篇 数学
- 973 篇 物理学
- 359 篇 统计学（可授理学、...
- 336 篇 生物学
- 231 篇 系统科学
- 24 篇 化学
258 篇 管理学
- 138 篇 管理科学与工程(可...
- 122 篇 图书情报与档案管...
- 27 篇 工商管理
19 篇 法学
- 19 篇 社会学
14 篇 农学
8 篇 教育学
7 篇 经济学
3 篇 军事学
3 篇 艺术学

主题

7,893 篇 computer vision
2,727 篇 training
2,680 篇 pattern recognit...
1,760 篇 computational mo...
1,644 篇 visualization
1,410 篇 cameras
1,372 篇 three-dimensiona...
1,327 篇 shape
1,213 篇 face recognition
1,207 篇 image segmentati...
1,164 篇 feature extracti...
1,109 篇 robustness
1,087 篇 semantics
983 篇 layout
959 篇 object detection
949 篇 computer archite...
942 篇 benchmark testin...
931 篇 codes
902 篇 computer science
859 篇 deep learning

机构

174 篇 univ sci & techn...
161 篇 carnegie mellon ...
148 篇 univ chinese aca...
144 篇 chinese univ hon...
110 篇 microsoft resear...
106 篇 tsinghua univ pe...
103 篇 zhejiang univ pe...
99 篇 swiss fed inst t...
92 篇 tsinghua univers...
89 篇 microsoft res as...
88 篇 shanghai ai lab ...
81 篇 zhejiang univers...
76 篇 alibaba grp peop...
73 篇 university of sc...
73 篇 hong kong univ s...
72 篇 peking univ peop...
72 篇 university of ch...
68 篇 shanghai jiao to...
66 篇 univ oxford oxfo...
66 篇 shanghai jiao to...

作者

79 篇 van gool luc
70 篇 zhang lei
59 篇 timofte radu
48 篇 yang yi
47 篇 xiaoou tang
45 篇 luc van gool
43 篇 darrell trevor
43 篇 tian qi
42 篇 loy chen change
42 篇 sun jian
42 篇 li fei-fei
40 篇 qi tian
38 篇 li stan z.
36 篇 chen xilin
36 篇 torralba antonio
35 篇 vasconcelos nuno
35 篇 shan shiguang
35 篇 liu yang
34 篇 liu xiaoming
34 篇 tao dacheng

语言

19,484 篇 英文
2 篇 日文
2 篇 中文
1 篇 其他

检索条件"任意字段=IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2000"

共 19489 条记录，以下是4991-5000 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Playable Environments: Video Manipulation in Space and Time

Playable Environments: Video Manipulation in Space and Time

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Menapace, Willi Lathuiliere, Stephane Siarohin, Aliaksandr Theobalt, Christian Tulyakov, Sergey Golyanik, Vladislav Ricci, Elisa Univ Trento Trento Italy Inst Polytech Paris LTCI Telecom Paris Paris France MPI Informat SIC Saarbrucken Germany Snap Inc Santa Monica CA USA Fdn Bruno Kessler Povo Italy

ISBN: (数字)9781665469463

ISBN: (纸本)9781665469463

We present Playable Environments-a new representation for interactive video generation and manipulation in space and time. With a single image at inference time, our novel framework allows the user to move objects in 3D while generating a video by providing a sequence of desired actions. The actions are learnt in an unsupervised manner. The camera can be controlled to get the desired viewpoint. Our method builds an environment state for each frame, which can be manipulated by our proposed action module and decoded back to the image space with volumetric rendering. To support diverse appearances of objects, we extend neural radiance fields with style-based modulation. Our method trains on a collection of various monocular videos requiring only the estimated camera parameters and 2D object locations. To set a challenging benchmark, we introduce two large scale video datasets with significant camera movements. As evidenced by our experiments, playable environments enable several creative applications not attainable by prior video synthesis works, including playable 3D video generation, stylization and manipulation(1).

关键词： computer vision Three-dimensional displays Modulation Machine learning Aerospace electronics Benchmark testing Cameras

来源：评论

学校读者我要写书评

暂无评论

DeCoTR: Enhancing Depth Completion with 2D and 3D Attentions

DeCoTR: Enhancing Depth Completion with 2D and 3D Attentions

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Shi, Yunxiao Singh, Manish Kumar Cai, Hong Porikli, Fatih Qualcomm AI Res San Diego CA 92121 USA

ISBN: (纸本)9798350353006

In this paper, we introduce a novel approach that harnesses both 2D and 3D attentions to enable highly accurate depth completion without requiring iterative spatial propagations. Specifically, we first enhance a baseline convolutional depth completion model by applying attention to 2D features in the bottleneck and skip connections. This effectively improves the performance of this simple network and sets it on par with the latest, complex transformer-based models. Leveraging the initial depths and features from this network, we uplift the 2D features to form a 3D point cloud and construct a 3D point transformer to process it, allowing the model to explicitly learn and exploit 3D geometric features. In addition, we propose normalization techniques to process the point cloud, which improves learning and leads to better accuracy than directly using point transformers off the shelf. Furthermore, we incorporate global attention on downsampled point cloud features, which enables long-range context while still being computationally feasible. We evaluate our method, DeCoTR, on established depth completion benchmarks, including NYU Depth V2 and KITTI, showcasing that it sets new state-of-the-art performance. We further conduct zero-shot evaluations on ScanNet and DDAD benchmarks and demonstrate that DeCoTR has superior generalizability compared to existing approaches.

关键词： 3D computer vision depth completion point transformer

来源：评论

学校读者我要写书评

暂无评论

Hierarchical Video Prediction using Relational Layouts for Human-Object Interactions

Hierarchical Video Prediction using Relational Layouts for H...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Bodla, Navaneeth Shrivastava, Gaurav Chellappa, Rama Shrivastava, Abhinav Univ Maryland College Pk MD 20742 USA Johns Hopkins Univ Baltimore MD USA

ISBN: (纸本)9781665445092

Learning to model and predict how humans interact with objects while performing an action is challenging, and most of the existing video prediction models are ineffective in modeling complicated human-object interactions. Our work builds on hierarchical video prediction models, which disentangle the video generation process into two stages: predicting a high-level representation, such as pose sequence, and then learning a pose-to-pixels translation model for pixel generation. An action sequence for a human-object interaction task is typically very complicated, involving the evolution of pose, person's appearance, object locations, and object appearances over time. To this end, we propose a Hierarchical Video Prediction model using Relational Layouts. In the first stage, we learn to predict a sequence of layouts. A layout is a high-level representation of the video containing both pose and objects' information for every frame. The layout sequence is learned by modeling the relationships between the pose and objects using relational reasoning and recurrent neural networks. The layout sequence acts as a strong structure prior to the second stage that learns to map the layouts into pixel space. Experimental evaluation of our method on two datasets, UMD-HOI and Bimanual, shows significant improvements in standard video evaluation metrics such as LPIPS, PSNR, and SSIM. We also perform a detailed qualitative analysis of our model to demonstrate various generalizations.

关键词： Measurement computer vision Recurrent neural networks Computational modeling Layout Predictive models Cognition

来源：评论

学校读者我要写书评

暂无评论

Propagation Regularizer for Semi-supervised Learning with Extremely Scarce Labeled Samples

Propagation Regularizer for Semi-supervised Learning with Ex...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Kim, Noo-ri Lee, Jee-Hyong Sungkyunkwan Univ Dept Elect & Comp Engn 2066 Seobu Ro Suwon 16419 Gyeonggi Do South Korea

ISBN: (数字)9781665469463

ISBN: (纸本)9781665469463

Semi-supervised learning (SSL) is a method to make better models using a large number of easily accessible unlabeled data along with a small number of labeled data obtained at a high cost. Most of existing SSL studies focus on the cases where sufficient amount of labeled samples are available, tens to hundreds labeled samples for each class, which still requires a lot of labeling cost. In this paper, we focus on SSL environment with extremely scarce labeled samples, only 1 or 2 labeled samples per class, where most of existing methods fail to learn. We propose a propagation regularizer which can achieve efficient and effective learning with extremely scarce labeled samples by suppressing confirmation bias. In addition, for the realistic model selection in the absence of the validation dataset, we also propose a model selection method based on our propagation regularizer. The proposed methods show 70.9%, 30.3%, and 78.9% accuracy on CIFAR-10, CIFAR-100, SVHN dataset with just one labeled sample per class, which are improved by 8.9% to 120.2% compared to the existing approaches. And our proposed methods also show good performance on a higher resolution dataset, STL-10.

关键词： Degradation computer vision Costs Computational modeling Semisupervised learning Stability analysis Data models

来源：评论

学校读者我要写书评

暂无评论

Cross-Modal Relationship Inference for Grounding Referring Expressions 32

Cross-Modal Relationship Inference for Grounding Referring E...

引用

32nd ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Yang, Sibei Li, Guanbin Yu, Yizhou Univ Hong Kong Hong Kong Peoples R China Sun Yat sen Univ Guangzhou Guangdong Peoples R China Deepwise AI Lab Beijing Peoples R China

ISBN: (纸本)9781728132938

Grounding referring expressions is a fundamental yet challenging task facilitating human-machine communication in the physical world. It locates the target object in an image on the basis of the comprehension of the relationships between referring natural language expressions and the image. A feasible solution for grounding referring expressions not only needs to extract all the necessary information (i.e. objects and the relationships among them) in both the image and referring expressions, but also compute and represent multimodal contexts from the extracted information. Unfortunately, existing work on grounding referring expressions cannot extract multi-order relationships from the referring expressions accurately and the contexts they obtain have discrepancies with the contexts described by referring expressions. In this paper, we propose a Cross-Modal Relationship Extractor (CMRE) to adaptively highlight objects and relationships, that have connections with a given expression, with a cross-modal attention mechanism, and represent the extracted information as a language-guided visual relation graph. In addition, we propose a Gated Graph Convolutional Network (GGCN) to compute multimodal semantic contexts by fusing information from different modes and propagating multimodal information in the structured relation graph. Experiments on various common benchmark datasets show that our Cross-Modal Relationship Inference Network, which consists of CMRE and GGCN, outperforms all existing state-of-the-art methods.

关键词： Categorization recognition: Detection Retrieval vision + Language

来源：评论

学校读者我要写书评

暂无评论

vision Transformer with Deformable Attention

Vision Transformer with Deformable Attention

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Xia, Zhuofan Pan, Xuran Song, Shiji Li, Li Erran Huang, Gao Tsinghua Univ Dept Automat BNRist Beijing Peoples R China Amazon AWS AI San Francisco CA USA Beijing Acad Artificial Intelligence Beijing Peoples R China

ISBN: (数字)9781665469463

ISBN: (纸本)9781665469463

Transformers have recently shown superior performances on various vision tasks. The large, sometimes even global, receptive field endows Transformer models with higher representation power over their CNN counterparts. Nevertheless, simply enlarging receptive field also gives rise to several concerns. On the one hand, using dense attention e.g., in ViT, leads to excessive memory and computational cost, and features can be influenced by irrelevant parts which are beyond the region of interests. On the other hand, the sparse attention adopted in PVT or Swin Transformer is data agnostic and may limit the ability to model long range relations. To mitigate these issues, we propose a novel deformable self-attention module, where the positions of key and value pairs in self-attention are selected in a data-dependent way. This flexible scheme enables the self-attention module to focus on relevant regions and capture more informative features. On this basis, we present Deformable Attention Transformer, a general backbone model with deformable attention for both image classification and dense prediction tasks. Extensive experiments show that our models achieve consistently improved results on comprehensive benchmarks. Code is available at https://***/LeapLabTHU/DAT.

关键词： Deformable models Adaptation models computer vision Computational modeling Predictive models Transformers Data models

来源：评论

学校读者我要写书评

暂无评论

Pre-Trained Image Processing Transformer

Pre-Trained Image Processing Transformer

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Chen, Hanting Wang, Yunhe Guo, Tianyu Xu, Chang Deng, Yiping Liu, Zhenhua Ma, Siwei Xu, Chunjing Xu, Chao Gao, Wen Peking Univ Dept Machine Intelligence Key Lab Machine Percept MOE Beijing Peoples R China Huawei Technol Noahs Ark Lab Shenzhen Peoples R China Univ Sydney Fac Engn Sch Comp Sci Sydney NSW Australia Huawei Technol Cent Software Inst Shenzhen Peoples R China Peking Univ Sch Elect Engn & Comp Sci Inst Digital Media Beijing Peoples R China Peng Cheng Lab Shenzhen Peoples R China

ISBN: (纸本)9781665445092

As the computing power of modern hardware is increasing strongly, pre-trained deep learning models (e.g., BERT, GPT-3) learned on large-scale datasets have shown their effectiveness over conventional methods. The big progress is mainly contributed to the representation ability of transformer and its variant architectures. In this paper, we study the low-level computer vision task (e.g., denoising, super-resolution and deraining) and develop a new pre-trained model, namely, image processing transformer (IPT). To maximally excavate the capability of transformer, we present to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs. The IPT model is trained on these images with multi-heads and multi-tails. In addition, the contrastive learning is introduced for well adapting to different image processing tasks. The pre-trained model can therefore efficiently employed on desired task after fine-tuning. With only one pre-trained model, IPT outperforms the current state-of-the-art methods on various low-level benchmarks.

关键词： Adaptation models computer vision Computational modeling Image processing Superresolution Noise reduction Benchmark testing

来源：评论

学校读者我要写书评

暂无评论

Leveraging per Image-Token Consistency for vision-Language Pre-training

Leveraging per Image-Token Consistency for Vision-Language P...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Gou, Yunhao Ko, Tom Yang, Hansi Kwok, James Zhang, Yu Wang, Mingxuan Southern Univ Sci & Technol Shenzhen Peoples R China Hong Kong Univ Sci & Technol Hong Kong Peoples R China ByteDance Ai Lab Beijing Peoples R China Peng Cheng Lab Shenzhen Peoples R China

ISBN: (纸本)9798350301298

Most existing vision-language pre-training (VLP) approaches adopt cross-modal masked language modeling (CMLM) to learn vision-language associations. However, we find that CMLM is insufficient for this purpose according to our observations: (1) Modality bias: a considerable amount of masked tokens in CMLM can be recovered with only the language information, ignoring the visual inputs. (2) Under-utilization of the unmasked tokens: CMLM primarily focuses on the masked token but it cannot simultaneously leverage other tokens to learn vision-language associations. To handle those limitations, we propose EPIC (lEveraging Per Image-Token Consistency for vision-language pre-training). In EPIC, for each image-sentence pair, we mask tokens that are salient to the image (i.e., Saliency-based Masking Strategy) and replace them with alternatives sampled from a language model (i.e., Inconsistent Token Generation Procedure), and then the model is required to determine for each token in the sentence whether it is consistent with the image (i.e., Image-Token Consistency Task). The proposed EPIC method is easily combined with pre-training methods. Extensive experiments show that the combination of the EPIC method and state-of-the-art pre-training approaches, including ViLT, ALBEF, METER, and X-VLM, leads to significant improvements on downstream tasks. Our coude is released at https://***/gyhdog99/epic

关键词： Multi-modal learning

来源：评论

学校读者我要写书评

暂无评论

DVC: An End-to-end Deep Video Compression Framework 32

DVC: An End-to-end Deep Video Compression Framework

引用

32nd ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Lu, Guo Ouyang, Wanli Xu, Dong Zhang, Xiaoyun Cai, Chunlei Gao, Zhiyong Shanghai Jiao Tong Univ Shanghai Peoples R China Univ Sydney SenseTime Comp Vis Res Grp Sydney NSW Australia Univ Sydney Sydney NSW Australia

ISBN: (纸本)9781728132938

Conventional video compression approaches use the predictive coding architecture and encode the corresponding motion information and residual information. In this paper, taking advantage of both classical architecture in the conventional video compression method and the powerful nonlinear representation ability of neural networks, we propose the first end-to-end video compression deep model that jointly optimizes all the components for video compression. Specifically, learning based optical flow estimation is utilized to obtain the motion information and reconstruct the current frames. Then we employ two auto-encoder style neural networks to compress the corresponding motion and residual information. All the modules are jointly learned through a single loss function, in which they collaborate with each other by considering the trade-off between reducing the number of compression bits and improving quality of the decoded video. Experimental results show that the proposed approach can outperform the widely used video coding standard H.264 in terms of PSNR and be even on par with the latest standard H.265 in terms of MS-SSIM.

关键词： Low-level vision vision Applications and Systems

来源：评论

学校读者我要写书评

暂无评论

Learning to Segment Actions from Visual and Language Instructions via Differentiable Weak Sequence Alignment

Learning to Segment Actions from Visual and Language Instruc...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Shen, Yuhan Wang, Lu Elhamifar, Ehsan Northeastern Univ Boston MA 02115 USA Univ Michigan Ann Arbor MI 48109 USA

ISBN: (纸本)9781665445092

We address the problem of unsupervised localization of task-relevant actions (key-steps) and feature learning in instructional videos using both visual and language instructions. Our key observation is that the sequences of visual and linguistic key-steps are weakly aligned: there is an ordered one-to-one correspondence between most visual and language key-steps, while some key-steps in one modality are absent in the other. To recover the two sequences, we develop an ordered prototype learning module, which extracts visual and linguistic prototypes representing key-steps. To find weak alignment and perform feature learning, we develop a differentiable weak sequence alignment (DWSA) method that finds ordered one-to-one matching between sequences while allowing some items in a sequence to stay unmatched. We develop an efficient forward and backward algorithm for computing the alignment and the loss derivative with respect to parameters of visual and language feature learning modules. By experiments on two instructional video datasets, we show that our method significantly improves the state of the art.

关键词： Location awareness Visualization computer vision Computational modeling Prototypes Linguistics Feature extraction

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 491 492 493 494 495 496 497 498 499 500 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：