检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

20,860 篇 会议
104 篇 期刊文献
43 册 图书

馆藏范围

21,006 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,619 篇 工学
- 11,055 篇 计算机科学与技术...
- 2,652 篇 机械工程
- 2,252 篇 软件工程
- 914 篇 光学工程
- 884 篇 电气工程
- 529 篇 控制科学与工程
- 477 篇 信息与通信工程
- 216 篇 测绘科学与技术
- 135 篇 生物工程
- 127 篇 生物医学工程（可授...
- 98 篇 电子科学与技术（可...
- 92 篇 仪器科学与技术
- 46 篇 安全科学与工程
- 40 篇 建筑学
- 40 篇 化学工程与技术
- 39 篇 土木工程
- 37 篇 交通运输工程
- 35 篇 力学（可授工学、理...
- 33 篇 航空宇航科学与技...
3,494 篇 医学
- 3,489 篇 临床医学
- 32 篇 基础医学(可授医学...
2,247 篇 理学
- 1,145 篇 物理学
- 1,081 篇 数学
- 401 篇 生物学
- 384 篇 统计学（可授理学、...
- 245 篇 系统科学
- 46 篇 化学
343 篇 管理学
- 176 篇 管理科学与工程(可...
- 168 篇 图书情报与档案管...
- 34 篇 工商管理
31 篇 法学
19 篇 农学
15 篇 教育学
8 篇 经济学
5 篇 艺术学
2 篇 军事学
1 篇 文学

主题

8,140 篇 computer vision
2,886 篇 training
2,840 篇 pattern recognit...
1,809 篇 computational mo...
1,715 篇 visualization
1,492 篇 cameras
1,433 篇 three-dimensiona...
1,433 篇 feature extracti...
1,366 篇 shape
1,360 篇 face recognition
1,243 篇 image segmentati...
1,135 篇 robustness
1,124 篇 semantics
992 篇 computer archite...
984 篇 object detection
982 篇 layout
959 篇 benchmark testin...
935 篇 codes
899 篇 computer science
898 篇 object recogniti...

机构

174 篇 univ sci & techn...
158 篇 univ chinese aca...
153 篇 carnegie mellon ...
145 篇 chinese univ hon...
109 篇 microsoft resear...
103 篇 zhejiang univ pe...
99 篇 swiss fed inst t...
95 篇 tsinghua univers...
90 篇 microsoft res as...
90 篇 tsinghua univ pe...
88 篇 shanghai ai lab ...
81 篇 zhejiang univers...
77 篇 alibaba grp peop...
74 篇 hong kong univ s...
73 篇 university of sc...
72 篇 peking univ peop...
72 篇 university of ch...
68 篇 shanghai jiao to...
66 篇 univ oxford oxfo...
65 篇 google res mount...

作者

80 篇 van gool luc
70 篇 zhang lei
58 篇 timofte radu
48 篇 yang yi
47 篇 luc van gool
46 篇 xiaoou tang
44 篇 tian qi
43 篇 darrell trevor
42 篇 loy chen change
42 篇 sun jian
41 篇 qi tian
40 篇 li stan z.
38 篇 li fei-fei
37 篇 chen xilin
36 篇 shan shiguang
35 篇 zhou jie
35 篇 vasconcelos nuno
35 篇 liu yang
35 篇 torralba antonio
34 篇 liu xiaoming

语言

20,981 篇 英文
10 篇 中文
7 篇 其他
5 篇 土耳其文
2 篇 日文
2 篇 葡萄牙文

检索条件"任意字段=2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016"

共 21007 条记录，以下是771-780 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Multi-Modal Representation Learning with Text-Driven Soft Masks

Multi-Modal Representation Learning with Text-Driven Soft Ma...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Park, Jaeyoo Han, Bohyung Seoul Natl Univ Comp Vis Lab ECE Seoul South Korea Seoul Natl Univ IPAI Seoul South Korea

ISBN: (纸本)9798350301298

We propose a visual-linguistic representation learning approach within a self-supervised learning framework by introducing a new operation, loss, and data augmentation strategy. First, we generate diverse features for the image-text matching (ITM) task via soft-masking the regions in an image, which are most relevant to a certain word in the corresponding caption, instead of completely removing them. Since our framework relies only on image-caption pairs with no fine-grained annotations, we identify the relevant regions to each word by computing the word-conditional visual attention using multi-modal encoder. Second, we encourage the model to focus more on hard but diverse examples by proposing a focal loss for the image-text contrastive learning (ITC) objective, which alleviates the inherent limitations of overfitting and bias issues. Last, we perform multi-modal data augmentations for self-supervised learning via mining various examples by masking texts and rendering distortions on images. We show that the combination of these three innovations is effective for learning a pretrained model, leading to outstanding performance on multiple vision-language downstream tasks.

关键词： language reasoning vision

来源：评论

学校读者我要写书评

暂无评论

NVTC: Nonlinear Vector Transform Coding

NVTC: Nonlinear Vector Transform Coding

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Feng, Runsen Guo, Zongyu Li, Weiping Chen, Zhibo Univ Sci & Technol China Hefei Peoples R China

ISBN: (纸本)9798350301298

In theory, vector quantization (VQ) is always better than scalar quantization (SQ) in terms of rate-distortion (RD) performance [33]. Recent state-of-the-art methods for neural image compression are mainly based on nonlinear transform coding (NTC) with uniform scalar quantization, overlooking the benefits of VQ due to its exponentially increased complexity. In this paper, we first investigate on some toy sources, demonstrating that even if modern neural networks considerably enhance the compression performance of SQ with nonlinear transform, there is still an insurmountable chasm between SQ and VQ. Therefore, revolving around VQ, we propose a novel framework for neural image compression named Nonlinear Vector Transform Coding (NVTC). NVTC solves the critical complexity issue of VQ through (1) a multi-stage quantization strategy and (2) nonlinear vector transforms. In addition, we apply entropy-constrained VQ in latent space to adaptively determine the quantization boundaries for joint rate-distortion optimization, which improves the performance both theoretically and experimentally. Compared to previous NTC approaches, NVTC demonstrates superior rate-distortion performance, faster decoding speed, and smaller model size. Our code is available at https://***/USTC-IMCL/NVTC.

关键词： Low-level vision

来源：评论

学校读者我要写书评

暂无评论

Revisiting Reverse Distillation for Anomaly Detection

Revisiting Reverse Distillation for Anomaly Detection

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Tran Dinh Tien Anh Tuan Nguyen Nguyen Hoang Tran Ta Duc Huy Duong, Soan T. M. Nguyen, Chanh D. Tr. Truong, Steven Q. H. VinBrain JSC Hanoi Vietnam VinUniversity Hanoi Vietnam Le Quy Don Tech Univ Hanoi Vietnam

ISBN: (纸本)9798350301298

Anomaly detection is an important application in large-scale industrial manufacturing. Recent methods for this task have demonstrated excellent accuracy but come with a latency trade-off. Memory based approaches with dominant performances like PatchCore or Coupled-hypersphere-based Feature Adaptation (CFA) require an external memory bank, which significantly lengthens the execution time. Another approach that employs Reversed Distillation (RD) can perform well while maintaining low latency. In this paper, we revisit this idea to improve its performance, establishing a new state-of-the-art benchmark on the challenging MVTec dataset for both anomaly detection and localization. The proposed method, called RD++, runs six times faster than PatchCore, and two times faster than CFA but introduces a negligible latency compared to RD. We also experiment on the BTAD and Retinal OCT datasets to demonstrate our method's generalizability and conduct important ablation experiments to provide insights into its configurations. Source code will be available at https:// ***/tientrandinh/Revisiting-Reverse-Distillation.

关键词： detection recognition: Categorization retrieval

来源：评论

学校读者我要写书评

暂无评论

Dynamic Inference with Grounding Based vision and Language Models

Dynamic Inference with Grounding Based Vision and Language M...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Uzkent, Burak Garg, Amanmeet Zhu, Wentao Doshi, Keval Yi, Jingru Wang, Xiaolong Omar, Mohamed Amazon Prime Video Seattle WA 30439 USA

ISBN: (纸本)9798350301298

Transformers have been recently utilized for vision and language tasks successfully. For example, recent image and language models with more than 200M parameters have been proposed to learn visual grounding in the pre-training step and show impressive results on downstream vision and language tasks. On the other hand, there exists a large amount of computational redundancy in these large models which skips their run-time efficiency. To address this problem, we propose dynamic inference for grounding based vision and language models conditioned on the input imagetext pair. We first design an approach to dynamically skip multihead self-attention and feed forward network layers across two backbones and multimodal network. Additionally, we propose dynamic token pruning and fusion for two backbones. In particular, we remove redundant tokens at different levels of the backbones and fuse the image tokens with the language tokens in an adaptive manner. To learn policies for dynamic inference, we train agents using reinforcement learning. In this direction, we replace the CNN backbone in a recent grounding-based vision and language model, MDETR, with a vision transformer and call it ViTMDETR. Then, we apply our dynamic inference method to ViTMDETR, called D-ViTDMETR, and perform experiments on image-language tasks. Our results show that we can improve the run-time efficiency of the state-of-the-art models MDETR and GLIP by up to similar to 50% on Referring Expression Comprehension and Segmentation, and VQA with only maximum similar to 0.3% accuracy drop.

关键词： Efficient and scalable vision

来源：评论

学校读者我要写书评

暂无评论

NTIRE 2024 Challenge on Stereo Image Super-Resolution: Methods and Results

NTIRE 2024 Challenge on Stereo Image Super-Resolution: Metho...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Wang, Longguang Guo, Yulan Li, Juncheng Liu, Hongda Zhao, Yang Wang, Yingqian Jin, Zhi Gu, Shuhang Timofte, Radu Aviation University of Air Force Sun Yat-sen University The Shenzhen Campus of Sun Yat-sen University China National University of Defense Technology China Shanghai University China University of Electronic Science and Technology of China China Computer Vision Lab University of Würzburg Germany

ISBN: (纸本)9798350365474

This paper summarizes the 3rd NTIRE challenge on stereo image super-resolution (SR) with a focus on new solutions and results. The task of this challenge is to super-resolve a low-resolution stereo image pair to a high-resolution one with a magnification factor of x4 under a limited computational budget. Compared with single image SR, the major challenge of this challenge lies in how to exploit additional information in another viewpoint and how to maintain stereo consistency in the results. This challenge has 2 tracks, including one track on bicubic degradation and one track on real degradations. In total, 108 and 70 participants were successfully registered for each track, respectively. In the test phase, 14 and 13 teams successfully submitted valid results with PSNR (RGB) scores better than the baseline. This challenge establishes a new benchmark for stereo image SR.

关键词： Stereocenters

来源：评论

学校读者我要写书评

暂无评论

TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action recognition

TimeBalance: Temporally-Invariant and Temporally-Distinctive...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Dave, Ishan Rajendrakumar Rizve, Mamshad Nayeem Chen, Chen Shah, Mubarak Univ Cent Florida Ctr Res Comp Vis Orlando FL 32816 USA

ISBN: (纸本)9798350301298

Semi-Supervised Learning can be more beneficial for the video domain compared to images because of its higher annotation cost and dimensionality. Besides, any video understanding task requires reasoning over both spatial and temporal dimensions. In order to learn both the static and motion related features for the semi-supervised action recognition task, existing methods rely on hard input inductive biases like using two-modalities (RGB and Optical-flow) or two-stream of different playback rates. Instead of utilizing unlabeled videos through diverse input streams, we rely on self-supervised video representations, particularly, we utilize temporally-invariant and temporally-distinctive representations. We observe that these representations complement each other depending on the nature of the action. Based on this observation, we propose a student-teacher semi-supervised learning framework, TimeBalance, where we distill the knowledge from a temporally-invariant and a temporally-distinctive teacher. Depending on the nature of the unlabeled video, we dynamically combine the knowledge of these two teachers based on a novel temporal similarity-based reweighting scheme. Our method achieves state-of-the-art performance on three action recognition benchmarks: UCF101, HMDB51, and Kinetics400. Code: https://***/DAVEISHAN/TimeBalance.

关键词： Video: Action and event understanding

来源：评论

学校读者我要写书评

暂无评论

Bidirectional Cross-Modal Knowledge Exploration for Video recognition with Pre-trained vision-Language Models

Bidirectional Cross-Modal Knowledge Exploration for Video Re...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Wu, Wenhao Wang, Xiaohan Luo, Haipeng Wang, Jingdong Yang, Yi Ouyang, Wanli Univ Sydney Camperdown NSW Australia Baidu Inc Beijing Peoples R China Zhejiang Univ Hangzhou Peoples R China Univ Chinese Acad Sci Beijing Peoples R China Shanghai AI Lab Shanghai Peoples R China

ISBN: (纸本)9798350301298

vision-language models (VLMs) pre-trained on large-scale image-text pairs have demonstrated impressive transferability on various visual tasks. Transferring knowledge from such powerful VLMs is a promising direction for building effective video recognition models. However, current exploration in this field is still limited. We believe that the greatest value of pre-trained VLMs lies in building a bridge between visual and textual domains. In this paper, we propose a novel framework called BIKE, which utilizes the cross-modal bridge to explore bidirectional knowledge: i) We introduce the Video Attribute Association mechanism, which leverages the Video-to-Text knowledge to generate textual auxiliary attributes for complementing video recognition. ii) We also present a Temporal Concept Spotting mechanism that uses the Text-to-Video expertise to capture temporal saliency in a parameter-free manner, leading to enhanced video representation. Extensive studies on six popular video datasets, including Kinetics-400 & 600, UCF-101, HMDB-51, ActivityNet and Charades, show that our method achieves state-of-the-art performance in various recognition scenarios, such as general, zero-shot, and few-shot video recognition. Our best model achieves a state-of-the-art accuracy of 88.6% on the challenging Kinetics-400 using the released CLIP model. The code is available at https://***/whwu95/BIKE.

关键词： Video: Action and event understanding

来源：评论

学校读者我要写书评

暂无评论

Disentangled Representation Learning for Unsupervised Neural Quantization

Disentangled Representation Learning for Unsupervised Neural...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Noh, Haechan Hyun, Sangeek Jeong, Woojin Lim, Hanshin Heo, Jae-Pil Sungkyunkwan Univ Seoul South Korea Elect & Telecommun Res Inst Daejeon South Korea

ISBN: (纸本)9798350301298

The inverted index is a widely used data structure to avoid the infeasible exhaustive search. It accelerates retrieval significantly by splitting the database into multiple disjoint sets and restricts distance computation to a small fraction of the database. Moreover, it even improves search quality by allowing quantizers to exploit the compact distribution of residual vector space. However, we firstly point out a problem that an existing deep learning-based quantizer hardly benefits from the residual vector space, unlike conventional shallow quantizers. To cope with this problem, we introduce a novel disentangled representation learning for unsupervised neural quantization. Similar to the concept of residual vector space, the proposed method enables more compact latent space by disentangling information of the inverted index from the vectors. Experimental results on large-scale datasets confirm that our method outperforms the state-of-the-art retrieval systems by a large margin.

关键词： detection recognition: Categorization retrieval

来源：评论

学校读者我要写书评

暂无评论

Learning a Practical SDR-to-HDRTV Up-conversion using New Dataset and Degradation Models

Learning a Practical SDR-to-HDRTV Up-conversion using New Da...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Guo, Cheng Fan, Leidong Xue, Ziyu Jiang, Xiuhua Commun Univ China State Key Lab Media Convergence & Commun Beijing Peoples R China Peng Cheng Lab Shenzhen Peoples R China Peking Univ Beijing Peoples R China Natl Radio & Televis Adm Acad Broadcasting Sci Beijing Peoples R China

ISBN: (纸本)9798350301298

In media industry, the demand of SDR-to-HDRTV up-conversion arises when users possess HDR-WCG (high dynamic range-wide color gamut) TVs while most off-the-shelf footage is still in SDR (standard dynamic range). The research community has started tackling this low-level vision task by learning-based approaches. When applied to real SDR, yet, current methods tend to produce dim and desaturated result, making nearly no improvement on viewing experience. Different from other network-oriented methods, we attribute such deficiency to training set (HDR-SDR pair). Consequently, we propose new HDRTV dataset (dubbed HDRTV4K) and new HDR-to-SDR degradation models. Then, it's used to train a luminance-segmented network (LSN) consisting of a global mapping trunk, and two Transformer branches on bright and dark luminance range. We also update assessment criteria by tailored metrics and subjective experiment. Finally, ablation studies are conducted to prove the effectiveness. Our work is available at: https://***/AndreGuo/HDRTVDM.

关键词： Low-level vision

来源：评论

学校读者我要写书评

暂无评论

Hierarchical Prompt Learning for Multi-Task Learning

Hierarchical Prompt Learning for Multi-Task Learning

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Liu, Yajing Lu, Yuning Liu, Hao An, Yaozu Xu, Zhuoran Yao, Zhuokun Zhang, Baofeng Xiong, Zhiwei Gui, Chenguang JD Logist Beijing Peoples R China Univ Sci & Technol China Hefei Anhui Peoples R China

ISBN: (纸本)9798350301298

vision-language models (VLMs) can effectively transfer to various vision tasks via prompt learning. Real-world scenarios often require adapting a model to multiple similar yet distinct tasks. Existing methods focus on learning a specific prompt for each task, limiting the ability to exploit potentially shared information from other tasks. Naively training a task-shared prompt using a combination of all tasks ignores fine-grained task correlations. Significant discrepancies across tasks could cause negative transferring. Considering this, we present Hierarchical Prompt (HiPro) learning, a simple and effective method for jointly adapting a pre-trained VLM to multiple downstream tasks. Our method quantifies inter-task affinity and subsequently constructs a hierarchical task tree. Task-shared prompts learned by internal nodes explore the information within the corresponding task group, while task-individual prompts learned by leaf nodes obtain fine-grained information targeted at each task. The combination of hierarchical prompts provides high-quality content of different granularity. We evaluate HiPro on four multi-task learning datasets. The results demonstrate the effectiveness of our method.

关键词： language reasoning vision

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 74 75 76 77 78 79 80 81 82 83 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：