检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

11,745 篇 会议
8 篇 期刊文献

馆藏范围

11,753 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

8,139 篇 工学
- 7,674 篇 计算机科学与技术...
- 804 篇 机械工程
- 580 篇 软件工程
- 376 篇 电气工程
- 252 篇 控制科学与工程
- 208 篇 光学工程
- 85 篇 生物工程
- 83 篇 信息与通信工程
- 29 篇 生物医学工程（可授...
- 23 篇 电子科学与技术（可...
- 21 篇 化学工程与技术
- 15 篇 交通运输工程
- 14 篇 安全科学与工程
- 10 篇 网络空间安全
- 8 篇 仪器科学与技术
- 6 篇 材料科学与工程（可...
- 6 篇 动力工程及工程热...
3,194 篇 医学
- 3,190 篇 临床医学
- 11 篇 基础医学(可授医学...
- 7 篇 公共卫生与预防医...
481 篇 理学
- 216 篇 物理学
- 203 篇 系统科学
- 88 篇 生物学
- 55 篇 数学
- 29 篇 统计学（可授理学、...
- 24 篇 化学
55 篇 管理学
- 29 篇 图书情报与档案管...
- 28 篇 管理科学与工程(可...
- 12 篇 工商管理
17 篇 法学
- 15 篇 社会学
6 篇 农学
4 篇 教育学
2 篇 经济学
1 篇 军事学
1 篇 艺术学

主题

5,434 篇 computer vision
2,516 篇 training
2,087 篇 pattern recognit...
1,621 篇 computational mo...
1,435 篇 visualization
1,306 篇 three-dimensiona...
1,060 篇 semantics
981 篇 codes
968 篇 benchmark testin...
898 篇 computer archite...
884 篇 deep learning
762 篇 task analysis
681 篇 feature extracti...
536 篇 face recognition
527 篇 conferences
515 篇 transformers
515 篇 neural networks
479 篇 object detection
466 篇 image segmentati...
454 篇 cameras

机构

168 篇 univ sci & techn...
144 篇 univ chinese aca...
144 篇 tsinghua univ pe...
143 篇 carnegie mellon ...
135 篇 chinese univ hon...
112 篇 peng cheng lab p...
108 篇 zhejiang univ pe...
97 篇 swiss fed inst t...
92 篇 tsinghua univers...
92 篇 sensetime res pe...
88 篇 shanghai ai lab ...
85 篇 zhejiang univers...
84 篇 shanghai jiao to...
78 篇 peng cheng labor...
77 篇 university of sc...
77 篇 alibaba grp peop...
76 篇 univ hong kong p...
76 篇 tech univ munich...
76 篇 stanford univ st...
73 篇 university of ch...

作者

76 篇 timofte radu
64 篇 van gool luc
50 篇 zhang lei
44 篇 yang yi
40 篇 loy chen change
34 篇 tao dacheng
32 篇 liu yang
32 篇 chen chen
30 篇 zhou jie
30 篇 tian qi
30 篇 sun jian
28 篇 zha zheng-jun
27 篇 qi tian
26 篇 li xin
26 篇 vasconcelos nuno
26 篇 ying shan
25 篇 liu xiaoming
25 篇 luc van gool
25 篇 boxin shi
24 篇 zheng wei-shi

语言

11,746 篇 英文
7 篇 其他

检索条件"任意字段=2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023"

共 11753 条记录，以下是4791-4800 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

Cops-Ref: A new Dataset and Task on Compositional Referring Expression Comprehension

Cops-Ref: A new Dataset and Task on Compositional Referring ...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Chen, Zhenfang Wang, Peng Ma, Lin Wong, Kwan-Yee K. Wu, Qi Univ Hong Kong Hong Kong Peoples R China Univ Wollongong Wollongong NSW Australia Tencent AI Lab Shenzhen Peoples R China Univ Adelaide Australian Ctr Robot Vis Adelaide SA Australia Univ Adelaide Adelaide SA Australia

ISBN: (纸本)9781728171685

Referring expression comprehension (REF) aims at identifying a particular object in a scene by a natural language expression. It requires joint reasoning over the textual and visual domains to solve the problem. Some popular referring expression datasets, however, fail to provide an ideal test bed for evaluating the reasoning ability of the models, mainly because 1) their expressions typically describe only some simple distinctive properties of the object and 2) their images contain limited distracting information. To bridge the gap, we propose a new dataset for visual reasoning in context of referring expression comprehension with two main features. First, we design a novel expression engine rendering various reasoning logics that can be flexibly combined with rich visual properties to generate expressions with varying compositionality. Second, to better exploit the full reasoning chain embodied in an expression, we propose a new test setting by adding additional distracting images containing objects sharing similar properties with the referent, thus minimising the success rate of reasoning-free cross-domain alignment. We evaluate several state-of-the-art REF models, but find none of them can achieve promising performance. A proposed modular hard mining strategy performs the best but still leaves substantial room for improvement. The dataset and code are available at: https://***/ zfchenUnique/Cops- Ref.

关键词： computer vision

来源：评论

学校读者我要写书评

暂无评论

Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes

Canonical Voting: Towards Robust Oriented Bounding Box Detec...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： You, Yang Ye, Zelin Lou, Yujing Li, Chengkun Li, Yong-Lu Ma, Lizhuang Wang, Weiming Lu, Cewu Shanghai Jiao Tong Univ Shanghai Peoples R China

ISBN: (数字)9781665469463

ISBN: (纸本)9781665469463

3D object detection has attracted much attention thanks to the advances in sensors and deep learning methods for point clouds. Current state-of-the-art methods like VoteNet regress direct offset towards object centers and box orientations with an additional Multi-Layer-Perceptron network Both their offset and orientation predictions are not accurate due to the fundamental difficulty in rotation classification. In the work, we disentangle the direct offset into Local Canonical Coordinates (LCC), box scales and box orientations. Only LCC and box scales are regressed, while box orientations are generated by a canonical voting scheme. Finally, an LCC-aware back-projection checking algorithm iteratively cuts out bounding boxes from the generated vote maps, with the elimination of false positives. Our model achieves state-of-the-art performance on three standard real-world benchmarks: ScanNet, SceneNN and SUN RGB-D. Our code is available on hups://***/qq456evb/CanonicalVoting.

关键词： Point cloud compression Deep learning computer vision Three-dimensional displays Machine vision Object detection Sensor systems and applications

来源：评论

学校读者我要写书评

暂无评论

Complex Video Action Reasoning via Learnable Markov Logic Network

Complex Video Action Reasoning via Learnable Markov Logic Ne...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Jin, Yang Zhu, Linchao Mu, Yadong Peking Univ Beijing Peoples R China Baidu Res Beijing Peoples R China Univ Technol Sydney ReLER Lab AAII Sydney NSW Australia

ISBN: (数字)9781665469463

ISBN: (纸本)9781665469463

Profiting from the advance of deep convolutional networks, current state-of-the-art video action recognition models have achieved remarkable progress. Nevertheless, most of existing models suffer from low interpretability of the predicted actions. Inspired by the observation that temporally-configured human-object interactions often serve as a key indicator of many actions, this work crafts an action reasoning framework that performs Markov Logic Network (MLN) based probabilistic logical inference. Crucially, we propose to encode an action by first-order logical rules that correspond to the temporal changes of visual relationships in videos. The main contributions of this work are two-fold: 1) Different from existing black-box models, the proposed model simultaneously implements the localization of temporal boundaries and the recognition of action categories by grounding the logical rules of MLN in videos. The weight associated with each such rule further provides an estimate of confidence. These collectively make our model more explainable and robust. 2) Instead of using hand-crafted logical rules in conventional MLN, we develop a data-driven instantiation of the MLN. In specific, a hybrid learning scheme is proposed. It combines MLN's weight learning and reinforcement learning, using the former's results as a self-critic for guiding the latter's training. Additionally, by treating actions as logical predicates, the proposed framework can also be integrated with deep models for further performance boost. Comprehensive experiments on two complex video action datasets (Charades & CAD-120) clearly demonstrate the effectiveness and explainability of our proposed method.

关键词： Training Location awareness computer vision Visualization Reinforcement learning Markov processes Predictive models

来源：评论

学校读者我要写书评

暂无评论

Grounded Language-Image Pre-training

Grounded Language-Image Pre-training

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Li, Liunian Harold Zhang, Pengchuan Zhang, Haotian Yang, Jianwei Li, Chunyuan Zhong, Yiwu Wang, Lijuan Yuan, Lu Zhang, Lei Hwang, Jenq-Neng Chang, Kai-Wei Gao, Jianfeng Univ Calif Los Angeles Los Angeles CA 90024 USA Microsoft Res Redmond WA 98052 USA Univ Washington Seattle WA 98195 USA Univ Wisconsin Madison WI USA Microsoft Cloud & AI Redmond WA USA Intemat Digital Econ Acad Shenzhen Peoples R China

ISBN: (数字)9781665469463

ISBN: (纸本)9781665469463

This paper presents a grounded language-image pretraining (GLIP) model for learning object-level, language-aware, and semantic-rich visual representations. GLIP unifies object detection and phrase grounding for pre-training. The unification brings two benefits: 1) it allows GLIP to learn from both detection and grounding data to improve both tasks and bootstrap a good grounding model;2) GLIP can leverage massive image-text pairs by generating grounding boxes in a self-training fashion, making the learned representations semantic-rich. In our experiments, we pre-train GLIP on 27M grounding data, including 3M human-annotated and 24M web-crawled image-text pairs. The learned representations demonstrate strong zero-shot and few-shot transferability to various object-level recognition tasks. 1) When directly evaluated on COCO and LVIS (without seeing any images in COCO during pre-training), GLIP achieves 49.8 AP and 26.9 AP respectively, surpassing many supervised baselines.(1) 2) After fine-tuned on COCO, GLIP achieves 60.8 AP on val and 61.5 AP on test-dev, surpassing prior SoTA. 3) When transferred to 13 downstream object detection tasks, a 1-shot GLIP rivals with a fully-supervised Dynamic Head. Code will be released at https://***/microsoft/GLIP.

关键词： Visualization computer vision Image recognition Head Grounding Object detection Data models

来源：评论

学校读者我要写书评

暂无评论

Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf vision-Language Models

Emergent Open-Vocabulary Semantic Segmentation from Off-the-...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Luo, Jiayun Khandelwal, Siddhesh Sigal, Leonid Li, Boyang Nanyang Technol Univ Singapore Singapore Univ British Columbia Vector Inst AI Vancouver BC Canada

ISBN: (纸本)9798350353013;9798350353006

From image-text pairs, large-scale vision-language models (VLMs) learn to implicitly associate image regions with words, which prove effective for tasks like visual question answering. However, leveraging the learned association for open-vocabulary semantic segmentation remains a challenge. In this paper, we propose a simple, yet extremely effective, training-free technique, Plug-and-Play Open-Vocabulary Semantic Segmentation (PnP-OVSS) for this task. PnP-OVSS leverages a VLM with direct text-to-image cross-attention and an image-text matching loss. To balance between over-segmentation and under-segmentation, we introduce Salience Dropout;by iteratively dropping patches that the model is most attentive to, we are able to better resolve the entire extent of the segmentation mask. PnP-OVSS does not require any neural network training and performs hyperparameter tuning without the need for any segmentation annotations, even for a validation set. PnP-OVSS demonstrates substantial improvements over comparable baselines (+29.4% mIoU on Pascal VOC, +13.2% mIoU on Pascal Context, +14.0% mIoU on MS COCO, +2.4% mIoU on COCO Stuff) and even outperforms most baselines that conduct additional network training on top of pretrained VLMs. Our codebase is at https://***/letitiabanana/PnP-OVSS.

关键词： open-vocabulary semantic segmentation training-free

来源：评论

学校读者我要写书评

暂无评论

EgoThink: Evaluating First-Person Perspective Thinking Capability of vision-Language Models

EgoThink: Evaluating First-Person Perspective Thinking Capab...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Cheng, Sijie Guo, Zhicheng Wu, Jingwen Fang, Kechen Li, Peng Liu, Huaping Liu, Yang Tsinghua Univ Dept Comp Sci & Technol Beijing Peoples R China Tsinghua Univ Inst AI Ind Res AIR Beijing Peoples R China Univ Toronto Dept Elect & Comp Engn Toronto ON Canada Tsinghua Univ Zhili Coll Beijing Peoples R China 01 Ai Beijing Peoples R China

ISBN: (纸本)9798350353006

vision-language models (VLMs) have recently shown promising results in traditional downstream tasks. Evaluation studies have emerged to assess their abilities, with the majority focusing on the third-person perspective, and only a few addressing specific tasks from the first-person perspective. However, the capability of VLMs to "think" from a first-person perspective, a crucial attribute for advancing autonomous agents and robotics, remains largely unexplored. To bridge this research gap, we introduce EgoThink, a novel visual question-answering benchmark that encompasses six core capabilities with twelve detailed dimensions. The benchmark is constructed using selected clips from ego-centric videos, with manually annotated question-answer pairs containing first-person information. To comprehensively assess VLMs, we evaluate twenty-one popular VLMs on EgoThink. Moreover, given the open-ended format of the answers, we use GPT-4 as the automatic judge to compute single-answer grading. Experimental results indicate that although GPT-4V leads in numerous dimensions, all evaluated VLMs still possess considerable potential for improvement in first-person perspective tasks. Meanwhile, enlarging the number of trainable parameters has the most significant impact on model performance on EgoThink. In conclusion, EgoThink serves as a valuable addition to existing evaluation benchmarks for VLMs, providing an indispensable resource for future research in the realm of embodied artificial intelligence and robotics.

关键词： Benchmark Egocentric vision-Language Models

来源：评论

学校读者我要写书评

暂无评论

MSCap: Multi-Style Image Captioning with Unpaired Stylized Text 32

MSCap: Multi-Style Image Captioning with Unpaired Stylized T...

引用

32nd ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Guo, Longteng Liu, Jing Yao, Peng Li, Jiangwei Lu, Hanqing Chinese Acad Sci Inst Automat Natl Lab Pattern Recognit Beijing Peoples R China Univ Sci & Technol Beijing Beijing Peoples R China Huawei Devices Multimedia Dept Shenzhen Peoples R China Univ Chinese Acad Sci Beijing Peoples R China

ISBN: (纸本)9781728132938

In this paper, we propose an adversarial learning network for the task of multi-style image captioning (MSCap) with a standard factual image caption dataset and a multi-stylized language corpus without paired images. How to learn a single model for multi-stylized image captioning with unpaired data is a challenging and necessary task, whereas rarely studied in previous works. The proposed framework mainly includes four contributive modules following a typical image encoder. First, a style dependent caption generator to output a sentence conditioned on an encoded image and a specified style. Second, a caption discriminator is presented to distinguish the input sentence to be real or not. The discriminator and the generator are trained in an adversarial manner to enable more natural and human-like captions. Third, a style classifier is employed to discriminate the specific style of the input sentence. Besides, a back-translation module is designed to enforce the generated stylized captions are visually grounded with the intuition of the cycle consistency for factual caption and stylized caption. We enable an end-to-end optimization of the whole model with differentiable sofiinax *** last, we conduct comprehensive experiments using a combined dataset containing four caption styles to demonstrate the outstanding performance of our proposed method.

关键词： vision + Language

来源：评论

学校读者我要写书评

暂无评论

FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation

FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Li, Xiang Wei, Tianhan Chen, Yau Pun Tai, Yu-Wing Tang, Chi-Keung HKUST Hong Kong Peoples R China Tencent Shenzhen Peoples R China

ISBN: (纸本)9781728171685

Over the past few years, we have witnessed the success of deep learning in image recognition thanks to the availability of large-scale human-annotated datasets such as PAS-CAL VOC, ImageNet, and COCO. Although these datasets have covered a wide range of object categories, there are still a significant number of objects that are not included. Can we perform the same task without a lot of human annotations? In this paper, we are interested in few-shot object segmentation where the number of annotated training examples are limited to 5 only. To evaluate and validate the performance of our approach, we have built a few-shot segmentation dataset, FSS-1000, which consists of 1000 object classes with pixelwise annotation of ground-truth segmentation. Unique in FSS-1000, our dataset contains significant number of objects that have never been seen or annotated in previous datasets, such as tiny daily objects, merchandise, cartoon characters, logos, etc. We build our baseline model using standard backbone networks such as VGG-16, ResNet-101, and Inception. To our surprise, we found that training our model from scratch using FSS-1000 achieves comparable and even better results than training with weights pre-trained by ImageNet which is more than 100 times larger than FSS-1000. Both our approach and dataset are simple, effective, and easily extensible to learn segmentation of new object classes given very few annotated training examples. Dataset is available at https : //***/HKUSTCV/FSS-1000

关键词： Image recognition

来源：评论

学校读者我要写书评

暂无评论

MPC: Multi-view Probabilistic Clustering

MPC: Multi-view Probabilistic Clustering

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Liu, Junjie Liu, Junlong Yan, Shaotian Jiang, Rongxin Tian, Xiang Gu, Boxuan Chen, Yaowu Shen, Chen Huang, Jianqiang Zhejiang Univ Hangzhou Zhejiang Peoples R China Alibaba Cloud Comp Ltd Hangzhou Zhejiang Peoples R China Zhejiang Univ Embedded Syst Engn Res Ctr Minist Educ China Hangzhou Zhejiang Peoples R China Zhejiang Prov Key Lab Network Multimedia Technol Hangzhou Zhejiang Peoples R China Alibaba Hangzhou Zhejiang Peoples R China

ISBN: (数字)9781665469463

ISBN: (纸本)9781665469463

Despite the promising progress having been made, the two challenges of multi-view clustering (MVC) are still waiting for better solutions: i) Most existing methods are either not qualified or require additional steps for incomplete multi-view clustering and ii) noise or outliers might significantly degrade the overall clustering performance. In this paper, we propose a novel unified framework for incomplete and complete MVC named multi-view probabilistic clustering (MPC). MPC equivalently transforms multiview pairwise posterior matching probability into composition of each view's individual distribution, which tolerates data missing and might extend to any number of views. Then graph-context-aware refinement with path propagation and co-neighbor propagation is used to refine pairwise probability, which alleviates the impact of noise and outliers. Finally, MPC also equivalently transforms probabilistic clustering's objective to avoid complete pairwise computation and adjusts clustering assignments by maximizing joint probability iteratively. Extensive experiments on multiple benchmarks for incomplete and complete MVC show that MPC significantly outperforms previous state-ofthe-art methods in both effectiveness and efficiency.

关键词： computer vision Transforms Benchmark testing Probabilistic logic Robustness pattern recognition

来源：评论

学校读者我要写书评

暂无评论

MVFuseNet: Improving End-to-End Object Detection and Motion Forecasting through Multi-View Fusion of LiDAR Data

MVFuseNet: Improving End-to-End Object Detection and Motion ...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Laddha, Ankit Gautam, Shivam Palombo, Stefan Pandey, Shreyash Vallespi-Gonzalez, Carlos Aurora Innovat Mountain View CA 94043 USA

ISBN: (纸本)9781665448994

In this work, we propose MVFuseNet, a novel end-to-end method for joint object detection and motion forecasting from a temporal sequence of LiDAR data. Most existing methods operate in a single view by projecting data in either range view (RV) or bird's eye view (BEV). In contrast, we propose a method that effectively utilizes both RV and BEV for spatio-temporal feature learning as part of a temporal fusion network as well as for multi-scale feature learning in the backbone network. Further, we propose a novel sequential fusion approach that effectively utilizes multiple views in the temporal fusion network. We show the benefits of our multi-view approach for the tasks of detection and motion forecasting on two large-scale self-driving data sets, achieving state-of-the-art results. Furthermore, we show that MVFusenet scales well to large operating ranges while maintaining real-time performance.

关键词： computer vision Laser radar conferences Object detection Real-time systems pattern recognition Forecasting

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 476 477 478 479 480 481 482 483 484 485 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：