检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

23,136 篇 会议
90 篇 期刊文献
15 册 图书

馆藏范围

23,240 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,631 篇 工学
- 11,162 篇 计算机科学与技术...
- 3,338 篇 软件工程
- 2,414 篇 机械工程
- 1,663 篇 光学工程
- 1,203 篇 电气工程
- 973 篇 控制科学与工程
- 738 篇 信息与通信工程
- 381 篇 仪器科学与技术
- 322 篇 生物工程
- 239 篇 生物医学工程（可授...
- 188 篇 电子科学与技术（可...
- 109 篇 化学工程与技术
- 104 篇 安全科学与工程
- 99 篇 测绘科学与技术
- 85 篇 建筑学
- 83 篇 交通运输工程
- 82 篇 土木工程
- 56 篇 力学（可授工学、理...
3,696 篇 医学
- 3,684 篇 临床医学
- 76 篇 基础医学(可授医学...
3,138 篇 理学
- 1,880 篇 物理学
- 1,605 篇 数学
- 547 篇 统计学（可授理学、...
- 466 篇 生物学
- 243 篇 系统科学
- 107 篇 化学
491 篇 管理学
- 290 篇 图书情报与档案管...
- 212 篇 管理科学与工程(可...
- 74 篇 工商管理
252 篇 艺术学
- 251 篇 设计学（可授艺术学...
58 篇 法学
38 篇 农学
25 篇 教育学
19 篇 经济学
10 篇 军事学
3 篇 文学

主题

10,395 篇 computer vision
3,892 篇 pattern recognit...
3,101 篇 training
2,104 篇 computational mo...
1,898 篇 visualization
1,799 篇 cameras
1,487 篇 feature extracti...
1,475 篇 three-dimensiona...
1,464 篇 shape
1,447 篇 image segmentati...
1,287 篇 robustness
1,234 篇 computer archite...
1,213 篇 semantics
1,112 篇 benchmark testin...
1,111 篇 conferences
1,104 篇 layout
1,092 篇 object detection
1,084 篇 computer science
1,026 篇 codes
907 篇 face recognition

机构

137 篇 univ sci & techn...
124 篇 univ chinese aca...
121 篇 chinese univ hon...
108 篇 tsinghua univers...
108 篇 carnegie mellon ...
105 篇 microsoft resear...
97 篇 zhejiang univ pe...
91 篇 swiss fed inst t...
85 篇 university of sc...
84 篇 zhejiang univers...
81 篇 shanghai ai lab ...
79 篇 university of ch...
75 篇 shanghai jiao to...
69 篇 microsoft res as...
68 篇 alibaba grp peop...
66 篇 adobe research
65 篇 national laborat...
64 篇 peking univ peop...
61 篇 univ oxford oxfo...
59 篇 peng cheng labor...

作者

80 篇 van gool luc
71 篇 timofte radu
65 篇 zhang lei
43 篇 luc van gool
40 篇 yang yi
37 篇 loy chen change
34 篇 li stan z.
33 篇 liu yang
33 篇 xiaoou tang
33 篇 murino vittorio
33 篇 chen chen
33 篇 qi tian
33 篇 li fei-fei
32 篇 tian qi
32 篇 sun jian
30 篇 ying shan
30 篇 pascal fua
29 篇 darrell trevor
28 篇 li xin
28 篇 hanqing lu

语言

23,148 篇 英文
66 篇 其他
20 篇 中文
5 篇 土耳其文
2 篇 日文

检索条件"任意字段=IEEE/CVF Conference on Computer Vision and Pattern Recognition"

共 23241 条记录，以下是111-120 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting

Attention-Propagation Network for Egocentric Heatmap to 3D P...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Kang, Taeho Lee, Youngki Seoul Natl Univ Seoul South Korea

ISBN: (纸本)9798350353013;9798350353006

We present EgoTAP, a heatmap-to-3D pose lifting method for highly accurate stereo egocentric 3D pose estimation. Severe self-occlusion and out-of-view limbs in egocentric camera views make accurate pose estimation a challenging problem. To address the challenge, prior methods employ joint heatmaps-probabilistic 2D representations of the body pose, but heatmap-to-3D pose conversion still remains an inaccurate process. We propose a novel heatmap-to-3D lifting method composed of the Grid ViT Encoder and the Propagation Network. The Grid ViT Encoder summarizes joint heatmaps into effective feature embedding using self-attention. Then, the Propagation Network estimates the 3D pose by utilizing skeletal information to better estimate the position of obscure joints. Our method significantly outperforms the previous state-of-the-art qualitatively and quantitatively demonstrated by a 23.9% reduction of error in an MPJPE metric. Our source code is available on GitHub (1).

关键词： 3D pose estimation egocentric vision stereo vision

来源：评论

学校读者我要写书评

暂无评论

eTraM: Event-based Traffic Monitoring Dataset

eTraM: Event-based Traffic Monitoring Dataset

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Verma, Aayush Atul Chakravarthi, Bharatesh Vaghela, Arpitsinh Wei, Hua Yang, Yezhou Arizona State Univ Tempe AZ 85287 USA

ISBN: (纸本)9798350353006

Event cameras, with their high temporal and dynamic range and minimal memory usage, have found applications in various fields. However, their potential in static traffic monitoring remains largely unexplored. To facilitate this exploration, we present eTraM - a first-of-its-kind, fully event-based traffic monitoring dataset. eTraM offers 10 hr of data from different traffic scenarios in various lighting and weather conditions, providing a comprehensive overview of real-world situations. Providing 2M bounding box annotations, it covers eight distinct classes of traffic participants, ranging from vehicles to pedestrians and micro-mobility. eTraM's utility has been assessed using state-of-the-art methods for traffic participant detection, including RVT, RED, and YOLOv8. We quantitatively evaluate the ability of event-based models to generalize on nighttime and unseen scenes. Our findings substantiate the compelling potential of leveraging event cameras for traffic monitoring, opening new avenues for research and application. eTraM is available at https://***/eTraM.

关键词： DVS Dynamic vision Sensor Event-based Event-based vision Event-camera ITS Neuromorphic

来源：评论

学校读者我要写书评

暂无评论

Rugby Scene Classification Enhanced by vision Language Model

Rugby Scene Classification Enhanced by Vision Language Model

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Nonaka, Naoki Fujihira, Ryo Koshiba, Toshiki Maeda, Akira Seita, Jun RIKEN Informat R&D & Strategy Headquarters Adv Data Sci Project Wako Saitama Japan Hakata Knee & Sports Clin Fukuoka Japan

ISBN: (纸本)9798350365474

This study investigates the integration of vision language models (VLM) to enhance the classification of situations within rugby match broadcasts. The importance of accurately identifying situations in sports videos is emphasized for understanding game dynamics and facilitating downstream tasks like performance evaluation and injury prevention. Utilizing a dataset comprising 18, 000 labeled images extracted at 0.2-second intervals from 100 minutes of rugby match broadcasts, scene classification tasks including contact plays (scrums, mauls, rucks, tackles, lineouts), rucks, tackles, lineouts, and multiclass classification were performed. The study aims to validate the utility of VLM outputs in improving classification performance compared to using solely image data. Experimental results demonstrate substantial performance improvements across all tasks with the incorporation of VLM outputs. Our analysis of prompts suggests that, when provided with appropriate contextual information through natural language, VLMs can effectively capture the context of a given image. The findings of our study indicate that leveraging VLMs in the domain of sports analysis holds promise for developing image processing models capable of incorpolating the tacit knowledge encoded within language models, as well as information conveyed through natural language descriptions.

关键词： Rugby Scene classification vision language model

来源：评论

学校读者我要写书评

暂无评论

GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation

GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Khanna, Mukul Ramrakhya, Ram Chhablani, Gunjan Yenamandra, Sriram Gervet, Theophile Chang, Matthew Kiraly, Zsolt Chaplot, Devendra Singh Batra, Dhruv Mottaghi, Roozbeh Georgia Inst Technol Atlanta GA 30332 USA Carnegie Mellon Univ Pittsburgh PA 15213 USA Univ Illinois Urbana IL USA Mistral AI Paris France Univ Washington Seattle WA USA

ISBN: (纸本)9798350353006

The Embodied AI community has made significant strides in visual navigation tasks, exploring targets from 3D coordinates, objects, language descriptions, and images. However, these navigation models often handle only a single input modality as the target. With the progress achieved so far, it is time to move towards universal navigation models capable of handling various goal types, enabling more effective user interaction with robots. To facilitate this goal, we propose GOAT-Bench, a benchmark for the universal navigation task referred to as GO to AnyThing (GOAT). In this task, the agent is directed to navigate to a sequence of targets specified by the category name, language description, or image in an open-vocabulary fashion. We benchmark monolithic RL and modular methods on the GOAT task, analyzing their performance across modalities, the role of explicit and implicit scene memories, their robustness to noise in goal specifications, and the impact of memory in lifelong scenarios.

关键词： computer vision Embodied AI Visual navigation

来源：评论

学校读者我要写书评

暂无评论

Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video

Video2Game: Real-time, Interactive, Realistic and Browser-Co...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Xia, Hongchi Lin, Zhi-Hao Ma, Wei-Chiu Wang, Shenlong Univ Illinois Champaign IL 61820 USA Shanghai Jiao Tong Univ Shanghai Peoples R China Cornell Univ Ithaca NY USA

ISBN: (纸本)9798350353013;9798350353006

Creating high-quality and interactive virtual environments, such as games and simulators, often involves complex and costly manual modeling processes. In this paper, we present Video2Game, a novel approach that automatically converts videos of real-world scenes into realistic and interactive game environments. At the heart of our system are three core components: (i) a neural radiance fields (NeRF) module that effectively captures the geometry and visual appearance of the scene;(ii) a mesh module that distills the knowledge from NeRF for faster rendering;and (iii) a physics module that models the interactions and physical dynamics among the objects. By following the carefully designed pipeline, one can construct an interactable and actionable digital replica of the real world. We benchmark our system on both indoor and large-scale outdoor scenes. We show that we can not only produce highly-realistic renderings in real-time, but also build interactive games on top.

关键词： computer vision

来源：评论

学校读者我要写书评

暂无评论

Intrinsic Image Diffusion for Indoor Single-view Material Estimation

Intrinsic Image Diffusion for Indoor Single-view Material Es...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Kocsis, Peter Sitzmann, Vincent Niessner, Matthias Tech Univ Munich Munich Germany MIT EECS Cambridge MA 02139 USA

ISBN: (纸本)9798350353013;9798350353006

We present Intrinsic Image Diffusion, a generative model for appearance decomposition of indoor scenes. Given a single input view, we sample multiple possible material explanations represented as albedo, roughness, and metallic maps. Appearance decomposition poses a considerable challenge in computer vision due to the inherent ambiguity between lighting and material properties and the lack of real datasets. To address this issue, we advocate for a probabilistic formulation, where instead of attempting to directly predict the true material properties, we employ a conditional generative model to sample from the solution space. Furthermore, we show that utilizing the strong learned prior of recent diffusion models trained on large-scale real-world images can be adapted to material estimation and highly improves the generalization to real images. Our method produces significantly sharper, more consistent, and more detailed materials, outperforming state-of-the-art methods by 1.5dB on PSNR and by 45% better FID score on albedo prediction. We demonstrate the effectiveness of our approach through experiments on both synthetic and real-world datasets.

关键词： Appearance Decompostion computer vision Deep Learning Diffusion Graphics Lighting Estimation Material Estimation

来源：评论

学校读者我要写书评

暂无评论

PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs

PIN: Positional Insert Unlocks Object Localisation Abilities...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Dorkenwald, Michael Barazani, Nimrod Snoek, Cees G. M. Asano, Yuki M. Univ Amsterdam Amsterdam Netherlands

ISBN: (纸本)9798350353006

vision-Language Models (VLMs), such as Flamingo and GPT-4V, have shown immense potential by integrating large language models with vision systems. Nevertheless, these models face challenges in the fundamental computer vision task of object localisation, due to their training on multi-modal data containing mostly captions without explicit spatial grounding. While it is possible to construct custom, supervised training pipelines with bounding box annotations that integrate with VLMs, these result in specialized and hard-to-scale models. In this paper, we aim to explore the limits of caption-based VLMs and instead propose to tackle the challenge in a simpler manner by i) keeping the weights of a caption-based VLM frozen and ii) not using any supervised detection data. To this end, we introduce an input-agnostic Positional Insert (PIN), a learnable spatial prompt, containing a minimal set of parameters that are slid inside the frozen VLM, unlocking object localisation capabilities. Our PIN module is trained with a simple next-token prediction task on synthetic data without requiring the introduction of new output heads. Our experiments demonstrate strong zero-shot localisation performances on a variety of images, including Pascal VOC, COCO, LVIS, and diverse images like paintings or cartoons.

关键词： Efficient Adaption of VLMs Foundation Models vision-Language Models Visual Grounding

来源：评论

学校读者我要写书评

暂无评论

Mitigating Object Hallucinations in Large vision-Language Models through Visual Contrastive Decoding

Mitigating Object Hallucinations in Large Vision-Language Mo...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Leng, Sicong Zhang, Hang Chen, Guanzheng Li, Xin Lug, Shijian Miao, Chunyan Bing, Lidong Alibaba Grp DAMO Acad Hangzhou Peoples R China Nanyang Technol Univ Singapore Singapore Hupan Lab Hangzhou 310023 Peoples R China

ISBN: (纸本)9798350353006

Large vision-Language Models (LVLMs) have advanced considerably, intertwining visual recognition and language understanding to generate content that is not only coherent but also contextually attuned. Despite their success, LVLMs still suffer from the issue of object hallucinations, where models generate plausible yet incorrect outputs that include objects that do not exist in the images. To mitigate this issue, we introduce Visual Contrastive Decoding (VCD), a simple and training-free method that contrasts output distributions derived from original and distorted visual inputs. The proposed VCD effectively reduces the over-reliance on statistical bias and unimodal priors, two essential causes of object hallucinations. This adjustment ensures the generated content is closely grounded to visual inputs, resulting in contextually accurate outputs. Our experiments show that VCD, without either additional training or the usage of external tools, significantly mitigates the object hallucination issue across different LVLM families. Beyond mitigating object hallucinations, VCD also excels in general LVLM benchmarks, highlighting its wide-ranging applicability.

关键词： Large Multimodal Models Multimodality vision and Language

来源：评论

学校读者我要写书评

暂无评论

FPN-IAIA-BL: A Multi-Scale Interpretable Deep Learning Model for Classification of Mass Margins in Digital Mammography

FPN-IAIA-BL: A Multi-Scale Interpretable Deep Learning Model...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Yang, Julia Barnett, Alina Jade Donnelly, Jon Kishore, Satvik Fang, Jerry Schwartz, Fides Regina Chen, Chaofan Lo, Joseph Y. Rudin, Cynthia Duke Univ Durham NC 27708 USA Brigham & Womens Hosp 75 Francis St Boston MA 02115 USA Univ Maine Orono ME USA

ISBN: (纸本)9798350365474

Digital mammography is essential to breast cancer detection, and deep learning offers promising tools for faster and more accurate mammogram analysis. In radiology and other high-stakes environments, uninterpretable ("black box") deep learning models are unsuitable and there is a call in these fields to make interpretable models. Recent work in interpretable computer vision provides transparency to these formerly black boxes by utilizing prototypes for case-based explanations, achieving high accuracy in applications including mammography. However, these models struggle with precise feature localization, reasoning on large portions of an image when only a small part is relevant. This paper addresses this gap by proposing a novel multi-scale interpretable deep learning model for mammographic mass margin classification. Our contribution not only offers an interpretable model with reasoning aligned with radiologist practices, but also provides a general architecture for computer vision with user-configurable prototypes from coarse-to fine-grained prototypes.

关键词： breast cancer cancer computer vision deep learning interpretability interpretable machine learning machine learning mammography medical imaging neural networks

来源：评论

学校读者我要写书评

暂无评论

CONFORM: Contrast is All You Need For High-Fidelity Text-to-Image Diffusion Models

CONFORM: Contrast is All You Need For High-Fidelity Text-to-...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Meral, Tuna Han Salih Simsar, Enis Tombari, Federico Yanardag, Pinar Virginia Tech Blacksburg VA USA Swiss Fed Inst Technol Zurich Switzerland TUM Munich Germany Google Menlo Pk CA USA

ISBN: (纸本)9798350353006

Images produced by text-to-image diffusion models might not always faithfully represent the semantic intent of the provided text prompt, where the model might overlook or entirely fail to produce certain objects. Existing solutions often require customly tailored functions for each of these problems, leading to sub-optimal results, especially for complex prompts. Our work introduces a novel perspective by tackling this challenge in a contrastive context. Our approach intuitively promotes the segregation of objects in attention maps while also maintaining that pairs of related attributes are kept close to each other. We conduct extensive experiments across a wide variety of scenarios, each involving unique combinations of objects, attributes, and scenes. These experiments effectively showcase the versatility, efficiency, and flexibility of our method in working with both latent and pixel-based diffusion models, including Stable Diffusion and Imagen. Moreover, we publicly share our source code to facilitate further research.

关键词： computer vision Contrastive learning Generative AI Semantic fidelity Stable Diffusion Text-to-image diffusion models

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 8 9 10 11 12 13 14 15 16 17 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：