检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

50,479 篇 会议
1,421 册 图书
1,041 篇 期刊文献
1 篇 学位论文

馆藏范围

52,940 篇 电子文献
4 种 纸本馆藏

日期分布

学科分类号

31,811 篇 工学
- 24,804 篇 计算机科学与技术...
- 12,568 篇 软件工程
- 5,153 篇 光学工程
- 4,756 篇 电气工程
- 4,436 篇 信息与通信工程
- 4,257 篇 机械工程
- 3,956 篇 控制科学与工程
- 2,474 篇 生物工程
- 1,728 篇 生物医学工程（可授...
- 1,584 篇 仪器科学与技术
- 1,317 篇 电子科学与技术（可...
- 793 篇 化学工程与技术
- 698 篇 安全科学与工程
- 542 篇 交通运输工程
- 379 篇 建筑学
- 331 篇 土木工程
11,839 篇 理学
- 6,434 篇 物理学
- 5,405 篇 数学
- 2,761 篇 生物学
- 1,910 篇 统计学（可授理学、...
- 801 篇 化学
- 669 篇 系统科学
5,305 篇 医学
- 5,094 篇 临床医学
- 729 篇 基础医学(可授医学...
- 459 篇 药学(可授医学、理...
3,350 篇 管理学
- 1,953 篇 图书情报与档案管...
- 1,535 篇 管理科学与工程(可...
- 479 篇 工商管理
720 篇 艺术学
- 718 篇 设计学（可授艺术学...
428 篇 法学
- 401 篇 社会学
297 篇 农学
197 篇 教育学
163 篇 经济学
63 篇 文学
49 篇 军事学

主题

17,385 篇 computer vision
9,017 篇 pattern recognit...
4,196 篇 training
3,815 篇 feature extracti...
3,134 篇 cameras
2,870 篇 computational mo...
2,789 篇 image segmentati...
2,622 篇 visualization
2,573 篇 shape
2,533 篇 face recognition
2,171 篇 robustness
2,123 篇 computer science
1,973 篇 object detection
1,959 篇 computer archite...
1,878 篇 layout
1,853 篇 object recogniti...
1,802 篇 three-dimensiona...
1,725 篇 neural networks
1,708 篇 humans
1,691 篇 image recognitio...

机构

165 篇 univ chinese aca...
144 篇 tsinghua univers...
136 篇 national laborat...
108 篇 univ sci & techn...
104 篇 zhejiang univers...
100 篇 shanghai jiao to...
95 篇 microsoft resear...
94 篇 university of sc...
86 篇 zhejiang univ pe...
84 篇 shanghai ai lab ...
74 篇 school of comput...
69 篇 computer vision ...
68 篇 peking univ peop...
68 篇 chinese acad sci...
65 篇 chinese univ hon...
63 篇 institute of inf...
62 篇 google res mount...
61 篇 univ oxford oxfo...
59 篇 univ toronto on
57 篇 swiss fed inst t...

作者

91 篇 van gool luc
87 篇 umapada pal
76 篇 zhang lei
64 篇 lee seong-whan
49 篇 vittorio murino
42 篇 yang yi
34 篇 nassir navab
33 篇 li xin
33 篇 jie yang
32 篇 liu yang
31 篇 escalera sergio
31 篇 loy chen change
30 篇 ling haibin
30 篇 h. bischof
29 篇 zhou jie
29 篇 vasconcelos nuno
29 篇 jan-michael frah...
29 篇 hanqing lu
28 篇 blumenstein mich...
27 篇 jia yunde

语言

51,871 篇 英文
835 篇 其他
241 篇 中文
22 篇 土耳其文
5 篇 西班牙文
2 篇 日文
2 篇 葡萄牙文
2 篇 俄文

检索条件"任意字段=IEEE Conference on Computer Vision and Pattern Recognition"

共 52943 条记录，以下是111-120 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

De-Diffusion Makes Text a Strong Cross-Modal Interface

De-Diffusion Makes Text a Strong Cross-Modal Interface

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Wei, Chen Liu, Chenxi Qi, Siyuan Zhang, Zhishuai Yuille, Alan Yu, Jiahui Google DeepMind London England Johns Hopkins Univ Baltimore MD 21218 USA

ISBN: (纸本)9798350353006

We demonstrate text as a strong cross-modal interface. Rather than relying on deep embeddings to connect image and language as the interface representation, our approach represents an image as text, from which we enjoy the interpretability and flexibility inherent to natural language. We employ an autoencoder that uses a pre-trained text-to-image diffusion model for decoding. The encoder is trained to transform an input image into text, which is then fed into the fixed text- to-image diffusion decoder to reconstruct the original input - a process we term De- Diffusion. Experiments validate both the precision and comprehensiveness of De-Diffusion text representing images, such that it can be readily ingested by off-the-shelf text-to-image tools and LLMs for diverse multi-modal tasks. For example, a single De-Diffusion model can generalize to provide transferable prompts for different text-to-image tools, and also achieves a new state of the art on open-ended vision-language tasks by simply prompting large language models with few-shot examples. Project page: ***.

关键词： Diffusion Generative Model vision and Language

来源：评论

学校读者我要写书评

暂无评论

Prompt Learning with One-Shot Setting based Feature Space Analysis in vision-and-Language Models

Prompt Learning with One-Shot Setting based Feature Space An...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Hirohashi, Yuki Hirakawa, Tsubasa Yamashita, Takayoshi Fujiyoshi, Hironobu OMRON Corp Kyoto Japan Chubu Univ Kasugai Aichi Japan

ISBN: (纸本)9798350365474

By using few-shot data and labels, prompt learning obtains optimal prompts that are capable of achieving high performance on downstream tasks. Existing prompt learning methods generate high-quality prompts that are suitable for downstream tasks but tend to perform poorly in scenarios where only very limited data (e.g., one-shot) is available. We address on this challenging one-shot scenario and propose a novel architecture for prompt learning, called Image-Text Feature Alignment Branch (ITFAB). ITFAB aligns text features closer to the centroids of image features and separates text features with different classes to resolve misalignment in the feature space, thereby facilitating the acquisition of high-quality prompts with very limited data. In one-shot setting, our method outperforms the existing CoOp and CoCoOp methods and in some cases even surpasses CoCoOp's 16-shot performance. Testing on different datasets and domain, show that ITFAB almost matches CoCoOp's effectiveness. It also works with current prompt learning methods like MapLe and PromptSRC, improving their performance in one-shot setting.

关键词： Prompt Learning vision-and-Language Model

来源：评论

学校读者我要写书评

暂无评论

Low Latency Point Cloud Rendering with Learned Splatting

Low Latency Point Cloud Rendering with Learned Splatting

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Hu, Yueyu Gong, Ran Sun, Qi Wang, Yao NYU Tandon Sch Engn Brooklyn NY 11201 USA Tsinghua Univ Beijing Peoples R China

ISBN: (纸本)9798350365474

Point cloud is a critical 3D representation with many emerging applications. Because of the point sparsity and irregularity, high-quality rendering of point clouds is challenging and often requires complex computations to recover the continuous surface representation. On the other hand, to avoid visual discomfort, the motion-to-photon latency has to be very short, under 10 ms. Existing rendering solutions lack in either quality or speed. To tackle these challenges, we present a framework that unlocks interactive, free-viewing and high-fidelity point cloud rendering. We train a generic neural network to estimate 3D elliptical Gaussians from arbitrary point clouds and use differentiable surface splatting to render smooth texture and surface normal for arbitrary views. Our approach does not require per-scene optimization, and enable real-time rendering of dynamic point cloud. Experimental results demonstrate the proposed solution enjoys superior visual quality and speed, as well as generalizability to different scene content and robustness to compression artifacts.

关键词： Rendering (computer graphics)

来源：评论

学校读者我要写书评

暂无评论

PEEKABOO: Interactive Video Generation via Masked-Diffusion

PEEKABOO: Interactive Video Generation via Masked-Diffusion

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Jain, Yash Nasery, Anshul Vineet, Vibhav Behl, Harkirat Microsoft Redmond WA 98052 USA Univ Washington Seattle WA USA

ISBN: (纸本)9798350353013;9798350353006

Modern video generation models like Sora have achieved remarkable success in producing high-quality videos. However, a significant limitation is their inability to offer interactive control to users, a feature that promises to open up unprecedented applications and creativity. In this work, we introduce the first solution to equip diffusion-based video generation models with spatio-temporal control. We present PEEKABOO, a novel masked attention module, which seamlessly integrates with current video generation models offering control without the need for additional training or inference overhead. To facilitate future research, we also introduce a comprehensive benchmark for interactive video generation. This benchmark offers a standardized framework for the community to assess the efficacy of emerging interactive video generation models. Our extensive qualitative and quantitative assessments reveal that PEEKABOO achieves up to a 3.8x improvement in mIoU over baseline models, all while maintaining the same latency. Code and benchmark are available on the webpage.

关键词： computer vision diffusion interactive text to video video generation

来源：评论

学校读者我要写书评

暂无评论

Honeybee: Locality-enhanced Projector for Multimodal LLM

Honeybee: Locality-enhanced Projector for Multimodal LLM

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Cha, Junbum Kang, Wooyoung Mun, Jonghwan Roh, Byungseok Kakao Brain Seongnam South Korea

ISBN: (纸本)9798350353006

In Multimodal Large Language Models (MLLMs), a visual projector plays a crucial role in bridging pre-trained vision encoders with LLMs, enabling profound visual understanding while harnessing the LLMs' robust capabilities. Despite the importance of the visual projector, it has been relatively less explored. In this study, we first identify two essential projector properties: (i) flexibility in managing the number of visual tokens, crucial for MLLMs' overall efficiency, and (ii) preservation of local context from visual features, vital for spatial understanding. Based on these findings, we propose a novel projector design that is both flexible and locality-enhanced, effectively satisfying the two desirable properties. Additionally, we present comprehensive strategies to effectively utilize multiple and multifaceted instruction datasets. Through extensive experiments, we examine the impact of individual design choices. Finally, our proposed MLLM, Honeybee, remarkably outperforms previous state-of-the-art methods across various benchmarks, including MME, MMBench, SEED-Bench, and LLaVA-Bench, achieving significantly higher efficiency. Code and models are available at https://github. com/kakaobrain/honeybee.

关键词： Multimodal LLM vision-Language

来源：评论

学校读者我要写书评

暂无评论

An End-to-End vision Transformer Approach for Image Copy Detection

An End-to-End Vision Transformer Approach for Image Copy Det...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Lee, Jiahe Steven Hsu, Wynne Lee, Mong Li Natl Univ Singapore Inst Data Sci Singapore Singapore Natl Univ Singapore Ctr Trusted Internet & Community Singapore Singapore

ISBN: (纸本)9798350365474

Image copy detection is one of the pivotal tools to safeguard online information integrity. The challenge lies in determining whether a query image is an edited copy, which necessitates the identification of candidate source images through a retrieval process. The process requires discriminative features comprising of both global descriptors that are designed to be augmentation-invariant and local descriptors that can capture salient foreground objects to assess whether a query image is an edited copy of some source reference image. This work describes an end-to-end solution that leverage a vision Transformer model to learn such discriminative features and perform implicit matching between the query image and the reference image. Experimental results on two benchmark datasets demonstrate that the proposed solution outperforms state-of-the-art methods. Case studies illustrate the effectiveness of our approach in matching reference images from which the query images have been copy-edited.

关键词： image copy detection image retrieval

来源：评论

学校读者我要写书评

暂无评论

Accurate Training Data for Occupancy Map Prediction in Automated Driving Using Evidence Theory

Accurate Training Data for Occupancy Map Prediction in Autom...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Kaelble, Jonas Wirges, Sascha Tatarchenko, Maxim Ilg, Eddy Bosch Ctr Artificial Intelligence Renningen Germany Saarland Univ Saarbrucken Germany

ISBN: (纸本)9798350353013;9798350353006

Automated driving fundamentally requires knowledge about the surrounding geometry of the scene. Modern approaches use only captured images to predict occupancy maps that represent the geometry. Training these approaches requires accurate data that may be acquired with the help of LiDAR scanners. We show that the techniques used for current benchmarks and training datasets to convert LiDAR scans into occupancy grid maps yield very low quality, and subsequently present a novel approach using evidence theory that yields more accurate reconstructions. We demonstrate that these are superior by a large margin, both qualitatively and quantitatively, and that we additionally obtain meaningful uncertainty estimates. When converting the occupancy maps back to depth estimates and comparing them with the raw LiDAR measurements, our method yields a MAE improvement of 30% to 52% on nuScenes and 53% on Waymo over other occupancy ground-truth data. Finally, we use the improved occupancy maps to train a state-of-the-art occupancy prediction method and demonstrate that it improves the MAE by 25% on nuScenes.

关键词： Automated Driving computer vision Occupancy Prediction

来源：评论

学校读者我要写书评

暂无评论

Label-free Anomaly Detection in Aerial Agricultural Images with Masked Image Modeling

Label-free Anomaly Detection in Aerial Agricultural Images w...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Shikhar, Sambal Sobti, Anupam Plaksha Univ Mohali Punjab India

ISBN: (纸本)9798350365474

Detecting various types of stresses (nutritional, water, nitrogen, etc.) in agricultural fields is critical for farmers to ensure maximum productivity. However, stresses show up in different shapes and sizes across different crop types and varieties. Hence, this is posed as an anomaly detection task in agricultural images. Accurate anomaly detection in agricultural UAV images is vital for early identification of field irregularities. Traditional supervised learning faces challenges in adapting to diverse anomalies, necessitating extensive annotated data. In this work, we overcome this limitation with self-supervised learning using a masked image modeling approach. Masked Autoencoders (MAE) extract meaningful normal features from unlabeled image samples which produces high reconstruction error for the abnormal pixels during reconstruction. To remove the need of using only "normal" data while training, we use an anomaly suppression loss mechanism that effectively minimizes the reconstruction of anomalous pixels and allows the model to learn anomalous areas without explicitly separating "normal" images for training. Evaluation on the Agriculture-vision data challenge shows a 6.3% mIOU score improvement in comparison to prior state of the art in unsupervised and self-supervised methods. A single model generalizes across all the anomaly categories in the Agri-vision Challenge Dataset [5].

关键词： Anomaly Detection computer vision Masked Image Modelling Precision Agriculture UAV Images

来源：评论

学校读者我要写书评

暂无评论

Making use of unlabeled data: Comparing strategies for marine animal detection in long-tailed datasets using self-supervised and semi-supervised pre-training

Making use of unlabeled data: Comparing strategies for marin...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Sharma, Tarun Cline, Danelle E. Edgington, Duane CALTECH Pasadena CA 91125 USA MBARI Moss Landing CA USA

ISBN: (纸本)9798350365474

This paper discusses strategies for object detection in marine images from a practitioner's perspective working with real-world long-tail distributed datasets with a large amount of additional unlabeled data on hand. The paper discusses the benefits of separating the localization and classification stages, making the case for robustness in localization through the amalgamation of additional datasets inspired by a widely used approach by practitioners in the camera-trap literature. For the classification stage, the paper compares strategies to use additional unlabeled data, comparing supervised, supervised iteratively, self-supervised, and semi-supervised pre-training approaches. Our findings reveal that semi-supervised pre-training, followed by supervised fine-tuning, yields a significantly improved balanced performance across the long-tail distribution, albeit occasionally with a trade-off in overall accuracy. These insights are validated through experiments on two real-world long-tailed underwater datasets collected by the Monterey Bay Aquarium Research Institute (MBARI).

关键词： computer vision conservation technology long-tail long-tailed distribution marine imaging real-world datasets Self-supervised learning semi-supervised learning underwater animal detection unlabeled data

来源：评论

学校读者我要写书评

暂无评论

A Dual-Mode Approach for vision-Based Navigation in a Lunar Landing Scenario

A Dual-Mode Approach for Vision-Based Navigation in a Lunar ...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Ostrogovich, Luca Del Prete, Roberto Tomasicchio, Giuseppe Longepe, Nicolas Renga, Alfredo Univ Naples Federico II Dept Ind Engn Naples Italy Telespazio SRL Rome Italy ESA ESRIN Lab Frascati Italy

ISBN: (纸本)9798350365474

In this research, a novel approach for autonomous spacecraft navigation, particularly in lunar contexts, is presented, focusing on vision-based techniques. The system incorporates lunar crater recognition in conjunction with feature tracking to enhance the accuracy of spacecraft navigation. This system underwent comprehensive evaluation through a purpose-built software simulation, replicating lunar conditions for thorough evaluation and refinement. The methodology integrates established navigational methods with advanced artificial intelligence algorithms, resulting in significant navigational accuracy. The system demonstrates precise capabilities in determining the spacecraft position, with an average accuracy of approximately 270 m for the absolute navigation mode, while the relative mode exhibited an average error of 27.4 m and 0.8 m in determining the horizontal and vertical lander displacements relative to terrain. Initial tests on embedded systems-akin to those on-board spacecraft-were conducted. These tests are pivotal in demonstrating the system's operational viability within the constraints of limited bandwidth and rapid processing requirements characteristic of space missions. The promising results from these tests suggest potential applicability in real-world space missions, enhancing autonomous navigation capabilities in lunar and potentially other extraterrestrial environments.

关键词： Lunar Navigation Onboard AI Visual Based Navigation

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 8 9 10 11 12 13 14 15 16 17 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：